Estimation bias and agreement limits between two common self-report methods of habitual sleep duration in epidemiological surveys

Korman, Maria; Zarina, Daria; Tkachev, Vadim; Merikanto, Ilona; Bjorvatn, Bjørn; Bjelajac, Adrijana Koscec; Penzel, Thomas; Landtblom, Anne-Marie; Benedict, Christian; Chan, Ngan Yin; Wing, Yun Kwok; Dauvilliers, Yves; Morin, Charles M.; Matsui, Kentaro; Nadorff, Michael; Bolstad, Courtney J.; Chung, Frances; Mota-Rolim, Sérgio; De Gennaro, Luigi; Plazzi, Giuseppe; Yordanova, Juliana; Holzinger, Brigitte; Partinen, Markku; Reis, Cátia

doi:10.1038/s41598-024-53174-1

Download PDF

Article
Open access
Published: 10 February 2024

Estimation bias and agreement limits between two common self-report methods of habitual sleep duration in epidemiological surveys

Maria Korman¹,
Daria Zarina¹,
Vadim Tkachev¹,
Ilona Merikanto^2,3,
Bjørn Bjorvatn^4,5,
Adrijana Koscec Bjelajac⁶,
Thomas Penzel⁷,
Anne-Marie Landtblom^8,9,
Christian Benedict¹⁰,
Ngan Yin Chan¹¹,
Yun Kwok Wing¹¹,
Yves Dauvilliers¹²,
Charles M. Morin¹³,
Kentaro Matsui¹⁴,
Michael Nadorff¹⁵,
Courtney J. Bolstad^15,16,
Frances Chung¹⁷,
Sérgio Mota-Rolim¹⁸,
Luigi De Gennaro^19,20,
Giuseppe Plazzi^21,22,
Juliana Yordanova²³,
Brigitte Holzinger²⁴,
Markku Partinen^25,26 &
…
Cátia Reis^27,28

Scientific Reports volume 14, Article number: 3420 (2024) Cite this article

770 Accesses
1 Altmetric
Metrics details

Subjects

Abstract

Accurate measurement of habitual sleep duration (HSD) is crucial for understanding the relationship between sleep and health. This study aimed to assess the bias and agreement limits between two commonly used short HSD self-report methods, considering sleep quality (SQ) and social jetlag (SJL) as potential predictors of bias. Data from 10,268 participants in the International COVID Sleep Study-II (ICOSS-II) were used. Method-Self and Method-MCTQ were compared. Method-Self involved a single question about average nightly sleep duration (HSD_self), while Method-MCTQ estimated HSD from reported sleep times on workdays (HSD_MCTQwork) and free days (HSD_MCTQfree). Sleep quality was evaluated using a Likert scale and the Insomnia Severity Index (ISI) to explore its influence on estimation bias. HSD_self was on average 42.41 ± 67.42 min lower than HSD_MCTQweek, with an agreement range within ± 133 min. The bias and agreement range between methods increased with poorer SQ. HSD_MCTQwork showed less bias and better agreement with HSD_self compared to HSD_MCTQfree. Sleep duration irregularity was − 43.35 ± 78.26 min on average. Subjective sleep quality predicted a significant proportion of variance in HSD_self and estimation bias. The two methods showed very poor agreement and a significant systematic bias, both worsening with poorer SQ. Method-MCTQ considered sleep intervals without adjusting for SQ issues such as wakefulness after sleep onset but accounted for sleep irregularity and sleeping in on free days, while Method-Self reflected respondents’ interpretation of their sleep, focusing on their sleep on workdays. Including an SQ-related question in surveys may help bidirectionally adjust the possible bias and enhance the accuracy of sleep-health studies.

Sleep characteristics across the lifespan in 1.1 million people from the Netherlands, United Kingdom and United States: a systematic review and meta-analysis

Article 16 November 2020

Impact of sleep health on self-perceived health status

Article Open access 13 May 2019

The association between subjective–objective discrepancies in sleep duration and mortality in older men

Article Open access 04 November 2022

Introduction

Habitual Sleep Duration (HSD) is a widely investigated parameter due to the number of highly reproducible associations to physical and psychological health outcomes^1,2. It is common to find that health outcomes of interest deteriorate as self-reported HSD deviates from the reference sleep norm interval^3,4,5,6,7. Choosing the right tools to estimate HSD is challenging in epidemiological sleep research. The best method to self-report HSD is a sleep diary⁸, but it is generally non-applicable in surveys. Majority of the validated (vis-a-vis polysomnography (PSG)) sleep questionnairs, that are routinely used in clinical evaluation to reliably distinguish between individuals with and without sleep disorders, are relatively long⁹. To ensure good compliance and high response rates, tools that have minimal number of items are therefore prioritized in epidemiological surveys¹⁰.

Assessment of HSD in epidemiological surveys can include single questions such as “How many hours do you usually sleep at night?” (e.g., Pittsburgh Sleep Quality Index—PSQI, Self-Assessment of Sleep Survey—SASS)^11,12, which assumes that adults provide an accurate global and retrospective approximation of their sleep length. Other HSD estimation methods use two questions about sleep onset and offset times to estimate the sleep interval (e.g., Karolinska Sleep Questionnaire—KSQ, Basic Nordic Sleep Questionnaire—BNSQ, Munich Chronotype Questionnaire—MCTQ); these questions are asked separately for work and free days^13,14,15. This method estimates sleep timing and crucial sleep metrics like social jetlag (SJL) and irregular sleep¹⁶. For example, inconsistent sleep timing is an important risk factor for metabolic abnormalities, even more significant than sleep duration¹⁷.

Various studies found weak-to-moderate correlations between single items of HSD and objectively measured sleep, however the agreement between different methods is poor—ranging between 2.0 and 3.5 h above and below the difference between the means^{1,18,19,20,21,22}. Also, sleep diaries and single-question HSDs, displayed either non-significant or weak associations¹. Self-assessment and time-in-bed duration calculated from habitual bedtime and wake time (rather than sleep onset and offset times), were recently reported to show disagreement with actigraphy-based sleep duration. Specifically, the single question provided a significant underestimate of HSD while the bed-wake interval agreed well with Time-in-Bed (TIB) but overestimated Total Sleep Time (TST)¹⁸. These biases and disagreements pose a significant challenge in the accurate assessment of contribution of HSD to physical and psychological health in survey research. Further, a recent methodological review showed that the variability in the questions relating to sleep, such as event definitions (e.g., “go to bed” vs. “fall asleep”), context (e.g., “habitual” vs. “work/free days”) and timeframe (“typical night” vs. “recently”) leads to discrepancies in HSD estimation by different self-report methods²³. Additionally, perceived sleep quality, insomnia symptoms and social schedules are important factors that can affect self-reported HSD¹⁹, but the extent of these effects have not been systematically quantified in large cohorts.

Sleep quality refers to the subjective experience of sleep, reflecting a number of quantifiable components of physiological sleep, such as depth of sleep (i.e., amount of slow-wave sleep), sleep continuity (i.e., wake after sleep onset, percentage of time awake, and number of awakenings) and additional internal or external factors (i.e., circadian profile, pain, stress)²⁴. Poor sleep quality can lead to overestimation or underestimation of sleep duration²⁵. A single question of overall sleep quality using a Likert scale is common in both experimental and epidemiological studies, with a verbal scale providing more stable estimation compared to a numerical scale^10,12. The Insomnia Severity Index (ISI) is sometimes also used as a proxy for sleep quality^26,27. Social time pressure refers to the demands and constraints of social obligations that may limit the sleep duration²⁸. In industrialized societies, people often experience a high social time pressure on workdays, and a large mismatch between internal biological and social times. This mismatch can be quantified by the difference between mid-sleep point on free and workdays and reflects irregularity of sleep timing, called Social Jet Lag (SJL)²⁹. Because self-report questions always encompass more than physiological sleep duration alone, evaluating the differences between common self-report methods used to assess HSD in surveys focusing on the potential predictors of the bias is important. The first objective of this study was to evaluate within-subjects estimation bias and the limits of agreement between two short self-report methods used to assess HSD in a large, global, heterogeneous sample of the International Covid Study II (ICOSS-II) project³⁰. The second objective of this study was to address the contribution of subjective Sleep Quality and Social Time Pressure to estimate the HSD bias. The contribution of Sleep Quality was validated vis-à-vis Insomnia Severity Index (ISI)—one of the most widely used tools to assess sleep problems in clinical and community samples²⁷.

Results

The sample consisted of 10,268 participants with a mean age of 43.16 ± 16.80 years (Mean ± standard deviation) and 68.3% were female. Demographic descriptive in Table 1.

Table 1 Socio-demographic characteristics and sleep measures of the sample. Mean ± SD or frequency (% of group total).

Full size table

Estimation of habitual sleep duration bias and the agreement between methods

Distributions of HSDs from both methods are shown in Fig. 1a, with mean HSD_self being shorter (418.9 ± 77.2) than HSD_MCTQweek (461.4 ± 75.1). A paired t-test was used to quantify the within-subject difference between methods. A systematic HSD estimation bias was observed (t = − 63.07, df = 10,267, p < 0.001). The mean bias was − 42.41 ± 67.42 min (95% CI of the difference: − 43.72 to − 41.11) and had a normal distribution (Fig. 1b), though HSD_self and HSD_MCTQweek were significantly positively correlated (rho = 0.604, p < 0.001, weighted by age).

The level of agreement between the two HSD assessment methods is visualized using the Bland–Altman plot in Fig. 1c. As neither of the two methods is a “reference”, the bias was compared with the means of the HSD_self and the HSD_MCTQweek values. To assess whether the bias (represented by the gap between the X axis, and the mean line (blue)) is stable through the whole range of values, a linear regression line (red) was fit to the HSD data points. A Pearson test demonstrated a significant negligible slope (k = 0.034, Beta = 0.02, p = 0.03). Finally, the limits of agreement between methods were calculated as: Upper limit \(\left[ {\overline{d}\left[ { - 1.96\;{\text{s}}} \right] = - 42.41 - \left( {1.96 \times 67.42} \right) = 175} \right]\); Lower limit \(\left[ {\overline{d}\left[ { + 1.96\;{\text{s}}} \right] = - 42.41 + \left( {1.96 \times 67.42} \right) = 90} \right]\). Altogether, the two methods only agreed within ± 133 min, in other words, the HSD_self may be 90 min above or 175 min below the HSD_MCTQweek.

A simple regression model using weighted joint distribution of gender and age by country showed that age was not a significant predictor of the HSD bias (F(1, 10,256) = 2.77, p = 0.096, Beta = 0.016). However, women had significantly larger HSD bias than men (t = 4.55, p < 0.001, mean difference = 6.6 min), but with a negligibly small effect size (Cohen’s d = 0.097).

Sleeping well? The HSD estimation bias and the agreement of the methods depend on subjective sleep quality

HSD estimated by both methods negatively correlated with participants’ subjective Sleep Quality, with sleep quality demonstrating a stronger relation to HSD_self (Pearson correlations weighted by age: rho = − 0.334, p < 0.01, rho = − 0.134, p < 0.01; HSD_self and HSD_MCTQweek, respectively). Although the two methods are presumably estimating the same construct, using the Fisher r-to-z transformation we found that the two correlation coefficients were also significantly different (z = − 15.71, p < 0.01). The correlation between HSD estimation bias and subjective Sleep Quality was also significant (rho = − 0.207, p < 0.01).

To quantify the dependence of the agreement between the two methods in reference to subjective sleep quality, given the large sample size of the ICOSS-II study, HSD bias for each 5 Sleep Quality groups was separately analyzed. One-way ANOVA showed that the estimation bias became more negative as the sleep quality decreased (F(4, 10,256) = 105.16, p < 0.001). The results are summarized in Fig. 2. The minimal HDS estimation bias value (− 26.69 ± 58.10 min) and the narrowest range of agreement between methods (± 114 min) were in the group sleeping “well”. The estimation bias and range of agreement became progressively larger with poorer sleep quality. HDS estimation bias in the group sleeping “badly” reached a maximum value of (− 79.97 ± 97.29 min) with a range of agreement of ± 191 min. Post-hoc pairwise comparisons with Bonferroni corrections demonstrated significant distinctions between each of the five sleep quality groups (see supplementary information SI-Table S.1), suggesting underestimation of HSD_self relative to HSD_MCTQweek increases incrementally.

Workdays or freedays? The HSD estimation bias and the agreement of methods depends on social time pressure (workdays/free days)

Most participants reported irregular sleep durations across the week. The mean difference between HSD_MCTQwork and HSD_MCTQfree was − 43.35 ± 78.26 min (449.0 ± 81.1 and 492.3 ± 87.7 min, respectively; paired t-test, t(10,267) = − 56.13, p < 0.001). Accordingly, the distribution of the difference between HSD_MCTQwork and HSD_MCTQfree, with majority of respondents reporting longer sleep duration during free days (percentiles in minutes: 25th = 0, 50th = 30, 75th = 75).

Next, we tested the hypothesis that HSD_MCTQwork would demonstrate a smaller estimation bias and better agreement with HSD_self as compared to HSD_MCTQfree. The mean estimation bias for the HSD_MCTQwork was smaller than the HSD_MCTQfree (− 30 min, and − 73 min, respectively, Fig. 3a). Further, the agreement limits with the HSD_self were similar to the limits of the HSD_MCTQweek but better than in HSD_MCTQfree (± 140 min vs. ± 169 min, respectively, Fig. 3b,c). The observation that Sleep Quality groups were significantly different from each other was replicated also in HSD_self–HSD_MCTQwork and HSD_self–HSD_MCTQfree comparisons (SI-Tables S.2, S.3)_.

The mean SJL of the sample was 56.5 ± 62.2 min (SJL percentiles, in minutes: 25th = 15, 50th = 45, 75th = 90). There were no significant differences in SJL between the Sleep Quality groups (One-way ANOVA p = 0.205).

The combined contribution of sleep quality and social time pressure on HSD estimation bias

Having established the effects of Sleep Quality and Social Time Pressure on HSD estimation bias, we presumed that their combination may demonstrate conditions under which the bias is minimal and the agreement between the methods is most reliable. One-way ANOVAs showed that the estimation bias became more negative in both methods as the sleep quality decreased (F(4, 10,263) = 84.312, p < 0.001; F(4, 10,263) = 79.65, p < 0.001; Method-MCTQ_work and Method-MCTQ_free, respectively). Post-hoc pairwise comparisons with Bonferroni corrections for HSD_MCTQwork showed that “well” and “rather well” Sleep Quality groups did not differ, while all other groups showed significant differences (SI-Table S.4). In contrast, for HSD_MCTQfree, “rather badly” and “badly” Sleep Quality groups were not significantly different from each other, while all other groups showed significant differences (SI-Table S.5). The “well” and “rather well” sleeping groups during workdays showed the best parameters: the mean HSD estimation bias was only − 15.81 ± 62.77 min and the two methods agreed within ± 114 min (Fig. 4a,b).

Weighted least squares stepwise regressions were conducted to examine the extent to which Sleep Quality and Social Time Pressure (represented by SJL) explained the variance in different HSDs and the HSD estimation bias itself. The main model had 5 predictors: Sleep Quality, SJL, age, gender, and BMI. Gender and age by country distribution was used for weighting. The model explained 13.7% of the HSD_self variance, 4.2% of the HSD_MCTQweek variance, 3.6% of the HSD_MCTQwork variance, 10.8% of the HSD_MCTQfree variance and 6.9% of the variance in the HSD estimation bias. Leading predictor in all models, except HSD_MCTQfree, was Sleep Quality, with HSD_self demonstrating the largest dependence (12.5% vs. 2.1% vs. 2.1% and 6.2%; HSD_self, HSD_MCTQweek and HSD_MCTQwork and HSD estimation bias, respectively). Leading predictor of HSD_MCTQfree was SJL (7.4%). Age and gender were significant predictors in most models but explained less than 1% of the variance for all (statistical details in supplementary information SI-Table S.6).

Comparison between the contributions of sleep quality and ISI score to HSD estimation bias

The contribution of subjective Sleep Quality to the models was assessed using the ISI score, a clinical index of insomnia symptoms severity. Weighted least squares stepwise regressions were re-run with the ISI score used instead of the Sleep Quality and the other four predictors similar to the original model. The variance in HSD_self, HSD_MCTQweek and HSD_MCTQwork was primarily explained by the ISI score but the models were less robust (8.4%, 1.4% and 1.5%, respectively, (see details in supplementary information SI-Table S.7). See full statistical details in SI-Table S.7 and SI-Fig. S.1 for the distribution of the HSD estimation bias values by ISI categories. Finally, a model including both Sleep Quality and ISI continuous score as predictors (and SJL, gender, age, and BMI), explained 6.9% of the variance in HSD estimation bias. Note that the ISI score was the least robust contributor accounting only for 0.1% of the variance (SI-Table S.8), demonstrating that ISI score was practically redundant as a predictor of the HSD estimation bias.

Discussion

It is not clear which self-report method to measure sleep duration can be advised to be used with confidence in large online surveys, since great discrepancies are systematically observed between different methods. Our findings in a large international sample of 10,268 participants also showed poor agreement range (± 133 min), and also indicated systematic and high estimation bias (42.41 ± 67.42 min) between HSD derived from sleep onset and offset and a single question. Thus, for a given person, self-reported sleep duration (HSD_self) will be almost always lower than self-reported sleep interval (according to HSD_MCTQweek). For example, if somebody says they sleep 7.5 h a night that means that he/she would estimate their sleep interval as ~ 8h12min (+ 42 min), on average, but the accuracy of this estimation will be very low (± 133 min).

While inaccuracy and problems with face validity of different methods are well recognized in the literature, differences in the dimensionality of the self-report methods, factors that contribute to the poor agreement between them and explain the bias, at least partially, were less studied^18,19,23. If HSD is systematically under- or overestimated depending on the question, the associations of the health outcomes with sleep duration will also be systematically inflated or flattened³¹. Our findings showed that subjective sleep quality was a strong driver of the estimation bias, the bias almost tripled from the best to worst Sleep Quality group (from 26.69 ± 58.10 to − 79.97 ± 97.29 min). Furthermore, estimation bias changed incrementally with decreasing sleep quality. We also showed that a single question addressing sleep quality contributed to the model explaining the HSD estimation bias more than a multi-item insomnia symptoms severity score. Moreover, having both Sleep Quality and ISI scores as predictors of HSD estimation bias was, in fact, redundant. Sleep quality was also a leading predictor of HSD_self, HSD_MCTQweek and HSD_MCTQwork, while SJL was a leading predictor of HSD_MCTQfree. The quantitative estimation of the bias between methods can be used bi-directionally to estimate HSD from one method to the other, if a subjective sleep quality parameter is available.

Our findings therefore indicate that assessing HSD with a single question, or HSD from sleep onset and offset, may capture distinct aspects of sleep duration. The HSD_MCTQweek was only subtly influenced by sleep quality, while HSD_self and the estimation bias were profoundly sensitive to it. Conversely, the single-question method accounts for poor sleep, but lacks sensitivity to sleep rebound on free days. This may happen because people tend to report the most representative days of the week (i.e., workdays), and lower sleep satisfaction during workdays. This makes the single-question method more susceptible to sleep misperception. Sleep misperception has been found to vary a lot in people from the general population, in patients with insomnia³², hypersomnia³³ and obstructive sleep apnea³⁴. These results are in agreement with previous findings, where single questions about sleep duration and sleep quality using the PSQI tool were shown to represent workdays, whereas when the same PSQI questions were asked separately, participants from the general population³⁵ had better sleep during free days as well as in clinical populations, and this difference was mediated by SJL³⁶. Women had a slightly higher HSD estimation bias compared to men (~ 6 min), and this finding may be explained by the fact that women tend to report lower sleep quality³⁷. Interestingly, although sleep duration changes through life³⁸, age had no effect on the HSD estimation bias, suggesting that underestimation of HSD_self relative to HSD_MCTQweek is a stable phenomenon across ages related to sleep quality.

Several limitations exist when interpreting our results. Among those, it was a convenience sample that was collected during COVID-19 pandemic, included unusual participants with a novel health profile of long COVID, and had a clear overrepresentation of women (68.3%). In particular, the data collection period was associated with many changes in the social and personal lives of people across participating countries but note that data was not collected during confinement. Sleep–wake habits during the pandemic were adaptively changing worldwide, with many people working and studying from home^39,40,41. Additionally, this study was designed to engage participants who may have had COVID-19 and suffer from symptoms of long COVID^25,30. Indeed, 9.1% of the sample reported symptoms of long COVID when enrolled in the ICOSS-II study. However, the sensitivity analyses in a sub-group of participants with long-COVID symptoms and in a subgroup of older adults supported the conclusion that HSD bias between methods is a stable trait primarily related to Sleep Quality (see details in the “Methods” and Supplementary Materials sections). Altogether, the web-based survey's generalizability is limited, but maybe partially offset by the large sample size and uniform data acquisition period.

Concerns about self-reported sleep duration accuracy in surveys are longstanding^19,42,43, even prompting suggestions to exclude it from epidemiological studies⁴⁴. Nevertheless, in large-scale field sleep studies the use of self-report tools is often the only possible option, like in the case of the COVID-19 pandemic^28,30. Over the last years, many studies showed associations between self-report measures with chronic diseases and mental health^5,6,7,45, identifying risk factors, screening for sleep disorders, monitoring changes in the population habits, and understanding the broader public health implications. We believe that researchers using measures of sleep duration based on self-reports should be aware of the meanings and limitations associated with each method, as well as about their disagreement without assuming that all of them reflect physiological sleep to the same extent and strive to add objective measurements of sleep duration or sleep diary when possible.

To conclude, the two methods showed very poor agreement and a significant systematic bias, both worsening with poorer subjective sleep quality. The method using self-reported sleep onset and offset times provides a “raw” calculation of the sleep intervals for work and free days, accounts for irregularities in sleep duration and timing but is inherently insensitive to the frequency and length of awakenings^46,47. The accuracy of sleep intervals estimations would benefit from inclusion of a wakefulness after sleep onset item, as in Evanger et al.⁴⁸. The single-question sleep duration assessment was found to be associated with sleep quality, and thus may reflect in part how respondents perceive their sleep. However, this method is inherently insensitive to the sleep rebound that occurs on days off^31,49. We suggest that assessing sleep duration and subjective sleep quality separately for workdays and free days may improve the design of future studies^35,36. This can be done using either single or two-question approach, in accordance with the specific objectives of the study and, when possible, should include objective measures of sleep. Future studies should evaluate whether including items assessing sleep quality (e.g., single question) and wakefulness after sleep onset may facilitate the implementation of adjustments accounting for potential biases between HSD estimation methods.

Methods

Data collection

This study used data from the International Covid Study II (ICOSS-II)³⁰, which is an international collaboration between sleep and circadian rhythm experts. Using a web-based anonymous survey, ICOSS-II took place between May to December 2021 in parallel across the following 16 countries using translations to local languages: Austria, Brazil, Bulgaria, Canada, Hong Kong/China, Croatia, Finland, France, Germany, Israel, Italy, Japan, Norway, Portugal, Sweden, USA. The survey used Qualtrics and Redcap platforms. The study conforms to recognized standards by the Declaration of Helsinki. After a brief explanation of the study, the survey was available to participants after obtaining their informed consent to be part of the study. All investigators obtained local ethical committee (REB) approval when applicable (detailed list in supplementary material Table S.8). Due to the anonymous nature of the survey, REB permissions were exempted in some countries.

A total of 16,899 participants opened the link to the ICOSS questionnaire, and 15,859 had valid data. For this study we excluded shift/night workers and subjects reporting severe health conditions (atrial fibrillation, heart failure, stroke, other heart conditions, chronic obstructive pulmonary disease, kidney failure, cancer, immunosuppressive treatment, ongoing Covid-19). For quality control reasons, we excluded participants with HSD < 2.5 h or > 16 h (in either HSD_self and HSD_MCTQfree), with discrepancy in sleep duration estimation of more than 400 min between the two methods, or with missing data in sleep duration and sleep quality parameters. We had a final sample of 10,268 individuals.

Sleep assessment items and measures

HSD times were assessed twice for each participant using two methods: Method-Self assessment was based on a single-question (i.e., “How many hours per night you have been sleeping on average CURRENTLY?”) in the format hh:mm (HSD_self). The Method-MCTQ used an adapted version of the Munich Chronotype Questionnaire (µMCTQ). The questions were referring to sleep onset and offset timings (reported in 24 h local time format) (i.e., “At what time do you usually fall asleep at work/free days CURRENTLY?”, “At what time do you usually wake up at work/free days CURRENTLY?”). Separate reports were obtained for workdays and free days, enabling calculation of HSD during workdays and free days (HSD_MCTQwork, HSD_MCTQfree) and a weighted weekly average HSD, assuming 5 workdays (HSD_MCTQweek)⁵⁰. The resolution of the answers was 15 min. Sleep mid-points (between reported sleep onset and offset times) on work- and free days were used to calculate SJL (absolute difference between sleep mid-points on free and workdays)²⁹.

Subjective Sleep Quality was reported by participants on a 5-point Likert scale (i.e., well, rather well, neither well or badly, rather badly and badly) as in the BNSQ, in response to the question “How well have you been sleeping CURRENTLY?”. We used these categories to stratify the sample by Sleep Quality groups. Symptoms of insomnia were assessed using the Insomnia Severity Index (ISI), a 7-item questionnaire assessing the nature, severity, and impact of insomnia during “the last month”. A 5-point Likert scale is used to rate each item (0 = no problem to 4 = very severe problem), which provided a total score ranging from 0 to 28. The total score was interpreted as follows: absence of insomnia (0–7); sub-threshold insomnia (8–14); moderate insomnia (15–21); and severe insomnia (22–28)²⁷.

Statistical analysis

Data are reported as mean ± SD or frequency (% of group total). The agreement between the two methods for assessment of HSD (Method-Self and Method-MCTQ) was analyzed using the approach proposed by Bland and Altman⁵¹. Mean differences between the methods [HSD_self–HSD_MCTQweek], or [HSD_self–HSD_MCTQwork], or [HSD_self–HSD_MCTQfree] were valued as a measure of systematic bias using paired t-tests. The upper and lower limits of agreement were defined as mean difference ± 1.96 × standard deviation with corresponding 95% confidence interval (95% CI). The difference between limits of agreement represents the range of HSD values covering the agreement between the two methods for ~ 95% of the individuals as a measure of precision. Sleep Quality groups were compared using Mann–Whitney or t-tests for continuous variables, according to the type and variables distribution. A simple regression model with weighted joint distribution of gender and age by country was used to estimate the contribution of these demographics to the HSD bias. Multiple regressions were run to evaluate the extent to which Sleep Quality and social time pressure (given by SJL) explained the variance in different HSDs and the HSD estimation bias itself. The main model included a set of 5 predictors: Sleep Quality, SJL, and potential demographic confounders previously linked to HSD—including age, gender, and Body Mass Index (BMI). In the validation analysis, ISI score was also used as a predictor. Collinearity tests showed no multicollinearity concerns with the predictors.

The sensitivity analyses to explore potential plausible biases were performed in a sub-group of participants with long-COVID symptoms (SI-Table S.8) and in a subgroup of older adults (> 65 years old, majority after retirement, SI-Table S.9): (1) As the ICOSS-II data were collected 15–21 months after the onset of the COVID-19 pandemic, the first subgroup for sensitivity analysis included 934 (9.1% from total) individuals who met the WHO criteria for long COVID-19⁵². COVID-19 is a recent disorder that impacts sleep and may change the perception of sleep duration with the two estimates. We performed a sensitivity analysis focusing on the HSD estimation and agreement between Method-Self and Method-MCTQ to investigate potential bias in a sub-sample of participants with symptoms of long COVID. (2) Since age and retirement play a major role in sleep habits, sleep quality and social time pressure, the second subgroup for sensitivity analysis included 1187 participants (11.5% from total). The mean age of this group was 71.22 ± 3.68 years old. The data were analyzed using SPSS 29.0 (IBM Corp., Armonk, NY, USA) and R (version 4.0.5).

Data availability

We included all the data needed for the evaluation of the conclusions in the “Results” section or in the Supplementary Information file. Additional data related to this article may be requested from the authors.

References

Benz, F. et al. How many hours do you sleep? A comparison of subjective and objective sleep duration measures in a sample of insomnia patients and good sleepers. J. Sleep Res. 32, e13802 (2023).
Article PubMed Google Scholar
Chaput, J.-P. et al. Sleep duration and health in adults: An overview of systematic reviews. Appl. Physiol. Nutr. Metab. 45, S218–S231 (2020).
Article PubMed Google Scholar
Zhu, G. et al. Exploration of sleep as a specific risk factor for poor metabolic and mental health: A UK biobank study of 84,404 participants. Nat. Sci. Sleep 13, 1903–1912 (2021).
Article PubMed PubMed Central Google Scholar
Kósa, K., Vincze, S., Veres-Balajti, I. & Bába, É. B. The pendulum swings both ways: Evidence for U-shaped association between sleep duration and mental health outcomes. Int. J. Environ. Res. Public Health 20, 5650 (2023).
Article PubMed PubMed Central Google Scholar
Cappuccio, F. P., Cooper, D., D’Elia, L., Strazzullo, P. & Miller, M. A. Sleep duration predicts cardiovascular outcomes: A systematic review and meta-analysis of prospective studies. Eur. Heart J. https://doi.org/10.1093/eurheartj/ehr007 (2011).
Article PubMed Google Scholar
Cappuccio, F. P., D’Elia, L., Strazzullo, P. & Miller, M. A. Sleep duration and all-cause mortality: A systematic review and meta-analysis of prospective studies. Sleep 33, 585–592 (2010).
Article PubMed PubMed Central Google Scholar
Reis, C. et al. Sleep duration, lifestyles and chronic diseases: A cross-sectional population-based study. Sleep Sci. 11, 217–230 (2018).
Article PubMed PubMed Central Google Scholar
Carney, C. E. et al. The consensus sleep diary: Standardizing prospective sleep self-monitoring. Sleep 35, 287–302 (2012).
Article PubMed PubMed Central Google Scholar
Shahid, A., Wilkinson, K., Marcu, S. & Shapiro, C. M. STOP, THAT and One Hundred Other Sleep Scales (Springer, 2012). https://doi.org/10.1007/978-1-4419-9893-4.
Book Google Scholar
Croy, I., Smith, M. G., Gidlöf-Gunnarsson, A. & Persson-Waye, K. Optimal questions for sleep in epidemiological studies: Comparisons of subjective and objective measures in laboratory and field studies. Behav. Sleep Med. 15, 466–482 (2017).
Article PubMed Google Scholar
Buysse, D. J., Reynolds, C. F., Monk, T. H., Berman, S. R. & Kupfer, D. J. The Pittsburgh sleep quality index: A new instrument for psychiatric practice and research. Psychiatry Res. 28, 193–213 (1989).
Article CAS PubMed Google Scholar
Dietch, J. R., Sethi, K., Slavish, D. C. & Taylor, D. J. Validity of two retrospective questionnaire versions of the Consensus Sleep Diary: The whole week and split week Self-Assessment of Sleep Surveys. Sleep Med. 63, 127–136 (2019).
Article PubMed Google Scholar
Nordin, M., Åkerstedt, T. & Nordin, S. Psychometric evaluation and normative data for the karolinska sleep questionnaire. Sleep Biol. Rhythms 11, 216–226 (2013).
Article Google Scholar
Roenneberg, T., Wirz-Justice, A. & Merrow, M. Life between clocks: Daily temporal patterns of human chronotypes. J. Biol. Rhythms 18, 80–90 (2003).
Article PubMed Google Scholar
Partinen, M. & Gislason, T. Basic Nordic Sleep Questionnaire (BNSQ): A quantitated measure of subjective sleep complaints. J. Sleep Res. 4, 150–155 (1995).
Article CAS PubMed Google Scholar
Monk, T. H. et al. Measuring sleep habits without using a diary: The sleep timing questionnaire. Sleep 26, 208–212 (2003).
Article PubMed Google Scholar
Huang, T. & Redline, S. Cross-sectional and prospective associations of actigraphy-assessed sleep regularity with metabolic abnormalities: The multi-ethnic study of atherosclerosis. Diabetes Care 42, 1422–1429 (2019).
Article CAS PubMed PubMed Central Google Scholar
Lauderdale, D. S. Commentary on “Agreement between simple questions about sleep duration and sleep diaries in a large online survey”. Sleep Health 1, 138–139 (2015).
Article PubMed Google Scholar
Miller, C. B. et al. Agreement between simple questions about sleep duration and sleep diaries in a large online survey. Sleep Health 1, 133–137 (2015).
Article PubMed Google Scholar
Silva, G. E. et al. Relationship between reported and measured sleep times. J. Clin. Sleep Med. 03, 622–630 (2007).
Article Google Scholar
Matthews, K. A. et al. Similarities and differences in estimates of sleep duration by polysomnography, actigraphy, diary, and self-reported habitual sleep in a community sample. Sleep Health 4, 96–103 (2018).
Article PubMed Google Scholar
Lee, P. H. Validation of the National Health And Nutritional Survey (NHANES) single-item self-reported sleep duration against wrist-worn accelerometer. Sleep Breath. 26, 2069–2075 (2022).
Article PubMed Google Scholar
Robbins, R. et al. Self-reported sleep duration and timing: A methodological review of event definitions, context, and timeframe of related questions. Sleep Epidemiol. 1, 100016 (2021).
Article PubMed PubMed Central Google Scholar
McCarter, S. J. et al. Physiological markers of sleep quality: A scoping review. Sleep Med. Rev. 64, 101657 (2022).
Article PubMed Google Scholar
Santos, R. B. et al. Prevalence and predictors of under or overestimation sleep duration in adults: The ELSA-Brasil study. Sleep Epidemiol. 1, 100013 (2021).
Article Google Scholar
Muzni, K., Groeger, J. A., Dijk, D. & Lazar, A. S. Self-reported sleep quality is more closely associated with mental and physical health than chronotype and sleep duration in young adults: A multi-instrument analysis. J. Sleep Res. 30, e13152 (2021).
Article PubMed Google Scholar
Morin, C. M., Belleville, G., Bélanger, L. & Ivers, H. The insomnia severity index: Psychometric indicators to detect insomnia cases and evaluate treatment response. Sleep 34, 601–608 (2011).
Article PubMed PubMed Central Google Scholar
Korman, M. et al. COVID-19-mandated social restrictions unveil the impact of social time pressure on sleep and body clock. Sci. Rep. 10, 22225 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Roenneberg, T., Pilz, L. K., Zerbini, G. & Winnebeck, E. C. Chronotype and social jetlag: A (self-) critical review. Biology (Basel) 8, 54 (2019).
PubMed Google Scholar
Merikanto, I. et al. Disturbances in sleep, circadian rhythms and daytime functioning in relation to coronavirus infection and Long-COVID—A multinational ICOSS study. J. Sleep Res. 31, e13542 (2022).
Article PubMed Google Scholar
Jackson, C. L. et al. 0694 Concordance between self-reported and objectively-assessed sleep duration among African–American adults: Findings from the Jackson Heart Sleep Study. Sleep 42, A278–A278 (2019).
Article Google Scholar
Fernandez-Mendoza, J. et al. Sleep misperception and chronic insomnia in the general population: Role of objective sleep duration and psychological profiles. Psychosom. Med. 73, 88–97 (2011).
Article PubMed Google Scholar
Evangelista, E. et al. Characteristics associated with hypersomnia and excessive daytime sleepiness identified by extended polysomnography recording. Sleep 44, zsaa264 (2021).
Article PubMed Google Scholar
Choi, S. J., Suh, S., Ong, J. & Joo, E. Y. Sleep misperception in chronic insomnia patients with obstructive sleep apnea syndrome: Implications for clinical assessment. J. Clin. Sleep Med. 12, 1517–1525 (2016).
Article PubMed PubMed Central Google Scholar
Pilz, L., Keller, L., Lenssen, D. & Roenneberg, T. Time to rethink sleep quality: PSQI scores reflect sleep quality on workdays. Sleep 2, zsy029 (2018).
Google Scholar
Reis, C., Pilz, L. K., Keller, L. K., Paiva, T. & Roenneberg, T. Social timing influences sleep quality in patients with sleep disorders. Sleep Med. 71, 8–17 (2020).
Article PubMed Google Scholar
Fatima, Y., Doi, S. A. R., Najman, J. M. & Al Mamun, A. Exploring gender difference in sleep quality of young adults: Findings from a large population study. Clin. Med. Res. 14, 138–144 (2016).
Article PubMed PubMed Central Google Scholar
Hirshkowitz, M. et al. National Sleep Foundation’s updated sleep duration recommendations: Final report. Sleep Health 1, 233–243 (2015).
Article PubMed Google Scholar
Leone, M. J. & Sigman, M. Effects of lockdown on human sleep and chronotype during the COVID-19 pandemic. Curr. Biol. 30, R905–R931 (2020).
Article Google Scholar
Scarpelli, S. et al. Subjective sleep alterations in healthy subjects worldwide during COVID-19 pandemic: A systematic review, meta-analysis and meta-regression. Sleep Med. 100, 89–102 (2022).
Article PubMed PubMed Central Google Scholar
Brandão, L. E. M. et al. Social jetlag changes during the COVID-19 pandemic as a predictor of insomnia—A multi-national survey study. Nat. Sci. Sleep 13, 1711–1722 (2021).
Article PubMed PubMed Central Google Scholar
Bliwise, D. L. & Young, T. B. The parable of parabola: What the U-shaped curve can and cannot tell us about sleep. Sleep 30, 1614–1615 (2007).
Article PubMed PubMed Central Google Scholar
Lauderdale, D. S., Knutson, K. L., Yan, L. L., Liu, K. & Rathouz, P. J. Self-reported and measured sleep duration. Epidemiology 19, 838–845 (2008).
Article PubMed PubMed Central Google Scholar
Bianchi, M. T., Thomas, R. J. & Westover, M. B. An open request to epidemiologists: Please stop querying self-reported sleep duration. Sleep Med. 35, 92–93 (2017).
Article PubMed PubMed Central Google Scholar
Schurhoff, N. & Toborek, M. Circadian rhythms in the blood–brain barrier: Impact on neurological disorders and stress responses. Mol. Brain 16, 5 (2023).
Article PubMed PubMed Central Google Scholar
Zavada, A., Gordijn, M. C. M., Beersma, D. G. M., Daan, S. & Roenneberg, T. Comparison of the Munich Chronotype Questionnaire with the Horne–Östberg’s morningness–eveningness score. Chronobiol. Int. 22, 267–278 (2005).
Article PubMed Google Scholar
Roenneberg, T., Daan, S. & Merrow, M. The art of entrainment. J. Biol. Rhythms 18, 183–194 (2003).
Article PubMed Google Scholar
Evanger, L. N. et al. Later school start time is associated with longer school day sleep duration and less social jetlag among Norwegian high school students: Results from a large-scale, cross-sectional study. J. Sleep Res. 32, e13840 (2023).
Article PubMed Google Scholar
St-Onge, M.-P. et al. Information on bedtimes and wake times improves the relation between self-reported and objective assessments of sleep in adults. J. Clin. Sleep Med. 15, 1031–1036 (2019).
Article PubMed PubMed Central Google Scholar
Ghotbi, N. et al. The µMCTQ: An ultra-short version of the Munich ChronoType Questionnaire. J. Biol. Rhythms https://doi.org/10.1177/0748730419886986 (2019).
Article PubMed Google Scholar
Bland, J. M. & Altman, D. G. Comparing methods of measurement: Why plotting difference against standard method is misleading. Lancet 346, 1085–1087 (1995).
Article CAS PubMed Google Scholar
Soriano, J. B., Murthy, S., Marshall, J. C., Relan, P. & Diaz, J. V. A clinical case definition of post-COVID-19 condition by a Delphi consensus. Lancet Infect. Dis. 22, e102–e107 (2022).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

We acknowledge Ying Huang (Germany), Harald Hrubos-Strøm (Norway), Colin A. Espie (Great Britain) and Yuichi Inoue (Japan) for being instrumental in giving inputs for study design or providing data to this study. This material is partially the result of work supported with resources at the South Texas Veterans Health Care System in San Antonio, TX, USA. The contents of this publication do not represent the views of the U.S. Department of Veterans Affairs or the United States Government.

Author information

Authors and Affiliations

Department of Occupational Therapy, Faculty of Health Sciences, Ariel University, Ariel, Israel
Maria Korman, Daria Zarina & Vadim Tkachev
SleepWell Research Program, Faculty of Medicine, University of Helsinki, Helsinki, Finland
Ilona Merikanto
Orton Orthopaedics Hospital, Helsinki, Finland
Ilona Merikanto
Department of Global Public Health and Primary Care, University of Bergen, Bergen, Norway
Bjørn Bjorvatn
Norwegian Competence Center for Sleep Disorders, Haukeland University Hospital, Bergen, Norway
Bjørn Bjorvatn
Institute for Medical Research and Occupational Health, Zagreb, Croatia
Adrijana Koscec Bjelajac
Sleep Medicine Center, Charité Universitätsmedizin Berlin, Berlin, Germany
Thomas Penzel
Department of Medical Sciences, Neurology, Uppsala University, Uppsala, Sweden
Anne-Marie Landtblom
Department of Biomedical and Clinical Sciences, Linköping University, Linköping, Sweden
Anne-Marie Landtblom
Department of Pharmaceutical, Uppsala University, Uppsala, Sweden
Christian Benedict
Li Chiu Kong Family Sleep Assessment Unit, Department of Psychiatry, The Chinese University of Hong Kong, Sha Tin, Hong Kong SAR, China
Ngan Yin Chan & Yun Kwok Wing
Sleep-Wake Disorders Unit, Department of Neurology, Gui-de-Chauliac Hospital, CHU Montpellier, INSERM Institute of Neurosciences of Montpellier, University of Montpellier, Montpellier, France
Yves Dauvilliers
Centre de Recherche CERVO/Brain Research Center, École de Psychologie, Université Laval, Quebec, QC, Canada
Charles M. Morin
Department of Clinical Laboratory, National Center Hospital, National Center of Neurology and Psychiatry, Kodaia, Japan
Kentaro Matsui
Department of Psychology, Mississippi State University, Starkville, MS, USA
Michael Nadorff & Courtney J. Bolstad
South Texas Veterans Health Care System, San Antonio, TX, USA
Courtney J. Bolstad
Department of Anesthesia and Pain Management, Toronto Western Hospital, University Health Network, University of Toronto, Toronto, ON, Canada
Frances Chung
Brain Institute, Physiology and Behavior Department and Onofre Lopes University Hospital, Federal University of Rio Grande do Norte, Natal, Brazil
Sérgio Mota-Rolim
Department of Psychology, Sapienza University of Rome, Roma, Lazio, Italy
Luigi De Gennaro
IRCCS Fondazione Santa Lucia, Rome, Italy
Luigi De Gennaro
Irccs Istituto Delle Scienze Neurologiche di Bologna, Bologna, Italy
Giuseppe Plazzi
Department of Biomedical, Metabolic and Neural Sciences, University of Modena and Reggio-Emilia, Modena, Italy
Giuseppe Plazzi
Institute of Neurobiology, Bulgarian Academy of Sciences, Sofia, Bulgaria
Juliana Yordanova
Institute for Consciousness and Dream Research, Medical University of Vienna, Vienna, Austria
Brigitte Holzinger
Department of Clinical Neurosciences, University of Helsinki Clinicum Unit, Helsinki, Finland
Markku Partinen
Helsinki Sleep Clinic, Terveystalo Healthcare Services, Helsinki, Finland
Markku Partinen
Católica Research Centre for Psychological - Family and Social Welbeing, Universidade Católica Portuguesa, Lisbon, Portugal
Cátia Reis
Instituto de Medicina Molecular João Lobo Antunes, Universidade de Lisboa, Lisbon, Portugal
Cátia Reis

Authors

Maria Korman
View author publications
You can also search for this author in PubMed Google Scholar
Daria Zarina
View author publications
You can also search for this author in PubMed Google Scholar
Vadim Tkachev
View author publications
You can also search for this author in PubMed Google Scholar
Ilona Merikanto
View author publications
You can also search for this author in PubMed Google Scholar
Bjørn Bjorvatn
View author publications
You can also search for this author in PubMed Google Scholar
Adrijana Koscec Bjelajac
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Penzel
View author publications
You can also search for this author in PubMed Google Scholar
Anne-Marie Landtblom
View author publications
You can also search for this author in PubMed Google Scholar
Christian Benedict
View author publications
You can also search for this author in PubMed Google Scholar
Ngan Yin Chan
View author publications
You can also search for this author in PubMed Google Scholar
Yun Kwok Wing
View author publications
You can also search for this author in PubMed Google Scholar
Yves Dauvilliers
View author publications
You can also search for this author in PubMed Google Scholar
Charles M. Morin
View author publications
You can also search for this author in PubMed Google Scholar
Kentaro Matsui
View author publications
You can also search for this author in PubMed Google Scholar
Michael Nadorff
View author publications
You can also search for this author in PubMed Google Scholar
Courtney J. Bolstad
View author publications
You can also search for this author in PubMed Google Scholar
Frances Chung
View author publications
You can also search for this author in PubMed Google Scholar
Sérgio Mota-Rolim
View author publications
You can also search for this author in PubMed Google Scholar
Luigi De Gennaro
View author publications
You can also search for this author in PubMed Google Scholar
Giuseppe Plazzi
View author publications
You can also search for this author in PubMed Google Scholar
Juliana Yordanova
View author publications
You can also search for this author in PubMed Google Scholar
Brigitte Holzinger
View author publications
You can also search for this author in PubMed Google Scholar
Markku Partinen
View author publications
You can also search for this author in PubMed Google Scholar
Cátia Reis
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceptualization: M.K., C.R.; Data curation: M.K. (Israel), I.M., M.P. (Finland), B.B. (Norway), A.K.B. (Croatia), T.P. (Germany), A.L., C.B. (Sweden), N.Y.C., Y.K.W. (Hong Kong), Y.D. (France), C.M.M. (Canada), K.M. (Japan), M.N., F.C. (US), S.M.R. (Brazil), L.G., G.P. (Italy), J.Y. (Bulgaria), B.H. (Austria), C.R. (Portugal); Formal analysis: M.K., D.Z., V.T.; Analysis dicussion: M.K., C.R.; Methodology: M.K., C.R.; Project administration: M.P., I.M.; Writing—original draft: M.K., D.Z., V.T., C.R.; and all authors review, edit and approved the final version.

Corresponding authors

Correspondence to Maria Korman or Cátia Reis.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Korman, M., Zarina, D., Tkachev, V. et al. Estimation bias and agreement limits between two common self-report methods of habitual sleep duration in epidemiological surveys. Sci Rep 14, 3420 (2024). https://doi.org/10.1038/s41598-024-53174-1

Download citation

Received: 19 November 2023
Accepted: 29 January 2024
Published: 10 February 2024
DOI: https://doi.org/10.1038/s41598-024-53174-1

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.