Digital phenotyping by consumer wearables identifies sleep-associated markers of cardiovascular disease risk and biological aging

Sleep is associated with various health outcomes. Despite their growing adoption, the potential for consumer wearables to contribute sleep metrics to sleep-related biomedical research remains largely uncharacterized. Here we analyzed sleep tracking data, along with questionnaire responses and multi-modal phenotypic data generated from 482 normal volunteers. First, we compared wearable-derived and self-reported sleep metrics, particularly total sleep time (TST) and sleep efficiency (SE). We then identified demographic, socioeconomic and lifestyle factors associated with wearable-derived TST; they included age, gender, occupation and alcohol consumption. Multi-modal phenotypic data analysis showed that wearable-derived TST and SE were associated with cardiovascular disease risk markers such as body mass index and waist circumference, whereas self-reported measures were not. Using wearable-derived TST, we showed that insufficient sleep was associated with premature telomere attrition. Our study highlights the potential for sleep metrics from consumer wearables to provide novel insights into data generated from population cohort studies.

T he relationship between sleep and various health outcomes has been extensively studied. Among others, insufficient sleep has been linked to obesity 1,2 , hypertension [3][4][5][6] , cardiovascular disease (CVD) 7-10 , insulin resistance [11][12][13][14] , and even premature death 15,16 . Previous studies on sleep-health interactions have relied on three methods to quantify sleep; sleep questionnaires/diaries, actigraphy, and polysomnography (PSG). There are drawbacks associated with each approach. First, sleep questionnaires/diaries lack precision and rely on subjective recall 17 . Second, actigraphy involves specialized devices and are only suitable for relatively short studies. Finally, PSG studies, while being the gold-standard in accuracy, are very resourceintensive to conduct 18 .
The digital revolution has resulted in the proliferation of consumer wearables with activity tracking functionality. These devices range from relatively simple and low-cost fitness trackers to more sophisticated and multifunctional smartwatches. Beyond physical activity, such devices also track sleep duration and sleep stages, the latter using integrated heart rate (HR) sensors. Although marketed as tools to promote healthy sleep habits, the rapidly growing adoption of consumer wearables suggest their potential as sources of quantitative sleep data for sleep-related biomedical research.
Recent studies have begun exploring the potential of sleep data derived from consumer wearables. First, researchers have compared the accuracy of sleep as measured by consumer wearables from several manufacturers (e.g., Fitbit and Jawbone) with gold standard PSG measurements [19][20][21][22] . Consumer wearables were found to perform similarly to actigraphs in that they were accurate in detecting sleep but did less well in detecting wake 21,23 . Some cohort studies have begun using consumer wearables. For example, we recently used Fitbit-derived sleep tracking data to show differences in sleep patterns among volunteers stratified into various activity pattern clusters 24 . Xu et al. used Fitbit Charge HR sleep data from 748 individuals to demonstrate independent associations between both sleep duration and sleep duration variation with body mass index (BMI) 25 . Additionally, Turel et al. used Fitbit devices to show a negative association between sleep duration and abdominal obesity 26 .
Despite these advances, the potential role of sleep metrics from consumer wearables in population health studies remains largely unexplored. First, there has been limited comparison between sleep metrics from consumer wearables and self-reported sleep quality from questionnaires such as the Pittsburgh Sleep Quality Index (PSQI), which is typically used in large cohort studies where it is it impractical and costly to use actigraphy or PSG 25,27 . This is important if consumer wearables are to replace or augment sleep questionnaires in future cohort studies. Second, the utility of consumer wearables in identifying associations between sleep and health markers is relatively unknown, especially in Asians; a population with considerably different sleep behavior compared to Western cohorts 25,28 . Health markers of typical interest in population health studies include CVD risk markers such as anthropometrics, blood pressure, lipid profile, and fasting blood glucose (FBG). Telomeres are hexameric repeats that cap chromosome ends and are progressively shortened with successive cell divisions 29 . Leukocyte telomere length (LTL) is thus usually included in cohort studies as a biomarker of aging 30 . Finally, there has yet to be an exploration of how wearable sleep data correlates with demographic, socioeconomic, and lifestyle factors.
Using an expanded cohort and dataset compared to our initial study 24 , we sought to address these gaps through a comprehensive analysis of sleep data obtained from Fitbit Charge HR activity trackers worn by 482 Singaporean volunteers. Apart from the wearable tracking, these volunteers were comprehensively profiled for CVD risk markers and LTL. We found that sleep metrics from consumer wearables could be used to identify not just sleep-related demographic, socioeconomic, and lifestyle factors in health cohorts, but also CVD risk markers affected by sleep duration and quality. Furthermore, we used wearablederived sleep duration to show that volunteers with insufficient sleep experienced premature telomere shortening. Our results highlight the potential for consumer-grade wearables as sources of quantitative sleep metrics in population health studies, thus increasing power to detect sleep-associated factors.

Results
Comparison between wearable-derived and subjective sleep metrics. The cohort of 482 volunteers was tracked using Fitbit Charge HR wearables that measured physical activity, HR, and sleep. Summary statistics for the cohort are shown in Table 1. The volunteers were on average 46 years of age (range 21 to 69 years) at enrollment. On average, they had 4 nights of tracked sleep (range 3 to 11 nights), with a mean total sleep time (TST) of 6 h and 28 min. We first compared objective sleep measures from consumer wearables to subjective PSQI responses. The PSQI sleep questionnaire comprises several components, each encompassing a different aspect of sleep quality. It then summarizes individual component scores into a global PSQI score. We compared wearable-derived TST and SE with global PSQI scores and found correlations in neither (r s = −0.089, p = 0.091 and r s = −0.080, p = 0.129 respectively). Wearable-derived TST, however, showed a significant, albeit weak correlation with self-reported TST (r s = 0.283, p = 2.394E-10). We asked if this weak correlation could be due in part to the relatively short study duration, and therefore modified the inclusion criteria from at least three nights of tracked sleep to four and five nights. Indeed, when the thresholds increased, correlation with self-reported TST rose to 0.322 (p = 6.218E-09, n = 310) and 0.397 (p = 1.425E-06, n = 138) respectively. We then categorized self-reported TST by levels specified in component 3 of the PSQI, which profiles habitual sleep duration. Compared to those with the lowest score of 0 (>7 h of sleep), those with scores of 1 (6 to7 hours) and 2 (5 to 6 h) had lower wearable-derived TST (β = −0.321, CI = −0.512 to −0.131, p = 0.001 and β = −0.721, CI = −1.027 to −0.415, p = 4.94E-06 respectively, Fig. 1a). However, those with a score of 3 (<5 h) exhibited no significant difference in TST compared to those with a score of 0 (β = −0.428, CI = −0.972 to 0.115, p = 0.123), indicating lower concordance with wearable-derived TST among those with self-perceived chronic sleep deprivation. This may be due to the limited number of volunteers in that category (n = 13) and correspondingly higher variability in wearablederived TST. Overall, volunteers on average over-estimated habitual sleep duration by 6 min compared to objective wearable-derived measurements (p = 0.067, paired Student's t-test).
Apart from TST, the Fitbit wearables also measure sleep efficiency (SE) as the fraction of TST over total time in bed. Similarly, the PSQI estimates SE using self-reported habitual sleep duration, sleep times and wake times. An overall comparison between wearable-derived and self-reported SE revealed no correlation (r s = −0.080, p = 0.081). However, when selfreported SE was grouped by PSQI component 4 scoring thresholds (>85%, 75-84%, 65-75%, <65%), volunteers in the <65% group had significantly lower wearable-derived SE compared others (mean SE = 89.722% vs 92.638%, two-sided Student's ttest p-value = 0.005, Fig. 1b). Thus, only volunteers with the poorest self-perceived SE have concordantly lower wearablederived SE. We note that wearable-derived SE is almost uniformly high (mean SE = 92.584%), with little variation (SE standard deviation = 3.060%); possibly due to the lower sensitivity of Fitbit wearables in detecting wake states as opposed to sleep states 19 .
Another wearable-derived sleep metric available was the number of awakenings per sleep session. For each volunteer, we obtained the average daily number of nocturnal awakenings. We then compared this number against responses to question 5b of the PSQI, which asks volunteers how frequently they had trouble sleeping due to waking up in middle of the night or early morning. We observed a weak correlation (r s = 0.189, p = 4.114E-05), with volunteers reporting the highest (≥3 times/week) number of nocturnal awakenings having significantly higher daily wearable-detected nocturnal awakenings compared to those    reporting no trouble sleeping in the past month (β = 0.586, CI = 0.185 to 0.987, p = 0.004, Fig. 1c). Collectively, our findings show that consumer wearables provide objective sleep metrics that, although associated with to a certain extent, are orthogonal to subjective measures of sleep quality.
Relationship between wearable sleep metrics and cohort demographics. We next determined if wearable-derived sleep metrics can identify sleep-associated demographic, socioeconomic and lifestyle factors in our cohort. These factors were obtained from responses to detailed demographic and socioeconomic questionnaires administered during volunteer recruitment.
We next examined relationships between wearable-derived TST and socioeconomic factors. Among others, we considered income levels, residence type, education level and occupation type (Supplementary Table 1). Of these factors, occupation type and residence type were associated with TST. Volunteers engaged in manual work slept 27 min less then volunteers engaged in other occupation types (i.e., service industry, office work and unemployed/retired, CI = −50 to −4 min, p = 0.022, Fig. 2d). Furthermore, volunteers living in private residences slept 15 min longer than those living in public housing (CI = 2-27 min, p = 0.019).
Several self-reported lifestyle factors were also analyzed for association with wearable-derived TST. These included exercise, smoking status, alcohol consumption and caffeine consumption (Supplementary Table 2). Apart from alcohol consumption, no other significant associations were found. Volunteers who self-reported alcohol consumption within the past three months slept 19 min longer than those who did not, adjusting for age, gender and ethnicity (CI = 8-30 min, p = 8.54E-04). When alcohol consumption was broken down by type of alcohol, volunteers reporting consumption of hard liquor had the largest difference in TST compared to those that did not (28 min longer, CI = 10-46 min, p = 0.002), followed by red wine (19 min longer, CI = 5-33 min, p = 0.008) and beer (18 min longer, CI = 4-32 min, p = 0.014). We did not identify any significant associations when the analyses in this section  Tables 1 and 3).

Wearable-derived sleep metric associations with CVD markers.
A key aim of the study cohort was to study CVD risk in normal individuals.  Table 3). Neither wearablederived TST nor SE were significantly associated with blood pressure or FBG levels in this study. As with cohort demographics, we tested self-reported sleep metrics for TST and SE using the same models and found no significant associations (Supplementary Table 4). We also considered models with interactions between wearable-derived sleep metrics and two factors; age and gender. Apart from an interaction between wearable-derived TST and age for SMP, we did not identify any other significant interactions (Supplementary Table 5).
Wearable-inferred sleep insufficiency is associated with premature telomere attrition. A subset of the cohort (n = 175) underwent whole-genome sequencing (WGS) for prospective genetic studies. Studies have shown that LTL can be estimated from WGS data by analyzing reads containing the telomeric repeat motif (TTAGGG). We used a tool called Telomerecat that estimates LTL by calculating the ratio between read-pairs completely mapping to the telomere and those that span the telomere boundary 31 . WGS-inferred LTL ("WGS-LTL") was computed for the 175 volunteers with WGS data, and the estimated values corrected to account for different sequencing runs. We first selected 20 volunteers of varying WGS-LTL and experimentally measured LTL using quantitative polymerase chain reaction (qPCR, "qPCR-LTL").  Fig. 3a). We then asked if wearablederived sleep metrics were associated with telomere length and found a positive association between wearable-derived TST and WGS-LTL (β =  Fig. 3c).
To validate this finding, we performed qPCR-based telomere length estimation on 305 volunteers from the cohort without WGS data. We were able to replicate the association between wearable-derived TST and qPCR-LTL (β = 7.288E-04, CI = 8.318E-05 to 0.001, p = 0.028, adjusted for age, gender, ethnicity, and BMI, Fig. 3d), as well as the observation that volunteers with adequate sleep had longer telomeres than those with insufficient sleep (β = 0.253, CI = 0.079 -0.427, p = 0.005, adjusted for age, gender, ethnicity and BMI). However, when self-reported TST was used instead of wearable-derived TST, a significant association with LTL was found in the WGS-based discovery cohort (β = 93.835, CI = 25.645-162.026, p = 0.008) but not in the qPCR-based validation cohort (β = 0.020, CI = −0.015 to 0.055, p = 0.258). We also did not identify any significant interactions  There is evidence that excessive sleep may be associated with not only increased morbidity and mortality 32 , but also shorter telomeres 33,34 . This could result in increased LTL heterogeneity in the adequate sleep group (>7 h). We thus repeated the analysis with the adequate sleep group further stratified into two groups; adequate sleep (>7 h but < = 9 h) and long sleep (>9 h), with the adequate sleep group set as reference ( Supplementary Fig. 1

Discussion
We have shown in a sizeable cohort of 482 individuals how sleep metrics from consumer wearables can be used in biomedical research, particularly in the context of population health studies. This multi-modal cohort is one of the largest to-date with consumer wearable sleep metrics. Our comparison of objective wearable-derived sleep metrics against subjective measures obtained through the PSQI provides insights into the characteristics of these two modalities. The weak correlation between wearable-derived and self-reported TST is consistent with previous studies comparing objectively measured TST (PSG and actigraphy) with self-reported TST [35][36][37] . For example, Landry et al. reported a correlation of 0.29 between actigraph-derived and self-reported TST, despite a longer minimum tracking duration than ours (14 nights vs 3 nights). Concordance was poorer when we compared wearable-derived SE and number of nocturnal awakenings with their self-reported counterparts. This is again consistent with previous reports of no correlation between objectively-measured (PSG and actigraphy) and PSQI-derived SE 37,38 . Our comparison of wearable-derived sleep metrics against volunteer-provided responses to the PSQI-an instrument frequently used in population studies, will inform investigators considering using wearables in future studies. Furthermore, the  Fig. 3 Wearable-derived TST predicts leukocyte telomere length. a Adjusted WGS-LTL by age-group. b Adjusted wearable-derived TST and adjusted WGS-LTL. c Adjusted WGS-LTL and adjusted qPCR-LTL of volunteers with insufficient (<5 h) and adequate (>7 h) of TST. d Adjusted wearable-derived TST and adjusted qPCR-LTL. Asterisks denote significance of component score in linear model compared to reference score of 0. **p < 0.01, ***p < 0.001. LTL leukocyte telomere length, WGS-LTL LTL estimated using whole-genome sequencing, qPCR-LTL LTL estimated using quantitative PCR, TST total sleep time, bp base pairs, T/S T/S ratio. All LTL values are adjusted for age, gender, ethnicity, and BMI limitations highlighted present opportunities for researchers to develop algorithms that can more accurately detect wake states. Indeed, newer generations of Fitbit wearables (e.g., Alta HR) combine accelerometer and HR variability data to accurately stage sleep, increasing wake detection specificity to over 88% 20 .
The questionnaire responses from volunteers allowed us to study how wearable-derived sleep metrics are influenced by demographic, socioeconomic and lifestyle factors. Among others, we showed that wearable-derived TST was associated with age, gender, ethnicity, occupation type, and even habitual alcohol consumption. With many countries realizing the importance of population health studies, the use of consumer wearables to identify demographic, socioeconomic and lifestyle factors associated with sleep duration could provide vital insights into population sub-groups at risk for poorer health outcomes due to insufficient sleep.
Our analysis on how wearable-derived TST and SE relate to various CVD risk markers demonstrate the utility of wearablederived sleep metrics in biomedical research. Xu et al. previously described a link between habitual sleep duration estimated using Fitbit Charge HR devices and BMI in a predominantly European-American cohort of 471 individuals 25 . We identified in an Asian cohort this relationship despite a considerably shorter average tracking duration (4 nights vs 78 nights), suggesting that sleep metrics from even short studies can be useful. One novel aspect of this study was our analysis of wearable-derived SE against CVD risk markers. Our findings of links between SE and three obesity markers; BMI, WC and WHtR, is supported by previous studies performed using orthogonal approaches such as PSG and actigraphy 39,40 . This indicates that wearables can contribute beyond TST to health cohort studies, notwithstanding current limitations to the accuracy of wearable-derived SE. The paucity of associations between wearable-derived sleep metrics and clinical parameters such as FBG (a marker of insulin sensitivity), and blood pressure could be due in part to the cohort size, and highlights the need for wearables to be included in larger population-scale cohort studies in order to thoroughly assess their utility.
In both analyses of cohort demographics and CVD risk markers, no significant associations were identified when selfreported TST was used. This stood in contrast to wearablederived TST, which showed multiple significant associations. Several of these associations have been previously described; for example, shorter TST has been linked to older age 41 , male gender 25,41 , manual labor 42 , and increased BMI 1,25 . Several factors may explain the poor performance of self-reported TST. First, wearable-derived TST is more precise than self-reported TST (minute-level vs hour-level resolution), resulting in higher sensitivity to small differences in TST. Second, self-reported TST is only moderately correlated with TST objectively measured using PSG 35,36 or actigraphy 37 . In contrast, TST from consumer wearables (e.g., Fitbit) are highly correlated with TST measured using PSG 20,43,44 and actigraphy 23 . Third, the subjectivity of selfreported TST exposes it to biases; one study found that while men self-reported longer TST than women, actigraphy data indicated the opposite 45 . Questionnaires and PSG are on opposite ends of the sleep detection accuracy spectrum 17 , whereas contact-based approaches such as actigraphy and consumer-wearables are second only to PSG in accuracy. Our study therefore, reinforces the utility of consumer wearables as a low-cost yet objective source of sleep metrics in population cohort studies.
Beyond comparisons with the usual clinical health markers, our study provides a novel demonstration of the utility of wearable-derived sleep data in the study of biological aging. We showed that in a normal free-living cohort, individuals with short habitual sleep duration experienced premature telomere attrition.
Previous studies of this phenomenon have either used sleep questionnaires 33,[46][47][48] or cohorts of sleep disorder patients 49,50 . As premature telomere shortening has been linked to the early onset of various age-related diseases 51 and all-cause mortality 52,53 , new evidence on the link between insufficient sleep and accelerated aging such as this are vital in helping shape public policy (e.g., later school start times, altered work hours and schedules, etc.) and to promote healthier sleep habits among the public. This finding is especially relevant to Singapore, a developed nation whose citizenry is among the most sleep-deprived in the world 41,54 . We did not identify in our study statistically significant shorter LTL among volunteers with long wearablederived TST. This could be due to the small numbers of volunteers who are long sleepers. In fact, only one volunteer had wearable-derived TST exceeding 10 h. Our use of (1) wearables for sleep tracking and (2) WGS for LTL estimation in this study, demonstrate the versatility that emerging technologies can bring to population cohort studies by providing added behavioral and phenotypic data beyond their primary functions.
In summary, our study has demonstrated various aspects in which sleep metrics from wearables can be used in cohort studies. Apart from comparing wearable-derived and self-reported sleep metrics, our work has shown that wearables can be used to study how sleep relates to demographic, socioeconomic and lifestyle factors, as well as various markers of health and aging. The increasing ubiquity of wearables and other forms of digital health, represent a rich source of behavioral data that can be tapped by investigators running cohort studies. Beyond the use of wearables as study-provided devices, a BYOD (bring your own device) model, where participants share data from their own wearables with investigators through application programming interfaces (APIs), is also possible. This is particularly attractive as the BYOD model allows for much longer tracking durations with minimal incremental cost. At the same time, the use of wearables and other digital health devices in population health studies can catalyze further development of digital applications that promote healthy behavior, including sleep habit.

Methods
Study volunteers and ethics statement. Volunteers responding to print advertisements were recruited as part of the SingHEART/Biobank study using a protocol and written informed consent form approved by the SingHealth Centralized Institutional Review Board (ref: 2015/2601). The cohort's details and its inclusion criteria have been previously described 24 . Among others, volunteers underwent an activity tracking study using a consumer wearable (Fitbit Charge HR) and detailed clinical profiling of various CVD risk markers (anthropometry, blood pressure, lipid panel, FBG, etc.). In addition, DNA was extracted from volunteer whole blood samples for molecular studies. After evaluation for completeness of sleep and activity tracking data, and the removal of subjects with extreme outlier activity metrics, 482 volunteers were included in this study. The sample size is comparable to, or exceeds that of similar studies 18,19 .
Processing of wearable sleep metrics. For each volunteer, we extracted their Fitbit data (activity, heart rate and sleep) using the Fitbit Web API (https://dev. fitbit.com/reference/web-api/quickstart/). Data completeness was evaluated by availability of HR data, and days with no intraday steps were excluded 24 . We considered days with at least 20 h of data to be complete, and only volunteers with at least three data-complete days were included. Detailed sleep tracking data from Fitbit was obtained in the JSON (JavaScript Object Notation) format, and processed using an R. For each day, we summed the duration of all sleep sessions starting between 8 PM and 8 AM. We then averaged daily sums for each volunteer to obtain the TST. Sleep hour was determined by calculating the average start time of sleep sessions occurring between 8 PM and 8 AM with duration more or equal to 3 h. Wake hour was determined by averaging the end time of sleep sessions. SE was computed in a similar fashion, except that for each day, the average SE of sleep sessions was obtained. In addition, we estimated the number of nocturnal awakenings by averaging daily total wake counts.
Questionnaires. The volunteer recruitment process included the administration of several questionnaires. This included the SingHEART patient questionnaire which covered demographics, socioeconomic factors, medical history, smoking history, alcohol consumption patterns, exercise and dietary habits. The General Practice Physical Activity Questionnaire (GPPAQ) was used to estimate physical activity levels as previously described 24 . Furthermore, the Pittsburgh Sleep Quality Index (PSQI) questionnaire, which assesses sleep quality and disturbances over a onemonth time interval, was also administered 27 . Volunteers were asked about their sleep habits, for example their bed time, hours of sleep per night, sleep trouble and sleep quality. The PSQI contains 19 self-rated questions and 5 questions rated by the bed partner or roommate (if one is available). These 19 self-rated items are then combined to produce seven component scores, each of which has a range of 0-3. Components 3 and 4, as well as Question 5b were examined in this study due to their relevance to our wearable-derived sleep metrics (TST, SE and nocturnal awakenings respectively).
Association tests. Multiple linear regression analyses described in this study were conducted using the GLM (generalized linear model) function in R and Gaussian error distribution was used. When gender was considered as a covariate, the female gender was set as the reference level. Whereas the following are the reference level for each of the socioeconomic factor when it was set as a covariate: "Publichousing" for residence type; "Others" for education level and "Manual-labor" for occupation type. For lifestyle factors, the reference level for alcohol and caffeine, tea and green tea consumption is "No"; the reference levels for smoking and exercise status are "Ex-smoker" and "Never/hardly" respectively.
For linear regression analysis between wearable-derived sleep metrics and CVD risk markers, three models were used. Age, gender, ethnicity, daily step counts were included as covariates. For Model 1 and Model 2 which considered TST and SEbased sleep metric, the model is, respectively, Marker~Age + Gender + Ethnicity + AverageDailyTotalSteps + TST and Marker~Age + Gender + Ethnicity + AverageDailyTotalSteps + SE. Whereas for Model 3 which includes both TST and SE, the model is Marker~Age + Gender + Ethnicity + AverageDailyTotalSteps + TST + SE. For linear regression analysis between wearable-derived TST metrics and socioeconomic factors, the following model was used: TST~Age + Gender + Ethnicity + SocioeconomicFactor. In addition, linear regression analysis between wearable-derived TST metrics and LTL was done by using this model: LTL~Age + Gender + Ethnicity + BMI + TST. For analysis of interactions between wearable-derived sleep metrics and age or gender, the model used was LTL~TST * Age + Gender + Ethnicity + BMI and LTL~TST * Gender + Age + Ethnicity + BMI, respectively. DNA extraction. Genomic DNA was extracted from volunteer whole blood specimens using the Chemagic DNA blood kit (Perkin Elmer, MA) following manufacturer's protocol. The quality and quantity of extracted genomic DNA were assessed using LabChip DS (Perkin Elmer).
Telomere length estimation. To estimate LTL from WGS data (WGS-LTL), we used the Telomerecat 31 program, which does so by calculating the ratio between read-pairs completely mapping to the telomere and those that span the telomere boundary. As part of the SingHEART/Biobank study, we had sequenced the genomes of 546 volunteers (of which 175 overlapped with this study) at a target depth of 30×. The sequencing was performed by commercial sequencing providers using the Illumina Hiseq X platform. Telomerecat was used to estimate WGS-LTL for this dataset. Briefly raw sequencing reads in FASTQ files were aligned to the human reference genome (hs37d5) using BWA-MEM version 0.7.12 55 . The resulting BAM files were further processed using Sambamba version 0.5.8 56 to sort the reads and flag duplicates. Telomerecat was run in two steps. First, the bam2telbam command was run on individual BAM files to generate telbam files, which are small BAM files containing only sequencing reads relevant to LTL estimation. The telbam2length command was then executed to generate LTL estimates for the entire 546 set of telbam files. Finally, we used linear regression to correct for different sequencing runs. BAM files containing telomeric and subtelomeric sequencing reads are available in the European Nucleotide Archive (ENA, accession number PRJEB29577). WGS data for the individuals are deposited in the European Genome-phenome Archive (EGA, accession number EGAS00001003570) and are available subject to Data Access Committee (DAC) approval.
Telomere length measurement by qPCR. We used the qPCR method described by Cawthon 57,58 to estimate LTL. The qPCR experiments were performed by operators blinded to participant characteristics. Briefly, two primer pairs are used, with one targeting telomere repeats (T) and another targeting a 36B4, a known single-copy gene (S). For a given sample, the ratio between T and S amplification products (T/S ratio) is calculated. This T/S ratio correlates with telomere length, and the relative difference in T/S ratio between samples is proportional to the relative difference in their telomere lengths. Experimental details are as follows: Each genomic DNA sample was normalized to 35 ng/μL. Sequences of T and S primers and their final concentration are provided in Supplementary Table 6. A reference DNA sample (Promega, cat. no.: G1521) was diluted serially threefold from 109.0 ng/μL to 36.3, 12.1, 4.0, and 1.3 ng/μL to generate a standard curve. All samples were run in triplicate and the telomere and 36B4 PCR performed on two separate plates in identical well positions. Each reaction mix was prepared in 10 μL containing 1× PowerUp SYBR Green Master Mix (Applied Biosystems), 270-300 μM forward and 500-900 nM reverse primers and 35 ng gDNA or reference DNA. All PCRs were performed on StepOnePlus Real Time PCR system (Applied Biosystems) and began with Stage 1: 2 min at 50°C, 2 min at 95°C; Stage 2: 40 cycles of 15 s at 95°C, 15 s of 54°C (Telomere PCR) or 58°C (36B4 PCR), 1 min of 72°C with signal acquisition. The T and S concentrations were interpolated using the standard curve and the averages of T and S concentrations were calculated from the triplicates. Any outliers were removed or repeated. Inter-plate variability was controlled using a normalization factor derived from a control sample (Promega, cat. no.: G3041) running in triplicates in each run. All T and S concentrations were corrected using the normalization factor. As the experiment was conducted in two batches, linear regression was used to adjust for batch effects in the T and S concentrations prior to calculating the final T/S ratios.
Statistics and reproducibility. All statistical analyses in this study were performed using the R statistical environment. Correlations between continuous values (e.g., wearable-derived TST) and discrete questionnaire responses (e.g., PSQI scores) were calculated as Spearman's rank correlation coefficients (denoted as r s ), whereas correlations between pairs of continuous values were calculated as Pearson correlation coefficients (denoted as r p ). The relationship between TST and LTL was explored in two separate cohorts. First, a discovery cohort comprising 175 individuals with LTL estimated using WGS and second, a validation cohort comprising 305 individuals with LTL estimated using qPCR. The validation cohort, apart from demonstrating the reproducibility of the observation made in the discovery cohort, also showed that the relationship was present regardless of method used to estimate LTL (WGS or qPCR).
Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
Study participant characteristics (used to generate