Introduction

Excessive daytime sleepiness (EDS) is a chief symptom of chronic insufficient sleep1 as well as of several primary sleep disorders, such as sleep apnea, narcolepsy, and circadian rhythm disorders2,3. Several disease processes and medications also associate with prevalent and incident EDS4,5,6. EDS is estimated to contribute to risk for motor vehicle crashes, work-related accidents, and loss of productivity, highlighting its public health importance7,8. The clinical impact of EDS extends to a negative impact on cognition, behavior, and quality of life9. Therefore, sleep interventions often identify reduction in EDS as a chief goal. EDS is also associated with an increased risk for cardio-metabolic disorders, psychiatric problems, and mortality6,10 through pathways that may be causal, bi-directional, or reflect pleiotropic effects.

While EDS occurs in a variety of settings associated with insufficient sleep, there is large inter-individual variability in levels of EDS that is not fully explained by sleep duration, sleep quality, or chronic disease11. Experimental studies have shown that there is also individual vulnerability to EDS following sleep restriction11,12. The heritability of daytime sleepiness is estimated to be between 0.37 and 0.48 in twin studies13,14,15, 0.17 in family studies16, and between 0.084 and 0.17 in GWAS17,18, suggesting that genetic factors contribute to variation in sleepiness. Despite multiple candidate gene studies19 and GWAS17,20,21, including one from the first genetic release of the UK Biobank18, few genome-wide significant genetic variants have been reported, likely reflecting the heterogeneous and multifactorial etiology of the phenotype and low statistical power.

Here, we extend our GWAS of self-reported daytime sleepiness to the full UK Biobank dataset22 and identify multiple genetic variants grouping into different biological subtypes that associate with sleepiness. Bioinformatics analyses further highlight relevant biological processes and reveal shared genetic background with other diseases.

Results

Sample characteristics

In the UK Biobank22, 452,071 participants of European genetic ancestry self-reported the frequency of daytime sleepiness using the question: “How likely are you to doze off or fall asleep during the daytime when you don’t mean to? (e.g.: when working, reading or driving)”, with the answer categories “never” (N = 347,285), “sometimes” (N = 92,794), “often” (N = 11,963), or “all of the time” (N = 29). The severity of daytime sleepiness increased with older age, female sex, higher body mass index (BMI), various behavioral, social and environmental factors, and chronic diseases (Supplementary Table 1). Self-reported daytime sleepiness was positively but weakly correlated with self-reported insomnia symptoms, morning chronotype, ICD-10, or self-report of physician diagnosed sleep apnea and self-reported shorter and longer sleep duration, consistent with earlier reports or known clinical correlates18 (Spearman correlation <0.2; Supplementary Table 1 and Supplementary Fig. 1). Self-reported daytime sleepiness was also weakly correlated with shorter sleep duration, lower sleep efficiency (indicating more time awake during the sleep period), and longer daytime inactivity duration estimated using a 7-day accelerometry in a subset (N = 85,388) of UK Biobank participants (Methods; Supplementary Table 2).

GWAS, sensitivity, and replication analyses

We performed a GWAS of self-reported daytime sleepiness treating the four categories as a continuous variable using a linear mixed regression model23 adjusted for age, sex, genotyping array, ten principal components (PCs) of ancestry and genetic relatedness matrix, and identified 37 genome-wide significant loci (P < 5 × 10−8) (Fig. 1, Supplementary Fig. 2, Supplementary Table 3). The most significant association was observed within the gene KSR2, a gene associated with multiple physiological pathways relevant to sleep and metabolism24,25 (see Discussion). Additional novel loci were identified within or near genes with known actions on sleep–wake control regulation or that are associated with sleep disorders (e.g. PLCL1 (ref. 26), GABRA2 (ref. 27), BTBD9 (ref. 28), HTR7 (ref. 29), RAI1 (ref. 30)), metabolic traits (e.g. GCKR31, SLC39A8 (ref. 32)), and psychiatric traits (e.g. AGAP1 (ref. 33), CACNA1C34). Regional association plots of genome-wide significant loci are shown in Supplementary Fig. 3. We identified 37 association signals driven by common lead variants with minor allele frequency 0.08–0.49. The previously identified rare variant signals for daytime sleepiness at AR/OPHN1 (MAF = 0.002), ROBO1 (MAF = 0.003), and TMEM132B (MAF = 0.004) in the first release of the UK Biobank (N = 111,975)18 were not significantly associated with sleepiness in this study (P = 0.006–0.03; Supplementary Table 4). The lack of consistency across these analyses may relate to initial false-positive signals at rare variants (MAF = 0.001–0.005) and/or by selection bias in the initial sample in which heavy smokers had been oversampled35. However, two genome-wide significant loci, HCRTR2 and PATJ, overlapped with those identified for a composite sleep trait in the interim release sample and a suggestive sleepiness signal at CEPB1 was replicated. No association was seen with single-nucleotide polymorphisms (SNPs) reported in smaller independent GWAS of EDS, hypersomnia, or narcolepsy (Supplementary Table 4).

Fig. 1
figure 1

Manhattan plot for genome-wide association analysis of self-reported daytime sleepiness. Dotted line indicates genome-wide significance. Genetic association signals are highlighted in green and annotated with the nearest genes

Previous longitudinal research indicated obesity and weight gain are associated with incidence of daytime sleepiness5; therefore, we performed an additional GWAS adjusting for BMI to identify loci that may operate in obesity-independent pathways. This analysis identified five additional loci (Supplementary Figs. 4 and 5; Supplementary Table 3). Effect estimates at the 37 loci identified in the primary model were largely unchanged.

Sensitivity analyses on autosomes additionally adjusted for potential confounders (including depression, socio-economic status, alcohol intake frequency, smoking status, caffeine intake, employment status, marital status, neurodegenerative disorders, and psychiatric problems) and stratified by obesity and sleep duration did not substantially alter effect estimates of the identified signals (Supplementary Data 1; Supplementary Tables 5 and 6). Secondary GWAS (N = 255,426), excluding shiftworkers and individuals with chronic health or psychiatric illnesses, additionally identified significant variants in SEMA3D and revealed marginally significant interactions with health status at PATJ, ZENF326/BARHL2, ECE2, ASAP1, and CYP1A1/CYP1A2 (interaction P < 0.05; Supplementary Fig. 6, Supplementary Table 7). Conditional analyses at each locus identified no secondary signals. Sex-stratified analyses on autosomes additionally identified CWC27 and DIAPH3 in women but not in men; however, significant gene by sex interactions were not observed (Supplementary Fig. 7, Supplementary Table 8).

Replication was attempted using self-reported daytime sleepiness indices (based on related questions) available in European whites from HUNT36 (N = 29,906; Supplementary Table 9) and Health 2000 studies37 (N = 4546; Supplementary Table 10) (Methods). Five individual signals, including KSR2, SUSD4, and CYP1A1/CYP1A2 were marginally significant (P < 0.05) and with consistent association direction in individual cohorts and/or meta-analysis (Supplementary Table 11). A genetic risk score (GRS) of 42 sleepiness loci weighted by the effect estimates from our primary daytime sleepiness GWAS was replicated in a meta-analysis of HUNT, and Health 2000 (Fisher’s P = 0.00031; Supplementary Table 11), and this remained significant after removing the three marginally associated loci from the meta-analysis (Fisher’s P = 0.017). Marginal association with a tiredness phenotype was observed for two signals in a meta-analysis of FINRISK38 (N = 20,344; Supplementary Table 12) and Finnish Twin39 (N = 5766; Supplementary Table 13); however, a combined GRS was not significant (Fisher’s P = 0.551; Supplementary Table 14).

We further validated our results through associations with subjective and objective measures of sleep patterns and disorders available in the UK Biobank. A GRS of 42 variants was associated with self-reported shorter sleep duration, morning chronotype, increased insomnia symptoms, increased frequency of daytime napping, and with accelerometry-assessed lower sleep efficiency and increased duration of daytime inactivity (Table 1).

Table 1 Association of weighted genetic risk score (GRS) of all 42 daytime sleepiness loci, 10 sleep propensity loci, and 27 sleep fragmentation loci with (a) multiple self-reported sleep traits and (b) 7-day accelerometry-derived sleep, circadian, and activity traits in the UK Biobank

Clustering of sleepiness loci suggest biological subtypes

Genetic variants may influence daytime sleepiness through different mechanisms. Therefore, to dissect this heterogeneity, we investigated associations of individual SNPs with other sleep traits (Supplementary Data 2).

Individual daytime sleepiness increasing alleles at PATJ and PLCL1 were also associated with morning chronotype; loci at metabolism regulatory genes KSR2, LOC644191/CRHR1, and SLC39A8 with self-reported sleep duration (KSR2 with increased sleep duration, LOC644191/CRHR1 with long sleep, and SLC39A8 with short sleep); LMOD1 and LOC644456/LOC730134 with both insomnia and short sleep duration; and at the orexin/hypocretin receptor HCRTR2 with both morning chronotype and short sleep duration, suggesting common genetic factors. Adjusting for sleep disturbance traits (ICD-10 code defined sleep apnea or narcolepsy, or self-reported sleep duration hours, frequent insomnia symptoms or chronotype) together attenuated effect estimates for several loci, suggesting that these genetic variants influence sleepiness through altered sleep patterns and sleep disorders; however, adjustment for any trait alone only minimally altered effect estimates at individual loci (Supplementary Data 1).

Using 7-day accelerometry-derived data available in a subset of the UK Biobank (N = 85,388), we observed associations of several daytime sleepiness alleles with reduced sleep efficiency (e.g., SNX17) whereas others were associated with increased sleep efficiency (e.g., PLCL1), suggesting that genetic mechanisms may lead to sleepiness through effects on increased sleep fragmentation (i.e., low sleep efficiency) or increased sleep propensity (i.e., high sleep efficiency), respectively (Supplementary Data 2). Therefore, we performed hierarchical clustering analyses on risk alleles for sleepiness at 42 loci according to their association effect sizes (z-scores) with objective estimates of sleep efficiency, sleep duration, and number of sleep bouts, and self-reported frequent insomnia symptoms. An iterative approach based on silhouette coefficients was performed to remove cluster outliers (Methods; Supplementary Fig. 8). We interpreted sleepiness alleles showing patterns of association with higher sleep efficiency, longer sleep duration, fewer discrete sleep bouts and fewer insomnia symptoms as reflective of greater intrinsic sleep propensity, whereas sleepiness alleles associated with these sleep traits in a largely inverse manner were interpreted as reflective of disturbed sleep or a sleep fragmentation phenotype resulting in less restorative sleep (Fig. 2). GRS of daytime sleepiness loci stratified by the two clusters support our interpretation, with sleep propensity loci showing robust associations with early circadian traits (e.g. morning chronotype P = 5.54 × 10−4; Table 1).

Fig. 2
figure 2

Daytime sleepiness risk alleles associate predominantly with sleep propensity or sleep fragmentation phenotypes. Each cell shows effect sizes (z-scores) of associations between sleepiness risk alleles (positively associated with self-reported daytime sleepiness) and sleep traits (accelerometry-derived sleep efficiency, sleep duration, number of sleep bouts, and self-reported insomnia symptoms). Blue color indicates positive z-scores and red color indicates negative z-scores. Sleep propensity alleles were defined as more likely associated with higher sleep efficiency, longer sleep duration, fewer sleep bouts, and fewer insomnia symptoms. Sleep fragmentation alleles were defined as more likely associated with lower sleep efficiency, shorter sleep duration, more sleep bouts, and more insomnia symptoms

Functional effects of loci

Sleepiness loci lie in genomic regions encompassing 164 genes (Supplementary Data 3), and 3 associations are in strong linkage disequilibrium with known GWAS associations for other traits, including blood cell count, high-density lipoprotei cholesterol, and caffeine metabolism. Genes at multiple loci have been implicated in Mendelian syndromes or in experimental studies in mouse or fly models. Eighteen loci harbor one or more genes with potential drug targets.

We performed fine mapping analyses for potential causal variants using PICS40 and identified 33 variants within 25 sleepiness loci with a causal probability larger than 0.2 (Supplementary Data 4). The majority of likely causal variants were intronic (65%) followed by non-coding transcript variants (8%) and nonsense-mediated decay transcript variants (7%) (Supplementary Fig. 9). Functional variants included a missense variant rs12140153 within PATJ, a synonymous variant rs11078398 within RAI1, regulatory variants rs10800796 in the promoter region of LMOD1, and rs239323 in a CTCF-binding site in the gene POM121L2. Using the Oxford Brain Imaging Genetics (BIG) server41, we further observed the pleiotropic locus at rs13135092 (SLC39A8, previously associated with blood lipids, height, schizophrenia, and other traits42) to be significantly associated with bilateral putamen and striatum volume in the UK Biobank (P < 2.8 × 10−7; N = 9,707; Supplementary Fig. 10). This could be of particular interest given the importance of these central brain centers in influencing motor and emotional behaviors, and emerging data implicating these centers in the integration of behavioral inputs that modulate arousal and sleep–wake states43,44.

Gene-based, pathway, and tissue enrichment analyses

Gene-based analyses using PASCAL45 identified 94 genes associated with self-reported daytime sleepiness (enrichment P < 2.29 × 10−6) (Supplementary Table 15), of which 61 overlapped with genes under significant association peaks shown in Supplementary Data 3. Tissue enrichment analysis across 53 tissues in the GTEx database using MAGMA46 identified multiple brain tissues including the frontal cortex, cerebellum, anterior cingulate cortex, nucleus accumbens, caudate nucleus, putamen, hypothalamus, amygdala, and hippocampus (enrichment P < 10−3; Supplementary Table 16, Supplementary Fig. 11a). Pathway and ontology analyses using PASCAL identified enrichment in neuronal synaptic transmission pathways and EnrichR47 identified pathways involved with the central nervous system, neurotransmitters, and metabolic processes (e.g. insulin receptor signaling pathway) (Supplementary Table 17 and Supplementary Data 5). Genes at loci showing clustering with sleep propensity phenotypes (n = 37) showed enriched expression in brain tissues including cortex and amygdala (enrichment P < 10−3; Supplementary Fig. 11b). In contrast, no tissues were enriched in expression of genes that showed clustering with sleep fragmentation phenotypes (n = 86) perhaps reflecting further heterogeneity (Supplementary Fig. 11c). Pathway and ontology analyses results for clustered genes using FUMA also reveal different patterns (Supplementary Figs. 12 and 13).

The SNP-heritability of self-reported daytime sleepiness explained by genome-wide SNPs was estimated at 6.9% (SE = 1%). Partitioning heritability across tissue types and functional annotation classes indicated enrichment of heritability in central nervous system and adrenal/pancreas tissue lineage tissues, and in regions conserved in mammals, introns, and H3K4me1- potentially active and primed enhancers (enrichment P < 8.3 × 10−4) (Supplementary Tables 18 and 19).

Genetic correlation and Mendelian randomization

Consistent with daytime sleepiness being a symptom of several sleep disorders, GRSs of genome-wide significant SNPs for restless legs syndrome48 (P = 0.0002), insomnia49 (P = 4 × 10−7), and coffee consumption50 (P = 1.87 × 10−12) (often used as a sleepiness “counter-measure”) were significantly associated with self-reported daytime sleepiness phenotype (Table 2). Although EDS is a key symptom of narcolepsy, the GRS of narcolepsy51 was not associated with self-reported daytime sleepiness (P = 0.126), suggesting narcolepsy loci did not explain sleepiness variation in this sample. We could not examine the genetic overlap of sleep apnea loci and sleepiness because few significant loci for sleep apnea have been reported in the literature and there was limited sleep apnea information in this cohort.

Table 2 Association between weighted genetic risk scores (GRS) of significant SNPs (P < 5 × 10−8) for other sleep behavioral traits and sleep disorders with self-reported daytime sleepiness phenotype in UK Biobank

To investigate the genetic correlation between sleepiness and other common disorders, we tested the proportion of genetic variation of self-reported daytime sleepiness shared with 233 other traits with published GWAS summary statistics in LDSC52. After adjusting for multiple comparisons, significant positive genetic correlations were observed for daytime sleepiness with obesity traits, coronary heart disease, and psychiatric traits (P < 0.0001) (Supplementary Data 6). The genetic correlations with coronary artery disease and psychiatric traits persisted after adjusting for BMI (Fig. 3). Consistently suggestive negative genetic correlations for daytime sleepiness with subjective well-being and reproductive traits (age at menarche and age at first birth) were also observed (P < 0.005).

Fig. 3
figure 3

Top significant genetic correlations (rg) between self-reported daytime sleepiness and published summary statistics of independent traits using genome-wide summary statistics using LD score regression (LDSC). Blue color indicates positive genetic correlation and red color indicates negative genetic correlation. Larger colored squares correspond to more significant P values, and asterisks indicate significant (P < 2.2 × 10−4) genetic correlations after adjusting for multiple comparisons of 224 available traits. All genetic correlations in this report can be found in tabular form in Supplementary Data 6

To evaluate the causal relationship between sleep disorders or other disease traits and daytime sleepiness, we performed two-sample summary-level Mendelian Randomization (MR) analyses using independent genetic variants from published summary statistics from GWAS of BMI, type 2 diabetes, coronary heart disease, neuroticism, bipolar disorder, depression, schizophrenia, age of menarche, restless legs syndrome, narcolepsy, insomnia, sleep duration, and chronotype as exposures and daytime sleepiness as outcome53. Using the inverse variance weighted (IVW) approach, we identified a putative causal association of higher BMI with increased daytime sleepiness (IVW β = 0.018; 95% CI [0.008, 0.028]; P = 0.0004), which was significant after accounting for multiple comparisons (IVW P < 0.003; Supplementary Table 20). However, there was evidence of variant heterogeneity potentially due to horizontal pleiotropy (Cochran’s Q = 677.17; P = 1.09 × 10−37; Supplementary Table 21). Therefore, we performed sensitivity analysis using the Radial MR-Egger approach (Methods)54 to control for bias due to pleiotropy, and observed an effect that was consistent with our main IVW analyses but less precisely estimated (wider confidence intervals) because this method is statistically relatively inefficient (MR-Egger β = 0.025; 95% CI [−0.005, 0.055]; P = 0.103; Fig. 4 and Supplementary Table 21). An additional suggestive causal association of type 2 diabetes with increased daytime sleepiness was also observed (IVW β = 0.005; 95% CI [0.001, 0.009]; P = 0.014) with evidence of heterogeneity (Cochran’s Q = 88.38; P = 0.005), but broadly consistent results when using Radial MR-Egger again showed a consistent effect direction (MR-Egger β = 0.002; 95% CI [−0.006, 0.01]; P = 0.637; Supplementary Table 21). Reverse MR did not identify any strong evidence for daytime sleepiness having a causal effect on any of the outcomes we examined (Supplementary Table 22).

Fig. 4
figure 4

Radial plots of two-sample Mendelian randomization (MR) analysis of daytime sleepiness. a MR between BMI and daytime sleepiness outcome using IVW and MR-Egger tests. b MR between Type 2 diabetes and daytime sleepiness outcome using IVW and MR-Egger tests. The x-axis is the inverse standard error (square root weights in the IVW analysis) for each SNP. The y-axis scale represents the ratio estimate for the causal effect of an exposure on outcome for each SNP (\(\hat \beta _j\)) multiplied by the same square root weight

Discussion

This study expands our knowledge of the genetic architecture of daytime sleepiness. Despite the modest SNP-heritability (h2 = 6.9%, consistent with previous reports17,18), we identified 42 genome-wide significant loci (P < 5 × 10−8) given boosted statistical power with 452,071 samples. The association effects were largely unchanged adjusting for BMI, depression, socio-economic status, alcohol, smoking, caffeine, employment, neurodegenerative disorders, sleep disturbance traits individually, and upon exclusion of shiftworkers, sleep/psychiatric medication users. We did not evaluate the effect of restless legs syndrome and periodic limb movement disorder because information on these disorders were not collected in the UKB.

An aggregate effect of a genetic risk score of 42 loci was confirmed in independent Scandinavian cohorts with different self-reported daytime sleepiness. Despite the challenges of individual loci replication with insufficient power (5–57% in replication cohorts; Supplementary Table 11), variable questionnaires in different scales across different cohorts and the multifactorial etiology of sleepiness, we observed nominal replication at five loci including our most significant association observed at KSR2, a gene regulating multiple signaling pathways (e.g. the ERK/MEK signaling pathway), affecting energy balance, cellular fatty acid, and glucose oxidation that is implicated in obesity, insulin resistance, and heart rate during sleep in previous studies in humans and mice24,25. While the GRS association was highly significant including the three loci with nominal significance in the meta-analysis, the effect estimates removing the three loci remained at 67% of the original effect in HUNT and 86% of the original effect in Health 2000, suggesting that additional individual sleepiness loci contribute to the combined effect of the GRS. However, replication in additional, well-powered cohorts will be important.

The validation of our results was further supported by associations between sleepiness GRS with self-reported shorter sleep duration, morning chronotype, increased insomnia risk, increased frequency of daytime napping, and with accelerometry-derived lower sleep efficiency and increased duration of daytime inactivity. These associations also suggest sleepiness loci impact other sleep parameters such as sleep latency, sleep efficiency, and sleep timing. The sleepiness GRS was not associated with 7-day accelerometry-derived continuous sleep duration, largely reflecting the heterogeneity of self-reported sleepiness (sleep propensity vs sleep fragmentation).

Our results were strengthened by previous GWAS associations with related traits (e.g. metabolic and psychiatric traits), model organism evidence for sleep phenotypes (HCRTR2, SEMA7A, and RAI1), and tissue and pathway enrichment analyses. Genes under association peaks were enriched in multiple brain tissues, including brain regions implicated in sleep–wake and arousal disorders55 as well as centers responsive to sleep deprivation and pathways involved with the central nervous system, neurotransmitters, and metabolic processes. Enrichment of partitioning heritability were observed in variants in central nervous system and highly conserved regions shared by human and other 28 mammals56, suggesting strong conservation of sleep regulation throughout evolution.

We also investigated the heterogeneity of daytime sleepiness loci for the first time by performing clustering analysis according to individual SNP associations with four major sleep parameters: 7-day accelerometry-derived sleep efficiency, sleep duration, and number of sleep bouts, and self-reported frequent insomnia symptoms. We discovered risk sleepiness variants at 10 loci (e.g. PLCL1 and KSR2) and their GRS that associated with sleep propensity traits (higher sleep efficiency, longer sleep duration, fewer discrete sleep bouts, and fewer insomnia symptoms); whereas sleepiness variants at 27 loci (e.g. LMOD1, HCRTR2, and GABRA2, known to play a central role in sleep/wake control and narcolepsy57) and their GRS were more likely to contribute to sleep fragmentation (lower sleep efficiency, shorter sleep duration, more sleep bouts and more insomnia symptoms). Sleep propensity GRS revealed significant associations with early chronotype, reflective of circadian influences on sleep drive. Genes at sleep propensity loci also showed enriched expression in brain tissues whereas no tissues were enriched in sleep fragmentation loci, suggesting that the mechanisms associated with sleep fragmentation may be more complex, reflective of multifactorial influences. Future experimental and statistically robust clustering analysis that include other sleep and related traits are needed to validate and distinguish the biological subtypes of daytime sleepiness58.

We extended our analysis to compare the genetic architecture between daytime sleepiness and other common disorders, and observed significant genetic correlations with obesity, coronary heart disease, and psychiatric traits. The genetic correlations of sleepiness with coronary artery disease and psychiatric traits persisted after adjusting for BMI, perhaps partially reflecting shared neurologic or neuroendocrine factors, such as those that underlay insomnia and short sleep with cardiac and psychiatric traits49,59. Using MR analysis, we identified potential causal association of higher BMI with increased daytime sleepiness, consistent with prospective epidemiological studies, which likely reflect metabolic and/or circadian dysfunction in obese people5. Suggestive causal association of type 2 diabetes and daytime sleepiness were also identified, which may reflect a high prevalence of sleep disturbances in diabetes (e.g., sleep apnea) or systemic inflammation. Reverse MR did not identify any strong evidence for sleepiness having a causal effect on any of the outcomes we examined, implying that sleepiness is often a “symptom” of other disorders. However, replications with large samples are required to confirm the causal relationships. Future systematic MR analyses with other common disorders may be of particular interest.

This study has several strengths. It is the largest GWAS of self-reported daytime sleepiness with better power than previous studies. We identified 42 loci and confirmed the aggregated association effect in independent cohorts. Moreover, using a wide range of data on individual sleep traits—both self-reported and objectively measured—we showed that individual daytime sleepiness variants associate with unique patterns of sleep and circadian traits that largely cluster into two biological subtypes, intrinsic sleep propensity and sleep fragmentation. These findings were extended with knowledge from published databases of tissue-based expression, pathway annotations, and GWAS summary statistics for other traits.

This study also has several limitations. Primary analyses used self-reported daytime sleepiness expressed as a continuous variable derived from a 4-point scale. Future work should evaluate the psychometric properties of this question and compare it to other frequently used measures of daytime sleepiness, such as the Epworth Sleepiness Scale (ESS) or Maintenance of Wakefulness Test. It is likely that there was some loss of power due to use of a single measure of self-reported sleepiness resulting in random misclassification. However, the large sample for which questionnaire data were available provided results that were able to be further studied in a smaller sample of 7-day accelerometry-derived sleep data (which has been shown to agree well with polysomnography). Future work using objective measurements of sleepiness, such as from vigilance tests, may provide further insights into the genetics of sleepiness-related traits. The statistical power of this study may also be limited by the heterogeneity of daytime sleepiness which could be addressed by adjusting or restricting analyses using the available covariate data. Only individuals of European ancestries aged 40–69 years old in the UK were included, which limits generalizability to other populations and age groups, especially considering that sleep patterns change with age5.

In summary, we conducted an extensive series of analysis from a large-scale GWAS and identified the heterogeneous genetic architecture of daytime sleepiness. Multiple genetic loci were identified, including genes expressed in brain areas implicated in sleep–wake control and genes influencing metabolism. Shared genetic factors were identified for daytime sleepiness and other sleep disorders, with evidence that sleepiness variants clustered with two predominant phenotypes—sleep propensity and sleep fragmentation—with the former showing stronger evidence for enrichment in central nervous system tissues, suggesting two unique mechanistic pathways. Genetic variants for daytime sleepiness also overlapped those for other diseases and lifestyle traits, with evidence that higher BMI and possibly diabetes are causally associated with increased daytime sleepiness. This work will advance understanding of biological mechanisms relating to sleepiness and underlying sleep and circadian regulation, and open new avenues for future study.

Methods

Population and study design

The discovery analysis was conducted on participants of European ancestry from the UK Biobank study22. The UK Biobank is a prospective study that has enrolled over 500,000 people aged 40–69 living in the United Kingdom. Baseline measures collected between 2006 and 2010, including self-reported heath questionnaire and anthropometric assessments were used in this analysis. Participants taking any self-reported sleep medication (Supplementary Note 1) were excluded. The UK Biobank study was approved by the National Health Service National Research Ethics Service (ref. 11/NW/0382), and all participants provided written informed consent to participate in the UK Biobank study. In total, 452,071 individuals of European ancestry were studied with available phenotypes and genotyping passing quality control, as described below.

EDS and covariate measurements

Self-reported daytime sleepiness was ascertained in the UK Biobank using the question “How likely are you to dose off or fall asleep during the daytime when you don’t mean to? (e.g. when working, reading or driving)” with the response options of “Never/rarely”, “sometimes”, “often”, “all of the time”, “do not know”, and “prefer not to answer”. Participants reporting “do not know” and “prefer not to answer” were set to missing. Other responses were coded continuously as 1 to 4 corresponding to the severity of daytime sleepiness. The primary covariates used were self-reported age and sex, and BMI calculated as weight/height2. Covariates used in the sensitivity analyses include potential confounders (depression, social economic status, alcohol intake frequency, smoking status, caffeine intake, employment status, marital status, neurodegenerative disorders, and use of psychiatric medications) and indices of sleep disorders and sleep traits (daytime napping, sleep apnea, narcolepsy, sleep duration, insomnia, and chronotype). Depression was recorded as a binary variable (yes/no) corresponding to question “Ever depressed for a whole week?”. Social economic status was measured by the Townsend Deprivation Index based on aggregated data from national census output areas in the UK. Alcohol intake frequency was coded as a continuous variable corresponding to “daily or almost daily”, “three or four times a week”, “once or twice a week”, “once to three times a month”, “special occasions only”, and “never” drinking alcohol. Smoking status was categorized as “current”, “past”, or “never” smoked. Caffeine intake was coded continuously corresponding to self-reported cups of tea/coffee per day. Employment status was categorized as “employed”, “retired”, “looking after home and/or family”, “unable to work because of sickness or disability”, “unemployed”, “doing unpaid or voluntary work”, or “full or part-time student”. Neurodegenerative disorder cases (N = 517) were identified as a union of International Classification of Diseases (ICD)-10 coded Parkinson’s disease (G20–G21), Alzheimer’s disease (G30), and other degenerative diseases of nervous system (G23, G31–G32). Day napping was coded continuously (“never/rarely”, “sometimes”, or “usually”) responding to the question “Do you have a nap during the day?” Sleep apnea cases (N = 5571) were identified as a union of self-reported and ICD-10 coded (G47.3) sleep apnea. Narcolepsy cases (N = 7) were determined by the ICD-10 code (G47.4). Insomnia was recorded as “never/rare”, “sometimes”, or “usually” responding to the question “Do you have trouble falling asleep at night or do you wake up in the middle of the night?”. Individuals reported “usually” were considered as frequent insomnia symptom cases. Sleep duration was recorded as discrete integers in response to the question “About how many hours sleep do you get in every 24 hours (please include naps)”. In this study, short sleep was defined by sleep duration shorter than 7 h and long sleep was defined by sleep duration longer than 8 h. Chronotype was categorized as “definitely a “evening” person”, “more an “evening” than a “morning” person”, “more a “morning” than “evening” person”, and “definitely an “morning” person”. Secondary analyses were performed on participants further excluding shiftworkers, psychiatric mediation users, and participants with chronic and psychiatric illness (described in Supplementary Note 2, N = 255,426).

Activity-monitor-derived measures of sleep

Raw accelerometer data (.cwa) were collected using open source Axivity AX3 wrist-worn triaxial accelerometers (https://github.com/digitalinteraction/openmovement) in 103,711 individuals from the UK Biobank for up to 7 days60. We converted.cwa files to.wav files using Omconvert (https://github.com/digitalinteraction/openmovement/tree/master/Software/AX3/omconvert)60,61. Time windows of sleep (SPT-window) and activity levels were extracted for each 24-h period using a heuristic algorithm using the R package GGIR (https://cran.r-project.org/web/packages/GGIR/GGIR.pdf)62,63. Briefly, for each individual, a 5-min rolling median of the absolute change in z-angle (representing the dorsal–ventral direction when the wrist is in the anatomical position) across a 24-h period. The 10th percentile of the output was used to construct an individual’s threshold, distinguishing periods with movement from non-movement. Inactivity bouts were defined as inactivity of at least 30 min duration. Inactivity bouts with less than 60 min gaps were combined to blocks. The SPT-window was defined as the longest inactivity block, with sleep onset as the start of the block and waking time as the end of the block. The sleep measurements derived from accelerometer data using this algorithm has been shown to provide reliable estimates for sleep onset time, waking time, SPT-window duration, and sleep duration within the SPT-window compared to polysomnography63. We applied exclusion criteria based on accelerometer data quality including (1) none-zero or missing in “data problem indicator” (Field 90002); (2) 0 in “good wear time” (Field 90015); (3) 0 in “good calibration” (Field 90016); (4) 0 in “calibrated on own data” (Field 90017); (5) “data recording errors” (Field 90182) >788 (Q3 + 1.5 × IQR); and (6) non-zero in “interrupted recording periods” (Field 90180). Accelerometry data from 85,388 participants of European ancestry passed quality control and were analyzed in this study.

The distributions of accelerometer data are described in Supplementary Table 2. The details of each measurement are as follows. L5 and M10 were the least-active 5-h window and most-active 10-h window for each day estimated from a moving average of a contiguous 5/10-h window. The L5 timing was defined as the number of hours elapsed from the previous midnight whereas M10 was defined as the number of hours elapsed from the previous midday. Sleep midpoint was the midpoint between the start and end of the SPT-window. L5, M10, and sleep midpoint variables capture the circadian characteristics of an individual. Sleep episodes within the SPT-window were defined as periods of the z-axis angle change less than 5° for at least 5 min62. Sleep duration in an SPT-window was calculated as the sum of all sleep episodes. The mean and standard deviation of sleep duration across all SPT-windows were investigated in this study. Sleep efficiency was calculated as sleep duration divided the total SPT-window duration in an SPT-window. Sleep fragmentation was examined by counting the number of sleep episodes of at least 5 min separated by at least 5 s of wakefulness within an SPT-window. Diurnal inactivity duration was the total duration of estimated bouts of inactivity that fell outside of the SPT-window in 24 h, which included both inactivity and naps.

Genotyping and quality control

DNA samples of 502,631 participants in the UK Biobank were genotyped on two arrays: UK BiLEVE (807,411 markers) and UKB Axiom (825,927 markers). In all, 488,377 samples and 805,426 genotyped markers passed standard QC64 and were available in the full data release. SNPs were imputed to a Haplotype Reference Consortium (HRC) panel (~96 million SNPs). The detailed description of genotyping, QC, and imputation are available elsewhere64. We further performed K-means clustering using the PCs of ~100,000 high-quality genotyped SNPs (missingness < 1.5% and MAF > 2.5%) and identified 453,964 participants of European ancestry.

Genome-wide association analysis

We performed a genome-wide association analysis (GWAS) of self-reported daytime sleepiness as a continuous variable derived from a 4-point scale using 452,071 individuals of European ancestry in the UK Biobank. A linear mixed regression model was applied adjusting for age, sex, genotyping array, 10 PCs, and genetic relatedness matrix, using BOLT-LMM with an MAF > 0.001, BGEN imputation score > 0.3, maximum per SNP missingness of 10%, and per sample missingness of 40%23. Reference 1000 genome European-ancestry (EUR) LD scores and genetic map (hg19) were implemented in this analysis. X-chromosome data were imputed and analyzed separately (with males coded as 0/2 and female genotypes coded as 0/1/2) using the same analytical approach in BOLT-LMM as was done for analysis of autosomes. A rare chrX signal at IGSF1 on chromosome X driven by one rare variant (MAF = 0.006) was identified, potentially attributed to genotyping artifact or false-positive association; therefore, we do not report it as a main finding. Similar linear mixed regression analyses were performed additionally adjusting for BMI and stratified by sex. Secondary GWAS excluding related individuals, shiftworkers, individuals who used psychiatric medications, and participants with chronic health and psychiatric illness (N = 255,426) was performed adjusting for age, sex, genotyping array, and 10 PCs in PLINK 1.9 (ref. 65). We used a hard-call genotype threshold of 0.1, SNP imputation quality threshold of 0.80, and an MAF threshold of 0.001. SNP-heritability, defined as the proportion of trait variance explained by genome-wide additive genetic effects, was estimated using BOLT-REML23. Genome-wide significance level was set at 5 × 10−8. Gene-sex and gene-health status interaction analyses were performed on unrelated individuals using a linear regression model in PLINK with the additional –interaction flag. Conditional analyses to dissect independent signals in significant genomic regions were performed using GCTA-COJO66 with MAF > 0.001 and genome-wide significant threshold of P < 5 × 10−8 through a stepwise selection procedure using –cojo-slct flag. Variant annotation for each significant locus was performed using PICS with 1000 Genome EUR LD reference with a causal probability of 0.2 or greater40.

Sensitivity and stratification analyses of significant loci

Sensitivity analyses of the genome-wide significant loci on autosomes in the primary analysis (P < 5 × 10−8) were performed additionally adjusting for potential confounders (including depression, socio-economic status, alcohol intake frequency, smoking status, caffeine intake, employment status, marital status, and psychiatric problems) and clinically important sleep traits (including sleep apnea, narcolepsy, sleep duration hours, insomnia, and chronotype) individually in 337,539 unrelated individuals using PLINK. Sleep traits were further adjusted in the model to investigate their combined effect on sleepiness signals. Stratified association analyses with self-reported daytime sleepiness were performed in persons without obesity (BMI < 30, N = 256,373) vs individuals with obesity (BMI ≥  30, N = 81,163), long sleepers (self-reported sleep duration > 8 h; N = 25,272) vs short sleepers (self-reported sleep duration < 7 h; N = 78,393) and tested for heterogeneity effect.

Heterogeneity analysis

Genome-wide significant loci identified by primary GWAS analysis were further investigated to understand their contribution to daytime sleepiness through different mechanisms by testing the associations between sleepiness risk alleles with BMI and other sleep traits in the UK Biobank (including self-reported sleep duration, insomnia, chronotype, long sleep duration [>8 h], short sleep duration [<7 h], snoring, obstructive sleep apnea defined by ICD-10 code [G47.3], hypersomnolence [defined as sleepiness plus long sleep duration without any chronic or psychiatric diseases], and 7-day accelerometry data). Linear or logistic regression analyses were performed adjusting for age, sex, genotyping array, and 10 PCs. Genome-wide summary statistics of sleep duration, insomnia, chronotype, long sleep duration, short sleep duration, and 7-day accelerometry using BOLT-LMM were available in public database49,59,67,68. We performed hierarchical cluster analyses using the pairwise Euclidean distances between 42 loci: \(D\left( {{\mathbf{X}}_{\mathbf{i}},{\mathbf{X}}_{\mathbf{j}}} \right) = \sqrt {\mathop {\sum }\nolimits_{k = 1}^4 \left( {x_{ik} - x_{jk}} \right)^2}\), where \({\mathbf{X}}_{\mathbf{i}} = \left( {x_{i1},x_{i2},x_{i3},x_{i4}} \right)^{\mathrm{T}}\) corresponds to the association z-scores with accelerometer-derived sleep efficiency, sleep duration, sleep fragmentation (number of sleep periods), and self-reported insomnia for an SNP i, i = 1, …, 42. We took an iterative approach to improve the performance of our clustering analysis by removing cluster outliers based on silhouette coefficients. Briefly, the silhouette coefficient is a measure of how similar an object is to its own cluster (cohesion) compared to other clusters (separation). It ranges from −1 to +1, where a high value indicates that the object is well matched to its own cluster and poorly matched to neighboring clusters. In the initial clustering results (Iteration 1), loci at GAPVD1/MAPKAP1, PATJ, and POM121L2/FKSG83 showed negative silhouette coefficients, indicating that they were likely to be incorrectly clustered (Supplementary Fig. 8a). We therefore removed these loci in a subsequent iteration of clustering analysis. In Iteration 2, ECE2 locus then showed a negative silhouette coefficient, and therefore this was removed (Supplementary Fig. 8b). In Iteration 3, all loci showed positive silhouette coefficients (Supplementary Fig. 8c), indicating reasonable classification. The average silhouette coefficient improved from 0.32 in our original classification to 0.4 in our final classification (with all positive coefficients).

Gene, pathway, and tissue enrichment analyses

We further examined the genes within genome-wide significant loci using gene-based pathway and tissue enrichment analyses45,47,69. Gene-based analysis was performed using PASCAL, which estimated a combined association P value from the summary statistics of multiple SNPs in a gene45. Pathway and ontology enrichment analyses were performed using FUMA69 and EnrichR47. Tissue enrichment analysis was performed using MAGMA46 in FUMA, which controlled for gene size. Pathway and tissue enrichment analyses were also performed on genes within loci belonging to sleep propensity and sleep fragmentation clusters separately.

We constructed a weighted GRS comprising the 42 significant sleepiness loci and tested for associations with other self-reported sleep traits (sleep duration, long sleep duration, short sleep duration, insomnia, chronotype, and day naps), and 7-day accelerometry traits in the UK Biobank. Weighted GRS analyses were performed by summing the products or risk allele count multiplied by the effect estimate reported in the primary GWAS of self-reported daytime sleepiness using R package gds (https://cran.r-project.org/web/packages/gds/gds.pdf). We also tested the GRSs of reported loci for insomnia, sleep duration, short sleep, long sleep, day naps, chronotype, restless legs syndrome (RLS), narcolepsy, and coffee consumption associated with self-reported daytime sleepiness using the same approach. The SNPs selected for each trait include 57 genome-wide significant loci for frequent insomnia49; 78, 27, and 8 loci for sleep duration, long sleep, and short sleep, respectively59; 348 loci for chronotype67; 125 loci for daytime napping; 20 genome-wide significant loci for RLS48; 8 non-HLA suggestive significant loci (P < 10−4) in a narcolepsy case–control study of European Americans51, and 8 loci for coffee consumption50.

Genetic correlation analyses

Genetic correlation analysis using LD Score regression was performed on genome-wide SNPs mapped to the HapMap3 reference panel between daytime sleepiness (with and without adjustment for BMI) and 233 published GWAS available in LDHub52. The significance level was determined as 10−4 correcting for multiple comparisons. Pairwise genetic correlations among daytime sleepiness, frequent insomnia, sleep duration, long sleep duration, short sleep duration, and chronotype were performed locally using LDSC. We also partitioned heritability across 8 cell-type regions and 25 functional annotation categories available in LDSC70. Enrichment of the partitioning heritability was calculated in each region with and without extension (±500 bp).

MR analyses

To investigate the causal relationship between daytime sleepiness and other traits, we performed two-sample MR using MRbase package in R53. IVW approach, assuming no horizontal pleiotropy effect, was implemented as the primary approach in this analysis. BMI, type 2 diabetes, coronary heart disease, psychiatric, reproductive traits, and other sleep and circadian traits (narcolepsy, insomnia, sleep duration, and chronotype) were tested as exposures for daytime sleepiness. Independent genome-wide significant SNPs extracted from publicly available summary statistics of exposures of interest (Supplementary Table 20) were tested as instruments for their effect on daytime sleepiness. The significance level was determinate as IVW P < 0.003 after accounting for multiple comparisons. We identified a putative causal association of higher BMI with increased sleepiness risk (IVW β = 0.018; 95% CI [0.008, 0.028]; P = 0.0004; Supplementary Table 21). The mean F statistic was 32.7, indicating the instruments were sufficiently strong. However, Cochran’s Q statistic was calculated to be 677.16 (P = 1.09 × 10−37), indicating substantial heterogeneity about the IVW slope. This is an indicator of potential horizontal pleiotropy that violates the traditional IV assumptions. Therefore, we applied MR-Egger regression on the radial plot scale as a sensitivity analysis54. The mean \(I_{\mathrm{{GX}}}^2\) statistic is 0.89, indicating that instruments are sufficiently strong for this analysis71. We observed a consistent effect direction for Radial MR-Egger (β = 0.025; 95% CI [−0.005, 0.055]; P value = 0.103). Rucker’s Q statistic for Radial MR-Egger is 676.58, indicating that the IVW and Radial MR-Egger models fit the data equally well (Supplementary Table 21). We also investigated the suggestive putative causal association of type 2 diabetes with increased sleepiness risk (IVW mean F = 29; β = 0.005; 95% CI [0.001, 0.009]; P value = 0.014; Supplementary Table 21). Given variants heterogeneity evidence (Cochran’s Q = 88.38; P = 0.005), we performed sensitivity analysis using Radial MR-Egger again (mean \(I_{\mathrm{{GX}}}^2\) = 0.84) and observed consistent effect direction (β = 0.025; 95% CI [−0.005, 0.055]; P = 0.637; Supplementary Table 21). Reverse MR between daytime sleepiness and other outcome were conducted using genome-wide significant sleepiness SNPs as instruments, and did not identify any causal association (IVW P > 0.05; Supplementary Table 22).

Replication analyses

Replication analyses were conducted using self-reported day sleepiness or fatigue in Scandinavian individuals from four population-based studies, including Nord-Trøndelag Health Study (HUNT), Health 2000 Survey, FINRISK, and The Finnish Twin Cohort Study.

The HUNT is a large longitudinal population health study, investigating the county of Nord-Trøndelag, Norway since 1984 (ref. 36). Three surveys (HUNT1 [1984–1986], HUNT2 [1995–1997], and HUNT3 [2006–2008]) have been completed including more than 120,000 individuals. Daytime sleepiness phenotype was collected in HUNT3 by asking the question “How often in the last 3 months have you felt sleepy during the day?” with the choices “Never/seldom”, “Sometimes”, and “Several times”. Individuals with self-reported stroke, myocardial infarction, angina pectoris, diabetes mellitus, hypo- and hyperthyroidism, fibromyalgia, and arthritis were excluded from the replication analysis.

DNA samples were collected in 71,860 HUNT samples and genotyped on one of three Illumina arrays: HumanCoreExome12 v1.0, HumanCoreExome12 v1.1, and UM HUNT Biobank v1.0. Imputation was performed on samples with European ancestries using a combined reference panel comprised of the HRC and 2202 whole-genome sequenced HUNT participants. In total, 29,906 individuals with both phenotype and imputed genotype data were available for this analysis. Sample distributions are presented in Supplementary Table 9. A generalized linear mixed model analysis was performed on continuous sleepiness adjusted for age, sex, genotyping batch effect, and four PCs using SAIGE v0.25. A second analysis additionally adjusting for BMI was conducted for replications of loci identified after adjusting for BMI.

The Health 2000 Survey is a population-based sample representing the population structure of individuals from Finland who at the time of contact were over 18 years old. Individuals over 30 years of age answered a number of health and lifestyle-related questionnaires37. These data were collected between 11 September 2000 and 2 March 2001 with a goal to reveal and study public health problems in Finland. The Ethics Committee of the Helsinki and Uusimaa Hospital District approved the study protocol, and a written informed consent was obtained from all participants after providing a description of the study. Full Epworth sleepiness scale (0–24) was included among the questionnaires and included from 4546 individuals with genotyping data on the study (Supplementary Table 10).

Genotyping was performed at Finnish Genome center using IlluminaHuman610K genotyping array. Imputation was performed against 2,690 hcWGS and 5092 WES Finnish genomes (http://www.sisuproject.fi/). Linear regression analyses were performed on continuous ESS adjusted for age, sex, genotyping batch effect, and 10 PCs using snptest v2.5. Shiftworkers were excluded and secondary analysis was adjusted with BMI.

The FINRISK is a population-based study initiated in 1972 and collected every 5 years since then in Finland to investigate the risk factors for cardiovascular outcomes38. Nine cross-sectional surveys including 101,451 participants aged 25–74 years old were conducted between 1972 to 2012. DNA samples have been collected since the 1992 survey.

We studied exhaustion and fatigue in this population. This was ascertained by asking a question “During the past 30 days, have you felt yourself exhausted or overstrained?” with choises “Never”, “Sometimes” and “Often”. In total, 20,344 individuals with both phenotype and whole genome genotyped and imputed data were available for this study (Supplementary Table 12). Genotyping was performed at the Wellcome Trust Sanger Institute (Cambridge, UK), at the Broad Institute of Harvard and MIT (MA, USA), and at the Institute for Molecular Medicine Finland (FIMM) Genotyping Unit using Illumina beadchips (Human610-Quad, HumanOmniExpress, HumanCoreExome). The data were imputed using the 1000 Genomes project phase 3 haplotypes and a custom haplotype set of 2000 whole genome sequenced Finnish individuals as reference panels. Linear regression analyses for exhaustion was performed with snptest v2.5 and adjusted with age, sex, genotyping batch effects, and 10 PCs. Shiftworkers were removed from analyses, A secondary analysis was additionally adjusted for BMI.

The Finnish Twin Cohort Study consists of same-sexed twin pairs born before 1958, who participated in two questionnaire surveys in 1975 and 1981. In 1990, twins who had participated in either previous survey and who were born in 1930 to 1957 were invited to participate in a questionnaire survey in 1990. The survey included a broad set of items on sleep and sleep disorders, as reported earlier39. Daytime fatigue was ascertained by asking the question “During the past year have you experienced any of the following symptoms: Daytime fatigue?” with the choices “Never”, “Every day or almost every day”, “On 3–5 days per week”, “On 1–2 days per week”, “Less often than once a week”, “About once a month” and “Rarely”. These were coded to three categories where “Never” & “Rarely” coded to represent category of “Low”, “On 1–2 days per week” & “Less often than once a week” as “Intermediate” and “Every day or almost every day” & “On 3–5 days per week” as “High”. Total of 5766 individuals with phenotype and imputed genotype data were available for the study (Supplementary Table 13). Genotyping were done at the Wellcome Trust Sanger Institute (Cambridge, UK), at the Broad Institute of Harvard and MIT (MA, USA), at the Institute for Molecular Medicine Finland and at the Thermo Fisher Scientific (Santa Clara CA, USA) using Illumina (Human610-Quad, Human670-QuadCustom, HumanCoreExome) and Affymetrix (FinnGen Axiom array) platforms. Genotypes were imputed using the Haplotype Reference Consortium release 1.1 reference panel. Linear mixed model association for EDS was performed with RVTESTS v2.0.9 adjusted for age, sex, and the genetic kinship matrix as a random effect controlling for sample relatedness and population structure.”

A GRS of all sleepiness loci were also tested in the four cohorts. Meta-analyses of the sleepiness cohorts (HUNT and Health 2000) and the tiredness cohorts (FINRISK and Finnish Twin) were performed using Fisher’s method.

Reporting Summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.