Main

Adults are advised to sleep at least seven to eight hours each night1,2,3,4, and it is widely perceived that shorter sleep could be a pervasive negative factor for physical, mental and cognitive health5,6,7,8, yielding increased risk of Alzheimer’s disease (AD) and other dementias9,10,11,12,13,14,15. However, we still do not know what amount of sleep is associated with good brain health and whether a causal relationship between variations in habitual sleep duration and brain health exists. Here we address these questions, analysing MRIs of the brain and genetic data in a combined longitudinal and cross-sectional design.

Brain health encompasses multiple features16. Important aspects can be indexed by rate of atrophy, which increases in normal ageing17, in cognitive decline18, in AD19 and with, for example, cardiovascular risk factors20. Lower rates of atrophy are related to healthy lifestyle21 and better maintained cognitive function22. Hence, if insufficient habitual sleep has detrimental effects on the brain, it is likely that short sleep will be associated with higher rates of atrophy. Still, the evidence for a role of sleep in neurodegeneration was not considered sufficiently strong to include sleep among the 12 potentially modifiable risk factors by the Lancet Commission on dementia prevention23, the World Health Organization guidelines on the risk reduction of cognitive decline and dementia24 do not mention sleep, and only a few studies have tested the relationship between sleep duration and brain atrophy using longitudinal MRIs. One study reported durations shorter and longer than seven hours to be associated with more frontotemporal grey matter loss25, while three others found no relationships26,27,28. The paucity of relationships reported may be due to small effect sizes with insufficient statistical power and scarce sampling of very short and very long sleep durations. In a longitudinal study of 28,000 participants, faster cognitive decline was observed in individuals sleeping four hours or less or ten hours or more, compared with a reference group sleeping seven hours, with no relationship between these extreme intervals29. In a cross-sectional study of 21,000 participants from the UK Biobank (UKB), we found that variations within the range of five to nine hours of sleep were not related to smaller hippocampal volume, whereas shorter and longer durations were26.

The question of how much sleep is associated with good brain health can also be addressed using cross-sectional data. There seems to be an inverted U-shaped relationship between sleep duration and brain health, since both long and short sleep are associated with increased risk of cognitive decline30,31 and smaller regional brain volumes26. This pattern falls into a broader line of converging evidence from multiple sources of research. A meta-analysis of 35 studies of sleep duration and mortality found that seven hours was associated with the lowest risk32. Two recent very large studies found seven hours of sleep to be associated with the highest cognitive performance33,34 and lowest dementia risk35. Seven hours is close to the average reported sleep duration in epidemiological studies36, suggesting that average and ‘optimal’ sleep duration converge. Hence, we would expect similar estimates in cross-sectional analyses of brain morphometry. Importantly, such results cannot be used to make inferences about atrophy and brain change, as inter-individual brain volumetric differences even in adults mainly reflect early developmental processes17,37. Accordingly, larger brain volumes are positively and stably related to lifelong higher cognitive function and demographic variables such as education17,38,39. Cross-sectional sleep–volume relationships26,40,41,42,43,44,45,46,47,48,49,50 therefore represent mostly stable factors, not brain changes26 (for an overview of previous studies, see Supplementary Information, ‘Reviewed studies’).

Cross-sectional relationships can be further investigated using genetic information. Twin and genome-wide association studies (GWAS) have demonstrated heritability of and polygenic influences on sleep duration, although GWAS heritability is modest51,52,53,54,55,56,57. Single nucleotide polymorphism heritability (SNP-h2) for sleep duration is typically below 10% (ref. 58). To date, up to 78 independent genetic loci have been associated with sleep duration51, among which the thyroid-specific transcription factor gene (PAX8) and Vaccinia related kinase 2 (VRK2) have been considered as the most robust findings. Besides gene discovery, genetic overlaps between sleep duration and other conditions have been studied51,53,59, suggesting pleiotropy between sleep duration, somatic disorders and neuropsychiatric health. However, no studies have investigated whether genes affect sleep duration uniformly for below-average versus above-average sleepers. Sleep duration tends to be positively related to health in below-average sleepers and negatively related to health in above-average sleepers. If the same is true for brain characteristics, it will be interesting to investigate genetic differences between these participants and to use Mendelian randomization (MR) analyses to examine the possible relationships between sleep duration and brain health as indexed by MRI.

Here we tested the relationship between sleep duration and rates of brain atrophy. Sleep duration was chosen as the sleep metric of focus because it is the most widely used, represents an aspect of sleep that for many people is under voluntary control and constitutes the basis for most recommendations about sleep. A higher rate of atrophy was regarded as a marker of declining brain health17,18,19,20. Longitudinal data from the Lifebrain consortium60 were combined with legacy data, yielding a sample of 8,153 longitudinal MRI brain scans from 3,893 participants (20–89 years), with two to seven examinations covering up to 11.2 years (mean, 2.51; s.d., 1.45; see Table 1). Possible influences of relevant somatic, psychiatric and societal variables were assessed. Additional analyses were conducted using 51,295 MRIs from 47,029 participants to estimate the amount of sleep associated with the overall thickest cortex and largest regional brain volumes. Genetic analyses were undertaken to further investigate the sleep–brain relationships. We took advantage of measured variation in genes for each trait of interest and used MR61 to explore the associations between sleep duration and brain structure.

Table 1 Origins of the total sample

Results

Associations were tested by using generalized additive mixed models (GAMMs) in R62, a nonlinear statistical approach that does not require a priori specification of a polynomial functional form63. Because the relationships between sleep duration and a range of health-related measures typically form an inverted U-shape, this approach allows us to accurately estimate the number of hours of sleep associated with the largest regional brain volumes and thickest cortex64. The code, detailed model statistics, complementary results and exact sample size for each sub-analysis are presented in the Supplementary Information. All statistical tests were two-tailed, and P values were adjusted according to the Benjamini–Hochberg procedure65.

Self-reported sleep across ages

Mean self-reported sleep duration per night as a function of age is shown in Fig. 1a, imposed on the US National Sleep Foundation recommendations66. The average sleep duration was relatively stable around seven hours across the lifespan. While it was significantly related to age (F = 33.1, P < 2 × 1016, N = 47,034), age explained a very small part of the variance (R2 = 0.006). The average reported sleep durations were at or below the lower recommended limits at most ages. The distributions of sleep durations as functions of different covariates are shown in the Supplementary Information, ‘Subcortical cross-sectional’.

Fig. 1: Cross-sectional relationships.
figure 1

a, Self-reported sleep duration superimposed on the recommended sleep intervals from the National Sleep Foundation. The blue/grey area depicts the recommended sleep interval (blue indicates ‘recommended’; grey indicates ‘may be appropriate’). The green line shows average self-reported sleep in this study; the blue and red lines show the 75th and 25th percentiles, respectively. The shaded area around each curve shows the 95% CI. b, Clusters of regions showing similar relationships between thickness and sleep duration. The graph shows thickness in each cluster as a function of sleep duration. The maximum thickness is 100%, illustrated by the coloured dots. NA, non-cortical region. c, Subcortical and global volumes as a function of sleep duration. The maximum volume is 100%. The red dots show the average reported sleep duration. Only regions significantly related to sleep duration are shown. The plots are corrected for baseline age, sex, site, follow-up time and ICV (except for the ICV plot). CC, corpus callosum.

Longitudinal sleep–brain atrophy associations

We analysed 19 volumetric brain variables and 32 cortical regions67, summed across the hemispheres. For each measure y, we ran the following model for the ith observation of the jth participant:

$$\begin{array}{l}{y}_{{ij}}=f\left({{\mathrm{age}}}_{{{\mathrm{bl}}},\,j}\right)+{\beta }_{1}\left({{\mathrm{age}}}_{{{\mathrm{bl}}},\,j}\right)\times {{\mathrm{sleep}}}_{j}+{\beta }_{2}\left({{\mathrm{age}}}_{{{\mathrm{bl}}},\,j}\right)\\ \times {{\mathrm{time}}}_{{i\,j}}+{\beta }_{3}\left({{\mathrm{sleep}}}_{j}\right)\times {{\mathrm{time}}}_{{i\,j}}+{{\mathrm{covariates}}}_{{ij}}+{b}_{j}+{\varepsilon }_{{i\,j}}\end{array}.$$

Here f(agebl,j) is a smooth function of age at baseline, agebl,j. Next, β1(agebl,j), β2(agebl,j) and β3(sleepj) are varying-coefficient terms that depend smoothly on their arguments68. All smooth terms were constructed with cubic regression splines and penalized on the basis of their squared second derivatives. The term timeij denotes the time since baseline at the ith time point of the jth participant, and sleepj denotes the sleep duration of the jth participant. The first three smooth terms serve to control for the effect of age on the brain measure, the cross-sectional (between-participant) effect of sleep on the brain measure and how the effect of time depends on age, respectively. The fourth term, β3(sleepj) × timeij, is of primary interest, since it describes how the effect of time depends on sleep duration. Baseline age, self-reported sex, site and follow-up time were used as covariates. Intracranial volume (ICV) was included as a covariate in the volumetric analyses. Finally, bj is a random intercept term for participant j, and εij is a residual, both assumed normally distributed. The model was estimated using maximum marginal likelihood.

As sleep duration was available for one time point only for most of the participants, we used the average value across time points for the small number of participants for whom more than one observation was available. The cortical analyses focused on thickness, which changes considerably with age69,70,71, but the results for area and volume are reported for completeness. Post hoc analyses were run controlling for socio-economic status (SES: income and education), body mass index (BMI), depression symptoms and a measure of global sleep quality in turn as covariates, as these variables may affect sleep duration, brain structure and possibly the relationship between them72,73,74.

For the 32 cortical regions, no significant relationships between sleep duration and cortical thinning, volume loss or area changes were found. As can be seen in Fig. 2 (right), for no region or metric was the P value smaller than 0.05. The same was seen for volume and area. In the main analyses, sex was included as a regressor. We also ran separate analyses for males and females, still yielding no evidence for significant relationships between sleep and thickness change for any cortical region (Supplementary Information, ‘Cortical longitudinal’).

Fig. 2: Sleep duration, cortical thickness and thickness change.
figure 2

P values corrected for multiple comparisons by FDR are shown for each cortical region. The left panel shows the cross-sectional results (thickness). The right panel shows the longitudinal results (thickness change). GAMMs were used for testing. The P values are two-sided and adjusted using the Benjamini–Hochberg procedure. The dashed lines show P < 0.05. The results for cortical area and volume are shown in the Supplementary Information, ‘Cortical cross-sectional’ and ‘Cortical longitudinal’. Bankssts, banks of the superior temporal sulcus.

The results for the 19 volumetric structures are shown in Table 2. Longer sleep was linearly related to greater volume loss for the caudate (P = 0.02), while shorter sleep was linearly related to greater volume loss for cerebellum white matter (P = 0.006) and thalamus (P = 0.006) and greater expansion of the ventricles (P = 0.02) (for the details, see the Supplementary Information, ‘Subcortical longitudinal’). When we restricted the analyses to sleep duration at least five and no more than nine hours, the P value for the association between sleep duration and caudate atrophy increased to 0.07, while the other relationships were still significant. However, none of the sleep–atrophy relationships survived controlling for SES, and the relationship with the ventricles also did not survive controlling for BMI or depression symptoms, despite low correlations between sleep duration and the different covariates (for education, r = 0.02; for income, r = −0.02; for BMI, r = −0.04; for height (UKB only, controlling for age and sex), r = 0.032; for depression, r = −0.06; all P < 0.05). Controlling for the global sleep quality score did not weaken the duration–brain change relationships, but the relationships for brainstem and putamen became significant when controlling for the global score.

Table 2 Associations between sleep duration and brain volumetric change

Cross-sectional sleep–brain morphometry associations

We ran the model

$${y}_{{ij}}=f\left({{\mathrm{age}}}_{{ij}},{{\mathrm{sleep}}}_{j}\right)+{{\mathrm{covariates}}}_{{ij}}+{b}_{j}+{\varepsilon }_{{ij}}$$

for each brain variable, where yij denotes the volume or thickness for participant j at time point i; f(ageij, sleepj) is a tensor interaction term constructed with cubic regression splines according to ref. 75; the covariates are sex, site and (for volumetric analyses) ICV; bj are random intercepts; and εij are residuals. Note that although the estimated effects are cross-sectional, all available data were used, and hence random intercepts were included. The full model was compared to two reduced models: a model in which the tensor interaction term was replaced by two additive terms, f1(ageij) and f2(sleepj), and another model in which sleep was completely removed. As these models are nested, comparison in terms of likelihood ratio tests is valid. We hence based model selection on a likelihood ratio test with a 5% significance level. This allowed us to estimate the sleep duration associated with the maximum subcortical volume and cortical thickness and the smallest ventricles. The cortical results were visualized using ggseg76 (the vertex-wise results are shown in the Supplementary Information, ‘Cortical vertex analyses’).

For 27 of the 32 cortical regions, a significant relationship between sleep duration and thickness was found (Fig. 2, left)—that is, the model without sleep terms was rejected in the likelihood ratio test. When we split the analyses by sex, five regions showed significant sleep–thickness relationships in males only (cuneus, lateral orbitofrontal, lateral occipital, fusiform and entorhinal). A formal sex-interaction analysis of these regions showed a significant effect of sex only for lateral orbitofrontal cortex, where very short and very long sleep were both more associated with thinner cortex in males than females.

The results for the total sample were entered into a K-means cluster analysis to reduce the dimensionality of the cortical data. This yielded three clusters (Fig. 1b and Supplementary Information, ‘Cortical cross-sectional’). One cluster (Cluster 1) covered the posterior medial cortices, superior parietal and caudal anterior cingulate cortex. The second and largest cluster (Cluster 2) included most of the lateral surface and superior frontal cortex. The third cluster (Cluster 3) included the rest of the lateral cortex (that is, insula and pars opercularis) and medial regions such as medial orbitofrontal cortex, rostral anterior cingulate, posterior cingulate and parahippocampal gyrus. All clusters showed inverted U-shaped relationships to sleep duration, but Cluster 1 showed the weakest effect. The vertex-wise analyses confirmed this general finding, showing only positive relationships between sleep duration and thickness in the below-average sleepers, and only negative relationships in the above-average sleepers (Supplementary Information, ‘Cortical vertex analyses’). To exclude the possibility that the use of different scanners influenced the results, we also used a two-stage approach. We first ran the thickness–sleep duration GAMMs separately in each sample and then performed meta-analysis of the different cohort results. This yielded very similar estimates, demonstrating that the use of different scanners did not bias the results (Supplementary Information, ‘Mega vs meta-analytic approach’).

The volumetric results are shown in Fig. 1c, Fig. 3 and Table 3. Most structures showed significant inverse U-shaped relationships to sleep duration. The sleep durations associated with the maximum subcortical volume, the thickest cortex and the smallest ventricles in Table 3 were entered into a meta-analysis. Weights were applied such that cortex and subcortex contributed equally to the meta-analytic fit. We excluded total grey matter volume (TGV) since this variable is a sum of other included variables. Corpus callosum structures were excluded because, with one exception, their estimated sleep duration at maximum volume could not be defined (monotonous sleep–volume relationship). Random-effects meta-analysis was used, to allow regions to have different sleep durations associated with maximum volume or thickness. The estimates and standard errors were computed by 5,000 Monte Carlo samples from the empirical Bayes posterior distribution of the model for each region, constraining the number of hours of sleep to be between four and ten. The detailed results are presented in the Supplementary Information, ‘Meta-analysis’. A sleep duration of 6.5 hours was associated with the maximum subcortical volume, the smallest ventricles and the thickest cortex. The critical values, as defined by the 95% confidence interval (CI), were 5.7 and 7.3 hours. Variability across age was small, while variability across regions was considerable. Controlling for the effects of SES, BMI and depression symptoms as covariates had no notable effects on the results, and no significant interaction effects with these variables were found (Supplementary Information, ‘Subcortical cross-sectional’). The analyses were also run controlling for the global sleep quality score (Fig. 3). Except for the thalamus, where sleep duration at maximum volume was reduced to the lower limit (four hours) when controlling for global sleep quality, most peak estimates were similar for the default model versus the model including global sleep quality as a covariate.

Fig. 3: Sleep at maximum subcortical volume.
figure 3

The sleep durations associated with the maximum subcortical volume are indicated by the dots. Only regions significantly related to sleep duration are shown. The error bars indicate 95% CIs (N = 47,029; 51,295 observations). The default model is shown in red, and the model including global sleep quality as a covariate is shown in turquoise.

Table 3 Estimated sleep duration in hours associated with maximum (minimum for ventricles) volume or thickness for the variables used in the meta-analysis

Since ICV showed a relationship with sleep duration, we reran the cross-sectional meta-analysis without controlling for ICV. As expected, this affected the results, yielding 7.0 hours (95% CI, (6.3, 7.7)) as the duration associated with maximum volume and thickness.

GWAS, polygenic scores and MR

To explore the possible associations between brain structure and sleep duration, we performed a series of genetic analyses using cross-sectional data from UKB. The hippocampus, TGV and ICV were chosen as the regions of interest (ROIs) as they showed the typical inverted U-shaped relationship to sleep duration. For details about the selection of participants, quality control procedures and genetic analyses, see the Supplementary Information, ‘Genetic analyses’, ‘Genetics notes’ and ‘Genetics tables’.

The sample was stratified into shorter-than-average (≤7 hours) and longer-than-average (>7 hours) sleepers. Since an inverse U-shaped relationship between sleep duration and health—including brain health—is established, both short and long sleep are associated with poorer health. Importantly, the genetic contributions to sleep duration and brain health may be different in short sleepers compared with long sleepers51, and hence different relationships were expected in these two groups. Two independent samples were used for GWAS: (1) participants sleeping ≤7 hours without MRI (N = 197,137) and (2) participants sleeping >7 hours without MRI (N = 112,839). GWAS was performed independently for each trait in the corresponding sample. We further performed GWAS for hippocampal volume, TGV and ICV using the 29,155 UKB participants who were not included for the sleep duration GWAS. The GWAS results for these brain features were used for the polygenic score (PGS) and MR analysis bellow. Further details, Manhattan plots and QQ plots showing the GWAS results are presented in the Supplementary Information, ‘Genetic analyses’. We did not observe noticeable inflation in the association statistics (λ = 1.03 and 1.02 for shorter- and longer-than-average sleepers, respectively).

We discovered three genomic loci significantly associated with sleep duration for participants sleeping ≤7 hours and one for those sleeping >7 hours, with minor allele frequency (MAF) >0.001. The three loci for short sleep included a region on chromosome 3 (hg19, chr3:52978418–53171555), a region on chromosome 11 (chr11:116631186–117072176) and a region on chromosome 15 (chr15:54586505–54622690). Genes mapped to these regions include APOA1/4/5, APOC3, ZNF256, BUD13, UNC13C, SIDT2, TAGLN, SIK3, PCSK7, RFT1, SFMBT1 and PAFAH1B2. The only locus for longer-than-average sleepers was mapped to chromosome 3 (chr3:70671137–70843060) and included two pseudo-genes, COX6CP6 and RNU6-281P, neither of which yet has known functions.

SNP-h2 was estimated by the linkage disequilibrium (LD) regression models77 for sleep duration in the shorter-than-average sleepers (h2 = 0.045, s.e. = 0.0035) and longer-than-average sleepers (h2 = 0.021, s.e. = 0.0047), hippocampal volume (h2 = 0.29, s.e. = 0.03), TGV (h2 = 0.22, s.e. = 0.03) and ICV (h2 = 0.35, s.e. = 0.03). The genetic correlation for sleep duration was negative for the shorter- versus longer-than-average sleepers (rg = −0.40, s.e. = 0.10, P = 9.65 × 10−5), showing that the genes related to longer sleep in the below-average-sleep-duration group are related to shorter sleep in the above-average-sleep group.

Corresponding PGSs were calculated for each variable in each sleep duration group separately. The PGSs for ICV (PGS-ICV: t = 8.47; P corrected by false discovery rate (FDR), 2.4 × 10−15) and TGV (PGS-TGV: t = 4.65, PFDR = 3.28 × 10−5) were significantly associated with sleep duration in the shorter-than-average sleepers (Fig. 4). The PGS for sleep duration in the shorter-than-average sleepers was significantly related to ICV (t = 6.99, PFDR = 3.03 × 10−11, Fig. 4b) and to a lesser extent with TGV (t = 2.69, PFDR = 6.42 × 10−2). No significant associations were identified for other pairs of traits.

Fig. 4: Genetic relations between sleep duration and brain structure.
figure 4

a, Distributions of PGSs for ICV in different sleep duration strata among shorter-than-average sleepers (3–4 h, N = 541; 4–5 h, N = 2,937; 5–6 h, N = 13,863; 6–7 h, N = 177,493), expressed in s.d. compared with the sample mean of 0. The horizontal lines in each violin represent the median group value and the interquartile range, the height represents the 95% CI and the width represents the probability density. b, ICV for shorter-than-average sleepers with one standard deviation above (blue) and below (red) the average PGS for sleep duration. The grey shaded areas represent the 95% CIs. c, SNP effects on ICV (x axis) and sleep duration (y axis) for the shorter-than-average sleepers. d, TGV for shorter-than-average sleepers with one standard deviation above (blue) and below (red) the average PGS for sleep duration. The grey shaded areas represent the 95% CIs.

We performed bidirectional MR analysis for each brain volumetric trait to sleep duration (see also Supplementary Information, ‘STROBE-MR-checklist’, reporting according to best practice for MR studies). Among the 12 pairs, ICV showed an effect (34 instrumental SNPs; minimal F statistics, >24; inverse-variance weighted β; 0.060; s.e. = 0.017; P = 5.36 × 10−4) on sleep duration for the shorter-than-average sleepers (Fig. 4c and Supplementary Information, ‘Instrumental variables’), with no evidence of effects of sleep on ICV. TGV showed a trend-level effect for the shorter-than-average sleepers (P = 0.12). The low heritability for sleep in the study resulted in a weaker genetic instrument. While we were powered (>80%) to detect a true causal effect for hippocampal volume and ICV of 0.3 or larger, the low heritability for sleep in the study required a much larger sample size, on the basis of the Freeman model for power calculations in MR studies78. We therefore performed a robust MR analysis using the robust adjusted profile score79 for the direction from sleep to brain traits, but we did not detect significant relationships for the directionality of effects by this even more liberal threshold. The only significant relation was robust when we performed the analysis using both a stringent and a weaker instrument selection protocol. This means that we did not detect evidence for strong effects of sleep duration on brain morphometry. See the Supplementary Information, ‘Genetic analyses’ and ‘Genetic notes’, for the full statistical results.

Discussion

The current results give no indication that shorter or longer habitual sleep duration is associated with higher rates of brain atrophy measured longitudinally. Across a range of cortical and subcortical regions and metrics, no statistically significant relationships were observed when controlling for BMI, SES or depression symptoms. The absence of significant relationships was observed both when using the full range of self-reported sleep durations and when restricting the sample to those sleeping between 5 and 9 hours. Cross-sectionally, 6.5 hours of sleep was associated with the maximum relative regional brain volume and cortical thickness and the smallest ventricular volumes when controlling for ICV, with the critical lower limit being 5.7 and the higher limit 7.3 hours. This was true also when controlling for a measure of global sleep quality. A duration of 6.5 hours is below the lower limit of the current international recommendations, and 7.3 is lower than the upper limit suggested by the US National Sleep Foundation1,2,3,4. ICV was positively related to sleep duration, so not controlling for ICV yielded a cross-sectional association with maximum volume and thickness of 7.4 hours. Aligning with the longitudinal results, the MR analyses did not reveal evidence for an impact of short sleep on brain structure. Taken together, the longitudinal, cross-sectional and genetic results suggest that short habitual sleep duration is weakly related to poorer brain health in healthy adults as indexed by structural brain measures, and that somewhat less than 7 hours of sleep is associated with the most favourable features, in line with converging evidence from research on mortality, health and cognition.

Sleep duration and the brain

Sleep duration is the most widely studied, best supported and most straightforward sleep measure to address in relation to health7. It is also an aspect of sleep that may partly be modified by lifestyle. Our longitudinal results did not yield evidence for any relationship between sleep duration and brain atrophy. We therefore used the full sample of cross-sectional data to calculate the amount of sleep associated with maximum relative regional brain volume and cortical thickness. It is noteworthy that the resulting 6.5 hours is relatively well aligned with the average reported sleep duration of 7 hours and similar to the results of a recent meta-analysis of more than one million participants36. Furthermore, this corresponds with the conclusion of a meta-analysis of 35 studies of sleep duration and mortality, where 7 hours of sleep was associated with lowest risk of premature death among adults32. A study of more than 700,000 participants found 7 hours of sleep to be associated with the highest performance on a spatial navigation task34, and a prospective study of more than 400,000 UKB participants found the lowest dementia risk in those reporting to sleep 7 hours35. Converging evidence thus suggests that somatic and brain health are associated with about 7 hours of sleep.

Results across studies suggest that substantially shorter sleep than the recommended duration does not need to be associated with worse outcomes. However, longer sleep may be. Previous research has established associations between long sleep and poorer brain48, cognitive25,26,29,49,80 and somatic health81. For instance, less than 5 hours and 8 hours of sleep were associated with similar increases in risk of premature death32. This mirrors the present results: 4 and 8 hours of sleep were associated with the same deviations from maximum cortical thickness. The American Academy of Sleep Medicine and the Sleep Research Society proposed no upper limit7 on sleep duration, whereas the US National Sleep Foundation recommended a maximum of 9 hours through most of adulthood and 8 hours in older adults66. From the present analyses, the critical values for short and long sleep were 5.7 and 7.3 hours, demonstrating that sleep durations longer than average but well within the recommended range may still be associated with less favourable volumetric brain outcomes. Associations between longer sleep duration and worse outcomes are often ascribed to underlying comorbidities7,66,81. We addressed this by controlling for somatic (BMI), mental (symptoms of depression) and social (SES) factors. Importantly, the longitudinal results showed that neither long nor short sleep was associated with higher rates of brain atrophy. We therefore believe that the combined longitudinal and cross-sectional results make a strong case that short habitual sleep is not a prevalent cause of poorer brain health as indicated by structural brain measures and rates of atrophy in the samples studied here.

In this regard, the association between ICV and sleep duration is interesting. ICV was the MRI-derived measure most positively associated with sleep duration, and the MR analysis suggested an effect of ICV on sleep duration in the shorter-than-average sleepers but not the inverse. As sleep has no causal effect on ICV in adults, this relationship must reflect other factors and demonstrates that associations between sleep duration and MRI-derived volumes may reflect non-causal and stable relationships that do not emerge as a function of variations in sleep duration. The partly common genetic underpinning of ICV and sleep duration suggests that there may be a mechanistic association, but this is not caused by sleep. Controlling for ICV removes the effect of global scaling—that is, that regional brain volumes scale with head size. Since ICV is sometimes regarded as a proxy for maximal brain size, controlling for ICV yields regional volumes representing deviations from the expected based on head size. Controlling for ICV also controls to some extent for body size, as head size and body size are normally related, although height and BMI were weakly related to sleep duration in the present data. Not controlling for ICV naturally led to a higher sleep duration estimate of 7.4 hours associated with maximal brain volumes and thickness, as ICV and sleep duration were positively related.

Underlying neurobiology cannot be directly inferred from MRIs, but the number of neurons has been shown to correlate with regional82 and global83 brain volumes cross-sectionally. Longitudinally, volumetric reductions and cortical thinning occurring during adulthood may be associated with the shrinkage of neurons, dendrites and axonal arborizations84,85, reduced spine numbers and density86, loss of synapses and dendritic branches87, and, in degenerative conditions such as AD, also neuronal loss84, although neuronal85,88,89 or glial90 loss probably plays a limited role in the volumetric reductions seen in normal ageing. Sleep duration–brain correlations were seen in the cross-sectional analyses only, so it is unlikely that these are caused by neurobiological events underlying morphometric changes observable with MRI during adulthood. Events ongoing during earlier life stages, in development, may thus be more relevant. Processes such as synaptogenesis and synapse elimination/pruning91, dendritic and axonal growth92,93, and intracortical myelinization94 can be involved in morphometric changes in childhood development. Some of these, however, such as synaptic density, will have minute effects on volumetric measures because their total volume is very small93,95. In any case, it must be stressed that the volumetric analyses in the present study are corrected for ICV, which means that relative and not absolute volumes are used in the calculations. It is thus not clear which or any of the processes above can contribute to explaining the observed cross-sectional relationship. Research into the neurobiology of sleep has focused more on electrophysiological processes and neurotransmitter systems than on macrostructural differences. In addition, the association between neurodegeneration and sleep problems may be due to any disturbance of normal brain function and structure probably affecting how we sleep, and the neurobiological foundation will then vary depending on the underlying condition. Hence, a neurobiological interpretation of the present findings will be speculative and must be based on general knowledge about the relationship between brain features and different human traits. For example, we have previously shown that sleep disturbances are associated with spatial expression patterns of oligodendrocytes and S1 pyramidal cell genes71, in line with theories of relationships between myelination and sleep96. To our knowledge, such analyses have not been reported for sleep duration specifically.

Genetic associations

Sleep duration is a complex trait modulated by more than the core circadian genes51. Previous studies have reported GWAS heritability to be modest51,52,53,54,55,56,57, which limits the power of the MR approach. Still, an MR study from UKB found that both short and long sleep were related to poorer visual memory and longer reaction time97. In contrast, another study did not find causal relationships between sleep patterns and AD98. In the current study, the genetic association analyses yielded some interesting results. First, we found that the genetic correlation for below- versus above-average sleepers was negative. This means that the genes related to longer sleep in the shorter-than-average sleepers were related to shorter sleep in the longer-than-average sleepers. This could mean that there is a genetically influenced drive towards the average reported sleep duration, for both the above- and below-average sleepers. This is interesting considering the present findings of larger regional brain volumes and cortical thickness in participants sleeping 6.5 hours, as well as the above reviewed evidence that ~7 hours of sleep is associated with better health and cognitive performance34. Second, the GWAS results suggested that genes involved in metabolism (for example, APOA1/4/5, APOC3 and RFT1) may contribute to inter-individual differences in sleep duration in the shorter-than-average sleepers. Genetic and cellular links between sleep and metabolism are a focus of current research99,100, although a further investigation into the details of these relationships is beyond the scope of this paper. Third, we found that PGSs for ICV and TGV were significantly positively associated with sleep duration in the short sleepers, which means that genes related to larger brain volumes are also related to longer sleep in the short sleepers. This finding aligns relatively well with our estimate of 6.5 hours being associated with the largest relative brain volumes. This suggests partly shared genetic variation between regional brain volumes and sleep duration. Finally, the MR analyses showed an effect of ICV on sleep duration, an effect that was robust after accounting for confounding factors (BMI, smoking and drinking habits, and neuropsychiatric disorders; Supplementary Information). However, there was no evidence of causal effects of sleep duration on any MRI-derived brain measure. Hence, in the current samples, people with larger heads on average report that they sleep longer, and this relationship partly depends on genetics. The lack of evidence for an inverse influence of sleep duration on ICV was given, as ICV does not change in adults and hence cannot be affected by sleep. Still, the genetic results suggest that there may be a mechanistic relationship between ICV and sleep duration that could warrant further explorations. This effect was removed from the estimated sleep duration–brain volume relationships by covarying for ICV, which may contribute to explaining why the nominally significant relationship between TGV and sleep duration in the MR analysis did not survive corrections. In sum, the genetic results were in coherence with a view of average and ‘optimal’ sleep duration as relatively well aligned and did not provide evidence for a causal relationship between sleep duration and brain structural features.

Variation across persons, ages and regions

The meta-analytic estimate of 6.5 hours is a best approximation, not a magic number. First, there was substantial regional heterogeneity in the cross-sectional results. For instance, the hippocampus showed peak volume at 6.3 hours, the white matter compartments even lower, while people with the thickest cortex reported sleeping between 6.4 and 7.0 hours. This does not mean that less sleep is necessarily more optimal for hippocampal volume than for cortical thickness, and these differences should be interpreted while keeping in mind the lack of evidence for sleep–atrophy relationships in the longitudinal analyses. These numbers therefore probably represent stable relationships rather than reflecting the effects of sleep duration per se.

Second, an important qualification is that individual differences in sleep need exist, due to, for instance, genetic differences and previous sleep history51,52,53,54,55,56,57,101. If deviations from an individuals’ sleep need led to poorer brain health for that individual, this may not be picked up in our group analyses. We thus cannot from our results conclude that people should try to sleep 6.5 hours each night. Rather, people who report sleeping 6.5 hours tend to have the thickest cortex and largest regional brain volumes relative to ICV. Intra-individual causal effects of changing sleep duration on the brain can be assessed in sleep deprivation studies. Unfortunately, experimental sleep deprivation does not resemble habitual variations in sleep duration, and the long-term consequences on the brain from sleep deprivation, taking adaptations into account101, are not known.

Finally, associations with sleep could vary with age71, but the sleep–age interactions did not confirm that this was the case. The vertex-wise cortical thickness analyses (Supplementary Information) suggested that relationships with sleep duration were different in younger and older adults, but this was not confirmed in the GAMM analyses. We therefore believe that the meta-analytic results represent a good approximation of a general sleep–brain volume relationship on a group level, while ignoring that there naturally are variations across brain regions, ages and participants.

Caveats and limitations

The first limitation of our study is that self-reported sleep duration is not accurate and may reflect several other aspects of sleep than duration only. There is no perfect way to measure sleep duration without disrupting routine102. Self-reports are only moderately correlated with actigraph measures102,103. However, although actigraph results often correlate highly with polysomnography104, they tend to overestimate sleep duration104,105,106,107,108, and it is not known how well actigraphs perform outside a sleep lab setting. One study reported that the same genetic loci were related to sleep duration whether it was measured by actigraphs or self-reports51. The international recommendations for sleep duration were mostly based on studies involving self-report7,66, and self-reported sleep is the most relevant variable for clinical, public health and policy recommendations7. While acknowledging the limitations of self-reports, we also believe them to be the most relevant measure in the present context. Second, we studied morphometric brain measures only. Although other measures could show different sensitivity to sleep duration, such as white matter microstructure109 or Aβ accumulation110, brain morphometry is sensitive to normal and pathological brain changes19, and atrophy has consistently been identified as a factor governing age-related sleep changes111. Third, we have not considered cognitive function, for which different ranges of sleep duration may be associated with the highest scores. It is still likely that associations between sleep duration and cognitive performance or mental health are transient and reversable after restorative sleep, whereas associations with brain structure may be more permanent. Fourth, the samples were not thoroughly screened for sleep disorders such as sleep apnea112. If individuals with sleep problems were included, this would probably not attenuate the relationships and is therefore unlikely to explain the weak sleep–brain associations observed in the study. Fifth, to study differential genetic influences on sleep duration in participants with different habitual sleep patterns, we stratified the sample by seven hours, a strategy that made our GWAS underpowered. While the findings are promising, large-scale independent validation is needed. Furthermore, given the limited power of the GWAS, the relations suggested by the MR analysis will also need future replication. Sixth, we had no quantitative measure of head motion, so to the extent that head motion is correlated with sleep duration, this could be a confounder. Seventh, a number of covariates that could influence the sleep–brain relationships were not controlled for, including cardiovascular risk factors other than BMI. Finally, although some of the samples are population based, no MRI study is fully representative of the population from which it is sampled. Despite including studies from multiple European countries and the United States, we cannot exclude the possibility that different sleep–brain patterns might exist in other populations.

Conclusion

We did not find evidence suggesting that sleep duration was related to the rate of atrophy or that sleep shorter than the recommended duration1,66 was associated with smaller regional brain volumes, thinner cortex or smaller ventricles. Rather, sleeping less than the recommended amount was associated with thicker cortex and greater regional brain volumes relative to ICV, and moderately long sleep showed a stronger association with smaller volumes than even very short sleep (for example, less than five hours). As the average sleep duration was almost perfectly aligned with the duration associated with the largest volumes, this may suggest that normal brains promote adequate sleeping patterns, which are shorter than the current recommendations.

Methods

Transparency

The current work contains many analyses and analytic choices, which may affect the results. These include, for instance, which covariates are included in the different analyses, the exclusion of outliers and restriction of data ranges (for example, for sleep duration), model specifications and model selection. This information is too extensive to fit in the main text. To optimize transparency, we have included these details in the Supplementary Information (an overview is provided in Supplementary Table 3).

Sample

Community-dwelling participants were recruited from multiple countries in Europe and the United States. Some were convenience samples, whereas others were contacted on the basis of population registries. All participants at the age of majority gave written informed consent. All procedures were approved by a relevant ethics review board. For Lifebrain, approval was given by the Regional Ethical Committee for South Norway, and all sub-studies were approved by the relevant national review boards. For UKB, ethics approval was obtained from the National Health Service National Research Ethics Service (ref. no. 11/NW/0382).

In total, data from 47,039 participants (20.0–89.4 years) with information about sleep duration and MRI of the brain were included. For 3,910 participants, two or more MRI examinations were available, yielding a total of 51,320 MRIs (mean follow-up interval, 2.5 years; range, 0.005–11.2 years; 26,811 female and 24,509 male observations). The demographics of the samples are given in Table 1, and a brief description of each is given below (for the details, see the Supplementary Information, ‘Sample characteristics’).

Sleep measures

For the Human Connectome Project and the Lifebrain samples except Betula, sleep duration and other characteristics were measured by the Pittsburgh Sleep Quality Index (PSQI)113. For Betula, sleep characteristics were measured by the Karolinska Sleep Questionnaire114,115, which can be used to extract the same information covered by the PSQI116. For UKB, sleep was measured through multiple questions. For all samples except UKB, we calculated the PSQI global score following normal procedures but excluded the sleep duration component. For UKB, we calculated a sum score of different sleep-related measures (sleeplessness (field 1200), problems getting up in the morning (field 1170), daytime dozing (field 1220), snoring (field 1210) and chronotype (field 1180)). This global sleep quality score was used as a covariate in follow-up analyses of brain–sleep duration relationships.

MRI

The Lifebrain MRI data originated from seven different scanners (for the details, see ref. 26 and the Supplementary Information, ‘MRI methods’), processed with FreeSurfer version 6.0 (https://surfer.nmr.mgh.harvard.edu/)117,118,119,120. Because FreeSurfer is almost fully automated, to avoid introducing possible site-specific biases, we imposed gross quality control measures and did no manual editing. To assess the influence of the scanner on volumetric estimates, seven participants were scanned on seven scanners across the consortium sites (see ref. 26 for the details). Using the hippocampus as the test region, there was a significant main effect of the scanner on volume (F = 4.13, P = 0.046), but the between-participant rank order was close to perfectly retained between scanners, with a mean between-scanner Pearson correlation of r = 0.98 (range, 0.94–1.00). Analyses of five additional volumetric cortical and subcortical ROIs (medial temporal lobe (entorhinal and parahippocampal cortex), precuneus, superior temporal, caudate nucleus and caudal middle frontal) showed correlations close to 1.0 for all regions except medial temporal lobe, where correlations were somewhat lower but still more than r = 0.75 (ref. 121). Thus, including site as a random effect covariate in the analyses is probably sufficient to remove the influence of scanner differences.

UKB participants were scanned using three identical Siemens 3T Prisma scanners (UKB Brain Imaging—Acquisition Protocol (https://www.fmrib.ox.ac.uk/ukbiobank/protocol/)). FreeSurfer outputs122 and the volumetric scaling from T1 head image to standard space as a proxy for ICV were used in the analyses, generated using publicly available tools, primarily based on FSL (FMRIB Software library, https://fsl.fmrib.ox.ac.uk/fsl/fslwiki). The details of the imaging protocol (http://biobank.ctsu.ox.ac.uk/crystal/refer.cgi?id=2367) and structural image processing are provided on the UKB website (http://biobank.ctsu.ox.ac.uk/crystal/refer.cgi?id=1977).

Statistical analyses

ROI analyses were run in R version 4.0.0 (ref. 123), by GAMMs using the packages gamm4 version 0.2–26 (ref. 124) and mgcv version 1.8–28 (ref. 62). GAMMs offer an attractive alternative to linear mixed models in that a priori specifications of polynomial functional forms are not necessary, and GAMMs are able to accurately fit trajectories of different forms and complexities63. The Desikan–Killiany parcellation included in FreeSurfer yields 34 regions, but the temporal and the frontal poles were excluded from analysis due to substantial noise in these regions. This atlas was selected because it is well validated and commonly used for cortical ROI-based analyses, which is a benefit when comparing results across studies. Volumetric outliers were defined by having a residual more than four times the magnitude of the residual standard error in an analysis of age effects and removed from the analyses. The FDR was used to adjust the P values for multiple comparisons, because family-wise error correction methods such as Bonferroni are very strict and would lead to serious loss of power. With FDR methods, we know that the expected proportion of false discoveries is 0.05, which we consider acceptable. Hence, we used the Benjamini–Hochberg procedure to adjust the P values. The set of all subcortical ROIs constrained one family of tests, and the set of all cortical ROIs (for the measure of thickness) constrained another family of tests. The computer code can be found in the Supplementary Information in the relevant sections.

Genetic analyses

GWAS

Five GWAS analyses were performed: sleep duration for participants sleeping ≤7 hours (N = 197,137) and >7 hours (N = 112,839), separately, using a sample with no MRI data available; total hippocampal volume (N = 29,155); TGV (N = 29,155); and estimated ICV (N = 29,155). For each GWAS, sex, baseline age and the top ten genetic principal components were included as covariates in a linear regression model for identifying associate SNPs for each trait. In addition, each of the five traits was first normalized to have unit variance and zero mean. For total hippocampal volume and TGV, ICV was additionally included as a covariate. PLINK version 2.0 (ref. 125) was used for these analyses with the function glm. The FUMA server126 was used to annotate the GWAS results to genomic regions and nearby genes with the default parameters. Additional details, Manhattan plots and QQ plots showing the GWAS results are presented in the Supplementary Information, ‘Genetic analyses’.

GWAS were run instead of using summary statistics from previous genetic studies of sleep in UKB (for example, ref. 51) for three reasons: (1) we needed to ensure that we were using completely non-overlapping samples for the sleep and the sleep-MRI analyses; (2) we were interested in contrasting participants with below- versus above-average sleep duration and studying the variation within each group, which has not previously been done; and (3) an important aim is to assess whether there are plausible causal relations between sleep duration and brain structure, using PGS and MR methods. The widely used models for these methods assume monotonic relationships, where effects do not change direction across the range of phenotypic values. This does not fit the inverse U-shaped relationship between sleep duration and brain features. Thus, we could not use previously published summary statistics, particularly for the MR analysis.

SNP-h 2 and genetic correlation

The LD score regression model (ldsc)77 was used to estimate SNP-h2 for each trait and genetic correlation between sleep duration and hippocampal volume, TGV and ICV. LD structure provided by ldsc from the HapMap 3 data was used in this analysis. The other parameters of ldsc were set to its default values.

PGS

To accurately estimate the PGSs for a trait, we first computed the posterior effect size per SNP using the Bayesian mixture model implemented in PRS-CS4 (ref. 127). The polygenic risk score via continuous shrinkage priors (PRS-CS) model is a widely used method for computing PGSs for highly polygenic traits127. PRS-CS shrinks effect sizes estimated from GWAS using LD correlations in a Bayesian framework, assuming a two-component mixture prior distribution. The LD correlations provided by PRS-CS were based on the 1000 Genomes phase 3 European population. In total, about 1.3 million high-quality SNPs were used. In addition, PRS-CS does not need information from the target sample where the estimated posterior effect will be used for computing PGSs. The GWAS sample and the target sample were thus treated fully independently in the PGS computation. Furthermore, in light of previously published GWAS results as well as ours, we assumed a highly polygenic genetic architecture for both MRI-derived traits and sleep duration, by setting the parameter φ to 0.01, instead of a grid-search strategy proposed by the model. We believe that our choice, though conservative, further reduces the overfitting risk. For the other parameters in PRS-CS, we used the default values.

The posterior effect sizes obtained by running PRS-CS on each GWAS were then used separately to compute PGSs. We did not use P values or LD thresholds to select SNPs. Rather, genome-wide SNPs were used for computing the PGSs. After removing rare variants (MAF < 0.01) in UKB, variants not in the HapMap 3 data and variants that are not in Hardy–Weinberg equilibrium (P < 10−6), we used the remaining 615,297 SNPs for computing the PGSs for each trait. Recent methodology studies all point to the advantage of using shrinkage-based methods over P-value-based thresholding methods (for example, LDpred2 (ref. 128), PRS-CS and the lasso-based models129), particularly for highly polygenic traits. The computed posterior effects were used as weights in the computation of PGSs for a trait by using the score function from PLINK version 2.0. To examine the associations between PGSs for a trait with a second trait, linear regression models were used. The same covariates included in the GWAS analysis were included as covariates in addition to PGSs in these models.

The PRS-CS127 software, which implements Bayesian mixture models to incorporate LD structures in estimating allele effect sizes, was used for PGS analysis. High-quality SNPs provided by PRS-CS derived from the HapMap 3 dataset were used for constructing LD structures. The polygenicity parameter (φ) was set to 0.01, assuming a highly polygenic trait. Estimated effect sizes were used to compute PGSs using the score function from PLINK version 2.0 without further selection through association P values or LD r2 values.

Two-sample MR

The TwoSampleMR R package130 was used to investigate the relations between sleep duration and the brain variables. Independent instrumental SNPs were selected using the following parameters: association P ≤ 10−6, MAF ≥ 0.05, LD r2 ≤ 0.1 and LD distance = 10 kb. The LD structure was derived from 10,000 independent European participants randomly selected from UKB. The powerful inverse variance weighted model from TwoSampleMR was used as the main model. Other models implemented in the software were also run as sensitivity analysis. To further support the results, the analysis was reperformed with P ≤ 10−5, which would increase the strength of instrumenting for the less powerful sleep duration traits. For the only significant relation—that is, ICV to sleep duration for shorter-than-average sleepers—a third analysis with P ≤ 5 × 10−8 used for selecting instrumental SNP was performed. The standard output from TwoSampleMR is shown in the Supplementary Information, ‘Genetics notes’.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.