Introduction

Our society is aging at unprecedented rates, with the average human lifespan increasing globally. Aging-associated cognitive decline impairs daily functioning1, increases the frequency of hospitalization and emergency visits2, and is associated with heightened risk of multimorbidity3,4. Patients with cognitive impairment and dementia also face barriers to diagnosis5,6, leading to delayed detection that further hinders appropriate care, treatment, and patient functioning. Developing easily repeatable biomarkers of “brain health” (i.e., the combined preservation of optimal brain integrity and cognitive function) could facilitate early detection of current or future cognitive impairment risk which may in turn guide potential therapeutic opportunities for patients.

Sleep-physiology based metrics are attractive as indicators of brain health because changes in sleep architecture are not only strongly associated with aging, but also with cognitive decline and a wide range of neuropathologic changes, suggesting that sleep may provide a general-purpose window into brain health. For example, people advancing into the fifth decade of age present with increased sleep-onset latency and sleep fragmentation and decreased total sleep time and sleep efficiency7,8. They also exhibit reductions in slow-wave sleep (SWS) percent, rapid eye movement (REM) percent, and density, duration and amplitude of sleep spindles, and increases in non-REM stage 1 (N1) percent and nighttime wakefulness8,9,10. These age-related sleep changes have been linked to subsequent cognitive decline11 and increased risk of dementia12,13.

One recently proposed sleep EEG-based biomarker of brain health is “brain age” (BA)14. Its difference from chronological age, called the “brain age index” (BAI), characterizes the extent to which an individual’s observed neurophysiologic functioning during sleep deviates from what would be expected for their chronological age. In excess, BAI has been linked to higher mortality15 and an underlying burden of disease, including dementia16, HIV infection17, hypertension14, and diabetes14. Although BAI gives insight into the general functional capacity of the brain, it is not explicitly designed to decode information about neuroanatomic integrity and its relationship with cognition has not yet been evaluated.

Here, our aim was to take a novel approach to measuring brain health by developing methods to decode neurocognitive information from sleep. Specifically, we developed a series of novel markers of brain health termed Sleep Cognitive Indices (SCIs). Unlike BAI, the SCIs are explicitly designed to correlate with specific components of cognition. Such indicators of brain health could be important for identifying age-related brain diseases which preferentially affect specific aspects of cognition, or for tracking the effects of interventions targeted at specific cognitive domains. We hypothesized that specific combinations of sleep-EEG features would be correlated with performance on specific cognitive tasks and that it may thus be possible to develop EEG-based indicators specifically correlated with different types of cognitive abilities. In comparison, we expected participants with elevated BAI to perform worse on cognitive assessments but reasoned that this correlation is likely nonspecific since BAI was developed to predict age.

Methods

Design and participants

We conducted a single-center, cross-sectional observational study consisting of adults (≥ 18 years of age) who underwent diagnostic polysomnography (PSG) between November 2018 and October 2019 at the Massachusetts General Hospital Sleep Laboratory. Enrolled participants completed a cognitive test battery within 40 days of their PSG. Patients were excluded if they had a baseline diagnosis of dementia or a learning disability, were unable to perform the cognitive tests due to a lack of English proficiency or impairment (motor, visual, or hearing), or if they had prior experience with the cognitive test battery. This study of human subjects was approved by the Mass General Brigham Institutional Review Board. All methods were performed in accordance with the study protocol and the Declaration of Helsinki. Written informed consent was provided by all participants. The number of subjects and their characteristics are summarized in Table 1.

Table 1 Baseline characteristics of study patients (N = 150).

Sleep signal preprocessing

Electroencephalogram (EEG) signals were recorded from six scalp electrodes: frontal (F3, F4), central (C3, C4), and occipital (O1, O2), each referenced to the contralateral mastoid (M1, M2). EEG signals were recorded at 512 Hz and downsampled to 200 Hz before analysis. These signals were then band-pass filtered between 0.1 and 20 Hz and noncerebral artifacts were removed using a previously described filtering method18.

The American Academy of Sleep Medicine (AASM) provides guidelines for classifying consecutive 30-s epochs of EEG signals into 5 “stages”19, including awake (W), rapid eye movement (REM) sleep, and 3 stages of non-REM sleep (N1, N2, N3). EEG epochs were classified following these AASM guidelines by licensed sleep technicians and the assigned stages were subsequently reviewed and revised as needed by a sleep physician. Only central electrode signals (C3-M2 and C4-M1) were used for our main analysis, as the public sleep dataset that we used for external validation, the Sleep Heart Health Study included only central electrodes. We further explored model performance when either occipital or frontal electrodes were available for analysis in addition to central electrodes.

Spindle and slow oscillation characterization

Sleep spindle and slow oscillation features were obtained using Luna software9 (http://zzz.bwh.harvard.edu/luna/). Spindle detections were included only for epochs scored as N2 and N3. A single electrocardiogram (ECG) electrode was zero-phase band-pass filtered from 0.3 to 40 Hz and used to apply ECG-correction to remove ECG artifacts from the EEG signals. Slow oscillations were detected by band-pass filtering between 0.2 and 4.5 Hz. Positive-to-negative zero-crossings were then detected in the filtered signal, and intervals between 0.8 and 2-s were designated as slow oscillations if they had a negative peak higher than the median across all zero-crossings and a peak-to-peak amplitude higher than the median. All spindle and slow oscillation features used for analysis are summarized in Table 2.

Table 2 Sleep features included in the Sleep Cognitive Index (SCI) model (BAI used a subset of features used in SCI, i.e., the last two feature domains).

Sleep macrostructure features

Sleep macrostructure measures were calculated following AASM definitions, including total sleep time (TST), wake after sleep onset (WASO), sleep efficiency (SE), total time in bed (TTB), sleep latency (Sleep_L), and REM latency (REM_L). Percentages of TST spent in N1, N2, N3, and REM were calculated using custom code written in Python (https://www.python.org/). All sleep macrostructure features are summarized in Table 2.

Cognitive test battery

All participants were asked to complete the NIH Toolbox Cognition Battery20. The NIH Toolbox Cognition Battery is one of the core domains in the NIH Toolbox for Assessment of Neurological and Behavioral Function. It consists of seven instruments that assess the following functional constructs: Flanker Inhibitory Control and Visual Attention (ICA), Dimensional Change Card Sort (DCCS; measures cognitive flexibility), List Sorting Working Memory (LSWM), Picture Sequence Memory (PSM; measures visual episodic memory), Pattern Comparison Processing Speed (PCPS), Picture Vocabulary (PV; measures vocabulary comprehension), and Oral Reading Recognition (ORR; measures reading decoding). Of these seven instruments, PV and ORR are classified as measures of crystallized cognition and the rest as measures of fluid cognition. Fluid cognition reflects a collection of cognitive processes involved in problem-solving, abstract thinking, and reasoning that are independent of past knowledge acquired through experience and education. In contrast, crystallized cognition represents a group of cognitive processes that apply prior knowledge from experience and education to solve problems. Although different, these two cognition types are tightly correlated components of total cognition. Despite this association, studies often examine these subdivisions of total cognition separately to better understand and treat neurologic conditions21,22. Additionally, both crystallized and fluid cognition have been shown to change with age23,24. For detailed instrument information, see Table S1. In addition to scores for individual tests, three composite scores for fluid, crystallized, and total cognition are provided. Absolute scores for each of the seven tests and the three composite scores were used for analyses (all non-age adjusted).

Statistical analyses

Developing the sleep cognitive indices

To develop SCI for specific cognitive measures, we created a series of regression models. The dependent variable of each model was a task’s absolute scores. Independent variables in SCI models included EEG features that were derived from spindles and slow waves, as well as the features in Table 2. Demographic variables, such as age and sex, were not included since our primary aim was to develop EEG-based indicators of neurocognitive health and evaluate how well brain signals alone could capture neurocognitive status, rather than produce accurate predictions of cognitive performance per se. EEG-based models were evaluated with a goodness of fit test (see below) in comparison to a full model with demographic variables. Because the number of independent variables (160 × 3 + 42 + 10 = 532 with all electrodes, 160 + 42 + 10 = 212 with one electrode set, Table 2) exceeded the number of participants (150) in our dataset, we used linear regression with Elastic Net regularization to prevent overfitting and to force regression models to select only the features most relevant to the target task. Note that Elastic Net regularization automatically selects which features to retain in the model, and thus the number of features selected varies depending on the specific prediction task and data used to develop the model. To avoid overestimation of regression performance, model training and feature selection were restricted to training data, while model performance was evaluated strictly on held-out test data. In summary, each SCI model is generated by extracting EEG measures of interest (determined by Elastic Net regularization), multiplying these EEG measures by the regression coefficients of the model of interest (fluid, crystallized, total), and adding the results to obtain a single number (SCI score).

For SCI model optimization and testing model performance, we used nested tenfold cross-validation (CV) (Fig. S1). For each functional construct and cognitive composite score, the outer CV loop separated data into ten folds, where each fold contained 15 distinct participants. Nine folds were used for model fitting (n = 135) and the other fold for model testing (n = 15). This was done ten times, such that testing was performed once on each fold. During model fitting, Elastic Net regression was performed to select the best subset of features and their coefficients. Strict separation of training and test set was maintained to achieve statistically unbiased estimates of out-of-sample performance. Our reported performance results are based on test data only.

In addition to the new SCI models, we calculated the Brain Age Index (BAI) using a previously described machine learning model14. BAI includes features from the waveform time domain (e.g. line length and kurtosis which reflect EEG signals complexity) and from the frequency domain (e.g. spectral power of the delta (0.5–4 Hz), theta (4–8 Hz), alpha (8–12 Hz) bands, and their ratios14,25). All features are summarized in Table 2. Features of missing sleep stages were imputed using the K-nearest neighbor approach (K = 10). We used Pearson correlations to measure the degree to which the BAI correlates with cognitive test scores. Statistical significance was defined using a p-value < 0.05.

Pearson correlation was calculated between cognitive scores and the various SCI and BAI. To compare pairs of correlation coefficients (e.g. to evaluate the difference between the strength of correlations of BAI vs. SCI with each cognitive test), we completed Fisher r-to-z transformations for each pair of correlation coefficients26.

To evaluate how well SCI and BAI distinguish individuals who score low versus high on different cognitive tests, we divided participants into three groups of equal size for each cognitive test (1/3 low score, 1/3 medium score, 1/3 high score). We then performed group-level analysis of discriminability using Cuzick’s non-parametric test for trend to examine the statistical significance of the difference in SCI across the different score groups. For individual-level analysis of discriminability, we calculated Receiver Operating Characteristic (ROC) curves and the Area Under the ROC Curve (AUC) for each SCI model. When performing ROC analysis, the medium score group was excluded from this analysis to ensure distinctness between groups.

Evaluating cognitive variation related to age, sex, and education

Performance on cognitive tasks in the NIH Toolbox Cognition Battery depends on age20. We reasoned that, if our SCI indicators are valid, they should account for age-related variation in cognitive performance. If so, regression models that include age and SCI should explain no more of the variance in cognitive test performance than regression models that include SCI alone. Similar reasoning applies to other biological variables that might correlate with cognitive performance, including years of education and sex. To address these questions, we created a series of nested regression models and compared each submodel using a likelihood ratio test. Specifically, we first fitted two Elastic Net models for each cognitive test: (1) a submodel with EEG features alone (SCI model), and (2) a full model with EEG features, age, years of education, and sex. We then compared models by calculating the log-likelihood of each model and performing a likelihood ratio test to measure the change in deviance for the submodel.

External validation

External validation was performed using EEG data from the Sleep Heart Health Study27,28,29,30 (SHHS), a composite cohort overlapping with the Framingham Heart Study31 (FHS). Participants were included if they completed a neuropsychological test battery32 in the FHS within 3 years of their SHHS polysomnography exam date. Scores from the following tests were used for the Wechsler Memory Scale (WMS) score calculation: Logical memory—Immediate Recall, Delayed Recall, Recognition; Visual reproductions—Immediate recall, Delayed Recall, Recognition; Paired Associate Learning—Immediate Recall, Delayed Recall. Of the 476 participants in the validation dataset; 152 were subsequently excluded due to incomplete WMS data, with the remaining 324 available for analysis. WMS does not include tests that are directly comparable with the three NIH toolbox composite scores (total, fluid, and crystallized); therefore we correlated the WMS score with all three composite SCI models (total, fluid, crystallized), with the expectation that these constructs are correlated and thus, if the SCI models capture valid physiologic information related to brain health, they should exhibit some measurable (if nonspecific) correlation with WMS scores.

A subset of participants in the FHS cohort was flagged for possible dementia using criteria as previously described33,34,35,36. Through the consensus diagnosis process, some of these participants were assigned a Clinical Dementia Rating (CDR)-like dementia severity rating of 0.5 and associated with the diagnosis of cognitive impairment no dementia (CIND). We further evaluated the SCI models by calculating the association between the subset of cases diagnosed as CIND and the SCI model outputs.

Statistical significance was defined using a p-value < 0.05. All statistical analyses were performed with code written in-house using Python (https://www.python.org/). We did not perform corrections for multiple comparisons, as our aim was to measure the correlation of each SCI with its corresponding target cognitive domain rather than to draw a general conclusion about the presence or absence of an association between sleep and cognition; that is, the primary focus of the study was to estimate effect sizes rather than statistical hypothesis testing.

Results

Overall, 168 participants were enrolled; 18 were subsequently excluded from analysis as they were determined to be ineligible or had missing or incomplete data. A flowchart illustrating the screening and enrollment of study participants is shown in Fig. 1. The final cohort included 150 participants (56% female) with a mean age of 48.8 ± 17.7 years. Participant characteristics are listed in Table 1. The median score for each cognitive test is listed in Table S2.

Figure 1
figure 1

Recruitment flowchart and study design. Flow diagram shows screening and enrollment of study participants, exclusions, and arrival at the final cohort (N = 150).

Correlations of cognitive scores with sleep cognitive indices

Figure 2 shows the correlation between SCI and each cognitive test. SCI designed for specific cognitive measures showed significant correlations with total (r = 0.37, p < 0.0001) and fluid cognitive scores (r = 0.56, p < 0.0001), in addition to each of the five fluid subtests (PCPS: r = 0.33, p < 0.0001; Flanker ICA: r = 0.22, p = 0.006; LSWM: r = 0.46, p < 0.0001; DCCS: r = 0.30, p = 0.0002; PSM: r = 0.46, p < 0.0001). The SCI designed for measure of crystallized cognition performed poorly (r = − 0.07, p = 0.38), as did the SCI designed for its subtests (PV: r = − 0.12, p = 0.16; ORR: r = − 0.08, p = 0.34). We also show the correlation matrix when using the different SCI models to predict each cognitive score (Fig. S2).

Figure 2
figure 2

Sleep Cognitive Index is moderately associated with total and fluid cognition and not associated with crystallized cognition. Scatter plots of the absolute (N = 150) and predicted scores for each subtest and composite measures on the NIH Toolbox Cognition Battery are shown below. True cognitive scores are compared with cognitive scores predicted by an Elastic Net model for each cognitive test and composite measure. Sleep spindle features were generated using Luna. Abbreviations DCCS, Dimensional change card sort; ICA, Inhibitory control & attention; LSWM, List sorting working memory; ORR, Oral reading recognition; PCPS, Pattern comparison processing speed; PSM, Picture sequence memory; PV, Picture vocabulary.

SCI indicators were normally distributed for all significant SCI models (Fig. S3). The top five features for significant SCI models are listed in Table S3. When evaluating the effect of EEG electrode location, we observed similar performance for the three composite cognition SCI models across different subsets of EEG electrodes (Table 3).

Table 3 Effect of EEG electrode placement on SCI performance (MAE: mean absolute error).

As shown in Table S4, SCI showed stronger correlations than BAI with both total (z = 3.59, p = 0.0003) and fluid cognition (z = 4.39, p < 0.0001). For fluid subtests, SCI had stronger correlations with LSWM (z = 3.83, p = 0.0001), DCCS (z = 2.31, p = 0.02), and PSM cognitive tasks (z = 2.53, p = 0.01) and similar correlations with the Flanker ICA (z = 1.4, p = 0.16) and PCPS tests (z = 1.2, p = 0.23). No difference in correlation between SCI and BAI was evident for crystallized composite (z = − 1.59, p = 0.11) and subtest scores (PV: z = − 1.16, p = 0.25; ORR: z = − 1.23, p = 0.22).

To evaluate the ability of SCI indicators to discriminate high versus low cognitive scores at the group level, we conducted Cuzick’s test for trend and found strong trends for total SCI (z = 4.72, p < 0.0001), fluid SCI (z = 7.06, p < 0.0001), and all fluid subtest SCIs (DCCS: z = 5.21, p < 0.0001; Flanker ICA: z = 5.15, p < 0.0001; LSWM: t = 5.16, p < 0.0001; PSM: t = 4.67, p < 0.0001; PCPS: t = 4.61, p < 0.0001). The SCI designed for the crystallized cognition score (z = − 0.86, p = 0.39) and subtest scores (PV: t = − 1.98, p = 0.05; ORR: t = − 1.42, p = 0.15) did not display any significant or meaningful trends (Fig. 3a). Receiver Operating Characteristic (ROC) curves and Area Under Curve (AUC) scores confirmed that SCI models could differentiate low versus high scorers at the individual level for fluid and total cognition composite and subset scores (AUC ranged from 0.74 to 0.90), but not for crystallized composite and subset scores (AUC ranged from 0.38 to 0.46). ROC curves are shown in Fig. 3b.

Figure 3
figure 3

Analysis of discriminability of the Sleep Cognitive Index. (a) Sleep Cognitive Index models discriminated between high and low performers at the group level for total and fluid cognition. (b) ROC curve for each cognitive test. Cognitive scores (N = 150) were predicted using an Elastic Net model for each cognitive test and composite measure on the NIH Toolbox Cognition Battery. Abbreviations DCCS, Dimensional change card sort; ICA, Inhibitory control & attention; LSWM, List sorting working memory; ORR, Oral reading recognition; PCPS, Pattern comparison processing speed; PSM, Picture sequence memory; PV, Picture vocabulary.

Examination of BAI showed that among the three major cognition domains, only crystallized cognition exhibited a significant correlation, which was negative (Crystallized: r = − 0.25, p = 0.002; Fluid: r = 0.12, p = 0.15; Total: r = − 0.03, p = 0.75). Increased BAI was also negatively correlated with both crystallized cognition subtests (PV: r = − 0.25, p = 0.003; ORR: r = − 0.22, p = 0.006) and was positively correlated with the processing speed fluid subtest (PCPS: r = 0.20, p = 0.01; Flanker ICA: r = 0.06, p = 0.46; LSWM: r = 0.05, p = 0.52; DCCS: r = 0.04, p = 0.59; PSM: r = 0.02, p = 0.67). Figure 4 shows a scatter plot and linear fit between BAI and each cognitive test. The opposite signs of the correlations between BAI and PCPS (positive correlation) versus crystallized cognition (negative correlation) scores likely reflect the differential age-related changes observed in fluid and crystallized cognition: fluid cognition tends to decline with age and crystallized cognition likely increases to compensate24.

Figure 4
figure 4

Brain Age Index is moderately associated with crystallized cognition and not associated with total and fluid cognition. Scatter plots of BAI and the absolute scores (N = 150) for each subtest and composite measures on the NIH Toolbox Cognition Battery are shown below. Abbreviations DCCS, Dimensional Change Card Sort; ICA, Inhibitory Control & Attention; LSWM, List Sorting Working Memory; ORR, Oral Reading Recognition; PCPS, Pattern Comparison Processing Speed; PSM, Picture Sequence Memory; PV, Picture Vocabulary.

Evaluating cognitive variation related to age, sex, and education

Likelihood ratio tests confirmed that SCI indicators for the three cognition measures and the Flanker ICA, LSWM, PCPS, and PSM cognitive tasks fit the data similarly to a full “brain health” model that incorporated EEG, age, education, and sex features (p = 0.1). Therefore, SCI models adequately capture variation in cognitive performance related to these factors. Detailed metrics for all models are listed in Table S5.

External validation

Of the 324 SHHS/FHS participants available for analysis, 20 participants had mild cognitive impairment at the time of neuropsychological evaluation. Using all participant data, both total and fluid cognition SCI indicators showed similar correlations with the participants’ total WMS scores (total: r = 0.31, p < 0.0001; fluid: r = 0.32, p < 0.0001). In contrast, the crystallized cognition SCI model was poorly indicative of participants’ total WMS scores (r = 0.07, p = 0.23). Correlations between the three SCI models and cognitive scores, along with score distributions are shown in Fig. 5. No significant change in the strength of association was observed when cognitively impaired participants were excluded from analysis (total: r = 0.30, p < 0.0001; fluid: r = 0.30, p < 0.0001; crystallized: r = 0.06, p = 0.28). Baseline characteristics of patients are listed in Table 4.

Figure 5
figure 5

Validation of the three composite cognition SCI models in the Sleep Heart Health Study (SSHS) dataset. (a) SCI indicators of fluid and total intelligence are similarly associated with participants’ Wechsler Memory Scale (WMS) scores, while SCI indicators of crystallized intelligence are not associated with WMS scores. (b) All three SCI indicators are normally distributed. Participants (N = 324) completed the WMS through the Framingham Heart Study.

Table 4 Baseline characteristics of FHS patients (N = 324).

Discussion

In this cross-sectional observational study, we demonstrate that machine learning analyses of sleep EEG signals can generate indices that correlate with specific tests of cognition. These novel sleep EEG-derived machine learning models—the SCIs developed in the present study—were optimized to serve as indicators of brain health related to each cognitive task. They achieved a weak to moderate correlation with total cognition, moderate correlation with a composite measure of fluid cognition, and a range of weak to moderate correlations for fluid cognition subtests. SCIs for crystallized cognition and tasks were not correlated with composite crystallized cognition and subtest scores. Crucially, all significant SCI models performed well at differentiating low from high test scorers at the group and individual levels. Overall, our results suggest that overnight sleep EEG is a promising source of indicators of neurocognitive health. This is significant because sleep EEG is increasingly easy to monitor using home devices. Thus, SCIs may have promise for identifying signs of age-related brain diseases that preferentially affect specific aspects of cognitive health and for tracking the physiologic effects of interventions.

SCI versus BAI

Comparing BAI and SCI performance, we found SCIs exhibited stronger correlations with cognitive scores for total cognition, fluid cognition, and three fluid functional constructs: working memory, episodic memory, and cognitive flexibility. Because fluid cognition often declines at earlier stages of the Alzheimer’s Disease (AD) pathologic cascade24,37, measures of fluid cognition may serve as sensitive indicators of preclinical AD and increased vulnerability for cognitive decline in cognitively unimpaired adults.

In contrast, the previously published BAI was correlated with crystallized composite cognition and subtest scores and with the visual processing speed subtest of fluid cognition. No correlation was found between BAI and total cognition, fluid cognition, or the remaining four fluid subtests. Although we anticipated SCI to show stronger associations with cognition, the lack of an association between BAI and fluid cognition was unexpected. On further examination, we found that while crystallized cognition and chronological age (CA) were positively correlated (r = 0.37, p = 0.001), BA was negatively correlated with crystallized cognition in our cohort (r = − 0.16, p = 0.049). BAI (i.e., BA-CA) was therefore negatively correlated with crystallized cognition. This was not the case for fluid cognition. Because both CA and BA were negatively correlated with fluid cognition, their difference (BAI) was not associated with fluid cognition. The different results for BAI and SCI are in alignment with previous findings that relate distinct brain regions for the two cognition types. When evaluating the effects of different white-matter tracts on fluid and crystallized cognition, one study linked the forceps minor tract with measures of crystallized cognition and the superior longitudinal fasciculus with measures of fluid cognition23.

The lack of correlation between SCI and crystallized cognition may have arisen because the features computed did not capture predictive information about crystallized cognition or the choice of model was inadequate for this task due to possible non-linear relationships between sleep features and crystallized cognition.

Including non-EEG metrics of health as features of a cognition index could potentially improve correlations between sleep metrics and cognition. For example, one study that predicted individual sleep metrics using age, cognitive scores, status of cardiometabolic disease, and baseline covariates found that individuals who performed above average within their age group exhibited sleep metrics closer to younger and healthier individuals38.

Most influential features across SCI models

When reviewing the top contributors to significant SCI models, we found that a higher delta-to-theta ratio in N3 was important for total cognition, and a higher delta-to-alpha ratio in N3 was influential for both total cognition and working memory. This finding is in line with previous studies that show decreased delta band power during sleep for older adults with and without sleep disorders39,40 and increased delta band power during sleep in response to a motor learning task41.

Another N3 feature, line length, significantly contributed to total cognition, fluid cognition, cognitive flexibility, processing speed, and episodic memory models. Line length, also referred to as the mean resultant vector length, is the total variation in the signal amplitude and frequency and is a measure of EEG signal complexity. In our models, a larger signal complexity led to stronger correlations with cognition. This finding was likely driven by the line length of slow oscillations-associated spindles and delta-associated spindles during slow-wave sleep42.

Kurtosis of band power was also a strong feature for most models. Kurtosis is a measure of the amount of transiently occurring events, and a larger kurtosis corresponds with a more heavy-tailed distribution. For example, many transient 1-s spindles in a 30-s epoch can lead to higher kurtosis in the sigma band (11–15 Hz). In this study, we found that the tail extremity of alpha band power signals near the onset of sleep and during sleep–wake periods contributed to higher correlations with fluid cognition, working memory, and inhibitory control and visual attention scores. Meanwhile, the tail extremity of theta band power signals during N2 contributed to higher correlations with fluid cognition and four fluid functional constructs, excluding episodic memory. For total cognition, the tail extremity of delta band power signals during REM sleep likewise contributed to higher correlations.

With respect to spindle and slow oscillation features, spindle density during N2 was one of the top three contributors to the composite fluid cognition, inhibitory control and visual attention, cognitive flexibility, and processing speed models. This finding aligns with previous literature that links spindle density with different measures of fluid cognition and functional constructs43,44. Spindle amplitude, duration or frequency did not appear to be important. Further, the number of spindles that overlapped with a detected slow oscillation in N2 was an important feature for total cognition and three fluid functional constructs: cognitive flexibility, processing speed, and episodic memory. Coupling between the phase of slow-wave oscillations and spindle activity has been shown to facilitate memory consolidation and performance45 and influence cognitive impairment in older adults46. These studies further support our episodic memory model, for which the second most influential feature was the circular mean of slow oscillation phase at spindle peak, or mean coupling direction. We also discovered that slow oscillation peak duration predicts cognitive flexibility. Slow oscillation slope, which has been linked to the effectiveness of neuronal synchronization at the cortical level47, was also found to predict episodic memory.

Among the sleep architecture features (macrostructure), the percentage of REM sleep was the only highly influential one and ranked the third important feature for working memory. This result agrees with previous studies that support the role of REM duration in working memory performance48,49. While REM occurs in the middle of the night, other macrostructures are more likely to be effected by the fact that the MGH dataset is a clinical dataset. For example, for total sleep time (TST), multiple studies have shown a U-shape relationship50,51,52,53 where overly long or short TST is associated with worse cognition and TST between 6 and 8 h is associated with better cognition. However, the sleep architecture in the sleep lab may not reflect their habitual sleep given that participants are awoken around 6am and experience the “first-night effect” associated with sleep studies. This also affects wake after sleep onset, sleep efficiency, total time in bed, sleep latency and REM latency.

Goodness of fit

A goodness of fit test showed that SCI models for all cognitive composite scores and all but three subtest scores (picture vocabulary, working memory, and episodic memory) were not improved by adding age and sex features. This suggests that SCI models capture changes in neurocognitive health related to age and sex via features of brain activity during sleep.

EEG electrodes used in the SCI models

As shown in Table 3, the SCI model trained using central EEG electrodes performs best. This could be explained by the top features: the delta band power during N3 is highest at the central location, therefore the delta-to-theta power ratio during N3 at the central location is the most predictable. Similarly, the spindles at the central electrodes are the so-called fast spindles, which have been shown to correlate with cognition more strongly than slow spindles at frontal electrodes54.

External validation

We investigated whether SCI designed for the three proposed measures of cognition were indicative of performance on the WMS in the SHHS/FHS dataset and found significant correlations between SCI for fluid and total cognition and WMS scores. Both models resulted in comparable correlations, while the crystallized model had no correlation. Compared to our MGH dataset, the overall performance for total and fluid cognition SCI models was reduced in the SHHS/FHS dataset. This difference in performance is likely driven by differences in methodologies, such as the use of different neuropsychological batteries (the WMS selected does not include specific measures of processing speed or working memory), the larger gap between neuropsychological and polysomnography exam dates (SHHS/FHS ≤ 1095 days; MGH ≤ 40 days), and the difference in the average age of the two cohorts (SHHS/FHS: 62 years; MGH: 49 years). Sex was evenly divided for both cohorts, while the level of education could not be compared due to different methods of capturing education levels.

Limitations

Our study has several limitations, one of which is selection bias. As the study was offered only to those undergoing a PSG for suspected sleep disorders, participants likely had at least subjectively abnormal sleep. In addition, we did not control for medications. Therefore, the cohort would not be reflective of a healthy population. Participants also lacked racial (76% White) and socioeconomic diversity. As the single in-lab PSG setting is known to create the first-night effect, sleep for some participants may not represent typical sleep at home. Lastly, noise in the cognition scores may exist, as we did not control for the time of day when administrating the cognitive test battery or the time between PSG and cognitive assessment.

Future directions

In future research, the night-to-night variability of SCI should be considered, as our previous work shows that the average night-to-night standard deviation in calculating BAI is 7.5 years, which can be reduced to less than 5 years by averaging consecutive nights55. Because calculating SCI only requires two central electrodes, information on night-to-night variability can be conveniently captured using home-based EEG recording devices to improve the reliability of SCI measurements.

Additionally, although BA is commonly measured using magnetic resonance imaging (MRI)56, structural MRI scans remain costly, inaccessible to claustrophobic patients and those with metal implants, difficult to deploy or repeat, and do not measure functional status. Thus, sleep EEG-based brain age and health biomarkers may address some of these concerns due to the cost-effectiveness of EEG devices, the accessibility of home-based EEG recording devices, and the aging-associated changes in sleep EEG57. To understand the potential benefit in clinical settings, future work is needed to evaluate this biomarker in a more diverse population with cognitive impairment with underlying neuropathologic changes.

Conclusion

Sleep cognitive indices (SCI) are correlated with measures of total and fluid cognition, while the brain age index (BAI) is correlated with measures of crystallized cognition. Key features contributing to the observed relationships include delta-to-theta and delta-to-alpha band power ratios, kurtosis, spindle density, coupling between slow oscillations and spindles, and percentage of REM sleep. Further research is needed to improve the stability of SCI and to validate SCI as a brain health biomarker.