Introduction

The number of individuals on the Alzheimer’s disease (AD) continuum, from preclinical AD to AD dementia, is currently estimated at about 416 million globally1. AD presents a significant impact on public health systems and the well-being of older adults and caregivers2. Therefore, it is crucial to identify prevention strategies to prevent or delay disease progression. Adherence to healthy lifestyle habits may reduce the risk of dementia and preserve cognitive health in at-risk seniors, in a cost-effective way. Multidomain lifestyle behaviors may prevent or delay up to 40% of dementias3. Previous studies have found that healthier diet habits4, physical activity engagement5, cognitively stimulating activity participation6, and social support7 prevented or delayed cognitive decline among healthy older adults. Similar lifestyle habits (e.g., engaging in exercise, healthy diet, and cognitive activity) are also associated with decreased risks of developing mild cognitive impairment (MCI) and AD8,9,10. Therefore, promoting healthy lifestyle behaviors are important for brain resilience and AD prevention.

Most prior studies focused on a single lifestyle variable. A few studies examined multidomain lifestyles, cognitive and brain health but with a limited number (i.e., < 3) of lifestyle variables11,12,13. However, lifestyle behaviors may not act on the brain health in isolation. According to previous reviews, intervention for AD and cognitive impairment prevention is moving from targeting a single lifestyle factor to multidomain lifestyle features in recent decades, as multidomain interventions mimic a more ecologically valid approach to voluntary adherence to modifiable lifestyles habits14,15. This approach is also based on the assumption that AD is a complex disorder that is associated with multiple risk and protective factors16. Thus, older adults at risk for AD may benefit more from interventions that target multiple factors at once. These reasons underscore the need to deepen our understanding of engagement in multidomain lifestyle factors. How individuals adhere to multidomain lifestyles voluntarily in a real-world setting is presently unknown. Only a few behavioral studies have examined this question. For example, more social support was associated with more physical activity engagement and healthy food intake17. Additionally, there was a reverse relationship between physical activity engagement and dietary fat18. Therefore, we hypothesize that engagement in each lifestyle factor is not entirely independent (i.e., people who adhere to one healthy habit might be more or less likely to adhere to another). By identifying potential lifestyle profiles among individuals, we will be able to provide insights into personalized lifestyle promotion interventions.

The brain systems underlying successful multidomain healthy lifestyle engagement are presently poorly understood. This knowledge is critical to more effectively design personalized lifestyle interventions. A few studies to date have investigated the relationship between multidomain lifestyle behaviors and brain health. Bittner et al. (2019) found that a higher combined lifestyle risk score (i.e., defined by physical activity, social engagement, alcohol intake, and smoking) was associated with altered functional connectivity and gyrification in motor and frontal areas in healthy older adults. Relatedly, individuals with higher lifestyle risk scores showed an older brain age, estimated by T1-weighed structural images19. Multiple lifestyle factors were also associated with neuropathological biomarkers (e.g., amyloid burden)13,20. Prior research has focused on brain characteristics as the outcome of healthful lifestyles, rather than the cause. This overlooks the bidirectional relationship between the brain and lifestyle habits. Morris et al. (2022) found that functional connectivity in regions related to inhibitory control predicted older adults’ sedentary behaviors change after an intervention. Therefore, it may be equally important to consider the potential neural features that impact an individual’s lifestyle choices when interpreting the findings in cohort studies where directionally can be hard to discern. Additionally, the majority of these studies focused on generally healthy older populations, where these findings may not generalize to those at risk for AD and who stand to benefit the most from lifestyle engagement. Relatedly, there is a need for a mechanistic understanding of the cognitive processes and neural substrates that support intervention response in individuals at risk for AD to provide a deeper understanding underlying the efficacy of interventions21.

In the current study, we aim to (1) identify the potential profiles of individuals’ lifestyle habits based on healthy diet adherence, exercise engagement, cognitive activity engagement, and social network, and (2) identify the dissociable neurobiological substrates among lifestyle phenotypes based on functional and structural brain imaging. We examined these research questions in a well-characterized high AD-risk population from the Pre-symptomatic Evaluation of Experimental or Novel Treatments for Alzheimer’s disease (PREVENT-AD) longitudinal cohort22. Given the heterogeneity of results and distinct methodologies employed in previous studies, we chose to use a data-driven approach to identify lifestyle phenotypes and the underlying neural distinctions.

Method

Participants

We obtained cross-sectional data from 139 cognitively healthy older adults, as a subsample from the Presymptomatic Evaluation of Experimental or Novel Treatments for Alzheimer’s disease (PREVENT-AD) cohort. This longitudinal cohort at McGill University, recruited participants who were cognitively normal but are at an increased risk of developing dementia and have an immediate family member with a history of AD. Selection criteria for the 139 sub-sample included cognitive status, data completeness and quality. Participants with potential cognitive impairment, evaluated by a neuropsychologist, were not included in this sub-sample. Demographic information (age, sex, years of education, and APOE genotype) was collected during the baseline visit of each participant, which occurred between 2011 and 2017. APOE genotype was labeled as whether individuals were heterozygous carriers for alleles ε3 and ε4 or not. The consent form was reviewed and signed by all participants from this study. Specific consent forms were obtained from participants prior to each experimental procedure. The consent form, protocols and study procedures were approved by the McGill Institutional Review Board and/or Douglas Mental Health University Institute Research Ethics Board. All procedures were carried out in compliance with the ethical principles of the Declaration of Helsinki.

Behavioral data

Psychosocial and lifestyle questionnaires were administered through Qualtrics from 2017 to 2019 at separate timepoints (https://www.qualtrics.com). Because the majority of lifestyle variables were collected in 2018, the age of each individual was all corrected to their age in 2018. Below is the specific information about the lifestyle, psychosocial, and cognitive measures that were used in the current study.

The lifetime total physical activity questionnaire

Exercise and Sports sub-scale. This questionnaire was developed and validated by Friedenreich et al. 1998. The participants reported the frequency of each type of physical activity they engaged in by day, week, month, and year, and the duration of each activity per session. The accumulated time spent in each activity per year was calculated and summed across all activities, in order to derive the total time in exercise and sports engagement in a year. The total score was used as an indicator of physical activity engagement of individuals. The scale was assessed throughout the year 2017.

The lifetime cognitive activity scale

This scale was developed and validated by Wilson et al. 2003. Participants reported the frequency of different cognitive activities in different age stages: 6, 12, 18, 30–40, 40 years old and currently. The present study focuses on current cognitive engagement only. The sub-scale of current cognitive activity engagement lists 11 activities, and the participants indicated the frequency for each activity on a Likert scale from 1 (once per year) to 5 (every day). The sum score for this sub-scale was calculated as an indicator of cognitive activity engagement for individuals. The scale was assessed in 2018.

The diet habit questionnaire

This questionnaire consists of 4 items from the Mediterranean Diet Adherence scale (Martínez-González et al. 2012) and 22 items from the Educoeur questionnaire diet sub-scale (Goyer et al. 2013). Red meat intake was removed from the analysis because of a large portion of missing values. Higher scores suggest higher adherence to recommended dietary habits across all items. The summed score was calculated as an indicator of adherence to a healthy diet. The scale was assessed throughout the year 2018.

The social life frequency scale

This in-house questionnaire contains four items asking about the frequency of getting together with friends/relatives, inviting friends/relatives to participants' homes, visiting friends/relatives at their homes, and on the telephone with friends/relatives over the past month. The participants rate each item on a Likert scale from 1 (Not at all in the past month) to 6 (Everyday). The sum score across 4 items was calculated as an indicator of social activity engagement. The scale was assessed in 2018.

The geriatric depression scale (GDS)

This questionnaire was developed and validated23 to assess depression symptoms in older adults. The 15-item version was used. For each item, participants responded with yes or no. The maximal score is 15 and a higher score indicates a greater level of depression.

The geriatric anxiety inventory (GAI)

This questionnaire was developed and validated24 to assess anxiety symptoms in older adults. The 20-item version was used. For each item, participants responded with agree or disagree. The maximal score is 20 and a higher score indicates a greater level of anxiety.

Stress subscale from the depression anxiety stress (DASS)

This subscale was developed and validated25 for assessing stress. This subscale has 14 items, each on a Likert scale from 0 to 3. A higher score indicates a greater level of stress.

Subscale from the psychological wellbeing scale

This scale was developed and validated25 and the subscale to assess purpose of life was administered in the current study. This subscale has 14 items, each on a Likert scale from 1 (Strongly agree) to 6 (Strongly disagree). Higher score indicates greater level of purpose of life.

The apathy evaluation scale (AES)

This questionnaire was developed and validated26 to assess amotivation across cognitive, behavioral and emotional domains. The questionnaire contains 20-items. For each item, participants responded on a scale from 1 (A lot) to 4 (Not at all). The maximal score is 80 and higher score indicates more apathy.

The big five inventory

This inventory was developed and validated27 to assess five personality dimensions: Extraversion, Neuroticism, Consciousness, Agreeableness, and Openness. The questionnaire has 44 items in total, participants responded to each by rating from 1 (disagree strongly) to 5 (agree strongly). The summary scores for all five dimensions were calculated separately.

The repeatable battery for the assessment of neuropsychological status (RBANS)

This neuropsychological battery was developed with five cognitive domains: Immediate Memory (i.e., list learning, story remembering), Visuospatial Ability (i.e., figure copy, line orientation), Language (i.e., picture naming, semantic fluency), Attention (i.e., digit span, coding), and Delayed Memory (i.e., list recognition, story recall, figure recall)28. The battery was administered annually to all participants enrolled in PREVENT-AD. Data from 2016 and 2019 were included in the current analysis and change scores were computed by subtracting performance in 2016 from 2019, in order to assess the longitudinal change in cognitive function over time. There are 22 out of 139 participants had incomplete RBANS data, therefore 117 participants were included in the analysis.

MRI data acquisition and preprocessing

Participants were scanned in a Siemens TIM Trio 3 Tesla magnetic resonance imaging (MRI) scanner. The Siemens standard 12-channel coil was used. T1-weight structural data were acquired using an MPRAGE (Magnetization Prepared Rapid Gradient Echo Imaging) sequence. The parameters were as follows: repetition time (TR) = 2300 ms, echo time (TE) = 2.98 ms, inversion time (TI) = 900 ms, flip angle = 9°, Field of View (FOV) = 256 × 240 × 176 mm, phase encode A-P, GRAPPA acceleration factor = 2. Resting-state functional MRI data were acquired by an echo-planar imaging (EPI) sequence. The scanning duration for each run was 5.04 min and two runs were performed sequentially. The parameters were as follows: repetition time (TR) = 2000 ms, echo time (TE) = 30 ms, flip angle = 9°, FOV = 256 × 256 × 252 mm, phase encode A-P, Bandwidth = 2442/px. Thirty-two slices were collected in each run.

The preprocessing of both functional and structural data was performed using the fMRIprep pipeline29. The structural images went through skull stripping, brain tissue segmentation, spatial normalization to Montreal Neurological Institute (MNI) space, and surface reconstruction. Preprocessing steps for the resting-state functional images include head motion correction, realignment, slice timing correction for sequential acquisition, susceptibility distortion correction, co-registration to reconstructed structural images, and spatial normalization to standard space. Some extra preprocessing steps were performed on functional images in CONN toolbox30 as follows. The functional data were smoothed using a full-width half-maximum kernel of 6 mm. Volumes with framewise displacement above 0.5 mm and/or global blood-oxygen-level-dependent (BOLD) signal changes above 3 standard deviations were flagged as motion outliers. Noise reduction was performed with 10 noise components from white matter and cerebrospinal areas estimated using an anatomical component-based noise correction procedure (aCompCor) (Behzadi et al. 2007), 12 estimated subject-motion parameters estimated from fMRIprep, motion outliers, and constant and first-order linear session effects detected in CONN. Nine participants were removed because of having less than five minutes of scanning time after the outlier scans being removed31. These nine participants were not included in any analysis of the current study.

Structural and functional image features preparation

Schaefer 100 parcellation was applied to both functional and structural brain data32. Cortical area and thickness were extracted from the 100 parcels in T1 structural images, which resulted in 200 structural features. Fisher-Z transformed bivariate Pearson correlation coefficients between each pair of parcels were calculated in CONN toolbox for the resting state functional images, which resulted in (100 × 100 − 100)/2 = 4950 functional features.

Lifestyle phenotyping (K-means clustering)

The k-means algorithm was used to cluster the participants into different lifestyle phenotypes. K-means is an unsupervised methodology to investigate patterns in a dataset, by defining a priori the potential number of clusters (k) that exist in the dataset. For a given k, the algorithm will assign each data point into a cluster by minimizing the distance between that data point and the centroid (i.e., the center of a cluster). All data were transformed to z-scores initially and all analysis was done in R (version 4.1.3). The four lifestyle variables (physical activity, cognitive activity, diet, and social activity) were separately entered into a linear regression model as dependent variables, with age, years of education, sex, and APOE4 carrier status as independent variables. Residuals from the regression models of the four lifestyle variables were entered into k-means function from the stats package, with number of clusters set from 1 to 9. The Clusterboot function from fpc package was used to calculate the Jaccard index by bootstrapping the clustering 100 times and calculating how many datapoints stayed in the same cluster across resamples. The Jaccard index was calculated by the ratio between number of data points assigned to the same cluster and the total number of data points. We determined the optimal k using two criteria: 1) highest Jaccard index, a higher value indicates higher stability, and 2) ‘elbow’ for within sum of squares for k versus k + 1, a lower value of within sum of squares indicates higher compactness for a cluster, which is the distance of each participant from the centroid.

The final chosen clusters were validated by examining group differences in psychosocial variables and longitudinal cognitive changes. The goal of this step is to confirm that the resultant clusters are derived from real differences in the data rather than being spurious. Psychosocial variables included depression, anxiety, stress, apathy, purpose of life, and Big Five Inventory. Cognitive measures included the change in scores from 2019 to 2016 across the five dimensions of the RBANS battery. False Discovery Rate (FDR; p < 0.05) correction was applied within dimensions of Big Five Inventory, RBANS, and other psychosocial variables (i.e., depression, anxiety, stress, apathy, purpose of life) separately. A t-test was performed to examine the difference between cluster groups.

Imaging classification

We ran a classification analysis using neuroimaging features to classify groups obtained in the k-means clustering solution with optimal k. Age, sex, educational years, APOE4 genotype, and mean head motion (only for the functional features) were regressed out of each feature and the residuals were used in the prediction model, in order to control for potential confounding. To select the imaging feature set that would be used in the training model, a feature selection filter was applied to all imaging features using the sbf function in the caret package. For each imaging feature, a logistic regression model was generated with the clustering solution as a binary categorical dependent variable, and the imaging feature as an independent variable. Then the features that predicted the clustering at p < 0.05 in the logistic models were selected across a tenfold cross-validation manner.

The main classification model was built using a linear support vector machine (SVM) classifier with linear kernel in kernlab package. The linear SVM classifier tries to find a hyperplane that separates the data points in a N-dimension space (N = the number of features in this model). The analysis was done in a nested cross-validation manner, with 25 inner loop resampling and tenfold outer loop cross-validation, in order to avoid over-fitting when using the same samples for tuning and testing. The tenfold in the outer loop was consistent with the folds in feature selection described above. A list of cost C was created varying from 0.25 to 64. The analysis involved two steps: (1) Tuning the parameter cost in inner loop resamples by maximizing Area Under Curve (AUC) in a receiver operating characteristics (ROC) curve, (2) testing the model by applying the best cost value for each fold in outer loop cross-validation. Model performance was averaged across all 10 folds in the outer loop. The same pipeline with the same fold splits was repeated in the same seed with a Random Forest model and a SVM classifier with non-linear kernel, in order to examine the stability of the outcome across models. Number of predictors to be sampled at each split, and the minimum number of data points in a node for further splitting were tuned in the inner resample. The number of trees was set at 1000.

The performance of each classification model was evaluated by accuracy, sensitivity, specificity, and AUC. An AUC of 1 indicates perfect prediction, and an AUC of 0.5 performs at chance. The significance of accuracy was examined via a permutation test. The sample was permutated and generate a new accuracy which repeated 1000 times, in order to produce the null distribution of the accuracy. The p value was determined by the percentage of values from the null distribution that are equal or greater than the observed accuracy.

The whole schema for the analysis pipeline is depicted in Fig. 1:

Figure 1
figure 1

The classification analysis pipeline.

Pos-hoc analysis

In order to determine which neurobiological features were important for the classification of individuals into lifestyle adherence clusters, the weights for all features were extracted from the linear SVM model. Specifically, coefficients from the orthogonal vector to the hyperplane were extracted and the greater absolute value of the coefficient for each feature indicates greater importance in separating the clusters. We ranked the absolute values of coefficients and selected the features that ranked among the top 30 for no fewer than 8 out of the 10 outer folds as the final list of important features.

Results

Descriptive information

There are 139 participants included in the k-means analysis (age = 66.53 ± 5.04; 47 males; educational years = 15.50 ± 3.46; 48 APOE4 carriers). The distribution of each lifestyle habit and their correlation with each other are displayed in Fig. 2a. Spearman correlation was examined for each pair of lifestyle variables and the scatter plots are displayed in Fig. 2b. Cognitive activity engagement was significantly correlated with exercise engagement (r = 0.30, p = 0.001 FDR-corrected) and social activity engagement (r = 0.24, p = 0.01 FDR-corrected). While diet was not statistically associated with the other variables, a non-significant positive correlation between these lifestyle factors and all the other factors was observed.

Figure 2
figure 2

The distribution of each lifestyle variable (a). Scatterplots of the correlations between each pair of lifestyle variables (b). correlation coefficients and significance were displayed on each correlation pair (***p < 0.001, **p < 0.01, *p < 0.05).

Lifestyle phenotyping (K-means clustering) results

The within cluster sum of squares showed an ‘elbow’ at k = 2, as it decreased with a steeper slope from k = 1 to 2, compared with further increments in k (Fig. 3a). Additionally, the highest Jaccard index appeared at k = 2 (i.e., 0.90 for cluster1, 0.89 for cluster2), compared to clustering solutions with a higher k (from 2 to 9; Fig. 3b). Therefore, the solution of two clusters provided the best fit for the current k-means analysis. The scatter plots of pairs of lifestyle variables across all k values (i.e., from 1 to 9) are displayed in Supplementary Fig. S(1). Scatterplots allow visualization of the different centers (Fig. 4a). For the final two-cluster solution, we separately examined the two groups’ lifestyle variables and found the two groups represented adherence and non-adherence to healthy lifestyle choices (Fig. 4b).

Figure 3
figure 3

The total within sum of squares (a) and Jaccard index across different k (b), measures of cluster variability and stability respectively.

Figure 4
figure 4

Scatterplots for cluster = 2 among pairs of lifestyle variables (a). The location of black cross (X) indicates the centroid of each cluster. Mean of each lifestyle variable for the two groups derived from k-means clustering (b). Positive scores represent greater adherence to healthy lifestyle variable. Participants in Cluster 2 showed higher levels of engagement across each health behaviour (AL group) compared to participants in Cluster 1 (NAL group).

The demographic information and lifestyle engagement for both groups separately are summarized in Supplementary Table S(1). We used psychosocial variables (i.e., depression, anxiety, stress, apathy, purpose of life, big five personality), and cognitive change across 4 years (i.e., RBANS change scores for five dimensions from 2016 to 2019) to validate that the two clusters reflect meaningful behavioral patterns that extend beyond lifestyle choices. The results showed that the group with non-adherence healthy lifestyle (NAL) had a higher score in apathy (t = 3.25, p = 0.007 FDR corrected; Fig. 5a) and a lower score in Big Five extraversion subscale score (t = − 3.0, p = 0.01 FDR corrected; Fig. 5b) than the group with adherence to healthy lifestyle (AL) group. NAL group also showed more longitudinal decline in language index change (t = − 0.31, p = 0.015 FDR corrected; Fig. 5c).

Figure 5
figure 5

Group differences in apathy (a), extraversion (b), and language (c) index change. Green represents the AL group, and red represents NAL group.

Classification results

After feature reduction filtering, 255 brain imaging features were chosen as the final set of imaging features to be included in the classification model with 252 functional features and 3 structural features. The linear SVM model successfully classified each cluster assignment (p < 0.001; accuracy = 0.75, sensitivity = 0.72, specificity = 0.77, AUC = 0.87; ROC see Fig. 6). The non-linear SVM model and random forest model produced similar results (Supplementary Fig. 3).

Figure 6
figure 6

ROC for linear SVM classification model.

Feature importance

The features that were the most important to the classification model are displayed in Fig. 7. The functional connections that are important for the classification model were mostly between-network and cross-hemisphere. The importance value of each feature is displayed in Table 1 and mapped on the atlas in Fig. 7a. The connectivity direction for both AL and NAL groups are displayed in Fig. 7b. Nodes that showed higher importance are distributed mainly in the default mode network, dorsal attention network, sensorimotor network and visual network (Fig. 7c).

Figure 7
figure 7

Important functional features. (a) Anatomical map of important features and their relative importance values. Greater node size indicates higher frequency of this node appearing in this list. Thicker edge indicates greater importance value of the connection. Red indicates greater connectivity values in AL group, and gray indicates greater connectivity values in NAL group. (b) Functional connectivity directionality in both groups. Red indicates positive mean connectivity across subgroups, and blue indicates negative connectivity across subgroups. (c) Frequency of nodes from each intrinsic resting-state network (i.e., Yeo 7 networks)33 in the list of important features. VN (Visual Network), SMN (Sensorimotor Network), DAN (Dorsal Attention Network), VAN (Ventral Attention Network), LN (Limbic Network), FPN (Frontoparietal Network), DMN (Default Mode Network).

Table 1 The list of connectivity features that were most important to the model.

Discussion

In the current study, we identified two lifestyle phenotypes among older adults at-risk for AD and examined the neurobiological distinction between these two phenotypes. We applied rigorous data-driven machine learning methods, an important step towards generalizable findings and precision neurology. We have two major findings in this study. Firstly, the data-driven clustering approach revealed two distinct phenotypes that adhere to a healthier lifestyle and a less healthy lifestyle in all four categories of lifestyle habits. We found that older adults engaged in protective lifestyle behaviors across multiple domains, rather than in isolation. Between-group differences were also verified in increased extraversion, lower apathy levels, and more preserved cognitive change over time in adherence group. Secondly, the classification model identified functional connectivity features that successfully differentiated the two lifestyle phenotypes. Most features represented between-network functional connectivity, which revealed a distributed set of neural features related to lifestyle choices. The non-adherent lifestyle group (i.e., NAL group) showed a larger number of features representing greater between-network and cross-hemisphere functional connectivity than adherent lifestyle group (i.e., AL group).

We validated patterns in our data that showed two distinct sub-groups of people who either adhere well to all four lifestyle habits or adhered poorly to all four habits. The results indicated potential covarying factors across different lifestyle behaviors, which was under-examined in prior literature. Importantly, we found convergent validity for these two clusters using psychosocial and cognitive variables independent from the input of the clustering analysis. These findings converge with prior research. For example, older adults with higher extraversion and agreeableness were found to have both greater social networks and more moderate physical activity engagement34,35, which is consistent with our finding of the AL group having higher extraversion scores. There might be also a bidirectional relationship between executive function and health behaviors, such as physical activity36. Consistently, our results showed difference in language index changes between AL and NAL groups, and the semantic fluency subtest was driving this difference (t = − 3.14, p = 0.002; see Supplementary Fig. S(2)), which loads on executive control components37. Therefore, the distinction we observed across two multidomain lifestyle profiles indicates a potential set of psychological and neurobiological factors shared by many health behaviors, and at the same time provides validation that the two phenotypes found in our data-driven approach reflect ecologically valid patterns in the data.

We identified the neural features that differentiated individuals with differences in multidomain lifestyles using a classification prediction model. Features with higher weight and consistency for the prediction model were all between-network functional connectivity features, some of which showed greater functional connectivity in NAL group, while others showed greater functional connectivity in AL group. Nodes in temporal regions and prefrontal regions belonging to the DMN and DAN were found most frequently in these discriminative features (Fig. 7c). Previous studies also found that functional connectivity in DMN and attentional networks predicted adherence to an exercise intervention and a mental training programs38,39. Meanwhile, both DMN and DAN play an important role in supporting executive function40,41, which were found to contribute to voluntary physical activity engagement42. One potential psychological mechanism behind this relationship is that executive function facilitates initiating and adhering to health behaviors through “temporal self-regulation”36, with three important determinants for physical activity engagement: physical activity prepotency, intention, and executive function43. Combining the neurobiological distinction in nodes involving DMN and DAN and behavioral difference in verbal fluency between AL and NAL in our findings, it suggests that this connection between executive control and physical activity may generalize to multidomain lifestyles, and it might be a promising attempt to examine this theoretical framework across multiple lifestyle behaviors in the future.

Most of the features with stably high importance represented enhanced between network connectivity. More between-network features showed increased functional connectivity values in the NAL group compared to the AL group (9 vs 5). Specifically, individuals from the NAL group showed greater positive correlation or less anti-correlation between DMN and other task-positive networks (e.g., DAN, SMN) than the AL group. Studies have found that higher cross-network connectivity and lower within-network connectivity are associated with cognitive decline in healthy older adults44,45. A less segregated functional brain was also associated with AD symptoms46,47. Previous studies also identified the decreased anti-correlated relationship between DMN and task-positive networks (e.g. DAN, FPN) in both normal aging population and AD patients48,49,50,51. Regarding lifestyle, older adults who engaged more in physical and cognitive activity, showed greater modularity, a measure of network distinctiveness12. Therefore, in a summary, greater functional connectivity between task-negative and task-positive networks in the NAL group might suggest a more vulnerable connectivity status linked with lifestyle behaviors.

The final classification model included a majority of functional connectivity and only a few structural features after feature selection, and the most important and consistent features all consisted of functional connectivity variables. This indicates that the relationship between multidomain lifestyle habits and brain health might be primarily supported by cognitive reserve or functional plasticity (i.e., a more adaptable functional brain with higher efficiency and flexibility for cognitive processing) rather than brain reserve (i.e., the structural characteristics of a brain that copes with pathology and function loss)52. Although this observation is not necessarily consistent with some previous multidomain lifestyle studies that identified associations between lifestyle and selective aspects of brain structure19,20,53, the discrepancy may result from different methods and populations. The current study applied a data driven method by including all neural features in the same model, while other studies examined structural and functional features separately. Moreover, all of our participants were at high risk for AD. The structural characteristic in the current sample may already differ from other studies. Some high importance nodes in DMN in middle temporal gyrus (i.e., R DMN_Temp_1 and R DMN_Temp_2) and inferior frontal (i.e., R DMN_PFCv_2)32 overlapped with the rich club hubs that showed different nodal efficiency between AD patients and healthy older adults from a previous report54, indicating potential pre-symptomatic AD-related alternation in our current sample. Therefore, it is important to focus more attention on high AD-risk samples, and regions and networks that are vulnerable to AD pathology.

There are some limitations in the current study. First, we had a relatively small sample size, which may cause increased bias for the results accuracy55 and low stability56 for machine learning models. Accordingly, we interpret with caution individual features from the machine learning models. However, our study did show biological distinction among individuals with different lifestyle profiles, which provides insight into lifestyle-brain associations. Second, this was a cross-sectional design. We identified features linked to lifestyle behaviors, but we cannot establish any causal relationship between functional connectivity and lifestyles in the current study. Analyses using longitudinal or interventional study designs are needed in future research to further disentangle the bidirectional relationship between brain health and lifestyle habits. Third, the collection time of different variables did not always occur at the same time. The neuroimaging data were collected one or two years earlier than the lifestyle variables, and exercise variables were collected one year earlier than the other lifestyle variables. We hypothesize that lifestyle habits stay consistent across these years, particularly among older adults, but we cannot exclude the potential confounding of behavioral change. Lastly, the sample consists of a majority of females, which decreased the generalizability for the results. Nevertheless, we controlled the for sex by including it as a covariate in all models in the study.

In conclusion, we identified two phenotypes of lifestyle participation and the neurobiological distinction between them. The prediction model suggested an association between multidomain lifestyles and functional connectivity features in older adults at high risk for AD. Studies with larger samples and more diverse populations are needed to further examine brain-lifestyle relationships. Future studies should allocate more attention to investigating the psychosocial and cognitive factors that promote protective lifestyles, from real-world adherence to response and behavioral change following interventions.