Introduction

Brain atrophy in Alzheimer’s disease (AD) is associated with cognitive decline and the topological spread of neurofibrillary tangles (NFT)1. Neuropathological2,3,4 and in vivo neuroimaging5,6 studies challenge the hypothesis of AD as a single entity, supporting the hypothesis of AD as a heterogeneous disease. It was recently suggested that the heterogeneity in AD can be explained using two main dimensions, severity and typicality, which emerge in the form of various biomarker and clinical expressions7. Four AD subtypes are reported in the literature based on regional atrophy and/or NFT spread: typical, hippocampal sparing, limbic predominant7,8, and minimal atrophy subtypes. However, the most urgent questions are whether the observed heterogeneity reflects different disease stages or distinct subtypes, and if these subtypes finally converge at advanced stages of the disease7.

Advances in biomarker research, data collection, and computational methods, have substantially enhanced our ability to study the heterogeneity in different diseases9. These computational methods unite various in vivo pathophysiological markers to model disease heterogeneity. Research on classification of AD patients into meaningful groups with neuropathological4, neuroimaging8,10, clinical11, and biochemical12 biomarkers has shed light on the heterogeneity underlying the clinical AD diagnosis. However, current findings are based on cross-sectional analyses, which increase the chance that identified patterns reflect patient groups observed in different disease stages rather than distinct disease subtypes. A recent study modeled subtype biomarker trajectories in vivo from cross-sectional imaging datasets to implicitly infer disease stages13. That is a first step towards assessing and accounting for disease staging. However, we cannot exclude the chance that the identified patterns may still reflect different disease stages, since longitudinal information was not used for clustering, only for characterizing subtypes post hoc. This assumption is partially confirmed in models with various biomarker types (increased disease specificity) but remains unrealistic when a well-defined timescale of events for each patient is not in place. Recent reviews that presented the current approaches for identifying subtypes in heterogeneous diseases9 and summarized the existing AD subtypes in the literature, point out important data and methodological limitations that need to be overcome to reach a better understanding of the heterogeneity in AD7,8,14. According to their conclusions, the field is lacking longitudinal AD subtyping based on a clear timescale (i.e., age at measurement, age at disease onset) in order to disentangle disease stages from disease subtypes.

In this study, we aimed to assess whether heterogeneity in AD’s brain atrophy patterns results from observing patients at different disease stages or reflects distinct subtypes with specific atrophy and cognitive trajectories. Longitudinal data were modeled with a longitudinal Bayesian clustering framework15 over 8 years from the clinical disease onset (a clear timescale) to assess disease staging and heterogeneity simultaneously (previous studies used only cross-sectional data). This is a significant step towards the discovery of differential atrophy trajectories in AD, using structural magnetic resonance imaging (MRI) data from four international multi-center cohorts from four continents. Only amyloid-positive AD patients were included to increase diagnostic specificity (discovery dataset). In addition, with our approach, we could assess whether atrophy subtypes7,8 converge during the disease course, a vital step towards understanding the heterogeneity in AD. Frequency predictions of the discovered atrophy patterns were performed in an external validation dataset to assess the ability of our model to classify new patients with one or two MRI timepoints available. Finally, we assessed between and within subtype differences in cognitive decline and relevant disease modifiers such as APOE genotype, education, and premorbid intelligence.

Results

Our sample included 1196 individuals (891 AD dementia patients and 305 cognitively unimpaired individuals) from four cohorts (Supplementary Table 1). The discovery and validation datasets consisted of 320 and 571 AD dementia patients, respectively. Cohort demographics are summarized in Table 1.

Table 1 Demographic and clinical characteristics of participants in the cohorts included in the training and validation cohorts

The longitudinal gray matter patterns that we estimated for the cognitively unimpaired (CU) and AD groups, show that the CU group deteriorates in gray matter with aging (Fig. 1A) and as expected that the AD group has more extensive atrophy (Fig. 1B). The correction method (gray matter of each AD patient standardized with respect to the CU model underlying Fig. 1A) that was applied to the AD dataset shows, at the population level, that AD presents with distinct atrophy patterns depending on the patient’s age. Patients under 65 years of age typically have more posterior cortical atrophy, while patients over 75 years old show a prototypical AD mediotemporal atrophy pattern (Fig. 1C).

Fig. 1: Atrophy at population level in the CU and AD groups.
figure 1

For the calculation of cognitively unimpaired (CU) and Alzheimer’s disease (AD) atrophy patterns at the different ages (A, B), the data were z value transformed. One mixed effect multivariate model was used to visualize the differences in atrophy between the two diagnostic labels (red color; more atrophy, yellow color; less atrophy). The upper right color legend refers to standard deviations from the sample mean (0 corresponds to the mean of AD and CU sample values). At 55 years of age, AD has seemingly similar atrophy levels to the CU population and differences show up with ageing. For the visualization of the AD data correction based on the CU sample (C), two separate mixed effect multivariate models (one for the CU sample and one for the AD sample) were used. The AD data were standardized based on the CU data. Thus, the lower color legend shows standard deviations of the AD population below the CU population (w-values, 0 corresponds to the mean of CU sample values). Younger AD patients (between 55 and 65 years of age) show more posterior atrophy compared to controls, while older AD patients (above 75 years old) show more mediotemporal and hippocampal atrophy compared to controls. For the visualization models the fitted values in panels A, B, and C are controlled for MRI field strength, cohort.

Clustering evaluation

Longitudinal clustering showed that the 2-cluster and 5-cluster models were the most optimal with marginal differences. The 2-cluster model was preferable for one clustering criterion (fewer random effect parameters with high autocorrelation in their MCMC samples) while the 5-cluster model was more favorable for another (lower model deviance) (see Supplementary Table 2). The other clustering solutions had worse quality score combinations (either many autocorrelated MCMC samples or high model deviance) (Supplementary Table 2)15. The 2-cluster solution (Supplementary Fig. 1, fitted values) separated the discovery set only in terms of cortical severity (high versus low brain atrophy), whereas the 5-cluster solution (Fig. 2, fitted values) revealed spatially different atrophy subtypes. Since different spatial atrophy subtypes are of greater importance from an exploratory perspective and given the previous literature in AD subtypes7, we chose to interpret the results of the 5-cluster solution.

Fig. 2: Fitted values for cortical thickness and subcortical volumes for the different longitudinal patterns of atrophy from AD onset.
figure 2

Atrophy-fitted values from clinical AD onset. Each row represents one cluster of patients with the corresponding pattern of atrophy. The color scale illustrates cortical thinning and subcortical volume loss compared to Aβ negative, cognitively unimpaired (CU) individuals (red color; more atrophy, yellow color; less atrophy). Data are w-value transformed and therefore colors represent standard deviations below the CU group controlled for aging. Fitted values are fixed for intracranial volume and MRI scanner field strength.

Cluster atrophy patterns and discriminant features

In the discovery dataset, we found five clusters of patients that showed gradual or steep longitudinal atrophy progression (Fig. 2). The largest cluster, minimal atrophy (MA, 59.1%), had very little mediotemporal atrophy at the clinical AD onset compared with the CU group (Fig. 2, 1.6 standard deviations below the CU population, Supplementary Fig. 2, 0.5 standard deviations below the CU population). It progressed slowly with entorhinal and hippocampal involvement that extended to other temporal lobe regions. The second largest cluster, limbic predominant atrophy (LPA, 29.1%), presented with entorhinal cortex atrophy at the clinical onset, with later involvement of other temporal lobe regions including the hippocampus. The third cluster, LPA+ (7.2%), was spatially similar to the LPA cluster but exhibited more atrophy in the entorhinal cortex at the AD onset. Atrophy progressively extended to the temporal lobe and then further to the rest of the cortex. We also found a cluster, diffuse atrophy (DA, 1.6%), with temporal and frontal involvement already at AD onset, where the atrophy diffused fast during the disease course. The last cluster, hippocampal sparing (HS, 3.1%), had parietal lobe atrophy and no involvement of medial-temporal lobe structures at disease onset, but fast atrophy progression. The MA and LPA patterns converged to widespread temporal lobe atrophy while the LPA+ converged to DA seven years after the disease onset. The most atypical atrophy pattern, HS, also progressed to a more diffuse atrophy pattern over time but with less involvement of the hippocampus. The cluster names were decided based on the atrophy pattern at AD onset. Table 2 provides a four-dimensional characterization of each subtype to illustrate how the patterns of atrophy and cognition evolve over time (see also Supplementary movie 1, Table 3, Fig. 3).

Table 2 Cluster/cognitive profiles summary
Table 3 Cluster characteristics
Fig. 3: Cluster-specific cognitive trajectories after the clinical onset of dementia.
figure 3

The trajectories are estimated with mixed effect models to account for intra subject and cohort variability. MMSE Mini-Mental State Examination, ADAS Alzheimer’s disease assessment scale. Dotted lines represent 95% confidence intervals.

The cluster intercepts (AD onset) showed that the HS and DA clusters exhibit considerably thinner cortex in the parietal lobe than the other three clusters (Figs. 2 and 4). The LPA cluster has less entorhinal atrophy than the LPA+. Regarding the cluster slopes (atrophy evolution over time), the posterior cingulate gyrus, pars opercularis, pars-orbitalis gyri, and insula discriminate both DA and HS from the other three clusters (Figs. 2, 4, Supplementary Fig. 2). The atrophy slopes of the HS cluster were the steepest, followed by the DA and the LPA+ clusters.

Fig. 4: Longitudinal clustering model cluster-mean intercept and slope atrophy coefficients.
figure 4

Each row of the heatmap is grouped in terms of neuroanatomical spatial position (red color; more atrophy, yellow color; less atrophy). Columns that represent different clusters are grouped in terms of similarity between clusters. Vertical lines within cells represent cluster region mean ROI value (the vertical dotted line represent the value 0, no difference from the CU sample). The diffuse atrophy cluster has the lowest intercept and it is not grouped with any other cluster. The cluster slopes of the diffuse atrophy and hippocampal sparing clusters are grouped together. The minimal atrophy and limbic predominant atrophy/Limbic predominant plus are grouped together.

The five longitudinal patterns of atrophy (Fig. 2) revealed a fine grouping that included variations in the stereotypical distribution of atrophy staging in AD5 compared to the 2-cluster solution (Supplementary Fig. 1). In Table 3, we have summarized the longitudinal patterns of atrophy, to show the different features of the five longitudinal patterns and the patient characteristics related to them. After the main cluster analysis, the post hoc hierarchical clustering of cluster-specific atrophy intercepts and slopes (Fig. 4, slope dendrogram and figure legend) revealed quantitatively, that MA, LPA, and LPA+ have similar spatial distribution of atrophy over time (however, different atrophy levels at the AD onset and different rates of atrophy progression) starting in the mediotemporal lobe and spreading further into the neocortex. The HS pattern follows another spatial atrophy distribution, starting in cortical regions. The DA cluster is quantitatively grouped together with the HS pattern but expresses both progression atrophy patterns since we observed it in a later disease stage (already widespread atrophy).

Cluster characteristics

The percentages of patients from each cohort in the five clusters were similar (Table 3). In the discovery dataset, MA had the highest prevalence of APOE e4 carriers (75%), while HS had the lowest (40%). Patients in the DA and HS clusters had higher education levels (>15 years) followed by patients in the MA, LPA, and LPA+ (≤15 years). Using the MA (the largest cluster in the dataset) as reference group we found significantly lower American National Adult Reading Test (ANART) scores in LPA+ and HS (p < 0.05). Mini-Mental State Examination (MMSE) at AD onset was significantly worse for LPA (p < 0.05) (Fig. 3). Longitudinally, LPA+ and HS had the fastest decline in MMSE (p < 0.05). Regarding the Alzheimer’s disease assessment scale (ADAS-cog) subscales, memory (word recall) was initially lower in LPA and LPA+ had the fastest decline over time in that domain. Language (following commands) and praxis (constructional) were significantly worse for the HS than the other clusters at AD onset. Orientation (ADAS) was worse for the LPA+ at AD onset.

In the model validation, no differences in amyloid-\({{{\rm{\beta }}}}\) (Aβ) status between clusters were found. Information regarding patient medical history was available for the Alzheimer’s Disease Neuroimaging Initiative (ADNI) and the Japanese Alzheimer’s Disease Neuroimaging Initiative (J-ADNI), but not for the Australian Imaging, Biomarkers and Lifestyle study (AIBL) or the AddNeuroMed cohorts. A summary of the cluster medical history characteristics can be found in Supplementary Table 3. The distribution of disease duration at MRI visit for each cluster is presented in Supplementary Table 4.

Intercept and slope covariance matrices

MA had the greatest total nodal strength and was used as a reference group for pairwise cluster comparisons of intercepts and slopes. The nodal strength of the LPA and LPA+ was lower with few exemptions (Fig. 5). The DA had higher nodal strength in only a few medial (frontal, temporal, and occipital) brain regions (intercepts and slopes) and the HS had higher nodal strength at the intercept of some ventromedial prefrontal and medial temporal regions. Cluster-specific intercept and slope covariance matrices are shown in Supplementary Fig. 3.

Fig. 5: Comparison of cluster-specific covariance matrixes with node strength.
figure 5

The cluster-specific intercept (A, C, E, and G) and slope (B, D, F, and H) covariance matrices were compared with network theory. Sphere diameter shows the node strength of each region. The regions where the minimal atrophy cluster has higher nodal strength than the other clusters are shown in red, while blue is used in the opposite case. The networks are presented in lateral sagittal and transversal view.

Model validation

Our model was validated in two ways. First, we used an independent external dataset of unseen patient MRIs, to assess whether the classification of new data in one of the five longitudinal atrophy patterns yield sensible results. In addition to that we applied clustering separately to ADNI and J-ADNI/AIBL datasets.

The cluster probabilities show that few patients had a high probability of belonging to more than one clusters in the discovery dataset (Supplementary Table 5), and even fewer patients in the validation dataset (0.009% of the dataset, Supplementary Table 6, Supplementary Fig. 4). Finally, median cortical and hippocampal atrophy at the median disease duration for each cluster in the validation dataset showed high similarity to the model’s fitted values at the same disease stage (Fig. 6, Supplementary Fig. 5).

Fig. 6: Comparison of model-fitted values and validation dataset atrophy levels.
figure 6

Atrophy-fitted values after the AD onset for the trained clustering model versus the new validation dataset. New observations were classified to each cluster, and median disease duration was calculated. Then atrophy-fitted values at the median disease duration of each cluster were calculated through the clustering model (middle column). Median atrophy maps (group median atrophy) for the new data of each cluster are presented in the left column. The right column shows the hippocampal volumes of each cluster’s (boxplot colors: green; minimal; n = 420, olive; limbic predominant; n = 283, orange; limbic predominant+; n = 12, blue; diffuse; n = 13, purple; hippocampal sparing; n = 8) new observations (repeated measurements are included) and the model-fitted value hippocampal atrophy (green vertical line). The color scale of the cortical maps (left and middle column) reflects AD atrophy levels compared to a multicohort dataset of \(A\beta \) negative cognitively unimpaired (CU) individuals (red color; more atrophy, yellow color; less atrophy). Data are w-value transformed and therefore colors represent standard deviations bellow the CU group controlled for aging. Fitted values are fixed for intracranial volume and MRI scanner field strength.

Moreover, when clustering was applied separately to ADNI and J-ADNI/AIBL datasets, the former dataset showed five and the latter dataset showed four different patterns of atrophy (Supplementary Figs. 6 and 7 (uncorrected version of Supplementary Fig. 6), Supplementary Tables 78). The atrophy patterns found in the separate cohorts were similar to the overall discovery dataset, including MA, LPA, LPA+, DA, and HS cases. Quantitatively, MA is more similar (in terms of intercept and slopes distances) to the ADNI cluster 3 and J-ADNI/AIBL cluster 3, LPA is more similar to ADNI cluster 2 and J-ADNI/AIBL cluster 2, LPA+ is more similar to ADNI cluster 1 and J-ADNI/AIBL cluster 3, DA is more similar to ADNI clusters 4 and J-ADNI/AIBL cluster 4, and finally HS is more similar to ADNI cluster 2 and J-ADNI/AIBL cluster 1 (Supplementary Figs. 67, Supplementary Tables 78). The similarities between the ADNI and J-ADNI/AIBL datasets can be found in the supplementary analysis (Supplementary Fig. 7).

Discussion

A major contribution of this study is the transition from a cross-sectional understanding of AD subtypes to the perspective brought by longitudinal clustering. Some of the previously reported AD subtypes seem to reflect different stages of the disease that can be observed in our five estimated longitudinal atrophy patterns. Hence, our data contribute a step towards solving the long-lasting problem of disentangling disease stages from actual disease subtypes. This was enabled by modeling longitudinal data using a clear timescale, i.e., over eight years, from disease onset in a large multiethnic cohort of 891 AD dementia cases from four continents. Another important finding is that AD subtypes with clearly distinct atrophy trajectories may converge in late disease stages. This introduces a new understanding of neurodegeneration in AD, which combined with knowledge of neuropathological and clinical heterogeneity, could set the ground for future personalized predictions of biological changes and cognitive decline in AD.

At the modeled clinical disease onset, our method successfully identified the same patterns of atrophy previously identified in neuropathological and neuroimaging subtyping studies (minimal atrophy, limbic predominant, typical AD, and hippocampal sparing)5,7,8,13,16. Our results revealed two main pathways of atrophy. We introduce the term pathway to describe AD patients that show similar spatial distribution of atrophied brain regions over time. Within the same atrophy pathway, patients may progress faster (LPA+) than others (LPA and MA) but their spatial distribution of atrophy over time is similar. This pathway contrasts with the second different atrophy pathway in AD, which has a different spatial distribution with mainly cortical atrophy over time. The differences in progression rates also reflect the rates of cognitive decline of the patients. It is a very important future aim to understand the factors underlying of these differences in progression within the same pathway but also between the different pathways that we have identified.

The minimal atrophy (atrophy limited to the entorhinal cortex), the limbic predominant (atrophy mainly in limbic areas), and the typical (widespread atrophy in the hippocampus, temporal, parietal, and frontal lobes) AD subtypes16, were identified in some disease stage of our MA, LPA, or LPA+ longitudinal atrophy clusters. MA was the most representative cluster in the datasets under investigation and it had the highest variability within cluster. Clustering methods often identify one cluster that represents the most prevalent pattern in a dataset which is an average of more heterogeneous observations than the pattern that results from the remaining clusters in the dataset16. It is important to stress that our MA cluster includes patients that are grouped in the minimal and limbic predominant patterns of atrophy, and potentially some early stage typical AD patients reported in the literature7. This is the case, since in our study we model trajectories of atrophy from the disease onset accounting for longitudinal structural changes in CU \(A\beta \) negative subjects. Through this type of modeling, we connected patterns of atrophy from the literature by modeling atrophy trajectories and therefore disease staging explicitly. Our MA and LPA clusters probably belong to the same AD subtype observed in two distinct stages, since MA patients reached the LPA levels (baseline) two years after the AD onset. The differences in cognitive intercepts (MMSE and ADAS word recall) between our MA and LPA clusters support the view that they reflect different disease stages. The LPA+ cluster appears to be on the same atrophy pathway but with faster atrophy rates in comparison to the MA and LPA clusters. Patients in the LPA+ cluster had the steepest decline in cognition among the five identified clusters, including memory and orientation. LPA+ patients had similar APOE e47, education and disease onset as in MA and LPA. However, premorbid intelligence, a proxy for cognitive reserve17, was significantly higher in LPA+ than in MA and LPA. We believe that due to high cognitive reserve, patients of the LPA+ cluster can reach higher levels of brain atrophy than the MA and LPA clusters, while maintaining similar clinical severity until they reach the AD onset17. The dynamics of brain atrophy over time in the MA, LPA, and LPA+ clusters differed. However, our current data seems to indicate that these three longitudinal atrophy clusters belong to the same atrophy pathway in AD, namely the mediotemporal atrophy pathway. Atrophy in this well-documented pathway is shown to correlate with the neurofibrillary tangle pathology at autopsy1,5,18. Even though these three clusters (MA, LPA, and LPA+) belong to the same atrophy pathway, their rates of atrophy and cognitive decline differ substantially, which can have important clinical implications. These observed differences are likely due to a combination of protective and risk factors as well as potential concomitant non-AD brain pathologies7. For example, it was shown by Ferreira and colleagues, that the location and frequency of markers of small vessel disease differ between AD subtypes19.

Our HS cluster resembles the hippocampal sparing subtype described in previous neuropathological and neuroimaging subtyping studies5,7,8,13,16. This subtype is more often characterized by cortical atrophy in comparison to the other AD subtypes7,8,16,18. In our study, some characteristics of the HS cluster included steep atrophy trajectories, a lower frequency of the APOE e4 allele7, high premorbid intelligence, more years of education, and early AD onset, which is in line with the characteristics associated with the hippocampal sparing subtype reported by previous studies7,8,13,16. This cluster had the lowest frequency, which is also in line with previous studies7,8. The chances of finding more hippocampal sparing patients were reduced since the cohort selection criteria included the amnestic phenotypic presentation of AD, which is frequently related to typical AD and thus the mediotemporal atrophy pathway4. The significantly affected constructional and ideational praxis is a key characteristic of the hippocampal sparing subtype7,13,16, which was also confirmed in our study. Comparisons between our MA and HS cluster covariance patterns revealed network differences between these two groups. In the MA, anatomical differences due to the disease were predominantly localized in the medial-temporal lobe and cortical regions combined as a network at the AD onset. On the other hand, the HS cluster network differences at the AD onset also involve the basal ganglia. Moreover, the HS cluster had higher nodal strength at the intercept of some ventromedial prefrontal and medial temporal regions from the MA cluster. Based on all these results, we believe that the HS pattern of atrophy represents a distinct atrophy pathway in AD, namely the cortical pathway.

To explain the atrophy trajectories of our DA cluster is challenging since excessive frontal and temporal atrophy was already present at the clinical onset. Our data showed that in advanced stages on the mediotemporal and cortical pathways of atrophy, AD patients may develop comparable levels of atrophy that are similar to our DA cluster. As a result, this cluster of patients can potentially belong to either of the two pathways of atrophy. Similarly to our LPA+, cognitive reserve in our DA cluster (education exceeded 15 years on average) may explain the greater atrophy levels (at dementia onset)7,17. Our DA cluster had a similar pattern of atrophy to that of the typical AD atrophy subtype reported in the literature7,8,13,16, but lower frequency. In a recent cross-sectional clustering study using tau PET that mainly included preclinical AD, no cluster had spatial tau distribution similar to the typical AD pattern of atrophy, but the cortical and medial-temporal patterns of tau were observed10. Further, two other studies in prodromal AD found clusters of individuals with decreased temporal-parietal glucose metabolism20 or increased temporal-parietal atrophy21 (typical AD pattern), but in low sample frequencies, which is in line with our findings.

Recently, it was proposed that \(A\beta \) aggregation in the default mode network (DMN) is predominantly associated with within-network but distant glucose hypometabolism22. Moreover, glucose metabolism, atrophy, and tau pathology are closely linked in AD7,18,22. We speculate that the mediotemporal path of neurodegeneration in AD may be initiated in the vulnerable temporal lobe after enough is deposited in distant DMN regions. In contrast, the cortical atrophy pathway patients may show less initial temporal lobe atrophy (and amnestic symptomatology) partially because they respond differently to \(A\beta \) aggregation in the DMN due to compensation mechanisms22 such as cognitive reserve17.

Our study has addressed some important methodological challenges that the existing literature of biological subtypes has not overcome so far. To our knowledge, this is the first time that AD atrophy subtypes were discovered based on modeling longitudinal biomarker trajectories8. An immediate advantage of our longitudinal clustering approach is that it overcomes the assumption that subjects of a cluster (cross-sectional analysis) remain in the same cluster when the disease advances, which is unrealistic8. Previous studies have employed arbitrary timescales to model biomarker progression8,10,13. Our estimates are based on a clearly defined timescale, namely the time from clinical onset. This approach provides the unique possibility to generate interpretations based on disease staging that help to trace abnormal changes early in the disease course of each cluster. Previously, longitudinal interpretations could not directly relate back to data in hand because they were not anchored to a specific timescale13. We calculated atrophy w-values for each patient corrected for the effects of aging in brain morphology based on a dataset of longitudinal \(A\beta\) negative CU individuals. Our model for the correction of ageing effects on the atrophy values, as it was shown in the results, identified the excess atrophy due to AD at different ages correctly and is in line with the literature comparing early and late onset AD23. This approach helped to estimate the within-subject variance more precisely and therefore account for the effects observed in aging9,15,24, which has been a limitation of cross-sectional estimations9,16,18. A common pitfall of clustering studies is to focus on finding labels for observations depending on their features in a population, which tends to overfit the training set. External validation datasets help to assess the ability of clustering models to generalize8. We found that our longitudinal atrophy estimates and the unseen atrophy patterns in the validation dataset were highly concordant. Moreover, the application of longitudinal clustering separately in the ADNI and J-ADNI/AIBL cohorts showed similar longitudinal atrophy patterns to those found in the whole discovery dataset with small variations. The low sample percentages that some clusters exhibited, is attributed to the underrepresentation of rare subtypes in some cohorts that focused on the typical AD phenotype, the lower sample that was used in the separate cohorts for clustering, and to the ability of our method to identify clusters of very low prevalence if they exist15. Concordance was high for the most prevalent atrophy patterns and lower for DA and HS, due to low sample sizes and cohort differences. Between ADNI and J-ADNI/AIBL cohorts, a quantitative assessment showed increased similarity in longitudinal atrophy trajectories, with small variations due to small sample sizes and cohort variability. Of interest, the hippocampal sparing and diffuse atrophy patterns of atrophy were found in both datasets but with lower prevalence than in the complete discovery dataset. This happened due to the split of the discovery dataset in smaller datasets that underrepresent the AD population. AD subtypes of lower prevalence in the population7, are doomed to be underrepresented or disappear when clustering is applied to small datasets9. The combined analysis of the cohorts in the discovery dataset with one model instead of building one clustering model per cohort, allowed us to build a single statistical model that produced more accurate estimates due to a larger sample size. Importantly, since our study was mainly based on longitudinal information from repeated cross-sectional measurements, we avoided to interpret structural relations between brain regions based on cross-sectional correlations. Instead, we focused only on the longitudinal correlation between brain regions which is based on within patient longitudinal trajectories.

Our study has some limitations. Only atrophy markers were modeled in the context of AD heterogeneity. Pre-AD scans were not included. This reduced our ability to infer atrophy patterns that precede the diagnosis of AD dementia. In the future, we envision combining and comparing other imaging modalities longitudinally, thus extending our current analyses to incorporate information about tau-related pathology. Moreover, the future addition of biomarkers of non-AD pathologies in the clustering studies design will help in understanding the contribution of comorbidities in AD subtypes. The inclusion of subjects from four different continents is a strength since it increased the variability in the sample and therefore represented the AD population better, but it is also a limitation due to variability in MRI assessments. Another limitation is the short follow-up period for AD patients included in the study. A future re-estimation of atrophy trajectories will include more MRI visits per patient to obtain better estimates. However, a strong methodological aspect of this study is the reconstruction of longitudinal subtype-atrophy profiles over the dementia part of the AD continuum, based on longitudinal individual patients’ data that comprised short segments of the disease continuum. Future studies should also include multiple MRIs from patients that are followed up from the preclinical until the dementia stage. The cohorts were harmonized to reduce MRI variability. Beyond these limitations, we assumed that the CU population has homogeneous brain morphology. Future studies should investigate whether CU individuals age differently and incorporate this information in the context of AD heterogeneity.

In conclusion, based on a large multiethnic cohort of AD dementia patients, we discovered five longitudinal patterns of brain atrophy that group the previously reported AD subtypes into two atrophy pathways (a mediotemporal and a cortical). We introduced a different understanding of the neurodegenerative aspect of AD heterogeneity, by shifting from the cross-sectional understanding of AD subtypes to the perspective brought by longitudinal clustering. Our study is a step forward toward answering an urgent question, whether the observed heterogeneity in AD reflects disease stages or distinct biological subtypes. We believe that with the help of our proposed model, it will be possible to unravel the heterogeneity in AD, thus enabling precision medicine and potentially leading to successful disease-modifying treatments in the future.

Methods

Study design and participants

This study includes 891 AD dementia patients and 319 CU individuals from four international multi-center cohorts: Alzheimer’s Disease Neuroimaging Initiative (ADNI, http://adni.loni.usc.edu) Japanese ADNI (J-ADNI, https://humandbs.biosciencedbc.jp/en/hum0043-v1)25, AddNeuroMed (https://consortiapedia.fastercures.org/consortia/anm/)26, and the Australian Imaging, Biomarkers and Lifestyle study (AIBL, Australian ADNI, https://aibl.csiro.au/)27 (Table 4). The AD inclusion criteria of the four cohorts were similar since the research protocols of J-ADNI, AIBL, and AddNeuroMed were designed to be comparable with ADNI (Supplementary material, p. 1–2). All participants provided written informed consent in accordance with the Helsinki declaration and approval for the studies was obtained by the local ethics committees.

Table 4 Characteristics of the cohorts included in the study

Only \(A\beta\) positive AD patients (ADNI, J-ADNI, AIBL) were included in the discovery cohort to ensure that the identified clusters reflect AD pathology (Table 4). CU individuals were \(A\beta\) negative (to exclude preclinical AD) and remained CU during all future cognitive assessments available to date (Table 4). Participants in the discovery dataset had more than two MRI visits (Supplementary Table 1, Supplementary Fig. 8), while those in the validation dataset had at least one visit (ADNI, J-ADNI, AIBL, AddNeuroMed). Some patients from the validation dataset (AddNeuroMed) had access to more than one MRI visit (Supplementary Table 1).

Magnetic resonance imaging (MRI)

The J-ADNI, AddNeuroMed, and AIBL cohorts adopted the MRI protocol of ADNI. High resolution sagittal (1.5 T and 3 T) 3D T1-weighted Magnetization Prepared Rapid Gradient Echo (MPRAGE) volumes, with full brain and skull coverage were acquired and detailed quality control (QC) was applied to the original images. Images were processed with the longitudinal stream of FreeSurfer 6.0, through the TheHiveDB28. The parcellation and segmentation of MRIs with Freesurfer were QCed manually by a trained person to exclude bad segmentations/parcellations that would introduce noise to the results. Thickness from 34 cortical (Desikan atlas) and volumes of seven subcortical regions per hemisphere (Supplementary Table 9) were extracted and averaged between hemispheres. These regions were used as input for clustering. Estimated total intracranial volume (eTIV) was also extracted to account for differences in head size in volumetric measures.

Longitudinal clustering analysis

Statistical analysis consisted of three steps (Supplementary Fig. 9). In the first step, we estimated mean volume/thickness levels of the CU individuals (for the age span 50–90) in the discovery dataset based on linear mixed effect models. This was followed by calculations of w-values29, which are z-values adjusted for age and cohort for the discovery and validation datasets based on the CU mixed effect models. Volume/thickness per brain ROI was used as response, cohort, and subject id as random effects and age as a fixed effect in the CU mixed effect models (one model for each of the 41 left/right hemisphere averaged brain regions). Adding cohort as a random effect in these models enabled us to make individual average volume/thickness predictions for the effects of ADNI, J-ADNI, and AIBL cohorts and use the population mean that corresponds to all individuals to harmonize the data of the AddNeuroMed cohort. The addition of the cohort random effect at this step of the analysis, allows for future classification of MRI data from new cohorts to the identified longitudinal clusters. Adding age as a fixed effect allowed us to accurately estimate the anatomical changes in the 41 brain regions due to aging since the CU dataset consisted of amyloid-negative healthy controls with up to nine MRI visits and a CU diagnosis during the sum of their future follow-ups. The mean volume/thickness (mixed effect model atrophy expected fitted value for specific cohort and chronological age) at any age and the standard deviation of it (residual plus random effects standard deviation) were used to calculate w-values or AD patients. Consequently, w-values in our AD group (both discovery and validation datasets) reflect brain atrophy that is caused by the disease, free from the healthy aging anatomical features and cohort effects. To visually inspect this correction method, we employed a multivariate mixed effect model30 and visualized the results. After this correction, the effect of disease is what remained in the AD dataset to be assessed.

In the second step, we applied an in-house pipeline for longitudinal clustering to the discovery dataset15,31. The multivariate mixture of generalized mixed effects clustering model32 incorporates Bayesian inference to explore heterogeneity in the longitudinal brain data. We applied this model on brain volume/thickness but it has already been used in other applications31,33,34,35,36,37,38,39,40,41,42. The Bayesian approach allowed the implementation of a complex hierarchical model for each cluster, where probabilistic mixture modeling (number of clusters) is combined with mixed effect modeling (number of brain regions) in one model definition. The covariance between random effects (brain regions) for each cluster was also modeled and thus the outcome included structural and functional relationships between brain regions at a cluster level. This implementation allows for the inclusion of random and fixed effects. The feature of time (timescale) was modeled by random intercepts and slopes (cluster specific). Disease duration (time between AD onset age and MRI session age), was used as the timescale variable (cluster-specific random effect) in the clustering model. Since the existence of disease duration as a continuous measure in months was the initial inclusion criterion for AD patients in this study (for the design of the study), the timescale is common and disease duration exists for the sum of AD patients in the model. The fixed effects modeled the population level effects. MRI field strength and eTIV were used as fixed effects, assuming that they vary and follow a trend in the population but not between clusters. Following this approach, we accounted for some important effects that can bias the clustering algorithm9. Since the longitudinal clustering is based on the traditional model-based probabilistic clustering (mixture of Gaussian distributions), we also estimated patient-cluster probabilities that reflect the chances of a patient belonging to each cluster given their response vector (random intercepts and slopes)43. Patients with similar volume/thickness patterns at the AD onset and progression over time were clustered together. Post-clustering, we assessed how many AD volume/thickness patterns exist at the clinical AD dementia onset, the rate of atrophy per year for each pattern, and the frequency of each pattern in the population. Although internal measurements of clustering quality (Silhouette, CH, and others) already exist for clustering assessment, they do not provide enough information when longitudinal data are clustered. Instead, model-fitting information including model deviance and percentage of MCMCs with higher autocorrelation compared to the majority of MCMCs were used to assess clustering quality15,31. The model was initially fitted with linear slopes. After a sufficiently long simulation, the parameters were saved and then used to initialize the optimization again, but with the addition of quadratic terms. The second optimization step aimed to model the atrophy plateau that occurs after long disease duration. By following this stepwise quadratic term addition, we avoided the risk that all trend parameters (slopes and quadratic terms) are poorly optimized, which is common in models with as many parameters44 as is our model-based clustering approach. Smoothed median-fitted value maps with volume/thickness at AD onset and for 8 consecutive years were calculated (using longitudinal information from patients at any time during the first 8 years of clinical disease duration) to characterize cluster volume and thickness tendency. Two different thresholds, 1.6 standard deviations below the CU normative values45, and 0.5 standard deviations below the CU normative values (a less conservative threshold) were used for the atrophy maps.

In the third step we used the discovery set model as a classifier, to assess the chance of each patient in the validation dataset belonging to any of the defined clusters46. We used the validation dataset for two reasons. Since the validation dataset includes mainly patients with one MRI visit (79% of patients), we aimed to understand whether we can utilize the longitudinal model outcome with this cross-sectional information to accurately assign patients to the longitudinal clusters. To compare the accuracy of this assignment we calculated median volume/thickness images for the sum of patients in each cluster of the validation set separately. Then, we compared those median images with the fitted values (estimated at the median disease duration in months of the validation set for each cluster) of our model (2nd step) to make an approximate assessment of the classification ability of new AD patients’ data. This helped us to increase the transparency of the supervised classification procedure and assess the model’s ability to make relevant patient assignments into clusters. Moreover, by predicting cluster assignment in the validation dataset we were able to increase the size of the final clusters (pooled discovery and validation datasets) and make more accurate estimations of the cognitive profiles (and other characteristics) of the AD patient clusters. A further validation of the clustering method involved the application of the second step of the analysis independently in the ADNI and J-ADNI/AIBL datasets, to assess the volume/thickness patterns in the different datasets and their agreement to the complete dataset model. The correspondence between the results of the independent analysis in the ADNI and J-ADNI/AIBL datasets and their relation to the complete dataset analysis were assessed by means of distance between the intercepts and slopes of the identified patterns.

Some of the advantages of the overall pipeline are that it: incorporates whole brain data, leverages data of patients with different visit numbers and at different times, provides cluster visualization through the fitted values, provides clustering uncertainty measures, allows for the modeling of confounding effects, compares the patient’s cluster specific volume/thickness with a group of healthy individuals15, can potentially be used for the classification of new patients with only one MRI visit. In comparison to previous approaches10,13, longitudinal data are used in longitudinal modeling and not as an evaluation set in cross-sectional analysis.

Complementary statistical analysis

As mentioned in step one of the longitudinal clustering analysis, we estimated cluster-specific random effects covariance matrices for each cluster. Each element of the cluster-specific (one for each cluster) intercept covariance matrix represents the correlation of one brain region’s intercept to any other region. Consequently, correlated brain regions may have similar structural connectivity. The same applies to slope covariance matrices. We are focusing more on slopes that can provide more information about structural connectivity. Thus, correlating random slopes shows that brain regions develop atrophy in a similar manner over time. It is important to notice that the intercept/slope variance/covariance matrices per cluster refer to estimated regression random intercepts and slopes and not to the original volume/thickness data32. To characterize the differences between clusters in terms of structural (intercept) and longitudinal (slope) brain regional volume/thickness relationships, nodal strength47 was calculated based on the aforementioned intercept and slope variance/covariance matrices. This graph theory measurement summarizes information from covariance matrices for each brain region and reflects the sum of the correlations of a brain region with all the regions connected to it. Clusters were compared in pairs using BRAPH (http://braph.org/)48. It is important to stress that the nodal strength calculation was not used as the main analytical step in this study but only to help summarize the information from the cluster covariance matrices and to decrease the number of brain regions involved in the cluster interpretation. Moreover, post-clustering (after the main clustering analysis), the intercept and slope mean values per cluster were further clustered using hierarchical clustering, to investigate the existence of common atrophy intercepts and atrophy progression patterns (slopes) over time. This step helped to infer whether some clusters of patients follow the same spatial distribution of atrophy in the brain, but with faster or slower progression and/or different intercepts at the AD onset (stage of atrophy at the AD onset). For the ADAS-cog subscales, MMSE, and ANART, we applied generalized linear mixed effect models (and corrected our results post hoc) to explore differences between clusters. All analyses were done with R (3.6.3). ANART scores were used to assess premorbid intelligence.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.