Introduction

Alzheimer’s disease (AD) is the most common cause of dementia worldwide and is characterised by progressive cognitive impairment and brain atrophy1. The disease is characterised by several events. The National Institute on Aging and Alzheimer’s Association has proposed a classification system to categorise individuals based on biomarker evidence of pathology. This is called the ATN classification system and is used to rate people for the presence of cerebrospinal fluid β-amyloid (CSF Aβ or amyloid positron emission tomography (PET): 'A'), hyperphosphorylated τ (CSF pτ or τ PET: 'T'), and neurodegeneration (atrophy on structural magnetic resonance imaging (MRI), FDG) PET, or CSF total τ: 'N'), resulting in eight possible biomarker combinations2. Furthermore, a recent report on the involvement of microglial activation in the spread of τ tangles over the neocortex in AD suggests an additional inflammation biomarker for AD3. The most consistent structural imaging finding in AD is the reduced hippocampal volume4, but this is arguably not the most specific structural biomarker as AD frequently presents with non-amnestic symptoms with initial involvement of extra-temporal regions of the brain5. Furthermore, the reduced hippocampal volume has been found in many other neuropsychiatric conditions including schizophrenia6, depression7 and hippocampal sclerosis8 as well as the recently described limbic-predominant age-related TDP-43 encephalopathy9. Together with the hippocampal volume, Aβ(1–42), phosphorylated τ (pτ), and total τ (τ) CSF biomarkers have been shown to discriminate patients with AD from healthy controls10. However, their introduction into clinical practice is limited by considerable variability between laboratories and assay batches10. Similarly, blood-based biomarkers, which are eagerly awaited to address issues related to the invasiveness and high cost of CSF-based ones, often stall in the early stages because of a disconnect between academia, where biomarkers are identified, and industry, where they should be developed and commercially distributed11.

In these last 40 years, improved computational power and storage capacity have led to numerous advances in developing non-invasive and low-cost structural biomarkers for AD that combine neuroimaging approaches, in particular structural MRI12, with machine learning. This approach involves the acquisition of image data, the segmentation of the region of interest (ROI), feature extraction and selection for classification/prediction. Critically, features extracted from radiological images are able to reveal useful new biology13,14 hidden to the clinician’s eye15—at a mesoscopic scale. For example, the mesoscopic architecture of entire tumours can reveal stromal phenotype or immune context, with strong prognostic or predictive utility16,17. In a radiomics analysis, the extracted features represent statistical morpho-functional traits of intensity, shape, texture, scale, grey level co-occurrence matrix (GLCM), grey level run-length matrix (RLM), grey level size zone matrix (GLSZM), neighbourhood grey tone difference matrix (NGTDM) and neighbourhood grey level dependence matrix (NGLDM)18. A number of studies have shown texture differences between AD patients and healthy controls (HC) in structures such as the hippocampus, corpus callosum, and thalamus19,20. Supplementary Data 1 summarises the results and methods of the most cited papers published in the last 5 years on the classification of AD and AD-related mild cognitive impairment (MCI) patients using multimodal features. Zhang et al.21 for instance used a single-hidden-layer neural network and predator-prey particle swarm optimisation algorithm to classify HC from AD patients. They extracted texture features from one selected axial slice of a T1-weighted (T1w) MRI scan and obtained 93% accuracy in an internal test set. Similarly, Sorensen et al.22, with a linear discriminant analysis extracted cortical thickness measurements, volumetric measurements and hippocampal volume, shape and texture features and reached from a T1w MRI scan with 63% accuracy. With the integration of genetic and cerebrospinal fluid biomarkers, Tong et al.23 reached a 0.78 area under the curve (AUC) in the discrimination between HC and people with an AD-related mild cognitive impairment, thus pushing the technology towards earlier detection. They used a non-linear graph fusion method to reduce the number of volumetric features extracted from T1w MRI, intensity features extracted from PET data, three CSF measures and one genetic categorical feature. An improved performance was obtained with the view-aligned hypergraph learning approach used by Lin et al.24. They obtained 93, 90, 80 and 79% accuracies in the discrimination between HC and AD patients, HC and progressive MCI, HC and MCI, and stable and progressive MCI patients, respectively. In aggregate, when all patients, including control, prodromal forms of AD and AD are combined, most methods reach lower accuracy values. Of note, in most studies, models were trained and tested on an internal dataset only (Supplementary Data 1).

This current study proposes a method able to characterise early and later forms of Alzheimer’s disease with the extraction from a T1w MRI sequence of 29,520 statistical morpho-functional traits distributed over a multi-regional brain mask obtained with an automatic segmentation. Healthy brain and diseases unrelated to AD pathology, including Parkinson’s disease and frontotemporal dementia have been combined for the development of a set of tools able to reveal the mesoscopic architecture unique to AD.

Methods

The study workflow is summarised in Fig. 1. The analysis of baseline age-matched T1w MRI images consisted of a two-step combined approach with and without the additional information given by cognitive scores and CSF-based biomarkers. The model was trained on 1.5 T T1w MRI scans obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI). After stratified randomisation, 70% of data were used for training and 30% for validation (robustness test shown in Supplementary Fig. 1). The control group (nADrp) included healthy controls, patients with frontotemporal dementia and with Parkinson’s disease and the disease group (ADrp) included people with AD-related mild cognitive impairment (referred to as MCIAD in the text) and with Alzheimer’s disease. The method was tested on four cohorts: (1) The unseen 1.5 T ADNI cohort (30% of the entire 1.5 T cohort, made up of 65 CN, 62 MCIAD, 54 AD, 28 FTD and 25 PD); (2) The unseen 1.5 T dataset: 64 people obtained from the Open Access Series of Imaging Studied (OASIS) consortium with baseline T1w MRI scan and the mini-mental state examination (MMSE) score (53 CN and 11 AD); (3) The unseen 3 T dataset: 402 people obtained from ADNI with T1w MRI scan, MMSE, logical memory delayed recall total (LDELTOTAL), Aβ, τ and pτ (172 CN, 161 MCIAD and 69 AD); (4) The ‘real-world’ memory clinic cohort (IMC cohort): 83 patients with atypical presentations who underwent clinical Amyloid PET imaging as part of their diagnostic workup with a 1.5 T T1w MRI scan (45 amyloid-negative (AMY−) and 38 amyloid-positive (AMY+)) and LDELTOTAL and MMSE scores (for a subgroup of 22 people: 11 AMY− and 11 AMY+).

Fig. 1: Overview of the study design and two-step least absolute shrinkage and selection operator (LASSO) approach.
figure 1

Data used in this work were obtained from ADNI database, the OASIS consortium and the hospital memory clinic (IMC Cohort). Age-matched T1w MRI images were collected and segmented into 115 brain regions using the FreeSurfer’s recon-all function. Isotropic (1 × 1 × 1) T1w MRI scans and their brain masks were used for the radiomic analysis in a combined double step approach. After the selection and the standardisation of features, a first least absolute shrinkage and selection operator (LASSO1) was trained to classify people into those without and with AD-related pathology (nADrp and ADrp). Within the last group, a second LASSO (LASSO2) was trained to characterise patients with a mild cognitive impairment due to AD (MCIAD) from AD patients. The model was also integrated with cognitive scores (MMSE and LDELTOTAL) and CSF-based biomarkers (Aβ, τ and pτ). As the final algorithm was to be used to discriminate between ADrp and nADrp, combined healthy controls and patients affected by other non-AD pathologies (e.g. Frontotemporal dementia and Parkinson’s disease dementia) were combined into one group referred to as non-AD-related pathology group. Initial analysis of T2w MRI data did not yield discriminatory information, so only T1w MRI data is reported.

For the IMC cohort, we received ethical approval from the Camden and Kings Cross UK Research Ethics Committee (IRAS n. 273966) to perform retrospective anonymised and unlinked analysis of all clinical data (including MR images), provided that these were anonymised at source by a member of the clinical care team. In particular, the study protocol states: 'For all patients undergoing Amyloid PET at Imperial College Healthcare NHS Trust (ICHT) from December 2013 to January 2023 we will perform retrospective anonymised and unlinked analysis of clinically collected data. This will be anonymised at source by members of the clinical care team. The data will be unlinked and there will be no prospective element to this data collection.' Informed consent was waived, as is the case for retrospective analysis of anonymised imaging data.

Data for ADNI and OASIS are openly available upon registration of investigator interest. All participants provided informed consent. Details about the Ethics statement of the ADNI study population can be found at: https://adni.loni.usc.edu. Details about the Ethics statement of the OASIS study population can be found at: https://www.oasis-brains.org/#data. Protocols for data collection and the list of institutions who approved data collection can be found at https://adni.loni.usc.edu/methods/documents/ for ADNI. OASIS is made available by the Washington University Alzheimer’s Disease Research Center, the Howard Hughes Medical Institute (HHMI) at Harvard University, the Neuroinformatics Research Group (NRG) at Washington University School of Medicine, and the Biomedical Informatics Research Network (BIRN).

MRI segmentation and radiomic analysis

T1w MRI images were segmented to brain masks of 115 sub-regions using the FreeSurfer’s recon-all function (45 regions obtained from the segmentation of the white matter +70 subcortical regions obtained from the additional segmentation of the cortex)25,26. Before segmentation, this function performs many pre-processing steps, including bias correction, image sampling and coregistration; the steps and brain regions extracted are summarised in Supplementary Table 1. The multi-regional brain masks were post-processed for the extraction of 656 features for each region using in-house software (TexLAB 2.0), which runs on MATLAB16. The extracted features are related to the shape and size, intensity, texture and wavelet decompositions of isotropic (1 × 1 × 1) T1w MRI scans (Supplementary Data 2). The standardised radiomic features with a false discovery rate (FDR) <5% were selected as the input for the LASSO. Tenfold cross-validation was performed to select lambda which yielded the minimum cross-validated mean squared error. The weighted sum of the selected features gave the Alzheimer’s predictive Vector, ApV. For improving the model performance, the method was integrated with two cognitive measurements (MMSE and LDELTOTAL) and three CSF-based biomarkers (Aβ, τ and pτ). The result was a second predictive vector: ApVs.

The model is composed of two steps:

  1. 1.

    In the first stage of the classification, the algorithm works on the discrimination of people with an Alzheimer related pathology. The two inputs to the LASSO1 are the nADrp group, which includes healthy controls and people with Parkinson’s and frontotemporal dementia, and the ADrp group, which includes people with MCIAD and AD. The result of the LASSO is a reduced number of features/regions with their correspondent weights. The weighted sum of regions/features gives the ApV1 (ApV1s with the inclusion of cognitive scores and CSF related biomarkers). People classified as not- nADrp are used as inputs for the second stage of the classification.

  2. 2.

    In the second stage of the classification, the algorithm works on the distinction between people with an AD-related mild cognitive impairment and with Alzheimer’s disease. The LASSO2 performs a weighted sum of selected features/regions and gives the ApV2 (ApV2s with the inclusion of cognitive scores and CSF related biomarkers) which characterise a prodromal from a late phase of AD.

The performance of the algorithm was tested using two methods. In Method A, the features extracted from the 45-region brain mask (alone and together with cognitive/CSF scores) were used and, in Method B, features extracted from the (45 + 70)-region brain mask (alone and together with cognitive/CSF scores) were used. Based on the accuracy and the accuracy/AUC values, Method B was chosen for the computation of the ApV1, and Method A was chosen for the computation of ApV1s, ApV2 and ApV2s (Table 1).

Table 1 Methods comparison.

Genomic analysis

Six genome-wide association study (GWAS) analyses were performed across three phenotypes (nADrp, MCIAD, AD) derived from three variables (original label (ADNI), ApV and ApVs). One GWAS was performed for nADrp vs MCIAD and another GWAS for nADrp vs AD across all five variables. APOE4 allele status was provided by ADNI APOE genotype dataset. All the GWAS analyses were adjusted for age and gender using the GWASTools R package (v1.36). Each GWAS analysis calculated the main effects of all single-nucleotide polymorphisms (SNPs) on the target label (MCIAD /AD). For all GWAS the empirical p values were based on the Wald statistic27. Manhattan plots were used to visualise GWAS results.

Statistics and reproducibility

Standard statistical analysis was applied to all the figures as appropriate and indicated in the figure legends. All samples were used once. Multiple testing was corrected with the FDR method. All the statistical analyses were conducted in Matlab R2019b.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Results

Characteristics of data and patients

Data used in this work were obtained from the ADNI database (www.loni.ucla.edu/ADNI), launched in 2003 as a public-private partnership, led by Principal Investigator Michael W. Weiner, MD. The primary goal of ADNI is to test whether serial MRI, PET, other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of MCI and early AD. For up-to-date information, see www.adni-info.org. From this database, all people for whom baseline MRI data (T1w magnetisation-prepared rapid acquisition with gradient echo (MP-RAGE) sequence at 1.5 T), age, and cognitive scores (MMSE28, a brief screening test for cognitive status and the LDELTOTAL29, a measure of verbal episodic memory), CSF-based biomarkers (Aβ, τ and pτ) were available have been included.

For the diagnostic classification at baseline, the method was trained on 783 people scanned at 1.5 T (ADNI1 cohort). They were grouped as 216 healthy controls, 208 people with MCI due to AD (MCIAD), 181 AD, 94 patients with Frontotemporal Dementia (FTD), and 84 with Parkinson’s disease (PD).

In particular, based on the data obtained from the ADNI database, two new groups of people were defined: the nADrp group, which contains people who do not show any pathology related to AD (healthy controls, PD and FTD were included here); and the ADrp group which, on the contrary, contains people with MCI due to AD and AD patients.

The method was externally tested on:

  • An unseen 1.5 T dataset obtained from the OASIS consortium (https://www.oasis-brains.org/) of 64 people for whom baseline T1w sequence, age and MMSE scores were available (53 CN and 11 AD).

  • An unseen 3 T dataset of 402 people obtained from the ADNI3 cohort for whom baseline T1w sequence, age, cognitive scores and CSF related biomarkers were available (172 CN, 161 MCIAD and 69 AD).

  • The IMC cohort: 83 patients with atypical presentations who underwent clinical Amyloid PET imaging at the Imperial Memory Centre (IMC, London, UK) as part of their diagnostic workup with a 1.5 T T1w MRI scan. Of the 396 patients who had an Amyloid PET scan between December 2013 and June 2019, those (n = 83) who had an MRI scan available acquired between 3–6 months after the Amyloid PET scan and received a clinical neuropsychological assessment which included the administration of the Logical Memory Test, were included to the study. Of these, a subgroup of 22 patients also had an MMSE administered within 12 months of MRI scanning. At the Memory Centre, the decision to perform a clinical Amyloid PET scan is made by consensus within the Cognitive Neuroradiology Multidisciplinary Team30 and referral to Amyloid imaging is in line with the Appropriate Use Criteria published by the Amyloid Imaging Taskforce31. These criteria recommend the use of clinical Amyloid PET in three main categories of patients: (1) with persistent/progressive unexplained MCI; (2) with atypical course or aetiologically mixed presentation; (3) with early age of onset. Moreover, patients undergoing clinical Amyloid PET imaging should report objective cognitive impairment with substantial diagnostic uncertainty following a comprehensive evaluation31. For the IMC cohort, mainly employed for the classification/evaluation of earlier diseases using structural MRI, all images were visually read as ‘amyloid-positive’ (AMY+, N = 45) or ‘amyloid-negative’ (AMY−, N = 38) by an experienced nuclear medicine radiologist using greyscale images. All AMY + patients received a clinical diagnosis of AD. AMY− patients were either diagnosed with another neurodegenerative disease (progressive non-AD MCI (N = 4), MCI due to hypertensive microvascular disease (N = 1), unspecified neurodegenerative disease (NDG) (N = 1), MCI due to previous stroke (N = 1), NDG with Parkinsonian features (N = 1), Lewy body dementia (N = 1), tauopathy (N = 1), normal pressure hydrocephalus (N = 1), isolated cerebral amyloid angiopathy (N = 1)) or with a non-neurodegenerative condition (e.g. depression). Patient characteristics are provided in Supplementary Fig. 2.

A multiparametric analysis was conducted on a subset of 118 diffusion tensor imaging (DTI) MRI sequences obtained from ADNI (39 AD, 40 CN and 39 MCIAD). They were used to assess the variability of the fractional anisotropy (FA) and its relationship with the extracted features. Finally, quantitative phenotypes derived from ADNI Genetics Core were available for 199 CN, 187 MCIAD and 166 AD people of our 1.5 T training cohort and used for GWAS analysis.

Radiomic predictive vector characterises Alzheimer’s disease

For each subject, T1w MRI images were automatically segmented into 115 regions from which radiomic features were independently acquired, standardised and reduced with a machine learning-based model. They were finally combined in Alzheimer’s predictive vectors.

ApV1 – a biomarker to discriminate between patients with and without AD-related pathology

Among the 656 features extracted for each of the 115 brain regions, LASSO1 selected 20 features (those with non-zero coefficients) distributed in 14 regions (Fig. 2a). The weighted sum of extracted features in the selected regions gave the Alzheimer’s predictive vector ApV1. With the integration of cognitive scores and CSF-based biomarkers, LASSO1 selected 19 features distributed among 12 regions (Fig. 2b). In a similar way, the combination of features, cognitive scores and regions gave the predictive vector ApV1s. Figure 2aI, aII (and bI-bII) show the tenfold cross-validated deviance of the LASSO fit and the feature coefficients plotted against the shrinkage parameter lambda extracted for the ApV1 (ApV1s). Figure 2aIII, aIV show the ROC curve for the validation of ApV1 (AUC of 0.99) and the distribution of the validated ApV1 in the nADrp and ADrp groups, respectively. Similarly, Fig. 2bIII, bIV show the ROC curve for the validation of ApV1s (AUC of 0.99) and the distribution of the validated ApV1 in the nADrp and ADrp group, respectively. The predictive ability of the ApV1 in discriminating people without AD-related pathologies (nADrp) from those with AD-related pathology (ADrp) was compared to the clinical standard measures of hippocampal volume and CSF Aβ (Table 2). Of note, the measurements of diagnostic accuracy of Aβ are obtained with the application of established cut-off values32 from the comparison between CN and ADrp. Compared to the standard measures, our method showed higher specificity, sensitivity, accuracy, negative and positive predictive values, likelihood ratios and diagnostic odds ratios. ApV1 showed a state-of-the-art accuracy of 0.98 (0.26 and 0.62 for the volume of the hippocampus and CSF Aβ, respectively) in the prediction of AD-related pathologies. Of note, neither age nor CSF biomarkers were selected by LASSO1.

Fig. 2: Results of LASSO1.
figure 2

The biophysical mesoscopic properties of brain regions in nADrp and ADrp people are depicted by the combination of features/regions selected by the LASSO1. In the radial phylogeny trees, the components of ApV1 (a), ApV1s (b) are summarised. aI and aII show the tenfold cross-validated deviance of the LASSO1 fit and feature coefficients plotted against the shrinkage parameter Lambda. Shown in aIII the ROC curve for the validation of ApV1. Shown in aIV is the distribution of the validated ApV1 in the nADrp (N = 152) and ADrp (N = 116) groups. bI and bII show the tenfold cross-validated deviance of the LASSO1 fit with the integration of cognitive scores and CSF-based biomarkers and the feature coefficients plotted against the shrinkage parameter Lambda. bIII and bIV show the ROC curve for the validation of ApV1s, and the distribution of the validated ApV1s in the nADrp (N = 152) and ADrp (N = 116) groups. In the radial trees, branches are coloured based on the region selected (hippocampus: red, other: black), their brain hemisphere (left: orange, right: blue), and the cognitive score (green). In the box plots, points are laid over a 1.96 standard error of the mean (95% confidence interval) and one standard deviation (black vertical line).

Table 2 Diagnostic performance of the Alzheimer’s predictive vector ApV1 and ApV1s.

The testing of the method on the unseen 1.5 T OASIS cohort showed 0.81 and 0.83 accuracies for ApV1 and ApV1s, respectively (Table 2). Applied unmodified to a different field strength (3 T), our method showed 91 and 80% specificity, together with reduced accuracy of 0.49 and 0.47 for the ApV1 and ApV1s, respectively.

ApV2 — a biomarker to categorise ApV1/ApV1s positive patients into prodromal (MCIAD) and late (AD) groups

The LASSO2 selected 8 features distributed in seven regions (Fig. 3a) with a dominance of the left brain. The weighted sum of the extracted features in the selected regions gave the Alzheimer’s predictive vector ApV2. With the integration of cognitive scores and CSF-based biomarkers, the LASSO2 selected 19 features distributed in 15 regions (Fig. 3b). The combination of features, cognitive scores and regions gave the predictive vector ApV2s. Figures 3aI, aII (and bI-bII) show the tenfold cross-validated deviance of the LASSO2 fit and the feature coefficients plotted against the shrinkage parameter lambda extracted for the ApV2 (ApV2s). Figure 3aIII, aIV show the ROC curve for the validation of ApV2 (AUC of 0.79) and the distribution of the validated ApV2 in the MCIAD and AD groups, respectively. Similarly, Fig. 3bIII, bIV show the ROC curve for the validation of ApV2s (AUC of 0.95) and the distribution of the validated ApV2 in the MCIAD and AD groups, respectively. The predictive ability of the ApV2 in discriminating people with prodromal and later forms of AD in comparison with the standard clinical measures—the volume of the hippocampus and the CSF Aβ—was quantified with the measures of diagnostic accuracies and is summarised in Table 3. ApV2 reached an accuracy of 0.79 in the prediction of AD, with higher accuracy of 0.86 with the integration of clinical scores, independent of age and CSF biomarkers. The high accuracy is remarkable given the continuum of disease progression between MCIAD and AD. Applied to different field strengths (3 T), our method showed an accuracy of 0.62 and 0.82 for the ApV2 and ApV2s, respectively. The LASSO2 could not be tested on the OASIS cohort as it does not include any MCIAD people. In aggregate, our results show a predominant dysfunction in the left hemisphere33. This confirms the strong left-hemispheric lateralisation found in the early stages of the disease compared to weak right-hemispheric lateralisation found in advanced stages34 (see also Supplementary Note 1 and Supplementary Fig. 3).

Fig. 3: Results of LASSO2.
figure 3

The biophysical mesoscopic properties of brain regions in MCIAD and AD people are depicted by the combination of features/regions selected by the LASSO2. In the radial phylogeny trees, the components of ApV2 (a), ApV2s (b) are summarised. aI and aII show the tenfold cross-validated deviance of the LASSO2 fit and the feature coefficients plotted against the shrinkage parameter Lambda. Shown in aIII the ROC curve for the validation of ApV2. Shown in aIV the distribution of the validated ApV2 in the MCIAD (N = 62) and AD (N = 54) groups. bI and bII show the tenfold cross-validated deviance of the LASSO2 fit with the integration of cognitive scores, CSF-based biomarkers and the feature coefficients plotted against the shrinkage parameter lambda. bIII and bIV show the ROC curve for the validation of ApV2s, and the distribution of the validated ApV2s in the MCIAD (N = 62) and AD (N = 54) groups. In the radial trees, branches are coloured based on the region selected (hippocampus: red, other: black), their brain hemisphere (left: orange, right: blue), and the cognitive score (green). In the box plots, points are laid over a 1.96 standard error of the mean (95% confidence interval) and one standard deviation (black vertical line).

Table 3 Diagnostic performance of the Alzheimer’s predictive vectors ApV2 and ApV2s.

Repeatability of the Alzheimer’s predictive vectors

The ApV methods were compared to the standard imaging measure (the volume of the hippocampus) and tested on a second T1w MRI scan obtained on the same day of the baseline scan used for training the model. The Bland–Altman plots are shown in Supplementary Fig. 4. Based on the reporting guidelines by Koo and Li35, a one-way random effects, absolute agreement, single rater/measurement interclass correlation coefficient was evaluated and was 0.83, 0.89, 0.83 and 0.82 for ApV1, ApV1s, ApV2 and ApV2s, respectively. The interclass correlation coefficient for the hippocampal volume was 0.94. A boxplot of the distribution of the volumes of the hippocampus in the main groups is also shown in Supplementary Figure 4f. The robustness (non-random nature) of our ApV1 and ApV2 was further tested. Results are summarised in Supplementary Table 2. The measurements of diagnostic accuracy of ApV1 (a) and ApV2 (b) are obtained when the ApV is computed with the complete set of features extracted by the LASSO (Ftot), the four features with the highest weights (Ftest4) and all the possible permutations with three (Ftest3-p1, Ftest3-p2, Ftest3-p3, Ftest3-p4) and two features (Ftest2-p5, Ftest2-p6, Ftest2-p7, Ftest2-p8, Ftest2-p9 and Ftest2-p10) are reported. With regards to the ApV1, Ftest4 showed a comparable performance compared to Ftot. Among all the permutations, Ftest3-p2 obtained the best performance involving the features extracted in the right middle temporal, rostral middle frontal and temporal pole (98% accuracy, 0.99 AUC). Regarding ApV2, the best performance was obtained when the ApV was computed with only two features extracted from the left cerebral white matter (WM) and left Cerebellum WM (78% accuracy and 0.79 AUC).

The ApV on 'real-world' data

The model was tested on the IMC cohort, which includes people who underwent a clinical amyloid PET scan at our institution and are classified as Amyloid-positive (AMY+) or negative (AMY−). When applied to this 'real-world' cohort, no statistical difference was found between ApV1 and ApV2 in people with positive/negative amyloid enhancement (p = 0.88) (Supplementary Fig. 5b). Regardless of the PET output, people were classified as nADrp and MCIAD (in particular, of the 44 AMY−, 42 were classified as nADrp, 2 as MCIAD and 1 as AD; of the 38 AMY−, 36 were classified as nADrp and 2 as MCIAD). The model was also tested on a subgroup of 22 people whose T1w MRI scan was obtained 5 ± 4 months after Amyloid PET imaging and was used together with the MMSE and the LDELTOTAL cognitive scores. In this small cohort, people with a negative PET scan were classified as nADrp (N = 8), MCIAD (N = 2) and AD (N = 1). People with a positive scan were evenly classified as nADrp and MCIAD (N = 5), only one subject was classified as AD. In relation to the PET output, our ApV1s showed a statistical difference between AMY- and AMY+ (p = 0.02) (Supplementary Fig. 5b).

Genome-wide association study and fractional anisotropy

Figure 4 shows the Manhattan plot of the GWAS for the ApVs. The Manhattan plot shows one SNP above a significance threshold of p < 10−7. This SNP corresponded to the genotype RS IDs: rs2075650. The rs2075650 SNP was above the significance thresholds across all variables, original labels, ApV and ApVs (Supplementary Figs. 6, 7). Similarly, for all cognitively normal vs mild cognitive impairment, no SNPs were above the threshold. Additionally, in the ApV group, ADrp vs AD, the p < 10−6 SNP rs575606 was above a threshold of p < 10−6 (Supplementary Fig. 6). When performing a GWAS adjusting for the presence of one or two APOE4 alleles, no SNPs were identified as significantly associated with AD in any of the outcomes (Supplementary Fig. 7). Additionally, we present LocusZoom plots of the 2000 base pairs around rs2075650 on the GWAS results without the adjustment of APOE4 (Supplementary Fig. 8). An extensive interpretation of the GWAS results is included in Supplementary Note 2. In aggregate, Supplementary Note 2 includes the allele frequencies evaluation (allele proportions and Hardy–Weinberg Equilibrium Fisher’s exact test p value) for the SNP rs2075650, which shows ‘B’ to be the minor allele with both the ApVs and ApV classification (Supplementary Table 3).

Fig. 4: Genetic and molecular characteristics associated with the ApV biomarker.
figure 4

In A, B the Q–Q and Manhattan plots of genome-wide association study (GWAS) of the cognitively normal and Alzheimer’s disease labels derived from ApVs are shown. In detail, B is the Manhattan plot of the p values (−log10(Wald p value)) from GWAS analysis of the ApVs. The horizontal line displays the cut-off for two significant levels (p < 10−7). Shown in A is the quantile–quantile (Q–Q) plot of the distribution of the observed p values (−log10(observed p value)) in this sample versus the expected p values (−log10(expected p value)) under the null hypothesis of no association. Shown in C is the variation of fractional anisotropy tested in 115 brain regions. A Wilcoxon rank-sum test was used to test the regional statistical difference of FA between nADrp (N = 79) and ADrp (N = 39) and between MCIAD (N = 31) and AD (N = 8) people. D The absolute values of FA in the regions for which a statistical difference was found between nADrp and ADrp and between MCIAD and AD patients (p < 0.05) is shown.

In agreement with the ADrp phenotype, the analysis of fractional anisotropy from DTI MRI sequences showed a neuronal loss in ADrp people. The variation of FA was tested in 115 brain regions. A Wilcoxon rank-sum test was used to test the regional statistical difference of FA between nADrp (N = 79) and ADrp (N = 39) and between MCIAD (N = 31) and AD (N = 8) people. For most regions, no statistically significant reduction was present (p > 0.05) (Fig. 4C). Twenty-two out of 115 regions showed a significant variation of FA between nADrp and ADrp (left and right cerebral cortex and the left caudate showed an FA increase). Between MCIAD and AD, 11 out of 115 regions showed a significant variation of FA (an increase of FA was present only in the left amygdala). Figure 4D shows the absolute values of FA in the regions for which a statistical difference was found between nADrp and ADrp and between MCIAD and AD patients (p < 0.05).

Discussion

This study presents a novel MRI-based radiomic predictive vector which outperforms standard hippocampal volume and CSF Aβ measurements (Table 2) reaching a 0.98 accuracy in an internal test set (mean value 0.9830, 95% confidence interval (CI) [0.9829, 0.9831]) for the triage of people without an AD-related pathology. Our ApV is robust and repeatable across MRI scans (Supplementary Fig. 4), demonstrating its potential for applicability in clinical practice in the future.

This method does not require a subject matter expert, but rather uses established software for both brain segmentation (FreeSurfer)25,26,36 and radiomics analysis16. The algorithm computes manually engineered features allowing an easy interpretation of the ApV and facilitating clinical translation. To avoid overfitting, the dimensionality of the model is reduced with the ‘least absolute shrinkage and selection operator’37, which selects the most informative and less redundant features corresponding to specific brain regions. The LASSO is suitable for the regression of high-dimensional features in a radiomics strategy38 allowing, in a single regression model, the statistical analysis of complex data where data are labelled to exploit dependence patterns in specific brain regions. Compared to the most common multivariate models present in the literature (Random Forest, Naïve Bayes, K-Nearest Neighbours and Support Vector Machine), our univariate analysis shows higher accuracy (Supplementary Table 4) and easier interpretability, thanks to the implementation of manually engineered features, facilitating clinical translation. In order to improve the model’s generalisability, the training of ApV exploits commonalities and differences within the segmentations between controls and patients with FTD, PD, MCI due to Alzheimer’s disease and AD—appreciating that patients who come to the memory clinic may have other conditions. We rationalised that the extra information from FTD and PD segments will allow the model to gain a better contextual understanding of the regions of interest and better discriminate nADrp from ADrp rather than for detecting FTD or PD per se. Appreciating that the inclusion of non-AD pathologies in the control group of the training set could have introduced a classification bias leading to an overrated model accuracy, further tests were done to assess the impact of PD and FTD patients in the nADrp group. The measurements of diagnostic accuracy obtained when the classification is computed between CN and ADrp, as well as between CN and MCIAD and CN and AD patients (in comparison with the proposed original method, in italic – Table 4) prove that the performance of our method is not influenced by the presence of PD and FTD patients in the nADrp group.

Table 4 Test on the diagnostic performance of the algorithm.

In an internal test set (the 1.5 T ADNI cohort), the ApV1 is able to discriminate between people with (ADrp) and without (nADrp) Alzheimer’s related pathologies with a 0.98 accuracy. Differently from the majority of published research studies, where models are usually trained between two categories (e.g. HC vs AD or MCI vs AD) (Supplementary Data 1), our algorithm includes both AD patients and people with the early form of AD, mild cognitive impairment in the ADrp group. This procedure permits triage of patients who neither have MCIAD nor AD, taking into account the notion that Alzheimer’s disease exists along a spectrum, from early memory changes to functional dependence and death. To the best of our knowledge, the accuracy reached by the ApV in the internal dataset (obtained by analysing MRI data with or without cognitive scores) is superior to the ones obtained from published research studies, which focus on a single internal test set only39,40,41. However, the true performance of a radiomic model needs to be validated on external datasets or independent institutional cohorts; in practice, only a minority of studies report an application of algorithms to external datasets42. When tested on an external test set (the unseen 1.5 T OASIS cohort), our algorithm reaches a 0.86 accuracy, higher than previously reported studies43. Furthermore, when compared to the standard clinical measures of hippocampal atrophy and cerebrospinal fluid beta-amyloid concentration, the ApV shows higher accuracy, presenting a potentially valid alternative to the invasive CSF measurements.

To be precise, the ApV is independent of the amyloid levels in the CSF. Regardless of the stronger pathological biomarker signature encountered when increased CSF concentrations of τ and pτ species, decreased concentrations of Aβ32,44 and cognitive scores are considered together with structural data, it is notable that Aβ, τ and pτ were not selected as part of the optimised ApV algorithm. This result can be explained by the inner low accuracy of the CSF-based biomarkers collected for our cohorts (Supplementary Table 5), with respect to the established cut-off values (93 pg/ml for τ, 192 pg/ml for Aβ1–42 and 23 pg/ml for pτ)32. The non-overlapping nature of the ApV means that a combination of these with CSF biomarkers could be explored in the future to further improve accuracy in early MCIAD /AD.

The ApV describes the mesoscopic architecture and the biological changes of an AD brain. With an unsupervised approach, and appreciating the lack of post-mortem AD confirmation in our cohort of people, the algorithm selects texture and shape features, strong biomarkers of AD20,45,46, in regions typically involved in the development of the disease (the hippocampus, entorhinal cortex, amygdala47). In particular, our results show a predominant dysfunction in the left hemisphere33, confirming the strong left-hemispheric lateralisation found in the early stages of the disease compared to weak right-hemispheric lateralisation found in advanced stages34. As extensively described in the 'Biological interpretation of ApV' in the Supplementary Note 1, the cortical grey matter structural changes, usually due to the ageing brain and cognitive decline caused by neuronal loss48,49,50, are represented in part within the ApV by GLCM and FD features51 and confirmed, with the multiparametric analysis of DTI MRI images, by the statistically significant decrease of FA in AD patients. For example, the GLCM correlation feature, filtered with an LHL wavelet filter, in the left lateral ventricle expresses the dependency of grey level values to their respective voxels in the GLCM possibly relating to grey levels’ distribution in this brain region of AD patients where ventriculomegaly is commonly observed. Brain parenchymal shrinkage causes, in most neurodegenerative disorders, the passive enlargement of the lateral, third and fourth ventricles with a significant ventricular enlargement associated with AD52. Furthermore, cognitive decline, expressed as local neuronal loss of many hippocampal subfields (subiculum, cornu ammonis) following AD progression (as also confirmed by the statistically significant decrease of fractional anisotropy), is expressed by the Neighbouring Grey Tone Difference Matrix (NGTDM) coarseness feature extracted in the right hippocampus. This is a measure of the average difference between the central voxel and its neighbourhood and is an indication of the spatial rate of change. A higher value indicates a lower spatial change rate and a locally more uniform texture. Together with high pass wavelet filters applied in one dimension and a low pass one applied in the other two, the extraction of the coarseness in the hippocampus represents an index of heterogeneity. Interestingly, the algorithm also selects regions not commonly related to AD, such as the cerebellum and the ventral diencephalon. Together with a few studies reported in the literature53,54, this outcome challenges the traditional view that white matter bundles in the cerebellum or in the ventral diencephalon are not affected by AD, possibly highlighting new therapeutic opportunities.

The GWAS performed across nADrp, MCIAD and AD derived from the ApV classification labels highlights genetic insights distinct from classical APOE-only gene association in AD. The non-causal significant alteration of the SNP rs2075650 found in patients with ADrp-like phenotype reinforces a body of research that associates this gene with MCIAD and AD55,56,57. TOM40 is located adjacent to APOL, and the two genes are thought to be correlated with Alzheimer’s due to linkage. Given that after adjusting for APOE4 allele status, rs2075650 is no longer significant, this suggests the TOM40 association signal is driven by the APOE4 allele and surrounding variants.

The ApV is also age-independent for the age range used. The similarity between age-related atrophy in AD and in normal aging represents one limitation of applying multivariate models to structural MRI58. In this study, this issue is assessed following the age-correction method by Moradi et al.59, which introduced a large distortion on the MRI image, limiting the reliability of the extracted features, thus, considering age as an additional feature. The result was a non-selection of age among the less redundant, most significant features.

This method provides a biomarker able to detect an early stage of AD with a significant potential improvement of the clinical decision support system. The ApV was tested on a clinical cohort of people with objective cognitive impairment and uncertain underlying aetiology caused by an atypical clinical course or the presence of multiple co-morbidities (Fig. 5a). When employed in this cohort, the ApV outperformed the hippocampal volume measurements (Fig. 5b) and the standard cognitive scores (Fig. 5e) showing a statistically significant difference between the AMY− and AMY+ groups (p = 0.02, Fig. 5d). Therefore, where isolated hippocampal atrophy or episodic memory impairment fails to differentiate AMY+ from AMY− patients, the ApV shows a stronger diagnostic potential.

Fig. 5: Early detection of Alzheimer’s disease in an atypical-AD cohort.
figure 5

a Patients presenting at the IMC with suspected cognitive decline undergo a range of standard diagnostic investigations, such as MRI and neuropsychological assessment, which can vary across individuals depending on the clinical presentation. Where diagnostic uncertainty persists, the decision to perform Amyloid PET Imaging is made by consensus by a multidisciplinary team30 and in line with the appropriate use criteria31. In this context, a positive Amyloid PET imaging is highly suggestive of an underlying AD diagnosis, while a negative scan rules out AD. Patients with a negative Amyloid PET imaging often have either a non-AD type of dementia (e.g., FTD) or other non-neurodegenerative causes of cognitive impairment (e.g. depression). b The hippocampal volumes evaluated in the entire ADNI and IMC cohorts (N = 27 AMY−, N = 21 AMY+, N = 394 nADrp, N = 389 ADrp). c The distribution of the hippocampal volumes in the IMC cohort and in artificially thresholded subgroups of ADNI people (N = 27 AMY−, N = 21 AMY+, N = 387 nADrp and N = 340 ADrp in the mADNI group, N = 123 nADrp and N = 51 ADrp in the aADNI group). d The ApVs values of the IMC* and aADNI cohorts (where the volume of the hippocampus is statistically significant between the control and disease group (p = 0.02)) (N = 11 AMY−, N = 11 AMY+, N = 123 nADrp, N = 51 ADrp). e The distribution of the LDELTOTAL and MMSE scores in the IMC* and aADNI cohorts (N = 11 AMY−, N = 11 AMY+, N = 123 nADrp, N = 51 ADrp). In the box plots, points are laid over a 1.96 standard error of the mean (95% confidence interval) and one standard deviation (black vertical line).

Other than its retrospective nature, a limitation of this study is represented by the lower performance of the method when tested unmodified at higher different field strengths (the unseen 3 T dataset). As shown in Table 2, very high positive predictive values are associated with low sensitivity and overall low accuracy for both the ApV1 and ApV2 obtained from a baseline 3 T ADNI cohort. This result confirms the hypothesis that MRI radiomic features are susceptible to magnetic field strength60 and limits the applicability of our current method only to 1.5 T data. Future studies will focus on the development of pre-processing techniques for the improvement of the performance of the algorithm on 3 T data together with the introduction of an equivalent algorithm for higher field strengths. A second limitation of this study is the impossibility of directly comparing our method with the published literature. This is mainly related to how we decided to structure our input to improve the model’s generalisability: the control group, together with healthy people, also contains people with Parkinson’s disease and frontotemporal dementia. A third limitation of this study is related to the computational effort needed to pre-process the structural MRI data. The segmentation step performed by FreeSurfer’s recon-all function usually requires about 10/12 h per subject. In this regard, to reduce computation time, we decided to re-run the analyses in parallel using 12 logical cores: a group of 10/15 scans were segmented with this latter approach in the same amount of time. In fact, we believe that with the implementation of a faster segmentation pipeline, this work would outperform the clinical tests now used in isolation. A possible future solution to minimise segmentation time in clinical practice could be the extraction of a custom T1w-MRI-based template built from the chosen dataset (e.g. using the SPM DARTEL pipeline).

In summary, this study proposes an unsupervised approach for the development of an MRI-based biomarker for the biological characterisation of AD. The ApV is reproducible and robust. It can be easily computed with the calculation of manually engineered features and is ready to be integrated into the clinical decision support system without the need for additional sampling or patient testing.