A predictive model using the mesoscopic architecture of the living brain to detect Alzheimer’s disease

Background Alzheimer’s disease, the most common cause of dementia, causes a progressive and irreversible deterioration of cognition that can sometimes be difficult to diagnose, leading to suboptimal patient care. Methods We developed a predictive model that computes multi-regional statistical morpho-functional mesoscopic traits from T1-weighted MRI scans, with or without cognitive scores. For each patient, a biomarker called “Alzheimer’s Predictive Vector” (ApV) was derived using a two-stage least absolute shrinkage and selection operator (LASSO). Results The ApV reliably discriminates between people with (ADrp) and without (nADrp) Alzheimer’s related pathologies (98% and 81% accuracy between ADrp - including the early form, mild cognitive impairment - and nADrp in internal and external hold-out test sets, respectively), without any a priori assumptions or need for neuroradiology reads. The new test is superior to standard hippocampal atrophy (26% accuracy) and cerebrospinal fluid beta amyloid measure (62% accuracy). A multiparametric analysis compared DTI-MRI derived fractional anisotropy, whose readout of neuronal loss agrees with ADrp phenotype, and SNPrs2075650 is significantly altered in patients with ADrp-like phenotype. Conclusions This new data analytic method demonstrates potential for increasing accuracy of Alzheimer diagnosis.


Plain Language Summary
Alzheimer's disease is the most common cause of dementia, impacting memory, thinking and behaviour. It can be challenging to diagnose Alzheimer's disease which can lead to suboptimal patient care. During the development of Alzheimer's disease the brain shrinks and the cells within it die. One method that can be used to assess brain function is magnetic resonance imaging, which uses magnetic fields and radio waves to produce images of the brain. In this study, we develop a method that uses magnetic resonance imaging data to identify differences in the brain between people with and without Alzheimer's disease, including before obvious shrinkage of the brain occurs. This method could be used to help diagnose patients with Alzheimer's Disease.
A lzheimer's disease (AD) is the most common cause of dementia worldwide and is characterised by progressive cognitive impairment and brain atrophy 1 . The disease is characterised by several events. The National Institute on Aging and Alzheimer's Association has proposed a classification system to categorise individuals based on biomarker evidence of pathology. This is called the ATN classification system and is used to rate people for the presence of cerebrospinal fluid β-amyloid (CSF Aβ or amyloid positron emission tomography (PET): 'A'), hyperphosphorylated τ (CSF pτ or τ PET: 'T'), and neurodegeneration (atrophy on structural magnetic resonance imaging (MRI), FDG) PET, or CSF total τ: 'N'), resulting in eight possible biomarker combinations 2 . Furthermore, a recent report on the involvement of microglial activation in the spread of τ tangles over the neocortex in AD suggests an additional inflammation biomarker for AD 3 . The most consistent structural imaging finding in AD is the reduced hippocampal volume 4 , but this is arguably not the most specific structural biomarker as AD frequently presents with non-amnestic symptoms with initial involvement of extra-temporal regions of the brain 5 . Furthermore, the reduced hippocampal volume has been found in many other neuropsychiatric conditions including schizophrenia 6 , depression 7 and hippocampal sclerosis 8 as well as the recently described limbic-predominant age-related TDP-43 encephalopathy 9 . Together with the hippocampal volume, Aβ , phosphorylated τ (pτ), and total τ (τ) CSF biomarkers have been shown to discriminate patients with AD from healthy controls 10 . However, their introduction into clinical practice is limited by considerable variability between laboratories and assay batches 10 . Similarly, blood-based biomarkers, which are eagerly awaited to address issues related to the invasiveness and high cost of CSF-based ones, often stall in the early stages because of a disconnect between academia, where biomarkers are identified, and industry, where they should be developed and commercially distributed 11 .
In these last 40 years, improved computational power and storage capacity have led to numerous advances in developing non-invasive and low-cost structural biomarkers for AD that combine neuroimaging approaches, in particular structural MRI 12 , with machine learning. This approach involves the acquisition of image data, the segmentation of the region of interest (ROI), feature extraction and selection for classification/ prediction. Critically, features extracted from radiological images are able to reveal useful new biology 13,14 hidden to the clinician's eye 15 -at a mesoscopic scale. For example, the mesoscopic architecture of entire tumours can reveal stromal phenotype or immune context, with strong prognostic or predictive utility 16,17 . In a radiomics analysis, the extracted features represent statistical morpho-functional traits of intensity, shape, texture, scale, grey level co-occurrence matrix (GLCM), grey level run-length matrix (RLM), grey level size zone matrix (GLSZM), neighbourhood grey tone difference matrix (NGTDM) and neighbourhood grey level dependence matrix (NGLDM) 18 . A number of studies have shown texture differences between AD patients and healthy controls (HC) in structures such as the hippocampus, corpus callosum, and thalamus 19,20 . Supplementary Data 1 summarises the results and methods of the most cited papers published in the last 5 years on the classification of AD and AD-related mild cognitive impairment (MCI) patients using multimodal features. Zhang et al. 21 for instance used a single-hidden-layer neural network and predator-prey particle swarm optimisation algorithm to classify HC from AD patients. They extracted texture features from one selected axial slice of a T1-weighted (T1w) MRI scan and obtained 93% accuracy in an internal test set. Similarly, Sorensen et al. 22 , with a linear discriminant analysis extracted cortical thickness measurements, volumetric measurements and hippocampal volume, shape and texture features and reached from a T1w MRI scan with 63% accuracy. With the integration of genetic and cerebrospinal fluid biomarkers, Tong et al. 23 reached a 0.78 area under the curve (AUC) in the discrimination between HC and people with an AD-related mild cognitive impairment, thus pushing the technology towards earlier detection. They used a non-linear graph fusion method to reduce the number of volumetric features extracted from T1w MRI, intensity features extracted from PET data, three CSF measures and one genetic categorical feature. An improved performance was obtained with the view-aligned hypergraph learning approach used by Lin et al. 24 . They obtained 93, 90, 80 and 79% accuracies in the discrimination between HC and AD patients, HC and progressive MCI, HC and MCI, and stable and progressive MCI patients, respectively. In aggregate, when all patients, including control, prodromal forms of AD and AD are combined, most methods reach lower accuracy values. Of note, in most studies, models were trained and tested on an internal dataset only (Supplementary Data 1).
This current study proposes a method able to characterise early and later forms of Alzheimer's disease with the extraction from a T1w MRI sequence of 29,520 statistical morphofunctional traits distributed over a multi-regional brain mask obtained with an automatic segmentation. Healthy brain and diseases unrelated to AD pathology, including Parkinson's disease and frontotemporal dementia have been combined for the development of a set of tools able to reveal the mesoscopic architecture unique to AD.

Methods
The study workflow is summarised in Fig. 1. The analysis of baseline age-matched T1w MRI images consisted of a two-step combined approach with and without the additional information given by cognitive scores and CSF-based biomarkers. The model was trained on 1.5 T T1w MRI scans obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI). After stratified randomisation, 70% of data were used for training and 30% for validation (robustness test shown in Supplementary Fig. 1). The control group (nADrp) included healthy controls, patients with frontotemporal dementia and with Parkinson's disease and the disease group (ADrp) included people with AD-related mild cognitive impairment (referred to as MCI AD in the text) and with Alzheimer's disease. The method was tested on four cohorts: (1) The unseen 1.5 T ADNI cohort (30% of the entire 1.5 T cohort, made up of 65 CN, 62 MCI AD , 54 AD, 28 FTD and 25 PD); (2) The unseen 1.5 T dataset: 64 people obtained from the Open Access Series of Imaging Studied (OASIS) consortium with baseline T1w MRI scan and the mini-mental state examination (MMSE) score (53 CN and 11 AD); (3) The unseen 3 T dataset: 402 people obtained from ADNI with T1w MRI scan, MMSE, logical memory delayed recall total (LDELTOTAL), Aβ, τ and pτ (172 CN, 161 MCI AD and 69 AD); (4) The 'real-world' memory clinic cohort (IMC cohort): 83 patients with atypical presentations who underwent clinical Amyloid PET imaging as part of their diagnostic workup with a 1.5 T T1w MRI scan (45 amyloidnegative (AMY−) and 38 amyloid-positive (AMY+)) and LDELTOTAL and MMSE scores (for a subgroup of 22 people: 11 AMY− and 11 AMY+).
For the IMC cohort, we received ethical approval from the Camden and Kings Cross UK Research Ethics Committee (IRAS n. 273966) to perform retrospective anonymised and unlinked analysis of all clinical data (including MR images), provided that these were anonymised at source by a member of the clinical care team. In particular, the study protocol states: 'For all patients undergoing Amyloid PET at Imperial College Healthcare NHS Trust (ICHT) from December 2013 to January 2023 we will perform retrospective anonymised and unlinked analysis of clinically collected data. This will be anonymised at source by members of the clinical care team. The data will be unlinked and there will be no prospective element to this data collection.' Informed consent was waived, as is the case for retrospective analysis of anonymised imaging data.
Data for ADNI and OASIS are openly available upon registration of investigator interest. All participants provided informed consent. Details about the Ethics statement of the ADNI study population can be found at: https://adni.loni.usc.edu. Details about the Ethics statement of the OASIS study population can be found at: https://www.oasis-brains.org/#data. Protocols for data collection and the list of institutions who approved data collection can be found at https://adni.loni.usc.edu/methods/documents/ for ADNI. OASIS is made available by the Washington University Alzheimer's Disease Research Center, the Howard Hughes Medical Institute (HHMI) at Harvard University, the Neuroinformatics Research Group (NRG) at Washington University School of Medicine, and the Biomedical Informatics Research Network (BIRN).
MRI segmentation and radiomic analysis. T1w MRI images were segmented to brain masks of 115 sub-regions using the FreeSurfer's recon-all function (45 regions obtained from the segmentation of the white matter +70 subcortical regions obtained from the additional segmentation of the cortex) 25,26 . Before segmentation, this function performs many pre-processing steps, including bias correction, image sampling and coregistration; the steps and brain regions extracted are summarised in Supplementary Table 1. The multi-regional brain masks were post-processed for the extraction of 656 features for each region using in-house software (TexLAB 2.0), which runs on MATLAB 16 . The extracted features are related to the shape and size, intensity, texture and wavelet decompositions of isotropic (1 × 1 × 1) T1w MRI scans (Supplementary Data 2). The standardised radiomic features with a false discovery rate (FDR) <5% were selected as the input for the LASSO. Tenfold crossvalidation was performed to select lambda which yielded the minimum cross-validated mean squared error. The weighted sum of the selected features gave the Alzheimer's predictive Vector, ApV. For improving the model performance, the method was integrated with two cognitive measurements (MMSE and LDELTOTAL) and three CSF-based biomarkers (Aβ, τ and pτ). The result was a second predictive vector: ApV s .
The model is composed of two steps: 1. In the first stage of the classification, the algorithm works on the discrimination of people with an Alzheimer related pathology. The two inputs to the LASSO1 are the nADrp group, which includes healthy controls and people with Parkinson's and frontotemporal dementia, and the ADrp group, which includes people with MCI AD and AD. The result of the LASSO is a reduced number of features/regions with their correspondent weights. The weighted sum of regions/features gives the ApV 1 (ApV 1s with the inclusion of cognitive scores and CSF related biomarkers). People classified as not-nADrp are used as inputs for the second stage of the classification. 2. In the second stage of the classification, the algorithm works on the distinction between people with an ADrelated mild cognitive impairment and with Alzheimer's disease. The LASSO2 performs a weighted sum of selected features/regions and gives the ApV 2 (ApV 2s with the inclusion of cognitive scores and CSF related biomarkers) which characterise a prodromal from a late phase of AD.
The performance of the algorithm was tested using two methods. In Method A, the features extracted from the 45-region brain mask (alone and together with cognitive/CSF scores) were used and, in Method B, features extracted from the (45 + 70)region brain mask (alone and together with cognitive/CSF scores) were used. Based on the accuracy and the accuracy/AUC values, Method B was chosen for the computation of the ApV 1 , and Method A was chosen for the computation of ApV 1s , ApV 2 and ApV 2s (Table 1).
Genomic analysis. Six genome-wide association study (GWAS) analyses were performed across three phenotypes (nADrp, MCI AD, AD) derived from three variables (original label (ADNI), ApV and ApVs). One GWAS was performed for nADrp vs MCI AD and another GWAS for nADrp vs AD across all five variables. APOE4 allele status was provided by ADNI APOE genotype dataset. All the GWAS analyses were adjusted for age and gender using the GWASTools R package (v1.36). Each GWAS analysis calculated the main effects of all single-nucleotide polymorphisms (SNPs) on the target label (MCI AD /AD). For all Fig. 1 Overview of the study design and two-step least absolute shrinkage and selection operator (LASSO) approach. Data used in this work were obtained from ADNI database, the OASIS consortium and the hospital memory clinic (IMC Cohort). Age-matched T1w MRI images were collected and segmented into 115 brain regions using the FreeSurfer's recon-all function. Isotropic (1 × 1 × 1) T1w MRI scans and their brain masks were used for the radiomic analysis in a combined double step approach. After the selection and the standardisation of features, a first least absolute shrinkage and selection operator (LASSO1) was trained to classify people into those without and with AD-related pathology (nADrp and ADrp). Within the last group, a second LASSO (LASSO2) was trained to characterise patients with a mild cognitive impairment due to AD (MCI AD ) from AD patients. The model was also integrated with cognitive scores (MMSE and LDELTOTAL) and CSF-based biomarkers (Aβ, τ and pτ). As the final algorithm was to be used to discriminate between ADrp and nADrp, combined healthy controls and patients affected by other non-AD pathologies (e.g. Frontotemporal dementia and Parkinson's disease dementia) were combined into one group referred to as non-AD-related pathology group. Initial analysis of T2w MRI data did not yield discriminatory information, so only T1w MRI data is reported.
GWAS the empirical p values were based on the Wald statistic 27 . Manhattan plots were used to visualise GWAS results.
Statistics and reproducibility. Standard statistical analysis was applied to all the figures as appropriate and indicated in the figure legends. All samples were used once. Multiple testing was corrected with the FDR method. All the statistical analyses were conducted in Matlab R2019b.
Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Results
Characteristics of data and patients. Data used in this work were obtained from the ADNI database (www.loni.ucla.edu/ADNI), launched in 2003 as a public-private partnership, led by Principal Investigator Michael W. Weiner, MD. The primary goal of ADNI is to test whether serial MRI, PET, other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of MCI and early AD. For up-to-date information, see www.adni-info.org. From this database, all people for whom baseline MRI data (T1w magnetisationprepared rapid acquisition with gradient echo (MP-RAGE) sequence at 1.5 T), age, and cognitive scores (MMSE 28 , a brief screening test for cognitive status and the LDELTOTAL 29 , a measure of verbal episodic memory), CSF-based biomarkers (Aβ, τ and pτ) were available have been included.
For the diagnostic classification at baseline, the method was trained on 783 people scanned at 1.5 T (ADNI1 cohort). They were grouped as 216 healthy controls, 208 people with MCI due to AD (MCI AD ), 181 AD, 94 patients with Frontotemporal Dementia (FTD), and 84 with Parkinson's disease (PD).
In particular, based on the data obtained from the ADNI database, two new groups of people were defined: the nADrp group, which contains people who do not show any pathology related to AD (healthy controls, PD and FTD were included here); and the ADrp group which, on the contrary, contains people with MCI due to AD and AD patients.
A multiparametric analysis was conducted on a subset of 118 diffusion tensor imaging (DTI) MRI sequences obtained from ADNI (39 AD, 40 CN and 39 MCI AD ). They were used to assess the variability of the fractional anisotropy (FA) and its relationship with the extracted features. Finally, quantitative phenotypes derived from ADNI Genetics Core were available for 199 CN, 187 MCI AD and 166 AD people of our 1.5 T training cohort and used for GWAS analysis. The classification between nADrp and ADrp, as well as the classification between MCI AD and AD patients were tested with two methods. With Method A, the algorithm received as input features extracted from the 45 brain regions resulting from segmentation of the white matter (without and with the CSF/cognitive scores). Method B considered the features extracted from the 70 subcortical regions (without and with the CSF/cognitive scores).
For each subject, T1w MRI images were automatically segmented into 115 regions from which radiomic features were independently acquired, standardised and reduced with a machine learning-based model. They were finally combined in Alzheimer's predictive vectors.
ApV 1a biomarker to discriminate between patients with and without AD-related pathology. Among the 656 features extracted for each of the 115 brain regions, LASSO1 selected 20 features (those with non-zero coefficients) distributed in 14 regions (Fig. 2a). The weighted sum of extracted features in the selected regions gave the Alzheimer's predictive vector ApV 1 . With the integration of cognitive scores and CSF-based biomarkers, LASSO1 selected 19 features distributed among 12 regions (Fig. 2b). In a similar way, the combination of features, cognitive scores and regions gave the predictive vector ApV 1s . Figure 2aI, aII (and bI-bII) show the tenfold cross-validated deviance of the LASSO fit and the feature coefficients plotted against the shrinkage parameter lambda extracted for the ApV 1 (ApV 1s ).  Table 2). Of note, the measurements of diagnostic accuracy of Aβ are obtained with the application of established cut-off values 32 from the comparison between CN and ADrp. Compared to the standard measures, our method showed higher specificity, sensitivity, accuracy, negative and positive predictive values, likelihood ratios and diagnostic odds ratios. ApV 1 showed a stateof-the-art accuracy of 0.98 (0.26 and 0.62 for the volume of the hippocampus and CSF Aβ, respectively) in the prediction of ADrelated pathologies. Of note, neither age nor CSF biomarkers were selected by LASSO1.
The testing of the method on the unseen 1.5 T OASIS cohort showed 0.81 and 0.83 accuracies for ApV 1 and ApV 1s , respectively ( Table 2). Applied unmodified to a different field strength (3 T), our method showed 91 and 80% specificity, together with reduced accuracy of 0.49 and 0.47 for the ApV 1 and ApV 1s , respectively.
ApV 2a biomarker to categorise ApV 1 /ApV 1s positive patients into prodromal (MCI AD ) and late (AD) groups. The LASSO2 selected 8 features distributed in seven regions (Fig. 3a) with a dominance of the left brain. The weighted sum of the extracted features in the selected regions gave the Alzheimer's predictive vector ApV 2 . With the integration of cognitive scores and CSF-based biomarkers, the LASSO2 selected 19 features distributed in 15 regions (Fig. 3b). The combination of features, cognitive scores and regions gave the predictive vector ApV 2s . Figures 3aI, aII (and bI-bII) show the tenfold cross-validated deviance of the LASSO2 fit and the feature coefficients plotted against the shrinkage parameter lambda extracted for the ApV 2 (ApV 2s ). Figure 3aIII, aIV show the ROC curve for the validation of ApV 2 (AUC of 0.79) and the distribution of the validated ApV 2 in the MCI AD and AD groups, respectively. Similarly, Fig. 3bIII, bIV show the ROC curve for the validation of ApV 2s (AUC of 0.95) and the distribution of the validated ApV 2 in the MCI AD and AD groups, respectively. The predictive ability of the ApV 2 in discriminating people with prodromal and later forms of AD in comparison with the standard clinical measures-the volume of the hippocampus and the CSF Aβ-was quantified with the measures of diagnostic accuracies and is summarised in Table 3. ApV 2 reached an accuracy of 0.79 in the prediction of AD, with higher accuracy of 0.86 with the integration of clinical scores, independent of age and CSF biomarkers. The high accuracy is remarkable given the continuum of disease progression between MCI AD and AD. Applied to different field strengths (3 T), our method showed an accuracy of 0.62 and 0.82 for the ApV 2 and ApV 2s , respectively. The LASSO2 could not be tested on the OASIS cohort as it does not include any MCI AD people. In aggregate, our results show a predominant dysfunction in the left hemisphere 33 . This confirms the strong left-hemispheric lateralisation found in the early stages of the disease compared to weak right-hemispheric lateralisation found in advanced stages 34 (see also Supplementary Note 1 and Supplementary Fig. 3).
Repeatability of the Alzheimer's predictive vectors. The ApV methods were compared to the standard imaging measure (the volume of the hippocampus) and tested on a second T1w MRI scan obtained on the same day of the baseline scan used for training the model. The Bland-Altman plots are shown in Supplementary Fig. 4. Based on the reporting guidelines by Koo and Li 35 , a one-way random effects, absolute agreement, single rater/ measurement interclass correlation coefficient was evaluated and was 0.83, 0.89, 0.83 and 0.82 for ApV 1 , ApV 1s , ApV 2 and ApV 2s , respectively. The interclass correlation coefficient for the hippocampal volume was 0.94. A boxplot of the distribution of the volumes of the hippocampus in the main groups is also shown in Supplementary Figure 4f. The robustness (non-random nature) of our ApV 1 and ApV 2 was further tested. Results are summarised in Supplementary Table 2. The measurements of diagnostic accuracy of ApV 1 (a) and ApV 2 (b) are obtained when the ApV is computed with the complete set of features extracted by the LASSO (Ftot), the four features with the highest weights (Ftest4) and all the possible permutations with three (Ftest3-p1, Ftest3-p2, Ftest3-p3, Ftest3-p4) and two features (Ftest2-p5, Ftest2-p6, Ftest2-p7, Ftest2-p8, Ftest2-p9 and Ftest2-p10) are reported. With regards to the ApV 1 , Ftest4 showed a comparable performance compared to Ftot. Among all the permutations, Ftest3-p2 obtained the best performance involving the features extracted in the right middle temporal, rostral middle frontal and temporal pole (98% accuracy, 0.99 AUC). Regarding ApV 2 , the best performance was obtained when the ApV was computed with only two features extracted from the left cerebral white matter (WM) and left Cerebellum WM (78% accuracy and 0.79 AUC).
The ApV on 'real-world' data. The model was tested on the IMC cohort, which includes people who underwent a clinical amyloid PET scan at our institution and are classified as Amyloid-positive (AMY+) or negative (AMY−). When applied to this 'real-world' cohort, no statistical difference was found between ApV 1 and ApV 2 in people with positive/negative amyloid enhancement (p = 0.88) (Supplementary Fig. 5b). Regardless of the PET output, people were classified as nADrp and MCI AD (in particular, of the 44 AMY−, 42 were classified as nADrp, 2 as MCI AD and 1 as AD; of the 38 AMY−, 36 were classified as nADrp and 2 as MCI AD ). The model was also tested on a subgroup of 22 people whose T1w MRI scan was obtained 5 ± 4 months after Amyloid PET imaging and was used together with the MMSE and the LDEL-TOTAL cognitive scores. In this small cohort, people with a negative PET scan were classified as nADrp (N = 8), MCI AD (N = 2) and AD (N = 1). People with a positive scan were evenly classified as nADrp and MCI AD (N = 5), only one subject was classified as AD. In relation to the PET output, our ApV 1s showed a statistical difference between AMY-and AMY+ (p = 0.02) (Supplementary Fig. 5b).
Genome-wide association study and fractional anisotropy. Figure 4 shows the Manhattan plot of the GWAS for the ApVs. The Manhattan plot shows one SNP above a significance threshold of p < 10 −7 . This SNP corresponded to the genotype RS IDs: rs2075650. The rs2075650 SNP was above the significance thresholds across all variables, original labels, ApV and ApVs ( Supplementary Figs. 6, 7). Similarly, for all cognitively normal vs mild cognitive impairment, no SNPs were above the threshold. Additionally, in the ApV group, ADrp vs AD, the p < 10 −6 SNP rs575606 was above a threshold of p < 10 −6 ( Supplementary  Fig. 6). When performing a GWAS adjusting for the presence of one or two APOE4 alleles, no SNPs were identified as significantly associated with AD in any of the outcomes ( Supplementary  Fig. 7). Additionally, we present LocusZoom plots of the 2000 base pairs around rs2075650 on the GWAS results without the adjustment of APOE4 (Supplementary Fig. 8). An extensive interpretation of the GWAS results is included in Supplementary Note 2. In aggregate, Supplementary Note 2 includes the allele frequencies evaluation (allele proportions and Hardy-Weinberg Equilibrium Fisher's exact test p value) for the SNP rs2075650, which shows 'B' to be the minor allele with both the ApVs and ApV classification (Supplementary Table 3).
In agreement with the ADrp phenotype, the analysis of fractional anisotropy from DTI MRI sequences showed a neuronal loss in ADrp people. The variation of FA was tested in 115 brain regions. A Wilcoxon rank-sum test was used to test the regional statistical difference of FA between nADrp (N = 79) and ADrp (N = 39) and between MCI AD (N = 31) and AD (N = 8) people. For most regions, no statistically significant reduction was present (p > 0.05) (Fig. 4C). Twenty-two out of 115 regions showed a significant variation of FA between nADrp and ADrp (left and right cerebral cortex and the left caudate showed an FA increase). Between MCI AD and AD, 11 out of 115 regions showed a significant variation of FA (an increase of FA was present only in the left amygdala). Figure 4D shows the absolute values of FA in the regions for which a statistical difference was found between nADrp and ADrp and between MCI AD and AD patients (p < 0.05).

Discussion
This study presents a novel MRI-based radiomic predictive vector which outperforms standard hippocampal volume and CSF Aβ measurements (Table 2) reaching a 0.98 accuracy in an internal test set (mean value 0.9830, 95% confidence interval (CI) [0.9829, 0.9831]) for the triage of people without an AD-related pathology. Our ApV is robust and repeatable across MRI scans (Supplementary Fig. 4), demonstrating its potential for applicability in clinical practice in the future.
This method does not require a subject matter expert, but rather uses established software for both brain segmentation (FreeSurfer) 25,26,36 and radiomics analysis 16 . The algorithm computes manually engineered features allowing an easy interpretation of the ApV and facilitating clinical translation. To avoid overfitting, the dimensionality of the model is reduced with the 'least absolute shrinkage and selection operator' 37 , which selects the most informative and less redundant features corresponding to specific brain regions. The LASSO is suitable for the regression of high-dimensional features in a radiomics strategy 38 allowing, in a single regression model, the statistical analysis of complex data where data are labelled to exploit dependence patterns in specific brain regions. Compared to the most common multivariate models present in the literature (Random Forest, Naïve  In the testing test, AUC values were generated from sensitivity and specificity 62 . DOR diagnostic odds ratio,  Bayes, K-Nearest Neighbours and Support Vector Machine), our univariate analysis shows higher accuracy (Supplementary Table 4) and easier interpretability, thanks to the implementation of manually engineered features, facilitating clinical translation.
In order to improve the model's generalisability, the training of ApV exploits commonalities and differences within the segmentations between controls and patients with FTD, PD, MCI due to Alzheimer's disease and AD-appreciating that patients who come to the memory clinic may have other conditions. We rationalised that the extra information from FTD and PD segments will allow the model to gain a better contextual understanding of the regions of interest and better discriminate nADrp from ADrp rather than for detecting FTD or PD per se. Appreciating that the inclusion of non-AD pathologies in the control group of the training set could have introduced a classification bias leading to an overrated model accuracy, further tests were done to assess the impact of PD and FTD patients in the nADrp group. The measurements of diagnostic accuracy obtained when the classification is computed between CN and ADrp, as well as between CN and MCI AD and CN and AD patients (in comparison with the proposed original method, in italic - Table 4) prove that the performance of our method is not influenced by the presence of PD and FTD patients in the nADrp group.
In an internal test set (the 1.5 T ADNI cohort), the ApV 1 is able to discriminate between people with (ADrp) and without (nADrp) Alzheimer's related pathologies with a 0.98 accuracy. Differently from the majority of published research studies, where models are usually trained between two categories (e.g. HC vs AD or MCI vs AD) (Supplementary Data 1), our algorithm includes both AD patients and people with the early form of AD, mild cognitive impairment in the ADrp group. This procedure permits triage of patients who neither have MCI AD nor AD, taking into account the notion that Alzheimer's disease exists along a spectrum, from early memory changes to functional dependence and death. To the best of our knowledge, the accuracy reached by the ApV in the internal dataset (obtained by analysing MRI data with or without cognitive scores) is superior to the ones obtained from published research studies, which focus on a single internal test set only [39][40][41] . However, the true performance of a radiomic model needs to be validated on external datasets or independent institutional cohorts; in practice, only a minority of studies report an application of algorithms to external datasets 42 . When tested on an external test set (the unseen 1.5 T OASIS cohort), our algorithm reaches a 0.86 accuracy, higher than previously reported studies 43 . Furthermore, when compared to the standard clinical measures of hippocampal atrophy and cerebrospinal fluid beta-amyloid concentration, the ApV shows higher accuracy, presenting a potentially valid alternative to the invasive CSF measurements.
To be precise, the ApV is independent of the amyloid levels in the CSF. Regardless of the stronger pathological biomarker signature encountered when increased CSF concentrations of τ and pτ species, decreased concentrations of Aβ 32,44 and cognitive scores are considered together with structural data, it is notable that Aβ, τ and pτ were not selected as part of the optimised ApV algorithm. This result can be explained by the inner low accuracy of the CSF-based biomarkers collected for our cohorts (Supplementary Table 5), with respect to the established cut-off values (93 pg/ml for τ, 192 pg/ml for Aβ1-42 and 23 pg/ml for pτ) 32 . The non-overlapping nature of the ApV means that a combination of these with CSF biomarkers could be explored in the future to further improve accuracy in early MCI AD /AD.
The ApV describes the mesoscopic architecture and the biological changes of an AD brain. With an unsupervised approach, and appreciating the lack of post-mortem AD confirmation in our cohort of people, the algorithm selects texture and shape features, strong biomarkers of AD 20,45,46 , in regions typically involved in the development of the disease (the hippocampus, entorhinal cortex, amygdala 47 ). In particular, our results show a predominant dysfunction in the left hemisphere 33 , confirming the strong left-hemispheric lateralisation found in the early stages of the disease compared to weak right-hemispheric lateralisation found in advanced stages 34 . As extensively described in the 'Biological interpretation of ApV' in the Supplementary Note 1, the cortical grey matter structural changes, usually due to the ageing brain and cognitive decline caused by neuronal loss [48][49][50] , are represented in part within the ApV by GLCM and FD features 51 and confirmed, with the multiparametric analysis of DTI MRI images, by the statistically significant decrease of FA in AD patients. For example, the GLCM correlation feature, filtered with an LHL wavelet filter, in the left lateral ventricle expresses the dependency of grey level values to their respective voxels in the GLCM possibly relating to grey levels' distribution in this brain region of AD patients where ventriculomegaly is commonly observed. Brain parenchymal shrinkage causes, in most neurodegenerative disorders, the passive enlargement of the lateral, third and fourth ventricles with a significant ventricular enlargement associated with AD 52 . Furthermore, cognitive decline, expressed as local neuronal loss of many hippocampal subfields (subiculum, cornu ammonis) following AD progression (as also confirmed by the statistically significant decrease of fractional anisotropy), is expressed by the Neighbouring Grey Tone Difference Matrix (NGTDM) coarseness feature extracted in the right hippocampus. This is a measure of the average difference between the central voxel and its neighbourhood and is an indication of the spatial rate of change. A higher value indicates a lower spatial change rate and a locally more uniform texture.
Together with high pass wavelet filters applied in one dimension and a low pass one applied in the other two, the extraction of the coarseness in the hippocampus represents an index of heterogeneity. Interestingly, the algorithm also selects regions not commonly related to AD, such as the cerebellum and the ventral diencephalon. Together with a few studies reported in the literature 53,54 , this outcome challenges the traditional view that white matter bundles in the cerebellum or in the ventral diencephalon are not affected by AD, possibly highlighting new therapeutic opportunities. The GWAS performed across nADrp, MCI AD and AD derived from the ApV classification labels highlights genetic insights distinct from classical APOE-only gene association in AD. The non-causal significant alteration of the SNP rs2075650 found in patients with ADrp-like phenotype reinforces a body of research that associates this gene with MCI AD and AD [55][56][57] . TOM40 is located adjacent to APOL, and the two genes are thought to be correlated with Alzheimer's due to linkage. Given that after adjusting for APOE4 allele status, rs2075650 is no longer significant, this suggests the TOM40 association signal is driven by the APOE4 allele and surrounding variants.
The ApV is also age-independent for the age range used. The similarity between age-related atrophy in AD and in normal aging represents one limitation of applying multivariate models to structural MRI 58 . In this study, this issue is assessed following the age-correction method by Moradi et al. 59 , which introduced a large distortion on the MRI image, limiting the reliability of the extracted features, thus, considering age as an additional feature. The result was a non-selection of age among the less redundant, most significant features. This method provides a biomarker able to detect an early stage of AD with a significant potential improvement of the clinical decision support system. The ApV was tested on a clinical cohort of people with objective cognitive impairment and uncertain underlying aetiology caused by an atypical clinical course or the presence of multiple co-morbidities (Fig. 5a). When employed in this cohort, the ApV outperformed the hippocampal volume measurements (Fig. 5b) and the standard cognitive scores (Fig. 5e) showing a statistically significant difference between the AMY− and AMY+ groups (p = 0.02, Fig. 5d). Therefore, where isolated hippocampal atrophy or episodic memory impairment fails to differentiate AMY+ from AMY− patients, the ApV shows a stronger diagnostic potential.
Other than its retrospective nature, a limitation of this study is represented by the lower performance of the method when tested unmodified at higher different field strengths (the unseen 3 T dataset). As shown in Table 2, very high positive predictive values are associated with low sensitivity and overall low accuracy for both the ApV 1 and ApV 2 obtained from a baseline 3 T ADNI cohort. This result confirms the hypothesis that MRI radiomic features are susceptible to magnetic field strength 60 and limits the applicability of our current method only to 1.5 T data. Future studies will focus on the development of preprocessing techniques for the improvement of the performance of the algorithm on 3 T data together with the introduction of an equivalent algorithm for higher field strengths. A second limitation of this study is the impossibility of directly comparing our method with the published literature. This is mainly related to how we decided to structure our input to improve the model's generalisability: the control group, together with healthy people, also contains people with Parkinson's disease and frontotemporal dementia. A third limitation of this study is related to the computational effort needed to pre-process the structural MRI data. The segmentation step performed by FreeSurfer's recon-all function usually requires about 10/12 h per subject. In this regard, to reduce computation time, we decided to re-run the analyses in parallel using 12 logical cores: a group of 10/15 scans were segmented with this latter approach in the same amount of time. In fact, we believe that with the implementation of a faster segmentation pipeline, this work would outperform the clinical tests now used in isolation. A possible future solution to minimise segmentation time in clinical practice could be the extraction of a custom T1w-MRIbased template built from the chosen dataset (e.g. using the SPM DARTEL pipeline).
In summary, this study proposes an unsupervised approach for the development of an MRI-based biomarker for the biological characterisation of AD. The ApV is reproducible and robust. It can be easily computed with the calculation of Fig. 4 Genetic and molecular characteristics associated with the ApV biomarker. In A, B the Q-Q and Manhattan plots of genome-wide association study (GWAS) of the cognitively normal and Alzheimer's disease labels derived from ApVs are shown. In detail, B is the Manhattan plot of the p values (−log 10 (Wald p value)) from GWAS analysis of the ApVs. The horizontal line displays the cut-off for two significant levels (p < 10 −7 ). Shown in A is the quantile-quantile (Q-Q) plot of the distribution of the observed p values (−log 10 (observed p value)) in this sample versus the expected p values (−log 10 (expected p value)) under the null hypothesis of no association. Shown in C is the variation of fractional anisotropy tested in 115 brain regions. A Wilcoxon rank-sum test was used to test the regional statistical difference of FA between nADrp (N = 79) and ADrp (N = 39) and between MCI AD (N = 31) and AD (N = 8) people. D The absolute values of FA in the regions for which a statistical difference was found between nADrp and ADrp and between MCI AD and AD patients (p < 0.05) is shown. The two inputs to the LASSO1 are the nADrp group, which includes healthy controls and people with Parkinson's and frontotemporal disease, and the ADrp group, which includes people with MCIAD and AD. The diagnostic performance of the algorithm was tested when the classification is computed between the ADrp group and healthy people, between CN and MCIAD and CN and AD patients.
manually engineered features and is ready to be integrated into the clinical decision support system without the need for additional sampling or patient testing.

Data availability
The radiomics data generated in this study have been deposited into the Mendeley database under the accession code DOI: 10.17632/rpztyz22df 61 . All the other data supporting the findings of this study, together with the source data underlying the graphs and charts shown in the manuscript are available and have been deposited into the Mendeley database under the accession code https://doi.org/10.17632/rpztyz22df 61 .

Code availability
The MATLAB scripts used to reproduce the key findings and generate figures are publicly accessible in Mendeley Data with the identifier https://doi.org/10.17632/ rpztyz22df.1 61 .
Received: 28 July 2021; Accepted: 24 May 2022; Fig. 5 Early detection of Alzheimer's disease in an atypical-AD cohort. a Patients presenting at the IMC with suspected cognitive decline undergo a range of standard diagnostic investigations, such as MRI and neuropsychological assessment, which can vary across individuals depending on the clinical presentation. Where diagnostic uncertainty persists, the decision to perform Amyloid PET Imaging is made by consensus by a multidisciplinary team 30 and in line with the appropriate use criteria 31 . In this context, a positive Amyloid PET imaging is highly suggestive of an underlying AD diagnosis, while a negative scan rules out AD. Patients with a negative Amyloid PET imaging often have either a non-AD type of dementia (e.g., FTD) or other non-neurodegenerative causes of cognitive impairment (e.g. depression).