Prediction of tau accumulation in prodromal Alzheimer’s disease using an ensemble machine learning approach

We developed machine learning (ML) algorithms to predict abnormal tau accumulation among patients with prodromal AD. We recruited 64 patients with prodromal AD using the Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset. Supervised ML approaches based on the random forest (RF) and a gradient boosting machine (GBM) were used. The GBM resulted in an AUC of 0.61 (95% confidence interval [CI] 0.579–0.647) with clinical data (age, sex, years of education) and a higher AUC of 0.817 (95% CI 0.804–0.830) with clinical and neuropsychological data. The highest AUC was 0.86 (95% CI 0.839–0.885) achieved with additional information such as cortical thickness in clinical data and neuropsychological results. Through the analysis of the impact order of the variables in each ML classifier, cortical thickness of the parietal lobe and occipital lobe and neuropsychological tests of memory domain were found to be more important features for each classifier. Our ML algorithms predicting tau burden may provide important information for the recruitment of participants in potential clinical trials of tau targeting therapies.


Scientific Reports
| (2021) 11:5706 | https://doi.org/10.1038/s41598-021-85165-x www.nature.com/scientificreports/ In a previous study, worse performance on the domain-specific neuropsychological tests was associated with a greater 18 F-AV1451 uptake in key regions implicated in memory, visuospatial function, and language 14 . In the combined prodromal AD and AD dementia group, increased tau PET uptake and reduced cortical thickness were associated with worse performance on a variety of neuropsychological tests 15 . Altogether, these biomarkers seem to be the potential features of classifiers predicting tau burdens. In particular, different models with various combinations of biomarkers are needed because not all cohorts/centres have access to all biomarkers.
In the present study, we aimed to develop a model to predict tau burdens in the prodromal AD using multimodal biomarkers. We hypothesized that ML could provide an objective, unbiased estimator for classifying tau positivity as an alternative statistical method. We developed and validated several RF and GBM models with various combinations of variables in order to account for the various clinical environments. Variable importance and partial dependency plot (PDP) were also assessed to identify the most relevant features and their relationship to tau burden.

Results
Demographics and clinical characteristics of participants. The demographic information of the participants is summarised in Table 1. The A + T + group had a higher percentage of female participants examined compared to the A + T− group (58.8% vs. 26.7%, p = 0.009). The A + T− group showed a higher number of years of education than the A + T + group (16.6 ± 3.1 years vs. 15.3 ± 2.1 years, p = 0.045). There were no differences in age (p = 0.463) and frequency of APOE4 carriers (p = 0.374) between the A + T− and A + T + groups.
The relative variable importance and PDP. The relative feature importance from each predictor of model 6 is shown in Fig. 2, indicating the highest contribution to the prediction of tau positivity. In the GBM model, cortical thickness of the parietal lobe was the most important feature followed by the neuropsychological test of memory domains, cortical thickness of the occipital lobe, and number cancellation test score. The important features identified by RF were similar to those identified in the GBM model, such as the cortical thickness of the parietal lobe, the neuropsychological test of memory domains, cortical thickness of the occipital lobe, and word recognition score. As expected, according to the PDP plot, cortical thickness and memory scores are

Discussion
In the present study, we developed and compared ML approaches for prediction of the brain tau burden in prodromal AD patients using multimodal biomarkers based on the ADNI dataset. We found that the GBM with multi-model biomarkers showed a good predictive performance. Especially, the important features in predicting the brain tau burden in prodromal AD patients involved brain structures and neuropsychological results that are responsible for memory. We also found that the GBM with baseline demographics and neuropsychological results showed a reasonable predictive performance. Furthermore, the RF had performance similar to that of the GBM. Therefore, our approaches predicting the tau burden may provide important information for the recruitment of participants in potential clinical trials of tau-targeting therapies, which is helpful to reduce failure in screening. We have developed six models for tau positivity with various combinations of input features reflecting the clinical practice. To construct a model relying on prediction performance alone may lead to underestimating the cost of acquisition and accessibility of the clinical resources. Thus, models from this study maximize the potential applicability of our models in any medical conditions and possibly provide efficient use for deploying cost-effective interventions.
We found that 53.1% of patients with prodromal AD showed a significant tau uptake, which was consistent with that seen in previous studies. Previously, Ossenkoppelle et al. set tau PET-positive (61.4%) as a Youden Index derived cut-off 16,17 and Maass et al. set significant tau PET uptake as Braak ROI-based staging using a regression-based conditional interference tree approach 8 . Especially in the ADNI data, a significant tau uptake in prodromal AD with conditional inference method analysis was similar (57.9%) to our result. www.nature.com/scientificreports/ In the present study, our algorithm using demographics, neuropsychological results, APOE4 genotype, SUVR of FDG PET and cortical thickness showed a good predictive performance for predicting tau burden in Aβ + MCI populations. Previously, a study predicting A/T/N stages for a spectrum of individuals ranging from healthy controls to those with MCI and AD was published 18 . This study used structural-MRI alone and showed that the model predicted tau at 89% across the clinical diagnostic group. However, those prediction values were analysed in a healthy control to MCI and AD whereas we did it in a group of homogeneous patients, which could affect predictive performance. Furthermore, in the present study, the GBM with only baseline demographics and neuropsychological results showed a reasonable predictive performance.
The GBM and RF had an adequate performance predicting tau positivity. In the GBM method, a number of weak learners were combined to decrease bias. They generally showed a better performance with low variance data. Since our data set consists of only prodromal AD patients indicating a low variance, the performance of the GBM might be better than RF. In addition, interpretability, which is one of the main challenges of ML, was enabled by providing additional information on the model via variable importance and PDP. The results of this study provide evidence to consider ML to be a more accessible prediction tool for clinical use.
Through analysis of the impact order of the variables in each machine learning classifier, abnormalities in the cortical thickness and neuropsychological tests related to memory function were selected as important features. Our findings were consistent with a recent study showing strong relationships between increased tau pathology and reduced cortical thickness with worse performance on neuropsychological test pronounced in bilateral temporoparietal regions in prodromal AD and AD dementia 15 . Considering that our participants consisted of those with prodromal AD, our findings might be explained by the fact that memory is affected early during the course of AD 19 . Interestingly, we found that the cortical thickness in the occipital region has a strong predictive value for disease severity in prodromal AD. Our findings might be supported by a previous study showing that cognitive function in prodromal/early stage of AD is related to occipital connectivity 20 .
We were able to conduct this study because of the availability of various clinical data through the ADNI because the ADNI is a large cohort of well-characterised subjects, and the clinical and imaging data were based on standardised protocols and analyses. However, there are a few limitations to this study. First, we set binary www.nature.com/scientificreports/ limits to tau burden as only tau positivity, defined as positive when the in-vivo Braak stage was ≥ III/IV, which is of particular interest since it might be considered as the transitional stage towards AD 8 . There is, however, no consensus yet on how to label tau PET scans as normal or abnormal 8 . However, the frequency of tau (+) in prodromal AD patients 8,17 seemed to be similar to that observed in previous studies. Second, some etiologically important variables or risk factors that have previously been established in AD research were not examined. Future research should certainly take into account other variables found to be of etiological significance. Another limitation of our study is relatively small number of samples. There has been no consensus on the measure to estimate the effective sample size for machine learning models. Additionally, acquisition for diseasespecific data is still limited and relatively small in clinical practice. Therefore, our result needs to be addressed and clarify using a larger sample size in future studies. Despite these limitations, machine learning with the rigorously well-defined framework proposed here may be useful to explore the nature of heterogeneous tau pathology in the prodromal stage of AD and to examine the relationship between clinical information, neuropsychological profiles, and brain imaging. Developing a better understanding of the algorithms and integration of machine learning into clinical practice is therefore a critical step to support the development of general population prediction models in the prodromal stage of AD.
In conclusion, our ML algorithms for predicting the brain tau burden in prodromal AD showed good accuracy, it can be a useful tool to screen study populations for targeted tau therapies and predict disease severity and prognosis. Future studies are warranted to evaluate tau burden in the transitional stage and account for other significant etiological variables.

Methods
Participants. Our study population primarily consisted of subjects from the Alzheimer's Disease Neuroimaging Initiative (ADNI)-3. A full list of inclusion/exclusion criteria is described in detail at http://adni.loni.usc. edu/metho ds/docum ents/. All participants provided written informed consent, and all protocols were approved by each participating site's institutional review board. The authors obtained approval from the ADNI Data Sharing and Publications Committee for data use and publication. In addition, all methods were implemented in accordance with the approved guidelines. Briefly, MCI participants had a subjective memory complaint with a Clinical Dementia Rating (CDR) score of 0.5 (Petersen et al., 2010). The stage of MCI (early and late) patients was determined using the Wechsler Memory Scale (WMS) Logical Memory II; Early MCI (EMCI) subjects must have education-adjusted scores between approximately 0.5 and 1.5 SD below the mean of cognitively normal adults (on delayed recall of one paragraph from WMS Logical Memory II). All subjects gave written informed consent prior to participation 6 .
In this study, we included MCI patients who underwent 3.0 T MRI scanning, 18  Definition of Tau abnormality: outcome. We defined participants as having an abnormal "T" (T +) if their in-vivo Braak stage was III/IV or greater by a conditional inference tree approach. This approach embeds decision tree-structured regression models to determine in-vivo Braak staging based on AV1451 uptake, as suggested by a previous study 21 . The regression model assigned participants with a mean Braak V/VI ROI AV1451 SUVR > 1.267 to in-vivo Braak stage V/VI. The remaining participants underwent the same procedure, using first Braak III/IV (> 1.207) and then Braak I/II (> 1.142) ROIs, leaving the remaining participants in in-vivo Braak stage 0. This conditional inference tree approach thus classified all participants into either Braak V/VI, Braak III/IV, Braak I/II or Braak stage 0 groups.

Cortical thickness measurement.
In order to obtain local cortical thickness measurements for each subject, all T1 volume scans were processed by the CIVET pipeline (version 2.1.0) developed at the Montreal Neurological Institute for fully automated structural image analysis. In brief, using a linear transformation, native MRI images were registered to the MNI-152 template 22 . The N3 algorithm was used for correction of intensity non-uniformity caused by the inhomogeneities in the magnetic field. The next step is to perform the tissue classification into white matter (WM), grey matter (GM), cerebrospinal fluid (CSF), and background (BG) based on the T1-weighed image. The brain is split into the left and right hemispheres for the purpose of surface extraction. The surfaces of the inner and outer cortices were automatically extracted using the Constrained Laplacian-based Automated Segmentation with Proximities (CLASP) algorithm 23 . The inner and outer surfaces had the same number of vertices, and there was a close correspondence between the counterpart vertices of the inner and  24 . The cortical thickness was defined as the Euclidean distance between the linked vertices of the inner and outer surfaces; there were 40,962 vertices in each hemisphere in the native space 23,25 .
Cortical thickness values were calculated in native brain spaces rather than in Talairach spaces because of the limitations of linear stereotaxic normalisation 26 . Intracranial volume (ICV) is defined as the total volume of grey matter, white matter, and cerebrospinal fluid. We calculated ICV by measuring the total volume of the voxels within the brain mask 27 . Brain masks were generated using the FMRIB (Functional Magnetic Resonance Imaging of the Brain) Software Library (FSL) bet algorithm. Since cortical surface models were extracted from MRI volumes transformed into stereotaxic space, cortical thickness was measured in the native space by applying an inverse transformation matrix to the cortical surfaces and reconstructing them in native space 25 .
To measure hippocampal volume (HV), we used an automated hippocampus segmentation method using a graph cut algorithm combined with an atlas-based segmentation and morphological opening as described in an earlier study 28 . Machine learning algorithms. To examine changes in prediction accuracy according to the different combinations of predictors, we developed six models. We derived two tree-based ML algorithms: GBM and RF. GBM 29 is a tree ensemble model that generates a strong prediction model from weak learners, typically decision trees. The RF was proposed by Breiman 30 builds a tree ensemble predictor with multiple decision trees, in which the predictions of multiple trees are aggregated by averaging or majority voting 31 .
K-fold CV is used to divide the data set into non-overlapping K partitions. K-1 data partitions are used as a training set where a classifier is trained, and its generalization performance is tested on the one left-out validation set. This process is repeated K times. We selected K = 10 as an empirically ideal situation since accuracy is saturated when K = 10. Under the CV procedure, the generalization of the predictive power and validation error www.nature.com/scientificreports/ was computed. The predictive performance was estimated using the area under the receiver operating characteristic (ROC) curve (AUC) and their 95% confidence interval.
Interpretable ML: variable importance and partial dependence plot (PDP). For each optimized model examined the variable importance criterion, which measures the relative prediction power (prediction strength) by using mean decreased accuracy (MDA) or Gini index 10 . For each analysis, variable importance was estimated to find which independent variables were influential features for an accurate classification 32 . Influential variables were ranked by calculating relative importance values. In the tree-based model such as GBM and RF, when the variables split the tree, the relative importance value of that variable was estimated by the discrepancy of the squared error loss over all trees. A higher relative importance value indicated a greater influence of the variables for classifying tau positivity. We conducted a PDP proposed by J.H. Friedman, which can provide information on whether the feature is positively or negatively correlated to the final prediction. In order to avoid over-weighted or underweighted results, a Min-Max normalisation 33 was conducted. PDP is a graphical representation tool, which can provide information on whether the feature is positively or negatively related to the final prediction, it is shown as follows.
Let Statistical analysis. For the comparison of demographic and clinical data, a two-sample t-test was used for continuous variables, and a chi-square test was used for categorical variables. All analyses were performed with R package 34 , version 3.6.1 (R Project for Statistical Computing).

Data availability
The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.