β-amyloid and tau drive early Alzheimer’s disease decline while glucose hypometabolism drives late decline

Clinical trials focusing on therapeutic candidates that modify β-amyloid (Aβ) have repeatedly failed to treat Alzheimer’s disease (AD), suggesting that Aβ may not be the optimal target for treating AD. The evaluation of Aβ, tau, and neurodegenerative (A/T/N) biomarkers has been proposed for classifying AD. However, it remains unclear whether disturbances in each arm of the A/T/N framework contribute equally throughout the progression of AD. Here, using the random forest machine learning method to analyze participants in the Alzheimer’s Disease Neuroimaging Initiative dataset, we show that A/T/N biomarkers show varying importance in predicting AD development, with elevated biomarkers of Aβ and tau better predicting early dementia status, and biomarkers of neurodegeneration, especially glucose hypometabolism, better predicting later dementia status. Our results suggest that AD treatments may also need to be disease stage-oriented with Aβ and tau as targets in early AD and glucose metabolism as a target in later AD.

Since the findings by Lin and colleagues replicate those in earlier clinical analyses, the real strength of this study is the ability to generate algorithms and correlations for computational analyses that are consistent with clinical, pathological and statistical assessments. The challenge is to translate analysis of data that is readily available collected and aggregated over the course of decades and to the capacity to individual patient assessment in real time.
Reviewer #3 (Remarks to the Author): In this study a Random Forest model was constructed to differentiate Alzheimer's disease, late mild cognitive impairment and cognitively unimpaired patients using PET, MRI and CSF biomarkers. The model was then used to interpret the contribution of these factors with feature importance analysis and by examining patterns of correlation with several cognitive scores. Major comments (1) Although it is interesting to see a machine learning method being used primarily for the interpretation of variable effects, the main question for most readers will be why this approach was chosen rather than a more common statistical method, like multivariate ANOVA? The trade-offs and rationale for this choice should be explained in detail in the introduction.
(2) In table 1 authors have shown that several confounding variables may potentially impact the results. However, no step appears to have been taken to address this in subsequent analysis. In particular, brain volumetric measures are known to change with age and therefore could be a proxy for this confounding feature. To ensure good quality results, this should be dealt with, e.g. by using covariate balancing.
(3) Feature importance is now largely superseded by a more advanced and accurate SHAP method (Lundberg et al., 2017). The implementation is readily compatible with the described procedure and adding this type of analysis to the paper will greatly improve the interpretability of the results. (4) To make the most out of the modern machine learning methods it is particularly importance to perform hyperparameter optimization. As this was not mentioned anywhere, I am assuming it was not done in this case. Either way, thorough evaluation should be done to show the effect of at least key RF parameters on the performance of the model and the results reported in the paper. Minor comments (1) Re: "…random forest machine learning method because it not only has high prediction accuracy due to its use of multiple decision trees". High accuracy is a property of a particular model, not an algorithm and even very simple algorithms can be used to create very accurate models in specific cases -consider rephrasing.
(2) Some additional information should be added to fully characterize stability and quality of the constructed model: performance at other K-folds (e.g. 3, 10) and PR curves in addition to ROC curves.
(3) As noted by the authors, some of the features used in the model are correlated, and this should be characterized further -e.g. by including some plots/heatmaps to show the correlation structure with respect to the overall vs. selected features.
(4) The discussion emphasizes that reported findings are quite consistent with several previous studies. As Communications Biology prioritizes novelty, if there are any specific novel insights from this study, they should be identified more explicitly. (5) Panels in Fig. 2 should be in the same order to make it easier for the reader.
We thank the reviewers for the thoughtful and constructive comments. We are pleased that we were able to address all the concerns and made the changes accordingly. In particular, we significantly revised the Methods and Results sections to accommodate the reviewers' suggestions and added five new figures/tables as the Supplementary materials. We also revised Discussion accordingly. The changes are highlighted in yellow in the manuscript. The point-bypoint response can be found below.

Reviewer #1
Overall the outcomes of the AI / machine learning approach conducted by Lin and colleagues matches very well with those reported earlier by Jack and colleagues: Jack, Knopman et al. Lancet Neurol. 2013 Feb; 12(2): 207-216.
Response: We thank the reviewer for the suggestion. We have included this paper in Discussion added it as a reference (Page 14, Lines 304-306).
Since the findings by Lin and colleagues replicate those in earlier clinical analyses, the real strength of this study is the ability to generate algorithms and correlations for computational analyses that are consistent with clinical, pathological and statistical assessments. The challenge is to translate analysis of data that is readily available collected and aggregated over the course of decades and to the capacity to individual patient assessment in real time.
Response: We agree. We have included this information in our Discussion (Page 14, Lines 306-310).

Reviewer #2
Major Comments: 1) Results (page 9, line 222): While the paper describes the correlation of the various AD biomarkers with broad measures such as composite scores of memory and executive functioning, there is no description in the paper of the correlations between the individual measures themselves. This information would be helpful to the reader in showing which measures are uniquely informative versus which measures are highly correlated and thus likely to show similar levels of feature importance in random forest analysis. 2) Discussion (page 16, line 368): In the description of potential limitations/concerns, it should be noted that the distribution of all three groups are predominantly male, whereas Alzheimer's disease disproportionately affects women (e.g., most study case groups are 55-65% female). It should be discussed whether this is a feature of study design or completeness of data, as it may have some effect on the inferences that can be drawn.
Response: We thank the reviewer to point this out. We have expanded our discussion to reference this potential limitation (Pages 18-19, Lines 438-441) as follows.
Additionally, while the available dataset from ADNI has more male participants, it should be " noted that AD disproportionately affects women 69 . Future efforts may be needed to re-evaluate the outcome when data from the female participants become more available".
Minor Comments: 1) General: Wherever the gene APOE is referenced, it should be italicized, and all instances referring to the APOE ε4 allele (including in tables) should be printed as "APOE ε4" and not as "APOE4."

Response: We have changed the formatting accordingly throughout the manuscript.
2) Discussion (page 17, line 397): The controversial term "type 3 diabetes" is often used to indicate that aberrant glucose metabolism in the brain mirrors the effects of insulin resistance issues elsewhere in the body, however this characterization is prone to oversimplification and misunderstanding. A more pragmatic statement might be to say that the metabolic abnormalities present in AD are often likened to a form of diabetes of the brain, while keeping the same references.

Response: We have made the suggested change on Page 17, Lines 396-397.
3) Methods (page 19, line 452): Additional citations for papers describing the collection of ADNI MRI and glucose metabolism variables should be included, as details on the collections of the phenotypes will be important for some readers. Those details do not need to be reproduced in the manuscript, but links or citations for that information should be provided. 19, lines 459-460).

Major comments
(1) Although it is interesting to see a machine learning method being used primarily for the interpretation of variable effects, the main question for most readers will be why this approach was chosen rather than a more common statistical method, like multivariate ANOVA? The trade-offs and rationale for this choice should be explained in detail in the introduction.
Response: We appreciate the reviewer for the thoughtful concern. As the major goal of the study is being able to rank the features as well as determine the accuracy of the prediction, we chose to use Random Forest method, a machine learning algorithm, as it fits for the purpose. Other more common statistical methods, including ANOVA, would not be able provide the information requested by the study. We included this comment on Page 20, Line 482.
(2) In table 1 authors have shown that several confounding variables may potentially impact the results. However, no step appears to have been taken to address this in subsequent analysis. In particular, brain volumetric measures are known to change with age and therefore could be a proxy for this confounding feature. To ensure good quality results, this should be dealt with, e.g. by using covariate balancing.
Response: We thank the reviewer for the comment. We adjusted for age in the revision. Specifically, we used the linear regression model between the 6 brain volumetric measures and age on the CU group, then applied the fitted coefficients across all other groups to achieve the covariate balancing before running RF. In the implementation, we used the function sklearn.linear_model.LinearRegression of scikit-learn package to calculate the linear regression coefficients between brain volumetric measures and age. The changes can be seen on Page 5, Lines 134-135 and Page 20, Lines 491-498. We also updated the results in Table 3 accordingly.
(3) Feature importance is now largely superseded by a more advanced and accurate SHAP method (Lundberg et al., 2017). The implementation is readily compatible with the described procedure and adding this type of analysis to the paper will greatly improve the interpretability of the results.
Response: We thank the reviewer for the suggestion. In this revision, we used the SHapley Additive exPlanations (SHAP) technique to implement an additional feature ranking analysis. We found that the results from SHAP analyses were consistent with those from Random Forest. (4) To make the most out of the modern machine learning methods it is particularly importance to perform hyperparameter optimization. As this was not mentioned anywhere, I am assuming it was not done in this case. Either way, thorough evaluation should be done to show the effect of at least key RF parameters on the performance of the model and the results reported in the paper.
Response: We agree. We added hyperparameter optimization in this revision. We changed the Methods accordingly (Pages 22-23, Lines 561-564), and as follows.

Minor comments
(1) Re: "…random forest machine learning method because it not only has high prediction accuracy due to its use of multiple decision trees". High accuracy is a property of a particular model, not an algorithm and even very simple algorithms can be used to create very accurate models in specific cases -consider rephrasing. Figure 2. SHAP Analysis depicting biomarker feature ranking importance. Comparison of feature ranking analysis from implementation of the SHapley Additive exPlanations (SHAP) technique. The bar plots shows feature impacts on the cognitively unimpaired (CU) vs late mild cognitive impairment (LMCI) analysis, the LMCI vs Alzheimer's disease (AD) analysis, and the CU vs AD analysis.

Supplementary
Response: We thank the reviewer for the suggestion. We have rephrased the sentence accordingly (Page 4, Line 93).
(2) Some additional information should be added to fully characterize stability and quality of the constructed model: performance at other K-folds (e.g. 3, 10) and PR curves in addition to ROC curves. Figure 3) and have changed the Methods and Results accordingly (Page 9, Line 205 and Pages 22-23, Lines 557-564). We also added the results with 3 and 10 K-folds calculations in addition to the original 5 K-fold outcomes (Supplementary Table 1) and have changed the Methods and Results accordingly (Page 9, Lines 202-203, and Page 22, Line 555).