Introduction

Recent neuropathology and biomarker studies have found that the pathological changes in Alzheimer's disease (AD), particularly beta-amyloid plaques and neurofibrillary tangles (NFT), may precede the onset of clinical disease by as long as 20-30 years1,2,3. A logical next step for further investigation of these changes is to express the disease process in terms of a series of measurable biological indicators. A disease progression model is a good choice because it can provide a basis for learning from prior clinical experience and summarize knowledge in a quantitative fashion. Almost all of the current models describe disease progression by only using the longitudinal AD Assessment Scale-cognitive subscale (ADAS-cog) scores. Holford and Peace first developed a linear disease progression model to describe longitudinal changes in ADAS-cog data over time in mild-to-moderate AD patients4. Several nonlinear models were subsequently developed because the linear model was considered insufficient to portray cognitive decline in disease progression. Samtani et al developed a logistic model to describe the longitudinal change of ADAS-cog scores for mild cognitive impairment (MCI) patients5. Gomeni et al reported an indirect physiological response model for mild-to-moderate AD patients, assuming constant kin and kout to be time-variant6. These models do not capture subtle biochemical or physiologic changes and may have a large variation, resulting in poor prediction. The progression of the disease is usually very slow and requires long-term data collection for an accurate analysis. Disease progression models with more physiologic inputs may be more appropriate for chronic degenerative diseases7. The current challenge is finding a way to integrate the biomarkers to a disease progression model. Jack et al suggested a well-accepted model to describe the temporal evolution of AD biomarkers versus time8. The original hypothesis was that the curves followed a Sigmoid function. Several other biomarkers' dynamic models have been developed since then, such as linear and exponential models9. These models lacked mathematical proof and description, and as such, they were only conceptual description models.

The most widely studied cerebrospinal fluid (CSF) biomarkers include the CSF 42 amino acid form of Aβ (Aβ42) and phosphorylated tau protein (p-tau), which have been shown to serve as in vivo proxy measures of amyloid plaques and NFT, respectively10,11,12,13. The consensus of scientists worldwide is that in the disease's evolution process, AD biomarkers do not develop in an identical manner. These biomarkers develop in a sequential yet partially overlapped manner. Discussion regarding the initiating event in the biological cascade that eventually leads to AD has been controversial. The amyloid cascade hypothesis assumes that a series of causal events are initiated by abnormal Aβ production/aggregation14,15,16. An alternative hypothesis is that p-tau develops first, but is confined to subcortical and medial temporal limbic areas. Neocortical Aβ deposits develop thereafter, aiding the antecedent tau-related neurodegeneration widely spreading by unknown mechanisms17,18,19,20. The sequence of this biological cascade is not consistent between individuals, so neither Aβ42 nor p-tau alone is sufficient to accurately diagnose the onset of AD. The changes in CSF Aβ42 and p-tau are very slow, and they approach plateaus at different times, limiting their utility for longitudinal measurement. The ratio of CSF Aβ42 to p-tau (the Ratio) may provide a promising strategy for monitoring the onset of AD. Because the level of CSF Aβ42 demonstrates a decrease over time, while that of p-tau demonstrates an increase over time, the Ratio always decreases in a more apparent manner.

Anomalies of the Ratio lead to neurodegeneration, possibly due to the direct neurotoxicity of aggregated Aβ and the collapse of the neuronal transport system that is caused by p-tau21. The first region in the brain showing significant atrophy during AD progression is the hippocampus, which plays an important role in the formation of new memories. The hippocampus can also provide measures of cerebral atrophy, due to the loss of synapses and neurons22,23,24. In addition, a prospective longitudinal cohort study found that greater atrophy in the hippocampal subfields predicted MCI conversion, whereas larger hippocampal volumes predicted cognitive stability and/or improvement25. Therefore, hippocampal volume is an essential indicator of disease progression in AD.

All of these pathological changes eventually lead to a loss of memory and functional ability, which are reflected by ADAS-cog. The disease progression model and biomarkers' models are discrete entities at present, but they are in fact connected. Because of the current lack of mathematical models, the abnormal range for many markers is not yet known. The objective of this study is to integrate these into disease progression models and to find out their abnormal ranges. We put forward an empirical model where clinical continuum of AD is initiated by an abnormality of the Ratio, followed by hippocampal atrophy and cognitive impairment. A detailed disease progression model can be established by finding associations between the Ratio, hippocampal volume, and ADAS-cog.

Materials and methods

Study details

Data were provided by the ADNI database (http://adni.loni.usc.edu/). This database is open to the public and allows authorized scientists to access imaging, clinical, genomic, and biomarker data for the purpose of scientific investigation, teaching, or planning clinical research studies. The primary goal of ADNI is to investigate whether serial MRI, PET, other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of MCI and early AD. Determining sensitive, specific markers of very early AD progression can aid researchers and clinicians in developing new treatments and monitoring their effectiveness. This can also reduce the time and cost of clinical trials.

The ADNI study began in 2004 and included 400 subjects diagnosed with MCI, 200 subjects with early AD, and 200 normal elderly (NL) subjects. The initial phase of the study was known as ADNI1. In 2009, ADNI1 was extended with ADNI GO, which assessed the existing ADNI1 cohort and added 200 participants identified as early MCI (EMCI). The objective of this phase was to examine biomarkers in an earlier stage of the disease. In 2011, as ADNI GO was ending, ADNI2 began. ADNI2 assessed participants from the ADNI1/ADNI GO cohort and added the following new participants: 150 NL, 100 EMCI, 150 late MCI (LMCI) and 150 mild AD patients.

We included 395 subjects from the ADNI study (Table 1), for whom baseline data on Aβ42, p-tau, hippocampal volume and ADAS-cog were available. Statistical analysis was conducted to determine whether we should consider the variation among phases of the study. A one-way ANOVA did not identify a significant difference between phases when the significance threshold was set at 0.01. Therefore, we treated all data as heterogeneous in subsequent data processing.

Table 1 Demographic characteristics of participants.

Model software

Dataset preparation, exploration and visualization were performed using R (version 2.15.0). Disease progression models were established using extended least squares regression by NONMEM (version 7.2, Icon Development Solutions, Ellicott City, MD, USA). The model building strategy that we used was based on an approach that is widely accepted in the pharmacometrics community. Various models were tested, and model selection was based on mechanistic plausibility, parameter estimate precision, and the objective function value (OFV). The disease progression model we used was established by subroutine ADVAN6 with TOL equal to 5 in NONMEM, using the first-order conditional estimation (FOCE) method without η-ε interaction. Relevant covariates were screened by visual inspection, using Xpose Version 4.0 (Uppsala University, Uppsala, Sweden) via multiple forward selection and backward elimination steps. The difference in OFVs between hierarchical models was assumed to have an approximate χ2 probability distribution, with the number of degrees of freedom equal to the difference in the number of parameters between the models. Covariates were considered significant when the difference between OFVs, with and without the covariate, was greater than 7.88 (P<0.005).

Model performance evaluation

Bootstrap analysis

A total of 1000 bootstrap replicates were randomly generated from the original data, then refitted to the final model. Median parameter estimates and 95% confidence intervals (CI) obtained from these bootstrap replications were compared with those obtained from the original dataset.

The visual predictive check (VPC)

VPC was constructed based on 1000 stochastic simulations from the final model. The median of the dependent variable (DV) and nonparametric 95% CI were then calculated (at the 2.5th and 97.5th percentiles) for the observed and all of the simulated datasets to graphically assess whether simulations were able to reproduce both the central trend and variability in the observed data.

Model progression

Base model

Various linear and nonlinear models were tested to determine the associations among time, the Ratio, hippocampal volume, and ADAS-cog. Brief descriptions and mathematical expressions are listed in Table 2. The changes of the Ratio over time, hippocampal volume with the Ratio, and ADAS-cog with hippocampal volume are all depicted in Figure 1, where we tested linear, Emax and Sigmoid models. In addition, because biological systems were self-limited, we also tested a logistic model with homeostatic control systems for the Ratio. The rate of change of the Ratio was governed by a constant K and itself, describing the phenomenon where the rate of change first increased and then decreased with increase of the Ratio. The Akaike Information Criterion (AIC) was used to compare these non-nested models, which mainly focused on gauging how well the models conformed to the data.

Table 2 Model comparison.
Figure 1
figure 1

(A) The Ratio change over time; (B) hippocampal volume change with the Ratio; (C) ADAS-cog change with hippocampal volume. The solid line represents the loss regression line. The observed data are represented by circles.

PowerPoint slide

Covariate model

According to the results of previous studies, covariates of interest in AD include disease state (DS), age, APOEε4 genotype and sex. We confirmed that there was no obvious correlation among these covariates. The ethnicity of the patients and their levels of education were reported to influence the disease progression, but we did not analyze these factors because of data bias. The effects of covariates were modeled using a linear equation (Eq 1).

where PTVi is the value of model parameter for individual i, θ is a correlation coefficient estimated for covariate, covi is individual's covariate value.

Categorical covariates were mapped to values of 0, 1, 2, and so on. APOEε4 is the best-established genetic risk factor for AD, predicting an earlier onset of AD and faster cognitive decline. The APOEε4 genotype was categorized into 0 APOEε4 allele (APOEε4=0), 1 APOEε4 allele (APOEε4=1) and 2 APOEε4 alleles (APOEε4=2). Gender was categorized to 0 for male (SEX=0) or 1 for female (SEX=1). DS was categorized to normal (DS=0), EMCI (DS=1), LMCI (DS=2), and AD (DS=3).

Results

The parameters of final model

Given the criteria we used to select the model, the most likely models (Table 2) for the Ratio over time, hippocampal volume with the Ratio and ADAS-cog with hippocampal volume were logistic, Emax and linear models, respectively, with the formulas in Model 8 used as the integrated base model. Because the trends between ADAS-cog and hippocampal volume were inconsistent for different disease states (Figure 1C), a uniform model covering all disease states was unavailable. We divided the four disease states into two subgroups (NL and EMCI, LMCI and AD) according to different rates of cognitive decline and further modified Model 8 in Table 2 (Eq 7,8).

The mathematical formulas of the final model were as follows (Eq 2,3,4,5,6,7,8,9):

where R is the value of the Ratio. V is the value of hippocampal volume. S is the value of ADAS-cog. K is a constant controlling the change rate of the Ratio. Rmax equals to 30, the largest observed value of the Ratio. R0 is the base value of the Ratio when both DS and APOEε4 are equal to 0. θ1 is the correlation coefficient of DS on the Ratio. θ2 is the correlation coefficient of APOEε4 on the Ratio. V0 is the base value of hippocampal volume when age equals to 0. θ3 is the correlation coefficient of age on hippocampal volume. θ4 is the correlation coefficient of DS on EC50. S0 is the base value of ADAS-cog when DS equals to 0. K1 is the slope of ADAS-cog changed with hippocampal volume for NL and EMCI subject. K2 is the slope of ADAS-cog changed with hippocampal volume for LMCI and AD subject, and θ5 is the correlation coefficient of DS on ADAS-cog.

The results of the estimates in the final model are summarized in Table 3. The baseline of the Ratio decreased with the aggravation of disease and the increased number of APOEε4 alleles: NL with 0 APOEε4 allele showed the maximum value of 8.09, while AD with 2 APOEε4 alleles showed the minimum value of 1.55. As time progressed, the Ratio decreased. Using the final model, we could see that the Ratio for different subgroups showed the same declining trend, but with different baseline values.

Table 3 Results of the final integrated model.

The baseline of hippocampal volume showed a downward trend with increase in age: every year the hippocampal volume reduced by 0.048 mL. EC50 showed an increased trend with aggravation of disease: 0.707 mL for NL and 1.997 mL for AD. The results suggest that hippocampal volumes decrease faster for individuals in a more serious disease state because of the higher EC50.

The baseline of ADAS-cog increased with the aggravation of disease: 13.7 for NL and 27.35 for AD. The four groups were divided into two subgroups, using different slopes of linear models. The results show that the subgroups in different disease states demonstrated different increasing trends: ADAS-cog increased more rapidly if the subjects were in AD or LMCI stages.

These results were all consistent with our hypothesis that a lower Ratio indicates a more severe disease state.

Overall, the final model parameters were well estimated, with reasonable confidence intervals and in good agreement with results calculated by the bootstrap (Table 3). The correlations of population predictions versus observations and individual predictions versus observations were in good agreement. The weighted residuals were, in general, randomly scattered around the zero line. Upon inclusion of the covariates into the model, we confirmed visually that there was no trend in the distribution of random effects in the final model (Figure 2).

Figure 2
figure 2

Basic goodness-of-fit plots for the final model.

PowerPoint slide

The results of VPC, show that the distribution of the observed data was contained within the empirical distribution of the estimates predicted by the final model over 1000 simulations. This indicates that the final model prediction was reasonable for both the point estimates as well as the distributions (Figure 3).

Figure 3
figure 3

Visual predictive check from 1000 simulations. The solid line represents the median observed data, and the gray field around it represents a simulation-based 95% confidence interval for the median. The observed 5% and 95% percentiles are presented with dashed lines, and the gray fields around them are their 95% confidence intervals. The observed data are represented by circles.

PowerPoint slide

Using the mean and 95% confidence interval, we were able to determine the corresponding scope of the Ratio to each disease state (Table 4). The mathematical expression for the Ratios' baseline was as Eq 2.

Table 4 Corresponding scope of the Ratio to each disease state.

Comparison of LMCI-converters and LMCI non-converters

Two important goals in AD research are predicting the conversion of MCI to AD and identifying fast and slow disease progression in individual patients. In our established disease progression model, a lower Ratio was related to faster cognitive decline. The next question we examined was whether the model could predict MCI conversion. In the dataset, MCI was subdivided into EMCI and LMCI, and only some LMCI patients converted to AD (LMCI-converters). The comparison between LMCI-converters and LMCI non-converters (Table 5) was performed to see whether the Ratio was a good predictor. The results showed that the mean Ratio of LMCI-converters was always lower than that of LMCI non-converters, regardless of which APOEε4 group they belonged to.

Table 5 Comparison of LMCI-converters and LMCI non-converters. The table was summarised as mean (range).

Discussion

Some improvements have been made in our study, compared with previous models. It is assumed that an AD patient will transition from normal to EMCI, then to LMCI, and finally to AD if the observation period is long enough. Due to the nonlinearity of disease progression, the model for one specific disease stage cannot describe the disease progression of another stage. All stages of the disease were considered in our study, including NL, EMCI, LMCI, and AD. Other studies only considered one disease stage, such as MCI or AD. Therefore, our model can be applied to the whole spectrum of AD progression even when disease stage changes. In addition, 7 years of data is available for some individuals, which is beneficial in examining the long-term changes of AD progression. More importantly, we have introduced a novel disease progression model which integrates three endpoints. This model is different from previous ones which empirically model the disease progress with clinical scores only. Instead of treating Aβ42, p-tau and hippocampal volume as covariates, our model utilizes them to develop a timeline disease progression model which can be applied to the entire AD spectrum from normal to dementia. In conventional longitudinal analyses, as treated as covariates, the values of Aβ42, p-tau and hippocampal volume are generally assumed to be time-invariant. In progressive chronic diseases, these assumptions are not realistic because the biological functions may deteriorate over time. Thus, a disease progression model that characterizes the time-varying disease status is desired, which not only describes the change of the score but also the changes of biomarkers and imaging markers. Through our model, people can obtain not only a model of ADAS-cog but also a model of biomarkers, imaging markers and their abnormal ranges. Finally, we have partially achieved the two most important goals in AD study: predicting conversion of MCI to AD and identifying fast and slow disease progression in individuals. As demonstrated by the results, individuals with a lower ratio deteriorate faster and convert from MCI to AD more easily.

The APOE gene encodes a protein called apolipoprotein E, which is a major component of a specific type of lipoprotein called very low-density lipoproteins (VLDLs). The ε4 version of the APOE gene may increase the risk for an individual to develop AD by an unknown mechanism. Some researchers have found that this allele is associated with an increased number of amyloid plaques, while others have found that it is associated with elevated p-tau26. In either case, APOEε4 is related to a decreased Ratio, which is consistent with our findings. The results of this study indicate that each additional copy of the APOEε4 allele may decrease the Ratio by 1.65.

Disease state (DS) is an important factor that influences numerous important progression parameters. From NL to AD, the Ratio will decrease by 1.08 as DS increases by one level. Higher DS level is also related to smaller hippocampal volume, through its influence on EC50. In all disease states, hippocampal volume will decrease with an increase in age. LMCI and AD groups show faster cognitive decline than NL and EMCI groups.

Clinical criteria, which are often subjective and dependent on clinical judgment, are insufficient to identify early stages of AD when considered alone. In recent decades, the use of more objective biochemical and imaging markers, either replacing or complementing these clinical approaches, to facilitate an early and accurate diagnosis of the illness, has been investigated extensively. Studies have shown that individuals with lower levels of CSF Aβ42 and higher levels of p-tau develop AD more frequently. Hence, the Ratio can be used to identify the early stages of AD and explore the subsequent events during AD progression. The combination of these two biomarkers appears to be superior to a single biomarker because only a slight change in either Aβ42 or p-tau is detected over relatively short intervals. Caroli and Frisoni even proposed that the CSF load was nearly disconnected from the disease stage, as they found changes in CSF biomarkers alone were not significantly associated with the annual decline in cognitive and functional scores in MCI and AD groups27. However, the change in the Ratio is more apparent over time because of the opposite tendency shown by the two biomarkers. Furthermore, although there is no doubt that Aβ42 and p-tau are involved in the pathogenesis of AD, the clarified sequence is still unclear. However, the Ratio decreases regardless of which biomarker changes first.

Predicting MCI-convertors is an important goal in AD research. MCI, the most widely used indicator, poorly predicts whether an individual will deteriorate to AD because some patients convert while others remain relatively stable. Various studies of the identification of MCI-converters have been published, which suggests that clinical measures, in combination with CSF biomarkers or imaging markers, may improve the accuracy of conversion forecasting28,29,30. The results of our study suggest that a combination of Ratio and LMCI can improve the predictive accuracy of LMCI-converters. Patients in the LMCI state with a lower Ratio are more likely to convert to AD.

There are some limitations of the present study. To date, CSF biomarkers have only been collected for a limited number of ADNI participants. Due to the relatively small sample size and short follow-up time, especially for AD, we could not make full use of the non-linear mixed effect model, which might be more appropriate for large samples and long periods.

Despite these limitations, this study makes unique contributions. To the best of our knowledge, this is the first study to introduce a model that explicitly describes the entire course of AD by identifying the mathematical relations between the Ratio over time, hippocampal volume with the Ratio, and ADAS-cog with hippocampal volume. Moreover, the same modeling strategy can be readily applied to systematically test hypotheses about the temporal relationships between other biomarkers and other neurodegenerative diseases.

In summary, the developed cascade model was suitable for describing the progression from NL to AD. Baseline disease state (DS) has an impact on all of the three endpoints (the Ratio, hippocampal volume and ADAS-cog), while APOEε4 genotype and age only influence the Ratio and hippocampal volume, respectively. The model provided a suitable tool for clinical trial simulations and could aid in the design of efficient clinical trials in the future. Using the Ratio, we are able to approximate the disease stage of an individual. Clinical measures in combination with the Ratio can improve the accuracy of MCI to AD conversion forecasting.

Author contribution

Wei LU, Tian-yan ZHOU, and Yue QIU designed the research; Yue QIU performed the research; Yue QIU and Liang LI analyzed data; and Yue QIU wrote the paper.