Among the top 10 causes of death in the United States, Alzheimer’s disease (AD) is the only condition without a viable treatment to cure or prevent it, or even significantly slow its progression1. Failure to develop a successful disease-modifying therapy for AD, despite large investments of public and private resources, is rooted in its complexity2,3,4. For instance, signaling pathway analyses of AD pathophysiology has implicated over 30 metabolic pathways and over 1000 chemical species4. Our incomplete understanding of how these mechanisms vary and interact at an individual level to create a clinically and biologically heterogeneous phenotype has resulted in an attempt to treat patients with varying underlying pathophysiology in a similar fashion5,6. Thus, failure to characterize and subtype AD at an individual level has represented a major roadblock in the development of effective therapeutic strategies to slow or halt AD progression. Recent biological classification of AD, based on imaging and cerebral spinal fluid (CSF) biomarkers, represents a major step toward the future development of personalized prognoses and therapeutic strategies7,8. The increasing availability of such data in large cohorts of subjects has made possible the development and testing of rigorous quantitative models of AD pathophysiology. For example, the Alzheimer’s Disease Neuroimaging Initiative (ADNI), a multicenter, prospective, naturalistic study, began in 2003, comprises four sequential studies—ADNI-1, ADNI-GO, ADNI-2, and ADNI-3—which followed subjects up to 15 years, using genetic, blood- and CSF-based, imaging, and cognitive biomarkers. The abundance of data from this and similar multinational biomarker studies in AD will require a rigorous quantitative data-driven modeling approach to analyze, integrate and interpret data at the level of the individual, where it can have maximum clinical impact.

Several mathematical models of AD progression have been developed recently. For example, one mathematical model includes a cellular biologic system of neurons, glia, macrophages, amyloidβ aggregation, and tau to simulate and validate at a cellular level the mechanisms underlying the failure of several drugs in recent clinical trials, and suggest alternative approaches9. Moreover, a mathematical modeling approach has also been used to describe the key AD clinical biomarkers including pathologic hallmark biomarkers (beta-amyloid and tau), neuronal loss biomarkers, and cognitive impairment10. This model was parameterized and tested to successfully simulate the natural history scenarios of three sub-types of AD presented in11: (1) early-onset autosomal dominant AD, (2) late-onset amyloid-first AD, and (3) late-onset tau-first AD.

Although these mathematical models bring new insights in understanding AD progression and enable simulation of therapeutics, the current models are built upon a priori hypotheses of the AD pathophysiological network which still is an open area of research12. In fact, there are dozens of pathophysiological pathways implicated in AD by systems biologists, and our understanding of these networks and their interactions remains incomplete13. Moreover, there has been limited work on mechanistic modeling of clinically measurable AD biomarkers. Most research to date on the keyAD biomarkers has been observational or correlational. Such modeling approaches do not benefit from the tools of a more integrative systems approach that address disease mechanism14,15.

Computational data-driven modeling approaches have already achieved success in analyzing multi-dimensional clinical data in diseases such as cancer16,17 and cardiovascular disease18,19. Such data-driven approaches employ mathematical models for patient populations using clinical, omics, and biomarker data, as well as powerful and new means to personalize such models based on individual data, yielding personal risk profiles. These data-driven modeling approaches can simulate complex systems, helping to elucidate complex physiological interactions and optimize personalized prevention and treatment strategies. Examples of such work include statistical approaches, such as Bayesian generalized linear models20, Bayesian hierarchical models21,22, and those based on Markov chain Monte Carlo simulations23,24,25,26 to analyze genome sequencing and biomarker dynamics. Recently, machine learning techniques, such as deep recurrent neural networks, have been used to predict AD progression27.

In this paper, we propose to develop a computational data-driven modeling framework to predict AD biomarker progression. We propose a methodology to construct data-driven causal models at a group and individual patient level. This method does not depend on any specific hypothesis of AD progression and extracts the causal model completely from the empirical data. More specifically, we derive the causal model based on clinical biomarkers in the ADNI dataset. In this data-driven modeling approach, the causal model is learned from four biomarkers (amyloid-beta pathology, total-tau pathology, hippocampal volume, and cognitive decline) to describe AD progression. Moreover, we incorporate a disease progression score (DPS) in the causal model28 to unify AD progression for different subjects since the onset age and rates of progression may markedly vary within and across the different subject classes in ADNI.


We elaborate on the effectiveness of the proposed data-driven causal model here. First, we construct a population-based causal model that describes the biomarker dynamics for all eligible subjects in ADNI-1, including normal controls. By fitting the population parameters via the ADNI dataset, the population model describes the transition of AD biomarkers between three different disease stages, cognitively normal (CN), late mild cognitive impairment (LMCI), and Alzheimer’s disease (AD). Second, we derive a population model for LMCI and AD subjects only. Third, we analyze the Sobol sensitivity29,30 of the parameter space of the population model, which identifies the attribution of each model parameter. Based on the sensitivity analysis results, we finally construct a personalized model for each subject and provide personalized biomarker predictions for subjects who have more than four longitudinal biomarker data points.

A population model

We construct a causal model by fitting biomarkers of all subjects across the ADNI dataset. Since the causal model is a dynamic system expressed as ordinary differential equations (ODEs), we require at least two longitudinal data points for each subject. More specifically, we remove patients who do not provide at least two measurements for any one of the four biomarkers. The histograms in Fig. 1(a) summarize the available biomarker data in the ADNI dataset.

Fig. 1: The ADNI dataset histogram and the calibrated causal model plots.
figure 1

a Histogram of four biomarkers in the ADNI dataset. Top left is amyloid-beta; top right is the tau; bottom left is the normalized hippocampal volume; bottom right is the cognitive subscale. X-axes are the corresponding magnitudes of each biomarker, and y-axes are their frequencies. The subjects are classified into “CN”, “LMCI”, and “AD”. b The calibrated causal model on three groups of patient data. X-axes are fitted DPS of biomarkers, and y-axes are the corresponding magnitudes of each biomarker. The orange circles, green triangles, and blue hexagons are data from “CN”, “LMCI”, and “AD” subjects correspondingly. The black solid lines are the solutions of the causal model. c The calibrated causal model on the dataset of LMCI and AD groups.

Algorithm 1

Population model calibration algorithm to compute the population parameters w(1) and DPS parameters (α, β). See details in Methods section.

Input \({{{\boldsymbol{y}}}}={\{{y}_{ijk}\}}_{ijk}\), \({{{\boldsymbol{t}}}}={\{{t}_{ij}\}}_{ij}\).

Initialize α0, β0, and w0.

1: for l=1 to L do

2: for k {A, T, N, C} do Population parameter calibration

3: \({{{{\boldsymbol{w}}}}}_{k}^{l}={{{\mbox{argmin}}}}_{{{{\boldsymbol{{w}}}_{k}}}}{\sum }_{(i,j)\in {{{{\mathcal{I}}}}}_{k}}{\left({y}_{ijk}-{f}_{k}\left({\alpha }_{i}^{l}{t}_{ij}+{\beta }_{i}^{l};{{{{\boldsymbol{w}}}}}_{k}^{l-1}\right)\right)}^{2}.\)

4: \({\sigma }_{k}^{l}=\frac{1}{| {{{{\mathcal{I}}}}}_{k}-2I-4| }{\sum }_{(i,j)\in {{{{\mathcal{I}}}}}_{k}}{\left({y}_{ijk}-{f}_{k}\left({\alpha }_{i}^{l}{t}_{ij}+{\beta }_{i}^{l};{{{{\boldsymbol{w}}}}}_{k}^{l-1}\right)\right)}^{2}.\)

5: end for


7: for i=1 to I do Update DPS parameters

8: \(({\alpha }_{i}^{l},\,{\beta }_{i}^{l})={{{\mbox{argmin}}}}_{{\alpha }_{i},{\beta }_{i}}{\sum }_{(j,k)\in {{{{\mathcal{I}}}}}_{i}}\frac{1}{{\sigma }_{k}^{l}}{\left({y}_{ijk}-{f}_{k}\left({\alpha }_{i}^{l}{t}_{ij}+{\beta }_{i}^{l};{{{{\boldsymbol{w}}}}}_{k}^{l}\right)\right)}^{2}.\)

9: end for

10: end for

Output wL as the population parameter w(1), αL, βL.

By using Algorithm 1, the initial value of αi is randomly chosen in (0, 4) and the initial value of βi satisfies − 10 ≤ si(t) ≤ 20 on all the measurement. Then we obtain the population model in terms of the fitted DPS shown in Fig. 1(b). The population model (black solid) is learned on three different disease stages, namely, CN (orange circle), LMCI (green circle), and AD (blue hexagons). The gray area is the confidence interval of the population model. More specifically, we sample the population parameters, w(1), from the posterior distribution (given by the simulation study) and run the model with the same initial condition 1000 times. Then the 95% confidence interval at every time point is plotted. The simulation study and diagnostic plots corresponding to the population model are shown in the Supplementary Materials. From this figure, we can separate biomarkers into three stages according to the population model. In particular, CN and AD patients correspond to s < 0 and s > 0, respectively while LMCI patients locate around s = 0. Moreover, the first three biomarkers (Aβ, τ, and N) start at steady-states when s < 0 (CN), change gradually when s = 0 (LMCI), and finally approach another steady-state (AD). Different from other biomarkers, ADAS continues to grow which means that cognitive symptoms get worse as AD progresses.

We also compare the population model with the sigmoid function fitting (black solid in Fig. 2). First, the population model provides relatively smooth transitions from one stage to another while the sigmoid function fitting gives more abrupt changes for Aβ at s ≈ 4. Second, the population model follows the biomarker cascade theory which is that τ rises after Aβ starts decreasing, N increases after τ, and C rises after N. However, the sigmoid function fitting makes Aβ and τ change after s = 0, while N and C change at s ≈ −3 and s ≈ −10.

Fig. 2: Subject biomarkers in ADNI data and the fitted sigmoid function.
figure 2

Top left is amyloid-beta; top right is the tau; bottom left is the normalized hippocampal volume; bottom right is the cognitive subscale. X-axes are fitted DPS of biomarkers, and y-axes are the corresponding magnitudes of each biomarker. The subjects are classified into “CN”, “LMCI”, and “AD” according to ADNI diagnostic groups, where orange is “CN”, green is “LMCI”, and blue is “AD”. The black solid lines are fitted with sigmoid functions28.

Since the CN group might not follow the same disease trajectory, we next derive a population model based on the LMCI and AD groups only. In order to better identify the biomarker dynamics among LMCI and AD group patients, we fix the parameters of DPS, (α, β), that we obtained before and only update the causal model parameters, w, by using Algorithm 1. Figure 1(c) shows the population model based on LMCI and AD groups.

The transitions for different biomarkers shown in Fig. 1(c) are similar to Fig. 1(b). But Fig. 1(c) advances the onset of changes since the LMCI and AD groups are prone to suffer from cognitive decline earlier. We summarize the parameters of the above-mentioned causal models in Table 1, which corresponds to the results given in Fig. 1(b) & (c).

Table 1 Population parameters w(1) of the calibrated causal models based on the ADNI dataset.

Sensitivity analysis

The quasi-Monte Carlo method is applied to compute sensitivity indices. For more details about Sobol sensitivity analysis, please refer to29,30. By taking C(0) as the output, Fig. 3(a) list the top nine most sensitive parameters for the first-order effects and total order sensitivity index. We see that the weight with greater first-order impact (Sm > 0.4) is associated with Aβ().

Fig. 3: The Sobol sensitivity analysis plots with respect to cognitive decline.
figure 3

a First-order, second-order, and total-order Sobol sensitivities of C(0). Left: the red rectangles are assigned first-order sensitivities of model parameters, and the blue rectangles are their total-order sensitivities. The length of the rectangles represents the attribution of sensitivities to outputs; Right: each square represents the second-order sensitivity correlations of two model parameters. The lighter the color, the stronger the positive correlation while the darker the color, the stronger the negative correlation. b The dynamics of first-order Sobol sensitivities with respect to DPS. Each curve corresponds to the first-order sensitivity values with an output C(s). Only first-order sensitivity values greater than 0.01 are plotted. c The dynamics of second-order Sobol sensitivities for two parameters with respect to DPS. Only the maximum absolute second-order sensitivity values greater than 0.01 are plotted.

The right figure in 3(a) shows the second-order interaction between two parameters. We see that the parameters associated with \({A}_{\beta }^{2}\) are always positively related to other terms. While Aβ with parameter wA1 is almost positively related with other terms, the Aβ term with parameter wT3 are negatively related with other parameters except \({A}_{\beta }^{2}\). But compared to their first-order sensitivity contribution, the second-order ones contribute slightly.

Figure 3(b, c) shows the dynamics of sensitivities with respect to DPS. From the figures, we can see the first-order sensitivity value of wA1 drops down over DPS which implies that the effect of Abeta on cognitive decline switches from linear to nonlinear in later-stage disease. At the same time, the first-order sensitivity values of some other parameters increase gradually, with a notable increase of wC3 and wC5. The second-order sensitivities between different parameters eventually converge to zero thus the interactions among different parameters become less as the biomarkers reach equilibrium. Based on the results shown in Fig. 3(a), we select wA1, wA2, wT4, wT5, wN4, wN5, wC3, and wC5 as the most sensitive parameters for personalization by setting the threshold, Tol, as 0.01 in Algorithm 2.

Personalized model and biomarker prediction

Algorithm 2

Personalized model calibration algorithm. The personalized parameters are initialized by the population model. The personalized models are applied for subjects who meet the requirement denoted as i Ω.

Input longitudinal biomarker data {yijk} at {tij} with i Ω;

Input the DPS parameter values (αi, βi) for each subject i Ω;

Input the population parameter values w(1) (w for simplicity);

Input sensitivity threshold, TOL.

1: for m=1 to 21 do First order sensitivity.

2: \({{{{\rm{S}}}}}_{m}(z)=\frac{{{{{\rm{Var}}}}}_{{w}_{m}}\left[{{{{\rm{E}}}}}_{{{{\rm{{w}}}_{ \sim m}}}}(z| {w}_{m})\right]}{{{{\rm{Var}}}}(z)}.\)

3: if Sm(z)≥ TOL then

4: set wm as a personalized parameter and denote as \({w}_{m}^{(2)}\) else

5: keep wm as a population parameter.

6: end if

7: end for


9: for i=1 to Ω do Personalized model calibration.

10: for k {A, T, N, C} do

11: Denote the personalized parameters in k-th equation as \({{{{\boldsymbol{{w}}}_{k}}}}^{(2)}\).

12: Select parameters to calibrate.

13: \({{{{\boldsymbol{{w}}}_{k}}}}^{(2)}={\arg \min }_{{{{{\boldsymbol{{w}}}_{k}}}}^{(2)}}\mathop{\sum }\limits_{j=1}^{M-1}{\left({\hat{y}}_{ijk}-{f}_{k}\left({\alpha }_{i}{t}_{ij}+{\beta }_{i};{{{{\boldsymbol{{w}}}_{k}}}}^{(2)}\right)\right)}^{2}.\)

14: \(P{A}_{ik}=\frac{{\hat{y}}_{iMk}-{f}_{k}\left({\alpha }_{i}{t}_{(iM)}+{\beta }_{i};{{{{\boldsymbol{{w}}}_{k}}}}^{(2)}\right)}{{\hat{y}}_{iMk}}\times 100 \% .\)

15: Compute prediction accuracy.

16: end for

17: end for

Output PAik for i Ω and k {A, T, N, C}.

Next, we build personalized models and provide biomarker prediction for subjects whose data satisfies the following two criteria: (1) There are at least four measurements for each biomarker; (2) Each biomarker measurement changes monotonically with respect to DPS. Based on the first-order sensitivity analysis results shown in Fig. 3(a), we chose the eight most sensitive parameters as personalized parameters by choosing TOL = 0.01 in Algorithm 2. For each subject, we denote the biomarker data as \(\hat{{{{\boldsymbol{y}}}}}({s}_{i})={[{\hat{{{{\boldsymbol{A}}}}}}_{\beta }({s}_{i})\hat{{{{\boldsymbol{\tau }}}}}({s}_{i})\hat{{{{\boldsymbol{N}}}}}({s}_{i})\hat{{{{\boldsymbol{C}}}}}({s}_{i})]}^{T}\) (i = 1,  , M), fit the sensitive personalized parameters of the population model w(1) by using the first M − 1 data points, and test the prediction accuracy on the last data point by \(\frac{\hat{{{{\boldsymbol{y}}}}}({s}_{M})-{{{\boldsymbol{y}}}}({s}_{M})}{\hat{{{{\boldsymbol{y}}}}}({s}_{M})}\times 100 \%\). A detailed procedure is outlined in Algorithm 2.

Figure 4 shows the biomarker trajectories of the personalized model by training (blue) and testing (red) data for one subject (pseudo ID = 18). We also compare the personalized model with the sigmoid function fitting, the personalized model provides a better prediction accuracy. In fact, the prediction accuracies given by the personalized model are 97.3% (Aβ), 95.9% (τ), 98.4% (N), and 95.1% (C), respectively while the ones given by the sigmoid function fitting are 95.5% (Aβ), 90.8% (τ), 95.7% (N), and 63.4% (C), respectively. Since the sigmoid function fitting predicts by using the longitudinal information of the current biomarker only, it provides a less accurate cognitive score.

Fig. 4: The personalized model for one LMCI subject with pseudo ID = 18.
figure 4

The green dashed lines are fitted by sigmoid functions, and black solid lines are the solutions of the personalized model. The blue markers are training data points while the red markers are for testing.

Furthermore, we build personalized models for the CN and LMCI groups (there are not enough data points in the AD group) with different numbers of longitudinal data points and summarize the predictive results in Tables 23. The tables indicate that our personalized models can provide high predictive accuracy compared to the sigmoid function fitting. Moreover, the accuracy of predicting biomarker dynamics increases as the number of biomarkers data points increases.

Table 2 The prediction accuracy summary for CN subjects using different numbers of longitudinal biomarker datapoints (n).
Table 3 The prediction accuracy summary for LMCI subjects using different longitudinal data points (shown in the first column).


Different from the existing pathophysiological AD network which is based on a priori assumptions about biomarker trajectories, this work develops a data-driven causal modeling approach informed by AD clinical biomarker data and demonstrates both population and personalized models. The proposed population model traces the general biomarker dynamics for all patient data without any specific assumptions regarding the form of the model and enables personalized AD risk prediction via incorporating historical clinical data such as CSF protein and imaging biomarkers as well as cognitive scores. By introducing a DPS for each subject, we calibrate and scale AD biomarker progression across the ADNI population and derive population parameters. We also compare the proposed data-driven modeling approach to an empirical fitting approach with a sigmoid function fitting and conclude that the proposed causal model is able to better capture disease progression with a smoother transition over time. Moreover, this causal model allows us to explore the underlying cascade relationship among biomarkers, while the empirical sigmoid function approach considers each biomarker as an independent term. The population model not only provides a means to classify different stages of AD progression for each biomarker, but also lays the foundation for personalized modeling.

Before constructing the personalized model, we performed a sensitivity analysis for the population parameters. From a clinical standpoint, the sensitivity analysis provides insights on AD progression in terms of which parameters play the greatest role in disease progression, and when during the disease course they are most relevant. From a computational standpoint, the sensitivity analysis aids the subsequent personalized parameter selection . Based on the sensitivity analysis, we see that change in cognition is driven primarily by first-order effects and is time-dependent. Initially, the greatest effects are by amyloid, represented by wA1, and to a lesser extent tau and neuronal vulnerability to tau, represented by wN4 and wN5, respectively. The amyloid parameter wA1 is most sensitive when the disease starts (DPS = 0) and the sensitivity diminishes as DPS increases. On the other hand, the sensitivity of parameters related to N and C, namely wC3 and wC5, increase significantly as the disease progresses. Thus, the sensitivity analysis suggests that at the early stage of AD cognitive decline is driven by Aβ levels and sensitivity decreases linearly as the disease progresses. Whereas at the later stages, cognitive decline is driven mainly by downstream effects including the level of neuronal degeneration, represented by wC3, and the interaction of cognition and neuronal degeneration, represented by wC5. These results are consistent with prior observational studies based on ADNI and other longitudinal cohorts, which suggest that cognitive decline is driven primarily by high amyloid levels at earlier disease stages and by neurodegeneration at later stages31.

Sensitivity analysis also provides key insights in terms of personalized parameter selection. The paucity of longitudinal biomarker data and the relatively larger number of model parameters can easily lead to overfitting for personalized models. Based on the sensitivity analysis results, we chose the eight most sensitive population parameters as personalized parameters and set the rest of the parameters at the mean population parameter values. In this case, calibration of personalized parameters based on sparse longitudinal biomarker data for each patient avoids the overfitting issue and provides a high-precision personalized prediction for each subject, as outlined in Results section.

Limitations of this work include sampling bias. Because the ADNI dataset is a research cohort from academic clinics, only one-third of ADNI subjects agreed to provide CSF biomarkers. Thus we need to replicate these findings using data from more general practice settings in the future. Despite these limitations, this model advances our understanding of the complexity of AD biomarker pathophysiology over that of current biomarker models which have primarily been independent and ad hoc in nature, with inherent assumptions regarding the shape of individual biomarker trajectories. Our current approach is integrative and based on the cascade mechanism, yet without assumptions regarding the exact mathematical form of the individual biomarker models or the resulting shape of the biomarker trajectories. In the future, we intend to extend the current approach to the spatiotemporal domain by utilizing longitudinal imaging data to determine mechanisms driving the spread of pathology in time and space.


We propose a pathophysiology and data-driven modeling approach to construct a causal model of AD clinical biomarkers. We construct a causal model from the serial clinical biomarker measures across 819 subjects from the ADNI-1 datasets with mild AD (N = 192), late mild cognitive impairment (LMCI, N = 398), and normal cognition (N = 229) (more details are shown in Table 4). We use PseudoIDs instead of RIDs to link across all clinical biomarker data belonging to a patient. The CSF proteins measured in ADNI are the following A-Beta 42 and Phosphorolated tau 181 (p-tau 181)32,33. These measures were obtained through serial spinal taps on subjects over approximately two-year intervals. Of note, A-Beta in the CSF goes down, and total and phosphorylated tau go up as the disease progresses. Hippocampal volume, a measure of neurodegeneration, was measured through volumetric analysis of serial MRI images obtained at approximately one-year intervals. It goes down as the disease progresses. Finally, cognitive decline was measured through a pencil-and-paper neuropsychological test, the thirteen-item Alzheimer s Disease Cognitive Assessment Scale (ADAS13). This measures function in several cognitive domains affected by AD, including memory, language, and praxis and is the de facto primary outcome measure in AD clinical trials. It goes up as the disease progresses.

Table 4 The characteristics of the ADNI-1 dataset in data-driven modeling.

After constructing and calibrating the population model with data across all ADNI subjects, we then personalize the parameters of the model using each patient’s longitudinal data to provide a personalized prediction of biomarker trajectories. The overall modeling approach is outlined in Fig. 5, and each step is elaborated in the following subsections.

Fig. 5: A flowchart of the pathophysiology and data-driven modeling approach.
figure 5

Given the initialized ODE model, a causal model is obtained by fitting the ADNI dataset and DPS model through sparse learning; secondly, the ADNI dataset is used to calibrate the population parameters in the causal model and obtain the population model; thirdly, a sensitivity analysis is applied to analyze the sensitivity of each population parameters and determine the sensitive personalized parameters, and a simulate study is conducted to validate the population model. Then, the personalized model is obtained by calibrating the sensitive personalized parameters with the use of personalized data. A prediction is made by the personalized model in the end.

The data-driven causal model learning via ADNI dataset

Four AD biomarkers are key factors in AD diagnosis and monitoring of AD progression, and include amyloid-beta Aβ, tau τ, neuronal degeneration N, and cognitive decline C. Amyloid-beta is the main component of amyloid plaques and is considered to be an early event of the pathological cascade of AD. Amyloid production leads to downstream Tau phosphorylation causing the formation of neurofibrillary tangles and neuropil threads. Tau is a microtubule-associated protein, which is very common in neurons of the central nervous system. Both amyloid-beta and tau phosphorylation contribute to neuronal degeneration and cognitive decline.

To describe the cascade relationship among the above-mentioned four biomarkers of AD progression, we consider a canonical system of ODEs to describe their relations. The amyloid-dependent cascade is initiated by amyloid-beta pathology Aβ, and mediated via tau τ. Neuron degeneration N starts with the rise of tau τ, and in turn, leads to the initiation of cognitive decline C. According to the above description, we consider the causal model as the system of ODEs:

$$\left\{\begin{array}{lll}\frac{d{A}_{\beta }}{dt}=\mathop{\sum }\limits_{\ell = 0}^{m}{w}_{1,\ell }{\phi }_{\ell }({A}_{\beta });\frac{d\tau }{dt}=\mathop{\sum }\limits_{| {{{\boldsymbol{\ell }}}}| \le m}{w}_{2,{{{\boldsymbol{\ell }}}}}{\psi }_{{{{\boldsymbol{\ell }}}}}({A}_{\beta },\tau );\\ \frac{dN}{dt}=\mathop{\sum }\limits_{| {{{\boldsymbol{\ell }}}}| \le m}{w}_{3,{{{\boldsymbol{\ell }}}}}{\psi }_{{{{\boldsymbol{\ell }}}}}(\tau ,N);\frac{dC}{dt}=\mathop{\sum }\limits_{| {{{\boldsymbol{\ell }}}}| \le m}{w}_{4,{{{\boldsymbol{\ell }}}}}{\psi }_{{{{\boldsymbol{\ell }}}}}(N,C),\end{array}\right.$$

where  = (1, 2),  = 1 + 2, and m is the degree of the model. We choose the polynomial basis function in the initialized ODE model, namely,

$${\phi }_{\ell }(x)={x}^{\ell },\,\,{\psi }_{{{{\boldsymbol{\ell }}}}}(x,y)={x}^{{\ell }_{1}}{y}^{{\ell }_{2}}.$$

We then learn the causal model parameters in (1) by using ADNI data. More specifically, we use CSF amyloid-beta 1-42 (Aβ), CSF total tau (τ), the ratio of hippocampal volume to whole-brain volume on MRI (N), and the Alzheimer’s Disease Assessment Scale-cognitive (C) to calibrate Aβ, τ, N, and C, respectively in the causal model. In order to denoise longitudinal data for different subjects, we applied a sigmoid interpolation for each biomarker. Moreover, because AD has a different time of onset and rate of progression for different subjects, we employ DPS28 to unify the time scale across subjects in the causal model.

Disease progression scores

For different subjects in ADNI, the onset of disease and rate of progression are different within and among subject classes of CN, LMCI and AD. To fit the causal model for all subjects in the ADNI-1 study, we standardize the longitudinal measurement among patients by employing the DPS28. In particular, we define DPS si(t) as a linear function of the patient’s age t for each patient:

$${s}_{i}(t)={\alpha }_{i}\cdot t+{\beta }_{i},$$

where i = 1, 2,  , I is the patient index, αi is the rate of AD progression, and βi is the age of AD onset.

The sigmoid function fitting

We fit each biomarker data in ADNI to a sigmoid function. Specifically, each biomarker is parameterized by four parameters \({{{{\boldsymbol{\theta }}}}}_{k}={[{a}_{k},{b}_{k},{c}_{k},{d}_{k}]}^{T}\):

$${g}_{k}\left(s;{{{{\boldsymbol{\theta }}}}}_{k}\right)={a}_{k}{(1+{e}^{-{b}_{k}\left(s-{c}_{k}\right)})}^{-1}+{d}_{k},$$

where ak is a magnitude scale of the function, bk is a slope coefficient, and ck and dk determine function positions. Here we take g1(s) = Aβ(s), g2(s) = τ(s), g3(s) = N(s), g4(s) = C(s) and denote \({{{\boldsymbol{g}}}}={({g}_{1},{g}_{2},{g}_{3},{g}_{4})}^{T}\).

Next, we apply the sparse learning to reveal the causal model in (1) which is re-written as

$$\frac{d{{{\boldsymbol{x}}}}}{ds}=\mathop{\sum}\limits_{| \ell | \le m}{{{{\boldsymbol{\phi }}}}}_{\ell }({{{\boldsymbol{x}}}}){w}_{\ell },\,{{{\rm{where}}}}\,{{{\boldsymbol{x}}}}={({A}_{\beta },\tau ,N,C)}^{T}\in {R}^{4}.$$

By taking uniform grid points \({\{{s}_{i}\}}_{i = 1}^{M}\) on s [−10, 20], we denote

$${D}_{i}=[{\phi }_{{\ell }_{1}}({{{\boldsymbol{g}}}}({s}_{i})),\cdots \,,{\phi }_{{\ell }_{n}}({{{\boldsymbol{g}}}}({s}_{i}))]\,and\,{b}_{i}=\frac{d({{{\boldsymbol{g}}}}({s}_{i}))}{ds},$$

where 1,  , n are in the set of  ≤ m. By expanding

$$D=\left(\begin{array}{l}{D}_{1}\\ \vdots \\ {D}_{M}\end{array}\right){{{\rm{and}}}}\,b=\left(\begin{array}{l}{b}_{1}\\ \vdots \\ {b}_{M}\end{array}\right),$$

we learn the causal model via the following Lasso regression, namely,

$$\mathop{\min }\limits_{w}\parallel Dw-b{\parallel }_{2}^{2}+\lambda \parallel w{\parallel }_{1},$$

where w1 enforces the sparsity.

Here we keep the polynomial degrees among all the variables in the causal model be consistent and choose m = 4 with λ = 10−7 in (5). By performing Lasso, we find the result is consistent with the causal model when m = 2 but different from the one with m = 1, which indicates the optimal choice of the causal model is m = 2. Then the general causal model of ODEs describing the progression of AD biomarkers is summarized below (All rights to the in-silico model belong to the authors and it cannot be used for any commercial purpose without permission):

$$\left\{\begin{array}{l}\frac{d{A}_{\beta }}{ds}={w}_{A0}+{w}_{A1}{A}_{\beta }+{w}_{A2}{A}_{\beta }^{2};\\ \frac{d\tau }{ds}={w}_{T0}+{w}_{T1}\tau +{w}_{T2}{\tau }^{2}+{w}_{T3}{A}_{\beta }+{w}_{T4}{A}_{\beta }^{2}+{w}_{T5}{A}_{\beta }\tau ;\\ \frac{dN}{ds}={w}_{N0}+{w}_{N1}N+{w}_{N2}{N}^{2}+{w}_{N3}\tau +{w}_{N4}{\tau }^{2}+{w}_{N5}\tau N;\\ \frac{dC}{ds}={w}_{C0}+{w}_{C1}C+{w}_{C2}{C}^{2}+{w}_{C3}N+{w}_{C4}{N}^{2}+{w}_{C5}NC,\end{array}\right.$$

with an initial condition Aβ(−10) = y0 and τ(−10) = N(−10) = C(−10) = 0, where y0 is also a parameter that we consider as a small positive value to initiate the cascade.

Population model calibration

First, we calibrate the learned causal model by using the ADNI dataset and rewrite (6) as the following population model

$$\left\{\begin{array}{lll}\frac{d{A}_{\beta }}{ds}=\mathop{\sum }\limits_{\ell = 0}^{2}{w}_{A,\ell }^{(1)}{A}_{\beta }^{\ell },\frac{d\tau }{ds}=\mathop {\sum}\limits_{| \ell | \le 2}{w}_{T,\ell }^{(1)}{A}_{\beta }^{{\ell }_{1}}{\tau }^{{\ell }_{2}},\\ \frac{dN}{ds}=\mathop {\sum}\limits_{| \ell | \le 2}{w}_{N,\ell }^{(1)}{\tau }_{\rho }^{{\ell }_{1}}{N}^{{\ell }_{2}},\frac{dC}{ds}=\mathop {\sum}\limits_{| \ell | \le 2}{w}_{C,\ell }^{(1)}{N}^{{\ell }_{1}}{C}^{{\ell }_{2}},\end{array}\right.$$

where \({{{\boldsymbol{w}}}}=\{{w}_{A,\ell }^{(1)},{w}_{T,\ell }^{(1)},{w}_{N,\ell }^{(1)},{w}_{C,\ell }^{(1)}\}\) denote the population parameters. We also denote f1(s) = Aβ(s), f2(s) = τ(s), f3(s) = N(s), and f4(s) = C(s) with the initial conditions f1(−10) = y0, f2(−10) = f3(−10) = f4(−10) = 0. Then the population parameters are calibrated based on the ADNI dataset by minimizing the sum of squared differences between the data and the solution of the causal model, namely

$$\mathop{\min }\limits_{{{{\boldsymbol{{w}}}_{k}}}}\mathop{\sum}\limits_{(i,j)\in {{{{\mathcal{I}}}}}_{k}}{\left({y}_{ijk}-{f}_{k}\left({\alpha }_{i}{t}_{ij}+{\beta }_{i};{{{{\boldsymbol{w}}}}}_{k}\right)\right)}^{2},\,\,(i,j,k)\in {{{\mathcal{I}}}}$$

where yijk is the k-th biomarker data for i-th patient at j-th visit and \({{{{\mathcal{I}}}}}_{k}\) is the set of (i, j) for k-th biomarker.

Since the biomarkers for each patient will generally increases or decreases monotonically, we consider fitting DPS as a least square linear regression problem, namely,

$$\mathop{\min }\limits_{{\alpha }_{i},{\beta }_{i}}\mathop{\sum}\limits_{(j,k)\in {{{{\mathcal{I}}}}}_{i}}\frac{1}{{\sigma }_{k}}{\left({y}_{ijk}-{f}_{k}\left({\alpha }_{i}{t}_{ij}+{\beta }_{i};{{{{\boldsymbol{w}}}}}_{k}\right)\right)}^{2},$$

where \({{{{\mathcal{I}}}}}_{i}\) is set of (j, k) for i-th patient and σk is the sum of squared error with respect to biomarker k, namely,

$${\sigma }_{k}=\frac{1}{| {{{{\mathcal{I}}}}}_{k}-2I-4| }\mathop{\sum}\limits_{(i,j)\in {{{{\mathcal{I}}}}}_{k}}{\left({y}_{ijk}-{f}_{k}\left({\alpha }_{i}{t}_{ij}+{\beta }_{i};{{{{\boldsymbol{w}}}}}_{k}\right)\right)}^{2}.$$

The detailed procedure to fit the parameters is shown in Algorithm 1. The optimization solver employs the Levenberg-Marquardt method34, which can avoid getting stuck in a local minimum.

Sensitivity analysis

We assume that the parameters in the population model, \({{{{\boldsymbol{w}}}}}^{(1)}=[{w}_{A0}^{(1)},\,{w}_{A1}^{(1)},\,\cdots \,,\,{w}_{m}^{(1)},\,\cdots \,,\,{w}_{C4}^{(1)},\,{w}_{C5}^{(1)}]\in {{\mathbb{R}}}^{21}\), are independent and identically distributed inputs, where m is the index of inputs. For sensitivity analysis, we omit the superscript of the parameters later for simplicity. The range of each input is 90–110% of their values shown in Table 1.

Then we perform Sobol sensitivity analysis, which is also called variance-based sensitivity analysis and is developed from the analysis of variance. As a global sensitivity analysis method, it analyzes the effects of each input by decomposing the variance of the output of the population model into fractions attributed to the inputs. In this paper, we perform both the first-order and second-order sensitivity analyses to the parameters. In particular, the first-order sensitivity index measures the attribution to the variance of the output considering only one input, which is calculated by:

$${{{{\rm{S}}}}}_{m}(y)=\frac{{{{{\rm{Var}}}}}_{{w}_{m}}\left[{{{{\rm{E}}}}}_{{{{{\rm{w}}}}}_{ \sim {{{\rm{m}}}}}}(y| {w}_{m})\right]}{{{{\rm{Var}}}}(y)},$$

where \({w}_{ \sim m}=\left[{w}_{A1},\,\cdots \,,\,{w}_{m-1},\,{w}_{m+1},\,\cdots \,,\,{w}_{C5}\right]\) includes all inputs except wm. Next, the second order sensitivity with respect to m and n is measured by sum of attributing the variance of the output considering their first order effects and the second-order interaction between inputs m and n:

$${{{{\rm{S}}}}}_{(m,n)}(y)={{{{\rm{S}}}}}_{m}(y)+{{{{\rm{S}}}}}_{n}(y)+\frac{{{{{\rm{Var}}}}}_{({w}_{m},{w}_{n})}\left[{{{{\rm{E}}}}}_{{{{{\rm{w}}}}}_{ \sim {{{\rm{m}}}},{{{\rm{n}}}}}}(y| {w}_{m},{w}_{n})\right]}{{{{\rm{Var}}}}(y)}.$$

Then we measure the total-order sensitivity index, which is calculated by attributing the variance of the output considering both the first-order effect, second-order effect, and other higher-order ones.

$${{{{\rm{S}}}}}_{T,m}(y)=1-\frac{{{{{\rm{Var}}}}}_{{{{\rm{{w}}}_{ \sim m}}}}\left[{{{{\rm{E}}}}}_{{w}_{m}}(y| {{{\rm{{w}}}_{ \sim m}}})\right]}{{{{\rm{Var}}}}(y)}.$$

When the sensitivity value is positive, the corresponding parameter is positively correlated with the model output. If the value is negative, they are negatively correlated. The absolute value of parameter sensitivities represents the degree of influence on the model output. If the sensitivity value is closer to 0, changing this parameter will have less influence on the model output. Based on the sensitivity values and the number of biomarker measurements, we determine the personalized parameters to fit the longitudinal data points for each patient and keep the remaining parameters the same as the population parameter values. This can avoid overfitting when providing the personalized prediction for each subject.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.