Early role of vascular dysregulation on late-onset Alzheimer's disease based on multifactorial data-driven analysis

Multifactorial mechanisms underlying late-onset Alzheimer's disease (LOAD) are poorly characterized from an integrative perspective. Here spatiotemporal alterations in brain amyloid-β deposition, metabolism, vascular, functional activity at rest, structural properties, cognitive integrity and peripheral proteins levels are characterized in relation to LOAD progression. We analyse over 7,700 brain images and tens of plasma and cerebrospinal fluid biomarkers from the Alzheimer's Disease Neuroimaging Initiative (ADNI). Through a multifactorial data-driven analysis, we obtain dynamic LOAD–abnormality indices for all biomarkers, and a tentative temporal ordering of disease progression. Imaging results suggest that intra-brain vascular dysregulation is an early pathological event during disease development. Cognitive decline is noticeable from initial LOAD stages, suggesting early memory deficit associated with the primary disease factors. High abnormality levels are also observed for specific proteins associated with the vascular system's integrity. Although still subjected to the sensitivity of the algorithms and biomarkers employed, our results might contribute to the development of preventive therapeutic interventions.

L ate-onset Alzheimer's disease (LOAD), the most common form of human dementia, is not causally associated with any unique neuropathological mechanism but rather with multiple concomitant factors. The high complexity of the mechanisms underlying the disease and the current lack of quantitative integrative models comparing them make our understanding of LOAD outcome/progression and the development of effective disease-modifying therapeutic agents difficult.
Historically, different hypotheses for the origin of the disease have been proposed and the most consistent are still the subject of scientific debate [1][2][3] . The vascular dysregulation hypothesis, dating from the early 1900s, proposes alterations to the balance between the blood flow substrate delivery and the neuronal/glial energy demands, which lead to brain dysfunction and disease [3][4][5] . Alternatively, amyloid-b (Ab) and tau misfolded proteins are thought to have a causal role on the cascade of cognitive/clinical events leading to LOAD 2,6 . The metabolic dysregulation hypothesis postulates impaired compensatory mechanisms associated with neuronal/glial energy production 7 . More recently, neuronal activity-dependent degeneration mechanisms have been postulated to explain the pathology as a consequence of neuronal/synaptic hyperactivity that expands a 'toxic' effect on surrounding connected neurons/synapses 1,8 . Tissular neurodegeneration and associated grey matter atrophy are other common hallmarks of disease progression, although their causes and roles are not totally understood, being generally thought of as a consequence of previous neuropathological factors.
Despite their importance, models that define a multifactorial LOAD pathogenesis [9][10][11][12][13][14] have generally been based on limited data that do not cover the multiplicity of possible biological factors that influence disease progression. For instance, in Jack et al. 10 , one of the most cited models of LOAD progression, the vascular dysregulation and the functional impairment components are ignored, even when these factors were historically and consistently associated with the disease's underlying mechanisms 1,[3][4][5]8 . Motivated by this lack of an integrative LOAD description, here we propose a multifactorial data-driven analysis (MFDDA) approach, wherein alteration levels of Ab misfolded proteins, metabolism, vascular regulation, functional activity at rest, structural tissue properties and protein levels are spatiotemporally characterized in relation to LOAD progression.
We analysed over 7,700 multimodality brain images and tens of different plasma and cerebrospinal fluid (CSF) biomarkers from 1,171 healthy and diseased subjects. Comparing the characteristic trajectory of each imaging or biospecimen biomarker in pathologic versus healthy aging, our data-driven approach revealed a multifactorial temporal ordering of disease progression. According to this ordering, and under the assumption that the analysed biomarkers represent specific physiological processes, vascular dysregulation might be the earliest/strongest brain pathologic factor associated with LOAD development, followed in order by Ab deposition, glucose metabolism dysregulation, functional impairment, and grey matter atrophy. Symptoms of cognitive decline were observed from initial LOAD stages, suggesting a continuous memory deterioration caused by subtle pathological alterations in the primary disease factors (for example, vascular/metabolic dysregulation and Ab effects). Our plasma and CSF results suggest the presence of early peripheral vascular alterations in the disease, and reveal new evidence about inflammatory activation, insulin resistance and associated lipid metabolism dysfunction. Finally, we highlight current limitations and challenges associated with the multifactorial modelling of LOAD. In addition to improving our understanding of LOAD, the methodology proposed in this study could be used for the analysis of other devastating human neurodegenerative disorders.

Results
Capturing abnormal biomarker trajectories in unhealthy aging. We evaluated Ab misfolded proteins, glucose metabolism, cerebral blood flow, functional activity and/or structural tissue brain patterns in a cohort of 1,171 subjects from the ADNI database (Methods section, Study participants; Supplementary Table 1). These five biological factors were mapped in vivo using corresponding neuroimaging techniques ( Fig. 1a; Methods section, Data Description and Processing): Florbetapir positron emission tomography (PET; for Ab deposition), Fluorodeoxyglucose PET (for glucose metabolism), Arterial Spin Labeling (ASL, for cerebral blood flow), resting functional magnetic resonance imaging (MRI; for neuronal activity at rest) and structural MRI (for structural tissular properties). Each participant was previously diagnosed at each visit as healthy control (HC), early mild cognitive impairment (EMCI), late mild cognitive impairment (LMCI) or probable Alzheimer's disease patient (LOAD). In addition, participants were cognitively and genetically characterized (for example, according to the Mini Mental State Examination (MMSE) or to the number of apoee4 allele copies, respectively). See Supplementary Table 1 for a detailed sample description across all data modalities. For each mentioned biological factor, representative regional values were calculated for 78 regions covering all the grey matter 15 . See Methods section (image processing subsections) for a description of evaluated multimodality imaging measurements.
We proceeded to reconstruct the characteristic trajectory of each biological factor at each brain region during healthy or unhealthy aging. For this, different aging-mediated disease trajectories were generated using a generative spatiotemporal model ( Fig. 1b- (Fig. 1c). Next, spatiotemporal abnormality trajectories for each specific biomarker were obtained comparing the mean characteristic curve for each diseased clinical transition with the corresponding curve for the healthy aging transition (Fig. 2a-e; MFDDA, Methods section). In addition, these trajectories were used to calculate a total LOAD-abnormality index for each biological factor and brain region, that is, the normalized area under the obtained abnormality curve (MFDDA, Methods section).
This generative procedure was repeated 500 times via a bootstrapping technique, which improved the robustness of the estimations and allowed to control the stability of the results. Finally, each factor-regional trajectory and associated abnormality scores were calculated as the mean of all the bootstrap outcomes. Similarly, aging/disease characteristic trajectories and corresponding abnormality trajectories/scores were generated for MMSE, included as a measure of symptoms severity 13 , and for 146 plasma and 87 CSF potential biomarkers (Supplementary  Tables 2 and 3). For further details see Fig. 1, Methods section (MFDDA subsection) and Supplementary Note 1.

Multifactorial biomarker ordering in LOAD progression.
Identification of the sequence of pathological events underlying LOAD progression is still a major challenge (Fig. 2). In the last decade, different models have been proposed [9][10][11][12][13][14]16 . These studies have contributed to the understanding of the ordering in biomarker abnormalities associated with LOAD, using different observational 9,10 or data-driven perspectives [11][12][13][14] . In addition to being less sensitive to subjective criteria, data-driven models present the advantage of being directly applicable to different diseases. For example, in refs 11,12 a probabilistic event-based   Figure 1 | Representation of the multifactorial data-driven generative approach. (a) Brain multimodality images and plasma/CSF biomarkers. (b) Regional patterns for Ab deposition across the entire sample. (c) Reconstructed regional Ab characteristic trajectories for HC to LOAD (left) and HC to HC (right) clinical transitions, over a 30-year aging period. (d) Regional (left) and total (right) Ab abnormality trajectories during the age-mediated clinical transitions. model is applied to Alzheimer's and Huntington's cohorts, providing disease-specific pathologic event orderings and individual disease states, without the assumption of a priori event ordering or requiring an initial grouping of the patients into clinical stages.
However, in general, previous models of LOAD progression considered an insufficient number of interrelated neuropathological factors and/or brain areas. For example, it is common to find multimodal analyses of grey matter atrophy, Ab deposition and/or functional impairment, in which vascular dysregulation is not included (see refs 1,9,10,16). In other studies 14,17 , conclusions have been based only on the observation of specific brain areas, those presumably more affected in advanced LOAD states (for example, hippocampal, ventricles and entorhinal regions). In refs 11,12, the analyses were limited to structural, cognitive, Ab classification in negative and positive subjects, and/or a few peripheral protein biomarkers (CSF Ab 1-42 , tau and phosphorylated tau (ptau)); brain vascular, functional and metabolic components, as well as other relevant peripheral protein biomarkers, were not considered. See the Discussion section for further description of previous models. Motivated by this lack of integrative LOAD models, here we aimed to identify a comprehensive multifactorial biomarker ordering in LOAD progression based on the spatiotemporal abnormality levels obtained previously for the whole HC to LOAD clinical transition.
First, we did a clinical pairwise comparison between all imaging biomarkers, based on their reconstructed spatiotemporal abnormalities. For each pair of factors (imaging modalities), and for each brain region and time point, a value of 1 was assigned to the factor with the higher abnormality value. This comparison was repeated across all brain regions and time points, and the results were summarized in a 5 Â 5 hierarchical matrix (Fig. 3a). Each element i,j (i,j ¼ 1..5) of this hierarchical matrix reflects the percentage of regions and time points at which the imaging modality j exceeded in abnormality magnitude the modality i during the HC to LOAD clinical transition. The columns of the matrix were reordered keeping from left to right the factors predominating in effect levels. We observed a remarkable predominance of the vascular dysregulation component over the other pathologic biomarkers (Fig. 3a). In total, the vascular factor was B80% more abnormal across all brain regions and time points than were the other factors considered. It was followed in spatiotemporal abnormality levels by Ab deposition, metabolic dysfunction, functional impairment and grey matter atrophy.
Next, to create and compare factor-specific abnormality curves during LOAD development, for each biological factor we calculated the average abnormality curve across all brain regions and, after normalizing by the maximum abnormality value, depicted together the final average curves (Fig. 3b). In the averaging calculation, each region's multifactorial abnormality curves were weighted according to the region's relevance during the pathological progression. For this, we assumed the sum of each region's abnormality levels across all biological factors to be a local multifactorial measure of vulnerability to the disease (Fig. 4). With the purpose of also analysing symptoms severity and peripheral protein alterations as a function of disease progression, we included in our multifactorial analysis the abnormality trajectory obtained for MMSE and for three commonly referenced CSF proteins (Ab 1-42 , tau and ptau 10 ). Again, we observed ( Fig. 3b) a higher abnormality magnitude for the vascular component, which exceeded the alterations in the other factors. Consistent with the previous hierarchical results (Fig. 3a), the vascular dysregulation was followed in abnormality magnitude by Ab deposition, metabolic dysfunction, functional impairment and grey matter atrophy. We noticed similar abnormality levels for Ab deposition, glucose metabolism and neuronal function at the early stages of the disease. However, these three factors diverged in abnormality levels with disease progression, explaining the global differences observed in the hierarchical matrix (Fig. 3a), and coinciding also with a 'slower' but consistent increase in structural atrophy. Around the last phase of the LMCI period, the structural atrophy becomes more abnormal (in terms of biomarker distance to the healthy state) than the functional impairment. In addition, we observed symptoms of memory impairment from very early disease stages. Contrary to what previous observational models proposed 9,10 , alterations in memory preceded the abnormalities observed for different molecular biomarkers (for example, CSF Ab 1-42 , tau and ptau proteins). This suggests that cognitive decline associated with LOAD is not a final output of large brain changes, but a continuous consequence of subtle pathological alterations in primary disease factors (for example, vascular dysregulation and Ab effects).
See Discussion section for further biological interpretation of these results.
Peripheral vascular and inflammatory alterations. Proteins execute central functions in living organisms and their peripheral concentrations/interactions are strongly associated with individual health conditions. This makes the analysis of peripheral protein dynamics a crucial step towards understanding the biological mechanisms underlying aging and associated neurodegenerative diseases. Particularly, peripheral plasma and CSF protein measurements have been suggested as promising biomarkers of pre-symptomatic pathological processes underlying LOAD 18,19 . Here similar to the imaging biomarkers, we aimed to explore possible abnormalities in plasma and CSF proteins associated with LOAD progression. For this, 146 plasma and 87 CSF protein biomarkers were analysed and sorted according to their obtained LOAD-abnormality indices ( Fig. 5; Supplementary  Tables 2 and 3).
Heart-type fatty acid-binding protein (hFABP) was identified as the most abnormal CSF biospecimen ( Fig. 5a,b, Supplementary Table 3). CSF hFABP levels are known to be significantly altered in LOAD patients 20 , having a high predictive power of the progression from MCI to LOAD states 20,21 . The CSF hFABP level is significantly associated with longitudinal atrophy of the entorhinal cortex and other LOAD-vulnerable neuroanatomical regions 22 , and is also considered a sensitive biomarker of specific cardiovascular disorders 23 . Cortisol and Apolipoprotein A (Apo A) were identified as the other most abnormal CSF biospecimens (Fig. 5a). Cortisol is a relevant risk factor for stress, glucose and cardiovascular dysregulation 24 , which has been strongly linked to early phases of LOAD progression and to the hyperactivity of the hypothalamic-pituitary-adrenal axis 25 . Apo A is a high-density lipoprotein with a central role in lipid metabolism. Peripheral Apo A concentration is strongly associated with the integrity of the vascular system and the risk of developing cardiovascular disorders 26 . In addition, we observed high abnormality levels for other CSF measurements previously associated with LOAD progression, for example, tau, ptau, ferritin and Ab 1-42 (scored in the total LOAD-abnormality positions 4, 6, 10 and 13, respectively, out of all of the 87 Element i,j represents the total percentage of regions and time points at which the biological factor j is more abnormal than is the factor i. (b) Multifactorial temporal ordering in disease progression, based on the factor-specific abnormality trajectories (temporal abnormalities averaged across all brain regions), memory deficit and three CSF biomarkers (Ab 1 À 42 , tau and ptau). All of the results were calculated for the HC to LOAD clinical transition. Dotted lines indicate 95 % confidence intervals, reflecting the uncertainties associated to the estimated mean trajectories, and obtained with 500 bootstrapping resamples. Inset figure provides detail of the trajectories obtained for early states of the disease (HC to EMCI transition). Note how in the initial states the vascular component is separating from the other components, while Ab, metabolism and functional dysregulation remain close, with a notable overlap among their confidence intervals, until more advanced pathological states. See Supplementary Fig. 2 for equivalent results obtained evaluating the model assuming a sigmoid (instead of linear) relationship between age and disease state, respectively.
considered CSF biomarkers). However, and contrary to what has been suggested by previous LOAD models 9,10 ( Fig. 6a), alterations observed in Ab 1-42 proteins were considerably lower than those observed for other CSF proteins (for example, hFBAP, cortisol and Apo A, with approximately a twofold higher average abnormality level for these latest descriptors). This effect was consistent from early to advanced disease states, which might suggest that hFBAP, cortisol and Apo A protein levels in the CSF could be earlier LOAD biomarkers than Ab 1-42 concentration. Among all of the studied plasma biospecimens, interferon-ginduced protein 10 (IP-10) presented the highest abnormality levels (Figs 5a and 6b, Supplementary Table 2). Alterations in plasma IP-10 reflect peripheral inflammation processes, which are a characteristic feature in aging and associated neurodegenerative disorders 27 . Among other functions, IP-10 is a strong modulator of angiogenesis 28 , which has a key role in poor vascularization and abnormal vasculature disorders 29 . IP-10 was followed in total abnormality levels by pregnancy-associated plasma protein A (PAPP-A), a predictor of adverse vascular events, including high risk of heart infarction 30 . Total and intact proinsulin followed IP-10 and PAPP-A in plasma abnormality levels. Proinsulin is the main precursor of insulin (scored at position 10 out of all the 146 plasma biomarkers). The consistent alterations of insulin and associated proteins in LOAD, and the presence of common cellular responses and pathogenesis, have motivated the classification of this disorder as a form of type III diabetes 31 . Peripheral insulin is suggested to enter the brain via a saturation mechanism involving the blood-brain barrier (BBB) 32 . Alterations in BBB permeability, which recently have been observed at early stages of LOAD 33 , might be associated with alterations in brain insulin resistance 34 . Moreover, peripheral and brain insulin alterations may alter the BBB transport of amino acids and drugs 32 , as well as induce changes in brain glucose, Ab and ptau regulations 35 . Glutathione S-transferase alpha and plasma matrix metalloproteinase 1 (MMP1) proteins were also identified with high abnormality levels. Glutathione S-transferase alpha alterations are strongly associated with oxidative stress 36 , which is caused by the age-dependent imbalance between the generation and detoxification of reactive oxygen and nitrogen species 37 . Among other relevant pathogenic functions, oxidative stress constitutes a regular pathway for different brain mechanisms leading to BBB dysfunction 38 . Brain MMP1 concentrations have been found to be significantly elevated in LOAD subjects 39 . Matrix metalloproteinase alterations are thought to be linked to neuroinflammatory processes 40,41 and BBB dysfunction 39,41 . In summary, changes of these plasma biomarkers suggest an early alteration of the peripheral vascular system during LOAD progression, as well as allude to other relevant pathologic mechanisms (for example, inflammatory hyperactivation).

Discussion
Neurodegenerative disorders are the consequence of agingassociated multifactorial biological dysfunction. Sophisticated modelling of spatiotemporal abnormalities associated with LOAD progression is a crucial step towards the understanding of the pathological mechanisms underlying this disease, and possibly contributing to the development of effective, disease-modifying therapies. Here we reconstructed LOAD-abnormality trajectories for multiple in vivo brain and peripheral biological descriptors. Under the assumption that these biomarkers represent specific physiological processes, we obtained, for the first time to our knowledge, an integrative data-driven model of LOAD progression. In general, our results suggest the role of vascular dysregulation on the early cascade of events associated with the disease's progression (Figs 3-6, Supplementary Tables 2 and 3).
Due to the increasing high prevalence of LOAD and other neurodegenerative diseases, it is imperative to define realistic biomarker-based models of disease progression 42 . Compared with traditional observational disease models 9,10 (see Fig. 6 for a comparison with our results), data-driven models are less susceptible to subjective expert criteria, as the conclusions are 3.5 Figure 4 | Regional total abnormality levels associated with LOAD progression. Brain regions were sorted from maximum to minimum total effect values, to illustrate their multifactorial damage. Note the across-brain consistent change in the vascular component, which is considerably less prominent for other factors (for example, functional and structural alterations).
dictated by real data values [11][12][13][14]17 . Also, and importantly, datadriven analyses allow for adding and interrelating large amounts of diverse/complementary data (see refs 14,17). For example, in this study we summarized in a comprehensive framework the information contained in five different brain imaging modalities and tens of peripheral biospecimens. Previous generative models of LOAD progression were notably valuable for understanding the underlying pathological mechanisms [11][12][13]43,44 . These models offered spatiotemporal descriptions of the amyloid, metabolic and/or structural changes associated with LOAD. Here we  extended such models to capture disease-related changes in other relevant imaging (that is, vascular, functional) and protein (that is, plasma, CSF) descriptors. This broad extension allowed us to obtain an integrative multifactorial description of the disease's progression, characterizing significant pathological alterations from the molecular to the macroscopic scales. A quantitative comparison between the mentioned data-driven models [11][12][13][14]17 is still missing. Due to its importance for recognizing/summarizing methodological differences and to reach a deeper consensus about LOAD progression, it will be among the objectives of our future research.
The comparison between unhealthy and healthy aging and the generation of temporal abnormality trajectories are crucial steps in our progression analysis. First, time plays a central role in the causal cascade of events that contribute to adverse clinical symptoms. Also, progressive neurodegenerative diseases can temporally coexist with non-pathological aging effects, making age a major confounding factor in biomarker examinations 45 . Findings obtained at a given cross-sectional point do not necessarily represent the dynamics of abnormal events associated with a large pathological progression, neither are they associated with time-dependent multifactorial pathological interrelations. Thus, in addition to its multifactorial attribute, the temporal aspect of the proposed generative analysis implies key advantages over traditional neurodegenerative cross-sectional studies. Methodologically, this study provides important contributions towards the integration of multi-modal data sets (for example, bridging neuroimaging and molecular fields) and its application in the study of aging and neurodegenerative disorders.
The causal role of vascular dysregulation on LOAD has been suggested from the beginning of the 20th century (for a recent review, see ref. 3). Although largely undervalued under the rise of other recent hypotheses (for example, the amyloid cascade hypothesis), a growing body of evidence supports the idea that vascular dysregulation is a major risk factor for LOAD development [3][4][5] . For instance, a significant age-dependent BBB permeability breakdown, that correlates with cognitive dysfunction, has been observed in the human hippocampus 33 .
This aging effect is thought to have a key impact on BBB-mediated misfolded protein clearance and deposition, and consequently on associated misfolded protein toxic effects. In line with this, our results (Figs 3-6, Supplementary Tables 2 and 3) suggest that vascular dysregulation is an early pathological event during disease development, followed in biomarker changing levels by Ab deposition, metabolic dysfunction, functional impairment and structural atrophy. Although our MFDDA does not reveal causal pathologic interactions, concordant evidence suggests that in LOAD Ab deposition is mainly caused by a deficiency in the Ab clearance system rather than by an Ab overproduction 43,46 , whereas Ab clearance is associated with the vascular system's integrity [47][48][49] . Ab efflux across the BBB sequestrates around 60% of the brain's Ab proteins 48 . At the same time, Ab has vascular destructive activity, making the cerebrovasculature a primary target for Ab toxicity 50 . Also, Ab is thought to have a negative impact on mitochondrial function, which consequently may increase reactive oxygen species (ROS) production and reduce mitochondrial Ab clearance, in a continuous feed-forward mechanism 51 . While a decrease in energy availability may affect cellular activity and lead to brain functional impairment, the neurodegenerative progression may decrease glucose metabolism because of the reduced synaptic energy needs 52 . All together, vascular dysregulation, Ab toxicity, failure in cellular energy demands (that is, hypometabolism) and functional toxicity may cause neuronal/glial cell death (that is, structural atrophy) and cognitive decline in a continuous degenerative cycle.
Importantly, multifactorial pathological interactions are not restricted to local/regional levels. From the microscopic to the macroscopic scales, multimodal brain connections can also be a conduit for disease spreading mechanisms (for review see ref. 52). In addition to contributing to the intercellular transference of factor-specific abnormalities (e.g. propagation of neuronal/ synaptic toxicity across anatomical/functional connections), strong relationships persist among the different forms of brain connectivity. For example, the vascular system supplies oxygen, glucose and other nutrients, and clears away deoxygenated blood and metabolic products, having a direct impact on the    Supplementary Tables 2 and 3). Confidence intervals were omitted for visual clarity. Crucial inter-model differences are: (1) the absence of a vascular component in a and the subsequent assumption that Ab measurements are the earliest biomarkers, whereas in b the vascular dysfunction is the earliest/stronger altered event, followed by Ab deposition; (2) CSF Ab 42 and tau are proposed in a as the two major proteinopathies underlying LOAD, with higher sensitivity to disease progression than the metabolic/structural and memory biomarkers, however our results suggest that these proteins were not the strongest altered CSF proteins during disease progression (for example, plasma IP-10, PAPP-A and total proinsulin, and CSF hFABP, cortisol and Apo A, showed higher sensitivity) while imaging and memory biomarkers appeared consistently as earlier biomarkers (see Results section, and Supplementary Tables 2 and 3); (3) in a, abnormalities in cognitive decline are only detectable at advanced abnormality levels for the considered biological biomarkers. In contrast, in b, alterations in cognition are observable in parallel with changes in the primary disease factors (for example, vascular/metabolic dysfunction and Ab deposition) and exceed in magnitude alterations observed for CSF Ab 1 À 42 , tau and ptau.
brain's functional/metabolic activity 53 . However, in a feedback relationship, drastic changes in functional/metabolic activities can notably modulate brain vascular networks 54 . This connectivitymediated effect can explain spatial mismatches between factor abnormalities observed during disease progression. For example, while regional metabolic alterations in LOAD have been found to follow Ab deposition in many brain regions 55 , the remaining spatial mismatch between hypometabolism and Ab binding could be explained by functional connections to Ab deposition areas 56,57 .
In addition to highlighting possible peripheral vascular and inflammatory alterations during LOAD progression, our plasma and CSF analysis suggests possible mechanisms contributing to such dysregulations. Peripheral insulin resistance, inflammatory and lipid/fatty acid metabolism alterations, supported by the observed high proinsulin, IP-10, hFABP and Apo A abnormalities, were some of the main pathological mechanisms suggested by the proteomics findings. Some of these have been previously associated with vascular/metabolic integrity and neurodegenerative progression 3,31,34,35 . Potentially, they might be reflecting the cascade of multifactorial pathological events conducive to LOAD. However, as discussed below, we should be cautious about the interpretation of these findings, since our analysis does not reveal direct causal relationships among the considered biomarkers, and consequently, neither among their corresponding biological factors.
In general, these results could contribute to the development of efficient, cost-effective, therapeutic interventions. The proposed MFDDA could be employed to enrich clinical trial populations by allowing for selective enrolment of subjects at a particular pathological disease transition. In this sense, the availability of characteristic abnormality trajectories for different biomarkers can help to evaluate if a given subject with a specific demographic profile is closest (in terms of biomarkers abnormalities) to one (for example, HC to EMCI) or another possible (for example, LMCI to LOAD) clinical transition. The multivariate distances to the reconstructed characteristic trajectories could represent useful quantitative indices of individual disease transitions. Moreover, and importantly, this model could be employed as an accurate quantitative descriptor of drug response, characterizing and predicting future deviations from characteristic disease trajectories generated with and without considering drug effects. The study of patients under a specific medication could allow to obtain biomarker trajectories as a function of age, disease state and medication levels. When a new patient would be analysed, consistent deviations from a specific trajectory and/or closeness to others could be reflecting how well that particular patient is responding to therapy.
In line with previous models of LOAD progression 9-13 , a strong assumption in our study is that the analysed biomarkers precisely reflect specific pathophysiological processes. Qualitative and data-driven models depend on how realistically the available observations represent the underlying biological processes. Although grey matter density and atrophy measurements, obtained with structural MRI techniques, are commonly used to characterize structural brain properties, results and interpretations depend on how accurately the used MRI techniques reflect the real tissue properties and also under which spatial scales these measurements are precise. Similarly, current ASL and PET techniques still offer a limited characterization of the vascular and metabolic/Ab brain properties. Consequently, it is important to exert caution about the observed biomarkers ordering with disease progression. Although the obtained abnormality trajectories may be reflecting a tentative ordering in which pathophysiological events occur, our results should be interpreted more in terms of biomarker sensitivity to disease progression than in terms of causal pathologic interactions conducive of LOAD. In addition, here structural alterations were only evaluated in the grey matter, ignoring possible alterations within the white matter and in its associated structural connectivity patterns. This will be the main focus of a separate study, for which we are combining structural T1 atrophy and diffusion-weighted connectivity measurements 58,59 . Another potential limitation of our study is that all evaluations were performed within a linear regression framework. This could mean that the obtained results are mainly reflecting the linear tendencies in the analysed biomarkers. The alternative use of non-linear modelling techniques (for example, radial basis and kernel functions) may provide a solution to overcome this particular limitation. Similarly, the assumption of an explicit analytic expression associated with each biomarker (for example, equation 3, Methods section) is a limitation in line with some previous data-driven models 13,14,17 . An event-based perspective 11,12 presents the advantage of not assuming any a priori biomarkers shape. However, the latest models demand a high computational cost to test exhaustively all the possible combinations in events ordering, which can make it difficult to apply such perspectives to a high number of multifactorial biomarkers. Finally, and similarly to the previous models 9-14 , here we are not addressing the issue that initial small changes/ alterations in specific biological factors could potentially cause large alterations in other interconnected factors. This traditional limitation suggests the need to study disease progression not only in terms of alteration levels of specific biomarkers, but also through the analysis of the multifactorial causal pathological interactions that take place at the different spatiotemporal scales. Causal analyses could potentially lead us to a more integrative understanding of neurodegenerative progression, and will form the central purpose of our future research.

Methods
Ethics statement. The study was conducted according to Good Clinical Practice guidelines, the Declaration of Helsinki Principles, US 21CFR Part 50-Protection of Human Subjects, and Part 56-Institutional Review Boards, and pursuant to state and federal HIPAA regulations (adni.loni.usc.edu). Study subjects and/or authorized representatives gave written informed consent at the time of enrolment for sample collection and completed questionnaires approved by each participating sites Institutional Review Board. The authors obtained approval from the ADNI Data Sharing and Publications Committee for data use and publication, see documents http://adni.loni.usc.edu/wp-content/uploads/how_to_apply/ ADNI_Data_Use_Agreement.pdf and http://adni.loni.usc.edu/wp-content/ uploads/how_to_apply/ADNI_Manuscript_Citations.pdf, respectively.
MFDDA of LOAD progression. Here we considered that LOAD progression can be characterized by a set of N biological and behavioural descriptors/biomarkers, where each descriptor i (i ¼ 1..N) is uniquely described by its temporal abnormality level (D HC!LOAD i ) and a total abnormality index (Ab HC!LOAD i ). As aging is a major risk factor for LOAD, the dynamic behaviour of the abnormality level should depend on the statistical distance between an unhealthy (LOAD associated) and a healthy aging: where y i is the spatiotemporal function describing the dynamics of the biomarker i during aging, including or not the influence of pathological factors. In addition to age and clinical state, we considered gender (gen), educational level (edu) and apoe-e4 genotype (apoee4) as other aging and disease relevant factors. Note that the inclusion of these variables here should not be considered exclusive of other possible risk factors in posterior LOAD studies (for example, familial history, life style conditions). Then, each biomarker observation can be written as: where f i is a function interrelating the risk factors and biomarker i, DS is the individual disease state, reflecting LOAD progression and symptoms severity, and e i is a noise term capturing the individual variability associated to i. Outlier detection for imaging biomarkers considered all brain regions, using the multivariate Mahalanobis distance.
Both steps (i) and (ii) controlled for cognitive heterogeneity, whereas (iii) controlled for variability and noise on biological measurements, improving together cognitive and biological data homogeneity at the clinical group levels. Importantly, although the choice of a categorical DS variable may overlook possible individual variability beyond the clinical classification, a strong correlation has been observed between the clinical diagnoses within ADNI data and a continuous LOAD disease progression score obtained after integrating multiple cognitive, neuroimaging and biospecimen biomarkers 17 .
Once the parameters for each f i expression were estimated based on the homogeneity/quality controlled data, we marginalized the f i expression by gender, educational level and apoe-e4, from their minimum to their maximum values, respectively (for mathematic details, see Supplementary Note 1). This is equivalent to keeping only the temporal (age) and disease components (DS) in a final y i age; DS ð Þfunction, after weighting by the effects of the marginalized risk factors. Note that such weighting method differs notably from the traditional covariate 'controlling' procedure (that is, removing the covariate effects on each descriptor 13 ), which may cause the loss of relevant risk factor effects on the disease's analysis and the deletion of useful information contained in the data.
Next, using the marginalized estimatorŷ i age; DS ð Þ , each descriptor was analytically reconstructed under different aging and disease conditions. For this, different possible aging and disease trajectories were generated, covering all the possible clinical state transitions during aging, with the DS going from: HC to HC, HC to EMCI, HC to LMCI, and HC to LOAD states, respectively. Similar to the approach described in ref. 13, the generative procedure consisted in evaluating the marginalizedŷ i age; DS ð Þfor different values of age and DS ( Supplementary Fig. 1). For example, for a HC to LOAD transition, ages took on 1560 increasing values (that is, once every week) while the DS incremented linearly with age (or following a sigmoid, reaching similar conclusions; Supplementary Figs 1  where K is a normalization constant, guaranteeing a maximum total abnormality index of 1 across all considered biomarkers. Potentially, some biomarkers can be abnormal at early disease states and tend to have normal values with aging and/or advanced disease progression processes (for example, a protein with an increase in concentration for early disease states, and a posterior concentration decrease due to a down regulation in its associated gene). In such cases the total abnormality index (Ab HC!LOAD i ) could give a biased measure of how early these biomarkers become affected. To overcome this we also calculated intermediate abnormality indexes (Ab HC!EMCI i and Ab HC!LMCI i , respectively), evaluating expression (5) until the corresponding EMCI and LMCI time points were reached (see Supplementary  Tables 2 and 3).
Finally, for each biomarker, we used a bootstrapping procedure (creating 500 different data sets with replacement) to calculate the mean and the 95 % confidence intervals of the obtained characteristic trajectories and their associated abnormality levels (see Fig. 3 Structural MRI acquisition/processing. Brain structural T1-weighted three-dimensional images were acquired for all subjects. For a detailed description of acquisition details, see http://adni.loni.usc.edu/methods/documents/ mri-protocols/. All images underwent non-uniformity correction using the N3 algorithm 63 . Next, they were segmented in grey matter, white matter and CSF probabilistic maps, using SPM12 (www.fil.ion.ucl.ac.uk/spm). Grey matter segmentations were standardized to MNI space using DARTEL tool from SPM12. Each map was modulated to preserve the total amount of signal/tissue. Mean grey matter density and determinant of the Jacobian (DJ) values were calculated for 78 regions covering all the brain's grey matter 15 . For each region, obtained density and DJ values were statistically controlled for differences in acquisition protocols. Both measurements provided equivalent modelling results. All the results/figures presented in this study correspond to the DJ, which constitutes a robust local measure of structural atrophy.
Fluorodeoxyglucose PET acquisition/processing. A 185 MBq (5 þ 0.5 mCi) of [18F]-FDG was administered to each participant and brain PET imaging data were acquired B20 min post injection. All images were corrected using measured attenuation. Also, images were preprocessed according to four main steps 64 : (1) dynamic co-registration (separate frames were co-registered to one another lessening the effects of patient motion), (2) across time averaging, (3) re-sampling and reorientation from native space to a standard voxel image grid space ('AC-PC' space) and (4) spatial filtering to produce images of a uniform isotropic resolution of 8 mm FWHM. Next, using the registration parameters obtained for the structural T1 image with nearest acquisition date, all FDG-PET images were spatially normalized to the MNI space. Regional standardized uptake value ratio (SUVR) values for the considered 78 regions 15 were calculated taking the cerebellum as reference region.
Resting functional MRI acquisition/processing. Resting-state functional images were obtained using an echo-planar imaging sequence, on a 3.0-Tesla Philips MRI scanner. Acquisition parameters were: 140 time points, repetition time (TR) ¼ 3,000 ms, echo time (TE) ¼ 30 ms, flip angle ¼ 80°, number of slices ¼ 48, slice thickness ¼ 3.3 mm, spatial resolution ¼ 3 Â 3 Â 3 mm 3 and in plane matrix ¼ 64 Â 64. Preprocessing steps included: (1) motion correction, (2) slice timing correction, (3) spatial normalization to MNI space using the registration parameters obtained for the structural T1 image with nearest acquisition date, and (4) signal filtering to keep only low-frequency fluctuations (0.01-0.08 Hz) 65 . To have regional quantitative indicators of the brain's functional integrity, fractional amplitude of low-frequency fluctuation 66 , regional homogeneity 67 and functional connectivity degree 68 measures were calculated for each considered brain region. Among these three measurements, low-frequency fluctuation showed the highest sensitivity to disease progression. Consequently, all the results presented in this study correspond to this measure.
Ab PET acquisition/processing. A 370 MBq (10mCi ± 10%) bolus injection of AV-45 was administered to each participant, and 20 min continuous brain PET imaging scans were acquired B50 min post injection. The images were reconstructed immediately after the 20-min scan, and when motion artifact was detected, another 20-min continuous scan was acquired. For each individual PET acquisition, images were initially preprocessed according to four main steps 64 : (1) dynamic co-registration (separate frames were co-registered to one another lessening the effects of patient motion), (2) across time averaging, (3) re-sampling and reorientation from native space to a standard voxel image grid space (''AC-PC'' space), and (4) spatial filtering to produce images of a uniform isotropic resolution of 8 mm FWHM. Next, using the registration parameters obtained for the structural T1 image with nearest acquisition date, all amyloid images were spatially normalized to the MNI space. Considering the Cerebellum as an Ab non-specific binding reference, regional SUVR values for the considered 78 grey matter regions were calculated.
Data availability. All data used in this study is available at the Alzheimer's Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu).