Introduction

Over a century after Charcot, Alzheimer, and Lewy, we still do not fully understand the pathogenic causes of sporadic neurodegenerative disorders, such as Alzheimer’s disease (AD), Parkinson’s disease (PD), amyotrophic lateral sclerosis (ALS), and frontotemporal dementia (FTD) [1]. Consequently, despite recent advances in anti-amyloid therapy for early AD [2, 3], disease modifying treatments generally remain elusive [4]. With a projected three-fold increase in the incidence of dementia by 2050 [5], there is an urgent imperative to identify and therapeutically target the root molecular causes of neurodegeneration.

Historically, physiological changes to the living brain could not be observed. Characteristic post-mortem pathological features, such as amyloid plaques and neurofibrillary tangles in AD or spinal cord degeneration in ALS, were only identified after the advent of histology and the clinic-anatomical method [6, 7]. Lacking early biomarkers, the nosology of neurodegenerative diseases has been driven primarily by clinical symptoms. For example, the diagnosis of AD is an evolving concept, with distinct clinical and pathobiological definitions, intertwined with the historical concept of “senile dementia” [8]. However, there is significant symptomatic as well as pathological overlap between neurodegenerative diseases [9] and with normal brain aging [10]. As such, even expert clinicians can make erroneous diagnoses, with around one-fifth of patients being clinically misdiagnosed with AD or PD compared to post-mortem pathological examination [11, 12]. Formalized criteria attempt to codify supportive and exclusionary features to standardize diagnosis [13], but, without definitive biological definitions and markers, it can be difficult to distinguish risk factors from prodromal disease features to place patients along an expected progression timeline [14]. As a result, our current conceptions of the major neurogenerative “diseases” are arguably in fact syndromes, grouped together by shared clinical manifestation but potentially obscuring diverse underlying physiological mechanisms [15,16,17].

On the other hand, neuropathological examination at autopsy shows that most patients have mixed pathology [18,19,20,21,22,23,24]. From autopsy analysis of 10 pathologies in individuals from 8 diagnostic classes, Robinson et al. found 161 different pathological combinations, with up to 7 present concurrently in a given individual [25]. In particular, tau pathology is nearly universal across the major neurodegenerative disorders [26]. In addition to the characteristic accumulation of amyloid and tau, co-pathological TDP-43 and α-synuclein are present in one-third and half of AD patients, respectively [26]. AD is also associated with neuroinflammation and metabolic dysregulation [27,28,29], and shares risk factors, pathology and symptoms with vascular dementia [30]. Patients of both sporadic and genetic variants of FTD have tau, TDP-43, and other proteinopathies [31], likely forming a continuum with ALS patients [32]. Furthermore, FTD itself encompasses several clinical syndromes with shared symptoms, including behavioral variant FTD, (semantic and non-fluent) primary progressive aphasias, progressive supranuclear palsy (PSP), and corticobasal syndrome (CBS) [33]. Synucleinopathies, such as PD, dementia with Lewy bodies (DLB) and REM sleep behavior disorder, also exhibit overlapping clinical, neurochemical, and morphological characteristics [34]. While PD is primarily associated with movement dysfunction due to nigrostriatal dopaminergic loss, patients also present various neurobiological alterations having strong associations with multiple neurotransmitter systems and peripheral organs [35]. The presence of co-pathologies could affect the observed efficacy of treatments and clinical trials [36], and require a more nuanced approach involving individualized and multi-factorial treatment [37].

While early pathological studies were limited to post-mortem autopsy, the advent of in vivo biomarkers in recent decades has allowed quantitative assessment throughout disease progression starting from preclinical or prodromal stages. Non-invasive neuroimaging techniques have enabled the characterization of structural, functional, proteinopathy, vascular, and metabolic alterations, revealing long periods of preclinical pathogenesis [38]. Trading spatial specificity for improved temporal resolution, electrophysiological modalities such as EEG/MEG can evaluate regional and network activity dysfunction [39]. On the other end, bulk tissue and single cell/nucleus transcriptomics can achieve microscopic spatial resolution, although they are dependent on the acquisition of post-mortem brain tissue and thus more restricted in spatial coverage and sample size [40, 41]. In addition, many plasma, cerebrospinal fluid (CSF), and peripheral markers have shown promise for integration into clinical practice [42].

Following the emergence of potential in vivo biomarkers, there have been increasing efforts to define neurodegenerative diseases, and categorize and stage patients based on underlying biological alterations rather than by clinical symptoms [43,44,45,46,47,48]. For an autosomal dominant disorder with a genetic continuum such as Huntington’s disease (HD), the starting point can be defined by genotype, followed by pathological biomarkers, and finally the appearance of symptoms and functional changes [48]. Alternatively, the amyloid/tau/neurodegeneration (A/T/N) framework [46] does not consider temporal ordering, but instead categorizes patients along an “Alzheimer’s continuum” based on a combination of binary features: namely, the presence of (e.g., CSF or PET) markers of amyloid and tau pathology as well as neurodegeneration or neuronal injury (e.g., hippocampal volume, cortical thickness, or CSF neurofilament light) [49]. With the development of CSF α-synuclein seed amplification assays [50], biomarker-based criteria are now emerging for PD. Two recently proposed approaches, SynNeurGe and the Neuronal Synuclein Disease Integrated Staging System (NSD-ISS), classify PD- and DLB-related disorders based on genotype, the presence of (e.g., CSF) α-synuclein, and imaging markers of PD-associated neurodegeneration without necessitating clinical symptoms [51, 52]. Theoretically, biomarker-based categorization can also flexibly incorporate other forms of pathology (e.g., vascular or metabolic indices in AD [28]) and alternative markers of the same pathology, although alignment with symptoms and clinical diagnosis appears to be sensitive to the specific choice of biomarkers for the A/T/N framework [49]. Biomarker-driven categorization is expected to improve the biological homogeneity of preclinical and prodromal subjects enrolled in clinical trials [48, 51, 52], but it remains to be seen whether the correct physiological factors are being considered [53]. The implications of clinical and pathological intra-disease heterogeneity and inter-disease overlap require further clarification. Perhaps a more integrative taxonomy of neurodegenerative disorders is needed, considering the multi-dimensional variability of clinical, anatomical, molecular, and etiological factors. To this end, a branching hierarchy considering divergence in genetics, followed by molecular pathways, and finally modifiable risk factors has been proposed [54].

Regardless of disease definitions, there are critical open questions about the mechanisms of onset and progression. Are the varied manifestations of each disorder diverging responses to a common, latent cause, or a combination of distinct underlying processes resulting in similar clinical syndromes [44, 55]? Are the various culprit proteinopathies the true etiology of neurodegenerative disorders, or are they the consequences of compensatory mechanisms [56]? What factors underlie varying therapeutic needs and treatment responses of patients with the similar clinical diagnosis?

To address these questions, there is a need for a systems-level understanding of interactions across various physiological systems and levels of brain organization [57]. Multi-modal computational modeling can support these efforts by integrating data types across spatial and temporal scales in biologically interpretable formulations. In this review, we cover recent advances in the computational modeling of spatiotemporal brain alterations in various neurodegenerative disorders. We particularly emphasize how data-driven in silico models, which are fit to empirical observations without necessitating detailed a priori knowledge about underlying mechanisms, can evaluate disease hypotheses and impact clinical practice. These approaches are presented in increasing order of mechanistic detail, summarized in Table 1. We begin by introducing continuous- and discrete-time disease progression models (DPMs) that stitch together data from cross-sectional observational studies to infer the order of physiological alterations and their variability in patients. Although these methods can flexibly incorporate multiple modalities with minimal a priori specification, they cannot resolve potential interactions between physiological variables. Addressing this consideration with causal structure, we discuss dynamical system models of interacting physiological factors and network propagation. These models have an element of mechanistic insight, and recent studies have extended them with the molecular and cellular architecture of the brain. Finally, we consider multi-scale biophysical models, where effects explicitly propagate from microscopic cellular mechanisms to mesoscopic circuits and macroscopic signals reproducing empirical neuroimaging and electrophysiological data. Together, this body of work follows the general theme of inferring latent disease mechanisms by fitting interpretable whole-brain computational models to observable biomarker data.

Table 1 Computational approaches to integrating multi-modal neuroimaging data to characterize disease progression and infer latent mechanisms.

Biomarker trajectories in latent disease time

Neurodegenerative disorders can affect multiple symptomatic domains, including memory, language, and executive dysfunction, and involve diverse physiological alterations, such as proteinopathy, cerebrovascular impairment, atrophy and hypometabolism. Often, the profile of physiological and symptomatic deterioration is characteristic of the stage of disease progression. For example, incontinence followed by sleep disorders are some of the earliest symptoms of PD (occurring 1–2 decades before motor symptoms) [58], and, during the long prodromal phase of AD, decline in semantic memory precedes more global cognitive deficit and eventually dementia [59].

The idea that the neurodegenerative disease progression follows stereotypical hierarchies quantifiable by biological (rather than clinical) variables can be traced to post mortem pathological staging [60]. In the early 1990s, Braak and Braak identified a characteristic sequence of neurofibrillary tangle progression in the brains of AD patients (Fig. 1A), from transentorhinal (Stages I-II) to limbic (Stages III-IV) and finally neocortical (Stages V-VI) regions [61]. Neuropathological staging has since been attempted for various proteinopathies, diseases, and cohorts [62, 63]. These studies emphasize the importance of the disease-specific spatiotemporal progression of pathological factors; for example, while tau pathology is involved in both AD and chronic traumatic encephalopathy (CTE), it follows distinct spreading patterns in the two disorders [60].

Fig. 1: Data-driven biomarker trajectory inference and staging.
figure 1

A Neuropathological staging systems, such as the Braak stages for AD, represent the earliest attempts to identify characteristic pathophysiological progression patterns [61]. The accumulation of neurofibrillary tangles begins in transentorhinal regions (Stages I and II) and propagates along a stereotypical pattern to limbic (Stages III and IV) and neocortical (Stages V and VI) regions. The figure has been adapted with permission from [332]. B Using in vivo (imaging, fluid, and clinical) biomarkers from large observational studies, continuous-time disease progression models attempt to stitch together data points from many subjects to infer population trajectories along a latent disease time. With minimal a priori assumptions, these methods must account for inter-subject variability in disease onset and progression rate, as well as the potential existence of sub-populations with distinct trajectories. C Event-based modeling is another approach to characterizing biomarker alterations over disease progression. (Left) This method does not explicitly model the trajectory along a latent temporal variable, but instead identifies the most likely sequences of biomarker alterations, along with their uncertainty represented by the gray elements in this positional variance diagram. These markers can be any combination of features from different brain regions and modalities. (Right) Event-based modeling is the basis for simultaneous Subtyping and Stage Inference (SuStaIn), a method that identifies sub-populations with varying event sequences. For example, SuStaIn identified 3 subtypes of AD atrophy progression, corresponding to typical, cortical-dominant, and subcortical patterns. The figure was originally published in [132], is covered by the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/) and has been adapted to show only 4 disease stages.

The establishment of in vivo (PET, MRI, and CSF) markers in clinical practice has offered a chance to extend staging systems to the preclinical phase, closer to pathogenesis. In an influential work, Jack et al. proposed a hypothetical cascade of multiple biomarkers in AD [64], conceptually similar to Fig. 1B. Echoing the traditional amyloid hypothesis [65], these biomarker curves implied that abnormal levels of amyloid and tau accumulation are followed by structural alterations, which finally lead to clinical symptoms [64]. An important corollary of this would be that upstream physiological alterations can signal the onset of symptoms, allowing early diagnosis and treatment. Enabled by large, observational imaging initiatives in various neurodegenerative disorders [66,67,68], many studies have since attempted to test hypothetical cascades and uncover the true orderings of biomarker alterations using DPMs.

Constructing disease trajectories from cross-sectional data

These DPMs are generally data-driven, typically fitting monotonic functions to empirical biomarker data with minimal assumptions about the underlying mechanisms. A key problem in fitting such trajectories is the absence of observations covering the entire course of disease progression in any single subject. Longitudinal studies are typically much shorter than the decades-long periods of preclinical, prodromal, and finally symptomatic progression of most neurodegenerative disorders. As a result, inferring population biomarker trajectories over the entire course of a disease requires stitching together data from subjects at varying disease stages (Fig. 1B) [69]. This data, whether from a single visit or a sequence of measurements, can have inter-individual variability in disease stage and severity, and subjects may not follow identical trajectories.

For simplicity, we first consider the case where there is a common population trajectory; that is, the sequence and severity of biomarker alterations is relatively consistent across patients. Individuals’ snapshots must be temporally aligned to correctly place each subject in the population trajectory. Continuous-time DPMs usually achieve this by arranging subjects according to a latent temporal variable, usually referred to as “disease age”, “disease time”, “disease progression score” (DPS) or pseudotime. This disease age is distinct from chronological age but better reflects onset and progression from patients’ markers [70]. To fit long-term, multivariate population biomarker trajectories from longitudinal snapshots over a shorter period, a popular approach for continuous-time DPMs has been to combine (i) mixed-effects modeling to account for subject-specific random effects on a fixed population trajectory, and (ii) self-modeling regression to adjust the population trajectory for individualized onset and rate of progression along a common latent disease time. In the remainder of this section, we will discuss some applications of this paradigm (as well as others) to various neurodegenerative disorders.

Familial age of onset as scaffolding for disease time

Due to a degree of predictability imposed by genetic risk, autosomal dominant disorders such as dominantly inherited AD (DIAD) and familial FTD are a suitable testbed for the DPM temporal alignment problem. In these disorders, individuals highly likely to progress to dementia can be identified in the preclinical phase. DIAD is relatively rare (around 1% of total AD cases) [71] and occurs significantly earlier (around 30–50 years of age) [72]. Unlike sporadic AD, which has no Mendelian inheritance pattern, DIAD is associated with pathogenic mutations of amyloid protein precursor (APP), presenilin-1 (PSEN1), and presenilin-2 (PSEN2) [71]. Although age of onset can vary, with over one-quarter of at-risk siblings developing familial AD more than 10 years apart in age [73], a systematic review and meta-analysis suggests that parental age of onset explains over 38% of the variance [74]. Likewise, autosomal dominant inheritance is observed in 10–15% of FTD patients due to mutations in genes such as progranulin (GRN), microtubule-associated protein tau (MAPT) and chromosome 9 open reading frame 72 (C9orf72), as well as others [75]. The influence of genetic risk on age of onset in familial FTD is genotype-dependent [76]. This heritability of disease onset age has informed early attempts at modeling biomarker progression in DIAD.

For example, Bateman et al. estimated expected years to/since disease onset in DIAD by subtracting parental age of onset from patients’ chronological age [71]. This estimate was used to fit linear mixed effects models of multi-modal biomarkers. The resulting population trajectories suggest that CSF amyloid is the earliest biomarker to become abnormal (declining up to 25 years before symptom onset), followed by amyloid PET, CSF tau, and atrophy (15 years before onset), hypometabolism and episodic memory dysfunction (10 years before onset), and cognitive impairment (5 years before clinical diagnosis).

In familial FTD, atrophy patterns vary by genotype between carriers of C9orf72, GRN, and MAPT variants [77]. For these mutation carriers, Staffaroni et al. predicted symptom onset using a joint Bayesian mixed effects model of longitudinal clinical assessments, regional brain volume, and plasma neurofilament light chain (NfL) data [78]. Estimated disease onset ages were sampled from a prior distribution of carriers of the same mutations, and biomarker functions were fit with mutation-specific temporal shift and scale parameters. Using this method, regional brain atrophy and elevated plasma NfL levels were found to appear 10–40 years before noticeable symptomatic deterioration across genotypes [78].

Other studies have attempted to extrapolate models from genetic to sporadic disease. In a genetic AD cohort, Almkvist et al. fit curvilinear functions mapping years to expected clinical onset to various cognitive assessments [79]. If non-familial AD follows a similar population clinical trajectory, inverting these relationships could be used to infer years to/since clinical onset from shared cognitive assessments. This calculated disease age did correlate better with CSF and imaging biomarkers than chronological age in non-familial mild cognitive impairment (MCI) and AD patients [80], with a bimodal distribution of onset age corresponding to early- and late-onset forms of sporadic AD.

It is important to test the assumptions of these extrapolations, and consider the extent to which familial and sporadic variants of a disorder are aligned in their pathological cascades. DIAD presents an opportunity to study the pre-symptomatic stage in carriers of risk variants who will go on to develop AD [72] with a somewhat predictable age of onset [74]. While similar trajectories of posterior cingulate amyloid deposition and memory decline have been noted in DIAD and sporadic AD patients, the latter display faster hippocampal atrophy [38, 65, 71] and an amyloid-independent medial temporal tauopathy [81]. A recent comparison of DIAD and sporadic early-onset AD clinical and biomarker progression reiterates that the former is more homogeneous, while the latter is more likely to exhibit atypical phenotypes [82]. For example, unlike sporadic AD, almost all DIAD patients exhibited an amnestic syndrome. Differences in genetic risk factors may drive the heterogeneity of sporadic AD [82], as well as potential later onset age, as more varied risk factor and comorbidities can accumulate over time.

Estimating disease onset in sporadic disorders

The pathological processes underlying sporadic neurodegenerative disorders such as AD and PD also begin decades before their characteristic symptoms [45, 58, 83]. Several early works used standardized clinical assessments to align subjects [84, 85]. In the Alzheimer’s Disease Neuroimaging Initiative (ADNI) cohort, Yang et al. fit individual-specific functions to longitudinal cognitive scores with temporal offsets representing age of onset [86], and applied this subject ordering to other biomarkers. While the inferred ordering was consistent with the hypothetical cascade [64], this method did not account for subject-specific variability in rate of progression, which can be a major consideration in AD [87].

Subsequent DPMs have considered inter-individual variability in rate of progression [88]. In an exemplar study on the ADNI cohort, Jedynak et al. iteratively fit i) a subject-specific DPS as a linear function of chronological age and ii) population biomarker curves as sigmoidal functions of DPS [88]. To simultaneously fit non-linear population curves and subject-specific disease time, many studies use iterative or Bayesian approaches [70, 89] [70].

In contrast to the hypothetical biomarker cascade of Jack et al. [64], the Rey Auditory Verbal Learning Test was the first marker to become abnormal in the data-driven model of Jedynak et al., followed by hippocampal volume and CSF amyloid and tau concentrations [88]. Considering covariates, Ishida et al. found genotype-dependent timing of cognitive decline in female AD patients [89].

In addition to determining the timing of biomarker alterations, DPMs can be used to stage individual patients and predict the onset of clinical symptoms. Combining Bayesian inference with flexible logistic basis functions and stage-dependent rates of progression, Bilgel and Jedynak predicted age of dementia onset in the ADNI cohort with a root mean-squared error of 1.5 years [90]. The distributions of model-inferred disease times differ significantly between diagnostic classes in the AD spectrum [70]. Such estimates of latent disease time can be used to define clinical trial endpoint [78]and detect treatment effects using fewer participants [89]. DPM-inferred individualized disease time can also be the basis for data-driven probabilistic diagnosis and estimation of time to conversion [91]. Based on the DPM of Lorenzi et al., the transition from healthy to diseased state in AD largely corresponds to hypo-metabolism and temporal atrophy, with more advanced stages reflected by neuropsychological markers [91]. These results suggest that integrative, model-based disease time inference is particularly useful for early disease stages, when clinical symptoms are less evident. Furthermore, these DPMs can flexibly incorporate clinical, imaging, and fluid biomarkers, as well as other features such as topological properties of spatial brain maps [92].

Temporal associations between markers

Based on observational studies, the DPMs presented so far do not provide explicit evidence for causality. However, a simple yet critical epidemiological evidence of causality is temporality; presumed causes must precede their consequences. [93]. From this lens, the timing of biomarker alterations from DPMs can be used to evaluate disease hypotheses.

A major topic in AD research is the relationship between the two defining pathologies: amyloid and tau. Amyloid is believed to facilitate tau pathology, but the two proteinopathies also appear to have synergistic as well as independent effects [94, 95]. Developed for survival analysis, the framework of accelerated failure time (AFT) is a straightforward way to evaluate temporality via a common population biomarker trajectory with individual-specific temporal shifts. Based on AFT analysis, model-inferred individual temporal shifts of amyloid and tau accumulation in the AD spectrum are better correlated with increasing proteinopathy burden than chronological age. APOE ε4 genotype shifted both amyloid and tau curves earlier, by 6.1 and 2.6 years, respectively. These curves were also moderately correlated, with an average delay of 13.3 years between amyloid and tau accumulation [96]. While the AFT analysis does not demonstrate causation, it shows how the timing of amyloid and tau pathology are related and affected by covariates. Disentangling synergistic from independent effects of amyloid and tau requires more detailed mechanistic modeling, discussed in later sections.

Cerebrovascular disease (CVD) pathology also frequently co-exists in AD, suggesting a potential relation to proteinopathy accumulation. Comparing disease trajectories of CVD-associated white matter hyperintensities (WMH) and fractional anisotropy (FA) with AD-associated amyloid and tau PET imaging shows moderate within-disease temporal correlations between individualized timings of amyloid and tau accumulation (r = 0.57) and WMH and FA alterations (r = 0.44) in the AD spectrum [97]. However, these imaging measures of CVD and AD pathology did not show strong correlations across disease measures nor with hippocampal volume, nor were associations with clinical symptoms considered. As a result, the authors propose that vascular and proteinopathy components in AD represent independent mechanisms [97]. However, interpretations are limited by the non-specificity of imaging measures to vascular pathophysiology, as well as aspects of vascular dysfunction not captured by WMH or FA.

Perfusion imaging modalities such as arterial spin labeling MRI can measure vascular function [98], which may be disrupted before structural alterations. Acknowledging the presence of diverse physiological alterations in late-onset AD, Iturria-Medina et al. fit multi-modal (structural, functional, metabolic, amyloid, and tau) imaging, CSF, and plasma mixed-effects models to infer biomarker abnormality as the distance between diseased and healthy trajectories [99]. Notably, vascular alterations (from arterial spin labeling MRI) preceded all other biomarker alterations, and memory deficit was observed early and continued to decline in parallel with neuroimaging- and biospecimen-based markers over disease progression.

Assumptions about trajectory shape

In addition to inter-subject variability in timing, assumptions about trajectory shape and biomarker dynamic range are important considerations in fitting empirical data. Using a non-parametric approach to fit monotonic splines to population trajectories in the ADNI cohort shows varying degrees of linearity and sigmoidal form among biomarkers [69], suggesting that some biomarkers did not capture the final, plateauing stage of a hypothetical sigmoidal disease trajectory. On the other hand, hippocampal volume had the highest signal-to-noise ratio at these disease stages, in agreement with another non-linear mixed-effects model where it was the largest contributor to model-inferred disease time [69, 89].

A common assumption is that trajectories are monotonic, with markers progressing consistently from normal to diseased levels. It is important to note that actual biomarker progression may not conform to assumptions about trajectory shape, such as linearity, exponentiality, sigmoidal shape or even monotonicity, especially when considering features derived from topology [100] or dimensionality reduction [101]. Relaxing assumptions about mean trajectory shape apart from smoothness, Schmidt-Richberg et al. developed a probabilistic method based on vector generalized additive models (VGAMs) to estimate disease stage and rate of progression using quantile regression [102]. Using converter subjects that progressed to a worsened disease state, this method fits biomarker probability density functions for clinical assessments and low-dimensional projections of imaging data obtained using Laplacian eigenmaps [101], while handling missing data and non-monotonic biomarker trajectories. Addressing the common trajectory assumption, Guerrero et al. transitioned from mean population to individualized disease progression models by selecting a subpopulation of similar patients based on neighborhood in a low-dimensional projection [103]. The theory of fitting subject-specific trajectories as temporally re-parameterized, spatially-shifted variants of a group trajectory that is a geodesic on a Riemannian data manifold has also been mathematically developed [104], and applied to high-dimensional cortical thickness features [105] as well as neuropsychological data from ADNI [104].

Feature selection and inferring disease time from high-dimensional data

At a finer resolution than ROI-averaged features, other works have applied DPMs to voxel- or vertex-wise data. This higher-resolution characterization can help resolve pathological trajectories that may be regionally variable. Bilgel et al. [106] extended an earlier DPM [88] to voxel-wise amyloid PET data from the Baltimore Longitudinal Study of Aging (BLSA) cohort, finding the earliest amyloid accumulation in the precuneus despite its similar rate of change as other cortical regions, which is consistent with other studies [107]. Notably, their calculated DPS correlated better with mean cortical distribution volume ratio than subject-specific offset and rate of change parameters. Marinescu et al. developed Data-driven Inference of Vertex-wise Evolution (DIVE) [108], which was used to infer sigmoidal biomarker trajectories of vertex clusters in AD and posterior cortical atrophy (PCA) from cortical thickness MRI and amyloid PET data. By iteratively clustering vertices, estimating biomarker trajectories for each cluster, and inferring disease pseudo-time, DIVE can automatically segment the cortex into (potentially disconnected) regions sharing similar progression patterns.

Model scalability with large numbers of features is an important consideration for high-dimensional data from transcriptomics, proteomics, and epigenomics. Analogous to DPMs, trajectory inference methods are commonly used in single-cell analyses to characterize dynamic cellular processes such as differentiation and life cycles from single-cell omics [109]. These concepts have also been applied to infer population trajectories from cross-sectional data [110, 111]. The general approach to trajectory inference is to fit a graph to individuals’ data points in a reduced dimensional space, linking them along a continuum that can be used to calculate a pseudotime (which in this context is equivalent to a disease time or progression score). Prioritizing variance between patients and controls during dimensionality reduction, a contrastive trajectory inference algorithm was applied to bulk tissue post-mortem brain and in vivo blood gene expression from cross-sectional cohorts of late-onset AD and HD patients [110]. This method used distance along a minimum spanning tree to healthy control references to calculate patients’ pseudotimes, which are significantly correlated with the severity of neuropathologies (Cerad, Braak, and Vonsattel stages) and cognitive performance. Another study on transcriptomics-based trajectory inference in the AD instead used a manifold learning approach that fits a non-linear transformation to a low dimensional space where subjects have a tree structure. Pseudotimes calculated from this tree did correspond to neuropathological stages and diagnoses, and a “disease resistant state” was also found, consisting of subjects with disease-like transcriptomic profiles but no pathological diagnosis of AD [111]. In general, the choice of dimensionality reduction algorithms and graph structure can influence results, such as the ability to identify branching structure in the data [112]. Beyond transcriptomics, trajectory inference methods have also been applied to voxel-scale imaging data using latent embeddings from variational autoencoders (VAEs) [113].

The impact of disease variability on staging

In general, heterogeneity in biomarker trajectories can be a major confounding factor for staging. Acknowledging the symptomatic and physiological heterogeneity of neurodegenerative disorders, there have been many attempts to identify disease subtypes from clinical data, in vivo markers, and pathology [114]. While a detailed discussion of subtyping is outside the scope of this review, some methodological concerns are covered elsewhere [115]. Typically, subtyping involves unsupervised methods such as clustering or network community detection [116] applied to cross-sectional [117] or longitudinal [118] features. In AD, the consensus from imaging and neuropathology points towards 3 subtypes, representing typical, limbic-predominant and hippocampal-sparing/minimal-atrophy spatial patterns [119, 120], while CSF proteomics-based clustering 5 subtypes with distinct molecular signatures that are identifiable from the pre-clinical phase [121]. However, disease stage can also exert a significant influence on progression-naïve subtyping (e.g., in PD [122]), and distinguishing between effects due to disease progression and trajectory is important [110, 123].

To address both sources of variability simultaneously, expectation-maximization methods can be used iteratively to assign subjects to and construct biomarker trajectories for subtypes, with an initial subtyping solution provided by clustering. Applying this approach to a reduced dimensional fused network of multi-omics (transcriptomics, epigenetics, proteomics, and metabolomics) data identified 3 molecular subtypes in AD [123],

With the presence of disease subtypes, the interpretation of subtype-specific DPS can become more complicated. For example, when subtype-specific DPS reflects the distance from a subject to a healthy control reference population along its trajectory [123], can these scores be compared across subtypes? One way to anchor the subtypes would be to calibrate all subtype-specific DPS in reference to a clinical score threshold. However, this is likely not a major concern in practice, as accurately placing a patient along the expected trajectory of their identified subtype would be more relevant than comparing scores across subtypes.

Patients sharing the same clinical diagnosis may not be biologically homogeneous, and may individually follow distinct trajectories. To account for this variability, we have seen attempts to shift towards unsupervised discovery of subtypes [123] and individualized modeling [103]. Other studies have considered the effects of risk factors, such as APOE genotype, on model parameters [84]. While population disease trajectories inferred from cross-sectional data are informative in understanding the stereotypical sequence of biomarker alterations, it is important to consider the factors that may contribute to heterogeneity during modeling and analysis.

Sequences of alterations in event-based models

In contrast to the DPMs presented so far that assume a latent temporal continuum of disease progression, event-based models (EBMs) order biomarkers according to discrete transitions from normal to abnormal states. Because of this simplicity, EBMs can extract an intuitive biomarker ordering, depicted in Fig. 1C, using cross-sectional data from small datasets [124]. With this practical advantage, applications of EBMs to a variety of imaging, clinical, neuropathological and biospecimen features across diseases have provided data-driven insight into biomarker ordering and their subtype variability.

Discretizing disease stages

As with continuous-time DPMs, some of the earliest EBM studies addressed autosomal-dominant disorders, where carriers can be identified before symptom onset. An influential work by Fonteijn et al. [124] characterized the progression of regional atrophy and clinical diagnosis in familial AD and HD patients. At the core of EBMs are mixture models, a statistical approach to fitting data arising from multiple subpopulations. In the original EBM formulation of Fonteijn et al., a mixture of Gaussian and uniform distributions is fit to each event/biomarker. The Gaussian distribution corresponds to the likelihood of observing a biomarker value when the event has not occurred, while the uniform distribution corresponds to the likelihood given that the event has occurred [124]. An overall likelihood can then be calculated for each sequence of event orderings, and a Markov chain Monte Carlo (MCMC) algorithm is used to sample the posterior distribution of event orderings given biomarker data. A characteristic sequence of events as well as their uncertainty estimates (represented by the gray elements in Fig. 1C) can then be calculated. In familial AD, hippocampal atrophy was the earliest imaging marker, occurring before MCI diagnosis, and soon followed by inferior parietal and precuneus atrophy. In HD, putamen, caudate, thalamus, posterior cingulate and superior frontal atrophy were the earliest markers.

Applications of EBMs to a variety of disorders have reproduced known aspects of disease progression while providing a quantitative staging system. Young et al. [125] extended the original EBM of Fonteijn et al. to sporadic AD, reproducing the early abnormality in CSF protein levels that is consistently observed, followed by regional atrophy rate, cognitive decline, and decreased regional brain volume. Similar EBMs have been applied to anatomical connectivity-derived network measures in sporadic AD [126]. Combining a cross-sectional EBM with longitudinal differential equation modeling of multi-modal biomarkers in DIAD showed that many biomarker alterations accelerate with disease progression [127], in contrast to the sigmoidal plateauing hypothesized by many studies [64]. In DIAD, cortical and then subcortical amyloid accumulation was followed by p-tau, CSF amyloid, and tau, neurodegeneration in the putamen and nucleus accumbens, cognitive decline, cerebral hypometabolism and finally other regional neurodegeneration [127]. Using gray and white matter, brainstem, cerebellar, and ventricular volumes, Eshaghi et al. applied an EBM to transitions from normal to atrophic states of regional gray matter in multiple sclerosis (MS) patients [128]. Consistent with histopathology, primary-progressive and relapse-onset variants of MS showed similar orderings of regional atrophy. These prolific applications reflect the simplicity and generality of the EBM approach.

Estimating disease time from EBMs

Unlike continuous time DPMs, the standard EBM formulation infers only the relative ordering of biomarker alterations, but not any timing between events or global disease time. Variations such as the temporal event-based model (TEBM) assign both a stage and progression risk to individuals, thus placing them on a disease timeline [129]. Notably, TEBM predicts conversion time (when all cognitive biomarkers become abnormal) with more accuracy and precision than the standard EBM or a continuous time Gaussian process DPM. Acknowledging the noise in clinical diagnosis, Venkatraghavan et al. developed a discriminative EBM also incorporating timing between events [130]. This approach first fits PDFs for easily separable subsets of controls and AD patients. It then infers subject-specific orderings, generalizes these orderings to the populations, estimates relative temporal distances between events, and stages patients based on this ordering. Notably, model-derived patient stages from discriminative EBM better reflected the progression of AD patients than stage estimates from other contemporary EBM formulations [130]. In a “typical AD” subset consisting of amyloid-negative controls and amyloid-positive MCI and AD patients, p-tau becomes abnormal before the cognitive assessments. However, with the full dataset, CSF amyloid and cognitive alterations precede p-tau; in general, the biases introduced by inclusion criteria must be considered. Notably, hippocampal volume and other structural measures follow cognitive alterations, potentially due to the insufficient sensitivity of these imaging markers to early mild changes in this cohort [130].

Accounting for heterogeneity of event sequences

The original EBM formulation assumed a common monotonic trajectory for all biomarkers across the cohort. This assumption is likely more valid for certain subgroups, such as patients of autosomal dominant disorders, or amyloid- and/or APOE-positive subjects who show less variability in event sequences [125]. Alternative formulations [131] allow heterogeneity in the temporal ordering of biomarkers using a probabilistic model. However, they considered the probability density functions of pre- and post-event classes to be independent Gaussian distributions for each biomarker, which likely exhibit correlation. Disentangling these distinct sources of variability, Subtype and Stage Inference (SuStaIn) simultaneously performs unsupervised subtyping and temporal disease staging using an iterative training procedure [132]. As a result, it has been able to derive data-driven subtypes based on progression patterns across many diseases.

Using structural MRI data, this method was able to identify known genotypes from imaging data of FTD patients, while proposing two distinct latent phenotypic subtypes linked to the C9orf72 genotype [132]. In AD, three subtypes were identified based on regional origin of atrophy: (i) the hippocampus and amygdala in the typical subgroup, (ii) the nucleus accumbens, insula, and cingulate in the cortical subgroup, and (iii) the pallidum, putamen, nucleus accumbens and caudate in the subcortical subgroup [132]. These data-driven progression subtypes appear to correspond to the 3 neuropathologically observed subtypes of AD [120, 133]. SuStaIn also identified 4 distinct spatiotemporal trajectories of AD tau accumulation, corresponding to different clinical profiles and outcomes [134]. Modeling amyloid accumulation, SuStaIn found cortical and subcortical subtypes with the latter corresponding to more typical AD clinical presentation [135], while another study showed that cortical amyloid deposition is best explained by three subtypes defined by frontal, parietal, and occipital initiation of abnormality [136]. Combining both proteinopathies in AD, SuStaIn also consistently reproduces complementary “amyloid-first” and “tau-first” subtypes from separate modeling using in-vivo PET and neuropathological evaluation [137].

In TDP-43 proteinopathies, including ALS, FTD, and the recently characterized limbic-predominant age-related TDP-43 encephalopathy neuropathological change (LATE-NC) [138], an ordinal variant of SuStaIn has been used to define a more fine-grained data-driven subtyping, staging and disease classification based on neuropathological progression [139]. From atrophy progression in the ALS-FTD spectrum, SuStaIn subtyping found two cortical atrophy subtypes in addition to a normal-appearing group, and staging correlated well with clinical and neuropathological measures [140]. Using diffusion and neuromelanin-sensitive MRI measures in PD, SuStaIn suggested the presence of 2 distinct subtypes, with different clinical and pathological progression [141]. In MS patients, subtypes were defined by normal-appearing white matter, cortical, and lesion subtypes, with the latter having the highest relapse rate and positive treatment response [142].

From sequences of alterations to interactions between biomarkers

In the past decade, both continuous time DPMs and discretized EBMs have helped characterize the sequence of physiological alterations in various neurodegenerative disorders. Some examples of these applications are summarized in Table 2. These methods have attempted to account for inter-subject variability in timing, onset, and trajectory, potential bias due to covariance between biomarkers, and differing trajectory shapes across biomarkers. DPMs have revealed a long prodromal period with multi-factorial alterations, such as the early roles of CSF amyloid accumulation in dominantly inherited AD [71], vascular dysregulation in AD [99], and atrophy and elevated NfL before symptom onset in FTD [78]. Furthermore, DPMs can integrate multiple data modalities to stage patients and estimate future progression [91], which may be particularly useful for pre-symptomatic individuals. While DPMs and EBMs often assume a common population trajectory, formulations such as SuStaIn can identify patient subtypes from variability in progression ordering [132]. However, the typical optimization procedure constrains EBMs to a limited number of features, precluding their application to high dimensional (e.g., multi-omics) or high-resolution (e.g., whole-brain imaging), and thereby limiting mechanistic insight.

Table 2 Example applications of data-driven methods to characterize biomarker alterations in neurodegenerative disease cohorts.

Mechanistically, similar pathological cascades seem to be shared between neurodegenerative diseases [143]. Data-driven subtyping and transdiagnostic clustering can help identify the distinct and shared mechanisms of different neurodegenerative disorders. To a limited extent, the ordering of disease alterations can be used to evaluate the temporality of pathogenic hypotheses [96, 97, 99]. However, the DPMs discussed account for relationships between biomarkers only implicitly, such as via joint probability distributions [78]. As such, while empirically validating hypothetical disease cascades is an important step towards understanding disease progression, biomarker timing only hints at the relationships between various neurobiological processes. In the following section, we discuss causal models that explicitly incorporate the interactions between multiple disease factors to evaluate hypotheses of disease pathogenesis and progression.

Evaluating disease hypotheses using mechanistic and causal models

Neurodegenerative disorders are accompanied by a multitude of alterations. Aging is a major risk factor across neurodegenerative disorders, which share features such as genomic instability, loss of proteostasis and cellular senescence [10]. A variety of genetic, environmental, pathogenic, lifestyle and dietary risk factors also contribute to sporadic disorders [27, 144, 145]. There is a notable vascular component to dementias, from AD [27, 30] to FTD [146]. The peripheral system [17] and gut-brain axis [147] also contribute to the onset and progression of PD. The integrity of the brain is thus multi-faceted, involving supporting vascular, metabolic, and inflammatory processes that support neuronal function. It is thus necessary to consider interactions between physiological factors and their causal directions, which may follow indirect, non-linear and complex pathways [148] (Fig. 2A), as well as inter-individual differences.

Fig. 2: Mechanistic models of pathophysiological interactions and network propagation.
figure 2

Dynamical systems-based models are used to explicitly represent intra-region interactions between different physiological systems, and inter-region propagation of pathophysiology. A Dynamical systems models impose causal structure on the relationships between variables. They can be used to simulate the spatiotemporal evolution of brain dynamics, and to determine optimal therapeutic inputs. The figure on the right has been adapted from [263] and is under the Creative Commons Attribution 4.0 International License http://creativecommons.org/licenses/by/4.0/. B Network models of connectome-driven pathophysiology propagation. These models consider the spatiotemporal propagation of disease factors, such as misfolded proteins, from regional epicenters along brain networks (e.g., structural, functional, or vascular connectomes). C A network diffusion model noted the resemblance between eigenmodes of the structural connectome graph Laplacian and the disease-specific atrophy patterns observed in healthy ageing, AD and bvFTD. Figures have been adapted with permission from [168]. D An epidemic spreading model (ESM) frames proteinopathy dynamics in terms of regional production, clearance, misfolding and propagation of misfolded proteins, and replicates spatial progression patterns observed from PET imaging. The figure has been adapted with permission from [174]. E Although no approved α-synuclein PET tracer exists at the time of writing, this epidemiological model of neurotoxic protein propagation and subsequent atrophy in PD patients replicated empirical atrophy patterns. The figure has been adapted with permission from ref. [182].

In this section, we shift away from the DPMs of the previous sections, which infer the order of biomarker alterations and event sequences but cannot resolve how these factors may influence each other. We now consider dynamical systems-based causal models of neurodegenerative disease progression. These methods explicitly employ interactions between disease factors. As a subset of these models with particular relevance to neurodegenerative diseases, we also consider network models of pathology propagation. With mechanistic interpretability and causal structure, these models are suited to testing disease hypotheses and inferring perturbational/treatment effects.

Model-inferred targets for combinatorial therapy in complex disorders

Given the unknown etiologies and heterogeneity of most neurodegenerative disorders, multiple therapeutic approaches are likely required [149]. However, identifying treatment targets is not trivial; disease-affected biomarkers do not necessarily translate to effective therapeutic targets [150]. Selecting candidate targets and designing clinical trials is likely to benefit from personalized and precision medicine approaches [151]. To this end, dynamical systems modeling using systems of coupled differential equations can characterize the spatio-temporal behavior of key variables, impose causal structure on interactions, identify pathways involved in disease progression, and predict the outcomes of interventions.

Using a dynamical systems framework called multifactorial causal modeling (MCM), Iturria-Medina et al. fit whole-brain population models of structural, functional, metabolic, vascular, and amyloid alterations as functions of their local pairwise interactions and inter-region propagation along anatomical, vascular and functional networks [152]. Consistent with earlier DPM analysis [99], vascular followed by functional activity alterations were the most likely initial pathogenic events based on cross-sectional data. Such dynamical systems models are well suited to the rich mathematical tools of control theory, to determine perturbational inputs to guide the brain to a different state (Fig. 2A). These models can identify optimal treatment targets, doses, and durations in various domains from molecular interaction networks [153] to brain stimulation [154]. In AD, MCM suggests that single-target therapy (e.g., targeting only amyloid accumulation or only vascular dysregulation) would be the least efficient way to return an advanced AD brain to a healthy state [152].

While cross-sectional data can be leveraged to select specific parameters to personalized using sensitivity analysis [155], applying MCM directly to individualized data can suggest biologically-based, patient-specific combination therapy to restore a healthy brain state [156]. Using a similar dynamical systems model, Zheng et al. note a stage-dependence on the relationship between physiological biomarkers and cognition; the amyloid parameter is most important at early disease stages but decreases in influence over time as neuronal degeneration has a stronger effect, further supporting combination therapy [155].

Network models of misfolded protein propagation

Neurotoxicity due to misfolded protein accumulation and inter-region propagation is a common theme across disorders, implicating pathogenic proteins with archetypal spreading patterns such as amyloid, tau, alpha-synuclein and TDP-43 [157]. In addition to the characteristic patterns of proteinopathy, numerous studies have also co-localized structural and functional networks with disease-specific pathological alterations [158, 159], such as default mode network atrophy and hypometabolism in AD [160] and functional connectivity-associated tau accumulation in primary tauopathies [161]. These findings are the basis of the network degeneration hypothesis of neurodegeneration, depicted in Fig. 2B, which suggests that pathological changes propagate along brain networks [162, 163]. The convergence of empirical data and the emergent field of network neuroscience [164] has enabled extensive connectome-based modeling studies of neurodegenerative and other brain disorders [165,166,167].

A wide range of network propagation models have been applied to data from molecular neuroimaging. These models are defined and differentiated by their assumptions about seeding, clearance, propagation, and network organization. At the whole-brain level, simple isotropic diffusion is insufficient to explain the spatiotemporal spreading of pathological proteins across the brain. With diverse tissue types and long-range connections, the cytoarchitecture and connectome are likely determinants of propagation.

Investigating the consequences of purely diffusive propagation without regional specificity, Raj et al. developed a linear network diffusion model (NDM), with protein propagation along concentration gradients on a static structural connectome obtained from tractography [168, 169]. Certain eigenmodes of network diffusion patterns showed similarities to AD and behavioral variant FTD (bvFTD) atrophy patterns (Fig. 2C), and this model was also more predictive of end-stage atrophy and metabolic alterations than baseline imaging in the ADNI cohort, with inter-class differences in rate parameters [169]. Lacking a directed human connectome, Pandya et al. [170] extended the network diffusion model with regional analogs from the mouse connectome and examined the effects of directed connectivity on progressive supranuclear palsy (PSP) atrophy. Both anterograde and retrograde propagation of purported tauopathy captured distinct topological patterns, suggesting the importance of propagation in both directions. Extensions of this approach attempted to infer seed regions of atrophy patterns [171], with most AD seed regions located in the temporal lobe, hippocampus and entorhinal cortex. Notably, model-derived seeds had a higher predictive power than assuming a common, hippocampal seeding, although no lower-dimensional latent structure was observed in the atrophy patterns and seeding regions. The importance of seed regions in determining eventual spatial spreading is emphasized by an anisotropic diffusion model, which recovers characteristic amyloid, tau, α-synuclein and TDP-43 deposition patterns based on different seed regions [172].

An important consideration in modeling protein propagation is chemical kinetics, such as the relationship between aggregation and clearing processes [173]. Garbarino and Lorenzi used Bayesian model comparison to evaluate different hypotheses of amyloid propagation in AD in silico [107]. Among increasingly complex dynamical systems models assuming (i) constant diffusion of amyloid [168], (ii) reaction-diffusion where aggregation and diffusion are simultaneous, and (iii) non-linear accumulation, clearance and propagation, the latter performed best, where propagation is triggered by saturated aggregation rather than being a constant diffusive process.

Based on in vitro, animal and human studies, the aggregation of proteinopathy is believed to induce “prion-like” misfolding in normal proteins [60]. Compartmental models of interacting susceptible, infectious, and recovered (SIR) populations are commonly used to simulate infectious diseases, and such an epidemic spreading model (ESM) has been developed for intra-brain pathology propagation [174]. Initially applied to amyloid PET data from ADNI (Fig. 2D), there is a decrease in amyloid clearance rate and age of pathology appearance when going from healthy controls to early and late MCI and finally AD patients [174]. Applications of this ESM to tau PET spreading patterns note that regions with high amyloid accumulation also display higher tau levels than predicted by connectivity-based spreading alone [175]. Other ESMs using MEG and tau PET data also demonstrated that functional connectivity predicts tau distribution patterns better than structural connectivity or simple diffusion [176], implicating dynamic activity as a substrate of pathological progression. In AD, amyloid accumulation is believed to form feedback loops with neurovascular uncoupling [177], and tau accumulation [178]. Causal mediation analysis suggests that amyloid positivity contributes to tau in the inferior temporal gyrus via a direct pathway as well as via medial temporal lobe tau levels [179], implying that both pathways would need to be targeted once an individual exhibits neocortical tau. Other, more detailed theoretical models (incorporating multiple forms of nucleation, elongation, etc.) also support the importance of amyloid-tau interactions and misfolded protein clearance, although available PET imaging data is unable to resolve all model mechanisms [180, 181]. Epidemiological models have also been combined with downstream modeling of neurotoxic proteinopathy effects result in atrophy (Fig. 2E) [182].

Competing hypotheses credit either connectivity-dependent intracellular or distance-dependent extracellular mechanisms for misfolded protein propagation. To compare these alternatives, Schäfer et al. [183] modeled longitudinal PET data using individualized network diffusion models. Although limited by the lack of follow-up imaging samples (with typically 3–4 visits per subject in ADNI), connectivity-based models seem to better match the longitudinal progression patterns observed in tau PET. Similar analyses using subject-specific Bayesian hierarchical modeling found statistically significant differences in average tau production rates and tau-dependent atrophy parameters between amyloid-positive and amyloid-negative individuals [184, 185]. Other works suggest the importance of disease stage, with early spatiotemporal evolution of tau driven by propagation whereas local production dominates in later stages, with individual and regional factors explaining some variability [186, 187].

The model-inferred evidence for activity-dependent, connectome-driven, and amyloid-enabled tau accumulation exemplifies the one application of causal modeling in evaluating disease hypotheses [175, 176, 179]. However, specific biological mechanisms of seeding, propagation, and selective regional vulnerability remain unresolved [188]. Convincing answers to these mechanistic questions will likely require continued integration of macroscopic models dominant in the imaging community with microscopic aspects of chemical kinetics [173], cellular and molecular features [167], and clinical phenotype in dynamically-evolving models [165].

Molecular and cellular vulnerability to disease progression

Brain regions are differentiated by various molecular factors such as cytoarchitecture, neurochemistry, transcriptomics, and connectivity [189, 190], which render them selectively vulnerable in disease [60]. While the DPMs discussed so far can reconstruct biomarker trajectories and sequences, characterize spatial patterns of alterations and infer interactions between macroscopic neuroimaging features, the underlying molecular and cellular mechanisms are more difficult to ascertain.

Linking these biomarkers, which are often non-specific to pathophysiology [191, 192], to mechanistic pathways requires information about features such as gene expression and neurochemical organization. However, molecular data must typically be obtained post-mortem, limiting its spatial and temporal coverage, sample size, and availability for a disease population of interest. Recently, many analyses have instead attempted to link spatiotemporal imaging pathology from neurodegenerative disease cohorts with template distributions of molecular features, such as mRNA expression for over 20,000 genes from the Allen Human Brain Atlas (AHBA) [193, 194], or neurotransmitter receptor densities from post-mortem autoradiography [195, 196] or PET imaging [197]. The growing body of research integrating neuroimaging-derived features with whole-brain molecular data (typically from averaged templates) has been termed the “molecular nexopathy paradigm” [198] or “imaging transcriptomics” [199, 200] (Fig. 3A). In this section, we summarize recent attempts to integrate molecular and cellular features in computational models of disease progression. We begin with several studies that show spatial correlations between imaging and molecular features. We then discuss how cellular and molecular features are used to augment mechanistic models, such as molecular-informed network propagation models and whole-brain dynamical models of coupled molecular and macroscopic physiological systems [201].

Fig. 3: Linking cellular and molecular architecture with large-scale brain alterations.
figure 3

A Imaging transcriptomics analyses typically use spatial correlations to identify cellular and molecular features co-localized with imaging alterations. B Dynamical systems models can explicitly incorporate properties such as a neurochemical architecture as a mediator of imaging-measured physiological interactions. The figure has been adapted with permission from [237]. C Biophysically constrained models consider cellular, mesoscale circuit, and macroscopic network effects at the appropriate scale. These models can incorporate synaptic mechanisms such as amyloid- and tau-mediated hyperexcitability [275] and serotonergic receptor-mediated gain modulation [260], and simulate their consequences on large-scale neuronal activity. The figure has been adapted with permission from [260].

Neurochemical correlates of functional, perfusion, and structural alterations

Neurotransmitter receptors are particularly relevant to behavioral function, interactions between physiological systems, and pharmacological response. Neurotransmission dysfunction is implicated in many neurodegenerative disorders including AD and PD, and in their frequently co-occurring psychiatric symptoms [202, 203]. However, the expense of PET and the lack of in vivo radioligands [204] has impeded large-scale, case-control imaging studies for many receptor types in disease populations. Nevertheless, healthy template distributions of neurotransmitter receptors are an informative proxy, and their physiological relevance to various populations has been supported by co-localization with macroscopic imaging signatures.

As the signaling system underlying neuronal activity, a natural first question is how neurotransmitter receptor architecture relates to spatial findings from functional and perfusion imaging. Cerebral blood flow (CBF) response to multiple drugs in young, healthy subjects is spatially correlated with autoradiography-derived receptor densities according to the corresponding drug-receptor affinity [205]. In the case of the dopaminergic D2 receptor, antipsychotic CBF response was better explained by PET-derived receptor density maps than by the mRNA expression profile of the corresponding gene DRD2 [206]. This is likely due to the many intermediary post-transcriptional steps separating gene expression from functioning receptors, supporting the complementary, but not identical, informativeness of these features. Acknowledging the multi-receptor binding of most psychedelic drugs and the complex interactions between various neurotransmitter systems, Luppi et al. found that pharmacologically-induce functional network reorganization is co-localized with neurotransmitter receptor expression [207], and regional susceptibility to cortical thinning in 11 neurological, developmental, and psychiatric disorders [207].

The impact of neurochemistry on structural vulnerability has also been supported by group-wise differences in the spatial correlation between receptor densities and disease-associated imaging features, such as atrophy patterns in schizophrenia patients with dyskinesia or parkinsonism [208], cortical thinning in PD patients with and without visual hallucinations [209], and white matter tract alterations in major psychiatric disorders (MPDs) [210]. In addition to patients with psychiatric symptoms, recent works have also co-localized healthy neurotransmitter receptor and transporter expression with structural and functional alterations in neurodegenerative disorders. In behavioral variant FTD patients, reduced fractional amplitude of low frequency fluctuations (fALFF) in fronto-temporal and fronto-parietal regions correlated with the densities of serotonergic 5HT1B and 5HT2A, GABAA, and D2 receptors as well as the norepinephrine transporter [211]. In particular, the strengths of the latter two associations correlated with symptom severity. In a PD cohort, fALFF alterations significantly associated with healthy D2 and 5HT1B receptor templates for both on and off levodopa conditions [212]. Voxel-wise gray matter volume differences spatially correlated with D1 receptor and serotonin transporter densities in primary progressive aphasia (PPA) patients [213], and these spatial correlations are dependent on genotype and disease stage in FTD patients [214]. Specifically, prodromal C9orf72 mutation carriers were associated with dopaminergic and cholinergic pathways, and MAPT carriers were linked to dopaminergic and serotonergic pathways, whereas no significant neurotransmission associations were found for prodromal GRN carriers. On the other hand, symptomatic FTD patients of all subtypes showed multi-receptor involvement including dopaminergic, serotonergic, glutamatergic, and cholinergic pathways [214]. These studies suggest that the neurochemical architecture of the brain may influence the selective vulnerability of brain regions to structural and functional alterations, a topic that is further explored by mechanistic models discussed in later sections.

Transcriptomics correlates imaging alterations

The spatial variation of gene expression in the brain and its relationship to imaging-derived features is also a topic of increasing interest [215]. Gene expression provides complementary molecular information to neurochemistry, and can be related to specific biological pathways using gene ontology. Correlative analyses using transcriptomic data have been applied to imaging signatures such as morphometric alterations in psychiatric disorders [216] and inter-individual variability in healthy white matter functional connectivity [217]. The transcriptomic correlates of white matter tract alterations are consistent with genes associated with MPDs from other lines of evidence, such as genome-wide association studies (GWAS) [210].

The correspondence between specific, disease-associated genes and imaging measures can be disease- and pathology-specific; AD amyloid deposition shows a moderate positive correlation with the amyloid precursor protein gene APP, whereas neurodegeneration instead shows a similar association with the tau-associated gene MAPT [218]. In the main FTD genotypes, there is no significant correlation between atrophy patterns and C9orf72, GRN, and MAPT expression [219]. However, genes associated with astrocytes and endothelial cells were overexpressed in regions with high atrophy, while neuronal- and microglial-associated genes were overexpressed in spread regions. In ALS patients, only OPTN showed a significant correlation with atrophy among disease-associated genes [220]. These correlative analyses can thus offer data-driven insight in addition to risk genes identified by GWAS.

Co-localization of imaging alterations and cell type expression

In addition to the contributions of diverse molecular pathways, the differential involvement of various cell types in neurodegenerative disorders is increasingly acknowledged [221,222,223]. Even characteristic disease genes, such as the APOE ε4 allele, appear to have cell type-specific effects [224], and mediation analysis suggests a pathway from tau pathology to cognitive decline via specific inhibitory neuronal, oligodendrocyte, astrocyte, and endothelial cell populations [225]. In a case-control comparison of post-mortem tissue from AD patients and controls, the expression of cell-type marker genes points to a decrease in excitatory neurons but an increase in inhibitory neurons and astrocytes in regions associated with AD cortical thinning [226]. Whole-brain cell type proportion estimates from the AHBA gene expression data also indicate a correlation between the densities of non-neuronal cell types, particularly microglia and astrocytes, with atrophy across 11 neurodegenerative diseases including early and late onset AD, PD, ALS, FTD and dementia with Lewy bodies [227].

However, tissue heterogeneity is a notable limitation in bulk transcriptomics. As genomics progresses from bulk tissue to single-cell/nucleus sequencing and spatial transcriptomics [228, 229], and larger omics datasets such as the Seattle Alzhiemer’s Disease Brain Cell Atlas (SEA-AD) become available [230], there is an increasing opportunity to integrate molecular information across scales and characterize regional and inter-individual variability [231, 232]. For example, Zeighami et al. combined the AHBA with single-cell gene expression from the middle temporal gyrus to compare the spatial expression patterns of disease-associated genes for 40 brain disorders, including neurodegenerative, developmental, psychiatric and movement disorders, and identify enrichment in specific cell types [233].

These applications represent some of the first efforts at resolving the cell type basis of neuroimaging alterations. Given the complicated and non-specific interpretation of many imaging measures (e.g., the influence of hemodynamics and afferent signals over regional activity on the BOLD signal [191]), the ability to disentangle the contributions of diverse cell types is a promising development.

Neurochemical and transcriptomic features in causal models

Molecular features can also be used to augment causal and network models discussed previously. Extensions of MCM have incorporated molecular mediation of interactions between pathological factors (Fig. 3B) to improve the model explainability of structural, functional, metabolomic, cerebrovascular, and proteinopathy alterations in the AD spectrum [234, 235]. In these personalized models, inter-individual variability in receptor-mediated interactions terms closely correlated with symptom severity. In AD, two “disease axes”, consisting of receptor-mediated biological interactions (e.g., between vascular and metabolic alterations), robustly correlated with inter-individual variability in i) executive dysfunction and ii) memory, language, and visuospatial symptoms [235]. Consistent with the dual syndrome hypothesis of PD [236], two distinct axes of model-inferred receptor-mediated interactions corresponded primarily to motor symptoms and secondarily to visuospatial, psychiatric, and memory axis with a strong cholinergic component [237]. Likewise, inter-individual co-variability between transcriptomic contributions to imaging alterations and symptom severity in AD suggests the involvement of a wide variety of pathways, ranging from oxidative stress, immune/inflammatory response, G protein-coupled receptors, and mRNA splicing [234]. These findings support a biologically- and clinically relevant role of multiple neurotransmitter systems and diverse molecular pathways, informed by the neurochemical and transcriptomic organization of healthy brains.

The relative influence of local molecular vulnerability on connectome-driven propagation of proteinopathy (e.g., amyloid and tau in AD) remains an open question. Based on graph and network metrics in cognitively normal subjects, amyloid propagation co-localized with CLU expression and dendritic genes, whereas tau propagation was associated with MAPT expression and axonal genes, in addition to a shared association with lipid metabolism and the APOE gene [238]. Gradients of APOE and the glutamatergic synaptic gene SLC1A2 expression are also implicated in the tau spreading network in cognitively unimpaired subjects [239]. However, an NDM suggests that gene expression alone does not explain pathology in AD; connectome-driven propagation predicts atrophy and hypometabolism better than the expression of single genes or principal components of multiple genes [240].

The pathways linking microscopic aggregation of α-synuclein with macroscopic functional activity and global brain network dysfunction in PD are also unresolved. Zheng et al. developed an epidemic spreading model of atrophy as a combination of α-synuclein mediated neurodegeneration and deafferentation (Fig. 2E) [182]. Informed by evidence of gene function, the production and clearance of α-synuclein in this model were determined by the regional expression levels of SNCA and GBA, respectively. These transcriptomic features significantly improved model fit, with the substantia nigra identified as the region most likely to result in an epidemic spreading condition from an initial misfolded protein seeding. Similar dependence on structural brain networks and transcriptomic factors (SNCA and GBA) were also observed in the related synucleinopathies of isolated REM sleep behavior disorders using compartmental modeling [241]. Validating a computational network diffusion model in mice injected with α-synuclein, Henderson et al. found evidence for primarily retrograde transmission and dependence on SNCA expression [242]. In sporadic and genetic bvFTD, an agent-based spreading model implicated both network spreading effects as well as transcriptomic vulnerability in [243]; atrophy patterns from deformation-based morphometry (DBM) were correlated with the expression of FTD-associated C9orf72 and TARDP genes. Epicenters varied between groups, potentially reflecting the convergence of multiple pathogenic factors to a common clinical syndrome mediated by network architecture [243]. These diverse applications demonstrate how transcriptomic data can be integrated into mechanistic models with or without prior knowledge of gene function.

Linking molecular features to model-inferred treatment needs

With regional variability in physiological interactions, connectome-based spreading of pathological factors, complex relationships between physiological and clinical biomarkers, and cell type-specific vulnerability, clinical prognosis can be complicated. Notably, causal models such as MCM can solve the problem of optimal, personalized treatment using the mathematical tools of control theory [156]. Optimal controller design supports the efficiency of multi-factorial treatments, and notably, model-derived personalized therapeutic intervention fingerprints were found to better predict plasma gene expression than clinical assessments in the ADNI cohort [156]. In the PPMI cohort, imaging-derived therapeutic intervention fingerprints correlated significantly with genetic factors and plasma gene expression that also explain levodopa response [244].

The recently emerging body of work integrating molecular features with longitudinal imaging and clinical data indicates that connectivity, multiple transcriptomic pathways, diverse neurotransmitter systems and cytoarchitecture together determine regional vulnerability to physiological alterations in neurodegenerative disorders. As such analyses proliferate, it is important to note several sources of variability. Post-mortem data is rare and typically under-sampled compared to imaging data, and inter-subject or even inter-hemisphere variability is not fully characterized. At a more fundamental level, pleiotropic genes and polygenic traits complicate the reverse inference of genes responsible for imaging phenotypes [245] Future work should aim to standardize methodology, for example, microarray probe selection, the choice of brain atlas, interpolation, lateralization, within- and across-donor normalization, null brain maps, and open-access toolboxes [246,247,248,249]. Nevertheless, the nascent field of imaging transcriptomics, using molecular data from representative populations [199], is a promising approach to linking in vivo macroscopic alterations with molecular pathways.

Biophysically constrained multi-scale dynamical models

Connectivity and interactions in the brain spans various scales, from local synapses and mesoscale circuits to long-range projections between distant brain regions [250]. As a result, there are complex relationships between microscopic molecular factors such as gene expression, cellular properties such as membrane potential and spike density, aggregation of neurotoxic pathology and neuronal activity, and macroscopic brain network dynamics. While the models discussed so far can hint at disease-relevant aspects of brain organization, the propagation of dysfunction up the hierarchy from microscopic to macroscopic scales has not explicitly addressed.

Biophysically constrained, whole-brain models of neuronal activity attempt to integrate data from multiple spatial scales to capture these relationships, typically with connectivity at the scales of i) cortical circuits comprising interacting excitatory and inhibitory neuronal populations and ii) long-range projections between macroscopic regions [251]. We distinguish these biophysically constrained models by their explicit modeling of multiple levels of brain organization. Simultaneously, these models must be detailed enough to provide mechanistic specificity, yet coarse-grained to be tractable. Microscopic properties of neural populations are typically averaged into spatiotemporal mean field or temporal neural mass models consisting of interacting populations of excitatory and inhibitory neurons, which contribute to macroscopic regional signals and network phenomena (Fig. 3C). By optimizing regional and global parameters to fit empirical data (e.g. fMRI or EEG/MEG signals), such approaches can evaluate the influence of cellular and molecular features on macroscopic alterations [252,253,254], and identify treatment targets for pharmacological interventions or brain stimulation [154].

Evaluating effective connectivity

Dynamical Causal Modeling (DCM) is a popular framework for model-based hypothesis testing [255], and has been used in many studies on task-based or resting state functional imaging (i.e., fMRI, EEG and MEG). At the core of DCM are individualized differential equation models of excitatory and inhibitory neural masses with local connections in cortical microcircuits as well as laminar-specific inter-regional projections. A forward model transforms this modeled neuronal activity into measured signal (e.g., BOLD signal for the fMRI models). Bayesian model comparison is then used to evaluate competing models [256], and DCM parameters can be compared across subjects and diagnostic classes. Unlike correlative measures (e.g., functional connectivity), DCM employs a causal model, and examining its parameters enables analysis of properties such as the effective connectivity of regional neuronal populations [257]. For example, DCM-inferred effective connectivity from the left dorsal premotor cortex to the left superior parietal cortex was (negatively) correlated with years to clinical onset in pre-symptomatic HD mutation carriers [258].

Neurotransmission modulates functional activity on a fixed structural connectome

A cognitively essential phenomenon that spans spatial scales and neurophysiological systems is the emergence of complex neuronal dynamics on a relatively fixed structural network via neurotransmitter modulation [259]. Multi-scale dynamical models are well suited to evaluating the effect of neurotransmission on the activities of local neuronal populations, as well as their hemodynamic or electrophysiological signatures (via observable BOLD or M/EEG signals).

The serotonergic system has several agonists of neuropsychiatric interest including psilocybin and LSD. Given the relatively fast action of these drugs, they offer a testbed for in silico dynamical modeling of pharmacological interventions. To investigate the effect of LSD on functional dynamics in a whole-brain, mean-field computational model, Deco et al. incorporated a single global gain parameter mediating the effect of local 5HT2A receptor density on regional neuronal activity [260]. In this model, neuronal parameters were first fit to minimize the statistical distance between the temporal correlations of simulated and placebo condition functional connectivity matrices, and then tuned to the LSD condition using the global gain parameter. To simulate the effects of psilocybin intake on the BOLD signal, Kringelbach et al. instead used a dynamically coupled model of neuronal-neurotransmitter interaction [259]. Serotonin release is determined by neuronal activity, and vice versa, with regional 5HT2A receptor density modulating the effect, and the model is fitted to features obtained by clustering the phase coherence between regional activity in a reduced dimensional space. The results support the importance of specific receptor density in both models; the 5HT2A receptor distribution is significantly more informative to the pharmacological response of LSD [260] and psilocybin [259] than other serotonergic receptors or the serotonergic transporter. This molecular insight has therapeutic implications since pharmacological treatment of psychiatric disorders typically involves selective serotonin reuptake inhibitors (SSRIs) acting via the transporter.

Psychedelic drug response has also been associated with an increased entropy of electrophysiological signals, and the subjective experience of psychedelics and the increased firing rate entropy may relate to the consequential ease of achieving different dynamical states of activity. [261]. Herzog et al. fit mean-field models, with serotonergic gain modulation of neuronal firing rate, to fMRI data from subjects under the influence of LSD and controls, and simulated resting state activity with and without 5HT-2A agonism [262]. The regional increase in activity entropy due to LSD was explained well by a combination of local 5HT2A receptor density and connectivity. Using network control theory informed by 5HT2A receptor density, Singleton et al. quantified LSD, psilocybin and DMT response as a reduction in brain network control energy [263, 264], associated with reduced functional connectivity differentiation [265]. In a non-psychedelic scenario, Coronel-Oliveros et al. demonstrated the relevance of PET templates of cholinergic receptors and transporters to resting state and attentional task activity [266]. Whole-brain models based on neural masses were fit to EEG and BOLD signals from nicotine users. The model-inferred mechanistic effect of nicotine was reduced global coupling and local feedback inhibition. Furthermore, nodal functional connectivity changes correlated with α4β2 receptor density [266].

Other multi-scale dynamical models have also incorporated aspects of neurotransmission. A mean field model links observable measures of functional integration/segregation with unobservable neurotransmitter kinetics (synaptic release and receptor binding) and its coupling with neuronal activity [267]. This model demonstrates that departures from an optimal E/I (i.e., glutamate/GABA) balance are associated with altered network measures of integration/segregation observed in functional connectivity analysis in neurological disorders [267]. A DCM study also integrated neurotransmitter concentrations from magnetic resonance spectroscopy (MRS) and resting state activity from MEG, to examine the specific connections affected by inter-individual differences in neurotransmitter concentrations in healthy subjects [268]. As expected, GABA concentrations influenced local recurrent inhibitory effective connectivity in the model, while glutamate levels influenced excitatory connections.

Mechanisms of excitatory-inhibitory imbalance and excitotoxicity in neural mass models

Electrophysiological data from M/EEG provides better temporal resolution at the expense of the spatial resolution of fMRI, and certain features such as attenuation of specific spectral bands are characteristic of AD [269]. In an early work combining neural mass modeling with a connectome template in AD, de Haan et al. simulated electrophysiological signals in healthy and diseased states with activity-dependent degeneration of synapses in response to spike density [270]. Compared to non-specific degeneration, activity-dependent degeneration better explained the structural and functional network alterations expected in AD, including oscillatory slowing, power spectrum attenuation, long-range desynchronization, hub vulnerability, and altered functional networks [270]. Notably, hub regions with high connectivity showed specific vulnerability as the sites of both higher amyloid deposition and increased neural activity. Although the symptomatic correlates of these alterations were not characterized, such models can be used to simulate expected macroscopic outcomes of interventions [271]. Counter-intuitively, excitatory neuronal stimulation was found to best preserve network activity. These findings highlight the complex response of the brain to simple perturbations, and the need for principled modeling of treatment effects. In AD patients, Sanchez-Rodriguez et al. used the framework of optimal control to determine stimulation target regions to steer the alpha band power spectrum towards a healthier, higher frequency state [154]. Notably, individuals with high anatomical connectivity (i.e., short path lengths and high global efficiency) had a lower stimulation energy cost.

Fitting MEG data from controls and MCI patients with amyloid pathology, van Nifterick et al. evaluated various mechanistic hypotheses of cellular alterations to excitatory and inhibitory populations in AD [272]. Pyramidal neuronal hyperactivity, inhibitory neuronal hypoexcitability, increased excitatory-excitatory coupling and decreased inhibitory-excitatory coupling were linked to oscillatory slowing [272]. Similar neural mass models have been used to assess various candidate markers of excitation-inhibition (E/I) ratio [273].

Other works explicitly include local amyloid and tau levels and their pathological propagation in neural mass models. Alexandersen et al. assumed connectivity-driven propagation of tau and more diffuse spatial spreading of amyloid simultaneously from multiple epicenters [274]. This model showed an initial increase in alpha band power in simulated M/EEG signals followed by a decrease, as well as a slowing of alpha band oscillations due to a decrease in excitatory activity and increased global coupling.

With the advent of new imaging targets, dynamical models can incorporate several molecular factors simultaneously. Sanchez-Rodriguez et al. combined structural, functional, amyloid, tau, and glial imaging, plasma markers, and clinical data from 132 subjects on the AD spectrum from the Translational Biomarkers in Aging and Dementia (TRIAD) cohort in neural mass model [275]. The subject-specific influences on neuronal excitability (i.e., firing thresholds) of regional levels of amyloid, tau, and their synergistic interaction were optimized to reconstruct individuals’ BOLD signals. AD subjects were characterized by lower alpha power and increased theta power, with neuronal excitability differing based on amyloid status and Braak stage. Notably, model-inferred, latent neuronal hyperexcitability correlated with worsened cognitive symptoms and plasma tau biomarker concentrations [275].

The Virtual Brain (TVB) [276] is a multi-scale, whole-brain mean-field modeling framework that has been used to reproduce features of empirical fMRI and EEG signals [277], relate them to cellular-scale properties such as E/I balance [278], optimize lead placement for deep brain stimulation [279] and predict functional connectivity outcomes of neurosurgery [280, 281]. In the context of neurodegenerative disorders, TVB has also been used to infer the macroscopic impact on EEG signals of amyloid modulation of neuronal dynamics [282] In AD patients, simulations reproduced properties of EEG signals, and reducing model weights (simulating the effects of the NMDA receptor antagonist memantine) partially reversed the characteristic oscillatory slowing [282]. Dynamical models such as TVB can also be used to infer relevant mechanistic alterations. Fitting resting state fMRI data in controls, amnestic MCI subjects and AD patients, statistically significant inter-subject correlations were observed between model parameters (representing excitatory-excitatory, excitatory-inhibitory, inhibitory-excitatory, and global coupling) and various cognitive domains [283]. These model parameters can differ between diagnostic categories and functional networks. For example, AD patients have significantly increased excitatory coupling in the default mode network, but it is reduced in the somatomotor network. The frontoparietal network, which is preserved in AD, involves alterations to all 4 TVB parameters in FTD [284].

Perturbational trajectories in low dimensional space

Low-dimensional embeddings can be useful in capturing the salient structure of high-dimensional brain states, such as neuronal activity or functional connectivity [285]. To compare the regional effects of external stimulation on different brain regions and across conditions, Sanz Perl et al. fit a phenomenological whole-brain model to the empirical functional connectivity of healthy controls, and AD and behavioral variant FTD patients [286]. Different waveforms of stimulation were applied to these models (specifically, to the bifurcation parameter, related to the excitatory-inhibitory balance). The resulting functional connectivity trajectories were visualized in a low dimensional space via variational autoencoders (VAEs), a non-linear dimensionality reduction technique with a regularized latent space. While the parameters of this model themselves are difficult to interpret, proximity in the VAE latent space implies similar functional connectivity, and diagnostic classes clustered well in the latent space representation of functional connectivity. Perturbational trajectories in this latent space were then used to determine the proximity of different brain regions to the distribution in healthy controls. Depending on the waveform, stimulation to visual areas, the sensorimotor cortex, and the temporal lobe, including the hippocampus, perturbed AD subjects towards healthy latent representations. Meanwhile, frontal regions were the most important to behavioral variant FTD perturbations [286].

Multi-modal data integration in biophysical models

Dynamical models are a promising approach to understanding the molecular mechanisms behind macroscopic observations, and their associations with clinical variables at the group or individual level. Many dynamical models have incorporated molecular information, particularly neurotransmission and neurotoxic proteinopathy mechanisms, and explored the effects of external perturbation. More generally, other molecular drivers of regional susceptibility can also be incorporated, potentially informed by the models presented in the preceding sections. In contrast to the many dynamical models that simulate functional or electrophysiological activity, Khanal et al. developed a biophysical model of atrophy and brain deformation to generate realistic simulated atrophy patterns from longitudinal data in AD [287]. Accounting for the different mechanical properties of parenchyma and CSF, tissue remodeling minimizes internal mechanical stress due to neuronal death. A promising avenue of future work is the incorporation of other physical drivers of neurodegenerative brain alterations, including mechanical stress due to atrophy or inflammation, molecular influences on cellular properties, macroscopic influence on brain networks, and global effects of environmental factors. Looking beyond dynamical models of neuronal activity and considering mechanical effects is a promising direction for multi-scale models [288].

Discussion and conclusion

Summary

Despite varying genetic [144], environmental [145], and age-related [10] risk factors, causes of sporadic neurodegenerative disease onset remain unknown. Healthy brain function requires the coordination of multiple physiological systems, and neurodegenerative disorders can affect altered neuronal activity, proteinopathies, vascular dysfunction, neuroinflammation, metabolic alterations, cell death, and atrophy. In the preceding decade, computational models of these multi-factorial processes have proliferated. Continuous time DPMs and discretized EBMs have provided data-driven staging of biomarker abnormality, network propagation models have characterized pathology propagation across brain networks, the integration of molecular data sources has identified salient aspects of cyto-, receptor- and transcriptomic-architecture, and dynamical systems models have been used to reproduce and evaluate disease mechanisms.

Causal inference using computational models

Yet, the central question remains: what are the causes of sporadic neurodegenerative disease onset, and how can they be treated? Over the 20th century, the post mortem clinico-anatomical method has been largely superseded by correlational in vivo neuroimaging studies [93], but this has not resulted in robust, disease-specific diagnostic or prognostic markers [289, 290]. Biomarker correlates of symptoms and treatment effects may be spurious, compensatory, or secondary to causal pathogenic mechanisms. The gold standard of causal evidence is the randomized controlled trial, requiring experimental intervention, in which studied populations, manipulated variables, and observed outcomes are carefully defined and individuals are blindly separated into control and treated groups. In the context of neurodegenerative pathogenesis in humans, this is often infeasible due to ethical concerns, the long temporal scale of disease progression, and the lack of appropriate counterfactuals.

With sufficient coverage of disease variability, neuroimaging-based modeling of disease progression can help address these limitations. Datasets should encompass i) the various stages of disease progression and ii) inter-individual and -subtype variability, although the exact requirements will vary based on the specific objectives and modeling methods. For example, a broad exploratory analysis will likely require more subjects compared to a clearly defined hypothesis-driven study. More complex models (i.e., with more parameters) will generally require more data, and individualized models will typically necessitate multiple longitudinal follow-up visits. Statistical considerations such as the effect sizes of disease features and desired statistical power can also impose sample size requirements. The existence of multiple biological subtypes with the same clinical diagnosis would also require larger sample sizes to distinguish their distinct progression patterns. The enrollment of preclinical, prodromal, and at-risk individuals will also depend on our knowledge of risk factors and predictive markers, while later, more severe stages of disease progression may be underrepresented due to patient drop out. A further consideration is environmental, socioeconomic, and genetic variability, which may not be sufficiently covered by a single center. In practice, sufficient coverage and sample sizes can often be achieved only by multi-site observational studies, which can suffer from recruitment bias, missing data, patient drop-out, and varying protocols. While standardized workflows and validation in independent datasets may alleviate some of these issues, appropriate post-hoc harmonization is often necessary to correct for technical and sample differences in data acquired from multiple centers [291]. Nevertheless, large observational studies can still be leveraged for causal inference [292], and the Bradford Hill criteria provide a blueprint to design experiments and analyses to evaluate causality [93]. For example, DPM and EBM approaches can support or refute purported causal relationships between biomarkers via temporality, while dynamical systems approaches can incorporate explicit causal structure and are well suited to modeling response to external perturbation. Other statistical techniques such as regression discontinuity design, differences-in-differences, Bayesian networks, and structural equation modeling are also appropriate for quasi-experimental causal inference [293]. Finally, we must be aware of the multiplicity of meaning behind the term “causal mechanism” in the literature, which can range in spatial scale from molecular interactions and cellular processes to circuit properties and abstract topological and network concepts [294].

In addition to considerations about causality, study design is often not explicitly considered or justified in computational models of observational data. In randomized experiments, there is a clear and well-defined distinction between pre-existing covariates, and outcomes after treatment. If the latter differ between the two groups significantly more than random chance, they are attributed to the treatment. However, this distinction between covariates and outcomes can be more blurred in observational studies. Well-designed observational studies should aim to approach characteristics of randomized experiments. One way to do so may include the definition of multiple control groups, to account for plausible alternatives [295].

On a related note, case-control studies may be sensitive to the precise selection criteria, particularly in the absence of robust biomarker disease categories. Often, inclusion criteria are intended to homogenize the studied population, but can introduce assumptions and biases. For example, amyloid-based definitions of “typical AD” can affect the results of downstream DPM analysis [130], and it is unclear whether patients with tau but no amyloid pathology should be considered to have early AD or non-AD pathology [296].

Although they are inherently limited by cohorts, study design, and collected data modalities, computational brain models can potentially resolve the “causality gap” [297], a prerequisite for improving treatment target selection [154, 298]. Conversely, interventional studies can provide rich evidence to evaluate generative brain models and resolve causality. Autopsies [299] and biopsies conducted during standard treatment procedures (e.g., deep brain stimulation [300] or tumor [231] surgeries) can also be a valuable source of omics data with minimal modifications to routine clinical workflow. Phenotyping patients selected based on genotype also offers an alternative to the typical imaging transcriptomics workflow [245]. As the field continues to mature, closer links between experimental design, computational modeling, and clinical considerations are imperative to resolving the unmet potential of neuroimaging-based modeling in clinical practice [290].

Clinical applications of computational models

Computational models also have rich applications outside of causal inference and scientific hypothesis testing. Suggested use cases of DPM-inferred latent disease time include defining endpoints [78], and inclusion criteria [89, 129] to reduce the number of subjects required to observe intervention effects in clinical trials. Personalized treatment design based on whole-brain dynamical models can also guide clinical interventions (Fig. 4A) [152, 156, 179, 244, 301], a paradigm that has wide applications from psychiatric disorders [251, 302] to epilepsy [303, 304]. While other applications are often focused on controlling pathological neuronal activity via stimulation, neurodegenerative disorders are likely to require multi-faceted treatment addressing the many affected physiological systems [152], with some responding slower than others. Indeed, current single-target, anti-amyloid monoclonal antibodies fail to cross the threshold of clinically important cognitive and functional benefit [305]. Model-inferred computational drug repurposing based on associated molecular pathways can also be used to speed up the drug development process (Fig. 4B) [306]. Given potential limitations about the controllability of brain networks as well as practical implementations of such interventions, significant work is still needed to translate computational models into clinical practice [307].

Fig. 4: Using multi-factorial computational models to improve treatment selection and test mechanistic hypotheses.
figure 4

Computational models of spatiotemporal pathophysiology progression can go beyond correlative analysis and infer disease-altered mechanisms. A Integrative in silico modeling of the progression of multiple biomarkers can be used to predict future disease progression and infer optimal therapeutic interventions at an individualized level [156]. B The role of the molecular architecture of the brain in various disease-affected alterations is an open question. Molecular pathways enriched in disease-affected tissue (e.g., where amyloid and tau accumulation alters functional activity) can be used to identify potential therapeutic targets [306]. C Computational modeling can benefit from diverse data sources, incorporating population-derived distributions of disease onset age with in vivo biomarker data [78], using homologous structures in other species to inform directed network propagation models [170], and validating model predictions using invasive experiments in animal models [312].

Expanding current whole-brain models

There are several other methodological avenues for future work in whole-brain computational models. Connectome-based modeling typically considers either structural or functional connectivity. However, a more complete picture may need to also consider metabolic, vascular, and molecular connectivity [152] Comprehensive integration of multiple forms of connectivity and features driving local molecular vulnerability using causal, network, and biophysical models is a promising avenue of research [308, 309].

Supporting these methodological extensions are technical advances providing new sources of molecular data. Causes of selective vulnerability, including cell type, can be further characterized throughout the cellular life cycle by single-cell profiling and iPSC methods [221]. Given the differential contributions of genes associated with specific sub-cellular structures, the ever-improving spatial resolution of cellular profiling is a valuable development [238]. While in vitro and animal models cannot perfectly reproduce cognitive decline and other phenotypic aspects of neurodegenerative disorders [310], they can be used to validate new methods (Fig. 4C). In non-human animals, meso- and micro-scale features can be probed more invasively, and intervention effects can be tested more readily [311]. Computational models of protein propagation in rodents can image features such as directional [170] and meso-scale connectivity [312, 313], and microglial influence [314]. Other functionally relevant neuroanatomical features, such as laminar structure, may be further resolved by advances in human imaging [315].

Finally, an important consideration that is often overlooked in DPMs is the roles of genetic, sex, environmental, lifestyle, and comorbid risk factors in sporadic disease onset and progression. One way to address this would be genome-wide association studies (GWAS) with imaging phenotypes and analysis of modifiable risk factors from large observational studies. For example, diabetes, air pollution, and alcohol intake frequency were associated with structural degeneration of a vulnerable brain network [316]. Furthermore, DPMs have also noted genotype-specific progression patterns. In sporadic AD, an early EBM application noted more homogenized disease progression in APOE ε4 carriers [125], while a continuous-time DPM demonstrated sex differences in how genotype affects progression [89]. The main familial FTD genotypes also present distinct progression patterns [78]. Data-driven analysis of the impact of other non-autosomal dominant genetic risk factors on progression trajectories would address an important gap in our understanding of sporadic neurodegenerative diseases.

Bringing in vivo biomarkers to clinical practice

Although imaging and fluid measures have shown promise in research settings, we still lack definitive clinical biomarkers across neurodegenerative diseases, particularly in the early prodromal or preclinical stages [317,318,319]. Robust in vivo biomarkers are likely necessary for the detection of disorders in the pre-clinical phase, when treatment is most likely to succeed, as well as to monitor progression and treatment response [296, 320]. While there may be systematic factors hindering impact on clinical practice, such as a shortage of resources, physician unfamiliarity, non-standardized testing, lack of regulatory approval, incomplete validation, and inconsistent coverage by healthcare systems [42, 321], there are also technical limitations. In vivo biomarkers have varying but generally imperfect specificity and sensitivity to the underlying physiological process of interest. For example, the BOLD signal is merely a proxy for functional brain activity [322], molecular imaging is susceptible to regionally heterogeneous ligand uptake and off-target binding [296], and fluid markers are subject to variable protein kinetics [296]. As such, biomarkers (and any model-inferred features such as disease time) require extensive normative characterization, standardization across studies, replication across cohorts, and validation with neuropathology and clinical status [323].

Towards biomarker-based disease definitions

Even with clinical and neuropathological validation, there is room for disagreement about what (combination of) pathology constitutes a particular disorder due to the absence of definitive biological disease definitions. Individuals with tau but no amyloid pathology from PET imaging may be considered to have either early AD or non-AD pathology [296]. Furthermore, co-pathologies are potential sources of heterogeneity driving biological subtypes in disorders such as AD [119].

The knowledge gap between symptomatically defined neurodegenerative diseases and unknown pathogenic causes is a major impediment to drug development [324]. In general, the ratio of research and development expenses to FDA-approved drugs has been rising exponentially since the middle of the 20th century [325]. Yet, potential reversals to this trend may be occurring due to genomics-validated targets for rare diseases, where drug development benefits from biologically homogenized patient populations [326]. Until the recent anti-amyloid monoclonal antibodies, neurodegenerative disorders have suffered for decades from a lack of successful drug trials [320]. The typical pharmacological approach for these complex and heterogeneous diseases is fixated on a single target, usually reducing the insoluble form of a proteinopathy such as amyloid, tau, α-synuclein, or TDP-43 [56] [327,328,329]. However, looking beyond the usual proteinopathy suspects can reveal effective modifiable risk factors such as vascular health [146].

To this end, biological disease definitions are imperative, and likely involve molecular networks spanning multiple pathways [324]. A thorough molecular profiling of brain tissue or other biospecimens is needed to stratify biological heterogeneity [123, 330], along with combination therapy addressing the affected molecular networks and macroscopic physiological systems [320, 328]. The integrative computational modeling approaches discussed in this review can support these efforts to uncover the biological basis of clinical heterogeneity in transdiagnostic populations [37, 331].