Introduction

The behaviors of complex systems, ranging from cell migration to pathological protein diffusion, brain activity, human mobility, and bird flocking, exhibit not only nonlinearity but also stochasticity1,2,3. Stochasticity plays a crucial role in enhancing a system’s adaptability to rapidly changing environments4,5, facilitating information processing6,7, and increasing robustness8,9. The emergence of order from disorder has long fascinated scientists, particularly in the context of system dynamics. While the behaviors of complex systems can be observed experimentally, their underlying dynamics remain elusive. Therefore, stochastic differential equations (SDEs) have been widely employed to model such stochastic systems due to their ability to simultaneously describe deterministic evolution and random fluctuations stemming from unresolved degrees of freedom.

However, conventional SDE models used to describe real-world scenarios have certain limitations, such as predefined forms, simplified physics, and assumed parameter values. Fortunately, the increasing availability of empirical data, including network typologies and node activities, provides an opportunity to shift this paradigm. Instead of modeling the dynamics of a complex system using a predefined SDE, it becomes possible to infer the hidden SDE from observational data on system behaviors.

Discovering the governing laws of dynamics from data has become a prominent field of artificial intelligence-empowered scientific exploration10,11,12,13, making significant progress in recent years14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29. Numerous data-driven methods have been proposed to identify ordinary differential equations (ODEs) and partial differential equations for single- and few-body nonlinear systems14,15,16, as well as ODEs for large networks17,18,19. However, these methods may not effectively address real systems exhibiting stochasticity. Previous efforts to learn stochastic dynamics have primarily focused on predicting a system’s future evolution rather than inferring its underlying SDE20. Additionally, the majority of previous methods have been validated on simulated systems with known ground-truth dynamics24,25, and few have demonstrated the ability to infer real stochastic systems with unknown underlying dynamics (with exceptions like27,28).

Here, we aim to address a fundamental question: given the observations of network topology and nodes’ activity series, how can we infer the coupled SDEs that capture the hidden stochastic dynamics of a complex system? The main contributions of our work are summarized as follows:

  1. 1.

    We propose a method termed the Langevin Graph Network Approach (LaGNA) that incorporates an innovative message-passing mechanism to separate dynamical sources within nodal activity data. This method subsequently infers concise mathematical expressions for each of these dynamic sources by leveraging corresponding neural network modules. Comparative analyses showcase our method’s proficiency in effectively unveiling the hidden coupled SDEs of complex networked systems, demonstrating superior performance compared to five state-of-the-art methodologies in the field.

  2. 2.

    We apply our method to natural flocking, an intriguing phenomenon and important research topic in the community of statistical physics and complex systems. From the trajectory data of several flocks our method successfully infers the SDE of real flocking dynamics. The inferred SDE exhibits a remarkable resemblance to the second-order Vicsek model, providing unprecedented evidence that the Vicsek model is not just a toy model but captures genuine flocking dynamics.

  3. 3.

    We apply our method to the spreading of pathological tau protein in Alzheimer’s disease (AD) brains, a frontier problem in neuroscience. From the experimental data of tau pathology in AD mice brains, our method successfully infers a novel SDE that captures the tau diffusion dynamics. The finding not only enables early-stage prediction of the percentage of brain areas that will be affected by tau pathology but also offers novel quantitative insights into the mechanism of tau pathology.

Results

Overview of the LaGNA framework

The state evolution of a complex system is often driven by several dynamic sources, including the self-dynamics of each node, the interaction between nodes, and intrinsic stochastic diffusion. In the first stage of LaGNA (Fig. 1a–d), we design a message-passing mechanism guided by the complex network’s nontrivial topology. The message-passing mechanism consists of three neural network (NN) modules: self-dynamics simulator \(\hat{{{{{{\bf{f}}}}}}}(\cdot )\), interaction dynamics simulator \(\hat{{{{{{\bf{g}}}}}}}(\cdot )\), and diffusion simulator \(\hat{{{{{{\boldsymbol{\phi }}}}}}}(\cdot )\), tailored to separate the dynamical sources hidden in nodes’ activity data (Fig. 1b,d) and differing from that used in graph neural network30.

Fig. 1: Pipeline of LaGNA.
figure 1

a Observation data include each node’s activity series and the network topology. b The stochastic dynamics of a complex networked system consists of three components: self (blue), interaction (gray), and diffusion (purple). c Predicted future states of nodes by the well-trained LaGNA. d In LaGNA, each node’s state evolution is governed by three neural network modules: \(\hat{{{{{{\bf{f}}}}}}}(\cdot )\) for self-dynamics, \(\hat{{{{{{\boldsymbol{\phi }}}}}}}(\cdot )\) for diffusion, and \(\hat{{{{{{\bf{g}}}}}}}(\cdot )\) for interaction. eg The explicit mathematical expressions of these parts are derived by penetrating the three modules, forming the final stochastic differential equation (SDE), where \({N}_{\min }\) represents the minimal number of elementary terms required.

Each node i’s activity at time t is denoted as a d-dimensional vector \({{{{{{\bf{x}}}}}}}_{i}(t)\equiv {({x}_{i,1}(t),{x}_{i,2}(t),\ldots,{x}_{i,d}(t))}^{{{{{{\rm{T}}}}}}}\). Given the input of nodes’ activities xi(t), i = 1, 2, …, n (Fig. 1a), the LaGNA estimates the states at the next time step \({\hat{{{{{{\bf{x}}}}}}}}_{i}(t+\,{{\mbox{d}}}\,t)\) (Fig. 1c) using the following equation:

$${\hat{{{{{{\bf{x}}}}}}}}_{i}(t+{{\mbox{d}}}t)={{{{{{\bf{x}}}}}}}_{i}(t)+(\hat{{{{{{\bf{f}}}}}}}({{{{{{\bf{x}}}}}}}_{i}(t))+\mathop{\sum}_{j=1}^{n}{A}_{ij} \, \hat{{{{{{\bf{g}}}}}}}({{{{{{\bf{x}}}}}}}_{i}(t),{{{{{{\bf{x}}}}}}}_{j}(t)))\,{{\mbox{d}}}\,t+\hat{{{{{{\boldsymbol{\phi }}}}}}}({{{{{{\bf{x}}}}}}}_{i}(t))\,{{\mbox{d}}}{{{{{{\bf{W}}}}}}}_{t}.$$
(1)

Here, Aij is the adjacency matrix representing the network topology, and Wt is the d-dimensional vector representing the Wiener process (i.e., normally distributed around zero with variance dt). Note that the form of Eq. (1) can describe a wide range of complex dynamical systems31,32,33, including epidemic spreading, neuronal dynamics, ecological dynamics, gene regulation, as well as flocking and tau pathology diffusion, as we will show below. The current stage of LaGNA can be viewed as an implicit dynamical system with a large number of trainable parameters: θf, θg, and θϕ representing the self, interaction, and diffusion simulators, respectively. To capture the underlying dynamics of a given complex system, LaGNA’s outputs \({\hat{{{{{{\bf{x}}}}}}}}_{i}(t+\,{{\mbox{d}}}\,t)\) need to exhibit behavior similar to the true observation xi(t + dt). Due to the intrinsic stochasticity, minimizing the difference between \({\hat{{{{{{\bf{x}}}}}}}}_{i}(t+\,{{\mbox{d}}}\,t)\) and xi(t + dt) will result in overfitting. Therefore, we train LaGNA with observation pairs xi(t) and xi(t + dt), and obtain its optimal parameters by maximizing instead the expectation:

$${\hat{{{{{{\boldsymbol{\theta }}}}}}}}_{f},{\hat{{{{{{\boldsymbol{\theta }}}}}}}}_{g},{\hat{{{{{{\boldsymbol{\theta }}}}}}}}_{\phi }: = {{{\rm{arg}}}}{{{{\rm{max}}}}}_{{{{{{{\boldsymbol{\theta }}}}}}}_{f},{{{{{{\boldsymbol{\theta }}}}}}}_{g},{{{{{{\boldsymbol{\theta }}}}}}}_{\phi }}{\mathbb{E}}[\log {p}_{{{{{{{\boldsymbol{\theta }}}}}}}_{f},{{{{{{\boldsymbol{\theta }}}}}}}_{g},{{{{{{\boldsymbol{\theta }}}}}}}_{\phi }}({x}_{i}(t+{{\mbox{d}}}t)| {x}_{i}(t),{{\mbox{d}}}t)],$$
(2)

where \({p}_{{{{{{{\boldsymbol{\theta }}}}}}}_{f},{{{{{{\boldsymbol{\theta }}}}}}}_{g},{{{{{{\boldsymbol{\theta }}}}}}}_{\phi }}\) is the probability density of the normal distribution generated by the model of Fig. 1b with parameters θf, θg, θϕ. Note that Eq. (2) describes the case of d = 1; refer to the “Methods” section for situations d > 1.

The well-trained LaGNA has the ability to predict future behaviors; however, it currently lacks an explicit equation to describe the underlying dynamics of the system. In the second stage, we aim to unveil the inner workings of the LaGNA black box. The tailored message-passing mechanism (Fig. 1d) has separated the underlying dynamics into three neural network modules, namely \(\hat{{{{{{\bf{f}}}}}}}(\cdot )\), \(\hat{{{{{{\bf{g}}}}}}}(\cdot )\), and \(\hat{{{{{{\boldsymbol{\phi }}}}}}}(\cdot )\). This decomposition allows us to penetrate each module, deriving explicit expressions for the three parts. Using pre-constructed comprehensive libraries of terms, i.e., LF, LG, and LΦ shown in Supplementary Information Section I-B, we identify the optimal combination of terms from the libraries using a modified version of our two-phase approach17. Our framework successfully separates and identifies concise mathematical expressions for self-dynamics, interaction dynamics, and intrinsic stochastic diffusion, respectively, which together form the final stochastic differential equation (Fig. 1e,f,g). LaGNA enables the balance of accuracy and complexity of mathematical expressions (see Supplementary Information Section III-C), becoming an interpretable learner for discovering the hidden SDEs of complex networked systems. Further details are described in Methods and Supplementary Information Section I.

Learning the stochastic dynamics of signed and weighted networks

Signed and weighted networks are prevalent in various biological and physical systems. In neuronal systems, for instance, the synapses between neurons can be either excitatory, enhancing the activity of the receiving neuron, or inhibitory, reducing activity. In physical systems like power grids and traffic networks, link weights play a crucial role in system characterization. The combined effect of heterogeneous links in these networks makes interaction intricate and poses challenges in dynamics inference.

To address the challenge of interaction heterogeneity and validate the effectiveness of our framework, we conduct simulations of a stochastic system with Hindmarsh-Rose (HR) neuronal dynamics on a signed network34 (refer to Supplementary Information Section II-B.3). In the simulations, we randomly assign half of the nodes as excitatory and the other half inhibitory. The links from excitatory nodes show excitability with Vsyn = 2, while the links from inhibitory nodes show inhibition with Vsyn = − 1.5. To infer the hidden HR dynamics, we incorporate the knowledge of link types and utilize two NNs for estimating excitatory and inhibitory interaction dynamics, respectively. It is worth noting that we use only one trial of the nodes’ activity sequence. The results in Fig. 2b–d show that our framework accurately estimates the terms of self, diffusion, and, notably, the two types of interactions. The inferred stochastic differential equations (SDEs) successfully reproduce the force field (Fig. 2e) and the stochastic trajectory (Fig. 2f). Additionally, we consider a weighted network Aij, where 0 ≤ Aij ≤ 1 and further simulate the network dynamics using stochastic Rössler equations35 (refer to Supplementary Information Section II-B.2). The results in Fig. 2h–j demonstrate the inability of the stochastic Rössler dynamics on weighted networks. The trajectory generated by the inferred SDEs exhibits similar dynamical characteristics compared to the original trajectory (Fig. 2g, l), and the reproduced force field closely aligns with the true force field (Fig. 2k).

Fig. 2: Validating LaGNA on signed and weighted networks.
figure 2

af Inference results of stochastic Hindmarsh-Rose (HR) dynamics on a 20-node signed network, where half of the nodes are excitatory and the other half are inhibitory. a Simulated HR stochastic 3-dimensional trajectory of one node. bd Comparisons of the values generated by the inferred SDE and that simulated by the original dynamics model for self (b), interaction (c), and diffusion (d) parts, respectively. ef The force field and trajectories generated by the inferred SDE. gl Similar to the above, but for the inference results of stochastic Rössler dynamics on a 20-node weighted network, where link weights are randomly drawn from a uniform distribution between zero and one. m Inference inaccuracy of previous five methods compared to LaGNA in stochastic Lorenz network dynamics benchmark. The bar chart depicts the sMAPE of five independent runs, along with the standard deviation indicated.

To underscore the significance of our LaGNA method in inferring the stochastic dynamics of complex networked systems, we conduct comparisons between LaGNA and five state-of-the-art methods, namely Modified-SINDy29, Two-Phase inference17, SDE-net25, SVISE26 and SFI28, utilizing the stochastic Lorenz networked system (refer to Supplementary Information Section II-B.1). In this model system, each node’s state is represented by a three-dimensional vector xi = (xi,1, xi,2, xi,3), where the intrinsic stochastic diffusion in one dimension (e.g., xi,2) can be influenced by another (e.g., xi,3). The stochastic intensity is denoted by \(1/\sqrt{\gamma }\), with a smaller γ indicating higher stochasticity. As shown in Fig. 2m, among the evaluated methods, LaGNA demonstrates significantly reduced errors in inferring the networked SDE, outperforming the other methods by two orders of magnitude. Although SDE-Net, SVISE, and SFI can effectively estimate drift and diffusion, explicit expressions for the networked SDEs remain elusive for the three methods. This challenge arises from the fact that the drift field of networked SDEs encompasses both self-dynamics and interaction effects. In contrast, LaGNA incorporates three specifically designed neural network modules, especially the message-passing module defined on links to capture the interaction dynamics, which together distinguish between self, interaction, and diffusion effects in observation data. This distinction enables accurate learning of networked SDEs from stochastic trajectories, effectively overcoming the limitations of previous “methods” (see Supplementary Information Section V for further comparisons).

Note that there is an interesting method recently introduced for learning macroscopic dynamical descriptions of stochastic dissipative systems27. However, the objective of this method differs from ours, as it aims to capture coarse-grained macroscopic behavior rather than the node-level microscopic dynamics required for our exploration of the real complex systems in the following sections. Due to the disparity in objectives and outputs, this method is not included in the comparison tests.

Learning the dynamics of empirical bird flocks

Collective motions and swarming are fascinating phenomena widely observed in nature3,36,37, such as bird and fish flocks, cell motions, and bacteria colonies. Understanding how individuals interact when large numbers of individuals move together in groups without colliding, or even when they perform tasks together, like hunting3,38, has been a topic of widespread discussion. The prevailing consensus on this phenomenon is that the condensation of individuals results from birds being consistent with their neighbors regarding speed, also known as alignment, as well as their tendency to maintain a close distance while avoiding collisions, also known as cohesion3,39.

To discover the underlying dynamics from the flocking trajectories, we extend the internal architecture of the LaGNA based on the above hypothesis. Specifically, we implement a second-order version by setting up three specialized NNs for simulating the self-propulsion, cohesion, and alignment respectively. We modify the loss function as the summation of negative log-likelihood loss \({{{{{{\mathcal{L}}}}}}}_{{{{{{\rm{nl}}}}}}}\) and three prediction errors:

$${{{{{\mathcal{L}}}}}}={\beta }_{1}{{{{{{\mathcal{L}}}}}}}_{{{{{{\rm{nl}}}}}}}+{\beta }_{2}{{{{{{\mathcal{L}}}}}}}_{{{{{{\rm{r}}}}}}}+{\beta }_{3}{{{{{{\mathcal{L}}}}}}}_{{{{{{\rm{v}}}}}}}+{\beta }_{4}{{{{{{\mathcal{L}}}}}}}_{{{{{{\rm{a}}}}}}},$$
(3)

where β1, β2, β3, β4 are hyperparameters balancing the different parts of loss, and \({{{{{{\mathcal{L}}}}}}}_{{{{{{\rm{r}}}}}}}\), \({{{{{{\mathcal{L}}}}}}}_{{{{{{\rm{v}}}}}}}\) and \({{{{{{\mathcal{L}}}}}}}_{{{{{{\rm{a}}}}}}}\) are the squared error between the predicted and true displacements, velocities and accelerations, respectively. To validate the potency of the extended framework, we generate a 20-bird flocking system with the 3-dimensional Vicsek model. The results show that our framework accurately estimates the self-propulsion, cohesion, and alignment strength of the Vicsek model system, as shown in Fig. 3a–e. The inferred second-order SDEs successfully regenerate the collective behaviors, as detailed in Supplementary Information Section II-B.4.

Fig. 3: Inferring flocking dynamics from both simulation (a–f) and experimental (g–p) data.
figure 3

a The 3-dimensional trajectories of 20 birds generated by the 2nd-order Vicsek model. be The diffusion, self, cohesion, and alignment parts of the first dimension estimated by ϕ(  ), f( . ), ga( . ) and gc( . ), respectively, and their comparison with the true values of the Vicsek model. f The flocking trajectories generated by the inferred stochastic Vicsek dynamics. go Inferring the flocking dynamics from four empirical datasets40. g Comparison between the force field generated by the inferred SDE and that calculated from the empirical data for 8 pigeons at different time stamps t = 25, 50, 100. (h) The flocking trajectories generated by the inferred SDE. The subplot is the empirical trajectories from the first flock data. (ij) The visualization of the align and cohesion functions of the inferred SDE. kp The trajectories and force fields generated by the inferred SDE for the second, third, and fourth flock data, respectively.

Furthermore, we apply our extended second-order SDE inference framework to learn the empirical dynamics of bird flocks. The dataset consists of four sets of homing flights, which were collected from homing pigeons equipped with GPS devices40. The pigeons were released approximately 15 km away from their home loft, and the GPS devices recorded their location during the return journey at a sampling rate of 0.2 seconds. Because there is less variation in the vertical direction of the bird flocks, here we primarily focus on the movement in the horizontal plane and perform the following data preprocessing steps: spline interpolation with a sampling rate of 0.01 for data augmentation; normalization of the data; extraction of the time period after takeoff and before descent when collective behaviors are most prominent; and alignment of the coordinates. Note that some pigeons exhibited outliers, so their data was removed. In the end, the first flock set contains 8 pigeons, and the other three sets contain 7 pigeons each.

Using the extended framework, we successfully learn the self-propulsion, alignment, and cohesion parts based on one of the four flock datasets. The estimated strengths exhibit a close correspondence with specific scaled functions. Specifically, the alignment strength matches with \(\hat{{{{{{\mathcal{A}}}}}}}={a}_{1}(\exp (-{r}_{ij}/3)+{a}_{2})+{a}_{3}\), the cohesion strength matches with \(\hat{{{\mathcal{C}}}}=c_{1}((r_{ij}/2-1)^{3}/(r_{ij}/2+1)^{6}+c_{2})+c_{3}\), and the self-propulsion strength is s1(vi2 + s2) + s3 (refer to Supplementary Information Section III-A). Therefore, the inferred SDE is

$${\dot{{{{{{\bf{v}}}}}}}}_{i}={s}_{1}\left(| {{{{{{\bf{v}}}}}}}_{i}{| }^{2}+{s}_{2}\right)+{s}_{3}+\sum\limits_{j\ne i}\left(\hat{{{{{{\mathcal{C}}}}}}}({r}_{ij}){{{{{{\bf{r}}}}}}}_{ij}+\hat{{{{{{\mathcal{A}}}}}}}({r}_{ij}){{{{{{\bf{v}}}}}}}_{ij}\right)+\hat{{{{{{\boldsymbol{\epsilon }}}}}}}\,{{\mbox{d}}}{{{{{{\bf{W}}}}}}}_{t},$$
(4)

where \({{{{{{\bf{v}}}}}}}_{i}=\dot{{{{{{{\bf{r}}}}}}}_{i}}\), rij = rj − ri, vij = vj − vi, rij = rij, \({{{{{{\bf{W}}}}}}}_{t} \sim \, {{{{{\mathcal{N}}}}}}(0,\,{{\mbox{d}}}\,t)\) representing the Wiener process with mean zero and variance dt with dt = 0.01, and \(\hat{{{{{{\boldsymbol{\epsilon }}}}}}}\) is the estimated intensity of stochasticity. The reproduced force field exhibits consistency with the actual field for a substantial duration across each flock system, and long-term predictions reveal a diverse range of behaviors, as depicted in Fig. 3g–j. To assess the generalizability of the inferred SDE, we employ it to describe three other datasets that were not used in training. We observe that by solely fine-tuning the scaling coefficients without altering the equation form, Eq. (4) is able to effectively capture the underlying dynamic mechanism of the collective behaviors exhibited in these three datasets, as illustrated in Fig. 3k–p. The hyperparameters and coefficients are shown in Supplementary Information Table 5.

The renowned Vicsek model has long served as a staple in flocking dynamics research, often regarded as a simplistic representation. Our finding offers unprecedented evidence that the Vicsek model transcends its toy model status, effectively encapsulating authentic flocking dynamics. Remarkably, Eq. (4) is autonomously inferred from the observation data, devoid of preconceived assumptions about its structure. Consequently, the striking resemblance of the inferred SDE to the second-order Vicsek model41,42 unveils new perspectives for understanding and modeling the collective behaviors of real flocks.

Learning the spreading dynamics of tau pathology in mouse brain

Tau proteins play a crucial role in maintaining the stability of axon microtubules, which is essential for the proper functioning of the brain43. However, in Alzheimer’s disease (AD), misfolded and hyperphosphorylated tau proteins lose their ability to bind microtubules properly, leading to their accumulation as neurofibrillary tangles, a hallmark of the disease44. Previous experimental studies have shown that in the early stages of AD, pathological tau spreads from the transentorhinal cortex to other areas of the limbic and neocortical regions, suggesting a spread along neuroanatomical connections45. There is also evidence indicating that tau can be released into the extracellular space, either as free tau or in vesicles such as exosomes46,47. As the second empirical system, here we apply our proposed inference framework to the observed tau pathology data to identify the hidden governing equation of the spreading dynamics contributed by both neuroanatomical connections and spatial proximity due to extracellular diffusion.

To obtain the tau spreading data, biologists injected nontransgenic (NTG) mice in five specific injection sites with paired helical filament (PHF) tau extracted from the hippocampus and overlaying cortex. The injected mice were euthanized at different time points (1, 3, 6, and 9 months after injection) to obtain pseudo-longitudinal data48,49. The brain sections of each mouse were then stained to label the percentage of infected areas in different brain regions, as shown in Fig. 4a. We consider the bidirectional diffusion of tau pathology along neuroanatomical connections, with retrograde (from terminals to the cell body) and anterograde (from the cell body to terminals) directions49,50. We also take into account the influence of geographical distance on diffusion, as shown in Fig. 4b. This leads to a total of n = 160 brain regions, with time-dependent percentages of the area occupied by tau pathology denoted as y(t). We have neuroanatomical and Euclidean weighted adjacency matrices A and D, respectively, with the anterograde matrix represented as AT.

Fig. 4: Inferring AD tau pathology diffusion dynamics from experimental data.
figure 4

a Experiment data. Nontransgenic (NTG) mice were injected unilaterally with AD PHF tau in the selected injection sites at 3–4 months of age. Four, eight, six, and six mice were euthanized at 1, 3, 6, and 9 months post-injection (MPI), and the pathology was recorded respectively48,49. b The anatomical connections in the retrograde direction (upper) and the Euclidean proximal connections (bottom). c Comparison between true and predicted area occupied (AO) with tau pathology in site iSUM, cCA1, and cCOAa. The true trajectories represent the average of pathology observed in different mice at 4 different months. The error shades indicate the standard deviation (std). d The predictions of log tau pathology from the inferred diffusion dynamics compared to the true log tau pathology at 1, 3, 6, and 9 MPI, respectively. The square of Pearson correlation R2 and P-values for multiple linear regression tests are noted. e The pathology spread patterns predicted from randomly selected 5 injection sites for 500 independent runs (purple dots) v.s. the predicted results starting from the true 5 injection sites (green dots). Dashed lines indicate median values. f Comparison between the predictions from the implicit effective model and the explicit heterogeneous diffusion model. Error bars indicate std. g The effectiveness of the inferred diffusion equation in the LRRK2G2019S mice pathology data.

By applying our framework, we infer the governing equation of the tau pathology diffusion dynamics as follows:

$${y}_{i}(t)= \left({b}_{0i}{x}_{i}(0)+{b}_{1i}\sum\limits_{j=1}^{n}{\tilde{A}}_{ij}\frac{{x}_{j}(0)}{1+{e}^{-{A}_{ij}}}+{b}_{2i}\sum\limits_{j=1}^{n}{\tilde{A}}_{ji}\frac{{x}_{j}(0)}{1+{e}^{-{A}_{ji}}} \right. \\ \left.+{b}_{3i}\sum\limits_{j=1}^{n}{\tilde{D}}_{ij}{x}_{j}(0){e}^{{D}_{ij}}\right)\cdot T(t)+{\sigma }_{i}(t).$$
(5)

Here, \({D}_{ij}=1/\log ({E}_{ij}^{2})\), where matrix E represents the actual Euclidean distance between different regions, and the term T(t) = ct + 1.5t includes trainable parameter ct that captures the varying propagation rate over time. The elements in D less than 0.11 are set to zero, meaning that there is no immediate spatial diffusion between two regions that are far apart (refer to Supplementary Information Section III-B). The binary matrix \(\tilde{A}\) has elements \({\tilde{A}}_{ij}=1\) when Aij > 0, and \({\tilde{A}}_{ij}=0\) otherwise; The same applies to the binary matrix \(\tilde{D}\). In the initial state x(0), only the five sites that were injected with a volume of 1 μg of tau are set to a value of 1 and the rest are zero, consistent with previous work49. The weights b0, b1, b2 and b3 are heterogeneous factors assigned to different regions51. The term σ represents the stochastic noise in the system. It is worth mentioning that the three terms on the right side of the inferred equation correspond to retrograde, anterograde, and spatial diffusion, respectively, which demonstrates the biological interpretability of our inference result.

Due to the stochastic nature of pathology propagation in brains, the pattern emergence occurs following the fluctuating duration of early stages. As shown in Fig. 4c,d, the inferred equation adeptly predicts the tau diffusion at 6 and 9 months post-injection (MPI) given the injection sites. To validate the prediction’s specificity to the injection sites, we assess its performance against 500 randomly selected initial sets comprising five regions each. The results affirm that injections with the experimental seed regions yield the highest accuracy across all time points (Fig. 4e). Furthermore, we evaluate a degenerate model that treats each brain region equally. The outcomes demonstrate reduced predictability of the degenerated model (Fig. 4f), underscoring the significant influence of regional heterogeneity on tau pathology diffusion.

Finally, we apply our method to infer the tau diffusion dynamics in mice with LRRK2G2019S mutation. This mutation is the most common cause of familial Parkinson’s disease and a common risk factor for idiopathic Parkinson’s disease. Mice carrying this mutation exhibit altered tau pathology patterns, but, intriguingly, the inferred equation that accurately delineates tau diffusion in mutant mice (Fig. 4g) shares the same form as Eq. (5). The remarkable distinction lies in the observation that while tau pathology diffusion in NTG mice lacks a directional preference, the diffusion in mutated mice exhibits a pronounced inclination towards the retrograde direction. This preference is quantified by the absolute average value of coefficient b1 of the retrograde direction ranging from 0.7–0.9, contrasting with b2 for the anterograde direction, which falls within the range of 0.1–0.3. These results align with a recent experiment50.

Our discovery offers new insights into tau pathology. Firstly, the inferred Eq. (5) holds promise in generally capturing tau pathology dynamics in brains. Secondly, the results shed light on the significance of spatial diffusion, a factor overlooked in previous studies, indicating its non-negligible impact on tau pathology. Lastly, the delineation of coefficients for retrograde and anterograde diffusion terms underscores the distinct tau pathology dynamics in mutant mice. These findings collectively enhance our understanding of tau pathology mechanisms.

Discussion

Inferring the governing equations of complex systems from observation data is a crucial direction to automatize scientific discovery. Previous studies have primarily focused on benchmarking algorithms on model systems with known ground truths. In contrast, our work delves into two important real-world systems and successfully distills their concealed networked SDEs. This not only showcases the applicability of our approach but also generates novel insights for understanding the mechanisms hidden in empirical flocking and tau pathology diffusion. Importantly, our LaGNA method requires only one trial of nodes’ activity sequence and only snapshots rather than continuous time series data, enhancing its flexibility and adaptability to other real scenarios with the aid of inductive bias.

While LaGNA demonstrates superior performance compared to previous state-of-the-art methods and provides valuable insights into real complex systems, it does have limitations that necessitate further attention in future research. Firstly, in some scenarios, the activity time series of certain nodes may be inaccessible. Therefore, it is worth determining the minimal sub-network structure required to unveil the system dynamics52,53,54. Real data from stochastic systems often exhibit a combination of intrinsic stochasticity and extrinsic noise, with the latter arising from measurement errors. Distinguishing between these types of noise poses significant challenges55,56,57. Without prior knowledge of the dominant source of noise, we treat all noises as intrinsic in this work, and LaGNA demonstrates accurate inference when the relative strength of extrinsic noise is below 10%. When extrinsic noise is more pronounced, a preprocessing step of denoising, such as the Kalman-Takens filter57, enhances inference capability (see Supplementary Information Sections V-B and V-C). However, future efforts are needed to better address extrinsic noises in data.

Secondly, while many real complex networks have been successfully mapped in the past, obtaining the topological data of a network may not always be feasible in certain scenarios. In such cases, there is a need to infer both network topology and system dynamics. Recent commendable efforts have been made to address this challenge18,58,59, yet they either require activity data from many trials with different initial states18 or learn for dynamics prediction rather than inference58,59. Simultaneously inferring both the dynamical equation and network topology of a large real system using a limited amount of experimentally feasible data remains a challenging task.

Thirdly, while the pre-constructed libraries in the second stage of LaGNA can contain a large number of orthogonal or non-orthogonal elementary function terms, it is still possible that the use of pre-constructed libraries may overlook certain terms. Symbolic regression, an alternative method that does not rely on pre-constructed libraries, faces higher-dimensional challenges. Thus, further efforts are needed to enhance the automation of current methods.

Fourthly, there has been considerable interest in higher-order interactions within complex systems in recent years59,60,61. LaGNA can be extended to accommodate higher-order systems by incorporating additional terms such as \({\sum }_{j,k}{A}_{i,j,k}{{{{{\bf{h}}}}}}({{{{{{\bf{x}}}}}}}_{i}(t),{{{{{{\bf{x}}}}}}}_{j}(t),{{{{{{\bf{x}}}}}}}_{k}(t))\) into the interaction part of Eq. (1) where Ai,j,k represents the third-order network and the function h denotes the third-order interaction dynamics. Yet this will increase the complexity in identifying an optimal equation, Yet this extension will increase the complexity of identifying an optimal equation, presenting a promising avenue for future efforts to address.

Methods

Loss function of LaGNA

Consider a complex networked system whose dynamics are governed by stochastic differential equations (SDEs)

$$\,{{\mbox{d}}}{{{{{{\bf{x}}}}}}}_{i}(t)=({{{{{\bf{F}}}}}}({{{{{{\bf{x}}}}}}}_{i}(t))+\mathop{\sum}_{j=1}^{n}{A}_{ij}{{{{{\bf{G}}}}}}({{{{{{\bf{x}}}}}}}_{i}(t), \, {{{{{{\bf{x}}}}}}}_{j}(t)))\,{{\mbox{d}}}t+{{{{{\mathbf{\Phi }}}}}}({{{{{{\bf{x}}}}}}}_{i}(t))\,{{\mbox{d}}}{{{{{{\bf{W}}}}}}}_{t}.$$
(6)

Here, xi(t) represents the d-dimensional state of node i at time t; A is the adjacency matrix of size n × n, where Aij denotes the influence from node j to i; \({{{{{\bf{F}}}}}}\equiv {({F}_{1}({{{{{{\bf{x}}}}}}}_{i}),\, {F}_{2}({{{{{{\bf{x}}}}}}}_{i}),\ldots,\, {F}_{d}({{{{{{\bf{x}}}}}}}_{i}))}^{{{{{{\rm{T}}}}}}}\) and \({{{{{\bf{G}}}}}}\equiv {({G}_{1}({{{{{{\bf{x}}}}}}}_{i},{{{{{{\bf{x}}}}}}}_{j}),{G}_{2}({{{{{{\bf{x}}}}}}}_{i},{{{{{{\bf{x}}}}}}}_{j}),\ldots,{G}_{d}({{{{{{\bf{x}}}}}}}_{i},{{{{{{\bf{x}}}}}}}_{j}))}^{{{{{{\rm{T}}}}}}}\) are nonlinear functions representing the self and interaction dynamics, respectively; Φ(xi(t)) is the positive-definite diffusion matrix of size d × d, and Wt is a d-dimensional vector representing the Wiener process with mean zero and variance dt25. Note that, by choosing different F and G, Eq. (6) can describe a wide range of systems dynamics32,33.

For simplicity, let’s consider first the case d = 1. Given x(t) and dt, x(t + dt) can be considered as points drawn from the normal distribution

$${x}_{i}(t+{{\mbox{d}}}t) \, \sim \, {{{{{\mathcal{N}}}}}}({\mu }_{i}(t), \ {\sigma }_{i}^{2}(t)),$$
(7)

where \({\mu }_{i}(t)={x}_{i}(t)+[F({x}_{i}(t))+{\sum }_{j=1}^{n}{A}_{ij}G({x}_{i}(t), \ {x}_{j}(t))]{{\mbox{d}}}t\), and \({\sigma }_{i}^{2}(t)={\Phi }^{2}({x}_{i}(t)){{\mbox{d}}}t\). To train the network end to end, we use all nodes’ states at time t, x(t), as inputs. Based on the network topology Aij, we map the information flow from node j to node i using a function g(xi(t), xj(t)). The estimated information values are then aggregated element-wise for each receiving node over all respective sending nodes. Additionally, we map the self-dynamics of each node i using a function f(xi(t)). The estimated mean and variance of node i’s activity distribution can be written as \({\tilde{\mu }}_{i}(t)={x}_{i}(t)+[f({x}_{i}(t))+{\sum }_{j=1}^{n}{A}_{ij}g({x}_{i}(t), \ {x}_{j}(t))]{{\mbox{d}}}t\) and \({\tilde{\sigma }}_{i}^{2}(t)={\phi }^{2}({x}_{i}(t)){{\mbox{d}}}t\) respectively. The functions g, f and ϕ are determined by trainable parameters θf, θg and θϕ, respectively.

TO obtain the optimal parameters, we train the model in Fig. 1b by maximizing the likelihood between the true and estimated distributions. Since the true distribution is inaccessible and only the next-step state is available, we instead maximize the following expectations using the maximum likelihood estimates (MLE):

$${\hat{{{{{{\boldsymbol{\theta }}}}}}}}_{f},{\hat{{{{{{\boldsymbol{\theta }}}}}}}}_{g},{\hat{{{{{{\boldsymbol{\theta }}}}}}}}_{\phi }:= {{{\rm{arg}}}}\mathop{{{{\rm{max}}}}}_{{{{{{{\boldsymbol{\theta }}}}}}}_{f},{{{{{{\boldsymbol{\theta }}}}}}}_{g},{{{{{{\boldsymbol{\theta }}}}}}}_{\phi }}{\mathbb{E}}[\log {p}_{{{{{{{\boldsymbol{\theta }}}}}}}_{f},{{{{{{\boldsymbol{\theta }}}}}}}_{g},{{{{{{\boldsymbol{\theta }}}}}}}_{\phi }}({x}_{i}(t+{{\mbox{d}}}t)| {x}_{i}(t), \, {{\mbox{d}}}t)],$$
(8)

where \({p}_{{{{{{{\boldsymbol{\theta }}}}}}}_{f},{{{{{{\boldsymbol{\theta }}}}}}}_{g},{{{{{{\boldsymbol{\theta }}}}}}}_{\phi }}\) represents the probability density of the normal distribution generated by the model of Fig. 1b with parameters θf, θg, θϕ, i.e.,

$$p({x}_{i}(t+1);{\tilde{\mu }}_{i}(t), \, {\tilde{\sigma }}_{i}^{2}(t))=\frac{1}{{\tilde{\sigma }}_{i}(t)\sqrt{2{{{\rm{\pi}}}} }}\exp \left(-\frac{{({x}_{i}(t+1)-{\tilde{\mu }}_{i}(t))}^{2}}{2{\tilde{\sigma }}_{i}^{2}(t)}\right).$$
(9)

Maximizing the likelihood in Eq. (9) is equivalent to minimizing the negative log-likelihood using the estimated \({\tilde{\mu }}_{i}(t)\) and \({\tilde{\sigma }}_{i}^{2}(t)\), i.e.,

$$-\log \left(p\left({x}_{i}(t+1);{\tilde{\mu }}_{i}(t), \ {\tilde{\sigma }}_{i}^{2}(t)\right)\right)= -\log \left(\frac{1}{{\tilde{\sigma }}_{i}(t)\sqrt{2{{{\rm{\pi}}}} }}\right)\\ -\log \left(\exp \left(-\frac{{({x}_{i}(t+1)-{\tilde{\mu }}_{i}(t))}^{2}}{2{\tilde{\sigma }}_{i}^{2}(t)}\right)\right)$$
(10)

Here, the constant coefficients and terms can be omitted, hence the loss function becomes

$${{{{{{\mathcal{L}}}}}}}_{i}({{{{{{\boldsymbol{\theta }}}}}}}_{f}, \ {{{{{{\boldsymbol{\theta }}}}}}}_{g}, \ {{{{{{\boldsymbol{\theta }}}}}}}_{\phi }| {x}_{i}(t+1), \ {x}_{i}(t),\,{{\mbox{d}}}t): = \frac{({x}_{i}(t+1)-{\tilde{\mu }}_{i}(t))}{{\tilde{\sigma }}_{i}^{2}(t)}+\log \left\vert {\tilde{\sigma }}_{i}^{2}(t)\right\vert .$$
(11)

For a training dataset containing n observed nodes, the expectation becomes

$${{{\rm{arg}}}}{{{{\rm{max}}}}}_{{{{{{{\boldsymbol{\theta }}}}}}}_{f},{{{{{{\boldsymbol{\theta }}}}}}}_{g},{{{{{{\boldsymbol{\theta }}}}}}}_{\phi }}\frac{1}{n}\sum\limits_{i=1}^{n}\log {p}_{{{{{{{\boldsymbol{\theta }}}}}}}_{f},{{{{{{\boldsymbol{\theta }}}}}}}_{g},{{{{{{\boldsymbol{\theta }}}}}}}_{\phi }}({x}_{i}(t+1)| {x}_{i}(t), \ {{\mbox{d}}}t).$$
(12)

For the case d > 1, the negative logarithm of a multivariate normal distribution can be written as

$$-\log (p({{{{{{\bf{x}}}}}}}_{i}(t+1);{{{{{\boldsymbol{\mu }}}}}}(t),\Sigma (t)))= \frac{d}{2}\log (2{{{\rm{\pi}}}} )+\frac{1}{2}\log | \Sigma (t)| \\ + \frac{1}{2}{({{{{{{\bf{x}}}}}}}_{i}(t+1) -{{{{{\boldsymbol{\mu }}}}}}(t))}^{T}\Sigma {(t)}^{-1}({{{{{{\bf{x}}}}}}}_{i}(t+1)-{{{{{\boldsymbol{\mu }}}}}}(t)),$$
(13)

where Σ(t) is a positive semidefinite matrix.

Inference of self, interaction, and diffusion parts

After the well-trained model of Fig. 1b separates the self, interaction, and diffusion parts, we adopt the core idea of the two-phase inference approach proposed by us previously17 to infer the concise form for each part. Specifically, with three pre-constructed extensive libraries LF, LG, and LΦ that contain widely used elementary functions (see Supplementary Information Section I-B), we introduce the time series data xi(t), where in, into LF, LG, and LΦ, and obtain the time-varying matrices ΘF(t) ≡ LF(xi(t)), ΘG(t) ≡ LG(xi(t), xj(t)), and ΘΦ(t) ≡ LΦ(xi(t)). Then, the inference problem can be formulated using the estimated values as follows:

$$\hat{{{{{{\bf{f}}}}}}}({{{{{{\bf{x}}}}}}}_{i}(t)) ={\tilde{\Theta }}_{F}(t){{{{{{\boldsymbol{\xi }}}}}}}_{F},\\ \hat{{{{{{\bf{g}}}}}}}({{{{{{\bf{x}}}}}}}_{i}(t), \ {{{{{{\bf{x}}}}}}}_{j}(t)) ={\tilde{\Theta }}_{G}(t){{{{{{\boldsymbol{\xi }}}}}}}_{G},\\ \hat{{{{{{\boldsymbol{\phi }}}}}}}({{{{{{\bf{x}}}}}}}_{i}(t))/\sqrt{\,{{\mbox{d}}}t} ={\tilde{\Theta }}_{\Phi }(t){{{{{{\boldsymbol{\xi }}}}}}}_{\Phi }.$$
(14)

Here \({\tilde{\Theta }}_{F}\equiv {\Theta }_{F}\bigotimes {I}_{d}\), \({\tilde{\Theta }}_{G}\equiv {\Theta }_{G}\bigotimes {I}_{d}\), and \({\tilde{\Theta }}_{\Phi }\equiv {\Theta }_{\Phi }\bigotimes {I}_{d}\), where is the Kronecker product, and Id is d × d identity matrix. Therefore, the objective is to find the appropriate sparse coefficients ξF, ξG, and ξΦ, where most of the elements are zero while only the coefficients of highly relevant elementary functions are non-zero, such that Eq. (14) closely matches the observed data.

The first phase involves global regression to find a few most relevant elementary functions for each part, based on the optimization formulas:

$$ {{{\rm{arg}}}}{{{{\rm{min}}}}}_{{{{{{{\boldsymbol{\xi }}}}}}}_{F}}\int_{0}^{T}\sum\limits_{i=1}^{n}\left(| | {\tilde{\Theta }}_{F}(t){{{{{{\boldsymbol{\xi }}}}}}}_{F}-\hat{{{{{{\bf{f}}}}}}}({{{{{{\bf{x}}}}}}}_{i}(t))| {| }^{2}\right){{\mbox{d}}}t+{{{\rm{\lambda }}}}_{F}(| | {{{{{{\boldsymbol{\xi }}}}}}}_{F}| {| }_{1})\\ {{{\rm{arg}}}}{{{{\rm{min}}}}}_{{{{{{{\boldsymbol{\xi }}}}}}}_{G}}\int_{0}^{T}\sum\limits_{i=1}^{n}\sum\limits_{j=1}^{n}\left(| | {\tilde{\Theta }}_{G}(t){{{{{{\boldsymbol{\xi }}}}}}}_{G}-\hat{{{{{{\bf{g}}}}}}}({{{{{{\bf{x}}}}}}}_{i}(t), \ {{{{{{\bf{x}}}}}}}_{j}(t))| {| }^{2}\right){{\mbox{d}}}t+{{{\rm{\lambda }}}}_{G}(| | {{{{{{\boldsymbol{\xi }}}}}}}_{G}| {| }_{1})\\ {{{\rm{arg}}}}{{{{\rm{min}}}}}_{{{{{{{\boldsymbol{\xi }}}}}}}_{\phi }}\int_{0}^{T}\sum\limits_{i=1}^{n}\left(| | {\tilde{\Theta }}_{\phi }(t){{{{{{\boldsymbol{\xi }}}}}}}_{\phi }-\hat{{{{{{\boldsymbol{\phi }}}}}}}({{{{{{\bf{x}}}}}}}_{i}(t))/\sqrt{\,{{\mbox{d}}}t}| {| }^{2}\right){{\mbox{d}}}t+{{{\rm{\lambda }}}}_{\phi }(| | {{{{{{\boldsymbol{\xi }}}}}}}_{\phi }| {| }_{1}),$$
(15)

where λF, λG and λϕ are hyper-parameters that regulate the sparsity of the coefficients. In the implementation, we use the least absolute shrinkage (LASSO) with five-fold validation to determine the optimal hyper-parameters. Through global regression, we obtain the degree of relevance between each elementary function in the libraries and the hidden dynamics, significantly reducing the model space.

Next, we utilize the second phase to identify the minimal number of elementary functions for self, interaction, and diffusion parts, respectively, which constitute the final stochastic differential equation. To do so, we add the most relevant elementary functions one by one according to the relevance degree obtained in the first phase. We use metric \({\kappa }^{2}=1-\frac{{\sum }_{i}{({\hat{y}}_{i}-{y}_{i})}^{2}}{{\sum }_{i}{({y}_{i}-\overline{y})}^{2}}\) to indicate the regression score of a temporary combination of elementary functions. The more accurate is the current equation, the closer to 1 is κ2. Here, \(\hat{y}\), yi, and \({\overline{y}}_{i}\) are prediction, true value and mean of true value, respectively. As we sequentially add the relevant elementary functions into equation, the metric κ2 will change accordingly. The minimal number of elementary functions for each part is determined when adding more elementary functions into the equation does not increase, or even decreases, the value of κ2, as shown in Fig. 1e–g.

Quantification of inference inaccuracy

The goal of our study is to infer the mathematical equation that describes the dynamics underlying a complex system rather than only to predict future states of the system. To quantify the difference between the two equations, we use symmetric mean absolute percentage error (sMAPE):

$$\,{{\mbox{sMAPE}}}\,=\frac{1}{k}\sum\limits_{i=1}^{k}\frac{| {I}_{i}-{R}_{i}| }{| {I}_{i}|+| {R}_{i}| }.$$
(16)

Here, k is the cardinal number of the set containing the inferred and true terms, Ii and Ri are the inferred and true coefficients for each term respectively. The value of sMAPE is within the interval between 0 and 1. The smaller the sMAPE, more accurate is the inference result. Importantly, sMAPE is sensitive to false negative and false positive errors. For example, if the inferred equation contains a term that should not be there, or does not contain a term that should be there, the value of sMAPE will increase significantly.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.