Abstract
Complex systems with many interacting nodes are inherently stochastic and best described by stochastic differential equations. Despite increasing observation data, inferring these equations from empirical data remains challenging. Here, we propose the Langevin graph network approach to learn the hidden stochastic differential equations of complex networked systems, outperforming five state-of-the-art methods. We apply our approach to two real systems: bird flock movement and tau pathology diffusion in brains. The inferred equation for bird flocks closely resembles the second-order Vicsek model, providing unprecedented evidence that the Vicsek model captures genuine flocking dynamics. Moreover, our approach uncovers the governing equation for the spread of abnormal tau proteins in mouse brains, enabling early prediction of tau occupation in each brain region and revealing distinct pathology dynamics in mutant mice. By learning interpretable stochastic dynamics of complex systems, our findings open new avenues for downstream applications such as control.
Similar content being viewed by others
Introduction
The behaviors of complex systems, ranging from cell migration to pathological protein diffusion, brain activity, human mobility, and bird flocking, exhibit not only nonlinearity but also stochasticity1,2,3. Stochasticity plays a crucial role in enhancing a system’s adaptability to rapidly changing environments4,5, facilitating information processing6,7, and increasing robustness8,9. The emergence of order from disorder has long fascinated scientists, particularly in the context of system dynamics. While the behaviors of complex systems can be observed experimentally, their underlying dynamics remain elusive. Therefore, stochastic differential equations (SDEs) have been widely employed to model such stochastic systems due to their ability to simultaneously describe deterministic evolution and random fluctuations stemming from unresolved degrees of freedom.
However, conventional SDE models used to describe real-world scenarios have certain limitations, such as predefined forms, simplified physics, and assumed parameter values. Fortunately, the increasing availability of empirical data, including network typologies and node activities, provides an opportunity to shift this paradigm. Instead of modeling the dynamics of a complex system using a predefined SDE, it becomes possible to infer the hidden SDE from observational data on system behaviors.
Discovering the governing laws of dynamics from data has become a prominent field of artificial intelligence-empowered scientific exploration10,11,12,13, making significant progress in recent years14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29. Numerous data-driven methods have been proposed to identify ordinary differential equations (ODEs) and partial differential equations for single- and few-body nonlinear systems14,15,16, as well as ODEs for large networks17,18,19. However, these methods may not effectively address real systems exhibiting stochasticity. Previous efforts to learn stochastic dynamics have primarily focused on predicting a system’s future evolution rather than inferring its underlying SDE20. Additionally, the majority of previous methods have been validated on simulated systems with known ground-truth dynamics24,25, and few have demonstrated the ability to infer real stochastic systems with unknown underlying dynamics (with exceptions like27,28).
Here, we aim to address a fundamental question: given the observations of network topology and nodes’ activity series, how can we infer the coupled SDEs that capture the hidden stochastic dynamics of a complex system? The main contributions of our work are summarized as follows:
-
1.
We propose a method termed the Langevin Graph Network Approach (LaGNA) that incorporates an innovative message-passing mechanism to separate dynamical sources within nodal activity data. This method subsequently infers concise mathematical expressions for each of these dynamic sources by leveraging corresponding neural network modules. Comparative analyses showcase our method’s proficiency in effectively unveiling the hidden coupled SDEs of complex networked systems, demonstrating superior performance compared to five state-of-the-art methodologies in the field.
-
2.
We apply our method to natural flocking, an intriguing phenomenon and important research topic in the community of statistical physics and complex systems. From the trajectory data of several flocks our method successfully infers the SDE of real flocking dynamics. The inferred SDE exhibits a remarkable resemblance to the second-order Vicsek model, providing unprecedented evidence that the Vicsek model is not just a toy model but captures genuine flocking dynamics.
-
3.
We apply our method to the spreading of pathological tau protein in Alzheimer’s disease (AD) brains, a frontier problem in neuroscience. From the experimental data of tau pathology in AD mice brains, our method successfully infers a novel SDE that captures the tau diffusion dynamics. The finding not only enables early-stage prediction of the percentage of brain areas that will be affected by tau pathology but also offers novel quantitative insights into the mechanism of tau pathology.
Results
Overview of the LaGNA framework
The state evolution of a complex system is often driven by several dynamic sources, including the self-dynamics of each node, the interaction between nodes, and intrinsic stochastic diffusion. In the first stage of LaGNA (Fig. 1a–d), we design a message-passing mechanism guided by the complex network’s nontrivial topology. The message-passing mechanism consists of three neural network (NN) modules: self-dynamics simulator \(\hat{{{{{{\bf{f}}}}}}}(\cdot )\), interaction dynamics simulator \(\hat{{{{{{\bf{g}}}}}}}(\cdot )\), and diffusion simulator \(\hat{{{{{{\boldsymbol{\phi }}}}}}}(\cdot )\), tailored to separate the dynamical sources hidden in nodes’ activity data (Fig. 1b,d) and differing from that used in graph neural network30.
Each node i’s activity at time t is denoted as a d-dimensional vector \({{{{{{\bf{x}}}}}}}_{i}(t)\equiv {({x}_{i,1}(t),{x}_{i,2}(t),\ldots,{x}_{i,d}(t))}^{{{{{{\rm{T}}}}}}}\). Given the input of nodes’ activities xi(t), i = 1, 2, …, n (Fig. 1a), the LaGNA estimates the states at the next time step \({\hat{{{{{{\bf{x}}}}}}}}_{i}(t+\,{{\mbox{d}}}\,t)\) (Fig. 1c) using the following equation:
Here, Aij is the adjacency matrix representing the network topology, and Wt is the d-dimensional vector representing the Wiener process (i.e., normally distributed around zero with variance dt). Note that the form of Eq. (1) can describe a wide range of complex dynamical systems31,32,33, including epidemic spreading, neuronal dynamics, ecological dynamics, gene regulation, as well as flocking and tau pathology diffusion, as we will show below. The current stage of LaGNA can be viewed as an implicit dynamical system with a large number of trainable parameters: θf, θg, and θϕ representing the self, interaction, and diffusion simulators, respectively. To capture the underlying dynamics of a given complex system, LaGNA’s outputs \({\hat{{{{{{\bf{x}}}}}}}}_{i}(t+\,{{\mbox{d}}}\,t)\) need to exhibit behavior similar to the true observation xi(t + dt). Due to the intrinsic stochasticity, minimizing the difference between \({\hat{{{{{{\bf{x}}}}}}}}_{i}(t+\,{{\mbox{d}}}\,t)\) and xi(t + dt) will result in overfitting. Therefore, we train LaGNA with observation pairs xi(t) and xi(t + dt), and obtain its optimal parameters by maximizing instead the expectation:
where \({p}_{{{{{{{\boldsymbol{\theta }}}}}}}_{f},{{{{{{\boldsymbol{\theta }}}}}}}_{g},{{{{{{\boldsymbol{\theta }}}}}}}_{\phi }}\) is the probability density of the normal distribution generated by the model of Fig. 1b with parameters θf, θg, θϕ. Note that Eq. (2) describes the case of d = 1; refer to the “Methods” section for situations d > 1.
The well-trained LaGNA has the ability to predict future behaviors; however, it currently lacks an explicit equation to describe the underlying dynamics of the system. In the second stage, we aim to unveil the inner workings of the LaGNA black box. The tailored message-passing mechanism (Fig. 1d) has separated the underlying dynamics into three neural network modules, namely \(\hat{{{{{{\bf{f}}}}}}}(\cdot )\), \(\hat{{{{{{\bf{g}}}}}}}(\cdot )\), and \(\hat{{{{{{\boldsymbol{\phi }}}}}}}(\cdot )\). This decomposition allows us to penetrate each module, deriving explicit expressions for the three parts. Using pre-constructed comprehensive libraries of terms, i.e., LF, LG, and LΦ shown in Supplementary Information Section I-B, we identify the optimal combination of terms from the libraries using a modified version of our two-phase approach17. Our framework successfully separates and identifies concise mathematical expressions for self-dynamics, interaction dynamics, and intrinsic stochastic diffusion, respectively, which together form the final stochastic differential equation (Fig. 1e,f,g). LaGNA enables the balance of accuracy and complexity of mathematical expressions (see Supplementary Information Section III-C), becoming an interpretable learner for discovering the hidden SDEs of complex networked systems. Further details are described in Methods and Supplementary Information Section I.
Learning the stochastic dynamics of signed and weighted networks
Signed and weighted networks are prevalent in various biological and physical systems. In neuronal systems, for instance, the synapses between neurons can be either excitatory, enhancing the activity of the receiving neuron, or inhibitory, reducing activity. In physical systems like power grids and traffic networks, link weights play a crucial role in system characterization. The combined effect of heterogeneous links in these networks makes interaction intricate and poses challenges in dynamics inference.
To address the challenge of interaction heterogeneity and validate the effectiveness of our framework, we conduct simulations of a stochastic system with Hindmarsh-Rose (HR) neuronal dynamics on a signed network34 (refer to Supplementary Information Section II-B.3). In the simulations, we randomly assign half of the nodes as excitatory and the other half inhibitory. The links from excitatory nodes show excitability with Vsyn = 2, while the links from inhibitory nodes show inhibition with Vsyn = − 1.5. To infer the hidden HR dynamics, we incorporate the knowledge of link types and utilize two NNs for estimating excitatory and inhibitory interaction dynamics, respectively. It is worth noting that we use only one trial of the nodes’ activity sequence. The results in Fig. 2b–d show that our framework accurately estimates the terms of self, diffusion, and, notably, the two types of interactions. The inferred stochastic differential equations (SDEs) successfully reproduce the force field (Fig. 2e) and the stochastic trajectory (Fig. 2f). Additionally, we consider a weighted network Aij, where 0 ≤ Aij ≤ 1 and further simulate the network dynamics using stochastic Rössler equations35 (refer to Supplementary Information Section II-B.2). The results in Fig. 2h–j demonstrate the inability of the stochastic Rössler dynamics on weighted networks. The trajectory generated by the inferred SDEs exhibits similar dynamical characteristics compared to the original trajectory (Fig. 2g, l), and the reproduced force field closely aligns with the true force field (Fig. 2k).
To underscore the significance of our LaGNA method in inferring the stochastic dynamics of complex networked systems, we conduct comparisons between LaGNA and five state-of-the-art methods, namely Modified-SINDy29, Two-Phase inference17, SDE-net25, SVISE26 and SFI28, utilizing the stochastic Lorenz networked system (refer to Supplementary Information Section II-B.1). In this model system, each node’s state is represented by a three-dimensional vector xi = (xi,1, xi,2, xi,3), where the intrinsic stochastic diffusion in one dimension (e.g., xi,2) can be influenced by another (e.g., xi,3). The stochastic intensity is denoted by \(1/\sqrt{\gamma }\), with a smaller γ indicating higher stochasticity. As shown in Fig. 2m, among the evaluated methods, LaGNA demonstrates significantly reduced errors in inferring the networked SDE, outperforming the other methods by two orders of magnitude. Although SDE-Net, SVISE, and SFI can effectively estimate drift and diffusion, explicit expressions for the networked SDEs remain elusive for the three methods. This challenge arises from the fact that the drift field of networked SDEs encompasses both self-dynamics and interaction effects. In contrast, LaGNA incorporates three specifically designed neural network modules, especially the message-passing module defined on links to capture the interaction dynamics, which together distinguish between self, interaction, and diffusion effects in observation data. This distinction enables accurate learning of networked SDEs from stochastic trajectories, effectively overcoming the limitations of previous “methods” (see Supplementary Information Section V for further comparisons).
Note that there is an interesting method recently introduced for learning macroscopic dynamical descriptions of stochastic dissipative systems27. However, the objective of this method differs from ours, as it aims to capture coarse-grained macroscopic behavior rather than the node-level microscopic dynamics required for our exploration of the real complex systems in the following sections. Due to the disparity in objectives and outputs, this method is not included in the comparison tests.
Learning the dynamics of empirical bird flocks
Collective motions and swarming are fascinating phenomena widely observed in nature3,36,37, such as bird and fish flocks, cell motions, and bacteria colonies. Understanding how individuals interact when large numbers of individuals move together in groups without colliding, or even when they perform tasks together, like hunting3,38, has been a topic of widespread discussion. The prevailing consensus on this phenomenon is that the condensation of individuals results from birds being consistent with their neighbors regarding speed, also known as alignment, as well as their tendency to maintain a close distance while avoiding collisions, also known as cohesion3,39.
To discover the underlying dynamics from the flocking trajectories, we extend the internal architecture of the LaGNA based on the above hypothesis. Specifically, we implement a second-order version by setting up three specialized NNs for simulating the self-propulsion, cohesion, and alignment respectively. We modify the loss function as the summation of negative log-likelihood loss \({{{{{{\mathcal{L}}}}}}}_{{{{{{\rm{nl}}}}}}}\) and three prediction errors:
where β1, β2, β3, β4 are hyperparameters balancing the different parts of loss, and \({{{{{{\mathcal{L}}}}}}}_{{{{{{\rm{r}}}}}}}\), \({{{{{{\mathcal{L}}}}}}}_{{{{{{\rm{v}}}}}}}\) and \({{{{{{\mathcal{L}}}}}}}_{{{{{{\rm{a}}}}}}}\) are the squared error between the predicted and true displacements, velocities and accelerations, respectively. To validate the potency of the extended framework, we generate a 20-bird flocking system with the 3-dimensional Vicsek model. The results show that our framework accurately estimates the self-propulsion, cohesion, and alignment strength of the Vicsek model system, as shown in Fig. 3a–e. The inferred second-order SDEs successfully regenerate the collective behaviors, as detailed in Supplementary Information Section II-B.4.
Furthermore, we apply our extended second-order SDE inference framework to learn the empirical dynamics of bird flocks. The dataset consists of four sets of homing flights, which were collected from homing pigeons equipped with GPS devices40. The pigeons were released approximately 15 km away from their home loft, and the GPS devices recorded their location during the return journey at a sampling rate of 0.2 seconds. Because there is less variation in the vertical direction of the bird flocks, here we primarily focus on the movement in the horizontal plane and perform the following data preprocessing steps: spline interpolation with a sampling rate of 0.01 for data augmentation; normalization of the data; extraction of the time period after takeoff and before descent when collective behaviors are most prominent; and alignment of the coordinates. Note that some pigeons exhibited outliers, so their data was removed. In the end, the first flock set contains 8 pigeons, and the other three sets contain 7 pigeons each.
Using the extended framework, we successfully learn the self-propulsion, alignment, and cohesion parts based on one of the four flock datasets. The estimated strengths exhibit a close correspondence with specific scaled functions. Specifically, the alignment strength matches with \(\hat{{{{{{\mathcal{A}}}}}}}={a}_{1}(\exp (-{r}_{ij}/3)+{a}_{2})+{a}_{3}\), the cohesion strength matches with \(\hat{{{\mathcal{C}}}}=c_{1}((r_{ij}/2-1)^{3}/(r_{ij}/2+1)^{6}+c_{2})+c_{3}\), and the self-propulsion strength is s1(∣vi∣2 + s2) + s3 (refer to Supplementary Information Section III-A). Therefore, the inferred SDE is
where \({{{{{{\bf{v}}}}}}}_{i}=\dot{{{{{{{\bf{r}}}}}}}_{i}}\), rij = rj − ri, vij = vj − vi, rij = ∣rij∣, \({{{{{{\bf{W}}}}}}}_{t} \sim \, {{{{{\mathcal{N}}}}}}(0,\,{{\mbox{d}}}\,t)\) representing the Wiener process with mean zero and variance dt with dt = 0.01, and \(\hat{{{{{{\boldsymbol{\epsilon }}}}}}}\) is the estimated intensity of stochasticity. The reproduced force field exhibits consistency with the actual field for a substantial duration across each flock system, and long-term predictions reveal a diverse range of behaviors, as depicted in Fig. 3g–j. To assess the generalizability of the inferred SDE, we employ it to describe three other datasets that were not used in training. We observe that by solely fine-tuning the scaling coefficients without altering the equation form, Eq. (4) is able to effectively capture the underlying dynamic mechanism of the collective behaviors exhibited in these three datasets, as illustrated in Fig. 3k–p. The hyperparameters and coefficients are shown in Supplementary Information Table 5.
The renowned Vicsek model has long served as a staple in flocking dynamics research, often regarded as a simplistic representation. Our finding offers unprecedented evidence that the Vicsek model transcends its toy model status, effectively encapsulating authentic flocking dynamics. Remarkably, Eq. (4) is autonomously inferred from the observation data, devoid of preconceived assumptions about its structure. Consequently, the striking resemblance of the inferred SDE to the second-order Vicsek model41,42 unveils new perspectives for understanding and modeling the collective behaviors of real flocks.
Learning the spreading dynamics of tau pathology in mouse brain
Tau proteins play a crucial role in maintaining the stability of axon microtubules, which is essential for the proper functioning of the brain43. However, in Alzheimer’s disease (AD), misfolded and hyperphosphorylated tau proteins lose their ability to bind microtubules properly, leading to their accumulation as neurofibrillary tangles, a hallmark of the disease44. Previous experimental studies have shown that in the early stages of AD, pathological tau spreads from the transentorhinal cortex to other areas of the limbic and neocortical regions, suggesting a spread along neuroanatomical connections45. There is also evidence indicating that tau can be released into the extracellular space, either as free tau or in vesicles such as exosomes46,47. As the second empirical system, here we apply our proposed inference framework to the observed tau pathology data to identify the hidden governing equation of the spreading dynamics contributed by both neuroanatomical connections and spatial proximity due to extracellular diffusion.
To obtain the tau spreading data, biologists injected nontransgenic (NTG) mice in five specific injection sites with paired helical filament (PHF) tau extracted from the hippocampus and overlaying cortex. The injected mice were euthanized at different time points (1, 3, 6, and 9 months after injection) to obtain pseudo-longitudinal data48,49. The brain sections of each mouse were then stained to label the percentage of infected areas in different brain regions, as shown in Fig. 4a. We consider the bidirectional diffusion of tau pathology along neuroanatomical connections, with retrograde (from terminals to the cell body) and anterograde (from the cell body to terminals) directions49,50. We also take into account the influence of geographical distance on diffusion, as shown in Fig. 4b. This leads to a total of n = 160 brain regions, with time-dependent percentages of the area occupied by tau pathology denoted as y(t). We have neuroanatomical and Euclidean weighted adjacency matrices A and D, respectively, with the anterograde matrix represented as AT.
By applying our framework, we infer the governing equation of the tau pathology diffusion dynamics as follows:
Here, \({D}_{ij}=1/\log ({E}_{ij}^{2})\), where matrix E represents the actual Euclidean distance between different regions, and the term T(t) = ct + 1.5t includes trainable parameter ct that captures the varying propagation rate over time. The elements in D less than 0.11 are set to zero, meaning that there is no immediate spatial diffusion between two regions that are far apart (refer to Supplementary Information Section III-B). The binary matrix \(\tilde{A}\) has elements \({\tilde{A}}_{ij}=1\) when Aij > 0, and \({\tilde{A}}_{ij}=0\) otherwise; The same applies to the binary matrix \(\tilde{D}\). In the initial state x(0), only the five sites that were injected with a volume of 1 μg of tau are set to a value of 1 and the rest are zero, consistent with previous work49. The weights b0, b1, b2 and b3 are heterogeneous factors assigned to different regions51. The term σ represents the stochastic noise in the system. It is worth mentioning that the three terms on the right side of the inferred equation correspond to retrograde, anterograde, and spatial diffusion, respectively, which demonstrates the biological interpretability of our inference result.
Due to the stochastic nature of pathology propagation in brains, the pattern emergence occurs following the fluctuating duration of early stages. As shown in Fig. 4c,d, the inferred equation adeptly predicts the tau diffusion at 6 and 9 months post-injection (MPI) given the injection sites. To validate the prediction’s specificity to the injection sites, we assess its performance against 500 randomly selected initial sets comprising five regions each. The results affirm that injections with the experimental seed regions yield the highest accuracy across all time points (Fig. 4e). Furthermore, we evaluate a degenerate model that treats each brain region equally. The outcomes demonstrate reduced predictability of the degenerated model (Fig. 4f), underscoring the significant influence of regional heterogeneity on tau pathology diffusion.
Finally, we apply our method to infer the tau diffusion dynamics in mice with LRRK2G2019S mutation. This mutation is the most common cause of familial Parkinson’s disease and a common risk factor for idiopathic Parkinson’s disease. Mice carrying this mutation exhibit altered tau pathology patterns, but, intriguingly, the inferred equation that accurately delineates tau diffusion in mutant mice (Fig. 4g) shares the same form as Eq. (5). The remarkable distinction lies in the observation that while tau pathology diffusion in NTG mice lacks a directional preference, the diffusion in mutated mice exhibits a pronounced inclination towards the retrograde direction. This preference is quantified by the absolute average value of coefficient b1 of the retrograde direction ranging from 0.7–0.9, contrasting with b2 for the anterograde direction, which falls within the range of 0.1–0.3. These results align with a recent experiment50.
Our discovery offers new insights into tau pathology. Firstly, the inferred Eq. (5) holds promise in generally capturing tau pathology dynamics in brains. Secondly, the results shed light on the significance of spatial diffusion, a factor overlooked in previous studies, indicating its non-negligible impact on tau pathology. Lastly, the delineation of coefficients for retrograde and anterograde diffusion terms underscores the distinct tau pathology dynamics in mutant mice. These findings collectively enhance our understanding of tau pathology mechanisms.
Discussion
Inferring the governing equations of complex systems from observation data is a crucial direction to automatize scientific discovery. Previous studies have primarily focused on benchmarking algorithms on model systems with known ground truths. In contrast, our work delves into two important real-world systems and successfully distills their concealed networked SDEs. This not only showcases the applicability of our approach but also generates novel insights for understanding the mechanisms hidden in empirical flocking and tau pathology diffusion. Importantly, our LaGNA method requires only one trial of nodes’ activity sequence and only snapshots rather than continuous time series data, enhancing its flexibility and adaptability to other real scenarios with the aid of inductive bias.
While LaGNA demonstrates superior performance compared to previous state-of-the-art methods and provides valuable insights into real complex systems, it does have limitations that necessitate further attention in future research. Firstly, in some scenarios, the activity time series of certain nodes may be inaccessible. Therefore, it is worth determining the minimal sub-network structure required to unveil the system dynamics52,53,54. Real data from stochastic systems often exhibit a combination of intrinsic stochasticity and extrinsic noise, with the latter arising from measurement errors. Distinguishing between these types of noise poses significant challenges55,56,57. Without prior knowledge of the dominant source of noise, we treat all noises as intrinsic in this work, and LaGNA demonstrates accurate inference when the relative strength of extrinsic noise is below 10%. When extrinsic noise is more pronounced, a preprocessing step of denoising, such as the Kalman-Takens filter57, enhances inference capability (see Supplementary Information Sections V-B and V-C). However, future efforts are needed to better address extrinsic noises in data.
Secondly, while many real complex networks have been successfully mapped in the past, obtaining the topological data of a network may not always be feasible in certain scenarios. In such cases, there is a need to infer both network topology and system dynamics. Recent commendable efforts have been made to address this challenge18,58,59, yet they either require activity data from many trials with different initial states18 or learn for dynamics prediction rather than inference58,59. Simultaneously inferring both the dynamical equation and network topology of a large real system using a limited amount of experimentally feasible data remains a challenging task.
Thirdly, while the pre-constructed libraries in the second stage of LaGNA can contain a large number of orthogonal or non-orthogonal elementary function terms, it is still possible that the use of pre-constructed libraries may overlook certain terms. Symbolic regression, an alternative method that does not rely on pre-constructed libraries, faces higher-dimensional challenges. Thus, further efforts are needed to enhance the automation of current methods.
Fourthly, there has been considerable interest in higher-order interactions within complex systems in recent years59,60,61. LaGNA can be extended to accommodate higher-order systems by incorporating additional terms such as \({\sum }_{j,k}{A}_{i,j,k}{{{{{\bf{h}}}}}}({{{{{{\bf{x}}}}}}}_{i}(t),{{{{{{\bf{x}}}}}}}_{j}(t),{{{{{{\bf{x}}}}}}}_{k}(t))\) into the interaction part of Eq. (1) where Ai,j,k represents the third-order network and the function h denotes the third-order interaction dynamics. Yet this will increase the complexity in identifying an optimal equation, Yet this extension will increase the complexity of identifying an optimal equation, presenting a promising avenue for future efforts to address.
Methods
Loss function of LaGNA
Consider a complex networked system whose dynamics are governed by stochastic differential equations (SDEs)
Here, xi(t) represents the d-dimensional state of node i at time t; A is the adjacency matrix of size n × n, where Aij denotes the influence from node j to i; \({{{{{\bf{F}}}}}}\equiv {({F}_{1}({{{{{{\bf{x}}}}}}}_{i}),\, {F}_{2}({{{{{{\bf{x}}}}}}}_{i}),\ldots,\, {F}_{d}({{{{{{\bf{x}}}}}}}_{i}))}^{{{{{{\rm{T}}}}}}}\) and \({{{{{\bf{G}}}}}}\equiv {({G}_{1}({{{{{{\bf{x}}}}}}}_{i},{{{{{{\bf{x}}}}}}}_{j}),{G}_{2}({{{{{{\bf{x}}}}}}}_{i},{{{{{{\bf{x}}}}}}}_{j}),\ldots,{G}_{d}({{{{{{\bf{x}}}}}}}_{i},{{{{{{\bf{x}}}}}}}_{j}))}^{{{{{{\rm{T}}}}}}}\) are nonlinear functions representing the self and interaction dynamics, respectively; Φ(xi(t)) is the positive-definite diffusion matrix of size d × d, and Wt is a d-dimensional vector representing the Wiener process with mean zero and variance dt25. Note that, by choosing different F and G, Eq. (6) can describe a wide range of systems dynamics32,33.
For simplicity, let’s consider first the case d = 1. Given x(t) and dt, x(t + dt) can be considered as points drawn from the normal distribution
where \({\mu }_{i}(t)={x}_{i}(t)+[F({x}_{i}(t))+{\sum }_{j=1}^{n}{A}_{ij}G({x}_{i}(t), \ {x}_{j}(t))]{{\mbox{d}}}t\), and \({\sigma }_{i}^{2}(t)={\Phi }^{2}({x}_{i}(t)){{\mbox{d}}}t\). To train the network end to end, we use all nodes’ states at time t, x(t), as inputs. Based on the network topology Aij, we map the information flow from node j to node i using a function g(xi(t), xj(t)). The estimated information values are then aggregated element-wise for each receiving node over all respective sending nodes. Additionally, we map the self-dynamics of each node i using a function f(xi(t)). The estimated mean and variance of node i’s activity distribution can be written as \({\tilde{\mu }}_{i}(t)={x}_{i}(t)+[f({x}_{i}(t))+{\sum }_{j=1}^{n}{A}_{ij}g({x}_{i}(t), \ {x}_{j}(t))]{{\mbox{d}}}t\) and \({\tilde{\sigma }}_{i}^{2}(t)={\phi }^{2}({x}_{i}(t)){{\mbox{d}}}t\) respectively. The functions g, f and ϕ are determined by trainable parameters θf, θg and θϕ, respectively.
TO obtain the optimal parameters, we train the model in Fig. 1b by maximizing the likelihood between the true and estimated distributions. Since the true distribution is inaccessible and only the next-step state is available, we instead maximize the following expectations using the maximum likelihood estimates (MLE):
where \({p}_{{{{{{{\boldsymbol{\theta }}}}}}}_{f},{{{{{{\boldsymbol{\theta }}}}}}}_{g},{{{{{{\boldsymbol{\theta }}}}}}}_{\phi }}\) represents the probability density of the normal distribution generated by the model of Fig. 1b with parameters θf, θg, θϕ, i.e.,
Maximizing the likelihood in Eq. (9) is equivalent to minimizing the negative log-likelihood using the estimated \({\tilde{\mu }}_{i}(t)\) and \({\tilde{\sigma }}_{i}^{2}(t)\), i.e.,
Here, the constant coefficients and terms can be omitted, hence the loss function becomes
For a training dataset containing n observed nodes, the expectation becomes
For the case d > 1, the negative logarithm of a multivariate normal distribution can be written as
where Σ(t) is a positive semidefinite matrix.
Inference of self, interaction, and diffusion parts
After the well-trained model of Fig. 1b separates the self, interaction, and diffusion parts, we adopt the core idea of the two-phase inference approach proposed by us previously17 to infer the concise form for each part. Specifically, with three pre-constructed extensive libraries LF, LG, and LΦ that contain widely used elementary functions (see Supplementary Information Section I-B), we introduce the time series data xi(t), where i ∈ n, into LF, LG, and LΦ, and obtain the time-varying matrices ΘF(t) ≡ LF(xi(t)), ΘG(t) ≡ LG(xi(t), xj(t)), and ΘΦ(t) ≡ LΦ(xi(t)). Then, the inference problem can be formulated using the estimated values as follows:
Here \({\tilde{\Theta }}_{F}\equiv {\Theta }_{F}\bigotimes {I}_{d}\), \({\tilde{\Theta }}_{G}\equiv {\Theta }_{G}\bigotimes {I}_{d}\), and \({\tilde{\Theta }}_{\Phi }\equiv {\Theta }_{\Phi }\bigotimes {I}_{d}\), where ⨂ is the Kronecker product, and Id is d × d identity matrix. Therefore, the objective is to find the appropriate sparse coefficients ξF, ξG, and ξΦ, where most of the elements are zero while only the coefficients of highly relevant elementary functions are non-zero, such that Eq. (14) closely matches the observed data.
The first phase involves global regression to find a few most relevant elementary functions for each part, based on the optimization formulas:
where λF, λG and λϕ are hyper-parameters that regulate the sparsity of the coefficients. In the implementation, we use the least absolute shrinkage (LASSO) with five-fold validation to determine the optimal hyper-parameters. Through global regression, we obtain the degree of relevance between each elementary function in the libraries and the hidden dynamics, significantly reducing the model space.
Next, we utilize the second phase to identify the minimal number of elementary functions for self, interaction, and diffusion parts, respectively, which constitute the final stochastic differential equation. To do so, we add the most relevant elementary functions one by one according to the relevance degree obtained in the first phase. We use metric \({\kappa }^{2}=1-\frac{{\sum }_{i}{({\hat{y}}_{i}-{y}_{i})}^{2}}{{\sum }_{i}{({y}_{i}-\overline{y})}^{2}}\) to indicate the regression score of a temporary combination of elementary functions. The more accurate is the current equation, the closer to 1 is κ2. Here, \(\hat{y}\), yi, and \({\overline{y}}_{i}\) are prediction, true value and mean of true value, respectively. As we sequentially add the relevant elementary functions into equation, the metric κ2 will change accordingly. The minimal number of elementary functions for each part is determined when adding more elementary functions into the equation does not increase, or even decreases, the value of κ2, as shown in Fig. 1e–g.
Quantification of inference inaccuracy
The goal of our study is to infer the mathematical equation that describes the dynamics underlying a complex system rather than only to predict future states of the system. To quantify the difference between the two equations, we use symmetric mean absolute percentage error (sMAPE):
Here, k is the cardinal number of the set containing the inferred and true terms, Ii and Ri are the inferred and true coefficients for each term respectively. The value of sMAPE is within the interval between 0 and 1. The smaller the sMAPE, more accurate is the inference result. Importantly, sMAPE is sensitive to false negative and false positive errors. For example, if the inferred equation contains a term that should not be there, or does not contain a term that should be there, the value of sMAPE will increase significantly.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
The codes for generating the simulation data in this study are deposited in the public GitHub repository51. The real data of bird flocks40 and tau pathology49 are shared via the link https://doi.org/10.6084/m9.figshare.24804894.v4. Source data are provided in this paper.
Code availability
The codes are available in the public GitHub repository and on Zenodo: https://github.com/Ting-TingGao/Network-SDE-Inference.git[https://doi.org/10.5281/zenodo.12112887]51.
References
Brückner, D. B. et al. Stochastic nonlinear dynamics of confined cell migration in two-state systems. Nat. Phys. 15, 595–601 (2019).
Ji, F., Wu, Y., Pumera, M. & Zhang, L. Collective behaviors of active matter learning from natural taxes across scales. Adv. Mater. 35, 2203959 (2023).
Vicsek, T. & Zafeiris, A. Collective motion. Phys. Rep. 517, 71–140 (2012).
Shahrezaei, V. & Swain, P. S. The stochastic nature of biochemical networks. Curr. Opin. Biotechnol. 19, 369–374 (2008).
Acar, M., Mettetal, J. T. & Van Oudenaarden, A. Stochastic switching as a survival strategy in fluctuating environments. Nat. Genet. 40, 471–475 (2008).
Rolls, E. T. & Deco, G.The Noisy Brain: Stochastic Dynamics as a Principle of Brain Function (Oxford University Press, Oxford, 2010).
Mendonça, P. R. et al. Stochastic and deterministic dynamics of intrinsically irregular firing in cortical inhibitory interneurons. eLife 5, e16475 (2016).
Palmer, T. Stochastic weather and climate models. Nat. Rev. Phys. 1, 463–471 (2019).
Grilli, J. Macroecological laws describe variation and diversity in microbial communities. Nat. Commun. 11, 4743 (2020).
Georgescu, I. How machines could teach physicists new scientific concepts. Nat. Rev. Phys. 4, 736–738 (2022).
Krenn, M. et al. On scientific understanding with artificial intelligence. Nat. Rev. Phys. 4, 761–769 (2022).
Liu, Z. & Tegmark, M. Machine learning conservation laws from trajectories. Phys. Rev. Lett. 126, 180604 (2021).
Wang, H. et al. Scientific discovery in the age of artificial intelligence. Nature 620, 47–60 (2023).
Brunton, S. L., Proctor, J. L. & Kutz, J. N. Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proc. Natl Acad. Sci. USA 113, 3932–3937 (2016).
Raissi, M., Perdikaris, P. & Karniadakis, G. E. Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 378, 686–707 (2019).
Cranmer, M. et al. Discovering symbolic models from deep learning with inductive biases. Adv. Neural Inf. Process. Syst. 33, 17429–17442 (2020).
Gao, T.-T. & Yan, G. Autonomous inference of complex network dynamics from incomplete and noisy data. Nat. Comput. Sci. 2, 160–168 (2022).
Zhang, Y. et al. Universal framework for reconstructing complex networks and node dynamics from discrete or continuous dynamics data. Phys. Rev. E 106, 034315 (2022).
Rao, C. et al. Encoding physics to learn reaction–diffusion processes. Nat. Mach. Intell. 5, 765–779 (2023).
Gu, T. et al. Stochastic trajectory prediction via motion indeterminacy diffusion. Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. 17113–17122 (2022).
Tang, K., Ao, P. & Yuan, B. Robust reconstruction of the fokker-planck equations from time series at different sampling rates. EPL 102, 40003 (2013).
Bernhard, J. E., Moreland, J. S. & Bass, S. A. Bayesian estimation of the specific shear and bulk viscosity of quark–gluon plasma. Nat. Phys. 15, 1113–1117 (2019).
Mitra, E. D. & Hlavacek, W. S. Parameter estimation and uncertainty quantification for systems biology models. Curr. Opin. Syst. Biol. 18, 9–18 (2019).
Brückner, D. B., Ronceray, P. & Broedersz, C. P. Inferring the dynamics of underdamped stochastic systems. Phys. Rev. Lett. 125, 058103 (2020).
Dietrich, F. et al. Learning effective stochastic differential equations from microscopic simulations: linking stochastic numerics to deep learning. Chaos 33, 023121 (2023).
Course, K. & Nair, P. B. State estimation of a physical system with unknown governing equations. Nature 622, 261–267 (2023).
Chen, X. et al. Constructing custom thermodynamics using deep learning. Nat. Comput. Sci. 4, 66–85 (2024).
Frishman, A. & Ronceray, P. Learning force fields from stochastic trajectories. Phys. Rev. X 10, 021009 (2020).
Kaheman, K., Brunton, S. L. & Kutz, J. N. Automatic differentiation to simultaneously identify nonlinear dynamics and extract noise probability distributions from data. Mach. Learn.: Sci. Technol. 3, 015031 (2022).
Xu, K., Hu, W., Leskovec, J. & Jegelka, S. How powerful are graph neural networks? Int. Conf. Learn. Represent. (ICLR) (2019).
Boccaletti, S., Latora, V., Moreno, Y., Chavez, M. & Hwang, D.-U. Complex networks: structure and dynamics. Phys. Rep. 424, 175–308 (2006).
Barzel, B. & Barabási, A.-L. Universality in network dynamics. Nat. Phys. 9, 673–681 (2013).
Meena, C. et al. Emergent stability in complex network dynamics. Nat. Phys. 19, 1033–1042 (2023).
Borges, F. et al. Inference of topology and the nature of synapses, and the flow of information in neuronal networks. Phys. Rev. E 97, 022303 (2018).
Arenas, A., Díaz-Guilera, A., Kurths, J., Moreno, Y. & Zhou, C. Synchronization in complex networks. Phys. Rep. 469, 93–153 (2008).
Cavagna, A. et al. Scale-free correlations in starling flocks. Proc. Natl Acad. Sci. USA 107, 11865–11870 (2010).
Katz, Y., Tunstrøm, K., Ioannou, C. C., Huepe, C. & Couzin, I. D. Inferring the structure and dynamics of interactions in schooling fish. Proc. Natl Acad. Sci. USA 108, 18720–18725 (2011).
Vásárhelyi, G. et al. Optimized flocking of autonomous drones in confined environments. Sci. Robot. 3, eaat3536 (2018).
Reynolds, C. W. Flocks, herds and schools: a distributed behavioral model. Comput. Graph. 21, 25–34 (1987).
Nagy, M., Ákos, Z., Biro, D. & Vicsek, T. Hierarchical group dynamics in pigeon flocks. Nature 464, 890–893 (2010).
Vicsek, T., Czirók, A., Ben-Jacob, E., Cohen, I. & Shochet, O. Novel type of phase transition in a system of self-driven particles. Phys. Rev. Lett. 75, 1226 (1995).
Grégoire, G., Chaté, H. & Tu, Y. Moving and staying together without a leader. Physica D 181, 157–170 (2003).
Wang, Y. & Mandelkow, E. Tau in physiology and pathology. Nat. Rev. Neurosci. 17, 22–35 (2016).
Guo, J. L. et al. Unique pathological tau conformers from alzheimer’s brains transmit tau pathology in nontransgenic mice. J. Exp. Med. 213, 2635–2654 (2016).
Braak, H. & Braak, E. Neuropathological stageing of alzheimer-related changes. Acta Neuropathol. 82, 239–259 (1991).
Wu, J. W. et al. Neuronal activity enhances tau propagation and tau pathology in vivo. Nat. Neurosci. 19, 1085–1092 (2016).
Asai, H. et al. Depletion of microglia and inhibition of exosome synthesis halt tau propagation. Nat. Neurosci. 18, 1584–1593 (2015).
Henderson, M. X. et al. Spread of α-synuclein pathology through the brain connectome is modulated by selective vulnerability and predicted by network analysis. Nat. Neurosci. 22, 1248–1257 (2019).
Cornblath, E. J. et al. Computational modeling of tau pathology spread reveals patterns of regional vulnerability and the impact of a genetic risk factor. Sci. Adv. 7, eabg6677 (2021).
Ramirez, D. M. et al. Endogenous pathology in tauopathy mice progresses via brain networks. Preprint at bioRxiv https://doi.org/10.1101/2F2023.05.23.541792 (2023).
Gao, T. LaGNA: Learning interpretable dynamics of stochastic complex systems. https://doi.org/10.5281/zenodo.12112887 (2024).
Casadiego, J., Nitzan, M., Hallerberg, S. & Timme, M. Model-free inference of direct network interactions from nonlinear collective dynamics. Nat. Commun. 8, 2192 (2017).
Shen, J., Liu, F., Tu, Y. & Tang, C. Finding gene network topologies for given biological function with recurrent neural network. Nat. Commun. 12, 3125 (2021).
Levina, A., Priesemann, V. & Zierenberg, J. Tackling the subsampling problem to infer collective properties from limited data. Nat. Rev. Phys. 4, 770–784 (2022).
Swain, P. S., Elowitz, M. B. & Siggia, E. D. Intrinsic and extrinsic contributions to stochasticity in gene expression. Proc. Natl Acad. Sci. USA 99, 12795–12800 (2002).
Lind, P. G. et al. Extracting strong measurement noise from stochastic time series: applications to empirical data. Phys. Rev. E 81, 041125 (2010).
Hamilton, F., Berry, T. & Sauer, T. Kalman-takens filtering in the presence of dynamical noise. Eur. Phys. J. Spec. Top. 226, 3239–3250 (2017).
Prasse, B. & Van Mieghem, P. Predicting network dynamics without requiring the knowledge of the interaction graph. Proc. Natl Acad. Sci. USA 119, e2205517119 (2022).
Li, X. et al. Higher-order granger reservoir computing: simultaneously achieving scalable complex structures inference and accurate dynamics prediction. Nat. Commun. 15, 2506 (2024).
Lambiotte, R., Rosvall, M. & Scholtes, I. From networks to optimal higher-order models of complex systems. Nat. Phys. 15, 313–320 (2019).
Battiston, F. et al. The physics of higher-order interactions in complex systems. Nat. Phys. 17, 1093–1098 (2021).
Acknowledgements
GY is supported by the National Natural Science Foundation of China (grants no. T2225022, no. 12161141016, no. 12350710786, and no. 62088101), STI2030 Major Project (grant no. 2021ZD0204500), Shanghai Municipal Science and Technology Major Project (grant no. 2021SHZDZX0100), Shuguang Program of Shanghai Education Development Foundation and Shanghai Municipal Education Commission (grant no. 22SG21), and the Fundamental Research Funds for the Central Universities. BB is supported by the Israel Science Foundation (grant no. 499/19), the Israel-China ISF-NSFC joint research program (grant no. 3552/21), the US National Science Foundation CRISP award no. 1735505, and the VATAT grant for data science research. The authors are also grateful for the helpful discussion with Zhuohao He, Jack M. Moore, Xiaozhu Zhang, Tongyu Li, and Xiaolei Ru.
Author information
Authors and Affiliations
Contributions
G.Y. conceived the research, G.Y., T.T.G. and B.B. designed it, T.T.G. performed it, G.Y., T.T.G., and B.B. analyzed the results, and G.Y. and T.T.G. wrote the manuscript with input from B.B.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks Tailin Wu, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Source data
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Gao, TT., Barzel, B. & Yan, G. Learning interpretable dynamics of stochastic complex systems from experimental data. Nat Commun 15, 6029 (2024). https://doi.org/10.1038/s41467-024-50378-x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467-024-50378-x
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.