A general model-based causal inference method overcomes the curse of synchrony and indirect effect

Park, Se Ho; Ha, Seokmin; Kim, Jae Kyoung

doi:10.1038/s41467-023-39983-4

Download PDF

Article
Open access
Published: 24 July 2023

A general model-based causal inference method overcomes the curse of synchrony and indirect effect

Nature Communications volume 14, Article number: 4287 (2023) Cite this article

5899 Accesses
1 Citations
59 Altmetric
Metrics details

Subjects

Abstract

To identify causation, model-free inference methods, such as Granger Causality, have been widely used due to their flexibility. However, they have difficulty distinguishing synchrony and indirect effects from direct causation, leading to false predictions. To overcome this, model-based inference methods that test the reproducibility of data with a specific mechanistic model to infer causality were developed. However, they can only be applied to systems described by a specific model, greatly limiting their applicability. Here, we address this limitation by deriving an easily testable condition for a general monotonic ODE model to reproduce time-series data. We built a user-friendly computational package, General ODE-Based Inference (GOBI), which is applicable to nearly any monotonic system with positive and negative regulations described by ODE. GOBI successfully inferred positive and negative regulations in various networks at both the molecular and population levels, unlike existing model-free methods. Thus, this accurate and broadly applicable inference method is a powerful tool for understanding complex dynamical systems.

Introduction

Identifying causal interaction is crucial to understand the underlying mechanism of systems in nature. A recent surge in time-series data collection with advanced technology offers opportunities to computationally uncover causation¹. Various model-free methods, such as Granger causality (GC)² and convergent cross mapping (CCM)³, have been widely used to infer causation from time-series data. Although they are easy to implement and broadly applicable^{4,5,6,7,8,9,10}, they usually struggle to differentiate generalized synchrony (i.e., similar periods among components) versus causality^{11,12,13,14,15} and distinguish between direct and indirect causation^{16,17,18,19,20}. For instance, when oscillatory time-series data is given, nearly all-to-all connected networks are inferred¹². To prevent such false positive predictions, model-free methods have been improved (e.g., partial cross mapping (PCM)²⁰), but further investigation is needed to show their universal validity.

Alternatively, model-based methods infer causality by testing the reproducibility of time-series data with mechanistic models using various methods such as simulated annealing²¹ and the Kalman Filter^22,23. Although testing the reproducibility is computationally expensive, as long as the underlying model is accurate, the model-based inference method is accurate even in the presence of generalized synchrony in time series and indirect effect^{21,22,23,24,25,26,27,28,29}. However, the inference results strongly depend on the choice of model, and inaccurate model imposition can result in false positive predictions, limiting their applicability. To overcome this limit, inference methods using flexible models were developed^{30,31,32,33,34,35,36,37,38,39}. In particular, the most recent method, ION¹², infers causation from X to Y described by the general monotonic ODE model between two components, i.e., $\frac{dY}{dt}=f(X,Y)$. However, ION is applicable only when every component is affected by at most one another component.

Here, we develop a model-based method that infers interactions among multiple components described by the general monotonic ODE model:

$$\frac{dY}{dt}=f({{{{{{{\bf{X}}}}}}}})=f({X}_{1},{X}_{2},\cdots \,,{X}_{N}),$$

(1)

where f can be any smooth and monotonic increasing or decreasing functions of X_i and X_N is Y in the presence of self-regulation. Thus, our approach considerably resolves the fundamental limit of model-based inference: strong dependence on a chosen model. Furthermore, we derive a simple condition for the reproducibility of time series with Eq. (1), which does not require computationally expensive fitting, unlike previous model-based approaches. To facilitate our approach, we develop a user-friendly computational package, GOBI (General ODE-Based Inference). GOBI successfully infers causal relationships in gene regulatory networks, ecological systems, and cardiovascular disease caused by air pollution from synchronous time-series data, with which popular model-free methods fail at inference. Furthermore, GOBI can also distinguish between direct and indirect causation, even from noisy time-series data. Because GOBI is both accurate and broadly applicable, which had not been achieved by previous model-free or model-based inference methods, it can be a powerful tool in understanding complex dynamical systems.

Results

Inferring regulation types from time series

We first illustrate the common properties of time series generated by either positive or negative regulation with simple examples. When the input signal X positively regulates Y (X → Y) (Fig. 1a), $\dot{Y}$ increases whenever X increases. Thus, for any pair of time points t and t^* with which X^d(t, t^*) ≔ X(t) − X(t^*) > 0, ${\dot{Y}}^{d}(t,{t}^{*}):=\dot{Y}(t)-\dot{Y}({t}^{*}) \, > \, 0$. Similarly, when X negatively regulates Y (X ⊣ Y) (Fig. 1c left), if X^d(t, t^*) < 0, then ${\dot{Y}}^{d}(t,{t}^{*}) \, > \, 0$. Thus, in the presence of either positive (σ = + ) or negative (σ = − ) regulation, the following regulation-detection function is always positive (Fig. 1b, c):

$${I}_{{X}^{\sigma }}^{Y}(t,{t}^{*}):=\sigma {X}^{d}(t,{t}^{*})\cdot {\dot{Y}}^{d}(t,{t}^{*})$$

(2)

defined on (t, t^*) such that σX^d(t, t^*) > 0.

**Fig. 1: Inferring regulation types using regulation-detection functions and scores.**

This idea can be extended to a case with multiple causes. For instance, when X₁ and X₂ positively regulate Y together (Fig. 1d), if both ${X}_{1}^{d}\, > \,0$ and ${X}_{2}^{d} \, > \, 0$, then ${\dot{Y}}^{d} \, > \, 0$. This leads to the positivity of the regulation-detection function for $\begin{array}{c}{X}_{1}\to \\ {X}_{2}\to \end{array}Y$, ${I}_{{X}_{1}^{+}{X}_{2}^{+}}^{Y}(t,{t}^{*}):={X}_{1}^{d}(t,{t}^{*})\cdot {X}_{2}^{d}(t,{t}^{*})\cdot {\dot{Y}}^{d}(t,{t}^{*})$, defined for (t, t^*) such that ${X}_{1}^{d}(t,{t}^{*})\, > \,0$ and ${X}_{2}^{d}(t,{t}^{*})\, > \,0$ (Fig. 1e). Similarly, if X₁ and X₂ positively and negatively regulate Y, respectively ($\begin{array}{c}{X}_{1}\to \\ {X}_{2} \dashv \end{array}Y$) (Fig. 1g), the regulation-detection function for $\begin{array}{c}{X}_{1}\to \\ {X}_{2} \dashv \end{array}Y$, ${I}_{{X}_{1}^{+}{X}_{2}^{-}}^{Y}(t,{t}^{*}):={X}_{1}^{d}(t,{t}^{*})\cdot (-{X}_{2}^{d}(t,{t}^{*}))\cdot {\dot{Y}}^{d}(t,{t}^{*})$, is positive for (t, t^*) such that ${X}_{1}^{d}(t,{t}^{*}) > 0$ and ${X}_{2}^{d}(t,{t}^{*}) < 0$ (Fig. 1i). Note that unlike ${I}_{{X}_{1}^{+}{X}_{2}^{+}}^{Y}$ (${I}_{{X}_{1}^{+}{X}_{2}^{-}}^{Y}$), ${I}_{{X}_{1}^{+}{X}_{2}^{-}}^{Y}$ (${I}_{{X}_{1}^{+}{X}_{2}^{+}}^{Y}$) is not always positive for $\begin{array}{c}{X}_{1}\to \\ {X}_{2}\to \end{array}Y$ ($\begin{array}{c}{X}_{1}\to \\ {X}_{2} \dashv \end{array}Y$) (Fig. 1f, h). See Supplementary Fig. 1 for other types of 2D regulations.

In the presence of monotonic regulation, the regulation-detection function ${I}_{{{{{{{{{\bf{X}}}}}}}}}^{\sigma }}^{Y}$ is positive. The positivity of the ${I}_{{{{{{{{{\bf{X}}}}}}}}}^{\sigma }}^{Y}$ can be quantified with its normalized integral, regulation-detection score ${S}_{{{{{{{{{\bf{X}}}}}}}}}^{\sigma }}^{Y}$ (Eq. (4)). Thus, ${S}_{{{{{{{{{\bf{X}}}}}}}}}^{\sigma }}^{Y}=1$ in the presence of regulation type σ since the regulation-detection function is positive (see Supplementary Information for details). However, even in the absence of regulation type σ, ${S}_{{{{{{{{{\bf{X}}}}}}}}}^{\sigma }}^{Y}$ can often be one. For instance, when X₁ positively regulates Y and X₂ does not regulate Y (Fig. 1j), $\dot{Y}$ increases whenever X₁ increases regardless of X₂. Thus, both ${I}_{{X}_{1}^{+}{X}_{2}^{+}}^{Y}$ and ${I}_{{X}_{1}^{+}{X}_{2}^{-}}^{Y}$ are positive (Fig. 1k, l). Here, ${S}_{{X}_{1}^{+}{X}_{2}^{+}}^{Y}={S}_{{X}_{1}^{+}{X}_{2}^{-}}^{Y}=1$ reflects that X₂ does not affect the regulation X₁ → Y. Thus, to quantify the effect of a new component (e.g., X₂) on an existing regulation (e.g., X₁ → Y), we develop a regulation-delta function Δ:

$${\Delta }_{{X}_{1}^{+}}^{Y}({X}_{2}):={S}_{{X}_{1}^{+}{X}_{2}^{+}}^{Y}-{S}_{{X}_{1}^{+}{X}_{2}^{-}}^{Y}.$$

(3)

If ${\Delta }_{{X}_{1}^{+}}^{Y}({X}_{2})=0$, ${S}_{{X}_{1}^{+}{X}_{2}^{+}}^{Y}=1$ (${S}_{{X}_{1}^{+}{X}_{2}^{-}}^{Y}=1$) does not indicate the presence of $\begin{array}{c}{X}_{1}\to \\ {X}_{2}\to \end{array}Y$ ($\begin{array}{c}{X}_{1}\to \\ {X}_{2} \dashv \end{array}Y$).

Inferring regulatory network structures

${S}_{{{{{{{{{\bf{X}}}}}}}}}^{\sigma }}^{Y}=1$ together with Δ ≠ 0 can be used as an indicator of regulation type σ from X to Y. Based on this, we construct a framework for inferring a regulatory network from time-series data (Fig. 2a). To illustrate this, we obtain multiple time-series data simulated with random input signal A and different initial conditions of B and C randomly selected from [−1, 1].

**Fig. 2: Framework for inferring regulatory networks.**

From each time series, the regulation-detection score ${S}_{{X}^{\sigma }}^{Y}$ is calculated for every type of 1D regulation σ from X to Y (X, Y= A, B, or C) (Step 1). Because only A ⊣ B satisfies the criteria ${S}_{{X}^{\sigma }}^{Y}=1$ for every time series, only A ⊣ B is inferred as 1D regulation. Note that even for the other regulations, ${S}_{{{{{{{{{\bf{X}}}}}}}}}^{\sigma }}^{Y}=1$ can occur for a few time series, leading to a false positive prediction. This can be prevented by using multiple time series. Next, ${S}_{{{{{{{{{\bf{X}}}}}}}}}^{\sigma }}^{Y}$ is calculated for every 2D regulation type σ (Step 2). Three types of regulation ($\begin{array}{c}A \dashv \\ C\to \end{array}B$, $\begin{array}{c}A \dashv \\ C \dashv \end{array}B$, and $\begin{array}{c}A\to \\ B\to \end{array}C$) satisfy the criteria ${S}_{{{{{{{{{\bf{X}}}}}}}}}^{\sigma }}^{Y}=1$ for every time series. Among these, we can identify false positive regulations by using a regulation-delta function (Step 3). ${\Delta }_{{A}^{-}}^{B}(C)$ is equal to zero for every time series, indicating that $\begin{array}{c}A \dashv \\ C\to \end{array}B$ and $\begin{array}{c}A \dashv \\ C \dashv \end{array}B$ are false positive regulations. Thus, $\begin{array}{c}A\to \\ B\to \end{array}C$ is the only inferred 2D regulation as it satisfies the criteria for the regulation-delta function (${\Delta }_{{A}^{+}}^{C}(B)\ne 0$ and ${\Delta }_{{B}^{+}}^{C}(A)\ne 0$). By merging the inferred 1D and 2D regulations, the regulatory network is successfully inferred. Here, note that regulation A → C is not detected by the 1D regulation–detection score since C has multiple causes. However, the 2D regulation-detection score detects $\begin{array}{c}A\to \\ B\to \end{array}C$, which contains A → C. This demonstrates the need for multi-dimensional inferences, as the 1D criteria alone would not have been sufficient to fully capture the regulatory relationships in the network. Since this system has three components, we infer up to 2D regulations. If there are N components in the system, we go up to (N − 1)D regulations (Supplementary Fig. 2).

We apply the framework to infer regulatory networks from simulated time-series data of various biological models. In these models, the degradation rates of molecules increase as their concentrations increase, like in most biological systems (i.e., self-regulation is negative). Such prior information, including the types of self-regulation, can be incorporated into our framework. For example, to incorporate negative self-regulation, when detecting ND regulation, one can use the (N + 1)D regulation-detection function and score that include negative self-regulation. Specifically, when inferring 1D positive regulation from X to Y, the criterion ${S}_{{X}^{+}{Y}^{-}}^{Y}=1$ is used. To illustrate this, we assume the negative self-regulation to infer the network structures of biological models (see below for details). Note that this assumption is optional for inference (see Supplementary Information for details).

From the time series simulated with the Kim-Forger model (Fig. 2b left), describing the negative feedback loop of the mammalian circadian clock⁴⁰, using the criteria ${S}_{{{{{{{X}}}}}}^{\sigma }{Y}^{-}}^{Y}=1$, two positive 1D regulations (M → P_C and P_C → P) and one negative 1D regulation (P ⊣ M) are inferred (Fig. 2b middle). Among the six different types of 2D regulations ($\begin{array}{c}M\to \\ P\to \end{array}{P}_{C}$, $\begin{array}{c}M\to \\ P \dashv \end{array}{P}_{C}$, $\begin{array}{c}{P}_{C}\to \\ M\to \end{array}P$, $\begin{array}{c}{P}_{C}\to \\ M \dashv \end{array}P$, $\begin{array}{c}P \dashv \\ {P}_{C}\to \end{array}M$, and $\begin{array}{c}P \dashv \\ {P}_{C} \dashv \end{array}M$) satisfying the criteria ${S}_{{{{{{{{{\bf{X}}}}}}}}}^{\sigma }{Y}^{-}}^{Y}=1$ for all the time series, none of them pass the Δ test (i.e., ${\Delta }_{{M}^{+}}^{{P}_{C}}(P)$= ${\Delta }_{{P}_{C}^{+}}^{P}(M)$= ${\Delta }_{{P}^{-}}^{M}({P}_{C})$ = 0) (Fig. 2b middle). Thus, no 2D regulation is inferred. By merging the three inferred 1D regulations, the negative feedback loop structure is recovered (Fig. 2b right). Our method also successfully infers the negative feedback loop structure of Frzilator⁴¹ (Fig. 2c) and the 4-state Goodwin oscillator⁴² (Fig. 2d). Furthermore, our framework correctly infers systems having 2D regulations: the Goldbeter model describing the Drosophila circadian clock⁴³ (Fig. 2e) and the regulatory network of the cAMP oscillator of Dictyostelium⁴⁴ (Fig. 2f) (see Supplementary Information for the equations and parameters of the models and Supplementary Data 1 for detailed inference results). Here, assuming negative self-regulation allows us to reduce ND regulation to (N − 1)D regulation. This simplification is important for accurate inference when data is limited (Supplementary Fig. 3). Moreover, it should be noted that when the assumptions about the types of self-regulation are not met, only the links that violate these assumptions become untrustworthy, while the other inference results are not affected (Supplementary Fig. 3). Taken together, our method successfully infers regulatory networks from various in silico systems regardless of their explicit forms of ODE by assuming a general monotonic ODE (Eq. (1)). Unlike our approach, model-based methods that require specifying the model equations produce inaccurate inferences if inappropriate functional bases are chosen (Supplementary Fig. 4).

Inference with noisy time series

In the presence of noise in the time-series data, the regulation-detection score (${S}_{{{{{{{{{\bf{X}}}}}}}}}^{\sigma }}^{Y}$) is perturbed. Thus, ${S}_{{{{{{{{{\bf{X}}}}}}}}}^{\sigma }}^{Y}$ may not be one even if there is a regulation type σ from X to Y. For example, in the case of an Incoherent Feed-forward Loop (IFL) which contains A ⊣ B (Fig. 3a), ${S}_{{A}^{-}}^{B}$ is always one in the absence of noise (Fig. 2a Step 1, blue), but not in the presence of noise (Fig. 3b blue). Thus, for noisy data, we need to relax the criteria ${S}_{{{{{{{{{\bf{X}}}}}}}}}^{\sigma }}^{Y}=1$ to ${S}_{{{{{{{{{\bf{X}}}}}}}}}^{\sigma }}^{Y} > {S}^{{{{{{{{\rm{thres}}}}}}}}}$ where ${S}^{{{{{{{{\rm{thres}}}}}}}}} < 1$ is a threshold. Because ${S}_{{A}^{-}}^{B}$ gets farther away from one as the noise level increases, ${S}^{{{{{{{{\rm{thres}}}}}}}}}$ should also be decreased accordingly. We choose ${S}^{{{{{{{{\rm{thres}}}}}}}}}$ as 0.9 − 0.005 × (noise level) with which true and false regulations can be distinguished in the majority of cases for our previous in silico examples (Fig. 3b and Supplementary Fig. 5e). For instance, ${S}^{{{{{{{{\rm{thres}}}}}}}}}$ (green dashed line, Fig. 3b) overall separates true regulation (Fig. 3b blue) and false regulation (Fig. 3b red). Here we choose A → C, which has the highest regulation-detection score among all false positive 1D regulations (Fig. 2a Step 1, red).

**Fig. 3: Extended framework for inferring a regulatory network from noisy data.**

We found that the fraction of time-series data satisfying ${S}_{{{{{{{{{\bf{X}}}}}}}}}^{\sigma }}^{Y} > {S}^{{{{{{{{\rm{thres}}}}}}}}}$, which we refer to as the Total Regulation Score (TRS) (Fig. 3c left), more clearly distinguishes the true (Fig. 3c right blue) and false (Fig. 3c right red) regulations. Thus, we use the criteria ${{{{{{{{\rm{TRS}}}}}}}}}_{{{{{{{{{\bf{X}}}}}}}}}^{\sigma }}^{Y} > {{{{{{{{\rm{TRS}}}}}}}}}^{{{{{{{{\rm{thres}}}}}}}}}$ to infer the regulation. Similar to ${S}^{{{{{{{{\rm{thres}}}}}}}}}$, ${{{{{{{{\rm{TRS}}}}}}}}}^{{{{{{{{\rm{thres}}}}}}}}}$ also decreases as the noise level increases. Specifically, we use ${{{{{{{{\rm{TRS}}}}}}}}}^{{{{{{{{\rm{thres}}}}}}}}}=0.9-0.01\times \,{{\mbox{(noise level)}}}\,$, which successfully distinguishes between the true and false regulations of IFL (Fig. 3c right) and in silico systems investigated in the previous section (Supplementary Fig. 5f). See Methods and Supplementary Information for how to quantify the noise level. Note that ${{{{{{{{\rm{TRS}}}}}}}}}_{{{{{{{{{\bf{X}}}}}}}}}^{\sigma }}^{Y}$ is the measure that integrates the weight given on the regulation–detection score reflecting the size of the domain of the regulation-detection function (see Supplementary Information for details).

Next, we investigate whether the Δ test can distinguish direct and indirect regulations using examples of the coherent feed-forward loop (CFL, Fig. 3d) and a single feed-forward loop (SFL, Fig. 3e). In CFL, direct negative regulation from A to C exists. On the other hand, in SFL, only indirect negative regulation from A to C, induced from a regulatory chain A ⊣ B → C, exists.

In the presence of noise, the regulation-delta function often fails to distinguish these direct and indirect regulations from A to C in CFL and SFL. Specifically, for both CFL and SFL with 20% multiplicative noise, ${S}_{{A}^{-}{B}^{+}}^{C}$ is larger than ${S}^{{{{{{{{\rm{thres}}}}}}}}}$ and ${\Delta }_{{B}^{+}}^{C}(A)$ is strictly negative (Fig. 3f, g) for most of the cases. Here, the sign of Δ is quantified by using a one-tailed Wilcoxon signed rank test (Supplementary Fig. 6a). Thus, the regulation $\begin{array}{c}A \dashv \\ B\to \end{array}C$ is inferred not only from CFL but also SFL. This indicates that in the presence of noise, the regulation-delta function can be skewed to the specific type of regulation, even for indirect regulation. To prevent such false positive predictions, we develop another criterion. Specifically, we use a surrogate time series of A (A_shuffled, Fig. 3h) to destroy the dependence of C on A in the presence of direct regulation (A ⊣ C). As a result, the regulation–detection score ${S}_{{A}_{{{{{{{{\rm{shuffled}}}}}}}}}^{-}{B}^{+}}^{C}$ is significantly reduced compared to ${S}_{{A}^{-}{B}^{+}}^{C}$ (Fig. 3i top). On the other hand, if A does not directly regulate C, then the regulation-detection score ${S}_{{A}_{{{{{{{{\rm{shuffled}}}}}}}}}^{-}{B}^{+}}^{C}$ does not decrease much (Fig. 3i bottom), and ${S}_{{A}^{-}{B}^{+}}^{C}$ is not significantly larger than ${S}_{{A}_{{{{{{{{\rm{shuffled}}}}}}}}}^{-}{B}^{+}}^{C}$. When multiple time series are given, we calculate the p-values for each data and integrate them using Fisher’s method. The criteria (the combined p-value < combining p = 0.001 for every data) successfully distinguishes between direct and indirect regulation even when the noise level varies (Supplementary Fig. 6b).

From the noisy time series, using the criteria ${{{{{{{{\rm{TRS}}}}}}}}}_{{{{{{{{{\bf{X}}}}}}}}}^{\sigma }}^{Y} > {{{{{{{{\rm{TRS}}}}}}}}}^{{{{{{{{\rm{thres}}}}}}}}}$, all potential 1D (Fig. 3h upper-left) and 2D (Fig. 3h upper-right) regulations are inferred. Then, among the inferred regulations, we need to identify indirect regulations. Unlike IFL, CFL and SFL have a potential indirect regulation. That is, A ⊣ C has the potential to be indirect since there is a regulatory chain A ⊣ B → C. In this case, we use a surrogate time series of a potential source of indirect regulation (A) to test whether ${S}_{{A}^{-}{B}^{+}}^{C}$ is significantly larger than ${S}_{{A}_{{{{{{{{\rm{shuffled}}}}}}}}}^{-}{B}^{+}}^{C}$. This reveals that A ⊣ C is a direct regulation for CFL, but not SFL. Then, merging 1D and 2D results successfully recovers the network structure of IFL, CFL, and SFL even from noisy time series (Fig. 3j). Since our method involves multi-dimensional inferences, in the presence of noise, various dimensional regulations for a single target can be detected. In this case, only the regulation with the highest value of TRS is inferred. In the example of CFL, our 1D framework infers B → C and 2D framework infers $\begin{array}{c}A \dashv \\ B\to \end{array}C$. Since ${{{{{{{{\rm{TRS}}}}}}}}}_{{A}^{-}{B}^{+}}^{C}$ is higher than ${{{{{{{{\rm{TRS}}}}}}}}}_{{B}^{+}}^{C}$, only 2D regulation $\begin{array}{c}A \dashv \\ B\to \end{array}C$ is inferred (Fig. 3j).

Based on TRS and post-filtering tests (Δ test and surrogate test), we develop a user-friendly computational package, GOBI, which can be used to infer regulations for systems described by Eq. (1) (see README file on Github⁴⁵ and Supplementary Information for manuals). GOBI successfully infers regulatory networks from simulated time series using ODE models (Fig. 2b–f) in the presence of multiplicative noise (Fig. 3k) and other types of noise (Supplementary Fig. 7a). Here, the F₂ score, the weighted harmonic mean of precision and recall, is nearly one, indicating that GOBI is able to recover all regulations almost perfectly. However, it should be noted that noise types that significantly affect the shapes of trajectories can result in the decreased performance of GOBI, which uses time series shape information for inference (Supplementary Fig. 7b).

Successful network inferences from experimentally measured time series

When the proposed thresholds for the regulation-detection score (Fig. 3b) and Total Regulation Score (Fig. 3c) and two critical values of significance (i.e., p-value = 0.01 for the Δ test and p-value = 0.001 for the surrogate test) are used, GOBI successfully infers the regulatory networks from in silico time series. Here, we use GOBI with these default hyperparameters to infer regulatory networks from experimentally measured time series. From the population data of two unicellular ciliates Paramecium aurelia (P) and Didinium nasutum (D)^3,46 (Fig. 4a left), the network between the prey (P) and predator (D) is successfully inferred (Fig. 4a and Supplementary Fig. 9a).

**Fig. 4: Inferring regulatory networks from experimental data.**

Next, we apply GOBI to the time series of the synthetic genetic oscillator, which consists of Tetracycline repressor (TetR) and RNA polymerase sigma factor (σ²⁸)⁴⁷ (Fig. 4b left). While the time series are measured under different conditions after adding purified TetR or inactivating intrinsic TetR, our method consistently infers the negative feedback loop, including negative self-regulation based on two direct regulations σ²⁸ → TetR and TetR ⊣ σ²⁸ for all cases (Fig. 4b middle and Supplementary Fig. 9b). This indicates that our method can infer regulations even when the data are achieved from different conditions since we do not specify the specific equations with parameters in Eq. (1).

We next investigate the time-series data from a slightly more complex synthetic oscillator, the three-gene repressilator⁴⁸ (Fig. 4c left). As the amount of data is greatly reduced compared to the synthetic genetic oscillator (Fig. 4b), we assume negative self-regulation. Then, the criteria ${{{{{{{{\rm{TRS}}}}}}}}}_{{{{{{{{{\bf{X}}}}}}}}}^{\sigma }{Y}^{-}}^{Y} > {{{{{{{{\rm{TRS}}}}}}}}}^{{{{{{{{\rm{thres}}}}}}}}}$ infers three negative 1D regulations and three 2D regulations (Fig. 4c middle). Among the 2D regulations, positive regulations are inferred as indirect as they do not pass the surrogate test (Fig. 4c middle, dashed arrow). Thus, among the inferred 2D regulations, only the negative regulations, consistent with the inferred 1D regulations, are inferred as direct regulations. Gathering these results, GOBI successfully infers the network structure of the repressilator (Fig. 4c right and Supplementary Fig. 9c). Note that although our method infers the regulations among proteins as direct, in fact, mRNA exists as an intermediate step between the negative regulations among the proteins. This happens due to the short translation time in Escherichia coli⁴⁹, which causes the mRNA and protein profiles to exhibit similar shapes and phases. This indicates that our method infers indirect regulations with a short intermediate step as direct regulations. Furthermore, compared to the synthetic genetic oscillator (Fig. 4b), the amount of data is small, and the number of components is large; thus, it is essential to assume negative self-regulation for correct inference, i.e., without the assumption, the available data is insufficient to fill the space of the regulation–detection function, making it difficult to detect 2D regulations.

We apply GOBI to the time series measuring the amounts of four cofactors present at the estrogen-sensitive pS2 promoter after treatment with estradiol^50,51(Fig. 4d left). As all components are expected to decay in proportion to their own concentrations, negative self-regulations are assumed, which is critical due to the small amount of data. GOBI infers five 1D regulations (HDAC ⊣ hER, TRIP1 ⊣ hER, hER → POLII, TRIP1 ⊣ POLII, and HDAC ⊣ POLII) that satisfy the criteria ${{{{{{{{\rm{TRS}}}}}}}}}_{{{{{{{{{\bf{X}}}}}}}}}^{\sigma }{Y}^{-}}^{Y} > {{{{{{{{\rm{TRS}}}}}}}}}^{{{{{{{{\rm{thres}}}}}}}}}$. However, we exclude them because hER and POLII have two and three causes, forming 2D and 3D regulations, respectively, although the 1D criteria assumes a single cause (Fig. 4d middle, dashed box). If all of these regulations are effective, they will be identified as 2D and 3D regulations. Indeed, among the 11 candidates for 2D regulations, most of them include the five inferred 1D regulations. Via Δ test and surrogate test, indirect regulations are identified among inferred 2D regulations (Supplementary Fig. 9d). For example, 2D regulation $\begin{array}{c}{{{{{{{\rm{hER}}}}}}}}\to \\ {{{{{{{\rm{HDAC}}}}}}}} \dashv \end{array}{{{{{{{\rm{POLII}}}}}}}}$ satisfies the criteria ${{{{{{{{\rm{TRS}}}}}}}}}_{{{{{{{{{\bf{X}}}}}}}}}^{\sigma }{Y}^{-}}^{Y} > {{{{{{{{\rm{TRS}}}}}}}}}^{{{{{{{{\rm{thres}}}}}}}}}$. Among two causal variables (i.e., hER and HDAC), only positive regulation from hER passes the post-filtering test, i.e., only 1D regulation hER → POLII, but not HDAC ⊣ POLII is inferred as a direct regulation. Consequently, after excluding all the indirect regulations, two 1D regulations (hER → POLII and HDAC ⊣ hER) and one 2D regulation ($\begin{array}{c}{{{{{{{\rm{POLII}}}}}}}}\to \\ {{{{{{{\rm{TRIP1}}}}}}}}\to \end{array}{{{{{{{\rm{HDAC}}}}}}}}$) are inferred (Supplementary Fig. 9d). While we are not able to further infer 3D regulations due to the limited amount of data, the inferred regulations are supported by the experiments. That is, estradiol triggers the binding of hER to the pS2 promoter to recruit POLII⁵⁰, supporting hER → POLII. Also, inhibition of POLII phosphorylation blocks the recruitment of HDAC but does not affect the APIS engagement at the pS2 promoter⁵⁰, supporting POLII → HDAC and no regulation from POLII to TRIP1, which is a surrogate measure of APIS. Without inhibition of POLII, HDAC is recruited after the APIS engagement, and when the HDAC has maximum occupation, then the pS2 promoter becomes refractory to hER⁵⁰, supporting TRIP1 → HDAC ⊣ hER. Interestingly, the inferred network contains a negative feedback loop, which is required to generate sustained oscillations⁵².

Finally, we investigate five-time series of air pollutants and cardiovascular disease occurrence in Hong Kong from 1994 to 1997⁵³ (Fig. 4e left). Since our goal is to identify which pollutants cause cardiovascular disease, we fix the disease as a target. Also, we assume the negative self-regulation of disease, reflecting death. While two positive causal links from NO₂ and respirable suspended particulates (Rspar) to the disease are identified as 1D regulations (Fig. 4e middle), we exclude them because they share the same target (Fig. 4e middle, dashed box). Among two inferred 2D regulations, one passes the Δ test and surrogate test (Fig. 4d middle). Furthermore, no 3D and 4D regulation is inferred (Supplementary Fig. 9e). The inferred network indicates that both NO₂ and Rspar are major causes of cardiovascular diseases (Fig. 4e right). Indeed, it was reported that NO₂ and Rspar are associated with hospital admissions and mortality due to cardiovascular disease, respectively⁵⁴.

Comparison between our framework and other model-free inference methods

Here, we compare our framework with popular model-free methods, i.e., GC, CCM, and PCM, by using the experimental time-series data in the previous section (Fig. 4a–e). Unlike our method, the model-free methods can only infer the presence of regulation and not its type (i.e., positive and negative). Thus, the arrows represent inferred regulations, which could be either positive or negative.

For the prey–predator system and the genetic oscillator (Fig. 4a, b), we merge them to create a more challenging case. Specifically, from the set of eight different time-series data of a genetic oscillator measured under different conditions, we select one that has a similar phase to the time series of the prey–predator system (Fig. 4b panel at the 2nd row and 2nd column). Then, we merge the selected time series with the time series of the prey–predator system. While GOBI and PCM successfully detect two independent feedback loops (Fig. 5a), CCM and GC infer false positive predictions (e.g., P to σ²⁸ in Fig. 5a) because they usually misidentify synchrony as causality. Furthermore, when we reduce the sampling rate by half, the accuracy of PCM dramatically drops, whereas GOBI can still infer the true network structure (Supplementary Fig. 10).

**Fig. 5: Model-free methods, but not our method, make a false prediction due to the presence of synchrony and indirect effect.**

For a similar reason, synchrony obscures the inference of the model-free methods for the repressilator (Fig. 5b). Moreover, the model-free methods fail to distinguish between direct and indirect regulations. For example, they infer the indirect regulation TetR → λcl induced by the regulatory chain TetR ⊣ LacI ⊣ λcl, unlike our method. Similarly, due to synchrony and indirect effect, for the system of cofactors at the pS2 promoter, model-free methods infer an almost fully connected causal network, unlike our method (Fig. 5c).

When we use 3 years of data (full-length data) on air pollutants and cardiovascular disease, PCM infers the same structure as GOBI infers, i.e., only NO₂ and Rspar cause the disease (Fig. 5d gray)²⁰. On the other hand, when a subset of the data (i.e., two years of data) is used, only GOBI infers the same structure (Fig. 5d purple). This indicates that GOBI is more reliable and accurate than the model-free methods.

Discussion

We develop an inference method that considerably resolves the weakness of model-free and model-based inference methods. We derive the conditions for interactions satisfying the general monotonic ODE (Eq. (1)). As this allows us to easily check the reproducibility of given time-series data with the general monotonic ODE (i.e., the existence of ODE satisfying given time-series data) without fitting, the computational cost is dramatically reduced compared to the previous model-based approaches. Importantly, as our method can be applied to any system described by general monotonic ODE (Eq. (1)), it significantly addresses the fundamental limit of the model-based approach (i.e., the requirement of a priori model accurately describing the system) (Supplementary Fig. 4). In addition, our method also does not run the serious risk of misidentifying generalized synchrony as causality, unlike the previous model-free approaches. Please note that our approach still cannot deal with completely synchronized system. Furthermore, our method successfully distinguishes direct causal relations from indirect causal relations by adopting the surrogate test (Fig. 3). In this way, our framework dramatically reduces the false positive predictions, which are the inherent flaw of the model-free inference method (Fig. 5). Taken together, we develop an accurate and broadly applicable inference method that can uncover unknown functional relationships underlying the system from their output time-series data (Fig. 4).

Despite these advantages, our method has some limitations that should be addressed. First, our framework assumes that when X causes Y, X causes Y either positively or negatively. Thus, GOBI cannot capture the regulation when X causes Y both positively and negatively or when the type of regulation changes over time. However, GOBI can be potentially extended to detect temporal-structured models, including non-monotonic regulation (Supplementary Fig. 11). It would be interesting in future work to investigate the extended framework thoroughly under diverse circumstances. Additionally, while we have considered the general form of monotonic ODE (Eq. (1)), GOBI can also be extended to describe interactions, including time delays (Supplementary Fig. 12). This will be an interesting future direction to make GOBI more broadly applicable. Also another limitation is the possibility of false positive predictions. This occurs because our method tests the reproducibility of time-series data using necessary conditions. Specifically, the regulation-detection score can be one even in the absence of regulation. To resolve this, we use multiple time-series data and perform post-filtering tests (i.e., Δ test and surrogate test). Nonetheless, it should be noted that inferring high-dimensional regulations requires a large amount of data (Supplementary Fig. 13). To address this challenge, we can use prior knowledge about the system. For example, in biological systems, negative self-regulation can be assumed as the degradation rates of molecules increase as their concentrations increase. By assuming negative self-regulation, we are able to reduce the ND regulation to (N − 1)D regulation, which allows us to successfully infer the network structure even with a small amount of experimental data (Fig. 4c). Note that when a priori assumption (e.g., the types of self-regulation) is not met, only the links that violate the assumptions are not trustable, i.e., the other inference results are not affected (Supplementary Fig. 3).

To use GOBI, we need to choose hyper-parameters. When applying GOBI to noisy data, users must choose thresholds for the regulation-detection region, regulation-detection function, and total regulation score, as well as two critical values of significance (i.e., p-values for Δ test and surrogate test). In this study, we determine these values by using noisy simulated data of various examples (Fig. 3 and Supplementary Fig. 5). Nevertheless, these values are effective when they are applied to experimental time-series data (Figs. 4 and 5). Thus, we have set those values of hyper-parameters as the default values of GOBI. However, the optimal threshold may vary depending on the data characteristics, and users may need to adjust the thresholds based on the importance of avoiding false positive or false negative predictions. Another hyper-parameter that requires consideration is the choice of the sampling rate. In this study, we use a sampling rate of 100 points per period after evaluating the trade-off between computational cost and accuracy. However, users can decrease or increase the sampling rate if the computation speed is too slow or if a higher level of accuracy is required, respectively.

Methods

Computational package for inferring regulatory network

Here, we describe the key steps of our computational package, GOBI (https://github.com/Mathbiomed/GOBI)⁴⁵. For the experimental time-series data X(t) = (X₁(t), X₂(t), ⋯ , X_N(t)), X(t) can be interpolated with either the ‘spline’ or ‘fourier’ method, chosen by the user. For the spline interpolation, we use the MATLAB function ‘interp1’ with the option ‘spline’, and for the Fourier interpolation, we use the MATLAB function ‘fit’ with the option ‘fourier1–8’. After the interpolation, the derivative of X(t) is computed using the MATLAB function ‘gradient’ to compute the regulation–detection score.

Regulation–detection region

For the ND regulation (Eq. (1)) with regulation type σ, the regulation-detection region (${R}_{{{{{{{{{\bf{X}}}}}}}}}^{\sigma }}$) is defined as the set of (t, t^*) on the domain of time series ${\left[0,\tau \right)}^{2}$ satisfying $\sigma (i){X}_{i}^{d}(t,{t}^{*})\, > \,0$ for all i = 1, 2, ⋯ , N. For example, with the positive 1D regulation X → Y (σ = + ), ${R}_{{X}^{+}}$ is the set of (t, t^*) where X^d > 0. For the 2D regulation $\begin{array}{c}{X}_{1}\to \\ {X}_{2} \dashv \end{array}Y$ (σ = (+, −)), ${R}_{{X}_{1}^{+}{X}_{2}^{-}}$ is the set of (t, t^*) satisfying both ${X}_{1}^{d} > 0$ and ${X}_{2}^{d} < 0$. The size of the regulation-detection region (${{{{{{{\rm{size}}}}}}}}({R}_{{{{{{{{{\bf{X}}}}}}}}}^{\sigma }})$) is the fraction of ${R}_{{{{{{{{{\bf{X}}}}}}}}}^{\sigma }}$ over the domain ${\left[0,\tau \right)}^{2}$. In the presence of noise, we only consider a region which is not small (i.e., ${{{{{{{\rm{size}}}}}}}}({R}_{{{{{{{{{\bf{X}}}}}}}}}^{\sigma }})\, > \,{R}^{{{{{{{{\rm{thres}}}}}}}}}$) to avoid an error from the noise. The value of ${R}^{{{{{{{{\rm{thres}}}}}}}}}$ can be chosen from 0 to 0.1, and the choice of ${R}^{{{{{{{{\rm{thres}}}}}}}}}$ does not significantly affect the results (Supplementary Fig. 5a). However, a small value of ${R}^{{{{{{{{\rm{thres}}}}}}}}}$ is recommended for inferring high-dimensional regulations since the average of ${{{{{{{\rm{size}}}}}}}}({R}_{{{{{{{{{\bf{X}}}}}}}}}^{\sigma }})$ decreases exponentially as dimension increases (see Supplementary Information for details).

Regulation–detection function and score

When the regulation type σ from X = (X₁, X₂, ⋯ , X_N) to Y exists, the following regulation-detection function (${I}_{{{{{{{{{\bf{X}}}}}}}}}^{\sigma }}^{Y}$) defined on regulation–detection region ${R}_{{X}^{\sigma }}$ is always positive.

$${I}_{{{{{{{{{\bf{X}}}}}}}}}^{\sigma }}^{Y}:={\dot{Y}}^{d}\cdot \mathop{\prod }\limits_{i=1}^{N}\sigma (i){X}_{i}^{d}.$$

Thus, the following regulation-detection score (${S}_{{{{{{{{{\bf{X}}}}}}}}}^{\sigma }}^{Y}$) is one:

$${S}_{{{{{{{{{\bf{X}}}}}}}}}^{\sigma }}^{Y}:=\frac{{\iint }_{{R}_{{{{{{{{{\bf{X}}}}}}}}}^{\sigma }}}{I}_{{{{{{{{{\bf{X}}}}}}}}}^{\sigma }}^{Y}(t,{t}^{*})dtd{t}^{*}}{{\iint }_{{R}_{{{{{{{{{\bf{X}}}}}}}}}^{\sigma }}}|{I}_{{{{{{{{{\bf{X}}}}}}}}}^{\sigma }}^{Y}(t,{t}^{*})|dtd{t}^{*}}$$

(4)

(see Supplementary Information for details). However, this is not true anymore in the presence of noise. Thus, we relax the criteria from ${S}_{{{{{{{{{\bf{X}}}}}}}}}^{\sigma }}^{Y}=1$ to ${S}_{{{{{{{{{\bf{X}}}}}}}}}^{\sigma }}^{Y} > {S}^{{{{{{{{\rm{thres}}}}}}}}}$. Among the data which has nonempty ${R}_{{{{{{{{{\bf{X}}}}}}}}}^{\sigma }}$ (i.e., ${R}_{{{{{{{{{\bf{X}}}}}}}}}^{\sigma }} > {R}^{{{{{{{{\rm{thres}}}}}}}}}$), the fraction of data satisfying the criteria ${S}_{{{{{{{{{\bf{X}}}}}}}}}^{\sigma }}^{Y} > {S}^{{{{{{{{\rm{thres}}}}}}}}}$ is called the total regulation score (${{{{{{{{\rm{TRS}}}}}}}}}_{{{{{{{{{\bf{X}}}}}}}}}^{\sigma }}^{Y}$). Finally, we infer the regulation from noisy time-series data using the criteria ${{{{{{{{\rm{TRS}}}}}}}}}_{{{{{{{{{\bf{X}}}}}}}}}^{\sigma }}^{Y} > {{{{{{{{\rm{TRS}}}}}}}}}^{{{{{{{{\rm{thres}}}}}}}}}$ for noisy time-series data. ${S}^{{{{{{{{\rm{thres}}}}}}}}}=0.9-0.005\times (\,{{\mbox{noise level}}}\,)$ and ${{{{{{{{\rm{TRS}}}}}}}}}^{{{{{{{{\rm{thres}}}}}}}}}=0.9-0.01\times (\,{{\mbox{noise level}}}\,)$ are used (Fig. 3a–c and Supplementary Fig. 5). The noise level of the time series is approximated using the mean square of the residual between the noisy and fitted time series (Supplementary Fig. 8).

Δ test

When we add any regulation to existing true regulation, the regulation-detection score is always one (Fig. 1j-l). Thus, to test whether the additional regulation is effective, we consider ${\Delta }_{{{{{{{{{\bf{X}}}}}}}}}^{\sigma }}^{Y}({X}_{{{{{{{{\rm{new}}}}}}}}})={S}_{{{{{{{{{\bf{X}}}}}}}}}^{\sigma }{X}_{{{{{{{{\rm{new}}}}}}}}}^{+}}^{Y}-{S}_{{{{{{{{{\bf{X}}}}}}}}}^{\sigma }{X}_{{{{{{{{\rm{new}}}}}}}}}^{-}}^{Y}$, where ${S}_{{{{{{{{{\bf{X}}}}}}}}}^{\sigma }{X}_{{{{{{{{\rm{new}}}}}}}}}^{+}}^{Y}$ (${S}_{{{{{{{{{\bf{X}}}}}}}}}^{\sigma }{X}_{{{{{{{{\rm{new}}}}}}}}}^{-}}^{Y}$) is the regulation-detection score when the new component (X_new) is positively (negatively) added to the existing regulation type σ. Because ${\Delta }_{{{{{{{{{\bf{X}}}}}}}}}^{\sigma }}^{Y}({X}_{{{{{{{{\rm{new}}}}}}}}})=0$ reflects that the new component (X_new) does not have any regulatory role, the newly added regulation is inferred only when ${\Delta }_{{{{{{{{{\bf{X}}}}}}}}}^{\sigma }}^{Y}({X}_{{{{{{{{\rm{new}}}}}}}}})\, \ne \, 0$ for some data. In particular, Δ > 0 (Δ < 0) represents that the new component adds positive (negative) regulation. In the presence of noise, the positive (negative) regulation is inferred if Δ ≥ 0 (Δ ≤ 0) consistently for all time series. If the number of time series is greater than 25, the sign of Δ is quantified by a one-tailed Wilcoxon signed rank test. We set the critical value of significance as 0.01, but it can be chosen by the user.

Surrogate test

Indirect regulation is induced by the chain of direct regulations. For example, in SFL (Fig. 3e), regulatory chain A ⊣ B → C induces the indirect negative regulation A ⊣ C. In the presence of noise, the Δ test sometimes fails to distinguish between direct and indirect regulations (Fig. 3d–g). Thus, after the Δ test, if the inferred regulation has the potential to be indirect, we additionally perform the surrogate test to determine whether the inferred regulation is direct or indirect. Specifically, for each candidate of indirect regulation, we shuffle the time series of the cause using the MATLAB function ‘perm’ and then calculate the regulation–detection scores. Then, we test whether the original regulation-detection score is significantly larger than the shuffled ones by using a one-tailed Z test. In the presence of the k number of time-series data, we can get the k number of p-values (p_i, i = 1, 2, ⋯ , k). Thus, we combine them into one test statistic (χ²) using Fisher’s method, ${\chi }_{2k}^{2} \sim -2\mathop{\sum }\nolimits_{i=1}^{k}\log ({p}_{i})$. We set the critical value of the significance of Fisher’s method by combining p_i = 0.001 for all the data, but it can also be chosen by the user.

Model-free methods

For CCM³ and PCM²⁰, we choose an appropriate embedding dimension using the false nearest neighbor algorithm. Also, we select a time lag producing the first minimum of delayed mutual information. To select the threshold value ‘T’ in PCM, we use k-means clustering as suggested in²⁰. We run CCM using ‘skccm’ and PCM using the code provided in²⁰. For GC², we run the code provided in⁵⁵, specifying the order of AR processes of the first minimum of delayed mutual information as we choose a max delay with the CCM and PCM. Also, we reject the null hypothesis that Y does not Granger cause X, and thereby inferred direct regulations by using the F statistic with a significance level of 95%². Specifically, we use embedding dimension 2 for the prey-predator, genetic oscillator, and estradiol data sets; and 3 for the repressilator and air pollutants and cardiovascular disease data sets. Also, we used time lag 2 for prey–predator; 3 ~ 10 for the genetic oscillator (there are eight different time-series data sets); 10 for the repressilator; 15 for the estradiol data set; and 3 for the air pollutants and cardiovascular disease data set.

in silico time-series data

With the ODE describing the system, we simulate the time-series data using the MATLAB function ‘ode45’. The sampling rate is 100 points per period for all the examples (Figs. 1, 2, and 3). For the multiple time-series data (Figs. 2 and 3), we generate 100 different time series with different initial conditions. Then, before applying our method, we normalize each time series by re-scaling to have minimum 0 and maximum 1. To introduce measurement noise in time series, we introduce multiplicative noise sampled randomly from a normal distribution with mean 0 and standard deviation given by the noise level. For example, for 10% multiplicative noise, we add the noise X(t_i) ⋅ ε to X(t_i), where ε ~ N(0, 0. 1²). Before applying our method, all the simulated noisy time series are fitted using the MATLAB function ‘fourier4’. However, if the noise level is too high, ‘fourier4’ tends to overfit and capture the noise. Thus, in the presence of a high level of noise, ‘fourier2’ is recommended for smoothing.

Experimental time-series data

For the experimental data, we first calculate the period of data by using the first peak of auto-correlation. Then, we cut the time series into periods (Fig. 4a, b). Specifically, we cut the prey-predator time series every five days to generate seven different time series (Fig. 4a). When the number of cycles in the data is low (<5), to generate enough multiple time series (Fig. 4c–e), we cut the data using the moving-window technique. That is, we choose the window whose size is the period of the time series. Then, along the time series, we move the window until the next window overlaps with the current window by 90%. Then, the time series in every window is used for our approach. We employ this approach for the repressilator (Fig. 4c); estradiol data set (Fig. 4d); and air pollution and cardiovascular disease data (Fig. 4e). For instance, we use time-series data of air pollutants and cardiovascular disease with a window size of one year and an overlap of 11 months (i.e., move the window for a month) to generate 23 data sets. Before this, the time series of disease admissions are smoothed using a simple moving average with a window width of seven days to avoid the effect of days of the week. Each time series is interpolated using the MATLAB function ‘spline’ (Fig. 4a–d) or ‘fourier2’ (Fig. 4e), depending on the noise level of the time-series data.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

The data sets generated in this study are publicly available on Github⁴⁵. The references for the public data sets used and analyzed during this study can be found in the “Results” section^{3,20,46,47,48,50}. Source data are provided in this paper.

Code availability

The codes for the GOBI package, including all the figures presented in this article, are publicly available on Github⁴⁵.

References

Saint-Antoine, M. M. & Singh, A. Network inference in systems biology: recent developments, challenges, and applications. Curr. Opin. Biotechnol. 63, 89–98 (2020).
Article CAS PubMed PubMed Central Google Scholar
Granger, C. W. Investigating causal relations by econometric models and cross-spectral methods. Econometrica 424–438 (1969).
Sugihara, G. et al. Detecting causality in complex ecosystems. Science 338, 496–500 (2012).
Article ADS CAS PubMed MATH Google Scholar
Pourzanjani, A., Herzog, E. D. & Petzold, L. R. On the inference of functional circadian networks using Granger causality. PLoS ONE 10, e0137540 (2015).
Article PubMed PubMed Central Google Scholar
Runge, J. et al. Inferring causation from time series in earth system sciences. Nat. Commun. 10, 1–13 (2019).
Article CAS Google Scholar
Kamiński, M., Ding, M., Truccolo, W. A. & Bressler, S. L. Evaluating causal relations in neural systems: Granger causality, directed transfer function and statistical assessment of significance. Biol. Cybern. 85, 145–157 (2001).
Article PubMed MATH Google Scholar
Deyle, E. R., Maher, M. C., Hernandez, R. D., Basu, S. & Sugihara, G. Global environmental drivers of influenza. Proc. Natl Acad. Sci. USA 113, 13081–13086 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Ma, H. et al. Detection of time delays and directional interactions based on time series from complex dynamical systems. Phys. Rev. E 96, 012221 (2017).
Article ADS MathSciNet PubMed Google Scholar
Tsonis, A. A. et al. Dynamical evidence for causality between galactic cosmic rays and interannual variation in global temperature. Proc. Natl Acad. Sci. USA 112, 3253–3256 (2015).
Article ADS CAS PubMed PubMed Central Google Scholar
Ye, H., Deyle, E. R., Gilarranz, L. J. & Sugihara, G. Distinguishing time-delayed causal interactions using convergent cross mapping. Sci. Rep. 5, 1–9 (2015).
Article CAS Google Scholar
Stokes, P. A. & Purdon, P. L. A study of problems encountered in Granger causality analysis from a neuroscience perspective. Proc. Natl Acad. Sci. USA 114, E7063–E7072 (2017).
Article ADS MathSciNet CAS PubMed PubMed Central MATH Google Scholar
Tyler, J., Forger, D. & Kim, J. K. Inferring causality in biological oscillators. Bioinformatics 38, 196–203 (2022).
Article CAS Google Scholar
Nawrath, J. et al. Distinguishing direct from indirect interactions in oscillatory networks with multiple time scales. Phys. Rev. Lett. 104, 038701 (2010).
Article ADS PubMed Google Scholar
Schelter, B. et al. Direct or indirect? Graphical models for neural oscillators. J. Physiol. 99, 37–46 (2006).
Google Scholar
Cobey, S. & Baskerville, E. B. Limits to causal inference with state-space reconstruction for infectious disease. PloS ONE 11, e0169050 (2016).
Article PubMed PubMed Central Google Scholar
Guo, S., Seth, A. K., Kendrick, K. M., Zhou, C. & Feng, J. Partial Granger causality—eliminating exogenous inputs and latent variables. J. Neurosci. Methods 172, 79–93 (2008).
Article PubMed Google Scholar
Frenzel, S. & Pompe, B. Partial mutual information for coupling analysis of multivariate time series. Phys. Rev. Lett. 99, 204101 (2007).
Article ADS PubMed Google Scholar
Zhao, J., Zhou, Y., Zhang, X. & Chen, L. Part mutual information for quantifying direct associations in networks. Proc. Natl Acad. Sci. USA 113, 5130–5135 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Runge, J., Petoukhov, V. & Kurths, J. Quantifying the strength and delay of climatic interactions: The ambiguities of cross correlation and a novel measure based on graphical models. J. Clim. 27, 720–739 (2014).
Article ADS Google Scholar
Leng, S. et al. Partial cross mapping eliminates indirect causal influences. Nat. Commun. 11, 1–9 (2020).
Article ADS CAS Google Scholar
Gotoh, T. et al. Model-driven experimental approach reveals the complex regulatory distribution of p53 by the circadian factor period 2. Proc. Natl Acad. Sci. USA 113, 13516–13521 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Pirgazi, J. & Khanteymoori, A. R. A robust gene regulatory network inference method base on Kalman filter and linear regression. PloS ONE 13, e0200094 (2018).
Article PubMed PubMed Central Google Scholar
Wang, Z., Liu, X., Liu, Y., Liang, J. & Vinciotti, V. An extended Kalman filtering approach to modeling nonlinear dynamic gene regulatory networks via short gene expression time series. IEEE/ACM Trans. Comput. Biol. Bioinforma. 6, 410–419 (2009).
Article Google Scholar
Lillacci, G. & Khammash, M. Parameter estimation and model selection in computational biology. PLoS Comput. Biol. 6, e1000696 (2010).
Article ADS MathSciNet PubMed PubMed Central MATH Google Scholar
McBride, D. & Petzold, L. Model-based inference of a directed network of circadian neurons. J. Biol. Rhythms 33, 515–522 (2018).
Article PubMed Google Scholar
Pitt, J. A. & Banga, J. R. Parameter estimation in models of biological oscillators: an automated regularised estimation approach. BMC Bioinforma. 20, 1–17 (2019).
Article Google Scholar
Radde, N. & Kaderali, L. Inference of an oscillating model for the yeast cell cycle. Discret. Appl. Math. 157, 2285–2295 (2009).
Article MathSciNet MATH Google Scholar
Toni, T., Welch, D., Strelkowa, N., Ipsen, A. & Stumpf, M. P. Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems. J. R. Soc. Interface 6, 187–202 (2009).
Article PubMed Google Scholar
Trejo Banos, D., Millar, A. J. & Sanguinetti, G. A Bayesian approach for structure learning in oscillating regulatory networks. Bioinformatics 31, 3617–3624 (2015).
Article PubMed PubMed Central Google Scholar
Brunton, S. L., Proctor, J. L. & Kutz, J. N. Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proc. Natl Acad. Sci. USA 113, 3932–3937 (2016).
Article ADS MathSciNet CAS PubMed PubMed Central MATH Google Scholar
Kim, J. K. & Forger, D. B. On the existence and uniqueness of biological clock models matching experimental data. SIAM J. Appl. Math. 72, 1842–1855 (2012).
Article MathSciNet MATH Google Scholar
Konopka, T. & Rooman, M. Gene expression model (in) validation by Fourier analysis. BMC Syst. Biol. 4, 1–12 (2010).
Article Google Scholar
Mangan, N. M., Brunton, S. L., Proctor, J. L. & Kutz, J. N. Inferring biological networks by sparse identification of nonlinear dynamics. IEEE Trans. Mol. Biol. Multi Scale Commun. 2, 52–63 (2016).
Article Google Scholar
McGoff, K. A. et al. The local edge machine: inference of dynamic models of gene regulation. Genome Biol. 17, 1–13 (2016).
Article Google Scholar
Pigolotti, S., Krishna, S. & Jensen, M. H. Oscillation patterns in negative feedback loops. Proc. Natl Acad. Sci. USA 104, 6533–6537 (2007).
Article ADS MathSciNet CAS PubMed PubMed Central MATH Google Scholar
Pigolotti, S., Krishna, S. & Jensen, M. H. Symbolic dynamics of biological feedback networks. Phys. Rev. Lett. 102, 088701 (2009).
Article ADS PubMed Google Scholar
Tegnér, J., Zenil, H., Kiani, N. A., Ball, G. & Gomez-Cabrero, D. A perspective on bridging scales and design of models using low-dimensional manifolds and data-driven model inference. Philos. Trans. R. Soc. A 374, 20160144 (2016).
Article ADS Google Scholar
Xie, X., Samaei, A., Guo, J., Liu, W. K. & Gan, Z. Data-driven discovery of dimensionless numbers and governing laws from scarce measurements. Nat. Commun. 13, 7562 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Chen, Z., Liu, Y. & Sun, H. Physics-informed learning of governing equations from scarce data. Nat. Commun. 12, 6136 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Kim, J. K. & Forger, D. B. A mechanism for robust circadian timekeeping via stoichiometric balance. Mol. Syst. Biol. 8, 630 (2012).
Article PubMed PubMed Central Google Scholar
Igoshin, O. A., Goldbeter, A., Kaiser, D. & Oster, G. A biochemical oscillator explains several aspects of myxococcus xanthus behavior during development. Proc. Natl Acad. Sci. USA 101, 15760–15765 (2004).
Article ADS CAS PubMed PubMed Central Google Scholar
Goodwin, B. C. Oscillatory behavior in enzymatic control processes. Adv. Enzym. Regul. 3, 425–437 (1965).
Article CAS Google Scholar
Goldbeter, A. A model for circadian oscillations in the drosophila period protein (per). Proc. R. Soc. Lond. Ser. B 261, 319–324 (1995).
Article ADS CAS Google Scholar
Maeda, M. et al. Periodic signaling controlled by an oscillatory circuit that includes protein kinases erk2 and pka. Science 304, 875–878 (2004).
Article ADS CAS PubMed Google Scholar
Park, S. H., Ha, S. & Kim, J. K. A general model-based causal inference method overcomes the curse of synchrony and indirect effect. Mathbiomed/GOBI: GOBI (General ODE-based causal inference) (v1.0.0). Zendo https://doi.org/10.5281/zenodo.7997213 (2023).
Veilleux, B. G. The analysis of a predatory interaction between didinium and paramecium (M.Sc. thesis, University of Alberta, Edmonton, Canada, 1976).
Aufinger, L., Brenner, J. & Simmel, F. C. Complex dynamics in a synchronized cell-free genetic clock. Nat. Commun. 13, 1–9 (2022).
Article Google Scholar
Potvin-Trottier, L., Lord, N. D., Vinnicombe, G. & Paulsson, J. Synchronous long-term oscillations in a synthetic gene circuit. Nature 538, 514–517 (2016).
Article ADS PubMed PubMed Central Google Scholar
Choi, B. et al. Bayesian inference of distributed time delay in transcriptional and translational regulation. Bioinformatics 36, 586–593 (2020).
Article CAS PubMed Google Scholar
Métivier, R. et al. Estrogen receptor-α directs ordered, cyclical, and combinatorial recruitment of cofactors on a natural target promoter. Cell 115, 751–763 (2003).
Article PubMed Google Scholar
Lemaire, V., Lee, C. F., Lei, J., Métivier, R. & Glass, L. Sequential recruitment and combinatorial assembling of multiprotein complexes in transcriptional activation. Phys. Rev. Lett. 96, 198102 (2006).
Article ADS PubMed Google Scholar
Novák, B. & Tyson, J. J. Design principles of biochemical oscillators. Nat. Rev. Mol. Cell Biol. 9, 981–991 (2008).
Article PubMed PubMed Central Google Scholar
Wong, T. W. et al. Air pollution and hospital admissions for respiratory and cardiovascular diseases in Hong Kong. Occup. Environ. Med. 56, 679–683 (1999).
Article CAS PubMed PubMed Central Google Scholar
Milojevic, A. et al. Short-term effects of air pollution on a range of cardiovascular events in England and Wales: case-crossover analysis of the minap database, hospital admissions and mortality. Heart 100, 1093–1098 (2014).
Article PubMed Google Scholar
Chandler. Granger causality test. MATLAB Central File Exchange (2023).

Download references

Acknowledgements

We thank Seokjoo Chae, Hyukpyo Hong, Yun Min Song, and Olive Cawiding for their valuable comments. This work was supported by Samsung Science and Technology Foundation SSTF-BA1902-01 (to J.K.K.) and Institute for Basic Science IBS-R029-C3 (to J.K.K.).

Author information

Authors and Affiliations

Department of Mathematics, University of Wisconsin-Madison, Madison, WI, 53706, USA
Se Ho Park
Biomedical Mathematics Group, Institute for Basic Science, Daejeon, 34126, Republic of Korea
Se Ho Park, Seokmin Ha & Jae Kyoung Kim
Department of Mathematical Sciences, KAIST, Daejeon, 34141, Republic of Korea
Seokmin Ha & Jae Kyoung Kim

Authors

Se Ho Park
View author publications
You can also search for this author in PubMed Google Scholar
Seokmin Ha
View author publications
You can also search for this author in PubMed Google Scholar
Jae Kyoung Kim
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

S.H.P., S.H., and J.K.K. designed the research. S.H.P. and S.H. developed the method. S.H.P. performed computation. S.H.P. analyzed data. J.K.K. supervised the project. All authors wrote the paper.

Corresponding author

Correspondence to Jae Kyoung Kim.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks the anonymous reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Peer Review File

Description of Additional Supplementary Files

Supplementary Data1

Reporting Summary

Source data

Source Data

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Park, S.H., Ha, S. & Kim, J.K. A general model-based causal inference method overcomes the curse of synchrony and indirect effect. Nat Commun 14, 4287 (2023). https://doi.org/10.1038/s41467-023-39983-4

Download citation

Received: 01 December 2022
Accepted: 22 June 2023
Published: 24 July 2023
DOI: https://doi.org/10.1038/s41467-023-39983-4

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.