Introduction

Networks underlie much cellular and biological behavior, including transcriptional, protein-protein interaction, signaling, metabolic, cell-cell, endocrine, ecological, and social networks, among many others. As such, identifying and then representing their structure has been a focus of many for decades now. This is not just from experimental perspectives alone, but predominantly computational with a variety of statistical methodologies that integrate prior knowledge from interaction databases with new experimental data sets1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24. Alternatively, a variety of methods have investigated general ways to infer detailed reaction mechanisms—often a foundation of networks—from experimental data25,26,27,28,29. Such tasks may be considered a subset of network inference.

Network structure is usually represented as either an undirected or a directed graph, with edges between nodes specifying the system. There are five main areas where current approaches to reconstructing networks struggle to capture important features of biological networks. The first is directionality of edges6,8,30,31. Commonly employed correlational methods predominantly generate undirected edges, which impedes causal and other mechanistic analyses. Second is cycles. Cycles such as feedback or feedforward loops are nearly ubiquitous in biological systems and central to their function32,33. This also includes an important type of cycle: self-regulation of a node, that is, an edge onto itself, which is rarely considered34. Third is that biological networks are often dynamic. Two notable examples are circadian and p53 oscillators35,36, where dynamics are key to biological function. Directionality and edge signs (i.e. positive or negative) dictate dynamics. Fourth is pinpointing how external variables impinge on network nodes. For example, is the effect of a growth factor on a network node direct, or though other nodes in the network? Fifth, the design and method employed should be robust to typical experimental noise levels. The experimental design and data requirements to uniquely identify the dynamic, directed and signed edge structures in biological networks containing all types of cycles and external stimuli remains a largely open but significant problem. Any such design should ideally be feasible to implement with current experimental technologies.

Modular Response Analysis (MRA) approaches, first pioneered by Kholodenko and colleagues in 200237,38 inherently deal with cycles and directionality by prescribing systematic perturbation experiments followed by steady-state measurements. The premise for data requirements is to measure the entire system response to at least one perturbation for each node. Thus, an n node system requires n experiments, if the system response can be measured in a global fashion (i.e. all nodes measured at once). The original instantiations struggled with the impact of experimental noise, but total least squares MRA and Monte Carlo sampling helped to improve performance39,40,41. Incomplete and prior knowledge can be handled as well using both maximum likelihood and Bayesian approaches42,43,44,45. However, these approaches are based on steady-state data, or fixed time point data, limiting abilities to deal with dynamic systems. There is a formal requirement for small perturbations, which are experimentally problematic and introduce issues for estimation with noisy data. Subsequent approaches have recommended the use of large perturbations as a trade off in dealing with noisy data, but the theory still formally requires small perturbations41. Lastly, there are two classes of biologically relevant edges that MRA does not comprehensively address. First is self-regulation of a node, which is often normalized (to -1) causing it to not be uniquely identifiable. The other are the effects of stimuli external to the network (basally present or administered) on the modeled nodes.

In addition to perturbations, another experimental design feature that can inform directionality is a time-series, which have also been integrated into MRA. This work37,46 uses time-series perturbation data to uniquely infer a signed, directed network that can predict dynamic network behavior. In an n node open system (e.g. protein levels are not constant), multiple nodes would either be distinctly perturbed more than once, such as both production and degradation of a transcript, or phosphorylation and dephosphorylation of a protein, or the system monitored before and after the perturbation (with one perturbation per node). This can be experimentally challenging both in terms of scale and finding suitable distinct perturbations for a node. Moreover, as is often the case, noise in the experimental data severely limits inference accuracy (due to required estimation of 2nd derivatives). Subsequent work47 recommends smaller perturbations and difference in timepoints but also does not address noisy data. Further work has demonstrated that larger perturbations produce better results due to inevitable experimental noise41. Thus, there remains a need for methods that can infer signed, directed networks from feasible perturbation time course experiments that capture dynamics, can uniquely estimate edge properties related to self-regulation and external stimuli, and finally that function in the presence of typical experimental noise levels.

Here we describe a novel, MRA-inspired approach called Dynamic Least-squares MRA (DL-MRA). For an n-node system, n perturbation time courses are required, and thus experimental requirements scale linearly as the network size increases. The approach uses an underlying network model that captures dynamic, directional, and signed networks that include cycles, self-regulation, and external stimulus effects. We test DL-MRA using simulated time-series perturbation data with known network topology under increasing levels of simulated noise. The approach has good accuracy and precision for identifying network structure in randomly generated two and three node networks that contain a wide variety of cycles. For the investigated cases, we find between 7 to 11 evenly distributed time points yielded reasonable results, although we expect this will strongly depend on time point placement. We apply the approach to models describing a cell state switching network48, a signal transduction network49, and a gene regulatory network32. Although signaling networks are often a focus in network biology, our analysis suggests they have unique properties that render them generally recalcitrant to reconstruction. Results from the gene regulatory network application suggest that incomplete perturbation (e.g. partial knockdown vs. knockout) is more informative than complete inhibition. While challenges remain for expanding to other and larger systems, the proposed algorithm robustly infers a wide range of networks with good specificity and sensitivity using feasible time course experiments, all while making progress on limitations of current inference approaches.

Results

Formulation of sufficient experimental data requirements for network reconstruction

Consider a 2-node network with four directed, weighted edges (Fig. 1a). An external stimulus may affect each of the two nodes differently and its effect is quantified by S1,ex and S2,ex, respectively (e.g. Methods, Eq. (15)). We also allow for basal/constitutive production in each node (Si,b). Let xi(k) be the activity of node i at time point tk. The network dynamics can be cast as a system of ordinary differential equations (ODEs) as follows

$$\frac{{dx_1}}{{dt}} \equiv f_1(x_1(k),x_2(k),S_{1,ex},S_{1,b}) \equiv f_1(k);\frac{{dx_2}}{{dt}} \equiv f_2(x_1(k),x_2(k),S_{2,ex},S_{2,b}) \equiv f_2(k).$$
(1)
Fig. 1: Overall DL-MRA approach.
figure 1

a Two-node network with Jacobian elements labeled. Green arrows are stimuli and basal production terms. Time course experimental design with perturbations: vehicle (b), Node 1 (d), Node 2 (f). The vehicle may be the solvent like DMSO for inhibition with a drug, or a nontargeting si/shRNA for inhibition with si/shRNA. Simulated time course data for Vehicle perturbation (c), Node 1 perturbation (e), Node 2 perturbation (g) from the network in a. Left Column: no added noise; Right Column 10:1 signal-to-noise added. Actual versus inferred model parameters (S1,b, S1,ex, F11, F12, S2,b, S2,ex, F21, F22) for direct solution of Eqs. 34 in the absence (h) or presence (i) of noise, or with noise and the least-squares approach (j). In h and i, error bars are standard deviation across time points.

The network edges can be connected to the system dynamics through the Jacobian matrix J37,38,46,

$${{{\mathbf{J}}}} \equiv \left( {\begin{array}{*{20}{c}} {F_{11}} & {F_{12}} \\ {F_{21}} & {F_{22}} \end{array}} \right) \equiv \left( {\begin{array}{*{20}{c}} {{\textstyle{{\partial f_1} \over {\partial x_1}}}} & {{\textstyle{{\partial f_1} \over {\partial x_2}}}} \\ {{\textstyle{{\partial f_2} \over {\partial x_1}}}} & {{\textstyle{{\partial f_2} \over {\partial x_2}}}} \end{array}} \right)$$
(2)

The network edge weights (Fij’s) describe how the activity of one node affects the dynamics of another node in a causal and direct sense, given the explicitly considered nodes (though not necessarily in a physical sense). In practice, however, causality can only be approached if every component of the system is included in the model, which is not typical (and even more so, there must be no model mismatch, which is almost impossible to guarantee)6,30,31,50,51. In MRA, these nodes may be individual species or “modules”. In order to simplify a complex network it may often be separated into “modules” comprising smaller networks of inter-connected species with the assumption that each module is generally insulated from other modules except for information transfer through so-called communicating species37. Cases where such modules may not be completely isolated are explored elsewhere52.

What experimental data are sufficient to uniquely estimate the signed directionality of the network edges and thus infer the causal relationships within the system? Fundamentally, we know that perturbations and/or dynamics are important for inferring causality6,37,46,51,52. Consider a simple setup of three time-course experiments that each measure x1 and x2 dynamics in response to a stimulus (Fig. 1b–g). One time course is in the presence of no perturbation (vehicle), one has a perturbation of Node 1, and one has a perturbation of Node 2. Consider further that the perturbations are reasonably specific, such that the perturbation of x1 has negligible direct effects on x2, and vice versa, and that these perturbations may be large. Experimentally, this could be an shRNA or gRNA that is specific to a particular node, or that a small molecule inhibitor is used at low enough dose to predominantly inhibit the targeted node. A well-posed estimation problem can be formulated (see Methods) that, in principle, allows for unique estimation of the Jacobian elements as a function of time with the following set of linear algebra relations:

$$\left[ {\begin{array}{*{20}{c}} {y_1(t_{k + 1})} \\ {y_{1,2}^{}(t_k)} \end{array}} \right] = \left[ {\begin{array}{*{20}{c}} {\Delta _tx_1(t_{k + 1})} & {\Delta _tx_2(t_{k + 1})} \\ {\Delta _{p,2}^{}x_1(t_k)} & {\Delta _{p,2}^{}x_2(t_k)} \end{array}} \right]\left[ {\begin{array}{*{20}{c}} {F_{11}(t_k)} \\ {F_{12}(t_k)} \end{array}} \right]$$
(3)
$$\left[ {\begin{array}{*{20}{c}} {y_2(t_{k + 1})} \\ {y_{2,1}^{}(t_k)} \end{array}} \right] = \left[ {\begin{array}{*{20}{c}} {\Delta _tx_1(t_{k + 1})} & {\Delta _tx_2(t_{k + 1})} \\ {\Delta _{p,1}^{}x_1(t_k)} & {\Delta _{p,1}^{}x_2(t_k)} \end{array}} \right]\left[ {\begin{array}{*{20}{c}} {F_{21}(t_k)} \\ {F_{22}(t_k)} \end{array}} \right]$$
(4)

Here, yi,j refers to a measured first-time derivative of node i in the presence of node j perturbation (if used), and Δ to a difference with respect to perturbation (subscript p) or time (subscript t) (see Methods). Since we do not use data from the perturbation of node i for estimation of node i edges, we do not have to impose assumptions on how the perturbation functionally acts on the system dynamics (see Methods). Moreover, constraints on the perturbation strength can be relaxed, following recent recommendations41 (although accuracy of the underlying Taylor series approximation can affect estimation—see Methods). If these measurements with and without perturbations were each taken in their steady state as is done in MRA, the solution for Fij would be trivial. MRA gets around this by normalizing self-regulatory parameters Fii to -1. Using dynamic data allows unique estimation of self-regulatory parameters without such normalization. Estimation of the node-specific stimulus strengths or basal production rates (S’s) requires evaluation after specific functional assumptions, but in general these effects are knowable from the data to be generated (see Methods and below results).

Note that this formulation is generalizable to an n dimensional network. With n2 unknown parameters in the Jacobian matrix, n equations originate from the vehicle perturbation and n−1 equations originate from each of the n perturbations (discarding equations from Node i with Perturbation i). This results in \(n + n^\ast (n - 1) = n + n^2 - n = n^2\) independent equations.

$$\begin{array}{l}S_{1,b} = - \left(F_{11}x_{1,ss} + F_{12}x_{2,ss}\right)\\ S_{2,b} = - \left(F_{21}x_{1,ss} + F_{22}x_{2,ss}\right)\end{array}$$

Using sufficient simulated data to reconstruct a network

As an initial test of the above formulation, we used a simple 2 node, single activator network where Node 1 activates Node 2, one node has first-order degradation (-1 diagonal elements), and the other has negative self-regulation (-0.8 diagonal) (Fig. 1a—see Methods for equations). A stimulus at t = 0 (time-invariant; S,ex = 1) increases the activity of each node, which we sample with an evenly spaced 11-point time course. This simulation was done for no perturbation (i.e. vehicle) and for each perturbation (Node 1 and Node 2) to generate the necessary simulation data per the theoretical considerations above (Fig. 1c, e, g, left panel). Here, we modeled perturbations as complete inhibition; for example, a perturbation of Node 1 makes its value 0 at all times. Solving Eqs. (3) and (4) to infer the Jacobian elements at each time point yielded good agreement between the median estimates and the ground truth values (Fig. 1h, “Analytic Solution”, No Noise). Using the node activity data corresponding to the last time point in the time course and the median estimates of Jacobian elements, the external stimuli S1,ex and S2,ex were also determined (Eqs. (18) and (19)) and reasonably agree with the ground truth values.

How does this approach fare when data are noisy? We performed the estimation with the same data but with a relatively small amount of simulated noise added (10:1 signal-to-noise—Fig. 1c, e, g). The resulting estimates are neither accurate nor precise, varying on a scale more than ten times greater than each parameter’s magnitude with median predictions both positive and negative regardless of the ground truth value (Fig. 1i). The stimulus strengths S1,ex and S2,ex are estimated to be negative, while the ground truth is positive.

Although the analytic equations suggest the sufficiency of the perturbation time course datasets to uniquely estimate the edge weights, in practice even small measurement noise corrupts estimates obtained from direct solution of these equations. Therefore, we considered an alternative representation by employing a least squares estimation approach rather than solving the linear equations directly. For a given set of guesses for edge weight and stimulus parameters, one can integrate to obtain a solution for the dynamic behavior of the resulting model, which can be directly compared to data in a least-squares sense. Least squares methods were shown to improve traditional MRA-based approaches39,40, but had never been formulated for such dynamic problems. Two hurdles were how to model the effect of a perturbation without (i) adding additional parameters to estimate or (ii) requiring strong functional assumptions regarding perturbation action. We solved these here by using the already-available experimental measurements within the context of the least-squares estimation (see Methods). We applied this approach to the single activator model, 10:1 signal-noise ratio case above where the analytic approach failed. This new estimation approach was able to infer the network structure accurately and precisely (Fig. 1j). We conclude that analytic formulations can be useful for suggesting experimental designs that should be sufficient for obtaining unique estimates for a network reconstruction exercise, but in practice directly applying those equations may not yield precise nor accurate estimates. Alternatively, using a least-squares formulation seems to work well for this application.

Reconstruction of random 2 and 3 node networks

To investigate the robustness of the least-squares estimation approach, we applied it to increasingly complex networks with larger amounts of measurement noise and smaller numbers of time points (Fig. 2). We focused on 2 and 3 node networks. We generated 50 randomized 2 and 3 node models, where each edge weight is randomly sampled from a uniform distribution over the interval [−2, 2], and the basal and external strength from [0, 2] (Fig. 2a, Supplementary Figs. S1a and S2a). Each random network was screened for stability. Many networks (29/50 for 2 node and 3 node) displayed potential for oscillatory behavior (non-zero imaginary parts of the eigenvalues of the Jacobian matrix). However, since the real parts of the eigenvalues are non-zero and negative, these oscillations should dampen over time, and no sustained oscillatory behavior was analyzed. For each random model, we generated a simulated dataset based on the prescribed experimental design, using complete inhibition as the perturbation. We considered evenly-spaced sampling within the time interval of 0-10 AU (approximate time to reach steady-state—Supplementary Figs. S1b and S2b) with different numbers of time points (3, 7, 11 and 21), and added 10:1 signal-to-noise, 5:1 signal-to-noise, and 2:1 signal-to-noise to the data. Non-uniform time point spacing may change inference results but that was not explored at these first investigations.

Fig. 2: Application to linear two and three node models.
figure 2

a Connections around a Node i in an n-Node Model. Si,b and Si,ex are the basal production and external stimulus terms acting on Node i, respectively. Fii is the self-regulation term; Fij the effect of Node j on Node i and Fji the effect of Node i on Node j. b Example of different signal-to-noise ratio effects on time course data. Ground truth versus estimated edge weights across all 50 random networks and noise levels for data from four different total timepoints (3,7,11,21) for 2 node (c) and 3 node (d) networks. Quadrant shading indicates edge classification. Fraction of network parameters correctly classified in 50 randomly generated 2 node networks (e) and 3 node networks (f) with different noise levels and total timepoints. g Fraction of network parameters correctly classified in 50 randomly generated 3 node networks with dynamic MRA using two sets of perturbation data.

For each random network model, number of time points, and noise level, we evaluated the fidelity of the proposed reconstruction approach in terms of signed directionality (Fig. 2c–f). We overall found reasonable agreement between inferred and ground truth values, even at the higher noise levels and low number of timepoints. Expectedly, the overall classification accuracy increases with more time points and decreases with higher noise levels. But, surprisingly, even in the worst case investigated of 3 timepoints and 2:1 signal-to-noise ratio, classification accuracy was above 85% for 2 node models and 70% for 3 node models. Increasing the number of nodes decreases performance, with 3-node reconstruction being slightly worse than 2-node reconstruction, other factors held constant.

We wondered whether the magnitude of an edge weight influenced its classification accuracy, since small edge weights may be more difficult to discriminate from noise. We found that edge weights with greater absolute values, which are expected to have a greater influence on the networks, were more likely to be classified correctly (Supplementary Figs. S1c–f and S2c–f). Also, for models with damped oscillatory behavior, the classification accuracy is very similar to that of all 50 random models (Supplementary Fig. S3a, b).

How does this method compare to similar network reconstruction methods? There are limited methods to compare to which also use dynamic data and sequential perturbations. MRA37, from which this method was inspired, uses steady-state data. However, we could use MRA methods requiring dynamic perturbation data as is used in our method46,53.To compare, we further generated another set of perturbation data with 50% perturbation (as opposed to 100%). We then used the two sets of perturbation data to estimate the network node edges with dynamic modular response analysis (Fig. 2g). Even in absence of noise, for low to medium numbers of timepoints (3-11) the network is not always accurately inferred (Fig. 2g). In the presence of noise, DL-MRA performs better, although the difference between the two methods becomes lower at high number of timepoints. Thus, DL-MRA not only outperforms with half the data, but it also estimates 6 additional parameters-basal production and external stimulus for each node. Although Cho’s approach47 builds upon MRA methods by recommending smaller time point intervals and smaller perturbations, for our purposes, the time intervals and perturbations are fixed and this would not affect the results obtained here. Moreover, further work has actually recommended larger perturbations while dealing with noisy data41.

To explore a scenario where data from a node might be unavailable, we removed the data from one of the nodes in the 50 random 3 node models and used the remaining data to reconstruct a 2-node system (Supplementary Fig. S4). Comparing with corresponding model parameters in the 3 node system, we find a good but expectedly reduced classification accuracy (No Noise-94.75%, 10:1 Signal: Noise-93.75%, 5:1 Signal: Noise-91.25%, 2:1 Signal: Noise-87).

A part of the inference process is performing parameter estimation using multiple starting guesses (i.e. multi-start), and we wanted to determine how robust the estimated parameters were across the multi-start processes. We looked at the distribution of coefficient of variation (CV) among the parameters in the multi-start results in the 50 random 3 node models where either the data generated from the estimated parameters had low sum of squared errors (SSE) compared to the original data (<10−4) or with SSE less than twice the minimum SSE. We find that the CVs peak around zero and generally have a small spread, especially for low noise scenarios (Supplementary Fig. S5). This implies a good convergence of the parameter sets obtained through multi-start.

We conclude that the network parameters of 2 and 3 node systems can be robustly and uniquely estimated using DL-MRA. However, these were ideal conditions where there was no model mismatch that is expected in specific biological applications. How does DL-MRA perform when applied to data reflective of different biological use cases?

Application to cell state networks

Cell state transitions are central to multi-cellular organism biology. They are commonly transcriptomic in nature and underlie development and tissue homeostasis and can also play roles in disease, such as drug resistance in cancer48,54,55,56,57,58,59,60,61. Could DL-MRA reconstruct cell state transition networks? As the application, we use previous data on SUM159 cells that transition between luminal, basal and stem-like cells48. Pure populations of luminal, basal and stem-like cells eventually grow to a stable final ratio amongst the three. The authors used a discrete time Markov transition probability model to describe the data and estimate a cell state transition network (Fig. 3a). Thus, we seek to compare DL-MRA to such a Markov model in this case.

Fig. 3: Application to cell state transition networks.
figure 3

a Markov transition model of SUM159 cell states. b Cell proportions over time for SUM159 cells using Markov transition parameters (dots), starting at different initial proportions and respective DL-MRA model fits (lines). c Parameters from DL-MRA estimates of SUM159 data are similarly classified as transformed Markov parameters (See Methods, Eqs. (29) and (30)). d Ground truth versus estimated edge weights across 50 random cell transition networks and noise levels for data from four different total timepoints (3,7,11,21).

We hypothesized that perturbations to the system in this case, in contrast to above, did not have to change node activity (i.e. edges). Rather, we thought that perturbing the equilibrium cell state distribution could serve an equivalent purpose. Thus, the data for reconstruction consisted of observing the cell state proportions evolve over time from “pure” populations (Fig. 3b), in addition to equal proportions. DL-MRA is capable of explaining the data (Fig. 3b). Interpretation of the estimated network parameters to DL-MRA depends on the transformation of the original discrete time Markov probabilities to a continuous time formulation (see Methods—there are constraints on self-regulatory parameters), but DL-MRA correctly infers the cell state transition network as well (Fig. 3c). Conveniently, DL-MRA is not constrained to 1-day time point spacing as is the original discrete time Markov model.

How does noise and the number of timepoints affect the reconstruction? As above, we generated data for 50 random cell state transition models with 3, 7, 11 and 21 timepoints within 5 days, as the models generally seemed to reach close to equilibrium within 5 days. Noise levels of 10:1, 5:1 and 2:1 were used. All parameters were classified accurately (Fig. 3d) (although additional constraints in the estimation—see Methods—facilitates this classification performance). With 3 timepoints, there was deviation from perfect fit even with no noise in the data. At 7 and higher number of timepoints, the estimates matched ground truth well, and noise expectedly reduced the accuracy (Fig. 3d). We conclude that DL-MRA can robustly infer cell state networks given perturbation data in the form of non-equilibrium proportions as initial conditions.

Application to intracellular signaling networks

How does the method perform for intracellular signaling networks? The Huang–Ferrell model49 (Fig. 4a) is a well-known intracellular signaling pathway model and has been investigated by different reconstruction methods, including previous versions of MRA37,39,41,46,62. It captures signal flux through a three-tiered MAPK cascade where the 2nd and 3rd tier contain two phosphorylation sites. An important aspect of the Huang–Ferrell model is that although the reaction scheme is a cascade and without obvious feedbacks, there may be hidden feedbacks due to sequestration effects and depending on how the perturbations were performed.

Fig. 4: Application to a signaling network.
figure 4

a Full Reaction scheme for the Huang–Ferrell (HF) Model, depicting the parameters k3, k15 and k27 which were perturbed sequentially to generate the perturbation data. b Model coarse-graining to a 3-node network. c Data generated for each node with a small E1 stimulus (2.5 × 10−6 uM). d Model parameters estimated as significant (bold) and negligible (dotted lines). e SELDOM true graph values represented in the 3-node model with parameters considered (bold) and not considered (dotted lines).

In order to reconstruct the Huang–Ferrell MAPK network through DL-MRA, we first simplified it to a three-node model with p-MAPKKK, pp-MAPKK and pp-MAPK as observable nodes, as is typical for reconstruction efforts (Fig. 4b)37,39,41,46,52,62. Second, to model perturbations, we sequentially perturbed the activation parameters of each of the observable species (k3, k15 and k27 respectively). Such perturbations, although hard to achieve experimentally, are important because modules must be “insulated” from one another and perturbations must be specific to the observables37,52. Even specific inhibitors do not have such kinetic specificity. Third, in the simplification of the reaction scheme, the observables are shown to influence each other but in the actual scheme, they conduct their effects through the unphosphorylated and semi-phosphorylated species. We sought to keep the levels of these two species relatively constant between different perturbations, so that they wouldn’t add to non-linearities in the estimation. Therefore, we used a stimulus which only activated the observables to a maximum of about 5% of the total forms of the protein52.

Estimation with DL-MRA under the above conditions fits the data (Fig. 4c) and predicts positive node edges down the reaction cascade (F21, F32), negligible direct relation between p-MAPKKK and pp-MAPK (F13, F31), negative self-regulation of each of the observables (F11, F22, F33) negative feedbacks from pp-MAPKK to p-MAPKKK (F12) and from pp-MAPK to pp-MAPKK (F23), and negligible external stimuli on pp-MAPK to pp-MAPKK (F13, F31). All these effects are consistent with the reaction scheme. The negative feedback effects, although not immediately obvious, are consistent with ground truth sequestration effects. For instance, pp-MAPK has an overall negative effect on pp-MAPKK as the existence of pp-MAPK lowers the amount of species MAPK and p-MAPK which sequester pp-MAPKK and makes it avoid deactivation by its phosphatase.

How do the estimation results for the Huang–Ferrell model in our method compare with those obtained from other methods? Previous work using MRA also reported negative feedbacks from successive modules to the preceding ones37,46,52. Similarly, self-regulation parameters in most preceding MRA based methods are also estimated to be negative but are fixed at -137,39,52.

Besides MRA inspired methods, SELDOM is another network reconstruction method which can also deal with dynamic data62. SELDOM is a data-driven method which uses ensembles of logic based dynamic models followed by training and model reduction steps to predict state trajectories under untested conditions. However, when dealing with the Huang–Ferrell network, the true value model of SELDOM does not map the effects of self-regulation, nor feedback effects between nodes (Fig. 4e). This may be explained by the fact that although SELDOM uses an extensive number of models to test the data obtained from multiple different stimuli, perturbation data was not included to test the Huang–Ferrell Model. This implies that systematic perturbation of each of the nodes, as prescribed by MRA-based methods, are necessary in order to unearth feedbacks and self-regulation effects.

Although application of DL-MRA to the Huang–Ferrell model was able to unearth latent network structure, the simulation conditions were restrictive. First, the perturbation scheme chosen in this paper, although specifically targeted at the observable species, is hard to produce experimentally. In practice, knock-down/out, overexpression, or specific inhibitors could be used as suitable perturbations, but do not have the preciseness needed to be compatible with MRA-imposed constraints. The feedback effect observed could depend on the perturbation scheme chosen-for instance knockdown of an entire module as a perturbation would likely have manifested as positive feedback to the preceding module. That is because such a knockdown would have reduced the effect of sequestration of the module on the preceding observable and would have made it more available for dephosphorylation. Second, we assumed a low stimulus to avoid effects from the unphosphorylated version of the proteins. A higher activation may increase non-linearities adding to the complexity of the model, whereas a lower stimulus may not activate enough proteins to be well detected in experiments. The degree of activation needed for an experiment may be hard to predict beforehand. Such specific perturbations and stimulus had to be done to reduce the effects arising from the non-observable species behavior. Hence application of DL-MRA to intracellular signaling networks with multiple physical interactions needs to be carefully considered before modeling or experiments.

Application to gene regulatory networks: partial perturbations are more informative than full perturbations

Here, we applied DL-MRA further to a series of well-studied non-linear feed forward loop (FFL) gene regulatory network models that have time-varying Jacobian elements (Fig. 5a, Table 1)32,33. Such FFL motifs are strongly enriched in multiple organisms and are important for signaling functions such as integrative control, persistence detection, and fold-change responsiveness63,64,65.

Fig. 5: Application to 16 non-linear gene regulatory networks: no noise.
figure 5

a Feedforward loop (FFL) network models. Across all 16 models (Table 1), F11, F22, and F33 values are fixed at -1 and F12, F13, and F23 values are fixed at 0. F21, F31, and F32 values can be positive or negative depending on the model. The combined effect of x1 and x2 on x3 is described by either an AND gate or an OR gate. There are 16 possible model structures (Table 1). b 100% inhibitory perturbations may not provide accurate classification even without noise. In Model #1, F31 is positive (ground truth) but is estimated as null. c Specific structure of Model #1. d Node activity simulation data for 100% inhibition in Model #1, implying that it is impossible to infer F31 from such data. e Node activity simulation data for 50% inhibition in Model #1, showing potential to infer F31. f Fraction of model parameters correctly classified in all the 16 non-linear models without noise, for 100% inhibition vs 50% inhibition.

Table 1 Structure of each of the 16 non-linear models.

The FFL network has three nodes (x1, x2, and x3), and the external stimulus acts on x1 (S1,ex). There is no external stimulus on x2 and x3; however, there may be basal production of x2 (S2,b) and x3 (S3,b),. Each node exhibits first-order decay (Fii = −1). The parameters F12, F13, and F23 represent connections that do not exist in the model; we call these null edges, but we allow them to be estimated. The relationship between x1 and x2 (F21), between x1 and x3 (F31), or between x2 and x3 (F32) can be either activating or inhibitory. Furthermore, x1 and x2 can regulate x3 through an “AND” gate (both needed) or an “OR” gate (either sufficient) (Fig. 5a). These permutations give rise to 16 different FFL structures (Table 1).

To generate simulated experimental data from these models, we first integrated the system of ODEs starting from a zero initial condition to find the steady state in the absence of stimulus. We then introduced the external stimulus and integrated the system of ODEs (see Methods) to generate time series perturbation data consistent with the proposed reconstruction algorithm, using full inhibitory perturbations. We used 11 evenly spaced timepoints for all 16 non-linear models, based on the random 3-node model analysis above, and also added noise as above.

We first noticed that even in the absence of added noise, a surprising number of inferences were incorrect (Fig. 5b, f). Model #1 (Table 1, Fig. 5b, c) is used as an example, where F21, F31 and F32 are activators with an AND gate, and F31 is incorrectly predicted as null (Fig. 5b—compare ground truth to 100% inhibition). To understand the reason for the incorrect estimation, we looked at the node activity dynamics across the perturbation time courses (Fig. 5d). All three nodes start from an initial steady state of zero, but Node 3 is zero for all three perturbation cases. This is because of the following. Since x1 is required for the activation of x2 and x3, complete inhibition of x1 completely blocks both x2 and x3 activation. But, because both x1 and x2 are required for the activation of x3, completely inhibiting x2 activity also completely inhibits x3. Thus, given this experimental setup, it is impossible to discern if x1 directly influences x3 or if it acts solely through x2.

We thus reasoned that full inhibitory perturbation may suppress the information necessary to correctly reconstruct the network, but that a partial perturbation experiment may contain enough information available to make a correct estimate. If this were true, then upon applying partial perturbations (we chose 50% here), Node 3 dynamics should show differences across the perturbation time courses. Simulations showed that this is the case (Fig. 5e). Subsequently, we found that for partial perturbation data, F31 is correctly identified as an activator. More broadly, we obtain perfect classification from noise-free data across all 16 FFL networks when partial perturbation data are used, as opposed to 5/16 networks having discrepancy with full perturbation data (Fig. 5f). The fits to simulated data from the reconstructed model align very closely, despite model mismatch (Supplementary Fig. S6). We conclude that in these cases of non-linear networks, a partial inhibition is necessary to estimate all the network parameters accurately. Thus, moving forward, we instead applied 50% perturbation to all simulation data and proceeded with least squares estimation.

Application to gene regulatory networks: performance

The above analysis prompted us to use a partial (50%) perturbation strategy, since it classified each edge for each model in the absence of noise correctly. What classification performance do we obtain in the presence of varying levels of experimental noise? We first devised the following strategy to assess classification performance. We generated 50 bootstrapped datasets for each network structure/signal-to-noise pair, and thus obtained 50 sets of network parameter estimates. To classify the network parameters, we used a symmetric cutoff of a percentile window around the median of these 50 estimates (Fig. 6a). We illustrate this approach with three different example edges and associated estimates, one being positive (Edge 1), one being negative (Edge 2), and one being null (Edge 3). Given the window of values defined by the percentile cutoff being chosen, if the estimates in this window are all positive, then the network parameter would be classified as positive. Similarly, if the estimates in this window are all negative, then the parameter would be classified as negative. Finally, if the estimates in the window cross zero (i.e. span both positive and negative terms), then it would be classified as null. First, consider the case that the percentile window is just set at the median with no percentile span. Then, the classifications for true positives and negatives are likely to be accurate while the null parameters are likely to be incorrectly categorized as either positive or negative (Fig. 6a). If we increase the percentile window span slightly (e.g. between the 40th and 60th percentile, middle panel), we can categorize null edges better, while maintaining good classification accuracy of both true positive and negative edges. However, if we relax the percentile window too much, (e.g. between the 10th and 90th percentile, far right panel) we may categorize most parameters as null, including the true positive and negatives. Thus, it is clear there will be an optimal percentile cutoff that maximizes true positives and minimizes false positives as the threshold is shifted from the median to the entire range.

Fig. 6: Application to 16 non-linear gene regulatory networks: including noise.
figure 6

a Classification scheme for a distribution of parameter estimates. Going from left to right panels, the same parameter distribution with an actual (ground truth) value of positive (+), negative (−), or null (0), respectively, is estimated using different percentile windows centered on the median. The percentile “window” is the median value for the leftmost panel (rigorous classification), between 40th and 60th percentile in the second panel, and between 10th and 90th percentile in the third panel (conservative classification). Going from rigorous to conservative (left to right), an intermediate between the two gives a good classification performance. b ROC curves across all parameters for all 16 FFL models. Different color lines are different noise levels. c Fraction of correctly classified model parameters for different noise levels broken down by FFL model type. d Fraction of each model parameter correctly classified for different noise levels broken down by parameter type.

Now, we applied this classification strategy to the 16 FFL model estimates from data with different noise levels. We varied the percentile window from the median only (50) to the entire range of estimated values (100) and calculated the true and false positive rates for all edges across all 16 FFL models, which allowed generation of receiver operator characteristic (ROC) curves (Fig. 6b). For each noise level, we chose the percentile window that yielded a 5% false positive rate (13-87 percentile for 10:1 Signal:Noise, 19-81 percentile for 5:1 Signal:Noise, and 21-79 percentile for 2:1 Signal:Noise). Using this simple cutoff classifier, we observed good classification performance across all noise levels according to traditional area under the ROC curve metrics (10:1 AUC = 0.99, 5:1 AUC = 0.9, 2:1 AUC = 0.92).

How does classification accuracy break down by FFL model and edge type? To evaluate the performance for each of the 16 FFL cases, we calculated the fraction of the 12 links in each FFL model that was classified correctly as a function of signal-to-noise, given the percentile windows determined above (Fig. 6c). We also looked at the fraction of the 16 models where each of the 12 links were correctly classified (Fig. 6d). Perfect classification is a value of one, which is the case for no noise, and for many cases with 10:1 signal-to-noise.

In general, as noise level increases, prediction accuracy decreases, as expected. Although for some models and parameters, performance at 2:1 signal-to-noise is poor, in some cases it is surprisingly good. This suggests that the proposed method can yield information even in high noise cases; this information is particularly impactful for null, self-regulatory, and stimulus edges. High noise has strong effects on inference of edges that are either distinct across models, time variant or reliant on other node activities (F21, F31, F32) (Fig. 6c, d, Supplementary Fig. S7). F21, which is reliant on activity of x1, is inferred better than F31 and F32. This may be caused by the fact that x3 dynamics depend on both x1 and x2, whereas x2 dynamics only depend on x1.

Comparing across models, we find that Models 1-8 are reconstructed slightly better than Models 9-16 (Fig. 6c) when noise is high. This performance gap is predominantly caused by S3,b misclassification—basal production of Node 3 (Supplementary Fig. S7). What is the reason for the possible misclassification of S3,b in Models 9-16? We know that S3,b depends on the initial values of x1, x2 and x3 and the estimated values of F31, F32 and F33 (See Methods, Eq. (19)). For Models 1-8, x1(t = 0) and x2(t = 0) are both zero and therefore S3,b is effectively only dependent on estimated value of F33 and x3(t = 0) (Supplementary Fig. S6 and Methods). But for Models 9-16, x2(t = 0) is non-zero and S3,b is dependent on the estimated values of both F32 and F33, in addition to x2(t = 0) and x3(t = 0), which increases the variability of S3,b estimates. Therefore with high levels of noise, S3,b is more likely to be mis-classified in Models 9-16, whereas this does not happen in Models 1-8 (Fig. 6c, d, Supplementary Fig. S7). In the future, including stimulus and basal production parameters in the least squares estimations themselves, rather than further deriving algebraic relations to estimate them, will likely help improve reliability.

We conclude that (i) when dealing with non-linear gene regulatory networks, complete perturbations such as genetic knockouts may fundamentally impede one’s ability to deduce network architecture and (ii) this class of non-linear networks can be reconstructed with reasonable performance using the proposed strategy employing partial perturbations.

Discussion

Despite intensive research focus on network reconstruction, there is still room to improve discrimination between direct and indirect edges (towards causality), particularly when biologically-ubiquitous feedback and feedforward cycles are present that stymie many statistical or correlation-based methods, and given that experimental noise is inevitable. The presented DL-MRA method prescribes a realistic experimental design for inference of signed, directed edges when typical levels of noise are present. It allows estimation of self-regulation edges as well as those for basal production and external stimuli. For 2 and 3 node networks, the method can successfully handle random linear networks, cell state transition networks, and gene regulatory networks, and, under certain limiting conditions, signaling networks. Prediction accuracy improved with more timepoints, which in our case accounted for more relevant dynamic data. However, we would like to stress that here we did not explore time point placement, which likely underlies the performance increase rather than simply number of timepoints. Prediction accuracy was strong in many cases even with simulated noise that exceeds typical experimental variability (2:1 signal-to-noise). The method presented here is quite general and could be applied not only to cell and molecular biology, but also vastly different fields where perturbation time course experiments are possible, and where network structures are important to determine.

One type of non-linear model that we did not investigate is one with sustained oscillations, such as those found in the cell cycle66, or sometimes even MAPK signaling pathways67,68,69. We found that in our application to general two and three node linear models, DL-MRA could reconstruct multiple networks that have damped oscillatory behavior (Fig. 1b). However, we expect time point measurement selection and frequency to be much more important for inferring networks that give rise to sustained oscillations, with time points comprehensively covering peaks and troughs, and the frequency high enough to do so. We do expect that the method could infer the structure of such networks given appropriate sampling, but this requires a much deeper investigation.

MRA and its subsequent methods allow for inference of direct edges by prescribing systematic perturbation of each node37,39,41,43,45 and the idea of directionality has been followed through in DL-MRA. Often, such edge directness is equated to causality, but this is not necessarily the case, especially when the entire system is not explicitly represented. In practice, the causality and strength of an edge may be dependent on how well the model represents the underlying phenomenon and might be affected by simplification of larger networks, non-linearities in the actual model and even by noise in the data. Secondly, in discussions about causal system inferences, consideration of the counterfactuals is important30,31,50,51. For a network of nodes going through dynamics, the counterfactuals to intrinsic network edges causing the dynamics would be the environmental factors extrinsic to the network edges. In DL-MRA, by evaluating external stimuli and basal production as well as the network edges, we have mapped some counterfactuals to node dynamics, thus presenting a more complete map of the causal factors to the network dynamics compared to methods which only show network edges. This also allows for a concise mapping of the environmental contexts in which the network edges are reconstructed.

Application of DL-MRA could reconstruct cell state transition networks based on discrete time Markov transition models, with the added benefit of not being constrained to specific time intervals. It can also successfully handle noisy data. The additional constraints in DL-MRA in the context of cell state transitions (summations of transition rates—see Methods) implies that the underlying network may be estimated even with less data requirements than in other cases. This method can be a useful tool to model cell state transitions and predict cell state. Perturbations were modeled as a difference in initial states, and that worked well in this case, suggesting that such modeling of perturbations may work in other cell state transition or biological networks.

Although application of DL-MRA to an intracellular signaling network (Huang–Ferrell MAPK) was able to explain its ground truth, including feedback due to sequestration, the method was constrained to specific, difficult-to-implement perturbations and a low stimulus which may not always be feasible experimentally. Specific inhibitors could be a source of perturbation, but even they influence more kinetic parameters than was required here for a clean solution. In MRA, a larger reaction scheme is often simplified into modules with one species in the module representing the activity of the module. But often, the activity of the other species in the module is implicit and becomes significant in dictating how perturbations and stimulus affect the network dynamics. Moreover, the type of perturbation chosen, such as specific inhibitors versus knock-down, also may yield different network inference results. Therefore, the use of MRA methods on simplified large intracellular signaling networks, especially while dealing with experiments, have significant caveats that should be carefully considered41,70.

Although complete inhibition is often used for perturbation studies of gene regulatory networks (e.g. CRISPR-mediated gene knockout), we found that partial inhibition is important to fully reconstruct the considered non-linear gene regulatory networks. It is important to distinguish here, however, small perturbations vs. partial perturbations. Small perturbations are formally recommended for both MRA and other techniques70 where the effects of noise are not extensively explored. In practice however, there is a tradeoff between perturbation strength and feasibility, since the effects of small perturbations are masked by noise41. Partial perturbations, as considered in this work (~50%) are much larger than what are typically considered small perturbations. The theoretical formulation of DL-MRA reduces the impact of not having small perturbations, because perturbation data from a particular node is not used for inference of edges connected to that node. Yet, DL-MRA still uses linearizations of the Jacobian which are are always subject to greater inaccuracy the further away from reference points such perturbations take the system. Since many biological networks share the same types of non-linear features contained within the considered FFL models, this is not likely to be the only case when partial inhibition will be important. We are thus inclined to speculate that large partial perturbations may be a generally important experimental design criterion moving forward. Partial inhibition is often “built-in” to certain assay types, such as si/shRNA or pharmacological inhibition that are titratable to a certain extent.

One major remaining challenge is scaling to larger networks. Here, we limited our analysis to 2 and 3 node networks. Conveniently, the number of necessary perturbation time courses needed grows linearly (as opposed to exponentially) with the number of considered nodes. Furthermore, as long as system-wide or omics-scale assays are available, the experimental workload also grows linearly. This is routine for transcriptome analyses71, and is becoming even more commonplace for proteomic assays (e.g. mass cytometry72, cyclic immunofluorescence, mass spectrometry73, RPPA74. Thus, the method is arguably experimentally scalable to larger networks.

However, the computational scaling past 2 and 3 node models remains to be determined and is likely to require different approaches for parameter estimation. Increasing the network size will quadratically increase the number of unknown parameters, which will significantly increase the computational requirements for obtaining robust solutions. Yet, recent work has shown that large estimation problems in ODE models may be broken into several smaller problems75, which may be applicable here, and is likely to yield large computational speed up by allowing parallelization of much smaller tasks. However, theory on how to merge potentially discrepant results between independently estimated overlapping subnetworks would need to be derived. Importantly, we saw in the linear 2 and 3 node model examples that the impact of experimental noise was larger for 3 node models, implying that increasing the number of nodes past 3 will further increase the impact of experimental noise. Another synergistic avenue could be imposing prior knowledge to improve initial parameter guesses and even reduce the parametric space, such as in Bayesian Modular Response Analysis45, or with functional database information76. Such prior knowledge could also help inform emergent network properties as network size grows, such as degree distributions for scale-free networks2. Here, we only investigated dense subnetworks, so sparseness patterns and judicious allocation of non-zero Jacobian elements could also have great impact on estimation for large networks. Overall, application to larger networks is of great interest but these non-trivial computational roadblocks must be solved first.

In conclusion, the proposed approach to network reconstruction is systematic and feasible, robustly operating in the presence of experimental noise and accepting data from large perturbations. It addresses important features of biological networks that current methods struggle to account for: causality/directionality/sign, cycles (including self-regulation), dynamic behavior and environmental stimuli. It does so while leveraging dynamic data of the network and only requires one perturbation per node for completeness. We expect this approach to be broadly useful not only for reconstruction of biological networks, but to enable using such networks to build more predictive models of disease and response to treatment, and more broadly, to other fields where such networks are important for system behavior.

Methods

Deriving sufficiency conditions for unique estimation of Jacobian elements

The first-order partial derivatives comprising J (Eq. (2)) can be approximated by a first-order Taylor series expansion of Eq. (1) about a time point k

$$f_1(k + 1) \approx f_1(k) + \frac{\partial }{{\partial x_1}}\left( {f_1(k)} \right).\left( {x_1(k + 1) - x_1(k)} \right)\, + \frac{\partial }{{\partial x_2}}\left( {f_1(k)} \right) \cdot \left( {x_2(k + 1) - x_2(k)} \right)$$
(5)
$$f_2(k + 1) \approx f_2(k) + \frac{\partial }{{\partial x_1}}\left( {f_2(k)} \right) \cdot \left( {x_1(k + 1) - x_1(k)} \right)\, + \frac{\partial }{{\partial x_2}}\left( {f_2(k)} \right) \cdot \left( {x_2(k + 1) - x_2(k)} \right)$$
(6)

Equations (5) and (6) may be written more succinctly as

$$\begin{array}{l}y_1(k + 1) \approx F_{11}(k) \cdot \Delta _tx_1(k + 1) + F_{12}(k) \cdot \Delta _tx_2(k + 1)\\ y_2(k + 1) \approx F_{21}(k) \cdot \Delta _tx_1(k + 1) + F_{22}(k) \cdot \Delta _tx_2(k + 1)\end{array}$$
(7)

where

$$y_i(k + 1) \equiv f_i(k + 1) - f_i(k);\Delta _tx_i(k + 1) \equiv x_i(k + 1) - x_i(k).$$
(8)

The approximation in Eq. (7) becomes more accurate as more time points are measured. Also, the edge weights are potentially time-dependent, although this is rarely considered when describing biological networks.

How do we estimate the edge weights F in Eq. (7) and thus reconstruct the network? Time series data can inform xi’s and fi’s as a function of time, following application of a stimulus. Given such stimulus-response data, however, for each time point there are only two equations for four unknowns, an underdetermined system for which more data are needed.

Consider now stimulus-response time course data in the presence of single perturbations. Let pi be a variable that reflects the strength and/or presence of different potential perturbations: p1 represents perturbation of x1 and p2 represents perturbation of x2. If pj is not explicitly written, its value is zero and/or it has no effect. Now, the ODEs become a function of the perturbation variables

$$f_{i,j}^{}(k) \equiv f_i(k,p_j) = f_i(x_1(k),x_2(k),p_j)$$
(9)

The 1st order Taylor series expansions for cases with perturbations become

$$y_{1,1}^{_{_{_{_{}}}}}(k) \approx F_{11}(k) \cdot \Delta _{p,1}^{}x_1(k) + F_{12}(k) \cdot \Delta _{p,1}^{}x_2(k) + \frac{\partial }{{\partial p_1}}\left( {f_1(k)} \right) \cdot p_1$$
(10)
$$y_{1,2}^{}(k) \approx F_{11}(k) \cdot \Delta _{p,2}^{}x_1(k) + F_{12}(k) \cdot \Delta _{p,2}^{}x_2(k) + \frac{\partial }{{\partial p_2^{}}}\left( {f_1(k)} \right) \cdot p_2^{_{}}$$
(11)
$$y_{2,1}(k) \approx F_{21}(k) \cdot \Delta _{p,1}x_1(k) + F_{22}(k) \cdot \Delta _{p,1}x_2(k) + \frac{\partial }{{\partial p_1}}\left( {f_2(k)} \right) \cdot p_1$$
(12)
$$y_{2,2}(k) \approx F_{21}(k) \cdot \Delta _{p,2}x_1(k) + F_{22}(k) \cdot \Delta _{p,2}x_2(k) + \frac{\partial }{{\partial p_2}}\left( {f_2(k)} \right) \cdot p_2$$
(13)

where

$$y_{i,j}(k) \equiv f_{i,j}(k) - f_i(k);\Delta _{p,j}x_i(k) \equiv x_i(k,p_j) - x_i(k)$$
(14)

Here, we have expanded with respect to the perturbation, rather than with respect to time as previously. However, since the reference point is the same, the Jacobian elements remain identical in these equations. It is also interesting to note that the Jacobian elements, or network, may be affected by the perturbation, but we do not necessarily have to know those effects mathematically, since the reference point is the same. Now we have six potential equations with which to estimate the four Jacobian elements. However, we must make some determination as to how the perturbations p1 and p2 directly affect Node 1 and Node 2 dynamics f1 and f2 to account for the perturbation variable partial derivatives.

By design, the Node 1 perturbation has significant direct effects on Node 1 dynamics, and similarly for the Node 2 perturbation on Node 2 dynamics. Using equations including \(\partial f_1/\partial p_1\) and \(\partial f_2/\partial p_2\) require precise definition of perturbation strength and their effects on dynamics, which could be difficult to determine experimentally and implement in simulations. Therefore, we do not employ equations involving such terms. On the other hand, if the Node 1 perturbation has negligible direct effect on Node 2 dynamics, that is, the effects on Node 2 dynamics are through the network (i.e. p1) and not explicit in f2), and similarly the Node 2 perturbation has negligible direct effect on Node 1 dynamics, then \(\partial f_2/\partial p_1\) and \(\partial f_1/\partial p_2\) are approximately zero. This mild condition is often the case experimentally. The only determining factors for the suitability of the Taylor series truncation are the spacing of time points and the accuracy of the expansion about the perturbation difference. From this, the main set of linear equations presented in Eqs. (3) and (4) are obtained.

General estimation model equations

We employ the following general model for a two-node network: -

$$\begin{array}{l}\frac{{dx_1}}{{dt}} = f_1(x_1,x_2) = S_1 + F_{11}x_1 + F_{12}x_2\\ \frac{{dx_2}}{{dt}} = f_2(x_1,x_2) = S_2 + F_{21}x_1 + F_{22}x_2\end{array}$$
(15)

Here, S1 and S2 are the stimuli strengths on Node 1 and Node 2 respectively, and F11, F12, F21 and F22 are the network edge weights (Fig. 1a). In many systems, there may be a basal or constitutive production driving the node activities, besides an external stimulus. For these cases, the Stimulus term (Si), may be considered as an addition of these two effects- the basal production term (Si,b) and the external stimulus (Si,ex). Then the two-node model can be represented by the following equations-

$$\begin{array}{l}\frac{{dx_1}}{{dt}} = S_{1,b} + S_{1,ex} + F_{11}x_1 + F_{12}x_2\\ \frac{{dx_2}}{{dt}} = S_{2,b} + S_{2,ex} + F_{21}x_1 + F_{22}x_2\end{array}$$
(16)

Or more generally,

$$\frac{{dx_i}}{{dt}} = S_{i,b} + S_{i,ex} + \mathop {\sum}\limits_{j = 1}^n {F_{ij}x_j} ,$$
(17)

where n is the total number of nodes.

When a steady state exists, the dxi/dt terms become zero and it becomes easy to represent the stimulus terms as a function of the node activities (xi) and network edges (Fij).

$$S_{i,b} + S_{i,ex} = - (\mathop {\sum}\limits_{j = 1}^n {F_{ij}x_{i,ss}} )$$
(18)

This is helpful to understand that the perturbation time course data also generally constrains not only the edge weights, but also the stimulus terms. For a system at a steady state without an external stimulus, for example at t = 0:

$$S_{i,b} = - (\mathop {\sum}\limits_{j = 1}^n {F_{ij}x_{i,ss}} )$$
(19)

The two-node single activator model

The two-node single activator model (Fig. 1a, Supplementary Fig. S1a) is described by

$$\begin{array}{l}\frac{{dx_1}}{{dt}} = f_1(x_1,x_2) = 1 - x_1\\ \frac{{dx_2}}{{dt}} = f_2(x_1,x_2) = 1 + 1.5x_1 - 0.8x_2\end{array}$$
(20)

Here, S1,ex = 1, F11 = −1, F12 = 0, S2,ex = 1, F21 = 1.5, F22 = −0.8. The basal production terms are both zero, for simplicity, and the initial conditions for x1(t = 0) and x2(t = 0) are zero. The stimulus terms Si,ex are calculated through Eq. (18), using the median values of Fij and the xi(t = 10), when the system reaches near steady state.

Random two-node and three-node models

The random 2 node network is described by

$$\begin{array}{l}\frac{{dx_1}}{{dt}} = f_1(x_1,x_2) = S_{1,b} + S_{1,ex} + F_{11}x_1 + F_{12}x_2\\ \frac{{dx_2}}{{dt}} = f_2(x_1,x_2) = S_{2,b} + S_{2,ex} + F_{21}x_1 + F_{22}x_2\end{array}$$
(21)

Values for S1,b, S2,b, S1,ex and S2,ex are sampled from a uniform distribution over the range [0,2] and values for F11, F12, F21, and F22 are sampled from a uniform distribution over the range [−2,2] using the MATLAB function rand. To capture basal activity, we use a two-step approach. First, starting from node activity values of zero, without the external stimulus on Node 1 and Node 2 (S1,ex = S2,ex = 0 in Eq. (22)) we simulate until the network reaches steady-state with just basal production driving the network behavior. Then, we introduce the external stimulus on Node 1 and Node 2, integrate the ODEs, and sample evenly spaced time-points using ode15s in MATLAB with default settings. We sample 3,7, 11, and 21 evenly spaced time points across a time course, from 0 to 10 arbitrary time units in all the cases.

The random 3 node networks use the same sampling rules as the 2 node networks with the following equations.

$$\begin{array}{l}\frac{{dx_1}}{{dt}} = f_1(x_1,x_2,x_3) = S_{1,b} + S_{1,ex} + F_{11}x_1 + F_{12}x_2 + F_{13}x_3\\ \frac{{dx_2}}{{dt}} = f_2(x_1,x_2,x_3) = S_{2,b} + S_{2,ex} + F_{21}x_1 + F_{22}x_2 + F_{23}x_3\\ \frac{{dx_3}}{{dt}} = f_3(x_1,x_2,x_3) = S_{3,b} + S_{3,ex} + F_{31}x_1 + F_{32}x_2 + F_{33}x_3\end{array}$$
(22)

Intracellular signaling networks

In the simplification of the Huang–Ferrell network to three nodes, p-MAPKKK, pp-MAPKK and pp-MAPK were taken as nodes. Since, in absence of external stimuli, the basal values of the nodes are zero, the basal production was estimated as zero beforehand and not considered in the estimation of the rest of the network. Aside from the basal production edges, a full 3 node network (Fig. 4b) was estimated from the simulation data of each of the observables. After estimation, parameters with values less than 1/100th of the largest parameter, were considered negligible.

Cell state transition models

The cell transition model48 is a discrete time Markov probability model. Here, we show how this form is related to the ODE model used in DL-MRA. Starting at any initial value, each next step representing a time difference of one day follows from the previous time point as follows-

$$\begin{array}{l}x_{1,t + 1} = M_{11}x_{1,t} + M_{12}x_{2,t} + M_{13}x_{3,t}\\ x_{2,t + 1} = M_{21}x_{1,t} + M_{22}x_{2,t} + M_{23}x_{3,t}\\ x_{3,t + 1} = M_{31}x_{1,t} + M_{32}x_{2,t} + M_{33}x_{3,t}\end{array}$$
(23)

Where Mij denotes the Markov transition probabilities of species j into species i. In matrix form it may be represented as follows-

$$\left[ \begin{array}{l}x_{1,t + 1}\\ x_{2,t + 1}\\ x_{3,t + 1}\end{array} \right] = \left[ \begin{array}{l}M_{11}{{{\mathrm{ }}}}M_{12}{{{\mathrm{ }}}}M_{13}\\ M_{21}{{{\mathrm{ }}}}M_{22}{{{\mathrm{ }}}}M_{23}\\ M_{31}{{{\mathrm{ }}}}M_{32}{{{\mathrm{ }}}}M_{33}\end{array} \right]\left[ \begin{array}{l}x_{1,t}\\ x_{2,t}\\ x_{3,t}\end{array} \right]$$
(24)

Representing the Markov parameter matrix as M and the species relative concentration variables as vector X, the equation becomes

$$X_{t + 1} = MX_t$$
(25)

The Markov transition probabilities for a species must add up to 1. In experimental terms, a species can either transition to other species or stay the same and the sum of all those probabilities is 1.

$$\mathop {\sum}\limits_{i = 1:3} {M_{ij} = 1}$$
(26)

As a first step in relating these equations to the ODE form underlying DL-MRA, we put the variables in terms in terms of ∆x (with respect to time),

$$X_{t + 1} - X_t = MX_t - X_t$$
(27)
$$\Delta X_{t + 1} = (M - I)X_t$$
(28)
$$\Delta X_{t + 1} = M^\prime X_t$$
(29)

Where M’ is M-I, and I is the identity matrix. M’ is M, except that 1 is subtracted from all its diagonal elements. Hence Eq. (26) for M’ becomes

$$\mathop {\sum}\limits_{i = 1:3} {M^\prime _{ij} = 0}$$
(30)

This also implies that the diagonal term for M’ is negative of the sum of the other two terms in the same column. In experimental terms, the amount of reduction of a species is equal to how much it got converted to other species.

The above equations apply for the cases where ∆t is 1. We can incorporate arbitrary time steps as

$$\Delta X_{t + \Delta t} = M^\prime _{\Delta t}X_t\Delta t$$
(31)

Where ∆t is the scalar value of time difference and M’∆t is the matrix of the set of parameters, specific to the time difference chosen. For a case where ∆t tends to 0, the equation becomes-

$$\mathop {{\lim }}\limits_{\Delta t \to 0} (\Delta X_{t + \Delta t}/\Delta t) = M_{dt}^\prime X_t$$
(32)
$$dX/dt = M_{dt}^\prime X_t$$
(33)

Where M’dt is the matrix of the set of parameters specific to the case where ∆t is infinitesimally small. Note that Eq. (33) is similar in form to Eq. (22), only without the extra stimulus terms and where M’dt is equivalent to the Jacobian matrix F with terms Fij. There would be an added constraint that the sum of the terms in the same column would add up to zero, or that the diagonal term is the negative of the sum of the other two terms in the same column.

$$\frac{{dX}}{{dt}} = FX_t$$
(34)
$$F_{ii} = - \mathop {\sum}\limits_{j = 1,j \ne i}^3 {F_{ij}}$$
(35)

Non-linear models

The non-linear feedforward loop models32 are described by:

$$\begin{array}{l}\frac{{dx_1}}{{dt}} = f_1(x_1,x_2,x_3) = 1 - x_1\\ \frac{{dx_2}}{{dt}} = f_2(x_1,x_2,x_3) = f(x_1,K_{x_1x_2}) - x_2\\ \frac{{dx_3}}{{dt}} = f_3(x_1,x_2,x_3) = G(x_1,K_{x_1x_3},x_2,.K_{x_2x_3}) - x_3\end{array}$$
(36)

When an AND gate is present

$$G(x_1,K_{x_1x_3},x_2,K_{x_2x_3}) = f(x_1,K_{x_1x_3})^\ast f(x_2,K_{x_2x_3})$$
(37)

When an OR gate is present

$$G(x_1,K_{x_1x_3},x_2,K_{x_2x_3}) = fc(x_1,K_{x_1x_3},K_{x_2x_3},x_2) + fc(x_2,K_{x_2x_3},K_{x_1x_3},x_1)$$
(38)

For a given u, v ϵ {x1, x2, x3} and K, Ku, Kv ϵ {\(K_{x_1x_2}\), \(K_{x_1x_3}\), \(K_{x_2x_3}\)}:

If u activates its target, then:

$$f(u,K) = \frac{{\left( {\frac{u}{K}} \right)^2}}{{1 + \left( {\frac{u}{K}} \right)^2}};\,fc(u,K_u,K_v,v) = \frac{{\left( {\frac{u}{{K_u}}} \right)^2}}{{1 + \left( {\frac{u}{{K_u}}} \right)^2 + \left( {\frac{v}{{K_v}}} \right)^2}}$$
(39)

If u represses its target, then:

$$f(u,K) = \frac{1}{{1 + \left( {\frac{u}{K}} \right)^2}};\,fc(u,K_u,K_v,v) = \frac{1}{{1 + \left( {\frac{u}{{K_u}}} \right)^2 + \left( {\frac{v}{{K_v}}} \right)^2}}$$
(40)

Effectively, an external stimulus of ‘S1,ex = 1’, acts on Node 1 at t = 0 and is propagated through the network. There is no external stimulus acting on Node 2 and Node 3. However, in many cases there is basal production in one or both of Node 2 and Node 3. This leads to a non-zero steady-state of the network before the external stimulus is introduced.

To capture basal activity, we use a two-step approach. First, starting from node activity values of zero, without the external stimulus on Node 1 (S1,ex = 0), we simulate until the network reaches steady-state. Then, we introduce the external stimulus on Node 1, integrate the ODEs, and sample 11 evenly spaced time-points using ode15s in MATLAB with default settings and steady-state node values without the external stimulus as the initial conditions. We chose 11 timepoints because it yields good classification accuracy for the above random 3 node model even in presence of noisy data. For each of the 16 non-linear models, the values of the parameters (K, Ku, Kv), were varied and chosen so that the resulting node activity data are responsive to the stimulus and perturbations (Supplementary Fig. S6, See Supplementary Code for values).

Modeling perturbations

Precisely modeling perturbations can be a challenge, since experimentally, there may be several ways of causing a perturbation with different mechanisms such as siRNAs, competitive/non-competitive/uncompetitive inhibition, etc. It may be hard to quantify how much a perturbation is affecting a node, in terms of its dynamics (i.e. right-hand sides of the ODEs). Therefore, we employ the following approaches which circumvent the need to model how each perturbation mechanistically manifests in the ODEs during parameter estimation. There are two cases to consider: (i) when we have a perturbation of node i and we need to simulate node i dynamics; (ii) when we have a perturbation of node i and we need to simulate other node j dynamics. To illustrate the approach, we use the above-described 2 node model with an example of a Node 1 perturbation. Recall that

$$\begin{array}{l}\frac{{dx_1}}{{dt}} = S_{1,b} + S_{1,ex} + F_{11}x_1 + F_{12}x_2\\ \frac{{dx_2}}{{dt}} = S_{2,b} + S_{1,ex} + F_{21}x_1 + F_{22}x_2\end{array}$$
(41)

For case (i), we have to obtain values for x1 under perturbation of Node 1. We refer to the perturbed time-course as x1,1. In experimental situations, x1,1 would be measured directly. To obtain simulation data for x1,1 we use the following:

$$x_{1,1}(k) = p_1 \times x_1(k),$$
(42)

where x1 is obtained from the simulations without perturbations, and recall that k refers to time point k. For a 50% inhibition, p = 0.5 and for a complete inhibition, p = 0.

For case (ii), we have to obtain the values for x2 under perturbation of Node 1, which we refer to as x2,1. To do this, we have to integrate the ODE for dx2/dt, but using x1,1 values, as follows

$$\frac{{dx_{2,1}}}{{dt}} = S_{2,b} + S_{2,ex} + F_{21}x_{1,1} + F_{22}x_{2,1}$$
(43)

Here, x2 has been replaced with x2,1 to represent x2 under perturbation of Node 1, for clarity. To solve this equation, we simply use the “measured” x1,1 time course directly in the ODE.

When data are generated by simulations, there is little practical limit to temporal resolution, but with real data, to solve Eq. (43) one may need values for x1,1 at multiple time points where measurements are not available, depending on the solver being used. We therefore fit x1,1 data to a polynomial using polyfit in MATLAB, and use the polynomial to interpolate given a required time point. In this work, we have used an order of 5 to fit the data as well as avoid overfitting, but the functional form is quite malleable so long as it captures the data trends.

For modeling perturbations of the cell transition model, the initial value of the simulated data for the perturbed node was taken as zero during simulation. The estimation was performed in a similar way as a random 3 node network as described above.

For modeling perturbations for the Huang–Ferrell model, the parameters k3, k15 and k27 were sequentially set as zero. The estimation was performed in a similar way as a random 3 node network as described above.

Simulated noise

Normally distributed white (zero mean) noise is added to simulated time courses point-wise with

$$y = x + N(0,d \cdot x)$$
(44)

where x is the simulation data point, y is the noisy data point, and d represents the noise level. Signal-to-noise ratio of 10:1, 5:1 and 2:1 are, respectively d = 0.1, 0.2, and 0.5. Normally distributed samples are obtained using randn in MATLAB. While there are many different distributional options for modeling noise, we chose this for simplicity and to capture the effects generically of noisier data. We do not intend to answer questions related to whether specific distributional assumptions about the form of the noise have significant impact of the methods performance.

Parameter estimation

For the two-node model, the entire network, with and without perturbations, can be explained by the following system of equations

$$\begin{array}{l}\frac{{dx_1}}{{dt}} = S_{1,b} + S_{1,ex} + F_{11}x_1 + F_{12}x_2\\ \frac{{dx_2}}{{dt}} = S_{2,b} + S_{2,ex} + F_{21}x_1 + F_{22}x_2\\ \frac{{dx_{2,1}}}{{dt}} = S_{2,b} + S_{2,ex} + F_{21}x_{1,1} + F_{22}x_{2,1}\\ \frac{{dx_{1,2}}}{{dt}} = S_{1,b} + S_{1,ex} + F_{11}x_{1,2} + F_{12}x_{2,2}\end{array}$$
(45)

where x1,1 and x2,2 are the perturbed node values, from either simulated or experimental data. Eight parameters (S1,b, S1,ex, F11, F12, S2,b, S2,ex, F21, F22) need to be estimated to fully reconstruct this network. We seek a set of parameters that minimizes deviation between simulated and measured dynamics.

For an initial guess, the node edge parameters (Fij) are randomly sampled from a uniform distribution over the interval [−2,2] and the stimulus parameters (Si,ex) are sampled from a uniform distribution over the interval [0,2]. Using data at t = 0, which corresponds to a steady-state without Si,ex, the Si,b can be estimated during each iteration of the estimation as follows-

$$\begin{array}{l}\hat S_{1,b} = - (\hat F_{11}x_1(t = 0) + \hat F_{12}x_2(t = 0))\\ \hat S_{2,b} = - (\hat F_{21}x_1(t = 0) + \hat F_{22}x_2(t = 0))\end{array}$$
(46)

For an n-node model, this equation can be scaled accordingly to obtain each \(\hat S\)i,b.

For these initial guesses we compute the activity data using Eq. (45). The perturbation data xk,k is used in the perturbation equations as detailed above (Eq. (43)). Let \(\hat x\)i, and \(\hat x\)i,j denote the predicted node activity values for non-perturbed and perturbed cases respectively. For a total of n nodes and Nt timepoints, the objective function is the sum of squared errors Φ

$$\Phi = \mathop {\sum}\limits_{k = 1}^{N_t} {\left[ {\left( {\mathop {\sum}\limits_{i = 1}^n {\left( {x_i(k) - \hat x_i(k)} \right)^2} } \right) + \mathop {\sum}\limits_{i = 1}^n {\mathop {\sum}\limits_{j \ne i} {\left( {x_{i,j}(k) - \hat x_{i,j}(k)} \right)^2} } } \right]}$$
(47)

Note here that we do not use data from node j, when perturbation j was used (per the derivation). The MATLAB function fmincon is used to minimize Φ by changing edge weights and stimulus terms within the range [−10,10].

We employ “multi-start” by running the estimation 10 times, starting from different randomly generated initial starting points77. The estimated parameters and their respective final sum of squared errors (Φ) are saved and the estimated parameter set corresponding to the minimum Φ is chosen as the final parameter set. Variability of parameter estimates across multi-start runs is explored in Supplementary Fig. S5.

Parameter estimation for non-linear models

For estimating the Non-Linear models, we start with a prior knowledge that S1,b is always zero and S2,ex and S3,ex are always zero as well, which is directly evident from x1 initial conditions and x2, x3 stimulus response in the presence of a complete Node 1 perturbation. The equations for the non-perturbation case become as follows

$$\begin{array}{l}\frac{{dx_1}}{{dt}} = S_{1,ex} + F_{11}x_1 + F_{12}x_2 + F_{13}x_3\\ \frac{{dx_2}}{{dt}} = S_{2,b} + F_{21}x_1 + F_{22}x_2 + F_{23}x_3\\ \frac{{dx_3}}{{dt}} = S_{3,b} + F_{31}x_1 + F_{32}x_2 + F_{33}x_3\end{array}$$
(48)

Since the system is at steady-state before the external stimulus, the basal production parameter can be estimated during each iteration of the estimation as-

$$\begin{array}{l}\hat S_{2,b} = - (\hat F_{21}x_1(t = 0) + \hat F_{22}x_2(t = 0) + \hat F_{23}x_3(t = 0))\\ \hat S_{3,b} = - (\hat F_{31}x_1(t = 0) + \hat F_{32}x_2(t = 0) + \hat F_{33}x_3(t = 0))\end{array}$$
(49)

where \(\hat F\)i,j are the current model parameter estimates and xi (t = 0) are the x values at the initial system steady state before the induction of external stimulus.

Bootstrapping simulated data for the FFL model cases

To generate multiple parameter set estimates to classify edge weights for the FFL model cases, we employ a bootstrapping approach. In an experimental scenario, each data point will have a mean and a standard deviation, and upon a distributional assumption (e.g. normal), one can then resample datasets to obtain measures of estimation uncertainty. We use the simulated data as the mean, and then vary the standard deviation as described above to generate 50 bootstrapped datasets for each of the 16 considered models. Estimation is carried out for each of the 50 datasets using multi-start, which yields 50 best-fitting parameter sets for each model. Uncertainty analysis and classification error is based on these sets.