Inferring sparse networks for noisy transient processes

Inferring causal structures of real world complex networks from measured time series signals remains an open issue. The current approaches are inadequate to discern between direct versus indirect influences (i.e., the presence or absence of a directed arc connecting two nodes) in the presence of noise, sparse interactions, as well as nonlinear and transient dynamics of real world processes. We report a sparse regression (referred to as the -min) approach with theoretical bounds on the constraints on the allowable perturbation to recover the network structure that guarantees sparsity and robustness to noise. We also introduce averaging and perturbation procedures to further enhance prediction scores (i.e., reduce inference errors), and the numerical stability of -min approach. Extensive investigations have been conducted with multiple benchmark simulated genetic regulatory network and Michaelis-Menten dynamics, as well as real world data sets from DREAM5 challenge. These investigations suggest that our approach can significantly improve, oftentimes by 5 orders of magnitude over the methods reported previously for inferring the structure of dynamic networks, such as Bayesian network, network deconvolution, silencing and modular response analysis methods based on optimizing for sparsity, transients, noise and high dimensionality issues.

regression (henceforth referred to as the  1 -min) formulation to recover the structure of dynamic networks from noisy data gathered under transient conditions. Our main contribution is in providing a theoretical bound on the constraints of the  1 -min formulation and providing stable numerical procedures that overcome effects of nonlinear couplings in large interconnected processes, availability of only a small sample of short time series ensembles, and inaccuracies in estimating noise levels. These bounds mitigate tedious trial and error procedures employed customarily as part of  1 -min implementations 1,[34][35][36] . The theoretical results and subsequent experimental studies suggest that the present  1 -min approach is more robust to noise compared to the contemporary dynamic Bayesian network [14][15][16][17][18][19] as well as NDs 26,27,32 . It is shown that up to 5 orders of magnitude reduction in the inference error are possible from the present approach, leading to a more accurate inference of the network structure for complex real world networks.

Methods
Towards a more formal treatment, we define a real world system as high dimensional coupled differential equation of the form = ( , ), ( ) n is a state vector, p is the parameter vector, τ ( ) x is an initial condition. As noted in the foregoing, such dynamics can also be represented in form of a network 37 shown in Fig. 1, where the node i represents the state variable x i and a directed arc represents the existence and the strength of the coupling (direct influence) s ij between node i and node j. In this context, the direct influence ( ) s t ij of node j on node i around a certain point x in the state space defined in Eq. (1) can be expressed as It may be noted that, a node j is connected to a node i at time t if ( ) ≠ s t 0 ij . Hence, ( ) = ( ( )) S t s t ij captures the physical structure of the dynamical system (1) at time t. In practice, ( ) S t needs to be inferred from the measurements of the total influence ( ) g t ij between every pair of nodes 26,27 or estimated from time series outputs of the dynamic system gathered under transient conditions 33 . The total influence ( ) g t ij is the sum of the direct influence of node j on node i and all indirect influences from node j to node i through other nodes connecting to both of them (see Fig. 1b). For example, total influence from → 1 4, ( ) g t 41 is the sum of indirect influences along the paths → → → 1 3 2 4 and → → 1 2 4, or ( ) = ( ) ( ) ( ) + ( ) ( ) g t s t s t s t s t s t 41 31 23 42 21 42 . In other words, the total influence that node j has on node i around a certain point x on the state space defined in Eq. (1) is defined recursively as which is similar to the expression noted in in Barzel and Barabási 27 . Conventionally, under stationarity assumptions, ( ) g t ij can be approximated using similarity measures, such as correlation and mutual information 8 estimated from raw samples of time series. The direct and total influence matrices are related at every time t by the following equation: where ( ) B t and ( ) C t are functions (defined depending on the context) of ( ) S t and ( ) G t , respectively. Pertinently, when the underlying dynamical system is linear and time-invariant, ( ) S t and ( ) G t do not depend on time. Eq. (7) generalizes previous network deconvolution formulations as follows: for Feizi et al. 26 , ( ) = ( + ), ( ) = B t I G C t G, for Barzel and Barabási 27 , and for Sontag et al. 33 . For simplicity of expressions, we use henceforth S, B and C instead of ( ), ( ) S t B t and ( ) C t in this subsection. The "true" network structure S 0 can be estimated by solving the following  1 -min formulation: , and  is the allowable perturbation that captures the effects of noise in the measured data. We note that in the absence of noise, this formulation is equivalent to ND and MRA. In the following sections we present two alternative  1 -min formulations for direct influence inference. The first formulation presented in Eqs (9, 10) addresses the estimation of s ij for real world scenarios when the total influence g ij is directly measurable (e.g., based on the strengths of co-excitations), and the second formulation Eqs (21,22) addresses the inference of the network structure (i.e., determine all node pairs where ( ) = ∀ s t t 0 ij ) under one of the most generic scenarios of using multiple ensembles of time series realizations of the state variables, collected under noisy and transient conditions with different parameter settings. It may be noted that inferring the network structure under such generic conditions has not been investigated to date.

Network inference when total influence matrix is available.
For the case where the measurements of total influence matrix G are provided 26 , the relaxed  1 -min formulation can be written as where g i is the i th column of G. In order to solve for an accurate estimate of S 0 from Eqs (9) or (10) using standard solvers 38,39 , estimation of  and ε i are crucial. Specifically, when noisy measurements of the total influence matrix differ from the "true" total influence as = + ∆ G G G 0 , the estimated direct influence matrix differs from the true direct influence matrix as = + ∆ S S S 0 , and The quantity ∆ + ∆ SG S F is called total perturbation. In vector form, can represent the total perturbation for computing row i of S 0 . The bounds on  and ε i are as follows (See Theorem 1 in Supplementary Information): where γ is the largest eigenvalue of Δ G, δ K is the restricted isometry constant 40 and · F is the Frobenius norm of a matrix. By employing these bounds, we can set the values of  and ε i for effective network inference. As subsequent numerical investigations indicate, the performance of the method does not degrade significantly due to the presence of noise, and this is the major advantage of the present approach. It may be noted that our method is designed to provide the sparsest network structure that replicates the measured total influence G within a bound (specified in terms of the allowable total perturbation). This is very important because only a small set of noisy observations are available, for most real world applications. For example, in the case of genetic regulatory networks, only a subset of dynamic regimes (i.e. marked by the active degrees of freedom) of the underlying process are captured. Therefore, identification of true network structure would never be guaranteed by any approach, and among the network structures that can replicate the observed total influence within a specified bound, the sparsest network would be of the most interest. Although sparser than the network derived by ND,  1 -min derived structure might be adequate to uncover the total dynamic couplings of the process captured in the observed data.
In real world scenarios, ∆G is not always known. Overestimation of ∆G can lead to network structures that are sparser than the original. However, we show that the effects of under-estimation of noise can be alleviated to a great extent. When noise level is unknown but multiple realizations of the noisy measurements of G are available, it is possible to further reduce the inference error by combining the estimates with different realizations of G . This averaging procedure allows us to improve the network inference accuracy when multiple measurements of the total influence matrix are available. For example, when the network structure does not change significantly as the system approaches a steady state, the total influence matrices can be measured multiple times, each corresponds to one time window.
Network inference when the time series under transient conditions are available (total influence matrix not given). In practice, g ij are often estimated using convenient similarity measures such as correlation or mutual information between the time series ( ) x t i and ( ) x t j of the nodes i j and as stated in the foregoing section. These estimations have a very low accuracy due to nonstationaries (transient), low sampling rates and sample size limitation; and can not capture the total influence in the system. Also, in most real world applications, only finite samples of time series ( ) x t are available, and the present NDs can not be employed in these scenarios. To overcome these drawbacks, we have adapted an approach to estimate the direct influence based on multiple time series ensembles obtained by perturbing parameters of the dynamical system Eq. (1) 33 . We first modify the perturbation procedure proposed by Sontag et al. 33 to make it more robust to numerical error then further improve the accuracy of network inference by introducing a sparse regression formulation and the averaging scheme.
A robust perturbation procedure. According to Sontag et al. 33 can be derived from the following equation: Note that Γ plays the role of the total influence matrix G in the previous section. To compute the row i of the matrix S, the parameters p j to be perturbed are chosen such that ∈ p P j i 33 . As a consequence, changes in p j indirectly affect x i , and dx is much smaller (2 orders of magnitude smaller as in the Table 1 for the network studied in case study 1) compared to other columns when ∈ p P j i . A numerical issue this poses can be understood based on the following linear system of equations Here, the sensitivity of solution u to the change in A can be quantified as follows 41 becomes several magnitudes larger than other rows. Therefore, the perturbation procedure proposed by Sontag et al. 33 is very unrobust to noise or numerical error in x i s. The following modification to the perturbation procedure addresses the aforementioned issue. Consider the case when  x i depends linearly on x i as in the following system 42 : This system describes popular biochemical reactions when the activity of a chemical species is inhibited by its own concentration 43,44 . To compute the i th row of the Jacobian, the parameters p i is also perturbed. Note that The remaining parameters are perturbed as in Eqs (17,18). Therefore, to compute we can solve the system of equations (16) with and other P s ij , R s ij are defined as in (17,18).
A robust network identification approach. In addition to the perturbation procedure proposed in Eqs (17)(18)(19), we present a method to solve Eq. (16) that is more robust to the presence of noise. In the present context, the  1 -min formulation of Eq. (16) takes the following form: As noted in the foregoing section, estimation of  and ε i based on the noise levels when measuring ( ) x t is essential to ensure that the solution to Eq. (21) serves as a viable estimator of the "true" direct influence S 0 . The following bounds and approximation allow the specification of  and ε i (Theorems 4 and 5 in Supplementary Information) 258e-5 0.0100 − 1.651e-4 0.0820 0.0820 Table 1. The matrix R for computing the first row of S is estimated using Sontag et al. 33 's perturbation procedure. The first row/column of R is two orders of magnitude smaller than others, which presents major numerical issues for inferring structures of large networks.
Scientific RepoRts | 6:21963 | DOI: 10.1038/srep21963 are the errors incurred when measuring ( , ), , respectively. As stated in the foregoing, noise level is not known a priori in most real world systems. In this situation, the network structure is deduced based on the entries in the estimated ( ) S t 0 that are equal to zero for all t and can be estimated by the entries in as are measurements or approximations of the total influence matrix Γ ( ) t 0 at time t r (see Proposition 2 in Supplementary Information). This averaging procedure allows us to improve the accuracy to predict the pair of nodes that are not connected when the measurement noise level is not available. As a result, our method ensures low false positive rates on the "arcs". As noted in the context of Proposition 1, network inference with ( ) S N tends to be at least as good as with ( ) S t r even when ( ( ) ) Var S t r is arbitrarily large.

Results
We have considered two case studies to validate the theoretical results and evaluate the performance of the  1 -min approach. The first case study contains two simulation scenarios. The first scenario simulates a scale-free network whose structure resembles that of the genetic regulation process of E. Coli species 45 . Here, the challenge is to estimate the true network structure, i.e., the direct influence matrix S 0 from a noisy total influence matrix G. This scenario is optimal for assessing the closeness of the bounds stated in Eqs (14,15)  , and comparing the performance of the  1 -min formulation relative to the recent ND methods in terms of inference error and sparsity. The next scenario simulates a system of Hill-type differential equations modeling a gene interaction network. Here, the challenge is to estimate the true network structure from noisy and transient time series data. The second case study is an application of our method to infer genetic regulatory networks (GRNs) from empirical data in the context of DREAM5 challenge 46 . This challenge is a standard framework for evaluating GRN inference methods.
Case I: simulation studies. Inferring direct influence networks from total influence network. First, we adapted the procedure specified by Muchnik 47 to generate 500 random realizations of scale-free networks consisting of = n 100 nodes, with a degree exponent of 2.2. In each realization, the weights of the true direct influence network, s ij 0 follow the distribution  µ σ ( , ) . We considered cases where the measurement noise level ∆G F is known as well as those where there is uncertainty in estimating the measurement noise level.
We first compare the "true" bound ( ) 0  (computed using S 0 ) and the bounds for  estimated based on Eqs (13,14). In the presence of noise, the bounds appear to be in the same order of magnitude for all simulated networks ( Table 2). The results also suggest that the bound specified in Eq. (13) closely matches the "true" bound and can be used to approximate the feasible region when  ( ) 0 is unknown with high accuracy. Although the bound in Eq. (14) tends to be loose, it can be used as an upper bound for ( ) 0  . We next compared the performance of ND and  1 -min approaches (using our bounds Eqs (13) and (14) , where Ŝ is computed using the different methods being compared. The  1 -min approach with "true" constraint bound ε ε = ( ) i i 0 significantly improves the ND (the mean and the variance of the estimated ρ were reduced by 45% and 99%, respectively) (Fig. 2).
 (based on Eq. (13)), the  1 -min approach performs much better than ND (the mean and variance of ρ are reduced by 33.5% and . 87 5%, respectively). More importantly, the inference error of  1 -min approaches were concentrated around of 0.15 within ± 0.05, while those of ND were spread over a larger range, from 0.3 to 0.6. This suggests that  1 -min approach using our bound in Eq. (13) is more robust than ND to noise and approximation error incurred when measuring the total influence matrix.
We also compared the sparsity of the recovered networks measured in terms of Hoyer sparsity measure 48 defined as follows  (Eq. 14) 8.61 × 10 −2 Note that ( ) ∈ , Hoyer S [0 1]. The closer it is to 1, the sparser S is. In terms of this measure, the solution of the  1 -min approach is much sparser (mean is 16.38% larger, variance is 69% smaller when using the true bound ε ε = ( ) i i 0 , and mean is 15.90% larger, variance is 75.69% smaller when using the approximated bound ε ε = ) ( ) i i 1 than solution of ND (Fig. 2b). Also, the Hoyer measure of the  1 -min approach is concentrated more around a much higher value (sparse matrices) than that of ND indicating that the  1 -min approach using our bound gives a significantly sparser solution than ND. As a result, this gives a more interpretable connection structure without the loss of performance. We also studied the effects of the bounds of  1 -min formulation on inference error to verify Eq. (40) numerically. When ε ε / > ( )

Table 2. Comparison of bounds on total perturbation obtained using Eqs (13) and (14) suggests that Eq. (13) provides a good approximation and Eq. (14) serves as an upper bound of
, the inference error trends almost linearly with ε i (see Fig. 3). This confirms the conclusion of Theorem 3. Also, when ε ε / < ( ) 1 i i 0 and tends toward 0, the inference error increases. This shows an evidence of over-fitting.
Subsequently, we studied the effect of averaging (Proposition 1) in the context of the  1 -min and ND methods. We conducted N = 40 simulations, in each of which, , S G 0 0 and ∆G were generated as stated in the foregoing. We used the inference error without ρ ( ) N and with averaging ρ as measures for comparison from each simulation defined as follows: Histograms summarizing the relative performance of ND and  1 -min approaches for the benchmark numerical case in terms of (a) inference error that quantifies the accuracy and (b) Hoyer measure that quantifies the sparsity of the solution. The solution from the  1 -min approach is more precise and sparser than ND: compared to NDs, the mean and the variance of the inference error are reduced by 45% and 99%, respectively, when using  1 -min with ε ε = ( )  suggest that averaging reduces the inference error of both methods by about 8 times in all cases, thus supporting the validity of Proposition 1 (Fig. 4). The inference errors were almost the same between ND and  1 -min with ε ε = ( ) i 1 .
Inferring direct influence network structure from multiple time series under transient conditions. In this section we represent the performance of  1 -min approach in inferring network structure from transient time series with an unknown noise level. In this study we used Michaelis-Menten dynamic system given by 27 : where the "true" network defined by ( ) s ij 0 is a scale-free network 45 generated randomly with degree exponent γ = .
2 2 consisting of = n 40 nodes with about 70 edges, whose weights s ij 0 follow the distribution ( , . ) 5 0 25  . We obtained 30 different variants of this network. For each of these invariants (trials), a perturbed network was obtained by changing (perturbing) the parameters according to Eqs (18)(19)(20). Every solution ( ) x t , ∈ , t [0 1], obtained from an initial condition ( ) x 0 was contaminated with noise of the form  σ ( , ) 0 2 to simulate a noisy measurement ( ) x t . Here σ 2 was chosen to be 10 As summarized in Fig. 5, the  1 -min approach performs better than Sontag et al.'s 33 method in all cases tested. In fact, ρ ρ , were reduced by 10 5 times. The poor performance of Sontag et al.'s 33 method is attributed to the numerical issues noted in the earlier section. A further 30% reduction in inference error resulted from averaging for both cases. Next, the cases (c) and (d) were designed to simulate the real situations where the noise magnitude is unknown. We considered cases where the noise levels are under or overestimated by 1 order of magnitude. While Sontag et al.'s 33 method would not be applicable in such cases,  1 -min without averaging was found to lead to suboptimal inference. Under underestimation ε ε < / ( ) 10 i i 0 , averaging was found to further reduce the inference error by about 70%, and the inference error ρs were of the same level as one would obtain when the noise level is known. This result is consistent with and is a clear verification of Proposition 2. When the noise level is overestimated, the resulting network tends to be highly sparse, offering excellent specificity in identifying the absence of direct coupling. The inference errors are therefore low even without averaging by default. In this case averaging reduces the inference errors by 5%. The p-values of the paired t-tests between the inference error with and without averaging were below 0.0282 in all cases suggesting that averaging helps improve network inference. Case II: Application to empirical genetic regulatory network inference. Next, we applied our method to infer real world GRNs and compare its performance with other methods including ND 26 , Bayesian network inference, Pearson and Spearman correlation networks 8 using the framework presented in DREAM5 challenge. Here, the Pearson and Spearman correlations were considered as they are the most widely used methods for network inference and can provide a reasonable estimation of the total influence matrix 26,27 . In addition, ND has been most effective in inferring network topology when the total influence matrix G is estimated using Person and Spearman correlations. Therefore, these serve as the challenging test cases to evaluate the performance of  1 -min where ND is already effective. The DREAM5 challenge contains gene-expression microarray data of three species including an in silico benchmark, a prokaryotic model organism (E. coli) and a eukaryotic model organism (S. cerevisiae). Beside ρ and Hoyer metrics, we employed the following score, which was used in earlier works 8 to assess the performance of a network inference method for recovering the structure underlying these data sets: where p ROC and p PR are p-values computed from AUROC (area under receiver operating characteristic curve) and AUPR (area under precision-recall curve).
The results of the performance evaluation are summarized in Fig. 6. We note that for computing the performance metrics we first generated 30 different G matrices with Pearson correlation, 30 others with Spearman correlation and another 30 with Mutual Information for each data set. The G matrix in each case was estimated using samples of size 75% of the data set. The averaging procedure considers the S matrices estimated from these G matrices using different methods. In terms of ξ-score (Eq. (31)), which quantifies how well-in terms of having low false negative rates (FNR, related to sensitivity), and low false positive rates (FRN, related to specificity), the true positive rate (TPR) and true negative rate (TNR)-the estimated Ŝ captures S 0 ,  1 -min approach yields Ŝ with at least 18.53% higher than with ND in all cases tested except the in silico case (see Fig. 6). Both ND and  1 -min performed better than Bayesian network approach whose ξ-scores were 14.891, 0.029, 0.0001, respectively, for the three data sets 8 . In terms of ρ-score (Eq. (29)), which quantifies the false positive rates (i. e., the specificity),  1 -min approach reduces ρ by 2-3 orders compared to ND in all cases. These results provide a strong evidence for the relevance of the  1 -min approach for network structure inference. In terms of sparsity,  1 -min approach increased the Hoyer measure by about 20% in most cases, and were much closer to the Hoyer measures of the gold-standard network, compared to ND.
As noted earlier for in silico data, although the ρ-score with  1 -min was at least 1160% lower (i.e., higher specificity) and Hoyer was 33% higher (i.e., higher sparsity), the ξ-score was slightly (10%) lower than with ND. The lower ξscore for  1 -min is perhaps a consequence of the method being susceptible to over-specification of the noise level. In this context, it must be noted that the solutions from both ND and  1 -min can replicate the observed total influence G within a specified bound (as total perturbation). However, the solutions from  1 -min tend to be much sparser and have lower false positive rate. Given that there were only 805 sample measurements to reconstruct G matrices for 1643 nodes in the in silico network, it is highly likely that several dynamic modes (degree of freedom) are not observable from the data. Therefore,  1 -min generated a much sparser network which, by formulation, is guaranteed to be adequate to capture the observed modes of the dynamics within the specified total perturbation limits. The ND derived networks for in silico and other cases that have higher ξ-score, intriguingly, were consistently found to have much lower Hoyer score (hence sparsity) even compared to the specified total influence matrix. Thus,  1 -min-generated solutions provide significant improvement in specificity, although the sensitivity at times were found to be slightly lower than with ND.
Averaging improves the ξ-scores (Eq. (31)) with all methods by at most 10%. This is perhaps due to the near-stationarity of the total influence matrix G, when computed using data over long time windows that smooths out various higher order transient effects. Also, one may note that the averaging makes the network inferred from ND less sparse than without averaging. This is because under noise, transients and data sparsity, ND yields vastly different network topologies depending on the samples employed. Averaging over these vastly different networks causes a reduction in sparsity. These results, taken together suggest that the  1 -min approach is perhaps the best known means to provide specificity for network inference from transient and noisy data. The utility of the approach would be to provide a minimal set of arcs (dynamic couplings or direct influences) to be considered for further network dynamics reconstruction applications.
Discussion and Concluding remarks. In this paper, we have investigated a method to robustly infer the structure of a network representing a sparse dynamical system from noisy, transient time series data. When the noise level is known, the  1 -min formulation employing our theoretical formula for the bound on total perturbation improves the recently reported NDs in terms of both accuracy and sparsity. When the noise level is unknown, we have shown that by averaging the networks inferred from different time points or conditions, the inference of network structure of real world processes becomes highly plausible.
Pertinently, for most real world processes, the total influence is not known a priori; only the time series ensembles gathered under transient conditions are available (e.g., gene expression microarray data 8,49 , protein-protein interaction data 50 as in the case of Michaelis-Menten dynamics). It has been noted that most of the earlier approaches present severe accuracy, noise sensitivity and/or numerically stability issues for such realistic scenarios. To overcome these limitations, we have investigated the  1 -min approach with a novel perturbation procedure for time series based network inference. Averaging over the solutions estimated at different time windows has been shown to allow inference of the structure for complex real world networks, especially when the noise levels are unknown or cannot be accurately estimated.
Next, we have applied our method to three benchmark systems: a sparse scale-free network 51 with a specified noise level and the total influence between any two nodes given, a genetic regulatory network model formulated in terms of a system of Hill-type differential equations 27 , and GRNs of DREAM5 challenge 46 . These analyses suggest that our proposed bounds on the constraints for the  1 -min formulation, extracted from a few time series samples acquired under transient conditions, are of the same order (i.e., they closely envelop) with the constraints estimated based on the full knowledge of the noise level. The  1 -min formulation reduces the inference errors defined in (31) and (29) by 18.53% and 2 to 3 orders of magnitude, respectively, and improves the sparsity of the solution (measured in terms of Hoyer sparsity measure) by 15.9%, in comparison with conventional approaches including various versions of dynamic Bayesian approaches for network inference as well as ND. If instead of the total influence, only the time series gathered under transient conditions is provided, such as in the case of Michaelis-Menten dynamics,  1 -min approach achieves a 4 order reduction in inference error compared to MRA. The total influence G matrix is estimated by Pearson correlation (blue/dark), Spearman correlation (red/light) and Mutual Information (green/light). Compared to ND, the prediction scores with  1 -min are increased by 23 .65% for S. cerevisiae, respectively. For in silico data, ND gives a solution with 11% higher prediction score but 33% less sparse than  1 -min approach. Averaging slightly improves the performance of all methods (< 10%).
Scientific RepoRts | 6:21963 | DOI: 10.1038/srep21963 These theoretical and and numerical studies suggest that our proposed method can be employed to effectively infer the presence of dynamic coupling (i.e., arc set or the direct influence in a dynamic network) based on sparse samples.
As with any network reconstruction approach, the method assumes that the time series realizations taken together can adequately mirror the salient dynamic regimes of the underlying process 52 , and as noted earlier, the approach is restricted to ensuring high levels of specificity and not sensitivity in identifying the direct influences. Additionally, while the approach is fairly robust to the presence of noise, the estimates ŝ ij from the averaging procedure for the arcs with = s 0 ij 0 is guaranteed to converge to zero only in the presence of additive noise. More specifically, one of the following conditions need to hold for the approach to be applicable: (1) the governing equation of the process dynamics is specified, so that ( ) G t or ( ) R t can be constructed; (2) one or more realizations of ( ) G t (based on ND or silencing method) or ( ) R t (based on MRA) are given. In our experience, 30 realizations ensured the convergence of the averaging method; (3) one realization of a n-dimensional time series is available for estimating ( ) G t using various alternative methods outlined in Feizi et al.'s 26 or n 2 time series realizations with the same initial condition are available for estimating ( ) R t using Eq. (17). Note that Scenario 1 is useful only for applications such as to investigate if there exists a more compact (sparser) network representation to capture the specified process dynamics. In Scenarios 2 and 3, we assume that the noise level or its lower limit is known, and adequate number of realizations are available to ensure convergence of the averaging method. In scenario 3, Eq. (17) yields a finite space-time approximation of the partial derivatives , ing the parameters p j and keeping the initial condition the same for two time series signals. The length of the time series in this case can be really small, or it can just be samples taken over multiple (roughly 30), short (can be even 2 samples) time windows. However, the time steps (or sampling interval) in each time window must be small enough to ensure that ( ) R t values locally converge. Sensitivity of the network inference performance to time step size, however, needs further investigation.
Efforts are underway to address some of the  1 -min aforementioned limitations. We are investigating a two-stage approach to recover local nonlinear dynamics from sparse time series data. For future research, we will consider a more realistic scenario where not all state variables can be measured. In GRN inference, for example, only the outputs/activations of only those genes that have been discovered are measured. However, unknown genes might have significant influence on the network structure. Removing the effects of unmeasured variables, when combined with the method proposed in this paper, will lead to a more advanced network inference method.