Reconstruction of Complex Directional Networks with Group Lasso Nonlinear Conditional Granger Causality

Reconstruction of networks underlying complex systems is one of the most crucial problems in many areas of engineering and science. In this paper, rather than identifying parameters of complex systems governed by pre-defined models or taking some polynomial and rational functions as a prior information for subsequent model selection, we put forward a general framework for nonlinear causal network reconstruction from time-series with limited observations. With obtaining multi-source datasets based on the data-fusion strategy, we propose a novel method to handle nonlinearity and directionality of complex networked systems, namely group lasso nonlinear conditional granger causality. Specially, our method can exploit different sets of radial basis functions to approximate the nonlinear interactions between each pair of nodes and integrate sparsity into grouped variables selection. The performance characteristic of our approach is firstly assessed with two types of simulated datasets from nonlinear vector autoregressive model and nonlinear dynamic models, and then verified based on the benchmark datasets from DREAM3 Challenge4. Effects of data size and noise intensity are also discussed. All of the results demonstrate that the proposed method performs better in terms of higher area under precision-recall curve.

Moreover, some methods consider taking some polynomial and rational functions as a prior information which is usually difficult to obtain in advance, for subsequent model selection 39,40 .
In this paper, we concern over inferring complex nonlinear causal networks from time-series and propose a new method termed as group lasso nonlinear conditional granger causality (GLasso-NCGC). Particularly, we establish a general data-driven framework for nonlinear network reconstruction from time-series with limited observations, which doesn't require the pre-defined model or some polynomial and rational functions as a prior information. With obtaining multi-source datasets based on the data-fusion strategy introduced in recent high quality paper 3 , we first introduce the formulation of multivariate nonlinear conditional granger causality model (NCGC), and exploit different groups of radial basis functions (RBF) to approximate the nonlinear interactions between each pair of nodes respectively. Then we decompose the task of inferring the whole network into local neighborhood selections centered at each target node. Together with the natural sparsity of real complex networks, group lasso regression 41 is utilized for each local structure recovery, where different RBF variables from different centers in each node should be either eliminated or selected as a group. Next, we obtain the candidate structure of the network by resolving the problem of group lasso regression. As a result, the final network can be judged by significance levels with Fisher statistic test for removing false existent links.
For the purpose of performance evaluations for our proposed method, simulated datasets from nonlinear vector autoregressive model are firstly generated. Compared with CGC, Lasso-CGC and NCGC, we find GLasso-NCGC outperforms other methods in terms of Area Under Precision-Recall curve (AUPR) and Receiver-Operating-Characteristic curve (AUROC). Effects of data size and noise intensity are also investigated. Besides, the simulation based on nonlinear dynamic models with network topology given in advance is carried out. We consider two classical nonlinear dynamic models which are used for modelling biochemical reaction networks 40 and gene regulatory networks 42 respectively. Especially for gene regulatory networks with Michaelis-Menten dynamics, we simulate gene regulatory model on random, small-world and scare-free networks. Then we explore the performance of these methods influenced by different average degrees, noise intensities and amounts of data simultaneously for these three types of networks. Based on the sub-challenge of Dialogue on Reverse Engineering Assessment and Methods (DREAM), we finally use the benchmark datasets of DREAM3 Challenge4 for investigation. All of the results demonstrate the proposed method GLasso-NCGC executes best on the previous mentioned datasets, which is fully certified to be optimal and robust to noise.

Models and Methods
Multivariate conditional granger causality.
in multivariate conditional granger causality model, the current value of Y t can be expressed by the past values of Y and Z 1 , Z 2 , …, Z N−2 in Equation (1). Meanwhile, Y t can be also written as the joint representation of the past values of Y, X and Z 1 , , are the coefficients in VAR model, p is the model order, ε 1 and ε 2 are the noise terms. Let Z = {Z 1 , Z 2 , …, Z N−2 }, then the CGC index can be shown as CGCI X→Y|Z = ln (var(ε 1 )/var(ε 2 )), which is used to analyze the conditional causal interaction between two temporal variables. If the variance var(ε 1 ) is larger than the variance var(ε 2 ), i.e. CGCI X→Y|Z > 0, X causes Y conditioned on the other sets of N − 2 variables Z. Generally, the statistic judgement of terminal results can be executed by significant levels with F-test.
Group lasso nonlinear conditional granger causality. The unified framework of multivariate nonlinear conditional granger causality (NCGC) model is proposed as shown in Equation (3).
is data matrix of nonlinear kernel functions. The detailed descriptions for all the elements above are given in Equation (4).
For each target variable y i ∈ Y, we can obtain N independent equations as follows.   1 is the set of n centers in the space of X j , which can be acquired by k-means clustering methods. α i is the coefficient between target variable y i and Φ(X). a ij (n) is the coefficient corresponding to the function ϕ X ( ) n j k . Generally, we can use least square method to deal with Equation (4). But the solution procedure of Equation (4) may be problematic when the number of variables is relatively larger than the number of available samples. With the knowledge of sparsity in A, we can turn to regularization-based methods for help. In this case, it's noteworthy that apparent groupings exist among variables. So variables belonging to the same group should be regarded as  a whole. Here, a series of RBF variables from the different centers in the same time-series should be either eliminated or selected as a group. Then group lasso is adopted to solve this problem of sparse regression as follows.
In Equation (5), we take l 2 norm as the intra-group penalty. The detailed algorithm of Group Lasso Nonlinear Conditional Granger Causality is shown in Algorithm 1. With small samples at hand, first-order vector autoregressive model is usually considered 20,21 . Besides, in our consideration, model order is also set as p = 1, which is in accordance with the characteristics of dynamic systems governed by state-space equations. For the time-delayed dynamic systems, the implementation of higher-order vector autoregressive model can be straightly extended. In addition, we can use cross validation criterion for the selection of optimal parameters, such as the number of RBF centers n and the coefficient of penalty terms λ i . Then we mainly describe the selection of λ i as follows. We use two stages of refined selection which are similarly adopted by Khan et al. 43 . In the first stage, we set the coarse values of the search space λ ∈ {λ 1 , λ 2 , λ 3 , λ 4 , λ 5 }, which determines the neighborhood of the optimal λ i . In the second stage, we obtain the neighborhood of the optimal λ i for refined search, i.e. λ ∈ [0.5λ i , 1.5λ i ], the interval Δλ = kλ i (0 < k < 1). Here, λ 1 = 10 −4 , λ 2 = 10 −3 , λ 3 = 10 −2 , λ 4 = 10 −1 , λ 5 = 1, k = 0.1. For example, if we choose λ 3 = 10 −2 in the first stage, we next confine the refined search in the range of λ ∈ [0.005, 0.015], where Δλ = 10 −3 . For large-scale networks, we can just take relatively larger k to reduce the range of search space and ensure the low computational cost.
Meanwhile, in order to ensure the variety of datasets, multiple measurements are often carried out under different conditions (adding perturbations or knocking out nodes et al.). As a result, the extension of Equation (3) can be written as the following formula.
In Equation (6), m denotes the number of measurements. At m-th measurement, ∼ Y m and ∼ X m are acquired for integrating them in Equation (6). Then Equation (6) can be also divided into N independent equations that would be solved by group lasso optimization with Equation (5). Finally, we execute nonlinear conditional granger causality with F-test in terms of the given significant level P val . In our paper, we set P val = 0.01 for all the statistic analysis.

Results
Nonlinear vector autoregressive model. The first simulation is based on a nonlinear vector autoregressive model with N = 10 nodes (variables), see Equation (7). 2: According to model order p, form the data matrix X = [X 1 , X 2 , …, X N ] and target matrix Y = [y 1 , y 2 , …, y N ].
3: Formulize the data matrix X into Φ(X) based on radial basis functions as shown in Equation (4).

4: For each target variable y i ∈ Y
• Execute group lasso: • Obtain candidate variable sets S i for each y i according to α i . Rearrange data matrix Φ(X) with S i expressed as Φ S i and reform the expression of nonlinear conditional granger causality model.
• Execute nonlinear conditional granger causality with F-test in terms of the given significant level P val . Confirm the causal variables of y i . The directed graph from Equation (7) together with true adjacency matrix are shown in Fig. 1. In Fig. 1(a), black lines are linear edges and blue lines are nonlinear edges. In Fig. 1(b), white points are existent edges among nodes, while black areas stand for nonexistent edges. Next, the synthetic datasets (M = 100 samples) are obtained based on the model of Equation (7), where ε is zero-mean uncorrelated gaussian noise terms with identical unit variance.
Networks inferred by CGC, Lasso-CGC, NCGC and GLasso-NCGC are shown in Fig. 2. According to the number of edges correctly recovered (True Positives), we can find that GLasso-NCGC almost captures all the existing edges except for the edges 2 → 2, 3 → 3, 3 → 8. Compared with GLasso-NCGC, NCGC additionally fails to recover two existing edges 6 → 8, 10 → 7. Due to neglect the nonlinear influences among nodes, both CGC and Lasso-CGC fail to find many existing edges, which leads to many edges falsely unrecovered (False Negatives).
For further measuring performance we plot ROC curves and PR curves. Figure 3 is PR and ROC curves generated by CGC, Lasso-CGC, NCGC, GLasso-NCGC respectively. Obviously, it can be seen from Fig. 3 that GLasso-NCGC outperforms its competitors with the highest score (AUROC and AUPR).
In the following section, the performance of robust estimation with different methods is explored. Firstly, multi-realizations are carried out, here the number of realizations is set as 100. Figure 4 demonstrates discovery rate matrixes from multi-realizations. Table 1 shows the comparison of discovery rate of true positives inferred by these methods. Discovery rate means the total number of each edge rightly discovered over multi-realizations. Compared with CGC, Table 1 summarizes Lasso-CGC improves the performance with relatively larger discovery rate in true edges. However, due to neglecting the nonlinearity of modeling, they can't manifest better performance than NCGC and GLasso-NCGC. In general, for nonlinear edges, we can discover GLasso-NCGC nearly outperforms other methods. Specially, for edges 1 → 6, 2 → 7, 3 → 8 and 4 → 9, GLasso-NCGC and NCGC greatly identify these true causal edges at large percentages of 100 independent realizations. For linear edges, both GLasso-NCGC and NCGC also maintain high discovery rate.
Relatively, GLasso-NCGC utilizes group lasso to promote the performance of NCGC by silencing many false positives and realize the best reconstruction of adjacency matrix at last. Average AUROC and Average AUPR of different methods over 100 realizations are calculated in Table 2.
Then, we explore the performance of CGC, Lasso-CGC, NCGC and GLasso-NCGC influenced by different number of samples. Given that the number of samples varies from 100 to 300, at each point, we get the Average AUROC and Average AUPR over 100 independent realizations respectively. From Fig. 5(a) and (b), we find the performance of these four methods improves as the number of samples increases. GLasso-NCGC scores highest in both Average AUROC and Average AUPR.
Next, given samples with M = 100, the performance of CGC, Lasso-CGC, NCGC and GLasso-NCGC influenced by different noise intensities σ (0.1~1) is also compared as shown in Fig. 5(c) and (d). At each intensity of noise, the Average AUROC and Average AUPR over 100 independent realizations are computed respectively. GLasso-NCGC also wins the best score in both Average AUROC and Average AUPR.
Nonlinear dynamic model. Consider the complex dynamic system of N variables (nodes) with the following coupled nonlinear equations.  where state vector x = [x 1 , x 2 , …, x N ] T , the first term on the right-hand side of Equation (8) describes the self-dynamics of the i-th variable, while the second term describes the interactions between variable i and its interacting partners j. The nonlinear functions F(x i ) and H(x i , x j ) represent the dynamical laws that govern the variables of system. Let B be the adjacent matrix which describes the interactions among variables. b ij ≠ 0 if the j-th variable connects to the i-th one; otherwise, b ij = 0. ε i is zero-mean uncorrelated gaussian noise with variance σ 2 . Indeed, these is no unified manner to establish the nonlinear framework for all the complex networked systems. But Equation (8) has the broad applications in many science domains. With an appropriate choice of F(x i ) and H(x i , x j ), Equation (8) is used to model various known systems, ranging from ecological systems, social systems and physical systems 11,18,40,42 . Specifically, Equation (8) can be transformed into discrete-time expressions.
where, t k+1 − t k = 1. Here, the simulation systems are as follows. S1: Biochemical reaction network (BRN).   Here, x 1 , x 2 and x 3 are concentrations of the mRNA transcripts of genes 1,2,3 respectively; x 4 , x 5 and x 6 are concentrations of the proteins of genes 1,2,3 respectively; α 1 , α 2 , α 3 are maximum promoter strength for the corresponding gene; γ 1 , γ 2 , γ 3 are mRNA decay rates; γ 4 , γ 5 , γ 6 are protein decay rates; β 1 , β 2 , β 3 are protein production rates; n 1 , n 2 , n 3 are hill coefficients. The parameters of Equation (9) are given as follows.  (9), we can find GLasso-NCGC almost recover all of the real edges over 100 independent realizations as shown in Fig. 6(c). From Fig. 6(c), CGC, Lasso-CGC and NCGC can't discover the entire true positives but result in lots of false positives. In detail, we next choose some representative edges for further analysis. The discovery rate of these representative edges are calculated in Table 3. Compared with CGC, Lasso-CGC and NCGC, GLasso-NCGC demonstrates a considerable advantage for both these true positives and false positives.

GLasso-NCGC
With the quantitative comparison of these methods in Fig. 6(b), we can observe GLasso-NCGC get the highest Average AUROC and Average AUPR with the smallest standard deviations (resp. 0.9905 (0.0234) and 0.9037 (0.0235)). Here, we also explore the performance of robust estimation with different Granger Causality methods.
We first construct adjacent matrix B with N nodes and simulate gene regulatory model with three classical types of networks, including random, small-world and scare-free networks (resp. RN, SW, SF). Then synthetic datasets of M samples are generated from the gene regulatory model of Equation (10). In order to reconstruct large-scale complex networks, multiple measurements from different conditions should be executed by adding perturbations in some different ways or with random initial values. Meanwhile, the outputs of model are supposed to be contaminated by gaussian noise. The number of multiple measurements is set as m. T time points are obtained at each measurement. As a result, data matrix with M × N is collected (M = mT).
Here, we take size 100 SW network with average degree 〈k〉 = 5 for detailed analysis. We set gaussian noise intensity as 0.1 and generated 400 samples with 100 multi-measurements (each measurement with 4 time points).   Table 3. Comparison of discovery rate of representative edges extracted from true adjacency matrix in BRN simulation.

GLasso-NCGC
True SW network are given in Fig. 8(a) together with reconstructed networks which are inferred by different methods. From Fig. 8(a), we can observe that the network inferred by GLasso-NCGC is most similar to the true network. To some extent, both CGC and Lasso-CGC have similar structures compared with the true network. However, they generated lots of false positives. Meanwhile, it can be seen that NCGC almost failed in this case. Because there are more parameters involved in NCGC model and the ratio of the number of samples to the number of parameters is relatively smaller than that of CGC model. Actually, the best performance of GLasso-NCGC proves that regularization-based methods have superiority in the case of more parameters and relatively smaller samples. In Fig. 8(b), we plot PR curves and ROC curves of CGC, Lasso-CGC, NCGC and GLasso-NCGC. The almost perfect reconstruction is ensured by GLasso-NCGC. Furthermore, reconstructed values of elements in different inferred matrixes for size 100 SW network can be shown in Fig. 9(a). Red points are existent edges and green points are nonexistent edges. CGC, Lasso-CGC and NCGC cannot make a distinction between existent and nonexistent edges, because there are so many overlaps between red and green points without a certain threshold (P val ) for separation. However, GLasso-NCGC shows a vast and clear gap between existent and nonexistent edges. Based on the consideration of robustness, we next apply our method with respect to different intensities of noise . respectively. From Fig. 9(b), GLasso-NCGC also maintains a relatively good performance with strong measurement noise.
Then we explore the performance of these methods influenced by different average degrees, noise intensities and amounts of data simultaneously for different types of networks. The detailed results are shown in Table 4. In general, all of the results demonstrate the optimality and robustness of GLasso-NCGC.
In order to compare the computational requirement by these methods, we simulate the SW and SF networks for comparison (N = 100, 〈k〉 = 5, M = 400). And we calculate the average computational time over 10 independent realizations as shown in Table 5. The specifications of the computer used to run the simulations are as follows. Matlab version: R2013a (64 bit); Operating system: Windows 10 (64 bit); Processor: Intel(R) Core(TM) i7-4770 CPU @ 3.  respectively.
conditions. For the networks of size 50 and size 100, the number of measurements are set as 23 and 46 respectively (each measurement with 21 time points). To assess these results of different methods, we also calculate the AUROC and AUPR. We can discover that some methods get the highest AUROCs, while their AUPRs are pretty small. In many cases, good AUROC might accompany by a low precision because of a large ratio of FP/TP. Thus, AUPR is taken as the final evaluation metric. In Table 6, GLasso-NCGC gets the best AUPR in all the datasets only except for size 50 network of Yeast1. Furthermore, the average AUPRs of these methods are subsequently computed and plotted as shown in Fig. 10. Finally, GLasso-NCGC is also found to execute optimally with the highest average AUPR.

Conclusions
Reconstructing complex network is greatly useful for us to analyze and master the collective dynamics of interacting nodes. In our work, with getting multi-source datasets based on the data-fusion strategy, the new method namely group lasso nonlinear coditional granger causality (GLasso-NCGC) is proposed for network recovery with time-series. The evaluations of performance address that GLasso-NCGC is superior to other mentioned methods. Effects of data size and noise intensity are also discussed. Although the models or applications we mainly focus on here are biochemical reaction network and gene regulatory network, our method can be also used to other complex networked systems, such as kuramoto oscillator network and mutualistic network 40,42 .
Here, it is also important to remember that we just adopt the model of first-order nonlinear conditional granger   Table 5. The average computational time (in sec.) of different methods over 10 independent realizations in the simulation of GRN model on SW and SF networks. causality (NCGC), which is in accordance with the characteristics of dynamic systems governed by state-space equations. For the time-delayed dynamic systems, such as coupled Mackey-Glass system 28 , the framework of our method can be flexibly extended to higher-order NCGC model with group lasso regression, which can be waited for the prospective researches. In the information era with explosive growth of data, our proposed method provides a general and effective data-driven framework for nonlinear network reconstruction, especially for the complex networked systems that can be turned into the form of Y = Φ(X)A.