Abstract
Identifying a critical transition and its leading biomolecular network during the initiation and progression of a complex disease is a challenging task, but holds the key to early diagnosis and further elucidation of the essential mechanisms of disease deterioration at the network level. In this study, we developed a novel computational method for identifying earlywarning signals of the critical transition and its leading network during a disease progression, based on highthroughput data using a small number of samples. The leading network makes the first move from the normal state toward the disease state during a transition, and thus is causally related with diseasedriving genes or networks. Specifically, we first define a statetransitionbased local network entropy (SNE), and prove that SNE can serve as a general earlywarning indicator of any imminent transitions, regardless of specific differences among systems. The effectiveness of this method was validated by functional analysis and experimental data.
Introduction
There is usually a sudden health deterioration during gradual progression of many chronic diseases, such as prostate cancer^{1}, asthma attacks^{2}, epileptic seizures^{3}, and others^{4,5,6,7,8}. This critical phenomenon generally results in a drastic or a qualitative transition in the focal system or network from a normal state to a diseased state, which corresponds to a socalled bifurcation point in dynamical systems theory^{9,10}. Clearly, predicting and further elucidating this critical transition at the network level hold the key to understanding the fundamental mechanism of disease development and progression.
In general, disease progression can be divided into three states^{11}, i.e., a normal state, a predisease state (or a critical state), and a disease state (Figs. 1 a–d). In the normal state, a disease is under control or in an incubation period, and dynamically it can be considered to have high resilience and robustness to perturbations (Figs. 1 b and g, also Fig. S1 a). The predisease state is defined as the limit of the normal state, which occurs before the imminent phase transition point is reached, but it has low resilience and robustness due to its dynamical structure (Figs. 1 c and g, also Fig. S1 a). At this stage, the system is sensitive to external stimuli but still reversible to the normal state when appropriately interfered with, but a small change in the parameters of the system may suffice to drive the system into collapse through bifurcation, which often implies a phase transition to the disease state. The disease state represents a seriously deteriorated stage possibly with high resilience and robustness (Figs. 1 d and g, also Fig S1 a), where the system usually finds it difficult to recover or return to the normal state even after treatment, which contrasts with the predisease state. Therefore, it is crucial to detect the predisease state to prevent qualitative deterioration and to further elucidate its molecular mechanism.
For many complex diseases, however, it is a difficult task to predict a predisease state because the state of the system may change little before the bifurcation point or the critical transition is reached, namely, there may be little difference between the normal and predisease states; note that a predisease state can be considered as a limit of the normal state but a disease state is different from the normal state. This is also why diagnosis based on traditional biomarkers or static measurements may fail to distinguish a predisease state from a normal state (Fig. 1e). Another obstacle that hampers the detection of earlywarning signals is the complexity of diseases, which can involve thousands of genetic factors (e.g., SNPs and CNVs) and epigenetic factors (e.g., methylation and acetylation). To overcome those problems, a general earlywarning indicator was recently developed based on a new modelfree concept, i.e., a dynamical subnetwork of biomarkers, or a dynamical network biomarker (DNB)^{11}, which appears only during a predisease state and satisfies three measurable conditions. A DNB can distinguish a predisease state from other states in any disease at least fundamentally with dynamical nature (regardless of their differences) (Figs. 1 c, f and g) and it also has a solid theoretical basis derived from bifurcation theory and center manifold theory^{12,13}. In particular, DNBs have been proven theoretically to be the leading biomolecular networks (or leading networks) in critical transitions, which make the first move from a normal state to a disease state. In other words, the leading network is the first subnetwork that breaks down the limit of a normal state to move into a disease state, which means that they are clearly related to the causal (or driving) genes in a disease network, in contrast to the differential gene expression that results from the disease (as the consequence of the disease). Therefore, identifying the leading network during a critical transition can signal the emergence of a predisease state so as to make the early diagnosis on the disease, and also help to elucidate the mechanism of disease initiation and progression at the network level. As shown in Fig. S1, a DNB is a dynamical signal to identify the predisease state, rather than the disease state detected by the traditional static biomarkers.
In general, the reliable identification of DNBs and predisease stages from many thousands of genes and from many stages using high throughput data is not easy to achieve with noisy data and a small number of samples^{14}, which makes the identification of DNBs and their critical stages inaccurate. Identifying DNBs that satisfy the three conditions among genomewide scale variables in highthroughput datasets is also a computationally intense task. Therefore, an effective computational method is required to reliably and efficiently identify DNBs to accurately predict predisease states and further elucidate the mechanism of disease deterioration. In this study, we used center manifold theory as the basis to develop a novel computational method for identifying the leading networks before critical transitions in complex diseases in an efficient and accurate manner. To achieve this, we derived a new type of entropy, i.e., statetransitionbased local network entropy (SNE). The SNE is defined as a statetransitionbased Shannon entropy that is conditional on the previous state of a local dynamical network in a Markov process, which is also the entropy rate of the state change in a biomolecular network, where each node represents a gene (or a protein, or a chemical) and each edge represents a regulatory relation between two genes, with the assumption that a Markov process governs the dynamics of each node. Given a biomolecular network, e.g., a proteinprotein interaction (PPI) network or a correlation network, we can theoretically prove that the SNE is drastically reduced when the system approaches a predisease state, whereas there is no significant change in the SNE at normal and disease states. Thus, the SNE provides a clear signal for detecting the critical transition and its leading network (or DNB) (see Figs. 1 f and g, also Fig. S1). In particular, only local information is required to evaluate the SNE for a specific node, so we can design a very efficient algorithm that avoids the processing of most of the noisy data, which reduces the computational requirements and also improves the reliability and accuracy. In addition, this is a modelfree method that can be theoretically applied to any disease or biological system with sudden transition events, while it also requires no parameter tuning due to the nature of the SNE. From a dynamical viewpoint, we show that the SNE can quantify the robustness of a system during the time evolution of disease progression, which can be applied directly to the dynamical and structural analysis of network rewiring during disease progression for many biomedical problems. The entire network can also be naturally decomposed into four layers (i.e., a DNB core, a DNB boundary, a nonDNB boundary, and a nonDNB core) based on the topological structure of the leading network (see Fig. 2a), which reveals the dynamical roles of genes (or proteins, or chemicals) during disease development and progression. To demonstrate the effectiveness and efficiency of the SNE, we applied our method to a simulated dataset and two real disease (i.e., lung injury and liver cancer) datasets, and we successfully identified the critical transitions and the leading networks. The relevance of the identified leading networks to the diseases was also validated by functional enrichment analysis and by using related experimental data. Note that DNB or the leading network in this paper is not for identifying the critical transition phenomenon but for detecting the state just before the critical transition (i.e., a predisease state), and therefore, it is of great importance for early diagnosis of complex diseases. Without confusion in this paper, identifying the critical transition means identifying the state just before the critical transition.
Results
The dynamics of the progression of complex diseases is usually very complicated before and after sudden deterioration, and it may not be fully expressed even by using a very highdimensional space. However, provided that a system is driven by some known or unknown parameters approaching to a bifurcation point, a system can be generally guaranteed to be eventually constrained to a one or twodimensional space (i.e., the center manifold), which can be expressed in a very simple form for any dynamical system regardless of their differences^{11}. This is the theoretical basis for developing a general indicator that detects critical transitions and their leading networks in this study.
State transitions in biomolecular networks
The network entropy was originally proposed for the study of demographic stability in population models^{15,16} and further extended to analyzing the topology and robustness of protein interaction networks^{17}. Theoretically, the network entropy was derived from the entropy rate^{18}, i.e., a conditional entropy, for measuring network robustness and stability based on a random walk process among the nodes (e.g., genes or proteins) of a network. However, the dynamics of a biological network cannot generally be described simply by individual moves from one gene to another via a random walk, because it is dominated by changes in the system state governed by the dynamical network. In this study, we define a general network entropy based on the state transitions of a local network, rather than the special dynamics of a random walk. This entropy is also a local networkbased measurement with state transition variables (not with the original state variables), so it can be exploited to overcome computational difficulties, such as the computational complexity, noisy data, and accuracy, which are encountered during highthroughput data processing.
First, we define the network state (or original variables) and the transition states for a dynamical network in a Markov process. For an nnode network, let Z(t) = (z_{1}(t), …, z_{n}(t)) represent the network state at t, where z_{i}(t) denotes the expression value of node (i.e., a gene or protein) i. Then, x_{i}(t) ∈ {0, 1} is defined to measure whether or not node i has a large change at the sampling time point t, i.e., if z_{i}(t)−z_{i}(t−1) is sufficiently large such as z_{i}(t)−z_{i}(t−1) > d_{i}, then x_{i}(t) = 1, otherwise x_{i}(t) = 0 (see Supplementary Information A for detailed definition). Thus, X(t) = (x_{1}(t), …, x_{n}(t)) is the transition state for the network at t.
Next, we define a local network structure that is centered on each node, which is the basis for deriving the network entropy, i.e., the SNE. We assume that node i has m linked firstorder neighbor nodes i_{1}, …, i_{m} (see Fig. 2b), which form a local network centered on node i with local transition state at t. Clearly, based on the current state X^{i}(t) = A_{1} at time t, there is a total of 2^{m}^{+1} possible state transitions (or possible transition states) X^{i}(t+1), which are denoted as , for this local network at the next time point t + 1 (see Supplementary Information B for details). To simplify the notation, we omit i and denote X^{i}(t) as X(t), while we also denote the transition state simply as the state.
Based on the network structure, we can derive the Markov matrix P = (p_{u,v}), where p_{u,v} describes the transition rate from state v to state u as follows: where u, v ∈ {1, 2, …, 2^{m}^{+1}} and (see Supplementary Information A for detailed definition and derivation). Thus, we derive the following stochastic Markov process for X(t): with X(t + i) = A_{u}, u ∈ {1, 2, …, 2^{m}^{+1}}.
SNE and robustness
We assume a stationary Markov process for each local network centered on node i with its m linked firstorder neighbor nodes i_{1}, i_{2}, …, i_{m}, and we define the SNE at node i as where is stationary distribution that satisfies . We can easily prove that H_{i}(t) = H(X(t)X(t − 1)), i.e., it is a conditional entropy. Moreover, we can prove (see^{18,19}): where H(X(t), X(t + 1), …, X(t + T)) is the entropy of {X(t), …, X(t + T)}. In other words, H_{i}(t) is actually an entropy rate. The detailed derivation is given in Supplementary Information B.
Thus, for a network with n nodes, the average SNE at t is given by A significant feature is that the SNE can be proven to be positively correlated with the robustness or resilience of the network, which is defined as its capacity to withstand random changes (See Supplementary Information B for detailed descriptions). Thus, a higher H_{i}(t) is correlated with a more robust local system. At the critical transition, the system undergoes a qualitative structural change with the lowest H_{i}(t) with H_{i}(t) → 0. Thus, the SNE can quantitatively measure the structural stability and robustness of the system and detect the critical transition. It is worth noting that the sharp decrease of network robustness coincides with the critical decline of system resilience^{20,21} when the system approaches a tipping point, a generic dynamical phenomenon known as “critical slowing down”^{22,23}.
Clearly, the SNE has two major differences from other entropy definitions. (a) First, it is a statetransitionbased entropy, i.e., it depends on the state transitions of a dynamical network, rather than the special dynamics of a random walk among nodes. (b) Second, it is a localnetworkbased entropy, i.e., it depends on the local structure of a dynamical network. Based on these two features, we can evaluate the system robustness in an accurate (a) and efficient (b) manner, even with noisy data and a small number of samples. We also adopt the transition state variables in the SNE, rather than the original variables, which characterize the main differences between normal and predisease states.
Identifying critical transitions and their leading networks using SNE
Next, we describe our main theoretical results related to the identification of critical transitions and their leading networks during disease progression using the SNE. Note that identifying a critical transition in this paper means identifying the state (or early signal) just before the critical transition, rather than the critical transition phenomenon.
It can be proved that there is a group of molecules (i.e., genes or proteins), known as the dominant group^{11}, that satisfy all of the following conditions in terms of state transition variables (see the proof in Theorem 2 in Supplementary Information A) for any two samples (i.e., at time t and time t + T).
When the system is in the normal state, the following result holds.
For any two nodes i and j (including i = j) in the network, x_{i}(t+T) is statistically independent of x_{j}(t).
When the system approaches a critical transition point, the following result holds.
If both i and j are in the dominant group or the DNB, there is a strong correlation between x_{i}(t + T) and x_{j}(t);
If neither i nor j (including i = j) is in the dominant group, x_{i}(t + T) is statistically independent of x_{j}(t).
The network containing this dominant group is known as the DNB^{11} or the leading network of the critical transition, which makes the first move from the normal state toward the disease state during the critical transition.
Based on the DNB, we can decompose all of the genes in a network into four layers (see Fig. 2a), i.e., DNB core genes, DNB boundary genes, nonDNB boundary genes, and nonDNB core genes. Calculating the SNE using Eq.(3), we can show that the following results hold when a system approaches a critical transition (see Methods and Supplementary Information C for detailed analysis):
for DNB core genes (type1, red nodes in Fig. 2a), the SNE drastically decreases;
for DNB boundary genes (type2, orange nodes in Fig. 2a), the SNE decreases;
for nonDNB boundary genes (type3, blue nodes in Fig. 2a), the SNE decreases;
for nonDNB core genes (type4, purple nodes in Fig. 2a), the SNE remains almost constant.
Note that the DNB or the leading network is composed of DNB core and DNB boundary genes, whereas nonDNB boundary genes are firstorder neighbors of the DNB. The DNB core, DNB boundary, and nonDNB boundary genes form an SNE network, although most genes among the whole biological system are generally expected to be nonDNB core genes.
Based on the above results, although the average SNE (or the summed SNE) of the entire network decreases as the system approaches a critical transition, it is not an efficient strategy for identifying the critical transition and the leading network because of noisy data and measurement errors when using all of the genes. To obtain a clear earlywarning signal, we can intentionally select those genes with the SNEs that decrease or drastically decrease, and then calculate the average SNE using Eq.(5). In this manner, we can reduce the effects of noise and data errors (see Supplementary Information D), and greatly improve the sensitivity for detecting the early warning signal. Close to the critical transition, the average SNE of these genes will decrease drastically, thereby providing a clear early warning signal. It should be noted that genes with the decreasing SNEs are DNB genes (DNB core genes and DNB boundary genes) and the firstorder neighbors of the DNB (nonDNB boundary genes), which form an SNE network that covers the leading network (see Fig. 2a). The SNE network provides sufficient information for studying the leading network, but further clustering of these genes based on the SNEs and the three conditions of the DNB can identify accurate gene sets for the leading network or the DNBcore network (see Supplementary Information F). To summarize the above theoretical result on the early warning signal by SNE, we have the following statement.
Drastic decrease of the average SNE is the early warning signal of the critical transition.
In other words, the drastic decrease of the average SNE on the DNB implies the emergence of the critical transition or the predisease state. Finally, it should be noted that in this paper we aim to identify the leading network that first moves into the disease state driven by known or unknown factors.
Numerical simulation
To demonstrate the effectiveness of the SNE, we used a sixnode gene regulatory network (shown in Fig. 3a) to show the SNE for each gene and the average SNE. Detailed descriptions of the network represented by a set of stochastic differential equations are provided in Supporting Information E, and numerical simulations are provided in Fig. 3. The numerical simulation shows that a drastic change (or sharp decrease) in the SNE indicates the emergence of a critical transition, which validates that the SNE can serve as a general indicator by detecting an abrupt catastrophic change in the system and the leading network (z_{1}, z_{2}, z_{3}).
Application to complex diseases
We further analyzed the prediction of two complex diseases using highthroughput experimental data, i.e., microarray data for HCVinduced dysplasia and hepatocellular carcinoma (HCC)^{24}, and lung injury after carbonyl chloride inhalation exposure^{25}. Figure 4 shows the identified pretransition states and the leading networks just before the critical deteriorations by our SNEbased method, which agreed with the observed biological phenotypes described in the original datasets^{24,25}.
Figures 4a–b show the SNEbased prediction results, i.e., the minima of the average SNEs indicate the critical transition points (i.e., sampling time point 4 for HCC and sampling time point 4 for lung injury). For HCC, Figs. 4 c–f show the dynamics of the difference SNE for the whole human molecular network and the leading network (or DNB) with their functional interactions (proteinprotein interactions and TFtarget regulations), which clearly shows that the sudden deterioration was near the “very early HCC” stage at which the average SNE of the identified leading network sharply decreases to the minimum (also see the whole network for other periods in Supplementary Information Fig. S9). Clearly, the members of the DNB (or the leading network) behaved in a significantly different way from other genes, but only near the “very early HCC” stage (e), which indicates imminent deterioration (e.g., metastasis). Interestingly, members of the leading network behaved similarly to other genes after the system moved to the deteriorated state, i.e., the early HCC stage (f). Thus, the leading network is not a disease biomarker but a predisease (or predeterioration) biomarker because it only appeared during the “predisease” stage. Figures 4 gj show the dynamics of the difference SNE in the whole mouse molecular network during the evolution of lung injury, which clearly shows the significance of the leading network (DNB) in terms of expression variations and network structures near the critical state (8 h), i.e., the average SNE of the identified leading network sharply decreases to the minimum at 8 h (also see the whole network for other periods in Supplementary Information Fig. S10). Prior to the disease state, there was no significant differences between the DNB (the leading network) members and other genes during all periods (g and h), with the exception of 8 h when the DNB members behaved very differently in terms of their SNEs, i.e., they decreased sharply. However, after the system was driven into the disease state (j), interestingly the DNB members appeared to behave in a manner similar to other genes again. This was consistent with our previous results^{11} and the observed biological phenotypes (see the experimental descriptions provided in Supplementary Information F). The results in the two diseases show the effectiveness of our method in detecting the earlysignal of critical transitions and their leading networks on the basis of small samples.
The detailed algorithm and data description are provided in Supplementary Information E and F, respectively. To explore the biological implication of the leading network where the SNEs decrease sharply, we conducted the functional analysis for these identified genes (or proteins) in the two datasets respectively (See Methods and Supplementary Information H). In particular, we used HCC liver cancer as an concrete example to explain the calculation procedure in details (see Supplementary Information F).
To verify the biological significance of the identified leading network, we also carried out bootstrap and cross validation analysis of the two diseases individually (described in Supplementary Information G). For phosgene inhalation lung injury (acute lung injury), and HCVinduced dysplasia and HCC liver cancer, some enriched gene ontology (GO) functions and dysfunctional pathways underlying the leading networks are listed in Table 1. Some of the identified members of the leading network are also shown in Table 1 (see Supplementary Table ‘Identified extended leading network’ for complete lists). The detailed descriptions are presented in Supplementary Information H.
The functional analysis shows that the members of the leading network are highly relevant to the corresponding complex diseases, which validates the effectiveness of our method. In the HCC study, we found that many genes included in the identified leading network were consistent with the response to HCV infection in vivo, especially the activation of the immune system and the dysfunctions associated with basic cell metabolism of hosts^{26,27,28}. The results of GO and pathway enrichment analyses are provided in Supplementary Table ‘KEGG enrichment analysis’. These results show that the genes of the leading network played significantly important roles during disease development. In the enrichment analysis, the most enriched functions indicate its significant relationship with disease evolution. At the pathway level, the pathways in cancer and Hepatitis C appear to be significant, which provides clear evidence that most of the genes selected by the SNE are directly related to HCC. Some enriched pathways are related to the dysfunctions in basic cell metabolism, which implies the reproduction and release of HCV. Some pathways show the common characteristics of cancer, especially the signaling pathways involved in cell growth, such as transcriptional misregulation in cancer, purine metabolism, the Wnt signaling pathway, and the TGFbeta signaling pathway. These dysfunctional pathways indicate the cell status when HCV invades the host cells. HCV readily uses the host resources to replicate its genetic material (RNA) for viral replication. At the GO function level, some genes were also related to important biological processes beyond the pathways mentioned above. For example, CLU, IL1B, and TNF lead to an inflammatory response^{27}. The regulations of antiapoptosis and growth are also dysfunctional^{28}. However, in order to reproduce and release, the functional interruption of RNA biosynthetic processes and gene expression are enriched in the genes CD81, POLR1A, POLR1E, TCERG1, and AR, while transport activities are enriched in DRD2, PPARG, JPH2, and SNCA. They are often used for releasing new viruses after their reproduction. We also compared the members of the identified leading network with those reported in a previous study^{28}. We validated that 5 out of the 167 genes (hypergeometric test, pvalue < 0.02) had a close correspondence with HCC induced by HCV infection during the dysplastic stage and the very early stage of HCC. A detailed functional analysis of lung injury is given in Supplementary Information H.
Discussion
In this study, we proposed a new method with the SNE for identifying critical transitions and their leading networks during disease progression. From the viewpoints of both theoretical analysis and numerical computation, we demonstrated that the SNE can be used to characterize the dynamical behavior of a system and provide a general indicator for the detection of the DNB when a system reaches a critical transition point (see Fig. 1 and Fig. 3).
By only choosing the genes with SNEs that decreased drastically, we identified the leading network during a critical transition that made the first move toward a disease state, or a deteriorated state from a nonlinear dynamics perspective. The identification of the leading network using a small number of samples is vital for the early diagnosis of diseases and elucidating the essential mechanisms of disease deterioration at the network level. The leading networks are also related to causal genes or networks, so they can provide a new type of biomarkers for the detection of predisease states in a dynamical manner (Figs. 1 c and g), which contrasts with traditional gene or proteinbased biomarkers that evaluate systems in a generally static manner. It is notable that SNE networks satisfy the three conditions of DNBs in the vicinity of a critical point (see reference 11 and their derivation in Supplementary Information B), so these results are compatible with the DNB theory^{11}.
The advantages of the SNE are obvious. First, the SNE method is efficient because its calculation requires only local information for a local network centered on a specific node. By focusing on the local structure and listing all of the possible state transitions for each node in a network, we can obtain its SNE. Second, we ignore nodes where the SNE increases or changes little, which facilitates a significant reduction in the effect of noise, data errors, and the computational complexity. Third, the SNE can detect the earlysignal of the critical transition and identify its leading network, which facilitates the elucidation of the essential mechanisms during disease development and progression at the network level. Compared with correlationbased methods that can only describe linear dependency relationship between any two nodes, our statetransition based SNE describes nonlinear relationship among nodes and the collective behavioral dynamics of a group of nodes. Fourth, the SNE algorithm is simple to implement and it may be viewed as a DNBfree method, although the theoretical background of the SNE is based on DNB. Finally, the SNE can be also considered as a measurement of resilience and robustness for the corresponding local network, which can be applied directly to the dynamical and structural analysis of network rewiring during disease progression in many biomedical problems. We applied our method to two diseases, i.e., lung injury and liver cancer, which demonstrated its effectiveness and efficiency for identifying critical transitions and their leading networks.
One potential application of DNB in medicine is the early diagnosis for complex diseases. As demonstrated in our previous work and also this paper, it is difficult to distinguish the normal and predisease states by one sample (e.g., by only one health examination in one year) although it is possible to detect the disease state. But with a number of the consecutive samples (e.g., with several health examinations in one year), we can clearly detect the predisease state before the drastic deterioration, thus open a new way to predictive and preventive medicine (and even personalized medicine). Another potential application is to detect the critical transition of a specific biological process (e.g., cell differentiation or cell proliferation process) in biology since our theoretical result can be essentially applied to identify the earlystate before any drastic change of a biological system (or state change).
Methods
Identifying critical transitions and their leading networks using SNE
Dynamical networks
The theoretical results were derived by considering the following equations with noise perturbations near the equilibrium : We assume that {λ_{1}(P), …, λ_{n}(P)} are the eigenvalues of the Jacobian matrix of f at with each λ_{i} between 0 and 1. Among the eigenvalues, the largest in terms of its modulus, say λ_{1}, approaches 1 when parameter P → P_{c}. This eigenvalue characterizes the system's rate of change around a fixed point and it is known as the dominant eigenvalue. In the overall network, nodes that are influenced directly by the dominant eigenvalue λ_{1} are known as the dominant group members, the leading network, or the DNB (see Supplementary Information B), i.e., a group of nodes that make the first move toward the disease state thereby indicating the approach of a sudden deterioration. Thus, in the network shown in Fig. 2a, the nodes or genes can be categorized into four groups based on the topological structure of the DNB, i.e., DNB core genes, DNB boundary genes, nonDNB boundary genes, and nonDNB core genes.
Dynamical evolution of state transitions
We consider the linearized equations of Eq.(6) with the Jacobian matrix which has different real eigenvalues (λ_{1}(P), …, λ_{n}(P))^{29}. After introducing new variables Y (k) = (y_{1}(k), …, y_{n}(k)) and a transformation matrix S, i.e., ΔY (t) = S^{−1}(Z(t)−Z(t − 1)) (Supplementary Information A2), we have where Λ(P) is the diagonalized matrix of , ζ(k) = (ζ_{1} (k), …, ζ_{n}(k)) are Gaussian noises with zero means and covariances κ_{ij} = Cov(ζ_{i}, ζ_{j}), Λ(P) = diag(λ_{1}(P), …, λ_{n}(P)), and y_{1} is the eigenvector of the dominant eigenvalue λ_{1}.
For any integer T > 0, it holds that where ε_{i}(t) is a small white noise (see Supplementary Information A2).
Therefore, given that ΔZ(t) = SΔY (t), we can use Eq.(8) to derive the main results of the dynamical evolution for the original variable ΔZ and the local transition state X (also see the derivations in Supplementary Information A). The detailed derivations of the dynamics for the four types of nodes at different stages are presented in Supplementary Information C.
Data preprocessing and functional analysis
Data processing
Two gene expression profiling datasets were downloaded from the NCBI GEO database (ID: GSE6764, GSE2565) (www.ncbi.nlm.nih.gov/geo). In these datasets, probe sets without corresponding gene symbols were not considered during our analysis. The expression values of probe sets mapped to the same gene were averaged.
In each disease dataset, the expression profiling information was mapped to the integrated networks individually. For each species, we downloaded the biomolecular interaction networks from various databases, including BioGrid (www.thebiogrid.org), TRED (www.rulai.cshl.edu/cgibin/TRED/), KEGG (www.genome.jp/kegg), and HPRD (www.hprd.org). First, the available functional linkage information for Mus musculus and Homo sapiens was downloaded from these databases and combined. After removing any redundancy, we obtained 65625 linkages in 11451 human proteins/genes and 37950 linkages in 6683 mouse proteins/genes. Next, the genes evaluated in these microarray datasets were mapped individually to these integrated functional linkage networks. Furthermore, the leading networks (LNs) for two diseases were identified using the proposed SNE algorithm. In total, there were 167 proteins including four transcriptional factors (TFs) in the leading network identified for HCC, and 182 proteins including 16 TFs in the leading network identified for acute lung injury. These networks were then visualized using Cytoscape (www.cytoscape.org).
Functional analysis of the leading networks
In the high throughput gene expression profiling datasets for the two diseases, i.e., HCC induced by HCV infection and acute lung injury induced by phosgene gas, we detected the earlywarning signals as well as the leading networks using our SNE algorithm and the corresponding protein interaction networks.
It is reported that TFs with high differential expression of their target genes might be causal factors. Bearing this in mind, we added TFs if their targets were present in the leading networks, as well as proteins or genes adjacent to any members of the leading networks. We treated them as the extended leading networks. In Supplementary Information H, we describe the data processing method in detail and present the results of the functional analysis (g:profiler: http://biit.cs.ut.ee/gprofiler/ and NOA: http://app.aporc.org/NOA/)^{30,31} of these leading networks for two complex diseases.
Based on the functional analysis, we found close relationship between the members of the leading networks and complex diseases. Several genes that had been verified in other published reports were also identified. Newly identified genes can be treated as novel biomarker candidates.
We performed a functional analysis of genes in the leading networks identified by the SNE in HCC and the acute lung injury. We also applied enriched pathways in the leading networks and the results were validated by bootstrap and cross validation analysis (See Supplementary Information G for details). Functional enrichment showed that the leading networks have significantly strong relationship with the corresponding diseases. Supplementary Table ‘KEGG Enrichment analysis’ shows the results for the two diseases (see Supplementary Information H).
References
 1.
Hirata, Y., Bruchovsky, N. & Aihara, K. Development of a mathematical model that predicts the outcome of hormone therapy for prostate cancer. J. Theor. Biol. 264, 517–527 (2010).
 2.
Venegas, J. G. et al. Selforganized patchiness in asthma as a prelude to catastrophic shifts. Nature 434, 777–782 (2005).
 3.
Litt, B. et al. Epileptic seizures may begin hours in advance of clinical onset: a report of five patients. Neuron 30, 51–64 (2001).
 4.
McSharry, P. E., Smith, L. A. & Tarassenko, L. Prediction of epileptic seizures: are nonlinear methods relevant? Nature Med. 9, 241–242 (2003).
 5.
Roberto, P. B., Eliseo, G. & Josef, C. Transition models for changepoint estimation in logistic regression. Statist. Med. 22, 1141–1162 (2003).
 6.
Paek, S., Chung, H., Jeong, S. & Park, C. Hearing preservation after gamma knife stereotactic radiosurgery of vestibular schwannoma. Cancer 104, 580–590 (2005).
 7.
He, D., Liu, Z., Honda, M., Kaneko, S. & Chen, L. Coexpression network analysis in chronic hepatitis B and C hepatic lesion reveals distinct patterns of disease progression to hepatocellular carcinoma. Journal of Molecular Cell Biology 4, 140–152 (2012).
 8.
Liu, J. K., Rovit, R. L. & Couldwell, W. T. Pituitary Apoplexy. Seminars in Neurosurgery 12, 315–320 (2001).
 9.
Gilmore, R. Catastrophe Theory for Scientists and Engineers, (Dover, 1981).
 10.
Murray, J. D. Mathematical Biology, (Springer, 1993).
 11.
Chen, L., Liu, R., Liu, Z. P., Li, M. & Aihara, K. Detecting earlywarning signals for sudden deterioration of complex diseases by dynamical network biomarkers. Sci. Rep. 2, 342 (2012).
 12.
Arnol'd, V. I. Dynamical systems V: bifurcation theory and catastrophe theory, (Springer, 1994).
 13.
Murdock, J. Normal forms and unfoldings for local dynamical systems, (Springer, 2003).
 14.
Chen, L., Wang, R., Li, C. & Aihara, K. Modeling Biomolecular Networks in Cells: Structures and Dynamics, (Springer, New York, 2010).
 15.
Demetrius, L., Gundlach, V. & Ochs, G. Complexity and demographic stability in population models. Theoretical Population Biology 65, 211–225 (2004).
 16.
Demetrius, L. & Manke, T. Robustness and network evolutionłan entropic principle. Physica A 346, 682–696 (2005).
 17.
Manke, T., Demetrius, L. & Vingron, M. An entropic characterization of protein interaction networks and cellular robustness. J. R. Soc. Interface 30, 51–64 (2001).
 18.
Cover, T. & Thomas, J. Elements of information theory, (Wiley, New Jersey, 2005).
 19.
GomezGardenes, J. & Latora, V. Entropy rate of diffusion processes on complex networks. Phy. Rev. E 78, 065102(4) (2008).
 20.
Gunderson, L. H. Ecological resilience  in theory and application. .Ann. Rev. Ecol. Syst. 31, 425–439 (2000).
 21.
Walker, B., Holling, C. S., Carpenter, S. R. & Kinzig, A. Resilience, adaptablility and transformability in social  ecological systems. Ecology and Society 9, 1–5 (2004).
 22.
Strogatz, S. H. Nonlinear Dynamics And Chaos: With Applications To Physics, Biology, Chemistry And Engineering, (AddisonWesley, Reading, MA, 1994).
 23.
Van Nes, E. H. & Scheffer, M. Slow recovery from perturbations as a generic indicator of a nearby catastrophic shift. Am. Nat. 169, 738–747 (2007).
 24.
Wurmbach, E. Genomewide molecular profiles of HCVinduced dysplasia and hepatocellular carcinoma. Hepatology 45, 938–947 (2007).
 25.
Sciuto, A. M., Phillips, C. S. & Orzolek, L. D. Genomic analysis of murine pulmonary tissue following carbonyl chloride inhalation. Chem. Res. Toxicol. 18, 1654–1660 (2005).
 26.
Bruix, J., Boix, L., Sala, M. & Llovet, J,M. Focus on hepatocellular carcinoma. Cancer Cell 5, 215–219 (2004).
 27.
Farazi, P. A. & DePinho, R. A. Hepatocellular carcinoma pathogenesis: from genes to environment. Nat Rev Cancer 6, 674–687 (2006).
 28.
Wurmbach, E., Chen, Y., Khitrov, G. & Zhang, W. Genomewide molecular profiles of HCVinduced dysplasia and hepatocellular carcinoma. Hepatology 45, 938–947 (2007).
 29.
Scheffer, M., Bascompte, J., Brock, W. & Brovkin, V. Earlywarning signals for critical transitions. Nature 461, 53–59 (2009).
 30.
Reimand, J., Arak, T. & Vilo, J. g:Profiler – a web server for functional interpretation of gene lists (2011 update). Nucleic Acids Res. 39, W307–315 (2011).
 31.
Wang, J., Huang, Q., Liu, Z., Wang, Y., Wu, L., Chen, L. & Zhang, X. NOA: a novel Network Ontology Analysis method. Nucleic Acids Res. 39, e87 (2011).
Acknowledgements
This work was supported by NSFC under grant Nos. 91029301, 61134013, and 61072149, by the Chief Scientist Program of SIBS of CAS under grant No. 2009CSP002, and by the Knowledge Innovation Program of SIBS of CAS under grant No. 2011KIP203. This work was also supported by the Shanghai Pujiang Program, 973 Program No.2011CB910201, the National Center for Mathematics and Interdisciplinary Sciences of CAS, and the Aihara Project, the FIRST program from JSPS initiated by CSTP. We thank Yoshito Hirata for his comment on cross validation.
Author information
Affiliations
Collaborative Research Center for Innovative Mathematical Modelling, Institute of Industrial Science, University of Tokyo, Tokyo 1538505, Japan
 Rui Liu
 , Luonan Chen
 & Kazuyuki Aihara
Department of Mathematics, South China University of Technology, Guangzhou 510640, China
 Rui Liu
Key Laboratory of Systems Biology, SIBSNovo Nordisk Translational Research Centre for PreDiabetes, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China
 Meiyi Li
 , ZhiPing Liu
 , Jiarui Wu
 & Luonan Chen
Authors
Search for Rui Liu in:
Search for Meiyi Li in:
Search for ZhiPing Liu in:
Search for Jiarui Wu in:
Search for Luonan Chen in:
Search for Kazuyuki Aihara in:
Contributions
L.C., R.L. and K.A. conceived the research. L.C., R.L. and M.L. designed the numerical simulation and the actual experimental data analysis. R.L. and M.L. performed the numerical experiments. Z.P.L., M.L. and J.W. contributed to the functional analysis. All the authors wrote the manuscript.
Competing interests
The authors declare no competing financial interests.
Corresponding author
Correspondence to Luonan Chen.
Supplementary information
PDF files
 1.
Supplementary Information
Supplementary information
Rights and permissions
This work is licensed under a Creative Commons AttributionNonCommercialShareALike 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/byncsa/3.0/
To obtain permission to reuse content from this article visit RightsLink.
About this article
Data and materials availability: Data are available in the National Center for Biotechnology Information Gene Expression Omnibus database (http://www.ncbi.nlm.nih.gov/geo) under accession numbers GSE2565 and GSE6764.
Further reading

Detecting the tipping points in a threestate model of complex diseases by temporal differential networks
Journal of Translational Medicine (2017)

Forecasting influenza A pandemic outbreak using protein dynamical network biomarkers
BMC Systems Biology (2017)

Dynamic versus static biomarkers in cancer immune checkpoint blockade: unravelling complexity
Nature Reviews Drug Discovery (2017)

The decrease of consistence probability: at the crossroad of catastrophic transition of a biological system
BMC Systems Biology (2016)

Measuring intratumor heterogeneity by network entropy using RNAseq data
Scientific Reports (2016)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.