Identifying early-warning signals of critical transitions with strong noise by dynamical network markers

Identifying early-warning signals of a critical transition for a complex system is difficult, especially when the target system is constantly perturbed by big noise, which makes the traditional methods fail due to the strong fluctuations of the observed data. In this work, we show that the critical transition is not traditional state-transition but probability distribution-transition when the noise is not sufficiently small, which, however, is a ubiquitous case in real systems. We present a model-free computational method to detect the warning signals before such transitions. The key idea behind is a strategy: “making big noise smaller” by a distribution-embedding scheme, which transforms the data from the observed state-variables with big noise to their distribution-variables with small noise, and thus makes the traditional criteria effective because of the significantly reduced fluctuations. Specifically, increasing the dimension of the observed data by moment expansion that changes the system from state-dynamics to probability distribution-dynamics, we derive new data in a higher-dimensional space but with much smaller noise. Then, we develop a criterion based on the dynamical network marker (DNM) to signal the impending critical transition using the transformed higher-dimensional data. We also demonstrate the effectiveness of our method in biological, ecological and financial systems.

For a complex dynamical system with multiple variables or network, assume that we measure the variables at different time points. In this section, we theoretically introduce several generic properties of such a dynamical network when the system approaches a critical transition point.
Specifically, we derive the conditions to obtain dynamical network marker (DNM) in a general system or dynamical network biomaker (DNB) in a biological system, which can characterize the generic properties and predict the critical transition, based on bifurcation theory and center manifold theory (6,7).
We consider the following discrete-time dynamical system that represents the dynamical evolution of a network: where Z(t) = (z 1 (t), ..., z n (t)) is an n-dimensional state vector or variables at time instant t, while P = (p 1 , ..., p s ) is a parameter vector or driving factors that represent slowly changing factors. f : R n × R s → R n are generally nonlinear functions.
Furthermore, we assume that the following conditions hold for Eq.(S1).
1.Z is a fixed point in system (S1) such thatZ = f (Z; P ).
2. There is a value P c such that one or a pair of the eigenvalues of the Jacobian matrix ∂f (Z;Pc) ∂Z Z=Z is equal to 1 in the modulus.
3. When P = P c , the eigenvalues of (S1) are not always equal to 1 in the modulus.
These three assumptions with other transverse conditions (1) imply that the system undergoes a phase change atZ or a codimension-one bifurcation when P reaches the threshold P c .

S3
From a mathematical perspective, the bifurcation is generic, i.e. almost all of the bifurcations in a general system satisfy these conditions. It is notable that most of the systems described by differential equations can be generally discretized and transformed into Eq.(S1), e.g., using methods such as the Euler scheme and the Poincaré section. Thus, we focus on difference equations (S1) during our theoretical analysis in this section.
It is known that the dynamics of a nonlinear system is highly complex far before or after a sudden transition; therefore, the state equations of systems are generally constructed in a very high-dimensional space using a large number of variables and parameters (1-4, 15, 17).
However, if a system driven by known or unknown parameters approaches a critical point, which is a very special phase during its dynamical evolution, it is theoretically guaranteed that the system will eventually be constrained to one-or two-dimensional space (i.e., the center manifold), which can be expressed in a simple form around a codimension-one bifurcation point (5,6,15,17). This is generally guaranteed by the bifurcation theory and the center manifold theory (5)(6)(7)(8). Thus, we can detect the signal of any dynamical system only during this special phase and not during other periods (i.e., neither the before-transition state nor the after-transition state), which is part of the theoretical foundation of this study (15,17).
For system (S1) nearZ and before P reaches P c , we assume that the system is at a stable fixed pointZ, so all of the eigenvalues are within (0, 1) in the modulus. The parameter value P c when the state shift of the system occurs, is known as a bifurcation parameter value or a critical transition value.
This theoretical result was derived based on consideration of the linearized system or equations for Eq.(S1) and the small noise perturbations nearZ. Specifically, by introducing the new variables Y (t) = (y 1 (t), ..., y n (t)) and a transformation matrix S, i.e.
we have where Λ(P ) is the diagonalized matrix of ∂f (Z;P )

∂Z Z=Z
. ζ(t) = (ζ 1 (t), ..., ζ n (t)) are small Gaussian noises with zero means. We denote σ i as the small standard deviation of ζ i for all k. Without any loss of generality, the diagonalized matrix Λ(P ) = diag(λ 1 (P ), ..., λ n (P )) for each |λ i | is Among the eigenvalues of Λ, the largest one (in the modulus), say λ 1 , approaches 1 in the modulus when parameter P → P c . It should be noted that there may be more than one real dominant eigenvalues (the case of multiple roots), for which the derivation is similar (15). Thus we just discuss the case with a unique real dominant eigenvalue λ 1 , i.e., a generic case. The eigenvalue λ 1 characterizes the system's rate of change around a fixed point and is known as the dominant eigenvalue. The before-transition state corresponds to a period where |λ 1 | < 1, whereas the pre-transition stage corresponds to the period with |λ 1 | → 1. Without loss of S5 generality, we assume that the first variable y 1 in Y corresponds to λ 1 , namely (y 1 , 0, ..., 0) is the eigenvector of λ 1 .
With respect to the original state space, noting z i (k) = s i1 y 1 (k) + · · · + s in y n (k) +z i , the variance is When s i1 = 0, lim Var(z i ) → +∞.
When s i1 = 0, lim Var(z i ) → V i , where V i is a bounded positive value.

S6
When s i1 = 0 and s j1 = 0, lim When s i1 = 0 or s j1 = 0, lim When s i1 = 0 and s j1 = 0, lim When s i1 = 0 and s j1 = 0, lim Hence, close to a fixed point, among the original variables Z = (z 1 , ..., z n ) there is a dominant group or a DNM which is composed of variables z i = s i1 y 1 (k) + · · · with s i1 = 0. In other words, each DNM member is directly related to y 1 . Therefore, for the case of a real dominant eigenvalue, we can prove that there is only one dominant group (i.e., DNM) (see Fig.S1).
For this dominant group, we conclude that there are the following generic properties when the system approaches a critical transition point or tipping point (15).
Theorem 1 We consider a stochastically perturbed linearized system for Eq.(S1). When P approaches the saddle-node or period-doubling bifurcation point, there is a dominant group, and the following results hold (also see Fig.S1).
• If both z i and z j are in the dominant group, then while SD(z i ) → ∞ and SD(z j ) → ∞;

S7
• if z i is in the dominant group but z j is not, then while SD(z i ) → ∞, and SD(z j ) approaches a bounded value; • if neither z i nor z j is in the dominant group, then PCC(z i , z j ) approaches a constant, while both SD(z i ) and SD(z j ) approach bounded values, where PCC is the Pearson's correlation coefficient and SD is the standard deviation.
This dominant group of variables or elements is DNM. This theorem is the part of the theoretical basis of detecting the pre-transition state for multi-variable systems with small noise.
The three conditions in the theorem are actually the criteria to detect the DNM (15). From the first criterion, we know that in a DNM group, each variable is strongly fluctuated due to the DNM nature in dynamics, and each pair of two variables in DNM are strongly correlated from the second criterion. In other words, DNM is a group of variables with strongly collective fluctuations, which indicates the emergence of the critical transition.
Actually, for saddle-node or period-doubling bifurcation, we have proved that there is only one dominant group (i.e., DNM) in the SI of (15) (see Fig.S1). Note that the theorem holds for a linearized system. The DNM is the subnetwork that makes the first move from one state toward the other at the critical transition point; therefore, we refer to the DNM as the leading network in this critical transition (17). For a general discrete-  Figure S1: | A sketch for the case of one dominant group (one DNM) just before the critical transition. Under the assumption that the dominant eigenvalue is real, the sketch shows a network with one dominant group when the system approaches the critical point, where the orange nodes represent variables inside the dominant groups, while green nodes are outside. Each edge represents the correlation between two variables. It can be seen that among orange nodes in the dominant group, the correlations are strong. There is no correlation between a dominant-group member and a non-dominant-group node. Note that, if the system is not near the critical point, generally there are edges between nodes of the dominant variables and nondominant variables, which are all in one network.
Note that there is only one DNM for any co-dimension-one bifurcation, except Neimark-Sacker bifurcation which corresponds to two DNMs with one quasi-DNM. For these two DN-Ms, they are all with strong SDs, but lose correlations between them (except the quasi-DNM).
Actually, for a higher co-dimensional bifurcation, there are multiple DNMs, but we can obtain the early-warning signals by detecting one of them. In particular, the saddle-node bifurcation is a typical catastrophic bifurcation, whose prediction is of great importance, in contrast to the period-doubling bifurcation that is a non-catastrophic bifurcation. Next, we study the properties S9 of the Neimark-Sacker bifurcation.
By solving the above three equations, we obtain It is clear that both lim Var(y 1 ) = +∞ and lim Next, we analyze the variances and correlations for the original variables based on the above results. For the original variables Z, through Z(k) −Z = S Y (k), we have z i (k) = s i1 y 1 (k) + s i2 y 2 (k) · · · + s in y n (k) +z i .
• when s i2 s j2 = 0, s 2 i2 >> s 2 i1 and s 2 j2 >> s 2 j1 (i.e., s i1 s i2 ≈ 0 and s j1 It means that when two variables z i , z j are much more related to y 2 than to y 1 , generally they belong to y 2 -related dominant group. Therefore, there are the following critical results (also see Fig.S2).
Therefore, there are two dominant groups respectively related to y 1 and y 2 (see Fig.S2). The PCC between a y 1 -related member z i and a y 2 -related member z j is either 1 (if z i or z j is related to y 1 or y 2 , respectively) or bounded value (if z i and z j are related to y 1 and y 2 simultaneously).
To summarize and state these results clearly, we make the following definitions.
• Dominant group 2: contains variables z i where both s i1 = 0 and s i2 = 0, i.e., variables are only related to y 2 .
• Common-dominant group: contains variables z i where s i1 = 0 and s i2 = 0, i.e., variables related to y 1 and y 2 simultaneously.
• Non-dominant group: contains variables z i where s i1 = 0 and s i2 = 0, i.e., variables have no relation with y 1 and y 2 .
Theorem 2 We consider a stochastically perturbed linearized system for Eq.(S1). When P approaches the Neimark-Sacker bifurcation point, the following results hold (also see Fig.S2).
• If both z i and z j are in the same dominant group 1 or 2, then • if z i is in dominant group 1 and z j in dominant group 2, then • if z i is in a dominant group (including dominant 1 or 2, common-dominant group) and z j is in the non-dominant group, then while SD(z i ) → ∞ and SD(z j ) approaches a bounded value; . Each edge represents the correlation between two variables. It can be seen that among orange nodes in a same dominant group 1 or 2, the correlations are strong. However, between dominant group 1 (or 2) and the common-dominant group, any two nodes show relatively weak correlation. There is no correlation between dominant group 1 and dominant group 2, and between a dominant-group member and a non-dominant variable. Note that, if the system is not near the critical point, generally there are edges between nodes of the dominant variables and non-dominant variables, which are all in one network.
• if z i is in the common dominant group, and z j is in dominant group k (k = 1, 2) or the common dominant group, then |PCC(z i , z j )| approaches a constant less than 1, while SD(z i ) → ∞, and SD(z j ) → ∞; • if neither z i nor z j is in the dominant group, then |PCC(z i , z j )| approaches a constant S17 less than 1, while both SD(z i ) and SD(z j ) approach bounded values; where PCC is the Pearson's correlation coefficient and SD is the standard deviation.
Remark 1 When both z i and z j are in the common dominant group, if their relations to y 1 are much more stronger than that to y 2 , then generally z i and z j are considered in dominant group 1. Contrarily, if the relations of z i and z j to y 2 are much more stronger than that to y 1 , then generally z i and z j are considered in dominant group 2.
Thus, the members in the common dominant group are related to y 1 and y 2 in a similar way, e.g., z i and z j approximately satisfy s i1 /s i2 ≈ s j1 /s j2 , and then lim Note that the theorem holds for a linearized system.

A.1.3 Dynamical network marker for nonlinear systems near critical transition point with small noise
We presented the critical properties of the linearized system for Eq.(S1). However, for a nonlinear system Eq.(S1) approaching the bifurcation point (the saddle-node or period-doubling bifurcation point) or tipping point, we can observe directly obtain the properties of DNM based on Theorem 1 as the following remark.
Remark 2 For nonlinear case Eq.(S1) near a bifurcation point (the saddle-node or perioddoubling bifurcation point), the dynamical behavior has the same tendency as that of the linearized case, that is, when the system is approaching to the bifurcation point, both indices SD and |PCC| in the DNM increase sharply, while |PCC| between DNM and other non-DNM molecules decreases rapidly, i.e., 1. If both z i and z j are in the DNM, then PCC(z i , z j ) increases, while SD(z i ) and SD(z j ) increases, and there is no significant change for SD(z j ); 3. if neither z i nor z j is in the DNM, then there are no significant changes on PCC(z i , z j ), SD(z i ) and SD(z j ). Remark 3 For nonlinear case Eq.(S1) near a Neimark-Sacker bifurcation point, the dynamical behavior has the same tendency as that of the linearized case, that is, when the system is approaching to the bifurcation point, both indices SD and |PCC| in DNM k (k = 1, 2) increase sharply, However, for the three cases, i.e., any two members between DNM 1 and DNM 2, between DNM k (k = 1, 2) and non-DNM molecules, and between quasi-DNM and non-DNM molecules, |PCC| decreases rapidly. On the other hand, for the other two cases, i.e., any two members in quasi-DNM, and between DNM k (k = 1, 2) and quasi-DNM, |PCC| has no significant change. i.e.,

If both
and SD(z j ) drastically increase; 2. if z i is in DNM 1 (or DNM 2, or quasi-DNM) but z j belongs to non-DNM, then PCC(z i , z j ) decreases, while SD(z i ) drastically increases, and there is no significant change for SD(z j ); 3. if z i is in DNM 1 and z j is in DNM 2, then PCC(z i , z j ) decreases, while both SD(z i ) and SD(z j ) drastically increase; S19 4. if z i is in DNM 1 (or DNM 2) and z j is in quasi-DNM, then there is no significant changes on PCC(z i , z j ), while both SD(z i ) and SD(z j ) drastically increase; 5. if both z i and z j are in quasi-DNM, then there is no significant changes on PCC(z i , z j ), while SD(z i ) and SD(z j ) drastically increase; 6. if neither i nor j is in any DNM, then there are no significant changes on PCC(z i , z j ), SD(z i ) and SD(z j ).
Note that for 5 in Remark 3, the members in the common dominant group are related to y 1 and y 2 in a similar way, e.g., z i and z j approximately satisfy s i1 /s i2 ≈ s j1 /s j2 , and thus As indicated in the theoretical results, there is only one DNM for any co-dimension-one bifurcation except Neimark-Sacker bifurcation which corresponds to two DNMs with one quasi-DNM. For these two DNMs, they are all with strong SDs, but lose correlations between them (except the quasi-DNM). Actually, for a higher co-dimensional bifurcation, there are multiple DNMs, but we can obtain the early-warning signals by detecting one of them. In particular, the saddle-node bifurcation is a typical catastrophic bifurcation, whose prediction is of great importance, in contrast to the period-doubling bifurcation that is a non-catastrophic bifurcation.
Based on above results, by using the data of either multiple-samples or time-course highthroughput data for complex biological processes, we can identify the corresponding DNMs (or DNBs: dynamical network biomarkers) based on the above conditions, e.g., by the algorithm in the following section D or the algorithm described for DNB (15,17). Clearly, this approach is a model-free method, and DNM can be obtained only based on the data. Note that sample data without time points can also be used to construct DNM based on the above conditions.
Critical slowing-down (CSD) (16) for single variables has been considered as a leading indicator to predict the critical transitions provided that the system is fluctuated by small noise, S20 which assumes the linear restoring force, i.e., small noise. However, CSD mainly characterizes the dynamics of single-variables, in contrast to DNM which characterizes the dynamics of multi-variables or a network. Actually, for a single-variable system, DNM is equivalent to the principle of CSD.

A.2 Three states during a critical transition process
During a state transition, the dynamics of the system can be divided as three stages, as shown in Fig.S1. Before-transition state corresponds to a stable equilibrium. In bio-medical systems, it generally corresponds to a "normal state" or a stable period that the disease is under control. Pretransition state is the limit of the before-transition state. In bio-medical systems, it represents a "pre-disease state" just before the critical transition to the disease state. After-transition state corresponds to another stable equilibrium. In bio-medical systems, it represents a badly ill stage or "disease state", and is usually difficult to return to the before-transition state even by big perturbations.
As shown in Table S1, for one-dimension system, the before-transition state and pre-transition state are near each other with the similar values, whereas the after-transition state is significantly different from the above two states. On the other hand, the variation (low) of the beforetransition state is significantly different from that (high) of the pre-transition state, but may be similar to that (low) of the after-transition state. For multi-dimension system, the before-  Figure S3: | Definition of three states in a transition process. Before-transition state is a stable equilibrium. Pre-transition state is the limit of the Before-transition state. After-transition state is another stable equilibrium. (a). Deterministic dynamical evolution (black line) of system state with time or parameter. (b). Stochastic dynamical evolution (blue line) of system states with time or parameter. In terms of state value (i.e., when it is a deterministic system or we only consider average value of a stochastic system), there is no significant difference between before-transition state and pre-transition state. But after-transition state is significantly different from other two states. In terms of variations of state value (when it is fluctuated by noise or it is a stochastic system), there is a significant difference between the before-transition state and the pre-transition state. In a real system, the system is always fluctuated by noise, and thus the process of a state transition can be characterized by three states. Also see Table S1 for the features of the three states in a process of state transition. The green circle represents the pretransition or critical state, while the red one is the state immediately after the transition. We aim to detect the pre-transition state or green circle, rather than the red circle.

A.3 Distribution embedding near critical transition point with big noise by moment expansion
Note that we do not consider the flickering phenomenon in this paper. In other words, we assume only to have the observed data near the original stable state before the critical transition S22 Thus, based on the above features, we can distinguish the three states when the system is fluctuated with small noise.

One-dimension System
Before-transition state  to another state. Thus, the probability distribution from the observed data is the conditional distribution for the whole system (e.g., in contrast to the stationary probability distribution) due to no observed data on the after-transition state.
When the system is fluctuated by big noise, the critical point is far earlier than the bifurcation point, which may make critical slowing-down principle fail. However, we can transform the stochastic system into moment equations, a set of ordinary differential equations (ODEs) with moments as variables, that is, the mean, variance, skewness and so on, and thus reduce the level of the original noise. A set of moments correspond to a probability distribution, and such a transformation is actually to convert the state dynamics into the distribution dynamics. In the following, we explain such a procedure based on dynamical systems theory.
Then, we can approximate system (S4) by the following moment evolution equation (11) with moment expansion to the kth order: where g(m(t)) = (g 1 (m(t)), ..., g N (m(t))) are nonlinear moment functions derived from f (x(t)), and moment variables are m(t) = (m 1 (t), ..., m N (t)). Due to truncation to the k-th order of moments, the error functions are η small (t) = (η small 1 (t), ..., η small N (t)), which can be taken as noise terms. m i (t) is a moment, and N is the total number of the moments up to the k-th order.
In particular, if expanding the moments to k = 2, then N = n(n + 3)/2, where the moment variables m 1 (t) are means (the first order moment of variable x i , i.e., x i ) and m 2 (t) are covariances (the second order central moments of variables x i and x j , i.e., ( . Actually, to approximate the original stochastic dynamics or minimize the error terms η small (t) by finite order moment equations (S5), many sophisticated schemes to truncate moments, such as moment closure (12,13), have been proposed. By this moment-system with smaller noise, we can directly use DNM to detect the critical transition, where the critical point is not the bifurcation point of the original system (S4) but the one of (S5). Note that any probability distribution can be represented or expanded by Gram-Charlier or Edgeworth series (26) in terms of moments m(t). Hence, a set of moments represent one probability distribution, i.e., this moment-system represents the dynamics of the state probability-distribution rather than the state dynamics of the original system x(t), and thus the critical point of the moment-system (S5) corresponds to the drastic change of the probability-distribution rather than the drastic change can also be exactly expressed by (S5) with k = 2 and zero error.
For a general nonlinear stochastic system, with moment expansion to an infinite order (11),

S25
i.e., as k → ∞, the dynamics of system (S4) can be expressed by (S6) in an exact manner, which becomes a deterministic system with zero error or noise.
where F (M (t)) = (g 1 (M (t)), . Also see the intuitive explanation in Figure 1 in the main text.
In other words, it is expected that, the higher the order of moment expansion is, the more accurate the resulting dynamics (S5) would be to that of the original system (S4) in terms of the distribution, and thus the smaller the noises or error terms are. This result gives the theoretical basis to reduce the noise level by increasing the dimension of the original system. In particular, the moment system corresponds to the distribution dynamics, i.e., a set of moments represent one distribution. Thus, (S5) or (S6) can be also viewed as the transformation from state dynamics with big noise to distribution dynamics with small noise. Note that we can only observe the data in the original state before the transition, and have no information on the state after the transition, i.e., no flickering. Thus, the observed probability distribution is the conditional distribution. Also note that in real situations, we only have observed data, and do not need the above analytical implementations. Next we will first describe the implication of the critical transition for the moment-system, and then give the detail procedure to construct the synthetic data in a higher-dimensional space from the observed original data.
Next, we specifically derive the moment evolution equations by expanding the moments to the second order, i.e., k = 2. Let the first-order moment (or mean) be u = {u 1 , ..., u n } with u i = x i , and the second-order moment (or covariance) be v = {v ij } i,j=1,2,...,n with v ij = (x i − u i )(x j − u j ) . Then the moment equations are the following deterministic system S26 (11) where Therefore, the original stochastic system (S4) is transformed to a deterministic system (S7)- is an n × n constant matrix and B = (B i ) is a constant n vector, then obviously we can analytically derive the moment system (S7)-(S8) directly, due to

S27
Thus, the original system can analytically expressed by the first-order moments u and the second-order moments v.
However, it the original system is nonlinear, the deterministic system (S7)-(S8) is generally unclosed with the first and second order moments. That is, in the expressions (S9) and (S10) there are usually involved with high-order moments (with the third or higher order moments).
To circumvent this problem, the approximation methods, such as moment-closure (41), are used to truncate moments up to the second order, thereby making (S7)-(S8) closed in terms of the first and second order moments. Due to such an approximation, there are additional error or noise terms in g i and g ij of (S7)-(S8), as described in (S5). Note that we can expand the original system also by binomial moment series (27) in a similar way.

A.4 Critical state-transition and critical distribution-transition
Note that we do not consider the flickering phenomenon in this paper, i.e., we assume only to have the observed data near the original stable state before the critical transition to another state, and based on such observed data, to detect the early-warning signals of the critical transition.
Thus, the probability distribution from the observed data is the conditional distribution for the whole system due to no observed data on the after-transition state.
By this moment-system (S5) or (S6) with smaller noise, we can directly use DNM to detect the critical transition, where the critical point is not the bifurcation point of the original system (S4) but the one of (S5) or (S6). As described in Supplementary Information B, any probabilitydistribution can be represented or expanded by Gram-Charlier or Edgeworth series in terms of moments m(t). Hence, the moment-system represents the dynamics of the state probabilitydistribution rather than the state dynamics of the original system, and thus the critical point of (S5) or (S6) corresponds to the drastic change of the probability-distribution for state x(t) (i.e., distribution-transition) rather than the drastic change of the state x(t) (i.e., state-transition). In S28 other words, the bifurcation point of the moment-system (S5) m(t) is the critical "distributiontransition" point (for a stochastic system with noise), whereas the traditional bifurcation point of the original state-system (S4) x(t) is the "state-transition" point (for a deterministic system without noise) (see Supplementary Information B and Fig.S11). Thus, the critical transition in this paper implies the critical distribution-transition, which is the generalization of the traditional state-transition (see Fig.S11).

A.5 Constructing time-course data of moments in a higher-dimensional space from original data
From the observed original data x(t) of state variables, we can construct the time-course data of moment variables m(t) in a higher-dimensional space by simply using a sliding-window scheme, as shown in Fig.S2. In particular, for k = 2 with the window interval τ (an integer), the time course data for the first order moment of variable x i (t) with i = 1, ..., n can be obtained by where there are n first order moments (or means) and t ≥ τ . On the other hand, for the second central moment of variable x i (t) and x j (t) with i, j = 1, ..., n, we have where there are n(n + 1)/2 second order moments (or covariances) due to their symmetry of the covariances.
From such a procedure, the original n-dimension time-course data are transformed into n(n + 3)/2-dimension time-course data. Depending on the observed system, if higher order moments (k > 2) are required so as to make the noise sufficiently small, we can obtain their time-course data in a similar way. Thus, instead of the original data with big noise, we can S29 use DNM to analyze the transformed higher-dimensional data with the reduced noise level for detecting the early-warning signals of the critical transition.

B One-dimensional example
In this section, we illustrate the DNM method as well as the critical distribution transition by using an one-dimensional example.

B.1 Moment-system
An one-dimensional state system x(t) is represented as follows.
where η is a white noise with zero mean, i.e., η = 0, and the variance (or amplitude) η 2 = σ. Figure S5: | The state transition in system of Eq.(S11) without noise The sketch presents the state transition in the deterministic system (S11). The solid curves represent the stable equilibria and the dashed curve stands for the unstable equilibria. With the increase of the parameter p, the transition occurs near p = 2 with small noise.
By analyzing Eq.(S11), we find that when the system is under a small noise (σ = 0.1), there is a bifurcation point around p = 2 (see Fig.S5). As shown in Fig.2 in the main text, the traditional CSD-based criteria, i.e., SD and AR can signal the critical transition since the critical S30 point is near the bifurcation point. However, when the noise level increases to σ = 1.5, both SD and AR fail to signal the imminent transition, which occurs much earlier than that of the original system under small noise. Therefore, we use the moment-expanding scheme to "make big noise smaller" so that the DNM-based method works again. Actually, when the system is under big noise, the critical point is far ahead the bifurcation point, comparing with the case with small noise, where the critical point is close to the bifurcation point (see Fig.2 in the main text). Generally, the big noise actually makes the critical transition occurring earlier.
Let E(x) represent the expectation value of x. Through the process of moment closure, i.e., we get the 2-dimensional moment equation of Eq.(S11) as follows where m 1 is the mean (the 1st-order moment) and m 2 is the variance (the 2nd-order moment), σ is the amplitude of original white noise η, and ξ i is a small noise generated from moment closure.
For 2-dimensional system (S14, S15), the DNM-based method works again due to the reduced noise level. We respectively computed the SDs and ARs for the first-order moment (m 1 ) and the second-order moment (m 2 ) in system (S14, S15). The prediction results by DNM is given as Fig.2g-i in the main text.

B.2 Critical state-transition represented by states, and critical distributiontransition by moments
Generally, the probability distribution of x(t) can be represented or expanded by Gram-Charlier or Edgeworth series (26) in terms of moments m(t) or cumulants, i.e., the probability distribution of x(t) can be expressed as φ(x(t)) = P rob (x(t)) = f (x(t); m(t)). With the finite moments or cumulants (i.e., the finite terms of Gram-Charlier series), the Gram-Charlier series is the approximation to the original probability distribution, but a Gaussian distribution can be exactly represented by the Gram-Charlier series with first and second moments or cumulants. Note that the probability distribution here is not the stationary distribution in the whole space but the probability distribution in the basin of the original stable equilibrium. Also it is the approximation of the real probability distribution because of the finite moments or moment closure. Thus, the bifurcation for the moment-system with the order-two moments will result in the drastic change of the partial distribution for the Gaussian part, i.e., will result in the distortion of the Gaussian partial-distribution.
Thus the critical point or bifurcation point of the moment-system (S14)-(S15) corresponds to the drastic change of this probability distribution rather than the drastic change of the state x(t). In other words, different from the critical state-transition of the deterministic system in terms of x(t), the transition of the stochastic system (S14)-(S15) in terms of m 1 (t), m 2 (t) is the critical distribution-transition. As a result of the distribution-transition, a high probability for one state is changed to another state.
Next, we show that the earlier transition caused by big noise is actually a "critical distributiontransition" in a stochastic system, that is, before the real bifurcation value (p c = 2 of the deterministic system in the above example), where the probability distribution of the state drastically changes from one to the other in the form of the moments. Such a critical distribution-transition is related to the magnitude of noise and the distance between the parameter p and the bifurca-S32 tion value p c , that is, the larger the noise is, the earlier the distribution-transition is; the nearer with the bifurcation value. Different from the traditional (critical) "state-transition" for a deterministic system, the critical "distribution-transition" for a stochastic system results in a new probability distribution. By such a transition, the probability of the current stable state may been significantly reduced while the probability of another stable state may be drastically increased. Specifically, based on the above example (with large samples), there is a probability distribution of state between p = −2 and p = 2 (see Fig.S5), during which the system may shift stochastically from a stable state to the other one. Figure S6 shows the changes of distribution of x in Eq.(S11) under a big noise σ = 1.5 (Fig.S6a) and a small noise σ = 0.1 (Fig.S6b) respectively. The drastic change of the distribution can be measured by the Kullback-Leibler (KL) divergence between two distributions, as shown in Fig.S7. It is worth noting that although the KL divergence can indicate the earlier transition due to the big noise, it requires many samples to estimate the distribution, which is not satisfied in many realistic cases. The red points represent the stable equilibria, and the black point is the unstable equilibrium. It can be seen that there is a bifurcation of the moment-system (S14, S15) between p = 0.68 and p = 0.9 (actually around p = 0.80) when σ = 1.5, and there is a bifurcation between p = 1.77 and p = 2.1 (actually around p = 1.94) when σ = 0.1. Clearly, the moment-system (S14, S15) is approximate to the original system, and thus its bifurcation point of the parameter (i.e., p = 0.80 for σ = 1.5 or p = 1.94 for σ = 0.1) is approximate to the critical distribution-transition point or the original stochastic system (i.e., p = 0.8 for σ = 1.5 or p = 1. In this sketch of a molecular network, there are 18 nodes whose dynamical regulatory relationships are given as stochastic system S16. The edges represent positive or negative regulations among nodes. In this section, we use a regulatory network with 18 molecules (see Fig. S9) to conduct a numerical simulation and theoretically demonstrate the effectiveness of DNM for detecting the pre-transition state. Molecular networks are often used to study various biological processes such as transcription, translation, diffusion, and translocation processes that affect gene activities (1,4,(22)(23)(24). The following 18 differential equations represent the gene regulation of 18 genes in a network where gene regulation is represented in a Michaelis-Menten form with the exception of the degradation rates, which are linearly proportional to the concentrations of the S36 corresponding genes.
where P is a scalar control parameter ranging from −1 to 1 and η i (t) (i = 1, 2, ..., 18) are Gaussian noises with zero means and covariances κ ij = Cov(η i , η j ). Here we set the amplitude κ ii of η i as 1. x i (i = 1, ..., 18) represent the concentrations of mRNA-i. There is a stable equilibrium when the parameter P is smaller than the critical value P c = 0. When P passes P c = 0, there is a state transition of the system (see Fig.3 in the main text). However, it is hardly to signal the transition based on the traditional method based on CSD due to the big noise. Therefore we carry out the DNM method by expanding the dimension of the system.
The differential equations Eq.(S16) can be transformed into the difference equations X(k + 1) = f (X(k), P ) using the Euler scheme (25), with a small time interval ∆t = 0.01 when P ∈ [−1, 1]. Note that X(k) is the vector of X(t) at the time instant k∆t. By using this system,

S37
we simulated the data and applied the DNM scheme to identify the pre-transition state as shown in Fig.3 in the main text. The simulations were performed in MATLAB(R2009a) using the Euler-Maruyama integration method with the Ito calculus (25). Figure S10: | The change of the largest singular value. This figure shows the change of the largest singular value when the system approaches the critical point. Clearly, due to noisy data and small number of samples, there is no significant signal near the critical point, i.e., the largest singular value obtained by SVD cannot indicate the imminent critical transition under big noise.

C.2 Comparison with SVD
For the purpose of comparison to DNM, we also employed singular value decomposition to detect the early-warning signals of this multi-variable system from the observed data, i.e., using the largest singular value (Fig.S10), which is the square root of the dominant eigenvalue of M M * , where M is the stochastically changing matrix of X in system (S16) and M * is the conjugate transpose of M . As shown in Figure S10, due to noisy data and small number of samples, there is no significant signal for the largest singular value, which cannot indicate the S38 imminent critical transition under big noise.

D Algorithm for calculating DNM
For a given dataset, the calculation of DNM score is based on the following procedure.
At each time point, calculate the 1st and 2nd moments of the data for each variable.
We conduct a new type of data normalization for all the variables obtained in 2.
where A denotes the normalized expression data for each variable in each case sample, D case is the data for each variable in every case sample, while the mean(N control ) and SD(N control ) are the mean and standard deviation for each variable in all the control samples, respectively.
At each sampling point (or period), by using the student t-test with significance level p < 0.05, choose those variables whose expressions show significant changes (in the sense of mean values) between the case samples and the control samples.
By using the false discovery rate (FDR), correct the multiple comparisons or multiple student t-tests for the variables selected in each period.
The two-fold change method is then adopted to further screen variables that exhibit relatively significant changes in standard deviation in each period.
Cluster variables at each sampling time point by correlations, i.e., each group (or the "in" group) is composed of molecules with high correlations, and the "out" group S39 contains all the other molecules. According to the theoretical results, if a time period is in or close to the pre-transition stage, then the clusters obtained in this period are potential dominant groups, and should satisfy the 3 criteria of DNM. Hence every cluster is a candidate of the DNM, whereas its corresponding period is a candidate of the transition point in the pre-transition state.
Step 3 Among all clusters, determine the dominant group or the DNM by significance analysis, i.e., the 3 criteria of DNM listed in main text are used to determine whether a candidate group is DNM.
Calculating the DNM score.
where SD is the average standard deviation of all variables in DNM; PCC in is the average Pearson's correlation coefficient between variables in DNM in absolute value; PCC out is the average Pearson's correlation coefficient between a variable inside DNM and another one outside in absolute value; and is a small positive constant to avoid zero division. Here, instead of SD, it is also appropriate to use CV (coefficient of variation) in (S18) when the variables are not normalized.
We then screen clusters using the three criteria of DNM. That is, the SD of each variable in any candidate dominant group should sharply increases. The PCC in (in absolute value) for each pair of variables among any candidate dominant group should drastically increases. The PCC out between one variable in any candidate dominant group and another one outside should drastically be decreased.

E Application to three real datasets
For the three datasets, that is, gene expression profiling dataset of acute lung injury induced by phosgene gas and the ecological data of eutrophic lake state, we identified the pre-transition states by using DNM.
The gene expression profiling dataset of acute lung injury was downloaded from the NCBI GEO database (access ID: GSE2565, GSE13009, GSE30550) (www.ncbi.nlm.nih.gov/geo). In these datasets, probe sets without corresponding gene symbols were ignored during our analysis. The expression values of probe sets that are mapped to the same gene were averaged. The three datasets are described in Table S1. Besides, we described the data processing in details and conducted the functional analysis results (g:profiler: http://biit.cs.ut.ee/gprofiler/ and NOA: http://app.aporc.org/NOA/) (29,30) for some important molecules. This dataset was obtained in an experiment on toxic gas-induced lung injury effects, i.e., pulmonary edema (33), and was downloaded from the NCBI GEO database (ID: GSE2565) (www.ncbi.nlm.nih.gov/geo). In this data set, there are 22,690 original probe sets. We mapped them to the corresponding NCBI Entrez gene symbols by using the GEO annotation. Meanwhile, probe sets without corresponding gene symbols were not considered during our analysis.
The expression values of probe sets mapped to the same gene were averaged. There were 12,871 genes left. Furthermore, the expression profiling information was mapped to the integrated networks, i.e., genes were linked and correlated by the combined functional couplings among them from various databases of protein-protein interactions. We downloaded the biomolecular interaction networks and functional linkage information for Mus musculus from various databases, including BioGrid (www.thebiogrid.org), STRING (http://string-db.org/), and KEG-G (www.genome.jp/kegg). After removing the redundancy, we obtained 7,950 linkages in 6,683 mouse proteins/genes for acute lung injury. Next, the genes evaluated in these microarray datasets were mapped individually to these integrated functional linkage networks.
To study acute lung injury, a genomic approach was used to investigate the molecular mechanism of phosgene-induced lung injury. The experiments were conducted to determine the temporal effects of phosgene exposure on lung tissue antioxidant enzyme concentrations and the gene expression level, and these results were compared with those from air-exposed mice treated in a similar manner to assess the role of the GSH redox cycle in this oxidative lung injury model. To produce two groups of data, i.e., the control group data and case group data, two groups of CD-1 male mice were exposed to air or phosgene, respectively. Lung tissues were collected from air-or phosgene-exposed mice at 0. 5,1,4,8,12,24,48, and 72 hr after exposure. The details of the experiment are available in the original paper (33). We introduce S42 some key background as well as our functional analysis as follows.
Phosgene gas, a nerve gas, is one of the most important and common chemical industry gases (32). Some pathogenic mechanisms of the acute lung injury induced by phosgene have been identified (33). According to the calculation results by the dynamical network marker (DNM), a major transition is considered to occur from 4 hr to 8 hr. Also, the pathway enrichment analysis and GO functional analysis showed that the identified genes in the DNM were closely related to the mechanism of disease progression (33,39). and IL1B are related to the MAPK signaling pathway. This indicated the denaturation of lipoids, proteins, and nucleic acids that may have been oxidized by phosgene (33,39).
Briefly, it was found that the main physiological effects occurred within the first 8 hours after exposure, resulting in common observations of enhanced BALF protein levels, increased pulmonary edema, and ultimately decreased survival rates (33). At the concentration delivered, 50%-60% mortality was routinely observed at 12 hours while 60%-70% mortality was observed at 24 hours (33). The detailed results are also available in the original paper (33). Early warning signals of lung injury based on the identified DNM are shown in Fig.4a of the main text, which showed that the pre-transition state may start around 4 hr, while the system may enter the aftertransition state after 12 hr. Our prediction based on the DNM score agreed with the actual disease development.
To explain our method more clearly, we used acute lung injury as a concrete example to describe our computational procedure step by step. In GSE2565 data set, there are 22,690 original probe sets. We mapped them to the corresponding NCBI Entrez gene symbols by using the GEO annotation. Meanwhile, we screened out all probe sets with incorrect corresponding gene symbols while probe sets that detected the same genes were combined using the averaging method. After this procedure, there were 12,871 genes left.
Step 1 Choose differential expression genes from the high-throughput gene data for acute lung Based on set A of the selected differential expression molecules, through two-fold change screening, we obtain B = [0, 29,72,195,269,163,173,188,176] genes respectively for the 9 sampling time points.
Step 2 We carried out the dimension expanding scheme, by calculating the 1st and 2nd moments of the data in each 2-sampling-points sliding-window.
We conducted the data normalization for all the variables (the 1st and 2nd moments).
where A denotes the normalized expression data for each variable in each case sample, D case is the data for each variable in every case sample, while the mean(N control ) and SD(N control ) are the mean and standard deviation for each variable in all the control samples, respectively.
Cluster variables at each sampling time point by correlations. We got the candidate groups (the 2nd and 3rd group among all clustering groups).
Step 3 Among all clusters, determine the dominant group or the DNM by significance analysis.
Calculating the DNM score.

S45
The first cluster which is qualified to show a critical transition is the 3rd group clustered in the 5th sampling point (8 h

E.2 Dataset 2. Ecological data about the eutrophic lake state
The data record the historical changes in the Erhai Lake catchment system in Yunnan, China (34). The monitoring data mainly provide historical trends for lake water quality, and several related chemical indices. Microfossil and geochemical records from dated lake sediment cores were used to reconstruct the trends in the state of lake diatom communities and water quality S47 back to the 1880s, and these records seem to reproduce the abrupt change in algal states observed in recent monitored data, between 2001 and 2005. From the combined monitored and lake sediment data, it seems that a profound transition in the algal community occurred around 2001. From the original data, it is also pointed out that the transition in Erhai Lake in ∼2001 corresponds to the classic development of a bistable system (34), that is, the shift in the state of the diatom communities and the abrupt changes in other water quality indicators are consistent with the behaviour of a lake that is shifting from a stable state (i.e., the oligotrophic state) to an alternative stable state (the eutrophic state). Therefore, the DNM-based method is applicable to predict the imminent transition from one state to another (see Fig.4f in the main text).
The dataset record the historical data for lake-water-quality variables including lake water level, We listed all of the 18 identified DNM members in the Supplementary Table 'Identified DNM members B'.

E.3 Dataset 3. Financial data about the bankruptcy of Lehman Brothers
The critical transitions in financial market are often referred to the broken of unstable "financial bubbles". In financial markets the participants slowly build up an ever densifying web of mutual dependencies through investments and transactions to hedge risks, which gradually create the unstable "bubbles" (42,43). Detecting the onset of critical transitions in these complex dynamical systems is difficult due to the lack of the understanding of underlying mechanism as well as the insight to create models with predictive power. The Financial data about the bankruptcy of Lehman Brothers consist of the daily prices of interest-rate swaps (IRS) in the USD and EUR currency. The data spans more than twelve years: the EUR data from 12/01/1998 to 12/08/2011 and the USD data from 04/29/1999 to 06/06/2011 (35). It is found that the CSD-based indices, i.e., SD and AR, "do not show a distinctive change of behavior around the time of the bankruptcy in both USD and EUR markets (35)" (also see Fig.S12). However, the DNM-based method provides the obvious signal for the critical transition (see Fig.4g in the main text). The DNM members of this financial dataset include the mean of USD (mean(USD)), the standard deviation of USD (SD(USD)), the mean of EUR (mean(EUR)), the standard deviation of EUR (SD(EUR)), and the correlation between USD and EUR (Corr(USD, EUR)). In the USD market, a long-term build-up of stress started in the beginning of 2004 and continued for more than four years, eventually peaking shortly before the bankruptcy (35). This is coincident with the common sense that markets created 'bubbles (44), which grow slowly along the time and may 'burst, leading to sudden regime shifts triggered by a catalyzing event. The EUR market, on the other hand, appears to have played a more submissive role. The information dissipation length indicator rose distinctly for about half a year before the Lehman Brothers bankruptcy and diminished in about the same amount of time (35).
We listed all of the 5 identified DNM members in the Supplementary Table 'Identified DNM members C'.

S50
F State-transition with small noise and distribution-transition with big noise Figure S13 illustrates the mathematical meaning of the critical state detected by dynamical network markers for a system with small or big noise. For a system with small noise, the critical point is the bifurcation point of the original state system, whereas for a system with big noise, the critical point is the bifurcation point of the moment-system, which corresponds to the probability distribution. Note that it is the probability distribution in the basin of the stable equilibrium rather than the whole space. The moment-system with moment closure is the approximation of the real probability distribution, or conditional probability distribution. Thus, for a system with big noise, the critical point is for the probability distribution of the state.
Clearly, the distribution-transition is the generalization of the traditional state-transition. For instance, the bifurcation for a moment-system with the order-two moments, will result in the drastic change of the partial distribution for Gaussian part (or distortion of the Gaussian part) rather than the real distribution.
G Comparison of distribution embedding scheme with support vector machine Each individual system has its specific critical transition, and thus detecting early-warning signals of the critical transition is a system or individual specific problem. Although detecting early-warning signals of the tipping point is not a typical classification problem, to compare with the machine learning methods we design such a computational problem for classifying the before-transition samples and the pre-transition samples. In particular, we compared the performance of our embedding scheme with a machine learning scheme, i.e., the support vector machine (SVM). The SVM maps the observed sample space to a higher dimension space Thus, the critical point of m(t) is for the distribution-transition of x(t) Figure S13: | Detection of early-warning signals by dynamical network markers for a system with small or big noise. For a system with small noise shown in the upper figure, DNM detects the critical state, which is the bifurcation point of the original deterministic system of x(t). Thus, the critical state detected by DNM for x(t) corresponds to the state-transition of x(t). In other words, such a transition results in the drastic change of the state x(t). On the other hand, for a system with big noise shown in the lower figure, there is no effective method to detect the critical state due to strong fluctuation. In this work, we transform the original system x(t) with big noise to the moment system m(t) with small noise, which can be detected by DNM for the critical state. Since the moment system corresponds to the probability distribution (i.e., a set of moments represent one probability distribution), the bifurcation point of the moment system is also the critical point of the distribution p(x(t)). Thus, the critical state of the moment system detected by DNM for m(t) is the critical point of the distribution, or distribution-transition. In other words, such a transition results in the drastic change of the distribution of x(t). Clearly, the distribution transition is the generalization of the state-transition, i.e., for small noise, they are equivalent. and thus increases the dimensionality of the data, so that the samples can be classified by hyperplanes in the high-dimensional space. Thus SVM is a supervised learning scheme which increases the dimensionality of the data to improve the ability of the classifiers.

Detection by DNM
Specifically, we applied these two methods to a dataset from the eighteen-nodes network Figure S14: | Comparison of distribution embedding scheme with a machine learning scheme. Based on a dataset from the eighteen-nodes network Eq.(S16), the receiver operating characteristic (ROC) curves respectively show the performance of our distribution embedding scheme and that of the support vector machine (SVM). It can be seen that under big noise, the distribution embedding scheme or high-dimensional embedding (AUC=0.8124) works better than SVM (AUC=0.8124).

S53
H The signal-to-noise ratio of the three datasets We have investigated the signal-to-noise rate (SNR) by using the three real datasets at the critical points in terms of noises and signals. Specifically, we carried out the CSD-based method on each dataset with the original data (before the moment expansion). Then, we compared the signals between the CSD-method and our moment-expanding algorithm. The comparison of S-NR is also presented in Table.S2. The signal-to-noise ratio (SNR) is calculated by the following formula: where µ is the signal mean and σ is the standard deviation of the noise. The results indicated that SNRs in the higher dimensional space all increased after the implementation of the moment expansions, compared with those in the original space, which show the effectiveness of our method.