Main

Although some research has focused on the details of such transient dynamics as failure propagation19, there is an entire class of real-world dynamic complex systems in which networks can spontaneously recover after their collapse, and the mechanism for this network global recovery has not yet been adequately understood. The Internet can initially fail after a severe attack and then, after a period of time, recover. A human brain can spontaneously recover after an epileptic attack. A traffic network returns to its normal state after a period of gridlock. A financial network may, after a period of time, recover after having a large fraction its constituents fail.

We develop a framework for understanding dynamic networks that demonstrate an ability to spontaneously recover. We start with three fundamental assumptions.

First, we assume that any node in the network can fail independently of other nodes (internal failure) with a probability of p d t during a time interval d t. Second, we assume that any node can fail owing to external causes, for example, if it has a substantially damaged neighbourhood. Nodes that are not failed either externally or internally are regarded as active. We use a simple threshold rule (similar to that proposed in ref. 20) to define a substantially damaged neighbourhood: it is a neighbourhood containing fewer than or equal to m active nodes, where m is an integer. If node j has more than m active neighbours during d t, we assume that its neighbourhood is ‘healthy’, but if node j has ≤m active neighbours during the interval d t, there is a probability rd t that node j will externally fail. Parameter r we denote ‘damage conductivity’. As a third premise, we assume that there is a reversal process, a recovery from failures. Node j recovers from an internal failure after a time period , and it recovers from an external failure after time τ′. For simplicity, we set τ′ = 1.

If there are no recoveries () the system reduces to the Watts model20 generalized and rigorously solved in ref. 21. We find that introducing dynamic recovery leads to spontaneous network collapse and recovery—the phase switching phenomena.

We first perform numerical simulations for regular networks (in which all nodes have the same degree k) and then for Erdős–Renyi networks22,23. In simulations we use τ = 100, and networks have N = 107 nodes to approximate the thermodynamic limit. As most numerical results depend only on the product p τ, instead of using p and τ we define a more convenient single parameter p*≡1−exp(−p τ). It can be shown that p* represents the average fraction of internally failed nodes—see the Supplementary Information for an explanation. The network global state is best characterized by the fraction of active nodes in the network, z. The most interesting question is how p* (which controls internal failures) and r (which controls external failures) affect the entire network. For a set of different values of p* and r, we numerically calculate a time-averaged fraction 〈z〉 = 〈z(r,p*)〉 of active nodes. Figure 1a shows 〈z〉 as a function of p*, for three different r values, for a regular network with k = 10 and m = 4. For some r values we encounter a discontinuity in 〈z〉 while slowly changing the p* value, for the increasing or decreasing direction of p*, or sometimes both. The hysteresis shown in Fig. 1a is the characteristic feature of a first-order phase transition. Repeating this procedure for many values of r, in Fig. 1b we obtain the two-parameter (r,p*) phase diagram. Discontinuity lines (spinodals) in the (r,p*) space separate two collective phases corresponding to high network activity (Phase I—large values of 〈z〉) and low network activity (Phase II—low values of 〈z〉). Between the spinodals is the hysteresis region (the purple region in Fig. 1b), in which either of the two network phases can exist.

Figure 1: Critical behaviour of the system with first-order phase transition and hysteresis.
figure 1

a, Equilibrium average fraction of active nodes, 〈z(p*)〉, simulation results (symbols) and the MFT prediction (solid lines), for three different values of r. Parameters for random regular networks, N = 107, k = 10 and m = 4 are used in this example. b, The phase diagram in model parameters (r, p*) exhibits two phases. Phase I (green region) represents a high-activity collective network mode; Phase II (orange) represents a low-activity mode. The hysteresis region (purple) is bounded with spinodals, denoted by red and blue lines. The lines merge at a critical point located at (r = 0.637, p* = 0.386). Colours in the diagram highlight regions of different phases. Analytical MFT results for spinodals are denoted by black lines. Point A shows the parameters used in Fig. 2. c, Comparison of analytical MFT result (dashed lines) with numerical results (dots), for the spinodals in the (r,p*) phase diagram, for regular networks. d, Comparison of analytical MFT result with simulation results, for an Erdős–Renyi (ER) network with 〈k〉 = 10, m = 4, N = 107.

The system can be analytically described using a mean-field theory (MFT), details of which we provide in the Methods section. For an arbitrary network with degree distribution fk, the fraction of failed nodes a = 1−〈z〉 is well approximated by the equation

The values of r and p* determine whether there is a single solution for 〈z〉 = 1−a (a single phase) or three solutions (two physical stable solutions we observe in simulations and the third solution, which is dynamically unstable). Our approximate solution is similar to the exact analytical solution of the Watts model 21 for cascading failures with no recovery. However, in our model, because we assume τ′ = 1, nodes that externally failed at the previous stage of the cascade (with probability r) may become active again at the next stage. The solution of equation (1) gives a discontinuity in 〈z〉 for certain values of r and p*. Figure 1a compares analytical results for 〈z(p*)〉 for three different values of r with the simulation results. MFT agrees well with simulations, but the deviation increases close to discontinuities. Figure 1b shows the MFT prediction for the position of spinodals (black lines). The deviation of the MFT approximation from simulations becomes smaller as connectivity k increases (Fig. 1c), which is a mean-field characteristic. Figure 1d shows numerical and analytical results for the Poisson degree distribution (Erdős–Renyi network22), for 〈k〉 = 10 and m = 4. In this case, because a substantial fraction of nodes have k close to the value of m, there is less agreement between MFT and simulation results. In the Supplementary Information we discuss and compare our model with the Ising, lattice gas and forest fire models24,25.

As many real-world networks are small or medium-sized, we perform numerical simulations for small networks in which fluctuations are very pronounced and find dynamic behaviour that is qualitatively different. Figure 2a shows the fraction of active nodes z(t) for a regular (k = 10, m = 4) network with N = 100 nodes when (r = 0.8, p* = 0.28) is an arbitrary point in the hysteresis region. We find that z flips back and forth from one phase to another, between 〈zhigh≈0.7 in the high-activity phase, and 〈zlow≈0.14 in the low-activity phase. The probability distribution function (PDF) of z has a bimodal shape (see Fig. 2b), similar to a random walker in a double-well potential. This behaviour is characteristic for small systems near the critical point, in particular, for the spontaneous folding and unfolding of proteins26, or in fluids27.

Figure 2: Two network modes characterized by high and low network activity.
figure 2

a, Dynamical switching (flipping) of the fraction of active nodes z, between two collective modes in the subcritical region, an example for p* = 0.28, r = 0.80 (point A, in yellow, Fig. 1b), with k = 10, m = 4 and N = 100. Green circles mark avoided transitions. b, The PDF of z shows a bimodal form. c, The white line represents the trajectory (rλ(t), pλ*(t)) of the system in the phase diagram, from t = 0 to the moment of the first transition (point 1, Fig. 2a), in the same numerical simulation where z(t) in Fig. 2a was simulated. The system was in the low active phase until the trajectory crossed the left spinodal, resulting in a global recovery event. Analogously, when the system is in the high active state the right spinodal becomes relevant (points 2 and 4). Transitions between the macroscopic states are essentially first-passage processes on interchangeable spinodals. d, For the same parameters as in ac, expected lifetime of the system in a certain state measured in simulations increases exponentially with the system size N, confirming our theoretical results. Black lines represent linear regressions in the (N, lnT) diagram.

To understand this violent dynamics of the network we need to determine the mechanism of network global recovery/collapse. Observe a system with a small number of nodes N. Consider the fraction of externally failed nodes among the nodes having a critically damaged neighbourhood (CDN). This fraction at random time t is not exactly r but, owing to probabilistic nature of external failures, fluctuates around r. Thus, for short time intervals [tλ,t] we can define the local-time realization of r, rλ(t), as the time-averaged fraction of externally failed nodes among the CDN nodes, during that interval. We expect that during short time intervals the ‘true’ damage conductivity is not r, but rλ(t). A natural choice for λ is the system relaxation time. To estimate it, we use a typical cascade duration in which λ≈5 for a N = 100 network. Similarly we define pλ*(t) as the average fraction of internally failed nodes during [tλ,t].

The evolution of the system can then be described as a trajectory (rλ(t), pλ*(t)) in the phase diagram. Our crucial hypothesis is this: the global recovery event of the network in the low-activity phase occurs when the trajectory (rλ(t), pλ*(t)) crosses the left spinodal (the red line in Fig. 1b), triggering a cascade and causing a transition to the high-activity phase. Similarly, the transition from the high-activity phase to the low-activity phase occurs when the trajectory crosses the right spinodal (the blue line in Fig. 1b). The phase-flipping phenomenon is then simply explained as the interchangeable crossing of the two spinodals by the trajectory (rλ(t), pλ*(t)) in the phase diagram. Numerical simulations confirm our hypothesis. For z(t) in Fig. 2a, we measure the corresponding (rλ(t), pλ*(t)) trajectory. Point 1 in Fig. 2a denotes the moment when the first jump from the down state to the upper state is registered. The position of the point (rλ(t), pλ*(t)) at that moment is marked in the phase diagram in Fig. 2c, and it is very close to the left spinodal. Similarly, the first jump from the upper state to the lower state (point 2 in Fig. 2a) is plotted in Fig. 2c. As expected, the system at that moment is close to the right spinodal. A few more transitions are presented, confirming our hypothesis for the jump mechanism. In Fig. 2c, the white curve represents the trajectory (rλ(t), pλ*(t)) from t = 0 to the moment of the first transition at 1. Sometimes the system crosses the ‘dangerous’ spinodal, leaves the hysteresis region for a very short time, and then quickly returns to the hysteresis region without triggering a cascade of failures (recoveries) and a corresponding transition. Exceptionally large spikes in Fig. 2a correspond to such avoided transitions. Note that when r = 1 the global recovery process is disabled. The fluctuations of rλ vanish, because rλ(t) = 1 for every t, and the system cannot cross the left spinodal, which is necessary for global recovery.

Two important observables are the average lifetimes Tdown(N) and Tup(N) of the system in, respectively, Phase II (down) and Phase I (up). Derivations for their estimates are presented in the Supplementary Information, with results

and similarly Tup(N)exp[N λ(p*−ps*)2/2p*(1−p*)], where E[a(r,p*)] is the average fraction of CDN nodes, and rs (ps*) is a typical r (p*) position on the left (right) spinodal, where most transitions to the upper (lower) state occur. Thus, equation (2) predicts that the average lifetime in a certain state exponentially increases with the size N, as confirmed in simulations in Fig. 2d.

To obtain plausible empirical support for our dynamic network model, we study the economic networks of companies in both developed and developing countries. To map a real economic network to our model, we use market returns to construct an appropriate binary variable for each node. To pick up fundamental changes in the companies rather than speculations, we measure a company’s return for a period of 100 days. The state of company i at time t we define as ‘good’ (‘bad’) if, during the period [t−100,t], the company has a market value increase (decrease). At each t, we define z(t) as the fraction of companies that have positive net returns during [t−100,t].

Figure 3 shows results for z(t) and its PDF P(z) for two real financial markets over a ten-year period. Figure 3a,b shows the Indian financial index BSE200, and Fig. 3c,d shows the US financial index S&P500. Figure 3a,c shows that z(t) switches back and forth between high and low values, resembling the phase-flipping phenomena that our model predicts for the hysteresis regime. Figure 3b,d shows that the PDF P(z) exhibits an asymmetric bimodal shape. Hence for both the Indian and the S&P500 index data we find that z suggests a bimodal PDF behaviour corresponding to the hysteresis regime, for the past decade (which includes the severe economic crisis), but not during previous decades. This finding suggests the intriguing possibility that model parameters of the economic network can change from year to year or from decade to decade, and that the system can enter and exit the hysteresis region. A comparison with real data indicates that our model is a plausible qualitative explanation for the behaviour we observe in real economic networks. It supports the concept of economic states28,29 and provides a critical understanding of possible hysteresis reported in economic systems 30. From the phase-flipping mechanism that we have uncovered, we can draw some interesting conclusions about economic networks. Several negative economic events are much more dangerous when concentrated within a time interval Δtλ. When they are combined they can increase pλ*(t) and push the system across the right spinodal. If they are separated by long intervals, the system is more likely to absorb the damage without collapse. The possible relation between the avoided transitions in our model, and the flash crash phenomenon in real-world networks, is discussed in the Supplementary Information.

Figure 3: Properties of phase-flipping phenomenon in financial data for developing and developed markets.
figure 3

a, For the constituents of the Indian index (BSE200), the fraction of stocks z with positive return as a function of time switches back and forth between the two network modes characterized by high and low network activity. b, Bimodal form in the PDF of financial data during the past decade. c, The same as in a but for the S&P500 financial index. d, Bimodal form in the PDF of the S&P500 financial data.

Methods

We give a brief derivation of equation (1). The average fraction of internally failed nodes in the network is p* = 1−exp(−p τ). For external failures, let Ek be the probability that a node of degree k is located in a CDN (with fewer than m+1 active neighbours in the steady state. Assume that the time-averaged fraction of failed nodes (either internally failed or externally failed) is 0<a<1. In a mean-field approximation, a is also the average probability that any node has failed, so using combinatorics we obtain

The probability that a node of degree k has externally failed is then r Ek. If we denote the failure events as A = {internal failure} and B = {external failure} and assume they are approximately independent, the probability that a randomly chosen node of degree k has failed is akP(A)+P(B)−P(A)P(B), where P(A) = p* and P(B) = r Ek are probabilities of the events. Summing over all k, we obtain equation (1). We also note that the giant component of active nodes in the network is existent (non-zero) in the entire region presented in Fig. 1b.

For the phase-flipping mechanism, our initial choice for λ is supported by simulations. If for λ we choose a much larger value than the relaxation time (which is the natural choice), the fluctuations of rλ(t) and pλ*(t) become too small and the trajectory (rλ(t), pλ*(t)) shrinks to a small region around point A and it does not cross the spinodals when it is supposed to. If λ is too small (for example, λ = 1), the system cannot adiabatically follow rapid changes in rλ(t) and pλ*(t).