Non-Markovian recovery makes complex networks more resilient against large-scale failures

Non-Markovian spontaneous recovery processes with a time delay (memory) are ubiquitous in the real world. How does the non-Markovian characteristic affect failure propagation in complex networks? We consider failures due to internal causes at the nodal level and external failures due to an adverse environment, and develop a pair approximation analysis taking into account the two-node correlation. In general, a high failure stationary state can arise, corresponding to large-scale failures that can significantly compromise the functioning of the network. We uncover a striking phenomenon: memory associated with nodal recovery can counter-intuitively make the network more resilient against large-scale failures. In natural systems, the intrinsic non-Markovian characteristic of nodal recovery may thus be one reason for their resilience. In engineering design, incorporating certain non-Markovian features into the network may be beneficial to equipping it with a strong resilient capability to resist catastrophic failures.

T he dynamics of failure propagation on complex networks constitute an active area of research in network science and engineering with significant and broad applications. This is because the functioning of a modern society relies on the cooperative working of many networked systems such as the electrical power grids, various transportation networks, computer and communication networks, and business networks, but these networks typically possess a complex structure and are vulnerable to failures and intentional attacks. Among the diverse failure scenarios, one of the most severe types is cascading failures 1 , where the failure of some nodes would cause their neighbors to fail and the process would propagate to the entire network, disabling a large fraction of the nodes and causing malfunctioning of the system at a large scale [2][3][4][5][6][7][8][9][10][11][12][13][14][15] . Classic examples of cascading failures includes power blackout-the collapse of power grids 5,6 , traffic jams 16 , and even economic depression 14,17 . Previous studies mostly focused on how cascading failures occur, how network structures and failure propagation are related, and on network robustness and vulnerability to failure propagation [18][19][20][21][22] .
A tacit assumption employed in many previous studies of cascading failures is irreversible failure propagation, where a node, if it has failed, cannot recover and is no longer able to function actively. A failed node is then removed from the network completely, including all the links associated with it. There are real-world situations of networked systems, such as financial and transportation networks, where failed nodes can recover from malfunctioning spontaneously after a collapse [23][24][25][26][27][28] . In general, there are two types of failure-and-recovery scenarios 29 : internal and external. In the first type, a node fails because of internal causes (e.g., the occurrence of some abnormal or undesired dynamical behaviors within the node), which is independent of the states of its neighbors. In this case, the node can recover spontaneously after a period of time. An example is the failure of a company characterized by a drop in its market value due to poor management, followed by recovery due to internal restructuring. The second type is external failures, where a node's failure is externally triggered, e.g., by the failures of its neighboring nodes. After a period of time, as its local "environment" is improved, the node is able to recover spontaneously. The time of recovery depends not only on the specific type of failure-andrecovery mechanism, i.e., whether internal or external, but also on the individual node and its position in the network. For example, for a given node in the network, it may take longer to complete an internal restructuring process to recover from a failure due to an internal than an external cause. Previous computation and mean-field analysis have revealed that cascading dynamics incorporating a failure-and-recovery mechanism can exhibit a rich variety of phenomena such as phase transitions, hysteresis, and phase flipping [29][30][31][32][33] . With respect to the resilience responses of networks, the effects of removing a fraction of nodes and links on network functions were studied [34][35][36][37] , demonstrating that resilience can be used to characterize the critical functionality of the network with applications in complex infrastructure engineering 36,37 .
In spite of the variations in the recovery dynamics across networks or even nodes in the same network, generally the process can be classified into two distinct types: Markovian and non-Markovian. In a Markovian recovery process, an event occurs at a fixed rate and the interevent time follows an exponential distribution [38][39][40] , rendering memoryless the process. On the contrary, a non-Markovian recovery (NMR) process has memory, as the current state of a node depends not only on the most recent state but also on the previous states. In this case, the interevent time distribution is not exponential but typically exhibits a heavy tail. For example, in human activity and interaction dynamics, the occurrences of contacts among the individuals in a social network can be characteristically non-Markovian, for which there is mounting empirical evidence [41][42][43][44][45][46][47][48] . Non-Markovian type of recovery process also occurs in biochemical reactions 49 and in the financial markets 12,50 . We note that, in the context of spreading dynamics on complex networks, the effects of the non-Markovian process, due to its high relevance to the real world, have attracted growing attention [51][52][53][54][55][56][57][58] . From the point of view of mathematical analysis, incorporating memories into the dynamical process makes analytic treatment challenging.
While the impacts of non-Markovian processes on spreading dynamics have been reasonably well-documented [51][52][53][54][55][56][57][58] , there has been little work so far addressing the influence of non-Markovian recovery process on failure propagation dynamics. In this paper, we address this issue systematically through a comparison study of two types of dynamical processes: one with Markovian and another with NMR. In the Markovian recovery (MR) model, failures due to internal and external causes will recover with different constant rates. In the NMR model, such a constant rate cannot be defined. We thus resort to the recovery time. In particular, we assume that the failed nodes due to internal and external causes will take different time to recover, so a memory effect is naturally built into the model. For each model, we develop a mean-field theory and an analysis based on the pair approximation (PA) 29,59-62 that retains the two-node correlation but ignores any correlation of higher orders. Comparing results with numerical simulations indicates that both mean-field theory and PA analysis capture the key features of the failure propagation dynamics qualitatively, but the PA analysis yields results that are in better quantitative agreement with numerics. The counterintuitive and striking phenomenon is then that non-Markovian character with a memory effect makes the network more resilient against large-scale failures. There are two implications. Firstly, in physical, biological, or other natural networked systems, the intrinsic non-Markovian character of nodal recovery may be one reason for resilience of these networks and their existence in a harsh environment. Secondly, in engineering and infrastructure design, incorporating certain non-Markovian features into the network may help strengthen its resilience and robustness.

Results
Spontaneous recovery models. For general failure propagation dynamics on a network, a node can be in one of two states: an active (labeled as A-type) state in which the node functions properly and an inactive state (I-type) in which the node has failed. To distinguish the causes for a node to become inactive, we label an inactive node due to internal or external failure as X-type or Y-type, respectively. In the NMR model, an A-type node may fail spontaneously at the rate β 1 to become an X-type node, or it may fail at the rate β 2 to become a Y-type node when the number of its A-type neighboring nodes is less than or equal to a threshold integer value m that sets the limit on neighboring support for proper functioning of a node. Without loss of generality, we assume that external failures occur more frequently than internal failures: β 1 < β 2 . This is often the case as internal failures can be made less probable by building up the capability of the nodes through better equipment and/or management, while external failures are uncontrollable and more difficult to avoid. For examples, falling stocks may be the result of unanticipated changes in the market rather than poor management. In a road network, failures are caused more often by congestion than by physical failures. Once a node becomes inactive, it takes time τ 1 to recover from an internal failure (when the node is of the X-type) or time τ 2 to recover from an external failure (when the node is of the Y-type). The non-Markovian characteristic is taken into account through the incorporation of a memory effect into the model. In particular, the nodes that will recover at time t constitute those that were turned into X-type inactive nodes at the time t − τ 1 and those turned into Y-type inactive nodes at the time t − τ 2 . Here, we assume τ 1 > τ 2 , for the reason that repairing a node or restructuring the management due to the malfunctioning of the node itself would need more time. For example, reorganizing a company or repairing a road often takes more time. The failure processes characterized by the rates β 1 and β 2 as well as the recovery processes as determined by τ 1 and τ 2 are schematically illustrated in Fig. 1.
Note that the case of τ 1 < τ 2 may also arise in the real world. For example, for an infrastructure network in civil engineering, when an earthquake strikes and destroys buildings (nodes), the time to rebuild can be longer than that required for recovering from internal failures, e.g., the collapse of a roof due to some material failure. Our computations of this case yield qualitatively similar results to those in the case of τ 1 > τ 2 -see Supplementary Note 3 for detail.
The MR and NMR models differ only in the recovery processes. In the MR model, an inactive node of the X-type or the Y-type recovers at a constant rate μ 1 or μ 2 , respectively, as illustrated in Fig. 1. Consequently, the number of nodes recovered at time t depends only on the number of inactive nodes of both Xtype and Y-type at the previous time step.
To develop theories for failure propagation on networks with MR or NMR recovery process and to identify the key differences between the two type of dynamics, we focus on random regular networks. In the numerical simulations, we use a relatively large network size N = 3 × 10 4 with the degree k = 35. In the NMR model, the recovery times are taken to be τ 1 = 100 and τ 2 = 1 for the X-type and Y-type of nodes, respectively. In the MR model, the values of the recovery rates are set to be μ 1 = 1/τ 1 = 0.01 and μ 2 = 1/τ 2 = 1 so that they correspond to the same scales for the recovery times in the NMR model (see Supplementary Note 1 for a more detailed explanation). The threshold values in both models are m = 15. Synchronous updating is invoked in simulations with the time step Δt = 0.01.
Markovian recovery process. Mean-field theory: We start with setting up the dynamical equations for MR dynamics and comparing results with simulations. Based on the mean-field theory in "Mean-field theory for MR dynamics" of "Methods" section, we first examine the behavior of E t ([I]) in Eq. (4). Figure 2a  To check which branch the system would take on and whether there are two states for some range of parameters, the simulation results for moving the value of β 1 up (circles) and down (squares) are shown in Fig. 2b for comparison. As the values of β 1 are increased or decreased, the initial state is taken to be the final state corresponding to the previous value of β 1 -the adiabatic process. The results indicate that: (i) the values of [I] from simulations follow the two branches given by the mean-field approximation, and (ii) the low-failure (high-failure) branch is followed when moving β 1 up (down) until a particular value at which there is a jump to the high-failure (low-failure) branchthe signature of a hysteresis. The results also imply that if one starts from the initial conditions [X] 0 ≠ 0 and [Y] 0 = 0, there exists a critical value of β c ≈ 0.007 for a sudden increase in the number of failed nodes when [X] 0 is small as the system will Fig. 1 Schematic illustration of NMR and MR models. An active (A-type) node may fail spontaneously at the rate β 1 to become an X-type node due to internal causes, or it may fail at the rate β 2 to become a Y-type node when the number of its A-type neighbors n is less than or equal to a threshold m setting the necessary neighboring support for the proper functioning of a node. In the NMR model, the X-type and Y-type nodes take the time duration τ 1 and τ 2 from the time they are generated to recover, respectively. In the MR model, the X-type and Y-type nodes recover, respectively at the rates μ 1 and μ 2 . c Phase diagram on the β 2 /μ 2 -β 1 /μ 1 plane as predicted by the mean-field theory. Systems in the bistable phase will evolve either to a high-failure or a low-failure phase depending on the initial conditions. Beyond the critical point, there is no distinction between the low-failure and high-failure phases.
follow the low-failure branch. However, for large [X] 0 , the critical value β c becomes β c ≈ 0.003 as the system will follow the high-failure branch. A plot of β c against [X] 0 will therefore exhibit two plateaus with β c ≈ 0.007 for small [X] 0 and β c ≈ 0.003 for large [X] 0 . The mean-field approximation not only simplifies the analysis but also provides insights into the dynamical process. For example, the mean-field theory suggests the ratios β 1 /μ 1 and β 2 /μ 2 as key parameters. In general, solutions can be obtained numerically by solving Eq. (5) together with Eq. (4). The results are shown in Fig. 2c as a phase diagram. For parameters falling into the regions corresponding to the low-failure (high-failure) phase, the system will evolve into a low-failure (high-failure) state. For parameters in the bistable phase, the system will evolve either to a low-failure or a high-failure state, depending on the initial conditions. The high-failure and low-failure phase boundaries meet at the critical point determined by β 1 /μ 1 ≈ 0.745 and β 2 /μ 2 ≈ 1.020.
In addition to the stationary state, the evolution of the system can also be studied by iterating Eqs. (1) and (2) for a given initial condition. Figure 3a shows the evolution of the MR dynamics as obtained by the mean-field theory for β 1 = 0.004 and β 2 = 2. In the three-dimensional space formed by Pairwise approximation theory for the MR model: It is possible to formulate a theory that takes into account of two-node spatial correlation based on the pairwise approximation (PA). The basic idea is to follow the evolution of different types of links, i.e., links that connect different pairs of neighboring nodes 62 . The PA method has been used widely in studying epidemic and information spreading [63][64][65] , and in coevolving voter models and adaptive games with two or more strategies [66][67][68][69] . In "Effect of nodal correlation: pairwise approximation for the MR model" of "Methods" section, we develop a PA based theory for the MR model. Figure 4 presents a comparison of the predictions of the PA analysis and mean-field theory with the numerical results, where Fig. 4a shows the time evolution of [X] t and [Y] t from the initial state [X] 0 = [Y] 0 = 0 for β 1 = 0.009, β 2 = 2.0, μ 1 = 0.01, and μ 2 = 1. While both mean-field and PA theories capture the key features in time evolution, the results of PA are in better agreement with those from simulations. It is useful to understand the dynamical behaviors in the MR model qualitatively (so as to enable a meaningful comparison with those of the NMR model later). For this purpose, we identify several stages in the time evolution as marked in Fig. 4a. In the early stage, i.e., t ∈ [t O , t A ], most nodes are active and they have more active neighbors, violating the condition n A ≤ m. As a result, only internal failures occur and [X] t grows but [Y] t decreases and eventually vanishes. For t ∈ [t A , t B ], [X] t , active nodes start to fail into Y-type nodes, leading to fewer active nodes in the system and triggering more external nodal failures. This results in the observed rapid increase in [Y]. In the later stage t ∈ [t B , t C ], there are more failed nodes than active ones. While the failed nodes of X and Y types can recover with their respective rates, the remaining or recovered active nodes will more likely fail again through external than internal causes due to the many failed nodes surrounding the active nodes. Consequently, in this later stage, [Y] t increases and [X] t decreases toward their respective steady-state values for t → ∞, with [Y] > [X] when the system evolves into a high-failure state. The PA analysis captures the behavior of [X] t over time and the onset of [Y] t better than the mean-field analysis. Figure 4b shows the phase diagram for μ 1 = 0.01 and μ 2 = 1.0. The mean-field phase diagram is the same as that shown in Fig. 2c, where it can be seen that the results of the PA analysis (solid curves) are indeed in better agreement with the simulation results than the predictions of the single-node mean-field theory.
Note that Fig. 2  value of β 2 = 2.0) and search for the value of β 1 beyond which the system attains a high-failure state (see Supplementary Fig. 1 in Supplementary Note 2). The critical value thus depends on [X] 0 , the initial fraction of failed nodes due to an internal mechanism. Figure 4c shows the numerically obtained functional relation β c ([X] 0 ) (open circles), together with two types of theoretical prediction (PA analysis and mean-field theory). As the initial fraction [X] 0 is increased from a near zero value, β c maintains at a relatively higher constant value (about 0.007). As [X] 0 increases through the value of about 0.4, the value of the critical rate suddenly decreases to about 0.003. We see that, again, the prediction of the abrupt change in β c by the PA analysis is more accurate than that by the mean-field theory.
What is the physical meaning of the abrupt decrease in the critical value of the spontaneous failure rate as displayed in Fig. 4c? A higher value of β c means that the network system is more resilient to large-scale failures as it requires a larger rate value to drive the system into a high-failure state. As the fraction of initially failed nodes is increased, the network as a whole is more prone to large-scale failure so we expect the value of β c to decrease. Because of the lack of any memory effect in the ideal, Markovian type of recovery process, i.e., after a node fails, it either recovers instantaneously or does not recover (with probabilities determined by the rate of recovery), we expect a characteristic change in the system dynamics as characterized by the value of the critical rate β c to occur in an abrupt manner. Indeed, as Fig. 4c reveals, as the fraction of initially failed nodes is increased through a threshold value, there is a sudden decrease of about 50% in the value of β c , giving rise to a first-order type of transition. This behavior of abrupt transition may not occur in reality because of the assumed Markovian recovery process, which is ideal and cannot be expected to arise typically in the physical world. In the next section, we will demonstrate that making the dynamics more physical by assuming non-Markovian type of recovery process will drastically alter the picture of transition in Fig. 4c.
Non-Markovian recovery process. To analyze failure propagation dynamics in systems with NMR, a viable approach is to construct difference equations that relate the fractions of types of nodes and links at time t + Δt to those at time t. It is necessary to keep track of the time when a node becomes the X or Y type as well as the time at which a link becomes type UV. In "Pairwise approximation theory for the NMR model" of "Methods" section, we develop a PA analysis for the NMR model. Figure 5  To describe the key features of the NMR model, we divide the evolution into five stages with the respective time intervals , and [t D , t E ], as shown in Fig. 5a. In the earliest stage [t O , t A ], [X] t increases due to internal failures but [X] t is insufficient to cause external failures. The behavior is similar to that in the MR model, but the duration is shorter and the rise in [X] t is steeper in the NMR model. The reason is that the memory effect in NMR model allows the recovery of X-type nodes to take place only after τ 1 steps, while the recovery occurs probabilistically in the MR model. In the narrow time window of [t A , t B ], [X] t attains a level high enough to trigger the onset of many external failures. As a result, the failed nodes constitute the majority in the system and [A] t decreases sharply, giving rise to the sharp increase in [Y] t . The Y-type nodes recover deterministically after τ 2 (τ 2 < τ 1 ) into active nodes. In the period [t B , t C ], the recovery of Y-type nodes refuels the system with active nodes that can participate in two paths: more internal and external failures. For t C < τ 1 , the existing X-type nodes have yet to recover and [X] t continues to increase but at a slower pace due to the external failure path, while [A] t reduces slightly.
In the time window [t C , t D ], the initial internally failed nodes begin to recover as t C > τ 1 , in addition to the recovery of the Ytype nodes. The A-type nodes due to recovery will be more likely to become Y-type as the failed nodes remain the majority (due to the parameter setting β 2 > β 1 in this example). This leads to the observed increase in [Y] t and decrease in [X] t in the time interval [t C , t D ]. In the final stage [t D , t E ], [X] t stops decreasing because the recovery of X-type nodes at the time t ≳ t D is due to those failed internally at t ≳ t B for which the number was small. However, the recovery of Y-type nodes at a shorter time scale supplies fresh active nodes. The fraction of failed nodes [X] t + [Y] t is so high, i.e., approaching the high-failure state, that the dynamics lead to a higher steady value of [Y] than [X] in long time. For time well beyond t D , both [X] and [Y] become steady. Figure 5b shows the phase diagram of the NMR model analogous to Fig. 4b Fig. 5c. The pair approximation, again, gives more accurate prediction than that from the mean-field theory. The result in Fig. 5c demonstrates the striking effect of non-Markovian type of recovery with memory on the failure propagation dynamics, which is in stark contrast to the ideal case of Markovian process as exemplified in Fig. 4c. In particular, as the fraction [X] 0 of initially failed nodes is increased from a near zero value to one, the value of β c begins to decrease continuously and smoothly until it reaches a minimum, at which β c increases relatively more rapidly to a high value of about 0.006 for [X] 0 ≈ 0.3. For [X] 0 > 0.3, the value of β c remains approximately constant at 0.006. Comparing Fig. 5c with Fig. 4c, we see two major, characteristic differences. Firstly, the behavior of an abrupt decrease in the Markovian case is replaced by a gradual process in the non-Markovian case, essentially converting a firstorder like process to a second-order one. Secondly and more importantly, β c recovers from its minimum value and maintains at a high value regardless of the value of [X] 0 insofar as it exceeds about 30%. This means that, the system can maintain its degree of resilience even when the initial fraction of failed nodes reaches 100%! This contrasts squarely the behavior in the Markovian case, where the system resilience is reduced dramatically even when only about 40% of the nodes failed initially. In this sense, we say that a non-Markovian type of memory effect makes the network system more resilient against failure propagation.
While the behavior in Fig. 5c is counterintuitive, a heuristic reason is as follows. For an initial state with many initial X-type nodes, the few remaining nodes will switch from being active to the Y-type and back. All the initial X-type nodes will have to wait for the time period τ 1 to recover. At that time, the system becomes one with only a few failed nodes-effectively equivalent to one with small [X] 0 value and requiring a larger β c value to evolve into the high-failure state. In a range of small [X] 0 , a smaller β c can already cause more active nodes to become Y-type, helping maintain the system in a high-failure state as described for Fig. 5a. Theoretical support for the behavior is provided by the PA analysis and mean-field theory, as shown in Fig. 5c.
In addition to the different time evolution in the MR and NMR models, there are also cases where the same initial conditions MR and NMR dynamics on heterogeneous networks. So far our analysis and simulations have been carried out for MR and NMR dynamics on random regular networks. We find that altering the network structure causes little change in the qualitative results. For example, we have carried out simulations on scale-free networks of size N = 3 × 10 4 with degree range k min ; ffiffiffiffi N p Â Ã and degree distribution P(k)~k −γ . Figure 7 shows the results of β c versus [X] 0 for the MR and NMR dynamics for networks with γ = 3. Because of the heterogeneity in the nodal degree distribution, the threshold on external failure is given in terms of the fraction one-half of the failed neighbors.
Comparing results with Fig. 4c for MR dynamics and Fig. 5c for NMR dynamics in random regular networks, we see that the key features are similar when the underlying random regular networks are replaced by scale-free networks. We have also carried out numerical simulations on four additional types of synthetic and empirical networks: (a) networks with degree-degree correlation, (b) networks with a community structure, (c) empirical arenas-email network, and (d) empirical friendship-hamster network, with results presented in Supplementary Notes 4 and 5 for the former and latter two cases, respectively. These results, together with Fig. 7, suggest that, for   a b c heterogeneous networks, a non-Markovian process tends to enhance the network resilience against large-scale failures.

Discussion
The intrinsic memory effect associated with non-Markovian processes makes it challenging to analyze the underlying network dynamics, new and surprising phenomena can arise. Most previous studies treated Markovian processes through either a meanfield type of theory 60,61 or an effective degree approach 59 . For non-Markovian processes, the mean-field approximation can still be applied 29,31-33 , but it is necessary to invoke a higher-order theory such as the PA analysis. Our work presents such an example in the context of failure propagation in complex networks.
Our study has demonstrated that, in both models, the network can evolve into a low-failure or a high-failure state, with the latter corresponding to the undesired state of large-scale failure. Both the mean-field and PA theories are capable of predicting the dynamical behaviors of failure propagation, and the performances of the theories are gauged by simulation results, revealing that the more laborious pair approximation gives results in better quantitative agreement with the numerics. Our systematic computations on different complex networks and two types of theoretical analyses have uncovered a striking phenomenon: the non-Markovian memory effect in the nodal recovery can counterintuitively make the network more resilient against large-scale failures.
Our finding also calls for the incorporation of non-Markovian type of memory factors into the design of communication, computer, and infrastructure networks in various engineering disciplines. We hope our work will stimulate interest in examining and exploiting non-Markovian processes in various network dynamical processes. We have carried out a systematic study of the effects of Markovian versus non-Markovian recovery on network synchronization using the paradigmatic Kuramoto network model, with the main finding that non-Markovian recovery makes the network more resilient against large-scale breakdown of synchronization (Supplementary Note 6).

Methods
Mean-field theory for MR dynamics. Let [A] t , [X] t , and [Y] t be the fractions of Atype, X-type, and Y-type nodes in the system at time t, respectively. A hierarchical set of dynamical equations for the MR model can be constructed to include increasingly longer spatial correlation. The equations for the evolution of the fractions of different types of nodes are: where the first term in each equation gives the supply to [X] ([Y]) due to internal (external) failures and the second term represents the drop in [X] ([Y]) due to recovery. Note that, because of the relation an equation for [A] t is unnecessary. The quantity E t is the probability of an A-type node having j ≤ m neighbors of A-type nodes at time t and thus the node will be infected at the rate β 2 .
In general, the quantity E t involves the correlation between two neighboring nodes. To connect Eqs. (1) and (2) so as to retain the simplicity of a single-node theory, we use the approximation where C kÀj k ¼ k!=ðj!ðk À jÞ!Þ. Equations (1) (4) form a set of equations, from which the fractions of different types of nodes can be solved. This is the simplest singlesite mean-field approximation for the MR dynamics that ignores any spatial correlation. Despite its simplicity, it is capable of revealing the key features in the stationary state, in which Eqs. (1) and (2) require the fraction of failed nodes [I] to satisfy In general, the equations of single-node quantities, e.g., Eq. (2), necessarily involve quantities of more extensive spatial correlation because the interplay between the failure of a node and the states of its neighboring nodes.
t is the probability of an A-type node having an inactive node regardless of the types of the neighbors, the probability that there are exactly j neighbors of A-type and (k − j) inactive neighbors of either X or Y type is where k is the degree of the node. The quantity E t in Eq. 2, as schematically depicted in Fig. 8a, is thus given by which indicates explicitly that the dynamics of single-node quantities are governed by the two-node quantity [AI] t . This is reminiscence of the BBGKY (Bogoliubov-Born-Green-Kirkwood-Yvon) hierarchy of equations for the distribution functions in a system consisting of a large number of interacting particles in statistical physics 70 . Only under the approximation [AI] t ≈ [A] t [I] t (so that the two-node correlation can be neglected) will the resulting equation be Eq. (4)-a set of singlenode mean-field equations.
To proceed, we derive the dynamical equations for [UV] t that will in general involve more extensive spatial correlation. For example, a link of the type AA would evolve into a different type depending on the neighborhoods of the two nodes, effectively a small cluster of nodes. To develop a manageable approximation, we retain the two-node correlation and decouple any longer spatial correlation in terms of one-node and two-node functions. This is the idea behind PA for obtaining a closed set of equations. In particular, the dynamical equations for [AX] t and [AA] t are where is the probability of an A-type node having j ≤ m A-type neighbors among its (k − 1) neighbors, given that one neighbor is inactive, and is the probability of an A-type node having j ≤ m − 1 A-type neighbors among its (k − 1) neighbors, given that one neighbor is active. Figure 8 illustrates the meanings of E t , E 0 t , and E Pairwise approximation theory for the NMR model. Specifically, we let ½U l t be the fraction of nodes of type U at time t, which became type U from some other type only l time steps ago, and ½U l 1 V l 2 t be the fraction of links of the UV type when the corresponding node(s) associated with a link became that of the labeled type l 1 and l 2 time steps ago. The time evolution of the fraction of X-type nodes in the NMR model is given by ½X l tþΔt ¼ β 1 Δt½A t ; l 2 ½0; ΔtÞ; ½X lÀΔt t ; l 2 ½Δt; τ 1 ; 0; l 2 ðτ 1 ; 1Þ:: È ð13Þ The first line in Eq. (13) gives the new supply due to internal failure of A-type nodes in the time duration [t, t + Δt). The second line accounts for the nodes which were inactive for a duration l − Δt at time t but have not reached the time for recovery at time t + Δt. The third line states that all X-type nodes that came to existence τ 1 earlier have been recovered. Similarly, the time evolution of the fraction of Y-type nodes is given by ½Y l tþΔt ¼ β 2 ΔtE t ½A t ; l 2 ½0; ΔtÞ; ½Y lÀΔt t ; l 2 ½Δt; τ 2 ; 0; l 2 ðτ 2 ; 1Þ; where E t is defined in Eq. (8) and [AI] t = [AX] t + [AY] t . The fractions of X-type and Y-type nodes, regardless of how long they have been in the corresponding state, are given by ½X t ¼ P τ 1 l¼0 ½X l t and ½Y t ¼ P τ 2 l¼0 ½Y l t , respectively. The fraction of active nodes follows from To develop a PA analysis for failure propagation dynamics with NMR, we construct the equations for the time evolution of UV-types of links and retain t . The blue (red) color indicates a node in the active (failure) state. Open circles are nodes that may take on A, X, or Y state. In the PA equations, E t is the probability of an A-type node having n ≤ m neighbors of A-type nodes at time t, E 0 t is the probability of an A-type node having n ≤ m A-type neighbors among its (k − 1) neighbors given that one neighbor is inactive, and E″ t is the probability of an A-type node having n ≤ m − 1 A-type neighbors among its (k − 1) neighbors given that one neighbor is active. spatial correlation up to two neighboring nodes. Our derivation of the counterparts of Eqs. (13) and (14) in the MR case suggests the necessity to examine the history of the inactive nodes(s) associated with a link. For example, the time evolution of the links in ½AX l t is governed by ½AX l tþΔt ¼ β 1 Δt½AA t þ β 1 Δtð½X τ 1 A t þ ½Y τ 2 A t Þ; l 2 ½0; ΔtÞ; ½X τ 1 X lÀΔt t þ ½Y τ 2 X lÀΔt t þ ð1 À β 1 Δt À β 2 ΔtE 0 t Þ ½AX lÀΔt t ; l 2 ½Δt; τ 1 ; 0; l 2 ðτ 1 ; 1Þ; 8 > > > > > > > > < > > > > > > > > : where E 0 is defined in Eq. (11). The first line represents the new supply to AX-type of links due to an internal failure in one of the active nodes associated with a link of the AA-type, and an internal failure together with a recovery of an inactive node in a link of the XA-types and YA-types. The second line includes the supply to AX ltype links due to recoveries from XX and YX types as well as the links of AX l−Δt type that became AX l type in the recent duration Δt. The last line comes from the fact that an X-type node must recover after a time τ 1 since it became inactive. The fraction of links of AX-type, regardless of how long the node in the link has taken in the X-type, is given by ½AX t ¼ P τ 1 l¼0 ½AX l t . We thus have that the fraction of AA-type of links evolves in time as where E 00 t is defined in Eq. (12). Equations for other types of links can also be constructed (Supplementary Note 1). Equations (15) and (16) are analogous to Eqs. (9) and (10) in the MR model. The number of equations is determined by the divisions of τ 1 and τ 2 into the small time steps Δt, which increases rapidly when Δt is small compared with the other time scales in the NMR dynamics.
A crude approximation analogous to the mean-field theory can be developed for the NMR model by retaining only the fractions of nodes in the equations, which can be done by decoupling the two-node quantities such as [AI] t by [AI] t ≈ [A] t [I] t . The resulting equations governing the fractions of different types of nodes become and where E t takes on the approximate form in Eq. (4). Equations (17), (18), and (4) form a set of equations that can be solved to yield the fractions of different types of nodes. The first two terms in Eqs. (17) and (18) correspond to the increase in inactive nodes due to failure and due to those remaining inactive, and the last term corresponds to recovery. The number of equations, again, depends on the choice of Δt. This is the mean-field approximation for the NMR model that ignores any spatial correlation.
Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
The source data underlying Figs