## Abstract

Non-Markovian spontaneous recovery processes with a time delay (memory) are ubiquitous in the real world. How does the non-Markovian characteristic affect failure propagation in complex networks? We consider failures due to internal causes at the nodal level and external failures due to an adverse environment, and develop a pair approximation analysis taking into account the two-node correlation. In general, a high failure stationary state can arise, corresponding to large-scale failures that can significantly compromise the functioning of the network. We uncover a striking phenomenon: memory associated with nodal recovery can counter-intuitively make the network more resilient against large-scale failures. In natural systems, the intrinsic non-Markovian characteristic of nodal recovery may thus be one reason for their resilience. In engineering design, incorporating certain non-Markovian features into the network may be beneficial to equipping it with a strong resilient capability to resist catastrophic failures.

## Introduction

The dynamics of failure propagation on complex networks constitute an active area of research in network science and engineering with significant and broad applications. This is because the functioning of a modern society relies on the cooperative working of many networked systems such as the electrical power grids, various transportation networks, computer and communication networks, and business networks, but these networks typically possess a complex structure and are vulnerable to failures and intentional attacks. Among the diverse failure scenarios, one of the most severe types is cascading failures^{1}, where the failure of some nodes would cause their neighbors to fail and the process would propagate to the entire network, disabling a large fraction of the nodes and causing malfunctioning of the system at a large scale^{2,3,4,5,6,7,8,9,10,11,12,13,14,15}. Classic examples of cascading failures includes power blackout—the collapse of power grids^{5,6}, traffic jams^{16}, and even economic depression^{14,17}. Previous studies mostly focused on how cascading failures occur, how network structures and failure propagation are related, and on network robustness and vulnerability to failure propagation^{18,19,20,21,22}.

A tacit assumption employed in many previous studies of cascading failures is irreversible failure propagation, where a node, if it has failed, cannot recover and is no longer able to function actively. A failed node is then removed from the network completely, including all the links associated with it. There are real-world situations of networked systems, such as financial and transportation networks, where failed nodes can recover from malfunctioning spontaneously after a collapse^{23,24,25,26,27,28}. In general, there are two types of failure-and-recovery scenarios^{29}: internal and external. In the first type, a node fails because of internal causes (e.g., the occurrence of some abnormal or undesired dynamical behaviors within the node), which is independent of the states of its neighbors. In this case, the node can recover spontaneously after a period of time. An example is the failure of a company characterized by a drop in its market value due to poor management, followed by recovery due to internal restructuring. The second type is external failures, where a node’s failure is externally triggered, e.g., by the failures of its neighboring nodes. After a period of time, as its local “environment” is improved, the node is able to recover spontaneously. The time of recovery depends not only on the specific type of failure-and-recovery mechanism, i.e., whether internal or external, but also on the individual node and its position in the network. For example, for a given node in the network, it may take longer to complete an internal restructuring process to recover from a failure due to an internal than an external cause. Previous computation and mean-field analysis have revealed that cascading dynamics incorporating a failure-and-recovery mechanism can exhibit a rich variety of phenomena such as phase transitions, hysteresis, and phase flipping^{29,30,31,32,33}. With respect to the resilience responses of networks, the effects of removing a fraction of nodes and links on network functions were studied^{34,35,36,37}, demonstrating that resilience can be used to characterize the critical functionality of the network with applications in complex infrastructure engineering^{36,37}.

In spite of the variations in the recovery dynamics across networks or even nodes in the same network, generally the process can be classified into two distinct types: Markovian and non-Markovian. In a Markovian recovery process, an event occurs at a fixed rate and the interevent time follows an exponential distribution^{38,39,40}, rendering memoryless the process. On the contrary, a non-Markovian recovery (NMR) process has memory, as the current state of a node depends not only on the most recent state but also on the previous states. In this case, the interevent time distribution is not exponential but typically exhibits a heavy tail. For example, in human activity and interaction dynamics, the occurrences of contacts among the individuals in a social network can be characteristically non-Markovian, for which there is mounting empirical evidence^{41,42,43,44,45,46,47,48}. Non-Markovian type of recovery process also occurs in biochemical reactions^{49} and in the financial markets^{12,50}. We note that, in the context of spreading dynamics on complex networks, the effects of the non-Markovian process, due to its high relevance to the real world, have attracted growing attention^{51,52,53,54,55,56,57,58}. From the point of view of mathematical analysis, incorporating memories into the dynamical process makes analytic treatment challenging.

While the impacts of non-Markovian processes on spreading dynamics have been reasonably well-documented^{51,52,53,54,55,56,57,58}, there has been little work so far addressing the influence of non-Markovian recovery process on failure propagation dynamics. In this paper, we address this issue systematically through a comparison study of two types of dynamical processes: one with Markovian and another with NMR. In the Markovian recovery (MR) model, failures due to internal and external causes will recover with different constant rates. In the NMR model, such a constant rate cannot be defined. We thus resort to the recovery time. In particular, we assume that the failed nodes due to internal and external causes will take different time to recover, so a memory effect is naturally built into the model. For each model, we develop a mean-field theory and an analysis based on the pair approximation (PA)^{29,59,60,61,62} that retains the two-node correlation but ignores any correlation of higher orders. Comparing results with numerical simulations indicates that both mean-field theory and PA analysis capture the key features of the failure propagation dynamics qualitatively, but the PA analysis yields results that are in better quantitative agreement with numerics. The counterintuitive and striking phenomenon is then that non-Markovian character with a memory effect makes the network more resilient against large-scale failures. There are two implications. Firstly, in physical, biological, or other natural networked systems, the intrinsic non-Markovian character of nodal recovery may be one reason for resilience of these networks and their existence in a harsh environment. Secondly, in engineering and infrastructure design, incorporating certain non-Markovian features into the network may help strengthen its resilience and robustness.

## Results

### Spontaneous recovery models

For general failure propagation dynamics on a network, a node can be in one of two states: an active (labeled as *A*-type) state in which the node functions properly and an inactive state (*I*-type) in which the node has failed. To distinguish the causes for a node to become inactive, we label an inactive node due to internal or external failure as *X*-type or *Y*-type, respectively.

In the NMR model, an *A*-type node may fail spontaneously at the rate *β*_{1} to become an *X*-type node, or it may fail at the rate *β*_{2} to become a *Y*-type node when the number of its *A*-type neighboring nodes is less than or equal to a threshold integer value *m* that sets the limit on neighboring support for proper functioning of a node. Without loss of generality, we assume that external failures occur more frequently than internal failures: *β*_{1} < *β*_{2}. This is often the case as internal failures can be made less probable by building up the capability of the nodes through better equipment and/or management, while external failures are uncontrollable and more difficult to avoid. For examples, falling stocks may be the result of unanticipated changes in the market rather than poor management. In a road network, failures are caused more often by congestion than by physical failures. Once a node becomes inactive, it takes time *τ*_{1} to recover from an internal failure (when the node is of the *X*-type) or time *τ*_{2} to recover from an external failure (when the node is of the *Y*-type). The non-Markovian characteristic is taken into account through the incorporation of a memory effect into the model. In particular, the nodes that will recover at time *t* constitute those that were turned into *X*-type inactive nodes at the time *t* − *τ*_{1} and those turned into *Y*-type inactive nodes at the time *t* − *τ*_{2}. Here, we assume *τ*_{1} > *τ*_{2}, for the reason that repairing a node or restructuring the management due to the malfunctioning of the node itself would need more time. For example, reorganizing a company or repairing a road often takes more time. The failure processes characterized by the rates *β*_{1} and *β*_{2} as well as the recovery processes as determined by *τ*_{1} and *τ*_{2} are schematically illustrated in Fig. 1.

Note that the case of *τ*_{1} < *τ*_{2} may also arise in the real world. For example, for an infrastructure network in civil engineering, when an earthquake strikes and destroys buildings (nodes), the time to rebuild can be longer than that required for recovering from internal failures, e.g., the collapse of a roof due to some material failure. Our computations of this case yield qualitatively similar results to those in the case of *τ*_{1} > *τ*_{2}—see Supplementary Note 3 for detail.

The MR and NMR models differ only in the recovery processes. In the MR model, an inactive node of the *X*-type or the *Y*-type recovers at a constant rate *μ*_{1} or *μ*_{2}, respectively, as illustrated in Fig. 1. Consequently, the number of nodes recovered at time *t* depends only on the number of inactive nodes of both *X*-type and *Y*-type at the previous time step.

To develop theories for failure propagation on networks with MR or NMR recovery process and to identify the key differences between the two type of dynamics, we focus on random regular networks. In the numerical simulations, we use a relatively large network size *N* = 3 × 10^{4} with the degree *k* = 35. In the NMR model, the recovery times are taken to be *τ*_{1} = 100 and *τ*_{2} = 1 for the *X*-type and *Y*-type of nodes, respectively. In the MR model, the values of the recovery rates are set to be *μ*_{1} = 1/*τ*_{1} = 0.01 and *μ*_{2} = 1/*τ*_{2} = 1 so that they correspond to the same scales for the recovery times in the NMR model (see Supplementary Note 1 for a more detailed explanation). The threshold values in both models are *m* = 15. Synchronous updating is invoked in simulations with the time step Δ*t* = 0.01.

### Markovian recovery process

*Mean-field theory*: We start with setting up the dynamical equations for MR dynamics and comparing results with simulations. Based on the mean-field theory in “Mean-field theory for MR dynamics” of “Methods” section, we first examine the behavior of *E*_{t}([*I*]) in Eq. (4). Figure 2a shows the dependence of *E*_{t}([*I*]) on the fraction of failed nodes [*I*]. It can be seen that *E*_{t} exhibits two different types of behaviors over a large part of [*I*]: *E*_{t} ~ 0 for a wide range of small [*I*] values (low failure) and *E*_{t} ~ 1 for a range of large [*I*] values (high failure). In the low failure state, external failure events rarely occur. In the high failure state, an active node is supported by an insufficient number of active neighbors and external failure events almost always happen. It implies that the stationary state [*I*] can possess two branches: setting *E*_{t} = 0 in Eq. (5) gives [*I*] = 1 − 1/(*β*_{1}/*μ*_{1} + 1) as the low-failure branch, while setting *E*_{t} = 1 gives [*I*] = 1 − 1/(*β*_{2}/*μ*_{2} + *β*_{1}/*μ*_{1} + 1) as the high-failure branch. The two branches are shown in Fig. 2b (dashed and solid curves) in terms of the dependence of [*I*] on *β*_{1}, for *μ*_{1} = 0.01, *β*_{2} = 2, and *μ*_{2} = 1 as an example. To check which branch the system would take on and whether there are two states for some range of parameters, the simulation results for moving the value of *β*_{1} up (circles) and down (squares) are shown in Fig. 2b for comparison. As the values of *β*_{1} are increased or decreased, the initial state is taken to be the final state corresponding to the previous value of *β*_{1}—the adiabatic process. The results indicate that: (i) the values of [*I*] from simulations follow the two branches given by the mean-field approximation, and (ii) the low-failure (high-failure) branch is followed when moving *β*_{1} up (down) until a particular value at which there is a jump to the high-failure (low-failure) branch—the signature of a hysteresis. The results also imply that if one starts from the initial conditions [*X*]_{0} ≠ 0 and [*Y*]_{0} = 0, there exists a critical value of *β*_{c} ≈ 0.007 for a sudden increase in the number of failed nodes when [*X*]_{0} is small as the system will follow the low-failure branch. However, for large [*X*]_{0}, the critical value *β*_{c} becomes *β*_{c} ≈ 0.003 as the system will follow the high-failure branch. A plot of *β*_{c} against [*X*]_{0} will therefore exhibit two plateaus with *β*_{c} ≈ 0.007 for small [*X*]_{0} and *β*_{c} ≈ 0.003 for large [*X*]_{0}.

The mean-field approximation not only simplifies the analysis but also provides insights into the dynamical process. For example, the mean-field theory suggests the ratios *β*_{1}/*μ*_{1} and *β*_{2}/*μ*_{2} as key parameters. In general, solutions can be obtained numerically by solving Eq. (5) together with Eq. (4). The results are shown in Fig. 2c as a phase diagram. For parameters falling into the regions corresponding to the low-failure (high-failure) phase, the system will evolve into a low-failure (high-failure) state. For parameters in the bistable phase, the system will evolve either to a low-failure or a high-failure state, depending on the initial conditions. The high-failure and low-failure phase boundaries meet at the critical point determined by *β*_{1}/*μ*_{1} ≈ 0.745 and *β*_{2}/*μ*_{2} ≈ 1.020.

In addition to the stationary state, the evolution of the system can also be studied by iterating Eqs. (1) and (2) for a given initial condition. Figure 3a shows the evolution of the MR dynamics as obtained by the mean-field theory for *β*_{1} = 0.004 and *β*_{2} = 2. In the three-dimensional space formed by [*A*], [*X*], and [*Y*], the sum rule [*A*] + [*X*] + [*Y*] = 1 defines a triangular plane, as shown in Fig. 3. At any time *t*, the state of the system is characterized by a point in the plane. The results show that the MR dynamics will evolve into either the low-failure or the high-failure state (filled circles), depending on where the system begins. The mean-field theory also gives a separatrix, the line traced out by the open circles, where the system will evolve into a different state starting from a point on a different side of the separatrix. For [*X*]_{0} > 0.38, the system will evolve to a high-failure state with ([*X*], [*Y*], [*A*]) given approximately by (0.119, 0.580, 0.301). For [*X*]_{0} < 0.38, the system may evolve to the high-failure state or a low-failure state at around (0.285, 0, 0.715). Numerical results are shown in Fig. 3b, verifying all the features predicted by the mean-field theory. For example, the high-failure state is given by ([*X*], [*Y*], [*A*]) ~ (0.124, 0.579, 0.298) and the low-failure state at around (0.287, 0, 0.713), both are quite close to the values predicted by the mean-field theory.

*Pairwise approximation theory for the MR model*: It is possible to formulate a theory that takes into account of two-node spatial correlation based on the pairwise approximation (PA). The basic idea is to follow the evolution of different types of links, i.e., links that connect different pairs of neighboring nodes^{62}. The PA method has been used widely in studying epidemic and information spreading^{63,64,65}, and in coevolving voter models and adaptive games with two or more strategies^{66,67,68,69}. In “Effect of nodal correlation: pairwise approximation for the MR model” of “Methods” section, we develop a PA based theory for the MR model.

Figure 4 presents a comparison of the predictions of the PA analysis and mean-field theory with the numerical results, where Fig. 4a shows the time evolution of [*X*]_{t} and [*Y*]_{t} from the initial state [*X*]_{0} = [*Y*]_{0} = 0 for *β*_{1} = 0.009, *β*_{2} = 2.0, *μ*_{1} = 0.01, and *μ*_{2} = 1. While both mean-field and PA theories capture the key features in time evolution, the results of PA are in better agreement with those from simulations. It is useful to understand the dynamical behaviors in the MR model qualitatively (so as to enable a meaningful comparison with those of the NMR model later). For this purpose, we identify several stages in the time evolution as marked in Fig. 4a. In the early stage, i.e., *t* ∈ [*t*_{O}, *t*_{A}], most nodes are active and they have more active neighbors, violating the condition *n*_{A} ≤ *m*. As a result, only internal failures occur and [*X*]_{t} grows but [*Y*]_{t} decreases and eventually vanishes. For *t* ∈ [*t*_{A}, *t*_{B}], [*X*]_{t}, active nodes start to fail into *Y*-type nodes, leading to fewer active nodes in the system and triggering more external nodal failures. This results in the observed rapid increase in [*Y*]. In the later stage *t* ∈ [*t*_{B}, *t*_{C}], there are more failed nodes than active ones. While the failed nodes of *X* and *Y* types can recover with their respective rates, the remaining or recovered active nodes will more likely fail again through external than internal causes due to the many failed nodes surrounding the active nodes. Consequently, in this later stage, [*Y*]_{t} increases and [*X*]_{t} decreases toward their respective steady-state values for *t* → *∞*, with [*Y*] > [*X*] when the system evolves into a high-failure state. The PA analysis captures the behavior of [*X*]_{t} over time and the onset of [*Y*]_{t} better than the mean-field analysis. Figure 4b shows the phase diagram for *μ*_{1} = 0.01 and *μ*_{2} = 1.0. The mean-field phase diagram is the same as that shown in Fig. 2c, where it can be seen that the results of the PA analysis (solid curves) are indeed in better agreement with the simulation results than the predictions of the single-node mean-field theory.

Note that Fig. 2 reveals the emergence of a critical value of *β*_{c} in the spontaneous failure rate beyond which the system incurs a large-scale failure starting from the initial conditions [*X*]_{0} ≠ 0 and [*Y*]_{0} = 0. The critical rate *β*_{c} is calculated by starting the system from the initial conditions for different values of *β*_{1} (for a fixed value of *β*_{2} = 2.0) and search for the value of *β*_{1} beyond which the system attains a high-failure state (see Supplementary Fig. 1 in Supplementary Note 2). The critical value thus depends on [*X*]_{0}, the initial fraction of failed nodes due to an internal mechanism. Figure 4c shows the numerically obtained functional relation *β*_{c}([*X*]_{0}) (open circles), together with two types of theoretical prediction (PA analysis and mean-field theory). As the initial fraction [*X*]_{0} is increased from a near zero value, *β*_{c} maintains at a relatively higher constant value (about 0.007). As [*X*]_{0} increases through the value of about 0.4, the value of the critical rate suddenly decreases to about 0.003. We see that, again, the prediction of the abrupt change in *β*_{c} by the PA analysis is more accurate than that by the mean-field theory.

What is the physical meaning of the abrupt decrease in the critical value of the spontaneous failure rate as displayed in Fig. 4c? A higher value of *β*_{c} means that the network system is more resilient to large-scale failures as it requires a larger rate value to drive the system into a high-failure state. As the fraction of initially failed nodes is increased, the network as a whole is more prone to large-scale failure so we expect the value of *β*_{c} to decrease. Because of the lack of any memory effect in the ideal, Markovian type of recovery process, i.e., after a node fails, it either recovers instantaneously or does not recover (with probabilities determined by the rate of recovery), we expect a characteristic change in the system dynamics as characterized by the value of the critical rate *β*_{c} to occur in an abrupt manner. Indeed, as Fig. 4c reveals, as the fraction of initially failed nodes is increased through a threshold value, there is a sudden decrease of about 50% in the value of *β*_{c}, giving rise to a first-order type of transition. This behavior of abrupt transition may not occur in reality because of the assumed Markovian recovery process, which is ideal and cannot be expected to arise typically in the physical world. In the next section, we will demonstrate that making the dynamics more physical by assuming non-Markovian type of recovery process will drastically alter the picture of transition in Fig. 4c.

### Non-Markovian recovery process

To analyze failure propagation dynamics in systems with NMR, a viable approach is to construct difference equations that relate the fractions of types of nodes and links at time *t* + Δ*t* to those at time *t*. It is necessary to keep track of the time when a node becomes the *X* or *Y* type as well as the time at which a link becomes type UV. In “Pairwise approximation theory for the NMR model” of “Methods” section, we develop a PA analysis for the NMR model. Figure 5 shows the simulation results from the NMR model, together with predictions of the PA analysis and mean-field approximation for Δ*t* = 0.01. The time evolution of [*X*]_{t} and [*Y*]_{t} is shown for the parameter setting *β*_{1} = 0.009, *β*_{2} = 2.0, *τ*_{1} = 100 (thus *μ*_{1} = 0.01), and *τ*_{2} = 1 (thus *μ*_{2} = 1). The initial conditions are [*X*]_{0} = [*Y*]_{0} = 0. Both theories capture the key features of the dynamics. Comparing with results from the MR model [e.g., Fig. 4a], we see that the time evolution of the dynamical variables in the NMR model is different from that in the MR model, in spite of the approximately identical steady-state values.

To describe the key features of the NMR model, we divide the evolution into five stages with the respective time intervals [*t*_{O}, *t*_{A}], [*t*_{A}, *t*_{B}], [*t*_{B}, *t*_{C}], [*t*_{C}, *t*_{D}], and [*t*_{D}, *t*_{E}], as shown in Fig. 5a. In the earliest stage [*t*_{O}, *t*_{A}], [*X*]_{t} increases due to internal failures but [*X*]_{t} is insufficient to cause external failures. The behavior is similar to that in the MR model, but the duration is shorter and the rise in [*X*]_{t} is steeper in the NMR model. The reason is that the memory effect in NMR model allows the recovery of *X*-type nodes to take place only after *τ*_{1} steps, while the recovery occurs probabilistically in the MR model. In the narrow time window of [*t*_{A}, *t*_{B}], [*X*]_{t} attains a level high enough to trigger the onset of many external failures. As a result, the failed nodes constitute the majority in the system and [*A*]_{t} decreases sharply, giving rise to the sharp increase in [*Y*]_{t}. The *Y*-type nodes recover deterministically after *τ*_{2} (*τ*_{2} < *τ*_{1}) into active nodes. In the period [*t*_{B}, *t*_{C}], the recovery of *Y*-type nodes refuels the system with active nodes that can participate in two paths: more internal and external failures. For *t*_{C} < *τ*_{1}, the existing *X*-type nodes have yet to recover and [*X*]_{t} continues to increase but at a slower pace due to the external failure path, while [*A*]_{t} reduces slightly.

In the time window [*t*_{C}, *t*_{D}], the initial internally failed nodes begin to recover as *t*_{C} > *τ*_{1}, in addition to the recovery of the *Y*-type nodes. The *A*-type nodes due to recovery will be more likely to become *Y*-type as the failed nodes remain the majority (due to the parameter setting *β*_{2} > *β*_{1} in this example). This leads to the observed increase in [*Y*]_{t} and decrease in [*X*]_{t} in the time interval [*t*_{C}, *t*_{D}]. In the final stage [*t*_{D}, *t*_{E}], [*X*]_{t} stops decreasing because the recovery of *X*-type nodes at the time *t* ≳ *t*_{D} is due to those failed internally at *t* ≳ *t*_{B} for which the number was small. However, the recovery of *Y*-type nodes at a shorter time scale supplies fresh active nodes. The fraction of failed nodes [*X*]_{t} + [*Y*]_{t} is so high, i.e., approaching the high-failure state, that the dynamics lead to a higher steady value of [*Y*] than [*X*] in long time. For time well beyond *t*_{D}, both [*X*] and [*Y*] become steady.

Figure 5b shows the phase diagram of the NMR model analogous to Fig. 4b for the MR model, with *μ*_{1} = 0.01 and *μ*_{2} = 1.0. The results of the PA analysis (solid curve) are in better agreement with the simulation results than those obtained from the mean-field theory (dashed curve). The difference in dynamics in the NMR model also alters the dependence of *β*_{c} to sustain a high-failure state on [*X*]_{0}. Carrying out the same analysis as for the MR model (see Supplementary Fig. 1 in Supplementary Note 2 for details), we get the relationship *β*_{c}([*X*]_{0}) for attaining a high-failure state for a given initial condition, as shown in Fig. 5c. The pair approximation, again, gives more accurate prediction than that from the mean-field theory.

The result in Fig. 5c demonstrates the striking effect of non-Markovian type of recovery with memory on the failure propagation dynamics, which is in stark contrast to the ideal case of Markovian process as exemplified in Fig. 4c. In particular, as the fraction [*X*]_{0} of initially failed nodes is increased from a near zero value to one, the value of *β*_{c} begins to decrease continuously and smoothly until it reaches a minimum, at which *β*_{c} increases relatively more rapidly to a high value of about 0.006 for [*X*]_{0} ≈ 0.3. For [*X*]_{0} > 0.3, the value of *β*_{c} remains approximately constant at 0.006. Comparing Fig. 5c with Fig. 4c, we see two major, characteristic differences. Firstly, the behavior of an abrupt decrease in the Markovian case is replaced by a gradual process in the non-Markovian case, essentially converting a first-order like process to a second-order one. Secondly and more importantly, *β*_{c} recovers from its minimum value and maintains at a high value regardless of the value of [*X*]_{0} insofar as it exceeds about 30%. This means that, the system can maintain its degree of resilience even when the initial fraction of failed nodes reaches 100%! This contrasts squarely the behavior in the Markovian case, where the system resilience is reduced dramatically even when only about 40% of the nodes failed initially. In this sense, we say that a non-Markovian type of memory effect makes the network system more resilient against failure propagation.

While the behavior in Fig. 5c is counterintuitive, a heuristic reason is as follows. For an initial state with many initial *X*-type nodes, the few remaining nodes will switch from being active to the *Y*-type and back. All the initial *X*-type nodes will have to wait for the time period *τ*_{1} to recover. At that time, the system becomes one with only a few failed nodes—effectively equivalent to one with small [*X*]_{0} value and requiring a larger *β*_{c} value to evolve into the high-failure state. In a range of small [*X*]_{0}, a smaller *β*_{c} can already cause more active nodes to become *Y*-type, helping maintain the system in a high-failure state as described for Fig. 5a. Theoretical support for the behavior is provided by the PA analysis and mean-field theory, as shown in Fig. 5c.

In addition to the different time evolution in the MR and NMR models, there are also cases where the same initial conditions [*X*]_{0}, [*Y*]_{0}, and [*A*]_{0} would lead to different final states. Figure 6 shows the final states starting from any [*X*]_{0} and [*Y*]_{0} in the [*X*]_{0} − [*Y*]_{0} plane (the basin structure), with *β*_{1} = 0.004, *β*_{2} = 2.0, *μ*_{1} = 0.01, and *μ*_{2} = 1.0. The results from the mean-field theory (Fig. 6a) and direct simulations (Fig. 6b) show essentially the same features. (Results from the initial-condition setting [*X*]_{0} ≠ 0 and [*Y*]_{0} = 0.0 are presented in Supplementary Fig. 2 of Supplementary Note 2.) It is useful to contrast the final states of the MR and NMR models. From Fig. 3, an initial state, e.g., [*X*]_{0} = [*Y*]_{0} = 0.5, will evolve into a high-failure state in the MR model, but it will end up in a low-failure state in the NMR model. This means that, the NMR process can make the system more resilient to failures. (More examples can be found in Supplementary Fig. 3 of Supplementary Note 2 where different steady states from the two models are presented.)

### MR and NMR dynamics on heterogeneous networks

So far our analysis and simulations have been carried out for MR and NMR dynamics on random regular networks. We find that altering the network structure causes little change in the qualitative results. For example, we have carried out simulations on scale-free networks of size *N* = 3 × 10^{4} with degree range \(\left[{k}_{{\rm{m}}in},\sqrt{N}\right]\) and degree distribution *P*(*k*) ~ *k*^{−γ}. Figure 7 shows the results of *β*_{c} versus [*X*]_{0} for the MR and NMR dynamics for networks with *γ* = 3. Because of the heterogeneity in the nodal degree distribution, the threshold on external failure is given in terms of the fraction one-half of the failed neighbors.

Comparing results with Fig. 4c for MR dynamics and Fig. 5c for NMR dynamics in random regular networks, we see that the key features are similar when the underlying random regular networks are replaced by scale-free networks. We have also carried out numerical simulations on four additional types of synthetic and empirical networks: (a) networks with degree–degree correlation, (b) networks with a community structure, (c) empirical arenas-email network, and (d) empirical friendship-hamster network, with results presented in Supplementary Notes 4 and 5 for the former and latter two cases, respectively. These results, together with Fig. 7, suggest that, for heterogeneous networks, a non-Markovian process tends to enhance the network resilience against large-scale failures.

## Discussion

The intrinsic memory effect associated with non-Markovian processes makes it challenging to analyze the underlying network dynamics, new and surprising phenomena can arise. Most previous studies treated Markovian processes through either a mean-field type of theory^{60,61} or an effective degree approach^{59}. For non-Markovian processes, the mean-field approximation can still be applied^{29,31,32,33}, but it is necessary to invoke a higher-order theory such as the PA analysis. Our work presents such an example in the context of failure propagation in complex networks.

Our study has demonstrated that, in both models, the network can evolve into a low-failure or a high-failure state, with the latter corresponding to the undesired state of large-scale failure. Both the mean-field and PA theories are capable of predicting the dynamical behaviors of failure propagation, and the performances of the theories are gauged by simulation results, revealing that the more laborious pair approximation gives results in better quantitative agreement with the numerics. Our systematic computations on different complex networks and two types of theoretical analyses have uncovered a striking phenomenon: the non-Markovian memory effect in the nodal recovery can counter-intuitively make the network more resilient against large-scale failures.

Our finding also calls for the incorporation of non-Markovian type of memory factors into the design of communication, computer, and infrastructure networks in various engineering disciplines. We hope our work will stimulate interest in examining and exploiting non-Markovian processes in various network dynamical processes. We have carried out a systematic study of the effects of Markovian versus non-Markovian recovery on network synchronization using the paradigmatic Kuramoto network model, with the main finding that non-Markovian recovery makes the network more resilient against large-scale breakdown of synchronization (Supplementary Note 6).

## Methods

### Mean-field theory for MR dynamics

Let [*A*]_{t}, [*X*]_{t}, and [*Y*]_{t} be the fractions of *A*-type, *X*-type, and *Y*-type nodes in the system at time *t*, respectively. A hierarchical set of dynamical equations for the MR model can be constructed to include increasingly longer spatial correlation. The equations for the evolution of the fractions of different types of nodes are:

and

where the first term in each equation gives the supply to [*X*] ([*Y*]) due to internal (external) failures and the second term represents the drop in [*X*] ([*Y*]) due to recovery. Note that, because of the relation

an equation for [*A*]_{t} is unnecessary. The quantity *E*_{t} is the probability of an *A*-type node having *j* ≤ *m* neighbors of *A*-type nodes at time *t* and thus the node will be infected at the rate *β*_{2}.

In general, the quantity *E*_{t} involves the correlation between two neighboring nodes. To connect Eqs. (1) and (2) so as to retain the simplicity of a single-node theory, we use the approximation

where \({C}_{{k}}^{{k-j}}=k!/(j!(k-j)!)\). Equations (1) (4) form a set of equations, from which the fractions of different types of nodes can be solved. This is the simplest single-site mean-field approximation for the MR dynamics that ignores any spatial correlation. Despite its simplicity, it is capable of revealing the key features in the stationary state, in which Eqs. (1) and (2) require the fraction of failed nodes [*I*] to satisfy

which can be solved for [*I*] self-consistently with Eq. (4). Equation (5) implies that [*I*] depends only on the ratios *β*_{1}/*μ*_{1} and *β*_{2}/*μ*_{2} within the mean-field approximation, and so are the other fractions [*A*], [*X*], and [*Y*].

### Effect of nodal correlation: pairwise approximation for the MR model

Our PA based analysis begins by defining [*U**V*]_{t} as the fractions of *U**V* type of links in the system at time *t*, where *U*, *V* ∈ {*A*, *X*, *Y*}. A connection that stems out from a node can be classified by a type. For example, for a node with the current state being *A*-type, each link that it carries can be classified into the *A**A*, *AX*, or *A**Y* type, depending on the state of the node at the other end of the link. Taking into account every link from every node, we have that the fractions of links satisfy

with [*U**V*]_{t} = [*V**U*]_{t} for *U* ≠ *V*.

In general, the equations of single-node quantities, e.g., Eq. (2), necessarily involve quantities of more extensive spatial correlation because the interplay between the failure of a node and the states of its neighboring nodes. Since [*A**I*]_{t}/[*A*]_{t} = ([*AX*]_{t} + [*A**Y*]_{t})/[*A*]_{t} is the probability of an *A*-type node having an inactive node regardless of the types of the neighbors, the probability that there are exactly *j* neighbors of *A*-type and (*k* − *j*) inactive neighbors of either *X* or *Y* type is

where *k* is the degree of the node. The quantity *E*_{t} in Eq. 2, as schematically depicted in Fig. 8a, is thus given by

which indicates explicitly that the dynamics of single-node quantities are governed by the two-node quantity [*A**I*]_{t}. This is reminiscence of the BBGKY (Bogoliubov-Born-Green-Kirkwood-Yvon) hierarchy of equations for the distribution functions in a system consisting of a large number of interacting particles in statistical physics^{70}. Only under the approximation [*A**I*]_{t} ≈ [*A*]_{t}[*I*]_{t} (so that the two-node correlation can be neglected) will the resulting equation be Eq. (4)—a set of single-node mean-field equations.

To proceed, we derive the dynamical equations for [*U**V*]_{t} that will in general involve more extensive spatial correlation. For example, a link of the type *A**A* would evolve into a different type depending on the neighborhoods of the two nodes, effectively a small cluster of nodes. To develop a manageable approximation, we retain the two-node correlation and decouple any longer spatial correlation in terms of one-node and two-node functions. This is the idea behind PA for obtaining a closed set of equations. In particular, the dynamical equations for [*AX*]_{t} and [*A**A*]_{t} are

and

where

is the probability of an *A*-type node having *j* ≤ *m* *A*-type neighbors among its (*k* − 1) neighbors, given that one neighbor is inactive, and

is the probability of an *A*-type node having *j* ≤ *m* − 1 *A*-type neighbors among its (*k* − 1) neighbors, given that one neighbor is active. Figure 8 illustrates the meanings of *E*_{t}, \({E}_{{t}}^{\prime}\), and \({E}_{{t}}^{^{\prime\prime} }\) schematically. The terms in Eqs. (9) and (10) account for how the recovery and failure processes affect the fractions of *AX*-type and *A**A*-type links. The complete set of dynamical equations is listed in Supplementary Note 1, which can be solved iteratively to yield the temporal variations on the type of nodes and the type of links given an initial condition. The steady-state quantities can be obtained through a sufficiently large number of iterations.

### Pairwise approximation theory for the NMR model

Specifically, we let \({[{U}^{l}]}_{{t}}\) be the fraction of nodes of type *U* at time *t*, which became type *U* from some other type only *l* time steps ago, and \({[{U}^{{l}_{1}}{V}^{{l}_{2}}]}_{{t}}\) be the fraction of links of the UV type when the corresponding node(s) associated with a link became that of the labeled type *l*_{1} and *l*_{2} time steps ago. The time evolution of the fraction of *X*-type nodes in the NMR model is given by

The first line in Eq. (13) gives the new supply due to internal failure of *A*-type nodes in the time duration [*t*, *t* + Δ*t*). The second line accounts for the nodes which were inactive for a duration *l* − Δ*t* at time *t* but have not reached the time for recovery at time *t* + Δ*t*. The third line states that all *X*-type nodes that came to existence *τ*_{1} earlier have been recovered. Similarly, the time evolution of the fraction of *Y*-type nodes is given by

where *E*_{t} is defined in Eq. (8) and [*A**I*]_{t} = [*AX*]_{t} + [*A**Y*]_{t}. The fractions of *X*-type and *Y*-type nodes, regardless of how long they have been in the corresponding state, are given by \({[X]}_{{t}}=\mathop{\sum }\nolimits_{l = 0}^{{\tau }_{1}}{[{X}^{l}]}_{{t}}\) and \({[Y]}_{{t}}=\mathop{\sum }\nolimits_{l = 0}^{{\tau }_{2}}{[{Y}^{l}]}_{{t}}\), respectively. The fraction of active nodes follows from [*A*]_{t} = 1 − [*X*]_{t} − [*Y*]_{t}.

To develop a PA analysis for failure propagation dynamics with NMR, we construct the equations for the time evolution of *U**V*-types of links and retain spatial correlation up to two neighboring nodes. Our derivation of the counterparts of Eqs. (13) and (14) in the MR case suggests the necessity to examine the history of the inactive nodes(s) associated with a link. For example, the time evolution of the links in \({[A{X}^{l}]}_{{t}}\) is governed by

where \(E^{\prime}\) is defined in Eq. (11). The first line represents the new supply to *AX*-type of links due to an internal failure in one of the active nodes associated with a link of the *A**A*-type, and an internal failure together with a recovery of an inactive node in a link of the *X**A*-types and *Y**A*-types. The second line includes the supply to *AX*^{l}-type links due to recoveries from *X**X* and *Y**X* types as well as the links of *AX*^{l−Δt} type that became *AX*^{l} type in the recent duration Δ*t*. The last line comes from the fact that an *X*-type node must recover after a time *τ*_{1} since it became inactive. The fraction of links of *AX*-type, regardless of how long the node in the link has taken in the *X*-type, is given by \({[AX]}_{{t}}=\mathop{\sum }\nolimits_{l = 0}^{{\tau }_{1}}{[A{X}^{l}]}_{{t}}\). We thus have that the fraction of *A**A*-type of links evolves in time as

where \({E}_{t}^{^{\prime\prime} }\) is defined in Eq. (12). Equations for other types of links can also be constructed (Supplementary Note 1). Equations (15) and (16) are analogous to Eqs. (9) and (10) in the MR model. The number of equations is determined by the divisions of *τ*_{1} and *τ*_{2} into the small time steps Δ*t*, which increases rapidly when *Δ**t* is small compared with the other time scales in the NMR dynamics.

A crude approximation analogous to the mean-field theory can be developed for the NMR model by retaining only the fractions of nodes in the equations, which can be done by decoupling the two-node quantities such as [*A**I*]_{t} by [*A**I*]_{t} ≈ [*A*]_{t}[*I*]_{t}. The resulting equations governing the fractions of different types of nodes become

and

where *E*_{t} takes on the approximate form in Eq. (4). Equations (17), (18), and (4) form a set of equations that can be solved to yield the fractions of different types of nodes. The first two terms in Eqs. (17) and (18) correspond to the increase in inactive nodes due to failure and due to those remaining inactive, and the last term corresponds to recovery. The number of equations, again, depends on the choice of *Δ**t*. This is the mean-field approximation for the NMR model that ignores any spatial correlation.

### Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

## Data availability

The source data underlying Figs. 2–7 and Supplementary Figs. 1–12 are available at https://github.com/zhlin2328/Codes-for-NCOMMS-19-1125220.

## Code availability

C++ codes to reproduce the data in the main text and the Supplementary Information are available at https://github.com/zhlin2328/Codes-for-NCOMMS-19-1125220.

## References

- 1.
Motter, A. E. & Lai, Y.-C. Cascade-based attacks on complex networks.

*Phys. Rev. E***66**, 065102 (2002). - 2.
Zhao, L., Park, K. & Lai, Y.-C. Attack vulnerability of scale-free networks due to cascading breakdown.

*Phys. Rev. E***70**, 035101(R) (2004). - 3.
Zhao, L., Park, K., Lai, Y.-C. & Ye, N. Tolerance of scale-free networks against attack-induced cascades.

*Phys. Rev. E***72**, 025104(R) (2005). - 4.
Galstyan, A. & Cohen, P. Cascading dynamics in modular networks.

*Phys. Rev. E***75**, 036109 (2007). - 5.
Bialek, J. W. Why has it happened again? Comparison between the UCTE blackout in 2006 and the blackouts of 2003. In

*Power Tech 2007 IEEE Lausanne*51–56 (IEEE, 2007) - 6.
Dobson, I., Carreras, B. A., Lynch, V. E. & Newman, D. E. Complex systems analysis of series of blackouts: Cascading failure, critical points, and self-organization.

*Chaos***17**, 026103 (2007). - 7.
Gleeson, J. P. Cascades on correlated and modular random networks.

*Phys. Rev. E***77**, 046117 (2008). - 8.
Rosato, V. et al. Modelling interdependent infrastructures using interacting dynamical models.

*Int. J. Crit. Infrastruct.***4**, 63–79 (2008). - 9.
Huang, L., Lai, Y.-C. & Chen, G. Understanding and preventing cascading breakdown in complex clustered networks.

*Phys. Rev. E***78**, 036116 (2008). - 10.
Simonsen, I., Buzna, L., Peters, K., Bornholdt, S. & Helbing, D. Transient dynamics increasing network vulnerability to cascading failures.

*Phys. Rev. Lett.***100**, 218701 (2008). - 11.
Yang, R., Wang, W.-X., Lai, Y.-C. & Chen, G. Optimal weighting scheme for suppressing cascades and traffic congestion in complex networks.

*Phys. Rev. E***79**, 026112 (2009). - 12.
Takayasu, M., Watanabe, T. & Takayasu, H.

*Econophysics Approaches to Large-Scale Business Data and Financial Crisis: Proceedings of Tokyo Tech-Hitotsubashi Interdisciplinary Conference and APFA7*(Springer, 2010) - 13.
Huang, L. & Lai, Y.-C. Cascading dynamics in complex quantum networks.

*Chaos***21**, 025107 (2011). - 14.
Wang, W., Lai, Y.-C. & Armbruster, D. Cascading failures and the emergence of cooperation in evolutionary-game based models of social and economical networks.

*Chaos***21**, 033112 (2011). - 15.
Liu, R.-R., Wang, W.-X., Lai, Y.-C. & Wang, B.-H. Cascading dynamics on random networks: crossover in phase transition.

*Phys. Rev. E***85**, 026110 (2012). - 16.
Li, D. et al. Percolation transition in dynamical traffic network with evolving critical bottlenecks.

*Proc. Natl Acad. Sci. USA***112**, 669–672 (2015). - 17.
Parshani, R., Buldyrev, S. V. & Havlin, S. Critical effect of dependency groups on the function of networks.

*Proc. Natl Acad. Sci. USA***108**, 1007–1010 (2011). - 18.
Watts, D. J. A simple model of global cascades on random networks.

*Proc. Natl Acad. Sci. USA***99**, 5766–5771 (2002). - 19.
Dodds, P. S. & Watts, D. J. Universal behavior in a generalized model of contagion.

*Phys. Rev. Lett.***92**, 218701 (2004). - 20.
Simonsen, I., Buzna, L., Peters, K., Bornholdt, S. & Helbing, D. Transient dynamics increasing network vulnerability to cascading failures.

*Phys. Rev. Lett.***100**, 218701 (2008). - 21.
Buldyrev, S. V., Parshani, R., Paul, G., Stanley, H. E. & Havlin, S. Catastrophic cascade of failures in interdependent networks.

*Nature***464**, 1025 (2010). - 22.
Ganin, A. A. et al. Resilience and efficiency in transportation networks.

*Sci. Adv.***3**, e1701079 (2017). - 23.
Nudo, R. J. Recovery after brain injury: mechanisms and principles.

*Front. Hum. Neurosci.***7**, 887 (2013). - 24.
Shang, Y. Impact of self-healing capability on network robustness.

*Phys. Rev. E***91**, 042804 (2015). - 25.
Hu, F., Yeung, C. H., Yang, S., Wang, W. & Zeng, A. Recovery of infrastructure networks after localised attacks.

*Sci. Rep.***6**, 24522 (2016). - 26.
White, S. R. et al. Autonomic healing of polymer composites.

*Nature***409**, 794 (2001). - 27.
Toohey, K. S., Sottos, N. R., Lewis, J. A., Moore, J. S. & White, S. R. Self-healing materials with microvascular networks.

*Nat. Mater.***6**, 581 (2007). - 28.
Desmurget, M., Bonnetblanc, F. & Duffau, H. Contrasting acute and slow-growing lesions: a new door to brain plasticity.

*Brain***130**, 898–914 (2007). - 29.
Majdandzic, A. et al. Spontaneous recovery in dynamical networks.

*Nat. Phys.***10**, 34 (2014). - 30.
Podobnik, B. et al. Network risk and forecasting power in phase-flipping dynamical networks.

*Phys. Rev. E***89**, 042807 (2014). - 31.
Podobnik, B. et al. Predicting the lifetime of dynamic networks experiencing persistent random attacks.

*Sci. Rep.***5**, 14286 (2015). - 32.
Podobnik, B. et al. The cost of attack in competing networks.

*J. R. Soc. Interface***12**, 20150770 (2015). - 33.
Majdandzic, A. et al. Multiple tipping points and optimal repairing in interacting networks.

*Nat. Commun.***7**, 10850 (2016). - 34.
Council, N. R. et al.

*Disaster Resilience: A National Imperative*(The National Academies Press, Washington DC, 2012). - 35.
Gao, J., Barzel, B. & Barabási, A.-L. Universal resilience patterns in complex networks.

*Nature***530**, 307–312 (2016). - 36.
Ganin, A. A. et al. Operational resilience: concepts, design and analysis.

*Sci. Rep.***6**, 1–12 (2016). - 37.
Linkov, I. & Trump, B. D.

*The Science and Practice of Resilience*(Springer, 2019) - 38.
Pastor-Satorras, R., Castellano, C., Van Mieghem, P. & Vespignani, A. Epidemic processes in complex networks.

*Rev. Mod. Phys.***87**, 925 (2015). - 39.
Wang, W., Tang, M., Stanley, H. E. & Braunstein, L. A. Unification of theoretical approaches for epidemic spreading on complex networks.

*Rep. Prog. Phys.***80**, 036603 (2017). - 40.
de Arruda, G. F., Rodrigues, F. A. & Moreno, Y. Fundamentals of spreading processes in single and multilayer complex networks.

*Phys. Rep.***756**, 1–60 (2018). - 41.
Barabasi, A.-L. The origin of bursts and heavy tails in human dynamics.

*Nature***435**, 207 (2005). - 42.
González, M. C., Hidalgo, C. A. & Barabási, A. L. Understanding individual human mobility patterns.

*Nature***453**, 779–782 (2008). - 43.
Simini, F., González, M. C., Maritan, A. & Barabási, A. L. A universal model for mobility and migration patterns.

*Nature***484**, 96–100 (2012). - 44.
Zhao, Z.-D. et al. Emergence of scaling in human-interest dynamics.

*Sci. Rep.***3**, 3472 (2013). - 45.
Zhao, Z.-D., Huang, Z.-G., Huang, L., Liu, H. & Lai, Y.-C. Scaling and correlation of human movements in cyber and physical spaces.

*Phys. Rev. E***90**, 050802(R) (2014). - 46.
Pappalardo, L. et al. Returners and explorers dichotomy in human mobility.

*Nat. Commun.***6**, 8166 (2015). - 47.
Zhao, Y.-M., Zeng, A., Yan, X.-Y., Wang, W.-X. & Lai, Y.-C. Unified underpinning of human mobility in the real world and cyberspace.

*N. J. Phys.***18**, 053025 (2016). - 48.
Yan, X.-Y., Wang, W.-X., Gao, Z.-Y. & Lai, Y.-C. Universal model of individual and population mobility on diverse spatial scales.

*Nat. Commun.***8**, 1639 (2017). - 49.
Bratsun, D., Volfson, D., Tsimring, L. S. & Hasty, J. Delay-induced stochastic oscillations in gene regulation.

*Proc. Natl Acad. Sci. USA***102**, 14593–14598 (2005). - 50.
Scalas, E., Kaizoji, T., Kirchler, M., Huber, J. & Tedeschi, A. Waiting times between orders and trades in double-auction markets.

*Phys. A***366**, 463–471 (2006). - 51.
Vazquez, A., Racz, B., Lukacs, A. & Barabasi, A.-L. Impact of non-Poissonian activity patterns on spreading processes.

*Phys. Rev. Lett.***98**, 158702 (2007). - 52.
Iribarren, J. L. & Moro, E. Impact of human activity patterns on the dynamics of information diffusion.

*Phys. Rev. Lett.***103**, 038702 (2009). - 53.
Van Mieghem, P. & Van de Bovenkamp, R. Non-Markovian infection spread dramatically alters the susceptible-infected-susceptible epidemic threshold in networks.

*Phys. Rev. Lett.***110**, 108701 (2013). - 54.
Jo, H.-H., Perotti, J. I., Kaski, K. & Kertész, J. Analytically solvable model of spreading dynamics with non-Poissonian processes.

*Phys. Rev. X***4**, 011041 (2014). - 55.
Kiss, I. Z., Röst, G. & Vizi, Z. Generalization of pairwise models to non-Markovian epidemics on networks.

*Phys. Rev. Lett.***115**, 078701 (2015). - 56.
Starnini, M., Gleeson, J. P. & Boguñá, M. Equivalence between non-Markovian and Markovian dynamics in epidemic spreading processes.

*Phys. Rev. Lett.***118**, 128301 (2017). - 57.
Sherborne, N., Miller, J., Blyuss, K. & Kiss, I. Mean-field models for non-Markovian epidemics on networks.

*J. Math. Biol.***76**, 755–558 (2018). - 58.
Feng, M., Cai, S.-M., Tang, M. & Lai, Y.-C. Equivalence and its invalidation between non-Markovian and Markovian spreading dynamics on complex networks.

*Nat. Commun.***10**, 3748 (2019). - 59.
Valdez, L. D., DiMuro, M. A. & Braunstein, L. A. Failure-recovery model with competition between failures in complex networks: a dynamical approach. J. Stat. Mech.

**2016**, 093402 (2016) - 60.
Böttcher, L., Nagler, J. & Herrmann, H. J. Critical behaviors in contagion dynamics.

*Phys. Rev. Lett.***118**, 1–5 (2017). - 61.
Böttcher, L., Luković, M., Nagler, J., Havlin, S. & Herrmann, H. J. Failure and recovery in dynamical networks.

*Sci. Rep.***7**, 41729 (2017). - 62.
Keeling, M., Rand, D. & Morris, A. Correlation models for childhood epidemics.

*Proc. R. Soc. Lond. Ser. B***264**, 1149–1156 (1997). - 63.
benAvraham, D. & Köhler, J. Mean-field (n, m)-cluster approximation for lattice models.

*Phys. Rev. A***45**, 8358 (1992). - 64.
Mata, A. S. & Ferreira, S. C. Pair quenched mean-field theory for the susceptible-infected-susceptible model on complex networks.

*EPL***103**, 48003 (2013). - 65.
Gross, T., D’Lima, C. J. D. & Blasius, B. Epidemic dynamics on an adaptive network.

*Phys. Rev. Lett.***96**, 208701 (2006). - 66.
Ji, M., Xu, C., Choi, C. W. & Hui, P. M. Correlation and analytic approaches to co-evolving voter models.

*N. J. Phys.***15**, 113024 (2013). - 67.
Zhang, W., Xu, C. & Hui, P. M. Spatial structure enhanced cooperation in dissatisfied adaptive snowdrift game.

*Eur. Phys. J. B***86**, 196 (2013). - 68.
Zhang, W., Li, Y. S., Du, P., Xu, C. & Hui, P. M. Phase transitions in a coevolving snowdraft game with costly rewiring.

*Phys. Rev. E***90**, 052819 (2014). - 69.
Choi, C. W., Xu, C. & Hui, P. M. Adaptive cyclically dominating game on co-evolving networks: numerical and analytic reuslts.

*Eur. Phys. J. B***90**, 190 (2017). - 70.
Harris, S.

*An Introduction to the Theory of the Boltzmann Equation*(Courier Corporation, 2004)

## Acknowledgements

The authors would like to thank Zhenhua Wang for helpful discussions. This work was supported by the National Natural Science Foundation of China (Grant Nos. 11975099, 11575041, 11675056 and 11835003), the Natural Science Foundation of Shanghai (Grant No. 18ZR1412200), and the Science and Technology Commission of Shanghai Municipality (Grant No. 14DZ2260800). Y.C.L. would like to acknowledge support from the Vannevar Bush Faculty Fellowship program sponsored by the Basic Research Office of the Assistant Secretary of Defense for Research and Engineering and funded by the Office of Naval Research through Grant No. N00014-16-1-2828.

## Author information

### Affiliations

### Contributions

Z.-H.L., M.T., and Z.H.L. designed research; Z.-H.L. performed research; Z.-H.L., M.F., M.T., Z.H.L., C.X., and P.M.H. contributed analytic tools; Z.-H.L., M.F., M.T., Z.H.L., C.X., P.M.H., and Y.-C.L. analyzed data; Z.-H.L., M.T., Z.H.L., P.M.H., and Y.-C.L. wrote the paper.

### Corresponding authors

## Ethics declarations

### Competing interests

The authors declare no competing interests.

## Additional information

**Peer review information** *Nature Communications* thanks Francisco Rodrigues, Igor Linkov and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available

**Publisher’s note** Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Supplementary information

## Rights and permissions

**Open Access** This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

## About this article

### Cite this article

Lin, ZH., Feng, M., Tang, M. *et al.* Non-Markovian recovery makes complex networks more resilient against large-scale failures.
*Nat Commun* **11, **2490 (2020). https://doi.org/10.1038/s41467-020-15860-2

Received:

Accepted:

Published:

## Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.