Different from the direct contact in epidemics spread, overload failures propagate through hidden functional dependencies. Many studies focused on the critical conditions and catastrophic consequences of cascading failures. However, to understand the network vulnerability and mitigate the cascading overload failures, the knowledge of how the failures propagate in time and space is essential but still missing. Here we study the spatio-temporal propagation behaviour of cascading overload failures analytically and numerically on spatially embedded networks. The cascading overload failures are found to spread radially from the centre of the initial failure with an approximately constant velocity. The propagation velocity decreases with increasing tolerance, and can be well predicted by our theoretical framework with one single correction for all the tolerance values. This propagation velocity is found similar in various model networks and real network structures. Our findings may help to predict the dynamics of cascading overload failures in realistic systems.
Resilience of individual components in networks is determined not only by their own intrinsic properties, but also by their functional interactions with other components. For example, a failure of one component in a network may lead to overloads and failures of other components. Starting with a localized failure, such interactions between components can ignite a domino-like cascading failure, which may result in catastrophes such as those observed in many realistic networks1,2,3,4,5. The devastating consequences of these cascading failures have stimulated extensive studies6,7,8,9.
Many studies provided deep insight on the conditions and outcome of cascading failures10,11,12,13,14,15,16,17, which may help to evaluate the system resilience. However, to predict and mitigate the failure spreading in a network, understanding their spatio-temporal propagation properties is essential but still missing. Different from structural cascading failures caused by direct causal dependencies18,19,20, overload failures21,22,23,24 usually propagate through invisible paths as a result of cooperative interactions in the system. Actually, a fundamental question has rarely been posed: how overload failures propagate with time in space during the cascading process? Indeed, predicting the spatio-temporal propagation of cascading failures could determine the timing and resource allocation of an effective mitigation strategy in corresponding self-healing technologies.
In this paper, we aim to understand the spatio-temporal propagation of the cascading overload failures in spatially embedded systems25. We define two basic quantities to describe the spatio-temporal propagation properties of cascading failures. The first quantity, rc(t), is the average Euclidean distance of the failures appearing at cascading step t from the centre of the initial failure. The second quantity, Fr(t), is the number of node failures that occur at cascading step t. While rc can help system regulators to set a ‘firewall’ at suitable locations before the failure arrives, Fr can suggest the ‘height’ of the firewall. In our current study, the propagation of cascading overloads is found to follow an approximately constant velocity. This propagation velocity decreases as the system tolerance increases, which can be well predicted by our theoretical analysis with one single constant correction. The propagation velocity is also found here to be similar in various model networks and real network structures.
Propagation properties of cascading overload failures
In this study, we focus on the cascading overloads on spatially embedded networks caused by localized attacks, which are common in natural disaster and malicious attacks26,27. In Fig. 1, we show snapshots of the simulated cascading failures, where the failures spread almost radially from the initial attack region and finally spread over the whole system. As can be seen from Fig. 1, the nodes with spatial location closer to the initial failure begin to fail first and form approximately a ‘ring of failures’. The ring begins to grow and expand with time until it reaches the system’s boundary.
Considering the ring shape of the cascading overload failures (originating from the initial location), it is reasonable to quantify rc(t) and Fr(t) at cascading step t. Figure 2a shows that the propagation radius of cascading failures is increasing almost linearly with time for different system sizes. This means that the failures spread during the cascade process with an almost constant speed (slope of rc(t)). It can be seen that as the system linear size L increases, the propagation velocity of overload failures increases. The propagation radius of failures is increasing with time and becomes saturated near the boundary. The propagation size of failure Fr (number of new failures at each instant) increases until a certain time step and then decreases (Fig. 2b). The behaviour at large t is due to the finite size of the system, when rc(t) reaches the order of L the amount of damage can only decrease. Note that for different system sizes, the propagation size of failure Fr reaches the maximum at similar instants, which results from the higher velocity in larger system sizes.
To further understand the effect of system size and tolerance α on failure propagation rc(t), we rescale it in Fig. 2c by the system linear size L. We find that the curves of rescaled rc collapse into a single curve for different system sizes at a given tolerance α, suggesting that failures spread in the same relative velocities for different system sizes. When the size of failures, Fr(t), is rescaled by the system size, L × L, the different curves of Fr(t)/L2 also collapse into a single curve (Fig. 2d). It can be seen that the spreading time needed to reach the maximum Fr(t)/L2 is determined by the tolerance α, which implies that large tolerance can postpone the collapse of the system.
In contrast to the dynamics such as epidemics28 that propagate owing to nearest-neighboring interactions such as contact infection, cascading overloads propagate as a result of the global interaction between all the flows contributed by the whole system. Surprisingly, although the interactions are global, the propagation dynamics in the model network and realistic networks (see Supplementary Figs 1–6 and Supplementary Notes 1 and 2 for more examples) are found rather local. Here the local propagation means that there is a finite characteristic distance (Δ(rc)) between the successive overloads. This characteristic distance is the value of propagation velocity, which increases non-linearly with decreasing tolerance. The nearest-neighbor propagation of overloads usually assumed in some complex network models only corresponds to the limiting case of this local propagation (Δ(rc)=1).
Propagation results based on theoretical analysis
To further explore these propagation behaviours of cascading failures found in simulations and their relations with tolerance, we develop the theory (see Fig. 3, Methods section and Supplementary Methods for more details) to describe the cascading overloads.
As can be seen in Fig. 4, our theoretical analysis reproduces well the spatio-temporal propagation features of cascading overloads found in the simulations (Fig. 2). Specifically, as shown in Fig. 4a,c, the velocity of the failure propagation is almost constant in most of the time t, and decreases with increasing tolerance or decreasing system size. As shown in Fig. 4b,d, the number of failures increases with time and then begins to drop after reaching the peak, which is reduced by increasing the tolerance α. The instant for the maximal failure size is independent of the system size, but increases with the system tolerance. Moreover, similar to simulation results, both the radius and size of failures in the cascades can also be rescaled with system size (L and L2, respectively) as seen in Fig. 4c,d, where the different curves for different system sizes collapse into a single curve at a given α. Note that both rc(t) and Fr(t) demonstrate a small slope in the few initial steps, which is caused by the extremely small initial failure considered in the theory. As shown in Supplementary Fig. 7, if the initial size of attack is large, both quantities show a higher slope also in the few initial cascading steps.
As seen in Supplementary Fig. 8 (see Supplementary Note 3 for more details), there exists a critical value of α, αc, above which no spreading occurs. Moreover, we find the theoretical relation (Supplementary Fig. 9) between the system tolerance and the critical initial failure size, where the system can sustain a larger size of initial failure for a larger α.
Comparison between simulation and theory
To test our theory, we perform a quantitative comparison between the simulation and the theoretical model. As shown in Fig. 5a, the propagation velocity in different model networks and real networks can be well predicted by the theory with the same constant correction for all the values of tolerances. This constant correction, close to 2π, is a result of the anisotropic propagation of overloads in the simulation, which is assumed isotropic in the theory. These good agreements between theory and simulations support the validity of our proposed theoretical framework for cascading overloads.
The local propagation found in the simulation and theory is due to the mechanism of overload redistribution. In the simulation of the Motter–Lai model in Fig. 5b, the overloads propagate rather locally (with a characteristic distance) after an initial damage due to the redistribution of optimal paths between existing pairs of nodes. The optimal paths that passed through the previous damaged area will now mainly be redistributed close to this area, leading to the local overloads in spite of global interactions. Similar to our findings in the Motter–Lai model, the overloads in the theory (as shown in Fig. 5b and Supplementary Fig. 10) are found here mainly distributed close to the previous damage area, which causes the local propagation of failures (also as shown in Supplementary Fig. 11 and Supplementary Note 4). Note the agreement in Fig. 5b between theory and simulations (with the same constant correction of 2π, due to the anisotropic nature of the failure spreading, which is assumed isotropic in the theory). The excellent agreement between simulations and theory of the overload (Fig. 5b) as a function of distance from the original failure also supports our theoretical approach.
It is worth noting that the analysis of our theoretical framework is not limited by the computational complexity of calculating optimal paths. This is in contrast to simulations based on betweenness (number of optimal paths passing through a node), which require heavy computations and therefore are limited to relatively small systems. Our theoretical framework thus provides an efficient way to explore and understand the cascading behaviour of failures for any system size.
Universality of propagation velocity
To explore the possibility of universal feature in overload propagation, here we analyse the propagation of cascading overload failures on different network models and realistic networks. Besides the model networks including lattice, circular lattice, planar graph and circular plate, we also study the overload propagation on the realistic road networks29. We initiate a local attack in the geographical centre of these spatially embedded networks and analyse the spatio-temporal evolution of rc and Fr. As shown in Fig. 5a (see Supplementary Figs 1–6 for more examples), we find that propagation velocity of overloads is similar in various model and real network structures, which can be well predicted by our theory. The propagation velocity is independent of detailed network structures, which makes our findings more applicable. This universal propagation of overloads can be attributed to the common mechanism of overload redistribution in different networks, independent of structure difference between these networks. Furthermore, both theory and simulations suggest that for a given system size and a given tolerance, the size of initial failure does not influence the spreading velocity (Supplementary Figs 12 and 13 and Supplementary Notes 5 and 6).
Cascading failures represent the manifestation of non-linear butterfly effect in infrastructure networks, which can cause catastrophic damages due to a small local disturbance. The Motter–Lai-type overload cascade models are an important class of cascading failure dynamics, characterized by the nonlocal—in contrast to epidemic spreading-type local cascade models—interactions, and have been studied extensively for the last decade. Given the inherent global interactions in the mechanism of overload formation, it is of interest that the overloads spread rather locally from the initial attack region, at velocity that increases non-linearly with decreasing tolerance. Here the local propagation means that there is a finite characteristic distance between the successive overloads, which is the value of propagation velocity. The nearest-neighboring propagation of overloads usually assumed in some complex network models only corresponds to the limiting case of this local propagation (characteristic distance is 1).
For different model and realistic networks, our results suggest the existence of universal propagation features of cascading overloads, which are characterized by a finite linear propagation velocity. This velocity can be predicted by our theory with the same constant correction for all the values of tolerances. This constant correction, close to 2π, is a result of the anisotropic propagation of overloads in the simulation, which is assumed isotropic in the theory. This universal behaviour comes from the common global mechanisms of overload redistributions in different networks. When certain extreme heterogeneous networks like embedded scale free networks are considered, a revision of the theoretical framework may be needed. While the focus of this manuscript is on the spatio-temporal propagation of the cascading overload failures in spatially embedded systems, the overloads may spread very fast in general non-spatially embedded sparse network due to its small diameter.
The present study may help to bridge the longstanding gap between the overload model16,22 and the model of dependency links proposed by Buldyrev et al.13 and Parshani et al.18, in particular the lattice version of the model19,20 where dependency links can have a characteristic length r. Indeed, as can be seen in Figs 2 and 4, overload failures propagate in a nearly constant speed, which suggests a characteristic dependency distance, between successive overload failures. Furthermore, this speed or characteristic distance is found to increase with decreasing tolerance. This suggests a possible mapping between systems with overload failures and networks with dependency links, where networks with different characteristic length of dependency links can serve as a suitable model to describe cascading overload failures. This mapping can be useful since overload models usually require heavy computations and are therefore limited to small systems, while dependency models require significantly less computations, and large systems can be easily analysed.
When a disturbance is detected in networks, the knowledge of spatio-temporal propagation properties of cascading failures is essential for predicting and mitigating the cascading failures. Meanwhile, realistic cascading failures are usually the result of the collective interactions between different processes including overloads and other system operation procedures30,31,32. The universal features of overload propagation found here across different spatially embedded networks may help to better mitigate realistic cascading failure, if combined with the detailed knowledge of other processes including system operations and planning procedures.
Simulation of model
To study the propagation properties of cascading overload failures, we model the spatially embedded network as a randomly weighted L × L lattice with periodic boundaries, where L is the linear length of the lattice. The weight of each link is taken from a Gaussian distribution N(μ, σ2) with mean weight μ and the disorder is represented by the standard deviation σ. In this model22, the load on a node i, Li, represents the number of optimal paths between all pairs of nodes passing through this node. A node i will fail when its load Li is more than (1+α) times its original load, where α represents the system tolerance to overloads. A randomly localized region of the system is initially removed to trigger the cascading overloads. This kind of initial failure is motivated by the fact that natural disasters (like earthquake or floods) or malicious attacks (like electromagnetic pulse (EMP)) usually occur in specific geographical locations and destroy initially localized regions of the network. The initial failure is located in a randomly selected square of l × l nodes (l<<L), which are removed initially. In addition to being realistic, since failures are usually localized, this configuration can also help us to follow and analyse the spatio-temporal propagation pathway of the cascading failures. This local failure may trigger failures of other nodes if their load value exceeds the tolerance threshold due to the load redistribution across the entire network. Given the periodic boundary conditions, we can position the initial attack region at the centre of the lattice.
Cascading overload failures due to an initial local failure are produced by the redistribution of loads in the network. From the observations of simulations on weighted network, we find (Fig. 1) that failures spread in a ring shape from the centre of the initial damage. This inspires us to assume in our theory that the network is embedded in a two-dimensional circular plate (Fig. 3), the initial failure is within a (red) circle of radius a, and the main overload due to traffic after the initial failure is located in the (cyan) ring adjacent to the initial failure (see the cyan ring in Fig. 3), whose size will be determined by the theory. The overload is reflected by the increase of the number and lengths of shortest paths passing through this ring. If the overload exceeds the capacity tolerance of a node ((1+α) times its original load) within the adjacent ring, the node will fail in the next step, causing the cascading failure to propagate forward. Owing to the initial failed area, shortest paths from a given node A to destinations located in the shadow area s (the dotted area in Fig. 3) would be affected and become longer, since they now have to surround the failed area. Specifically, the shortest paths from A to nodes in s (for example, AF) across the failed area can be separated into two parts in the adjacent ring (for example, BC and DE). For the first part, its length within the ring changes from BC to KG. As for the second part, its overload on the ring can be calculated from symmetry by switching A and F (source and target). Finally, the integration from r=a to r=1 covers all the overloads added to the adjacent ring (cyan) due to the initial failure, can be written as
where the length of KG is for r>b and for r≤b, s(a, r) is the area of s, that is, the number of A’s destinations in the shadow and v(a, b, r) is the average length of the first part of shortest paths (within the adjacent ring) before the failure. Similarly, we can obtain the initial load of nodes located within the circle centred at O with radius b as Lini(b) (Supplementary Methods). Then, for a node on the circle centred O with radius b, the overload produced by the failure is
and its initial load is
For each functional node in the network, the critical condition for failure can be written as α=ΔL/L0. Specifically, if α≥ΔL/L0, it survives, otherwise it fails. More details for the solution of the theoretical model can be found in Supplementary Figs 14–19 and Supplementary Methods.
How to cite this article: Jichang, Z. et al. Spatio-temporal propagation of cascading overload failures in spatially embedded networks. Nat. Commun. 7:10094 doi: 10.1038/ncomms10094 (2016).
We thank the support from Collaborative Innovation Center for industrial Cyber-Physical System. D.L. is also supported by the National Natural Science Foundation of China (Grant 61104144) and the National Basic Research Program of China (2012CB725404). S.H. thanks DTRA, ONR, the LINC and the Multiplex (No. 317532) EU projects, the DFG, and the Israel Science Foundation for support. J.Z. was partially supported by NSFC (Grant Nos 71501005 and 71531001) and 863 Program (Grant No. SS2014AA012303).
Supplementary Figures 1-19, Supplementary Notes 1-6, Supplementary Methods and Supplementary References
This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/