Introduction

As crisis of blackouts1, congestions2 and bankruptcies3 have demonstrated, localized perturbation in different networks4,5,6,7,8,9 can lead to domino-like cascade of failures10 representing major threat to network robustness. For example, the August 2003 cascading blackout in large portion of the northeastern U.S. and eastern Canada had a direct cost on order of 10 billion dollars. While research has focused mostly on the critical conditions and consequences of cascading failures11,12,13,14,15, an important question has been rarely raised: What is the propagation behavior of cascading failures? Indeed, it is difficult to develop an efficient protection or mitigation strategy against cascading failures without understanding the propagation behavior of cascading failures.

Propagation of failures can be mainly classified into two categories: either by structural dependencies16,17,18, or by functional overloads19,20. In cascading structural failures, the damages spread directly via structural dependencies in the network connections that bind directly the failure of one network element to the failure of other network elements. However, different from cascading structural failures, many cascading failures spread due to the propagation of overloads21,22. The cascading occurs, when perturbations to systems route the flow through alternative paths, which may cause some nodes to become overloaded and failed. This may finally develop into a series of cascading failures that can disable the entire network.

In marked contrast to cascading structural failures, overload failures usually interact and propagate in networks globally due to cooperative overload pressure, without visible or direct causal relation. The many-body effects of failure interactions and the lack of real failure data make the study of propagation behavior of cascading overload failures a big challenge. Here we study in our collected real data the correlations and their evolution characterizing failure interaction of cascading overload failures. The mechanisms behind our findings are here understood using a unified framework of percolation and overload model.

Results

Spatial correlation of failures

We begin by studying the spatial pattern of cascading failures in our collected real data of traffic congestions and blackout of power grid. As can be seen in both traffic and power grid data shown in Fig. 1a–b, the spatial pattern of cascading failures seems neither compact as expected by typical model of cascade dynamics23 nor purely random as in percolation theory24. To quantify these unexpected spatial patterns, we use the concept of spatial correlation25,26, C(r), which measures the relation between failures separated at a distance r. The spatial correlation is defined as:

Here, xi represents the failure value of site i, which is 1 if component i is failed or 0 otherwise. is the average of xi over the whole network. is the variance, where F is the set of cascading failed components, NF is the total number of cascading failures and rij is the Euclidean distance between i and j. The δ function selects nodes at distance r. This definition guarantees the normalization so that when r = 0, C(r) = 1, which is not influenced by the number of functional nodes. Positive values of C(r) indicate positive correlations (i.e., tendency of failures to be close to each other), while negative values imply anti-correlations.

Figure 1
figure 1

Spatial correlation of cascading failures in real data.

(a) Spatial pattern of cascading failures in traffic in Beijing, at 8:10 on 25th September 2013. Congested roads are in red (e.g. expressways with speed lower than 20 kilometers per hour), functional roads with higher speed are marked in yellow or green, depending on their velocities. More details are presented in Supplementary Information. (b) Spatial pattern of cascading failures in power grid (failures are red nodes and links; functional components are marked in green). This blackout occurred in the Western Systems Coordinating Council on 2nd July 1996. More details are presented in Supplementary Information. (c) Spatial correlation of cascading failures in city traffic for different instants. Power law spatial correlation is found at rush hour (8:20), while correlations decay faster at off-peak hours. Resolution of network distance r for congestion data is around 50 meters in geographical distance. Results from 30 days in April, May, September, October and November in 2013 are averaged here. (d) Spatial correlation of cascading failures (red dots) during blackouts in power grid, where similar spatial correlation is found in the model (green dots) of cascading failures in small-world network (rewiring probability is 0.002). Resolution of network distance r for blackout data is around 5 kilometers in geographical distance. Result in small world model (averaged over 50 realizations) is shifted down for comparison.

Strikingly, we find below in the real data of congestions and blackouts that these cascading failures have long-range correlations characterized approximately by a power law decay:

where γ quantifies the strength of spatial correlation between failures. Strong correlations correspond to small γ, while the short-range correlated case is represented by γ ≥ d, where d is the space dimension (here d = 2)25,27. The correlation values we find in real data (see below) suggest that failures tend to ‘attract’ each other in a long-range power law pattern.

Spatial correlation of congestion

From the viewpoint of physics, transportation systems are considered as non-equilibrium systems composed of interacting vehicles, which undergo a phase transition between free flow and congestion28. This transition is usually the result of a collective effect, where certain spatial pattern of traffic congestion can be formed. A wide moving jam in highway is a typical spatial pattern of traffic congestion, which appears as a local region of low speed and high density29.

However, for city traffic on global scale, understanding of spatial pattern of interacting jamming is missing mainly due to the lack of data in network scale. Here we study our recently collected data records spanning the city of Beijing, which include the whole road network with real-time traffic situation (see SI). Our data of traffic covers 30 days of 24 hours (10-mins resolution) in different months during 2013. In our study, we consider congested roads (e.g. expressway with velocity below 20 km/h) as failures (jams), which are marked in red in a typical snapshot figure of real-time traffic situation shown in Fig. 1a. As seen in Fig. 1c (see Fig. S1 in SI for another example), the spatial correlations of traffic jamming at rush hours (e.g. at 8:20) seem to decay as a power-law with distance. The scaling exponent γ is found to be close to 0.6, which suggests strong correlations between congestions in network traffic. At off-peak hours (e.g. at 6:00 or 10:00) the correlations decay faster with distance. In a network scale, different from local spatial pattern of highway jamming, we find that the congestions are long-range correlated.

Spatial correlation of blackout

Similar to traffic congestion, cascading failures are also common in power grids, whose devastating effects have been manifested in many major blackouts. In our collected data of cascading failures in power grids (see Fig. 1b and SI for more details), failures mainly include line and generator trips during the blackout of Western Systems Coordinating Council (WSCC) area in July 1996, United States. Similar to city traffic, we find that failures in blackout are also spatially correlated with power law decay. However, the spatial correlations of failures in blackout are found relatively weaker than those found in traffic congestions, with an exponent γ between 0.9 and 1.0 (see Fig. 1d and Fig. S2 in SI for more examples).

Model

Overload models6,7 are usually considered to explain these cascading failures in networks, which, however, refer mostly to limiting cases of malicious attack on the hubs of networks. However, many cascading overload failures are initiated in power transmission systems by disturbances (e.g. component aging, weather or fault operations) or in transportation networks and Internet, by daily congestion instabilities. Furthermore, while most previous cascading-failure models did not have a parameter that can be tuned to criticality, it is shown in our analysis of real traffic data that the long-range of scaling of failure correlations found in realistic cascading failures only present at criticality. To understand the origin of the surprising long-range correlations found in real systems near criticality, we here incorporate the percolation framework24 into an overload model19 (see SI for details), by quantifying these initial disturbances with a fraction (1-p) of nodes initially removed from a network. The overload model assumes that each network component has a load and tolerance parameter α. The load quantifies the amount of flow that a node is requested to transmit, which is assumed to be proportional to the total number of shortest paths passing through it. When the component's current load value Li is at least (1 + α) times more than its original load, L0, this component will fail and its load will be redistributed to other components, see also [21]. After the initial step of having a fraction (1-p) of random failures, the loads of the failed nodes are redistributed and new nodes become overloaded. At this instant the number of cascading steps is increased by one. The new overloaded nodes are removed and further nodes become overloaded in next iteration step. The iterations continue until all remaining nodes' loads are lower than their tolerated capacity. Note that p in percolation framework and the tolerance parameter α in the overload model are the main control parameters in this model and their combined effects determine the propagation pattern of cascading failures.

We begin by studying how the main control parameters p and α affect the network robustness as a result of cascading failures in square lattice. The robustness of the network is quantified by the relative size, G, of the giant percolation cluster. As shown in Fig. 2a, the giant component of network G decreases as we either decrease α or p. As we reduce p for a given α, the system is undergoing a phase transition process, which separates between a failure phase and a functional phase. For small α values (e.g. α = 1), G decreases abruptly after a removal of 1% of nodes, meaning that the full lattice is near criticality for small α. For very large α values (e.g. α = 500), since there are rarely overloads failures due to high tolerance, G decreases smoothly, with pc approaching 0.5927, as expected in the case of classical percolation in 2d lattice24. For intermediate α values (e.g. α = 10, 15), G decreases in a more complicated manner, suggesting a possible hybrid pattern of the above two limiting cases. In Fig. 2b, we identify the critical threshold pc of the phase transition for each α as the p value having maximal cascading steps13. The critical value pc is shown to decrease with increasing α, since the increase of overload tolerance decreases the overloads and therefore the failures in the system. We study in our model the spatial pattern of cascading failures mainly near criticality, since it has been suggested based on the historical data and modeling that cascading failures in power grids1,22 and transportation systems28,29,30 occur at criticality.

Figure 2
figure 2

Spatial correlations of cascading failures in model.

(a) The relative size of giant component G in the network as a function of p for different α. Results are averaged over 100 realizations. (b) The number of cascading steps as a function of p for different α. The maximal cascading step corresponds to the phase transition threshold, pc. Results are averaged over 100 realizations. (c) Spatial patterns of cascading failures in the model near criticality. One realization of model on lattice of size 10,000 is shown here, with α = 10 and p = 0.92. (d) Spatial pattern of randomly distributed failures with the same number of failure as in (c). (e) Spatial correlations of cascading failures as a function of r in the model for different p values with α = 10. Note that the correlation decays as a power law at criticality (at pc) and decays faster when p is away from pc. Results are averaged over 50 realizations. (f) The values of correlation exponent γ as a function of α in model and in real data of traffic during rush hours (08:00, 08:30, 18:00). Results of model are averaged over 50 realizations. Results from 30 days in April, May, September, October and November in 2013 (Beijing) are averaged in traffic data.

Cascading failures in the model also follow a certain spatial pattern at criticality, as in the typical example shown in Fig. 2c. This pattern can be seen to be very different from random failures with the same concentration (Fig. 2d). As Fig. 2e shows, the spatial correlations of failures in the model seem to follow a power law at criticality, while away from criticality the correlations decay faster. Note that the correlations in the random case (Fig. 2d) are zero. We also find in the model that the correlations of the cascading failures at criticality show universal behavior (Fig. 2f), since the γ values seem to be independent on α. Moreover, the scaling exponent γ found above in traffic congestions is very close to that found in the model (Fig. 2f and see Fig. S5 in SI). This may suggest the universality feature of the long-range correlations as a result of the phase transition process, which does not depend on the detailed differences between road network in Beijing and lattice assumed in the model.

Different from the above results in traffic congestions and model, the spatial correlation in blackout is weaker and the exponent γ is found to be close to 1.0 (figure 1d and see Fig. S2 in SI). This may arise due to the presence of long-range connections (that do not appear in traffic network) such as high-voltage transmission lines in power grid, which allow redistribution of load to distinct areas. Indeed, when adding long-range connections into the lattice model by randomly rewiring a small fraction of lattice links (see Fig. S6 in SI), we find that the spatial correlation is becoming weaker as rewiring probability is increased. With small amount of long-range connections, similar spatial correlation as in blackouts is found in the model (shown in figure 1d and Fig. S6 in SI).

Evolution of correlation length

Until now, we have studied the spatial correlation of cascading failures mainly near criticality. However, to protect or mitigate real systems from cascading failures, it is indispensable to understand how these long-range correlations we find evolve and emerge when systems are approaching criticality. Therefore, we study in real data and model the evolution of the spatial range of correlations when the system approaches criticality. The length ξ of spatial correlations is defined when correlations become zero25,27, i.e.

We study in real traffic the evolution of the correlation length ξ during a day. As seen in Fig. 3a, the correlation length ξ increases dramatically as time evolves towards morning or evening rush hours, while at off-peak hours the correlation length is extremely small. Moreover, as seen in Fig. 3a, the number of jams Nf, closely follows the correlation length and reaches maximum at the same time (see Fig. S4 for more examples). This dynamical behavior of the correlation length may suggest that the traffic system evolves towards criticality, which occurs during rush hours in a typical day. This suggestion is further supported in the model (Fig. 3b and Fig. S3 in SI), where the correlation length ξ grows and reaches maximum at criticality together with the number of cascading failures.

Figure 3
figure 3

Evolution of correlations in real traffic congestion and model.

(a) Evolution of number of failures NF and correlation length ξ in traffic data, where two maximums correspond to rush periods during a day. Results from 9 days in September (Beijing) are averaged. The maximum of ξ reaches the diameter of the main part of Beijing. (b) Evolution of number of failures NF and correlation length ξ in the model with α = 10, results are averaged over 10 realizations. The unit of ξ is the same as network distance r in traffic data or model.

Discussion

In summary, cascading failures in realistic congestions and blackouts, are found here to show long-range spatial correlation, whose length grows dramatically and reaches maximum as systems approach criticality. The long-range correlations and their time evolution found in our paper enable us to understand and predict the collapse process of systems, which may help to evaluate and improve existing protection and mitigation strategies against cascading failures. For example, in power grids, to automatically detect overloads and to isolate the functional part of system from cascading failures, different types of relays and fuses are installed. However, these electromechanical protection methods are used to mainly protect nearby components from failures, which do not capture the long-range correlations between failures found here. These approaches can lead to systematic risks, as indicated by the fact that the frequency of large blackouts in the United States is not decreasing31. Our finding suggests the possibility and direction of rescuing systems from cascading failures by global protection and mitigation strategies, which should decouple the long range correlations between failures to localize failures from spreading. Faced with the challenge of robust control and reliable management for networked infrastructures, we believe that the spatial pattern and its evolution of cascading failures found here will be useful for the realization and improvement32 of the future Intelligent Transportation Systems and Smart Grid.