Optimizing the robustness of electrical power systems against cascading failures

Zhang, Yingrui; Yağan, Osman

doi:10.1038/srep27625

Download PDF

Article
Open access
Published: 21 June 2016

Optimizing the robustness of electrical power systems against cascading failures

Yingrui Zhang¹ &
Osman Yağan¹

Scientific Reports volume 6, Article number: 27625 (2016) Cite this article

3063 Accesses
33 Citations
3 Altmetric
Metrics details

Subjects

Abstract

Electrical power systems are one of the most important infrastructures that support our society. However, their vulnerabilities have raised great concern recently due to several large-scale blackouts around the world. In this paper, we investigate the robustness of power systems against cascading failures initiated by a random attack. This is done under a simple yet useful model based on global and equal redistribution of load upon failures. We provide a comprehensive understanding of system robustness under this model by (i) deriving an expression for the final system size as a function of the size of initial attacks; (ii) deriving the critical attack size after which system breaks down completely; (iii) showing that complete system breakdown takes place through a first-order (i.e., discontinuous) transition in terms of the attack size; and (iv) establishing the optimal load-capacity distribution that maximizes robustness. In particular, we show that robustness is maximized when the difference between the capacity and initial load is the same for all lines; i.e., when all lines have the same redundant space regardless of their initial load. This is in contrast with the intuitive and commonly used setting where capacity of a line is a fixed factor of its initial load.

Network isolators inhibit failure spreading in complex networks

Article Open access 25 May 2021

Understanding Braess’ Paradox in power grids

Article Open access 14 September 2022

Fragmentation of outage clusters during the recovery of power distribution grids

Article Open access 30 November 2022

Introduction

Electrical power systems are one of the most critical national infrastructures affecting all areas of daily life^1,2. They also provide crucial support for other national infrastructures such as telecommunications, transportation, water supply systems and emergency services^3,4,5. Besides daily life, the destruction of power systems would also weaken or even disable our defense and economic security⁶. Thus, ensuring the robustness of electrical power systems and maintaining the continuous availability of power supply are of utmost priority. Despite its great importance, concerns about the robustness of the power grid have grown recently because of several large-scale outages that took place in different parts of the world. For example, in the 2012 India blackout, 600 million people, nearly a tenth of the world’s population, were left without power⁷. The blackout spread across 22 states in Northern, Eastern and Northeast India and is the largest power outage in history⁸.

These blackouts often start with natural hazards such as lightning shorting a line or with malicious attacks and affect only a small portion of the power system initially. But due to the long range nature of electricity, the redistribution of power loads may affect not only geographically co-located lines but also other parts of the system far from the initial affected area. A typical example is the Western Systems Coordinating Council (WSCC) system outage on August 10, 1996⁹, where long range failures have been observed. The large-scale blackouts are often attributed to this initial shock getting escalated due to the intricate dependencies within a power system. For example, when a line is tripped, the flow on all other lines will be updated and some lines may end up with a total flow (initial plus redistributed after failures) exceeding their capacity. All lines with flows exceeding their capacity will in turn fail and flows on other lines will be updated again, possibly leading to further failures and so on. This process may continue recursively and lead to a cascade of failures, which may potentially breakdown the entire system. For instance, on August 10, 1996, an electrical line sagged in summer heat in Southern Oregon and this initiated a chain reaction that cut power to more than four million people in eleven Western States^10,11.

Since electrical power systems are among the largest and most complex technological systems ever developed¹², it is often hard to have a full understanding of their inter- and intra-dependencies and therefore it is hard to predict their behavior under external attacks or random failures. In this work, we aim to shed light on the robustness of power systems using a simple yet useful model. In particular, we assume that when a line fails, its load (i.e., flow) is redistributed equally among all other lines. The equal load redistribution model has the ability to capture the long-range nature of the Kirchhoff’s law, at least in the mean-field sense, as opposed to the topological models where failed load is redistributed only locally among neighboring lines^13,14. This is particularly why this model received recent attention in the context of power systems first in the work by Pahwa et al.¹⁵ and then in Yağan¹⁶; the model is originally inspired by the democratic fiber-bundle model¹⁷ that is used extensively for studying the rupture of fiber-bundles under increasing external force.

Assuming that each of the N lines in the system is assigned independently an initial load L_i and a redundant space S_i – meaning that line capacity equals L_i + S_i – from a joint distribution , we study the robustness of this systems against random attacks; see Model Definitions for the details of our model. In particular, we characterize the final fraction n_∞(p) of alive (i.e., non-failed) lines at the steady-state, when p-fraction of the lines are randomly failed. We identify the critical attack size p* after which the system breakdowns entirely; i.e., n_∞(p) = 0 if p > p*. We show that the transition of the system around p* is always first-order (i.e., discontinuous). However, depending on the distribution P_LS, this may take place with or without a preceding second-order (i.e., continuous) divergence of n_∞(p) from the 1 − p line. Finally, under the constraints that mean load and mean free space are fixed, we show that assigning every line the same free space regardless of its load is optimal in the sense of maximizing the robustness n_∞(p) for all attack sizes p. This provably optimal strategy is in sharp contrast with the commonly used^{13,18,19,20,21,22} setting where free-space S_i is set to be a constant factor of a line’s initial load, e.g., S_i = αL_i for all i. This hints at the possibility that existing power systems are not designed optimally and that their robustness may be significantly improved by reallocating the line capacities (while keeping the total capacity unchanged). Our analytic results are validated via extensive simulations, using both synthetic data for load-capacity values as well as realistic data from IEEE test-case data sets.

We believe that our results provide interesting insights into the dynamics of cascading failures in power systems. In particular, we expect our work to shed some light on the qualitative behavior of real-world power systems under random attacks and help design them in a more robust manner. The results obtained here may have applications in fields other than power systems as well. A particularly interesting application is the study of the traffic jams in roads, where the capacity of a line can be regarded as the traffic flow capacity of a road^23,24.

Model Definitions

Equal load-redistribution model

We consider a power system with N transmission lines with initial loads (i.e., power flows) L₁, …, L_N. The capacity C_i of a line defines the maximum power flow that it can sustain and is given by

where S_i denotes the free-space (or, redundancy) available to line . The capacity of a line can be defined as a factor of its initial load, i.e.,

with α_i > 0 defining the tolerance parameter for line . Put differently, the free space S_i is given in terms of the initial load L_i as S_i = α_iL_i; it is very common^13,18,19,20 to use a fixed tolerance factor for all lines in the system, i.e., to use α_i = α for all i. It is assumed that a line fails (i.e., outages) if its load exceeds its capacity at any given time. The key assumption of our model is that when a line fails, the load it was carrying (right before the failure) is redistributed equally among all remaining lines.

Throughout we assume that the pairs (L_i, S_i) are independently and identically distributed with for each i = 1, …, N. The corresponding (joint) probability density function is given by . Throughout, we let L_min and S_min denote the minimum values for load L and free space S; i.e.,

we assume that L_min, S_min > 0. We also assume that the marginal densities p_L(x) and p_S(y) are continuous on their support.

The equal load redistribution rule takes its roots from the democratic fiber bundle model^17,25, where N parallel fibers with failure thresholds C₁, …, C_N (i.e., capacities) share an applied total force F equally. There, it has been of interest to study the dynamics of recursive failures in the bundle as the applied force F increases; e.g., see^26,27,28,29. This model has been recently used by Pahwa et al.³⁰ in the context of power systems, with F corresponding to the total load shared equally by N power lines. The relevance of the equal load-redistribution model for power systems stems from its ability to capture the long-range nature of the Kirchhoff’s law, at least in the mean-field sense, as opposed to the topological models where failed load is redistributed only locally among neighboring lines^13,14. In particular, the equal load redistribution model is expected to be a reasonable assumption under the DC power flow model, which approximates the standard AC power flow model when the phase differences along the branches are small and the bus voltages are fixed³⁰; in fact, power flow calculations based on the DC model^31,32 are known to give accurate results that match the AC model calculations in many cases. Therefore, we expect our work to shed some light on the qualitative behavior of real-world power systems under random attacks.

Problem definition

Our main goal is to study the robustness of power systems under the equal load redistribution rule. In this work, we assume that failures are initiated by a random attack that results with a failure of a p-fraction of the lines; of course, all the discussion and accompanying results do hold for the robustness against random failures as well. The initial failures lead to redistribution of power flows from the failed lines to alive ones (i.e., non-failed lines), so that the load on each alive line becomes equal to its initial load plus its equal share of the total load of the failed lines. This may lead to the failure of some additional lines due to the updated flow exceeding their capacity. This process may continue recursively, generating a cascade of failures, with each failure further increasing the load on the alive lines and may eventually result with the collapse of the entire system. Throughout, we let n_∞(p) denote the final fraction of alive lines when a p-fraction of lines is randomly attacked. The robustness of a power system will be evaluated by the behavior n_∞(p) for all attack sizes 0 < p < 1. Of particular interest is to characterize the critical attack size p* at which n_∞(p) drops to zero.

The problem formulation considered here was introduced by Yağan¹⁶. This formulation differs from the original democratic fiber-bundle model (and its analog³⁰ introduced for power systems) in that (i) it does not assume that the total load of the system is fixed at F; and (ii) it allows for power lines to carry different initial loads unlike the democratic fiber bundle model where all lines start with the same initial load. Since power lines in real systems are likely to have different loads at the initial set-up, we believe our formulation is more suitable for studying cascading failures in power systems. In addition,³⁰ is concerned with failures in the power system that are triggered by increasing the total force (i.e., load) applied. Instead, our formulation allows analyzing the robustness of the system against external attacks or random line failures, which are known to be the source of system-wide blackouts in many interdependent systems^5,33,34.

A word on notation in use: The random variables (rvs) under consideration are all defined on the same probability space . Probabilistic statements are made with respect to this probability measure and we denote the corresponding expectation operator by . The indicator function of an event A is denoted by 1_[A].

Results

Final system size as a function of the attack size, n_∞(p)

Our first main result characterizes the robustness of power systems under any initial load-space distribution P_LS and any attack size p. Let L and S denote generic random variables following the same distribution with initial loads L₁, …, L_N and free spaces S₁, …, S_N, respectively. Then, with x* denoting the smallest solution of

over the range x* ∈ (0, ∞), the final system size n_∞(p) at attack size p is given by

This result, proved in Methods, provides a complete picture about a power system’s robustness against random attacks of arbitrary size under our model. In particular, it helps understand the response n_∞(p) of the system to attacks of varying magnitude.

For a graphical solution of n_∞(p), one shall plot as a function of x (e.g., see Fig. 1(a)) and draw a horizontal line at the height on the same plot. The leftmost intersection of these two lines gives the operating point x*, from which we can compute . When there is no intersection, we set x* = ∞ and understand that n_∞(p) = 0.

We see from this result that an adversarial attack aimed at a certain part of the electrical power grid may lead to failures in other parts of the system, possibly creating a recursive failure process also known as cascading failures. This will often result with a damage in the system much larger than the initial attack size p. However, in most cases some” part of the system is expected to continue its functions by undertaking extra load; e.g., with n_∞(p) > 0. In such cases, although certain service areas are affected, the power grid remains partially functional. The most severe situations arise when cascading failures continue until the complete breakdown of the system where all lines fail; e.g., when n_∞(p) = 0. This prompts us to characterize the critical attack size p*, defined as the largest attack size that the system can sustain.

The “critical” attack size

Of particular interest is to derive the critical attack size p* such that for any attack with size p > p*, the system undergoes a complete breakdown leading to n_∞(p) = 0; on the other hand for p < p*, we have n_∞(p) > 0. More precisely, we define p* as

The critical attack size can be derived from the previous results (3–4) that characterize n_∞(p). Namely, for any load-free space distribution P_LS(x, y), the maximum attack size p* can be computed from the global maximum of the function . In particular, we have

A proof of this result is given in Methods.

Understanding the “phase transition”: Conditions for abrupt rupture

It is of significant interest to understand the behavior of the system near the phase transition; i.e., when the attack size is very close to but smaller than the critical value p*. One main questions here is whether n_∞(p) decays to zero continuously (i.e., through a second-order transition), or discontinuously (i.e., through a first-order transition). The practical significance of this is that continuous transitions suggest a more stable and predictable system behavior with respect to attacks, whereas with discontinuous transitions system behavior becomes more difficult to predict, for instance, from past data.

Our analysis shows that under the equal-load redistribution model considered here the total breakdown of the system will always be through a first-order (i.e., discontinuous) transition; see Methods for a proof. Namely, we have

while by definition it holds that , for any ε > 0 arbitrarily small. This means that regardless of the attack size and the distribution of load and capacity, the transition point where the system has a total breakdown (i.e., where the fraction of alive lines drops to zero) is always discontinuous. These cases are reminiscent of the real-world phenomena of unexpected large-scale system collapses; i.e., cases where seemingly identical attacks/failures leading to entirely different consequences.

Now that we showed that the breakdown of the power system takes place through a first-order transition, an interesting question arises as to whether this first-order rupture at p* has any early indicators at smaller attack sizes; e.g., a diverging failure rate leading to a non-linear decrease in n_∞(p). Otherwise, an abrupt first-order transition is said to take place if the linear decay of n_∞(p) (of the form 1 − p) is followed by a sudden discontinuous jump to zero at p*; i.e., we say that the system exhibits an abrupt rupture when it holds that

In Fig. 1(b) we demonstrate the distinction between an abrupt rupture and a rupture with preceding divergence from the 1 − p line.

We now present our result that reveals the necessary and sufficient condition for an abrupt rupture to take place. We show (in Methods) that the system goes through an abrupt first-order breakdown (e.g., see the below line shown in red in Fig. 1(b)), if and only if the function reaches its maximum at x = S_min, where S_min is the minimum value the extra space S can take. Namely, an abrupt first-order rupture (without a preceding divergence) takes place if and only if

Otherwise, if , then a preceding divergence from the 1 − p line will be observed before n_∞(p) drops to zero; e.g., see the above line shown in blue in Fig. 1(b). More precisely, it will hold that n_∞(p) < 1 − p for some p < p*. A detailed analysis of conditions for these two types of ruptures is presented in Methods.

Figure 1 demonstrates different types of transitions that the system can exhibit in relation to the behavior of h(x). In Fig. 1(a), we plot h(x) in two different cases: the red (i.e., lower) line reaches its maximum at S_min, while the blue (i.e., upper) line continues to increase after S_min and reaches its maximum later. Since the function h(x) represented by the blue line does not satisfy the abrupt rupture condition (8), we see in Fig. 1(b) that the corresponding final system size goes through a diverging transition (from the 1 − p line) before entirely breaking down through a first-order transition. On the other hand, we see that h(x) represented by the red curve reaches its maximum at S_min. As expected from our results, we see that the corresponding final system size exhibits an abrupt breakdown without any preceding divergence from the 1 − p line.

Achieving optimal robustness

The most important question from a system design perspective is concerned with deriving the universally optimum distribution of initial loads L₁, …L_N and free spaces S₁, …, S_N that leads to maximum robustness under the constraints that and are fixed. We believe that the answer to this problem would be very useful in designing real-world power grids with optimum robustness, i.e. with the final system size n_∞(p) maximized for any attack size p. The motivation for the constraints on the mean load and mean free space are as follows. The total load carried by the system is likely to be dictated by system requirements in most real-world cases, which also determines the average load per line. In addition, the total capacity (or, total free space) available to the system is likely to be bounded due to the costs associated with using high-capacity lines.

Our results concerning this important problem are presented next. First, we focus on maximizing the critical attack size p*. We show in Methods that the critical attack size always satisfies

Namely, regardless of the distribution P_LS that generates load-capacity pairs, the system will always go into a complete breakdown if more than -fraction of lines are attacked; i.e., the system can never sustain a random attack of size exceeding the ratio of mean free space to mean capacity. Next, we show that this critical attack size is in fact attainable under any load distribution by a Dirac delta distribution for the free-spaces, i.e., by giving every line the same free space. More precisely, let denote the critical attack size when , where the probability density p_L(x) of the initial loads L₁, …, L_N is arbitrary. We show in Methods that

Combined with (9) this shows that assigning every line the same free space (regardless of the initial loads) maximizes the largest attack that the system can sustain.

More can be said regarding the optimality of equal free-space allocation. Let denote the maximum critical attack size as established above, i.e., . In view of the fact that we always have n_∞(p) ≤ 1 − p, the next result firmly establishes that using the Dirac delta distribution for free space optimizes the robustness of the system uniformly for any attack size p. In particular, if , then the corresponding final system size n_∞,dirac(p) satisfies

Namely, the setting maximizes the final system size n_∞(p) uniformly for all p.

This result shows that as far as the random attacks are concerned, the system’s robustness can be maximized under the constraints of fixed and fixed (and hence fixed ), by giving each line an equal free space , irrespective of how the initial loads are distributed. Put differently, the robustness will be maximized by choosing a line’s capacity C_i through no matter what its load L_i is. In view of (2), this then leads to a tolerance factor

meaning that the optimal robustness is achieved when lines with larger initial loads are given smaller tolerance factors. This result is rather counter-intuitive because one might think that lines with large initial loads shall receive extra protection (in the form of larger free space or tolerance factor) given the potentially detrimental effect of their failure to the overall system. Our result proves this intuition incorrect and show that robustness is maximized if all lines share the fixed total free-space equally. In fact, numerical results presented in Results show that the standard choice of setting free-space to be a constant factor of initial load (i.e., using the same tolerance factor for all lines) may lead to significantly worse robustness than the optimal choice given at (11).

A possible explanation to this counter-intuitive result is as follows. When all lines have the same extra space, we ensure that the system never goes through a cascade of failures. In other words, when p-fraction of the lines are attacked, we will have either n_∞(p) = 1 − p or n_∞(p) = 0 depending on whether or not, respectively, the total load of failed lines divided by 1 − p is less than the common free space S. In addition, if the attack size is large enough that total load of failed lines, i.e., , is larger than the total free space available in the rest of the system, then regardless of the distribution P_LS(x, y), the system will collapse. Collectively, these explain why assigning equal free-space to all lines ensures that system will go through an abrupt rupture, but only at the optimal critical attack size .

The optimality results presented here shed light on the recent findings by Yağan¹⁶ who investigated the model considered here in the special case where S_i = αL_i for all lines; i.e., the case where p_LS is degenerate with p_LS(x, y) = p_L(x)δ(y − xα). There, they found that the optimal robustness is achieved (i.e., n_∞(p) is maximized uniformly across all p), if the load L follows a Dirac delta distribution; i.e., the system is most robust when each line carries the same initial load. On the other hand, our results show that the distribution of load has very little to do with optimizing the robustness and in fact robustness is maximized under any initial load distribution if the free-space is distributed equally. This shows that Yağan’s result of the optimality of equal load distribution is merely a coincidence. It only arises under the assumption that a line’s free space is a constant factor of its load, so that equal allocation of initial loads is equivalent to that of free-space.

Numerical results

We now confirm our theoretical findings via numerical simulations, using both synthetic and real-world data. We focus on the former case first and consider various commonly known distributions for the load and free-space variables.

Synthetic data

Throughout, we consider three commonly used families of distributions: (i) Uniform, (ii) Pareto and (iii) Weibull. These distributions are chosen here because they cover a wide range of commonly used and representative cases. In particular, uniform distribution provides an intuitive baseline. Distributions belonging to the Pareto family are also known as a power-law distributions and have been observed in many real-world networks including the Internet, the citation network, as well as power systems¹⁵. It often leads to interesting behavior due to its long-tail property and ability to lead to infinite variance. Weibull distribution exhibits rich behavior and contains several widely-used distributions as special cases; e.g., Exponential, Rayleigh and Dirac-delta. The probability density functions of these distributions are defined below for a generic random variable L.

• Uniform Distribution: L ~ U(L_min, L_max). The density is given by

• Pareto Distribution: L ~ Pareto(L_min, b). With L_min > 0 and b > 0, the density is given by

To ensure that is finite, we also enforce b > 1. For any b ≤ 2, the variance of L is infinity.

• Weibull Distribution: L ~ Weibull(L_min, λ, k). With λ, k, L_min > 0, the density is given by

The case k = 1 corresponds to the exponential distribution and k = 2 corresponds to Rayleigh distribution. If k grows arbitrary large, Weibull distribution converges to the Dirac-delta distribution centered at its mean; i.e., . Here, the mean is given by , where Γ(⋅) is the gamma-function given by .

First, we confirm our results presented in Results concerning the response of the system to attacks of varying size; i.e. concerning the final system size n_∞(p) under different load-extra space distributions including its transition behavior around the critical attack size p*. In all simulations, we fix the number of lines at N = 10⁶ and for each set of parameters being considered (e.g., the distribution p_LS(x, y) and attack size p) we run 200 independent experiments. The results are shown in Fig. 2 where symbols represent the empirical value of the final system size n_∞(p) (obtained by averaging over 200 independent runs for each data point) and lines represent the analytical results computed from (3) and (4). We see that theoretical results match the simulations very well in all cases.

The specific distributions used in Fig. 2 are as follows: From left to right, we have (i) L is Weibull with L_min = 10, λ = 100, k = 0.4 and S = αL with α = 1.74; (ii) L is Uniform over [10, 30] and S is Uniform over [1, 5]; (iii) L is Weibull with L_min = 10, λ = 10.78, k = 6 and S is Uniform over [5, 10]; (iv) L is Pareto with L_min = 10, b = 2 and S = αL with α = 0.7; (v) L is Uniform over [10, 30] and S is Uniform over [10, 60]; and (vi) L is Weibull with L_min = 10, λ = 10.78, k = 6 and S is Uniform over [20, 100]. Thus, the plots in Fig. 2 demonstrate the effect of the load-free space distribution on the robustness of the resulting power system. We see that both the family that the distribution belongs to (e.g., Uniform, Weibull, or Pareto) as well as the specific parameters of the family affect the behavior of n_∞(p). For instance, the curves representing the two cases where L and S follow a Uniform distribution demonstrate that both abrupt ruptures and ruptures with a preceding divergence are possible in this setting, depending on the parameters L_min, L_max, S_min and S_max. In cases where the load follows a Pareto distribution and S = αL, only abrupt ruptures are possible as shown in¹⁶. Finally, we see that the Weibull distribution gives rise to a richer set of possibilities for the transition of n_∞(p). Namely, we see that not only we can observe an abrupt rupture, or a rupture with preceding divergence (i.e., a second-order transition followed by a first-order breakdown), it is also possible that n_∞(p) goes through a first-order transition (that does not breakdown the system) followed by a second-order transition that is followed by an ultimate first-order breakdown; see the behavior of the orange circled line in Fig. 2. We remark that these cases occur when h(x) has a local maximum at x = S_min, while its global maximum occurs at a later point x > S_min; see¹⁶ for a more detailed discussion of this matter.

In our second set of simulations we seek to verify the results presented in Results, namely the optimality of assigning the same free space to all lines (regardless of how initial loads are distributed) in terms of maximizing the robustness. In the process, we also seek to compare the robustness achieved under equal free-space distribution versus the commonly used strategy of setting S_i = αL_i for each line. We note that the latter setting with a universal tolerance factor α is commonly used in relevant research literature^13,18,19,20 as well as in industrial applications^21,22; therein, the term (1 + α) is sometimes referred to as the Factor of Safety. The results are depicted in Fig. 3 where lines represent the analytical results given in Results and symbols are obtained by averaging over 200 independent experiments with N = 10⁶ lines. In all cases we fix the mean load at and mean free-space at . With load distributed as Uniform (Fig. 3(a)), Weibull (Fig. 3(b)), or Pareto (Fig. 3(c)), we either let S_i = 10 for all lines, or use S_i = αL_i with , the latter choice making sure that the mean free-space is the same in all plots.

We see in all cases that there is an almost perfect agreement between theory and simulations. We also confirm that regardless of how initial load is distributed, the system achieves uniformly optimal robustness (i.e., maximum n_∞(p) for all p) as long as the free-space is distributed equally; e.g., see Fig. 3(d) that combines all plots in Fig. 3(a–c). In other words, we confirm that (10) holds with the critical attack size p* matching the optimal value . Finally, by comparing the robustness curves under equal free-space and equal tolerance factor, we see the dramatic impact of free-space distribution on the robustness achieved. To give an example, we see from Fig. 3(d) that regardless of how initial load is distributed, the system can be made robust against random attacks that fail up to 25% of the lines; as already discussed this is achieved by distributing the total free-space equally among all lines. However, if the standard approach of setting the free-space proportional to the initial load is followed, the system robustness can be considerably worse with attacks targeting as low as 10% of the lines being able to breakdown the system.

Real world data

Thus far, our analytical results are tested only on synthetic data; i.e., simulations are run when load-free space variables are generated randomly from commonly known distributions. To get a better idea of the real-world implications of our work, we also run simulations on power flow data from the IEEE power system test cases³⁵; the IEEE test-cases are widely regarded as realistic test-beds and used commonly in the literature. Here, we consider four power flow test cases corresponding to the 30-bus, 57-bus, 118-bus and 300-bus systems. For each test case, we take the load values directly from the data-set³⁵. Since the data-set does not contain the line capacities, we allocate all lines an equal free-space, S = 10; clearly, most of the discussion here would follow with different free-space distributions.

Figure 4 presents the results from the IEEE data set simulations, where blue circles represent the final system size n_∞(p) under original load data from each test case; each data point is obtained by averaging the result of 200 independent random attack experiments. As we compare these circles with our analytical results (represented by solid red lines) we see that the overall tendency of n_∞(p) is in accords with the theoretical analysis. However, the agreement of theory and simulations is significantly worse than that observed in Figs 2 and 3. This is because our mean field analysis relies on the number of lines N being large, while the IEEE test case data represent very small systems; e.g., the underlying systems have 30, 57, 118 and 300 lines in Fig. 4(a–d), respectively. In order to verify that the mismatch is due to the small system size (rather than the load distribution being different from commonly known ones), we re-sample 10⁵ load values from the empirical load distribution obtained from the data-set in each case; the Inset in each figure shows the corresponding empirical distribution P_L(x). The simulation results with these N = 10⁵ load values are shown in Fig. 4 with red triangles. This time with the number of lines increased, we obtain a perfect match between analysis and simulations. This confirms our analysis under realistic load distributions as well. We also see that although analytical results fail to match the system robustness perfectly when N is very small, they still capture the overall tendency of the robustness curves pretty well. In fact, they can be useful in predicting attack sizes that will lead to a significant damage to the system; e.g., in all cases we see that the analytically predicted critical attack size p*, ranging from 0.42 in Fig. 4(a) to 0.07 in Fig. 4(d), leads to the failure of more than 50% of all lines in the real system.

Discussion

Our results provide a complete picture of the robustness of power systems against random attacks under the equal load-redistribution model. Namely, with initial load L_i and extra space S_i of each line being independently and identically distributed with p_LS(x, y), our analysis explains how the final system size n_∞(p) will behave under attacks with varying size p. We also demonstrate different types behavior that n_∞(p) can exhibit near and around the critical attack size p*, i.e., the point after which n_∞(p) = 0 and the system breaks down completely. We show that the final breakdown of the system is always first-order (i.e., discontinuous) but depending on p_LS(x, y), this may (i) take place abruptly meaning that n_∞(p) follows the 1 − p line until its sudden jump to zero; or (ii) be preceded by a second-order (i.e., continuous) divergence from the 1 − p line. We also demonstrate the possibility of richer behavior where n_∞(p) drops to zero through a first-order, second-order and then a first-order transition. The discontinuity of the final system size at p* makes it very difficult to predict system behavior (in response to attacks) from previous data. In fact, this is reminiscent of the real-world phenomena of unexpected large-scale system collapses; i.e., cases where seemingly identical attacks/failures lead to entirely different consequences. On the other hand, the cases that exhibit a preceding second-order transition are less severe, since the deviation from the 1 − p line may be taken as an early warning that the current attack size is close to p* and that the system is not likely to sustain attacks much larger than this.

From a design perspective, it is desirable to maximize the robustness of the electrical power system under certain constraints. In our analysis, we address this problem and derive the optimal load-free space distribution p_LS(x, y) that maximizes the final system size n_∞(p) uniformly for all attack sizes p. Namely, we show that under the constraints that and are fixed, robustness is maximized by allocating the same free space to all lines and distributing the initial loads arbitrarily; i.e. the distribution maximizes robustness for arbitrary p_L(x). We show that this optimal distribution leads to significantly better robustness than the commonly used strategy of assigning a universal tolerance factor α, i.e., using p_LS(x, y) = p_L(x)δ(y − αx).

Our theoretical results are verified via extensive simulations using both synthetic data and real world data. We show that our results are in perfect agreement with numerical simulations when the system size N is large; in most cases it suffices to have N = 10⁴ to N = 10⁵. However, we see from our simulations with the IEEE test-cases that when N is very small (we considered N = 30, N = 57, N = 118 and N = 300), our theory fails to yield the same prediction accuracy. Nevertheless, we see that our results capture the overall tendency of n_∞(p) pretty well and thus can serve as a useful predictor of the critical attack size.

An important direction for future work would be to relax the simplifications and assumptions used here for modeling the failures in an electrical power system. For example, the equal redistribution rule is used here to capture the long-range effect of the Kirchhoff Law, i.e., that the failure of a line may impact the system globally. Future work may consider different types of global load redistribution rules that are not based on equal redistribution; e.g., load may be redistributed randomly or according to some other rules. Hybrid approaches where a fraction of the load is redistributed only locally to the neighboring lines (as often adopted in topology-based redistribution models²⁰), while the rest being redistributed globally might be considered. Another interesting direction for future work would be to consider the case of targeted attacks, rather than random attacks studied here. A good starting point in that direction would be to study possible attack strategies that a capable adversary might use; e.g., given L₁, …, L_N and S₁, …, S_N, which k lines should an adversary attack in order to minimize the final system size n_∞? A preliminary analysis of this problem can be found in³⁶, with partial results indicating that optimal attack strategies may be computationally expensive to derive – i.e., that the problem is NP-Hard.

Methods

Understanding the cascade dynamics

Our proofs are based on a mean-field analysis of the cascading failure dynamics under the equal redistribution model; see Model Definitions for details. Assume that failures take place in discrete time steps t = 0, 1, … and are initiated at time t = 0 by the random failure of a p-fraction of the lines. For each t = 0, 1, …, let f_t denote the fraction of lines that have failed until stage t. The number of links that are still alive at time t is then given by N_t = N(1 − f_t); e.g., f₀ = 0 and N₀ = N(1 − p). Also, we find it useful to denote by Q_t the total extra load per alive line at (the end of) stage t. In other words, Q_t is given by the total load of all f_tN failed lines until this stage divided by (1 − f_t)N; e.g., since the initial attack is random.

Our main goal is to derive the final system size n_∞(p) as a function of the attack size p. With the above definitions in place, we clearly have n_∞(p) = 1 − f_∞. Thus, the derivation of n_∞(p) passes through an understanding of the behavior of f_t as t → ∞. Here, we will achieve this by first deriving recursive relations for f_t, Q_t and N_t for each t = 0, 1, … and then analyzing the steady-state behavior of the recursions. This method has already proven successful by Yağan¹⁶, who studied the same problem in a special case where

with α > 0 defining the universal tolerance factor. Put differently, the work¹⁶ considers the special case where

for arbitrary distribution of initial load p_L(x). Here, we start our discussion from the recursions derived by Yağan [16, Eqn. 6] for this special case. Namely, with f₀ = p, (and Q₋₁ = 0), they showed for each t = 1, 2, … that

An inspection of their derivation reveals that these recursive relations do hold in the general case as well with αL replaced by the random variable S. Namely, with no constraints imposed on the distribution p_LS(x, y) (other than those stated in Model Definitions), we have

These equations can also be validated intuitively. First of all, given that Q_t defines the extra load per alive line at the end of stage t, we know that for a line to fail exactly at stage t + 1, it must have a free space smaller than Q_t but larger than Q_t−1, with the latter condition ensuring that the line does not fail at any previous stages. So, the fraction of lines that fail at stage t + 1 among those that survive stage t is intuitively given by . Rewriting (15), we get

confirming this intuitive argument. In fact, it is clear that for a line to survive stage t + 1, it must (i) survive the initial attack (which happens with probability 1 − p) and (ii) have a free space S > Q_t. Given the independence of the initial attack from other variables, we thus have . This explains the denominator of (16) since Q_t+1 gives the additional load per alive line at this stage. The nominator of (16) should then give the mean total load of the lines that have failed until (and including) stage t + 1, normalized with N; the normalization is required given that the denominator term is also normalized with N. To calculate the total load of the failed lines at this stage, first we note that p fraction of the lines fail randomly as a result of the initial attack, giving the term in the nominator of (16). In addition, among the remaining 1 − p fraction of the lines, those with free space satisfying S ≤ Q_t will fail, leading to the second term in the nominator of (16).

Returning to the recursions (12)–(14), we see that cascading failures will stop and a steady-state will be reached when f_t+2 = f_t+1. From (15), we see that this occurs if

or, equivalently if

upon using (16). In order to simplify this further, we let x := Q_t and realize that

With these in place, the condition for cascades to stop (18) gives

It is now clear how to obtain the fraction n_∞(p) of power lines that are still alive at the end of the cascading failures. First, we shall find the smallest solution x* of (19) that gives the equilibrium value Q_∞ at which cascades will stop. Then the final-fraction n_∞ of alive lines is given by

The last relation follows from the fact that for each t = 0, 1, …. This can be established in the following manner. Applying (15) repeatedly, we get

which gives

where Q₋₁ = 0 as before. Since Q_t is monotone increasing in t, i.e., Q_t+1 ≥ Q_t for all t, we get

as we recall that f₀ = p. We now seek to simplify the cascade stop condition given at (19). For notational convenience, let

Then, (19) becomes

which holds if either one of the following is satisied:

1. x ≥ g(x); or,

2. x < g(x) and .

The next result (proved in the Supplementary Material) shows that it suffices to consider only the first case for the purposes of our discussion.

Claim 1 Let x* be the smallest solution of (23) and x** be the smallest solution of x ≥ g(x). Then, we have

The proof of this important technical result is given in the Supplementary File.

Rewriting the inequality x ≥ g(x) and using Claim 1, we now establish the first main result of the paper given at (3) and (4). Namely, with x* denoting the smallest solution of

the final system size n_∞(p) is given by . If (24) has no solution, we set leading to n_∞(p) = 0. ■

Critical attack size

Characterizing the critical attack size p* is now a simple matter from the discussion above. Recall that critical attack size is defined via

Since always holds, we know that whenever there exists an x* < ∞ satifying (24), we must have that as otherwise the function would be zero conflicting with (24). Therefore, we have n_∞(p) > 0 for any p for which (24) has a solution; and we get n_∞(p) = 0 only if (24) has no solution. With these in mind, it is clear that p* will be given by the supremum of p values for which (24) has a solution. Equivalently, p* is given by the value of p for which equals to the global maximum of g(x). More precisely, we have

This establishes (5). ■

Order of phase transition and condition for abrupt break-downs

Order of transition

We now establish the fact that the final breakdown of the system will always be through a first-order (i.e., discontinuous) transition. This amounts to establishing (6), namely that ; this then implies a discontinuity in n_∞(p) at , since by definition of the critical attack size we have for any ε > 0.

We now establish . From (25), we see that when p = p* the cascade stopping condition (24) will be satisfied by x_* that maximizes . In other words, we have

where

we argue that by contradiction. Suppose that . With x_* denoting the global maximizer of , this would imply that

However, with 0 ≤ x < S_min, we have contradicting with (27). Therefore, we must have and the desired conclusion immediately follows. ■

Condition for an abrupt first-order transition

An abrupt rupture is said to take place if the linear decay of n_∞(p) (in the form of 1 − p) is followed by a sudden discontinuous jump to zero at p*; i.e., when it holds that

we now show that this occurs if and only if the function takes its global maximum at . First of all, implies that in view of (26). This immediately gives . For the reverse direction, observe that h(x) is linearly increasing on the range 0 ≤ x ≤ S_min. In fact, we have

Therefore, h(x) reaches at least a local maximum at x = S_min, implying that . Collecting, we establish as a necessary condition for an abrupt rupture (i.e., (28)) to take place. Next we show that this is also a sufficient condition. If meaning that h(x) is maximized at S_min, then the minimum solution for the inequality , when exists, will appear at x* that satisfies . This implies that the final system size is either 1 − p (when a solution to the inequality exists), or zero (when no solution to exists). In other words, when the final system size will be of the form (28), i.e., an abrupt rupture will take place. Combining, we conclude that (28) takes place if and only if

In the Supplementary File, we explore this issue further and provide a necessary (but not sufficient) condition for (30) to hold. This leverages the fact that for the maximum of h(x) to take place at x = S_min, the derivative of h(x) must change its sign to negative at this point; this would ensure a local maximum of g(x) take place at S_min. The resulting necessary condition, given here for convenience, is

Optimal P_LS distribution that maximize robustness

We now seek to find the optimal load-free space distribution P_LS(x, y) that maximizes the robustness of the power system, when and are fixed. First, we focus on maximizing the critical attack size p*. Recall (25) and observe that

where we use the Markov inequality at (31), i.e. that . Reporting this into (25) we get

This means that regardless of our choice of the joint distribution P_LS(x, y) the critical attack size can never be larger than .

Next, we show that this upper bound is in fact achievable by a Dirac-delta distribution of free space S. Assume that

where the load distribution P_L(x) is arbitrary; this is equivalent to having . Let denote the corresponding critical attack size. With , we have and , so that

Thus, we have , which immediately gives

Since the lower bound (32) holds for any distribution, we conclude that

This shows that a degenerate distribution on the extra space S leads to optimal (i.e., maximum) critical attack size . ■

We now show that a Dirac-delta distribution of free space not only maximizes p* but it maximizes the final system size n_∞(p) uniformly across all attack sizes. It is clear that n_∞(p) ≤ 1 − p for any distribution P_LS(x, y) and any attack size p. Thus, our claim will be established if we show that the Dirac-delta distribution (33) leads to an abrupt rupture and thus the resulting final system size is in the form given at (28). More precisely, we will get the desired result

upon establishing the abrupt rupture condition (30), i.e., that h(x) takes its maximum at x = S_min. This follows immediately upon realizing that we have

under the distribution (33). ■

Additional Information

How to cite this article: Zhang, Y. and Yağan, O. Optimizing the robustness of electrical power systems against cascading failures. Sci. Rep. 6, 27625; doi: 10.1038/srep27625 (2016).

References

Rinaldi, S. M., Peerenboom, J. P. & Kelly, T. K. Identifying, understanding and analyzing critical infrastructure interdependencies. Control Systems, IEEE 21, 11–25 (2001).
Article Google Scholar
Brummitt, C. D., D’Souza, R. M. & Leicht, E. Suppressing cascades of load in interdependent networks. Proceedings of the National Academy of Sciences 109, E680–E689 (2012).
Article CAS ADS Google Scholar
O’Rourke, T. D. Critical infrastructure, interdependencies and resilience. Bridge-Washington-National Academy Of Engineering, 37, 22 (2007).
Google Scholar
Dobson, I., Carreras, B. A., Lynch, V. E. & Newman, D. E. Complex systems analysis of series of blackouts: Cascading failure, critical points and self-organization. Chaos: An Interdisciplinary Journal of Nonlinear Science 17, 026103 (2007).
Article Google Scholar
Yağan, O., Qian, D., Zhang, J. & Cochran, D. Optimal allocation of interconnecting links in cyber-physical systems: Interdependence, cascading failures and robustness. Parallel and Distributed Systems, IEEE Transactions on 23, 1708–1720 (2012).
Article Google Scholar
Foundations, C. Protecting america’s infrastructures: The report of the president’s commission on critical infrastructure protection (1997).
Romero, J. J. Blackouts illuminate india’s power problems. Spectrum, IEEE 49, 11–12 (2012).
Google Scholar
Tang, Y., Bu, G. & Yi, J. Analysis and lessons of the blackout in indian power grid on july 30 and 31, 2012. In Zhongguo Dianji Gongcheng Xuebao(Proceedings of the Chinese Society of Electrical Engineering), vol. 32, 167–174 (Chinese Society for Electrical Engineering, 2012).
Kosterev, D. N., Taylor, C. W., Mittelstadt, W. et al. Model validation for the august 10, 1996 wscc system outage. Power Systems, IEEE Transactions on 14, 967–979 (1999).
Article ADS Google Scholar
Carreras, B., Newman, D., Dobson, I. & Poole, A. Evidence for soc in electric power system blackouts. In hicss, 2016 (IEEE, 2001).
Sachtjen, M., Carreras, B. & Lynch, V. Disturbances in a power transmission system. Physical Review E 61, 4877 (2000).
Article CAS ADS Google Scholar
Anghel, M., Werley, K., Motter, A. E. et al. Stochastic model for power grid dynamics. In System Sciences, 2007. HICSS 2007. 40th Annual Hawaii International Conference on, 113–113 (IEEE, 2007).
Crucitti, P., Latora, V. & Marchiori, M. Model for cascading failures in complex networks. Phys. Rev. E 69, 045104 (2004).
Article ADS Google Scholar
Wang, J.-W. & Rong, L.-L. Cascade-based attack vulnerability on the {US} power grid. Safety Science 47, 1332–1336 (2009).
Article Google Scholar
Pahwa, S., Hodges, A., Scoglio, C. & Wood, S. Topological analysis of the power grid and mitigation strategies against cascading failures. In Systems Conference, 2010 4th Annual IEEE, 272–276 (IEEE, 2010).
Yağan, O. Robustness of power systems under a democratic-fiber-bundle-like model. Phys. Rev. E 91, 062811 (2015).
Article ADS Google Scholar
Daniels, H. The statistical theory of the strength of bundles of threads. i. In Proceedings of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, vol. 183, 405–435 (The Royal Society, 1945).
Article ADS MathSciNet Google Scholar
Motter, A. E. & Lai, Y.-C. Cascade-based attacks on complex networks. Phys. Rev. E 66, 065102 (2002).
Article ADS Google Scholar
Wang, W.-X. & Chen, G. Universal robustness characteristic of weighted networks against cascading failure. Phys. Rev. E 77, 026101 (2008).
Article ADS Google Scholar
Mirzasoleiman, B., Babaei, M., Jalili, M. & Safari, M. Cascaded failures in weighted networks. Physical Review E 84, 046114 (2011).
Article ADS Google Scholar
Bernstein, A., Bienstock, D., Hay, D., Uzunoglu, M. & Zussman, G. Power grid vulnerability to geographically correlated failures—analysis and control implications. In INFOCOM, 2014 Proceedings IEEE, 2634–2642 (IEEE, 2014).
Kinney, R., Crucitti, P., Albert, R. & Latora, V. Modeling cascading failures in the north american power grid. The European Physical Journal B-Condensed Matter and Complex Systems 46, 101–107 (2005).
Article CAS ADS Google Scholar
Su, Z. et al. Robustness of interrelated traffic networks to cascading failures. Scientific reports 4 (2014).
Scala, A., Lucentini, P. G. D. S., Caldarelli, G. & D’Agostino, G. Cascades in interdependent flow networks. Physica D: Nonlinear Phenomena (2015).
Andersen, J. V., Sornette, D. & Leung, K.-t. Tricritical behavior in rupture induced by disorder. Physical Review Letters 78, 2140 (1997).
Article CAS ADS Google Scholar
da Silveira, R. Comment on “tricritical behavior in rupture induced by disorder”. Physical review letters 80, 3157 (1998).
Article CAS ADS Google Scholar
Sornette, D., Leung, K.-T. & Andersen, J. Conditions for abrupt failure in the democratic fiber bundle model. arXiv preprint cond-mat/9712313 (1997).
Pradhan, S. & Chakrabarti, B. K. Failure properties of fiber bundle models. International Journal of Modern Physics B 17, 5565–5581 (2003).
Article ADS Google Scholar
Roy, C., Kundu, S. & Manna, S. Fiber bundle model with highly disordered breaking thresholds. Physical Review E 91, 032103 (2015).
Article ADS Google Scholar
Pahwa, S., Scoglio, C. & Scala, A. Abruptness of cascade failures in power grids. Scientific reports 4 (2014).
Overbye, T. J., Cheng, X. & Sun, Y. A comparison of the ac and dc power flow models for lmp calculations. In Proceedings, 37th Hawaii International Conference on System Sciences (2004).
Stott, B., Jardim, J. & Alsac, O. Dc power flow revisited. Power Systems, IEEE Transactions on 24, 1290–1300 (2009).
Article ADS Google Scholar
Rosato, V. et al. Modelling interdependent infrastructures using interacting dynamical models. International Journal of Critical Infrastructures 4, 63–79 (2008).
Article Google Scholar
Buldyrev, S. V., Parshani, R., Paul, G., Stanley, H. E. & Havlin, S. Catastrophic cascade of failures in interdependent networks. Nature 464, 1025–1028 (2010).
Article CAS ADS Google Scholar
Christie, R. University of washington power systems test case archive (last accessed on 25 nov 2013)(1999). URLhttp://www.ee.washington.edu/research/pstca.
Chatziafratis, E., Zhang, Y. & Yağan, O. On the robustnxess of power systems: optimal load-capacity distributions and hardness of attacking. In Information Theory and Applications Workshop (ITA), 2016 (2016).

Download references

Acknowledgements

This research was supported in part by the National Science Foundation through grant CCF #1422165 and in part by the Department of Electrical and Computer Engineering at Carnegie Mellon University (CMU). O. Yağan also acknowledges the Berkman Faculty Development Grant from CMU.

Author information

Authors and Affiliations

Department of ECE, Carnegie Mellon University, Pittsburgh, 15213, USA
Yingrui Zhang & Osman Yağan

Authors

Yingrui Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Osman Yağan
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Y.Z. and O.Y. have contributed equally to all parts of the paper.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Electronic supplementary material

Supplementary Information

Rights and permissions

This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

Reprints and permissions

About this article

Cite this article

Zhang, Y., Yağan, O. Optimizing the robustness of electrical power systems against cascading failures. Sci Rep 6, 27625 (2016). https://doi.org/10.1038/srep27625

Download citation

Received: 29 February 2016
Accepted: 23 May 2016
Published: 21 June 2016
DOI: https://doi.org/10.1038/srep27625

This article is cited by

Nonlinear model of cascade failure in weighted complex networks considering overloaded edges
- Chao-Yang Chen
- Yang Zhao
- Harry Eugene Stanley
Scientific Reports (2020)
Limits of Predictability of Cascading Overload Failures in Spatially-Embedded Networks with Distributed Flows
- A. Moussawi
- N. Derzsy
- G. Korniss
Scientific Reports (2017)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.