Fragmentation of outage clusters during the recovery of power distribution grids

The understanding of recovery processes in power distribution grids is limited by the lack of realistic outage data, especially large-scale blackout datasets. By analyzing data from three electrical companies across the United States, we find that the recovery duration of an outage is connected with the downtime of its nearby outages and blackout intensity (defined as the peak number of outages during a blackout), but is independent of the number of customers affected. We present a cluster-based recovery framework to analytically characterize the dependence between outages, and interpret the dominant role blackout intensity plays in recovery. The recovery of blackouts is not random and has a universal pattern that is independent of the disruption cause, the post-disaster network structure, and the detailed repair strategy. Our study reveals that suppressing blackout intensity is a promising way to speed up restoration.


Supplementary Note 1. Outage data
The outage data comes from four open-source websites with real-time outage reports provided by Eversource, National Grid, and Entergy, three electric companies in the United States. The four websites are listed below. We have recorded a total of 682,733 outages that affected 19,384,168 customers through more than a year's monitoring. The shared information they have includes start and end time, the number of customers affected, latitude and longitude, county, as well as block and its population.
Dataset 1 covers 78,512 outages in Massachusetts from 2018/11/20 to 2020/04/29. It is the only dataset among the four we obtained that does not include the cause of failure. Datasets 2 and 3 have the same monitored time period as dataset 1 does. Dataset 2 is also from MA but reported by National Grid, which has totally different power distribution systems than Eversouce does. Dataset 3, although provided by National Grid, contains data from NY. In total, 48,441 and 131,563 outages are recorded by the two datasets, respectively. Entergy data lasts from 2019/07/13 to 2020/08/12, including 424,217 outages.
All datasets contain two parts: daily data and blackout data. The daily data includes outages that often appear as individual or small-scale events that could be repaired within a short time period (usually less than 24 hours). The blackout data includes outages happening during the lifetime of blackouts, which accounts for roughly 0.05% of the total observed outages. Blackouts discussed are presented in Table S1. In the table, the start and end time of a blackout are both calculated at 12 o'clock in the morning    Table S1). As time goes on, outage numbers would quickly climb to a peak value and then slowly go back to a normal state.

Supplementary Note 2. Size distribution of outage clusters
It implies that the probability that a cluster fragments into two sub-clusters is independent of both the sizes of the two parts. Given that only one cluster of size N exists at moment t = 0, i.e., S N (0) = 1, the evolution of cluster of size N follows Similarly, for k = N − 1, the kinetic equation for outage clusters with size N − 1 becomes Again, at moment t = 0, only the initial cluster of size N exists. The number of clusters with other sizes equals zero, i.e., S k (0) = 0 for any 1 ≤ k < N . In view of this boundary condition, by submitting (2) into (3) and leveraging the general solution of first-order non-homogeneous linear differential equation 1 , we have Continuing the iteration process for the cases of k = N − 2, k = N − 3, · · · , we could get a general formula as follows This result is the number of outage clusters of size k, or size distribution, in the network.
where D is the maximum difference (ranging both over data samples and over quadrants) of the four corresponding integrated probabilities, r is the coefficient of correlation, N is the sample size with form and Q KS (·) is a function defined as When the p-value is larger than 0.2, the two data samples are not significantly different and could be accepted as of the same underlying distribution 4 .
As the two-dimensional KS-test can only be used for two samples. We divide blackout and daily operation data listed in Table S1 and S2 into two groups, with Eversource data and National Grid data from NY in group A, Entergy data and National Grid data from MA in group B. Then the two-dimensional KS-test can be used to test whether two dataset distributions in each group differ. From Table S3, we get that the null hypothesis that the two samples in group A (B) come from the same distribution is acceptable. Therefore, we can take data in a group as one sample, and further test whether data from groups A and B share the same distribution.
Again, the null hypothesis could not be rejected. To sum up, the four sets of data could be seen as from the same distribution.