Recovery coupling in multilayer networks

The increased complexity of infrastructure systems has resulted in critical interdependencies between multiple networks—communication systems require electricity, while the normal functioning of the power grid relies on communication systems. These interdependencies have inspired an extensive literature on coupled multilayer networks, assuming a hard interdependence, where a component failure in one network causes failures in the other network, resulting in a cascade of failures across multiple systems. While empirical evidence of such hard failures is limited, the repair and recovery of a network requires resources typically supplied by other networks, resulting in documented interdependencies induced by the recovery process. In this work, we explore recovery coupling, capturing the dependence of the recovery of one system on the instantaneous functional state of another system. If the support networks are not functional, recovery will be slowed. Here we collected data on the recovery time of millions of power grid failures, finding evidence of universal nonlinear behavior in recovery following large perturbations. We develop a theoretical framework to address recovery coupling, predicting quantitative signatures different from the multilayer cascading failures. We then rely on controlled natural experiments to separate the role of recovery coupling from other effects like resource limitations, offering direct evidence of how recovery coupling affects a system’s functionality.

As critical infrastructure systems have increased in size and complexity, so has the interdependence between them-communication systems require electricity from the power grid, whose functioning and maintaining relies, however, on communication systems.Both networks rely on the transportation system for repairs, and in turn transportation needs both electrical power and a functioning communication system.These multiple interdependencies, and their consequences for resilience, have inspired an extensive literature on coupled multilayer networks, crossing disciplinary boundaries [1][2][3][4][5][6][7][8][9][10].
The common hypothesis behind the current multilayer network modeling framework is one of hard coupling, where a node or link failure in one network causes node or link failures in another network, which in turn may induce additional failures in the original network, resulting in a domino-like cascade of failures across multiple systems [3] (Fig. 1a).Despite the many modeling insights it has offered, evidence of such hard cascading failures remains limited in real systems.For example, while communications and some transit networks do depend directly on electricity, failures in these networks rarely cause electrical failures [8].Furthermore, while cascading failures in the electric grid are well documented [11][12][13][14][15][16], despite a decade-long body of literature on the subject, we continue to lack convincing empirical evidence of domino-like interdependencies induced by them on other infrastructure systems.
While direct evidence of hard coupling across multiple networks is limited, there are multiple accounts of interdependencies not considered by the current modeling frameworks, those induced by the recovery process [8].Indeed, the repair and the recovery of a network following a local or global failure requires resources typically supplied by other networks.
For example, restoring failed power components requires that the repair crews have access to transportation (road networks) and coordination through communications (cellular networks and internet).If the support networks are not fully functional, the delivery of resources critical for recovery will be slowed or impaired (Fig. 1b).Indeed, while a blocked road or an internet outage in a given location will not cause a power outage, it may delay the repair of power outages in the affected area.And because the damage may continue regardless of the system's ability to recover, impaired recovery could eventually lead to a system's collapse.Such recovery-based interdependencies were well documented in the aftermath of Hurricane Sandy: at least 85 incidents of recovery interdependence were reported, including the dependency of the power grid's recovery on other networks [17].
Here we show how recovery coupling affects a network's functionality, finding that its signatures and dynamics are different from the much-studied multilayer cascading failures.
To empirically test the developed framework, we collected data on millions of power grid failures in the contiguous United States, finding evidence of striking nonlinear behavior in recovery following large perturbations, consistent with the model predictions.
Consider two infrastructure systems X and Y , each composed of N elements (nodes).
Each network is described by its adjacency matrix, X ij and Y ij , and we label the nodes geographically so that co-located nodes x i and y i , have the same index i.At any moment, each node can either be functional (x i = 1, y i = 1) or non-functional (x i = 0, y i = 0).A non-functional node can cause secondary damage either by isolating its neighbors from the rest of the network, or via cascading mechanisms [16].Though a single node or link failure can render other parts of the network nonfunctional, once the initial failure is repaired, typically the secondary failures will also return to functionality [18].For example, though a downed power line may cut off power to many homes, once the line is repaired, the power will be restored to each home without needing the individual repair of each component.
Assuming a constant damage rate γ d µ and a constant repair rate γ r µ , the fraction of primary failed nodes in each network f µ evolves in time as reaching the equilibrium damage fraction The damage rate γ d µ is largely exogenous and determined by weather, accidents or component failures.The repair rate γ r µ , in contrast, is determined by the resources available for repair, such as crew and supplies.
Equations ( 1)-( 2) predict a linear relationship between the number of damaged nodes and the number of repairs executed within a given time window, analogous to the elastic balance between displacement and restoring forces in stress-strain relationships in materials science [19,20].A constant damage rate γ d µ leads to γ d µ N sites being damaged at any time, and temporal variability can be modeled by replacing the constant γ d µ with a stochastic variable from a representative distribution (see Supp.Sec. 2).
To empirically test the validity of elastic recovery, we built an Outage Observatory, a suite of continually running web crawlers that record live-updating outage maps [21][22][23] from electrical utilities around the United States (Fig. 2a).During 2019 the Observatory recorded over 5 million power outages, capturing the geographic location and time of each outage and the repair time for each incident (Fig. 2e).By comparing the number of repairs and outages occurring in a utility at any time, we can construct the damage-repair curves for each utility (Fig. 2b and 2c), finding that for most utilities the recovery follows the linear response of Eq. ( 1) 95% of the time, whose slope provides the repair rate (Supp.Sec. 4 for details).However, we also observed multiple large disruptions, for which the number of repairs systematically and significantly deviates from the linear pattern characterizing the elastic behavior (Fig. 2d).We have been able to link many of these to large events such as severe winds, rainfall, snowfall and fires.For example, a derecho system which struck the Northern Midwest on July 19 2019 [24] caused over 55,000 outages, resulting in over 60 million lost customer hours.Each perturbation impacts the power grid and its support systems in different ways, hence the precise deviation from linearity cannot be inferred from the number of outages alone.Though each large failure has its unique cause and recovery dynamics, when we place all perturbations on the same graph we observe a remarkable universality, finding that all large events collapse on a single nonlinear curve (Fig. 2d).
The loss of elasticity during extreme perturbations indicates that the hypothesis of a constant repair rate is not sufficient to explain the system's behavior.Given that the repair process requires resources from other networks, we hypothesize that a multi-network approach could explain the observed deviation.To model the observed dependency, we allow the repair rate γ r X,i of the primary network X (e.g.power grid) at node i to depend on the state of the support network Y (e.g.road or communication network) at the same location (see Fig. 1b), obtaining where g(x) is an unknown function which represents the functional dependence of the repair rate of system X on the state of network Y around site i, which we assess with the network average to capture the fact that the repair resources are drawn from the neighborhood of the failure.
Denoting with γ r,0 X = g(1) the elastic repair rate and with α = −g (1)/g(1), we obtain enabling us to describe the expected behavior of g(x) to first order with the assumption that α ∈ (0, 1).Specifically, we assume that damage in Y will not improve repair in X (g (1) ≤ 0 → α ≥ 0) and that the repair rate must remain positive (|g (1 If damage is sporadic and uncorrelated across both systems, the simultaneous failure of x i and y i for a given i is rare, and when the failures are limited to a single network, recovery is not impaired (Fig. 1b).However, if damage in X and Y is correlated in time or space, simultaneous damage of nearby sites in X and Y will occur with higher frequency and based on Eq. ( 5) we expect a reduction in the repair rate.Such correlations are often caused by severe weather events, the main source of disruptions to all infrastructure systems in the United States [25,26].These events are highly localized in time and space, simultaneously damaging the electric, communications and transportation networks.Hurricane Sandy, for example, induced failures across the power grid and communications networks (downed lines, flooded control centers) and transportation networks (flooded roads).These simultaneous failures lead to recovery delays, as power outages could not be repaired because roads were flooded.At times, the coupling was bi-directional: some flooded roads had pumping systems for drainage, which could not be operated without electricity [17].
When there are many outages at once, the repair time can also be affected by resource limitations, like a limited number of repair crew members and trucks.Yet resource limitations are expected to impact the whole service area equally.If, however, the slowdown is limited to regions where the support infrastructure is damaged, recovery coupling is the main driving factor.To distinguish between these two mechanisms, we relied on natural experiments, when exogenous shocks simultaneously affected the electrical network and its support networks.In September 2019, Tropical Depression Imelda caused widespread power outages and flooding in Houston, Texas and the surrounding area (Fig. 3a).We analyzed the duration for all power outages in the vicinity of flooded roads, using areas without flooding as control, allowing us to test whether the slowdown in outage repairs was due to system-wide drains on resources or on the dependence of the repair rate on road networks.
We also considered a temporal control, inspecting the repair times of outages reported over the previous 60 days in the same area (see Fig. 3b, 3e).We find that the slowdown in outage restoration is heavily localized in both space and time around the flooded roads: while more than 95% of the outages located more than 30 km from the flooded roads were repaired within 10 hours, 40% of the failures occurring within 5 km of a flooded road remained unrepaired after 10 hours.Furthermore, even during the storm, outages far from flooded roads were repaired at the same rate as without a storm (spatial control, Fig. 3e).The observed separation of outage survival curves at different distances from flooded roads offers direct evidence of multilayer recovery coupling, illustrating how damage in a non-electrical infrastructure impacts the functionality of the electrical infrastructure.
Further evidence of the proposed phenomenon is provided by the coexistence of elastic behavior far from the flooded roads with inelastic behavior near them (Figs.3c and 3d).
We note that the repair amounts are not only below the elastic prediction, but decrease with increased damage, in line with the prediction that the deviation from elasticity are not caused by resource constraints, which tend toward saturation of repair per unit time (Supp. Sec. 1 and [27]).
To understand the implications of recovery coupling for multilayer network resilience, we consider the symmetric case in which the network structure, damage and recovery parameters are the same in both systems.Since the two systems support each other, we let the repair rate of Y be influenced by the state of X in the same manner as Eq. ( 5): γ r Y,i = g( x i ).In the symmetric case f x = f y = f , leading to a single equation that governs the state of the system.If the failures are uniformly distributed, we can use percolation theory [28,29] to analytically derive the equation that governs the expected fraction of primary failures in the coupled system, where u(x) is the probability that a link does not lead to the largest connected component when a random fraction 1 − x of the nodes are removed, and is determined by the network topology.Equation ( 6) has one or two stable solutions depending on the value of the control parameter γ r,0 γ d .In contrast, the uncoupled case (2), which we recover from ( 6) for α = 0, has a single stable solution.The new solution describes a stable fixed point at f = 1 (all nodes failed), which persists even for high recovery rates γ r,0 γ d (see Fig. 4a).The existence of two stable solutions for f for the same recovery rate γ r,0 γ d indicates that for a wide range of conditions, recovery coupled networks are resilient: they display functionality comparable to the uncoupled case, returning to full functionality following small perturbations [30,31].
However, a sufficiently large perturbation can force the system to cross the unstable branch, pushing it into a dynamically stable non-functional state (Fig. 4a).The existence of this behavior analytically predicts a "catch 22" phase that follows a sufficiently large disaster: infrastructure system X cannot be repaired because it requires resources from Y , and Y cannot be repaired because it requires resources from X.The fact that the collapsed state persists even for high repair rates and low damage rates predicts that it is harder to bootstrap a broken system than it is to maintain the functionality of one that is damaged but still working.Synthesizing elastic residual curves (Fig. 4c) like the observations in Fig. 2d, we find that the full coupling α = 1 reproduces the shape of the curve, while lower values of α do not, providing further evidence that the general deviation from elasticity is consistent with recovery coupling.
The September 27th, 2003 blackout in Italy is often used to illustrate how the interdependence of communications and electrical infrastructure can cause cascading failures [3,32].However, a closer look at the sequence of events indicates that failures in the communication network did not trigger a domino-like cascade of failures in the power grid, but hindered the ability of the operators to communicate, prolonging the recovery [33].Here we demonstrated that such recovery coupling can lead by itself to a collapse of functionality.More importantly, we have shown that the signatures of recovery coupling are directly observable during severe weather events, indicating that the proposed mechanisms have direct relevance to real multilayer networks.Domino-like dependencies, which could co-occur, further amplify this danger.
The ability to identify specific instances of impaired resilience due to recovery coupling opens the door to studying interdependent infrastructure as it is.For example, we find that while the set of flooded roads as a whole caused slowdowns in power outage repairs, some impaired roads had much stronger effects than others.The roads in downtown Houston caused only minor delays when flooded, while in Beaumont and Northeast Houston flooded roads caused severe delays (see Fig. 3a).By highlighting actual cases where interdependence did and did not play a role, we open the door to addressing the pressing problems of infrastructure vulnerability, an acute issue in light of aging infrastructure and climate change.
Recovery coupling has relevance for other systems affected by multiple networks.A pertinent example is frailty, as living organisms have multiple repair mechanisms supported by different biological networks designed to cope with ongoing stress and damage [34,35].
These systems also display a fundamental asymmetry between damage and repair: damage is typically caused by external factors (oxidants, pathogens, shocks, etc.) while repair is endogenous and is governed by multiple coupled networks (regulatory, metabolic and signaling) requiring diverse resources (nutrients, oxygen, immune cells, etc.).If the damage is sufficiently widespread, impairing the repair mechanisms, the organism becomes frail, losing its capacity to recover from shocks it could tolerate under normal conditions [36].As understanding and data availability of the fundamental biological processes improves, we expect the conceptual framework developed here to play a role in unraveling the vexing puzzle of senescence and aging.
Node x1 is disabled exogenously, triggering a repair process, with resources delivered through y1.If y1 is functional, together with y2 and y3, who support y1, recovery is prompt.

!FIG. 1 :FIG. 2 :FIG. 3 :
FIG. 1: Damage and Recovery in Interdependent Networks (a) Under the hard coupling model, when node x 1 fails, it causes a cascade across both networks, that eventually disables the entire system.(b) With recovery coupling on the same network, when node x 1 fails it is repaired using resources from network Y delivered through node y 1 .Therefore failures in network Y will impair that repair process.

FIG. 4 :
FIG. 4: Recovery Coupling in Multilayer Networks (a) Comparison between the functionality of uncoupled networks and recovery coupled networks.The uncoupled case (blue line) has a single solution for any repair to damage ratio γ r 0 /γ d , implying that it can recover its functionality after an arbitrarily large perturbation.With recovery coupling the system can function at levels similar to the coupled case (orange line) but the non-functional collapsed state persists as an attractor (red line), implying that for sufficiently large damage system can reach a permanently collapsed state.(b) If we inspect the recovery per unit time as a function of concurrent damage amount, we observe a behavior similar to elasticity in materials science.Recovery coupling leads to a sublinear or inelastic behavior, predicting the loss of resilience under heavy damage.(c) The elastic residual plot from bidirectionally coupled random networks shows the same pattern as observed for the real data in Fig. 2d.