Abstract
Selfhealing smart grids are characterized by fastacting, intelligent control mechanisms that minimize power disruptions during outages. The corrective actions adopted during outages in power distribution networks include reconfiguration through switching control and emergency load shedding. The conventional decisionmaking models for outage mitigation are, however, not suitable for smart grids due to their slow response and computational inefficiency. Here, we present a graph reinforcement learning model for outage management in the distribution network to enhance its resilience. The distinctive characteristic of our approach is that it explicitly accounts for the underlying network topology and its variations with switching control, while also capturing the complex interdependencies between state variables (along nodes and edges) by modeling the task as a graph learning problem. Our model learns the optimal control policy for power restoration using a Capsulebased graph neural network. We validate our model on three test networks, namely the 13, 34, and 123bus modified IEEE networks where it is shown to achieve nearoptimal, realtime performance. The resilience improvement of our model in terms of loss of energy is 607.45 kWs and 596.52 kWs for 13 and 34 buses, respectively. Our model also demonstrates generalizability across a broad range of outage scenarios.
Similar content being viewed by others
Introduction
Resilience enhancement of power distribution networks (DNs) has been gaining considerable recognition in recent years, which has been often overlooked before due to the perception of DNs as merely a link between the transmission networks and consumers. A key factor for this shift is the realization that 90% of customer disruptions during extreme events can be attributed to the failure of components within the distribution network itself ^{1}. Additionally, the increasing presence of distributed energy resources (DERs) and the resulting decentralization of power generation have spurred the notion of DNs as autonomous entities that can operate independently from the main grid^{2}. Consequently, the DN is now considered capable of retaining its functionality even during the loss of connectivity to the transmission network.
Concurrently, modernization of the power grid and the shift toward smart grids have been driving the deployment of intelligent and automated technologies in the DN^{3}. The distribution automation has been implemented through the deployment of line monitors, fault indicators, remotecontrolled switches, and reclosers in the DN^{4}. An important characteristic of the smart grid is its selfhealing capability, which includes implementing intelligent control actions through automation to minimize power disruptions, thus enabling the recovery of network operations during outages in real time^{5}. Therefore, the key requirements of a selfhealing tool include autonomy, quick response, and online adaptability, which are indeed the salient features of our model discussed in this paper.
The transformation of the grid to a smart grid is driven by a bottomup approach^{6} with distribution feeders interacting at the transmission level. This paper specifically explores the intricacies of the lowerlevel component of the smart grid  the distribution network. The smart grid typically operates as an independent entity governed by an independent system operator (ISO). Intergrid operations are challenging due to differing protocols, communication systems, and regulatory jurisdictions among independent system operators (ISOs). Additionally exploring new frontiers in smart grid operation is constrained by the ongoing development of communication infrastructure standardization and interoperability. The operation and control of the distribution networks within the smart grid are mostly autonomous with its aggregated impact visible on the transmission level^{7}. However, intergrid operations are seldom employed during extreme events, driven by concerns about potential cascading failures between independent entities.
In the face of power disruptions caused by extreme weather events or cyberphysical attacks, a selfhealing DN warrants the automatic detection of faulty components, their isolation, and system restoration (fully or partially) using intelligent control algorithms. This process is referred to as FLISR, which stands for fault location, isolation, and service restoration^{5}, and is addressed using taskspecific techniques. Restoration or the recovery of DN operation can be achieved using different control actions, such as network reconfiguration, load management, DER control, energy storage control, and reactive power resource control. The preliminary control action often adopted in such circumstances is reconfiguration (or switching control), followed by load shedding^{8,9}. Distribution network reconfiguration (DNR) by controlling the status of the network switches is a commonly used strategy to control DN operation for varying objectives such as loss minimization, reliability enhancement, load balancing, increasing penetration of renewable resources, improvement of voltage profile, and service restoration^{10,11,12}. The purpose of feeder reconfiguration is twofold: (1) to quickly and efficiently reroute power from the functional part of the DN to the isolated section^{13,14}, and (2) to form intentional islands around the gridforming DERs when there exists no connectivity to the main grid^{2,15}. In the existing body of knowledge, these two reconfiguration strategies have been addressed separately and have been largely considered as two distinct domains. However, a comprehensive restoration strategy suitable for various outage scenarios must efficiently utilize both gridforming and gridfeeding DERs and consider all possible reconfiguration (or switching) options^{16}. Hence, in our framework, we consider both DN characterizations through switching by including simultaneously gridconnected and offgrid modes of operation. Additionally, DNR alone may not be sufficient as a restorative action during catastrophic events, as the network remains vulnerable to voltage collapse and system blackouts^{9,17}. Therefore, load shedding becomes necessary as an emergency control mechanism^{18} to minimize voltage violations in the DN.
Furthermore, power distribution networks are typically unbalanced and radial in nature, with a unidirectional power flow from the substation to the consumers. Besides the nonlinearity in power flow, the optimization of modernday DN operation has also been made challenging by the integration of DERs^{19}. The DN restoration is an NPhard, nonlinear combinatorial optimization problem that aims to maximize energy supply while considering network connectivity and operational constraints^{20}. Various methods have been used in the literature to solve the traditional reconfiguration problem, falling into heuristic^{21,22}, metaheuristic^{23,24,25}, and mixedinteger programming^{26,27,28} techniques. In line with the increasing penetration of DERs, researchers have also explored islanding strategies using mixedinteger programming models to expand the zone of DER operation^{2,29}. Load management during outages has also been previously investigated as an emergency control strategy^{16,30}. Despite these efforts, a solution incorporating both the gridconnected and islanding (offgrid) reconfiguration schemes for outage management is limited in literature, and presents a complex and challenging problem to solve. The multitude of restorative options depends on the number of controllable devices (switches, loads) and the operational modes of DERs. Although the proliferation of remotecontrolled elements in the DNs widens the horizons of automated network control, it also increases the complexity of the underlying nonlinear combinatorial optimization problem^{31}. The commonly used mixedinteger nonlinear programming (MINLP) methodologies for restoration problems face issues of scalability, computational tractability, and realtime decisionmaking capability^{16}. Apart from these, the existing linear programming approximation models in the literature are not designed to address restoration in threephase unbalanced DNs with sectionalizing, tie switches, and various types of DERs (gridforming and gridfeeding). Heuristic and metaheuristic techniques, although explored, tend to be computationally expensive and timeconsuming. Moreover, traditional methods heavily rely on a comprehensive description of the DN model and network parameters, making them modeldependent. Considering the uncertainty in network conditions during outages, it is desirable to develop a model capable of adapting to varying circumstances and is deployable online. Here, we present a model based on reinforcement learning to provide online decision support during outages.
Reinforcement learning (RL) methods have been increasingly adopted in recent years for power system applications that require autonomous control^{32}. This is because RL methods are quite effective in solving highdimensional, combinatorial, stochastic optimization problems, besides providing fastacting control. The latter is imperative to rapid responsiveness during outages, otherwise not possible with conventional optimizationbased decision support. Deep RL is being increasingly employed for voltage control in active DNs in recent literature. In ref. ^{33}, the DER inverters and static VAR compensators were controlled to achieve the desired voltage levels in the network using a combination of graphbased network representation learning, surrogate model of power flow, and soft actorcritic algorithm. In ref. ^{34}, the distributed energy storage devices have been treated as agents, and a multiagent deep RL was utilized for voltage regulation with the capability to respond to topology changes as well. In another study^{35}, multiagent deep RL was applied to perform optimal scheduling of various DERs, energy storage systems, and flexible loads within the network. In this context, the inverters associated with DERs and energy storage can be considered as individual agents. The role of such devices in voltage regulation aligns with the distributed nature of their control mechanism. Conversely, outage management using reconfiguration and load control relies on widearea measurements at the control center to facilitate switching operations. Particularly with regards to reconfiguration, RLbased models^{36,37} have been developed to perform dynamic DNR during normal operation for loss minimization and voltage improvement. These methods specifically used deep Qlearning with neural networks and trained the offpolicy RL network using a historical network operation dataset. The exploration problem that may arise in these models has been addressed by a NoisyNet Qlearning model^{38} developed to perform DNR for similar objectives.
Another approach^{39}, utilized a batchconstrained soft actorcritic algorithm to learn the control policy for loss minimization during normal DN operation. As opposed to the DNR during normal operation considered in these studies, extreme operating conditions are more challenging considering the highimpact, lowprobability occurrence of such events. Therefore, availing historical datasets for network operation may also not be possible as in previous studies. Although researchers have explored using RL models for DNR^{40,41} to improve network resilience, such works do not consider the feasibility of network operation based on voltage monitoring and DER operational modes during reconfiguration. Additionally, in methods based on the Qlearning approach, the policy network determines the optimal/nearoptimal configuration or the spanning forest, rather than individually controlling each switch. This approach would require enumerating all feasible configurations to define a Qprobability matrix, which is impractical due to the exponential increase in state and action space with network size, possible outage scenarios, and the number of devices. Since these methods are not scalable and require significant storage and computational capabilities for exploration, policy gradient methods are more suitable for learning in outage conditions^{39}. We, therefore, employ the proximal policy optimization (PPO), which is a policy gradient method for learning DN outage management in DN. In another work^{42}, a deep Qlearningbased RL approach was employed to dynamically form microgrids in response to outages. However, this method necessitates the compilation of all radial feasible structures before the learning process and does not encompass both forms of reconfiguration. Similarly, ref. ^{43} utilized a Qlearningbased strategy for reconfiguration and load shedding. Lastly, in addition to load and switch control, deep RL could also be used for optimal dispatch of DERs in islanded mode as demonstrated in ref. ^{44}.
In this model, our idea is based on the intrinsic graph representation of power distribution networks. The DN is viewed as a graph where nodes are the buses (i.e., substation, load, or DERs) and edges are the lines or transformers. The state variables of the DN, including demand/generation estimates and voltage/current measurements, can be considered as data superimposed on a graph. The state variables exhibit complex interdependencies, necessitating the extraction of meaningful representations that accurately capture the structure of the DN connectivity. Moreover, outage management, particularly reconfiguration, involves altering the DN connectivity by switching on/off network lines (binary actions) and hence, requires consideration of the underlying combinatorial network structure. Therefore, we present Graph RL (GRL) approach for simultaneous realtime control of network topology and loads, ensuring sustained network operations during failures that are caused by extreme events. GRL uses a graph neural network or GNN as a policy model (as is the case here) and/or “value” model, as it allows more effective capturing of the combinatorial nature of networkbased state information (involving both binary and continuous variables). This advantage is demonstrated in our case studies through comparison with baseline RLbased solutions that use a standard multilayered perceptron (MLP) based policy model.
Specifically, we use a Graph Capsule (GCAPS) neural network to learn optimal control policies in power network resilience problems. Compared to other GNNs such as Graph Convolutional Networks (GCN), the capsulebased GNN has been shown by refs. ^{45,46,47} to better capture the structural information of a graph (the DN in this work) as a graph embedding, where the individual intermediate features of the state are represented as a vector (in GCAPS) as compared to that of a scalar for example in GCN and Graph Attention Networks (GAT), thus giving an enhanced state representation. This enhanced state representation helps in computing better actions compared to other simple feature abstraction networks such as MultiLayered Perceptron (MLP). Experimental validation of our trained GCAPSbased model on test networks demonstrates the generalizability and realtime control capability with nearoptimal performance which is desirable in a selfhealing tool for DNs.
Results
Reconfiguration and load shedding as emergency response
During extreme events in the DN, the occurrence of outages due to component failures can be addressed by a combination of control actions, including reconfiguration and load shedding. We assume that realtime outage detection and protection system responsible for detecting, locating, and isolating faulty components is a preliminary step to the work discussed in this paper.
Line switches in the DN are typically divided into two categories: switches associated with normallyclosed sectionalizing lines and those with normallyopen tie lines. During emergency conditions, when component failures disrupt the power supply to the network loads, reconfiguring the DN through control actions on these switches can help maintain network functionality. The objective in such situations is to maximize (or minimize) the energy supplied (or loss of energy) to the loads, despite the network failure, while ensuring operational stability. The optimal switching control depends on factors such as the network state (voltage, branch flow, etc.), network operational limits, and the location and extent of the outage in the network.
Besides this, the presence of DERs, particularly gridforming DERs, plays a pivotal role in providing uninterrupted supply to loads following outages. In the offgrid mode, the formation of a selfsustained entity comprising loads and DERs is only possible with the assistance of gridforming DERs. These gridforming DERs generate the reference voltage and frequency for the isolated network section while gridfeeding DERs follow this reference and inject active/reactive power into the grid^{48}. While the detailed modeling of these DERs is beyond the scope of this work, they are represented as voltage sources when operating in the gridforming mode, and this characterization is incorporated in the DN model within the environment.
Reconfiguration is often used as an umbrella term for any change in normal operating network topology using switching control. On the other hand, intentional islanding has long been recognized as a resilience enhancement technique and is a subset of the reconfiguration problem. In scenarios where the outage is extensive and the availability of tie switches is limited, intentional islanding around gridforming DERs may be adopted to ensure a continuous power supply. Figure 1 illustrates the different switching actions that may be employed based on the extent of the outage. Different outage scenarios are portrayed in Fig. 1 with mitigation strategies representing the possible solutions we considered while designing the environment.
Network topology control through switching actions alone cannot guarantee the operational feasibility of the energized sections in the network. Therefore, to ensure sustainable network operation, emergency load shedding is also considered to maintain network voltage within safe operational limits. The loads are modeled as equivalent load at the distribution transformer in the primary distribution system and can be disconnected from the network through switching actions.
DN representation as a graph
Outage management in DN using switching control can be largely viewed as a task of learning the associated network topology, which is our motivation to reformulate the problem in graphtheoretic terms. Consequently, we represent the DN as a graph \({{{{{{{\mathcal{G}}}}}}}}=({{{{{{{\bf{N}}}}}}}},{{{{{{{\bf{E}}}}}}}})\), with an N set of nodes interconnected by an E set of edges. The nodes in the graph represent the buses in the DN, including the substation, load, DER, and zeropower injection buses. The edges represent the distribution lines and inline transformers. These lines (edges) consist of both switchable (sectionalizing and tie) and nonswitchable lines. The node variables comprise both forecasted or estimated variables and measured variables. These variables include the estimated or forecasted values for active power demand (or generation), reactive power demand (or generation), and the threephase voltage measured at each bus. The edge variable considered is the measured power flow through the branches. To obtain these measured signals, we utilize a power flow simulator in our synthetic approach.
Network reconfiguration in the graph domain essentially involves determining the status (open or closed) of the switchable edges in the DN. Emergency load shedding at the primary DN level is indicated using a binary variable associated with the nodes representing switchable loads.
A Markov decision process over graphs
The emergency response during outages in the DN is formulated as a Markov Decision Process (MDP) in the graph domain, denoted as \({{{{{{{\mathcal{M}}}}}}}}=({{{{{{{\mathcal{S}}}}}}}},{{{{{{{\mathcal{A}}}}}}}},{{{{{{{{\mathcal{P}}}}}}}}}_{tr},{{{{{{{\mathcal{R}}}}}}}})\). The tuple denotes the state, action, transition probability, and reward (in the respective order), which are defined as follows:

(1)
State (\({{{{{{{\mathcal{S}}}}}}}}\)): the state is composed of relevant observations from the DN that represent the current operating condition of the network. It includes node variables, edge variables, network topology, and other system variables, denoted as \({{{{{{{\mathcal{S}}}}}}}}=[{P}_{d}^{N},{Q}_{d}^{N},{P}_{g}^{N},{Q}_{g}^{N},{V}^{N},{V}_{{{{{{{{\rm{viol}}}}}}}}},{l}^{E},{{{{{{{\mathcal{T}}}}}}}},{E}_{{{{{{{{\rm{supp}}}}}}}}},{{{{{{{\mathcal{O}}}}}}}},\mu ]\). Here, \({P}_{d}^{N},{Q}_{d}^{N}\) represents the estimated or forecasted active and reactive power demand at the nodes, while \({P}_{g}^{N},{Q}_{g}^{N}\) corresponds to the active and reactive power generation at the nodes. The threephase voltage measured at the buses (graph nodes) is represented as V^{N}, and V_{viol} indicates the voltage violation in the network. The edge variable includes the power flow through the network branches, denoted as l^{E}. The operating topology of the network is \({{{{{{{\mathcal{T}}}}}}}}\), and the total energy supplied in the network is represented by E_{supp}. The variable \({{{{{{{\mathcal{O}}}}}}}}\) in the state encapsulates the outage scenario, i.e., the multiline failures in the network, including switch outages. The inoperability of the outage switches is addressed by using a masking mechanism that suppresses the corresponding switching action, represented by the state variable μ.

(2)
Action (\({{{{{{{\mathcal{A}}}}}}}}\)): the control actions for emergency response include switching and load shedding. Therefore, the action space is represented as \({{{{{{{\mathcal{A}}}}}}}}=[{\delta }_{1}^{sw},{\delta }_{2}^{sw},...,{\delta }_{{N}_{S}}^{sw},{\delta }_{1}^{ld},{\delta }_{2}^{ld},...{\delta }_{{N}_{L}}^{ld}]\). Here N_{S} represent the number of switchable lines, which includes both the sectionalizing and tie lines. The number of switchable loads in the network is denoted as N_{L}. Line switching is represented by a binary variable δ^{sw} where 0 and 1 represent the opening and closing of the switch, respectively. The status of the loads is also represented by a binary variable δ^{ld}, where load served and load shed respectively corresponds to 1 and 0.

(3)
Transition probability (\({{{{{{{{\mathcal{P}}}}}}}}}_{tr}\)): the transition probability captures the dynamic nature of the network with emergency response, denoted as \({{{{{{{\mathcal{P}}}}}}}}({s}_{t+1}^{{\prime} } {s}_{t},{a}_{t})\). This represents the transition from network state s at time step t to state \({s}^{{\prime} }\) at step t + 1 given that action a is implemented at time step t. The transition probability is learned by the agent from its interactions with the environment.

(4)
Reward (\({{{{{{{\mathcal{R}}}}}}}}\)): the reward guides the GRL algorithm to take optimal control actions for mitigating outages in the DN, which is formulated as follows:
$$r(s,\; a)=\left\{\begin{array}{l}{E}_{{{{{{{{\rm{supp}}}}}}}}}{V}_{{{{{{{{\rm{viol}}}}}}}}},\; {{{{{{{\rm{if}}}}}}}}\,{C}_{{{{{{{{\rm{viol}}}}}}}}}=0,\\ 0,\hfill \,\,\,\,{{{{{{{\rm{otherwise}}}}}}}}.\quad \end{array}\right.$$(1)The reward reflects the goal of improving resilience in the DN by maximizing the energy supplied E_{supp} while minimizing violations of voltage constraints. To account for the network being illconditioned with specific outage conditions and switching actions, a term C_{viol} is introduced into the reward. The DN, subject to topology changes due to outages and switching actions, may consist of multiple independent sections (network components), each housing various active components (transformers, regulators, generators, loads, etc.) with corresponding state variables. In some scenarios, the isolation of these components from a robust slack (substation) renders the network illconditioned, resulting in challenges in achieving nodal power balance within a preset tolerance of mismatch. This lack of balance in certain sections of the DN leads to nonconvergence of power flow, identifiable through flags in the solver. This issue is attributed a zero value with the actual impact of switching on the network state being indeterminate given that the solver fails to accurately reflect the network behavior with switching. On the other hand, the network operation with large voltage violations is infeasible as it leads to immediate network collapse. To discourage the agent from pursuing actions that result in actions leading to invalid states, the reward is augmented with a penalty term, V_{viol}. The goal here is to maintain the voltage levels within an acceptable range, ensuring that the network operation is sustainable. The voltage violations for each bus i ∈ N beyond its upper limit (\(\overline{V}\)) and lower limit (\(\underline{V}\)) are evaluated after power flow estimation as follows:
$$\Delta {V}_{{{{{{\rm{max}}}}}}}^{i}=\left\{\begin{array}{l}{\sum}_{j\in \phi } \, {V}_{j}^{i}\overline{V},\,\,\;{{{{{{{\rm{if}}}}}}}}\,{V}_{j}^{i} \; > \; \overline{V}\quad \\ 0,\hfill {{{{{{{\rm{otherwise}}}}}}}}\quad \end{array}\right.$$(2)$$\Delta {V}_{{{{{{\rm{min}}}}}}}^{i}=\left\{\begin{array}{l}{\sum}_{j\in \phi } \, \underline{V}{V}_{j}^{i},\;\,\,{{{{{{{\rm{if}}}}}}}}\,{V}_{j}^{i} \; < \; \underline{V}\quad \\ 0,\hfill {{{{{{{\rm{otherwise}}}}}}}}.\quad \end{array}\right.$$(3)
where ϕ denotes the set of phase connections for the bus. The voltage measurements and the energy supplied are estimated in perunits (pu) and calculated with respect to the base voltage, kV_{base}, and base power MVA_{base} of the corresponding network. The perunit calculations in power systems eliminate the issue of units and is equivalent to normalizing them using their base values:
where ∣N∣ is the cardinality of the set of network buses, and ΔV_{max} and ΔV_{min} represent the violations over maximum and minimum desirable voltage limits, respectively.
The outage management tool is applied to power distribution networks where the distribution system operator (DSO) or substation agents are responsible for regulating the power balance and controlling the resources to ensure safe and stable operation. In this study, the test feeders under consideration feature a single substation supplying power to loads while integrating distributed energy resources. Consequently, we adopt a centralized approach for outage management, treating the DSO or substation agent as an autonomous decisionmaking entity.
The formulation of our approach for outage management is tailored to align with the control architecture found in realworld distribution networks, instead of defaulting to a decentralized approach. Besides this, a multiagent system (MAS) based approach may prove unsuitable for reconfiguration which relies on widearea measurements, especially in networks where observability is limited, and local information is constrained. Additionally, the MAS while computationally efficient, encounters challenges in consistently achieving the optimal results^{49}. On the other hand, the developed GCAPS with centralized control can achieve nearoptimal results by integrating global (widearea) and local properties into the learning model. It is crucial to highlight that the primary focus of this study does not revolve around designing an MAS architecture, as seen in other works^{50,51}. Our objective is not to prescribe the control flow within the smart grid, and we operate under the assumption that the existing control architecture, with a DSO (in this case, an autonomous agent), is already established. While acknowledging the evolving nature of control architectures in smart grids, with a potential shift toward distributed control, it is essential to note the current lack of clear standards in this domain.
Environment and learning architecture
The distribution network models are implemented and simulated using the opensource distribution system simulator (OpenDSS)^{52}. DERs are modeled using a generic generator and solar photovoltaic (PV) elements in OpenDSS. Switches are defined on lines with associated switching controls, while the disable/enable property of the loads is used for shedding or picking up load. OpenDSSDirect^{53} is employed as the Pythonbased API to maneuver circuit modifications, I/O operations, and network topology extraction. The equivalent graph is constructed for the circuit using the NetworkX module. The overall framework of the environment is presented in Fig. 2. The implementation of specific switching actions may lead to the formation of multiple components within the network. These components are then translated into isolated DN sections within the DSS circuit. Furthermore, intentional islands created by gridforming DERs are considered a potential solution to tackle outages. To enable power flow evaluation in the isolated DSS circuit section, a virtual slack or reference bus is defined at the location of the gridforming DERs. This requires assigning a voltage source element to the selected buses (i.e., nodes).
The learning architecture utilizes a policy gradientbased GRL algorithm, where the policy network is derived from a Graph Neural Network (GNN). Each node i in the DN graph has properties such as active/reactive power demand, generation, and threephase voltage measurements, denoted as \({\gamma }_{i}=[{P}_{d}^{i},{Q}_{d}^{i},{P}_{g}^{i},{Q}_{g}^{i},{V}^{i}]\). The policy network takes the state information as input and produces an action. The policy network consists of three main components: (1) A GNN which is used to compute the graph node embeddings for the DN graph. (2) A feedforward network that is used to compute a feature vector, referred to as context embedding. This vector incorporates information that cannot be naturally represented in the graph structure, such as the energy supplied, voltage violations, and power flow through the edges. (3) An MLP that takes the node embeddings from the GNN and the context embeddings from the feedforward network as input. It computes a final feature vector that encompasses the entire state space information. Figure 2 shows the overall structure of the policy network, which includes the GNNbased feature abstraction.
Initially, the node properties γ_{i}, i ∈ N, are projected to a higherdimensional space using linear transformation: \({F}_{{{{{{{{\rm{init}}}}}}}}}^{i}={W}_{{{{{{{{\rm{init}}}}}}}}}\times {\gamma }_{i}+{b}_{{{{{{{{\rm{init}}}}}}}}}\), where \({W}_{{{{{{{{\rm{init}}}}}}}}}\in {{\mathbb{R}}}^{ {\gamma }_{i} \times {h}_{0}}\) and b_{init} are learnable weights and biases, respectively. The cardinality of a vector or set is denoted by ∣. ∣, and h_{0} represents the projection length. Let F_{init} be a matrix (\(\in {{\mathbb{R}}}^{ N \times {h}_{0}}\)) that represents all \({F}_{{{{{{{{\rm{init}}}}}}}}}^{i},i\in N\), (\({F}_{{{{{{{{\rm{init}}}}}}}}}=[{F}_{{{{{{{{\rm{init}}}}}}}}}^{1},{F}_{{{{{{{{\rm{init}}}}}}}}}^{2}\ldots {F}_{{{{{{{{\rm{init}}}}}}}}}^{ N }]\))
Node embeddings: Each feature vector \({F}_{{{{{{{{\rm{init}}}}}}}}}^{i},i\in {{{{{{{\bf{N}}}}}}}}\), is then passed through a series of Graph capsule layers. These layers utilize a graph convolutional filter of polynomial form to compute a matrix \({f}_{p}^{(l)}({{{{{{{\mathcal{X}}}}}}}},{{{{{{{\mathcal{L}}}}}}}})\), defined as:
Here, \({{{{{{{\mathcal{L}}}}}}}}\) represents the graph Laplacian, p is the order of the statistical moment, K is the degree of the convolutional filter, \({F}_{(l1)}({{{{{{{\mathcal{X}}}}}}}},{{{{{{{\mathcal{L}}}}}}}})\) denotes the output from layer l − 1, and \({F}_{(l1)}{({{{{{{{\mathcal{X}}}}}}}},{{{{{{{\mathcal{L}}}}}}}})}^{\circ p}\) represents p times elementwise multiplication of \({F}_{(l1)}({{{{{{{\mathcal{X}}}}}}}},{{{{{{{\mathcal{L}}}}}}}})\). Here, \({F}_{(l1)}({{{{{{{\mathcal{X}}}}}}}},{{{{{{{\mathcal{L}}}}}}}})\in {{\mathbb{R}}}^{{N}_{n}\times {h}_{l1}p}\), \({W}_{pk}^{(l)}\in {{\mathbb{R}}}^{{h}_{l1}p\times {h}_{l}}\). The variable \({f}_{p}^{(l)}({{{{{{{\mathcal{X}}}}}}}},{{{{{{{\mathcal{L}}}}}}}})\in {{\mathbb{R}}}^{{N}_{n}\times {h}_{l}}\) is a matrix, where each row is an intermediate feature vector for each node i ∈ N, infusing nodal information from L_{e} × K hop neighbors, for a value of p. The output of layer l is obtained by concatenating all \({f}_{p}^{(l)}({{{{{{{\mathcal{X}}}}}}}},{{{{{{{\mathcal{L}}}}}}}})\), as given by:
Here, \({{{{{{{\mathcal{P}}}}}}}}\) is the highest order of statistical moment, and h_{l} is the node embedding length of layer l. We consider all the values of h_{l}, l ∈ [0, L_{e}], to be the same throughout the paper. Equations (5) and (6) are computed for L_{e} layers, where each layer uses the output from the previous layer (\({F}_{l1}({{{{{{{\mathcal{X}}}}}}}},{{{{{{{\mathcal{L}}}}}}}})\)). Increasing the number of layers (L_{e}) and raising the value of K can enhance the learning of the overall structure of the graph by aggregating nodal neighborhood features from L_{e} × K neighbors. However, this improvement comes at the expense of having more learnable parameters in the policy, which becomes a drawback as the problem size increases. A larger value of h_{l} is beneficial as it enables the computation of a more detailed and comprehensive nodal state representation, both at the final stage and in intermediate steps. Similarly, a larger value of P assists in a better encoding of intermediate states using a vector representation (described in Eq. (6)) for each intermediate feature. This richer structural embedding is expected to be more effective than the scalar embedding used in GCN (Graph Convolutional Networks). However, it is important to note that both higher h_{l} and P come with additional training costs. The final node embeddings are computed using a linear transformation of \({F}_{l={L}_{e}}({{{{{{{\mathcal{X}}}}}}}},{{{{{{{\mathcal{L}}}}}}}})\):
where W_{F} is a learnable weight matrix of size \({h}_{{L}_{e}}{{{{{{{\mathcal{P}}}}}}}}\times {h}_{{L}_{e}}\).
The final graph embedding is computed by passing the node embeddings matrix F_{Nodes} through a series of Linear layers, followed by taking the mean:
where \({W}_{g1}\in {{\mathbb{R}}}^{{h}_{{L}_{e}}\times  N }\) and \({W}_{g2}\in {{\mathbb{R}}}^{{h}_{{L}_{e}}\times {h}_{{L}_{e}}}\), and \({F}_{{{{{{{{\rm{graph}}}}}}}}}\in {{\mathbb{R}}}^{{h}_{{L}_{e}}}\), for ease of representation, the bias terms are omitted here.
Context: In addition to the graphbased information, certain state space variables cannot be directly represented as nodes in the graph. These variables include energy supplied E_{supp}, voltage violation V_{viol}, and power flow through the edges l^{E}. The measurement of the impact of a control action on the distribution network performance serves as the context for training the model to embrace control policies that are both operationally feasible and safe. In the case of power networks, voltage violations can lead to severe consequences. The objective during steadystate operation is to uphold network voltage to prevent undervoltage and the ensuing blackout. Additionally, switching induces alterations in the network state, consequently causing a shift in the supplied energy. This impact is also considered as contextual information for the learning model. Similarly, the power flow through the branches which is representative of the DN state and line status (on, off, or outage) is encompassed within the context. To incorporate this information, a feature vector called the context is constructed:
Final MLP layer: The final state embedding F_{final} (\({{\mathbb{R}}}^{{h}_{{L}_{e}}}\)) is computed by adding F_{graph} and F_{context} and passing it through an MLP layer:
The \({{{{{{{\rm{Logits}}}}}}}}\in {{\mathbb{R}}}^{ {{{{{{{\mathcal{A}}}}}}}} }\) across all available actions are computed by passing F_{final} through a Feedforward layer. The Logits of the switches that need to be masked are set to negative infinity. Using the Logits, a Bernoulli probability distribution is computed for all available actions, with the probabilities computed using a Sigmoid function as e^{Logits}/(1 + e^{Logits}). The final switching action is determined using a greedy policy. If the mean of an action element (switch) is greater than 0.5, the switch position is set as on (or a value of 1).
The predicted value of the state is computed by passing F_{final} through another feedforward layer, which approximates the value of the state.
For this policy to be implemented on power networks of different sizes, the only change that has to be made is in the Feedforward layer used to compute the “context” vector. This is because the Feedforward layer size depends on the size of the state variables l_{E} and E_{supp}, which varies with the power network size. The structure of the GCAPS encoder and the final MLP layer does not need to change, hence the GCAPS encoder and the final MLP layer trained for a smallersized network, could also be used as a warm start to train for a largersized network. This is a significant fundamental advantage of the choice of our GNN architecture used to embody the network reconfiguration policy.
Training process
The training process involves generating samples on the distribution network to simulate different outage scenarios. This is accomplished by introducing line failures, adjusting load and generation operating points, and considering various outage scenarios. The outage events in the network are primarily caused by distribution line failures, which are simulated using a graphbased approach (discussed in the “Methods” section). The power network operating points (i.e., the load demand and power generation) are randomly drawn out of an annual profile made available in OpenDSS. To train the policy network, we employ Proximal Policy Optimization (PPO)^{54}. Here, the PPO training algorithm has been implemented using the stablebaselines3^{55} python library. Onpolicy algorithms such as PPO are usually preferred over offpolicy algorithms for environments with a discrete action space. An additional advantage of using the PPO implementation is its ability to support all the available data types in stablebaselines3^{55}. This is particularly useful to address the action space in our problem, which is represented in terms of MultiBinary data type. The training process involves collecting experience in the form of tuples containing the state, action, reward, and the next state. PPO operates based on rollout operations, where each operation consists of a fixed number of steps, denoted as N_{steps}. The weight updates occur after completing a rollout operation, in batches of size N_{batch}(≤N_{steps}). The weight update is performed via backpropagation, aiming to minimize a cost function comprising the policy gradient loss and the state value approximation loss. The policy network was trained for a total of N_{total} number of steps. To evaluate the performance of the proposed model, as well as to assess the impact of local and global structural information in the encoding process, we conducted comparative experiments with another learningbased framework called MLP. This framework utilizes the PPO algorithm, with a policy network based on a simple MultiLayer Perceptron (MLP) architecture. To ensure a fair and unbiased comparison, MLP was trained using the same settings as GCAPS. Both the MLPbased policy and the GCAPSbased policy are trained on an Intel Xeon Gold 6330 CPU (including 28 cores) with 512GB RAM and an NVIDIA A100 GPU. Note that this is expected to be a one/fewrun offline investment for any given or existing network. Moreover, such (or even better) computing resources are readily available nowadays, making the training process a reasonable offline investment for training a realtime decisionsupport system (the policy models) for outage management. This solution strategy is particularly attractive considering that the realtime models are much faster than current baselines, as seen from the comparisons with baselines in the “Results” section.
Figure 3 shows the training history in terms of the average episodic reward after each rollout, while training GCAPS and MLP for 13, 34, and 123 bus systems. The average episodic reward is computed as the average of the episodic rewards for all the episodes in each rollout operation. Analyzing the training history curve depicted in Fig. 3, it becomes evident that GCAPS consistently achieves a higher reward compared to MLP for the 13bus, 34bus, and 123bus systems. For the 13bus network, the average episodic reward for MLP converges to a slightly lower value than the peak value, while for GCAPS, the average episodic rewards are much higher compared to MLP, but could not fully converge in 2 million steps. For 34bus and 123bus networks, GCAPS has a faster convergence compared to that of MLP. This observation demonstrates the superior performance of GCAPS in effectively managing outages and optimizing the distribution network’s operational state. The codes for training can be found in ref. ^{56}.
Case study on 13bus network
The proposed model for outage management is validated using a modified version of the IEEE 13bus distribution test network. This network incorporates switches and DERs and serves as the basis for validating the effectiveness of the proposed model, as shown in Fig. 4a. The quantity, positions, and specifications of the switches within the 13bus test network are based on established studies that have previously validated the technical viability of these components within the circuit. Specifically, for the 13bus network, we refer to the details presented in refs. ^{57,58} to define the sectionalizing and tie switches. Our model assumes that switches are preinstalled in the network with their data available for our decisionmaking tool. However, optimizing switch locations and quantities falls within a planning study and requires a technoeconomic analysis, which is beyond the scope of this paper. Our focus is on evaluating the model for enhancing operational resilience in power networks. Two gridforming DERs of 1000 kW are considered at buses 634 and 680, while the buses 645, 675, and 684 are equipped with gridfeeding DERs rated at 40 kW, 500 kW, and 100 kW, respectively. The total connected load of the network is 3.5 MW. In the normal configuration of the network, the sectionalizing switches are closed, while the tie switches remain open. This initial setup establishes the baseline operational state for the network. To systematically evaluate the developed model and its performance, two traditional optimization techniques, namely the mixed integer secondorder conic programming (MISOCP) and binary particle swarm optimization (BPSO), are employed for all case studies in addition to the previously discussed MLP model. In the testing phase of the models, we rationally select the number and location of the line outages as opposed to the graphbased approach used during training. Additionally, the load and generating points are not drawn out of the representative annual profile discussed in training, rather a randomly generated multiplying factor is used to set the network operating point.
Scenario 1 in the 13bus network involves the failure of a single line of importance, determined by its high edgebetweenness in normal configuration. Specifically, this scenario represents the outage of the line connecting buses 670–671. The status of the decision variables, which includes both the switches and dispatchable loads, obtained from the different models for scenario 1 is depicted in Fig. 5a. Notably, both the traditional optimization models, namely the MISOCP and the BPSO, yield the same solution for scenario 1. An important observation from analyzing the statuses of the switches and loads is that the reinforcement learning models demonstrate generalizability by providing distinct solutions for the two different scenarios. It is worth mentioning that the MLP model generates different solutions for the same test case while the GCAPS model solution is reproducible for a specific test case. The voltage plot of the 13bus network, after implementing the GCAPS solution for managing outage scenario 1, is illustrated in Fig. 6a. The GCAPS solution reroutes the power from the substation to affected downstream section through an alternate path. Due to this switching action in scenario 1, the resulting network configuration maintains a robust connection to the substation, ensuring that the voltages at all active phases of connected buses are within 0.99 and 1.10 pu, thus operating well within the desirable bounds.
Scenario 2 involves the outage of two switchable lines connecting 632–670, and 646–684. This scenario aims to test the capability of the proposed model to enforce the inoperability of the outage switch in decision support. The status of decision variables, including the switches and dispatchable loads, obtained from the different models for scenario 2, is shown in Fig. 5b. Once again, the MISOCP and the BPSO solutions for scenario 2 are identical. Upon inspecting the decision variables, it is noticeable that the MLPbased RL model violates the nonswitchable condition of the outage line 646–684 (sw3) for scenario 2, as it mistakenly closes the switch. The voltage plot of the 13bus network, after implementing the GCAPS solution for managing outage scenario 2, is shown in Fig. 6b. In scenario 2, the GCAPS outage mitigation solution ensures a functional network with voltages at buses ranging from 1.10 pu to 0.99 pu. This solution also does not isolate any components of the network from the substation, thereby resulting in a stronger connected network. Additionally, the diversity in solutions with different outage scenarios is indicative of the generalizing capability of the model.
Case study on 34bus network
The validation of the proposed model and baselines is conducted on a modified 34bus distribution test network, which incorporates switches and DERs. The details regarding the switches in the 34bus network are adopted from ref. ^{59}, albeit presented in a different ordering of sectionalizing and tie switches here. The total connected load of the network is 2.04 MW. Three gridforming DERs with capacities of 146 kW, 144 kW, and 200 kW are connected at buses 890, 844, and 816, respectively while a gridfeeding DER with a capacity of 96 kW is connected at bus 820 as shown in Fig. 4b. Under normal operating conditions, the five sectionalizing switches are closed, while the four tie switches are open.
Scenario 1 involves multiple line outages at the connections between buses 858–834, 888–890, 814–828, and 828–830. The lines connecting the buses 814–828 and 828–830 are switchable lines (switches 9 and 4, respectively). While the line 858–834 is one with a high edge betweenness centrality measure in the downstream section of the feeder. Figure 5c presents the status of the decision variables, including switchable lines and loads, obtained from the different models for scenario 1. Both the MISOCP and BPSO yield similar results for scenario 1 on the 34bus network. The results demonstrate the ability of RL models to differentiate between various scenarios and generalize during decisionmaking. However, the MLPbased RL model produces an invalid control action in scenario 1 by closing switch 9 on the outage line. The switching action from the GCAPS forms two network components. One is connected to the substation and hence the voltage measurement at these buses are within the desirable limits as seen in Fig. 6c. The other network section is formed around the DER at bus 890. However, this DER is not a gridforming DER and therefore, the loads at these buses remain unsupplied. This is observed by the inactive or zero voltage for certain buses in the voltage profile plot (Fig. 6c). As shown in the figure, the voltage at bus 890 violates the safe operational limits. However, this is because of the gridfeeding DER at the bus 890. The gridfeeding DERs are generally equipped with island detection modules that turn off the DER when isolated. The voltages at all the other active buses are found to be within the limits of 0.95–1.10 pu.
Scenario 2 considers multiple line failures at 832–858, 834–860, and 854–852 in the network. The lines 832–858 and 854–852 are in close proximity, while the line 834860 is a switchable sectionalizing line (switch 2). Figure 5d presents the status of the decision variables, including the switchable lines and loads, obtained from the different models for scenario 2. In scenario 2, the switching action by the GCAPS model results in a configuration that remains connected to the substation, with a small section disconnected (inactive) from the main network. The voltage plot for the 34bus network, derived by implementing the GCAPS solution for scenario 2, is presented in Fig. 6d. The buses disconnected from the network by the switching action are characterized by inactive (or zero voltage from OpenDSS) as seen in Fig. 6d. It is observed that the GCAPS solution for scenario 2 ensures voltages at all active phases of connected buses are well within the range of 0.90–1.10 pu.
Case study on 123bus Network
To assess the scalability of the proposed learning over graphs model, we applied the developed outage management tool to a modified IEEE 123bus test network. This network has been modified by the inclusion of 13 sectionalizing and 9 tie switches as shown in Fig. 4c. The specifications of the switches are obtained from ref. ^{58}, albeit with a different arrangement in our implementation. The DERs with a capacity of 250 kW are connected at buses 39, 46, 71, 75, 79, 96, and 108, while gridfeeding DERs sized at 80 kW are introduced at buses 11, 33, 56, 82, 91, and 104, as detailed in ref. ^{60}. During normal operating conditions, the sectionalizing switches are in the closed position and the tie switches are open. Two outage scenarios have been considered to test the GCAPS model taking into account the network centrality metrics and associated vulnerabilities.
In scenario 1, outages have been considered on lines connecting buses 13–18, 51–151, and 65–66. Notably, the edge 13–18 exhibits the highest currentflow betweenness centrality, while nodes 51 and 151 have high currentflow closeness centrality. Additionally, the edge 65–66 is located at the end of a lateral feeder section. Figure 7a presents the status of the decision variables including switching lines and loads acquired from the different methods for scenario 1. The MISOCP yields the optimal result. The BPSO here, however does not produce the same result as MISOCP (as seen in other case studies) and seems to be stuck at a local optimum (clarified in Fig. 8a). There are no invalid switching actions in this scenario. The GCAPS switching action when implemented on the network suffering from an outage, results in improved performance with voltage profile as shown in Fig. 6e. The phases disconnected by switching and inactive phases are indicated as 0 when evaluating the network circuit in OpenDSS. Hence, the voltage measured at the active phases of all the buses are plotted in Fig. 6e. It is observed that the bus voltages are well within the desirable limits following outage management by GCAPS.
In scenario 2, multiple outages at lines connecting buses 151–300, 57–60, 67–72, and 67–97 are considered, and among these, the first three lines are associated with switches (sw15, sw5, and sw6 respectively). The last line connects end nodes with high betweenness centrality. Figure 7b illustrates the status of the decision variables, encompassing switchable lines and loads output by different models for scenario 2. The MLP model is found to operate outage switches, thus producing invalid actions. The results for the two outage scenarios in the 123bus network exhibits the ability of the proposed GRL model to differentiate between scenarios and generalize during decision making. The GCAPS solution on the 123bus network with outages results in improved network performance and the corresponding voltage plot is displayed in Fig. 6f. As seen in the figure, for the specific case, the voltage at the buses (for active phases) are within desirable bounds using the GCAPS switching control.
Comparison of the proposed model with baselines
We compare the developed GCAPSbased GRL model with the baseline models to evaluate the performance and the estimated energy served during outage conditions. Figure 8a, b presents the estimated equivalent energy served when implementing the control decisions in the distribution test networks for scenarios 1 and 2 using the different models, respectively. In the 13bus network, as expected, the energy supplied is optimal for the MISOCP and BPSO models. Our GCAPS model shows nearoptimal decisionmaking capability for both scenarios. In scenario 1, the MLP model is inferior as it provides the minimum energy supply among all the models, while it becomes invalid in scenario 2 due to the operation of the outage switch. In the case of 34bus network our GCAPS model exhibits nearoptimal performance, closely approaching the optimal energy supply estimated by the MISOCP and BPSO models. On the other hand, the MLP model performs inferiorly compared to the other models and also produces an invalid control action for scenario 1. As observed in the figure, for the 123 bus network the MISOCP generates the optimal results while the BPSO is near optimal in scenario 1 and optimal in scenario 2. The GCAPS solution closely approaches the optimal solution produced by the exact method. Conversely, the MLP model performs inadequately and results in invalid control actions in scenario 2.
The performance of our GCAPS model is compared with the baselines by testing different scenarios in 13, 34, and 123bus networks. The computation time required to obtain the outage mitigation solution is presented in Table 1. The table reports the mean of 5 test runs for the two scenarios using the models across different networks. It can be observed that the response time for the two RLbased models, namely GCAPS and MLP, is in the order of milliseconds, and is mostly agnostic to the increase in the size of the network from 13 to 34 bus system, demonstrating realtime performance. In comparison, the optimizationbased methods, BPSO and MISOCP have a delay in computing those decisions. Specifically, BPSO and MISOCP are respectively about 5 and 2 orders of magnitude more expensive than the learned RLbased policies. Although the computational complexity of the proposed model is contingent on the number of switches, the study in ref. ^{61} found that the optimal number of remotecontrolled line switches is 8 to 9 for a 37node network and 15 to 22 for a 137node network. Our research aligns with these findings, as we have considered this when defining switches in the networks (nine sectionalizing and tie switches for 34bus networks and twentytwo switches for 123bus networks). This approach closely reflects realworld conditions and constraints, as switches are typically not deployed along all lines within the distribution network.
In Fig. 8c–f, we illustrate the performance of the DN and its evolution with time when implementing the decisions provided by the different models during outages. Specifically, the proposed GCAPSbased GRL model is compared with the MLPbased RL model which does not consider the underlying topology and the MISOCP method (conventionally used for solving such problems). The BPSO despite producing similar results as the MISOCP is not suitable for resilience decision support as is evident from the delayed response shown in Table 1. Outage scenario 1 in the 13bus network and outage scenario 2 in the 34bus network are used to exemplify the impact of the model response on DN performance. The excluded scenarios in the two networks are not suitable for comparison owing to the invalid switching decisions provided by the MLP model. As observed in Fig. 8c, e, the voltages at the buses 652 and 890 in the 13 and 34 bus networks respectively are under voltage due to disruption. The voltage violation exists for about 10’s of cycles in the 13 and 34 bus DNs when MISOCP is used for decision support. While the RL models mitigate the voltage violation through outage management almost instantaneously. The continued operation of the network in the disrupted state also increases the risk of cascaded failures and widespread blackouts. Meanwhile, the loss of energy due to delayed decisionmaking by the MISOCP with respect to the GCAPS is 607.45 kWs and 596.52 kWs for 13 and 34 buses respectively. In Fig. 8g, the performance of various models on a logarithmic scale of time across different test networks is illustrated. Test runs of the models for different networks are performed to collect the computation time. A sample size of 5 is employed here as the computation times for BPSO models are prohibitively large.
Discussion
We have presented a realtime outage management model for distribution networks based on a reinforcement learning over graphs framework. In our outage management model, we have considered the gridforming and feeding modes of the DER, and hence both gridconnected and islanding reconfiguration schemes have been incorporated into the solution. The load shedding adopted in the mitigation strategy ensures that the network has operational feasibility and is not vulnerable to voltage collapse. The learning model employs an onpolicy RL algorithm and adopts the Graph Capsule (GCAPS) neural networks for integrating information about the DN topology into the learning framework. By leveraging GCAPS neural networks, the model has been shown to effectively integrate nodal properties, and local and global structural information into the learning process.
We have evaluated our model on modified versions of the IEEE 13bus, 34bus, and 123bus distribution test networks, which include distributed energy resources (DERs) and sectionalizing/tie switches. Two traditional models based on MISOCP and BPSO, and the RL with MLP as policy network have been used as baselines to compare the realtime decisionmaking and network resilience improvement capability, where the energy served under disruption (see Fig. 8) can be perceived as a measure of resilience. The results have demonstrated that the proposed model achieves nearoptimal performance in realtime outage management for different networks and outage scenarios. Additionally, the model has been found to effectively capture the DN topology in decisionmaking as indicated by the improved performance and constraint adherence when compared with the MLPbased approach. Above all, our model has also provided timesensitive decision support for outage mitigation, thereby making it a suitable selfhealing tool in the current smartgrid landscape.
As demonstrated in this paper, the rapid decisionmaking capability in contrast to traditional methods, is a key strength of our model. Unlike conventional approaches, our model demonstrates realtime response times to increasing network size, making it wellsuited for online deployment on large distribution networks. However, it is important to note that dealing with larger networks presents challenges during the training phase, demanding advanced computational resources to adequately train the learning over the graphs model. This limitation is encountered during the offline phase and can be resolved by allocating adequate resources for training considering the benefit of operational resilience. From the results in our prior studies on applying related graphbased GRL for MultiRobot Task Allocation^{46,47}, we have found that the computational memory requirement for training on larger graphs (more than 200 nodes) is very high and often hinders the training task. Our prior results^{46,47} have demonstrated the capability of the GNNbased policy network to learn policies that can be applied to a largersized mostly homogeneous networks with simple nearlinear state transitions (without training), while still demonstrating comparable performance with respect to more traditional approaches. More work is required to explore if these advantages will also translate to applications such as the DN topology reconfiguration that involves heterogeneous networks and nonlinear flow properties that affect the state transition. It is also crucial to model and evaluate the impact of communication breakdowns on resolving power network outages, since those can be an associated artifact attributed to the natural or anthropogenic hazard that caused the power grid breakdown. This, however, necessitates intricate coupled cyberphysical modeling of the communication network, and formulation of communication recovery as in ref. ^{62}. Addressing the modeling and control of coupled communication and power networks as a unified effort poses significant challenges. A potential extension of our work involves modeling the interconnected power and communication networks as multilayered graphs and evaluating the impact of communication failure on power network recovery.
Methods
Graphbased scenario generation
The training scenarios used for GRL model were generated from the graph equivalent of the DN. The failure of the components, such as lines, can be approximated by disconnecting them from the DN^{63}. The model developed is not specific to any particular type of extreme weather event, and hence a generalized and intuitive approach is adopted for simulating outages during training. The outages in the DN often originate from localized failures that can lead to cascading effects. To emulate this behavior, a subgraph method for randomized edge removal is employed, similar to the approach described in ref. ^{64}. This method involves randomly selecting nodes N_{s} ∈ N from the graph representation of the DN, and creating subgraphs centered around these nodes with varying radii R_{s} ≤ R_{max} (maximum radius). We consider \({R}_{{{{{{\rm{max}}}}}}}=\frac{{G}_{{{{{{{{\rm{dia}}}}}}}}}}{2}\), where G_{dia} is the diameter of the graph. Within each selected subgraph, a fraction of the edges F_{s} ∈ E is randomly removed to simulate the localized impact of contingencies. The fraction of edge failures is gradually increased from 0 to 50%. By varying N_{s}, R_{s}, and F_{s}, scenarios with multiline failures can be generated for training the model. Furthermore, within each scenario, load multipliers and generating points are varied by randomly selecting multipliers from an annual profile available in OpenDSS package with an hourly resolution.
Mixedinteger programming formulation
Outage management in an unbalanced distribution network is an optimization problem that combines combinatorial and nonlinear nature. The problem can be effectively formulated as an optimal power flow problem, leveraging branch flow equations with angle and conic relaxations as in ref. ^{19}. The decision variables include switching and load shedding, while the control variables corresponding to power flow are also considered in the problem formulation.
For the distribution network with \({\mathbb{L}}\) set of loads and \(\widetilde{{\mathbb{L}}}\) set of switchable loads, the active/reactive power consumption with load pickup or shedding is modeled using δ^{L} as follows:
where \({P}_{i}^{D}\) and \({Q}_{i}^{D}\) represent the active and reactive power demand of the load i, respectively.
On the other hand, considering the set of gridfeeding generators \({{\mathbb{G}}}_{fd}\) in the DN, the active and reactive power generation is estimated using:
where \({P}_{{{{{{\rm{avail}}}}}}}^{G}\), \({Q}_{{{{{{\rm{avail}}}}}}}^{G}\) is the total generation power available for the generator with ∣G_{ph}∣ number of phase connections, and θ^{*} is the set of active phases of the generator, considering θ = (a, b, c).
The total active and reactive power consumption by loads is constrained by the total generation in the DN as follows:
The total power generation in the DN is given as follows:
where in addition to the gridfeeding generators, the set of gridforming generators \({{\mathbb{G}}}_{s}\), including the substation, are considered.
The power supplied by the gridforming generators and the substation is constrained to be within its maximum capacity as follows:
Adopting threephase branch flow formulations with relaxations as in ref. ^{19}, \({{{{{{{\mathcal{V}}}}}}}}\) and \({{{{{{{\mathcal{I}}}}}}}}\) are used to denote the square of voltage and current, respectively. The voltages at all buses except the slack buses are constrained within upper and lower limits as follows:
Here, \({\mathbb{B}}\) and \({{\mathbb{B}}}_{s}\) denote the set of buses and the set of slack buses in the network, respectively. The voltage square at the substation (or slack) bus on the other hand is equated to 1.04 per unit.
For the set of power delivery elements \({\mathbb{E}}\), the set of switchable elements (lines) \({{\mathbb{E}}}_{sw}\), and the line switch status δ^{sw}, the power flow P^{E} through the elements are constrained as follows:
In a similar manner, the reactive power flow Q^{E} and the square of branch current square \({{{{{{{{\mathcal{I}}}}}}}}}^{E}\) through the elements are also constrained within its limits. The power flow through the outage lines defined in set \({\mathbb{O}}\) is, however, equated to zero as shown below:
The reactive power flow and the square of branch current through outage lines are also equated to zero. The balance of active and reactive power flow through the elements is formulated as follows:
where \({{\mathbb{R}}}_{L}(b)\) is the set of loads and \({{\mathbb{R}}}_{G}(b)\) is the set of generators connected to the receiving bus of element b. In Eqs. (19) and (20), \({\mathbb{C}}(b)\) represent the elements that are children elements to b.
Additionally, Kirchhoff’s voltage equation is modeled as:
Here, the parameters \(\hat{R}\) and \(\hat{X}\) denote the element’s modified resistance and reactance, respectively. In Eq. (21), \({\mathbb{S}}(b)\) and \({\mathbb{R}}(b)\) denote the sending and receiving bus of the element b, respectively. For elements (lines) with switch, Eq. (21) is modified to an inequality constraint using the big M method^{19} and bound within \((1{\delta }_{l}^{sw})M\) and \((1{\delta }_{l}^{sw})M\).
The secondorder conic inequality constraint using convex relaxation is formulated as follows:
The objective function maximizes the total power supply in the network with control actions during outages and is formulated as follows:
Training details
The training process is allocated a maximum of 36 h, and the total number of steps is set to 2 million. For the 13bus network, both GCAPS and MLP successfully completed the training with 2 million steps. However, for the 34bus network, MLP could only be trained for 1.5 million steps within a 36h time frame, while the network for 123bus systems could only be trained for 500,000 steps. To ensure a fair comparison, we utilize the trained weights of GCAPS and MLP at 1.5 million steps for the 34bus network, and 500,000 steps for the 123bus network. Here we implement a squared exponential decreasing learning rate strategy \({\rho }_{t}={\rho }_{{{{{{{{\rm{init}}}}}}}}}\times {e}^{{(1t)}^{2}\times {D}_{R}}\), where t represents the fraction of current step to the total number of steps for learning, ρ_{t} is the learning rate at t, ρ_{init} is the initial learning rate, and D_{R} is the decay rate. We used ρ_{init} = 1e−5, and D_{R} = 3. This strategy leads to smoother convergence and likely mitigates getting stuck in local minima. Table 2 shows the training details, including the hyperparameter setting for PPO.
Simulation setup
The proposed model and all the other baselines are tested on a system with Intel Core i71365U 1.80 GHz with 16 GB memory. The OpenDSSDirect API along with Python version 3.9.12, and Networkx version 2.6.3 are used in our simulations. The mixedinteger programming is performed with Gurobipy using a Gurobi optimizer version 9.5.2.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
The figure/table data generated in this study are provided in the Source Data file. Source data are provided with this paper.
Code availability
Code for this article is available publicly from: https://zenodo.org/records/11188543.
References
Campbell, R. J. & Lowry, S. Weatherrelated Power Outages and Electric System Resiliency (Congressional Research Service, Library of Congress Washington, DC, 2012).
Kirthiga, M. V., Daniel, S. A. & Gurunathan, S. A methodology for transforming an existing distribution network into a sustainable autonomous microgrid. IEEE Trans. Sustain. Energy 4, 31–41 (2012).
Bouhouras, A. S., Andreou, G. T., Labridis, D. P. & Bakirtzis, A. G. Selective automation upgrade in distribution networks towards a smarter grid. IEEE Trans. Smart Grid 1, 278–285 (2010).
U.S. Department of Energy. 2020 Smart Grid System Report (U.S. Department of Energy, 2022).
Arefifar, S. A., Alam, M. S. & Hamadi, A. A review on selfhealing in modern power distribution systems. J. Mod. Power Syst. Clean Energy 11, 1719–1733 (2023).
Distribution intelligence. https://www.smartgrid.gov/the_smart_grid/distribution_intelligence.html.
Fan, Z., Mao, Y. & Horger, T. What smart grid means to an ISO/RTO? In IEEE PES T&D 2010, 1–8 (IEEE, 2010).
Wang, Y. et al. Coordinating multiple sources for service restoration to enhance resilience of distribution systems. IEEE Trans. Smart Grid 10, 5781–5793 (2019).
Fan, D. et al. Restoration of smart grids: current status, challenges, and opportunities. Renew. Sustain. Energy Rev. 143, 110909 (2021).
Baran, M. E. & Wu, F. F. Network reconfiguration in distribution systems for loss reduction and load balancing. IEEE Power Eng. Rev. 9, 101–102 (1989).
Jacob, R. A. & Zhang, J. Distribution network reconfiguration to increase photovoltaic hosting capacity. In 2020 IEEE Power & Energy Society General Meeting (PESGM), 1–5 (IEEE, 2020).
Jacob, R. A. & Zhang, J. Outage management in active distribution network with distributed energy resources. In 2020 52nd North American Power Symposium (NAPS), 1–6 (IEEE, 2021).
Al Owaifeer, M. & AlMuhaini, M. MILPbased technique for smart selfhealing grids. IET Gener. Transm. Distrib. 12, 2307–2316 (2018).
Botea, A., Rintanen, J. & Banerjee, D. Optimal reconfiguration for supply restoration with informed A* search. IEEE Trans. Smart Grid 3, 583–593 (2012).
Xu, Y., Liu, C.C., Schneider, K. P., Tuffner, F. K. & Ton, D. T. Microgrids for service restoration to critical load in a resilient distribution system. IEEE Trans. Smart Grid 9, 426–437 (2016).
Poudel, S., Dubey, A. & Schneider, K. P. A generalized framework for service restoration in a resilient power distribution system. IEEE Syst. J. 16, 252–263 (2020).
Bakar, N. N. A., Hassan, M. Y., Sulaima, M. F., Na’im Mohd Nasir, M. & Khamis, A. Microgrid and load shedding scheme during islanded mode: a review. Renew. Sustain. Energy Rev. 71, 161–169 (2017).
Liu, H., Chen, X., Yu, K. & Hou, Y. The control and analysis of selfhealing urban power grid. IEEE Trans. Smart Grid 3, 1119–1129 (2012).
Farivar, M. & Low, S. H. Branch flow model: relaxations and convexification—part I. IEEE Trans. Power Syst. 28, 2554–2564 (2013).
Sekhavatmanesh, H. & Cherkaoui, R. A novel decomposition solution approach for the restoration problem in distribution networks. IEEE Trans. Power Syst. 35, 3810–3824 (2020).
Shirmohammadi, D. Service restoration in distribution networks via network reconfiguration. IEEE Trans. Power Deliv. 7, 952–958 (1992).
Zidan, A. & ElSaadany, E. Network reconfiguration in balanced and unbalanced distribution systems with variable load demand for loss reduction and service restoration. In 2012 IEEE Power and Energy Society General Meeting, 1–8 (IEEE, 2012).
Rao, R. S., Narasimham, S. V. L., Raju, M. R. & Rao, A. S. Optimal network reconfiguration of largescale distribution system using harmony search algorithm. IEEE Trans. Power Syst. 26, 1080–1088 (2010).
Wu, Y.K., Lee, C.Y., Liu, L.C. & Tsai, S.H. Study of reconfiguration for the distribution system with distributed generators. IEEE Trans. Power Deliv. 25, 1678–1685 (2010).
Pathan, M. I., AlMuhaini, M. & Djokic, S. Z. Optimal reconfiguration and supply restoration of distribution networks with hybrid microgrids. Electr. Power Syst. Res. 187, 106458 (2020).
Sekhavatmanesh, H. & Cherkaoui, R. Analytical approach for active distribution network restoration including optimal voltage regulation. IEEE Trans. Power Syst. 34, 1716–1728 (2018).
de Quevedo, P. M., Contreras, J., Rider, M. J. & Allahdadian, J. Contingency assessment and network reconfiguration in distribution grids including wind power and energy storage. IEEE Trans. Sustain. Energy 6, 1524–1533 (2015).
Li, Y., Xiao, J., Chen, C., Tan, Y. & Cao, Y. Service restoration model with mixedinteger secondorder cone programming for distribution network with distributed generations. IEEE Trans. Smart Grid 10, 4138–4150 (2018).
Chen, C., Wang, J., Qiu, F. & Zhao, D. Resilient distribution system by microgrids formation after natural disasters. IEEE Trans. Smart Grid 7, 958–966 (2015).
Wang, F. et al. A multistage restoration method for mediumvoltage distribution system with DGs. IEEE Trans. Smart Grid 8, 2627–2636 (2016).
Sultana, B., Mustafa, M., Sultana, U. & Bhatti, A. R. Review on reliability improvement and power loss reduction in distribution system via network reconfiguration. Renew. Sustain. Energy Rev. 66, 297–310 (2016).
Cao, D. et al. Reinforcement learning and its applications in modern power and energy systems: a review. J. Mod. Power Syst. Clean Energy 8, 1029–1042 (2020).
Cao, D. et al. Physicsinformed graphical representationenabled deep reinforcement learning for robust distribution system voltage control. IEEE Trans. Smart Grid 15, 233–246 (2023).
Xiang, Y., Lu, Y. & Liu, J. Deep reinforcement learning based topologyaware voltage regulation of distribution networks with distributed energy storage. Appl. Energy 332, 120510 (2023).
Lu, Y. et al. Deep reinforcement learning based optimal scheduling of active distribution system considering distributed generation, energy storage and flexible load. Energy 271, 127087 (2023).
Gao, Y., Shi, J., Wang, W. & Yu, N. Dynamic distribution network reconfiguration using reinforcement learning. In 2019 IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids (SmartGridComm), 1–7 (IEEE, 2019).
Kundačina, O. B., Vidović, P. M. & Petković, M. R. Solving dynamic distribution network reconfiguration using deep reinforcement learning. Electr. Eng. 104, 1–15 (2021).
Wang, B., Zhu, H., Xu, H., Bao, Y. & Di, H. Distribution network reconfiguration based on noisynet deep qlearning network. IEEE Access 9, 90358–90365 (2021).
Gao, Y., Wang, W., Shi, J. & Yu, N. Batchconstrained reinforcement learning for dynamic distribution network reconfiguration. IEEE Trans. Smart Grid 11, 5357–5369 (2020).
Abdelmalak, M. et al. Network reconfiguration for enhanced operational resilience using reinforcement learning. In 2022 International Conference on Smart Energy Systems and Technologies (SEST), 1–6 (IEEE, 2022).
Gautam, M., Abdelmalak, M., MansourLakouraj, M., Benidris, M. & Livani, H. Reconfiguration of distribution networks for resilience enhancement: a deep reinforcement learningbased approach. In 2022 IEEE Industry Applications Society Annual Meeting (IAS), 1–6 (IEEE, 2022).
Igder, M. A. & Liang, X. Service restoration using deep reinforcement learning and dynamic microgrid formation in distribution networks. IEEE Trans. Ind. Appl. 59, 5453–5472 (2023).
Ferreira, L. R., Aoki, A. R. & LambertTorres, G. A reinforcement learning approach to solve service restoration and load management simultaneously for distribution networks. IEEE Access 7, 145978–145987 (2019).
Du, Y. & Wu, D. Deep reinforcement learning from demonstrations to assist service restoration in islanded microgrids. IEEE Trans. Sustain. Energy 13, 1062–1072 (2022).
Verma, S. & Zhang, Z. L. Graph capsule convolutional neural networks. https://doi.org/10.48550/arXiv.1805.08090 (2018).
Paul, S., Ghassemi, P. & Chowdhury, S. Learning scalable policies over graphs for multirobot task allocation using capsule attention networks. In 2022 International Conference on Robotics and Automation (ICRA), 8815–8822 (IEEE, 2022).
Paul, S. et al. Efficient planning of multirobot collective transport using graph reinforcement learning with higher order topological abstraction. In 2023 IEEE International Conference on Robotics and Automation (ICRA), 5779–5785 (IEEE, 2023).
Vinayagam, A., Swarna, K. S. V., Khoo, S. Y., Oo, A. M. T. & Stojcevski, A. PV based microgrid with gridsupport gridforming inverter control(simulation and analysis). Smart Grid and Renewable Energy 8, 1–30 (2017).
Sujil, A., Verma, J. & Kumar, R. Multi agent system: concepts, platforms and applications in power systems. Artif. Intell. Rev. 49, 153–182 (2018).
Elmitwally, A., Elsaid, M., Elgamal, M. & Chen, Z. A fuzzymultiagent service restoration scheme for distribution system with distributed generation. IEEE Trans. Sustain. Energy 6, 810–821 (2015).
Rohbogner, G., Fey, S., Benoit, P., Wittwer, C. & Christ, A. Design of a multiagentbased voltage control system in peertopeer networks for smart grids. Energy Technol. 2, 107–120 (2014).
Dugan, R. C. & McDermott, T. Reference Guide. The Open Distribution System Simulator (OpenDSS) (EPRI, 2016).
Krishnamurthy, D. Opendssdirect.py. Tech. Rep. (National Renewable Energy Lab (NREL), 2017).
Schulman, J., Wolski, F., Dhariwal, P., Radford, A. & Klimov, O. Proximal policy optimization algorithms. https://doi.org/10.48550/arXiv.1707.06347 (2017).
Raffin, A. et al. Stablebaselines3: reliable reinforcement learning implementations. J. Mach. Learn. Res. 22, 1–8 (2021).
Jacob, R. A., Paul, S., Chowdhury, S., Gel, Y. R. & Zhang, J. Realtime outage management in active distribution networks using reinforcement learning over graphs. https://zenodo.org/records/11188543 (2024).
Kersting, W. The simulation of loop flow in radial distribution analysis programs. In 2014 IEEE Rural Electric Power Conference (REPC), B3–1 (IEEE, 2014).
QuinteroDuran, M., Candelo, J. E. & SotoOrtiz, J. A modified backward/forward sweepbased method for reconfiguration of unbalanced distribution networks. Int. J. Electr. Comput. Eng. 9, 85–101 (2019).
Gangwar, P., Singh, S. N. & Chakrabarti, S. Network reconfiguration for the DGintegrated unbalanced distribution system. IET Gener. Transm. Distrib. 13, 3896–3909 (2019).
Arif, A. & Wang, Z. Networked microgrids for service restoration in resilient distribution systems. IET Gener. Transm. Distrib. 11, 3612–3619 (2017).
Jooshaki, M., KarimiArpanahi, S., Lehtonen, M., Millar, R. J. & FotuhiFiruzabad, M. An MILP model for optimal placement of sectionalizing switches and tie lines in distribution networks with complex topologies. IEEE Trans. Smart Grid 12, 4740–4751 (2021).
Wang, X., Kang, Q., Wei, X., Guo, L. & Liang, Z. Resilience assessment and recovery of distribution network considering the influence of communication network. Int. J. Electr. Power Energy Syst. 152, 109280 (2023).
Danielsson, A. M. Deep Learning for Power System Restoration. Ph.D. thesis (2018).
Bush, B., Chen, Y., OforiBoateng, D. & Gel, Y. R. Topological machine learning methods for power system responses to contingencies. In Proceedings of the Innovative Applications of Artificial Intelligence Conference, 35, 15278–15285 (2021).
Acknowledgements
This material is based upon work sponsored by the Department of the Navy, Office of Naval Research under ONR award number N000142112530 (J.Z., S.C., and Y.G.). Part of this material is also based upon work supported by (while Y.G. serving at) the NSF. The United States Government has a royaltyfree license throughout the world in all copyrightable material contained herein. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the Office of Naval Research and the National Science Foundation.
Author information
Authors and Affiliations
Contributions
R.A.J. and S.P. conceptualized the code, conducted experiments, and performed analysis. S.C. supervised the development of the learning framework and J.Z. supervised the power network control and evaluation. Y.G. contributed to discussions and provided supervision of the work. R.A.J. and S.P. drafted the manuscript. S.C., Y.G., and J.Z. edited the manuscript. All authors contributed to manuscript revisions and provided feedback.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks Abdollah Younesi and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Source data
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Jacob, R.A., Paul, S., Chowdhury, S. et al. Realtime outage management in active distribution networks using reinforcement learning over graphs. Nat Commun 15, 4766 (2024). https://doi.org/10.1038/s4146702449207y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s4146702449207y
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.