# Power of Ensemble Diversity and Randomization for Energy Aggregation

## Abstract

We study an ensemble of diverse (inhomogeneous) thermostatically controlled loads aggregated to provide the demand response (DR) services in a district-level energy system. Each load in the ensemble is assumed to be equipped with a random number generator switching heating/cooling on or off with a Poisson rate, r, when the load leaves the comfort zone. Ensemble diversity is modeled through inhomogeneity/disorder in the deterministic dynamics of loads. Approached from the standpoint of statistical physics, the ensemble represents a non-equilibrium system driven away from its natural steady state by the DR. The ability of the ensemble to recover by mixing faster to the steady state after its DR’s use is advantageous. The trade-off between the level of the aggregator’s control, commanding the devices to lower the rate r, and the phase-space-oscillatory deterministic dynamics is analyzed. Then, we study the effect of the load diversity, investigating four different disorder probability distributions (DPDs) ranging from the case of the Gaussian DPD to the case of the uniform with finite support DPD. We show that stronger regularity of the DPD results in faster mixing, which is similar to the Landau damping in plasma physics. Our theoretical analysis is supported by extensive numerical validation.

## Introduction

Demand response (DR) is a popular modern way to balance power systems1. It can also be used more broadly to improve control of large engineered systems, such as natural gas systems, district heating systems, water systems, traffic systems, and so forth. Many of the infrastructure systems were originally devised assuming a clear separation of roles between, for example, loads and generators in a power system—loads consume the electric power as they need it without much coordination with generators, while the generators balance the system as fast and accurately as possible to keep the system balanced and running. However, this traditional paradigm is challenged by many modern additions to power systems, such as wind and solar renewable generation, which involve much more uncertainty and fluctuations than the system experienced in the past, thus making the system less reliable and stable. The essence of DR is fixing this problem by breaking the traditional split of roles between generators and loads by involving the loads in the system control and coordination. In spite of its relatively short history, DR has now become widely discussed and implemented, mainly through control of large flexible loads (see, e.g., review2 and references therein).

Another complication associated with the collective functioning of many loads, noticed already in3,4,5, is the so-called cold load pickup, which occurs at the conclusion of the DR service interruption. In this case, sufficiently long involvement of loads in the DR services leads to load synchronization—many thermostatically controlled loads, typically subject to the bang-bang control switching the loads on/off when they reach the endpoints of the comfort zone, are moving along their path in the phase space together, thus resulting in long undesirable oscillations of the ensemble cumulative consumption.

Stochastic effects, associated with uncontrolled and short-correlated temporal fluctuations of loads as well as inhomogeneity of loads within the ensemble, destroy the synchronization eventually and the system mixes into a statistically steady state. However, natural stochasticity and inhomogeneity are typically weak, thus resulting in unacceptably slow mixing. (The slow mixing translates into a delay constraint on the next use of the ensemble in the DR).

As argued in8,9, acceleration of mixing can be achieved by adding a controlled random component to the load dynamics. This “randomize for better mixing” idea was brought into the context of the aggregator model in10, where it was suggested to allow the loads to deviate from the bang-bang control. When the loads leave the comfort zone, their state (on or off) is not changed instantaneously but instead with a delay generated independently by each load according to a Poisson distribution with rate r. The rate r is the only parameter that is broadcast to the loads by the aggregator.

The methodology of10, utilizing the Fokker–Planck (FP) formalism of statistical physics brought into the DR literature in6,11, was limited to a homogeneous ensemble and to a dynamical load model that was too complex to allow analytic analysis of the mixing conditions. In this manuscript, we correct for these limitations, thus extending and improving the approach of10. We pay a special attention to extracting, confirming in simulations and explaining special features of the disorder, and also effects observed (but not elaborated on) in10.

## Results

We study the effects of the load inhomogeneity (which we also call disorder, following the statistical physics jargon) on operations of the ensemble, specifically in terms of the ensemble’s ability to recover fast from a perturbation after its use by the aggregator for the DR. The ensemble is assumed to be controlled by an aggregator in a communication-minimal way by sending the same signal switching off/on rate to all the consumers simultaneously. We are mainly interested in the regime where both the control and the ensemble variability are weak. The two main messages of the paper (put here in a colloquial format and then quantified formally later) are as follows:

1. (1)

There exists an optimal switching rate corresponding to the fastest recovery. Any deviation (increase or decrease) of the rate leads to a slower recovery.

2. (2)

Increase of the ensemble variability is advantageous for faster recovery/mixing.

Because temporal evolution of the ensemble is at the core of this manuscript analysis, let us define relevant timescales and then restate our main results in a more technical way. We assume that by default (without aggregator), each customer follows a standard bang-bang operation—switching on (off) the cooling device when the temperature exceeds (becomes less than) a preset threshold. We assume that the outside temperature is significantly higher than the switch-on threshold, thus resulting in cycling of the device with its natural timescale $$\tau$$. The aggregator changes this natural cycling by requesting the consumers to switch on/off with a random delay distributed according to a Poisson distribution with rate r. (Each device is assumed equipped with a random number generator). By default, i.e., without aggregator control, $$r=+\,\infty$$. Weak aggregator control means that $$r\tau \gg 1$$. To account for variability within the ensemble, one assumes that devices may have slightly different $$\tau$$. Formally, one considers $$\tau$$ as the disorder (variability) parameter distributed according the disorder probability distribution (DPD), $$g(\tau )$$, characterized in terms of its typical value (mean), $${\tau }_{0}$$, and the distribution width, $${\rm{\Delta }}$$. In the following we define and consider four different forms of $$g(\tau )$$—Gaussian, Lorentzian, Laplace, and finite-support uniform—parameterized by $${\tau }_{0}$$ and $${\rm{\Delta }}$$, however always assuming (analyzing the disorder case) that the typical control is weak, i.e., $$r{\tau }_{0}\gg 1$$, and that the disorder is also weak, i.e., $${\rm{\Delta }}\ll {\tau }_{0}$$.

With the timescales and two small dimensionless parameters, $${(r{\tau }_{0})}^{-1}$$ and $${\rm{\Delta }}$$/$${\tau }_{0}$$, defined, we are ready to provide the following more technical, still qualitative but intuitive, explanations for our main results.

1. (1)

When $$r=\infty$$, the system does not decay and temporal evolution of the probability distribution function (PDF) of a device temperature, x, averaged over the ensemble shows a periodic behavior in time, ~$$\exp (\,\pm \,i{\lambda }_{I}t)$$, with the period $$1/{\lambda }_{I}=\tau /(2\pi )$$. Decrease of r leads to decrease of λI and simultaneous increase (from zero at $$r=\infty$$) of the decay rate, λR. In this oscillatory with a decay regime, temporal behavior of the correction to the stationary probability distribution becomes ~$$\exp (\,-\,\lambda t)$$, $$\lambda ={\lambda }_{R}\pm i{\lambda }_{I}$$ $${\lambda }_{I}\ne 0$$ and $${\lambda }_{R} > 0$$, where ± reflects emergence of two complex-conjugated solutions. At a certain critical value, $$r={r}_{c}$$, λI becomes zero, i.e., the two complex-conjugated solutions merge into one (degenerate) solution such that close to the merging point $$\lambda ={\lambda }_{c}(1\pm c\sqrt{1-r/{r}_{c}}+O(1-r/{r}_{c}))$$, where $$c=O(1)$$ and λc is the critical value of λR achieved at $$r={r}_{c}$$. The main conclusion of this straightforward qualitative estimate is that the lowest of the two eigenvalues (corresponding to ±1 $$\to$$ −1 and thus to the slowest asymptotic decay) achieves its maximum as a function of r at rc.

2. (2)

In the default regime (no aggregator control), a set of devices with exactly the same $$\tau$$, i.e., when we set $${\rm{\Delta }}$$ to zero, would not mix at all, i.e., correction to the stationary probability distribution oscillates and does not decay. Introduction of a small but finite $${\rm{\Delta }}$$ results in a decay that is largely controlled by $$g(\tau )$$ in the vicinity of its maximum, i.e., at $$\tau \approx {\tau }_{0}$$. Specifically, decay of the temperature probability to its stationary value in time is controlled by the shifted Fourier transform of the DPD, $$\int \,d\tau g(\tau )\,\exp (\,\pm \,2\pi it/\tau )\approx \exp (\,\pm \,2\pi it/{\tau }_{0})$$$$\int \,d\varsigma g({\tau }_{0}+\varsigma )\,\exp (\,\mp \,2\pi i\varsigma t/{\tau }_{0}^{2})$$. Obviously, details of the decay depend on the shape of the DPD. Of the four model DPDs considered in this manuscript, the Gaussian DPD results in the fastest decay (shifted Gaussian in time), the Lorentzian DPD is a bit slower (exponential in time), and the Laplacian DPD and uniform finite-support DPD are the slowest, with asymptotic 1/t2 and 1/t decays, respectively. This hierarchy, illustrated in Fig. 1, and the Fourier-transform interpretation suggest that the speed of decay is linked to the regularity of the DPD around its central part. (This phenomenon is reminiscent of the mathematically similar analysis of the Landau damping in plasma physics described by the Vlasov equation [see12,13 and references therein]. Specifically, we refer here to the fact that the regularity of the initial velocity distribution influences the Landau mixing/damping speed).

These main focal points are detailed and extended in the remainder of the manuscript. Models of the statistical ensemble and of the ensemble inhomogeneity are formulated in “Formulation” section. The basic model of the homogeneous ensemble is analyzed in “Basic Homogeneous Model” section. Effects of the disorder/inhomogeneity are studied in “Disorder” section, where we also compare analytic and numerical results. “Discussions” section is reserved for conclusions and discussion of the path forward.

### Formulation of the problem

One characterizes a load by the continuous parameter, x, standing for the temperature, and by the discrete/binary parameter, $$\sigma =\uparrow \,,\downarrow$$, indicating whether the air conditioning system/device of the load (one considers cooling for concreteness) is switched on, $$\sigma =\uparrow$$, or off, $$\sigma =\downarrow$$. Conditioned to σ, the dynamics of x follow the deterministic rule

$$\frac{dx}{dt}=v(x,\sigma ),$$
(1)

where $$v(x|\sigma )$$ describes the rate of temperature change as a function of the current temperature, x, conditioned to the state of the load’s air conditioning device (later in the text referred to simply as “device”). Our basic model is

$$\underline{{\rm{Basic}}\,{\rm{Model}}}:\,v(x,\sigma )=\{\begin{array}{ll}-u, & \sigma =\uparrow \\ u, & \sigma =\downarrow \,,\end{array}$$
(2)

where u is a positive constant. The model is a simplification of a bit richer popular model, e.g., used in10, where u is not a constant as in Eq. (2) but a linear function of x. σ in Eqs (1) and (2) is modeled as the following Markovian binary (two-level) stochastic process:

$$\forall \,t:\,\sigma (t+dt)=\{\begin{array}{ll}\downarrow , & {\rm{with}}\,{\rm{probability}}\,r\times dt\,{\rm{otherwise}}\,\sigma (t)\,\& \,x < {x}_{\downarrow }\\ \uparrow , & {\rm{with}}\,{\rm{probability}}\,r\times dt\,{\rm{otherwise}}\,\sigma (t)\,\& \,x > {x}_{\uparrow }\end{array},$$
(3)

where dt is the time step (of the properly discretized continuous time limit), r is the rate of exponential (Poisson) switching, and $${x}_{\uparrow }$$, $${x}_{\downarrow }$$ marks the size of the temperature band within which no switching occurs, $${x}_{\downarrow } < {x}_{\uparrow }$$.

As set above, the basic model has two timescales: one describing deterministic evolution, $$\tau =2({x}_{\uparrow }-{x}_{\downarrow })/u$$, which is the time it takes for a device to make a full cycle through the combined $$(x,\sigma )$$ phase space illustrated in Fig. 2, and 1/r, the typical time of a stochastic jump from $$\sigma =\uparrow$$ to $$\sigma =\downarrow$$ or vice versa.

Notice that the selection of the model in Eq. (2) as the basis for this manuscript analysis is dictated not only by its realism but also by considerations of simplicity and our ability to derive analytic results. Specifically, for the case of the asymptotic uniform ensemble consisting of the infinite number of devices with the same characteristics (the same u) and following the same switching protocol (the same r), we are interested in computing and analyzing the evolution in time of the PDFs governed by the system of coupled FP equations following directly from the model definition, given by Eqs (1), (2) and (3):

$$({\partial }_{t}\,(\begin{array}{cc}1 & 0\\ 0 & 1\end{array})- {\mathcal L} )\,P(x|t,\tau ,r)=0,\,P(x|t,\tau ,r)\dot{=}(\begin{array}{c}{P}_{\uparrow }(x|t,\tau ,r)\\ {P}_{\downarrow }(x|t,\tau ,r)\end{array}),$$
(4)
$${\mathcal L} \dot{=}u{\partial }_{x}\,(\begin{array}{cc}1 & 0\\ 0 & -1\end{array})-r\,(\begin{array}{cc}\theta ({x}_{\downarrow }-x) & -\theta (x-{x}_{\uparrow })\\ -\theta ({x}_{\downarrow }-x) & \theta (x-{x}_{\uparrow })\end{array}),$$
(5)

where θ(y) is unity if $$y > 0$$ and zero otherwise. We are seeking a properly normalized solution of Eq. (4):

$${N}_{\uparrow }(t,\tau ,r)+{N}_{\downarrow }(t,\tau ,r)=1,\,{N}_{\uparrow ,\downarrow }(t,\tau ,r)\dot{=}\int \,dx{P}_{\uparrow ,\downarrow }(x|t,\tau ,r),$$
(6)

where $${N}_{\uparrow ,\downarrow }(t,\tau ,r)$$ counts proportions of devices that are switched on and off, respectively.

As shown in the section “Basic Homogeneous Model”, solution of the system of the FP Eq. (6) can be presented explicitly as the spectral expansion in terms of the Lambert-W functions for any initial $$t=0$$ distributions. This analytic expression will allow us to analyze the temporal evolution of the basic homogeneous ensemble in much more detail than10 for a more complex model, with $$v(x|\sigma )$$ in Eq. (1) dependent linearly on x.

However, devices contributing realistic ensembles are not necessarily the same in terms of their cooling/heating strength. To model the ensemble diversity, i.e., non-uniform ensemble, one introduces disorder in $$\tau$$. We assume that $$\tau$$ characterizing a device is drawn independently from one of the following four model DPDs: Gaussian, Lorentzian, Laplace, and uniform (finite support)

$${g}_{G}(\tau |{\tau }_{0},{\rm{\Delta }})=\frac{1}{\sqrt{2\pi }{\rm{\Delta }}}{e}^{-\frac{{(\tau -{\tau }_{0})}^{2}}{2{{\rm{\Delta }}}^{2}}},$$
(7)
$${g}_{Lr}(\tau |{\tau }_{0},{\rm{\Delta }})=\frac{{\rm{\Delta }}}{\pi }\frac{1}{{(\tau -{\tau }_{0})}^{2}+{{\rm{\Delta }}}^{2}},$$
(8)
$${g}_{Lp}(\tau |{\tau }_{0},{\rm{\Delta }})=\frac{1}{2{\rm{\Delta }}}\exp (-\frac{|\tau -{\tau }_{0}|}{{\rm{\Delta }}}),$$
(9)
$${g}_{u}(\tau |{\tau }_{0},{\rm{\Delta }})=\{\begin{array}{cc}{(2{\rm{\Delta }})}^{-1}, & {\tau }_{0}-{\rm{\Delta }}\le \tau \le {\tau }_{0}+{\rm{\Delta }}\\ 0, & {\rm{otherwise}}\end{array},$$
(10)

representing different extremes (e.g., in terms of the asymptotics). We parameterize these DPDs via their mean/max, $${\tau }_{0}$$, and variance, $${\rm{\Delta }}$$, in a similar way to facilitate comparisons. In general, we will assume that $${\rm{\Delta }}\le {\tau }_{0}$$, and in terms of the asymptotic analysis, we will be interested most in the regime of weak disorder, $${\rm{\Delta }}\ll {\tau }_{0}$$. (Notice that the negative values of $$\tau$$, $$\tau < 0$$, are not physical. Therefore, when performing asymptotic analysis for the disorder distributions with formally defined infinite support, described by Eqs (7), (8) and (9), one needs to make sure that the fictitious $$\tau < 0$$ regime does not contribute the asymptotic results). Then, the following average d over the DPDs

$$\overline{{P}_{\uparrow ,\downarrow }(x|t,{\tau }_{0},{\rm{\Delta }},r)}\dot{=}\int \,d\tau g(\tau ){P}_{\uparrow ,\downarrow }(x|t,\tau ,r),$$
(11)
$$\overline{{N}_{\uparrow ,\downarrow }(t,{\tau }_{0},{\rm{\Delta }},r)}\dot{=}\int \,d\tau g(\tau ){N}_{\uparrow ,\downarrow }(t,\tau ,r),$$
(12)

will be the focus of our analysis of the inhomogeneous ensembles represented by Eqs (7), (8), (9) and (10). Equation (12) defines the total number of devices observed at the moment of time t in the state $$\uparrow$$ or $$\downarrow$$ (for all $$\tau$$), Eq. (11) defines similar, however more detailed, object which also differentiate, in addition, between the current temperature, x, of the system.

In this manuscript, we pose and answer the following two related questions:

• Qualitative Question about the Inhomogeneous Ensemble: Does the disorder accelerate or slow down mixing, i.e., relaxation of the ensemble probability distribution to its steady state?

• Quantitative Question about the Inhomogeneous Ensemble: How does the relaxation look depending on the system parameters and the parameters characterizing the PDF of the disorder?

Our choice of the basic model in Eq. (2), resulting in analytic expression for $$P(x|t,\tau ,r)$$ stated in terms of the explicit spectral series in the section “Basic Homogeneous Model”, allows us in the section “Disorder” to answer the quantitative question explicitly and then to use the analytic solution to reach qualitative conclusions.

### Analytic Solution for the Basic Homogeneous Model

The solution of Eq. (4) can be written in terms of the following explicit spectral expansion:

$$\begin{array}{rcl}P(x|t,\tau ,r) & = & \sum _{k=-\infty }^{+\infty }({a}_{k;-}(\tau ,r){\xi }_{k;-}(x|\tau ,r){e}^{-{\lambda }_{k;-}(\tau ,r)t}\\ & & +\,{a}_{k;+}(\tau ,r){\xi }_{k;+}(x|\tau ,r){e}^{-{\lambda }_{k;+}(\tau ,r)t}),\end{array}$$
(13)
$${\mathcal L} {\xi }_{k;\pm }(x|\tau ,r)=-\,{\lambda }_{k;\pm }(\tau ,r){\xi }_{k;\pm }(x|\tau ,r),$$
(14)
$${\lambda }_{k;\pm }(\tau ,r)\dot{=}\frac{r}{2}\,(1-\frac{{W}_{k}(\pm \frac{r\tau }{4}{e}^{\frac{r\tau }{4}})}{r\tau /4}),$$
(15)
$${\xi }_{k;\pm }(x|\tau ,r)\dot{=}\{\begin{array}{l}(\begin{array}{c}\exp \,(\tfrac{\tau x(r-{\lambda }_{k;\pm })}{2({x}_{\uparrow }-{x}_{\downarrow })})\\ \tfrac{r}{r-2{\lambda }_{k;\pm }}\,\exp \,(\tfrac{\tau x(r-{\lambda }_{k;\pm })}{2({x}_{\uparrow }-{x}_{\downarrow })})\end{array}),x < {x}_{\downarrow }\\ (\begin{array}{c}\exp \,(\tfrac{\tau (r{x}_{\downarrow }-{\lambda }_{k;\pm }x)}{2({x}_{\uparrow }-{x}_{\downarrow })})\\ \tfrac{(r-2{\lambda }_{k;\pm })}{r}\,\exp \,(\tfrac{\tau (r{x}_{\downarrow }+{\lambda }_{k;\pm }(x-2{x}_{\uparrow }))}{2({x}_{\uparrow }-{x}_{\downarrow })})\end{array}),{x}_{\downarrow } < x < {x}_{\uparrow },\\ (\begin{array}{c}\exp \,(\tfrac{\tau ({\lambda }_{k;\pm }-r)x}{2({x}_{\uparrow }-{x}_{\downarrow })})\,\exp \,(\tfrac{\tau (r({x}_{\downarrow }+{x}_{\uparrow })-2{\lambda }_{k;\pm }{x}_{\uparrow })}{2({x}_{\uparrow }-{x}_{\downarrow })})\\ \tfrac{(r-2{\lambda }_{k;\pm })}{r}\,\exp \,(\tfrac{\tau ({\lambda }_{k;\pm }-r)x}{2({x}_{\uparrow }-{x}_{\downarrow })})\,\exp \,(\tfrac{\tau (r({x}_{\downarrow }+{x}_{\uparrow })-2{\lambda }_{k;\pm }{x}_{\uparrow })}{2({x}_{\uparrow }-{x}_{\downarrow })})\end{array}),x > {x}_{\uparrow }\end{array}$$
(16)

where Eq. (15) solves the spectral equation $$r-2{\lambda }_{k;\pm }=\pm \,(r{e}^{{\lambda }_{k;\pm }\tau /2})$$, and Wk(z) with $$z\in {\mathbb{C}}$$ and $$k\in {\mathbb{Z}}$$ denote all the analytic in z solutions of the Lambert-W transcendental equation, $${W}_{k}(z){e}^{{W}_{k}(z)}=z$$. (The Lambert-W function is called $$\mathrm{ProductLog}\,[k,z]$$ in Mathematica14. See15 for details of the Lambert-W function analysis, including asymptotics).

To complete description of the spectral decomposition, one also needs to define adjoint eigenvalues of $${\mathcal L}$$

$${ {\mathcal L} }^{\dagger }{\xi }_{k;\pm }^{\dagger }(\tau ,r)=-\,{\lambda }_{k;\pm }^{\ast }(\tau ,r){\xi }_{k;\pm }^{\dagger }(\tau ,r),$$
(17)
$${\xi }_{k;\pm }^{\dagger }(x|\tau ,r)\dot{=}\tfrac{\tau }{2({x}_{\uparrow }-{x}_{\downarrow })((r-2{\lambda }_{k;\pm }^{\ast })\tau +4)}\,\{\begin{array}{l}(\begin{array}{c}(r-2{\lambda }_{k;\pm }^{\ast })\,\exp \,(-\tfrac{\tau (r{x}_{\downarrow }+{\lambda }_{k;\pm }^{\ast }(x-2{x}_{\downarrow }))}{2({x}_{\uparrow }-{x}_{\downarrow })})\\ {(r-2{\lambda }_{k;\pm }^{\ast })}^{2}\,\exp \,(-\tfrac{\tau (r{x}_{\downarrow }+{\lambda }_{k;\pm }^{\ast }(x-2{x}_{\downarrow }))}{2({x}_{\uparrow }-{x}_{\downarrow })})\end{array}),x < {x}_{\downarrow }\\ (\begin{array}{c}(r-2{\lambda }_{k;\pm }^{\ast })\,\exp \,(-\tfrac{r\tau {x}_{\downarrow }-{\lambda }_{k;\pm }^{\ast }\tau x}{2({x}_{\uparrow }-{x}_{\downarrow })})\\ r\,\exp \,(-\tfrac{\tau (r{x}_{\downarrow }+{\lambda }_{k;\pm }^{\ast }(x-2{x}_{\uparrow }))}{2({x}_{\uparrow }-{x}_{\downarrow })})\end{array}),{x}_{\downarrow } < x < {x}_{\uparrow ,}\\ (\begin{array}{c}(r-2{\lambda }_{k;\pm }^{\ast })\,\exp \,(-\tfrac{r\tau {x}_{\downarrow }-{\lambda }_{k;\pm }^{\ast }\tau x}{2({x}_{\uparrow }-{x}_{\downarrow })})\\ r\,\exp \,(-\tfrac{r\tau {x}_{\downarrow }-{\lambda }_{k;\pm }^{\ast }\tau x}{2({x}_{\uparrow }-{x}_{\downarrow })})\end{array}),x > {x}_{\uparrow }\end{array}$$
(18)

where $${ {\mathcal L} }^{\dagger }$$, the adjoint of $${\mathcal L}$$, and the standard L2 scalar product between two vectors P and G are defined according to

$${ {\mathcal L} }^{\dagger }\dot{=}-\,u{\partial }_{x}\,(\begin{array}{cc}1 & 0\\ 0 & -1\end{array})-r\,(\begin{array}{cc}\theta ({x}_{\downarrow }-x) & -\theta ({x}_{\downarrow }-x)\\ -\theta (x-{x}_{\uparrow }) & \theta (x-{x}_{\uparrow })\end{array}),$$
(19)
$$\langle G, {\mathcal L} \,P\rangle \dot{=}\int \,{({G}^{\ast })}^{\top } {\mathcal L} \,P\,dx=\int \,{({ {\mathcal L} }^{\dagger }{G}^{\ast })}^{\top }P\,dx=\langle { {\mathcal L} }^{\dagger }G,P\rangle .$$
(20)

It is straightforward to check that the eigenvectors, defined by Eqs (16) and (18), are normalized and orthogonal, i.e., $$\langle {\xi }_{{k}_{1};{\varsigma }_{1}}^{\dagger },{\xi }_{{k}_{2};{\varsigma }_{2}}\rangle ={\delta }_{{k}_{1},{k}_{2}}{\delta }_{{\varsigma }_{1},{\varsigma }_{2}}$$. Now closing the loop in Eq. (13) and linking the a coefficients there to the initial condition, $${P}_{0}(x)\dot{=}P(0;x)$$, one derives

$${a}_{k;\pm }=\langle {\xi }_{k;\pm }^{\dagger },{P}_{0}\rangle .$$
(21)

Substituting Eq. (13) into Eq. (6), one discovers that $${N}_{\uparrow ,\downarrow }(t,\tau ,r)$$, i.e., the total density/proportion of devices that are switched on/off, is represented by the spectral series with only “−” modes contributing (coefficients of the “+” modes are exactly zero, i.e., $$\int \,dx{\xi }_{\uparrow ,\downarrow ;+}=0$$).

We discuss consequences of the analysis on special features of the spectral problem, long time analysis (of the gap), and sensitivity of the asymptotic solution to the parameters in the following three subsections.

#### Features of the Spectral Problem

We find it useful to identify a number of significant features of the spectral problem defined by Eqs (1321):

1. (1)

$${\lambda }_{0;+}=0$$ and all other eigenvalues have a positive real part that grows with |k|, i.e., the spectrum is discrete, positive, and ordered.

2. (2)

At $$r\tau > C=4{W}_{0}({e}^{-1})\simeq 1.11386$$, no real eigenvalues exist except the zero one, i.e., in the “high switching rate” regime, the solution shows oscillatory decay with time increase.

3. (3)

At $$r\tau \le C$$, there are only two other real eigenvalues besides zero, λ0;− and λ−1;−, where $${\lambda }_{0;-}\le {\lambda }_{-1;-}$$. All other eigenvalues (with a nonzero imaginary part) have a real part that is larger than λ−1;−. Therefore, in the “low switching rate” regime, the solution decays with time (no oscillations).

4. (4)

When one of the two parameters, $$\tau$$ or r, is fixed, one finds that the largest value of λ0;− is achieved at the bifurcation point, where $$\beta \dot{=}r\tau /4$$ reaches $${\beta }_{c}=C/4$$, i.e., given fixed r and changing τ, or fixed τ and changing r, mixing is the fastest at τ = C/r.

5. (5)

Moreover, given r is fixed, dRe(λ0;−1)/, i.e., the rate of change with τ of the real part of the leading eigenvalue is positive/negative when is smaller/larger than C.

6. (6)

When r is fixed and $$\beta =r\tau /4$$ is sent to zero, one finds that $${\lambda }_{0;-}\to r$$. Indeed, in this regime, all devices that are in the allowed range, $$x\in [{x}_{\downarrow },{x}_{\uparrow }]$$, move fast to their respective boundaries; thus, at small τ, PDF decay is controlled primarily by the Poisson jumps/switchings.

7. (7)

When r is fixed and $$\beta =r\tau /4$$ is sent to $$\infty$$ (or alternatively when τ is fixed and β is sent to $$\infty$$), one arrives at the following asymptotic: $${\lambda }_{k;\pm }$$ ~ $$(1-(\mathrm{ln}(\,\pm \,\beta {e}^{\beta })+2\pi ik$$$$\mathrm{ln}(\mathrm{ln}(\,\pm \,\beta {e}^{\beta })+2\pi ik))/\beta )r/2$$, which means, in particular, that $${\lambda }_{k,-}\to {0}^{+}-2i\pi (2k+1)/\tau$$ and $${\lambda }_{k,+}\to {0}^{+}-4i\pi k/\tau$$. One concludes that in the asymptotic regime of the “highest switching rate”, the temporal evolution of the PDF becomes oscillatory and relaxation to the steady state slows down asymptotically to zero.

Evolution of the (four) leading non-zero eigenvalues (containing the smallest real part) with the dimensionless parameter $$\beta =r\tau /4$$ and fixed τ is illustrated in Fig. 3. It is worth noting that the behavior of our system described by Eqs (16) is qualitatively similar to what would be observed in a damped harmonic oscillator with the natural frequency and damping coefficient scaling respectively as τ and 1/$$({\tau }^{2}r)$$.

#### Long Time Asymptotic Analysis: Gap Condition

Let us now clarify the conditions under which one can limit analysis of the PDF mixing to the leading $$k=0$$, “−” mode and complex conjugate, thus approximating

$$\delta {N}_{\uparrow }(t,\tau ,r)\dot{=}{N}_{\uparrow }(t,\tau ,r)-{N}_{\uparrow }^{{\rm{st}}}(\tau ,r)\approx \exp \,(\phi (\tau ,r)-{\lambda }_{0;-}(\tau ,r)t),$$
(22)
$$\phi (\tau ,r)\dot{=}\,\mathrm{log}\,({a}_{0;-}(\tau ,r)\,\int \,dx{\xi }_{\uparrow ;0;-}(x|\tau ,r)),$$
(23)

where $${N}_{\uparrow }^{{\rm{st}}}(\tau ,r)\dot{=}{\mathrm{lim}}_{t\to \infty }\,{N}_{\uparrow }(t,\tau ,r)=1/2$$ is the stationary solution achieved at $$t\to \infty$$. The approximation is valid when $${\rm{Re}}\,({\lambda }_{1;+}(\tau ,r)-{\lambda }_{0;-}(\tau ,r))\,t\gg 1$$, i.e., when the relaxation time is larger than the inverse gap between real parts of the two leading eigenvalues.

a0;−, and thus $$\phi$$, depend on initial condition. For $${P}_{0}=(\delta (x-{x}_{\downarrow }),0)$$, corresponding to the “worst case” (least mixed) initial condition, one derives

$$\lambda (\tau ,r)\dot{=}{\lambda }_{0;-}(\tau ,r)=\frac{r}{2}\,(1-\frac{4{W}_{0}(\,-\,\frac{r\tau {e}^{\frac{r\tau }{4}}}{4})}{r\tau }),$$
(24)
$$\phi (\tau ,r)=\,\mathrm{log}\,(\frac{2r(r-2\lambda )}{\lambda (r-\lambda )\,(\tau (r-2\lambda )+4)}),$$
(25)

where here and below $$\lambda =\lambda (\tau ,r)$$ is a shortcut notation for λ0;−.

#### Asymptotic Sensitivity

Analytic solution, discussed above in the main body of this section, allows us to analyze the sensitivity of $${\lambda }_{0;-}(\tau ,r)$$ and $$\phi (x,\tau ,r)$$, defined in Eqs (24) and (25), to changes in the parameter τ. (The analysis can also be extended to study sensitivity to changes of r. We focus on the τ sensitivity because τ is user-dependent and thus uncertain, whereas r is aggregator-defined and thus well controlled and certain). Specifically, we are interested in analyzing the coefficient of Taylor expansion at $${\beta }_{0}\dot{=}r{\tau }_{0}/4 > {\beta }_{c}$$ for the dynamic characteristics of interest about (the typical) τ0:

$$\phi (\tau ,r)=\phi +(\tau /{\tau }_{0}-1)\phi ^{\prime} +O({(\tau /{\tau }_{0}-1)}^{2}),$$
(26)
$$\lambda (\tau ,r)=\lambda +(\tau /{\tau }_{0}-1)\lambda ^{\prime} +O({(\tau /{\tau }_{0}-1)}^{2}),$$
(27)

where $$\phi ,\phi ^{\prime} ,\lambda ,\lambda ^{\prime}$$ are the shortcut notations for $$\phi (x,\tau ,r)$$, $$\tau {\partial }_{\tau }\phi (x,\tau ,r)$$, $$\lambda (x,\tau ,r)$$, and $$\tau {\partial }_{\tau }{\lambda }_{0;-}(\tau ,r)$$, respectively, evaluated at $$\tau ={\tau }_{0}$$. The coefficients of interest show the following asymptotics at small $$\varepsilon \dot{=}1/(r{\tau }_{0})$$:

$$\lambda {\tau }_{0}=-\,2i\pi (1-4\varepsilon +16{\varepsilon }^{2})+16{\pi }^{2}{\varepsilon }^{2}+O({\varepsilon }^{3}),$$
(28)
$$\lambda ^{\prime} {\tau }_{0}=2i\pi (1-8\varepsilon +48{\varepsilon }^{2})-48{\pi }^{2}{\varepsilon }^{2}+O({\varepsilon }^{3}),$$
(29)
$$\phi =-\,\mathrm{log}(\,-\,i\pi )-2\pi i(\varepsilon -8{\varepsilon }^{2})-2{\pi }^{2}{\varepsilon }^{2}+O({\varepsilon }^{3}),$$
(30)
$$\phi ^{\prime} =2i\pi (\varepsilon -16{\varepsilon }^{2})+4{\pi }^{2}{\varepsilon }^{2}+O({\varepsilon }^{3}).$$
(31)

### Basic Model with Disorder

Averaging over the disorder according to Eq. (11) with only the leading $$k=0$$, “−” term in Eq. (13) is justified when the spectral gap condition (see the section “Long Time Asymptotic Analysis: Gap Condition”) is verified. Furthermore, we assume $${\rm{\Delta }}/{\tau }_{0}\ll 1$$ so that the integral Eq. (11) is concentrated for τ located around τ0 so that $$|\tau -{\tau }_{0}|\ll {\tau }_{0}$$. Then, taking into account the large time asymptotic (Eq. (22)) and assuming that the Taylor series expansion (Eqs (26) and (27) is legitimate (when $$r{\tau }_{0} > C$$), one arrives at

$$\overline{\delta {N}_{\uparrow }(t,\tau ,r)}\approx \exp \,(\phi -\lambda t)\,\overline{\exp \,((\tau /{\tau }_{0}-1)\,(\phi ^{\prime} -t\lambda ^{\prime} ))}.$$
(32)

Versions of Eq. (32) for the four example probability distributions of the disorder (Eqs (710)) computed for small disorder $${\rm{\Delta }}/{\tau }_{0}\ll 1$$ are

$$\begin{array}{rcl}\overline{\delta {N}_{\uparrow }(t,\tau ,r)} & \approx & \exp \,(\phi -\lambda t)\,\times \\ & & \begin{array}{ll}({\rm{G}}): & \exp \,(\frac{{{\rm{\Delta }}}^{2}{(\phi ^{\prime} -t\lambda ^{\prime} )}^{2}}{2})\end{array}\end{array}$$
(33)
$$\begin{array}{lll} & & \begin{array}{ll} & \mathop{\to }\limits_{\varepsilon \to 0}\,\exp \,(\,-\,2{\pi }^{2}\frac{{{\rm{\Delta }}}^{2}{t}^{2}}{{\tau }_{0}^{2}}),\end{array}\end{array}$$
(34)
$$\begin{array}{lll} & & \begin{array}{ll}({\rm{Lr}}): & {e}^{i{\rm{\Delta }}(\lambda ^{\prime} t-\phi ^{\prime} )}\end{array}\end{array}$$
(35)
$$\begin{array}{lll} & & \begin{array}{ll} & \mathop{\to }\limits_{\varepsilon \to 0}\,\exp \,(\,-\,2\pi \frac{{\rm{\Delta }}t}{{\tau }_{0}^{2}}),\end{array}\end{array}$$
(36)
$$\begin{array}{lll} & & \begin{array}{ll}({\rm{Lp}}): & \frac{1}{1-{{\rm{\Delta }}}^{2}{(\phi ^{\prime} -\lambda ^{\prime} t)}^{2}}\end{array}\end{array}$$
(37)
$$\begin{array}{lll} & & \begin{array}{ll} & \mathop{\to }\limits_{\varepsilon \to 0}\,\frac{1}{1+{\mathrm{(2}\pi {\rm{\Delta }}t/{\tau }_{0}^{2})}^{2}},\end{array}\end{array}$$
(38)
$$\begin{array}{lll} & & \begin{array}{ll}({\rm{u}}): & \frac{\sinh \,{\rm{\Delta }}(\lambda ^{\prime} t-\phi ^{\prime} )}{{\rm{\Delta }}(\lambda ^{\prime} t-\phi ^{\prime} )}\end{array}\end{array}$$
(39)
$$\begin{array}{lll} & & \begin{array}{ll} & \mathop{\to }\limits_{\varepsilon \to 0}\,{\tau }_{0}^{2}\,\frac{\sin \,(2\pi {\rm{\Delta }}t/{\tau }_{0}^{2})}{2\pi {\rm{\Delta }}t}.\end{array}\end{array}$$
(40)

The expressions are justified in their respective asymptotic limits. In particular, the Gaussian DPD, Eq. (33), derived via a saddle-point analysis, is valid at $$t\ll {\tau }_{0}^{3}/{{\rm{\Delta }}}^{2}$$. For Laplacian and uniform DPDs, we used the Laplace method to derive Eqs (37) and (39), which are valid at small disorder. In the case of the Lorenzian DPD, we used the Cauchy integral and integration around the pole at $$\tau ={\tau }_{0}-i{\rm{\Delta }}$$ of Eq. (8) to compute Eq. (35), again valid at small disorder. Note that in this case the slow decay of the tails in Eq. (8) makes the truncation at negative τ in Eq. (32) relevant only at $$t\ll {\tau }_{0}^{2}/{\rm{\Delta }}$$. Later in time, after the $${\tau }_{0}^{2}/{\rm{\Delta }}$$ threshold is reached, Eq. (32) transitions to an asymptotic, 1/t, decay originating from the DPD discontinuity.

#### Particle Simulations and Comparison with the Theory

To test our analytic results, we performed particle simulations of the dynamics of Eqs (1), (2) and (3). One first associates with each of N devices its own relaxation time, τ, drawn i.i.d. from one of the DPDs defined by Eqs (7), (8), (9) and (10). (Negative values of τ are rejected). Initially, at $$t=0$$, all devices are set to $$x={x}_{\downarrow }$$ and $$\sigma =+\,1$$, corresponding to the “worst case”, i.e., least mixed, initial distribution. Then the dynamics, advanced discretely and independently for each device, are implemented according to the following rules. At the beginning of each time interval, t, the state of each of the N devices, characterized by σ and x, is advanced in time according to the first-order (in time) version of Eqs (1) and (3). At each t, we monitor $${N}_{\uparrow }(t)$$, which is the total number of the devices in the state +1, also corresponding to the instantaneous energy consumption of the ensemble (under the model assumption that each device, when switched on, consumes the same amount of energy).

The results of the straightforward particle simulations are illustrated in Fig. 4 for four different DPDs. To facilitate comparison, we juxtapose the results of the simulations with the corresponding analytic predictions given by Eqs (33), (35), (37) and (39). We observe very good agreement between the theory and the simulations at short and intermediate times. The conclusion is reached based on comparison of the amplitude and the frequency of the oscillations and the relaxation rate of $${N}_{\uparrow }(t)$$. Note that the theory results are derived in the asymptotic, weak disorder regime described by Eq. (32) and its complex-conjugated expression corresponding to the same (worst case) initial condition as in the simulations; hence, there are no fitting parameters. We also observe that at sufficiently large t, controlled by the finite (not infinite) size of the ensemble, the theory and the simulations start to deviate. Indeed, when $$|{N}_{\uparrow }(t)-1/2|$$ becomes of the order of 1/$$\sqrt{N}$$, fluctuations associated with the finiteness of the ensemble start to dominate results of the simulations. In the simulations with $$N={10}^{5}$$, this threshold is reached at $$|{N}_{\uparrow }(t)-1/2|=O({10}^{-3})$$.

Comparing the four subfigures in Fig. 4 with each other is useful because it illustrates dependence of the ensemble mixing on different types of disorder.

## Discussions

The main conclusion of the manuscript is that both types of randomizations, smoothing out the bang-bang control via Poisson-delayed switching and introducing diversity of loads in the ensemble, result in acceleration of the mixing/recovery following a heavy DR use of the ensemble. Specifically, we have shown via rigorous analysis and numerical simulations that (a) increasing the level of control (decreasing the switching rate) is advantageous only at sufficiently large rates, $$r > {r}_{c}$$; and (b) diversity of the devices’ natural timescale (speed of cooling/heating), which is more “regular” (e.g., distributed according to the Gaussian DPD), is advantageous in leading to a faster mixing (more efficient recovery). In this paper we study quantitatively effect of the load diversity on the speed of the ensemble restoration to normal (steady distribution) following a significant DR perturbation. To the best of our knowledge, this is the first systematic analysis showing and explaining that disorder does provide a significant acceleration. It is also worth mentioning that similar analysis and effects (acceleration of relaxation to the steady state due to disorder) is expected in other applications where ensemble of particles follow the same stochastic signal. This expectation is consistent with numerical observations made in the context of signal tracking in16, where it was reported that diversity of loads results in a faster tracking. Moreover, many models of population dynamics17,18 and more generally of non-equilibrium statistical mechanics19, has mathematical structure resembling one analyzed in the manuscript thus suggesting that there should be a wide range of possible applications where disorder in the model parameters results in faster mixing.

Encouraged by the reported results, we plan to extend the study in the following directions:

• Complex Modeling. We envision considering more complex models of both the individual device dynamics and the ensemble compilation. For the former, different switching rates (for switching on and off) and more general dependence of the relaxation speed u on x are two practical complications that can be included in the analysis. For the latter (richer disorder), most significant generalization corresponds to adding disorder/inhomogeneity in other model parameters, such as switching on/off temperatures. Our working hypothesis is that these modifications/generalizations will lead to (possibly significant) quantitative but not qualitative changes in the predictions.

• Mean-Field, Nonlinear Control. Switching rate, r, communicated by the aggregator to consumers, was constant in the model discussed above. It is interesting to experiment with changing the rate, in particular allowing it to depend on the current state of the ensemble, i.e., on the instantaneous probability distribution in the (x, σ) space. This Mean-Field control improves greatly the relaxation time, as shown by the team in20. This intricate scenario is related to developing and extending the study to the so-called mean-field games and control21.

• Optimal Control. This manuscript has focused primarily on analysis of the stochastic ensemble with a control. However, the control in this setting was not optimal but rather preset. The natural evolution of this analysis (which would also complicate it) consists of a two-level formulation where solution of the problem analyzed here is also optimized. For example, one minimizes a cumulative cost including DR tasks (such as tracking time-evolving consumption signal from the system operator) and the mixing/recovery characteristics of the ensemble investigated above.

• Discrete Phase Space. Given practical constraints in the device resolution, it is natural to reduce the hybrid (continuous-discrete) state space of the analyzed model to a purely discrete space simply by binning the temperature. Moreover, following the logic of22,23, it is practically appropriate to also consider the resulting Markov Process (MP) model in discrete time. In fact, this MP formulation is also practically advantageous for analysis of the aforementioned optimal control, where the problem becomes of the Markov decision process (MDP) type, as in9,24,25. We would argue that the MP and MDP approaches are naturally appropriate and algorithmically attractive to account for the randomization effects analyzed in the manuscript.

• Data-Driven Control. Individual devices included in the aggregation may change their behavior, which then should be accounted for through data-driven identification of a device and ensemble parameters26. To track changes in real time and then account for them in the control, one would naturally resort to the data-driven approaches of the reinforcement learning type27, combining learning and control and aimed at developing on-line algorithms for optimal control.

## References

1. 1.

Demand response, https://en.wikipedia.org/wiki/Demand_response.

2. 2.

Lampropoulos, I., Kling, W. L., Ribeiro, P. F. & van den Berg, J. History of demand side management and classification of demand response control schemes. In 2013 IEEE Power Energy Society General Meeting, 1–5, https://doi.org/10.1109/PESMG.2013.6672715 (2013).

3. 3.

McDonald, J. E. & Bruning, A. M. Cold load pickup. IEEE Transactions on Power Apparatus and Systems PAS-98, 1384–1386, https://doi.org/10.1109/TPAS.1979.319340 (1979).

4. 4.

Chong, C. Y. & Debs, A. S. Statistical synthesis of power system functional load models. In 1979 18th IEEE Conference on Decision and Control including the Symposium on Adaptive Processes, vol. 2, 264–269, https://doi.org/10.1109/CDC.1979.270177 (1979).

5. 5.

Ihara, S. & Schweppe, F. C. Physically based modeling of cold load pickup. IEEE Transactions on Power Apparatus and Systems PAS-100, 4142–4150, https://doi.org/10.1109/TPAS.1981.316965 (1981).

6. 6.

Chong, C.-Y. & Malhame, R. P. Statistical synthesis of physically based load models with applications to cold load pickup. Power Apparatus and Systems, IEEE Transactions on PAS-103, 1621–1628, https://doi.org/10.1109/TPAS.1984.318643 (1984).

7. 7.

Callaway, D. & Hiskens, I. Achieving controllability of electric loads. Proceedings of the IEEE 99, 184–199, https://doi.org/10.1109/JPROC.2010.2081652 (2011).

8. 8.

Angeli, D. & Kountouriotis, P. A. A stochastic approach to dynamic-demand refrigerator control. IEEE Transactions on Control Systems Technology 20, 581–592, https://doi.org/10.1109/TCST.2011.2141994 (2012).

9. 9.

Bušić, A. & Meyn, S. Distributed randomized control for demand dispatch. In 2016 IEEE 55th Conference on Decision and Control (CDC), 6964–6971, https://doi.org/10.1109/CDC.2016.7799342 (2016).

10. 10.

Chertkov, M. & Chernyak, V. Ensemble of thermostatically controlled loads: Statistical physics approach. Scientific Reports 7, 8673 (2017).

11. 11.

Callaway, D. S. Tapping the energy storage potential in electric loads to deliver load following and regulation, with application to wind energy. Energy Conversion and Management 50, 1389–1400, https://doi.org/10.1016/j.enconman.2008.12.012 (2009).

12. 12.

Villani, C. Landau damping, Notes for a course given in Cotonou, Benin, and in CIRM, Luminy (2010).

13. 13.

Mouhot, C. & Villani, C. On Landau damping. Acta Math. 207, 29–201, https://doi.org/10.1007/s11511-011-0068-9 (2011).

14. 14.

Inc., W. R. Mathematica, Version 11.3. (Champaign, IL, 2018).

15. 15.

Corless, R. M., Gonnet, G. H., Hare, D. E. G., Jeffrey, D. J. & Knuth, D. E. On the Lambert-w function. Advances in Computational Mathematics 5, 329–359, https://doi.org/10.1007/BF02124750 (1996).

16. 16.

Espinosa, L. A. D., Almassalkhi, M., Hines, P. & Frolik, J. Aggregate modeling and coordination of diverse energy resources under packetized energy management. In Decision and Control (CDC), 2017 IEEE 56th Annual Conference on, 1394–1400 (IEEE, 2017).

17. 17.

Eftimie, R., De Vries, G. & Lewis, M. Complex spatial group patterns result from different animal communication mechanisms. Proceedings of the National Academy of Sciences 104, 6974–6979 (2007).

18. 18.

Eftimie, R., de Vries, G., Lewis, M. A. & Lutscher, F. Modeling group formation and activity patterns in self-organizing collectives of individuals. Bulletin of Mathematical Biology 69, 1537, https://doi.org/10.1007/s11538-006-9175-8 (2007).

19. 19.

Mallmin, E., Blythe, R. A. & Evans, M. R. Exact spectral solution of two interacting run-and-tumble particles on a ring lattice. Journal of Statistical Mechanics: Theory and Experiment 2019, 013204 (2019).

20. 20.

Métivier, D. & Chertkov, M. Mean Field Control for Efficient Mixing of Energy Loads. ArXiv e-prints, 1810.00450 (2018).

21. 21.

Huang, M., Malhame, R. P. & Caines, P. E. Large population stochastic dynamic games: closed-loop McKean-Vlasov systems and the Nash certainty equivalence principle. Commun. Inf. Syst. 6, 221–252 (2006).

22. 22.

Kamgarpour, M. et al. Modeling options for demand side participation of thermostatically controlled loads. In 2013 IREP Symposium Bulk Power System Dynamics and Control, 1–15, https://doi.org/10.1109/IREP.2013.6629396 (2013).

23. 23.

Paccagnan, D., Kamgarpour, M. & Lygeros, J. On the range of feasible power trajectories for a population of thermostatically controlled loads. In 2015 54th IEEE Conference on Decision and Control (CDC), 5883–5888, https://doi.org/10.1109/CDC.2015.7403144 (2015).

24. 24.

Bušić, A. & Meyn, S. Ordinary differential equation methods for Markov decision processes and application to kullback–leibler control cost. SIAM Journal on Control and Optimization 56, 343–366, https://doi.org/10.1137/16M1100204 (2018).

25. 25.

Chertkov, M., Chernyak, V. & Deka, D. Ensemble control of cycling energy loads: Markov decision approach. In Meyn, S., G., S., H., I., S., J. & Samad, T. (ed.) Energy Markets and Responsive Grids: Modeling, Control and Optimization (Springer, Series: Institute of Mathematics and Applications, 2018).

26. 26.

El-Ferik, S. & Malhame, R. P. Identification of alternating renewal electric load models from energy measurements. IEEE Transactions on Automatic Control 39, 1184–1196, https://doi.org/10.1109/9.293178 (1994).

27. 27.

Reinforcement learning, https://en.wikipedia.org/wiki/Reinforcement_learning.

## Acknowledgements

The work at LANL was carried out under the auspices of the National Nuclear Security Administration of the U.S. Department of Energy under Contract No. DE-AC52-06NA25396. The work was partially supported by DOE/OE/GMLC and LANL/LDRD/CNLS projects.

## Author information

M.C. conceived the project, D.M., I.L. and M.C. conducted the work. All authors reviewed the manuscript.

Correspondence to David Métivier.

## Ethics declarations

### Competing Interests

The authors declare no competing interests.

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Rights and permissions

Reprints and Permissions