Limits of Risk Predictability in a Cascading Alternating Renewal Process Model

Most risk analysis models systematically underestimate the probability and impact of catastrophic events (e.g., economic crises, natural disasters, and terrorism) by not taking into account interconnectivity and interdependence of risks. To address this weakness, we propose the Cascading Alternating Renewal Process (CARP) to forecast interconnected global risks. However, assessments of the model’s prediction precision are limited by lack of sufficient ground truth data. Here, we establish prediction precision as a function of input data size by using alternative long ground truth data generated by simulations of the CARP model with known parameters. We illustrate the approach on a model of fires in artificial cities assembled from basic city blocks with diverse housing. The results confirm that parameter recovery variance exhibits power law decay as a function of the length of available ground truth data. Using CARP, we also demonstrate estimation using a disparate dataset that also has dependencies: real-world prediction precision for the global risk model based on the World Economic Forum Global Risk Report. We conclude that the CARP model is an efficient method for predicting catastrophic cascading events with potential applications to emerging local and global interconnected risks.

Alternating Renewal Process. The definition of Alternating Renewal Process (ARP) originated in renewal probability theory. A simple renewal process alternates between two states: the normal state, which accounts for its operational time and the abnormal (failure or repair) state, which accounts for its holding (non-operational) time. Each state is governed by a Poisson process specific to this state 15,16 . The ARP can be used to find the best strategy for replacing worn-out machinery 17,18 .
The CARP model expands ARP in several ways. The most important extension is to allow for defining risks as a network of risks with the given set of weighted edges representing interdependencies and probabilities of state transitions associated with each edge. Since state transitions are often observed without distinction for the inner or external, edge induced transitions, both probabilities are treated as hidden variables, since only their joined effect is observed directly. Another generalization is not restricting the number of states to two. Finally, we associate with CARP an MLE procedure for recovery of CARP parameters from historical data of the system evolution over time.
Model Structures. The city modeled is diverse. A small fraction of houses is large and placed sparsely with low fire propagation probabilities and high recovery rates. A larger fraction of houses with a medium size have a higher spatial density and fire propagation probabilities. Finally, the most densely packed small houses have the highest fire propagation probabilities, and lowest recovery rates.
Houses are grouped into blocks stitched together into a continuous city as shown in Fig. 1(a). Large and small houses sit on the West and East boundaries of the city, respectively. The North and South boundaries border all three types of houses. The houses are placed on rectangular lots whose size is commensurate with the size of the house. A large house occupies a square lot of one by one unit size. There are four such squares horizontally laid out with no space in between. Each medium house has a rectangular lot of half unit length vertically by one unit length horizontally. Each block holds 8 medium houses. Finally, repeating a similar pattern, small houses sit on a rectangular lot of one by a quarter unit size. One block contains 16 small houses. Distances between two adjacent houses vary from 1 for large houses to 1/2 for medium and 1/4 for small houses. A basic block holds 28 houses on 12 square units of land. A city of arbitrary large size can be built by repeatedly adding blocks to it. Figure 1(c) shows the degree of each house, which is defined as the number of houses within a fixed distance. As shown in Fig. 1(a), all houses within a fixed distance may catch fire from the burning center house. Hence, the small houses have a relative higher degree because of the high density of nearby houses. The surrounding houses on the boundary of the city and the border of different communities experience a slightly smaller degree compared with other same-size houses. As we explain later, we assume that, the likelihood of a house catching fire is determined by two factors: the materials the house is made of and the density of its neighbors.

Results
Unlike real-life, in simulations, we can arbitrarily vary the length of time over which we collect historical data and produce many variants of such data to measure the prediction precision of our model. We start with a mixed model with 8 large houses, 16 medium houses and 32 small houses, for a total of 56 houses. The parameter recovery is applied at seven different intervals: 100, 200, 400, 800, 1600, 3200 and 6400 time steps. The parameter recover is run on 50 versions of historical data created by simulations running with different seeds for the random number generator to account for the randomness of the stochastic processes. To quantify the accuracy of parameter recovery, we compute the relative error between the recovered values and the target values used for creating ground truth data.
As shown in Fig. 2, the blue vertical dots represent the relative error of parameter recovery in one realization. There are 50 blue dots for each particular length of the training dataset. The scattering of blues dots represents the variation of parameter recovery accuracy. The red curve is the average value among these 50 realizations, which is very close to zero. Obviously, the variance is a better measurement for the precision since variance considers a Resilience has a variety of definitions, but the common definition is the planning, preparation, absorption, recovery, and adaptation of systems under negative events 11 . Predicting the limits of risk is consistent with the common definition.
Scientific RePoRtS | (2017) 7:6699 | DOI:10.1038/s41598-017-06873-x the positive and negative errors instead of canceling them out. The average of relative errors tends asymptotically to zero and the variance shrinks according to the power law with an increase in sample size which is the length of historical data series in this case. This trend is consistent with the asymptotic behavior of the MLE method 19 . When the sample size is very large, the relative error of realization follows a normal distribution with 0 mean value and a finite yet small variance. More data decreases relative error, and therefore it is useful to find a balance between run time and prediction accuracy.
The relative errors of α and βˆ are larger than that of other parameters. This is due to the combined effects of two Poisson processes causing the same transition. As defined, α and β represent the intensity of internal and external fire ignition processes respectively. In real life, it is often hard to determine the actual reason. During the parameter recovery, these two parameters influence chances of each other to start a fire, which impacts the computation of likelihood function. This nonlinear effect tangles the errors of α and βˆ together as larger value of α can be compensated by smaller value of βˆ and vice versa.
We also compute the standard deviation of relative error of recovered parameters. Figure 3(a) shows that the distribution of the standard deviation follows a power law. The standard deviation decreases very quickly and  Table 2. then slowly as historical data size increases. The double-logarithmic plot in Fig. 3(b) has slope close to −0.5, which shows that the standard deviation decreases in a power-law fashion as the training data size increases. For real-world case processes, it is impossible to get multiple historical datasets for parameter recovery, yet recovery error based on single dataset may be different from the one based on the average of such recovered values on multiple datasets. Hence, we study how sensitive our methodology is to imperfect input datasets. In Intensity for the process of starting fire internally in house i: Intensity for the process of externally transferring fire from house j to house i:     Using parameter values shown in Table 2, parameter recovery was run on 50 different historical datasets generated by simulations with different seeds for the random number generator; the results of these runs are represented by blue dots. The red dashed curves show the average values of the relative error. The visible trend is that the average of relative errors tends asymptotically to zero and the variance exhibits power law decrease as the number of time steps increases, which means more training data improves the performance of parameter recovery.
Scientific RePoRtS | (2017) 7:6699 | DOI:10.1038/s41598-017-06873-x order to test this sensitivity, we compare four cases of simulations in which we record the average length of time in each state and the number of emerging fires during four simulation periods: 400, 800, 3200 and 6400. The first case is using the target values of parameters. The second case is using the averaged value of parameters recovered from 50 independent realizations. The third case employs adding one standard deviation σ to the average value of recovered parameters. The last case is subtracting one standard deviation σ from the average value of recovered parameters. The four sets of parameters employed in our simulation are listed in Table 1.
The average values of recovered parameters come from 50 independent realizations with 6400 time steps of training data. The standard deviations are: σ α = 0.00053, σ β = 0.000372, σ γ = 0.00026, σ δ = 0.0007. When adding one standard deviation σ to average value of α, we subtract σ from βˆ and vice versa. We assume complementary recovery errors on α and β since both of them contribute to the fire triggering process so the maximum value of one is likely to be reached at the minimum of the other. This corresponds to the assumption that the unique historical dataset produces parameter values within one σ of their average values, more stringent assumption might require using broader interval around average values of parameters. In simulation, we record how long each house stays fully operational, on-fire and in recovery and record the number of new fires during the simulation. We generate over 50 different historical datasets and gather results upon reaching five simulation periods: 400, 800, 1600, 3200 and 6400. With each dataset, we run 50 independent realizations to assess the predictability of our methodology. There is a trade-off between the running time and precision. To save computational effort, we set the number of independent realizations at a moderate value of 50. Figure 4 shows that all four parameter estimations have similar precision. In Fig. 4(a), the average duration of the fully operational state is almost the same for four cases since two parameters α and β have opposite influences on fully operational houses. In Fig. 4(b,c), the gap between the cases of estimated parameters ±σ is very small. In Fig. 4(d), the number of emerging fires for all cases are increasing linearly as a function of time. The largest relative error is in predicting the length of fire, and even that is just below 2%. For simulation efficiency, we set the target values of parameters in such a way that the duration of staying in a state is short and of the same order for all states, while in reality they are different, for example for any house staying on fire is orders of magnitude shorter than getting rebuilt. We intentionally set large and comparable target values to generate sufficient state transitions and test the quality of parameter recovery in an arbitrary case. Table 1 shows that the estimated parameters approach the target values very well, even if the target values are of the same order as the initial settings.
In addition to the parameter recovery from different lengths of historical datasets, we study how the precision of parameter recovery varies against the complexity of the system and the standard deviation of number of fires starting each day. Figure 5(a,b) show the relative errors of recovered parameters for various city sizes (number of houses): 28, 56, 112, 224 and 448. Here, we compare two cases. In the first case, we keep three types of houses and use the parameter values shown in Table 1. The blue dots in Fig. 5(a,b) represent the results of the first case. In the second case, we initialize all houses with the same value of N 1,i = 0.3 and N 2,i = 0.3, which are equal to the values for middle houses. The red dots in Fig. 5(a,b) represent the results from this case. Therefore, we remove the influence of house types on the results. In both cases, the length of historical data is 1600 time steps and we run over 20 different datasets. Only two parameters (α and βˆ) are shown in this figure since they are involved with the fire igniting process and another two parameters have similar trend. We find that the parameter recovery in a larger city has a smaller mean value and variance of relative errors. As the city's size increases, there are more state transitions within specific periods, leading to a more precise recovery of our parameters. To study the impact of intensity of emerging fires on recovery precision, we change the values of N 1,i , which define the intensities of fires. We compare three cases: N i 1, in red, N 1,i in blue and N i 1, 2 in cyan. Figure 5(c,d) show the relative error of recovered

Parameter Recovery Precision in Global Risk Network.
Here, we show how to apply the presented approach to a disparate, real-life dataset of the global risk network, which, as with fires in cities, exhibits spreading risk activation 1 . Using CARP, we estimate hidden parameters of global risks previously modeled using an Alternating Renewal Process 1 . Experts from the World Economic Forum 2013 Global Risks Report 20 define the properties of 50 global risks grouped into five categories: economic, environmental, geopolitical, societal, and technological. These assessments include the likelihood, impact of materialization, and connections of each risk. We take the advantage of this crowd-sourcing assessment to build an interconnected network to simulate risk propagation through the system.
In the global risk network, each risk has binary states (normal and active), and the state transitions follow Poisson processes. The difference is that in the global risk network, there are three state transitions instead of four in the fire-propagation model: 1, internally igniting fire process transitioning the risk from normal to active state; 2, externally starting fire process also transitioning risk from normal to active state; and 3, starting recovery process transitioning risk from active to normal state. Hence, we have three control parameters (α, β, γ) and recover the optimal parameter values from historical events. Using data collected over a 156-month period and Maximum Likelihood Estimation, we recover the following parameter values: α = 0.003038, β = 0.00117 and γ = 3.5561. They control internal risk materialization (α), external risk materialization (β) and recovery (γ) processes, and their detailed definitions can be found in ref. 1 . Once we obtain the parameter values, we can simulate the stochastic processes driving model evolution. As in the case of the fire-propagation model presented here, we use these recovered parameters as the ground truth parameters for establishing the parameter recovery precision.
We apply the same method used in Fig. 4 to this global risk network. For the precision evaluation, we obtain multiple recovered-parameters from different lengths of alternative historical data to predict the number of risk materialization for a selected period repeatedly with different random generator seeds. First, based on the ground truth parameters recovered from the real historical data (α, β, γ), we generate 120, 240, 480 and 960 monthly time steps of alternative historical data (representing 10, 20, 40 and 80 years) for parameter recovery. Then, 125 cases of parameter recovery (α i , β i , γ i , i = 1 … 125) are finished for each period of time. Next, using the recovered parameters from both real and alternative historical data, we complete 20 realizations for a prediction of 4 periods: 120, 240, 480 and 960. In each period, we use the average value of risk materialization among all 20 realizations to measure the performance of each simulated case. After that, we compute the relative error of the average number of risk materializations between the case with alternative recovered-parameters (α i , β i , γ i , i = 1 … 125) and the ground truth parameters (α, β, γ). In the end, we remove the 39 simulation cases with the worst performance (with the largest absolute relative error). The remaining 86 cases (68.8% of 125 cases) of results determine the ±σ boundary for the performance of the system with estimated parameters. Figure 6(a) shows a histogram of the number of materializations in each case. Dash curve represents a Gaussian fitting over the histogram. As the length of simulation period increases, the distribution of the number of materialization gradually approaches a normal distribution. Meanwhile, the distribution shifts to the right and gets close to a steady level. In the case with longer time steps for parameter recovery and prediction simulation, the predictability is more consistent, and variance shrinks faster. Figure 6(b) shows the boundary of ±σ performance for the number of materializations in each period. It is evident that the distance between the boundaries is decreasing as we increase the length of the period, which implies a higher precision of prediction. Figure 6(c) shows the number of materializations for each risk in the case of 960 time steps. The difference between three cases is slight indicating a consistent prediction of estimated parameters.

Discussion
The CARP model is used to simulate and then recover parameters of heterogeneous stochastic processes. First, we created a model of fires in the cities that we use to illustrate our approach. Using assumed parameters values, we generated several historical datasets and used them to measure parameter recovery precision. The results confirmed that the accuracy of our method increases as the amount of data increases even in the presence of parameters hidden from direct observations.
Applying our approach to real-life cases, we started with the recovery of the model parameter values based on unique and limited real-life ground truth data. Then, using these values as ground truth, we finished simulations to create many alternative historical datasets. Then using these historical datasets, the parameters are recovered by applying MLE method. Next, we compared the results to the assumed ground truth values to measure the accuracy of recovery. The standard deviation of relative error of parameter recovery exhibits a power-law decay with an exponent value of −0.5 as the training data size increases. The resulting statistics enable us to verify the reliability of predictions based on originally recovered parameters. We did so by comparing original predictions to predictions based on parameters differing from their average values by the desired multiple of their standard deviations as was demonstrated for the city model. We record the duration of each state (normal, on-fire, and recovery) and the number of emerging fires within the simulated period. The largest relative error of these variables is just below 2%.
In conclusion, we showed that the CARP model is a novel approach to predict and simulate risks. It is particularly useful for modeling cascading catastrophic events and thus has potential applications for analyzing local and global risks. Local risks were demonstrated using simulated fires in cities. However, the CARP model was also successfully used to model global risks in earlier work 1 . A better understanding of globally networked risks is critical to predicting and mitigating them 21 . Most of the world's critical infrastructure forms a complex, interconnected network prone to cascading failures with potentially devastating consequences to global stability 22 . Quantifying the limits of risk prediction, which are bounded by the amount of data, may inform earlier planning and thus potential mitigation of danger of risks spreading and its adverse consequences.

Methods
Discrete Model. Based on the structure of the fire-propagation model, we can simulate the fire cascades throughout the entire network using CARP. At time t, each house is either in state −1 (recovery), state 0 (fully operational) or state 1 (on-fire). Houses in the recovery state are under reconstruction and are immune to fire. Hence only fully operational houses are susceptible to fire. The burning (on-fire) state with certain probability switches to a state of recovery. Each house alternates between these three states.
The state transition is invoked by four types of Poisson processes. A house i transits from state 0 to 1 (on-fire) for internal reasons according to a Poisson process with the intensity λ i int . In this event, a fire starts inside of the house, caused by events such as overheating of electrical appliances, unattended stoves or other possible accidents. This house can make this transition if its neighbor j ignites the fire externally through a Poisson process with the intensity λ ji ext . A transition from state 1 (on-fire) to −1 (recovery) also follows a Poisson process, with the intensity λ i etg (extinguishing the fire). Finally, transition from state −1 to 0 is caused by a Poisson process with intensity λ i rec (completing the recovery process). For all the events discussed above the exact time of occurrence is not known, hence we use a discrete time step to accommodate this uncertainty and round up the event time to the nearest integer step. Hence, the evolution of the system can be viewed as a discrete-time series of stochastic processes with three states. For convenience, we assume here that each time unit represents one day of the real world. Consequently, all fires starting on the same day are considered to be starting simultaneously. As shown in ref. 1 for real time t measured in finite time units (days here) for events generated by a Poisson process, an event happening in at most ⌈ ⌉ t time units is identical for the corresponding Bernoulli process. The states of all houses at time t can be represented by a state vector → S t ( ). Here, like in ref. 1 , we assume that experts provide assessments of each house's fire resistance. The value of N 1,i represents the likelihood of house i to catch fire internally or externally, and N 2,i represents the likelihood that the house fire is extinguished and house is rebuilt over large period of time. In the following, we define control parameters for state transitions with details listed in Table 2. The internal risk materialization is controlled by parameter α. The external risk materialization is controlled by parameter β. However, we assume that only the state transition from fully operational to on-fire state is observable, without knowing the actual reason for it. So, impact of an individual parameter α or β is hidden from direct observation. In contrast, parameters γ, controlling recovery and δ controlling fire extinguishing are independent of each other, so impact of each of the two can be observed in the changes in the evolution of corresponding underlying process. We list events and parameters in the order compatible to the way they were ordered in the World Economic Forum model 1 in which only three events (int, ext and rec) and parameters (α, β, γ) exist, as defined in Table 2. We set N 1,i = 0.4 for large, 0.3 for medium and 0.2 for small houses and N 2,i = 0.2 for large, 0.3 for medium houses and 0.4 for small houses. We assign target values to our parameters α = 0.08, β = 0.012, γ = 0.016 and δ = 0.032.
To summarize, the dynamics progress in discrete steps t = 1, …, T and the probability of transition in each step is defined by the intensity of the corresponding Poisson process as shown in Table 2. The dynamics described above and shown in Fig. 1(b) imply that the state of the system at time t depends only on its state at time t − 1, and therefore the evolution of the state vector → S t ( ) is Markovian. The definitions of parameters and equations mapping them into intensities of Poisson processes are listed in Table 2.
With the assumed expert assessments and the created ground truth parameter values for the model, we simulate the evolution of house states for a particular period and record the frequency of emerging fires. In Fig. 1(d), we show the fraction of time each house is on-fire during one million time steps. The range of this on-fire fraction varies from 0.12 to 0.32. The fraction increases when the size or degree of the house increases, but areas of higher density housing suffer higher on-fire fractions than indicated by their degree.

Precision Limit of Maximum Likelihood Estimation (MLE). In our approach, we use Maximum
Likelihood Estimation (MLE) to recover parameters from ground truth data. The main reason for this choice is that state transitions are governed by independent inhomogeneous probability distributions for which MLE delivers consistency and asymptotic normality with sufficient amounts of observed data 26 . The historical data represents the combined effects of four Bernoulli processes. Our purpose is to recover the unknown parameters α, β, γ and δ mapping the expert assessments into event probabilities for Bernoulli processes of our model. We denote α β γˆ, , and δˆ to be the recovered (estimated) values of each parameter. The use of MLE to find the values of hidden parameters from observed events has not been studied, yet ref. 23 indicates that it is feasible. In our approach we split one of the parameters of directly observable events into a pair of hidden (and tangled) parameters of two processes and recover these two parameters from observed events.
We denote the unknown parameters as θ. Given the n observations x 1 , x 2 , … x n , the likelihood function of this set of observations is defined as: When the distribution is discrete, f is a frequency function, defined explicitly by Eq. 7, and the likelihood function L(θ) shows the probability of observing the given data. The Maximum Likelihood method 5 finds the values of parameters that yield the maximal probability of observing the given data. Logarithms are monotonic and therefore the likelihood and its logarithm have maximum at the same argument. Since the observed data comes from independent distributions, the logarithm of likelihood function can be written as: i n i 1 For continuous and smooth likelihood functions, which is the case here, we can scan the parameter space in order to find the maximum point for θ ln ( )  . The historical data size is limited, so we need to study how this limitation affects the precision of results. In this section, we derive an estimation of the optimal parameters for large sample sizes. Under appropriate smoothness conditions, the estimate is consistent with large data sets and obeys the asymptotic normality. Detailed description of these conditions can be found in ref. 19 . These conditions can be summarized as: the first three derivatives of the Let θ 0 denote the true value of the parameter θ and θˆ to be its recovered value. Once the smoothness conditions are met, the recovered value θˆ converges to the true value θ 0 as n → ∞. If we normalize θˆ, we obtain an approximation from a normal distribution:σ Given the definition of Fisher information 24 (I(θ)): The asymptotic normality of MLE can be written as: n 0 0 The variance of the normalized estimate decreases as I(θ 0 ) increases. This asymptotic variance to some extent measures the quality of MLE. Although it is hard to compute the variance analytically in our model, we know Scientific RePoRtS | (2017) 7:6699 | DOI:10.1038/s41598-017-06873-x there exists an estimation limit and the performance of MLE becomes better as the volume of given data increases. In the next section, we demonstrate that the variance indeed decreases as the size of the training dataset increases and the mean values of θˆ approaches θ 0 with the error approaching 0.
Parameter Estimation in Fire-propagation Model. Given the historical data about each house transition in a finite period, we use Maximum Likelihood 5 to find values of model parameters that make the model optimally match historical data. State transitions are independent functions with unknown parameter values. Since the historical data is generated from the discrete stochastic process, the likelihood of observing a particular sequence of risk materializations can be written as: In the training process, we compute the probability of state transitions using p i int and p ji ext for transition into fire, p i etg for transition into recovery and p i rec for transition into the fully operational state. Correspondingly, there are three cases of state transitions in the historical data.
A transition from the fully operational state to the on-fire state (0 → 1) happens when a house catches fire due to internal or external reasons. The probability of internal ignition is p i int . For external influence, we compute first the probability that none of the neighbors ignited this house, and then take the complement of this value. Neighbor j fails to ignite house i with probability − p 1 ji ext . Taking the product over all on-fire neighbors gives us the probability that there are no externally triggered fires: where N i is the set of all on-fire neighbors of house i. The complement of this product defines the probability that at least one neighbor ignites house i. Adding the probability of such external ignition to the probability of internal ignition, we obtain the total probability of a house catching fire: The maximum likelihood parameters are obtained by summing the logarithms of corresponding probabilities. After scanning the potential ranges of the model parameters, we find the globally optimal values that maximize the likelihood of the historical data. The closeness of the recovered parameters to their values set in the simulations measures how precisely our model captures the dynamics of the system.