Introduction

Given the essential role transportation plays in emergency response, provision of essential services, and maintenance of economic well-being1, the resilience of urban road networks to natural hazards, especially flooding events, has received increasing attention2, 3. Floodwaters in urban networks propagate over time and space, inducing a great deal of spatial–temporal uncertainty vis-a-vis protective actions, such as evacuation, and rapid emergency response4. Developing effective prediction tools to forecast the characteristics of flooding events is critical to the enhancement of urban road network resilience5.

Multiple studies have explored the spatial–temporal properties of floods in urban networks, including impact evaluation of environmental stress6,7,8 and cascading effects in road networks9, 10. In particular, empirical studies adopting remote sensors11, hydraulic data12, or satellite images13 have attempted to capture the properties of urban flooding. Temporal evolution of flood status is driven by the time-dependent profile of environmental stress, such as the duration of rainfall in hurricanes12. This temporal information facilitates identification of the outbreak and inflection points for flooding in affected networks. Flooding also exhibits high spatial correlation14 in which the co-located road segments are more likely to be flooded in the immediately succeeding time increments, indexed to digital timestamps on the model15. Specifically, a hydrologic study has shown that proximity to flooded areas is a significant predictor of regional flood frequency based on regionalized flood quantiles for 575 Austrian catchments16. While empirical studies illustrated the complex spatial–temporal dynamics of floods, their capabilities for flood prediction often rely on various types of hydro-geomorphological monitoring datasets and intensive computation17. Due to delay and computational cost issues, the existing physics-based hydrodynamic models may not be conducive to providing timely and reliable predictions for the spatial–temporal spread of floods and the failure of road segments within short time periods18.

To overcome the limitations in empirical models, machine learning techniques have been proposed and tested for predicting the spread of floodwaters in urban areas19, 20. Compared to the hydrodynamic methods, machine learning models, such as multiple linear regressions21, deep neural networks22, and Bayesian forecasting models23 require fewer input parameters, so that the models can be easily trained on historical flood event data. For example, Khosravi et al. tested four decision tree-based machine learning models—logistic model trees, reduced error pruning trees, naïve Bayes trees, and alternating decision trees—for flood mapping24. The results show that, with adequate training, these models can achieve greater than 80% accuracy for predicting the flooded locations. Youssef et al. integrated frequency ratio and logistic regression models to evaluate the correlation between flood occurrence and various potential factors and developed a model providing an acceptable prediction accuracy14. Although existing machine learning models can achieve good predictive performance to capture the flood propagation in urban networks, these models are limited due to their dependence on large sets of historical data for model training. In addition, existing machine learning models are designed to capture only the propagation of flood in urban areas. The flood recession process, which is also important for assessing the resilience of urban networks, is often ignored by the existing machine learning models.

Recognizing the limitations of existing models, there is a real need for mathematical models that can capture the spatial–temporal evolution of flooding without relying on a variety of input parameters and historical data such as the volume of waters and the width of the roads. Recent studies have demonstrated a surprisingly significant similarity among spreading processes in different systems, including the spread of traffic congestion in transportation, the contagion of infectious disease in populations, the diffusion of ideas in social networks25, as well as the evolution of flooding in urban road networks26, 27. Motivated by these studies, our goal in this research was to describe the floodwater spreading process using generalized mathematical contagion models, such as classical epidemic models28. Existing epidemic models offer an analytical and numerical framework to quantify and forecast multiple spread phenomena in a variety of contexts. In particular, the popular susceptible-infectious-recovered (SIR) model created the basic building blocks of epidemic modeling using infectious and recovery rates. These mathematical models have two fundamental hypotheses: compartmentalization, in which each entity is associated with a state or compartment; and homogenous mixing, in which each entity has the same chance of contacting an inflected entity29. In the context of flooding, each road cell is associated with a state, functional-flow or flooded. Hence, the flooding propagation problem fits this compartmentalization hypothesis. While these hypotheses simplify the modeling of contagion by eliminating the need to know the structure of the networks, the mathematical models can still capture the temporal evolution of the fraction of infected entities in the networks very well.

Flood risk prediction is a task that should take into account both the temporal and spatial natures of the floodwater in road networks. Urban flood risk characterization requires not only knowledge of the fraction of flooded roads at each timestamp, but also needs to identify the geographic locations of flooded roads as flooding unfolds. Hence, pure mathematical models are not able to satisfy these requirements. To this end, the network percolation process has gained attention recently because it enables to capture of the propagation process through the topological connectivity in networks30. As defined in percolation theory, the spread of infection relies on the probability of the infected neighbors, in which the heterogenous mixing assumption is held in local components of the networks31. Specifically, the infection spreads from an initial node along edges of the percolated network32. Hence, the percolation process reflects the “amplification” effect of neighbors and weakens global network interactions. An infection is more likely to be transmitted to those the node comes encounters. This characteristic is essential for flood propagation prediction in urban networks since it considers the spatial co-location and constraints of urban networks. Without the temporal information about the fraction of flooded roads, however, the percolation process would fail to capture the temporal evolution of flooding in urban networks.

This study proposed and tested a network percolation-based contagion model that integrates the mathematical framework and network percolation process to predict the spatial propagation and temporal evolution of flooding in urban road networks. The mathematical framework fits the temporal dynamics of the flood situations, and the percolation process identifies locations of flooded road segments. To illustrate the performance of the model, we applied it to a case study of flood evolution in Harris County road networks during Hurricane Harvey. Potential control strategies are also identified based on the outcomes of the model in the case study.

The network percolation-based contagion model

The proposed model is composed of four components: road network modeling, flood spread characterization, flood percolation process, and model evaluation. This epidemic-like model of flood spread process in urban road networks considers both global dynamics of flood scales and local probability of affecting other co-located roads within a set of neighbors.

Road network modeling

Road networks contain hundreds of thousands of road segments, most of which are quite short, about 100–800 m each. The traffic status information collected at the road segment level provides sufficient spatial resolution to precisely estimate the scale of flooding in road networks.

Definition 1.

A road segment is a basic unit with a starting point and an ending point, which can be assembled into a whole road in the order of the points.

Each point is associated with a longitude and a latitude so that it can be located on a geographical map. Then, a road segment can be represented as \(\left( {lat_{s} , lng_{s} , lat_{e} , lng_{e} } \right)\), where \(lat_{s}\) and \(lng_{s}\) are the latitude and longitude of the starting point, while \(lat_{e}\) and \(lng_{e}\) are the latitude and longitude of the ending point.

Although road segments enable good resolution for understanding the situation in urban networks, flooded areas are usually not restricted in a banded road segment. Floodwaters tend to start from a point and spread in all directions. In addition, massive segments and their complex connections in the networks would also cause intensive computational cost. Hence, in modeling road networks, segment-to-segment modeling is limited to follow the nature of flood spread and achieve efficient computation. To this end, grid decomposition, a commonly used method33, is adopted in this study to generate equal-sized cells and divide the study area into small regions (see Fig. 1).

Figure 1
figure 1

A schema of converting flooding status from a road network to grid.

Definition 2.

A road cell is a square over a rectangular projection of the geographical map.

The spatial boundary of the cell can be represented as a set of geo-coordinates \(\left( {lat_{bl} ,lng_{bl} ,lat_{ur} , lng_{ur} } \right)\), where \(lat_{bl}\) and \(lng_{bl}\) are the latitude and longitude of the bottom left corner, while \(lat_{ur}\) and \(lng_{ur}\) are the latitude and longitude of the upper right corner. To covert the road segments to grid, we apply the following criteria to the road network:

$$s_{i} \in grid_{j} ,\,{\text{if}}\,lat_{bl} \le \frac{{lat_{s} + lat_{e} }}{2} \le lat_{ur} \,{\text{and}}\;lng_{bl} \le \frac{{lng_{s} + lng_{e} }}{2} \le lng_{ur}$$
(1)

After grid decomposition, the road network assembled from road segments could be described as a series of cells. In practice, multiple factors would influence the outcome of grid decomposition. For example, a large cell would include many segments, which might lead to losing spatial resolution. Accordingly, the model would further lose the capability of capturing the spatial spread of flood. On the other hand, a small grid that cannot cover at least one road segment will partition the network into discontinuous components, and subsequently increase the computational cost. Hence, grid decomposition requires pre-testing to ensure that a cell in the grid is able to maintain spatial resolution without unduly burdening computation.

Once the road segments are assigned to cells, we remove cells lacking segments to reduce computational cost. The remaining cells form a network in which the cells are considered as nodes, and their shared borderlines are considered as links. By doing so, we can construct an undirected grid network \({\mathcal{G}}\) with average degree of \(k\) to represent the topology of the road networks. Average degree of a road cell is the average number of adjacent cells per cell in the road network. To model the flood propagation and process, we then associate a dynamical binary state variable \(x\) to each of the \(N\) cells (also called nodes) of the grid network \({\mathcal{G}}\), such that \(x_{i} \left( t \right) \in \left\{ {0,1} \right\}\) represents the flood status of node \(i\) at time \(t\). Using a standard notation, we divide the cells into two classes, functional flow \(\left( F \right)\) and flooded \(\left( C \right)\), corresponding respectively to the values 0 and 1 of the status variable \(x\). Functional flow is a state in which traffic can utilize a road (regardless of traffic level). In the context of flooding spread process, the status \(C\) represents the cells which have been flooded. At each time \(t\), the macroscopic flooding situation is given by the fraction of flooded cells \(c\left( t \right) = \frac{1}{N}\mathop \sum \limits_{i = 1}^{N} x_{i} \left( t \right)\).

Flood spread characterization

The flood propagation and recession process are temporally and spatially variant. To capture the temporal nature of flood evolution in urban road networks, the proposed model considers macroscopic characteristics to predict the temporal evolution of floodwater spread in urban networks.

In the first step, we define four flooding statuses for a cell: functional flow, exposed, flooded, and recovered, statuses by including the temporal attributes. \(C\left( t \right)\) represents the number of flooded cells in the network at time \(t\); \(F\left( t \right)\) represents the number of functional flow cells at time \(t\); \(E\left( t \right)\) represents the number of cells that are in flood incubation stage (i.e., roads in the path of approaching floodwater but on which traffic still moves) at time \(t\); and \(R\left( t \right)\) represents the number of cells that have recovered from flooding at time \(t\). Given a grid network \({\mathcal{G}}\) with an average degree of \(k\) and \(N\) nodes, each cell in the network has functional flow at time \(t = 0\). That is,\({ }F\left( {t = 0} \right) = N\) and \(C\left( {t = 0} \right) = 0\). The flood initially occurs at a set of nodes and then propagates throughout the network. From a macroscopic perspective, a cell in the undirected network is on average connected to \(k\) other cells. The neighbors of a flooded cell are exposed to flood at a rate of \(\beta\). In modeling the temporal evolution, the connections of the cells are assumed to be homogeneous, which forms the basis to formulate a general differential equation system. Then, the probability of a flooded cell being connected to a functional-flow link is \(F\left( t \right)/N{ }\) at time \(t\). Therefore, a flooded cell comes into contact with \(kF\left( t \right)/N\). Since \(C\left( t \right)\) flooded cells have water flowing, each at a rate \(\beta\), and the exposed cells become flooded at a fixed rate \(\alpha\), the average number of new exposed cells \(dE\left( t \right)\) during a unit timeframe \(dt\) is determined as follows:

$$\frac{dE\left( t \right)}{{dt}} = \beta k\frac{{C\left( t \right)\left( {N - C\left( t \right) - E\left( t \right) - R\left( t \right)} \right)}}{N} - \alpha E\left( t \right)$$
(2)

The exposed cells are not completely inundated, and traffic can still pass through, but they could be flooded within the next few timestamps. We call this stage, flood incubation stage. The defined four statuses of the road cells are the only statuses that a road cell could have. Hence, the sum of the four status variables should be \(N\). That is, \(N = C\left( t \right) + E\left( t \right) + F\left( t \right) + R\left( t \right)\). To capture the fraction of the cells in each status, we use the following variables: \(c\left( t \right)\) to represent the fraction of flooded cells in the grid network at time \(t\); \(f\left( t \right)\) to represent the fraction of functional-flow cells in which all road segments are accessible at time \(t\); \(r\left( t \right)\) to represent the fraction of recovered cells from flooding at time \(t\); and \(e\left( t \right)\) to represent the fraction of cells that are exposed to flooding but still in flood incubation at time \(t\). Then, the differential equation for the change rate of exposed cells can be derived as follows:

$$\frac{de\left( t \right)}{{dt}} = \beta kc\left( t \right)\left( {1 - c\left( t \right) - e\left( t \right) - r\left( t \right)} \right) - \alpha e\left( t \right)$$
(3)

where, the product of \(\beta k\) is called transmission rate or transmissibility. The transmissibility can be used to measure the capability of floodwater to spread in an urban network under the same propagation rate. This also allows us to understand the effects of the topological structure of an urban network on the transmission of floodwaters.

Since a fraction of the functional-flow cells become exposed at each timestamp, the decreasing rate of the fraction of functional-flow cells can be represented by:

$$\frac{df\left( t \right)}{{dt}} = - \beta kc\left( t \right)\left( {1 - c\left( t \right) - e\left( t \right) - r\left( t \right)} \right)$$
(4)

Simultaneously, in a unit timeframe, a fraction of exposed nodes would be flooded at a rate \(\alpha\), and some of the flooded cells recover at a rate \(\mu\). Hence, the changing rate of the fraction of flooded cells and the fraction of recovered cells can be formulated as:

$$\frac{dc\left( t \right)}{{dt}} = - \mu c\left( t \right) + \alpha e\left( t \right)$$
(5)
$$\frac{dr\left( t \right)}{{dt}} = \mu c\left( t \right)$$
(6)

Evidently, in a large-scale network where \(N\) is large, the probability of a flooded cell being connected to a functional-flow cell could be close to zero. At the macroscopic scale, the assumption of homogeneous mixing makes the prediction more tractable and robust. It should also be noted that the model does not include mortality (i.e., significantly damaged roads that would not be functional after the floodwater recedes). Since it is not often the case that a flooded road cell is severely damaged, excluding the mortality is reasonable and realistic. Based on the former constructs, the model component derived for capturing the temporal dynamics of flooding scale using macroscopic characteristics is established (Fig. 2A).

Figure 2
figure 2

The network percolation-based contagion model of flood propagation and recession in urban road networks. (A) The contagion model for capturing the rate of flooding cell; and (B) a schema of network percolation-based flood propagation and receding process.

Network percolation process

The spatial nature of floodwater spread in urban networks is modeled using a network percolation process. The network percolation process describes the contagion effects of a flooded cell on its network neighbors34. Specifically, the cells whose neighbors are flooded are more likely to be flooded than the cells whose neighbors are not flooded29. Similarly, floodwater is more likely to recede first from the cells whose neighbors are not flooded compared to the flooded cells whose neighbors are also flooded. To characterize this spatial aspect of flood spread, we define the probability of a node to be flooded or to be recovered in the next timestamp based on the number of flooded neighbors. The propagation and recession processes are modeled as described below, respectively (see Fig. 2B).

In the propagation process, the percolating cluster is a set of road cells which are flooded next to another bounded by functional-flow cells at a specific time interval. Since the extreme event, Hurricane Harvey in this study, affected a large area, multiple percolating clusters presented in the road network. They were spatially scattered across the network at the early stage of flooding. As the event progressed, some clusters broke through the bound and formed larger percolating clusters. We can obtain the number of flooded cells \(N_{c}^{\left( t \right)}\) at a unit timestamp \(t\) by the predicted fraction of flooded nodes \(c\left( t \right)\) and the total number of cells in the networks. The calculation can be formulated as:

$$N_{c}^{\left( t \right)} = N \cdot c\left( t \right)$$
(7)

The flooded cells at the unit timestamp \(t\) are composed of flooded cells \(N_{c}^{{\left( {t - 1} \right)}}\) at the last timestamp, plus the additional flooded cells \(N_{c}^{\left( t \right)}\) at timestamp \(t\), excluding the recovered cells. In reality, however, the fraction of recovered cells is negligible during the propagation period before reaching the flooding peak. Hence, in our model, the number of additional flooded cells, \(N_{p}^{\left( t \right)}\), in the current timestamp \(t\) is obtained by:

$$N_{p}^{\left( t \right)} = N_{c}^{\left( t \right)} - N_{c}^{{\left( {t - 1} \right)}}$$
(8)

As discussed earlier, the spatial pattern of the propagation process (\(F \to E \to C\)) is controlled by the fraction of flooded neighbors of a cell. Hence, we assign the probability of flooding (i.e., the fraction of flooded cells among all neighbors) to the unflooded cells. Each unflooded cell will be assigned a probability \(p_{i}^{\left( t \right)} \in \left\{ {p_{1}^{\left( t \right)} ,p_{2}^{\left( t \right)} , \cdots \cdots ,p_{k}^{\left( t \right)} } \right\}\), where \(k \le N\). Then the cells are sorted based on their probabilities of flooding from high to low. The additional flooding cells at the timestamp \(t\) are identified from the cells with high probabilities of flooding, subject to the number of additional flooded cells, \(N_{p}^{\left( t \right)}\). Hence, the percolation threshold is selected based on the number of additional flooded cells predicted by the proposed contagion model and the sorted probabilities of flooding to other cells.

Like the propagation process, flood recession (\(C \to R\)) can also be modeled in a way similar to the network percolation process. The percolating cluster is the set of road cells with functional flow next to another bounded by flooded cells. We first calculate the number of recovered cells \(N_{r}^{\left( t \right)}\) at the timestamp \(t\) based on the value of r \(\left( t \right)\) obtained from the flood dynamics model:

$$N_{r}^{\left( t \right)} = N \cdot r\left( t \right)$$
(9)

Floodwater starts receding after the peak of flooding; in actual experience, additional flooded cells usually do not occur. Hence, in the spatial prediction, the number of flooded cells \(N_{c}^{\left( t \right)}\) at the current timestamp \(t\) is equal to the number of flooded cells \(N_{c}^{{\left( {t - 1} \right)}}\) at the last timestamp \(t - 1\), minus the number of recovered cells \(N_{r}^{\left( t \right)}\) at the current timestamp \(t\). The calculation can be formulated as:

$$N_{c}^{\left( t \right)} = N_{c}^{{\left( {t - 1} \right)}} - N_{r}^{\left( t \right)}$$
(10)

In the next step, we assign the probabilities of flooding (i.e., the fraction of flooded cells among all neighbors) to the flooded cells. Each flooded cell is assigned a probability \(p_{i}^{\left( t \right)} \in \left\{ {p_{1}^{\left( t \right)} ,p_{2}^{\left( t \right)} , \cdots \cdots ,p_{k}^{\left( t \right)} } \right\}.\) That is, the probability of floodwater receding is \(1 - p_{i}^{\left( t \right)}\). In a manner different from the propagation process (in which the cells are sorted based on their probabilities of flooding from high to low), we sort the cells based on their probabilities of flooding from low to high (i.e., probabilities of floodwater receding from high to low). The recovered cells at timestamp \(t\) are assigned a low probability of flooding.

Using the above calibration in the percolation process, we can mitigate the homogeneous mixing assumptions in the flood spread characterization model by adopting local heterogenous flood probabilities and achieve high accuracy in predicting the spatial distribution of flooded cells in urban networks.

Model evaluation

We employed two metrics to evaluate the performance of the model for predicting the temporal evolution and spatial propagation of flooding spread in urban networks. The first component of the model is a system of differential equations to capture the magnitude of flooded cells in networks. The objective of this component is analogous to addressing a fitting curve. Hence, we use the root mean square error (RMSE)35 to measure the error of the model:

$$RMSE = \sqrt {\mathop \sum \limits_{i = 1}^{n} \frac{{\left( {\hat{y}_{i} - y_{i} } \right)^{2} }}{n}}$$
(11)

where, \(\hat{y}_{i}\) is the predicted value, \(y_{i}\) is the observed value, and \(n\) is the number of observations. Ignoring the division by \(n\) under the square root, the formula can be considered as a formula for Euclidean distance between the vectors of predicted values and observed values. Hence, the RMSE is a normalized distance between the predicted outcomes and the observations which can be used to evaluate the model accuracy. This can also serve as a heuristic for a training model, which will be used in a pattern search algorithm to obtain the numerical solution for the model (see “Results” section).

The spatial nature of the flooding propagation and receding is captured by the outcomes of the flood spread characterization model and the network percolation process. The intersection between the set of predicted flooded road cells and the set of observed flooded road cells indicates the precision and recall of the model36. They are formulated as follows:

$$Precision = \frac{True positive}{{True positive + False positive}}$$
(12)
$$Recall = \frac{True positive}{{True positive + False negative}}$$
(13)

where true positive is an outcome where the model correctly predicts the flooded class, false positive is an outcome where the model incorrectly predicts the flooded class, and false negative is an outcome where the model incorrectly predicts the unflooded class. The calculated precision and recall allow us to assess the performance of the model by identifying the specific correctly predicted road cells.

Results

Study context and data collection

To illustrate the application and performance of the proposed network percolation-based contagion model for flood spread, we tested the model using high-resolution data related to flooded roads in Harris County during Hurricane Harvey in 2017. Hurricane Harvey was a Category 4 storm that made landfall in Houston (Harris County) on August 26, 2017, dissipated inland August 30, 201737. The torrential rainfall brought by Harvey caused intensive flooding in Harris County, where the floodwaters damaged more than 290 roads and highways38. The flooding occurred August 27 through on September 4, 2017. We collected the traffic data for 19,712 roads in Harris County from INRIX, a private company providing location-based data and analytics. The INRIX traffic data covers all available road roads—from interstates to intersections, country roads to neighborhoods. The data includes the average speed for each road segment in 5-min intervals. The timeframe of data is August 1 to September 30, covering the entire flooding period. In this case study, we used squares with a length of 400 m for generating the grid network.

In the INRIX dataset, the flooded road segments can be identified by a designation of NULL for average speed, meaning there was no vehicle driving through the segment. Comparing the data before and after Hurricane Harvey, we found that the NULL average speed appears only during the flooding period. By cross-checking the flooded roads on government reports, most of the roads with NULL speed were flooded at that time. Although the average speed data was collected in 5-min intervals, the flood situation did not evolve significantly in such a short time period. To better capture the flood propagation and receding process, we aggregated the data at 4-, 8-, and 12-h intervals to test the performance of the model. Each interval was assigned a binary value of flooding status to the road segments based on their average speed. As documented in the model section, the value would be 0 if there is no NULL speed record, and the value would be 1 if there is a NULL speed record in the dataset (indicating functional flow status for the roads).

Pattern search for parameter estimation

It is often the case that the analytical solution to the differential equation system cannot be generated. That is because the functions are usually not continuous or differentiable39. To efficiently estimate the parameters in the proposed model, we applied a global pattern search algorithm40 as a derivative-free numerical optimization method to fit the curves for each variable (i.e., \(f\left( t \right)\), \(e\left( t \right)\), \(c\left( t \right)\), and \(r\left( t \right)\))26. The objective function in this optimization process is the RMSE, which will be minimized by searching for the optimal propagation rate \(\beta\), recovery rate \(\mu\), and exposed rate \(\alpha\). To start with this algorithm, we first specified initial values for the three parameters. The change of the parameters could be either increased or decreased in each step. The algorithm would compute the RMSE for each step until it finds an optimal point at which the current RMSE is smaller than the previous one. If the algorithm cannot find the optimal point using the current step size, it will use half of the current step size and repeat the computing process. Then all three parameters will move to the optimal values. The algorithm runs iteratively until one of the stopping criteria is met: the maximum number of iterations is reached or the step size is smaller than a certain threshold. In this study, we set the maximum number of iterations to be 2,000, and the threshold for the step size is 0.0001. The initial values for models with time intervals are the same, \(\beta = 1\), \(\alpha = 1\), and \(\mu = 0\). Through our tests, the selection of values around these initial values would yield similar results for the parameters.

Table 1 shows the estimated values for the parameters and the optimized RMSE for three models. The RMSE increases a little bit with the increase of the length of the time intervals. That is because the fraction of road cells that are flooded (i.e., \(c\left( t \right)\)) is greater in longer time intervals than in shorter time intervals. But, all three models fit the flood spread patterns in road cells well since the RMSE value accounts for only a very small proportion of the peak value of \(c\left( t \right)\). In addition, the propagation rate \(\beta\) remains the same across three models. This result indicates that the length of the time interval does not affect the capability of the model to predict flooding propagation. The flood incubation rate \(\alpha\) and recovery rate \(\mu\), however, decrease with the increase in the length of the time interval. That is because the number of flooded road cells increases with the increase in time intervals. Hence, a great number of road cells would be in the flood incubation state, which leads to a low rate of \(\alpha\) (Fig. 3D–F). As such, the recovery rate must be small to represent a large number of flooded road cells. This result reveals that the length of the time interval has an influence on parameter estimation, but it does not significantly affect the fitting performance.

Table 1 Estimated values for the parameters in the contagion model.
Figure 3
figure 3

Four-state model of flood propagation and recession in the road network, analogous to the susceptible-exposed-infected-recovered model subject to a time-varying disaster profile. Day 0 is August 26, 2017; and Day 9 is September 4, 2017. (A,D) are the results from the model for 4-h time interval; (B, E) are the results from the model for 8-h time interval; and (C, F) are the results from the model for 12-h time interval.

Prediction results

Once the optimal values for the parameters are obtained, we can capture the temporal evolution of flood propagation and recession in urban network by showing the spread curves for each variable (i.e., \(f\left( t \right)\), \(e\left( t \right)\), \(c\left( t \right)\), and \(r\left( t \right)\)) (Fig. 3D–F).

Basic reproduction number \(R_{0} = \beta /\mu\) is used as a measure of the number of secondary flooded cells generated by the first flooded cell over the course of the flooding unfolding41. Mathematically, when \(R_{0}\) is smaller than 1, the propagation will not occur as the recovery rate is greater than the propagation rate42. In the Hurricane Harvey case study, the estimated \(R_{0}\) is greater than 1 across all three models and increases with the length of time interval (see Table 1). This result explains how rapidly floodwaters spread. The closer the value of \(R_{0}\) to 1, the more stable the flooding situation is. In this case, we can observe that the flood situation would change more slightly if the time interval is shorter than 4 h, since \(R_{0}\) would be close to 1. When the time interval increases from 4 to 8 h, the value of \(R_{0}\) jumps from 1.20 to 2.45, meaning that in every 4- to 8-h increment, the flood situation would change more drastically.

In addition, the changes in \(c\left( t \right)\) allow us to predict the extent of flood in the networks and critical timestamps for breakout and peak (see Fig. 3A–C). The figures show the flood spread characterization in Eqs. (3)–(6) are fit to the empirical flooding data very well. We can observe that at the beginning of the hurricane (the hourly timestamps in day 1 and day 2), the number of flooded cells grew exponentially between every 4-, 8-, or 12-h increment, and reached the peak in the middle of day 3. After that timestamp on day 3, the flood receded gradually. Using the results, we can estimate the time that the urban road network needs to recover from flooding. In this case, evidently, the recovery rate was slow, which led to a long period for recovery at some severely flooded locations.

In the next step, we examined the predictive performance for both spatial spread and temporal evolution for the flooding, shown in Fig. 4. All models perform very well in terms of predicting the specific locations of flooded segments based on the flooding data at the last timestamp. The best performance of these models appears during the peak and receding period, while the performance is not particularly good at the beginning of the flooding. That is because the locations of the initial flooded cells depend on the rainfall magnitude and spatial patterns of precipitation in different areas. The emergence of initial flooded cells is difficult to be identified by the model without additional information about the precipitation pattern. After a few hours, however, the flood propagation and recession closely follow the percolation process since the majority of the initial flooded cells were identified correctly at that time. Hence, the model could predict the spatial spread of flooding after the locations of initial flooding are identified. In terms of the precision metric, the best model is the model based on a 4-h time interval, and with the increase of the length of the time interval, the performance of the model decreases a little bit. In terms of the recall metric, these three models show similar performance and show promise for special and temporal floodwater characteristics, especially during the peak and recession periods.

Figure 4
figure 4

Prediction performance of the proposed model for flood propagation and receding. (A) Precision of the model for different time intervals; and (B) Recall of the model for different time intervals. Day 0 is August 26, 2017; and Day 9 is September 4, 2017.

The three sets of figures in Fig. 5 show the true positive and false positive results for the road segments in the network. Here we converted the cells back to their segments. All segments in the flooded cells are considered flooded segments. We plotted three figures for each model at different timestamps: beginning, peak, and recession periods. As we observed in the figure, the flooding initially occurred at different locations in the timestamp of the beginning period (Fig. 5, left panel). With the continuous rainfall, the floodwater propagated from the initial flooded road segments to their neighbors (middle panel of Fig. 5). After Hurricane Harvey dissipated on August 30, the flood started to recede from the road segments at the edge of the flooded area (Fig. 5, right panel). The figures show that more than 90% of the flooded segments are captured by the proposed model, and the predictive performance decreases a bit with the increase of the length of the time interval. These findings support that the proposed model can accurately predict the spatial and temporal characteristics of flooding spread in urban networks.

Figure 5
figure 5

Examples of three prediction models for different phases of the flooding period (i.e., propagating, peak, and receding phases). (A) the model for predicting the situation in next 4 h; (B) the model for predicting the situation in next 8 h; and (C) the model for predicting the situation in next 12 h. The true positive (yellow) and false positive (red) road segments are all actual flooded road segments from the empirical INRIX data in a specific time interval.

Adaptation of the model

The proposed network percolation-based contagion model could be used to model flood spread in other regions and flooding events. Different extreme events have different attributes such as varying durations, intensities and geographical scales, which result in varying numbers and durations of flooded roads. To demonstrate the adaptation of the model in other contexts, we conducted further controlled experiments to examine the adaptation of the model to different topological properties of urban networks (average degree), the propagation rate which reflects the stress of rainfall, and flood incubation rate and recovery rate. We change the value of one parameter that was learned from the INRIX data and control other parameters to be constant as shown in Table 1. The results show the extent to which different parameters could affect the predicted outcomes. This experiment also provides evidence for supporting the generalizability of the proposed contagion model for different cities and various intensities of flooding.

First, we tested the effect of average degree of an urban network on the flood dynamics by keeping the same values for other parameters. Figure 6A shows the changes in model results for predicting the flood status in 4-h time intervals. With the increase of the average degree \(k\) of the networks, the growth rate of the flooded cells increases significantly during the propagation phase. That is because the network with a large \(k\) will allow the flooded road cells to infect more neighbors, which expedites the spread of flood even when the propagation rate remains the same. In addition, a large \(k\) would decrease the time to reach the peak of flooding since the flooded cell would have more connected neighbors. In contrast, a small \(k\) would lead to a longer period of flooding, although the fraction of flooded cells will not reach to the same flooding scale as the urban networks with a large \(k\).

Figure 6
figure 6

Adaptation of the model to different topological structures of urban road networks (A); different propagation rates (B); different incubation rates (C); and different recovery rates (D), by keeping the other parameters constant.

We further tested the effects of the three macroscopic parameters (i.e., \(\beta\), \(\alpha\), and \(\mu\)) on the predicted flood spread (see Fig. 6B–D). On the one hand, similar to the impact of the average degree, the increase in the propagation rate \(\beta\) and the flood incubation rate \(\alpha\) would decrease the time to reach the peak of flooding and the number of flooded cells at the peak, but increase the time to recover to the normal situation. On the other hand, an increase in the recovery rate \(\mu\) significantly decreases the number of flooded cells at the peak, but the time to reach the peak and the time to recover to a normal situation are not changed significantly. The results of these experiments show that by adjusting parameters, the model can be adapted to different scenarios with various intensities of flood events and topological structures of urban networks.

Discussion and concluding remarks

We have presented a predictive model which integrates flood spread characterization and network percolation processes to forecast the propagation and recession of flood in urban road networks. The model is formulated as a system of ordinary differential equations relying on three characteristics: flood propagation rate \(\beta\), flood incubation rate \(\alpha\), and recovery rate \(\mu\), analogous to the SEIR model. Using the output of the model, the network percolation process is obtained to model the spatial patterns of the flooded road cells over time. The study showed the application of the proposed model in an empirical case study of urban flooding in Harris County road networks during Hurricane Harvey in 2017. The application of the model using empirical data informs about some key findings and implications for urban flood risk prediction.

First, the extent to which flooding builds up in a network and how fast it recovers are shown to be dependent on the basic reproduction number \(R_{0} = \beta /\mu\), which can help us infer the effective time period in response to the flooding in road networks. In the case study, we found that the basic reproduction number changed significantly when the time interval increased from 4 to 8 h. This result indicates that flood situation evolves slightly in time intervals shorter than 4 h, but, the situation changes dramatically for time intervals that are longer than 4 h. Hence, to better monitor the flood in road networks, the status of roads should be observed in every 4 h. While urban networks in different cities may have different topological properties and experience different rainfall magnitudes, the proposed model can be adapted by adjusting the parameters. Results can inform decision-makers about the spatial propagation and temporal evolution of flooding.

Second, the spatial mechanisms of flood propagation and receding in urban networks is captured based on a network percolation process. This finding highlights the contagion effects of a flooded cell on its neighbors. Since the road cells are spatial networks, different from social contact networks, the spread of flood would be limited by geographical constraints. Hence, the local contagion from a cell to another cell through the link connecting them would be the main spreading pattern. Practically, this finding provides an important implication for flood control in urban systems. One effective control strategy would be to increase drainage capacity and to build retention ponds around the roads with a high likelihood of flooding. This strategy can reduce the proportion of flooded cells among the unflooded cell neighbors, which can contribute to reducing the probability of flooding in the next timestamp.

Third, the model and its application show good performance in predicting the scale and locations of flooded road cells in disasters. In addition to the response strategies, this model could also support proactive strategies for coping with future flood events. In particular, the proposed model can be incorporated in an early warning system. The system can help officials and the public be aware of the flood situations in the coming hours so that they can perceive flooding risks surrounding them and make proactive preparations. For example, cars and buses drove through floodwater during Hurricane Harvey43. With the predicted outcomes of this contagion model, people can be informed about roads at a high risk of flooding in the next few hours. This predictive information could significantly contribute to reducing the economic loss of transportation agencies, and possible loss of life of residents.

Finally, the model we introduced in this paper is simple and robust and can be adapted to various phenomena. Unlike hydrodynamic models and machine learning models that rely on significant data and computational resources, the proposed mathematical model provides a simple but powerful tool for predicting the spatial–temporal evolution of flooding in urban networks. Also, the proposed network percolation-based contagion model can be adapted for modeling other network spread phenomena. Future studies can further investigate the proposed model in other general predictive tasks, such as the spread of traffic congestions in road networks44, infectious diseases in human contact networks45, and innovations in global communities46. This model also has some limitations; for instance, for initial flooded segments without flooded neighbors, it is usually difficult to predict these segments without other information. Future studies can focus on improving the model to precisely predict initial flooded road segments based on rainfall magnitude and capacity of urban drainage systems. In addition, the model considers the parameters such as propagation rate being constant throughout the process, in order to provide a simple but fairly accurate tool for flood prediction. Future studies could extend our model by considering the dynamics of the beta to improve the performance of the model.