Introduction

Natural disasters suspend normal human activities and cause severe community-level disruptions that exceed capacity in multiple ways1. Recovering the current conditions to pre-disaster conditions in many perspectives is called a post-disaster recovery (PDR) process2. There are two reasons why the PDR process needs high budgets. At first, large-scale catastrophes and subsequent high damages require high budgets3 and workforce4 for the PDR process. The other reason for demanding high budgets and workforce is that boosting the PDR process is affected by various costly but influential factors: (1) a functional state of physical infrastructure5,6,7,8, (2) a functional state of social vulnerability1,9,10,11,12,13, and (3) socioeconomic attributes14,15,16,17.

The PDR process is usually modeled as a complex system due to the interconnected influential factors5,7,9,18,19,20,21,22. Previous studies have constructed the PDR process consisting of the physical network and the social network interwoven by influential elements5,6,7. However, it is hard to capture the functionality of each layer and the linkages across them. Instead, previous studies have used the proxies of two layers. The proxies of physical infrastructure are the moonlight-adjusted nighttime light (NTL) data23,24, electricity data5, and water deficit data19. The other proxy of social infrastructure is normalized visit density (NVD) to certain types of points of interest (POIs) using mobile phone location data19,25,26 coming from the number of visits to POIs. NVD of POIs serves as an indicator of the community recovery, as community activity necessitates visits to POIs27. Previous studies have noted that the aggregation of human mobility is a key factor for the urban vitality index28, which in turn is related to community activity, a main building block of the effective disaster recovery of communities29. Thus, the number of visits to POIs determines the community recovery. This study builds upon the literature and models the PDR process using two proxies of influential factors, NTL data and NVDs of POIs.

Previous studies have focused on correlations between influential factors but did not model causal relationships between influential factors1,8,11,12,13. The limitation of correlation analysis is that it often gives spurious outcomes in the complex systems. Spurious results occur when predicted results and tested counterfactual scenarios do not come from causal relationships30,31,32 in the complex system33. This is mainly because correlation analysis does not capture confounders32. The confounder is a variable that simultaneously affects both a dependent and an independent variable. If we only revealed the associations among factors, we could not determine whether the associated factors are causally related due to the confounding effect. Consequently, we obtain ambiguous relationships among factors that produce spurious results. On the other hand, a causal relationship represents which node affects the other neighbors of the nodes and explains the directed relationship from the cause to the effect. The confounder can be captured by revealing directed relationships among influential factors. For example, in the second section of Fig. 1a, a confounder node Xi1 only reveals the undirected node associations, but we can easily observe the confounding effects in the causally related nodes with directed links. Recent studies have revealed that confounding analysis is indispensable in the era of big data30 and complex systems33, especially in urban areas34, due to many interconnected factors.

Fig. 1: Spatiotemporally sequential recovery plans by causal discovery methods and a framework of this study.
figure 1

a Descriptions of spatiotemporal recovery policy. A spatiotemporal recovery policy is suitable for disproportional recovery impacts between different types of counties. b Descriptions of associations and causal relationships. Spatiotemporal recovery policy requires causal relationships to gain the practical implications of the spatiotemporal recovery policy. c Data preprocessing of POI and NTL data. This study uses the POI dataset and NTL dataset to obtain the NVDs and brightness aggregated by counties, types of POIs, and different time scales. d Descriptions of causal discovery and two grouping methods. We utilize the causal discovery method to obtain causal links. We use two grouping methods to obtain internally validated causal links, the POI-level and county-level node aggregation and simple block bootstrapping merging causal graphs at different time scales. e Model interpretations. We interpret the graph union from three different perspectives: (1) interpreting delays within two nodes to validate the causal links, (2) node centrality to identify the spatial heterogeneity, and (3) hierarchical structures to reveal the causal relationships between different types of counties. Finally, we suggest spatiotemporally sequential recovery sequences.

Through the causal relationship in the complex system, previous studies in several fields have inferred regionally heterogeneous outcomes, such as earth systems35,36,37, human body35,38, local sustainable development39, and urban heat island40. These studies provided perspectives on the usage of causal discovery methods. Causal discovery is a method of revealing the causal relationship between various datasets without randomized controlled experiments, used in the machine learning community30,41,42. Causal discovery considers multivariate features in the time series data with causal graphs represented by nodes (variables in time series data) and directed links (causal relationships between nodes). Through the causal graphs, we can trace the direct and indirect impacts of the nodes, as causal relationships inherently include the time-delayed relationship among regions35,43. This also holds true when we subsample or aggregate the time series data43,44,45. Previous studies have revealed potential causal links between regions and assessed the performance of causal discovery methods by comparing true positive correlation rates. Various methods have been employed, such as Granger causality for human brain regions38, the PC algorithm for country-level air temperature and the elements of the human heart35, and the graphical Lasso regression for continent-level precipitation37 and country-level finance interactions46.

Although different areas explored regional properties through causal discovery methods, previous studies in the PDR management area have not compared urban and rural properties with respect to heterogeneous spatiotemporal government PDR policies. Government PDR policy is a policy for the recovery action or effort from the local or federal governments to mitigate disaster damages. Through the government PDR policy, governments adjust the timing and intensity of the support efforts to meet the effective, efficient, and equitable recovery process2. Some examples missed the appropriate recovery period and suffered recovery failure47,48,49.

Another critical aspect of government recovery policy is the recovery sequences between regions. In this paper, the PDR sequence is defined by the recovery sequence to determine the recovery order by county. Currently, the urban area owns the top priority of the recovery due to the damage difference between urban and rural areas. Specifically, the physical and social infrastructure recovery cost of urban areas is higher than that of rural areas50,51,52. Sometimes, rural areas began the recovery with insufficient resources50,53 by disaster resilience index-based approaches25. This leads to slower recovery in rural areas compared to urban areas, a discrepancy inherently arising from economic inequality54. Despite the importance of timing, intensity, and spatially sequential recovery plans, previous studies have not modeled the sequential and spatial impact of government policies on the PDR process.

Therefore, the spatiotemporal sequential recovery plan should be supported by the directed spatial relationships to consider confounding impacts. The sequential recovery plan is important in different types of areas, including urban, suburban, and rural counties. However, associations between regions cannot represent spatiotemporally sequential recovery plans, while causal relationships can represent spatial directions. In this work, we verify the effects of recovery sequences between urban, suburban, and rural areas with causal analysis. The general logic of the spatiotemporally sequential recovery plan is expressed in Fig. 1a.

In summary, this work has two motivations. First, methodologically, previous studies regarding PDR did not consider the causal relationships between various factors. Previous studies have focused only on intra-level associations between influential factors. Second, practically, previous studies have not focused on spatial heterogeneity in the disaster recovery process based on a data-driven spatiotemporal propagation analysis of PDR impacts.

To bridge these gaps, this study uses causal discovery methods to infer the PDR sequential recovery plans in the PDR process with disaster-related datasets from Hurricane Irma, 2017, for six months from the first day of July (see Fig. 2a). This article finds the causal graphs of different PDR-related datasets: (1) the NVD of certain types of POIs (see Fig. 2c) using mobile phone location data and (2) moonlight-adjusted nightlight data (see Fig. 2d). We use Structural Analytical Modeling (SAM), one of the causal discovery techniques, to infer the directed causal graph. Subsequently, this study uses two grouping methods, POI-level and county-level node aggregation and simple block bootstrapping at different time scales, to obtain the internally validated causal links of the different causal graphs (see Fig. 2b). Based on these results, we reveal the recovery directions between different types of POIs and find how spatiotemporal impacts affect the PDR process. We also obtain data-driven proof of the spatially sequential recovery plans by uncovering the existence of hierarchical structures in a causal graph union and extract the spatiotemporal propagation of the PDR impacts. This framework is illustrated in Fig. 1b.

Fig. 2: Descriptions of datasets and grouping methods.
figure 2

a Descriptions of the spatiotemporal scope of Hurricane Irma, 2017. The black line represents the path of Hurricane Irma. Colored regions identify different types of counties, urban, suburban, and rural counties. The coordinate of the center of this image is N 2832'1” E 27833'5” with CRS EPSG:4269 - NAD83. b Explanations of two grouping methods. We use two grouping methods. One is the county-level node aggregation to obtain internally validated links within one county. The other method is the simple block time series bootstrapping to have internally validated links between different counties. If the number of the same links is not less than 5, we treat it as an internally validated link for stable results. c 10 focused types of POIs and brightness. We select ten different types of POIs and brightness related to disaster recovery. d County-level time series data. The upper figure describes county-level time series data of normalized visit density on restaurants. The lower plot describes county-level time series data of brightness.

Results

We use SAM to model a causal graph \({{{{\mathscr{G}}}}}_{t{p}_{m}}=({{{{\mathscr{V}}}}}_{tp},{{{{\mathscr{E}}}}}_{tp})\) denoted by a set of nodes \({{{{\mathscr{V}}}}}_{tp}\) and a set of links \({{{{\mathscr{E}}}}}_{tp}\). Note that a node represents a county, and a link represents a causal relationship from one node to the other. Specifically, this paper defines the link from node 1 to node 2 as the recovery sequence from node 1 to node 2, meaning that node 1 begins to recover before node 2, and changes in node 1 directly affect node 2 (see Results section for details). We obtain causal graphs at different time scales in Fig. 3b–f. The upper part of each panel describes the results of county-level aggregation, and the lower part depicts the results of POI-level aggregation. This study uses aggregation of POI nodes at the county level and aggregation of county nodes at the POI level. The county-level node aggregation groups ten different POIs’ NVDs into one county node. On the other hand, the POI-level node aggregation groups seven different county nodes into one POI node. In the aggregation step, we obtain internally validated links that imply the recovery sequence after Hurricane Irma, where the direction of a link represents the recovery sequence from a node that recovers first to a node that recovers later. An internally validated link from the grouping methods is defined as an integrated causal link from one node to the other within different causal graphs representing repeated robust results. We draw the internally validated link based on graph union if and only if there are at least five links with the same node and relationships. Threshold five is selected to determine the internally validated link, eliminating 30.6% (56) of the causal links.

Fig. 3: Causal graphs at different time scales.
figure 3

a A causal graph union aggregated from b to f. b A causal graph for July 1st, 2017 - December 31st, 2017 (tp1). c A causal graph for July 1st, 2017 - September 10th, 2017 (tp2). d A causal graph for September 11th, 2017 - September 30th, 2017 (tp3). e A causal graph for October 1st, 2017 - December 31st, 2017 (tp4). f A causal graph for September 11th, 2017 - December 31st, 2017 (tp5). Panels (b) to (f) in this figure have two parts. The upper part shows the county-level aggregation results, and the lower part describes the POI-level aggregation results. g The proportion of types of causal links of causal graphs from panels (a) to (f). Intra-regional links represent the links within a certain county. On the other hand, inter-regional links represent the links between two counties.

We observe two general patterns of internally validated links using POI types. The first pattern is the county-level intra-regional homogeneity which explains the homogeneity of POI trends in one county. We find that a causal graph union has 18.2% of the intra-regional internally validated links, and all causal graphs have more than 10% of the intra-regional internally validated links within eleven nodes (ten POIs and one brightness) from Fig. 3g. Note that the probability of drawing the intra-regional link in the random graph is 13.1%. The high proportions of intra-regional internally validated links indicate that the patterns of NVDs within one county are similar to each other. For further analysis, see Supplementary Table 1 in the supplementary material to check the spatial homogeneity by the correlation coefficients and Supplementary Table 2 in the supplementary material to check the Granger causality test results between types of POIs.

The second pattern is the consistent relationships between different types of POIs, (1) from school POIs to childcare services POIs, and (2) from restaurant POIs to entertainment POIs in Fig. 3a to f. The first type of relationship is interpretable since, during and after the natural disasters, school POIs can substitute childcare services POIs, but childcare services POIs cannot take the role of school POIs supported by the current role of childcare services in some public schools55,56. Therefore, policymakers recovered schools before childcare services. On the other hand, the possible explanation for the interdependence from restaurant to entertainment POIs is that restaurant POIs are an important type of POIs, while entertainment POIs are not necessary during and immediately after Hurricane Irma. Note that restaurant POIs are the necessary social capital for living, and entertainment POIs do not have the first priority for the disaster recovery process of the social capital. This relationship also holds before and after Hurricane Irma (See Supplementary Figs. 2 and 3).

Figure 4b also supports the delayed relationship between restaurant POIs and entertainment POIs during and after the disaster. The red trajectory represents the period around Hurricane Irma from September 8th to September 15th. This red line forms a polygon whose upper part shows the period before and during Hurricane Irma and whose lower part shows the period after Hurricane Irma. Specifically, if the red line has a linear relationship, it represents the prompt and same responses between restaurant and entertainment POIs before, during, and after Hurricane Irma. However, the formation of a polygon indicates a relationship change between two different periods: the pre-disaster period and the post-disaster period. Specifically, in Fig. 4b, the two types of POIs simultaneously decrease in the pre-disaster period, which can be attributed to the equal impact of the hurricane on the two types of POIs. Subsequently, in the post-disaster period, the recovery of restaurant POIs is faster than that of entertainment POIs, suggesting a delayed relationship between restaurant and entertainment POIs. Figure 3a to f further support the delayed relationship between restaurant and entertainment POIs.

Fig. 4: Model interpretations.
figure 4

a Node centrality of the graph union in Fig. 3a. We compare the in-degree and out-degree of the graph union. b A trajectory plot over time between NVD on restaurants and entertainment in Orange County. We compare two different types of POIs in Orange County to validate the causal relationship. A red line represents the points during and right after the disaster. c A trajectory plot over time between NVD on restaurants in Orange County (Urban) and Polk County (Suburban). d A trajectory plot over time between NVD on restaurants in Orange County (Urban) and Lake County (Rural). e NVD time series data on restaurants by three different types of counties.

We obtain universal patterns of the spatial relationships between county nodes based on the graph union (Fig. 3a) over different periods (Fig. 3b–f). We obtain 13 internally validated links from the graph union in Fig. 3a. These internally validated links represent causal relationships in different counties and describe the propagation of the recovery process between counties.

The county nodes are classified into three different types by population density, (1) urban, (2) rural, and (3) suburban counties (see Methods section). We can clearly see the heterogeneity by the types of counties. For example, Fig. 3a shows the interdependence of Orange County with Seminole, Osceola, Brevard, Polk, and Pasco counties. Specifically, we observe the spatial pattern that urban counties’ out-degree centrality is greater than or equal to the in-degree centrality, while rural counties have a higher in-degree centrality than out-degree centrality in Fig. 4a and c also support the spatial heterogeneity of county-level recovery patterns over time. Also, the polygons in Fig. 4c and d explain the interdependence between two counties, like the explanations in the 4b.

The last pattern we observe is the independence between the NTL data and the POI data. The correlation coefficients between POI and NTL data are less than or equal to 0.2 (see Supplementary Table 1 in the supplementary material). Also, the Granger causality test between POI and NTL data does not make any causal links between them (see Supplementary Table 2 in the supplementary material). It implies that the NTL and POI data are uncorrelated. Likewise, the causal discovery methods do not give any implications between NTL data and POI data. This independence implies that the proxies of the physical and social infrastructure are irrelevant. The contrasting independence can be explained for two reasons. The first reason is that the NTL dataset is not a good indicator of representing daily changes in physical infrastructure. For example, the NTL dataset only visualizes the intensity of daily night light, so the data can be unstable due to cloudy conditions and cannot quickly represent the recovery states.

The other reason for the contrasting independence is that Hurricane Irma did not severely damage the county compared to the other hurricanes. For example, counties in Florida overcame most of the power outages caused by Hurricane Irma in one week, compared to Puerto Rico counties that were severely damaged by Hurricane Irma and Maria consequently23. We have discussed the two reasons in detail in the Discussion section.

The above findings on spatial heterogeneity of recovery only compared the in-degree and out-degree centrality between different types of counties. However, they did not give us the overall spatially sequential recovery plans among the types of regions. We find the hierarchical structure of the causal graph union by network motif analysis to reveal the overall spatially sequential recovery plans. Network motif analysis is used to find structural properties by looking at the statistically frequent triads of the entire network57,58,59. Previous studies have stated that the high frequencies of certain types of network motifs imply the existence of a hierarchical structure in the network57. There are three network motifs positively related to the hierarchical structures in Fig. 5a: (1) double-dominant (DD), (2) double-subordinate (DS), and (3) transitive triad (TT)57. On the other hand, frequent pass-along (PA) network motifs support the nonexistence of the hierarchical structure.

Fig. 5: A hierarchical structure in the causal graph union.
figure 5

a Network motifs of hierarchical structures. There are four types of network motifs related to the hierarchical structure. PA is negatively related to the hierarchical structure, and the others are positively related to the hierarchical structure. b Normalized z-score on network motif analysis. We extract all types of triad network motifs and compared them to the networks generated by the Erdos-Renyi model. Red boxes are network motifs related to the hierarchical structure. c A hierarchical structure in a graph union. We relocate all nodes by different types of counties and observe the hierarchical structure with few exceptions in the suburban counties.

We compare the frequencies of the network motifs with the mean frequencies of the network motifs of 100 null graphs generated from the directed Erdos-Renyi random network model with 7 nodes and link probabilities of 0.260. From61, we use the normalized z-score to compare the frequencies of network motifs as follows statistically:

$${Z}_{i}=\frac{{F}_{target,i}-{\mu }_{null,i}}{{\sigma }_{null,i}},$$
(1)

where Zi is the normalized z-score of the i-th network motif, Ftarget,i is the frequency of the i-th network motif on the target graph, μnull,i is the mean of network motifs on the null graph, and σnull,i is the standard deviation of network motifs on the null graph model. The results of network motif analysis in Fig. 5b show the high z-score of DD and DS and the low z-score of PA. It supports the existence of a hierarchical structure. Figure 5c is an example of a hierarchical structure in the graph union. The graph union can be classified into urban, suburban, and rural counties. The urban county class emits the causal link to the other classes of counties, suburban and rural counties. The rural county class receives the link from the urban and suburban classes. The suburban county class interacts with both classes. The overall direction of the graph union goes from urban counties to suburban and rural counties.

(1) Intra-regional homogeneity, (2) inter-regional heterogeneity, and (3) the hierarchical structure revealed from the causal graph union motivate us to establish the data-driven spatiotemporal PDR sequences and propagation of spatiotemporal recovery policy. First, Orange and Seminole counties are the most important counties for PDR based on the hierarchical structure in the graph union. Changes in Orange and Seminole counties directly affect almost all nodes except Lake, and all trends are caused by Orange in this study. Second, the recovery order of Osceola and Lake is relatively low compared to the other counties. Third, Pasco, Polk, and Brevard counties interacted with urban and rural counties. Therefore, we can set the recovery sequence as {Orange, Seminole, (Pasco, Polk, Brevard), Osceola, and Lake}. Note that a brace represents a sequence that has an order within elements. For example, {A, B, C} means that A starts recovery first, and B and C start recovery sequentially. Note that the elements of the brace in the inner parentheses can be interchangeable. On the other hand, we can also set up the recovery sequences by types of POIs, {(Restaurants, Schools), (Entertainment, Childcare services)} since changes in restaurants affect entertainment POIs and changes in schools highly influence childcare service POIs.

Discussion

This study utilized SAM, one of the causal discovery methods, to reveal the spatiotemporal relationships between counties. This study extracted a causal graph of NVD to certain types of POIs and NTL data for periods before, during, and after Hurricane Irma in 2017. This study has shown (1) intra-regional similarity, (2) inter-regional dissimilarity, and (3) a hierarchical structure of PDR between regions to help policymakers set up spatiotemporally sequential recovery plans in PDR.

First, results show the intra-regional similarity in the county-level aggregation. Universal trends of POIs in each county are similar, although they have different properties by the types of POIs. Specifically, the intensity and timing of the PDR process by NVD within the same county are similar in Fig. 4b, supported by the previous studies27,62. Second, we found the inter-regional similarity by types of POIs. Each type of POIs has similar patterns between all the selected counties. For example, restaurant POIs between different counties share similar patterns in Fig. 2d. Universal POI interdependence from restaurant to entertainment POIs and from school to childcare service POIs, is observed in Fig. 3a to f and 4b. It is also supported by previous studies in different disasters, e.g., Hurricane Harvey in 201727. The universal interdependence among POIs implies that the prioritization and necessity of various types of POIs are consistent in the PDR process, as illustrated by the recovery sequences for different types of POIs in Results section. The interdependence can serve as a foundation for developing disaggregated spatiotemporal recovery plans tailored to specific types of POIs.

Third, we found the regional dissimilarity between urban, suburban, and rural counties in the PDR process. This regional dissimilarity is a basis for explaining the urban-rural differences of the PDR process. We observed the regional dissimilarity of in-degree and out-degree by types of counties in Fig. 4a. Also, it is supported by Fig. 4b and e by explaining the spatial heterogeneity and the delays of county-level recovery patterns. It can be identified by the distinct dissimilarity of socioeconomic attributes between urban, suburban, and rural counties in Table 1, such as the populations and budgets they allocated for Hurricane Irma. This observation can be assisted by the previous studies and perspectives of urban-rural differences53,63,64. The delayed relationships among three types of counties extend the interdependence among them to the regional dissimilarity in the PDR period, shown by Fig. 4c and d. Therefore, the regional dissimilarity and the delayed relationships support the internally validated causal graphs in the PDR process, indicating that the NVDs in urban counties began recovering before the NVDs in rural counties.

Table 1 Sociodemographic attributes of selected counties.

Furthermore, this study has shown the existence of the hierarchical structure in the causal graph union. The network motif analysis in Fig. 5b inferred the existence of the hierarchical structure. Figure 5c showed a candidate of the hierarchical structure classified by urban, suburban, and rural county levels. The graph union also has internally validated causal links from urban to suburban and rural counties. This direction implies the temporal dependence of POI recovery between county types.

The independence between the NTL data and the POI data can be explained for the following reasons. Since Hurricane Irma did not cause severe damage to the physical infrastructure system of selected counties due to disaster preparedness, the selected counties did not suffer severe power outages and quickly recovered the original electrical functionality. Therefore, NTL data, proxies of electricity outage, appear independent of POI data, a proxy of social infrastructure. Regarding hurricanes with severe damage, we can observe the evident interdependence between the NTL data and the disaster recovery process1,23.

The findings of this study have significant implications for setting up spatiotemporally sequential PDR plans. The recovery sequence plan is a good foundation for efficient and fast recovery policies. Targeted recovery policies can be developed when policymakers allocate the budget to a group of the nodes affecting a certain county (parents). For example, an increase in NVD in Orange and Polk counties promotes the recovery of Pasco County. Furthermore, Osceola County is directly affected by Orange, Seminole, and Polk counties, but not Brevard or Pasco County. Accordingly, to boost the recovery of Osceola County, we can promote the recovery of interconnected counties - Orange, Seminole, and Polk counties, including Osceola County itself.

This study suggests two different directions of spatiotemporal recovery plans from the above findings. The first direction is to build upon the current recovery policies with special considerations for rural areas. In other words, this policy starts the PDR from urban counties to suburban and rural counties. Simultaneously, this policy allocates PDR support to the selected suburban and rural areas that are highly likely that insufficient recovery actions will be provided. Adjusting the recovery speed by types of counties is necessary to alleviate the consequent dangers of hurricanes between regions. The results of this study, the causal graph unions, identify the suburban or rural areas that are not spatially related to the urban areas in the PDR process. Those suburban or rural areas will be candidates to be considered with special considerations for rural areas. Also, policymakers can establish recovery priorities when the resilience index and the extent of damage in suburban and rural areas are similar. The amounts of budgets to be allocated for the suburban or rural areas are the future direction. Merging considerations for rural areas with current recovery policies is straightforward as policymakers already allocate the budgets based on the fundamental disaster-related statistics and vulnerability indices of counties, such as population density, hurricane damage estimates, and social vulnerability index65, and it naturally yields the urban area first recovery policy. Currently, this pattern can be managed through proactive strategies in the suburban and rural areas during the pre-disaster preparedness66,67, but no such strategies exist in the PDR process. Adding considerations for rural areas to the current PDR process might simplify its implementation.

However, there exists room for more efficient recovery strategies than the first direction. For example, we do not know the holistic recovery impact of one county from the other counties through the first direction. When a certain county is severely damaged, we usually focus on the county itself and ignore the impact of other counties. However, certain types of POIs are interconnected within the county and the other neighboring counties. It implies that supporting the recovery of other neighboring counties indirectly helps the county boost the recovery.

Therefore, we propose an alternative to enhance the PDR of the targeted county by leveraging the indirect recovery impacts from neighboring counties. To boost the recovery of the targeted county, policymakers should initiate the recovery of counties directly affecting the targeted county, similar to the spatiotemporally sequential PDR plans. There are three steps to establishing the spatiotemporal sequential PDR plans. The first step is to identify the counties impacting the recovery of POIs by using the causal graph union. Second, policymakers select the county in urgent need of the PDR process. The final step is to allocate the recovery budgets for the targeted county as well as counties affecting the targeted counties. By considering the indirect impacts of neighboring counties, we can boost the PDR process of the target county. This approach will help policymakers mitigate the recovery equity issues in PDR68,69,70.

This work is suitable for organizing the county-level PDR process in an era of frequent natural hazards. Recent Intergovernmental Panel on Climate Change (IPCC) reports have warned of the concurrent and repeated natural hazards of climate change driven by human activities71. Simultaneously, IPCC stated that the impacts of natural hazards disproportionately affect people and cities vulnerable to natural hazards71, leading to the regional difference among various types of regions and countries. United Nations Office for Disaster Risk Reduction (UNDRR) also highlighted that disadvantaged groups disproportionately incur inequality due to the vicious cycle of climate-related disasters72, making disadvantaged groups vulnerable to natural hazards. However, IPCC and UNDRR cannot organize the fast PDR process in sustainable and resilient infrastructure systems and communities at the county level. This study provides spatiotemporally sequential PDR plans which can boost the PDR process by using mobile phone location data to estimate the state of POIs. Since frequent natural hazards require stakeholders to achieve a faster PDR process than before, spatiotemporally sequential PDR plans can serve as an alternative to current recovery policies.

This study elucidates the new usage of mobile phone location datasets that have not been used for understanding the hierarchy of the PDR. Current opportunities of mobile phone location datasets for the natural hazards are population displacements and evacuation modeling, analyses on migration and recovery, and damage estimates73. This study inspects the spatiotemporal relationships of POIs by counties, which are easily overlooked but are still crucial in the complex dynamical systems of the PDR process. The usage of the mobile phone location dataset empowers revealing the spatiotemporal relationships that cannot be detected before.

This study scientifically contributes to understanding spatiotemporal heterogeneity and how urban counties affect suburban and rural counties in PDR. In particular, state-of-the-art causal discovery methods are used to identify causal relationships between regions. The causal discovery method is also applicable to modeling any inter-regional dependence from the mobile phone location data. Using this method, we can identify how the PDR process spatially disseminates over time. This work also addresses the importance of selecting suburban counties to explain spatial heterogeneity clearly. Furthermore, this study proposes two node grouping methods to clarify the causal relationships and obtain internally validated causal links. These grouping methods help us manage large time series datasets and raw causal links. They also enable the creation of causal graph unions, categorized by different POIs and time scales. Moreover, we reveal the hierarchical structure in the PDR process by network motif analysis. We derive suggestions for spatiotemporally sequential PDR plans from the causal graph union and the hierarchical structure.

We summarize the validation of this study from similar case studies. First, we found the county-level intra-regional and POI-related patterns from Hurricane Harvey27. The relationships between childcare services and schools are supported by public school cases55,56. Second, a systematic review of urban-rural differences in PDR supports the regional recovery heterogeneity among types of counties53. Third, the recovery priority from urban to rural areas is emerging due to the damage difference affected by hurricanes50,51,52.

There are still some limitations to this study. First, this study lacks a measure for model effectiveness as it does not include a ground truth for the causal graph. Using data fusion on the survey dataset regarding the causal relationships between POI types, we can obtain a sample of a ground-truth causal graph and estimate the model. Second, this study focuses solely on counties with minor damage, excluding those with severe or complete damage to POIs and housing. Although Hurricane Irma was one of the most damaging storms, Orlando and its neighboring counties were not severely damaged due to their preparedness for hurricanes based on previous critical experiences on hurricanes. As the objective of this study is to explain the urban-rural differences, we limited our dataset to counties with minor damages. With a dataset on severely damaged urban and rural counties, we can further examine whether the interdependence exists between physical infrastructure and social networks from the causal discovery method. Third, we do not model and quantify the intra-regional and inter-regional recovery impacts. This work only suggests a candidate for the spatiotemporally sequential PDR plans based on time series data. Future research will focus on developing metrics for regional interdependence and integrating them into data-driven PDR planning models.

This research serves as one of the building blocks of the inter-regional PDR dynamics to evaluate the spatiotemporally sequential recovery plans between regions quantitatively. Furthermore, this work will help policymakers develop spatiotemporally sequential PDR plans with limited budgets and a limited workforce.

Methods

Disaster information and spatiotemporal scopes

We select Hurricane Irma as a target disaster to explain the causal relationships between urban and rural counties. Hurricane Irma was a Category 5 hurricane that hit Florida in the United States on September 10, 2017, and dissipated on September 13, 2017. FEMA provided a housing assistance program by Hurricane Irma in Florida for homeowners worth $414 million3. We handle datasets covering the six months from July to December 2017 to investigate the impacts of Hurricane Irma. We point out that the six-month period is sufficient to describe the PDR process from the mobility patterns of POIs27. Mobility patterns from the previous disasters returned to normal within two months - see Supplementary Figure 1.

We select a cluster of urban and rural areas to analyze the inter-regional recovery dependence. Orlando is the largest city affected by Hurricane Irma in central Florida. Therefore, seven counties are selected including Orlando and its surrounding counties: Polk, Orange, Osceola, Seminole, Pasco, Lake, and Brevard counties.

We choose urban, suburban, and rural counties based on the population density of 1,000 people per square mile shown in Table 174. If the population density of a county is greater than 1,000, that county is considered an urban county. Rural counties are selected when the population density is less than 500. The urban counties are Seminole and Orange counties, and the rural counties are Lake and Osceola. Suburban counties in this paper are defined as counties that have both urban and rural counties’ characteristics. We especially treat Pasco and Brevard counties as suburban counties since most of these counties are densely populated, but the county’s population density is less than 1,000. On the other hand, Polk County is also selected as the suburban county since most of the county is sparsely populated, but part of this county contains a highly densely populated area.

Datasets

Daily moonlight-adjusted nighttime light (NTL) data represent the daily nighttime brightness of the surface by Suomi National Polar-orbiting Platform/Visible Infrared Imaging Radiometer Suite (SNPP/VIIRS), preprocessed by National Aeronautics and Space Administration23,24. This dataset contains daily NTL monitoring results at a 500-meter resolution. We utilize NTL data as the brightness of certain regions at night and as an indicator of electricity after the disaster for the physical network. The lower part of Fig. 2d shows the county-level aggregated NTL data over time.

We use the daily visit dataset of POIs to illustrate the social network in the multilayer network structure. The POI visit dataset is provided by Safegraph Inc. (https://www.safegraph.com). POI datasets are commonly used to infer daily human activity26,75. The POI visit dataset is made up of the GPS location information dataset of mobile phones and smartphones in the United States through various apps. It consists of detailed information on POIs, including name, location, types, and brand of POIs based on North American Industry Classification System (NAICS) codes, and the number of visits by day based on GPS location data by counting the number of visits only if the visitor stays there for more than four minutes.

We select 10 different classes of POI to represent PDR: child care services (children), colleges, construction, entertainment, gasoline, groceries, hospitals, religious places (religion), restaurants, and schools19,27. We use J to represent ten different types of POIs.

We convert the number of visits of devices by the census block group to the normalized visit density (NVD) by county. First, we normalize the number of daily visits of mobile devices by county and POI information:

$${X}_{ijk}^{Target}=\frac{{X}_{ijk}}{\sqrt{A(i)}}\times \frac{P(l)}{D(l)},$$
(2)

where \({X}_{ijk}^{Target}\) is the visit density of the k-th POI in the j-th POI class in the i-th county, Xijk is a sequence of the number of daily visits of mobile devices to the k-th POI in the j-th POI class in the i-th county, A(k) is the area of the k-th POI, P(l) is the population in the l-th census block group, D(l) represents the number of devices in the l-th census block group.

Subsequently, we normalize the visit density (\({X}_{ijk}^{Target}\)) by the POI classes:

$${X}_{ijk}^{Norm}=\frac{{X}_{ijk}^{Target}}{{\sum }_{j\in J}{X}_{ijk}^{Base}/n(J)},$$
(3)

where \({X}_{ijk}^{Norm}\) is a sequence of NVD of the k-th POI in the j-th POI class in the i-th county, \({X}_{ijk}^{Base}\) is the normalization factor for \({X}_{ijk}^{Norm}\) by the j-th class of POIs in the i-th county, n(j) is the number of POIs in the j-th class, and J is a set of the selected classes of POIs. Normalization is composed of an aggregation of data from the j-th class of POIs in the i-th county in March, July, and August 2017.

Finally, we use aggregated POIs \({X}_{ij}^{Norm}\) by types of POIs:

$${X}_{ij}:= {X}_{ij}^{Norm}=\mathop{\sum}\limits_{k\in K}\frac{{X}_{ijk}^{Norm}}{n(k)}.$$
(4)

Here, \({X}_{ij}^{Norm}\) is the mean visit density to j-th POI class in the i-th county. We simply suggest Xij as the abbreviation of \({X}_{ij}^{Norm}\). We let K = {k1, k2, k3, . . . , k10} represent a set of POIs of the same class.

Causal discovery (CD) methods

Causal discovery (CD) methods are defined as an alternative to forward causal inference. Conventional forward causal inference aims to quantify causal effects using average treatment effects (E[yt = 1] − E[yt = 0]), where t = 1(0) denotes a treatment (control). The input data for forward causal inference have features (X), treatment (t), and outcomes (y). Furthermore, forward causal inference does not necessarily have ground-truth causal graphs regarding the dataset.

On the other hand, the CD method aims to select causal links among X by checking whether Xi causes Xj, Xi → Xj, Xi, XjX. CD can infer causal graph \({{{\mathscr{G}}}}\) with high-dimensional observational time series data without experimenting with treatment t. It would be helpful to give a ground-truth causal graph to evaluate the model, but it is not required to acquire that graph. This study aims to select causal links among various PDR-related datasets, so we choose the CD method to obtain causal links between regions and POI types.

There are four assumptions for the CD methods to rigorously match the causal graph with the real causal structures41. The first assumption is the causal Markov assumption, meaning that the noise variables of all nodes are assumed to be independent of each other except for the direct causes (called parents) of each node. The second assumption is the causal faithfulness assumption, implying that causal graphical structures must include all conditional independence relationships. The third assumption is the causal sufficiency assumption, which means that there is no external confounder outside of the causal graph. The last assumption is the acyclicity of the causal graph. The causal graph must be a directed acyclic graph that does not have a cycle in the causal graph.

In this paper, we assume the underlying generative model of the data is a Functional Causal Model (FCM)32. Let X = {X1, …, Xd} denote a set of continuous random variables whose causal relationships are of interest. We let FCM be defined by a pair \(({{{\mathcal{G}}}},f)\). Here, \({{{\mathcal{G}}}}\) is a directed acyclic graph (DAG) with nodes denoted by X and f = (f1, …, fd) is a set of d causal mechanisms. In FCM, f can be non-linear or non-parametric. For each variable XjX, we assume it follows the distribution

$${X}_{j}={f}_{j}({X}_{Pa(j;{{{\mathcal{G}}}})},{E}_{j}),$$
(5)

where \(Pa(j;{{{\mathcal{G}}}})\) is the set of parents of Xj in \({{{\mathcal{G}}}}\) and Ej is a random noise variable characterizing the effect of non-observed variables. If Xi is a parent of Xj, i.e., \({X}_{i}\in Pa(j;{{{\mathcal{G}}}})\) and Xi → Xj, variable Xi (the cause) has a causal influence on Xj (the effect) and vice versa.

To learn the DAG representing the acyclic FCM from observational data, we use SAM with the Causal Discovery Toolbox containing the latest CD algorithms41,42. Based on adversarial neural networks, SAM leverages both conditional independencies and distributional asymmetries induced by the causal direction, achieving trade-offs between model complexity and data fitting. The input of SAM is high-dimensional time series data representing each node of the causal graph, and the output of SAM is the weighted adjacency matrix to represent the averaged strength of the causal relationship41,76. The global score for this model has a minimax penalized optimization problem considering model complexity and data fitting loss. This score is executed by a stochastic gradient algorithm with respect to the set of parameters and structural gates to compensate for both model complexity and data fitting loss problems that are commonly raised in the rest of the models. Finally, the algorithm produces an acyclic causal graph (for more details, see41 and Supplementary Figure 4 for the parameter selection process of SAM).

A significant difference of SAM from the other CD methods is the utilization of a generative adversarial neural network to fit the loss function. Putting the variable in the structural gates, SAM designs the conditional generative neural network for the causal mechanisms of a set of nodes. SAM is adequate and accurate in identifying nonlinear causal structures between continuous variables from observational data.

Node aggregation

There are two types of node aggregation, county-level node aggregation and POI-level node aggregation. The upper side of Fig. 2b describes the county-level node aggregation. The raw results of the causal graph yield 11 different nodes in one county, including 10 types of POIs and brightness. These raw results show unclear relationships between regions since some links only represent specific causal relationships between nodes. Therefore, we aggregate all types of POIs and brightness into one county node and draw internally validated links.

First, we use mathematical expressions to explain county-level node aggregation. Define Xi,all as the node representing the i-th county and Xi,all = {Xij, jJ}, where i indicates the i-th county and j means the j-th POI class or brightness, and J is the set of different classes of POIs. When we see at least two links from X1j to X2j for any jJ, we draw an internally validated link from X1,all to X2,all.

On the other hand, POI-level node aggregation uses the same process as county-level node aggregation but utilizes seven different county nodes into one POI node. Specifically, we aggregate all county nodes into one POI node. Taking mathematical expressions, Xall,j is defined by a node to represent j-th POI class or brightness, and Xall,j = {Xij, iI}, where I is the set of counties. When we see at least two links from Xi1 to Xi2 for any iI, we draw an internally validated link from Xall,1 to Xall,2.

We use this node aggregation method in two ways. First, we use this step to visualize the results of the causal graph. Fig. 3b to f represent the visualized results. We choose 2 as the threshold for the number of links to draw the internally validated causal links. On the other hand, we apply this step to the graph union in Fig. 3a. Note that the threshold value 5 is used for the graph union in Fig. 3a. Details of the threshold value are expressed in Simple block bootstrapping of causal graphs at different time scales subsection.

Simple block bootstrapping of causal graphs at different time scales

We perform a simple block bootstrapping step to obtain internally validated results. The lower side of Fig. 2b explains how we aggregate raw results using different time scales to obtain internally validated links. In Fig. 3b, we include the result with data from the entire period. We split the datasets into five different time scales in Fig. 3b to f: (1) the entire period related to Hurricane Irma (from July 1st to December 31st, 2017) expressed by time period tp1, (2) the pre-disaster period (from July 1st to September 10th, 2017) represented by time period tp2, (3) the short-term PDR period from September 11th to September 30th, 2017 by time period tp3, (4) the mid-term PDR period from October 1st to December 31st, 2017 by time period tp4, and (5) the aggregated PDR period from September 11th to December 31st, 2017 by time period tp5. These results are not internally validated when we do not divide the datasets into different time scales. Specifically, it is highly likely that we will obtain causal links that are only applicable to specific time scales or types of POIs.

Let \({G}_{t{p}_{m}}=({{{{\mathscr{V}}}}}_{t{p}_{m}},{{{{\mathscr{E}}}}}_{t{p}_{m}})\) be the causal graph for the m-th time period tpm, denoted by a set of nodes \({{{{\mathscr{V}}}}}_{t{p}_{m}}\) and a set of links \({{{{\mathscr{E}}}}}_{t{p}_{m}}\). Note that each causal graph contains the node aggregation at the county level and POI level, that is, \({{{{\mathscr{V}}}}}_{t{p}_{m}}=\{{X}_{i,all},\forall i\in I\}\cup \{{X}_{all,j},\forall j\in J\}\).

In Fig. 3a, we present the graph union of the raw model output graphs at all time scales. Let \({G}_{U}=({{{{\mathscr{V}}}}}_{U},{{{{\mathscr{E}}}}}_{U})\) be the causal graph union, where \({{{{\mathscr{V}}}}}_{U}\)=\({{{{\mathscr{V}}}}}_{t{p}_{1}}\) and \({{{{\mathscr{E}}}}}_{U}=\{{e}_{{i}_{1},{i}_{2},t{p}_{m}}| {\sum }_{m\in M}n({e}_{{i}_{1},{i}_{2},t{p}_{m}})\ge 5,\forall {i}_{1},{i}_{2}\in I,{e}_{{i}_{1},{i}_{2},t{p}_{m}}\in {E}_{t{p}_{m}},m\in M:= \{1,2,3,4,5\}\}\). This means that when we observe not less than five links from \({v}_{{i}_{1}}\) to \({v}_{{i}_{2}}\) in the causal graphs by different time periods, we add the link \({e}_{{i}_{1},{i}_{2}}\) to the causal graph union. We set 5 as the threshold for the occurrence of links from \({v}_{{i}_{1}}\) to \({v}_{{i}_{2}}\) to fortify internal validation. Note that the threshold (\(th{r}_{{{{{\mathscr{E}}}}}_{U}}\)) for the occurrence of links is calculated by \(th{r}_{{{{{\mathscr{E}}}}}_{U}}=1+median(Occ({{{{\mathscr{E}}}}}_{U}))\), where \(Occ({{{{\mathscr{E}}}}}_{U})\) means the occurrence of links in \({{{{\mathscr{E}}}}}_{U}\), and median(S) represents the median of the occurrence of links S. Fig. 3a represents the causal graph union grouped by simple block bootstrapping77.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.