A spatiotemporal data mining study to identify high-risk neighborhoods for out-of-hospital cardiac arrest (OHCA) incidents

Out-of-hospital cardiac arrest (OHCA) is a worldwide health problem. The aim of the study is to utilize the territorial-wide OHCA data of Hong Kong in 2012–2015 to examine its spatiotemporal pattern and high-risk neighborhoods. Three techniques for spatiotemporal data mining (SaTScan’s spatial scan statistic, Local Moran’s I, and Getis Ord Gi*) were used to extract high-risk neighborhoods of OHCA occurrence and identify local clusters/hotspots. By capitalizing on the strengths of these methods, the results were then triangulated to reveal “truly” high-risk OHCA clusters. The final clusters for all ages and the elderly 65+ groups exhibited relatively similar patterns. All ages groups were mainly distributed in the urbanized neighborhoods throughout Kowloon. More diverse distribution primarily in less accessible areas was observed among the elderly group. All outcomes were further converted into an index for easy interpretation by the general public. Noticing the spatial mismatches between hospitals and ambulance depots (representing supplies) and high-risk neighborhoods (representing demands), this setback should be addressed along with public education and strategic ambulance deployment plan to shorten response time and improve OHCA survival rate. This study offers policymakers and EMS providers essential spatial evidence to assist with emergency healthcare planning and informed decision-making.

and Emergency (A&E) services, (ii) widespread availability of AED, and (iii) enhanced knowledge and ability of bystanders to offer first-aid. Realizing that it would be difficult if not impossible to improve on response time due to worsening traffic congestion in Hong Kong 6 , a practical approach to increasing service efficiency would involve identifying high-risk OHCA hotspots and strategically increasing the number of ambulance depots in these neighborhoods. There is also the need to raise public awareness and knowledge about cardiopulmonary resuscitation (CPR) 11,12 .
The primary goal of this study is to identify OHCA clusters (areas with high OHCA incidents) and high-risk neighborhoods in Hong Kong. Various spatiotemporal data mining/statistical techniques, also known as spatial cluster analyses, were employed in this study: SaTScan's spatial scan statistic 13 , Local Moran's I 14 , and Getis Ord Gi* 15 . High risk OHCA neighborhoods are defined as local clusters or hotspots, where the frequency of OHCA occurrence is higher than expected. To derive OHCA clusters, the Getis-Ord Gi* makes use of a self-defined threshold distance instead of the inverse distance measure employed by Moran's I or the statistical scanning technique used by SaTScan. We note that different spatiotemporal data mining approach would yield different patterns of OHCA clusters. Without a consensus on the best clustering method and considering the uneven landscape of Hong Kong characterized by hilly lands interspersed with waterbodies, the combined use of three methods would capitalize on individual strengths and allow impartial identification of high-risk neighborhoods. Previous literature have also recommended such an approach that has become common practice to integrate different methods into one new index so as to improve the results and produce optimum output 16,17 .
The spatiotemporal data mining of all population was stratified by age (all ages vs elderly 65+) and calendar year (2012 to 2015 inclusive). The outcomes by the three clustering methods were subsequently compared and triangulated in order to identify "credible" high-risk OHCA clusters, i.e. those recognized by at least two analytical methods as potential sites for community-based improvements. These "credible" high-risk neighborhoods for all ages and the elderly were further displayed as risk maps to visually evaluate accessibility and sufficiency of medical facilities. Identification of OHCA clusters will assist the Government and EMS providers in better appreciating any misfits in healthcare planning and resource allocation, which is particularly relevant to Hong Kong with rapidly ageing population.

Methods
Study area. The Hong Kong Special Administrative Region (SAR) of China ( Fig. 1) is situated at longitude 114° 15′ N and latitude 22° 15′ E, covering a total land area of 1106.66 km 2 . Hong Kong is one of the most densely populated megacities in the world with a high density of tall buildings and skyscrapers. Its 2020 total population was approximately 7.5 million, which was equivalent to 6,890 persons per km 2 (Census and Statistics Department of Hong Kong, https:// www. censt atd. gov. hk/). Its typical subtropical hot and humid weather, along with compact urban living with limited open space, has resulted in severe thermal discomfort and high energy consumption of the city. Figure 1 illustrates a map of Hong Kong, with administrative boundaries of 209 Spatial analysis and statistics. Spatial and spatiotemporal statistics are widely used in disease surveillance to identify geographic areas of elevated disease risk and for early detection of disease outbreaks 13 (SaTScan: version 8.0. Albany, NY, USA, 2018). The study employed SaTScan's spatial scan statistic, Local Moran's I, and Getis Ord Gi* to identify high-risk OHCA clusters. Figure 2 shows that each analysis method was computed separately for 2012-2015 calendar year for all ages cases (boxes shaded in cyan). With a substantial 75.25% of the OHCA cases belonging to the elderly population, a subset of cluster analyses was also computed for this age group (boxes shaded in pink). The results of statistical cluster analyses with hotspots were spatially overlaid and triangulated to yield "credible" high-risk OCHA neighborhoods separately for all ages and elderly groups. These results can be visualized and displayed in map form.
SaTScan's spatial scan statistics. SaTScan™ is a free software widely employed for analyzing spatial and temporal data for detecting disease clusters, such as HIV 20 , Legionnaires' disease 21 , OHCA cases 22 and tick-borne disease in the U.S. 23 The study applied a discrete Poisson-based model in SaTScan to examine OHCA cases in circular areas of varying diameters across Hong Kong 20 , assuming that the number of cases in a geographical location follows the Poisson distribution based on a known underlying population at risk 24 . Relative risks were calculated to reveal the risk level of each OHCA cluster. A spatial cluster with a high concentration of point locations would be regarded as a high-risk OHCA area 20 . Monte Carlo simulation using 1000 replications was used to detect statistically significant (p < 0.05) clusters 22,25 . The Gini coefficient was also calculated for each set of clusters to obtain a more refined collection of non-overlapping clusters 26 . These "Gini clusters" were ranked by statistical significance and each was assigned a numerical score. The OHCA clusters defined in this manner were then visualized using ArcMap 10.7.1 for subsequent spatial analysis and triangulation.
Local Moran's I statistic. Local Moran's I, also known as Local Indicators of Spatial Association (LISA), measures similarity among areas and defines neighborhoods with similar values as a cluster 14 . It also identifies dis-  www.nature.com/scientificreports/ similar neighborhoods or spatial outliers. This study used ArcMap 10.7.1 to compute Local Moran's I by considering the spatial distribution of OHCA cases and isolating areas with significantly higher or lower number of clusters than expected. The approach applied the inverse distance weights in establishing spatial relationships for the clustering analysis 14 .
The Moran's I index lies within the [− 1, 1] range to denote negative/positive spatial clustering effects with the zero value denoting a random distribution. Using a 95% confidence level, a statistically significant and positive z-score implies that the surrounding neighborhoods share similar OHCA rates. An area surrounded by neighborhoods with similar high z-score values yields a "High-High" (HH) cluster (i.e., a hotspot). Conversely, an area surrounded by neighborhoods with similar low z-scores is regarded as a "Low-Low" (LL) cluster (i.e., a coldspot). In this study, neighborhoods with HH and HL clusters were determined as high-risk OHCA clusters/ neighborhoods.
Getis-Ord Gi* statistic. The Getis-Ord Gi* statistic (Gi* statistic) measures the degree of spatial clustering with statistical significance at different spatial scales 15,16 . Using ArcMap 10.7.1, the Gi* calculation produces z-scores (GiZScore) and associated p-values (GiPValue) to indicate statistical significance of a particular neighborhood as a part of spatial clusters of either high-or low-values. A larger z-score denotes more intense clustering of higher OCHA incidents (i.e., a hotspot) whereas a smaller z-score signifies more intense clustering of low OCHA incidents (i.e., a coldspot). Neighborhoods in which hotspots were identified at 90% confidence were selected as high-risk OHCA clusters/neighborhoods.

High-risk index.
A high-risk OCHA neighborhood/TPU was denoted as "credible" if the TPU was rated high-risk by at least two of the three spatiotemporal clustering techniques described above (see also Fig. 2). The spatial clusters for all ages and the elderly groups were processed separately for each calendar year and triangulated using spatial overlay in ArcMap. TPUs rated as high-risk by all three techniques for a given year were denoted as Level 1 TPUs and those rated as high-risk by two of the three techniques were labeled as Level 2 TPUs.
To facilitate easy interpretation by the general public, Level 1 and Level 2 TPUs were assigned 1 and 0.5 scores respectively. By performing spatial overlay of the clustering results of all four years (2012-2015), the total score for a "credible" high-risk neighborhood/TPU would range between 0.5 and 4, hereafter referred to as the high-risk index of OHCA occurrence. The index was further translated into four ordinal classes (0.5-1.0 = low, 1.5-2.0 = medium, 2.5-3.0 = high, and 3.5-4 = extremely high). A high-risk index of "extremely high" (i.e. TPUs identified as high-risk by all three clustering methods in each of the four years) thus denotes "credible" high-risk neighborhoods with a more urgent need for service upgrade. Similarly, a high-risk index of "low" (i.e. TPUs rated as high-risk by two of three clustering methods during the four-year study period) denotes "credible" high-risk neighborhoods of lower priority for service improvement according to available resources. Table 1 summarizes characteristics of the 19,658 OHCA cases distributed in 209 TPUs in 2012-2015. The mean age of OHCA patients was 75.67 and more than three-quarters was elderly 65+. Males accounted for 55.22% and females for 44.32% of the sample. The majority of OHCA events was not related to trauma (n = 18,747; 95.37%). The "No ROSC" rate showing without return of spontaneous circulation was high (n = 18,269; 92.93%), with the majority of these events occurring at home (n = 10,383; 52.82%). Nearly 30% (n = 5851) of the OHCA cases resided in a home for the aged (HFA) or a nursing home for the elderly. Table 2 summarizes the number of high-risk neighborhoods/TPUs detected by the three spatial clustering methods. It also shows the number of Level 1 and Level 2 "credible" high-risk neighborhoods. The results show differing numbers of spatial clusters were identified by different methods. These spatial clusters were mapped to examine the degree of agreement or discrepancy in terms of spatial locations. Figure 3 shows results of the spatial overlay by three clustering methods for all ages (column a) and elderly 65+ (column b) for each year in the study period. Significant clusters computed by SaTScan were shaded in maroon; the HH (high clusters neighbored by high clusters) and HL (high clusters neighbored by low clusters) clusters produced by Local Moran's I were represented using gray slanting lines; and outcomes from Getis Ord Gi* (hotspots identified at 90% confidence) were displayed in orange-red crossed pattern.

Analysis of findings by spatial clustering methods irrespective of age.
Referring to Table 2 and Fig. 3, the spatial distribution of high-risk clusters for all ages by different clustering methods appeared to have some agreement as evidenced by the spatial overlap. It was observed that "credible" high-risk neighborhoods for each year tended to concentrate in the Kowloon area (including districts of Yau Tsim Mong, Sham Shui Po, and Kowloon City) with statistically significant high-risk clusters concentrating in Kwai Chung, Kwun Tong, and Kowloon East (refer to Fig. 4 for geographic positions). These places are not only fast paced and densely populated but also local economic and political centers. There exist older neighborhoods with poor road infrastructure and severe traffic congestion from intensified urban development that collectively account for the increase in the overall ambulance response time within the region.
Analysis of findings by three spatial clustering methods for elderly 65+. The distribution of highrisk clusters for the elderly group by different clustering methods yielded quite different results, compared with all ages group, both in terms of the number of high-risk clusters (Table 2) and the spatial locations (Fig. 3). Close examination of the spatial distribution of OHCA clusters for elderly 65+ revealed more dispersed distribution in 2014, compared with earlier years, whereby significant clusters were found in the New Territories covering sub-  Table 2. Number of high-risk neighborhoods by spatial clustering methods (SaTScan, Local Moran's I, and Getis Ord Gi*). Level 1 denotes "credible" high-risk neighborhoods identified by all three clustering methods; Level 2 denotes "credible" high-risk neighborhoods identified by two of three clustering methods.   www.nature.com/scientificreports/ urban and rural areas. Almost similar patterns were observed in 2015 with the exception of significant clusters appearing in the Kowloon region. The suburban and rural areas were not within the spatial coverage of 12-min ambulance service area targeted by the FSD (see Fig. 1). Several "credible" high-risk clusters were detected for elderly 65+ but the number was much smaller compared with that of the full sample. These significant and high-risk clusters were concentrated in Kwai Ching, Tsuen Wan, Yuen Long, and rural villages. The locations of these high-risk OHCA clusters suggest that elderly 65+ individuals tended to reside in older neighborhoods with narrow streets (e.g., Kwai Ching) and less accessible rural areas. The longer EMS response time thus increased the risk of adverse OHCA outcomes. Figure 4 illustrates results of the high-risk indices for all ages and elderly 65+. The darker shading denotes "credible" high-risk neighborhoods with "extremely high" likelihood of OHCA occurrence. These neighborhoods should receive priority attention to service upgrade if EMS was deemed insufficient. The lighter shading shows "credible" high-risk neighborhoods with lower risks of OHCA occurrence. These neighborhoods are of lower priority for consideration of EMS service upgrade and when resources become available.

Analysis of findings based on high-risk index.
The locations of A&E departments and ambulance depots were also displayed in Fig. 4 to reveal current levels of service provision within the 4-min ideal and 12-min targeted spatial coverage. The spatial distribution of A&E departments and ambulance depots is geographically uneven, which is particularly problematic for people living in high-risk neighborhoods. "Credible" high-risk neighborhoods/TPUs including San Po Kong, Kowloon City, and Shum Shui Po lack EMS within the vicinity of 4-min ideal response time for OHCA 18 .

Discussion
Drawing on the strengths of individual spatial clustering methods and integrating the three sets of clustering results, "credible" high-risk neighborhoods can be determined with a high degree of confidence. The study also reveals that the SaTScan and Local Moran's I approaches yielded highly consistent results even for Hong Kong, a metropolitan area known for its high spatial diversity of hilly and interrupted urban landscapes. This integrative approach can be adapted to examine community-based problems and facility distribution in other cities.
The study also shows the need to examine specific population group, such as elderly 65+, to detect spatial variation of hotspots from the general population. Our findings reported that the OHCA hotspots for elderly 65+ were distributed mostly in older neighborhoods and the less accessible suburban and rural areas where more affordable nursing homes were located 27,28 . This is particularly problematic as first aid assistance for OHCA should be provided within 4 min 18 to increase the survival rate of patients. There is thus a need to improve EMS provision for elderly 65+. Besides strategically increasing the number of EMS facilities 29 in high-risk neighborhoods, other community-based interventions such as improvement of public CPR awareness and education 30 may ultimately improve the survival rate of OHCA, as proven in previous litereature 31 .
This study has several limitations. First, the anonymized OHCA records provided by EMS did not include any personal risk factors (i.e. daily activity, socioeconomic status, occupation, etc.) and health condition or medical history before the cardiac arrest that would have influenced the association between the location of arrest and patient outcome. Second, prior treatment records of OHCA patients were not accessible to paramedics whilst en-route to hospitals or upon arrival at hospitals to enable effective treatment that could improve OHCA survival rate. Third, there is no official and centralized database about locations and conditions of AED in Hong Kong