An analysis of the dynamic spatial spread of COVID-19 across South Korea

The first case of coronavirus disease 2019 (COVID-19) in South Korea was confirmed on January 20, 2020, approximately three weeks after the report of the first COVID-19 case in Wuhan, China. By September 15, 2021, the number of cases in South Korea had increased to 277,989. Thus, it is important to better understand geographical transmission and design effective local-level pandemic plans across the country over the long term. We conducted a spatiotemporal analysis of weekly COVID-19 cases in South Korea from February 1, 2020, to May 30, 2021, in each administrative region. For the spatial domain, we first covered the entire country and then focused on metropolitan areas, including Seoul, Gyeonggi-do, and Incheon. Moran’s I and spatial scan statistics were used for spatial analysis. The temporal variation and dynamics of COVID-19 cases were investigated with various statistical visualization methods. We found time-varying clusters of COVID-19 in South Korea using a range of statistical methods. In the early stage, the spatial hotspots were focused in Daegu and Gyeongsangbuk-do. Then, metropolitan areas were detected as hotspots in December 2020. In our study, we conducted a time-varying spatial analysis of COVID-19 across the entirety of South Korea over a long-term period and found a powerful approach to demonstrating the current dynamics of spatial clustering and understanding the dynamic effects of policies on COVID-19 across South Korea. Additionally, the proposed spatiotemporal methods are very useful for understanding the spatial dynamics of COVID-19 in South Korea.

www.nature.com/scientificreports/ COVID-19 control policies at the local administrative level have been implemented based on the volume of cases. It is possible to visualize the dynamics of the disease from the results of spatial and temporal analysis of COVID-19 confirmed cases at the local administrative level, which may help us understand epidemics of the newly emergent infectious disease. In many countries, various spatial or spatiotemporal analyses of COVID- 19 have been performed to understand the characteristics of epidemics and evaluate public health policies. In China, the spatial spread of COVID-19 cases at the early stage was investigated 3,4 , and the spatiotemporal characteristics of COVID-19 transmission in 31 provincial-level regions and 337 prefecture-level cities were examined 5 . In the United States, the dynamic spatial spread of COVID-19 at the state level using metric geometry was analyzed 6 , and spatiotemporal clusters of county-level daily COVID-19 cases were detected from January 22nd to March 27th, 2020 7 . Additionally, the patterns of COVID-19 cases in rural and urban areas were compared, showing different temporal and spatial distributions 8,9 . In the UK, the spatial distribution of COVID-19 cases was explored and regional outbreaks were detected 10 . The spatiotemporal distribution of COVID-19 infection using unaggregated data was explored 11 . Daily COVID-19 cases and deaths in Brazil were used to explore their spatial patterns 12 . The spatiotemporal distribution of local-level COVID-19 cases in Italy was modeled and a significant impact of strict control policies on the spread was found 13 .
Several studies have examined spatially dependent effects or detected spatial clusters using Moran's I statistics and spatial scan statistics in China 3-5 and Iran 25,26 . Additionally, the spatial association between COVID-19 and the government response in South Korea at the early stage, from January 20 to May 2020 was assessed 27 . Following the COVID-19 outbreak in 2020, the spatial diffusion and patterns of COVID-19 have varied dynamically, depending mainly on the control policy, human mobility, and epidemic mechanism. When the outbreaks or the size of the high-risk spatial clusters increased, the government might have implemented a stronger social distancing policy at the national level or in high-risk areas to control COVID-19 transmission and reduce the spread of the virus. Thus, it is important to understand and investigate the dynamic spatial patterns of COVID-19 over a longer period.
In this study, we conducted a spatiotemporal analysis of confirmed COVID-19 cases across South Korea from February 18, 2020, to May 31, 2021, to investigate the spatial and temporal variations in COVID-19 and identify the temporally varying spatial cluster patterns of COVID-19 in South Korea.

Data and methods
Data sources. To investigate the spatial dynamics of COVID-19 cases across South Korea, the district-level (called si/gun/gu) number of daily or weekly COVID-19 cases was needed. However, the district-level COVID-19 dataset across South Korea was not publicly available, and there were no real figures. Thus, we used the official daily confirmed COVID-19 cases by district obtained by the Korea Disease Control and Prevention Agency. In this study, we analyzed district-level weekly cases from February 18, 2020, to May 31, 2021, in 250 districts across South Korea. The daily statistics of COVID-19 cases in South Korea include information on whether the case was infected outside or inside the country. Because we focused on local transmission within the community, cases from foreign countries were excluded from the study. All methods were performed in accordance with relevant guidelines and regulations as reviewed and approved by the Institutional Review Boards of Hanyang University Seoul Hospital (HYU-2019-04-021).

Research methods. Global Moran's I.
Moran's I statistic measures spatial autocorrelation 28 and is defined as follows: where i and j are the region indices and the element W ij is the adjacency between areas i and j . We set W ij to 1 if areas i and j shared a border and 0 if otherwise. The variables X i and X j denote the number of new confirmed cases in areas i and j , respectively, and X indicates the average number of new confirmed cases in the area. A value of 0 implies complete spatial randomness in the data. If Moran's I value is larger than 0, it indicates the clustering of similar values, whereas a negative value indicates the clustering of distinct values. A large absolute value of Moran's I implies a strong spatial autocorrelation. The mathematical formula of the statistic is similar to the Pearson correlation coefficient, but Moran's I is not bounded in [−1, 1] . Some alternative versions of Moran's I were proposed to explain heterogeneous populations or consider various weight functions [29][30][31] .
In this study, we focused mainly on the spatial autocorrelation among COVID-19 cases, not adjusting the population sizes. In the weight function formula, the definition of the geographic distance for our irregular district-level data is not clear. Thus, the original Moran's I with the adjacent weight function was considered in the analysis. Spatial scan statistic. The spatial scan statistic is a typical statistic for spatial cluster detection 32 . The scan statistic z is defined using the likelihood function as follows: www.nature.com/scientificreports/ where z and Z denote a scanning window in the spatial domain and the collection of all scanning windows, respectively. Here, L(θ|z) is the likelihood function. The null hypothesis H 0 is that a spatial cluster does not exist in the spatial domain. Alternatively, hypothesis H a is that a certain cluster does exist in the spatial domain. The size of the scanning windows can vary and usually does not exceed 50% of the study domain 33 . Various probability distributions can be assumed appropriately for the data. Our COVID-19 data have excess zeros at some weeks. Thus, this study assumed a zero-inflated Poisson distribution if the number of areas with zero cases exceeded 30% of the total and the Poisson distribution if otherwise. The maximum size of the scanning window was set to 20%. We defined the scanning window z with the maximum z as the most likely cluster. Monte Carlo hypothesis testing is widely used to obtain the p value of the most likely cluster. We simulated 999 random datasets for Monte Carlo testing. Additionally, we chose the most likely cluster as the final spatial cluster only if the number of cases for each area was above the 90th percentile. For analysis, we used R statistical software (version 3.6.3; https:// www.r-proje ct. org/) using the 'SpatialEpi' 34 and 'scanstatistics' 35 packages for the spatial scan statistic. We used the 'ape' package for Moran's I statistic 36 . In addition, all the figures were created using R software.

Ethical approval.
No human or animal samples were included in the research presented in this article; therefore, ethical approval was not necessary for this research.  Table 1).   www.nature.com/scientificreports/ Gyeonggi, and Incheon, which is 68% of the cases in the entire country in the period. The weekly cases have never been less than 3000 cases since April 2021.
To investigate the geographical distribution of the number of cases, we produced a map of the cumulative cases for 250 administrative areas of South Korea (Fig. 2a) and 77 administrative areas of three metropolitan cities of Seoul, Gyeonggi, and Incheon (Fig. 2b). The cases were the highest around metropolitan areas and Daegu. Moreover, a strong spatial dependency was uncovered, and most of the areas in Seoul had more than 1000 cases.  week over the entire area to check the spatial association in the number of confirmed cases. In Fig. 3, the black and red lines indicate the statistic and its p value, respectively. The p values of Moran's I were less than 0.0001 at 61 weeks (approximately 91% of the time domain), showing highly significant spatial autocorrelation. Additionally, p values at 5 weeks were between 0.005 and 0.025, providing medium significant spatial autocorrelation. When the number of new cases dramatically increased, the statistics also tended to increase, such as in August  In addition to Moran's I , we calculated the number of areas with a higher number of cases than a threshold (5, 10, 15, 20, and 25 cases) for each week to investigate the spatial diffusion, as shown in Fig. 4. The larger the number of areas is, the more active the spatial spread. The left side of the y-axis denotes the number of areas, and the right side indicates the number of areas divided by the total number of areas (250 areas). All five lines show a similar temporal tendency to Moran's I statistics in Fig. 3. This pattern indicates that the virus spread actively during the peak seasons in South Korea. For example, before August 2020, less than 20% of 250 areas had more than five cases. In contrast, after November 2020, over 50% of 250 areas had more than five cases.
To detect the spatial cluster with elevated risks, we used the spatial scan statistic for two peak seasons: the first from February 18, 2020, to mid-March 2020, and the second from December 1 to December 28, 2020. During the first peak season, the areas in Daegu were detected as clusters (Fig. 5, Table 2): the areas with black borderlines in Fig. 5 represent the clusters. During this period, the number of new infections mainly developed in Daegu and Gyeongsangbuk-do.
Unlike the first peak, all the clusters were in metropolitan areas in December 2020 (Fig. 6, Table 3). Most of them were in Seoul, and some were in Gyeonggi and Incheon. As shown in the maps, the number of cases was

Spatiotemporal analysis over metropolitan areas. The population in metropolitan areas in South
Korea was approximately 25,674,800 as of 2018, making up more than 50% of the total population. The number of cases in metropolitan areas has been dominant since April 2020. Before cluster detection, we calculated the global Moran's I statistic for each week to examine the spatial spread in metropolitan areas (Fig. 7). There was statistical significance in many periods, such as August 2020 and May 2021. In addition, we counted the number of metropolitan areas with the number of cases over a threshold (Fig. 8). In August 2020, the number of areas with more than five cases dramatically increased to over 80% of the entire area. The rate has not dropped to less than 80% since December 2020. This implies that spatial spread occurred in metropolitan areas, supporting the need for a spatial investigation of the number of cases in metropolitan areas. We detected spatial clusters with elevated risks using a scan statistic for metropolitan areas from August to September 2020 (Fig. 9, Table 4). Most of the districts were in Seoul, and only some were in Gyeonggi.
The cluster sizes detected in May 2021 were larger than those detected in August-September 2020, and the number of cases in the detected clusters increased accordingly (Fig. 10, Table 5).

Discussion
In this study, we conducted a spatiotemporal analysis to investigate the spatial spread and time-varying clusters of COVID-19 in South Korea. Along with Moran's I results, we presented various time series plots to examine the temporal pattern and produced choropleth maps to visually check the spatial association. To explore spatial clusters, scan statistics and visualization methods were considered. In general, the p value is related to sample  www.nature.com/scientificreports/ size and significance 37 . It is possible to obtain small p values in large datasets with weak associations or large p values in small datasets with strong associations. Thus, we considered various visualization methods as well as statistical tests to investigate the spatial dynamics of COVID-19. We found the areas in Daegu to be clusters in the early stage. This result may be due to mass infection in the Shincheonji religious group 38,39 . Then, metropolitan areas were detected as hotspots in December. It was reported that various cluster infections occurred in long-term hospitals, public saunas, and prisons in December 2020 40 .
Previous studies on the dynamics of the spatial patterns of COVID-19 have focused on existing spatial dependent effects or detecting spatial clusters, mainly using Moran's I statistics and spatial scan statistics [3][4][5][25][26][27] . The spatial spread of COVID-19 in China at the very early stage, from January 16, 2020, to February 06, 2020, was first examined, using 31 province-level COVID-19 confirmed data 3 . The spatial patterns of COVID-19 in China from January 10, 2020, to March 5, 2020, was also studied 4 . The dynamic spatial association of COVID-19 in 31 province-level regions and 337 prefecture-level cities in China from January to October 2020 was examined 5 . In Iran, the spatial association and spatial hotspots of COVID-19 at the early stage (March and April 2020) was examined 25 , and the spatiotemporal patterns of COVID-19 from February 18 to October 21, 2020 were analyzed 26 . Approximately 4 months of the COVID-19 epidemic from January 20 to May 31, 2020, in South Korea were covered 27 . These studies mapped the spatial pattern and linked the clusters in the early epidemics, and the results may have contributed to knowledge on COVID-19 epidemics, especially during the period in which information about the virus was lacking. Our study included a longer period of 16 months and recent dates with more cases, so that it is a powerful approach for demonstrating the current dynamics of spatial clustering across South Korea.
The spatiotemporal dataset may contain excess zero counts owing to the spatiotemporal units; then, such property should be considered in the analysis. Here, we accounted for the excess zero counts by utilizing a zero-inflated Poisson distribution in the scan statistic. We used various spatiotemporal methods simultaneously, leading to better results than using only one method. We compared the results of different approaches and provided more comprehensive results. In this study, we conducted weekly spatial analysis to investigate the  www.nature.com/scientificreports/ real-time spatial dynamics of COVID-19 cases across South Korea. Thus, we did not consider the use of multiple tests with p value adjustments 41 . Despite the many strengths of this study, it has some limitations. First, we did not investigate possible confounding factors on COVID-19 spread. For example, the Korean government has implemented many social distancing policies and regulations. If we consider these nonpharmaceutical effects, we might obtain more precise results. In addition, we did not investigate the spatial association between COVID-19 and confounding factors, such as air pollution, weather, population mobility, and demographic characteristics. Thus, future research should investigate the effects of confounding factors on COVID-19 at the regional level in South Korea using statistical models.
Second, we used the official number of COVID-19 cases to study the spatial dynamics of COVID-19 in South Korea. However, the official numbers might be underestimated due to limited testing capacities, unexpected false negatives, overcrowding of hospitals, and unprepared health systems [42][43][44][45][46][47][48] . The spatial dynamics of COVID-19 using official numbers or real numbers might be different. Thus, it may be of interest to conduct spatiotemporal analysis of COVID-19 by considering the underestimation of COVID-19 cases.

Conclusion
To the best of our knowledge, this is the first study to conduct a spatiotemporal analysis using long-term COVID-19 data in South Korea. Here, we showed that spatial spread of the coronavirus occurred, especially in metropolitan areas. A timely spatiotemporal analysis would be helpful for identifying hotspots and preventing spatial transmission of the virus during the pandemic.

Data availability
The data that support the findings of this study are available from the Korean Disease Control and Prevention Agency, but restrictions apply to the availability of these data, which were used under collaboration for the current study and are not publicly available. Data are however available from the authors upon reasonable request and with permission of the Korean Disease Control and Prevention Agency.