Introduction

After the coronavirus disease 2019 (COVID-19) outbreak in Wuhan, China, in late December 20191, the number of cases of COVID-19 in most countries, including China, dramatically increased. The World Health Organization (WHO) reported 95,324 confirmed cases with 3281 deaths globally by March 5, 2020, and declared COVID-19 a pandemic on March 11, 20202.

South Korea was one of the first countries to announce and respond to a COVID-19 case. On January 20, 2020, the first confirmed case was reported, and 30 confirmed cases were reported by February 17, 2020. After the 31st patient was identified on February 18, 2020, attending Shincheonjii church services in Daegu (located in the country’s southeast region), the number of newly confirmed patients dramatically increased to approximately 750 by February 24, 2020 (6 days). As of September 15, 2021, 277,989 confirmed cases and 2380 deaths of COVID-19 were reported in South Korea (https://kdca.go.kr).

The Korean government implemented various strategies, including rapid diagnostic testing, social distancing, and wearing face masks nationwide, to control and reduce the spread of COVID-19. In addition, different COVID-19 control policies at the local administrative level have been implemented based on the volume of cases. It is possible to visualize the dynamics of the disease from the results of spatial and temporal analysis of COVID-19 confirmed cases at the local administrative level, which may help us understand epidemics of the newly emergent infectious disease.

In many countries, various spatial or spatiotemporal analyses of COVID-19 have been performed to understand the characteristics of epidemics and evaluate public health policies. In China, the spatial spread of COVID-19 cases at the early stage was investigated3,4, and the spatiotemporal characteristics of COVID-19 transmission in 31 provincial-level regions and 337 prefecture-level cities were examined5. In the United States, the dynamic spatial spread of COVID-19 at the state level using metric geometry was analyzed6, and spatiotemporal clusters of county-level daily COVID-19 cases were detected from January 22nd to March 27th, 20207. Additionally, the patterns of COVID-19 cases in rural and urban areas were compared, showing different temporal and spatial distributions8,9. In the UK, the spatial distribution of COVID-19 cases was explored and regional outbreaks were detected10. The spatiotemporal distribution of COVID-19 infection using unaggregated data was explored11. Daily COVID-19 cases and deaths in Brazil were used to explore their spatial patterns12. The spatiotemporal distribution of local-level COVID-19 cases in Italy was modeled and a significant impact of strict control policies on the spread was found13.

The spatiotemporal dynamics of COVID-19 may be influenced by various local confounding factors14. For example, any intervention effects, such as social distancing or vaccination rates, may be related to COVID-19 spread15,16,17. Several studies have investigated the effects of air pollution, climate, and weather-related factors, such as temperature, wind, and humidity, on COVID-19 spread18,19,20,21,22,23. In addition, the spatial associations between COVID-19 and population mobility and demographic characteristics have been discussed14,24.

Several studies have examined spatially dependent effects or detected spatial clusters using Moran’s I statistics and spatial scan statistics in China3,4,5 and Iran25,26. Additionally, the spatial association between COVID-19 and the government response in South Korea at the early stage, from January 20 to May 2020 was assessed27. Following the COVID-19 outbreak in 2020, the spatial diffusion and patterns of COVID-19 have varied dynamically, depending mainly on the control policy, human mobility, and epidemic mechanism. When the outbreaks or the size of the high-risk spatial clusters increased, the government might have implemented a stronger social distancing policy at the national level or in high-risk areas to control COVID-19 transmission and reduce the spread of the virus. Thus, it is important to understand and investigate the dynamic spatial patterns of COVID-19 over a longer period.

In this study, we conducted a spatiotemporal analysis of confirmed COVID-19 cases across South Korea from February 18, 2020, to May 31, 2021, to investigate the spatial and temporal variations in COVID-19 and identify the temporally varying spatial cluster patterns of COVID-19 in South Korea.

Data and methods

Data sources

To investigate the spatial dynamics of COVID-19 cases across South Korea, the district-level (called si/gun/gu) number of daily or weekly COVID-19 cases was needed. However, the district-level COVID-19 dataset across South Korea was not publicly available, and there were no real figures. Thus, we used the official daily confirmed COVID-19 cases by district obtained by the Korea Disease Control and Prevention Agency. In this study, we analyzed district-level weekly cases from February 18, 2020, to May 31, 2021, in 250 districts across South Korea. The daily statistics of COVID-19 cases in South Korea include information on whether the case was infected outside or inside the country. Because we focused on local transmission within the community, cases from foreign countries were excluded from the study. All methods were performed in accordance with relevant guidelines and regulations as reviewed and approved by the Institutional Review Boards of Hanyang University Seoul Hospital (HYU-2019-04-021).

Research methods

Global Moran’s I

Moran’s I statistic measures spatial autocorrelation28 and is defined as follows:

$$I=\frac{n\sum_{i,j}{W}_{ij}({X}_{i}-\overline{X })({X}_{j}-\overline{X })}{\sum_{i\ne j}{W}_{ij}{\sum }_{i}{\left({X}_{i}-\overline{X }\right)}^{2}},$$

where \(i\) and \(j\) are the region indices and the element \({W}_{ij}\) is the adjacency between areas \(i\) and \(j\). We set \({W}_{ij}\) to 1 if areas \(i\) and \(j\) shared a border and 0 if otherwise. The variables \({X}_{i}\) and \({X}_{j}\) denote the number of new confirmed cases in areas \(i\) and \(j\), respectively, and \(\overline{X }\) indicates the average number of new confirmed cases in the area. A value of 0 implies complete spatial randomness in the data. If Moran’s \(I\) value is larger than 0, it indicates the clustering of similar values, whereas a negative value indicates the clustering of distinct values. A large absolute value of Moran’s \(I\) implies a strong spatial autocorrelation. The mathematical formula of the statistic is similar to the Pearson correlation coefficient, but Moran’s \(I\) is not bounded in \([-\mathrm{1,1}]\). Some alternative versions of Moran’s I were proposed to explain heterogeneous populations or consider various weight functions29,30,31.

In this study, we focused mainly on the spatial autocorrelation among COVID-19 cases, not adjusting the population sizes. In the weight function formula, the definition of the geographic distance for our irregular district-level data is not clear. Thus, the original Moran’s I with the adjacent weight function was considered in the analysis.

Spatial scan statistic

The spatial scan statistic is a typical statistic for spatial cluster detection32. The scan statistic \({\lambda }_{z}\) is defined using the likelihood function as follows:

$${\lambda }_{z}=\frac{\underset{z\in Z, {H}_{a}}{max}L(\theta |z)}{\underset{z\in Z, {H}_{0}}{max}L(\theta |z) }=\underset{z\in Z}{\mathit{max}}LR(z),$$

where \(z\) and \(Z\) denote a scanning window in the spatial domain and the collection of all scanning windows, respectively. Here, \(L(\theta |z)\) is the likelihood function. The null hypothesis \({H}_{0}\) is that a spatial cluster does not exist in the spatial domain. Alternatively, hypothesis \({H}_{a}\) is that a certain cluster does exist in the spatial domain. The size of the scanning windows can vary and usually does not exceed 50% of the study domain33. Various probability distributions can be assumed appropriately for the data. Our COVID-19 data have excess zeros at some weeks. Thus, this study assumed a zero-inflated Poisson distribution if the number of areas with zero cases exceeded 30% of the total and the Poisson distribution if otherwise. The maximum size of the scanning window was set to 20%. We defined the scanning window \(z\) with the maximum \({\lambda }_{z}\) as the most likely cluster. Monte Carlo hypothesis testing is widely used to obtain the p value of the most likely cluster. We simulated 999 random datasets for Monte Carlo testing. Additionally, we chose the most likely cluster as the final spatial cluster only if the number of cases for each area was above the 90th percentile.

For analysis, we used R statistical software (version 3.6.3; https://www.r-project.org/) using the ‘SpatialEpi’34 and ‘scanstatistics’35 packages for the spatial scan statistic. We used the ‘ape’ package for Moran’s \(I\) statistic36. In addition, all the figures were created using R software.

Ethical approval

No human or animal samples were included in the research presented in this article; therefore, ethical approval was not necessary for this research.

Results

Weekly incidence of COVID-19 cases

Figure 1 presents the time series plots of the newly confirmed cases and the cumulative confirmed cases every week. Bars indicate the weekly new cases with the left axis, and the blue line indicates the cumulative cases with the right axis. Within the temporal domain of the study, from February 18, 2020, to May 31, 2021, the highest count of new cases was 6887 between December 15 and December 21, 2020 (inclusive). A total of 132,060 patients were diagnosed with COVID-19 during the study period. To understand and compare temporal patterns of the weekly number of cases, we divided the dataset into six periods based on the number of cases. If the number of cases at each week was greater/less than the mean plus/minus standard deviation of the number of cases for the previous three weeks and the period length was greater than 4 weeks, then the new period was determined. Table 1 provides summary statistics for each period. The number of new cases was the highest from November 10, 2020, to January 18, 2021 (weekly mean of 4290 cases) and the lowest from April 7 to August 10, 2021 (weekly mean of 138 cases).

Figure 1
figure 1

Time series plot for weekly confirmed cases and cumulative confirmed cases of COVID-19 in South Korea from February 18, 2020, to May 11, 2021 (The bar colors distinguish the six temporal periods based on Table 1).

Table 1 Summary statistics for the number of weekly cases in six periods in South Korea.

After February 18, 2020, the number of confirmed cases increased dramatically until the beginning of March 2020. During this period, mass transmission occurred in Daegu and Gyeongsangbuk-do. From February 18 to March 9, 2020, a total of 7021 patients were diagnosed with COVID-19 in Daegu and Gyeongsangbuk-do, which was 90% of the total number of COVID-19 patients in South Korea in this period. Later, the number of new infections has greatly increased again since November 2020, mainly in metropolitan areas, including Seoul, Gyeonggi, and Incheon. From December 2020 to May 2021, a total of 68,952 cases were reported from Seoul, Gyeonggi, and Incheon, which is 68% of the cases in the entire country in the period. The weekly cases have never been less than 3000 cases since April 2021.

To investigate the geographical distribution of the number of cases, we produced a map of the cumulative cases for 250 administrative areas of South Korea (Fig. 2a) and 77 administrative areas of three metropolitan cities of Seoul, Gyeonggi, and Incheon (Fig. 2b). The cases were the highest around metropolitan areas and Daegu. Moreover, a strong spatial dependency was uncovered, and most of the areas in Seoul had more than 1000 cases.

Figure 2
figure 2

Map of the cumulative confirmed cases of COVID-19 in South Korea from February 18, 2020, to May 31, 2021.

Spatiotemporal analysis over the entire area

We calculated the global Moran’s \(I\) statistic for each week over the entire area to check the spatial association in the number of confirmed cases. In Fig. 3, the black and red lines indicate the statistic and its p value, respectively. The p values of Moran’s I were less than 0.0001 at 61 weeks (approximately 91% of the time domain), showing highly significant spatial autocorrelation. Additionally, p values at 5 weeks were between 0.005 and 0.025, providing medium significant spatial autocorrelation. When the number of new cases dramatically increased, the statistics also tended to increase, such as in August and November 2020. This implies that the coronavirus spread spatially when the number of new infections increased. In particular, in 2021, the statistic tends to increase from March 2021.

Figure 3
figure 3

Time series plot for global Moran’s \(I\) statistic (black line) and p value (red line) of COVID-19 cases for each week in South Korea from February 18, 2020, to May 11, 2021.

In addition to Moran’s \(I\), we calculated the number of areas with a higher number of cases than a threshold (5, 10, 15, 20, and 25 cases) for each week to investigate the spatial diffusion, as shown in Fig. 4. The larger the number of areas is, the more active the spatial spread. The left side of the \(y\)-axis denotes the number of areas, and the right side indicates the number of areas divided by the total number of areas (250 areas). All five lines show a similar temporal tendency to Moran’s \(I\) statistics in Fig. 3. This pattern indicates that the virus spread actively during the peak seasons in South Korea. For example, before August 2020, less than 20% of 250 areas had more than five cases. In contrast, after November 2020, over 50% of 250 areas had more than five cases.

Figure 4
figure 4

Time series plot for the number of areas with COVID-19 cases over a threshold in South Korea from February 18, 2020, to May 11, 2021.

To detect the spatial cluster with elevated risks, we used the spatial scan statistic for two peak seasons: the first from February 18, 2020, to mid-March 2020, and the second from December 1 to December 28, 2020. During the first peak season, the areas in Daegu were detected as clusters (Fig. 5, Table 2): the areas with black borderlines in Fig. 5 represent the clusters. During this period, the number of new infections mainly developed in Daegu and Gyeongsangbuk-do.

Figure 5
figure 5

Cluster maps of COVID-19 in South Korea from February 18 to March 10, 2020.

Table 2 Cluster information of COVID-19 in South Korea from February 18 to March 10, 2020.

Unlike the first peak, all the clusters were in metropolitan areas in December 2020 (Fig. 6, Table 3). Most of them were in Seoul, and some were in Gyeonggi and Incheon. As shown in the maps, the number of cases was focused in metropolitan areas at the beginning of December, and the number increased in other areas as the coronavirus spread geographically.

Figure 6
figure 6

Cluster maps of COVID-19 in South Korea in December 2020.

Table 3 Cluster information of COVID-19 in South Korea in December 2020.

Spatiotemporal analysis over metropolitan areas

The population in metropolitan areas in South Korea was approximately 25,674,800 as of 2018, making up more than 50% of the total population. The number of cases in metropolitan areas has been dominant since April 2020. Before cluster detection, we calculated the global Moran’s \(I\) statistic for each week to examine the spatial spread in metropolitan areas (Fig. 7). There was statistical significance in many periods, such as August 2020 and May 2021. In addition, we counted the number of metropolitan areas with the number of cases over a threshold (Fig. 8). In August 2020, the number of areas with more than five cases dramatically increased to over 80% of the entire area. The rate has not dropped to less than 80% since December 2020. This implies that spatial spread occurred in metropolitan areas, supporting the need for a spatial investigation of the number of cases in metropolitan areas.

Figure 7
figure 7

Time series plot for Moran’s \(I\) statistic and p value for COVID-19 cases in each week in metropolitan areas in South Korea from February 18, 2020, to May 11, 2021 (no calculation of Moran’s I on the week, April 28 to May 4, 2020, due to the lack of data information).

Figure 8
figure 8

Time series plot for the number of metropolitan areas with the number of COVID-19 cases over a threshold in South Korea from February 18, 2020, to May 11, 2021.

We detected spatial clusters with elevated risks using a scan statistic for metropolitan areas from August to September 2020 (Fig. 9, Table 4). Most of the districts were in Seoul, and only some were in Gyeonggi.

Figure 9
figure 9

Cluster maps of COVID-19 cases for metropolitan areas in South Korea from August to September 2020.

Table 4 Cluster information of COVID-19 for metropolitan areas in South Korea from August to September 2020.

The cluster sizes detected in May 2021 were larger than those detected in August–September 2020, and the number of cases in the detected clusters increased accordingly (Fig. 10, Table 5).

Figure 10
figure 10

Cluster maps of COVID-19 cases for metropolitan areas in South Korea in May 2021.

Table 5 Cluster information of COVID-19 for metropolitan areas in South Korea in May 2021.

Discussion

In this study, we conducted a spatiotemporal analysis to investigate the spatial spread and time-varying clusters of COVID-19 in South Korea. Along with Moran’s I results, we presented various time series plots to examine the temporal pattern and produced choropleth maps to visually check the spatial association. To explore spatial clusters, scan statistics and visualization methods were considered. In general, the p value is related to sample size and significance37. It is possible to obtain small p values in large datasets with weak associations or large p values in small datasets with strong associations. Thus, we considered various visualization methods as well as statistical tests to investigate the spatial dynamics of COVID-19.

We found the areas in Daegu to be clusters in the early stage. This result may be due to mass infection in the Shincheonji religious group38,39. Then, metropolitan areas were detected as hotspots in December. It was reported that various cluster infections occurred in long-term hospitals, public saunas, and prisons in December 202040.

Previous studies on the dynamics of the spatial patterns of COVID-19 have focused on existing spatial dependent effects or detecting spatial clusters, mainly using Moran’s I statistics and spatial scan statistics3,4,5,25,26,27. The spatial spread of COVID-19 in China at the very early stage, from January 16, 2020, to February 06, 2020, was first examined, using 31 province-level COVID-19 confirmed data3. The spatial patterns of COVID-19 in China from January 10, 2020, to March 5, 2020, was also studied4. The dynamic spatial association of COVID-19 in 31 province-level regions and 337 prefecture-level cities in China from January to October 2020 was examined5. In Iran, the spatial association and spatial hotspots of COVID-19 at the early stage (March and April 2020) was examined25, and the spatiotemporal patterns of COVID-19 from February 18 to October 21, 2020 were analyzed26. Approximately 4 months of the COVID-19 epidemic from January 20 to May 31, 2020, in South Korea were covered27. These studies mapped the spatial pattern and linked the clusters in the early epidemics, and the results may have contributed to knowledge on COVID-19 epidemics, especially during the period in which information about the virus was lacking. Our study included a longer period of 16 months and recent dates with more cases, so that it is a powerful approach for demonstrating the current dynamics of spatial clustering across South Korea.

The spatiotemporal dataset may contain excess zero counts owing to the spatiotemporal units; then, such property should be considered in the analysis. Here, we accounted for the excess zero counts by utilizing a zero-inflated Poisson distribution in the scan statistic. We used various spatiotemporal methods simultaneously, leading to better results than using only one method. We compared the results of different approaches and provided more comprehensive results. In this study, we conducted weekly spatial analysis to investigate the real-time spatial dynamics of COVID-19 cases across South Korea. Thus, we did not consider the use of multiple tests with p value adjustments41.

Despite the many strengths of this study, it has some limitations. First, we did not investigate possible confounding factors on COVID-19 spread. For example, the Korean government has implemented many social distancing policies and regulations. If we consider these nonpharmaceutical effects, we might obtain more precise results. In addition, we did not investigate the spatial association between COVID-19 and confounding factors, such as air pollution, weather, population mobility, and demographic characteristics. Thus, future research should investigate the effects of confounding factors on COVID-19 at the regional level in South Korea using statistical models.

Second, we used the official number of COVID-19 cases to study the spatial dynamics of COVID-19 in South Korea. However, the official numbers might be underestimated due to limited testing capacities, unexpected false negatives, overcrowding of hospitals, and unprepared health systems42,43,44,45,46,47,48. The spatial dynamics of COVID-19 using official numbers or real numbers might be different. Thus, it may be of interest to conduct spatiotemporal analysis of COVID-19 by considering the underestimation of COVID-19 cases.

Conclusion

To the best of our knowledge, this is the first study to conduct a spatiotemporal analysis using long-term COVID-19 data in South Korea. Here, we showed that spatial spread of the coronavirus occurred, especially in metropolitan areas. A timely spatiotemporal analysis would be helpful for identifying hotspots and preventing spatial transmission of the virus during the pandemic.