## Introduction

COVID-19 is an infectious disease whose main symptoms were breathing, coughing and sneezing. The median incubation period is 3 days and the longest can be up to 24 days1. The COVID-19 had spread to the world from the beginning of 2020. By December 29, 2021, the cumulative number of diagnosed cases worldwide has exceeded 280 million, and the cumulative number of deaths has exceeded 5 million2. It took only 2 or 3 months from the outbreak of COVID-19 to containment in mainland China. The arduous path of epidemic prevention in mainland China has provided valuable experience for all countries in the world.

Various work3,4,5,6,7,8,9 has analyzed the pathological and epidemiological characteristics of COVID-19. Literature10,11 have used infectious disease dynamic models (SEIR, LES) to predict the spread of COVID-19, which show that COVID-19 infection is spatially dependent and can spread to nearby areas. Literature12,13 have paid attention to the risk of COVID-19 spreading in urban and rural areas, and find that COVID-19 epidemic is composed of sub-epidemics with different temporal dynamics and spatial patterns. Some scholars have studied the regional transmission mechanism of COVID-19 through models such as GWR and metric geometry14,15, and find that the pattern of spatial differentiation of COVID-19 is a transitional pattern of parallel bands from east to west, and also an epitaxial radiation pattern centered in the Wuhan 1 + 8 urban circle. Some scholars also use geographic information systems and spatial autocorrelation analysis to uncover the geographic distribution characteristics of COVID-19. For example: Su16 uses spatial autocorrelation analysis at multiple time points to study the spatial aggregation of the cumulative number of confirmed cases in various regions of China. Liu17 assesses the spread of the epidemic in Henan province through spatial autocorrelation analysis and relative risk coefficients. Jian18 describes the spatial pattern and distribution characteristics of the epidemic situation in Henan province from the perspective of planning. These studies have achieved important results, explaining the epidemic mechanism of COVID-19 in time and space.

COVID-19 was detected and spread in Wuhan, Hubei province in January 2020. Because of the huge population of Wuhan and the proximity of the Chinese Lunar New Year, COVID-19 quickly spread to all provinces in mainland China. Among them, Hubei province and its surrounding provinces are the most affected. The closure strategy adopted by Wuhan has significantly contained the risk of epidemics caused by COVID-19. This study focuses on the spatio-temporal distribution of COVID-19 in mainland China when the epidemic just broke out. At present, there are relatively few studies on the correlation of the epidemic in the two dimensions of time and space during this time period in mainland China. This study uses spatio-temporal autocorrelation analysis and spatio-temporal scanning statistical methods to detect the spatio-temporal aggregation of COVID-19 and provide a theoretical basis for scientifically formulating epidemic prevention measures.

## Results

### Analysis of inverse distance weighted interpolation

The spatial distribution characteristics of the cumulative number of diagnosed cases with COVID-19 in mainland China on January 23 and February 11 are analyzed by inverse distance weighted interpolation, and the distribution map is drawn. The results are shown in Fig. 1. The spatial distribution map showed that the epidemic situation had basically spread to the whole country on January 23. Compared with February 11, most of the high incidence areas (hot spots) and low incidence areas (cold spots) of the epidemic were the same, and there was no larger-scale spread, indicating that the epidemic has been better controlled during this period. On the whole, the effect to Tibet, Yunnan, Northwest China, and Northeast China were less dramatic than Hubei and its neighboring regions. The cumulative number of diagnosed cases obeyed the characteristics of Hubei’s geographic proximity and network proximity, which was consistent with relevant research conclusions17.

### Spatio-temporal autocorrelation analysis

#### Global spatio-temporal autocorrelation analysis

The global spatio-temporal Moran’s I index can be used to describe the degree of spatio-temporal autocorrelation at two different moments in the study area. In Table 1 and Fig. 2, we display the global spatio-temporal autocorrelation analysis of the number of new diagnosed cases from January 23 to February 11, computed in (2), with the lag order k=1. From January 23 to January 31, the daily global spatio-temporal Moran’s I index was almost positive with the p-value less than 0.1, and showed a downward trend, with the most obvious decline in the first 5 days. It showed that with the onset of the incubation period cases with a history of contact in Hubei, the epidemic was increasing rapidly during this period, and there was a spatio-temporal aggregation, but it was quickly under control. From February 1 to February 4, the daily global spatio-temporal Moran’s I index was negative, and the value was small and close to 0. The significance test indicated that the daily number of new diagnosed cases passed the peak and began to show a weak downward trend, and there was still spatio-temporal aggregation. From February 5 to February 11, the global spatio-temporal Moran’s I index continued to be negative, with a relatively large value. It failed the significance test, indicating that the epidemic began to get under control, and the daily number of new diagnosed cases showed a significant downward trend, which was randomly distributed in time and space.

#### Local spatio-temporal autocorrelation analysis

Local spatio-temporal autocorrelation analysis is used to identify the correlation patterns of different spatio-temporal locations, and local spatio-temporal non-stationarity can be observed. In Fig. 3, we display the local spatio-temporal autocorrelation analysis, computed in (4), with the lag order k=1. Figure 3 shows the local spatio-temporal aggregation of new diagnosed cases at three different time points after the city was closed, and all of them passed the significance test.

High-high clustering pattern means that the neighboring areas of the area with a large number of new diagnosed cases at the current moment will have more new diagnosed cases at the next moment. Low-low clustering pattern means that the neighboring areas of the area with few new diagnosed cases at the current moment will also have fewer new diagnosed cases at the next moment. High-low clustering pattern means that the neighboring areas of the area with a large number of new diagnosed cases at the current moment will have fewer new diagnosed cases at the next moment. Low-high clustering pattern means that the number of new diagnosed cases will increase in the neighbouring areas of the area with few new diagnosed cases at the current moment24.

The high-high clustering pattern was mainly concentrated in Hubei and Hunan in the early epidemic, and their neighboring provinces such as Anhui and Jiangxi were more affected. With the deepening of national epidemic prevention work, the number of new diagnosed cases in Hubei began to decrease and transformed into a high-low cluster mode on February 1. The daily number of new diagnosed cases in Hunan, Chongqing, Anhui, Jiangxi and other regions began to decline gradually, which was consistent with the results of global spatio-temporal autocorrelation analysis. The low-low clustering pattern mainly focused on regions from Xinjiang, Inner Mongolia, Gansu, and Ningxia in the early epidemic, and successively joining some provinces such as Sichuan and Hainan, suggesting that the fundamentals of the national epidemic have gradually improved during this period.

### Spatio-temporal scanning statistics analysis

The results of the spatio-temporal scanning statistics analysis are shown in Table 2 and Fig. 4, computed by Eq. (5). The results show that there are three spatio-temporal clusters for the number of new diagnosed cases in the country from January 23 to February 11. If the relative risk index (RR) is more than 1, the cluster is a high-clustering area, otherwise, the cluster is a low-clustering area. The p-values of the significance test results are all less than 0.001, which are statistically significant. The high-clustering area was Hubei, the clustering time was from February 1 to February 10, and the relative risk index was 76.84. There were two low-clustering areas. The first cluster included Xinjiang, Qinghai, Ningxia, Gansu, Shaanxi, Tibet, Sichuan, Guizhou, Chongqing, Yunnan, Beijing, Shanxi, Hebei and Inner Mongolia, with a relative risk index of 0.095. The second cluster included Shanghai, Jiangsu, Zhejiang, Anhui, Fujian, Jiangxi, Shandong and Henan, with a relative risk index of 0.42, and the clustering time was from January 23 to February 1. The results of the spatio-temporal scanning show that the effect to neighboring provinces of Hubei and developed coastal areas were more dramatic than the northwest China, southwest China and northern China in the first 10 days of early epidemic. From the 10th day to the 20th day, the number of new diagnosed cases in other regions of the country except Hubei was randomly distributed, and there was no clustering area in time and space.

## Methods

### Data source

The provincial geographic map data of mainland China is obtained from the China Basic Geographic Database (scale 1:1 million), provided by the National Geographic Information Catalog Service Center (http://www.webmap.cn/). The data on the cumulative number of diagnosed cases of COVID-19 is obtained from National Health Commission and the websites of the provincial health commissions, which is obtained through crawler technology provided by python. For some provinces such as Xinjiang and Tibet which are difficult to obtain data on the websites of the health commission, the data is also acquired through web news. The number of new diagnosed cases per day is equal to the difference between the cumulative data of two consecutive days. Since February 12, the national diagnostic criteria for diagnosed cases changed, and “clinical diagnosis” was added to case diagnosis classification in Hubei, the most severely affected province19, resulting in inconsistent statistics data caliber. In addition, the onset of cases with a history of contact in Hubei was almost all within 20 days after leaving Hubei, so the research time range of this article is from January 23 to February 11.

Inverse distance weighted interpolation is implemented by arcgis 10.2, Spatio-temporal autocorrelation analysis is done by GeoDa 1.12.1, Spatio-temporal scanning statistics is done by SaTScan 9.5, Time-series diagram of spatio-temporal Moran’s I index changes is drawn by excel 2013, other maps are drawn by arcgis 10.2.

### Model

#### Inverse distance weighted interpolation

The spatial interpolation method is a statistical method that uses known sample data points to estimate unknown data, which can more comprehensively reflect the spatial distribution characteristics of the data. The inverse distance weighted interpolation is the most commonly used. It uses the inverse proportion of the spatial distance between the estimated point and the known data point as the weight for interpolation. The larger the distance, the smaller the weight. The calculation formula is20 :

\begin{aligned} Z=\frac{(\sum _{i=1}^n{\frac{Z_i}{d_i^m}})}{(\sum _{i=1}^n{\frac{1}{d_i^m}})}. \end{aligned}
(1)

In Eq. (1): Z is the value of the point to be estimated, which is unknown; $$Z_i$$ is the value of the i-th known sample point around Z; $$d_i$$ is the distance between Z and $$Z_i$$; n is the number of known sample points around Z; m is the power value of the inverse distance. The greater the parameter m, the more the point to be estimated will be affected by nearest sample point, and the rougher the space will be. On the contrary, the more distant sample point will be affected and the space will be smoother. Generally, m=2 by default.

#### Spatio-temporal autocorrelation analysis

Spatio-temporal autocorrelation analysis is a special case of bivariate spatial autocorrelation analysis21. Spatio-temporal autocorrelation analysis introduces the time dimension on the basis of traditional spatial autocorrelation analysis, which can systematically reflect the spatio-temporal change characteristics and aggregation trends of variables. The spatio-temporal Moran’s I index is a measure of spatio-temporal autocorrelation analysis22. The global spatio-temporal Moran’s I index represents the influence of the change of variable i at time t-k on the surrounding variables at time t (k is the lag order). The calculation formula is:

\begin{aligned} STI_k=\frac{\sum _{i=1}^n\sum _{j=1}^n\omega _{t-k,t}\omega _{ij}(x_{i,t-k}-{\overline{x_{t-k}}})(x_{j,t}-\bar{x_t})}{\sqrt{\sum _{i=1}^n(x_{i,t-k}-{\overline{x_{t-k}}})^2}\sqrt{\sum _{i=1}^n(x_{i,t}-\bar{x_t})^2}}. \end{aligned}
(2)

In Eq. (2): n is the number of regions; $$\omega _{t-k,t}$$ is the time weight, which indicates the degree of influence at time t-k on time t, in our results, we always set k=1; $$\omega _{ij}$$ is the spatial weight; $$\ x_{i,t}$$ and $$\ x_{j,t}$$ and $$\ x_{i,t-k}$$ represent the number of new diagnosed cases in the region at that time, $$\overline{x_{t-k}}$$ and $$\overline{x_{t}}$$ represents the average number of new diagnosed cases in all regions at that moment. Due to the lack of epidemic data in Hong Kong, Macao and Taiwan, n=31 in this article. The spatial weight matrix W uses the Queen adjacency type, and the neighboring area of Hainan province is set as Guangdong province. Queen matrix is a kind of spatial weight matrix, which expresses the neighbor relationship between spatial units. If there are n units, queen matrix W can be expressed as follows:

\begin{aligned} W= & {} \begin{bmatrix} \omega _{11} &{} \omega _{12} &{} \dots &{} \omega _{1n}\\ \omega _{21} &{} \omega _{22} &{} \dots &{} \omega _{2n}\\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ \omega _{n1} &{} \omega _{n2} &{} \dots &{} \omega _{nn} \end{bmatrix}\\ \omega _{ij}= & {} {\left\{ \begin{array}{ll} 1 &{} if\ i \ is \ adjacent \ to \ j \\ 0 &{} otherwise \end{array}\right. }. \end{aligned}

The global spatio-temporal Moran’s I index range is $$-\,1 \le \ STK_{i}\le 1$$. If it is greater than 0, there is a positive spatio-temporal relationship. If it is less than 0, there is a negative spatio-temporal relationship. If it is equal to 0, there is no spatio-temporal correlation. Under the assumption of normal distribution, the significance test is performed on the null hypothesis that variables do not have spatio-temporal autocorrelation in time and space, that is, they are randomly distributed in time and space. The statistic is the standardized Z value:

\begin{aligned} Z=\frac{STI_k-E(STI_k)}{\sqrt{Var(STI_k)}}. \end{aligned}
(3)

The significance level is determined by the p-value calculated by the standardized statistic Z. At the significance level of 0.1, if p < 0.1, it indicates that there is a spatio-temporal aggregation trend. If p > 0.1, it indicates that there is no spatio-temporal aggregation trend, and the variables are randomly distributed in time and space. The global spatio-temporal Moran’s I index is used to detect the overall spatio-temporal correlation. The local aggregation trend and non-stationary information are described by the local spatio-temporal Moran’s I index. The calculation formula is:

\begin{aligned} PSTI_{i,k}=\frac{n\omega _{t-k,t}(x_{i,t-k}-{\overline{x_{t-k}}})\sum _{j=1}^n\omega _{ij}(x_{j,t}-\bar{x_t})}{\sqrt{\sum _{i=1}^n(x_{i,t-k}-{\overline{x_{t-k}}})^2}\sqrt{\sum _{i=1}^n(x_{i,t}-\bar{x_t})^2}}. \end{aligned}
(4)

The local spatio-temporal Moran’s I index represents the influence of the number of new diagnosed cases in a local area at time t-k on the number of new diagnosed cases in the surrounding area at time t. Its variables, symbol definitions, value ranges and correlation explanations correspond to the global spatio-temporal Moran’s I index, and the hypothesis test method is also consistent with the global spatio-temporal Moran’s I index test method.

#### Spatio-temporal scanning statistics

The spatio-temporal scanning statistical method is uesd to analyze the data aggregation state based on the moving scanning window of a cylinder23. Its circular window is used to scan the spatial area, and the height reflects the time weight information. The cylindrical window moves on the spatio-temporal coordinate system, scanning each spatio-temporal area, and reflecting the state of spatio-temporal aggregation through overlapping cylinders of different sizes and shapes in the spatio-temporal area. This article uses the poisson model’s spatio-temporal scanning statistics to detect the spatio-temporal aggregation area of the daily new diagnosed cases. The significance level p-value calculated according to the actual statistics is used to determine whether the area is a spatio-temporal clustering area. The log-likelihood ratio (LLR) is the grading basis, and the relative risk(RR) represents the risk of an epidemic in the clustering area relative to the surrounding areas. The calculation formula is:

\begin{aligned} LLR=log\bigg(\frac{c}{\mu }\bigg)\bigg(\frac{C-c}{C-\mu }\bigg)^{(C-c)}. \end{aligned}
(5)

In Eq. (5), C is the total number of cases in each region, and c is the number of cases in the scanning window. $$\mu$$ is the expected value of the number of cases in the scanning window. Hypothesis testing is performed on the LLR, and the p-value is calculated by the Monte Carlo method. When p < 0.05, it is considered that there is a significant difference in the RR inside and outside the scan window. The value of LLR corresponds to the possibility of spatio-temporal aggregation in the scanning window, and the scanning window with the largest LLR value corresponds to the most likely aggregation area.

### Ethics declarations

This study used public data from the official website of the National Health Commission of China, all experiments were performed in accordance with relevant guidelines and regulations, and all participants provided written informed consent.