Statistical inference using GLEaM model with spatial heterogeneity and correlation between regions

Tan, Yixuan; Zhang, Yuan; Cheng, Xiuyuan; Zhou, Xiao-Hua

doi:10.1038/s41598-022-18775-8

Download PDF

Article
Open access
Published: 05 October 2022

Statistical inference using GLEaM model with spatial heterogeneity and correlation between regions

Yixuan Tan¹^na1,
Yuan Zhang²^na1,
Xiuyuan Cheng¹ &
…
Xiao-Hua Zhou^3,4,5

Scientific Reports volume 12, Article number: 16630 (2022) Cite this article

956 Accesses
Metrics details

Subjects

Abstract

A better understanding of various patterns in the coronavirus disease 2019 (COVID-19) spread in different parts of the world is crucial to its prevention and control. Motivated by the previously developed Global Epidemic and Mobility (GLEaM) model, this paper proposes a new stochastic dynamic model to depict the evolution of COVID-19. The model allows spatial and temporal heterogeneity of transmission parameters and involves transportation between regions. Based on the proposed model, this paper also designs a two-step procedure for parameter inference, which utilizes the correlation between regions through a prior distribution that imposes graph Laplacian regularization on transmission parameters. Experiments on simulated data and real-world data in China and Europe indicate that the proposed model achieves higher accuracy in predicting the newly confirmed cases than baseline models.

Infectious disease in an era of global change

Article 13 October 2021

The WHO estimates of excess mortality associated with the COVID-19 pandemic

Article Open access 14 December 2022

Over half of known human pathogenic diseases can be aggravated by climate change

Article 08 August 2022

Introduction

The outbreak of coronavirus disease 2019 (COVID-19) has impacted all aspects of the world significantly for a long. As of 26 Oct 2021, over 243 million confirmed cases of COVID-19 have been reported, including over 4 million deaths¹. Therefore, it is essential to study the spread of COVID-19 for better prediction and prevention of the disease. This paper proposes a new stochastic dynamical model that can describe different spread patterns of COVID-19 in multiple regions. We also develop an algorithm to estimate the corresponding transmission parameters and their posterior distributions. Our model is inspired by the Global Epidemic and Mobility (GLEaM) model proposed in Ref.². GLEaM is a stochastic dynamic model that depicts the spread of epidemics, integrating multiple data layers. The GLEaM model involves 3362 subpopulations in 220 countries obtained from Voronoi tessellation, centered around major airports. These subpopulations are connected by a multi-layered mobility network composed of processes from short-range commuting between nearby subpopulations to international flights. In each subpopulation, the transmission of epidemics is modeled by a variant of an Susceptible-Exposed-Infected-Removed (SEIR) compartmental model³. Please see “Review of the SEIR model and GLEaM model” for a more detailed review of the SEIR model and GLEaM model.

In the vast majority of GLEaM’s applications^{4,5,6,7,8,9,10,11}, the parameters are estimated based on Ref.¹². Reference¹² performed the maximal likelihood analysis of the reproduction number $R_0$ in the seed region, Mexico. For each value of the reproduction number $R_0$, the method generated the distribution of the arrival time of the influenza A(H1N1) in 12 countries produced by $2\times 10^3$ GLEaM simulations. Then, the optimal reproduction number $R_0$ was chosen by maximizing the likelihood function of arrival time. Reference¹² and subsequent works following its settings^{4,5,6,7,8,9,10,11} assumed that the epidemic was seeded from one region and the transmission parameters or the other parameters (like the introduction date and location) were estimated through the maximal likelihood analysis of arrival time or other events. In particular, the method in Ref.¹² was adopted in Ref.⁶ to estimate the posterior distribution of the reproduction number $R_0$ of COVID-19, which was assumed to be uniform for all subpopulations at all times. However, this setting is unsuitable for the current scenario of the COVID-19 pandemic since COVID-19 has lasted for a long time, and the community transmission has been widespread in most countries in the world¹³. To model the spread of COVID-19, both spatial and temporal heterogeneity of the transmission parameters are needed, rather than directly modeling the reproduction number $R_0$ solely as a periodic function of time as in Ref.¹². This is because the social behaviors, containment measures, medical conditions, and other elements that affect the spread of COVID-19 may vary among different countries and over time.

Recently, Reference¹⁴ improved the inference method in Ref.¹² based on the GLEaM model, by involving spatial heterogeneity. Specifically, Ref.¹⁴ estimated the initially infected individuals in each subpopulation through microblogging data from Twitter and also estimating the reproduction number $R_0$ for USA, Italy, and Spain separately. However, international travel was not considered in this study, and GLEaM was applied to each of the aforementioned countries (as an isolated systems) independently. The transmission rates in all subpopulations of these countries were again presumed to be homogeneous. Furthermore, Ref.¹⁴ also assumed that the initially infected individuals for each subpopulation were proportional to the total number of Twitter users in that subpopulation. Thus it still assumed that the severity of the pandemic at the initial outbreak of COVID-19 was uniform over the country, which is not the case for COVID-19.

In addition to the abovementioned issues, other potential concerns exist in applying GLEaM to model the spread of COVID-19. As mentioned in the last section of Ref.¹⁵, GLEaM can be used to simulate the spread of the epidemic under normal conditions since it uses the “steady-state” mobility data around the world. However, since the outbreak of COVID-19, the social order has been disrupted, and travel has been restricted in most countries. Thus GLEaM might not work well with its multi-layered mobility networks. Furthermore, the estimate of parameters using GLEaM is based on a large number of simulations to explore the space of parameters, which may potentially take much computational time² when the epidemic parameters to be estimated are spatially heterogeneous. In addition, although the social behavior, medical conditions, and other factors that affect the spread of COVID-19 may vary among different regions, these factors for regions that are geographically close or have similarities in other aspects still bear some resemblance. Hence, the transmission rates for COVID-19 should not only have their own heterogeneity but also be correlated to each other. To the best of our knowledge, neither of the features is reflected in GLEaM or most of its applications.

As the consequences of the possible constraints of GLEaM described above, most of the papers using GLEaM to model the epidemics mainly focus on estimating only the transmission parameter in the seed region at the very beginning of the outbreak. However, for the current long-lasting spread of the COVID-19 pandemic all over the globe, the spatial and temporal heterogeneity of the transmission parameters is needed to be taken into full consideration.

In this paper, we propose a new stochastic model that incorporates transportation between regions and at the same time enables spatial and temporal heterogeneity of transmission parameters. We model n regions as a graph having n nodes, and the transportation pattern between the regions is encoded as n-by-n matrices. Our graphical model of epidemic dynamics is a general abstract one motivated by and simplified from the GLEaM framework. Figure 1 shows a diagram of the proposed model. In contrast to most applications of GLEaM, which mainly focus on the initial outbreak, our proposed model is able to model the long-lasting spread of epidemics. For the inference of model parameters, we introduce an optimization algorithm that utilizes the correlation between districts. Furthermore, the posterior distribution of parameters is estimated by an Markov Chain Monte Carlo (MCMC) sampling procedure, where we set the initial value of the Markov Chain as the optimal parameter obtained by the optimization algorithm. This approach can potentially accelerate the convergence of MCMC sampling.

In summary, the main contributions of our paper are:

We propose a new stochastic model to describe the epidemic’s long-lasting spread, allowing spatial and temporal heterogeneity of transmission parameters and transportation between districts.
Based on the proposed model, we also design an algorithm that first makes inference for the parameters through a two-step procedure and then estimates the posterior distribution efficiently by MCMC sampling with the estimated parameters as the initial points. The parameter inference combines the information of correlation between districts, which is equivalent to imposing graph Laplacian regularization on the transmission parameters.
We compare the performance of the proposed model with the baseline models on both simulated and real-world data.

- For the simulated data, the results show that combining heterogeneity and transportation into the model helps improve the performance of trajectory prediction and parameter estimation. Moreover, our inference algorithm that integrates the correlation of districts leads to further improvement in predicting the future trajectories.

- For the real-world data in China and Europe, the proposed model outperforms the baselines in trajectory prediction.

A strength of the proposed model resides in introducing spatial and temporal heterogeneity of transmission parameters. We compare with more related works and comment on the differences and relations in “More related works”. Datasets used in this paper are publicly available at Refs.^16,17,18. Our work focuses on the methodology development and we aim at a new stochastic dynamic model that is generally applicable.

We list the default notations and parameters used throughout the paper in Table 1. The rest of the paper is structured in the following way: In “Methods”, we introduce the stochastic dynamic model and the corresponding inference algorithm. In “Experimental results for simulated data”, we compare the performance of trajectory prediction and parameter estimation of the models with or without mobility, heterogeneity, and using correlation information in the inference part for the simulated data. Section “Experimental results on COVID-19 data” describes the real-world data used in this paper, and presents the results and findings of applying the proposed model to the COVID-19 data in China and Europe. We discuss the limitations and possible extensions in “Discussion”.

Review of the SEIR model and GLEaM model

In this section, we provide a more detailed introduction to the SEIR model and the GLEaM model so as to provide a background of our study and augment the following context.

To depict the evolution of the epidemics, Ref.¹⁹ proposed the celebrated Susceptible-Infected-Removed (SIR) model and characterized the development of the pandemic with a deterministic ordinary differential equation (ODE). There are many extensions of the SIR model, including the Susceptible-Exposed-Infected-Removed (SEIR) model for diseases with a latent period, the Susceptible-Infected-Susceptible(SIS) model for diseases that do not gain immunity after recovery, etc.

These deterministic transmission models are constructed under certain assumptions, including that the population is large, closed, and homogeneous. Due to the random nature of the transmission process, many stochastic dynamic models are developed^20,21,22. Under certain rather generalized conditions, the deterministic models can be seen as the mean-field equations of the corresponding stochastic processes. However, this approximation may not hold when the size of the outbreak has not grown up to the same order of the total population, which is the case in many applications²³. More details can be found in Ref.²⁴ and the references therein.

The Global Epidemic and Mobility (GLEaM) model proposed in Ref.² used a meta-population scheme which balanced between the agent-based stochastic models and the deterministic compartmental models. Specifically, Ref.² adapted a high-resolution population database that divided the surface of the earth with cells of 15 min $\times$ 15 min of arc, and then used Voronoi tessellation to assign each cell to one of the major airports around the world. The obtained subdivisions were then called subpopulations.

The stochastic dynamic in the subpopulations was then coupled with two layers of mobility flows apart from the infection dynamic within each subpopulation. The first layer was the worldwide airport network between the airports in the subpopulations, which could be seen as a weighted graph whose edges represented the number of passengers between each pair of airports. This layer was integrated into the model through stochastic transportation between subpopulations. The second layer was the commuting network that connected subpopulations graphically close. This layer was integrated through being used to compute the effective population and infection in each subpopulation. More details can be found in Ref.².

More related works

Several recent works also involved different levels of heterogeneity in their models in various ways. Reference^25,26 utilized randomness in reproduction numbers to reflect the heterogeneity of the population, using plate model with Bayesian method and heterogeneous well-mixed theory²⁷ with age-of-infection method¹⁹, respectively. References^28,29 used functional data analysis tools. Specifically, Ref.²⁸ captured two different epidemic patterns in different regions of Italy using the probKMA algorithm Refs.³⁰, and²⁹ revealed different patterns of the epidemic across countries with functional principle component analysis. In addition, Refs.^{31,32,33,34,35} adapted SEIR / Susceptible-Exposed-Infected (SEI) / Susceptible-Infected (SI) compartmental models similar to this paper. Among these works,^31,32,33 considered heterogeneity in the aspects of age groups, social links, and vaccination status separately. References^34,35 bore more similarity with our paper since they also allowed transmission parameters to be spatially heterogeneous and involved transportation between different regions. However, Ref.³⁴ only considered intracounty data, and the transportation was used to compute the effective size of compartments and did not affect the dynamic model. Furthermore, the transmission rates in Ref.³⁴ were determined by an SDE whose parameters were to be fitted. Therefore, Ref.³⁴ focused on a different scope from our study. The settings of compartments in Ref.³⁵ were more realistic than the one considered in our paper by considering reporting rates. Nevertheless, compared with the model and inference algorithm described in “Methods”, transmission rates in Ref.³⁵ did not have temporal heterogeneity or correlation with each other. Both Refs.^34,35 used the Ensemble Kalman Filter, which samples particles in the state space according to the prior distribution and obtains the posterior distribution in the process of moving particles at each time step. This might be computationally less efficient than directly applying MCMC according to the posterior distribution with the initial point maximizing the posterior distribution, as implemented in this paper.

Methods

Ethics statement

The medical record data in China and Europe used in this paper are publicly available and can be found on the official websites of the National Health Commission of the People’s Republic of China¹⁶, the Chinese Center for Disease Control and Prevention¹⁷, and European Centre for Disease Prevention and Control¹⁸. The collection of data is performed in compliance with local government regulations. More details about data sources can be found in “Data sources”.

Model description

Compartmental model over multiple regions

In GLEaM² and other epidemic models involving transportation^34,35, the whole area is usually divided into subdivisions. For example, the GLEaM model divides the total area of 220 countries into over 3300 subpopulations centered around major airports and³⁴ divided Milwaukee County and Dane County in the state of Wisconsin into several regions. In this paper, we consider abstract subdivisions in the whole area, which will be referred to as “regions” hereinafter until further specifications in the later experiment sections. We denote n as the number of regions.

In our model, we use continuous time $t \in [0,T]$, where it is assumed that the evolution of the epidemic lasts within a period of T time units. The unit of time is fixed as one day throughout this paper. Note that when we introduce the transportation model in below the traveling matrix is assumed to be constant within each day, and the observed data is also collected on a daily basis. Thus we will use notation of discrete time (days) from $1,\dots ,T$ hereinafter, however, the evolution dynamic itself is modeled over continuous time.

For each region, we consider the following epidemiology compartments adapted from the SEIR model:

$S_k(t)$: Susceptible.
$E_k(t)$: Exposed and infectious.
$H_k(t)$: Hospitalized.
$R_k(t)$: Removed (recovered or dead).

The subscript k of the states denotes that they belong to the k-th region, and the dependence on the continuous time t is addressed through expressing the states as functions of $t\in [0, T]$.

At time t, we use $N_k(t) = S_k(t) + E_k(t) + H_k(t) + R_k(t)$ to denote the total population in the k-th region. The population $N_k(t)$ is allowed to be time-varying due to the inter-region mobility, especially for the days before the implementation of travel restrictions. However, since the traveling volume is not comparable to the total population in a region, the fluctuation of the total population in a region is not obvious. In this paper, we assume that $N = \sum _{k=1}^n N_k(t)$ keeps constant over time, which means that we consider a closed system, where exported/imported cases are not considered. However, it is worth noting that we do allow the transportation of active virus carriers between regions within our system. We remark in advance that this assumption is reasonable for the real-world data sets considered in this paper. From January to February 2020, strict international travel restrictions were imposed in China. While for data in Europe, from May to August 2020, the local spread of the epidemic has reached a relatively high level, and the imported cases were not comparable to the indigenous cases. We also denote $(N_a)_k(t) = S_k(t) + E_k(t) + R_k(t)$ as the total population that are permitted to move in the k-th region, excluding the hospitalized ones.

Transportation between regions and the stochastic model

Transportation plays an essential role in the spread of COVID-19. Actually, Refs.^36,37 indicated that the travel restrictions were remarkably important in mitigating the transmission of COVID-19, especially in the early stage of the pandemic. Recently, as detailed in Ref.³⁸, the Omicron variant had spread to 110 countries and had become dominant in many of them by 22 December 2021, only one month after its first report from South Africa on 24 November 2021. This motivates us also to take transportation into consideration in this paper. In our model, we introduce the transportation between regions via a traveling matrix, which is similar to the notation in the GLEaM model². Specifically, we denote $(w_l)_{kj}$ as the traveling volume from region k to j on the l-th day ($l=1,\dots ,T$). Then the traveling matrix $W_l$ on the l-th day can be written as

$$\begin{aligned} W_l=\begin{pmatrix} (w_l)_{11} &{} (w_l)_{12} &{} \cdots &{} (w_l)_{1n}\\ (w_l)_{21} &{} (w_l)_{22} &{} \cdots &{} (w_l)_{2n}\\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ (w_l)_{n1} &{} (w_l)_{n2} &{} \cdots &{} (w_l)_{nn}\\ \end{pmatrix}. \end{aligned}$$

(2.1)

Given the transportation matrix, we describe in below the stochastic model of the dynamic of the compartments over n regions, denoted as $\{ (S_k(t), E_k(t), H_k(t), R_k(t)), k =1 \dots , n \}$, where $t\in [0,T]$ is the continuous time, and the variables $S_k(t)$, $E_k(t)$, $H_k(t)$, and $R_k(t)$ take integer values from 0 to N. The proposed stochastic model is illustrated in Fig. 1.

Transmission in the k-th region: A case from $E_k(t)$ chooses an individual from $N_k(t)$ randomly at Poisson rate $\lambda _k$ ($\{\lambda _k\}_{k=1}^n$ are allowed to be spatially heterogeneous), and the individual chosen is infected if it is of state $S_k(t)$. Note that this is different from traditional SEIR models, since we assume that for the COVID-19 case, the pre-symptomatic patients from $E_k(t)$ can be contagious.
Hospitalization in the k-th region: Each individual in $E_k(t)$ will be hospitalized with Poisson rate $\delta$.
Recovery or death in the k-th region: Each individual in $H_k(t)$ will transfer into $R_k(t)$ with Poisson rate $\gamma _k$. The rate $\gamma _k$ owns spatial heterogeneity due to the uneven distribution of medical resources.
Transportation between regions: At the end of the l-th day, all the individuals in region k except the ones in $H_k(l)$ have the same probability of traveling from region k to region j, and the total traveling volume from region k to region j is $(w_l)_{kj}$. We assume that there are no transmissions happening during the transportation between regions. If we denote $\{\xi _{kj}^{[S,l]}, \xi _{kj}^{[E,l]}, \xi _{kj}^{[R,l]}\}$ as the number of people transported from $\{S_k(l), E_k(l), R_k(l)\}$ to $\{S_j(l), E_j(l), R_j(l)\}$ at the end of the l-th day, then $\{\xi _{kj}^{[S,l]}, \xi _{kj}^{[E,l]}, \xi _{kj}^{[R,l]}\}$ follows a multinomial distribution. Specifically,
$$\begin{aligned} P\left( \left\{ \xi _{kj}^{[S,l]}, \xi _{kj}^{[E,l]}, \xi _{kj}^{[R,l]}\right\} \right) = \frac{\left( (w_l)_{kj}\right) !}{\left( \xi _{kj}^{[S,l]}\right) !\left( \xi _{kj}^{[E,l]}\right) !\left( \xi _{kj}^{[R,l]}\right) !} \left( \frac{S_k(l)}{(N_a)_k(l)}\right) ^{\xi _{kj}^{[S,l]}} \left( \frac{E_k(l)}{(N_a)_k(l)}\right) ^{\xi _{kj}^{[E,l]}} \left( \frac{R_k(l)}{(N_a)_k(l)}\right) ^{\xi _{kj}^{[R,l]}}, \end{aligned}$$
(2.2)
with $\xi _{kj}^{[S,l]}+\xi _{kj}^{[E,l]}+\xi _{kj}^{[R,l]}=(w_l)_{kj}.$

As a consequence, $N_k(t)$ is a piece-wise constant function of t which only changes at the end of each day. Specifically, for any time $t \ge 0$,

$$\begin{aligned} N_k(t) = N_k(0) + \sum _{l=1}^{\lfloor t\rfloor } \sum _{i=1}^n ((w_l)_{ik} - (w_l)_{ki}). \end{aligned}$$

(2.3)

A more comprehensive stochastic dynamic model has been previously developed in Ref.²⁴. However, the work did not consider transportation between regions, which is a focus of this study.

Table 1 List of notations and parameters.

Full size table

Differential equation with spatial heterogeneity

Following Refs.^39,40, we derive the corresponding mean-field differential Eq. (2.4) of the stochastic dynamic introduced in “Transportation between regions and the stochastic model”, which is continuous in time, and the compartments $(\widetilde{S}_k(t), \widetilde{E}_k(t), \widetilde{H}_k(t), \widetilde{R}_k(t))$ take real values.

$$\begin{aligned} {\left\{ \begin{array}{ll} \dfrac{d\widetilde{S}_k(t)}{dt} = -\lambda _k\dfrac{\widetilde{S}_k(t)\widetilde{E}_k(t)}{\widetilde{N}_k(t)} + \sum _{i\ne k}\bigg [(w_{\lceil t\rceil })_{ik}\dfrac{\widetilde{S}_i(t)}{(\widetilde{N_a})_i(t)} - (w_{\lceil t\rceil })_{ki}\dfrac{\widetilde{S}_k(t)}{(\widetilde{N_a})_k(t)}\bigg ] \\ \dfrac{d\widetilde{E}_k(t)}{dt} = \lambda _k\dfrac{\widetilde{S}_k(t)\widetilde{E}_k(t)}{\widetilde{N}_k(t)} -\delta \widetilde{E}_k(t)+ \sum _{i\ne k}\bigg [(w_{\lceil t\rceil })_{ik}\dfrac{\widetilde{E}_i(t)}{(\widetilde{N_a})_i(t)} - (w_{\lceil t\rceil })_{ki}\dfrac{\widetilde{E}_k(t)}{(\widetilde{N_a})_k(t)}\bigg ]\\ \dfrac{d\widetilde{H}_k(t)}{dt} = \delta \widetilde{E}_k(t)-\gamma _k \widetilde{H}_k(t)\\ \dfrac{d\widetilde{R}_k(t)}{dt} = \gamma _k \widetilde{H}_k(t)+ \sum _{i\ne k}\bigg [(w_{\lceil t\rceil })_{ik}\dfrac{\widetilde{R}_i(t)}{(\widetilde{N_a})_i(t)} - (w_{\lceil t\rceil })_{ki}\dfrac{\widetilde{R}_k(t)}{(\widetilde{N_a})_k(t)}\bigg ] \\ {\dfrac{d\widetilde{N}_k(t)}{dt}=\sum _i ((w_{\lceil t\rceil })_{ik} - (w_{\lceil t\rceil })_{ki})} \\ \dfrac{d(\widetilde{C_a})_k(t)}{dt} = \delta \widetilde{E}_k(t) \\ \dfrac{d(\widetilde{R_a})_k(t)}{dt} = \gamma _k \widetilde{H}_k(t) \end{array}\right. } \end{aligned}$$

(2.4)

The first four equations of (2.4) describe the evolution of the $S_k(t), E_k(t), H_k(t), R_k(t)$ in the deterministic version of our model, which is governed by the transition dynamic explained in “Transportation between regions and the stochastic model”. The fifth equation characterizes the deterministic total population, which is a piece-wise linear function of time t (since the traveling volume is a piece-wise constant function of t) and coincides with $N_k(t)$ expressed as in (2.3) when t takes integer values. The last two equations depict the evolution of accumulated confirmed and removed cases, denoted by $(\widetilde{C_a})_k(t)$ and $(\widetilde{R_a})_k(t)$ respectively, in the deterministic model. It is worth noting that in the calculation of $(\widetilde{C_a})_k(t)$ and $(\widetilde{R_a})_k(t)$, each case is only accounted for once. In (2.4), $\widetilde{S}_k(t)$ is the deterministic counterpart of $S_k(t)$ and the same for $\widetilde{E}_k(t), \widetilde{H}_k(t), \widetilde{R}_k(t), (\widetilde{N_a})_k(t)$.

Furthermore, we assume that the accumulated confirmed and removed cases are available from data on a daily basis, which are denoted as $\{(C_a)_k(i)\}_{i=1}^T$ and $\{(R_a)_k(i)\}_{i=1}^T$, respectively. We also assume that $R_k(0)$ is 0, while $E_k(0)$ and $H_k(0)$ are left to be inferred for each $k=1,\dots ,n$. For inference of parameters, we further denote $(\widetilde{\Delta C_a})_k(i) = (\widetilde{C_a})_k(i) - (\widetilde{C_a})_k(i-1)$ as the deterministic newly confirmed cases on the i-th day determined by (2.4), and $(\Delta C_a)_k(i)$ as the newly confirmed cases computed from data, namely $(\Delta C_a)_k(i) = (C_a)_k(i) - (C_a)_k(i-1)$, $k=1,\dots ,n, i=2,\dots ,T$. The same convention holds for the definitions of $(\widetilde{\Delta R_a})_k(i)$ and $(\Delta R_a)_k(i)$. Note that the data $\{(C_a)_k(i), (R_a)_k(i), (\Delta C_a)_k(i), (\Delta R_a)_k(i)\}_{i=1}^T$ are random in nature.

Note that the model and the inference algorithm described below can be applied to estimate parameters as long as $\{(C_a)_k(i),k=1,\dots ,n\}_{i=1}^T$, $\{(R_a)_k(i),k=1,\dots ,n\}_{i=1}^T$, and $\{W_l\}_{l=1}^T$ are available. The availability of $\{(C_a)_k(i),k=1,\dots ,n\}_{i=1}^T$ and $\{(R_a)_k(i),k=1,\dots ,n\}_{i=1}^T$ is required in many works that use the SEIR model to estimate transmission rates of the epidemic^41,42,43, and the transportation network is also used in GLEaM² and its applications. However, in contrast to the works based on GLEaM^4,5,6,12,15, we allow parameters to possess both spatial and temporal heterogeneity, and further utilize the correlation between regions in the inference of parameters. We remark that the spatial and temporal heterogeneity is reflected in the fact that the transmission parameters $\{\lambda _k\}$ are allowed to vary in both space and time in our model. The temporal heterogeneity is introduced in more detail for the real-world data in “Model extension by allowing time-varying parameters”.

Estimation of model parameters

Based on the model described in “Model description”, the parameters that need to be specified are $\delta$ and $\Theta : = \{E_k(0), H_k(0), \lambda _k, \gamma _k\}_{k=1}^n$. Using a simplification in Remark 1, we prefix the parameter $\delta$, and estimate the rest in a two-step procedure to be described in this section. As a brief summary,

Step 1. We first make inference for $\{\gamma _k\}_{k=1}^n$ by maximizing the likelihood of the observed newly removed cases. Details in “Step 1: Estimate $\{\gamma _k\}_{k=1}^n$”.
Step 2. After the estimation of $\{\gamma _k\}_{k=1}^n$, ${\overline{\Theta }} := \{E_k(0), H_k(0), \lambda _k\}_{k=1}^n$ are then estimated by maximizing the posterior probability, where we introduce a prior distribution combining the information of correlation between regions. Details in “Step 2: Estimate ${\overline{\Theta }} = \{E_k(0), H_k(0), \lambda _k\}_{k=1}^n$”.

Finally, in the end of Step 2, we introduce an MCMC sampling approach to estimate the marginal posterior distributions of ${\overline{\Theta }}$. This provides information about the uncertainty of the estimated parameters, like $\lambda _k$, which are of scientific interest. We summarize the two-step procedure in this section together in Algorithm 1.

Remark 1

Among the unknown parameters, we prefix the parameter $\delta$, the inverse of the average time for a person from being exposed to hospitalized, to be 0.14 universally in the algorithm. According to⁴⁴, the mean duration of incubation period is 5.2 days. Furthermore, we assume that the average time for an individual from showing symptoms to being hospitalized is 2 days^45,46. Thus, the mean duration for an individual from being exposed to being hospitalized is 7.2 days, whose inverse value is approximately 0.14.

Step 1: Estimate $\{\gamma _k\}_{k=1}^n$

We first estimate $\{\gamma _k\}_{k=1}^n$ by maximizing the likelihood

$$\begin{aligned} {P\left( \left\{ ( R_a)_k(i), ( C_a)_k(i), k=1,\dots ,n\right\} _{i=1}^T \bigg |\{\gamma _k\}_{k=1}^n \right) } \end{aligned}$$

over $\gamma _k$ for each k. We assume that the newly removed cases in one day follow a Poisson distribution whose mean equals to the product of $\gamma _k$ and the accumulated hospitalized cases (which is the difference between the accumulated confirmed cases and the accumulated removed cases, and thus is observable) the day before. Then, the likelihood of $\{\gamma _k\}_{k=1}^n$ can be written as

$$\begin{aligned} &P\left( \left\{ ( R_a)_k(i), ( C_a)_k(i), k=1,\dots ,n\right\} _{i=1}^T \bigg |\{\gamma _k\}_{k=1}^n \right) \\& \quad = \prod _{i=1}^{T} P\left( \left\{ (\Delta R_a)_k(i), k=1,\dots ,n\right\} \bigg |\{\gamma _k\}_{k=1}^n, \left\{ ( R_a)_k(i-1), ( C_a)_k(i-1), k=1,\dots ,n\right\} \right) \\& \quad = \prod _{i=1}^{T}\prod _{k=1}^{n}\text {Pois} \left( (\Delta R_a)_k(i) \big |\gamma _k\left( (C_a)_k(i-1)-(R_a)_k(i-1) \right) \right) \\& \quad = \prod _{k=1}^{n}\prod _{i=1}^{T} \frac{ \left( \gamma _k\left( (C_a)_k(i-1)-(R_a)_k(i-1) \right) \right) ^{ (\Delta R_a)_k(i) }}{ \left( (\Delta R_a)_k(i)\right) ! }\exp \left( - \gamma _k\left( (C_a)_k(i-1)-(R_a)_k(i-1) \right) \right) , \end{aligned}$$

where $\mathrm {Pois}(k\big |\beta )$ ($k\in \mathbb {N}, \beta >0$) denotes the probability that k occurrences are observed for a discrete random variable X having a Poisson distribution with mean $\beta$.

Then, we estimate $\gamma _k^* = \arg \max _{\gamma _k} \prod _{i=1}^T \text {Pois}\left( (\Delta R_a)_k(i) \big | ((C_a)_k(i-1) - (R_a)_k(i-1))\gamma _k\right)$ for each k separately.

Step 2: Estimate ${\overline{\Theta }} = \{E_k(0), H_k(0), \lambda _k\}_{k=1}^n$

Next, we estimate the remaining parameters ${\overline{\Theta }} = \{E_k(0), H_k(0), \lambda _k\}_{k=1}^n$, by finding ${\overline{\Theta }}$ that achieves maximum a posteriori probability (MAP).

Posterior distribution of ${\overline{\Theta }}$ and MAE estimate. We denote the posterior distribution of ${\overline{\Theta }}$ given data $\{(\Delta C_a)_k(i),k=1,\dots ,n\}_{i=1}^T$ as $\pi ({\overline{\Theta }})$. Then by Bayesian formula,

$$\begin{aligned} \pi ({\overline{\Theta }})=P\left( {\overline{\Theta }} \bigg | \left\{ (\Delta C_a)_k(i)\right\} _{k,i}\right) = \dfrac{1}{Z}P\left( \left\{ (\Delta C_a)_k(i)\right\} _{k,i}\bigg |{\overline{\Theta }}\right) P({\overline{\Theta }}), \end{aligned}$$

(2.5)

where $P({\overline{\Theta }})$ is the prior distribution of ${\overline{\Theta }}$ to be determined and $Z = P\left( \left\{ (\Delta C_a)_k(i)\right\} _{k,i}\right)$ is a constant irrelevant to ${\overline{\Theta }}$. We further denote

$$\begin{aligned} V({\overline{\Theta }}) =-\log \left( P\left( \left\{ (\Delta C_a)_k(i)\right\} _{k,i} \bigg | {\overline{\Theta }}\right) \right) - \log (P({\overline{\Theta }})), \end{aligned}$$

(2.6)

then

$$\begin{aligned} \pi ({\overline{\Theta }})=\dfrac{1}{Z}\exp (-V({\overline{\Theta }})). \end{aligned}$$

(2.7)

To fit the realistic evolution of the epidemic more precisely, ${\overline{\Theta }}$ is estimated as

$$\begin{aligned} {\overline{\Theta }}^* = \arg \max \pi ({\overline{\Theta }}) = \arg \min V({\overline{\Theta }}), \end{aligned}$$

(2.8)

with reasonable prior distribution $P({\overline{\Theta }})$. Then, MCMC sampling scheme starting from ${\overline{\Theta }}^*$ is applied to get the posterior distribution for ${\overline{\Theta }}$. This process might possess higher computational efficiency than choosing the initial point for MCMC randomly or empirically.

Next, we specify the formulas for the likelihood function $P\left( \left\{ (\Delta C_a)_k(i)\right\} _{k,i}\bigg |{\overline{\Theta }}\right)$ and the prior distribution $P({\overline{\Theta }})$.

Likelihood function of ${\overline{\Theta }}$. Notice that the ODE system (2.4) is the mean-field version of our stochastic model, and $\{(\widetilde{\Delta C_a})_k(i)\}_{k,i}$ are determined by the parameters ${\overline{\Theta }}$ ($\delta$ and $\{\gamma \}_{k=1}^n$ are treated as given), thus by the Markov property, $\{(\Delta C_a)_k(i)\}$ are all independent for $k=1,\dots , n$, $i=1,\dots ,T$ conditioned on the parameters ${\overline{\Theta }}$. Furthermore we suppose that $(\Delta C_a)_k(i) \sim \text {Pois}((\widetilde{\Delta C_a})_k(i))$. Thus, the likelihood of ${\overline{\Theta }}$ can be written as

$$\begin{aligned} P\left( \left\{ (\Delta C_a)_k(i)\right\} _{k,i} | {\overline{\Theta }}\right) = \prod _{i=1}^{T} P\left( \left\{ (\Delta C_a)_k(i)\right\} _{k=1}^n \bigg |{\overline{\Theta }}\right) = \prod _{i=1}^{T}\prod _{k=1}^{n}\dfrac{((\widetilde{\Delta C_a})_k(i))^{(\Delta C_a)_k(i)}}{((\Delta C_a)_k(i))!}e^{-(\widetilde{\Delta C_a})_k(i)}. \end{aligned}$$

(2.9)

Choice of prior distribution of ${\overline{\Theta }}$. The remaining problem is to choose the prior distribution $p({\overline{\Theta }})$. Presuming that the transmission rates in the regions owning more similarities are closer, $p({\overline{\Theta }})$ is designed to combine the information of correlations between regions. In particular, given a matrix A which characterizes the pairwise similarities between the regions, and if we denote ${\lambda } = (\lambda _{1},\dots ,\lambda _{n})^T\in \mathbb {R}^n$,

$$\begin{aligned} P({\overline{\Theta }}) = \frac{1}{C_{\sigma ,A}} \exp (-{\lambda }^T (D-A){\lambda } - \sigma \Vert {\lambda }\Vert _2^2), \end{aligned}$$

(2.10)

where $D = \text {diag}\{d_1,\dots ,d_n\}$ is the degree matrix of A with $d_i = \sum _{j=1}^i A_{ij}$, $C_{\sigma ,A} = \int _{\mathbb {R}^{n}} \exp (-{\lambda }^T (D-A){\lambda } - \sigma \Vert {\lambda }\Vert _2^2) d{\lambda }$ is a constant depending on $\sigma$ and A. Here, a small $\sigma$ is chosen for $p({\overline{\Theta }})$ to be a probability measure without imposing much restriction on $\lambda$. Then, by (2.6) and (2.10)

$$\begin{aligned} V({\overline{\Theta }}) =&-\log \left( P\left( \left\{ (\Delta C_a)_k(i)\right\} _{k,i} \bigg | {\overline{\Theta }}\right) \right) + {\lambda }^T (D-A){\lambda } +\sigma \Vert {\lambda }\Vert _2^2 + \log {C_{\sigma ,A}} \nonumber \\ =&-\log \left( P\left( \left\{ (\Delta C_a)_k(i)\right\} _{k,i} \bigg | {\overline{\Theta }}\right) \right) + \frac{1}{2}\sum _{i,j} a_{ij} (\lambda _{i}-\lambda _{j})^2 + \sigma \sum _i \lambda _i^2 + \log {C_{\sigma ,A}}. \end{aligned}$$

(2.11)

The parameter estimation procedure could be extended to the case when $\{\lambda _k\}$ are time-varying by modifying (2.13), which we will introduce in more detail in “Model extension by allowing time-varying parameters” for the real-world data in China and Europe.

Construction of affinity matrix A appearing in prior distribution (2.10). Now, we specify the construction of affinity matrix A in (2.10) that reflects the similarity between regions. A is constructed from affinity matrix W by further addressing the correlation between regions with more similarities. To treat different data sets and W with a unified approach, we assume that $\max _{i,j}W_{ij}=1$ (W can be re-scaled entry-wise if necessary).

For a given $W=(W_{ij})_{i,j\in \{1,\dots ,n\}}$ whose choice is detailed later, the next step of attaining A is to divide the n regions into d groups ($D_1,...,D_d$, $\cup _{m=1}^d D_m = \{1,\dots ,n\}$, and $\forall i\ne j$, $D_i\cap D_j = \emptyset$ ) where the regions in the same groups have more similarities. Then, for given $\beta \in (0,1)$ and a given penalty factor $\mu >0$, $a_{ij}$ is constructed as follows:

$$\begin{aligned} {a_{ij} = \mu {\left\{ \begin{array}{ll} &{}W_{ij}, \quad \text {if regions } i \text { and } j\text { are in the same group},\\ &{}\beta W_{ij},\text { otherwise}. \end{array}\right. }} \end{aligned}$$

(2.12)

We remark that $\beta$ in (2.12) is taken to be 0.1 for all the experiments in this paper. By constructing A as in (2.12), correlations for regions in the same groups are further addressed, whose transmission parameters are imposed with stronger restrictions.

Now we specify the choice of W for data sets that will be analyzed later in this paper. For simulated data and real-world data in China, in which cases the transportation data are available, we construct W from the traveling volume matrices $\{W_i\}_{i=1}^T$. Specifically, $W{:=}{\bar{W}}/\max _{i,j}{\bar{W}}_{ij}$, where ${\bar{W}} := \frac{1}{2} ( (\frac{1}{T}\sum _{l=1}^TW_l) + (\frac{1}{T}\sum _{l=1}^TW_l)^T)$. Nevertheless, for real-world data in Europe, where we are not aware of traveling data publicly available that are sufficient for the proposed model, W is just the all 1 adjacency matrix. We remark that the affinity matrix W may also be obtained by ways other than using the transportation data, as long as it reflects the similarities between districts.

Specified formula for MAP estimate of ${\overline{\Theta }}$. From the MAP estimate (2.8), definition of V (2.11) given prior distribution (2.10), and the definition of $a_{ij}$ in (2.12), the inference of ${\overline{\Theta }}$ can be equivalently written as follows

$$\begin{aligned} { {\overline{\Theta }}^*}&{= \arg \min V({\overline{\Theta }})} \nonumber \\&{= \arg \min \left( -\log \left( P\left( \left\{ (\Delta C_a)_k(i)\right\} _{k,i} \bigg | {\overline{\Theta }}\right) \right) + \frac{1}{2}\sum _{i,j} a_{ij} (\lambda _{i}-\lambda _{j})^2 + \sigma \sum _i \lambda _i^2\right) } \nonumber \\&= \arg \min \left( -\log \left( P\left( \left\{ (\Delta C_a)_k(i)\right\} _{k,i} \bigg | {\overline{\Theta }}\right) \right) + \frac{\mu }{2}\left( \sum _m\sum _{i,j\in D_m} W_{ij} (\lambda _{i}-\lambda _{j})^2 \right. \right. \nonumber \\&\quad \left. \left. + \beta \sum _{m_1<m_2}\sum _{i\in D_{m_1},j\in D_{m_2}} W_{ij} (\lambda _{i}-\lambda _{j})^2\right) + \sigma \sum _i \lambda _i^2 \right) , \end{aligned}$$

(2.13)

where $\left( P\left( \left\{ (\Delta C_a)_k(i)\right\} _{k,i} \bigg | {\overline{\Theta }}\right) \right)$ is given in (2.9).

It can be seen that by choosing the prior distribution as in (2.10), a $l_2$ regularization term is imposed for better generalization.

Estimation of the marginal posterior distribution of ${\overline{\Theta }}$. Finally, after choosing $P({\overline{\Theta }})$ determined by A and $\sigma$, the optimization process ${\overline{\Theta }}^* = \arg \min V({\overline{\Theta }})$ as in (2.13) is accomplished by a BFGS algorithm^47,48,49,50. To obtain the posterior distribution of ${\overline{\Theta }}$, we use classical MCMC sampling scheme starting from ${\overline{\Theta }}^*$ solved by the optimization.

Prediction of the epidemic trajectories with the estimated parameters

Once the parameters $\Theta ^*=\{\{\gamma _k^*\}_{k=1}^n, {\overline{\Theta }}^*\}$ are estimated from the optimizations $\gamma _k^* = \arg \max _{\gamma _k} \prod _{i=1}^T \text {Pois}\bigg ((\Delta R_a)_k(i) \big | ((C_a)_k(i-1) - (R_a)_k(i-1))\gamma _k\bigg )$ and ${\overline{\Theta }}^* = \arg \min V({\overline{\Theta }})$ as described in “Estimation of model parameters”, the trajectories of newly confirmed cases could be simulated according to the stochastic dynamic process with $\Theta ^*$. Furthermore, trajectories could also be sampled from the posterior distribution of $\Theta$ instead of using $\Theta ^*$ alone, which also takes the randomness from ${\overline{\Theta }}$ into account. Particularly, this could be achieved by sampling ${\overline{\Theta }}$ from MCMC and then simulating trajectories with the sampled $\{\{\gamma _k^*\}_{k=1}^n, {\overline{\Theta }}\}$. Additionally, deterministic trajectories determined by (2.4) could also be computed by explicit Euler’s method.

Experimental results for simulated data

Two specific cases are considered for simulated data. We first remark that the regions in “Methods” are called as provinces in this section. Section “Four provinces case” considers four provinces separated into two groups (the provinces in the same group are assumed to have more similarities) and with traffic between each pair of the provinces. Section “Thirty provinces case” considers thirty provinces randomly separated into three groups, with the other settings similar to the previous case. Section “More details of experimental settings and sensitivity analysis” includes more details of the experimental settings and sensitivity analysis.

More details of experimental settings and sensitivity analysis

Experimental settings

The results in “Experimental results for simulated data” are for 100 replicas. In each replica, three random trajectories are sampled independently according to the stochastic model with prefixed parameters, part of which are treated as the ground truth training, validation, and testing trajectory, respectively (see more details in Sect. A.1.1 of Supplementary Information).

For each model, we first fit the parameters using the training trajectory and then predict the testing trajectory using the estimated parameters. Note that we detail the choice of hyper-parameters for the model proposed in Sect. A.2 of Supplementary Information. In particular, the penalty factor $\mu >0$ is chosen by cross-validation and chosen as the value minimizing the validation error, since for the simulated data, the validation error are identically distributed as the testing error. The comparison of trajectory prediction is from one typical realization, for which we compare the ground truth training and testing trajectories with the fitted training and predicted testing trajectories for all the models. Additionally, parameter estimation and quantitative evaluations are compared with mean and standard deviation over all 100 replicas. The detailed computations of training, validation, and testing errors can be found in Sect. B of Supplementary Information.

Sensitivity analysis of $\sigma$

Note that parameter inference with the proposed model involves the parameter $\sigma$, as shown in (2.13). Therefore, the sensitivity analysis for the parameter $\sigma$ in (2.13) is performed for the four provinces case. Specifically, the results for $\sigma$ varying from $10^{-6}$ to $10^0$ are presented and compared. Details can be found in “Results of parameter estimation” and “Further model evaluation”. Similar results are obtained for other data sets, and details are omitted.

Mismatched partition of regions

We also note that the graph Laplacian penalty of the proposed model depends on the partitioning the regions into several groups, as described in “Estimation of model parameters”. Since the graph knowledge is usually not fully known, it is a question whether our methods can still perform well without accurate prior knowledge. For the thirty provinces case, we report the results of the proposed model with a mismatch between the partition of the regions and the ground truth division, the details of which can be found in “Thirty provinces case”.

Four provinces case

Data description

In this simulated study, we let $n=4$, $T=20$, and set the threshold $T_{th}$ separating training and testing data to be 10 (more detailed can be seen in Supplementary Information S1). The other prefixed parameters are listed below:

For $k\in \{1,2,3,4\}$, $N_k(0) = 10^6$, $E_k(0)=30$, $H_k(0)=10$.
For $l\in \{1,\dots ,T\}$, $i,j\in \{1,2,3,4\}$ and $i\ne j$, $(W_l)_{ij} = 5\times 10^3$.
$\lambda _1=0.5$, $\lambda _2=0.47$, $\lambda _3=0.4$, $\lambda _4=0.37$, $\delta =\gamma _1=\cdots =\gamma _4=0.14$.

The four provinces are divided into two groups, with the first group consisting of Provinces 1 and 2 and the second group consisting of Provinces 3 and 4. The similarities within groups are reflected in the settings that the values of $\{\lambda _k\}$ are closer for provinces in the same group.

Models to compare

The proposed model and other four baseline models. We first specify the models to be compared below. The last one is the proposed model, and the first four models serve as baselines with different settings.

1.
The model with uniform prior distribution, without heterogeneity or migration.
2.
The model with uniform prior distribution, without heterogeneity but with migration.
3.
The model with uniform prior distribution, with heterogeneity but without migration.
4.
The model with uniform prior distribution, with both heterogeneity and migration.
5.
The model with prior distribution based on graph Laplacian, with both heterogeneity and migration.

For better illustration and comparison between the models in the experiment results, the Models 1–5 are summarized in Table 2 below.

Table 2 Models to be compared when the transportation data are available.

Full size table

First, the models with uniform prior distributions themselves (Models 1–4) are compared according to whether two key assumptions exist in the model:

(1)
Whether the transmission rates $\{\lambda _k\}$ are allowed to vary over regions.
(2)
Whether there exists transportation between regions.

Then, the model with prior distribution based on graph Laplacian (Model 5) is compared with those using uniform distributions as prior distributions (Models 1–4). The former one utilizes the correlation between subpopulations by adding a $l_2$ regularization term for the model. In contrast, only lower and upper bounds are imposed on parameters without other prior information being used in the latter ones.

Parameter inference of the five models and sensitivity of $\sigma$. For Model 5, the proposed model, the parameters $\Theta = \{\{\gamma _k\}_{k=1}^n, {\overline{\Theta }}\}$ are estimated following the two-step procedure described in “Estimation of model parameters”, where ${\overline{\Theta }} = \{E_k(0), H_k(0), \lambda _k\}_{k=1}^n$. For the estimation of ${\overline{\Theta }}$, following the general formula (2.13) in “Estimation of model parameters”, the specific formula of ${\overline{\Theta }}^*$ for the four provinces case is as follows,

$$\begin{aligned} {\overline{\Theta }}^* =\arg \min (-\log p(y_{1:T_{th}}|{\overline{\Theta }}) + \mu \left( (\lambda _1-\lambda _2)^2 + (\lambda _3-\lambda _4)^2 + \beta \sum _{(i,j)\ne (1,2) \text {or} (3,4)}(\lambda _i-\lambda _j)^2\right) + \sigma \Vert \lambda \Vert _2^2 ), \end{aligned}$$

(3.1)

where $\beta$ is taken to be 0.1. For Model 5, we conduct the sensitivity analysis for parameter $\sigma$ in (3.1) and present results for $\sigma =10^{-6},10^{-3}$ and $10^0$ respectively in “Results of parameter estimation” and “Further model evaluation”.

We remark that in Models 1–4, the estimation of $\Theta =\{\{\gamma _k\}_{k=1}^n, {\overline{\Theta }}\}$ still follows a similar two-step procedure as in Model 5, and the first step of obtaining $\gamma _k^* = \arg \max _{\gamma _k} \prod _{i=1}^T \text {Pois}\big ((\Delta R_a)_k(i) \big | ((C_a)_k(i-1) - (R_a)_k(i-1))\gamma _k\big )$ remains formally the same. The difference lies in the optimization object of ${\overline{\Theta }}$. First, the $l_2$ regularization term becomes prior knowledge of the parameters’ upper and lower bounds. Second, for models without heterogeneity of parameters, $\{\lambda _k\}_{k=1}^n$ are forced to be the same in ODE system (2.4). For models without transportation between regions, terms involving $W_t$ disappear in (2.4). Additionally, for the other data sets considered in the following sections, the parameter estimation methods for the baseline models are similar and thus will not be repeated.

Finally, note that the model in Ref.¹² is similar to Models 1 and 2, since they all assume a spatially homogeneous transmission parameter. However, Ref.¹² assumed that the epidemic was seeded from one seed region while Models 1 and 2 do not make such assumption. Moreover, Ref.¹² focused more on the spread of the epidemic from the seed region at the early stage of the pandemic, and only the introduction dates in the other regions were utilized for the estimation of transmission parameters. In comparison, the estimation of transmission rate in Models 1 and 2 exploits the data in all regions in the whole process.

Results of trajectory prediction

First, we remark that in Model 5, $\mu$ is chosen to be the minimizer of the averaged validation errors over 100 replicas over a range of values of $\mu$. The weighted (simply averaged) validation errors, MAE$^{[\mathrm Val]}_{(w)}$ and MSE$^{[\mathrm Val]}_{(w)}$ (MAE$^{[\mathrm Val]}_{(s)}$ and MSE$^{[\mathrm Val]}_{(s)}$), are defined as in Sect. B of Supplementary Information. We remark that the superscript $^{[\mathrm Val]}$ refers to when the error is computed on validation data, and the subscripts $_{(w)}$ and $_{(s)}$ denote that the errors are the weighted and simple average of relative errors over time respectively.

The averaged weighted validation errors MAE$^{[\mathrm Val]}_{(w)}$ and MSE$^{[\mathrm Val]}_{(w)}$ over replicas are shown in Supplementary Fig. S2, and the simply averaged counterparts are shown in Supplementary Fig. S3. For parameter inference using Model 5, $\mu$ is chosen to be $10^{2.7}$, at which all the averaged validation errors (MAE$^{[\mathrm Val]}_{(w)}$, MSE$^{[\mathrm Val]}_{(w)}$, MAE$^{[\mathrm Val]}_{(s)}$ and MSE$^{[\mathrm Val]}_{(s)}$) over 100 replicas are minimized, as can be seen from Supplementary Figs. S2 and S3.

The trajectories of a typical realization are plotted in Fig. 2 and the absolute errors of the fitted trajectories are shown in Fig. 3. As can be seen in these two figures, heterogeneity helps improve the prediction of testing data more than transportation, while introducing migration without heterogeneity of parameters worsens the estimate as can also be noticed from Table 4. More explanations can be found in “Further model evaluation”.

Additionally, Model 5 with prior distrbution based on graph Laplacian lowers the absolute errors of predicted trajectories compared with Model 4.

As shown in Figs. 2 and 3, Models 1 and 2 have slightly better generalization accuracy than Model 5 for Province 3. On the one hand, for data in this replica, the estimated $\lambda _3$ is 0.3860 using Model 5 and 0.4640 using Models 1 or 2 (recalling that the ground truth $\lambda _3$ is 0.4). On the other hand, due to the randomness of the generated testing data, the sampled newly confirmed cases are much more than the deterministic ones in Province 3 obtained by running (2.4) with the ground truth parameters. Hence, although all these estimates of $\lambda _3$ are biased from the ground truth 0.4, estimates using Models 1 and 2, which are biased up, lead to less absolute errors.

Results of parameter estimation

The mean and standard deviation of $\{\lambda _i\}$ estimated by the five models for four provinces case are reported in Table 3. It can be observed that models allowing heterogeneity estimate parameters more accurately, and Model 5 that integrates the correlation leads to slightly better estimate for $\lambda _2$. We can see that compared to Model 4, the estimates of smaller $\lambda _k$’s (such as $\lambda _2,\lambda _3,\lambda _4$) become larger, and the estimate of $\lambda _1$ which has the largest value becomes smaller, since the graph Laplacian penalty tends to make $\{\lambda _k\}_{k=1}^n$ closer to each other.

Moreover, we performed a sensitivity analysis for the hyper-parameter $\sigma$ to check that the results are robust to $\sigma$. The last two rows of Table 3 show the parameter estimation results for Model 5 with $\sigma =10^{-3}$ and $10^0$ respectively (more values of $\sigma \in [10^{-8}, 10^0]$ are tested and the results are similar as well). We observe that the variation of parameters estimated by Model 5 with $\sigma$ varying from $10^{-6}$ to $10^0$ does not exceed $1\%$. Therefore, the parameter estimation results are not sensitive to the choice of $\sigma$ as long as $\sigma$ is not too large.

Table 3 Estimated $\lambda _i$ with standard deviation using Models 1–5 for simulated data with four provinces.

Full size table

Further model evaluation

The training and testing errors, MAE$^{[\mathrm Tr]}_{(w)}$, MAE$^{[\mathrm Te]}_{(w)}$, MSE$^{[\mathrm Tr]}_{(w)}$, MSE$^{[\mathrm Te]}_{(w)}$, as defined in Sect. B (in the Eq. (S5)) of Supplementary Information, are listed below in Table 4 with mean and standard deviation. We remind the readers that the superscripts $^{[\mathrm Tr]}$ and $^{[\mathrm Te]}$ represent the errors are computed on training and testing data respectively, and the subscript $_{(w)}$ denotes that the error is weighted average of the daily relative errors over time. It can be seen from Table 4 that the presence of both heterogeneity and transportation helps reduce the training and testing errors by comparing the first four models. By comparing Model 4 and Model 5, it can be seen that using the graph Laplacian regularization leads to better prediction performance in average, which might not be obvious in this case due to the relatively large variance. The advantage of the proposed Model 5 is more evident for larger number of regions involved in the dynamic system, as shown in the next “Thirty provinces case”.

Table 4 Training and testing errors with standard deviation of Models 1–5 for simulated data with four provinces.

Full size table

In addition, it can be seen from Table 4 that the errors increase greatly after transportation is included while heterogeneity remains absent. A possible explanation for this might be that without heterogeneity of parameters and transportation between provinces, the estimated values of $\lambda _k$’s are lower than the true values of $\lambda _k$’s for group 1, which leads to that the estimated newly confirmed cases are fewer than the true ones for provinces in group 1. For the same reason, the estimated newly confirmed cases are higher than the true ones in group 2. When the transportation is considered, more confirmed cases in group 1 are transferred to group 2 than the cases transported in the opposite direction. As a result, when the transmission parameters do not have heterogeneity, migration between provinces will worsen the prediction performance compared to the case without migration.

Furthermore, the last two rows of Table 4 report the training and testing errors for Model 5 with the same $\mu =10^{2.7}$ while $\sigma =10^{-3}$ and $\sigma =10^0$ respectively. As a consequence of the robustness of the parameter estimation regarding $\sigma$, the errors of Model 5 are also robust to $\sigma$. The similar analysis is also performed for the other data sets and the similar results can be obtained which we do not report repetitively. Hereinafter, the results are presented with $\sigma =10^{-6}$.

The plots of the mean of weighted and simply averaged testing errors MAE$^{[\mathrm Te]}_{(w)}$, MSE$^{[\mathrm Te]}_{(w)}$, MAE$^{[\mathrm Te]}_{(s)}$ and MSE$^{[\mathrm Te]}_{(s)}$ against varying $\mu$ are shown in Supplementary Figs. S2 and S3 respectively. Recall that the subscripts (w) and (s) denote the weighted and simple average respectively. Note that Model 5 with $\mu =10^{2.7}$, at which the averaged validation errors over replicas are minimized, achieves the minimal values of testing errors MAE$^{[\mathrm Te]}_{(w)}$ and MSE$^{[\mathrm Te]}_{(w)}$ (also MAE$^{[\mathrm Te]}_{(s)}$ and MSE$^{[\mathrm Te]}_{(s)}$). This is because validation and testing errors have the same distribution in this case.