Introduction

The Coronavirus disease (COVID-19) is an ongoing pandemic that poses a global threat. As of March 26, 2020, more than 520,000 cases of COVID-19 have been reported in over 200 countries and territories, resulting in approximately 23,500 deaths1,2,3,4,5,6,7,8,9. In the United States, the first known positive case was identified in Washington state on January 20, 202010. By March 26, the epidemic had been rapidly spreading across many communities and present in all 50 states, plus the District of Columbia; the total number of confirmed cases in the United States rose to 78,786 with 1137 deaths.

To combat the spread of COVID-19, the government has taken actions in various dimensions, including banning or discouraging domestic and international travels, announcing stay-at-home orders to curb non-essential interactions for reducing transmission rate, and urging commercial laboratories to increase test capacity. To curb traveling, on January 31, the United States government announced travel restrictions on travelers from China; on February 29, it announced travel ban against Iran and advised travel with caution to Europe11 ; on March 11, it announced travel restrictions on most of European countries. To reduce human-interactions, on March 13, a national emergency was declared; as of March 28, 39 states had issued either statewide or regionally stay-at-home or shelter-in-place order, requiring residents to stay indoors except for essential activities. To increase test capacities, on February 4, the United States Food and Drug Administration (FDA) approved the United States Centers for Disease Control and Prevention (CDC)’s test, which was later to be proved inconclusive12; on February 29, the FDA relaxed its rules for some laboratories, allowing them to start testing before the agency granting its approvals; on March 27, FDA issued an Emergency Use Authorization to a medical device maker, the Abbott Labs, for the use of a coronavirus test that delivers quick testing results13.

So far, since there is no treatment or vaccine for SARS-COV-2 available, these actions have been taken largely based on classic non-pharmaceutical epidemic controls. Works on evaluating similar measures in other countries, especially China, started to emerge7,14,15. For example, the effect of travel restriction on delaying the virus spread in China has been reported5,16. However, it is still unclear what control and intervention measures would have actual effect, especially to what extent, on abating the spread of COVID-19 in the United States. As the United States has very different political, administrative, social, pubic health and medical systems, as well as culture from China, this remains to be a critical question to address, especially considering that some measures and policies come with extremely high economic and societal costs.

There have been numerous modeling works projecting or predicting the trend of the COVID-19 pandemic regionally or globally17,18. Most of the works apply a global model to the entire study area, either a region, a country, or the entire globe. Rarely the variation of different parts within one area and the interactions among those parts are taken into consideration. However, a country like the United States features diversity in all aspects. On the one hand, the overall situation of the entire country is a result emerging from local situations and their interactions, and thus, ignoring the local interactions can hardly lead to a high-quality overall model; on the other hand, as all interventions and policies finally have to be adapted to the local situation, a localized modeling will be much more relevant to the real-world practices. Spatially and network-related epidemic models can describe the geographical spread of viral dynamics7,19,20,21. Recent studies have shown the importance of incorporating timely human mobility patterns derived from mobile phone big data and global flight networks into the epidemiology modeling process and in public health studies5,7,22,23,24,25,26,27,28,29,30. Without accurate models that incorporate human mobility patterns and spatial interactions26,27, it is rather challenging to quantify the sensitivity of parameters, and using the linkage to real practices to make sensible policy suggestions.

Accordingly, the core of the study is twofold. First, to localize the modeling, we developed a travel-network-based susceptible-exposed-infectious-removed (SEIR) mathematical compartmental model system that simultaneously characterizes the spatiotemporal dynamics of infections in 51 areas (50 states and the District of Columbia). Each state or district has its own model, and all models simultaneously take into account inflows and outflows of interstate travelers.

Second, to improve the practical relevance, we chose to use three parameters that can directly correspond to possible practical means to discover, combat, and control the spread of the disease, and quantify their impact on the final output of the model. The three parameters include: (1) the transmission rate b, which corresponds to the local social-distancing enforcement, e.g., the stay-home order; (2) the detection and reporting rate r, which corresponds to the testing capacity; and (3) the travel ratio \(\alpha _t\), which corresponds to the ratio of interstate travel volume compared to that of 2019 during the same period.

The modeling is a dynamic projection process (see the ‘methods’ section). We employed daily and state-specific historical data to incrementally calibrate the model, and then used the calibrated model to predict future scenarios under different non-pharmaceutical control and intervention measures. During this process, we ran data assimilation methods to identify parameter values that optimally fit the current situation (see more details in the methods and supplementary material). To project into the future, we set different values for the parameters to create different control and intervention scenarios, and then ran the simulation to see their impact on the model results. The final output of the model is the total number of confirmed cases in a state on a particular day. The current strategy in the United States is to isolate people who have the symptoms of COVID-19. An ideal scenario is to have an \(100\%\) reporting rate, i.e., every infected case gets confirmed and thus isolated quickly. Another ideal setting is to have everyone who was in contact with the infected gets identified and isolated quickly as well. Our model incorporated these considerations and examined such direct isolation of the exposed compartment in detail. We particularly investigated the impact of quickness of such actions through mathematical modeling and scenario analysis.

A notable result from our modeling is that the impact of interstate travel restriction on the model output is modest. This can be explained by that when the disease has already widespread in all states, the relatively small number of cases in the travelers will cause little difference to the local situation, compared with the effects of local social-distancing and isolation rules and the increase of testing capacity.

Results

Figure 1 shows the effect on spatiotemporal dynamics of infectious population across states by setting the coefficients at different configurations. An interactive map-based scenario simulation web dashboard is also available at https://geods.geography.wisc.edu/covid19/us_model. We set \(r = 1-\alpha _r(1-r_0)\) and \(b=\alpha _bb_0\), where \(r_0\) and \(b_0\) are the report and transmission rate as of March 20, 2020 using data assimilation fitting result. By decreasing \(\alpha _r\) from 1 to 0, we increase the report rate from the original \(r_0\) to 1, and by decreasing \(\alpha _b\) we decrease the transmission rate. Most states, except a few such as NY, MI, and CA, see drastic improvement when the transmission rate is decreased and the testing(reporting) rate is increased, but the reduction of interstate traffic alone is not as effective. Our modelling reveals that once the epidemic in an area has reached a certain stage, the difference that can be caused to the local situation by the relatively small number of imported cases due to the interstate travel is insignificant. According to our modeling, all states in the United States have reached that stage. Therefore, as long as those travelers follow the social-distancing rules and the local government provides sufficient testing capacity, there is no apparent urge to curb interstate travel. This is in line with the finding in16,28, in which the authors projected the pick up of the spreading in other parts of China outside of Wuhan with about 3 days delay, and in the world outside China within a 2–3 weeks of delay, assuming no further screening is in place. Different from China where the city of Wuhan is clearly the epicenter of the COVID-19 outbreak and the travel ban quickly gets the rest of China under control, most of the states in the United States have already had signs of community spread by March 20, 202031, and banning other states will hardly make much difference to the local situation. In addition, Fig. 2 shows the corresponding prediction time series of infectious population in top 15 states under two scenarios (see also Supplementary Fig. S14): (A) the reported rate and the transmission rate remained unchanged as of March 20, 2020, with \(\alpha _r = \alpha _b = 1\), in which most states will continue their exponential growth before reaching their peak; (B) with \(\alpha _r = \alpha _b=0.1\), that is, when the transmission rate b is much smaller and the reported rate r is much higher (closer to 1), we can “flatten the curve” on the virus (i.e., reducing the spread of the virus).

We further investigate the effect of increased testing capacity and report rate. As shown in Fig. 3a, most states see drastic improvement when the report rate increases. All states, by April 29, see monotonically exponential reduction of infections. The impact is strong in states such as MA, AZ, FL, and OR, but relatively weak in states such as NY, MI and IL. In Fig.3b, we study the effect of \(\alpha _r\) and \(\alpha _b\) on the basic reproduction rate \(R_e\) in NY (see other states in Supplementary Fig. S15). It can be seen that merely raising the report rate cannot fully make \(R_e<1\). To mitigate the spread of COVID-19 in these states, a proactive approach needs to be taken, and quick detection and isolation of the exposed population need to be in place instead of being delayed until the onset of the symptoms. This measure can prevent the exposed population from potentially infecting other susceptible people. In Fig. 3c, we plot the increase of infections in terms of \(D_q\) (i.e., the temporal lag in putting a person into quarantine) for the states that are sensitive to change of \(D_q\), including NY, NJ, IL, GA, MI, CO, WI, LA, TX, PA, MA, and TN. The longer one waits to inform and isolate the exposed population, the more infected people one observes. For example, there is a sharp transition for NY and MI. If the average detection and isolation time is more than 2 days, the total number of infections will significantly increase.

The results again showed the importance of sufficient testing and strong transmission-intervention measures such as social distancing and self-quarantine policy32. These policies can help quickly identify the source of infection and isolate them before they infect the remaining population. This measure presumably comes with a lower economical cost.

We finally investigate the stability of our statements on the parameters chosen in the model. There are a number of parameters in the model that are determined according to medical studies and thus necessarily contain ambiguity. One parameter, \(\gamma\), is especially hard to be set at a particular value due to the lack of medical evidence. This parameter reflects the level of infectiousness of the “exposed” compartment, a population that is presymptomatic. Recent studies indicate that presymptomatic patients seem to be more infectious than patients who have symptoms on site33. We therefore run our model with different values of \(\gamma\) to identify the significance of this particular parameter. Our numerical result suggests that within a moderate range of \(\gamma\), our conclusions still stand true. In particular, as shown in Fig. 4, by setting the “exposed” compartment being more infectious than the “infected” compartment, the numerical solution shows the same trend. We still observe that, with a higher report rate, the number of non-infected population exponentially increases (i.e., less people would get infected), and when a proactive approach is taken, meaning that the “exposed” compartment gets quickly separated from the rest of the population, the non-infected population drastically increases as \(D_q\), the delay of the separation time, gets shortened. This means that the dependence of our conclusion on the parameter \(\gamma\) is stable, and the above statements are consistent.

We should emphasize that in our simulation, we do not differentiate patients with severe or mild symptoms. A more dedicated numerical experiment that separates the two categories could potentially give more detailed information. For example, in another agent-based modeling study34, researchers consider patients with mild to severe symptoms to evaluate the impacts of the timing of social distancing and adherence level on COVID-19 confirmed cases.

Discussion and conclusion

Modeling and analyzing the spread of COVID-19, and assessing the effect of various policies could be instrumental to national and international agencies for health response planning5,8,15,16,17,32. We show that the effect of interstate travel reduction is at most modest in the United States when the outbreak has already widespread in all states. On the other hand, we need to impose strong transmission-reduction intervention and increased testing capacity and report rate to contain the spread of virus. The result is based on mathematical and statistical analyses of transmission control measures and in agreement with previous findings2,3,5,14,15,16, suggesting that the effect of travel ban at a later stage of the outbreak is rather modest. This is also in line with the fact that the outbreaks still occurred in Europe even upon the strong travel ban on the earlier epicenter of Wuhan and its surrounding cities in China. We also quantitatively show that the transmission-reduction intervention such as policies on the social-distancing and shelter-in-place rules, and the increase of testing rate, which facilitates immediate isolation upon exposure, will significantly reduce the total infected population. Such effect is mostly visible for the states of NY, NJ, MI, and IL. Particularly, our modeling results show that for states such as NY and MI, to achieve an optimal infection reduction, a more proactive approach needs to be taken to quickly identify the exposed population and isolate them within two days of exposure in order to ensure the infection reduction. The result is in agreement with previous findings7,8.

We do need to emphasize that the model itself does not distinguish different ways of traveling across states. Indeed, if the interstate travel is conducted mostly through transiting through busy airports and train stations, and the social-distancing policy is not strictly imposed, then the high population density at these places will bring up the transmission rate b locally in space and time, leading to a higher infection rate. This is a severe consequence, but it should not be counted as the direct result of relaxing travel restrictions.

Moving forward, we estimate that the decline in travel has a modest effect on the mitigation of the pandemic. We need a stronger transmission-reduction intervention and increased detection and report rate in place to prevent the further spread of the virus. The results could potentially be used to design an optimal containment scheme for mitigating and controlling the spread of COVID-19 in the United States.

Methods

The mathematical model that simulates the spatiotemporal dynamics of state-level infections in the United States is a modified travel-network-based SEIR compartmental model in epidemiology by taking into account the variation of the 51 administrative units and their interactions14,35,36,37. It consists of 51 ordinary differential equation (ODE) systems, with each one characterizing the evolution of susceptible (S), exposed (E), reported (I), unreported (U) and removed (R) cases per state (Supplementary Fig. S1 and see more details in the supplementary material). The 51 ODE systems are then coupled through the state-to-state travel network flows (see Supplementary Fig. S2) that were extracted from the aggregated SafeGraph mobility data and weighted by \(\alpha _t\)38,39. Unlike most other models, we also incorporate the potential asymptomatic transmission. This makes the derivation of the basic reproduction number \(R_0\) different. Besides, each ODE system also includes two unknown parameters: the transmission rate (b) and the report rate for each state (r). The unknown parameters are inferred based on the total number of confirmed cases in each state for the period of March 1–March 20, 2020. The source of infection case data is the Center For Systems Science and Engineering at the Johns Hopkins University9.

The parameters and model specification are defined as follows:

$$\begin{aligned} \left\{ \begin{aligned}&\frac{\mathrm {d} S_i}{\mathrm {d} t} = -\frac{b_i S_i(U_i+\gamma E_i)}{P_i} + \sum _{j\ne i}\alpha _t n_{ij}\frac{S_j}{P_j} -\sum _{j\ne i}\alpha _t n_{ji}\frac{S_i}{P_i} \\&\frac{\mathrm {d} E_i}{\mathrm {d} t} = \frac{b_i S_i(U_i+\gamma E_i)}{P_i} - \frac{E_i}{D_e} + \sum _{j\ne i}\alpha _t n_{ij}\frac{E_j}{P_j} -\sum _{j\ne i}\alpha _t n_{ji}\frac{E_i}{P_i} \\&\frac{\mathrm {d} I_i}{\mathrm {d} t} = r_i \frac{E_i}{D_e} - c_{I}\frac{I_i}{D_{c}} - (1-c_{I})\frac{I_i}{D_{l}} \\&\frac{\mathrm {d} U_i}{\mathrm {d} t} = (1-r_i)\frac{E_i}{D_e} - c_U\frac{U_i}{D_{c}} - (1-c_U)\frac{U_i}{D_{l}} + \sum _{j\ne i}\alpha _t n_{ij}\frac{U_j}{P_j} -\sum _{j\ne i}\alpha _t n_{ji}\frac{U_i}{P_i} \\&\frac{\mathrm {d} R_i}{\mathrm {d} t} = c_{I}\frac{I_i}{D_{c}} + (1-c_{I})\frac{I_i}{D_{l}} + c_U\frac{U_i}{D_{c}} + (1-c_U)\frac{U_i}{D_{l}} \end{aligned}\right. \,. \end{aligned}$$
(1)

The ODE system is equipped with the following initial data (\(t=0\) standing for March 1, 2020):

$$\begin{aligned} S_i(0) = N_i - E_{i0} - U_{i0}-I_{i0}\,, \quad E_i(0) = E_{i0}\,, \quad I_i(0) = I_{i0}\,, \quad U_i(0) = U_{i0}\,,\quad R_i(0) = 0. \end{aligned}$$
(2)

In the equation, the unit for t is one day. \(N_i(t)\) is the total population of state i at time t, and \(P_i=S_i+E_i+U_i\) is the free population. \(n_{ij}\) is the number of inflow from state j to state i. \(b_i\) and \(r_i\) are the transmission rate and reporting rate of state i. \(c_I\) (\(c_U\), resp.) is the proportion of positive cases that show critical condition for I (unreported cases U, resp.). \(D_e\) is the latent period. \(D_{c}\) and \(D_{l}\) are the infectious periods of critical cases and mild cases. \(\alpha _t\) is a parameter to tune the traffic flow.

We emphasize two main differences in modeling compared with existing literature. In7, the authors study the inter-city traffic and its impact on the spreading of COVID-19 in China. The situation in China and that in the US are very different. In China, the epicenter is clear: the city of Wuhan, Hubei province, and the outbreak starts mid-January, 2020. The COVID-19 outbreak in the US, however, is multi-sourced. The consequence is that in the model in7, the initial condition for cities excepts Wuhan is clear: the latent, the reported and the unreported cases are all zero. In this model, however, the initial conditions \(E_{i0}\) are unclear for all states; Another big difference is, according to clinical findings, the latent cases also have the potential of transmitting the virus, and thus we add the interaction of \(E_i\) with \(S_i\) into the increment of \(E_i\)7,40,41.

The unknown parameters and state variables in the equation set are

\(*\):

\(b_i\): the transmission rate with non-informative prior range [1, 1.5];

\(*\):

\(r_i\): the report rate with non-informative prior range [0.1, 0.3];

\(*\):

\(E_{i0}\): the data for the latent population with non-informative prior range [0, 500].

\(*\):

\(U_{i0}\): the initial data for the unreported population with non-informative prior range [0, 200].

\(*\):

\(S_{i0}\): the initial data for the susceptible population defined by \(N_i-E_{i0}-I_{i0}-A_{i0}\).

Other parameters are:

\(\gamma\)::

the transmission ratio between unreported and latent. In the simulation we set it to be 0.5;

\(D_c\)::

the average duration of infection for critical cases. We assume \(D_c = 2.3\) days42.

\(D_e\)::

the average latent period. According to43, \(D_e = 5.2\) days.

\(D_l\)::

the average duration of infection for mild cases. We assume \(D_l = 6\) days.

\(\alpha _t\)::

the ratio of interstate travel volume compared to that of 2019 during the same period. The travel flow information \(n_{ij}\) was extracted from the SafeGraph mobility data, and we set \(\alpha _t=0.5\) to represent the travel reduction situation observed in the year of 2020.

\(c_{I}\)::

proportion of critical cases among all reported cases. We choose \(c_{I} = 0.1\).

\(c_{U}\)::

proportion of critical cases among all unreported cases. We assume \(c_{A} = 0.2\).

There is an essential assumption made in the model: the homogeneity in the population. It means that the traffic flow is a good representation of the total population without considering their demographic and socioeconomic characteristics. The susceptible, exposed, and unreported move in and out of states at the same rate. This explains the \(\frac{S_i}{P_i}\), \(\frac{E_i}{P_i}\) and \(\frac{U_i}{P_i}\) terms in the \(S_i/E_i/U_i\) equation.

The effective reproductive number \(R_e\) could be computed as

$$\begin{aligned} R_e = \frac{b}{E+U}\left[ \gamma D_e E + \frac{D_c D_l U}{c_U D_l + (1-c_U)D_c}\right] \,. \end{aligned}$$
(3)

\(R_e\) depends on time due to the time dependence of E and U.

The COVID-19 transmission dynamics (the ODE system) was simulated using the Forward Euler method, with each day discretized into 24 smaller time periods to ensure the numerical stability (see Supplementary Fig. S3). The parameter fitting was conducted under the Bayesian formulation that combines the effect of the underlying dynamics governed by the ODE system, serving as the prior knowledge, and the collected data, appearing in the likelihood function, to generate the posterior distribution that characterized the behavior of the state variables, including SEIUR, as well as the two unknown parameters, b and r. For this classical data assimilation problem, we employed the Ensemble Kalman Filter method that was derived from the Kalman filter and tailored to deal with problems with high-dimensional state variables44,45. The method proves to be effective when the measuring operator is linear and the underlying dynamics is Gaussian-like. It has been applied to a vast of problems that do not strictly satisfy the Gaussianity requirement. To apply this method, we generated 2000 samples according to the prior distribution, and evolve the samples through the dynamics of the ODE system. The samples were then rectified at the end of each day, using the announced number of confirmed cases, for tuning the two unknown parameters b and r.

At the beginning of the simulation, March 1, only a few states had non-zero confirmed cases. The true numbers of exposed people and unreported cases on that day, however, are unknown. These two numbers are also the state variables that need to be inferred to using the collected infection data. On March 1, we put a non-informative prior with range [0, 500] and [0, 200] over the exposed latent population and unreported infectious population in each state, respectively. Supplementary Figs. S4S13 show the data assimilation results for different states including the number of people in different compartmental groups and their temporal changes with \(95\%\) credible intervals. The average reporting rate r over all states is 0.2266 at the end of March 20 through the data assimilation method.

For forecasting (in supplementary material), we performed scenario studies of two types. First, we ran the mathematical model by applying the initial data obtained as of March 20 into the future for the next 40 days, but with different configurations of \((b,r,\alpha _t)\). The simulation results out of this setting were then compared with those from the setting that the three parameters remained unchanged for each state. To quantify and visualize the difference, we compared the increase of the percentage of the non-affected population when the measures of stay-at-home, increasing test rate, and travel bans were enacted.

The second scenario was about a more ideal situation: every confirmed case would get isolated immediately, as well as those who had been exposed to those confirmed cases, no matter if those who had been exposed had started to show symptoms or not. We built a new mathematical model that incorporated such isolations to study the effect of them. A new quarantined compartment (Q) was introduced into the model. Through the simulation, we examined the correlation between the average action-taking time (i.e., temporal lag in putting a person into quarantine denoted by \(D_q\)) and the increase of non-infected population. In both scenario studies, the simulation was run with the Forward Euler ODE solver, during which each day was divided into 24 intervals to achieve a numerical stability.

As a SEIR-type epidemic model, this model describes the dynamics of different compartments of the population, and assumes homogeneity within each compartment. However, we should note that this assumption may not be valid in real-world scenarios with heterogeneous populations and infections. Indeed, when an individual contracts the disease, the status could be either mild or severe. In our model, this is absorbed by the report rate \(r_i\) but is not explicitly differentiated in the model. A more sophisticated model should have the heterogeneities included, but that would pose a significant higher computational demand and more detailed empirical or clinical data support. We leave that to future research efforts.

Figure 1
figure 1

The spatiotemporal distribution of predicted infected population (in natural logarithm scale) across all states under different simulation scenarios: (A) \(\alpha _r = 1\) and \(\alpha _b = 1\), i.e., all parameters took the values of the initial configuration, obtained through data assimilation method using the numbers of confirmed cases during March 1 – March 20, 2020; (B) the travel flow was reduced to \(\alpha _t = 0.05\), while other parameters values remained unchanged; (C) \(\alpha _r = 0.1\) and \(\alpha _b=1\); (D) \(\alpha _r = 1\) and \(\alpha _b=0.1\); (E) \(\alpha _r = 0.1\), \(\alpha _b=0.1\). In the simulations, the transmission rate was set to be \(b = \alpha _b b_0\) and the reporting rate was set to be \(r = 1-\alpha _r(1-r_0)\). Where \(r_0\) and \(b_0\) were the reporting rate and the transmission rate on March 20, 2020, which are inferred from the data assimilation step (Note: The maps are created using Esri’s ArcGIS 10.7 software).

Figure 2
figure 2

The prediction time series of the total infected population in the 15 most affected states under two scenarios: (A) \(\alpha _r = \alpha _b = 1\), i.e., both the reported rate and the transmission rate remained unchanged; (B) \(\alpha _r = \alpha _b=0.1\), i.e., the transmission rate b was smaller and the reported rate r was larger (closer to 1) as \(r = 1-\alpha _r(1-r_0)\).

Figure 3
figure 3

(A) Susceptible population (S) on April 29, 2020 as a function of \(\alpha _r\). \(S(\alpha _r = 1)\) is the susceptible population on April 29 computed with the report rate set as the original report rate inferred from the data assimilation step. In all states, S increases as \(\alpha _r\) decreases, meaning that more people stay unaffected when a higher report is enacted. (B) \(R_e\), the basic reproduction number, on April 29 for different \(\alpha _b\) and \(\alpha _r\) in NY. The red line is the level set \(R_e = 1\). It can be seen that increasing the reported rate helps diminish the reproductive number, but cannot reduce \(R_e\) under 1 if the original transmission rate \(b_0\) is applied; (C) Susceptible population on April 29 for different \(D_q\). \(S(\alpha _r = 1)\) is the same as in (A). S significantly depends on the period from expose to quarantine.

Figure 4
figure 4

(A) Susceptible population (S) on April 4, 2020 as a function of \(\alpha _r\). The panels on the left and on the right are results from \(\gamma = 0.5\) and \(\gamma = 1.5\), respectively. For both \(\gamma\), S increases as \(\alpha _r\) decreases, meaning that more people stay unaffected when a higher report is enacted. (B) Susceptible population on April 4, 2020 for different \(D_q\) and different values of \(\gamma\). For those states whose susceptible population is much smaller than their total population due to a high infection rate (such as in NY), S significantly depends on \(D_q\) for both \(\gamma <1\) and \(\gamma >1\).