Abstract
We combine COVID19 case data with mobility data to estimate a modified susceptibleinfectedrecovered (SIR) model in the United States. In contrast to a standard SIR model, we find that the incidence of COVID19 spread is concave in the number of infectious individuals, as would be expected if people have interrelated social networks. This concave shape has a significant impact on forecasted COVID19 cases. In particular, our model forecasts that the number of COVID19 cases would only have an exponential growth for a brief period at the beginning of the contagion event or right after a reopening, but would quickly settle into a prolonged period of time with stable, slightly declining levels of disease spread. This pattern is consistent with observed levels of COVID19 cases in the US, but inconsistent with standard SIR modeling. We forecast rates of new cases for COVID19 under different social distancing norms and find that if social distancing is eliminated there will be a massive increase in the cases of COVID19.
Similar content being viewed by others
Introduction
The COVID19 pandemic has caused great disruption. Over 43 million people have confirmed diagnoses of the disease, and over 1 million people have died from it^{1}. It has also had substantial impacts on daily lives and economic activities^{2,3}. Many studies have focused on measuring who are affected the most by COVID19^{4,5}, or which therapies are appropriate at each stage of the disease^{6,7,8}. However, it is also crucial to understand how the spread of COVID19 depends on preventive measures such as social distancing and how the reopening may affect the spread.
The most common model used to study the spread of COVID19 is the susceptibleinfectedrecovered (SIR) model. In such models, there is a susceptible population, which is assumed to be equal to the population of whichever region is being examined minus the number of people that have previously had the disease. Some of the susceptible individuals get infected in each period, where the rate of infection is a function of the number of infectious individuals as well as other factors that shift the rate of transmission. Finally, infectious individuals move to a state of recovery. In our analysis, we call anyone who was sick but is no longer infectious to be “recovered,” although some of these people may still actually be sick, hospitalized, or have died. Thus, the recovered terminology is actually a shorthand for all postinfectious states. This model, and its variants, have been used extensively to study the growth of COVID19. For example, a recent study estimates a SusceptibleExposedInfectedConfirmedRemoved (SEIQR) model, which appends the standard SIR model with a stage modeling susceptible people who become exposed to the virus and a stage modeling infected people which are confirmed to have the disease^{9}. The paper then applies this model to estimate COVID19 transmission in Wuhan, China, showing that an earlier lockdown makes the outbreak worse in Wuhan but helps the rest of the world. The SEIQR model is also used to show that travel restrictions may have reduced the spread of COVID19 from Wuhan, China, to other Chinese cities^{10,11}. As another variant to the SIR model, the SEIR model (adding an exposure stage to the SIR model) is also applied to compute the rate of transmission both from animals to people and from people to people^{12}. While it may seem that having more stages in the model would make the SEIR model superior to the SIR model, it has been shown that the standard SIR model does a better job at predicting the spread of COVID19, based on data from Wuhan, China^{13}.
In this paper, we use a modified version of the SIR model to measure the extent to which social distancing reduces the speed at which COVID19 spreads. We then run simulations to forecast the rates of COVID19 spread under different social distancing levels during the reopening. We find that COVID19 spreads less than proportionately with the number of infectious individuals, a distinct difference from the assumption of standard models. We demonstrate that this pattern could be explained by the interconnectedness of people’s social networks. This pattern suggests that each additional infectious individual has less impact on the disease spread as more people become infected. One key implication of this finding is that the rate of disease growth can be slow and steady, rather than either exponential or falling quickly, as would be implied by the mostcommonly used models. This leads to more accurate predictions of the spread of COVID19. We also observe that social distancing greatly reduces the spread of COVID19.
Mathematically, we model transmission of COVID19 as
where \(y_{i,t}\) is the number of new infections in county i on date t, \(R_{i,t}\) is the rate at which infectious individuals transmit the disease, \(S_{i,t}\) is the percentage of the county population that has not yet had COVID19, and \(Y_{i,t}\) is the cumulative number of individuals who have been infected by date t. Correspondingly, the \(Y_{i,t2}Y_{i,t8}\) term reflects our assumption that infected individuals are infectious from the second day after they catch the virus through the seventh day. This implies that the average serial interval is 4.5 days under the assumption that the level of infectiousness and the level of contact with susceptible individuals is constant during this time^{14}. This treatment of the infectious population is an approximation of the standard SIR model, where the infectious population is typically modeled as a stock that has a constant outflow rate. Discretizing the rate of transmission enables the estimation of a large number of county and date fixed effects in our model, and as a practical matter this assumption has little impact on our estimates of the contagion rate. As a robustness check, we obtain extremely similar COVID19 forecasts if we take the time of infectiousness to be 14 days, \(Y_{i,t2}Y_{i,t16}\), instead of 6 days, as presented in the appendix. The main difference between our model and the standard SIR model is the inclusion of the exponent \(\omega\) on the number of infectious individuals. This \(\omega\) allows the rate of growth of COVID19 to be less than proportionate with the number of infectious individuals if \(\omega <1\). Such a result would be expected if infectious individuals expose many of the same unexposed individuals, which could occur if people have overlapping social connections. We see this directly when, for example, cases are clustered within households, nursing homes, or places of work. Thus, we can think of \(\omega\) as measuring the extent to which people’s networks are more interconnected to a tightknit group of individuals relative to their level of connectedness to the population as a whole.
We also allow the transmission rate \(R_{i,t}\) to vary with a number of factors instead of treating it as a constant parameter:
We use \(d_{i,t}\), \(m_{i,t}\), and \(h_{i,t}\) to represent the level of social distancing, temperature, and humidity in county i on date t, respectively, and \(\varepsilon _{i,t}\) is the statistical error term. The parameters \(\alpha\) and \(\beta\) are vectors of county and date fixed effects, where the ith element of \(\alpha\), \(\alpha _{i}\), represents the fixed effect for county i. Similarly, the tth element of \(\beta\), \(\beta _{t}\), represents the fixed effect for date t. These fixed effects measure the baseline transmission rate of each county and each date, respectively. The parameters \(\lambda\), \(\mu\), and \(\theta\) measure the impacts of social distancing, temperatures, and humidity on transmission rates, respectively. In short, this specification allows transmission rates to differ across counties (through the county fixed effects), dates (through the date fixed effects), levels of social distancing, temperatures, and humidity. We note that the impacts of the last two factors have been debated in the literature^{15,16,17}. The county fixed effects account for differences in demographics across counties, such as the demographics shown in Table 2 below as well as other unobservable countyspecific factors. The date fixed effects account for both dayoftheweek differences in the patterns of travel for people (e.g., the time away from the house to go to work or to go to the park, which may lead to different exposures to the disease) as well as differences in the rate of testing and reporting that occur across time. As a robustness check, we also include the statelevel testing numbers directly into Eq. (2) during estimation. The results are statistically indistinguishable from the main results, as noted in “Results and simulation” below.
The social distancing measure, \(d_{i,t}\), is based on cellphone GPS location data that are provided by SafeGraph for free to researchers studying COVID19. We measure social distancing as the first principle component of several daily measures of each county: the percentage of residents staying home, the percentage of residents working at workplace fulltime, the percentage of residents working at workplace parttime, the median duration of residents staying home, and the median distance of residents traveled.
As noted earlier, the most crucial difference between our model and a standard SIR model is that a standard SIR model constrains the exponent \(\omega =1\). We instead find that \(\omega =0.57\). Thus, the marginal impact of one more infected person diminishes as more people are infected. Such a result would be expected if infectious individuals expose many of the same unexposed individuals within a clustered network of individuals. In the appendix we demonstrate that a networking model with contagion can yield \(\omega <1\).
Results and simulation
The estimated model appears in Table 1, with standard errors (s.e.) reported in the parentheses. The estimated exponent on the number of infectious people, \(\omega\), is 0.57. Thus, the number of new infections is concave with respect to the number of infectious individuals. This level of concavity also implies that while initial outbreaks of COVID19 expand exponentially, the daily number of new cases quickly stabilizes to a longterm plateau. We also find that social distancing has a large impact on the growth rate of COVID19, while humidity has a smaller effect and temperature is insignificant. (When including daily testing numbers of each state in Eq. (2), the estimates of \(\omega\) and social distancing are 0.568 (s.e. = 0.014) and − 0.816 (s.e. = 0.246), respectively).
All countylevel demographic factors remain constant over time in our analysis. While our main regression gives many insights, impacts of these demographic factors on the spread of the virus are captured by the county fixed effects. In order to better understand how these factors affect the contagion rate, we next regress the county fixed effects \(\alpha\) on several demographic variables of each county. The coefficients from this regression should be thought of as the impacts of these demographics on the transmission rate. The results from this regression are reported in Table 2. We observe that the contagion in the disease is increased with greater population density and the percentage of commuters who use public transportation. We also observe that contagion rates are higher in areas with a higher fraction of Black and Hispanic residents. Furthermore, the rate of spread is higher for seniors than for younger people, but children and nonsenior adults do not seem to have statistically significantly different rates of contagion.
We next measure the outofsample prediction accuracy of our model using a holdout sample of 75 days (May 24–August 6) to see how well our model forecasts new cases. We use the observed county level of daily social distancing for our outofsample predictions. Nationally, this reflects an approximately 50–60% returntonormalcy, but this varies quite a bit across the country. We define the percentage returntonormalcy as \(\frac{SocialDistancingPeak_{i}SocialDistancing_{i,t}}{SocialDistancingPeak_{i}SocialDistancingBeforeCOVID_{i}}\), where \(SocialDistancingPeak_{i}\) is the social distancing level in county i at its peak (April 5–April 11, 2020), \(SocialDistancingBeforeCOVID_{i}\) is the observed lowest level of social distancing in February, and \(SocialDistancing_{i,t}\) represents social distancing level on date t. For example, a 25% towards normalcy represents social distancing at the level of 0.25\(\times\)(minimum social distancing) + 0.75\(\times\)(maximum social distancing).
Figure 1 shows the US actual cumulative cases along with outofsample forecasts from a model with \(\omega =0.57\) and a standard model with \(\omega =1\). The black hashed line represents the actual cumulative cases in the US. The green solid line and the red dashedline show the outofsample forecasts with \(\omega =0.57\) and \(\omega =1\), respectively. We readily observe that the model with \(\omega =0.57\) fits the data well while the model with \(\omega =1\) does not. Three states that had their ShelterinPlace orders expire or stuck down early are Florida, Georgia, and Wisconsin. To further evaluate our model’s accuracy in prediction, we repeat the same outofsample prediction comparisons for these three states in Figure 2. The figure again shows that the model with \(\omega =0.57\) has a much better fit than the model with \(\omega =1\).
We next simulate daily and cumulative cases from August 7 to October 31, 2020 under different levels of social distancing. When forecasting future cases, we use previous 5year county temperature data and the May 2020 county average humidity. The top of Figure 3 shows three sets of forecast daily cases after August 6, corresponding to 75%, current, and 25% levels of returntonormalcy. We observe that social distancing at the current 60% returntonormalcy first leads to a slightly increasing but then slowly decreasing number of cases, going from around 55,000 cases per day in early August to 25,000 cases per day in the end of October. If the US practiced social distancing at the level reflecting a 25% returntonormalcy for even a few weeks, new cases would drop to a much lower level of around 9,000 per day. On the other hand, a return to a 75% level of the normalcy would cause cases to surge for about two months. The pattern of the surge is consistent with recent studies on the relaxation of nonpharmaceutical interventions such as shelterinplace orders^{18}. However, after two months cases would again reach a longterm plateau, although this would occur at a level that was almost double of what would be experienced under the earlyAugust level of social distancing. The bottom of Figure 3 depicts the corresponding cumulative cases for the same time period under 100%, 75%, current, 25%, and 0% levels of returntonormalcy. The figure shows a consistent pattern where the cumulative cases look almost linear after the initial takeoffs. There would be substantially more cases if we returned to the preCOVID level of social distancing.
Methods
In this subsection, we detail the assumptions we make and the estimation procedure. The model is laid out in Eqs. (1) and (2) above. For simplicity, we rewrite Eq. (2) as \(R_{i,t}=\exp \left( X_{i,t}^{\prime }\Phi +\varepsilon _{i,t}\right)\), where \(X_{i,t}\) includes county dummy variables, date dummy variables, the measure of social distancing \(d_{i,t}\), and daily average temperature \(m_{i,t}\) and humidity \(h_{i,t}\). \(\Phi\) is the vector containing the parameters \(\alpha\), \(\beta\), \(\lambda\), \(\mu\), and \(\theta\), which measures the impact of each element in the vector \(X_{i,t}\) on the transmission rate \(R_{i,t}\). We assume that the errors \(\varepsilon _{i,t}\) are uncorrelated across counties. We further assume that \(\varepsilon _{i,t}\) is uncorrelated across time, although we cluster the standard errors by county.
We estimate the model by taking logarithm of both sides. After rearranging we get:
Note that sometimes \(y_{i.t}\), the diagnosed case number, is 0 for some counties on some dates. Therefore, we adjust this formula slightly by adding 1 to \(y_{i,t}\) so the logarithmic values are always welldefined:
In some counties, \(Y_{i,t2}Y_{i,t8}\) is 0 for some periods. We do not use those observations for estimation. Note that because this is a lagged variable, this is a selection based on independent variables and not based on dependent variables, and hence it does not bias our estimation.
One concern that can arise in estimating this model is that social distancing levels (and regulations) are not determined in a vacuum: Rather, people social distance more in areas that are hit harder by COVID19. Thus, \(\varepsilon _{i,t}\) may be correlated with social distancing, causing a biased measurement of the impact of social distancing on the rate of contagion. We thus use an instrumental variables (IV) technique to control for this endogeneity bias, where the amount of rain is our instrument for social distancing. Specifically, we assume that rain directly shifts the level of social distancing, but is not correlated with \(\varepsilon _{i,t}\) conditional on the temperature and humidity. Several other papers have used rain as an instrument for social distancing^{19,20,21,22}. The firststage Fstatistic for the strength of rain as an instrument is 214.44, which is highly significant, indicating that rain is a strong instrument.
Data
Our data come from a multitude of sources. We detail the data sources at https://github.com/songyao21/covid_data_depot. There are a few nonstandard issues to note. Our data on COVID19 cases consists of countylevel, officially confirmed daily case data of 2,924 US counties from February 1 to August 6 (with the last 75 days used as a holdout sample). COVID19 also has an incubation period of approximately 5 days^{23,24}. Because of this lag from infection to diagnosis, we assume that cases reported on a particular date actually measure the COVID19 infections from 5 days earlier. We also assume that the true number of cases is approximately 10 times the number of diagnosed cases. We get this number by assuming that the Infection Fatality Rate (IFR) is 0.75%^{25}. We also assume that any deaths occur 14 days after the confirmed test results. On May 23, 2020, the last day of our estimation case data, there were 92,622 deaths in the US. On May 9, 2020, there were 1,304,726 officially diagnosed cases. We hence obtain the factor as (92,622/0.0075)/1,304,726 = 9.5. We round this number up to 10. This is consistent with Centers for Disease Control and Prevention (CDC) director Robert Redfield’s estimate of the ratio between actual and confirmed cases^{26}. Our estimates are not sensitive to the specific factor we use. When we run the simulations, we divide our model’s predicted case numbers by 10, which gives us the prediction of diagnosed cases.
Conclusion
We use a modified SIR model to study the impacts of different factors on the spread of COVID19. We find that the impact of each additional infectious individual decreases as more people become infected. A potential mechanism underpinning this finding is that infections are more likely to occur within interconnected networks. Understanding the shape of this relationship, and the nonlinear aspects of it, are important for understanding how COVID19 spreads. Unlike previouslyestimated SIR models, our model allows for the possibility that the contagion process will grow or shrink at relatively steady levels, whereas traditional SIR models have contagion either taking off exponentially (if \(R>1\)) or falling quickly (if \(R<1\)).
We further find that social distancing helps to curb the speed of the spread. Consequently, we need to be cautious of breakouts in networks and maintain a reasonably high level of social distancing during the reopening of the economy. Taking the network effects and social distancing effects together gives more accurate forecasts about the timeline of the disease spread, and the ability to analyze and set policies about when to instate shelterinplace restrictions or when to allow businesses to be open.
References
World Health Organization coronavirus disease (COVID19) dashboard. https://covid19.who.int/. Accessed on October 27, 2020.
The Opportunity Insights economic tracker. https://tracktherecovery.org/. Accessed on October 27, 2020.
Google COVID19 community mobility reports. https://www.google.com/covid19/mobility. Accessed on October 27, 2020.
Wu, Z. & McGoogan, J. M. Characteristics of and important lessons from the coronavirus disease 2019 (covid19) outbreak in China. JAMA 323, 1239–1242. https://doi.org/10.1001/jama.2020.2648 (2020).
Chen, T. et al. Clinical characteristics of 113 deceased patients with coronavirus disease 2019: Retrospective study. BMJ 368, 1–12. https://doi.org/10.1136/bmj.m1091 (2020).
Turk, C., Turk, S., Malkan, U. & Haznedaroglu, I. Three critical clinicobiological phases of the human sarsassociated coronavirus infections. Eur. Rev. Med. Pharmacol. Sci. 24, 8606–8620 (2020).
AjuriaIllarramendi, O., MartinezLorca, A. & del Prado OrduñaDiez, M. [18f]fdgpet/ct in different covid19 phases. IDCases 21, 1–2. https://doi.org/10.1016/j.idcr.2020.e00869 (2020).
Alsuliman, T., Alasadi, L., Alkharat, B., Srour, M. & Alrstom, A. A review of potential treatments to date in covid19 patients according to the stage of the disease. Curr. Res. Transl. Med. 68, 93–104. https://doi.org/10.1016/j.retram.2020.05.004 (2020).
Sun, G.Q. et al. Transmission dynamics of covid19 in Wuhan, China: effects of lockdown and medical resources. Nonlinear Dyn. 24, 1–13. https://doi.org/10.1007/s11071020057709 (2020).
Li, M. et al. Analysis of covid19 transmission in Shanxi province with discrete time imported cases. Math. Biosci. Eng. 17, 3710–3720. https://doi.org/10.3934/MBE.2020208 (2020).
Du, Z. et al. Risk for transportation of coronavirus disease from Wuhan to other cities in China. Emerging Infect. Diseases 26, 1049–1052. https://doi.org/10.3201/eid2605.200146 (2020).
Chen, T.M. et al. A mathematical model for simulating the phasebased transmissibility of a novel coronavirus. Infect. Diseases Poverty 9, 24. https://doi.org/10.1186/s40249020006403 (2020).
Roda, W. C., Varughese, M. B., Han, D. & Li, M. Y. Why is it difficult to accurately predict the covid19 epidemic?. Infect. Disease Model. 5, 271–281. https://doi.org/10.1016/j.idm.2020.03.001 (2020).
Nishiuram, H., Linton, N. M. & Akhmetzhanov, A. R. Serial interval of novel coronavirus (covid19) infections. Int. J. Infect. Diseases 93, 284–286 (2020).
Wang, J., Tang, K., Feng, K. & Lv, W. High temperature and high humidity reduce the transmission of covid19. Working Paper (2020). Accessed on May 20, 2020 at SSRN: https://ssrn.com/abstract=3551767 or https://doi.org/10.2139/ssrn.3551767.
Oliveiros, B., Caramelo, L., Ferreira, N. C. & Caramelo, F. Role of temperature and humidity in the modulation of the doubling time of covid19 cases. Working Paper (2020). Accessed on May 20, 2020 at https://www.medrxiv.org/content/10.1101/2020.03.05.20031872v1.
Wang, M. et al. Temperature significant change covid19 transmission in 429 cities. Working Paper (2020). https://www.medrxiv.org/content/10.1101/2020.02.22.20025791v1. Accessed on May 20, 2020.
Li, Y. et al. The temporal association of introducing and lifting nonpharmaceutical interventions with the timevarying reproduction number R of SARSCoV2: a modelling study across 131 countries. The Lancet Infectious Diseases 1–10, https://doi.org/10.1016/S14733099(20)307854 (2020).
Adda, J. Economic activity and the spread of viral diseases: evidence from high frequency data. Q. J. Econ. 131, 891–941 (2016).
Qiu, Y., Chen, X. & Shi, W. Impacts of social and economic factors on the transmission of coronavirus disease 2019 (covid19) in China. J. Popul. Econ. 33, 1127–1172. https://doi.org/10.1007/s00148020007782 (2020).
Kapoor, R. et al. God is in the rain: The impact of rainfallinduced early social distancing on covid19 outbreaks. Available at SSRN 3605549 (2020).
Holtz, D. et al. Interdependence and the cost of uncoordinated responses to covid19. Proc. Nat. Acad. Sci. 117, 19837–19843. https://doi.org/10.1073/pnas.2009522117 (2020).
Lauer, S. A. et al. The incubation period of coronavirus disease 2019 (covid19) from publicly reported confirmed cases: estimation and application. Ann. Internal Med. 172, 577–582 (2020).
Li, Q. et al. Early transmission dynamics in Wuhan, China, of novel coronavirusinfected pneumonia. N. Engl. J. Med. 382, 1200–1207 (2020).
MeyerowitzKatz, G. & Merone, L. A systematic review and metaanalysis of published research data on covid19 infectionfatality rates. Working Paper (2020).
Sun, L. H., Sun, L. H. & Achenbach, J. CDC chief says coronavirus cases may be 10 times higher than reported. Washington Post (2020). Accessed on June 25, 2020 at https://www.washingtonpost.com/health/2020/06/25/coronaviruscases10timeslarger.
Acknowledgements
We thank SafeGraph Inc. for providing the mobility data used in our analysis.
Author information
Authors and Affiliations
Contributions
M.L., R.T., and S.Y. formulated the model, crafted the data analysis strategy, and drafted and revised the manuscript. M.L. and S.Y. conducted the data analysis.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Liu, M., Thomadsen, R. & Yao, S. Forecasting the spread of COVID19 under different reopening strategies. Sci Rep 10, 20367 (2020). https://doi.org/10.1038/s41598020772928
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598020772928
This article is cited by

Novel indicator for the spread of new coronavirus disease 2019 and its association with human mobility in Japan
Scientific Reports (2023)

Modeling and Optimal Control for Resource Allocation in the Epidemic Monitoring of a Multigroup Population
SN Computer Science (2023)

Impact of national culture on the severity of the COVID19 pandemic
Current Psychology (2023)

A COVID19 model incorporating variants, vaccination, waning immunity, and population behavior
Scientific Reports (2022)

Listening to bluetooth beacons for epidemic risk mitigation
Scientific Reports (2022)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.