Introduction

The implementation of population-wide non-pharmaceutical socially-based suppressive measures, focused on lockdowns of whole communities, social distancing, travel restrictions, and increasingly the deployment of testing and contact tracing1,2,3, has led to remarkable success in dampening the initial waves of the ongoing severe acute respiratory syndrome-coronavirus 2 (SARS-CoV-2/COVID-19) pandemic globally (https://aatishb.com/covidtrends/). This has resulted in many countries attempting to lift these unprecedented behavioral measures to allow the re-opening of their economies while awaiting the arrival and mass administration of less socially-disruptive technologies, such as viable vaccines4. Health officials are, however, becoming increasingly concerned about the likely impacts that such re-openings could have on the subsequent transmission dynamics of the virus. One reason for this concern is that in many areas, easing of the above non-pharmaceutical interventions (NPIs) has taken place before the initial epidemics have reached their endpoints. Another is that herd immunity in communities that have been under these interventions has not developed to levels that would mitigate the possibility of infection resurgences5,6. These possibilities, including the economic and political imperatives for easing NPIs, have led to increased attention being paid to the identification and deployment of those social measures that will enable the containment of viral transmission to levels that would allow reopening of societies with minimal harmful health and social side-effects7,8,9,10.

A distinctive feature of the policy response to the management of the COVID-19 pandemic worldwide has been the role played by epidemiological modelling for evaluating the use of behavioral interventions exclusively for controlling epidemic outbreaks in populations8,9,10,11,12,13,14,15 These mathematical models, based primarily on extensions to the standard SEIR epidemic model, but also newer methods based on machine learning, network analysis, agent-based simulations, and empirical growth models based on incidence data16,17,18,19, have enabled predictions of the course of the epidemic to warn policy-makers of the gravity of potential impacts, as well as help them in making comparisons of the various social measures proposed for suppressing viral transmission in exposed communities. For example, these tools have played critical roles for evaluating the comparative effects of locking down communities versus allowing a portion of the population to be exposed and develop immunity as alternate strategies for containing both first and subsequent epidemic waves20,21.

However, while these models have been important for supporting generalized, scenario-based, policy planning, they can be less useful for simulating dynamics after the disease breaks out in real communities19. First, such models have little support for capturing the rapidly changing social contexts in which an epidemic evolves. This is an important need since for fast evolving phenomena, such as an epidemic, feedbacks from changing conditions need to be processed speedily to correct for the invariable modelling errors surrounding a dynamic forecast19. Second, addressing forecasts of the near future as an epidemic develops requires models that give accurate forecasts with the least variance at the lead times required by management22. Third, general models, even if supported by the available historical data, do not address the need for data assimilation in real-time forecasting. Such data assimilation, in which information regarding the extant transmission processes that are embedded in observational data is used to iteratively update the underlying dynamical principles represented by the structure and parameters of a model, has been shown to provide near-term forecasts of the state of a dynamical system which are better than could be obtained with just data or the model alone23,24,25,26,27,28. For forecasting epidemics over the near future, this data-model assimilation framework will allow shrinkage of forecast variance while also correcting for model bias and drift29. Finally, few currently available mathematical models of SARS-CoV-2 transmission provide predictions at lower administrative levels, such as the county30,31. Thus, they do not capture the spatial and social heterogeneity32,33 that drives epidemics in the real-world, and lack the resolution for evaluating interventions that take a fuller account of these heterogeneities at the lower spatial scales of an epidemic19.

Developing, testing and refining data-driven locally-applicable models for enabling reliable near- and long-term forecasts for decision-making can be a challenge especially when the objective is to control an ongoing epidemic15. One problem is the requirement that locally relevant observations are transmitted sufficiently rapidly to be useful for updating models within the lead times connected with making an effective public response. Fortunately, advances in communication systems, powerful data processing, storage software and hardware, and an increasing focus on the provision of standardized, open-source data for diseases, are now becoming available that are providing solutions to this problem34,35,36. In parallel, major progress has also been made in the development of iterative statistical data-model assimilation techniques, whereby data of diverse types and prior information regarding model structures and parameters can be used reliably to constrain model parameters or states in a setting, including supporting evaluations of forecast uncertainty over time22,23,25,37,38,39,40,41,42,43,44. Lastly, developments in cyberinfrastructures to automate the dynamic integration of new data and information to facilitate regular assessment of forecasts and active updating of models mean that the practical implementation of iterative data-driven locally-applicable epidemic forecasting is now increasingly becoming possible27,36,38,39,45,46,47,48,49,50,51,52,53.

This paper describes the efforts of our team to develop and use such an iterative data-model assimilation-based forecasting system for SARS-CoV-2, wherein we use a SEIR-type model updated sequentially with publicly available COVID-19 case and human movement data in order to provide predictions of the course of the pandemic under various social interventions at the county level in the United States. Here, we report on the use of these data from counties in the state of Florida to demonstrate how the developed data-driven modelling system can allow forecasts of the localized epidemic dynamics as well as enable evaluations of the relative impacts of the social interventions until mass vaccination strategies can come into play in the state. We also examine how gaining a better understanding of the geographical variation in the propagation of the virus and our attempts to curtail it can allow the derivation of a regionally-varying or tailored response that is effective at minimizing contagion whilst offering at the same time the advantages of less restrictive rules for parts of a state’s population.

Results

Model fits to data, validation, and assessment of geographic drivers of transmission

We modeled the observed SARS-CoV-2 outbreaks in the 67 counties of Florida (Supplementary Table S1) based on reported case and mortality data, and information regarding local social behaviors, through to September 30th, 2020. This was carried out by using a Bayesian melding data assimilation approach centered on the sequential calibration of our compartmental epidemic model to confirmed case and death time series data, and data on human movements, reported for each of these counties by the Johns Hopkins University Coronavirus Resource Center47 and the location data firm, Unacast (https://www.unacast.com/covid19/social-distancing-scoreboard), respectively (see “Methods”). The model predictions for daily confirmed cases compared to data for nine representative Florida counties are shown in Fig. 1. The results show that the 95% confidence interval bounds of the predictions from the ensemble of sequentially updated models for each county are able to envelop nearly all of the confirmed case data over the calibration period. The corresponding model fits to cumulative confirmed cases from all Florida counties are portrayed in Supplementary Fig. S1.

Figure 1
figure 1

Model fits compared to confirmed daily case data in 9 representative Florida counties. Gray curves represent county-specific model predictions and red points represent confirmed case data obtained from Johns Hopkins University47. Fitting was started after at least 10 confirmed cases were reported. Results from model fits to representative countries, stratified by initial incidence growth rate in each county (Group a < 0.05, Group b > 0.05–0.15, Group c > 0.15) are shown. Group a counties: Alachua, Highlands, and Walton; group b: Hillsborough, Leon, and Sarasota; and group c: Broward, Miami-Dade, and Palm Beach.

Because the dynamics of COVID-19 are significantly influenced by changes in social behaviors, it is critical to iteratively calibrate transmission models, such as the present SEIR model, to new data and actively update the resulting predictions accordingly24,27. Supplementary Fig. S2 demonstrates the shift in model predictive performance as the model is updated sequentially with various lengths of incoming longitudinal data. Cross-validation analysis showed that carrying out such model calibrations to 2-week sequential blocks of data can maintain the relative mean-square error to consistently below 20%, while also being computationally feasible. Implementing this 2-week sequential updating procedure thus allowed us to incorporate information regarding changing transmission conditions in each county as effectively as possible into the model.

Supplementary Table S2 summarizes the values of the posteriors obtained by model fitting to the 14-day case day prior to September 30th for the social distancing parameter, d, which modifies the transmission rate, β, in each county, as well as for the fraction of the respective county population remaining under mobility restriction as estimated using the Unacast mobility data (see “Methods”).The results show that while the values of each of these two key social parameters varied between the present counties, the values of the social distancing parameter, d, appears to be comparatively less variable compared to those estimated for the lockdown fractions (Supplementary Table S2). This suggests that the between-county variations in SARS-CoV-2 outbreak dynamics, and the individual county-level response to the intervention scenarios, reported here (from October 1st 2020) may be a reflection of the combined effect of the initial incidences and variations in the numbers of the susceptible populations in a county that are released from restricted movement.

Forecasting the epidemic and impacts of social interventions

We used the models updated using the infection/death data reported to September 30th 2020 in each county to simulate both the local epidemic dynamics and compare the dynamical impacts of six different social interventions as described in “Methods” and depicted in Supplementary Fig. S3. The model predictions for infected cases under each intervention scenario through the end of the year are shown in Figs. 2 and 3, stratified by initial incidence growth rate (Group a < 0.05, Group b > 0.05–0.15, Group c > 0.15; see “Methods”). Scenario 1 is the least aggressive option considered with no interventions put in place and a full release of strict stay-at-home orders after September 30th. The results for this scenario show that because the social restrictions in Florida were lifted before the first epidemics ended, resurgence of the epidemic will be inevitable in every county (Fig. 2). Indeed, an average of 27% (with a range of 15–47%) of the overall population across all counties is projected to become infected at the peaks of the resulting 2nd waves under this scenario (Fig. 2, Table 1, Supplementary Table S1). Note that these, and subsequent model forecasts, account for all infected cases, including those who are not yet infectious (exposed class), asymptomatic, presymptomatic, and symptomatic. The county level model predictions from this scenario also highlight the variation in the peak size of the 2nd waves that different counties could have faced with the lifting of all interventions, with these sizes ranging from 1459 cases in Lafayette to as high as 742,898 cases in Miami-Dade County (Table 1). The time course of the epidemic is also variable with higher incidence counties expected to see generally later peaks compared to lower incidence counties (Fig. 2). Figure 2d shows that the predicted size of the 2nd wave peaks are directly related to those of the first waves that occurred in each county, further underlining the impact that initial variation in local conditions of virus transmission can have on the size of subsequent county-level infection resurgences in each county. The simulations also reveal that while the size of the 2nd waves will be large with the full release of lockdowns, the epidemics in each county will nonetheless, as expected, eventually end (Fig. 2), with the possibility that this will occur earlier in the case of the low incidence counties compared to the case with high incidence counties where the corresponding 2nd waves are predicted to end much later.

Figure 2
figure 2

Epidemic forecasts for individual counties in the scenario where all interventions are lifted following the initial lockdown (S1). Each curve represents the median prediction for a given county. The results are stratified by initial incidence growth rate in each county (Group a < 0.05, Group b > 0.05– 0.15, Group c > 0.15). The intervention scenario (also see Supplementary Fig. S3) represents a full release of lockdown and social distancing from October 1st. The y-axis is shown in log-scale to better visualize the difference between the size of the first and second epidemic waves. The range of epidemic peaks and epidemic ending dates for each group is shown by red and blue dotted vertical lines, respectively. The 2nd wave peaks of infected cases occurs between September 27th–November 2nd for Group a, October 3rd–November 7th for group b, and October 3rd–November 2nd for Group c. The epidemic ending dates occur between December 2nd–January 21st, 2021 for Group a, December 2nd–January 25th, 2021 for group b, and January 16th, 2021–March 2nd, 2021 for Group c, respectively.

Figure 3
figure 3

Epidemic forecasts for individual counties under four different social intervention scenarios. Note the differences in y-axis values when comparing scenarios. Each curve represents the median prediction for a given county. The results are stratified by initial incidence growth rates observed in each county (Group b < 0.05, Group b > 0.05–0.15, Group c > 0.15). The intervention scenarios are as follows (also see Supplementary Fig. S2): S2: maintain movement estimate and social distancing measures over 2 weeks from October 1st to October 14th; S3: maintain movement estimate and social distancing measures over 8 weeks from October 1st to November 30th; S4: maintain movement estimate and social distancing measures through end December 2020 + begin low intensity contact tracing and quarantine efforts on October 1st through March 2021; S5: maintain movement estimate and social distancing measures through end December 2020 + begin high intensity contact tracing and quarantine efforts on October 1st through March 2021. The red shading in each plot indicates the duration of movement and social distancing in each scenario, while the blue shading shows quarantining measures without movement restrictions or social distancing. The range of 2nd wave infection peaks for each county group is shown by dotted vertical lines. For scenario 2, the peak of infected cases occurs between October 29th–November 16th for Group a, October 29th–November 14th for group b, and November 2nd–November 13th for Group c. For scenario 3, the peak of infected cases occurs between December 17th–January 7th 2021 for Group a, December 16th–January 5th 2021 for group b, and December 17th–January 3rd 2021 for Group c. Epidemic ending dates for scenario 2 occur between January 2nd–February 17th 2021 for Group a, January 6th–February 21th 2021 for group b, and February 11th–February 27th, 2021 for Group c, respectively. For scenario 3, the corresponding end dates were between February 20th–March 2nd 2021 for Group a, and between February 22nd to beyond March 2nd for Groups b and c.

Table 1 Total median infected cases predicted at the epidemic peak in each county. The 90% confidence interval is given in brackets. Counties are stratified by initial county-level incidence growth rate.

The inclusion of social distancing measures through October and November in scenarios 2 and 3, respectively, is predicted to have two key effects (Fig. 3). First, it is to be noted that these measures will not prevent the occurrence of sizable 2nd waves; however, they will shift the timing of the county-level 2nd epidemic peaks further into the future, with this shift more pronounced for scenario 3 (from the October/November peaks predicted for scenario 2 to the December/January peaks forecasted for scenario 3; see legend to Fig. 3). These measures, however, will significantly reduce the size of the 2nd epidemic peaks, with the more intensive scenario 3 bringing about an average case reduction of 76% (with a range of 37–83%) compared to scenario 1 (Table 1). Again, these outcomes will vary by county group, with shifts to 2nd peaks, and resolution of the epidemics, occurring generally later among the high incidence countries (Group c), while the greatest reduction in peak cases will occur for the low incidence counties (Fig. 3, Table 1).

Implementation of contact tracing and quarantine measures to prevent a fraction of the infectious population from spreading the disease (25% quarantine rate in scenario 4 and 50% quarantine rate in scenario 5 both through March 2021) along with sustained social distancing measures through December 2020, is predicted to have a uniformly high suppressive effect on the course of the 2nd waves in each county. Scenario 4 reduces the size of the averaged county-level epidemic peak to affect just 3% of the total population (with a range of 0–17%), while scenario 5 is predicted to reduce the average peak infection size to just 0.6% of the overall population in the state (with a range of 0–8.5%). This represents an average peak reduction compared to scenario 1 of 89% (with a range of 61–97%) for scenario 4 and 98% (with a range of 80–100%) for scenario 5. Furthermore, the results show that scenario 5 could even bring about breakage of epidemic transmission in some counties, particularly in the case of those that exhibited the lowest initial infection incidences (Table 1). Interestingly, these intervention scenarios also appear to generally lessen the between-county group variations in the timings and peaks of the predicted 2nd epidemics (Fig. 3).

The above scenario differences are also apparent in the predictions of the required hospitalizations at the county level (Fig. 4). However, the hospitalization forecasts also indicate that without the more aggressive interventions, such as those modeled in scenarios 4 and 5, there would be a high risk that predicted cases will exceed existing county-level hospital capacities (Table 2). This serious outcome will also vary significantly by geographic location.

Figure 4
figure 4

Forecasts of corresponding hospitalized cases for individual counties under the four different social intervention scenarios. Hospitalized cases include both hospital and ICU model compartments. Note the differences in y-axis values when comparing scenarios. Each curve represents the median prediction for a given county. The results are stratified by initial county-level incidence growth rate (Group a < 0.05, Group b > 0.05–0.15, Group c > 0.15). The intervention scenarios are as follows (also see Supplementary Fig. S2): S2: maintain movement estimate and social distancing measures over 2 weeks from October 1st to October 14th; S3: maintain movement estimate and social distancing measures over 8 weeks from October 1st to November 30th; S4: maintain movement estimate and social distancing measures through end December 2020 + begin low intensity contact tracing and quarantine efforts on October 1st through March 2021; S5: maintain movement estimate and social distancing measures through end December 2020 + begin high intensity contact tracing and quarantine efforts on October 1st through March 2021. The red shading in each plot indicates the duration of movement and social distancing in each scenario, while the blue shading shows quarantining measures without movement or social distancing. The range of peak hospitalized cases for each group is shown by dotted vertical lines. For scenario 2, the peak of hospitalized cases occurs between November 9th–November 24th for Group a, November 7th–November 23rd for group b, and November 12th–November 22nd for Group c. For scenario 3, the peak of hospitalized cases occurs between December 25th–January 15th 2021 for Group a, December 27nd–January 13th 2021 for group b, and January 2nd, 2021–January 18th 2021 for Group c.

Table 2 Total hospitalized cases predicted at the epidemic peak in each county. The 90% confidence interval is given in brackets. Bold cells denote situations when hospitalization cases are predicted to be below the corresponding county-level care (bed/ICU) capacity. Counties are stratified by initial county-level incidence growth rate.

Figure 5 shows the predictions arising from the simulation of the most intensive of the social intervention scenarios (scenario 6), viz. maintaining social distancing, lockdown, and a 25% quarantine rate from October 1st through the end of our simulation period, March 2021. The results show that this scenario is the only one among the six scenarios investigated that would have prevented the occurrence of a 2nd wave of COVID-19 in all the modeled counties. It would also hasten the ending of the epidemic locally with all low incidence counties predicted to achieve elimination of their epidemics as early as between November 1st 2020 to January 26th 2021, whereas high incidence counties will see their corresponding epidemics ending between December 20th 2020 to February 25th 2021.

Figure 5
figure 5

Epidemic forecasts for individual counties in the scenario where all interventions (movement restriction, social distancing, and contact tracing) are held through to March 2021 starting from October 1st 2020. Each curve represents the median prediction for a given county. The results are stratified by initial incidence growth rate (Group a < 0.05, Group b > 0.05–0.15, Group c > 0.15). The y-axis is shown in log-scale to better visualize the difference between the size of the first and second epidemic waves. Red vertical lines indicate the ranges of peak infected cases, while blue vertical lines indicate the typical ending times of the epidemic. The ending dates for the epidemic are predicted to occur between November 1st–January 26th 2021 for Group a, November 14th–February 25th 2021 for Group b, and December 20th–February 25th 2021 for Group c.

Figure 6 depicts the proportions of the populations recovering from infection and thus developing immunity to infection through time in each county for scenarios 1 (top panel) and 6 (bottom panel). These results show that these scenarios may bring about extinction of the epidemic in each group of counties via different mechanisms. In the case of scenario 1, the epidemics are ended through the development of high levels of herd immunity (between 88 to 97%) in the community, with as expected lower population-level immunity required to bring out epidemic extinction in the lower incidence counties. By contrast, the results for scenario 6 indicate that extinction can also be brought about by instituting strong long-duration social containment measures that can reduce transmission to sufficiently low levels to bring about epidemic fade-outs. However, it is important to note that this impact comes with the cost of generating very low levels of herd immunity in each community by the end of the epidemics, raising the possibility of the inevitable resurgence of transmission should new infected individuals bring the virus back into these communities.

Figure 6
figure 6

Proportion recovereds predicted over time for Scenario 1 (full release of interventions on October 1st) and Scenario 6 (lockdown, social distancing, and quarantining through to March 2021). Each curve represents the median prediction for a given county. The results are stratified by initial incidence growth rate in each county (Group a < 0.05, Group b > 0.05–0.15, Group c > 0.15).

Although scenario 1 can produce high levels of herd immunity (Fig. 6), it is clear, however, that this would be associated with higher death tolls than in the case of scenario 6 (Fig. 7). Cumulative predicted deaths through the entire period of these simulations (i.e. October 1st 2020 to end of March 2021) ranged from 70 to 4223 in the low incidence counties to as high as 28,130 in the high incidence counties in the case of scenario 1. These were significantly lower in the case of scenario 6 with the corresponding cumulative deaths ranging from 6 to 300 in the low incidence counties to between 185 to 3330 in the high incidence counties (Fig. 7). These findings underscore the health-economy trade-offs involved in using an approach focused on the evolution of herd immunity as opposed to one based on the use of more socially-disruptive measures for containing the present pandemic. An immediate full release of social measures is the least economically disruptive option, but results in higher cases, hospitalizations, and deaths. By contrast, implementing longer periods of social distancing measures will optimize reductions in health outcomes but will affect the working of the economy.

Figure 7
figure 7

Cumulative number of deaths predicted over time for Scenario 1 (full release of interventions on October 1st) and Scenario 6 (lockdown, social distancing, and quarantining through March 2021). Each curve represents the median prediction for a given county. The results are stratified by initial incidence growth rate in each county (Group a < 0.05,  Group b > 0.05–0.15, Group c > 0.15).

Discussion

Our goals in this work were two-fold. The first was to assess if it is possible to use publicly available longitudinal infection case and human movement data to derive reasonable mathematical models of SARS-CoV-2 transmission to allow the simulation and evaluation of the course of the ongoing pandemic at the local county level in the United States. The second goal was to evaluate how the viral contagion dynamics might interact with social options for controlling COVID-19, such that the results could be used to identify those measures that will enable the safe containment of the virus in the absence of a viable vaccine. We also attempted to determine the implications of variance in virus transmission risk across smaller spatial units within a region, such as local counties, for the design of the optimal social strategies for curbing the contagion.

Developing and using reliable data-driven models for forecasting live local epidemics is challenging given the need for both the locality-specific temporal data required for updating models, and the necessity that predictions have to be made within the lead times requisite for making effective public responses22. Here, we have addressed this problem via the implementation of an iterative data-model assimilation-based forecasting system that acquires and processes the latest data, updates our SEIR COVID-19 model, and generates new sequential forecasts over time. Key features of the system include procedures that leverage the availability of open source API-enabled case and mortality surveillance data that are reported daily by health departments at the county level47, the incorporation of independently-quantified county-wide non-essential movement data to serve as an estimator of the level of population mixing through time, and the Bayesian calibration of our model on a sequential basis. An additional recent feature of the developed system is the use of a continuous analysis framework to automate the computational pipeline to handle the various stages of converting the raw data into new forecasts, including: data assembly, modelling and forecasting, and presentation of the forecasts relevant to policy makers27,45. This makes it possible to generate high-quality forecasts for a large number of study settings much more effectively and speedily.

We have used our modelling system to examine the effectiveness of a range of likely SARs-CoV-2 social intervention scenarios for containing the contagion at local levels, starting with investigations of the impact of the state-wide phased easing of the community lockdown that was implemented in Florida between April 3 2020 to May 4 2020 (scenario 1). Figure 2 depicts the major outcome of this policy response, viz. the inevitability of the emergence of significant 2nd waves in all counties if all other social measures, such as social distancing (mask wearing and observation of physical distancing), are also discontinued before the 1st local epidemics are fully ended. While this prediction can be considered to be as expected and so unremarkable, a striking and possibly less commented upon feature, however, is that the predicted sizes of the 2nd wave peaks or intensity of the 2nd waves will vary between counties as a positive function of variations in the size of their 1st wave peaks. This indicates that the subsequent intensity of virus transmission following the full release of all social protective measures in a community will depend fundamentally on the initial incidence at the time of epidemic establishment in a locality. It will additionally also depend on the number of infected individuals remaining in each county following the ending of the state-wide lockdown that took place before the 1st waves had been fully controlled. These findings highlight the dangers of imposing a one size fits all policy (here pertaining to the decision to ease lockdowns on a common date across Florida) for managing a spatially variable contagion.

The results pertaining to the numerical size of the 2nd peaks are shown in Table 1, and indicate that the health burden of the pandemic would also vary markedly between counties from as low as 1459 cases in Lafayette to as high as 742,898 cases for the most populous Miami-Dade county. This further underscores the fact that apart from variable resurgences in infection, spatial heterogeneity in infection burdens could also be expected if social measures are fully released in all counties. There would also be considerable variations in the course of the 2nd waves even within each incidence group (Fig. 2), although in general low incidence counties would present earlier and lower 2nd wave peaks compared to the later timings and higher peaks predicted for the corresponding medium and high incidence group of counties.

Full release of social measures would, however, as expected, result in the eventual extinction of the local epidemics in all counties, because of depletion of susceptible individuals and development of high proportions of immune individuals in each population (Fig. 6). Although this result supports the notion that permitting the development of herd immunity in populations as one way of controlling the pandemic, the predictions regarding the local and state-wide hospitalization cases (Table 2) and deaths (Fig. 7) point to the dangers of adopting such an option8,9,10,12.

Our simulations of the impacts of social control measures of varying strength and nature demonstrated overall the vital importance of continuing with these measures following phased lockdown release for containing the epidemic while waiting for more effective and less-socially disruptive pharmaceutical measures (Figs. 2, 3, 6; Table 1). According to our findings, while the two social measures investigated here, viz. maintaining current social distancing over a shorter (to October 14th) and longer (to November 30th) periods, and testing, contact tracing and quarantining at a moderate (25%) or higher (50%) level, would not have prevented the emergence of 2nd waves in the majority of cases, they would nonetheless delay infection peaks and reduce the numbers of patients requiring admissions to hospitals. Indeed, the inclusion of quarantine measures to March 2021, while ending social distancing by mid-October or end November 2020, for example, would have decreased the 2nd wave peak numbers of infection or hospitalizations required to over 90% of the respective 1st peak numbers in each county. Inclusion of strong contracting tracing and quarantine (scenario 5) could have led to very low levels or even interruptions of epidemic transmission and zero hospitalization cases in those counties exhibiting the lowest incidence rates among the present counties (Tables 1 and 2). This is an important outcome as it suggests that if testing and contact tracing were ramped up, then counties would have been able to lift their social distancing measures and hence reopen their economies by end of 2021 without fearing that such an option would have led to an overwhelming of their hospital capacities.

We show that the most intensive social intervention modelled in this study, viz. phased lockdown release, maintenance of current social distancing measures and a 25% quarantine rate all maintained from October 1st to end of March 2021, was the only social option that would have not only prevented the occurrence of a 2nd wave, but also hasten the ending of the epidemic in all counties, with extinctions predicted to be possible as early as by November 1st in some low-medium incidence counties (Fig. 5). However, like scenarios 4 and 5 but unlike the full release of social measures modelled in scenario 1, this scenario will also be marked by the low level of herd immunity that would develop in populations by the end of the local epidemics, leaving communities vulnerable to the real threat of future epidemic resurgences should the virus be re-introduced after the lifting of interventions (Fig. 6). This finding indicates that either maintaining continued vigilance and control by testing and contact tracing measures will be required to counter this prospect of epidemic resurgence in these communities over the foreseeable future, or that ultimately, control of the epidemic would only be achieved through effective vaccination of county populations. Our results show that in the latter case, vaccination rates (with a highly effective vaccine) will need to be above 85% and even above 90% in the medium–high incidence counties (Fig. 6) to accomplish the resolution of the pandemic, although note that if significant population heterogeneity underlines virus transmission within a county, much lower rates (to as low as 50%) might be sufficient to arrest the local epidemic6,21. This is provided that the developed immunity operates over the long-term.

Our county-level forecasts also suggest that a spatially-tailored response would be more effective at minimizing harmful effects in communities, not just in relation to health outcomes, but also in terms of minimizing the disruption to the local and global economic and other social systems. Thus, we show that while combining social distancing measures to end of December 2020 with high intensity contact tracing and quarantine to March 2021 (scenario 5) could have depressed hospitalization cases within manageable levels across virtually all county incidence groups, it would have been possible to contain the pandemic in some low and medium incidence counties with a version of this scenario (scenario 4) that implements only low intensity quarantine (Fig. 4, Tables 1 and 2). This would allow reopening of the economies of these counties earlier than for high incidence counties, lessening the economic and other social disruptions faced by the populations of these counties. Similarly, our predictions of the impact of scenario 6, in which all interventions are implemented from October 1st 2020 onwards, indicate that resolutions of the epidemic would occur significantly earlier in low incidence counties than in the case of medium and high incidence counties, suggesting that a safe reopening of the state of Florida and indeed other US states could be effectively accomplished in a geographically phased manner that takes into account county-level variations in epidemic risk explicitly. Indeed, our web-based SEIRcast COVID-19 simulation tool (https://seircast.org/) that implements our iterative data-driven continuous integration modelling framework, is designed to provide policymakers with the means to devise precisely such spatially-explicit management plans. We believe that including this spatial dimension into both models and in mitigation plans would not only provide for better predictions of the pandemic dynamics across a spatial domain, but would additionally result in significantly better overall social outcomes for state populations.

While our findings imply that social measures in general are highly effective in containing and curbing COVID-19 transmission, further work to address the rapidly changing transmission conditions affecting the pandemic and emerging interventions will undoubtedly be required to extend the applicability of the present results. Perhaps a first need is to address what impact the advent of vaccines would have on the need for continuing with the social measures investigated to allow the safe reopening of parts of the populations in Florida as early as possible. The key question here is whether NPI strategies will need to continue and indeed must remain the mainstay of our attempts to contain the contagion even with the roll out of vaccinations. Indeed, if the present vaccines can only be delivered in a phased age-targeted manner, are not perfect but instead reduces susceptibility by a fraction, and if the immunity induced is not long-term or countered by virus mutants, then there is a need to investigate how best to adapt the social measures studied here along with vaccination to bring about the containment or resolution of the pandemic effectively54. We are currently extending our model to include these various vaccination scenarios to address this policy question.

It is also clear going forward that we need to consider the effects that between-county movement might have on the current model predictions. While personal movement was curtailed drastically by lockdown, and the phased ending of the lockdown has led to increased movement within counties—both of which we have been able to incorporate into our model via parameterization of the within-county movement data provided by Unacast—details of inter-county movement and its reliable incorporation into our model will be required if we are to better capture the impacts of state-wide policies that are beginning to focus on lifting of all restrictions fully55. Recently, Unacast52 has begun to publish population migration data in the US using cell-phone signals, which will provide a means to address this topic.

Our current model also does not represent the age-structure and health status of the county-level populations. Partly this is an outcome of our goal to develop a modelling system that would support the generation of forecasts for the contagion in all counties of the United States based on the data presently publicly available for facilitating model configurations—and these currently lack information on these variables47. Extending our SEIR model to include these features, however, would allow better treatments of the exposure, risk, and transmission conditions that are likely to underlie the spatial heterogeneity in epidemic dynamics observed at the county level18,19,31. The addition of population structure and health composition into our current SEIR model will require deriving and adding more compartments and the applicable contact matrices10,56,57, but also, as noted, the configuration data for parameterizing these additions appropriately. We are currently in the process of adapting the data from the POLYMOD study53,57 to begin the construction of the relevant social contact matrices and parameterizations required for accomplishing these major extensions to the model. Nonetheless, it is to be noted that our data-assimilation approach to estimating the transmission rate in each country (both the median and range of values from the ensemble of best-fit models) implicitly does allow capture of the contributions of age-structural and other differences in transmission between counties, suggesting we have been able to approximate the impacts of this factor to a reasonable degree on the results presented here.

Our sequential data-assimilation framework, while allowing the incorporation of longitudinal changes in transmission conditions into the model, has the outcome, as for all dynamical models, that prediction error will increase the further out of sample forecasts are made22,23,24,25,27,29. While we have attempted to reduce forecast errors by model fitting to two sets of variables (infection cases and deaths), obtaining new data on other currently latent states (e.g. the fraction of asymptomatic infected cases) would offer better constraining of parameters and hence forecast variance. However, this must be balanced by appropriately addressing the effects of parameter degeneracy and sample impoverishment, which would impact the ability of the model to fit novel data as transmission conditions change drastically over the near future37,58,59,60. We have used a resampling approach whereby at each sequential updating point, we have blended in 25% random samples from initial priors to the posteriors obtained during the uptake made a time step (every 2-weeks) previously to keep forecast error below 20% to address this problem in the simulations reported here. However, future work might need to consider the development of appropriate adaptive approaches developed in the field of particle filtering61 to resolve this problem more effectively. Regardless, we note that while our forecasts beyond 2 weeks ahead could attain variances as high as 40%, and so can affect the peak sizes and extinction dates reported here, this will have lesser impacts on the conclusions reached regarding the comparative outcomes of the interventions investigated in this study.

Methods

Epidemic model

We simulated the ongoing SARS-CoV-2 outbreaks at the county level using a variation of the SEIR model. The model compartments and transitions are shown in Fig. 8. Full equations are also given in Supplementary Material. We assume each county is a closed population and ignore demographic changes such that the total population size remains constant. The population is divided into compartments representing various infection stages: susceptible (S), susceptible but removed from the transmission process via lockdown policies (R1), exposed (E), infectious asymptomatic (IA), infectious pre-symptomatic (IP), infectious with mild symptoms (IM), infectious with severe symptoms requiring hospitalization (IH), infectious with severe symptoms requiring intensive care including ventilation (IC), recovered and immune (R2), and deceased (D). The model equations describing the transitions in and out of each class are given in the Supplementary Material along with a description of all the model parameters and their prior values (Supplementary Table S3). Note the model considers the fraction of the population classified in each compartment (all compartments sum to 1), which is then scaled to the appropriate county population size to get counts for each compartment.

Figure 8
figure 8

Compartmental model structure. Boxes or compartments represent host categories related to infection status, solid lines represent movement between compartments, and dashed arrows depict the states contributing to transmission. Full details of model structure and the full set of differential equations and parameters driving the model are given in Supplementary Table S3.

Data

To calibrate the model to the local county setting, we fitted the SEIR model sequentially (see below) to cumulative confirmed case and deaths data assembled from the start of the epidemic at the county level and published for public access by the Johns Hopkins University Coronavirus Resource Center47. The county population sizes are also made available via this database, which we use to scale the model predictions (see above). Hospital bed capacity in each county are provided by the Agency for Health Care Administration for the State of Florida accessed on April 5th62. A 7-day moving average is applied to the case and death data to smooth out testing irregularities.

Estimation of initial epidemic growth rate

The initial incidence growth rate, τ, was estimated by fitting a log-linear model to the daily new cases reported during the early exponential phase (the first 4 weeks generally) of the epidemic curve observed in each county63. The values estimated for τ in each county were used to stratify the counties in Florida into each of initially low (< 0.05), medium (> 0.05 to < 0.15) and high (> 0.15) incidence or epidemic groups.

Bayesian melding data assimilation

We used a Monte-Carlo-based Bayesian melding framework to undertake the sequential updating of the model to the cumulative case and death data40,64,65. We began by first defining uniform prior distributions for each of the model parameters based on current understanding of SARS-CoV-2 transmission and disease characteristics. These initial parameter priors and relevant references are given in Supplementary Table S3. Note that the number of initial infected cases at the start of the simulation period is sampled as the parameter E0 (Supplementary Fig. S4), the number of exposed cases when the first cases began to be confirmed. We consider the start of the epidemic in each county to be when there are at least 10 cases reported. At this point, we sampled N = 50,000 parameter vectors from the initial priors and simulate the outbreak for 14 days forward. The resulting 50,000 model predictions of the epidemic are then compared to the confirmed case and death data observed during the 14-day forecast period using a modified root-mean-square error distance metric that normalizes a traditional RMSE by the standard deviations of these data. This facilitated the combination of prediction errors with respect to case and death data together despite their different orders of magnitude:

$$MRMSE=\sqrt{\frac{1}{2n}{\sum }_{i=1}^{n}\frac{{\left({\widehat{y}}_{i}-{y}_{i}\right)}^{2}}{std(\widehat{y})}+\frac{{\left({\widehat{x}}_{i}-{x}_{i}\right)}^{2}}{std(\widehat{x})}},$$

where n is the number of time points over which to compare the model predictions to data, ŷi is the model-predicted confirmed case data on a given date i, and yi is the observed confirmed case count for the same date, i is the model-predicted death data on a given date i, and xi is the observed death count for the same date. Based on this performance metric, the best-fitting 500 parameter vectors are retained as the most likely parameter sets to describe the local outbreak during the chosen 14-day window. For simulating the epidemic for the next 14 day period, another 50,000 parameters sets are sampled of which 75% are randomly sampled from the posterior distribution of the most recent 14 day window, while another 25% are sampled from the initial parameter priors to avoid sample depletion59,60. These set of blended parameter vectors are used to sequentially select the best-fitting models over time, and are used to forecast the impacts of the interventions described above. Different fitting windows were tried, and a 14-day window was found to be long enough to be computationally feasible for the entire dataset for all counties, while being short enough to capture the changing epidemic behavior and keep forecast error consistently low (below 20%).

The best-fitting parameter vectors are used to simulate future scenarios. The forecasts allow for the prediction of future waves of infection, which are defined as a sustained positive growth rate of cases, leading to a maximum. The subtleties of identifying the exact timing of waves due to trivial oscillations in data has been explored in other works66.

Simulating interventions

We used the latest sequentially fitted model in each county to simulate the impacts of different social intervention scenarios on the course of the outbreak in the future (beyond October 1st 2020). We simulated six different scenarios, which are outlined graphically in Supplementary Fig. S3. Scenario 1 represents the least aggressive option where lockdown and social distancing measures (like modified behavior, physical distancing, mask wearing, and increased sanitization) are fully lifted after September 30th. Scenario 2 maintains lockdown in addition to keeping social distancing measures in place for 2 weeks from October 1st to October 14th. We consider scenarios 1 and 2 to mimic the State of Florida’s state reopening plan (https://floridahealthcovid19.gov/plan-for-floridas-recovery/). Scenario 3 extends the social interventions (lockdown plus social distancing measures) by maintaining it over a longer 8 week period to November 30th. Scenarios 4 and 5 represent maintaining current social distancing and movement restrictions through the end of the year (December 2020) in addition to implementing contact tracing and quarantine efforts at either low (q = 0.25) or high (q = 0.50) intensity, respectively, from October 1st to end of March 2021. Finally, Scenario 6 represents the most intense intervention scenario, viz. maintaining social distancing, lockdown, and low quarantine starting from October 1st through to the end of March 2021. We considered this to be the most intense intervention because we maintain all three social interventions for the longest period of time. We implemented a low quarantine effort in this scenario to represent an incremental increase in the intensity of interventions relative to Scenario 5. This allowed us to also investigate if this was sufficient given the longer period of interventions to have the biggest impact on the pandemic among all the scenarios investigated in this paper. Outbreaks were simulated under these conditions and the predicted number of cases are compared between each scenario and county group. The numbers of hospitalized individuals (IH + IC) are also forecasted to evaluate the potential resource needs under each scenario.

The effect of statewide lockdown measures in these simulations is implemented by adding a distinct susceptible class which is assumed to not contribute to disease transmission (R1). The proportion of this class that complies with strict stay-at-home orders is controlled through the ratio of parameters α and λ. This ratio is informed by the fraction of non-essential trips made by the population in each county as estimated by Unacast based on analyses of GPS mobility data52. Mobility data has been used as a predictive tool across many domains—it has been used to build mobility networks67, study the relationship between mobility and financial market performance68, and understand which mobility patterns lead to an increased risk of death due to COVID-19 infection69. We interpret the reduction in such non-essential trips from prior to the lockdown as a proxy for the proportion of the susceptible population remaining unexposed in a county during any time of a simulation—both during the lockdown and after lockdown measures were lifted. Note that, additionally, incorporating this fraction of protected susceptibles into the model by this means also allows us to address the question of self-isolation by individuals indirectly. Social distancing measures are modeled as a reduction in transmissibility of the pathogen through the parameter d and is primarily based on the effectiveness of masks against transmission of similar diseases70. Although the parameter d could capture the effect of lockdown in addition to social distancing measures, we decided to model the population under lockdown separately via the use of the independent Unacast data to retain our ability to undertake investigations, if required, of the relative impacts of these measures on the course of the pandemic in future simulations. Quarantine of infectious cases through contact tracing and/or testing is modeled simply as a proportion, q, of IA, IP, and IM as not contributing to transmission as a result of being detected and made to isolate themselves at home.