Main

During 2020, the United States documented more COVID-19 cases and deaths than any other country in the world1. The first US COVID-19 case was identified in Washington state on 20 January 20202. Over the course of the year, three pandemic waves took place: (1) a spring outbreak in select, mostly urban areas following the introduction of the virus to the United States; (2) a summer wave that predominantly affected the southern half of the country; and (3) an autumn–winter wave that remained pervasive until the spring of 2021. To understand the transmission of the virus and better control its progression in the future, it is vital that the epidemiological features that have supported these outbreaks are quantified and analysed in both space and time.

Here we use a county-resolved metapopulation model to simulate the transmission of SARS-CoV-2 within and between the 3,142 counties of the United States. The model depicts both documented and undocumented infections and is coupled with an iterative Bayesian inference algorithm—the ensemble adjustment Kalman filter—which assimilates observations of daily cases in each county, as well as population movement between counties9,10 (Supplementary Information). The Bayesian inference supports a fitting of the model to case observations and estimation of unobserved state variables (for example, population susceptibility within a county) and system parameters (for example, the ascertainment rate in each county). Synthetic tests indicate that the inference approach can recover key time-varying parameters across a diversity of simulation scenarios (Extended Data Fig. 1). The model fitting to observed case data captures the three waves of the outbreak as manifest at national scales (Fig. 1a), as well as in major metropolitan areas and at county scales (Extended Data Fig. 2). These inference results are robust to parameter settings and model configurations (Extended Data Figs. 3, 4, Supplementary Information).

Fig. 1: Model calibration and ascertainment rate.
figure 1

a, Model fitting to daily case numbers (blue dots) in the United States and the New York metropolitan area (inset). Solid and dashed lines show the median estimate and 95% CIs, respectively. b, Comparison between inferred percentage cumulative infections and seroprevalence in ten locations adjusted for antibody waning. The inset shows residuals of inference (inferred percentage of infected population minus adjusted seroprevalence). Centres and whiskers show medians and 95% CIs, and colour indicates the sample collection date in each location. Distributions are obtained from n = 100 ensemble members. Details on the serological survey are provided in Supplementary Information. c, Distributions of estimated ascertainment rate in the United States and five metropolitan areas. The centre line shows the median, box bounds represent 25th and 75th percentiles, and whiskers show 2.5th and 97.5th percentiles. Monthly posterior estimates are presented for March to December 2020. Distributions are obtained from n = 100 ensemble members.

Source data.

To further validate the fitting, we compared model estimates of cumulative infections to findings from US Centers for Disease Control and Prevention (CDC) seroprevalence surveys conducted at site and state levels3. The seroprevalence data, which provide an out-of-sample corroboration of the model fitting, were adjusted for the waning of antibody levels following adaptive immune response11,12 (Extended Data Fig. 5, Supplementary Information). Model estimates of cumulative infected percentages are well aligned with adjusted seroprevalence estimates from the CDC 10-site survey across sites and through time (Pearson’s r = 0.97, mean absolute error (MAE) = 1.31%) (Fig. 1b) and are similarly well matched to adjusted estimates at the state level (Extended Data Fig. 6). In addition, the seroprevalence generated using the estimated daily infections adjusted for seroreversion also matches the observed seroprevalence, and the results are robust to assumed use of a lower-sensitivity seroassay (Extended Data Fig. 6).

A critical feature of SARS-CoV-2 is its ability to infect and transmit largely from individuals who have not been diagnosed with the virus4. The model structure and fitting enable estimation of the ascertainment rate, the percentage of infections confirmed diagnostically, at county scales. The national population-weighted ascertainment rate averaged for all of 2020 was 21.8% (95% CI: 15.9–30.3%), similar to an estimate derived from surveys on healthcare-seeking behaviours13. This national ascertainment rate increased from 11.3% (8.3–15.9%) during March 2020 to 24.5% (18.6–32.3%) during December 2020 (Fig. 1c). The increase through time is a likely by-product of increasing testing capacity, a relaxation of initial restrictions on test usage, and increasing recognition, concern and care-seeking among the public. We additionally focus on five metropolitan areas in the United States. Small differences in the ascertainment rate manifest across these areas—in particular, ascertainment rates for Phoenix and Miami were higher than the national average for much of the year, whereas those for New York City, Chicago and Los Angeles were consistently below the national average.

At the national level, three pandemic waves were evident during spring, summer and autumn–winter (Fig. 1a); however, the structure differs among the five focus metropolitan areas, with New York and Chicago experiencing strong spring and autumn–winter waves but little activity during summer, Los Angeles and Phoenix undergoing summer and autumn–winter waves, and Miami experiencing all three waves (Extended Data Fig. 2). Los Angeles County, the largest county in the United States, with a population of more than 10 million people, was particularly severely affected during autumn–winter. The differences in virus activity produced different cumulative infection numbers through time (Fig. 2a). Population susceptibility at the end of the year was 69.0% (63.6–75.4%) for the United States, and among the focal metropolitan areas it ranged from 47.6% (37.2–54.8%) in Los Angeles to 73.2% (68.3–77.8%) in Phoenix. Although there is variability among counties, a substantial portion of the US population (69.0%) had not been infected by the end of 2020; however, pockets of lower population susceptibility, which are evident in the southwest and southeast on 1 August 2020 (Fig. 2b), expanded considerably by 31 December 2020 (Fig. 2c). In particular, areas of the upper Midwest and Mississippi valley, including the Dakotas, Minnesota, Wisconsin and Iowa, are estimated to have population susceptibility below 40% as of 31 December 2020.

Fig. 2: Estimates of population susceptibility.
figure 2

a, Estimated evolution of susceptibility to COVID-19 in the United States and five metropolitan areas. Solid lines show median and the area between the dashed lines is the 95% CI. bc, Estimated susceptibility in 3,142 US counties on 1 August (b) and 31 December (c) 2020. Colour shows median estimate.

Source data.

The structure of the outbreak is evident in both incidence and prevalence estimates (Fig. 3, Extended Data Fig. 7). Incidence indicates the daily number of newly infectious individuals—both confirmed cases of COVID-19 and those whose infections remain undocumented. The majority of infections each month are undocumented (Fig. 3a), as indicated by the low ascertainment rates (Fig. 1c). For all of 2020, an estimated 78.2% of infections in the United States were undocumented. Estimates of daily prevalence provide a measure of the community infectious rate (CIR), the fraction of the population currently harbouring a contagious infection. The national SARS-CoV-2 CIR was 0.77% (0.60–0.98%) on 31 December 2020, indicating that roughly 1 in 130 people was contagious (a similar percentage, 0.83% (0.52–1.26%), was estimated to be latently infected—that is, infected but not yet contagious) (Fig. 3b). Among the 5 focal metropolitan areas, the CIR varied considerably: in mid-November, Chicago reached a CIR of 1.51% (1.27–1.82%); whereas in Miami CIR increased to 1.25% (1.03–1.53%) during July. Los Angeles was even more burdened at the end of 2020, with a CIR of 2.42% (2.05–2.86%) as of 31 December 2020 (Extended Data Fig. 7).

Fig. 3: Estimated transmission and characteristics of COVID-19 in the United States.
figure 3

a, Estimated monthly total infections (blue bars) and confirmed cases (orange bars) in the United States and the New York metropolitan area (inset). Distributions are obtained from n = 100 ensemble members. The blue bars represent the medians and whiskers show 95% CIs. b, Daily confirmed cases (blue line, 7-day moving average) and estimated prevalence of contagious infections (red line, median; red dashed lines, 95% CIs) in the United States. Inset, result for the New York metropolitan area. c, Estimated CFR (blue lines) and IFR (red lines) in the United States and five metropolitan areas. Solid and dashed lines show median estimate and 95% Cis, respectively.

Source data.

The model fitting enables estimation of the case fatality rate (CFR) and the infection fatality rate (IFR). Using public line-list data from the CDC14, we estimated the distribution of time lag from case confirmation to death for each county and, using these estimates, deconvolved observed deaths to their date of case reporting15 (Extended Data Figs. 8, 9, Supplementary Information). CFR and IFR were then generated using these deconvolved death data. Both rates were highest nationally at the beginning of the spring wave: the CFR was 7.1% (4.8–9.8%) and the IFR was 0.77% (0.51–1.25%) during April (Fig. 3c). The national cumulative IFR up to 1 June was 0.69% (0.47–1.04%), in line with previous studies5,6,7 (Extended Data Fig. 2, Supplementary Information). Over the course of the year, with earlier diagnosis and treatment, improved patient care16,17,18 and—in the case of CFR—increased reporting of mild infections, the CFR and IFR dropped to 1.29% (0.98–1.68%) and 0.31% (0.22–0.44%) by December 2020, respectively. Both rates varied by location and over time; for instance, intermediate drops of CFR and IFR began for Chicago, Phoenix and Miami during the summer wave, in association with a decrease of the average age of hospitalized patients (Extended Data Fig. 8). During the winter of 2020, the CFR and IFR in most metropolitan areas increased slightly, possibly driven by greater hospitalization rates among older individuals (Extended Data Fig. 8) and strained healthcare resources19. Overall, these findings delineate the mortality risk associated with infection broadly. The national IFR during the latter half of 2020 hovers around 0.30%, well above estimates for both seasonal influenza20 (<0.08%) and the 2009 influenza pandemic21 (0.0076%). As COVID-19 deaths are likely to be under-reported, our estimate of IFR could be biased low.

We further examined the change of the reproduction number Rt, in response to changing local, reported COVID-19 case numbers in five US regions (Northeast, Southeast, Midwest, Southwest and West) during the spring, summer and autumn–winter (Supplementary Information). Results indicate that communities with increasing cases showed greater reductions of Rt (Extended Data Fig. 10). However, the rate of reduction in Rt decreased over successive waves. These findings are potentially driven by a number of factors modulating the reproduction number, including changing compliance with non-pharmaceutical interventions22 and seasonal modulation of virus transmissibility23. A more thorough analysis of this preliminary finding is needed.

The United States experienced the highest numbers of confirmed COVID-19 cases and deaths in the world during 20201. Our findings provide quantification of the time-evolving epidemiological characteristics associated with successive pandemic waves in the United States, as well as conditions at the end of the year and prospects for 2021. Critically, despite more than 19.6 million reported cases by the end of 2020, an estimated 69% of the population remained susceptible to viral infection. Several factors will considerably alter population susceptibility in the coming months. First, ongoing transmission will infect naive hosts and continue to deplete the susceptible pool. Second, as more vaccine is distributed and administered, more individuals will be protected against symptomatic infection and the IFR will decrease. Finally, our model does not represent reinfection, either through waning immunity or immune escape; however, reinfection has been documented24,25, evidence of waning antibody levels exists26,27, and new variants of concern have emerged28,29 and will probably continue to do so. All these processes will affect population susceptibility over time and help to determine when society enters a post-pandemic phase, the pattern of endemicity the virus ultimately assumes and its long-term public health burden30.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this paper.