Main

As SARS-CoV-2 continues its rapid global spread, increased understanding of the underlying level of transmission and infection severity are crucial for guiding the pandemic response. Although the testing of individuals with COVID-19 is a vital public health tool, variability in surveillance capacities, case definitions and health-seeking behaviour can cause difficulties in the interpretation of case data. Owing to more complete reporting, COVID-19-associated deaths are often seen as a more reliable indicator of the size of the pandemic. If reliably reported, the number of COVID-19-associated deaths can be used to infer the total number of SARS-CoV-2 infections using estimates of the infection fatality ratio (IFR; the ratio of COVID-19-associated deaths to the total number of SARS-CoV-2 infections). Estimates of the IFR derived from seroprevalence studies that carefully estimate the number of individuals with detectable antibodies can help to make the link between deaths and total infections as well as refine estimates of the relative burden in different age groups1. Although it is clear that infection severity increases substantially with age2,4, there remain key unanswered questions with regard to the consistency of mortality patterns across countries. Underlying heterogeneities in the age structure of the population or in the prevalence of comorbidities can contribute to differences in the levels of observed COVID-19-associated fatalities5. In addition, when looking at the total number of COVID-19-associated deaths, the level of transmission among the general population can be difficult to disentangle from large outbreaks in vulnerable populations such as those living in nursing homes and other long-term care settings. For many countries, the COVID-19 pandemic has been characterized by a heavy burden in residents of nursing homes, with more than 20% of all reported COVID-19-associated deaths occurring in nursing homes in countries such as Canada, Sweden and the UK3. In other countries, such as South Korea and Singapore, few COVID-19-associated deaths have been reported in nursing homes3. In this context, simply comparing the total number of deaths across countries may provide a misleading representation of the underlying level of transmission. Focusing on COVID-19-associated death data in younger individuals, however, may provide more-reliable insights into the underlying nature of transmission.

Seroprevalence surveys provide valuable information on the proportion of the population that have ever experienced an infection6,7,8,9, however, they can be subject to a number of biases and variable performance of different assays can complicate comparisons of results across studies10. Here, we present a model framework that integrates age-specific COVID-19-associated death data from 45 countries with 22 national-level seroprevalence surveys, providing insights into the consistency of infection fatality patterns across countries (Fig. 1a). We use our model to produce ensemble IFR estimates by age and sex in a single harmonized framework as well as estimates of the proportion of the population that has been infected in each country.

Age-specific mortality patterns

Using population age structures and age-specific death data, we compare the relative number of deaths by age within each country, using 55–59-year-old individuals as the reference group. We find a very consistent pattern in the relative risk of death by age for individuals younger than 65 years of age across countries and continents, with a strong log-linear relationship between age and risk of death for individuals who are 30–65 years old (Fig. 1b and Supplementary Methods 1). The observed relative risk of death in older individuals appears to be substantially more heterogeneous across locations. Given the potential for important variability in mortality associated with outbreaks in nursing homes across countries, we first investigate mortality patterns specifically in the general population, using age-specific deaths of individuals aged 65 years and older from England, for which the granularity of the data enables us to remove deaths that occurred in nursing home populations. We find that the log-linear relationship between age and risk of death continues into older age groups (Fig. 1b). To assess the generalizability of the data from England to other countries, we use these estimates to reconstruct the number of deaths in the general population—that is, excluding nursing homes—in 13 other countries and find that the predictions are consistent with the reported numbers of deaths in the general population (Fig. 1c and Supplementary Methods 2).

To translate the relative risks of death by age to the underlying IFR, we combine age-specific death data with 22 seroprevalence surveys, representing 16 of the 45 countries (multiple studies are available for Belgium, England, Scotland, Sweden and the Netherlands) (Supplementary Table 1). We use daily time series of reported deaths to reconstruct the timing of infections and subsequent seroconversions. To limit biases that can be introduced by outbreaks in nursing homes and potentially variable reporting practices of fatalities among individuals aged 65 years of age and older, we fit our model investigating the relationship between seroconversion and mortality exclusively to death data from those younger than 65 years. To infer IFRs in age groups 65 years and older, we use our estimates of the relative risk of death derived from data from England of deaths that did not occur in nursing homes. As our baseline model, we use an ensemble model in which we include results from all national-level seroprevalence studies within a single framework. In addition, we consider separate models for which we use the results of each individual seroprevalence survey to investigate the consistency of estimates provided by different studies. As older individuals have fewer social contacts11 and are more likely to be isolated through shielding programmes, we assume a baseline relative infection attack rate of 0.7 for individuals aged 65 years and older, relative to those under 65, and assume equal infection attack rates across age groups under 65 years of age. We find that age-specific IFRs estimated by the ensemble model range from 0.001% (95% credible interval, 0–0.001) in those aged 5–9 years old (range, 0–0.002% across individual national-level seroprevalence surveys) to 8.29% (95% credible intervals, 7.11–9.59%) in those aged 80+ (range, 2.49–15.55% across individual national-level seroprevalence surveys) (Fig. 2a). We estimate a mean increase in IFR of 0.59% with each five-year increase in age (95% credible interval, 0.51–0.68%) for ages of 10 years and older. We estimate that the risk of death if infected with SARS-CoV-2 is significantly higher for men than for women (Fig. 2a) particularly in older individuals with ensemble IFR estimates of 10.83% for men aged 80+ (95% credible interval, 9.28–12.52%; range of each individual seroprevalence survey, 3.25–20.30%) and 5.76% for women aged 80+ (95% credible interval, 4.94–6.66%; individual seroprevalence survey range, 1.73–10.80%), which is consistent with previous findings12,13.

Consistency of IFRs across seroprevalence surveys

We use our model framework to facilitate robust comparisons of IFRs across settings, considering only age-specific deaths among those younger than 65 years old. Using country-specific demographic distributions (both age and sex), we estimate population-weighted IFRs for each country. Taking France as a reference population, the ensemble model estimates a population IFR of 0.79% (95% credible interval, 0.68–0.92%), although we find notable heterogeneity in IFR estimates as suggested by individual national-level seroprevalence studies, with a median range of 0.24–1.49% (Fig. 2b). In particular, seroprevalence studies from New York City (2.28; 95% credible interval, 2.15–2.42%), Scotland (1.49%; 95% credible interval, 1.25–1.82%) and England (1.41%; 95% credible intervals, 1.38–1.44%) suggest a significantly higher IFR whereas studies in Kenya (0.24%; 95% credible interval, 0.23–0.25%), Slovenia (0.25%; 95% credible interval, 0.24–0.30%) and Denmark (0.26%; 95% credible interval, 0.24–0.32%) support a lower IFR than that of the ensemble model. We note that the application of age- and sex-specific IFR estimates suggested by individual national-level seroprevalence studies at the lower end of the scale (for example, Kenya, Slovenia and Denmark) to mortality data in highly affected settings would suggest attack rates over 100% (Supplementary Fig. 3). Potential explanations for the variable IFR estimates observed across settings include different prevalences of high-risk populations (for example, individuals with comorbidities), differences in methodology and representativeness of seroprevalence studies, heterogeneities in availability and quality of care or variations in reporting of COVID-19-associated deaths. We have fitted our model to seroprevalence data adjusted for reported assay sensitivity and specificity but find that using unadjusted estimates provides similar results (Supplementary Fig. 5). As the duration of SARS-CoV-2 seropositivity among infected individuals is as-yet unclear14, in sensitivity analyses we explore the potential effect of waning antibodies over time. In a scenario with assumed 5% exponential decay of seroconversions per month, the ensemble model estimates a population IFR of 0.65% in France (95% credible interval, 0.56–0.73%) (Supplementary Fig. 4). Furthermore, we demonstrate that our results are robust to different assumptions regarding the mean delay between infection and seroconversion (Supplementary Fig. 4). There may also be individuals who never seroconvert or only develop a T cell response, and those individuals would be missed in these studies15. Of the studies included in our analysis, we find that those conducted among blood donors (which exclude children and require individuals to be asymptomatic at the time of sampling) do not give significantly different results to those conducted among the general population (Supplementary Fig. 6). However, further comparisons are needed to fully understand the representativeness of different seroprevalence survey designs.

Considering the demographic structures of each country, we find that population-weighted IFR estimates by the ensemble model are highest for countries with older populations such as Japan (1.09%; 95% credible interval, 0.94–1.26%; individual seroprevalence survey range, 0.33–2.05%) and Italy (0.94%; 95% credible interval, 0.80–1.08%; individual seroprevalence survey range, 0.28–1.76%), whereas the lowest IFRs are for Kenya (0.09%; 95% credible interval, 0.08–0.10%; individual seroprevalence survey range, 0.03–0.17%) and Pakistan (0.16%; 95% credible interval, 0.14–0.19%; individual seroprevalence survey range, 0.05–0.31%) (Fig. 2c). Our ensemble model reproduces the reported seroprevalence values for the majority of studies including the temporal dynamics. However, consistent with substantial heterogeneity in IFR across countries, the ensemble model cannot fully reconcile the relationship between reported seroprevalence and age-specific death data in some locations (Fig. 3b). Of the 45 countries included in our analysis, which represent 3.4 billion people, we estimate that an average of 5.27% (95% credible interval, 4.51–6.20%; individual seroprevalence survey range, 2.80–13.97%) of these populations had been infected by 1 September 2020, ranging from 0.06% (95% credible interval, 0.04–0.09%; individual seroprevalence survey range, 0.02–0.20%) in South Korea to 62.44% (95% credible interval, 54.07–72.90%; individual seroprevalence survey range, 33.13–207.20%) in Peru. These results indicate that there has been large heterogeneity in the level of transmission across countries to date, with particularly high attack rates estimated in many Latin American countries. Given the underlying heterogeneity in IFR that could not be captured by the ensemble model, it is important to consider the full range of uncertainty in these estimates as suggested by individual seroprevalence studies (grey points in Fig. 3b). Estimates of high transmission levels in some Latin American countries are consistent with recent subnational seroprevalence studies16,17,18. Our estimates are also consistent with mathematical modelling efforts for individual countries, in which additional metrics of epidemic size (for example, numbers of cases, hospitalizations and/or admissions to intensive care units) have been considered12,19,20 (Supplementary Fig. 7). The medium and longer term implications for the pandemic in countries that have experienced high levels of infection remain unclear; in particular, whether sufficient immunity exists to halt the pandemic locally21.

Heterogeneities in mortality in individuals over 65

Using our model framework, we estimate the number of deaths expected in the absence of transmission in nursing homes in those aged 65 years and older, given the reported number of deaths in younger age groups, and compare them to the reported number of COVID-19-associated deaths in individuals aged 65 years and older (Fig. 4a). We find that many countries in Latin America had significantly fewer reported deaths in individuals aged 65 years and older than expected, consistent with the underreporting of COVID-19-associated deaths among older individuals. For example, we find that in Ecuador there are 220 fewer reported deaths per 100,000 individuals in those aged 65 years and older than expected (95% credible interval, 200–240), equivalent to approximately 2,800 missing deaths. Although lower infection attack rates in older populations owing to a reduced number of contacts or successful shielding policies may also explain lower mortality rates, in sensitivity analyses we show that for some countries unrealistically low infection attack rates among individuals aged 65 years and older compared to the rest of the population would be required to reconcile the reported number of deaths in these age groups (Supplementary Fig. 8).

By contrast, for many European countries, we observe a higher incidence of deaths in older individuals than expected (Fig. 4a). This is consistent with the large proportion of reported COVID-19-associated deaths attributable to outbreaks in nursing homes, highlighting the enormous burden experienced by these communities in many higher-income countries22,23. We use the age and sex distribution of residents in nursing homes to derive a population-weighted IFR of 22.25% (95% credible interval, 19.06–25.74%) among residents in nursing home in France, assuming that individuals in nursing homes are 3.8 times more frail than individuals in the general population of the same age and sex, as has previously been estimated24 (Fig. 4b). Using this estimate of the IFR would suggest that 7.28% of the French nursing home population had been infected by 1 September 2020 (95% credible interval, 6.29–8.49%), a 1.70-fold higher infection attack rate than the general population (Supplementary Methods 3). In our baseline model, we derive IFR estimates among the general population (that is, excluding deaths that occurred in nursing homes) so as to facilitate robust comparisons of IFR and transmission in the general population across settings. However, we demonstrate that in cases in which high rates of infection have occurred among residents of nursing homes, overall IFRs will be significantly greater than in scenarios in which these populations have been successfully shielded or experienced little exposure (Fig. 4c). For example, in France, including deaths that occurred in nursing homes increases the IFR from 0.74% for the general population (95% credible interval, 0.64–0.86%) to 1.10% overall (95% credible interval, 0.95–1.28%). This highlights the complexity of comparing headline IFR estimates across populations in which very different levels of transmission may have occurred in these hyper-vulnerable communities.

Discussion

In our analysis, we assess the relationship between seroprevalence and the age-specific number of COVID-19-associated deaths across numerous settings. Accounting for population demographics and variable mortality burdens among older populations, we find considerable heterogeneity in the overall IFR of SARS-CoV-2 across settings, which suggests that there are additional important drivers of IFRs.

Seroprevalence surveys have, to date, shown inconsistent patterns in age-specific infection attack rates across settings (Supplementary Fig. 10) as contact patterns are likely to have changed substantially over the course of the pandemic. In sensitivity analyses, we find that our results are relatively consistent when using different assumed age-specific infection attack rates (Supplementary Fig. 9). Here we use data from national reporting systems of COVID-19-associated deaths. However, in some settings these may not capture all deaths associated with COVID-19. It has been estimated for a subset of countries (n = 6 out of 45) that the numbers of reported COVID-19-associated deaths ranged between undercounting of 40% to overcounting of 10% compared to excess death estimates25. Assuming that these differences occur equally across all age groups would result in a change in the mean IFR for these countries of 0.66% to 0.87%. However, this represents an extreme scenario, as most unaccounted for deaths are likely to be in the oldest age groups, which would not affect our estimates25. There are a number of complexities in the interpretation of excess death data that can inhibit their direct use in assessments of IFR. Specifically, excess death estimates are highly sensitive to the reference time period used (Supplementary Figs. 11, 12), frequent negative excess deaths occur, especially in younger ages (Supplementary Fig. 12), and there is limited availability of excess death data for narrow age groups or outside higher-income countries. Although both seroprevalence and reported COVID-19-associated death data can be subject to potential limitations, considering these data across multiple settings in a harmonized framework has enabled us to robustly assess trends in the transmission and fatality rates of SARS-CoV-2 and derive global ensemble estimates.

Translating the number of COVID-19-associated deaths into estimates of the number of infections requires careful consideration of fatalities from outbreak events in highly vulnerable populations. By providing a benchmark of the expected number of deaths by age in older individuals, our approach allows us to identify countries in which excess transmission in nursing home populations has probably occurred. We demonstrate how outbreaks in nursing homes can drive overall population IFRs, through both increased attack rates and increased vulnerability. The results and modelling framework that we present here demonstrate how age-specific death data can be used to robustly reconstruct the underlying level of transmission. This approach could be applied at subnational scale and may be of particular use in settings in which resources to carry out large, representative seroprevalence studies are not available.

Methods

Data reporting

No statistical methods were used to predetermine sample size. The experiments were not randomized and the investigators were not blinded to allocation during experiments and outcome assessment.

Data

Age- and sex-specific COVID-19-related fatality data

We collated national-level age-stratified COVID-19-related death counts from official government and department of health webpages and reports for 45 countries. Where available, stratification by both age and sex was used. Subnational age-stratified death counts were additionally collated for four regions in which seroprevalence surveys had been conducted. For countries for which information on age was missing for a subset of deaths, we assumed the age distribution of the missing subset to be the same as that of the deaths with available age data. Information on age was missing for 29% of deaths in Spain. In addition, the time series of daily reported deaths from each country or region were obtained from the COVID-19 Data Repository by the Center for Systems Science and Engineering at Johns Hopkins University26. Age- and sex-specific population data were obtained from the United Nations 2019 World Population Prospects27,28.

Seroprevalence studies

We used data from 25 SARS-CoV-2 seroprevalence surveys from 20 countries or regions for which the results were representative of the general population and for which age-stratified death data were also available; these results are shown in Fig. 1a and Supplementary Table 1. In the ensemble model, we consider only the 22 national-level seroprevalence surveys, which represent 16 countries. In cases in which estimates of seroprevalence reported by individual studies had not been adjusted for the performance of the seroprevalence assay, we used the reported values of assay sensitivity and specificity to adjust the reported seroprevalence values (Supplementary Table 1). Seroprevalence values from 24 out of 25 studies included in our analysis were adjusted for assay performance; the seroprevalence assay used by the one remaining study had not been reported at the time of publishing.

Model

We combined age- and sex-specific COVID-19-related death data from 45 countries with data from 22 seroprevalence surveys, to jointly infer the age- and sex-specific IFRs and country-specific cumulative probabilities of infection. Age- and sex-specific IFRs were estimated in 5-year age groups, with individuals aged 80+ years considered as a single age group. Let Nc,a,s be the population size for the age group a of sex s in country c. The expected number of deaths for the age group a of sex s in country c, Dc,a,s is estimated as shown in equation (1), which we assume to follow a Poisson distribution. Λc denotes the cumulative probability of infection in country c, δa the relative probability of infection in age group a and IFRa,s the infection fatality ratio of age group a and sex s. This assumes that age- and sex-specific IFRs are constant over the course of the pandemic. In cases in which improvements in COVID-19 outcomes have occurred over time, our estimates would represent the average probabilities to date.

$${D}_{c,a,s}={N}_{c,a,s}\hspace{.1pt}{\varLambda }_{c}{\delta }_{a}{{\rm{IFR}}}_{a,s}$$
(1)

The expected number of deaths estimated by 5-year age groups were summed to match the corresponding age groups of observed deaths when reported in coarser age groups. We fit exclusively to the reported number of deaths for age groups under 65 years of age for each country (that is, including all age groups for which the upper bound is less than 65 years). IFRs for age groups aged 65 years and older were derived from age-specific death data reported by the Office for National Statistics in England29, which allows us to exclude the age-specific number of deaths among residents of nursing homes (Supplementary Methods 2). As an external validation, we apply these IFRs to reported death data for a subset of 13 countries for which an adjustment for deaths that occurred in nursing homes could be applied (Supplementary Methods 2).

To align estimates of the cumulative probability of infection, Λc, with data from seroprevalence surveys, we used daily time series of reported deaths to infer the timing of infections and subsequent seroconversions. We assumed a gamma-distributed delay between onset and death30 with a mean of 20 days and a standard deviation of 10 days and a gamma-distributed delay between infection and onset31 with a mean of 6.5 days and standard deviation of 2.6 days. The delay between onset and seroconversion32 was assumed to be gamma-distributed with a mean of 10 days and a standard deviation of 8 days. We derive the approximated seroprevalence at a given survey period t, λc,t, as shown in equation (2). Here, Sc,i is the inferred number of seroconversions in country c on day i, as inferred from the convolution of the death time series, Dc,i is the number of new deaths reported in country c on day i, and Tc is the date of the reporting of the age-stratified cumulative death data.

$${\lambda }_{c,t}={\varLambda }_{c}\frac{{\sum }_{i=1}^{t}{S}_{c,i}}{{\sum }_{i=1}^{{T}_{c}}{D}_{c,i}}$$
(2)

We include all data from national-level seroprevalence studies in an ensemble model, in which the expected seroprevalence is assumed to follow a beta distribution with an unknown variance parameter, κ, as shown in equation (3). To investigate the contribution of different seroprevalence studies to the likelihood, the model was fitted separately to data from each individual seroprevalence survey, including an additional three subnational seroprevalence studies (Supplementary Table 1). For each seroprevalence survey, the expected number of seropositive individuals in country c at sampling period t, $${N}_{c,t}^{{\rm{positive}}}$$, is assumed to follow a binomial distribution33 as shown in equation (4), in which $${N}_{c,t}^{{\rm{samples}}}$$ is the number of serological samples taken in country c at time t.

$${\bar{\lambda }}_{c,t} \sim {\rm{beta}}\,({\lambda }_{c,t},\,\kappa ),$$
(3)
$${N}_{c,t}^{{\rm{positive}}} \sim \text{binomial}({N}_{c,t}^{{\rm{samples}}},\,{\lambda }_{c,t}).$$
(4)

All parameters were estimated in a Bayesian framework using RStan34 using R version 3.6.1. We assumed uniform priors on all parameters, between −50 and −0.001 on a logarithmic scale for all IFR estimates, and between −50 and 2 on a logarithmic scale for all estimates of the cumulative probability of infection. The model was run with 3 chains of 10,000 iterations each. The 95% credible intervals are calculated by taking the 0.025 and 0.975 quantiles of the posterior distribution.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this paper.