## Main

COVID-19, as a result of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection, has been the direct cause of hundreds of thousands of deaths in the world. The indirect effects of the pandemic and responses to it, acting through social, economic, environmental and healthcare pathways, can also be substantial1. Indirect effects include denied or delayed disease prevention and medical procedures for acute and chronic conditions; loss of jobs and income; disruption of social networks; increases in self-harm and crime, especially domestic abuse; changes in quantity and quality of food and the use of tobacco, alcohol and other drugs; and changes in other infectious diseases, road traffic crashes, other injuries and air pollution resulting from changes in social contacts, mobility and transportation1. How these developments affect mortality varies across countries, reflecting the sociodemographic characteristics of the population, the extent and timing of the epidemic and the response, the overall health status of the population, the resilience and agility of the health and social care system and the effectiveness of social and economic safety nets that support those in need. Knowledge of the total effect on mortality is needed to understand the true public health effects of the pandemic and the policy response. Comparative multi-country analyses2 offer insights into how responses can be made more effective and timely and how health and social care systems could be made more resilient. However, some politicians have rejected country benchmarking based on the argument that the data, methodology and timing of the analysis are not comparable across countries3. In this study, we developed and applied a probabilistic model averaging approach, using an ensemble of 16 Bayesian models, for comparable quantification of the weekly mortality effects of the first wave of the COVID-19 pandemic in 19 industrialized countries in central and western Europe, plus Australia and New Zealand. The models accounted for factors that affect death rates, including seasonality, temperature and public holidays, as well as for medium-term and long-term secular trends and the dependency of death rates in each week on those in preceding week(s). A summary of the main findings, limitations and policy implications of our study is shown in Table 1.

## Results

We selected countries for our analysis if their total population in 2020 was more than 4 million and if we could access weekly data on all-cause mortality divided by age group and sex that went back at least to 2015 and extended through late-May 2020. The 21 countries in our analysis were Australia, Austria, Belgium, Bulgaria, Czechia, Denmark, England and Wales, Finland, France, Hungary, Italy, Netherlands, New Zealand, Norway, Poland, Portugal, Scotland, Slovakia, Spain, Sweden and Switzerland. We used data on weekly deaths from the start of time series of data through mid-February 2020 to estimate the parameters of each model, which were then used to predict death rates for the subsequent 15 weeks as estimates of how many deaths would have occurred without the pandemic. These were then compared to reported deaths to calculate excess mortality due to the pandemic.

### Magnitude of excess deaths

We report the number of excess deaths, excess deaths per 100,000 people and relative (percent) increase in deaths together with their corresponding 95% credible intervals. For the purpose of reporting, we rounded results on number of deaths that are 1,000 or more to the nearest hundred to avoid giving a false sense of precision in the presence of uncertainty; results less than 1,000 were rounded to the nearest ten. We also report posterior probability that the observed change in deaths represents an increase or decrease in deaths compared to what would be expected if the pandemic had not occurred. Posterior probability represents the inherent uncertainty in how many deaths would have occurred in the absence of the pandemic. In a country and week in which the actual number of deaths is the same as the posterior median of the number expected in a no-pandemic counterfactual, an increase in deaths is statistically indistinguishable from a decrease; in such a situation, there is a 50% posterior probability of an increase and a 50% posterior probability of a decrease. Where the entire posterior distribution of the number of deaths expected without the pandemic is smaller than the actual number of deaths, there is a ~100% posterior probability of an increase and a ~0% posterior probability of a decrease and vice versa. For most countries, the posterior distribution of the number of deaths expected without the pandemic covers the observed number, but there is asymmetry in terms of whether much of the distribution is smaller or larger than the observed number. In such cases, there would be uneven posterior probabilities of an increase versus decrease in deaths, with the two summing to 100% (for example, 80% and 20%). Posterior probabilities more distant from 50%, toward either 0% or 100%, indicate more certainty.

Deaths in all these countries were at the levels that would be expected in the absence of the pandemic through the month of February but started to diverge to higher levels at various times in March in some (Fig. 1). From mid-February through the end of May 2020, an estimated 206,000 (95% credible interval 178,100–231,000) more people died in these 21 countries than would have been expected had the pandemic not occurred. This number is similar to the number of deaths from lung cancer in these countries in an entire year and more than twice the number of deaths from diabetes or breast cancer in an entire year4. Of these deaths, 105,800 (90,400–119,000) were in men and 100,000 (82,000–117,500) were in women (Extended Data Table 1). In relative terms, this amounts to an 18% (15–21%) increase in deaths over this period in these countries combined. Italy, Spain and England and Wales accounted for 24%, 22% and 28% of these excess deaths, respectively.

The posterior probability that there was a rise in deaths over the entire first wave of the pandemic was less than 50% (that is, a decline in deaths is more likely than an increase) for both sexes in Bulgaria, New Zealand, Australia, Slovakia, Czechia and Hungary and for women in Poland; 50–75% for women in Norway and Austria and men in Poland; 75–90% for men and women in Denmark and Finland, men in Norway and women in Switzerland; 90–99% for men and women in Portugal and men in Austria; and more than 99% for men in Switzerland and for both sexes in the Netherlands, France, Sweden, Belgium, Italy, Scotland, Spain and England and Wales (Fig. 2). In countries and sexes where mortality increased relative to the no-pandemic counterfactual with a posterior probability of at least 90%, the number of excess deaths per 100,000 people was lowest for men in Austria (14.3, −1.3 to 29.4), Switzerland (21.9, 7.6–34.9) and Portugal (27.4, 3.6–49.6), and for women in Portugal (28.7, 2.1–54.2) (Fig. 2). It was highest in Spain and England and Wales, with posterior median estimates for the two sexes ranging from 90 to 102 per 100,000 population. The posterior median increase was also more than 70 per 100,000 people for both sexes in Belgium, Italy and Scotland. Relative increases in deaths, compared to what would be expected in the absence of the pandemic, ranged from 10% or less in Austrian, Swiss and Portuguese men and Portuguese women to one quarter or more in Belgium, Italy, Scotland, Spain and England and Wales (Fig. 3). The largest rise in mortality for men was most likely to be in England and Wales (63% posterior probability of having the largest percent increase and 55% of having the largest number of deaths per 100,000 people), followed by Spain; for women, Spain was most likely to have experienced the largest rise in mortality (61% posterior probability of having the largest percent increase and 51% of having the largest number of deaths per 100,000 people), followed by England and Wales.

Taken across all 21 countries, the number of excess deaths from all causes was 23% (7–38%) higher than the number of deaths assigned to COVID-19 as underlying cause of death (Extended Data Table 1). The difference between all-cause excess and COVID-19 deaths was largest in Spain and Italy, where all-cause excess deaths were 69% (47–90%) and 46% (14–77%), respectively, higher than deaths assigned to COVID-19. This difference might be due to a combination of undetected infections5,6, whether or not deaths from ‘suspected COVID-19’ (based on clinical symptoms) are assigned to COVID-197, and some increase in mortality from other diseases due to reductions in acute and chronic care8,9,10,11,12,13,14. In contrast to Italy and Spain, the overall (all-cause) number of excess deaths was smaller than deaths assigned to COVID-19 in France, Belgium and Switzerland. This situation might have arisen because some countries have assigned any death in a person with confirmed or suspect SARS-CoV-2 infection to COVID-19; some of these deaths might have been in patients with multiple existing chronic conditions who already had a high risk of dying7,15,16,17. Finally, there might have been a reduction in deaths from influenza and other respiratory infections because of reduced contact among people18,19 as well as a decline in traffic injuries, falls and violence as people spent more time at home20. As a result of these differences, although France and Spain have reported similar numbers of deaths assigned to COVID-19, all-cause mortality increased by twice as much in Spain as in France. These variations show the importance of using all-cause mortality to capture the true death toll due to the pandemic.

### Timing of excess deaths

Italian men were the first group to experience a rise in mortality, with the first week of March 2020 as the earliest week in which the posterior probability of an increase in deaths was more than 90%. This was followed by Italian women and Spanish men in the subsequent week (Fig. 4). Deaths in some countries with large early excess mortality returned to levels that would be expected in the absence of the pandemic in April—for example, in France, followed by Spain. Deaths remained above the levels expected in the absence of the pandemic in England and Wales and Sweden throughout the month of May, which resulted in longer periods of adverse effect. As a result, in countries and sexes where the posterior probability of an increase in deaths was more than 90%, the period of time when deaths were higher than would be expected in the absence of the pandemic ranged from 5 weeks in Austrian men to 9–10 weeks in men and women in England and Wales and Sweden, women in Scotland and men in Italy.

The large adverse effect of the pandemic in England and Wales and, to some extent, in Spain is a consequence of having both long durations and large weekly rises, with a more than 90% posterior probability that, in some weeks, deaths in men and women in Spain and men in England and Wales more than doubled. In contrast, Portugal, Switzerland and possibly France had smaller weekly rises, and for fewer weeks, and, hence, had overall increases between one quarter and one half of those in England and Wales and Spain. Sweden had the longest duration of excess deaths but had smaller weekly increases in deaths than countries such as England and Wales, Spain, Scotland, Italy and Belgium. As a result, the overall mortality toll in Sweden, in terms of relative increase and deaths per 100,000 people, fell between those of countries with low-to-moderate effects (for example, Portugal and Switzerland) and countries with extreme tolls (for example, Spain and England and Wales).

### Demographic distribution of excess deaths

Although it is widely quoted that more men die from COVID-1921,22,23,24, the number of excess deaths for all causes, excess deaths per 100,000 people and relative increase in deaths were similar between men and women in most countries (Fig. 5). In all 21 countries together, 105,800 (90,400–119,000) men died from any cause of death as a result of the pandemic compared to 100,000 (82,000–117,500) women. Furthermore, in many countries, the balance of excess deaths changed from male dominated early in the pandemic to being equal (for example, in England and Wales) or female dominated (for example, in Italy, Spain and France) later on.

When considered in terms of relative increase in deaths, male disadvantage was largest in the Netherlands (24% (16–31%) increase in male deaths compared to 15% (7–24%) increase in female deaths) and Switzerland (10% (3–17%) increase in male deaths compared to 5% (−3% to 13%) increase in female deaths). In contrast, in Belgium (25% (16–34%) increase in male deaths compared to 29% (18–40%) increase in female deaths) and Spain (37% (29–45%) increase in male deaths compared to 39% (29–50%) increase in female deaths), there was a slight female disadvantage in total mortality effects. A male disadvantage in pandemic-related excess deaths was more pronounced before 65 years of age, whereas, in older ages, the relative effects were similar in men and women (Fig. 5). For example, the pandemic led to an estimated 19% (9–29%) increase in deaths in males younger than 65 years compared to 2% (−7% to 10%) in females of the same age in Sweden; 15% (8–22%) and 9% (3–15%) in Italy; and 11% (5–17%) and −2% (−8% to 4%) in the Netherlands.

In absolute terms, the total mortality toll of the pandemic was overwhelmingly in those aged 65 years and older, who experienced 94% of all excess deaths. In relative terms, older people were also affected more, with mortality in these ages being ~40% higher than it would have been in the absence of the pandemic in Spain and England and Wales and ~30% higher in Belgium, Scotland and Italy. The largest effect on those younger than 65 years was in England and Wales—26% (20–32%) for males and 22% (17–28%) for females—followed by Scotland, Spain, Sweden and Italy. In men and women in New Zealand and men in Denmark and Slovakia, there might have been a slight decline in deaths in men younger than 65 years as a result of the pandemic, with posterior probabilities of the observed declines being true declines above 90%. In these ages, injuries are an important cause of death, especially for men. For example, in men younger than 65 years in New Zealand, Denmark and Slovakia, injuries account for 22%, 11% and 15% of all deaths, respectively4.

## Discussion

With our consistent and comparable analysis, we identified four groups of countries in terms of the overall death toll of the first wave of the COVID-19 pandemic. The first group comprises countries that have avoided a detectable rise (with a posterior probability of at least 90%) in all-cause mortality and includes Bulgaria, New Zealand, Slovakia, Australia, Czechia, Hungary, Poland, Norway, Denmark and Finland. The second and third groups of countries experienced a low-to-medium effect of the pandemic on overall deaths and include Austria, Switzerland and Portugal (low effect) and France, the Netherlands and Sweden (medium effect). The fourth group of countries, which experienced the highest mortality toll, consists of Belgium, Italy, Scotland, Spain and England and Wales.

The main strength of our study is the development and application of a method to systematically and consistently use time series data from 2010 to early 2020 to estimate how many deaths would be expected in the absence of the pandemic. The models incorporated important features of mortality, including seasonality of death rates, how mortality in one week might depend on previous week(s) and the seasonally variable role of temperature. This methodology not only allows more robust estimation of the total effects of the pandemic but also enables comparisons of excess deaths across countries on a real-time basis. The use of a modeling framework, as we have done, allowed us to make estimates by age group and sex, which, because of smaller numbers of deaths, might not be possible (or at least stable) otherwise. By modeling death rates rather than simply the number of deaths, as is done in most other analyses, we account for changes in population size and age structure. We used an ensemble of models that typically leads to more robust projections and better accounts for both the uncertainty associated with each individual model and model choice.

A limitation of our study is that we did not have data on underlying cause of death. Having a breakdown of deaths by underlying cause will help develop cause-specific models and understand which causes have exceeded or fallen below the levels expected. We also could not access age-specific and/or sex-specific data for several other countries, nor did we have data on total mortality by socio-demographic status to understand inequalities in the effects of the pandemic beyond deaths assigned to COVID-19 as the underlying cause of death. Releasing these data will allow more granular analysis of the effects of the pandemic, which can, in turn, inform resource allocation and a more targeted approach to mitigating both the direct and indirect effects of the COVID-19 pandemic. Finally, we are not yet in a position to provide an overall unified explanation for the observed quantitative differences among countries, if such a task is ever possible25. Rather, the reasons are likely to lie in complex interactions of the social, economic, environmental and health system features of each country and specific events and responses that promote or suppress transmission. We discuss some of these below together with lessons for subsequent waves of the pandemic.

The total death toll for the first wave of the COVID-19 pandemic in a country is affected by three key groups of determinants and the social and political factors that shape them26: the baseline characteristics of the population and communities they live in; the response policies that affect mortality positively by interrupting transmission and negatively by isolation and denial of essential services; and the preparedness, resilience and agility of the public health and health and social care systems. Information on some relevant characteristics is presented in Extended Data Table 4.

The first group of determinants comprises characteristics of individuals and communities that make them vulnerable or resilient to the spread and adverse health consequences of infection and those of the restrictions. These include baseline demography and health; social networks and inequalities; employment status and occupation; and environmental features, such as transport and housing. The risk of death from COVID-19 increases with age, with social and material deprivation and in the presence of long-term conditions such as obesity, diabetes and vascular and kidney diseases. Most countries in our analysis have an aging population, and none stands out as particularly older or younger than the others. For example, the share of the population that is older than 65 years ranges from 16% in Australia and New Zealand to 23% in Italy, but this share weakly correlated with excess mortality (correlation coefficient, 0.25) (Extended Data Table 4). Obesity and associated morbidities are higher in the United Kingdom, which experienced one of the highest effects, than in other European countries in our analysis27,28,29. But New Zealand and Australia, which had no detectable excess deaths, have an even higher prevalence of obesity than the United Kingdom, whereas Belgium, Italy and Spain, which have lower prevalence, also experienced large effects. Similarly, although reported multi-morbidity varies across Europe30, it is not correlated with excess mortality: Sweden and Denmark, which had different magnitudes of excess deaths, have low levels of multi-morbidity; Hungary, Spain and Italy, which also span the entire range of excess mortality, have some of the highest. Finally, although the United Kingdom has higher relative poverty than countries such as Norway, Denmark and Finland31, excess deaths were higher in Sweden (similar relative poverty to Denmark and Finland) than in New Zealand (similar relative poverty to the United Kingdom). These findings suggest that these contextual factors, although important, are individually insufficient to lead to the massive cross-country variation in mortality observed here. Other important population characteristics lack consistent data across countries and, hence, remain unexplored. For example, in some countries, regional outbreaks have started among low-wage workers in poor working conditions, such as garment factories and food processing plants. The role of overcrowded social housing complexes and public transportation (and, more generally, frequency, routes and means of mobility) in the extent and geographical distribution of transmission is also unknown32.

The second determinant of mortality toll of the pandemic is the policy and public health response, which has varied vastly across countries in timing, character and extent33. The timing of the lockdown in relation to when initial infections occurred34 affects the peak number of people who are infected, which drives both the number of deaths from COVID-19 and the pressure on the healthcare system that displaces routine care for other diseases. The stringency of the lockdown, together with the extent and effectiveness of testing, contract tracing and isolation, determines how long it takes for the number of cases to return to low levels and can therefore account for some of the variations in the intensity and duration of excess deaths observed here (Extended Data Table 4). Among the countries analyzed here, Bulgaria, New Zealand, Slovakia, Czechia, Hungary, Norway and Finland acted early in terms of putting in place various movement restrictions or lockdowns33,35 and kept the number of cases to such low levels that they could identify and isolate cases and their contacts through their existing public health systems. Austria and Denmark experienced an early rise in the number of cases but enacted lockdowns soon after and used effective testing, contact tracing and isolation to contain the epidemic and its mortality effect. At the other extreme, Italy, which was the initial European epicenter of the pandemic, Spain, the Netherlands, France and the United Kingdom put lockdown measures in place only after the number of cases and deaths had risen to such levels that the epidemic continued for weeks. For example, the United Kingdom, Spain, Italy, France and the Netherlands introduced lockdowns after a larger number of cases had been detected and after a longer period since the first few COVID-19 deaths occurred than New Zealand and other countries in Europe, such as Denmark (Extended Data Table 4)33,34,36. Sweden, the only country that did not put in place a mandatory lockdown and used only voluntary social distancing measures, had one of the longest durations of excess mortality. Extensive (and, at the extreme, universal) testing and effective contact tracing and isolation of cases and their contacts can also minimize transmission even without a lockdown37. Countries also varied in how extensively they conducted community testing, contact tracing and isolation of cases and their contacts at each stage of the pandemic, with Austria, Denmark, Finland, New Zealand and Norway introducing effective systems and Belgium, Spain, France and the United Kingdom being more limited in community testing and/or contact tracing for many months33, with some, like the United Kingdom, Spain and France, still not having a system that is able to respond to the dynamic geographical, demographic and social nature of the epidemic38,39,40.

Third, the preparedness and resilience of the public health infrastructure not only influence how well the spread of infection is controlled but also influence the choice of policy, as decision-makers assess what they think is possible with existing capacity41. Denmark and Austria (as well as Germany, for which data were not available for our analysis) were able to scale up testing rapidly because they had extensive and well-coordinated laboratory networks and public health infrastructure in place. Some central European countries had existing contact tracing infrastructure, a legacy of their more recent experience with infectious diseases such as tuberculosis. Others had more limited capacity but were able to scale it up rapidly based on the existing public health structures, such as New Zealand’s contact tracing system. In contrast, the United Kingdom and Spain had limited testing capacity (or ability to use capacity in non-governmental labs) and contact tracing systems, early in the pandemic. As above, their testing, contact tracing and ability to persuade and support people to isolate when necessary are still not effective42,38,39,40. Countries also varied substantially in terms of how their healthcare system continued to provide life-saving services: those countries that had less capacity and were less able to rapidly enhance capacity, partly related to uneven health and social care spending, responded less effectively to healthcare needs. Notably, per capita spending is lower in the United Kingdom, Italy and Spain than in Austria, Norway, Sweden and Denmark43. One effect of financing variation is on the number of hospital beds, which, on a per capita basis in Austria, is nearly three times that of the United Kingdom44. Where hospital beds are more limited—for example, in the United Kingdom, Spain and Hungary45—concerns about breeching capacity might have led to delaying admission of patients with COVID-19 and other patients until their health deteriorated and to early discharge of patients to long-term care facilities (care homes) often without systematic testing. The spread of infection within and between hospitals and care homes, and between them and the community, is itself an important determinant of infections and deaths in both the vulnerable groups and the general population46,47. Where infection rates were high and care homes were not appropriately safeguarded—namely in Spain, the United Kingdom, Belgium, Italy, France and Sweden—a large number of care home residents died from confirmed or probable COVID-1946. The initial seeding through discharge of infected patients to care homes was compounded by lack of testing and protective equipment for staff and residents and, especially in privately run care homes, regular movement of (temporary) staff across facilities48. Finally, some of the variations in excess deaths might be due to variation in community-based and primary care that affected preventive and pre-hospital care for patients with COVID-19 as well as for patients with other conditions.

Although our results demonstrate that countries with timely lockdowns had smaller numbers of excess deaths in the first wave of the epidemic, lockdowns have adverse short- and long-term health, psychosocial and economic effects. They might become needed, as a mechanism of last resort, as the number of cases increases, but they also require effective surveillance and agile operation, with sufficient geographical granularity to limit restrictions to as small an area as possible. Lockdowns, especially nationwide ones, can be avoided or be less stringent if countries can put in place comprehensive (and, in the extreme, universal) and effective testing and contact tracing systems; provide information to individuals and local public health bodies in a timely manner; create a sense of trust and responsibility; and put in place economic and social support that helps to increase participation in testing, contact tracing and adherence to isolation advice. In addition to controlling transmission, there is a need for integrated care pathways at the community and facility level that manage both milder COVID-19 cases and allow other acute and chronic conditions to be rapidly and appropriately triaged and cared for in community facilities as well as in health and long-term care facilities. For some countries, this might involve a re-allocation and re-direction of care resources and, for others, where there has been chronic underinvestment in health and social care, the more challenging task of rebuilding public health and health and social care systems that serve their entire population41.

## Methods

### Data sources

We included industrialized countries in our analysis if:

• We could access weekly data on all-cause mortality divided by age group and sex that extended through May 2020. We selected late-May 2020 to have a consistent period of analysis for all countries and because our results showed that, by this date, the probability that deaths were above the level that would be expected had the pandemic not occurred was within the 90% credible interval in the great majority of countries.

• The time series of data went back at least to 2015 so that model parameters could be reliably estimated. For countries with longer time series, we used data starting in 2010.

• Their total population in 2020 was more than 4 million. We excluded countries with data but with smaller populations (Estonia, Iceland, Latvia, Liechtenstein, Lithuania, Luxembourg and Montenegro) because, in many weeks, the number of deaths would be small or zero, especially for people younger than 65 years. This would, in turn, lead to either large uncertainty that would make it hard to differentiate between those places with and without an effect or unstable estimates because the model is fitted to many weeks with zero deaths.

The sources of population and mortality data are provided in Extended Data Table 2. We calculated weekly population through interpolation of yearly population, consistent with the approach taken by national statistical offices for intra-annual population calculation49. Population for 2020 was obtained through linear extrapolation from the last 5 years. We obtained data on temperature from ERA550, which uses data from global in situ and satellite measurements to generate a worldwide meteorological data set, with full space and time coverage over our analysis period. We used gridded temperature estimates measured four times daily at a resolution of 30 km to generate weekly temperatures for each first-level administrative region and gridded population data (https://sedac.ciesin.columbia.edu/data/collection/gpw-v4) to generate population estimates by first-level administrative region in each country. We weighted weekly temperature by population of each first-level administrative region to create national-level weekly temperature summaries.

### Statistical methods

The total mortality effect of the COVID-19 pandemic is the difference between the observed number of deaths from all causes and the number of deaths had the pandemic not occurred, which is not directly measurable. The most common approach to calculating the number of deaths had the pandemic not occurred has been to use the average number of deaths over previous years—for example, the most recent 5 years—for the corresponding week or month when the comparison is made51. This approach, however, does not take into account changes in population size and age structure, nor long- and short-term trends in mortality, which are particularly pronounced for some age groups52,53. Nor does this approach account for time-varying factors, such as temperature, that are largely external to the pandemic but also affect death rates.

We developed an ensemble of 16 Bayesian mortality projection models that each make an estimate of weekly death rates that would have been expected if the COVID-19 pandemic had not occurred. We used multiple models because there is inherent uncertainty in the choice of model that best predicts death rates in the absence of pandemic. These models were formulated to incorporate features of weekly death rates as follows:

• First, death rates might have a medium-term to long-term trend that affects mortality in 2020 compared to earlier years. We developed two sets of models, one with no trend and one with a linear trend term over weekly deaths.

• Second, death rates have a seasonal pattern that varies by age group and sex54,55,56,57. We included weekly random intercepts for each week of the year. To account for the fact that seasonal patterns ‘repeat’ (that is, late December and early January are seasonally similar), we used a seasonal structure58,59 for the random intercepts. The seasonal structure allows the magnitude of the random intercepts to vary over time and implicitly incorporates time-varying factors, such as annual fluctuations in flu season.

• Third, death rates in each week might be related to rates in preceding week(s) due to short-term phenomena, such as severity of the flu season. We formulated four sets of models to account for this relationship. The weekly random intercepts in these models had a first-, second-, fourth- or eighth-order autoregressive structure58,59. The higher-order autoregressive models allow death rates in any given week to be informed by those in a progressively larger number of preceding weeks. Furthermore, trends not picked up by the linear or seasonal terms would be captured by these autoregressive terms.

• Fourth, beyond having a seasonal pattern, death rates depend on temperature and, specifically, on whether temperature is higher or lower than its long-term norm during a particular time of year60,61,62,63,64,65. The effect of temperature on mortality varies throughout the year and might be in opposite directions for different times of the year. We used two sets of models, one without temperature and one with a weekly term for temperature anomaly, defined as deviation of weekly temperature from the local average weekly temperature over the entire analysis period. The coefficients of temperature anomalies were specified as a random effect with a random walk prior of order one, so that temperature effects are more similar in adjacent weeks. The random effect had a circular structure so that late December and early January are treated as adjacent.

• Death rates might be different around major holidays, such as Christmas and New Year. We included effects (as fixed intercepts) for the week containing Christmas and New Year in all countries. For England and Wales and Scotland, we also included effects for the weeks containing other public holidays, because reported death rates in weeks that contain a holiday were different from other weeks. This term was tested but not included for other countries because the effect was negligible.

• We also tested, but did not include, terms for the weeks that coincided with a change to and from daylight saving time because the effect was negligible.

These choices led to an ensemble of 16 Bayesian models (2 trend options × 4 autoregressive options × 2 temperature options). The ensemble of models is shown in Extended Data Table 5. In each model, the number of weekly deaths follows a Poisson distribution:

$${\mathrm{deaths}}_{{\mathrm{week}}} \sim {\mathrm{Poisson}}\left( {{\mathrm{death}}\,{\mathrm{rate}}_{{\mathrm{week}}} \cdot {\mathrm{population}}_{{\mathrm{week}}}} \right).$$

Log-transformed death rates were modeled as a sum of components described above:

$$\begin{array}{l}\log \left( {{\mathrm{death}}\,{\mathrm{rate}}_{{\mathrm{week}}}} \right) = \alpha _0 + \alpha _{{\mathrm{holiday}}\left( {{\mathrm{week}}} \right)} + \beta \cdot {\mathrm{week}} + \zeta _{{\mathrm{week}}}^{\left( i \right)} + \theta _{{\mathrm{week}}}\\ + \left( {\gamma + {\upnu}_{{\mathrm{week}}\,{\mathrm{of}}\,{\mathrm{year}}}} \right) \cdot {\mathrm{temperature}}\,{\mathrm{anomaly}}_{{\mathrm{week}}} + \varepsilon _{{\mathrm{week}}}\end{array}$$

The term α0 denotes the overall intercept, and $${\alpha}_{{\mathrm{holiday}}\left( {{\mathrm{week}}} \right)}$$ is the holiday intercept, applied to weeks with a holiday. For example, if a week includes the 25th of December, then $$\alpha _{{\mathrm{holiday}}\left( {{\mathrm{week}}} \right)} = \alpha _{{\mathrm{Christmas}}}$$. For weeks that did not contain a holiday, this term did not appear in the above expression. All intercepts were assigned $${\cal{N}}\left( {0,1000} \right)$$ priors. The term β·week represents the linear time trend. The coefficient β was also assigned a $${\cal{N}}\left( {0,1000} \right)$$ prior. As described above, this term appeared in half of our models, whereas, in the other half, trends over time were captured by the remaining terms.

The models used different orders (first, second, fourth or eighth) of the autoregressive term $$\zeta _{{\mathrm{week}}}^{\left( i \right)}$$ with the superscript i denoting the order. The first-order autoregressive term is defined as $$\zeta _{{\mathrm{week}}}^{\left( 1 \right)} \sim {\cal{N}}\left( {\varphi .\zeta _{{\mathrm{week}} - 1}^{\left( 1 \right)},\sigma _\zeta ^2} \right)$$ where the parameter φ lies between −1 and 1 and captures the degree of association between the number of deaths in each week and the preceding week. Hyperpriors are placed on the parameters $$\varkappa _1 = \log \left( {\left( {1 - \varphi ^2} \right)/\sigma _\zeta ^2} \right)$$ and $$\varkappa _2 = \log \left( {\left( {1 + \varphi } \right)/\left( {1 - \varphi } \right)} \right)$$, which were assigned logGamma(0.001,0.001) and $${\cal{N}}\left( {0,1} \right)$$ distributions, respectively. Similarly, an ith order autoregressive term is given by $$\zeta _{{\mathrm{week}}}^{\left( i \right)} = \varphi _1 \cdot \zeta _{{\mathrm{week}} - 1}^{\left( i \right)} + \cdots + \varphi _i \cdot \zeta _{{\mathrm{week}} - i}^{\left( i \right)} + \epsilon _{{\mathrm{week}}}$$ with $$- 1 < \phi _j < 1$$. The parametrization of these models was based on the partial auto-correlation function of the sequence ϕj66.

The term θweek captures seasonality in mortality trends with a period of 52 weeks. The sums of every 52 consecutive terms $$\theta _{{\mathrm{week}}} + \theta _{{\mathrm{week}} + 1} + \cdots + \theta _{{\mathrm{week}} + 51}$$ were modeled as independent Gaussian with zero mean and variance $$\sigma _\theta ^2.$$. We used a $${\mathrm{logGamma}}\left( {0.001,\,0.001} \right)$$ prior on the log-precision $$\text{log}\left( {1/\sigma _\theta ^2} \right)$$. Each week is assigned an index between 1 and 52 depending on which week of the current year it is (the incomplete week 53 is mapped to either index 1 or 52 depending on whether it has greater overlap with week 52 of the current year or week 1 of the next year).

The effect of temperature anomaly on death rates is captured by the two terms γ and $${\upnu}_{{\mathrm{week}}\,{\mathrm{of}}\,{\mathrm{year}}}$$. The term $${\upgamma} \cdot {\mathrm{temperature}}\,{\mathrm{anomaly}}_{{\mathrm{week}}}$$ is the overall association of temperature anomaly in a week. The term $${\upnu}_{{\mathrm{week}}\,{\mathrm{of}}\,{\mathrm{year}}} \cdot {\mathrm{temperature}}\,{\mathrm{anomal}}y_{{\mathrm{week}}}$$ captures deviations from the overall association for each week of the year. It has a circular first-order random walk with 52 terms so that temperature associations change smoothly throughout the year and so that they are similar in late December and early January65. The first-order random walk prior is defined via $$\nu _{{\mathrm{week}}\,{\mathrm{of}}\,{\mathrm{year}}} \sim {\cal{N}}\left( {\nu _{{\mathrm{week}}\,{\mathrm{of}}\,{\mathrm{year}} - 1},\sigma _{\upnu}^2} \right)$$, and the prior assigned to the log-precision is $$\log \left( {1/\sigma _{\upnu}^2} \right) \sim {\mathrm{logGamma}}\left( {0.001,0.001} \right)$$.

Finally, the term εweek is a zero-mean term that accounts for additional variability. It is assigned an independent and identically distributed prior $$\varepsilon _{{\mathrm{week}}} \sim {\cal{N}}\left( {0,\sigma _{\upvarepsilon}^2} \right)$$, and a logGamma(0.001, 0.001) prior is placed on the log-precision $$\log \left( {1/\sigma _\varepsilon ^2} \right)$$.

The components $$\alpha _0,\alpha _{{\mathrm{holiday}}\left( {{\mathrm{week}}} \right)}$$, θweek, εweek and $$\zeta _{{\mathrm{week}}}^{\left( i \right)}$$ (for each autoregressive order of i = 1, 2, 4 or 8) appear in the expression for log(death rateweek) in all models. The remaining components appear in some models only. Extended Data Table 5 shows the terms included in each of the 16 models in the ensemble.

We used data on weekly deaths from the start of the time series of data through mid-February 2020 to estimate the parameters of each model, which were then used to predict death rates for the subsequent 15 weeks as estimates of the counterfactual death rates (that is, if the pandemic had not occurred). For the projection period, we used recorded temperature so that our projections take into consideration actual temperature in 2020. This choice of training and prediction periods assumes that the number of deaths that are directly or indirectly related to the COVID-19 pandemic was negligible through mid-February 2020 in these countries, but it allows for effects to have appeared in subsequent weeks.

We tested the sensitivity of the results to the choice of prior through the use of penalized complexity priors and found that the results were similar. All models were fitted using integrated nested Laplace approximation (INLA)67, implemented in the R-INLA software (version 20.03). We used a model-averaging approach to combine the predictions from the 16 models in the ensemble68,69. Specifically, we took 1,000 draws from the posterior distribution of sex- and age-specific deaths under each of the 16 models and pooled the 16,000 draws to obtain the posterior distribution of sex- and age-specific deaths if the COVID-19 pandemic had not occurred. This approach generates a distribution of estimates that has equal samples from that of each model in the ensemble and, hence, incorporates both the uncertainty of estimates from each model and the uncertainty in the choice of model. The reported credible intervals represent the 2.5th and 97.5th percentiles of the resultant posterior distribution of the draws from the entire ensemble. We also report the posterior probability that an estimated increase in deaths corresponds to a true increase (or decrease), which is described in the main paper. We also evaluated the sensitivity of our results to how the different models are weighted. Specifically, in the sensitivity analysis, the number of draws from each model was inversely proportional to the absolute error of prediction in the validation analyses described below. The results of the sensitivity analysis were virtually identical to those with equal draws, with median excess deaths estimates differing by 1.6% on average and by 0.5% when summed across all countries.

We did all analyses separately by sex and age group (0–64 years and 65+ years) because death rates, and how they are affected by the pandemic, vary by age group and sex. To obtain estimates of excess deaths across age groups and both sexes, we summed draws from age- and sex-specific estimates.

### Validation of no-pandemic counterfactual weekly deaths

We tested how well our model ensemble estimates the number of deaths expected had the pandemic not occurred by withholding data for 15 weeks starting from mid-February (that is, the same projection period as done for 2020) for an earlier year and using the preceding time series of data to train the models. In other words, we created a situation akin to 2020 for an earlier year. We then projected death rates for the weeks with withheld data and evaluated how well the model ensemble projections reproduced the known-but-withheld death rates. We repeated this for three different years: 2017 (that is, trained model using data from January 2010 to mid-February 2017 and tested for the subsequent 15 weeks); 2018 (that is, trained model using data from January 2010 to mid-February 2018 and tested for the subsequent 15 weeks); and 2019 (that is, trained model using data from January 2010 to mid-February 2019 and tested for the subsequent 15 weeks). We performed these tests for all sexes and age groups used in the analysis. We report the projection error (that measures systematic bias) and absolute forecast error (that measures any deviation from the withheld data). Additionally, we report coverage of the projection uncertainty; if projected death rates and their uncertainties are well estimated, the estimated 95% credible intervals should cover 95% of the withheld data.

The results of model validation (Extended Data Table 3) show that the estimates of how many deaths would be expected had the pandemic not occurred from the Bayesian model ensemble were unbiased, with mean projection errors of 1% (between −3% and 6% in different age groups, sexes and years). The mean absolute error was between 4% and 9% in different age groups, sexes and years. Ninety-five percent coverage, which measures how well the posterior distributions of projected deaths coincide with withheld data, was 95% on average, which shows that the posterior distribution is well estimated.

### Comparison with other estimates

The Financial Times, The Economist and The New York Times have reported the number of weekly deaths for some of the same countries as we have and compared them with either averages of the past 5 years or projections based on a linear model with a seasonal term. These comparisons have been for both sexes combined and, in most cases, for all ages combined and have not accounted for the role of temperature. Countries with small, medium and large numbers of excess deaths are consistent between our analysis and these reports. There are, nonetheless, some differences. For example, we estimated a small number of excess deaths, with low posterior probabilities, for Denmark and Norway, whereas these sources reported a decline in deaths. We also estimated a slightly larger number of excess deaths for Portugal, Italy and Sweden than some of these sources. EuroMoMo fits a sinusoidal seasonal model to death counts but does not report country-specific excess deaths and, hence, could not be compared with our results. The United Kingdom Office for National Statistics (ONS) calculated several age-standardized measures of excess mortality for January to June 2020, for both sexes combined, for European countries70. The analysis did not account for temperature or holidays. Because the analysis began in January, it also covered the period before the pandemic had reached Europe in a widespread manner. The overall grouping of countries into small, medium and large effects was mostly similar to us, but the ONS concluded a better performance (that is, lower excess mortality relative to other countries included) for France than we did. They also estimated a decline in mortality in Portugal and Switzerland, which contrasts with an increase in our analysis. Differences between our results and those of the ONS might be partly related to the fact that the ONS analysis also included the pre-pandemic months of 2020 and did not account for inter-annual variations in temperature. Most weeks during the period of January to March were warmer in 2020 than the average of the past 10 years.

### Reporting Summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.