A framework for reconstructing SARS-CoV-2 transmission dynamics using excess mortality data

The transmission dynamics and burden of SARS-CoV-2 in many regions of the world is still largely unknown due to the scarcity of epidemiological analyses and lack of testing to assess the prevalence of disease. In this work, we develop a quantitative framework based on excess mortality data to reconstruct SARS-CoV-2 transmission dynamics and assess the level of underreporting in infections and deaths. Using weekly all-cause mortality data from Iran, we are able to show a strong agreement between our attack rate estimates and seroprevalence measurements in each province and find significant heterogeneity in the level of exposure across the country with 11 provinces reaching near 100% attack rates. Despite having a young population, our analysis reveals that incorporating limited access to medical services in our model, coupled with undercounting of COVID-19-related deaths, leads to estimates of infection fatality rate in most provinces of Iran that are comparable to high-income countries.

A ssessing the true burden of COVID-19 requires a comprehensive surveillance system to monitor outbreaks and a large capacity for diagnostic testing. However, this is particularly challenging for many low-and middle-income countries due to limited diagnostic capacities and access to healthcare [1][2][3][4][5][6] . An alternative approach to measuring the burden of COVID-19 is to monitor changes in all-cause mortality trends with respect to baseline levels based on historic trends (i.e., measuring excess mortality). While excess mortality during the current pandemic may not exactly match the true number of COVID-19-related deaths due to various reasons such as disruptions in treatment for other fatal diseases and socio-economic disparities in health care in different regions of a country, several works have shown that a significant portion of excess deaths in many countries are directly attributable to COVID-19 1,[7][8][9][10][11] . This enables us to use excess mortality as a proxy for estimating the number of infections and fatalities from COVID-19 and correct for under-reporting of cases and deaths that may arise due to limited testing from suspected cases and uncertainty in the number of fatalities attributable to COVID-19 12,13 .
Iran was among the first countries outside mainland China to report a large outbreak of SARS-CoV-2 in February and March 2020 and also acted as a major source for its spread in several countries in the Middle East and elsewhere 11,[14][15][16][17][18] . It was also one of the first countries to experience a second wave of infection after the relaxation of non-pharmaceutical interventions (NPIs) in July 2020 14,19 . By the end of year 2021, the country experienced three additional waves, two of which were driven by the Alpha and Delta variants of SARS-CoV-2, in May and September 2021. Despite a recent increase in the number of daily administered vaccine doses, Iran had a very slow start on its immunisation programme with only~3% of the population being fully vaccinated by the end of July 2021, during the surge in cases with the Delta variant 20 . Because the Ministry of Health and Medical Education (MoHME) of Iran stopped releasing province-level data on the number of confirmed COVID-19 cases and deaths from 2020-03-22, the transmission dynamics across the country have remained largely unknown. One way to estimate the number of infections and account for under-reporting of cases is by converting deaths attributable to COVID-19 to infections using a population-weighted infection fatality ratio (IFR) estimate (i.e., infections = deaths/IFR), taking into account the demographics of the population of interest 21 . However, such estimation of attack rates can be inaccurate or misleading if it does not take into account the impact of limited access to health care in municipalities with low socioeconomic status, uneven adoption of pharmaceutical and non-pharmaceutical intervention in different areas, and the increased chance of re-infection over time. In particular, it is clear from the SARS-CoV-2 epidemics arounds the world during 2020 and 2021 that the demand for intensive care beds and mechanical ventilators can place a severe strain on health systems. This has likely more profound consequences for low-income settings where the availability and quality of healthcare and related resources is typically limited 22 . Neglecting to account for these factors can result in an overestimate of SARS-CoV-2 attack rates and underestimation its potential for continued transmission, with immediate implications for assessments of the potential burden of SARS-CoV-2 variants of concern such as Omicron in a population.
In this work, we use the newly updated province-level agestratified weekly all-cause mortality data from the National Organization for Civil Registration (NOCR) of Iran and develop a mathematical framework to fully reconstruct the Iranian epidemic with the aim to estimate the attack rate, number of daily hospital admissions, deaths, and re-infection rates of SARS-CoV-2 across the country.

Results
Estimating province-level excess mortality in Iran. We obtain age-stratified weekly all-cause mortality data from NOCR and apply a statistical model to calculate the weekly excess mortality for all 31 provinces and across 17 age-groups during the Iranian epidemic (see Methods section). Figure 1a shows 5 distinct peaks in excess mortality with a temporal trend that is strongly associated with the nationwide reported COVID-19 deaths over time. Several provinces show significant levels of excess mortality from the first week of February 2020 suggesting that the Iranian epidemic likely started at least a month before that (Fig. 1b). This is in agreement with previous phylogenetic and epidemiological studies suggesting the epidemic started in late December 2019 to early January 2020 14,23 . Furthermore, during the first two waves of the Iranian epidemic, the excess mortality was roughly 2.5 times higher than reported COVID-19 fatalities. However, this dropped to 2.1 at the later stages suggesting that the testing capacity of the country to record infections and deaths gradually improved over time, but not to the extent to fully capture the majority of COVID-19-related deaths. This is also in agreement with statements from the Iranian health officials suggesting that Iran's COVID-19-related deaths could be twice the official numbers and also previous studies showing elevated levels of under-reporting of COVID-19 fatalities in Iran, particularly in the early stages of the pandemic 14,15 . We also find that the cumulative nationwide excess deaths by 2021-10-22 is 259,641 (95% CI: 230,493-288,790) which is nearly twice the 125,000 reported COVID-19 deaths at the time. Figure 1b shows the intensity of excess mortality as quantified by the number of excess deaths per 100,000 persons in each province over time. During the first wave in spring 2020, only a few provinces showed significant levels of excess mortality with Gilan, Qom, Mazandaran, and Golestan among the hardest-hit provinces. This is likely because these provinces usually attract a large number of tourists and pilgrims around the New Year's holiday in February/March which led to large superspreading events and uncontrolled outbreaks due to limited NPIs before the national lockdown on 2020-03-05 14 . As the epidemic progressed, the timing of the peak excess mortality between different provinces were more simultaneous ( Fig. 1b; Supplementary  Fig. 1). Most notably, during the third wave in October/ November 2020 and the fifth wave in August/September 2021 with the Delta variant of SARS-CoV-2, almost all provinces experienced significant number of fatalities around the same time.
Estimating excess mortality per age-group. We also carry out a similar analysis to estimate the temporal pattern of COVID-19 fatalities for different age-groups (Fig. 1c). We find that the majority of excess deaths are concentrated in the older age categories (>55) and that the third and fifth epidemic waves had the largest impact on younger age groups. Figure 1d shows the per capita ratio of excess mortality per age group with respect to the 40-49 age group. The reason for choosing the 40-49 age group as a baseline is because, unlike younger age groups, excess mortality in this age group is subject to relatively low fluctuations over time due to experiencing more deaths. It also serves as a baseline for measuring the impact of vaccination on lowering excess deaths in older age groups who have received vaccination at an earlier date (see Supplementary Fig. 2). We see that while there is a gradual increase in the ratio of deaths in the older agegroups during the first three waves, there is a two-fold drop in the 60-74 and 75+ age-groups before the start of the fifth wave. This is likely due to a higher vaccination coverage in these age-groups. While the exact number of vaccinated individuals per age-group in Iran is not known, given that the immunisation programme for individuals aged >75 started in May 2021 (first phase priority groups for vaccination in Iran are frontline healthcare workers as well as those aged 80 and above; see 20 for more information), and that Iran fully vaccinated roughly 5 times the population size of individuals aged 75 and above by 2021-09-07, it is likely that the drop in excess mortality in the older age-groups before and during the fifth wave is due to their high vaccination coverage.
Model validation. Assuming that lack of testing is the primary reason behind the discrepancy between excess mortality and official COVID-19 deaths 1,15,24,25 , we attribute the mean excess mortality during the pandemic period to COVID-19. This enables us to reconstruct the SARS-CoV-2 transmission dynamics across the country and infer the population fatality rate (PFR), IFR, and attack rate per province (see Methods section). We first estimate the national and provincial attack rates of SARS-CoV-2 under two different sets of assumptions of the IFR by age relationship and select the better-fitting model by comparing the modelled seroprevalence estimates to data (Supplementary Fig. 3; Supplementary Data 1-6). By tracking the number of people who require a general hospital or ICU bed over time, we were able to account for the impact of bed shortages on increasing IFR for COVID-19 patients who do not receive a bed in each province (Methods section). We find that the model with a higher IFR per age-group (and an overall lower estimated attack rate) that explicitly accounts for an increase in IFR when hospital capacity is reached provides a better fit to seroprevalence data. With a median value of approximately 0.50% (0.32-0.64%), our inferred province-level IFR is generally higher than a population-weighted IFR which only takes into account the demographics of a given province (Table 1; Methods section) and is roughly twice as high as the IFR estimates derived from serological studies in predominantly high-income countries 21 . This discrepancy is expected because disparities in healthcare access can be offset by the younger population age structure of many developing countries resulting in population IFRs that are similar to high-income countries 7,26,27 . Figure 2a, b show a nearly 75% attack rate of SARS-CoV-2 across the country with nearly 15% of all infections due to reinfections by 2021-10-22. We also find that the agedistribution of the modelled COVID-19 deaths are well-aligned with the pattern of excess mortality per age-group and both follow a clear log-linear relationship with respect to age, a signature characteristic of COVID-19 fatalities 28 , suggesting that the model successfully reconstructs age-dependent contact patterns for disease spread in Iran (Fig. 2c). There is also a strong association between the modelled and observed daily hospital admissions over time and across provinces, further indicating that the majority of excess deaths during the pandemic period are directly attributable to COVID-19 ( Supplementary Fig. 4). There is, however, a notably lower number of modelled daily hospital admissions compared to observed hospitalisations in some provinces (e.g., South Khorasan, Lorestan, and Yazd) during the last wave in comparison to earlier waves. To explore these patterns, we use a mixed-effects model to examine the relationship between the number of hospitalisations inferred from our model fits and the observed COVID-19 hospital admissions over time and across Fig. 1 Estimating excess mortality per province and age-group during the pandemic in Iran. a One-month average of weekly excess mortality (blue) and reported COVID-19 deaths (black) in Iran from 2019-12-28 to 2021-10-22. Central line and shaded area show the median and 95% forecast intervals in weekly excess mortality, respectively. b One-month average of weekly excess deaths per 100,000 persons per province over time. c Nationwide excess deaths per 100,000 persons per age group over time. d Ratio of weekly excess deaths per age-group to the 40-49 age-group. Vertical dashed lines show the start of the immunisation programme for individuals aged 75 and above on 2021-05-18 and 60 and above on 2021-07-14. The horizontal dashed lines show the ratio of excess deaths at the start of the immunisation programme for individuals aged 75 and above. epidemic waves while controlling for province-specific variation (see Methods section). Using this framework, we show that there was no significant decrease in the ratio of model inferred hospitalisations to observed hospitalisations during the last wave compared to previous waves, and that the majority of variation in hospitalisation patterns observed is explained by province-level random effects, most likely due to specific changes in treatment seeking or the age demographics being most affected in later waves ( Supplementary Fig. 5). However, a significant decrease in the ratio of model inferred hospitalisations to observed hospitalisations was observed since the start of the pandemic, which suggests that case fatality rates from hospitalised infections have decreased during this period, likely due to a combination of improved case management, increased number of infections resulting from reinfections, and higher vaccination coverage.
Estimating cumulative attack rate of SARS-CoV-2. Using our transmission model, we also calculate the cumulative attack rate by 2021-10-22 for each province (Fig. 2d). Our results show a high degree of heterogeneity in the estimated attack rates across the country with 11 provinces reaching close to and higher than 100% cumulative attack rates after the fifth wave (Supplementary Table 1). While in most regions, the majority of infections took place during the third and fifth waves, the Delta variant of SARS-CoV-2 was responsible for the highest re-infection rates during the fifth wave ( Supplementary Fig. 6). Qazvin, Qom, and East Azerbaijan had the highest per capita mortality rate with cumulative PFRs of nearly 0.4%, while Golestan, Khuzestan, and Qom had the highest re-infection rate with nearly 20% of infections being due to re-infection (Table 1; Supplementary Fig. 6). Figure S1 and Supplementary Fig. 4 also show that while some provinces such as Ardabil, Alborz, Semnan, Golestan, Hamedan, and Mazandaran experienced five distinct epidemic peaks, others like Bushehr, Qazvin, Gilan, Kohgiluyeh and Boyer Ahmad, and Hormozgan experienced sustained transmission of SARS-CoV-2 for an extended period of time throughout the pandemic. In late September 2021, official reports from MoHME indicate that COVID-19 deaths in Sistan and Baluchistan, Ardabil, Ilam, Bushehr, Chahaar Mahal and Bakhtiari, South Khorasan, North Khorasan, Zanjan, Kohgiluye and Boyer Ahmad, Lorestan, Hormozgan, and Hamedan have all dropped to below 5 individuals per day in recent weeks, an almost record-low since the first wave in March 2020 [29][30][31] . This is congruent with our estimates of high attack rates in these provinces which may have resulted in a (temporary) herd-immunity to be reached after the fifth wave.

Discussion
Since the start of the pandemic, the global heterogeneity in COVID-19 reporting systems globally has resulted in substantial debate about the true extent to which COVID-19 has spread in Table 1 The cumulative excess mortality, PFR, estimated IFR (estimated from the transmission model), population-weighted IFR (based on the age-distribution of each province), and percentage of the population ever exposed to SARS-CoV-2 in each province as of 2021-10-22.  death tolls have subsequently relied on excess mortality due to under-reporting of COVID-19 deaths in many parts of the world 2 . This study is the first to formally explore the suitability of excess mortality as a proxy for estimating under-reporting of COVID-19 infections and deaths, confirming the information contained within reliable excess mortality statistics through the broad agreement with representative seroprevalence estimates. Additionally, this analysis lends further support to evidence that IFR patterns estimated in high income countries may underestimate the IFR in settings with stretched healthcare capacity 27 , as experienced in many provinces in Iran at periods during the pandemic 34,35 . Even though in some provinces we find that the transmission model with a lower estimated IFR provides a better fit to seroprevalence estimates, such a model offers a poor fit across most provinces and often results in epidemiologically unrealistic outcomes such that the modelled transmission dynamics outruns the pool of susceptible individuals. We also note that while some factors such as misidentification of baseline all-cause mortality and deaths attributable to COVID-19, particularly in the younger age-groups with fewer COVID-19-related deaths, can contribute to varied attack rates in some provinces, other factors such as nonrepresentative serosurveys, varied time to seroreversion using different assays, and uncertainty in inferring the true contact matrix between different age-groups within the population can impact the inferred attack rates 36,37 . Absence of accurate vaccination data could be another potential source of bias that may influence our modelled attack rates in some provinces, particularly in provinces that experienced high transmission levels due to Delta during the last wave. For instance, if we overestimate the level of protection from vaccines, more individuals need to be infected in a given wave in order to produce an epidemic that correctly matches the observed mortality. Similarly, if more vaccines are distributed to younger individuals than we currently assume in the model, it could lead to an artificial increase in the IFR with those highest at risk not being protected.
Prior to the emergence of the variants of concern, the basic reproduction number of the SARS-CoV-2 virus was estimated to be around 2.8 which corresponds to a herd immunity threshold of about 64% 38 . This threshold does not represent the point at which the outbreak immediately stops, but rather the point where, on average, fewer secondary infections are produced. During unmitigated outbreaks with such reproduction numbers, the attack rate can reach as high as 90% 39,40 . Similarly, during unmitigated outbreaks with variants of concerns such as the Delta variant with increased reproduction numbers compared to the ancestral SARS-CoV-2 virus, the attack rate can reach even higher levels 41 . While the duration of antibody responses after coronavirus infection over longer timescales remains largely unknown, recent surveillance data and comparative studies have shown that re-infection with SARS-CoV-2 is possible after 3-6 months, and that older age-groups have a much lower protection against reinfection 42,43 . Previous studies on human coronaviruses and SARS-CoV have also shown that protective immunity typically starts declining after 5-8 months post infection [44][45][46] . Our findings of high attack rates in several provinces show that herd immunity through natural infection has not been achieved in the population even after nearly 20 months since the start of the Iranian epidemic. This is likely due to substantial reduction in protection against repeat infection over time either due to waning immunity, increased chance of reinfection with variants of concern, or a combination of both. We note that while systematic biases such as identifying the baseline level of all-cause mortality and overestimation of deaths attributable to COVID-19, particularly in the younger age-groups, can contribute to increased attack rates in some provinces, other factors such as heterogeneities in immune protection, limited access to health care in municipalities with low socioeconomic status, and uneven adoption of public health intervention in different regions can also contribute to varied attack rates across the country.

Methods
Excess mortality. We collect the weekly time-series data on all-cause mortality per province per age group from NOCR. We use data from the beginning of year 1394 to the end of summer 1398 in Solar Hijri calendar (SH) (from 2015-03-15 to 2019-09-22 in Common Era) to calculate background mortality. We find significant levels of excess mortality during several weeks (mostly concentrated in weeks 35 to 45) of autumn 2019 prior to the start of the COVID-19 pandemic ( Supplementary  Fig. 7). This has also been reported in a previous study which investigated some of the potential underlying causes of this significant excess mortality 15 . Therefore, in order to avoid biasing our expected mortality estimates from years prior to the pandemic, we exclude mortality data during autumn 2019 (from 2019-09-23 to 2019-12-21) from the calculation of background mortality as it may shift the baseline mortality upwards and, hence, result in the underestimation of excess mortality during the pandemic years in autumn 2020 and 2021. We calculate the expected mortality using a linear regression model that accounts for seasonality and trend, previously developed to track excess mortality across more than 100 countries and territories around the world 1 . As a robustness test, we also calculate expected mortality with the same model under the assumption of an over-dispersed Poisson to see if different methods of calculating excess mortality yields significantly different results. We find that the estimated excess mortality using both methods are virtually identical over time across all provinces ( Supplementary  Fig. 8). For example, we find that by 2021-10-22, the estimated total excess mortality at the national level based on the linear regression is 259,641 (95% CI: 230,493-288,790) persons while, based on the quasi-Poisson model, it is 258,472 (95% CI: 230,369-286,574) persons which shows that the two estimates are very similar to each other.
Transmission model. We use a previously published age-structured COVID-19 transmission model 47 and fitting framework 25 to fit the weekly estimated excess deaths in each province in Iran. In overview, the model is a population-based agestructured Susceptible-Exposed-Infected-Recovered model, which explicitly represents disease severity, passage through different healthcare levels and the roll out of vaccination. Model fitting was carried out within a Bayesian framework using a Metropolis-Hastings Markov Chain Monte Carlo (MCMC) based sampling scheme, which estimates the epidemic start date, R 0 and the time varying reproduction number, R t , using a series of pseudo-random walk parameters, which alter transmission every 2-weeks, given by: where f(x) = 2.exp(x)/(1 + exp(x)), i.e., twice the inverse logit function. Each parameter is introduced two weeks after the previous parameter, serving to capture changes in transmission every two weeks. The last change in transmission, ρ n , is maintained for the last 4 weeks prior to the current day to reflect our inability to estimate the effect size of this parameter due to the approximate 21-day delay between infection and death 22 . The MCMC sampling scheme includes adaptive tuning of the proposal implemented during sampling using the Johnstone-Chang optimisation algorithm 48 . All parameter inference results are based on 100,000 iterations, 10,000 of which are discarded as burn-in. MCMC chains are checked for convergence using standard Gelman-Rubin diagnostics 49 . Each model fit used province-specific demography, with the population size in 5-year age bands sourced from the Statistical Center of Iran 50 . The effective number of general hospital beds and intensive care beds were sourced from the Iran statistical yearbook published in 1396 in Solar Hijri calendar 51 . At the time of writing, MoHME has not published subnational daily vaccination data prior to 2021-10-26, nor the proportion of which vaccines are administered. As a result, we used multiple data sources to approximate the roll of vaccines subnationally. From 2021-10-26, MoHME started to release the total number of first and second vaccine doses subnationally, which we use in combination with national vaccination data 52 to infer the daily vaccination rate per province. A mix of vaccine types have been administered in Iran, with 80% of the population vaccinated with Sinopharm, 10% with AZ and the remaining 10% either Sputkink-V (1.5%) and CoViran Barekat (8.5%). Due to the mix of vaccine types with uncertain efficacy estimates, particularly related to Delta, we use vaccine efficacy estimates for first and second dose efficacy sourced from estimates for the Sinopharm vaccine and other inactivated whole virus vaccine platforms (Table 2). Efficacy estimates are weighted to produce an overall population efficacy by the proportion of vaccinated individuals who have received either a first dose or second dose, and by the assumed proportion of the Delta variant in the population, sourced from the sampling dates of Delta sequences in Iran from GISAID (gisaid.org), lagged by 14 days to reflect the delay from administering vaccine to protection being conferred.
Imposing a single IFR across the whole country may lead to biased attack rate estimates as it does not account for various factors such healthcare capacity or demographics of each province. To avoid this, we assume that the IFR in each province is dependent on the demographics of the province, such that provinces with older populations have, on average, higher IFRs. The model also tracks the number of people who require a general hospital bed or an ICU bed so that if there is a bed shortage during an epidemic peak in a given province then the mortality outcomes for individuals who do not receive a bed will increase. We explore three sets of model assumptions, which scan across the likely range of parameter estimates for key model parameters that impact the inferred attack rate. These parameters relate to the assumed number of general and ICU hospital beds available, the change in hospitalisation related to the Delta variant and the level of immune escape conferred by the Delta variant relative to previous infection by the Alpha variant and the original wild type strains. Here, the level immune escape only refers to immunity from previous infection, which acts by making a proportion of people who would have been protected against reinfection by the same variant susceptible to infection with Delta. With regards to immunity from vaccination, we assume that the impact of Delta is to reduce the level of protection conferred by the vaccines. These ranges are described in Table 2. Given uncertainty in the IFR and its impact of the inferred attack rate, we explore two alternative estimates of IFR by age 21,53 and conduct model comparison between the two references through comparison against province level estimated seroprevalence data 16 , with seroprevalence assumed to be described by a Binomial distribution. Lastly, to explore if IFR is changing over time, we explore the ratio of model inferred hospitalisations and observed daily hospitalisations against time using a mixed-effects model, controlling for variation between provinces using a random intercept and slope with respect to time.
To assess the impact of healthcare capacity constraints on increasing the mortality from COVID-19, we also calculate the mean population-weighted IFR 15 ,  53 , where L is the total number of age-groups, α i is the age-stratified IFR for age-group i, σ i is its standard error, and w i is the proportion of individuals within age-group i in the population. Unlike IFR estimates based on our transmission model which takes into account the impact of various factors such as province-level demographics, impact of healthcare capacity constraints on increased mortality rates, and increased chance of hospitalisation with Delta the population-weighted IFR only accounts for the province-level demographics which enables us to assess the influence of other factors on increasing IFR.
Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
Raw weekly mortality data files in Iran are retrieved from https://www.sabteahval.ir/ avej/Page.aspx?mId=49826&ID=3273&Page=Magazines/SquareshowMagazine Number of active beds in hospital wards by province in year 1396 Solar Hijri (2018) is available at https://iranopendata.org/en/dataset/number-of-active-beds-in-hospitalwards-by-province-in-year-1396/resource/73e29d0b-7cc9-47c0-8b4e-b3e61cc0b800 Number of administered vaccine doses per province is retrieved from the official Telegram channel of the Statistics and Information Technology Center of the Ministry of Health and Medical Education of Iran https://t.me/it_behdasht/134 Daily hospital admission data is collected by the Ministry of Health and Medical Education of Iran. The data as of 22 November 2021 is retrieved from the mask.ir application official telegram channel https://t.me/mask_application/430 Province-specific demography, with the population size in 5-year age bands, is retrieved from the Statistical Center of Iran website https://www.amar.org.ir/english/Iran-Statistical-Yearbook/Iran-Statistical-Yearbook-2019-2020.

Code availability
All software code and analysis scripts are available in an R research compendium at https://github.com/OJWatson/iran-ascertainment 54 .