Caveats on COVID-19 herd immunity threshold: the Spain case

García-García, David; Morales, Enrique; Fonfría, Eva S.; Vigo, Isabel; Bordehore, Cesar

doi:10.1038/s41598-021-04440-z

Download PDF

Article
Open access
Published: 12 January 2022

Caveats on COVID-19 herd immunity threshold: the Spain case

Scientific Reports volume 12, Article number: 598 (2022) Cite this article

3895 Accesses
9 Citations
28 Altmetric
Metrics details

Subjects

Abstract

After a year of living with the COVID-19 pandemic and its associated consequences, hope looms on the horizon thanks to vaccines. The question is what percentage of the population needs to be immune to reach herd immunity, that is to avoid future outbreaks. The answer depends on the basic reproductive number, R₀, a key epidemiological parameter measuring the transmission capacity of a disease. In addition to the virus itself, R₀ also depends on the characteristics of the population and their environment. Additionally, the estimate of R₀ depends on the methodology used, the accuracy of data and the generation time distribution. This study aims to reflect on the difficulties surrounding R₀ estimation, and provides Spain with a threshold for herd immunity, for which we considered the different combinations of all the factors that affect the R₀ of the Spanish population. Estimates of R₀ range from 1.39 to 3.10 for the ancestral SARS-CoV-2 variant, with the largest differences produced by the method chosen to estimate R₀. With these values, the herd immunity threshold (HIT) ranges from 28.1 to 67.7%, which would have made 70% a realistic upper bound for Spain. However, the imposition of the delta variant (B.1.617.2 lineage) in late summer 2021 may have expanded the range of R₀ to 4.02–8.96 and pushed the upper bound of the HIT to 90%.

The herd-immunity threshold must be updated for multi-vaccine strategies and multiple variants

Article Open access 26 November 2021

Decoupling between SARS-CoV-2 transmissibility and population mobility associated with increasing immunity from vaccination and infection in South America

Article Open access 27 April 2022

Model based estimation of the SARS-CoV-2 immunization level in austria and consequences for herd immunity effects

Article Open access 21 February 2022

Introduction

On 11 March 2020, the World Health Organization declared the COVID-19 pandemic, and by 11 March 2021, 2.63 million people had died because of it¹. However, although these are the published figures, there were probably many more undocumented virus related deaths that were not recorded due to lack of tests^2,3,4. After a year of struggling, restrictions to lessen the spread of the virus, a downturn in the economy and the cost of human lives, most people are wondering when the pandemic will end. The year 2020 ended with the hopeful approval of some vaccines⁵, but how many people must be vaccinated to return to pre-pandemic life? The answer is quite complicated since vaccines do not provide 100% protection against infections^6,7 nor fully block the transmissibility of the virus^8,9,10. However, it is theoretically interesting to study when the herd immunity threshold (HIT) will be reached, if possible, under the assumptions that immune population (recovered and vaccinated people) get permanent immunisation against the different mutations of the SARS-CoV-2 virus and will not transmit the virus any further. In Spain, there is a general opinion that the HIT will be reached when 70% of the population becomes immune, which is not equivalent to 70% of vaccinated population in real life. Note that there is no single definition of HIT¹¹ and this can lead to misunderstandings. In this study, HIT will refer to the minimum proportion of the immune population that will produce a monotonic decrease of new infections, even if restrictions are lifted and society returns to a pre-pandemic level of social contact. The question is how realistic is a HIT of 70% for Spain.

The HIT is usually defined in terms of the effective reproduction number, R_e(t), which is the average number of secondary infections produced by an infected individual at time t. Any outbreak starts with R_e > 1, stabilizes with R_e = 1, and declines with R_e < 1. Therefore, the HIT will be reached when R_e = 1 and R_e < 1 afterwards. Given the number of susceptible individuals, that is, those that can get infected, R_e(t) can be estimated in an unmitigated epidemic as^12,13

$$ R_{e} \left( t \right) = R_{0} \cdot \frac{S\left( t \right)}{N}, $$

(1)

where S(t) is the number of susceptible individuals at time t; N is the total number of the population; and R₀ is the basic reproductive number, that is, the expected number of secondary infections produced by an infected individual in a population where all individuals are susceptible and there are no measures to reduce transmission^12,14. The proportion of susceptible, S(t)/N, can be written as 1 − q, where q is the proportion of immune population. Then, if R_e(t) = 1 (and R_e(t) < 1 afterwards), HIT equals q by definition. Replacing these equalities in Eq. (1) and operating, we get^15,16

$$ HIT = 1 - \frac{1}{{R_{0} }}. $$

(2)

Note the direct relationship: the larger the R₀, the larger the HIT; and that Eq. (2) makes sense only when R₀ > 1, since for values R₀ < 1 the disease will disappear naturally and the concept of HIT loses its sense. In Eq. (2) it is intrinsically assumed that recovered individuals cannot become susceptible again, that is, they cannot get re-infected nor transmit the virus after recovery. R₀ is used to quantify the transmissibility of the virus, which depends on the virus itself and the characteristics of the population that is being infected. Regarding other infectious diseases, typical values of R₀ are 0.9–2.1 for seasonal flu and 1.4–2.8 for the 1918 flu¹⁷, ~ 3 for SARS-CoV-1¹⁸ and < 0.8 for MERS¹⁹. For COVID-19 in 2020, a systematic review of 21 studies, mainly in China, found R₀ ranging from 1.9 to 6.5²⁰, which leads to HIT values between 47 and 84%. However, in 62% of these studies, the R₀ was between 2 and 3 (HIT between 50 and 67%). In Western Europe in 2020, an average R₀ was estimated at 2.2 (95% CI = [1.9, 2.6])²¹, with a HIT value of 55% (95% CI = [47, 62]). Therefore, 70% is an upper bound of HIT in 2020 in most of the cited cases, but not in all.

Theoretically, R₀ can only be observed at the very beginning of the pandemic, while the whole population are susceptible and no control measures are in force (e.g., social distancing, the use of masks, etc.). This is the case in the above-mentioned studies^20,21. However, during the COVID-19 pandemic the virus has mutated into more transmissible variants, with a higher R₀. In consequence, the HIT has been increasing during the course of the pandemic, but its estimated value cannot be directly updated because the new variants did not exist at the beginning of the pandemic when the R₀ should have been observed.

This study encompasses a detailed analysis of the HIT of the ancestral variant, that was the dominant variant at the beginning of the pandemic, from different approaches and quantifies the influence of three key factors: (1) source/quality of data; (2) infectiousness evolution over time; and (3) methodology to estimate R₀. Finally, we indirectly estimate the R₀ of the current dominant variants using Eq. (1) and comparisons between R_e values of several variants. The HIT values derived from these new R₀ estimates are discussed in the last section.

Data

Three COVID-19 daily infection datasets for Spain were used, from 1 January to 29 November 2020: (1) official infections published by the Instituto de Salud Carlos III (ISCIII,²²); and Infections estimated with the REMEDID algorithm²³ from (2) official COVID-19 deaths²², and (3) excess of all-causes deaths (ED) from European Mortality Monitoring surveillance system (MoMo,²⁴). The REMEDID-derived infection data are more realistic than official infection data since they assimilate seroprevalence studies data²⁵ and known dynamics of COVID-19 (see²³, for further discussion). As the last national longitudinal seroprevalence study in Spain finished on 29 November 2021, our REMEDID time series has been estimated up to that date. This is not a limitation for this study since only data up to March 2020 will be used (see next section).

Intrinsic growth rate

At the beginning of an outbreak the infections, I(t), increase exponentially^12,16 and can be fitted to the model

$$ I\left( t \right) = ae^{rt} + \varepsilon \left( t \right), $$

(3)

where $\varepsilon \left( t \right)$ accounts for errors in the fitting; t is time; a is a positive number determining the point where the function crosses the ordinate axis, and then depends on where the origin of time has been set; and r is a positive number called intrinsic growth rate or Malthusian number, that defines the increasing rate of the exponential growth. r is usually the first property that epidemiologists estimate in an outbreak. The higher the r, the higher the speed in the increase of cases. When comparing diseases, r is an indicator of contagiousness, as is R₀. In fact, with enough information about the latent and infectious periods, r (t⁻¹ units) can be used to estimate R₀ (dimensionless), although the relationship is not simple²⁶. In the latent period (exposed in a Susceptible-Exposed-Infected-Recovered (SEIR) model), an infected individual cannot produce a secondary infection, unlike in the infectious period, where secondary infections may be produced.

When estimating r, it must be kept in mind that I(n) (Fig. 1a), where n denotes time discretized in days, increases exponentially during a short period of time. Consequently, the first problem is to figure out the latest day, n₀, before I(n) will abandon the strictly exponential growth because of the diminishing of the number of susceptible individuals. To estimate n₀, we use the property that during the exponential growth I(n) is not only rising, but is accelerating with an increasing acceleration. Then, n₀ is the day where the first maximum of I″(n), the second (discrete) derivative of I(n), is reached. For REMEDID I(n), from both official and MoMo data, n₀ is 23 February 2020 (Fig. 1c). Figure 2 shows the least-squares best fit of Eq. (3) to REMEDID I(n) truncated at n₀, whose parameters are:

(1)
a = 11.86 (95% CI = [11.01, 12.70]) and r = 0.1592 (95% CI = [0.1576, 0.1609]), when MoMo ED are used;
(2)
a = 10.11 (95% CI = [9.25, 10.96]) and r = 0.1591 (95% CI = [0.1571, 0.1610]), when official deaths are used.

Considering the Bonferroni correction, the difference between the two estimates of r has a CI = [− 0.0034, 0.0038], which has at least a 90% of confidence level. Since the CI includes the value 0, there is no evidence that these two parameters are different. Besides, a linearization of the model allows to perform a contrast of hypothesis on r, that confirms that there is no significant discrepancy between the two estimates of r. Then, REMEDID I(n) will be estimated from MoMo ED hereafter. Applying the same hypothesis for contrast, it can be observed that the a parameters are significatively different. However, since a value is not relevant to determine the growth rate, which is our aim here, we will not discuss its estimated values. If the same analysis were carried out with official I(n), which were not reliable at the beginning of the pandemic, we would get r = 0.2322 (95% CI = [0.2266, 0.2377]) and the end of the exponential growth on 5 March 2020. This value is significantly different, at least at 90% confidence level after Bonferroni correction, from the r estimated from any REMEDID I(n) since the CI of their differences do not include the 0. A contrast of hypothesis confirms this discrepancy. Note that despite the larger value of r from official I(n) the fitted exponential is smaller than those estimated from REMEDID I(n) (Fig. 2) because of the horizontal shift due to differences in the a parameter. The end of the exponential growth has been estimated from 7-days running averaged versions of I(n), I′(n), and I″(n) (Fig. 1a–c respectively). It has to be said that at the beginning of the outbreak, the official data underestimated the number of infections due to the low sampling capability.

Estimates of R ₀

Generation time

During the infectious period, an infected individual may produce a secondary infection. However, the individual’s infectiousness is not constant during the infectious period, but it can be approximated by the probability distribution of the generation time (GT), which accounts for the time between the infection of a primary case and the infection of a secondary case. Unfortunately, such distribution is not as easy to estimate as that of the serial interval, which accounts for the time between the onset of symptoms in a primary case to the onset of symptoms of a secondary case. This is because the time of infection is more difficult to detect than the time of symptoms onset. Ganyani et al.²⁷ developed a methodology to estimate the distribution of the GT from the distributions of the incubation period and the serial interval. Assuming an incubation period following a gamma distribution with a mean of 5.2 days and a standard deviation (SD) of 2.8 days, they estimated the serial interval from 91 and 135 pairs of documented infector-infectee in Singapore and Tianjin (China). Then, they found that the GT followed a gamma distribution with mean = 5.20 (95% CI = [3.78, 6.78]) days and SD = 1.72 (95% CI = [0.91, 3.93]) for Singapore (hereafter GT₁), and with mean = 3.95 (95% CI = [3.01, 4.91]) days and SD = 1.51 (95% CI = [0.74, 2.97]) for Tianjin (hereafter GT₂). Ng et al.²⁸ applied the same methodology to 209 pairs of infector-infectee in Singapore and determined a gamma distribution with mean = 3.44 (95% CI = [2.79, 4.11]) days and SD 2.39 (95% CI = [1.27, 3.45]; hereafter GT₃). Figure 3 shows the probability density functions (PDF) of such distributions, f_GT. The differences between them are remarkable. For example, the 54.5%, 81.0%, and 80.7% of the contagions are produced in a pre-symptomatic stage (in the first 5.2 days after primary infection) assuming GT₁, GT₂, and GT₃, respectively.

Theoretically, assuming that the incubation periods of two individuals are independent and identically distributed, which is quite plausible, the expected/mean values of the GT and the serial interval should be equal^29,30. The mean of the serial interval is easier to estimate than that of the GT. For that reason, we assume a mean serial interval as estimated from a meta-analysis of 13 studies involving a total of 964 pairs of infector-infectee, which is 4.99 days (95% CI = [4.17, 5.82])³¹, is more reliable than the aforementioned means of the GT. This value is within the error estimates of the means of GT₁ and GT₂, but not for GT₃. Then, we construct a theoretical distribution for the GT that follows a gamma distribution (hereafter GT_th) with mean = 4.99 days and SD = 1.88 days. This theoretical distribution can be seen in Fig. 3 and approximates the average PDF of three gamma distributions with mean = 4.99 and the SD of GT₁, GT₂, and GT₃. We assume a conservative CI = [1.51, 2.39] for the theoretical SD, defined with the minimum and maximum SD values of GT₁, GT₂, and GT₃. GT_th shows 63.1% of pre-symptomatic contagions.

R ₀ from r

In theory, the basic reproduction number R₀ can be estimated as far as the intrinsic growth rate r, and the distributions of both the latent and infectious periods are known^26,32,33,34. The latent period accounts for the period during which an infected individual cannot infect other individuals. It is observed in diseases for which the infectious period starts around the end of the incubation period, as happened with influenza³⁵ and SARS³⁶. However, from Fig. 3 it is inferred that COVID-19 is transmissible from the moment of infection, and we will assume a null latent period. Then, if the GT follows a gamma distribution, R₀ can be estimated from the formulation of Anderson and Watson³², which was adapted to null latent periods by Yan²⁶ as

$$ R_{0} = \frac{{mean_{GT} }}{{1 - \left( {1 + mean_{GT} \cdot r \cdot \frac{1}{{shape_{GT} }}} \right)^{{ - shape_{GT} }} }} \cdot r, $$

(4)

where mean_GT is the mean GT and shape_GT is one of the two parameters defining the gamma distribution, which can be estimated as

$$ shape_{GT} = \frac{{\left( {mean_{GT} } \right)^{2} }}{{\left( {SD_{GT} } \right)^{2} }}. $$

(5)

For GT_th, we get R₀ = 1.50 (CI = [1.41, 1.61]) for REMEDID I(n) and R₀ = 1.76 (CI = [1.60, 1.94]) for official I(n). For the other three GT distributions, R₀ ranges from 1.39 (CI = [1.27, 1.58]) to 1.51 (CI = [1.34, 1.80]) for REMEDID I(n) and from 1.59 (CI = [1.40, 1.88]) to 1.78 (CI = [1.51, 2.23]) for official I(n) (Table 1). In all cases, R₀ from GT_th are within those from the three known GT distributions and indistinguishable from them within the error estimates. The lower (upper) bound of the CI is estimated as the minimum (maximum) R₀ obtained from all the possible combinations of 100 evenly spaced values covering the CI of r, mean_GT and SD_GT. Then, following the Bonferroni correction, the reported CI present at least a 85% of confidence level for GT₁, GT₂, and GT₃, but it cannot be assured for GT_th since the CI of its SD is unknown. In general, all these R₀ estimates are lower than those summarised by Park et al.²⁰.

Table 1 R₀ and HIT values of the ancestral SARS-CoV-2 variant estimated from GT₁, GT₂, GT₃, and GT_th, and REMEDID and official infections. For date₀, “Dec.” means December 2019, and “Jan.” means January 2020.

Full size table

Alternatively, R₀ can be estimated by applying the Euler–Lotka equation^29,33,

$$ R_{0} = \frac{1}{{\mathop \smallint \nolimits_{0}^{ + \infty } e^{ - rt} \cdot f_{GT} \left( t \right)dt}}. $$

(6)

In this case, we get values closer to previous estimates²⁰. In particular, for GT_th, we get R₀ = 2.12 (CI = [1.81, 2.48]) for REMEDID I(n) and R₀ = 2.92 (CI = [2.28, 3.75]) for official I(n). For the other three GT distributions, R₀ ranges from 1.63 (CI = [1.43, 1.90]) to 2.21 (CI = [1.59, 2.95]) for REMEDID I(n) and from 1.97 (CI = [1.59, 2.54]) to 3.11 (CI = [1.84, 4.90]) for official I(n) (Table 1). The CI are estimated as in Eq. (4).

R₀ from a dynamical model

We designed a dynamic model with Susceptible-Infected-Recovered (SIR) as stocks that accounts for the infectiousness of the infectors. Such a model is a generalisation of the Susceptible-Exposed-Infected-Recovered (SEIR) model³⁷. Births, deaths, immigration and emigration are ignored, which seems reasonable since the timescale of the outbreak is too short to produce significant demographic changes. For the sake of simplicity, the recovered stock includes recoveries and fatalities, and it is denoted as R(t). A random mixing population is assumed, that is a population where contacts between any two people are equally probable. Time is discretized in days, so the real time variable t is replaced by the integer variable n. As a consequence, the derivatives in the differential equations defining the dynamic model explained below are discrete derivatives.

The size of the population is fixed at N = 100,000, and then, for any day n we get

$$ \tilde{S}\left( n \right) + \left( {\mathop \sum \limits_{k = 0}^{20} \tilde{I}\left( n-k \right)} \right) + \tilde{R}\left( n \right) = N, $$

(7)

where $\tilde{S}\left( n \right)$, $\tilde{I}\left( n \right)$, and $\tilde{R}\left( n \right)$ are the discretized versions of S(t), I(t), and R(t) and $\tilde{I}$ is assumed to be null for negative integers. The summation is a consequence of the infectiousness, which is approximated according to the GT, whose PDF is discretized as

$$ \widetilde{{f_{GT} }}\left( n \right) = \mathop \smallint \limits_{n - 1}^{n} f_{GT} \left( t \right) dt, $$

(8)

from n = 1 to 20. Figure 3 shows $\widetilde{{f_{GT} }}\left( n \right)$ for GT_th. Truncating at n = 20 accounts for 99.99% of the area below the PDF of all the GT. Then, an infected individual at day n₀ is expected to produce on average

$$ \widetilde{{R_{e} }}\left( {n_{0} + n} \right) \cdot \widetilde{{f_{GT} }}\left( n \right) $$

(9)

infections n days later, where $\widetilde{{R_{e} }}\left( n \right)$ is the discretized version of R_e(t). From this expression, it is obvious that values of $\widetilde{{R_{e} }}\left( n \right) < 1$ will produce a decline of infections. Conversely, infections at day n₀ are produced by all individuals infected during the previous 20 days as

$$ \tilde{I}(n_{0} ) = \tilde{R}_{e} \left( {n_{0} } \right) \cdot \left( {\mathop \sum \limits_{n = 1}^{20} \tilde{I}\left( {n_{0} - n} \right) \cdot \widetilde{{f_{GT} }}\left( n \right)} \right), $$

(10)

whose continuous version has been reported in previous studies^29,38. The expression in brackets is called total infectiousness of infected individuals at day n₀³⁹. According to Eq. (1), Eq. (10) can be expressed in terms of R₀ as

$$ \tilde{I}(n_{0} ) = R_{0} \cdot \frac{{\tilde{S}\left( {n_{0} } \right)}}{N} \cdot \left( {\mathop \sum \limits_{n = 1}^{20} \tilde{I}\left( {n_{0} - n} \right) \cdot \widetilde{{f_{GT} }}\left( n \right)} \right). $$

(11)

As we want a dynamic model capable of providing $\tilde{I}\left( {n_{0} } \right)$ from the stocks at time step n₀ − 1, we replaced $\tilde{S}\left( {n_{0} } \right)$ by $\tilde{S}\left( {n_{0} - 1} \right)$ in Eq. (11). This assumption makes sense in a discrete domain since the infections at time n₀ take place in the susceptible population at time n₀ − 1. Then, assuming that all stocks are set to zero for negative integers, our dynamic model can be expressed in terms of Eq. (7) and the following differential equations:

$$ \delta \tilde{I}(n_{0} ) = R_{0} \cdot \frac{{\tilde{S}\left( {n_{0} - 1} \right)}}{N} \cdot \left( {\mathop \sum \limits_{n = 1}^{20} \tilde{I}\left( {n_{0} - n} \right) \cdot \widetilde{{{\text{f}}_{GT} }}\left( n \right)} \right) - \tilde{I}(n_{0} - 1), $$

(12)

$$ \delta \tilde{S}\left( {n_{0} } \right) = {-}\tilde{I}\left( {n_{0} } \right), $$

(13)

$$ \delta \tilde{R}\left( {n_{0} } \right) = \tilde{I}\left( {n_{0} - 21} \right), $$

(14)

where $\delta \tilde{I}$, $\delta \tilde{S}$, and $\delta \tilde{R}$ are the (discrete) derivatives of $\tilde{I}$, $\tilde{S}$, and $\tilde{R}$, respectively. Applying the initial conditions $\tilde{S}\left( 0 \right) = N - 1$, $\tilde{I}\left( 0 \right) = 1$, and $\tilde{R}\left( 0 \right) = 0$, it is assumed that the outbreak was produced by only one infector. The latter is not true in Spain, since several independent introductions of SARS-CoV-2 were detected⁴⁰. However, for modelling purposes it is equivalent to introducing a single infection at day 0 or M infections produced by the single infection n days later. Then, the date of the initial time n = 0 is accounted as a parameter date₀, which is optimised, as well as R₀, to minimise the root-mean square of the residual between the model simulated $\tilde{I}\left( n \right)$ and the REMEDID and official I(n) for the period from date₀ to n₀.

The model was implemented in Stella Architect software v2.1.1 (www.iseesystems.com) and exported to R software v4.1.1 with the help of deSolve (v1.28) and stats (v4.1.1) packages, and the Brent optimisation algorithm was implemented. For REMEDID I(n) and GT_th, we obtained date₀ = 13 December 2019 and R₀ = 2.71 (CI = [2.33, 3.15]). Optimal solutions combine lower/higher R₀ and earlier/later date₀ (Fig. 4), which highlights the importance of providing an accurate first infection date to estimate R₀. When the other three GT distributions were considered, we obtained similar date₀, ranging from 12 to 17 December 2019, and R₀ values ranging from 2.08 (CI = [1.86, 2.42]) to 2.85 (CI = [2.05, 3.25]; see Table 1). For official infections, date₀ was set to 1 January 2020 for all cases, and R₀ ranged from 1.81 (CI = [1.64, 2.07]) to 2.41 (CI = [1.80, 2.91]). The CI are estimated as in Eq. (4).

Herd immunity threshold and discussion

HIT of the ancestral variant was estimated from R₀ via Eq. (2) and values are shown in Table 1, which range between 28.1 (CI = [21.3, 36.7]) and 64.9% (CI = [51.2, 69.2]) for REMEDID I(n) (hereafter HIT_R), and between 37.0 (CI = [28.4, 46.7]) and 67.8% (CI = [45.7, 79.6]) for official I(n) (Hereafter HIT_O). The differences between the estimations are determined by three key factors: (1) source/quality of data; (2) GT distribution; and (3) methodology to estimate R₀.

In general, official infection data are of poor quality, but if death records and seroprevalence studies were available, the REMEDID algorithm would provide more reliable infections time series²³. The maximum difference between HIT_R and HIT_O is 13.1 percentage points, corresponding to the Eq. (6) estimate, although such difference is not significant within the errors estimates. Moreover, official data vary depending on the date of publication. For example, the maximum HIT_O is 67.7%. from data available in February 2021, and 80.1% from data available a year before, in March 2020. The latter is similar to the 80.7% published by Kwok et al.⁴¹ in March 2020, which was obviously based on data available at that time. The February 2021 version of the data is more realistic than the March 2020 one, and the REMEDID-derived infections are more realistic than both of them²³. In consequence, results based on REMEDID data should be more reliable.

The most influential factor for estimating the HIT is the methodology to estimate R₀, which may produce differences of ~ 30 percentage points for HIT_R and ~ 20 points for HIT_O for the same dataset and GT distribution. Such differences are significant within the error estimates for all GT in HIT_R and only for GT_th in HIT_O. For each GT, the lowest HIT values were obtained from Eq. (4), but the largest HIT_R and HIT_O are obtained from the dynamic model and Eq. (6), respectively. The CI from Eq. (6) and the dynamic model are longer than those from Eq. (4), meaning that the former are more sensitive to errors in the involved parameters. Moreover, the largest errors are obtained from Eq. (6) for both HIT_R and HIT_O, although they are larger for HIT_O. It means that Eq. (6) is the methodology most sensitive to parameters and data quality. In general, results from Eq. (6) are reconcilable with the other two within the error estimates, but Eq. (4) and the dynamic model are only reconcilable for official data (Table 1).

The selection of a GT produces HIT differences up to 6 percentage points when R₀ is estimated from Eq. (4); 18.7 from Eq. (6); and 13.7 from the dynamic model, although in no case are significantly different within the error estimates. It is more difficult to estimate the GT than the serial interval. For that reason, many studies approximate the GT by a serial interval (e.g.^39,41). However, though GT and serial interval have the same mean, serial interval presents a larger variance³⁰, which will underestimate R₀ when using Eq. (6) ²⁹. HIT values from Eq. (4) for any GT are included in the CI obtained for the other GT. On the contrary, although all the CI estimated from Eq. (6) overlap among them, only some HIT values are included in the CI estimated for other GT. This is also the situation for the HIT estimated from the dynamic model.

The influential factors should be kept in mind when interpreting R₀ estimates. For example, Locatelli et al.²¹ estimated an average R₀ of 2.2 (CI = [1.9, 2.6]) for Western Europe by using official data available in September 2020, a theoretical approximation of GT, and Eq. (6). For any GT in Table 1 it can be observed that: (1) official data produces the highest R₀ values for Eq. (6) with respect to Eq. (4), and the dynamic model; and that (2) the more realistic REMEDID data also produces lower R₀ values when Eq. (6) is used. Then, it could be conjectured that the R₀ reported by Locatelli et al.²¹ is in the upper bound of all the possible R₀ estimates for Western Europe.

In summary, accurately estimating HIT is quite complicated. In any case, assuming that REMEDID-derived infection data are more accurate than official data, 70% seems to be a good upper bound of HIT for the ancestral variant. However, the upper bound increases to 80% (accounting for the CI) if we rely on official data. Besides, the most important impediment to determine the value of the HIT is that it is variable in time. The more transmissible new SARS-CoV-2 variants present higher R_e, and in consequence a higher (theoretical) associated R₀ and higher HIT values. For example, the B.1.1.7 lineage (also known as alpha variant), which was first detected in England in September 2020⁴², and thereafter rapidly spread around the world. In Spain, at the beginning of January 2021, the alpha variant was ~ 30% of the circulating SARS-CoV-2 variants, but it was over 80% from March to May 2021⁴³. On the other hand, the B.1.617.2 lineage (also known as delta variant), first detected in India in December 2020⁴⁴, has represented over 95% of the SARS-CoV-2 variants in Spain from late July to at least up to October 2021⁴³. Both alpha and delta displaced the previous variants because of their higher transmissibly. Although the R₀ cannot be directly estimated for these variants since they appeared in the middle of the pandemic, the R_e can. It has been estimated that the R_e of the alpha variant is ~ 70% higher than in previous existing variants⁴⁵. On the other hand, the R_e of the delta variant is also ~ 70% higher than in alpha variant⁴⁶. Following Eq. (1) and assuming that the variations of R_e from one variant to another are not produced by changes in the control measures, it can be inferred that the R₀ of the alpha and delta variants are 70% and 189% higher than the ancestral variant, respectively. Therefore, if we take the highest estimate of R₀ in Table 1 (R₀ = 3.11, CI = [1.84, 4.90]; for GT₁, Eq. (6), official data) as an upper bound of the R₀ of the ancestral variant, we get that 8.99 (CI = [5.32, 14.16]) is an upper bound estimate of the delta variant R₀. In that case, we can conclude that an upper bound of the HIT at present in Spain is 88.9% (CI = [81.2, 92.9]). For a more realistic upper bound, we could alternatively take the maximum R₀ for REMEDID data in Table 1 (R₀ = 2.85, CI = [2.05, 3.25]; for GT₁, dynamic model) as an upper bound, which would produce R₀ = 8.24 (CI = [5.92, 9.39]) and HIT = 87.9% (CI = [83.1, 89.4%]) as upper bound for the delta variant, in agreement with previous estimates⁴⁶. Then, a HIT of 90% seems to be realistic for Spain with a predominant delta variant as in October 2021.

The presented results are valid for a randomly mixing population with a spread dynamic similar to Spain as a whole. However, even Spanish regions show different dynamics between themselves²³, which may lead to specific HIT values for each region. It should be kept in mind that none of the three vaccines administered in Spain are able to completely prevent the transmission of the virus. Then, even with a 90% of the population vaccinated, the HIT will probably not be reached. However, it is true that the risk of infection is significatively reduced for vaccinated susceptible individuals^6,7, which directly reduces the R₀. Besides, in case of infection, the transmission of the virus is also reduced^8,9,10, which modifies the associated GT, and reduces the R₀ and the HIT of a vaccinated population. So, even if transmission is not completely prevented by vaccines, the greater the proportion of the vaccinated population, the lower the HIT. Therefore, it is expected that the HIT of a highly vaccinated population will be below the estimated 90% upper bound. However, all this may change with the emergence and spread of new variants with re-infection capacity⁴⁷. In any case, even if the HIT is reached, it will not be the panacea. First, if HIT is reached in most places in a country but there are some specific regions or population subgroups in a region with a percentage of immune individuals below HIT, local outbreaks will be possible for those regions or subgroups. Second, the final size of an epidemic in a randomly mixing population with HIT = 70% and 90% is reached at 95.9% and 99.9% of infections, respectively^15,37. This means that if the ancestral variant would have not been replaced, the decreasing rate of infections after reaching a HIT of 70% may still produce a non-negligible 25.9% of infections, that is 12.2 million infections in Spain. Third, interpretation of HIT values must be done carefully and overoptimistic messages should be avoided as has been learnt from Manaus in the Brazilian state of Amazonas. In October 2020, it was thought that Manaus had reached the HIT with 76% of infected population⁴⁸, which led to a relaxation of the control measures. However, either because the percentage of infected population was not accurately estimated or because the new SARS-CoV-2 P.1 variant was capable of re-infecting, Manaus had a second wave in January 2021 with a higher mortality rate than in the first one⁴⁹. Therefore, health authorities should strictly ensure an adaptive and proactive management of the new situation after theoretical herd immunity is reached.

References

Our world in data, https://ourworldindata.org/coronavirus-data-explorer). Accessed 1 Apr 2021.
Defunciones según la Causa de Muerte—Avance enero-mayo de 2019 y de 2020. Notas de prensa del Instituto Nacional de Estadística (2020). https://www.ine.es/dyngs/INEbase/es/operacion.htm?c=Estadistica_C&cid=1254736176780&menu=ultiDatos&idp=1254735573175. Accessed 16 Feb 2021.
Kung, S. et al. Underestimation of COVID-19 mortality during the pandemic. ERJ Open Res. 7(1), 00766. https://doi.org/10.1183/23120541.00766-2020 (2021).
Article PubMed PubMed Central Google Scholar
Modi, C., Böhm, V., Ferraro, S., Stein, G. & Seljak, U. Estimating COVID-19 mortality in Italy early in the COVID-19 pandemic. Nat. Commun. 12, 2729. https://doi.org/10.1038/s41467-021-22944-0 (2021).
Article CAS PubMed PubMed Central ADS Google Scholar
European Medicines Agency, https://www.ema.europa.eu/en/human-regulatory/overview/public-health-threats/coronavirus-disease-covid-19/treatments-vaccines/covid-19-vaccines. Accessed 1 Apr 2021.
Hall, V. J. et al. COVID-19 vaccine coverage in health-care workers in England and effectiveness of BNT162b2 mRNA vaccine against infection (SIREN): A prospective, multicentre, cohort study. The Lancet. 397(10286), 1725–1735. https://doi.org/10.1016/S0140-6736(21)00790-X (2021).
Article CAS Google Scholar
Thompson, M. G. et al. Interim estimates of vaccine effectiveness of BNT162b2 and mRNA-1273 COVID-19 vaccines in preventing SARS-CoV-2 infection among health care personnel, first responders, and other essential and frontline workers—Eight US Locations, December 2020–March 2021. MMWR Morb. Mortal. Wkly. Rep. 70, 495–500. https://doi.org/10.15585/mmwr.mm7013e3 (2021).
Article CAS PubMed PubMed Central Google Scholar
Levine-Tiefenbrun, M. et al. Initial report of decreased SARS-CoV-2 viral load after inoculation with the BNT162b2 vaccine. Nat. Med. 27, 790–792. https://doi.org/10.1038/s41591-021-01316-7 (2021).
Article CAS PubMed Google Scholar
Emary, K. R. W. et al. Efficacy of ChAdOx1 nCoV-19 (AZD1222) vaccine against SARS-CoV-2 variant of concern 202012/01 (B.1.1.7): An exploratory analysis of a randomised controlled trial. The Lancet. 397(10282), 1351–1362. https://doi.org/10.1016/S0140-6736(21)00628-0 (2021).
Article CAS Google Scholar
Shah, A. S. V. et al. Effect of vaccination on transmission of SARS-CoV-2. N. Engl. J. Med. https://doi.org/10.1056/NEJMc2106757 (2021).
Article PubMed PubMed Central Google Scholar
Fine, P., Eames, K. & Heymann, D. L. “Herd immunity”: A rough guide. Clin. Infect. Dis. 52(7), 911–916. https://doi.org/10.1093/cid/cir007 (2011).
Article PubMed Google Scholar
Anderson, R. M. & May, R. M. Infectious Diseases of Humans: Dynamics and Control (Oxford University Press, 1991).
Google Scholar
Hannon, B. & Ruth, M. Dynamic Modeling of Diseases and Pests (Springer, 2009).
Google Scholar
Heesterbeek, J. A. P. A brief history of R₀ and a recipe for its calculation. Acta. Biotheor. 50, 189–204 (2002).
Article CAS Google Scholar
Kermack, W. O. & McKendrick, A. G. A contribution to the mathematical theory of epidemics. Proc. R. Soc. Lond. A. 115, 700–721 (1927).
Article ADS Google Scholar
Keeling, M. & Rohani, P. Modeling Infectious Diseases in Humans and Animals (Princeton University Press, 2008). https://doi.org/10.2307/j.ctvcm4gk0.
Book MATH Google Scholar
Coburn, B. J., Wagner, B. G. & Blower, S. Modeling influenza epidemics and pandemics: Insights into the future of swine flu (H1N1). BMC Med. 7, 30. https://doi.org/10.1186/1741-7015-7-30 (2009).
Article CAS PubMed PubMed Central Google Scholar
Bauch, C. T., Lloyd-Smith, J. O., Coffee, M. P. & Galvani, A. P. Dynamically modeling SARS and other newly emerging respiratory illnesses: Past, present, and future. Epidemiology 16, 791–801. https://doi.org/10.1097/01.ede.0000181633.80269.4c (2005).
Article PubMed Google Scholar
Breban, R., Riou, J. & Fontanet, A. Interhuman transmissibility of Middle East respiratory syndrome coronavirus: Estimation of pandemic risk. Lancet https://doi.org/10.1016/S0140-6736(13)61492-0 (2013).
Article PubMed PubMed Central Google Scholar
Park, M., Cook, A. R., Lim, J. T., Sun, Y. & Dickens, B. L. A systematic review of COVID-19 epidemiology based on current evidence. J. Clin. Med. 9, 967. https://doi.org/10.3390/jcm9040967 (2020).
Article CAS PubMed Central Google Scholar
Locatelli, I., Trächsel, B. & Rousson, V. Estimating the basic reproduction number for COVID-19 in Western Europe. PLoS ONE 16(3), e0248731. https://doi.org/10.1371/journal.pone.0248731 (2021).
Article CAS PubMed PubMed Central Google Scholar
Instituto de Salud Carlos III, https://cnecovid.isciii.es/covid19/#documentaci%C3%B3n-y-datos. Accessed 12 Feb 2021.
García-García, D. et al. Retrospective methodology to estimate daily infections from deaths (REMEDID) in COVID-19: The Spain case study. Sci. Rep. 11, 11274. https://doi.org/10.1038/s41598-021-90051-7 (2021).
Article CAS PubMed PubMed Central ADS Google Scholar
European Mortality Monitoring surveillance system, http://www.euromomo.eu. Accessed 16 Feb 2021.
Pollán, M. et al. Prevalence of SARS-CoV-2 in Spain (ENE-COVID): A nationwide, population-based seroepidemiological study. The Lancet. 396(10250), 535–544. https://doi.org/10.1016/S0140-6736(20)31483-5 (2020).
Article Google Scholar
Yan, P. Separate roles of the latent and infectious periods in shaping the relation between the basic reproduction number and the intrinsic growth rate of infectious disease outbreaks. J. Theor. Biol. 251, 238–252. https://doi.org/10.1016/j.jtbi.2007.11.027 (2008).
Article MathSciNet PubMed MATH ADS Google Scholar
Ganyani, T. et al. Estimating the generation interval for coronavirus disease (COVID-19) based on symptom onset data. Euro Surveill. 25, 2000257. https://doi.org/10.2807/1560-7917.ES.2020.25.17.2000257 (2020).
Article PubMed Central Google Scholar
Ng, S. et al. Estimating transmission parameters for COVID-19 clusters by using symptom onset data, Singapore, January–April 2020. Emerg. Infect. Dis. 27(2), 582–585. https://doi.org/10.3201/eid2702.203018 (2021).
Article CAS PubMed PubMed Central Google Scholar
Britton, T. & Tomba, G. S. Estimation in emerging epidemics: Biases and remedies. J. R. Soc. Interface 16, 20180670. https://doi.org/10.1098/rsif.2018.0670 (2019).
Article PubMed PubMed Central Google Scholar
Lehtinen, S., Ashcroft, P. & Bonhoeffer, S. On the relationship between serial interval, infectiousness profile and generation time. J. R. Soc. Interface. 18, 20200756. https://doi.org/10.1098/rsif.2020.0756 (2021).
Article PubMed PubMed Central Google Scholar
Fonfría, E. S. et al. COVID-19 epidemiological parameters for clinical and mathematical modeling: Mini-review and meta-analysis from Asian studies during early phase of pandemic. Front. Med. https://doi.org/10.1101/2020.06.17.20133587v1 (2021).
Article Google Scholar
Anderson, D. & Watson, R. On the spread of a disease with gamma distributed latent and infectious periods. Biometrika 67(1), 191–198. https://doi.org/10.1093/biomet/67.1.191 (1980).
Article MATH Google Scholar
Wallinga, J. & Lipsitch, M. How generation intervals shape the relationship between growth rates and reproductive number. Proc. R. Soc. B. 274, 599–604 (2007).
Article CAS Google Scholar
Roberts, M. G. & Heesterbeek, J. A. P. Model-consistent estimation of the basic reproduction number from the incidence of an emerging infection. J. Math. Biol. 55, 803–816 (2007).
Article MathSciNet CAS Google Scholar
Lau, L. L. et al. Viral shedding and clinical illness in naturally acquired influenza virus infections. J Infect Dis. 201(10), 1509–1516. https://doi.org/10.1086/652241 (2010).
Article PubMed Google Scholar
Peiris, J. S. et al. Clinical progression and viral load in a community outbreak of coronavirus-associated SARS pneumonia: A prospective study. Lancet 361(9371), 1767–1772 (2003).
Article CAS Google Scholar
Ma, J. & Earn, D. J. D. Generality of the final size formula for an epidemic of a newly invading infectious disease. Bull. Math. Biol. 68, 679–702. https://doi.org/10.1007/s11538-005-9047-7 (2006).
Article MathSciNet PubMed PubMed Central MATH Google Scholar
Park, S. W. et al. Forward-looking serial intervals correctly link epidemic growth to reproduction numbers. PNAS 118(2), e2011548118. https://doi.org/10.1073/pnas.2011548118 (2021).
Article CAS PubMed Google Scholar
Cori, A., Ferguson, N. M., Fraser, C. & Cauchemez, S. A new framework and software to estimate time-varying reproduction numbers during epidemics. Am. J. Epidemiol. 178(9), 1505–1512. https://doi.org/10.1093/aje/kwt133 (2013).
Article PubMed Google Scholar
Gómez-Carballa, A. et al. Phylogeography of SARS-CoV-2 pandemic in Spain: A story of multiple introductions, micro-geographic stratification, founder effects, and super-spreaders. Zool. Res. 41(6), 605–620. https://doi.org/10.24272/j.issn.2095-8137.2020.217 (2020).
Article PubMed PubMed Central Google Scholar
Kwok, K. O., Lai, F., Wei, W. I., Wong, S. Y. S. & Tang, J. W. T. Herd immunity—Estimating the level required to halt the COVID-19 epidemics in affected countries. J. Infect. 80(6), e32–e33. https://doi.org/10.1016/j.jinf.2020.03.027 (2020).
Article CAS PubMed PubMed Central Google Scholar
Public Health England, “Investigation of novel SARS-COV-2 variant: Variant of Concern 202012/01” (2020); www.gov.uk/government/publications/investigation-of-novel-sars-cov-2-variant-variant-of-concern-20201201.
Actualización de la situación epidemiológica de las variantes de SARS-CoV-2 de preocupación (VOC) e interés (VOI) en salud pública en España, 18 de octubre de 2021, https://www.mscbs.gob.es/profesionales/saludPublica/ccayes/alertasActual/nCov/documentos/COVID19_Actualizacion_variantes_20211018.pdf. Accessed 22 Oct 2021.
Cherian S, Potdar V, Jadhav S, Yadav P, Gupta N et al. Convergent evolution of SARS-CoV-2 spike mutations, L452R, E484Q and P681R, in the second wave of COVID-19 in Maharashtra, India. BioRxiv. 2021. Preprint at https://doi.org/10.1101/2021.04.22.440932.
Davies, N. G. et al. Estimated transmissibility and impact of SARS-CoV-2 lineage B.1.1.7 in England. Science 372, 6538. https://doi.org/10.1126/science.abg3055 (2021).
Article CAS Google Scholar
Liu, Y. & Rocklöv, J. The reproductive number of the Delta variant of SARS-CoV-2 is far higher compared to the ancestral SARS-CoV-2 virus. J. Travel Med. 28(7), taab124. https://doi.org/10.1093/jtm/taab124 (2021).
Article PubMed PubMed Central Google Scholar
Uriu K, Kimura I, Shirakawa K, Takaori-Kondo A, Nakada T, Kaneda A, The genotype to phenotype Japan (G2P-Japan) consortium, So Nakagawa, Kei Sato. Ineffective neutralization of the SARS-CoV-2 Mu variant by convalescent and vaccine sera. bioRxiv; 2021. https://doi.org/10.1101/2021.09.06.459005
Buss, L. F. et al. Three-quarters attack rate of SARS-CoV-2 in the Brazilian Amazon during a largely unmitigated epidemic. Science 371, 288–292. https://doi.org/10.1126/science.abe9728 (2021).
Article CAS PubMed ADS Google Scholar
Taylor, L. Covid-19: Is Manaus the final nail in the coffin for natural herd immunity?. BMJ 372, n394. https://doi.org/10.1136/bmj.n394 (2021).
Article PubMed Google Scholar

Download references

Funding

This work was supported by the University of Alicante [COVID-19 2020-41.30.6P.0016 to CB] and the Montgó-Dénia Research Station (Agreement Ajuntament de Dénia-O.A. Parques Nacionales, Ministry of the Environment—Generalitat Valenciana -Conselleria de Agricultura, Desarrollo Rural, Emergencia Climática y Transición Ecológica, Spain, Spain) [2020-41.30.6O.00.01 to CB].

Author information

Authors and Affiliations

Department of Applied Mathematics, University of Alicante, Alicante, Spain
David García-García, Enrique Morales & Isabel Vigo
Multidisciplinary Institute for Environmental Studies “Ramon Margalef”, University of Alicante, Alicante, Spain
Eva S. Fonfría & Cesar Bordehore
Department of Ecology, University of Alicante, Alicante, Spain
Cesar Bordehore

Authors

David García-García
View author publications
You can also search for this author in PubMed Google Scholar
Enrique Morales
View author publications
You can also search for this author in PubMed Google Scholar
Eva S. Fonfría
View author publications
You can also search for this author in PubMed Google Scholar
Isabel Vigo
View author publications
You can also search for this author in PubMed Google Scholar
Cesar Bordehore
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

D.G-G. and C.B. designed the study, and D.G-G. wrote the first version of the manuscript. D.G-G., I.V. and E.M. performed the mathematical analysis. All authors reviewed the last version of the manuscript.

Corresponding author

Correspondence to Cesar Bordehore.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

García-García, D., Morales, E., Fonfría, E.S. et al. Caveats on COVID-19 herd immunity threshold: the Spain case. Sci Rep 12, 598 (2022). https://doi.org/10.1038/s41598-021-04440-z

Download citation

Received: 26 July 2021
Accepted: 17 December 2021
Published: 12 January 2022
DOI: https://doi.org/10.1038/s41598-021-04440-z

This article is cited by

Estimates of the collective immunity to COVID-19 derived from a stochastic cellular automaton based framework
- Isaías Lima
- Pedro Paulo Balbi
Natural Computing (2022)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.