Evolution of the lethality due to SARS-CoV-2 in Spain according to age group and sex

The emergence of SARS-CoV-2 in China in December 2019 has posed a major challenge to health systems in all countries around the world. One of the most relevant epidemiological measures to consider during the course of a pandemic is the proportion of cases that eventually die from the disease (case fatality ratio, CFR). Monitoring the evolution of this indicator is of paramount importance because it allows for the assessment of both variations in the lethality of the virus and the effectiveness of the control measures implemented by health authorities. One of the problems with estimating the CFR in practice is that the available data only show daily or weekly counts of new cases and deaths; there is no information on when each deceased patient was infected and therefore it is not possible to know exactly how many cases there were at the time the patient became infected. Various approaches have been proposed for calculating the CFR by correcting for the time lag between infection and death. In this paper, we present a novel methodology to perform a non-parametric estimation of the evolution of the CFR by initially identifying an optimal time lag between infections and deaths. The goodness of this procedure is assessed by means of a simulation study and the method is applied to the estimation of the CFR in Spain in the period from July 2020 to March 2022.

least in those countries with advanced health care systems), estimation of prevalence usually requires serological surveys, which are expensive and time-intensive 12 and are therefore not suitable for rapid response to variations in virus lethality.
The case fatality ratio (CFR) is easier to obtain, but the fact that many asymptomatic or mildly symptomatic people are not identified as cases means that the CFR tends to overestimate the severity of the disease. However Luo et al. 13 show that in locations that meet certain conditions (large-scale community transmission, extensive testing and without medical breakdown, as is the case of Spain) the CFR computed from cases detected by PCR (polymerase chain reaction) tests can be considered as a reliable indicator of the lethality of COVID-19. Traditionally, the fatality rate is assumed to be constant throughout the epidemic 14 , but this may not be true in rapidly evolving contexts [15][16][17] such as the current SARS-CoV-2 pandemic. The simplest estimate of real time CFR is to divide the cumulative number of deaths by the cumulative number of cases over a period of time (usually 1 day or 1 week). This estimate is known as the crude (or naïve) CFR 18 . There are a number of reasons why this estimator may be biased 19,20 , but one of the most obvious is that the persons who die on a given day were infected much earlier, and thus the denominator of the CFR should be the total number of patients infected at the same time as those who died 21 . But usually the statistical records available show only daily or weekly numbers of new cases and deaths; there is no information on when each deceased patient was infected and therefore it is not possible to know exactly how many cases there were at the time the patient became infected.
A number of procedures have been proposed to circumvent this problem by correcting the CFR value to take into account the time elapsed between infection and death. Wilson et al. 22 and Baud et al. 21 estimated corrected CFR values for China during February 2020 by dividing the number of deaths on a given day by the number of patients with confirmed COVID-19 infection 13 or 14 days before. These values were chosen based on previous studies showing that the median time from infection to death was in this range. Shim et al. 23 and Newall et al. 24 compute a time delay adjusted CFR for COVID-19 in South Korea up to june 2020 fitting a gamma or lognormal distribution to the survival time of patients using previous case studies. Manisha et al. 25 estimate the CFR for China and for countries outside China before April 2020 also using a 14-day delay between death and cases; but instead of simply dividing deaths by delayed cases, they also fit a regression line to these values and estimates the CFR as the slope of the regression, in an attempt to reduce the bias still present. Rothman et al. 26 calculated age specific time corrected crude symptomatic CFR values from 7 countries using two independent time to fatality correction methods; Thomas et al. 18 use what they call time-shifted distribution (TSD) analysis method. The CFR is obtained as the number of deaths in day t divided by the number of cases in day t − t d where the value t d is obtained as the one that minimizes the root mean squared error in the linear regression of deaths versus delayed cases for t d = 1, 2, . . . , 25.
In this paper a novel method is proposed to estimate the weekly progression of SARS-CoV-2 lethality (CFR) by age group and sex when available data are only daily total counts of cases and deaths. Like Thomas et al. 18 we propose to approximate the number of deaths occurring among the cases registered in the week w by the number of deaths that occur δ weeks later. The CFR is then estimated by using a log-additive model 27 selecting for δ the value which minimizes the Akaike Information Criterion 28 . The goodness of this procedure is assessed by means of a simulation study based on actual data. The method is applied to the estimation of the CFR in Spain in the period from July 2020 to March 2022.

Material and methods
Data source. All data has been obtained from the Carlos III Health Institute (ISCIII) 29 . This institute is the Public Research Organization of the Spanish Government responsible for funding and executing national biomedical research and depends on the Ministry of Science, Innovation and Universities of Spain, and it is also attached to the Ministry of Health, Consumption and Social Welfare. This institution maintains a panel of open data obtained through the epidemiological surveillance network of the National Epidemiology Centre, which can be downloaded from its website 30 . The data available are daily counts of reported COVID-19 cases, number of deaths, number of people hospitalised and number admitted to ICU, broken down by age group and sex.
Data on the number of infections in the first wave of the pandemic are not reliable as only highly symptomatic cases were detected at that time 31 . For this reason, in this study only data from week 20-26 July 2020 to week 14-20 March 2022 (87 weeks) have been considered. The study was carried out separately for three age groups, namely: "less than 50 years", "from 50 to 69 years" and "70 or over years". For each of these groups, the data used in the analysis are the weekly counts of infections and deaths by sex group. More specifically, they are of the form: where Cases Sex,w and Deaths Sex,w denote the count of registered SARS-CoV-2 cases and deaths by sex group and week. Statistical analysis. For each age group, we denote by d Sex,w the count of deaths among those subjects registered as SARS-CoV-2 cases at week w . These deaths will therefore occur over the next few weeks. We assume that: where p Sex (w) is the actual Case Fatality Rate at w , i.e. is the probability of dying for those subjects registered as SARS-CoV-2 cases in week w.
In addition, we assume for p Sex (w) the additive log-model 27 : www.nature.com/scientificreports/ Here, "log" denotes the natural logarithm, β Female = 0 (Female sex is taken as reference) and s(w) is a smooth function of the week, which will be estimated nonparametrically using cubic splines. The ISCIII data do not provide the values of d Sex,w and, therefore, the CFR p Sex (w) cannot be directly estimated. However, Fig. 1 suggests that for each sex group there is a proportionality between the case count at week w ( Cases Sex,w ) and that of deaths δ weeks later ( Deaths Sex,w+δ ) for some delay δ , leading to the fact that: Here we assume that δ depends on sex but not on w . This parameter could be considered as the expected survival time of those cases who die.
In order to determine the delay δ by sex group, we fit the model (3) for values δ ∈ {0, 1, 2, 3, 4, 5} estimating p Sex (w) nonparametrically by means of a cubic spline. For each δ the Akaike Information Criterion 28 AIC(δ) is computed which measures the lack of fit of the model. Finally, we consider as optimum the value of δ that minimizes AIC(δ) . Note that the value of δ obtained depends on the sex group.
We can expect that most of the deaths among the cases registered in week w will occur on the days of week w + δ . There will certainly be deaths among such cases occurring outside that period. Similarly, some of the deaths corresponding to cases registered in weeks close to w will occur in week w + δ . If we assume that p Sex (w) varies smoothly from 1 week to the next, then cases registered in w that die outside w + δ are roughly offset by deaths in w + δ corresponding to cases registered outside but close to w . Thus the smooth variation assumption of p Sex (w) leads us to the approximation: d Sex,w ≈ Death Sex,w+δ , and therefore the CFR p Sex (w) in the log-additive model (2) can be estimated by p Sex (w) in model (3). We examine the goodness-of-fit of this method by a simulation study in the next section.
Once the models given in (1) are estimated for each age group, the rate ratios Males:Females (RR) are obtained as exp (β M ) and estimated by means of 95% confidence intervals (CI). Progression of the CFR is expressed as expected deaths by 10,000 infections-week 10 4 × p Sex,w .
Statistical significance was set at p < 0.05 . Data were analyzed using the R statistical language and environment, version 4.2.1 32 .

Simulation study
As input for this study, the actual data of the number of cases corresponding to the cohort of males aged 70 years or older have been used. The probability of death among those registered at week w (theoretical CFR) has been modelled using a function p(w) with two peaks of lethality. This function is shown in Fig. 3.
The simulation proceeds as follows: 1. As the number of new cases in week w ( Cases w ) we use the number that actually occurred in Spain between 20-26 July 2020 and the week 14-20 March 2022 (87 weeks). 2. For week w in which Cases w infections have been recorded, the number of those who will die, d w is generated from a distribution binomial Cases w , p(w) . 3. Survival times for each patient are generated independently with a common Weibull probability distribution with parameters = 2 and κ = 2.257 . Thus, the expected survival time among those who are going to die is 2 weeks (the estimate for actual data is 1 week, but here we use 2 weeks because this increases the variance in survival time and spreads the number of deaths over longer periods, making the CFR more difficult to estimate). 4. Figure 2a shows the weekly counts of infections ( Cases w ) and the count of deaths ( Deaths w ) obtained in the simulation. Figure 2b show the same data but with deaths delayed by 2 weeks. 5. For values of δ ∈ {0, 1, 2, 3, 4, 5} , we estimate the model Deaths w+δ ∼ binomial Cases w , p(w) , being p(w) = α + s(w) (we are considering only males) . AIC(δ) denotes the corresponding AIC value. Note that the optimal δ coincides with the expected survival time (Weibull distribution). Figure 3 plots the CFR estimated from these data in 100 simulations of the process, as well as the theoretical CFR, and shows that the procedure described produces a good estimate of the CFR used to simulate the data. Table 1 summarizes the six cohorts of study (three age groups for each sex). For the entire follow-up period and the three age groups, the infection rates were similar by sex, but mortality rates were higher in men than in women. Figure 1 displays jointly the weekly counts of cases and deaths (note the different scales on the y-axis) for each age group and sex. Note that the trajectories of deaths exhibit a shift to the right relative to that of infections, which appears to be week independent. Such displacements would correspond to the expected times of survival among those patients who die. Figure 4 shows the values of AIC versus the different delays corresponding to model (3) and their corresponding optima δ. Note that the older the patients, the lower the δ values. When the trajectories Cases Sex,w and the approximations of d Sex,w by Death Sex,w+δ are plotted together (Fig. 5), it can be observed that they show to be in phase. Table 2 summarizes the estimation of the three binomial models, one for each age group. The estimated s(w) curves in the additive models (2) are shown in Fig. 6. It can be observed that for each of the three age groups three peaks were reached during the follow-up period. In addition, Fig. 7 displays the progress in the CFR rate (2) log p Sex (w) = α + β Sex + s(w)

Discussion
The results obtained show how SARS-CoV-2 case fatality ratio has evolved during the course of the pandemic. Once Spain reached a stage of stable extensive testing, this progression of lethality rates could be explained by new virus variants that emerged during the observation period (see Table 3), as well as by the gradual introduction of vaccination against SARS-CoV-2 from the beginning of 2021 (see Fig. 8). Figure 7 shows that CFR increases with age and is higher in men than in women for all age groups. These results are in line with those of 33 , who studies the lethality in the first wave of SARS-CoV-2 in Spain. This sex  www.nature.com/scientificreports/ disparity is likely explained by a combination of biological sex differences (differences in chromosomes and related sex steroids) and gender-specific factors such as differential behaviors. Men are more likely to engage in poor health behaviors such as smoking and alcohol consumption resulting in higher rates of pre-existing co-morbidities (hypertension, cardiovascular disease, COPD) associated with a poor COVID 19 prognosis 34 .
Concerning the smoking habit, current smokers compared with never smokers have significantly upregulated ACE2 expression in the oral and lung epithelium. Given that smoking is more prevalent among males, higher expression of ACE2 due to this risk factor could explain the worse outcomes of SARS-CoV-2 in males 35,36 . Additionally, women have a higher antibody production and more efficacious vaccine responses overall. Furthermore, healthy females are known to have higher numbers of CD4+ T cells, greater CD4+/CD8+ ratios and an increased number of activated T cells, cytotoxic T cells, and B cells compared with males, resulting in a prompter response to the presence of infection 34,37 . Finally, low levels of testosterone in elderly men have been Limitations and future work. The procedure described in this article estimates the CFR at the week w by assuming that for each sex and age group the number of deaths at w + δ follows a binomial Cases w ; p(w) distribution with p(w) similar to p(w) , the true CFR. This approximation is valid when both the variation in the number of cases and the evolution of virus lethality occur smoothly. When a peak occurs in the number of cases the approximation does not work well. It can be seen in the simulation (Fig. 3) how the estimation of the function p(w) is worse around week 70, when there was a peak in the number of cases caused by the irruption of the omicron variant. Further refinements of this work are related with the fact that the CFR may have other biases 19,20 that we have not corrected for. Future work is required to improve the approximation at times when there is a rapid increase in the number of cases or mortality and to introduce corrections for other biases. It is also interesting to study how this CFR estimate can be used to improve estimates of the IFR, the other important lethality index, which is extensively used by health systems for pandemic management.

Conclusion
The procedure we have developed in this paper to estimate the CFR during a pandemic aims to correct for the bias due to the time lag between the time of infection and the time of death. Previous works make this correction by calculating the CFR as the number of deaths at t + δ divided by the number of cases at t . In those papers, several strategies are followed to estimate the appropriate value of δ : using δ values chosen subjectively by the researcher,  Figure 6. Splines corresponding to the progress of the lethality rate (95% CI) according to age group. www.nature.com/scientificreports/ using clinical data to estimate the distribution of survival time, or using the similarity of the curves of cases C(t) and deaths D(t) to determine the value of δ for which a better linear fit is achieved D(t + δ) = C(t) . Our procedure improves this approximation by considering that D(t + δ) follows for each sex and age group a binomial distribution of parameters C(t) and p(t) , where p(t) is estimated non-parametrically by means of a spline that allows us to directly obtain the evolution of the CFR. This procedure could be promising for decision-making in public health policies in a scenario in which the vaccination rate remains constant in a high percentage of the population, and changes in the CFR could basically be attributed to changes in virus lethality.