Introduction

COVID-19, the disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), is responsible for a worldwide pandemic of enormous proportions, in terms of the numbers of both infections and deaths. SARS-CoV-2 infection may be asymptomatic or symptomatic, with severity ranging from a few flu-like symptoms to severe respiratory manifestations requiring supplemental oxygen and to multi-organ failure1,2,3. COVID-19 mortality has been widespread throughout the world, but with considerable temporal and geographical variations4,5,6,7. Although the reasons for these differences are not yet fully elucidated, according to a meta-analysis8 the predictors of mortality from COVID-19 that have been repeatedly observed in different case series from different countries are advanced age, male sex, and pre-existing comorbidities; in addition, some abnormal values of laboratory biomarkers have been associated with poor prognosis.

Italy was the first country after China to be heavily affected by the pandemic, and it has registered excess overall mortality and more than 127 thousand COVID-19-related deaths by July 25, 20219,10. The trends of cases and deaths, in most countries, have not been constant over time, but have fluctuated, with peaks of incidence called "waves". The first wave, in Italy and other European countries, began in January 2020 and lasted until summer 20207,11. Indeed, during the summer there were few cases and deaths, but since the fall the numbers of cases and deaths rose again with subsequent waves9, due in part to virus variants.

There is ongoing discussion about the effects of different waves on mortality in COVID-19 patients11. In particular, a few studies showed that mortality decreased after the first wave (i.e. after June 2020)12,13, but these results have not yet been confirmed by the analysis of country-wide mortality data. Therefore, we analyzed factors associated with COVID-19 mortality in the Italian national case series. In particular, we analyzed the effect of sex, age and period of infection on the overall survival of hospitalized and not hospitalized COVID-19 patients, using a Cox model that is the most commonly used method to deal with time to event data and analyze prognostic factors14.

Materials and methods

Ethical statement

Data used in this study were collected by the Istituto Superiore di Sanità (ISS; Rome, Italy), an agency of the Italian government, for the Italian national integrated COVID-19 surveillance program, and kept in a confidential database. Ethical approval of data collection was handled by ISS. Data were anonymized prior to release or use. The scientific dissemination of these data was authorized by the Italian Presidency of the Council of Ministers on February 27, 2020 (Ordinance n. 640) and August 4, 2020 (Ordinance n. 691). Data were provided to us upon request (protocol no. AOO-ISS-19/04/2921-0014810).

Dataset

Data on people infected with SARS-CoV-2 in Italy (Italian COVID-19 epidemiological surveillance data) were obtained from ISS after filling a request on April 15, 2021, at https://www.iss.it/richiesta-dati-covid19. The series included people whose PCR diagnosis of infection or first symptoms were recorded from January 28, 2020, to July 25, 2021. The data regarded each person’s age at diagnosis, sex, nationality, date of PCR-confirmed diagnosis, presence/absence of symptoms, date of hospitalization (if pertinent), the alive/dead status on the day of data sharing (i.e., July 25, 2021) and, if relevant, the date of death. Data on comorbidities were not provided.

Survival analyses

Survival was analyzed in the entire cohort and, separately, for non-hospitalized and hospitalized patients. For the entire cohort and for non-hospitalized persons, we assessed mortality within 30 days of the date of infection detection (positive PCR test). Instead, for hospitalized patients, we assessed mortality within 30 days of the date of hospitalization. In all analyses, we considered only those persons for whom complete data were available regarding age, sex, date of infection detection or of hospitalization (depending on the group), and status (alive or dead) after 30 days. Patients who died on the day of infection detection or hospitalization were assigned 1 day of follow-up. Patients whose date of death was erroneously reported in the dataset as being before the date of infection detection or hospital admission were excluded from analysis.

Survival analyses considered the effects of age, sex and pandemic wave. In the choice of age groups, we took as reference the age group 15–44 years, instead of the youngest class (0–14 years). The reasons for this decision were that the age group 15–44 years has undergone only marginal alterations in overall mortality during the pandemic in European countries15, and it is larger than the younger age class (0–14 years). Therefore, its use as reference provides good stability for risk estimates. For pandemic waves, we defined the first wave as that from the beginning of the dataset (January 28, 2020) until June 30, 2020. We also defined a second period starting on July 1, 2020 and ending on December 31, 2020, and a third period from this latter date (approximately when vaccination started in Italy) to July 25, 2021. These latter two periods are here called the second and third semesters.

Associations between demographic-clinical features (i.e. age, sex and pandemic wave) and survival were evaluated, using the survival package in R environment (R version 3.6.0), to draw Kaplan–Meier curves and run the log-rank test. The hazard proportionality assumptions were verified through the function “cox.zph()” of the survival package. The variables that were found to impact upon survival in log-rank test (with P < 0.05) were analyzed in a multivariable weighted Cox analysis to account for non-proportional hazards16,17. We did this using the coxphw R package applying the Average Hazards Ratio method, by setting the parameter template = ”AHR”. Cox and log-rank test P values < 0.05 (two-sided) indicated sufficient statistical significance.

Results

The Italian cohort of SARS-CoV-2-positive individuals included 4,333,014 persons of median age 46 years, with a slight predominance of female cases (Table 1). Overall, 50.4% of the cases were symptomatic and 415,390 were hospitalized, with 13.7% of them requiring intensive care. The median age of non-hospitalized patients was much lower than that of hospitalized cases (44 vs. 70 years). Of the entire case series, 240,850 were diagnosed during the first wave, 1,907,690 became infected between July 1, 2020 and December 31, 2020, and 2,104,894 were diagnosed in the first seven months of 2021. At the end of the study, 127,524 subjects were dead, with 35,837 having been diagnosed as infected before June 30, whereas 49,120 and 40,755 dead patients had been diagnosed in the second and third semesters, respectively (for 1812 subjects, information about the date of infection detection was missing). Over 31 thousand non-hospitalized people had died by July 25, 2021, while the total number of deaths among the hospitalized people was 95,907, including 25,320 during the first wave. The median age at death of non-hospitalized patients was 86 years, with 94% of them being ≥ 65 years old (not shown). Instead, the median age of deceased hospitalized patients was 81 years, with 90% of them ≥ 65 years old (not shown). Surprisingly, more than 38 thousand people who were hospitalized were listed in the dataset as being asymptomatic.

Table 1 Demographic and clinical features of 4,333,014 SARS-CoV-2-positive individuals in the Italian national database, from January 28, 2020 to July 25, 2021.

To investigate the factors affecting mortality after SARS-CoV-2 infection, we first drew Kaplan–Meier curves for the whole series and, separately, for non-hospitalized and hospitalized patients. For the whole series, the analysis was limited to 4,224,698 persons, after eliminating 108,316 persons with incomplete data (including 1744 persons for whom the date of death was erroneously listed as being before the date of diagnosis or hospitalization). For non-hospitalized and hospitalized persons, these numbers were 3,816,311 and 412,942, respectively. We tested the effects of age, sex, and pandemic wave (first wave, before June 30, 2020, was taken as reference) on the risk of death 30 days after infection detection or hospitalization. Highly significant associations (log-rank test, P < 2 × 10–16) for all three variables were observed, in the whole series and in the two subsets (Fig. 1). The probability of survival decreased with increasing age and was lower for males than females and for persons who were diagnosed in the first wave.

Figure 1
figure 1

Kaplan–Meier survival curves for the entire Italian series of SARS-CoV-2-positive persons (panels AC), for non-hospitalized persons (panels DF), and for hospitalized COVID-19 patients (panels GI) in the Italian national COVID-19 series, by age group, sex, and pandemic wave (1 = before June 30, 2020, 2 = from July 1 to December 31, 2020, and 3 = after December 31, 2020). Crosses denote censored samples. Numbers of patients at risk for each group, at each time point, are reported under each plot. Log-rank test P < 2 × 10–16, except for panel (E) for which P = 1.2 × 10–11.

Since the proportionality hazard assumptions were not verified in our series (Schoenfeld residuals test, P < 2 × 10–16), we carried out weighted multivariable Cox analyses to deal with non-proportional hazards. Multivariable analysis of mortality in the whole series demonstrated that age, male sex, and the first pandemic wave were highly significant, independent risk factors for death (Table 2). This analysis showed a sharp increase in mortality risk with increasing age, especially in persons ≥ 65 years old and with a tremendously high risk [hazard ratio (HR) = 691; 95% CI, 634 to 753] in the age ≥ 85 years group. Children (0–14 years old) showed a much lower risk of death (HR = 0.120) than both the reference age group and the older groups. Male sex was associated with a ~ twofold risk of death (HR = 2.05), compared to females. The first pandemic wave (before June 30, 2020) was associated with an excess risk of death of almost threefold, compared to the subsequent periods (HR = 0.379 and 0.346 in the second and third semesters, respectively).

Table 2 Factors associated with death in the whole Italian series of people infected by SARS-CoV-2.

Multivariable analysis of non-hospitalized individuals, 30 days after diagnosis, also showed that age, sex, and pandemic wave were all independent poor prognostic factors (Table 3). Again, age conferred the highest risk of death, with an HR > 130 in the groups of subjects 65 years or older and HR > 1400 for those 85 years and older. Indeed, the vast majority of deaths (94%) among non-hospitalized patients regarded patients ≥ 65 years old. The risk estimates associated with male sex and pandemic wave are similar to those for the whole series, except for non-hospitalized patients of the third semester, who had an even lower risk of mortality (HR = 0.21).

Table 3 Factors associated with death in non-hospitalized subjects in the Italian national database.

Finally, multivariable Cox analyses of the smaller subgroup of hospitalized patients also showed that age, sex, and pandemic wave were significantly associated with the risk of death 30 days after hospitalization (Table 4). The estimates of the risk of death in this subgroup were lower than in non-hospitalized patients, as evidenced by the smaller values of HR reaching 59 even in the oldest age group, when compared to the reference group. Finally, being hospitalized after the first wave was associated with a lower risk of death (HR was about 0.7 in both semesters after June 30, 2020), but this effect was less intense than that observed among non-hospitalized persons. Also, the poor prognostic role of male sex was confirmed in these hospitalized patients (HR = 1.44).

Table 4 Factors associated with death in hospitalized COVID-19 patients in the Italian national database.

Discussion

The Italian COVID-19 epidemiological surveillance dataset analyzed here contained information on over 4 million persons molecularly diagnosed with a SARS-CoV-2 infection until July 25, 2021. The dataset included information about age, sex, date of diagnosis, presence vs. absence of symptoms, date of hospitalization (if pertinent), date of death (if pertinent), and a few other data. Survival analyses on the whole series and on subsets of non-hospitalized and hospitalized patients strongly confirmed the pivotal role of age in the probability of survival of COVID-19 patients. The analysis by age category, adjusted for sex and pandemic wave, showed that age groups older than 65 had mortality risks that were hundreds of times greater than that of the 15- to 44-year-old reference class. The 0–14 years age group had a mortality risk that was about 10 times less than that of the reference class. Male sex was also confirmed to be a poor prognostic factor, but with a much smaller effect. Additionally, our analysis demonstrated that being diagnosed during the first pandemic wave (until June 2020) was associated with an approximately threefold higher mortality risk than being diagnosed later.

In non-hospitalized patients, the mortality risk associated with age was greater than that for the whole series. This difference might be explained by the observation that most deceased non-hospitalized patients were very old, with a median age of 86 years. As a possible interpretation, we suppose that some elderly persons deteriorated rapidly and died before they could be hospitalized.

In hospitalized patients, old age was associated with an excess risk of death, as in the whole series, although the statistical estimates were lower. For example, for the age group ≥ 85 years old, the HR was 58.7 for hospitalized patients and 687 for the whole series. The difference may be explained by the facts that hospitalized patients were much older than subjects of the whole series (median age, 70 vs. 46 years), and that hospitalization itself poses an excess risk of death, as age is a known risk factor for hospitalization18,19, including in our series (not shown).

Our finding of age being a risk factor for COVID-19 mortality is in agreement with that of a meta-analysis by Shi et al.8 on 27 studies (including 24 from China, two from the United States, and one from Italy) and a meta-analysis by Booth et al.20 on 66 studies with > 17 million patients from 14 countries. Both meta-analyses found an association between old age and excess risk of mortality from COVID-19, although the quantitative risk estimates differ. Of note, these meta-analyses did not report HRs associated with survival, since no Cox analyses were done. To the best of our knowledge, only one other nation-wide study, conducted in France by Semenzato et al.21, used Cox models to analyze the effects of age on the risk of mortality in a large number of hospitalized COVID-19 patients. Although the age groups differ between the two studies, the risk estimates are similar, with HRs > 50 in elderly patients in both studies.

Several immunological mechanisms responsible for the increased risk of death from COVID-19 in the elderly can be hypothesized. One study demonstrated that pre-existing T-cell immunity induced by circulating human alpha- and beta-coronaviruses is present in young adults but virtually absent in older adults22. Consequently, older adults had a minimal baseline frequency of cross-reactive T cells directed toward the novel SARS-CoV-2; for this reason, they may be at higher risk of severe COVID-19 disease and death. Moreover, the phenomenon of immunosenescence, which involves age-related changes in innate and adaptive immunity, has been imputed as being associated with the increased mortality of older adults infected with SARS-CoV-223. The elderly exhibit a deficient immunologic response to SARS-CoV-2 infection, which may be another reason for their increased risk of severe disease and death24.

In the Italian nationwide COVID-19 series, male sex was an unfavorable prognostic factor for survival, with a risk that was twofold higher than for females in the whole series, and > 80% and ~ 50% higher in non-hospitalized and hospitalized patients, respectively. This result is in agreement with those of several other studies8,20, although the quantitative risk estimates differ. The HRs for male sex calculated in this study, which range from 1.46 to 2.05, are similar to those reported by Semenzato et al.21. Additionally, in an analysis of the excess number of deaths, standardized by age in 29 high-income countries, men were more affected than women in almost all countries25.

The mechanism by which sex is an unfavorable prognostic factor for COVID-19 is not yet known. Most likely, several sex-related factors contribute to the higher risk of males for poorer COVID-19 outcomes. A study of 1683 Italian patients who underwent chest computed tomography at admission showed that men had a higher prevalence of cardiovascular comorbidities, more coronary calcifications, and a higher coronary calcium score than females26. Notably, the higher coronary calcific burden of men appeared to be associated with higher mortality. A study of about 3000 COVID-19 patients in a single center in China observed that the level of inflammatory cytokines in peripheral blood was higher in males than in females27. Also, the percentages of CD19 + B cells and CD4 + T cells were generally higher in female patients during the course of the disease. Overall, males had greater inflammation, lower lymphocyte counts, and lower and delayed antibody responses during SARS-CoV-2 infection and recovery than females. Finally, from the perspective of an immunological mechanism, it has been hypothesized that chronic, subclinical, systemic inflammation, characteristic of aging, and immunosenescence contribute to the excess risk of COVID-19 mortality in elderly men28.

Our multivariable analysis provides strong support for the hypothesis that mortality from COVID-19 was much greater during the first wave (January to June 30, 2020) than later. Indeed, taking the first wave as reference, in the subsequent periods we observed a ~ threefold reduced risk of death, both in the whole series and in non-hospitalized patients. In hospitalized patient, the excess risk of death was ~ 30% lower in the two semesters after the first wave. The excess risk of death associated with pandemic wave was first reported in an Italian study of hospitalized patients12, and then confirmed by studies of Massachusetts healthcare workers13, patients of the U.S. Veterans Affairs healthcare system29, and UK patients30. The reasons for this effect could include the initial lack of preparedness of national health systems for pandemic management, the lack of knowledge about the most effective therapies for COVID-19 patients with severe disease, and the possibility that frailer people were more affected at the beginning of the pandemic than the rest of the population31. Another possible explanation for a lower risk of mortality after June 30, 2020, and in particular, during the third semester of the pandemic, compared with the first semester, may be mass vaccination, which began in January 2021; indeed, vaccines are associated with a reduced risk of severe COVID-19 and mortality32.

In the Italian COVID-19 epidemiological surveillance dataset, more than 2 million infected persons were symptomatic (50.4% of all cases). Modeling studies on the prevalence of infection in different populations suggested that the total number of SARS-CoV-2-positive individuals exceeds symptomatic cases by an order of magnitude or more33,34,35. If this holds true for the Italian population, then ~ 20 million people in Italy have been infected by SARS-CoV-2, i.e., 10 times the 2 million symptomatic cases. Why some infections are asymptomatic and others lead to severe COVID-19 has not yet been elucidated. Cross-reactive immunity, pre-existing in individuals who had been exposed to other coronaviruses, could be one of the mechanisms for asymptomatic and moderate courses of SARS-CoV-2 infection in many individuals36.

A limitation of our study is the lack of data about COVID-19 patients’ comorbidities, which are important risk factors for outcome8. Other possible confounders that we cannot consider in our model are, for instance, demographic, geographic and environmental factors37,38. This lack of information prevented us from analyzing other risk factors for death. Moreover, the reasons why some hospitalized patients were classified as asymptomatic are not known, but their hospitalization may have been due to reasons other than COVID-19. For example, in 5432 cases, the date of SARS-CoV-2 infection detection was after the date of hospitalization, and in 13,144 patients the diagnosis was on the same day. An additional limitation could be an underestimation of the number of cases and deaths during the first wave, due to initial unpreparedness of the health system to deal with the crisis caused by the pandemic and, thus, the initially limited testing capacity39. This might have determined a poorer data collection in the first wave as compared to the following periods.

Overall, this study confirms that age and male sex are independent risk factors for COVID-19 mortality for both hospitalized and non-hospitalized patients. Because age was found to be the most impactful negative prognostic factor, it should be considered in pandemic management, by giving priority to strategies aimed at protecting elderly people. Additionally, this is the first country-wide study to demonstrate a higher risk of death during the first pandemic wave than later. Similar nation-wide studies in different countries, to the best of our knowledge, have not been published. Thus, we cannot compare our study with those from other nations with different mortality rates, and we cannot exclude that such differences are due to unequal pandemic management in the first wave, considering that Italy was the first Western nation to be affected. Our study also suggests that the medical research that started with the pandemic onset and that led to the development of increasingly more effective clinical protocols contributed to improving COVID-19 patient survival25. Despite the limitations of this study, principally due to the lack of some clinical data (e.g. about comorbidities and environmental factors), this study demonstrates the usefulness of a national database for studying a new disease such as COVID-19. Efforts should be made in Italy to create a more detailed national database like those of the United Kingdom40 and France41 that collect more data on demographics, symptoms, diagnostic tests and treatments. National health databases, especially when accompanied by a national biobank of blood samples, offer great possibilities for biomedical research. They allow the construction of cohorts with unparalleled statistical power and help study risk factors for common diseases, rare diseases, and new emerging diseases such as COVID-19. Their availability could impact treatment and public health. Therefore, the creation of such databases in countries that do not yet have them and the creation of European databases are desirable.