Introduction

The coronavirus disease of 2019 (COVID-19) pandemic caused by the novel SARS-CoV-2 virus was present on the west coast of the United States in either late 2019 or early 20201. Since, there have been multiple waves of infection and ongoing viral mutations. According to the United States Centers for Disease Control and Prevention (CDC), as of April 2022, approximately 80 million people have contracted SARS-CoV-2 and 973,000 people have died2. In California alone, as of April 2022, there were more than 9,200,000 cases and 89,582 deaths2. The initial virus, termed the Founder variant, has mutated multiple times into other variants namely Alpha, Delta, and Omicron. With each mutation, the virus has become increasingly transmittable, and associated with a reduction in hospitalizations and fewer severe lower respiratory tract issues, the primary cause of COVID-19 deaths, compared to the Delta variant2.

Despite virus mutation, vaccination remains the best protection against hospitalization, death, and lessens the risk for post-covid conditions or post-acute sequelae of SARS-CoV-2 infection (i.e. long-COVID), which is defined as ongoing COVID-19 symptoms beyond the usual duration of acute disease3,4,5. According to the World Health Organization, approximately 63% of the US population is fully vaccinated6. Similarly, 66ā€“67% of the population in the United Kingdom, Canada, and Australia are fully vaccinated6. There are several reasons why vaccinate uptake in the US is low. Barriers to vaccination can be structural or attitudinal7. According to Fisk7, structural barriers relate to access issues (e.g. cost, convenience, and supply chain issues), whereas attitudinal barriers are associated with low perceived risk of acquiring the disease or potential severe consequences from the disease, perceived risks of the vaccine, lack of trust of agencies that are responsible for the development and distribution of vaccines, misinformation, or misconceptions7. Therefore, ongoing research is necessary to confront the attitudinal barriers members of the population may have to promote increased vaccine uptake to the target of 70ā€“90% to achieve herd immunity and thus prevent and mitigate the ongoing significant morbidity and mortality, disruptions to society, and long-term health consequences8.

Severe COVID-19 infection may lead to viral pneumonia and acute respiratory distress syndrome (ARDS). One study by Wu et al.9 found that 42% of patients with COVID-19 pneumonia developed ARDS9. In addition to acute complications such as ARDS, a recent publication from the National Health Service in the United Kingdom suggested that chronic cough, respiratory fatigue, and fibrotic lung disease complicate long-term recovery from COVID-1910. The National Health Service11 examined long-term symptoms of diseases caused by other coronaviruses, such as severe acute respiratory syndrome (SARS) and Middle East respiratory syndrome (MERS) and found that up to 30% of patients had persistent lung abnormalities after recovering from the acute illness stage11. Furthermore, a study by Huang et al.12 linked respiratory symptom clusters with a higher risk of long-term COVID-19 or long-haul COVID-1912. Collectively, these studies illustrate the importance of the respiratory system in SARS-CoV-2 infection and subsequent COVID-19 diseases.

The purpose of this study was to assess the prevalence of upper and lower respiratory tract symptoms across SARS-CoV-2 variants, to determine the effect of vaccination status on the symptoms, and to evaluate the risks of mortality based on variant type and vaccination status. A retrospective review of 55,406 medical records from hospitalized patients with confirmed SARS-CoV-2 infection within the University of California Health Covid Research Data Set (UC CORDS) was performed.

Methods

University of California health covid research data set (UC CORDS)

The UC CORDS data set comprises de-identified health data across all facilities in the University of California (UC) Health system, encompassing 19 health professional schools, five academic medical centers, and 12 hospitals13. It contains the records of more than 700,000 patients, including those hospitalized and outpatient, with de-identified information to enable safe and secure clinical research. The data in the UC CORDS database is stored in an online enclave and is not available to the general public, only those with access granted by the University of California may access UC CORDS. This study was deemed exempted from obtaining ethical approval by the University of California, Irvine, Internal Review Board. All methods were carried out in accordance with relevant university guidelines and regulations. The UC CORDS data set contains de-identified data from individuals seeking care in the University of California Health System, as such the Institutional Review Board at the University of California, Irvine, waived the need for obtaining informed consent. No experimental protocols were used in this study of de-identified data; therefore, no approvals were sought from the University.

Variant and vaccination status

The UC CORDS data set did not report variant type; therefore, variants were identified based upon dates when each variant was dominant as reported by the CDC data tracker2. Although the CDC data tracker contains national-level data, the data from the California Department of Public Health in daily trends of the number of COVID-19 cases was not remarkably different to warrant focusing on California-specific data, as shown in the Supplementary Information Figs.Ā 1 and 22,14. Moreover, this method was also previously used by Wang et al.3 to classify COVID-19 patient in their analysis of outcomes for the US population; although it may lack the precision of including variant-confirmed speciation from laboratory testing, it is nonetheless a measure that reflects the dominant variant waves that infected the US population at ongoing timepoints throughout the pandemic3. Accordingly, the date ranges extended from 01/01/2020 to 06/30/2020 for the Founder variant, 06/30/2020 to 05/31/2021 for the Alpha variant, 06/01/2021 to 11/30/2021 for the Delta variant, and from 12/01/2021 to 04/26/2022 for the Omicron variant. Although vaccines were not available for COVID-19 infection until December 2020, SARS-CoV-2 variants were included in this study prior to the vaccine being available were included primarily to depict the evolutionary changes in symptom presentation over time with each variant, rather than to focus on a comparison of fully vaccinated, partially vaccinated, or unvaccinated status. Patients who received at least two doses of the vaccine before their positive test result were considered fully vaccinated. Patients who received one dose of the vaccine before their positive test result were considered partially vaccinated, and those who received no vaccine were considered unvaccinated.

Inclusion and exclusion criteria and sample

The study involved review of electronic health record data in the UC CORDS data set. Inclusion criteria for this study included all patients, regardless of age, who had a positive test anywhere in the hospital setting (i.e. emergency department, intensive care unit, or any other hospital unit). For any given positive test, respiratory features data was included in a window of 5 days prior to the test result and up to 30Ā days after a positive RT-PCR test for SARS-CoV-2. Exclusion criteria for the study included those whose data was obtained from non-hospital (i.e. clinic-based) outpatient settings and those who had a positive RT-PCR SARS-CoV-2 test outside of the predetermined window.

Demographic data were obtained by searching the data set for each demographic variable of interest, including age, gender, and race/ethnicity data. Race and ethnicity data were included to ensure the data set included a representational sample of the general population in California. Comorbidity data was obtained by searching the data set for the 200 most common ICD-10 codes listed for the patients and then filtered further by including duplicate terms (e.g. ā€œchronic obstructive pulmonary disease (COPD)ā€ and ā€œCOPDā€) and removing terms that were not pertinent to this study (e.g. pregnancy).

Respiratory feature identification and extraction

The 40 most reported features across all body systems were extracted from each variant through a query in the UC CORDS data set. Forty features were chosen as an initial starting point for assessing the number of features based on historical work that found features may range from as few as 1715 to as many as 50 features16. In this study, the total number of features per variant ranged from 27 features during the Founder wave and up to 34 features during the Delta variant. Therefore, there were no remaining features undiscovered from the search results.

The preliminary search to identify prominent respiratory features involved running a query for the top 40 most frequent ICD-10 codes amongst all the patients in the data set. The term ā€œfeaturesā€ was substituted for ICD-10 codes to account for the differing nature of the results; for example, some ICD-10 codes are medical diagnoses, (e.g. acute respiratory failure or pneumonia) while others may be symptoms that a patient reports (e.g. cough or nasal congestion), while others may be signs that a medical provider assesses (e.g. dyspnea). The feature selection was then compared with historical work by others which demonstrated the most prevalent signs and symptoms (i.e. ā€œfeaturesā€) that affected people with acute and post-acute SARS-CoV-2 infection15,16 to assess face validity and consistency with prior work of the retrieved data. Of these most reported features across each variant, the non-respiratory features were classified as ā€œunclassifiedā€ and the remaining respiratory features were assigned classification into lower and upper respiratory features through expert consultation and discussion amongst the research group. All features, their ranking, and their classifications are provided in a tabular format in Supplementary Information Tables 1, 2, 3 and 4. The reported frequency of each feature was then normalized per 100 cases.

Statistical analysis

Statistical analysis was performed using odds ratios to determine the risk of death for each variant while accounting for vaccination status and adjusted odds ratios were calculated while controlling for age and gender. In this study, age was represented as a continuous variable and the reference category for gender was men. The two-tailed Chi-square test was used to study the effect of respiratory symptoms of COVID-19 on vaccination status across variants. Chi-square tests are used to study the relationship between categorical variables using a contingency matrix. The tests compared the relationship between the frequency of patients who did and did not report a particular symptom. The contingency matrices were created for each variant separately and they each compared the frequency of a particular symptom between fully and not fully vaccinated patients. A p-value ofā€‰<ā€‰0.05 was considered statistically significant. Analyses were conducted using Python (version 3.6) and the SciPy package (version 1.8.0). Because group sizes between the fully vaccinated and partially vaccinated were disproportionate, balancing of the groups was done using the SKlearn (version 1.1.3) package. This was done by randomly sampling patients from the larger group to match the size of the smaller group. This subsampled data was then placed into a logistic regression model using the SKlearn package with age and gender as covariates to obtain adjusted odds ratios for mortality.

Results

Demographics and comorbidities

Tables 1, 2, 3 and 4 provide demographic information for the 55,406 patients included based on variant and vaccine status (Founder (nā€‰=ā€‰2319), Alpha (nā€‰=ā€‰16,753), Delta (nā€‰=ā€‰7280), and Omicron (nā€‰=ā€‰29,054)). Across all variants, the fully vaccinated population was the oldest, followed by the partially vaccinated, and finally with the unvaccinated population being the youngest. Additionally, each of the major variant waves included more females than males, and the sample was predominantly White, followed by Hispanic or Latino, and then Asian, African-American, Native Hawaiian or Pacific Islander, American Indian or Alaskan Native, and Unknown or other.

Table 1 Demographic information of patients with a positive SARS-CoV-2 result during the Founder variant wave (nā€‰=ā€‰2319 SARS-CoV-2 infections).
Table 2 Demographic information of patients with a positive SARS-CoV-2 result during the Alpha variant wave (nā€‰=ā€‰16,753 SARS-CoV-2 infections).
Table 3 Demographic information of patients with a positive SARS-CoV-2 result during the Delta variant wave (nā€‰=ā€‰7280 SARS-CoV-2 infections).
Table 4 Demographic information of patients with a positive SARS-CoV-2 result during the Omicron variant wave (nā€‰=ā€‰29,054 SARS-CoV-2 infections).

Tables 5, 6, 7 and 8 provide patient comorbidity data delineated by variant. Across all variants, the fully vaccinated population had a higher frequency of chronic conditions such as anemia, atrial fibrillation, COPD, cancer, gastrointestinal reflux disease (GERD), heart failure, hypertension, immunocompromised, kidney disease, obesity, and type 2 diabetes mellitus.

Table 5 Comorbidity information of patients with a positive SARS-CoV-2 result during the Founder variant wave (nā€‰=ā€‰2319 SARS-CoV-2 infections) expressed as frequency percentage.
Table 6 Comorbidity information of patients with a positive SARS-CoV-2 result during the Alpha variant wave (nā€‰=ā€‰16,753 SARS-CoV-2 infections) expressed as frequency percentage.
Table 7 Comorbidity information of patients with a positive SARS-CoV-2 result during the Delta variant wave (nā€‰=ā€‰7280 SARS-CoV-2 infections) expressed as frequency percentage.
Table 8 Comorbidity information of patients with a positive SARS-CoV-2 result during the Omicron variant wave (nā€‰=ā€‰29,054 SARS-CoV-2 infections) expressed as frequency percentage.

Respiratory feature frequency

FiguresĀ 1 and 4 show the normalized cases of aggregated respiratory features, upper respiratory features, or lower respiratory tract features based on the variant and vaccination status. Upper respiratory tract features included acute pharyngitis, acute upper respiratory tract infection, cough, disorder of nasal cavity, and nasal congestion. Lower respiratory tract features included abnormal lung findings, acute respiratory failure, dyspnea, hypoxemia, and pneumonia. FiguresĀ 2, 3, 5 and 6 show the normalized cases of upper and lower respiratory features based on variant and vaccination status, respectively. Of note, the frequency of lower respiratory tract features decreased with successive variants while upper respiratory tract features increased.

Figure 1
figure 1

Comparison of frequency of most common respiratory features in COVID positive patients based on fully vaccinated or unvaccinated status.

Figure 2
figure 2

Comparison of frequency of most common upper respiratory features in COVID positive patients based on fully vaccinated or unvaccinated status.

Figure 3
figure 3

Comparison of frequency of most common lower respiratory features in COVID positive patients based on fully vaccinated or unvaccinated status.

Figure 4
figure 4

Comparison of frequency of most common respiratory features in COVID positive patients based on partially vaccinated or unvaccinated status.

Figure 5
figure 5

Comparison of frequency of most common upper respiratory features in COVID positive patients based on partially vaccinated or unvaccinated status.

Figure 6
figure 6

Comparison of frequency of most common lower respiratory features in COVID positive patients based on partially vaccinated or unvaccinated status.

Mortality

Tables 9, 10 and 11 provide mortality data based on vaccination status for all patients in the study across the four major variants. Table 9, which compares mortality data between fully vaccinated and unvaccinated individuals, shows that there was not a statistically significant difference between the groups regarding mortality in the unadjusted analysis. However, in the adjusted analysis, the odds of death for unvaccinated individuals reached significance during the Delta and Omicron waves. During the Delta wave, the adjusted odds ratio of mortality for unvaccinated individuals was 1.21; during the Omicron wave, the adjusted odds ratio of mortality for unvaccinated individuals was 1.17. Additionally, there were substantially more patients in the unvaccinated group (nā€‰=ā€‰1344) than the fully vaccinated group (nā€‰=ā€‰158).

Table 9 Comparison of mortality rate and mortality odds ratio by variants between fully vaccinated and unvaccinated individuals.
Table 10 Comparison of mortality rate and mortality odds ratio by variants between partially vaccinated and unvaccinated individuals.
Table 11 Comparison of mortality rate and mortality odds ratio by variants between fully or partially individuals and unvaccinated individuals.

Table 10 shows mortality data across all variants based on partially vaccinated or unvaccinated status. In the unadjusted analysis, during the Alpha and Delta waves, unvaccinated individuals had a significantly higher likelihood of mortality compared with partially vaccinated individuals (Alpha OR: 1.98, pā€‰=ā€‰0.022; Delta OR: 3.26, pā€‰=ā€‰0.009). However, in the adjusted analysis, there were no statistically significant differences in mortality between partially vaccinated and unvaccinated individuals. Similarly, there were substantially more individuals in the unvaccinated group (nā€‰=ā€‰1381) compared with the partially vaccinated group (nā€‰=ā€‰51).

Table 11 shows the mortality data across all variants between individuals who were either fully vaccinated or partially vaccinated compared to unvaccinated individuals. In the unadjusted analysis, during the Delta wave, unvaccinated individuals had higher risk of death compared to individuals who were either partially or fully vaccinated (Delta OR: 1.62, pā€‰=ā€‰0.012). In the adjusted analysis, unvaccinated individuals were significantly more likely to die compared with individuals who received any vaccination (Delta adjusted OR: 1.17, pā€‰=ā€‰0.024; Omicron adjusted OR: 1.2, pā€‰=ā€‰0.003). Again, there were a higher number of individuals in unvaccinated group (nā€‰=ā€‰1344) than either the partially or fully vaccinated group (nā€‰=ā€‰209).

Effect of vaccination status

The effect of vaccine status on respiratory features for each variant wave was assessed in Tables 12, 13 and 14. Across all variants, a total of 11,556 individuals were fully vaccinated, 4207 were partially vaccinated, and 39,646 were unvaccinated. Tables 12 and 13 show that unvaccinated individuals have an increased odds of having many upper and lower respiratory features. Also, Table 14 shows that unvaccinated individuals have increased odds of many upper and lower respiratory features compared with individuals who received any vaccination.

Table 12 Chi-square tests comparing feature significance in fully vaccinated versus unvaccinated individuals across the Alpha, Delta and Omicron waves.
Table 13 Chi-square tests comparing feature significance in partially vaccinated versus unvaccinated individuals across the Alpha, Delta and Omicron waves.
Table 14 Chi-square tests comparing feature significance in any vaccinated versus unvaccinated individuals across the Alpha, Delta and Omicron waves.

Discussion

Acute COVID-19 infection causes numerous respiratory disorders, and as such, it is necessary to investigate its impacts across the respiratory system as new variants have emerged. However, because of the predilection for SARS-CoV-2 in causing impairments to the lungs and respiratory system, this study particularly focused on assessing the direct consequences to the respiratory system over the course of the pandemic by examining the frequency of the most common features related to the most dominant and prevalent SARS-CoV-2 variants. Additionally, this study sought to assess the effect of vaccination status on respiratory features. A few observations warrant additional discussion.

First, during the Delta and Omicron waves, there was a statistically significant difference in mortality between fully vaccinated and unvaccinated individuals. For partially vaccinated individuals, there was a significant reduction in mortality during the Alpha and Delta waves compared with individuals who were unvaccinated. Additionally, after combining individuals who either were fully vaccinated or partially vaccinated and comparing them with unvaccinated individuals, there were statistically significant reductions in mortality during the Delta and Omicron waves. The adjusted analyses showed that age and gender are confounding variables in the relationship between the independent variable (i.e. vaccination status) and the outcome variable (i.e. mortality). These findings also correspond to the findings reported in Tables 12 and 13, in which there were also significant differences in many prominent features such as pneumonia, acute respiratory failure, and hypoxemia between the fully vaccinated or partially vaccinated individuals and unvaccinated individuals.

Second, a major source of mortality of COVID-19 disease includes acute respiratory distress syndrome (ARDS) and respiratory failure. We show that as variants evolve there is a reduction in lower respiratory tract features, such as pneumonia, hypoxemia, and acute respiratory failure. This finding may be due to increasing rates of vaccination, a reduction in virulence with successive variants, improvement in management of care for patients with acute COVID-19 infection, acquisition of immunity among those reinfected or a combination of any of these factors. The drastic reduction in the lower respiratory symptoms of pneumonia across variants may best demonstrate this phenomenon- during the Founder phase, pneumonia was reported in 36.83 per 100 cases; however, during the Omicron phase, the frequency of pneumonia was reduced to 3.59 in fully vaccinated patients and 4.42 in unvaccinated individuals. Moreover, the statistically significant difference between the fully vaccinated, partially vaccinated, and unvaccinated patients in frequency of pneumonia supports the evidence regarding the immense benefits associated with vaccination in preventing severe disease. Although the reduction in lower respiratory features and increased mortality observed during the Delta period may appear contradictory, this is only evident if it is assumed that the individual died from ARDS; unfortunately, the UC CORDS data set does not contain information regarding the specific cause of death, so it is difficult to determine whether an individualā€™s cause of death was ARDS, some other related sequalae of COVID-19 infection (e.g. myocardial infarction), or some other pathology altogether.

Third, the respiratory features observed with each variant have evolved. In the current study we observed that the incidence of lower respiratory tract features decreased with successive emergence of variants, and that there was a concurrent increase in upper respiratory tract features. For example, Figs.Ā 1 and 4 show a general decrease in frequencies for lower respiratory features with each successive variant. Meanwhile, for upper respiratory symptoms, Figs.Ā 1 and 4 also show a more consistent pattern in overall frequency of upper respiratory features and furthermore, Figs.Ā 2 and 5 show an uptick in the frequency of acute pharyngitis, acute upper respiratory infection, disorder of nasal cavity, and nasal congestion during the Delta and Omicron waves. FiguresĀ 3 and 6 depict the steep declines seen for all the lower respiratory features for successive variants of the COVID-19 pandemic. The findings are consistent with other studies, such as that by Wang et al.3, which showed that the Omicron variant was associated with less likelihood of 3-day risk of emergency department visit, hospitalization, intensive care unit admission, and mortality, because of Omicron being less virulent in causing lung-related disease3.

In contrast to the decreasing lower respiratory symptoms observed in the later SARS-CoV-2 variants, there was a notable increase in the trend of upper respiratory symptoms in this study. In particular, the features of acute pharyngitis, acute upper respiratory infection, and cough all either increased or remained elevated during more recent stages of the pandemic. These findings suggest that infection with more recent SARS-CoV-2 variants produces more upper airway features than lower airway features. More research is necessary to conclude that this is the case, but these findings are congruent with studies involving animals17,18.

Limitations

This study is not without limitations. First, the nature of the retrospective design is biased towards those who either sought care or required hospitalization for COVID-19; thus, these data will not include information regarding patients who did not seek care. Second, this study did not account for the timing of when patients received vaccination and when they became ill with COVID-19. The possibility remains that patients may have received the full doses of the vaccine, but perhaps developed COVID-19 prior to the time required by their body to develop sufficient antibodies. Moreover, UC CORDS does not account for severity of illness when a patient tested positive for COVID-19, it is plausible that a patient may have tested positive for COVID-19 but had few symptoms or may have had many. Third, although the UC CORDS data set contains the records of over 2500 patients infected with the Founder variant, at this stage of the pandemic, the virus was still a novel phenomenon and the UC CORDS database had not yet been fully set up; therefore, the records of some of the patients infected with the Founder variant may be lacking or missing. Fourth, symptom selection was determined through expert identification of symptoms and corresponding ICD-10 code obtained from the electronic health record, so there may be diagnoses which existed but were not captured in the UC CORDS database. Moreover, both ā€œpneumoniaā€ and ā€œviral pneumoniaā€ ranked in the top 40 of feature collection, but due to similarities in presentation, ā€œviral pneumoniaā€ was combined with pneumonia, thus raising the possibility that the actual frequency of pneumonia may have been slightly lower than what was captured in this study due to potential overlap. Fifth, genomic data of the viral strains is unavailable in the UC CORDS data set, and therefore it was necessary to rely on the information from the CDCā€™s data tracker to determine time periods in which a particular strain was most likely to be dominant in the US; it is highly probable that the various strains overlapped as newer strains became more dominant.Ā Sixth, there were likely some patients who developed re-infection resulting in their records to be counted twice leading to inconsistent total number of cases included in the study. Lastly, this study did not consider patients who may have had previous COVID-19 infection or having prior COVID-19 vaccination and a booster dose as having prior immunity; therefore, there may be patients in the not fully vaccinated group who retained some level of immunity related to previous COVID-19 infection.

Future research

This study primarily focused on the most common respiratory features of patients with COVID-19 while controlling for vaccination status. Future studies should examine the frequency data while controlling for the timing of vaccine administration. Furthermore, because vaccination guidelines changed throughout the pandemic, it is necessary to re-examine the data while using the latest vaccine guidelines (e.g. fully vaccinated in this study was considered to be at least two doses; however, patients may have received an adenovirus vaccine, which initially only required one dose, while other may have received a mRNA vaccine, which initially required two doses). As guidelines continue to shift and the virus continues to mutate, future research should take into consideration the changing guidelines and definition of what ā€œfully vaccinatedā€ is considered to mean.

Additionally, as ongoing research demonstrates the significant long-term effects of COVID-19 infection on developing post-acute sequelae of SARS-CoV-2 infection (PASC), it is imperative to assess how acute COVID-19 infection manifests for patients in the long term. Of particular concern is of how the different variants may be associated with the development of PASC symptoms. Finally, as the COVID-19 pandemic continues to cause immense problems for patients and the healthcare system alike, it is essential to examine historical data to help inform present and future decision-making.

Conclusion

This retrospective study examined the frequency of respiratory features across four major variants since the outset of the COVID-19 pandemic. Additionally, patients were categorized based on vaccination status and mortality risk was assessed. This study found that there were significant reductions in the risks of mortality for patients who were vaccinated during the Delta period. Additionally, there are substantially fewer lower respiratory features associated with later variants, such as Omicron. Meanwhile, as the frequency of lower respiratory features has decreased, there is a substantial uptick in the frequency of upper respiratory features. This study also showed substantial favorable benefits in patients who are fully vaccinated compared with the unvaccinated or only partially vaccinated, the fully vaccinated population experienced significantly fewer features involving the upper and lower respiratory tract. This study indicates that because of numerous factors, including viral evolution, enhanced immunity, and likely improved treatment modalities, respiratory features involving the lower respiratory tract are reported with less frequency compared with earlier stages of the pandemic.