Introduction

The pandemic caused by the coronavirus disease 2019 (COVID-19), which is associated with the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has affected almost every corner of the world. As of November 10th, 2020, almost 50 million cases have been confirmed, including more than 1.2 million deaths according to a report by the World Health Organization (WHO)1, where the numbers of cases and deaths are expected to continue to rise. The clinical spectrum of COVID-19 appears to be wide, encompassing asymptomatic infection, mild upper respiratory illness, neurological symptoms, renal and gastrointestinal complications, severe viral pneumonia with respiratory failure, multiple organ failure and even death2,3,4,5. Approximately 20–25% of patients will have a severe disease course.

Despite numerous studies showing a higher risk of severe COVID-19 in elderly patients, a substantial proportion of young patients also have an increased risk of developing a severe course. According to a report from the U.S. CDC, 47% of hospitalized patients are under the age of 65, as are 48% of those admitted to intensive care units (ICU)6. Although potential risk factors for mortality were reported to include advanced age, male gender, presence of comorbidities, the development of a cytokine storm and an immunocompromised status8,9, the risk factors for the development of a severe course specific to young patients (≤ 60 years old) remain under investigation. Of note, to date, a large proportion of studies applied logistic regression to analyze risk factors related to COVID-19 infection, assuming an underlying causal linear influence on the log odds2,3,4,5. However, given the intricate complexity of COVID-19 infection, statistical analysis with the consideration of non-linear relationships might provide more insightful information on COVID-19 related potential risk factors.

In an effort to fill these gaps, an important aim of this single-center study was to analyze clinical, demographic and treatment data of patients sequentially admitted into the Wuhan No.1 hospital, in an attempt to elucidate risk factors and main causes among young COVID-19 patients for experiencing a severe disease course. A further aim was to use the large amount of data available in this study to foster knowledge about general, age-independent, and non-linear relational risk factors for a severe disease course.

Methods

Study design and participants

This retrospective, single-center cohort study involved adult patients who were diagnosed with COVID-19 pneumonia between January 24th and March 27th, 2020, in the major government designated hospital in Wuhan: Wuhan No.1 Hospital. The date of the last follow-up was April 8th, 2020. The primary outcome was severity at the end of the study period. All patients were residents of Wuhan, and the diagnostic criteria of COVID-19 were based on the Diagnosis and Treatment Protocol for the 2019 Novel Coronavirus Pneumonia published by the National Health Commission of China.

The newly diagnosed patients were required to meet one of the following conditions: (1) positive signals of COVID-19 nucleic acids detected in fluorescent real-time RT-PCR; (2) viral gene sequencing showing a high degree of homology with the new coronavirus COVID-19. Patients with mild symptoms were required to meet the following conditions: (1) history of epidemiology; (2) fever or other respiratory symptoms; (3) CT image abnormalities typical of viral pneumonia. Patients with a severe condition met one of the following conditions: (1) shortness of breath, respiratory rate ≥ 30 breaths/min; (2) oxygen saturation (resting state) ≤ 93%; (3) PaO2/FIO2 ≤ 300 mm Hg. Critically ill patients were required to meet one of the following conditions: (1) respiratory failure requiring mechanical ventilation; (2) shock; (3) organ failure requiring ICU monitoring.

The following data were collected on admission: age, sex, symptoms from onset to hospital admission (fever, cough, dyspnea, myalgia, rhinorrhea, arthralgia, chest pain, headache, and vomiting), comorbidities (cardiovascular disease, chronic pulmonary disease, cerebrovascular disease and chronic neurological disorders, diabetes, malignancy, and smoking), vital signs (heart rate, respiratory rate, and blood pressure), laboratory values on admission (serum hemoglobin concentration, lymphocyte counts, platelet counts, diverse protein markers), treatment regime used for COVID-19 pneumonia (antiviral agents, antibacterial agents, and Chinese medicine), date of symptom onset, admission, virus testing, CT-scan, as well as condition improvement and living status. The study was approved by the Ethics Committee of Wuhan No.1 Hospital (No. 202008).

Treatment protocol for SARS-CoV-2 pneumonia

The treatment strategy for patients with COVID-19 pneumonia was based on the guidelines of the WHO10, which included symptom relief, treatment of underlying diseases, prevention of superimposed bacterial infections, active prevention of complications such as sepsis and acute respiratory distress syndrome (ARDS) and support organ vital function in a timely fashion. Oxygen supplementation was provided for patients with desaturation by means of high flow oxygen via nasal prong, non-invasive and invasive mechanical ventilation, or extracorporeal membrane oxygenation (ECMO) if required.

Statistical considerations

The outcome in this study was whether or not the patients experienced a severe to critical course of disease. This outcome will be denoted “severe versus mild” in the following. All eligible variables were considered as covariates potentially influencing the outcome in the statistical analysis (supplement section “In-depth description of the statistical analysis flow”). Univariable analysis was performed using logistic regression analysis, where p values were adjusted for multiple testing by means of the Benjamini–Hochberg procedure. The popular multivariate imputation by chained equations (MICE) approach11 was used to deal with missing values in the multivariable analyses, where 20 imputed data sets were used in each analysis (m = 20). The ratio of C-reactive protein (CRP) versus serum albumin (ALB) correlated very strongly with CRP, and the white blood cell count (WBC) correlated very strongly with absolute neutrophil count (ANC) (ρ > 0.9), which is why CRP and ANC were not considered in the multivariable analysis. The latter analysis was performed separately for the young patients and for all patients together (1) using logistic regression in combination with an automatic forward covariate selection procedure based on the Akaike information criterion (AIC)12 applicable to multiply imputed data13 and (2) random forest14. As a sensitivity analysis15, the Bayesian information criterion (BIC)16, which tends to select fewer covariates than the AIC, was also considered and backward selection was performed in addition, both when using the AIC and the BIC. The prediction performance of the models was estimated using 20 times repeated stratified K-fold cross-validation (K = 3, 4, 5), repeating the whole model selection process on each training set in each cross-validation iteration, excluding the corresponding test set17. Multiple imputation was performed separately on training and test sets18. As prediction performance measures the area under the curve (AUC) and the Brier score were used. Random forest was also used to rank the covariates with respect to their importance for prognosis via the AUC covariate importance values19 and to estimate the influence forms of the covariates using partial dependence plots (PDPs)20. All statistical analyses were performed using the software R, version 3.6.3. All p values smaller than 0.05 were considered as statistically significant, where all statistical tests were performed two-sided. For further details and explanations, the interested reader is referred to the detailed description of the statistical analysis flow in the supplement (section “In-depth description of the statistical analysis flow”).

Ethics, consent and permissions

The study was approved by the Ethics Committee of Wuhan No.1 Hospital (No. 202008). All methods were carried out in accordance with relevant guidelines and regulations.

Consent to publish

Informed consent was obtained from all patients for this study.

Results

Demographic and clinical features

The young cohort (age ≤ 60 years old) consisted of 762 patients (median age 47 years, interquartile ranges [IQR] 38–55, range 18–60; 55.9% female), who were admitted between January 2020 and March 2020 to Wuhan No.1 Hospitals. As shown in Table 1, 400 (52.5%) of the young patients had a mild condition during hospitalization (mild subgroup), while 362 (47.5%) developed a severe or critical disease course (severe subgroup). The mean age was statistically significantly higher in the severe subgroup than in the mild subgroup (59.3 vs. 56.0, Student's t-test: p < 0.001). 145 (19.8%) patients were affected by underlying diseases, hypertension (14.3%) being the most common one (Table 1). The median body temperature on admission was 36.6 °C (IQR 35.6–37.2), no difference was seen in median temperature between the two subgroups (mild vs. severe). The hospitalization in the severe subgroup was substantially longer than that in the mild subgroup (20.0 days [IQR 15.0–25.0] vs. 8.0 days [IQR 5.0–13.0]). The majority of patients received Chinese traditional medicine (93.4%), antiviral treatments (86.7%), and antibiotics (75.3%). In this cohort, the most frequently applied oxygen therapy was usual oxygen care (UOC) (78.2%). All key laboratory findings of this cohort are listed in Table 1. Additionally, an elderly cohort of patients (median age 69 years [IQR] 65–75, range 61–97, 52.8% female) was included, mainly for statistical comparisons with the young cohort.

Table 1 Clinical and demographic characteristics, treatment and key laboratory findings.

Potential risk factors associated with disease severity in the young patients

In the univariable analysis, elevated level of complement C3, systemic immune-inflammation index (SII), CRP, serum amyloid A (SAA), lactate dehydrogenase (LDH), the ratio of CRP versus ALB, and the ratio of neutrophil versus lymphocyte (NLR) were statistically significantly associated with an increased risk for the development of a severe disease course of COVID-19 infection in the young patients (Table 2; full results, including statistically non-significant results, are shown in Supplementary Table 1). In contrast, decreased levels of ALB and lymphocyte (LYM) were statistically significantly correlated with the development of a severe disease course. SAA had the largest covariate importance value, both for young and elderly patients. The covariate importance values (Fig. 1) and the PDPs (Fig. 2, Supplementary Figs. 1 to 4) obtained using random forest analysis suggest that complement C3 is only prognostic in young patients.

Table 2 Univariable logistic regression for the outcome “severe versus mild”.
Figure 1
figure 1

AUC variable importance with respect to predicting the outcome “severe versus mild” for young and elderly patients calculated using random forests. The larger the importance value of a covariate is, the greater the improvement in prediction performance by including this covariate in prognosis. Complement C3, C4, LYM, the ratio of CRP versus ALB, and UA seem to influence the prognosis of the development of severity mostly in young patients. The bars show the medians of the 20 importance values calculated using the 20 imputed data sets from the multiple imputation. The error bars illustrate the variabilities of the importance values: The lower/upper ends show the first/third quartiles of the 20 importance values, that is, 25% percent of the importance values lie below/above these values. To make the raw importance values comparable between young and elderly patients, both for the young and for the elderly patients, the raw importance values were divided by the means of all importance values with positive sign.

Figure 2
figure 2

Partial dependence plots (PDPs) for young and elderly patients calculated using random forests. In simplified terms, a PDP shows the influence of a covariate on the outcome after adjusting for the influences of the other covariates. The PDPs for the five variables with largest AUC importance values in young patients and that for 'age' are shown. The light lines show the 20 individual PDPs calculated using the imputed data sets from the multiple imputation. The bold lines show averages over the 20 individual PDPs.

Generally, the covariate importance values reveal that, while the number of relevant risk factors is larger for the elderly patients, the difference in importance between the most important risk factors and the remaining risk factors is more pronounced for the young patients. Supplementary Figs. 5 to 10 show the corresponding variable importance values and PDPs obtained for all patients, irrespective of age. Many of the PDPs indicate complex influence forms of the covariates. While SAA is again associated with the largest importance values here, the importance values of complement C3 are relatively small. The latter confirms that a higher level of complement C3 probably is a risk factor only associated with severity in young patients. After obtaining the latter result, additional analyses were performed in order to investigate whether the influence of complement C3 is different for specific subgroups of young patients (see supplement section "Subgroup analysis of the influence of complement C3 in young patients" for details). The results of these analyses did not suggest any relevant dependence of the influence of complement C3 in young patients on age, gender, and comorbidities, indicating that this risk factor is relevant for young patients independent of their specific characteristics (Supplement Figs. 1113).

Multivariable statistical model-analysis for disease severity

The results of the multivariable analysis using logistic regression in combination with the forward selection algorithm and the AIC criterion showed that the risk for the development of a severe disease course for COVID-19 in young patients was higher for combinations of elevated levels of complement C3, increased SAA and SII, and reduced levels of LYM, platelet-lymphocyte ratio (PLR) and uric acid (UA) (Table 3). Applying the backward selection algorithm, UA was not selected in the multivariable model due to lack of importance, but gender, hypertension, thyroid related disease, and blood urea nitrogen (BUN) were selected instead, indicating potential importance to the risk of developing a severe disease course in young patients (Supplementary Table 2). However, a sensitivity analysis presented in the supplement (section “Model stability analysis”) indicates that the results obtained using the forward selection algorithm (Table 3) may be more statistically stable. When using the BIC criterion instead of the AIC criterion, only complement C3 and SAA were shown to be relevant to the severity in these young patients (Supplementary Table 3).

Table 3 Multivariable logistic regression models for the outcome “severe versus mild” in young patients selected using the AIC criterion and forward selection.

The results of multivariable logistic regression for all patients (including young and elderly) using the AIC criterion and the forward selection algorithm differed partly from that obtained for the young patients. The risk of developing a severe disease course in all patients was high for increased levels of complement C3, SAA, BUN, LDH and immunoglobulin G (IgG) and decreased levels of ALB, PLR and immunoglobulin A (IgA) (Table 4). Note that the fact that complement C3 was included in the forward selection for all patients is very likely only due to its importance within the young cohort: In the univariable analysis complement C3 was only statistically significant for the young patients and, as seen in Fig. 1, the covariate importance value of complement C3 is only large for the young patients. As revealed by the covariate importance values obtained through the random forest analysis (Fig. 1), increased levels of SAA seem to be similarly associated with the development of a severe disease course in elderly patients as well as in young patients. When using the backward selection algorithm, LYM and blood platelet were selected in addition, indicating potential relevance associated with disease severity (Supplementary Table 4). Using the BIC criterion together with the forward selection algorithm, SAA, BUN, ALB, and LDH were selected, where the corresponding odds ratios (Supplementary Table 5) were very similar to those obtained in the model obtained using the AIC (Table 4). Applying the BIC criterion together with the forward selection algorithm delivered the same result.

Table 4 Multivariable logistic regression models for the outcome “severe versus mild” in all patients irrespective of age selected using the AIC criterion and forward selection.

In summary, the estimated prediction performances of the multivariable logistic regression models for all patients were better than those of the models obtained specifically for the young patients (Table 5). The random forest, which takes non-linear influences into account, performed best and the model selected using the BIC (which included fewer covariates) performed better than that selected using the AIC. Supplementary Table 6 provides an overview of which covariates were selected in each of the models obtained using the AIC and BIC criterion with forward and backward selection.

Table 5 Performances of the models measured using stratified K-fold cross-validation.

Discussion

COVID-19, caused by the SARS-CoV-2 virus, is a public health event that poses a serious threat to human health. The number of COVID-19 cases in young adults is higher than expected: According to the US CDC report almost half of the patients are younger than 65, and young patients have a substantial risk of developing a severe disease course. Facing the rapidly evolving circumstances caused by the COVID-19 pandemic, it is essential to prioritize medical resources by effectively conducting clinical stratification of COVID-19 young patients. Although several studies have identified risk factors for severity and mortality in COVID-19 patients3,5,8, it remains critical to identify early and investigate potential risk factors specific for the development of a severe/critical disease course for young COVID-19 patients, because there could be strong differences in the functional state of the immune system between the young and elderly population21. In order to shed light on potential risk factors associated with a severe disease course in young patients, this study investigated clinical, demographic, treatment, and laboratory data from a group of COVID-19 patients using a set of comprehensive modern statistical methodologies. The univariable analysis suggested that potential risk factors for disease severity in young patients (age ≤ 60 years; n = 762) are in part different from that in elderly patients (age > 60 years; n = 714). Specifically, elevated levels of complement C3 and SAA were only statistically significantly associated with higher risks of a severe disease course in the young patients. In contrast, increased levels of ANC, aspartate aminotransferase (AST), BUN, creatine kinase (CK), creatinine (CR), D-dimer, myo-hemoglobin, and PLR, and decreased levels of red blood cells (RBC) had a statistically significant influence on this risk only for elderly patients. Even though SAA lacked statistical significance in the elderly subgroup, the covariate importance values (Fig. 1) suggest that SAA is an important risk factor irrespective of patients' age. While complement C4 missed significance in both age groups, the covariate importance values and PDPs suggest that this covariate, in addition to complement C3, could be a particularly strong risk factor in the young cohort, while its influence was not statistically significant in elderly patients. Further analysis showed that the significance of complement C3 in young patients was independent of the common risk factors age, gender and the presence versus absence of comorbidities. Risk factors reported previously in COVID-19 infection such as CRP7, lactate dehydrogenase3 and decreased lymphocyte8 were validated both in young and elderly patient cohorts. Only the D-dimer and procalcitonin did not show significance in the young patient cohort. The PDPs confirmed the observed differences and revealed that the influences of many covariates on the log odds of disease severity of COVID-19 infection are strongly non-linear. However, in logistic regression it is assumed that these influences are linear, which is likely an important reason why the random forests outperformed the multivariable logistic regression models, both for the cohort of young patients and for all patients taken together.

The complement family is an important integral component of the innate immune response to viruses, not only protecting the body from infectious agents such as viruses and bacteria, but also playing a key role in promoting inflammatory processes triggering inflammatory cytokine storm22. The abnormal activation of various innate immune pathways, such as complement system, cytokines and thrombosis pathways, is considered as the driver of ARDS and may lead to multi-organ dysfunction23,24. The activation of the complement system also can be found in patients infected with coronaviruses, such as MERs-CoV, SARS-CoV-1 and SARS-CoV-2, which develop into ARDS25. When mice with complement C3 deficiency were infected with SARS-CoV, the infiltration of neutrophils and inflammatory monocytes in the lungs was strongly reduced, and the levels of cytokines and chemokines in the lungs and serum were decreased, as well as the incidence of respiratory failure. This suggests that the activation of the complement component C3 may aggravate the disease of SARS-CoV-related ARDS24.

Furthermore, complement C3, involved in the function of innate immunity, has been shown to play a role in the recovery of COVID-19 patients26, and critically low levels of this immune component have been shown to be connected with mortality following COVID-19 infection27. These results do not contradict the findings of our study. As the key initiator of innate immunity, complement C3 plays a major role in the activation of different immune cells including neutrophils and macrophages25. Critically low levels of complement C3 indicate an inability for the immune response to initiate, causing an immediate failure of anti-viral immune protection, whereas elevated levels of complement C3 may lead to excessive production of cytokine via diverse signaling pathways, causing a cytokine storm25. The majority of young patients (97.5%) in our cohort had a normal or elevated level of complement C3, reflecting the latter case. Young patients have fewer underlying diseases and are more immune-related compensated. This may be the reason why many indicators including D-dimer and procalcitonin do not change strongly, even when patients progress to severe COVID-19. However, young patients have more active immune function, which is why elevated levels of serum complement C3 may be a potential indicator of the severity of COVID-19 patients. A possible explanation why complement C3 did not have the same prognostic role for elderly patients would be the immunosenescence caused by aging21. The hyperactivation of complement C3 alone does not suffice to induce activities of different immune cells.

This study possesses limitations aside from those inherent to all retrospective cohort studies such as their lack of causal inference. First, it is a single-center study featuring a limited number of cases. Second, the patient data were collected within 3 days after hospital admission, leading to missing data in a number of variables, which were imputed with a standard statistical approach.

In summary, this study conducted a comprehensive statistical analysis with a focus on non-linear relationships to identify risk factors and possible pathogenesis for the development of a severe disease course during COVID-19 infection in young patients. However, large-scale and multi-center analysis is needed to further build on the knowledge obtained.