Complement C3 identified as a unique risk factor for disease severity among young COVID-19 patients in Wuhan, China

Given that a substantial proportion of the subgroup of COVID-19 patients that face a severe disease course are younger than 60 years, it is critical to understand the disease-specific characteristics of young COVID-19 patients. Risk factors for a severe disease course for young COVID-19 patients and possible non-linear influences remain unknown. Data were analyzed from COVID-19 patients with clinical outcome in a single hospital in Wuhan, China, collected retrospectively from Jan 24th to Mar 27th. Clinical, demographic, treatment and laboratory data were collected from patients' medical records. Uni- and multivariable analysis using logistic regression and random forest, with the latter allowing the study of non-linear influences, were performed to investigate the clinical characteristics of a severe disease course. A total of 762 young patients (median age 47 years, interquartile range [IQR] 38–55, range 18–60; 55.9% female) were included, as well as 714 elderly patients as a comparison group. Among the young patients, 362 (47.5%) had a severe/critical disease course and the mean age was statistically significantly higher in the severe subgroup than in the mild subgroup (59.3 vs. 56.0, Student's t-test: p < 0.001). The uni- and multivariable analysis suggested that several covariates such as elevated levels of serum amyloid A (SAA), C-reactive protein (CRP) and lactate dehydrogenase (LDH), and decreased lymphocyte counts influence disease severity independently of age. Elevated levels of complement C3 (odds ratio [OR] 15.6, 95% CI 2.41–122.3; p = 0.039) are particularly associated with the risk of developing severe COVID-19 specifically in young patients, whereas no such influence seems to exist for elderly patients. Additional analysis suggests that the influence of complement C3 in young patients is independent of age, gender, and comorbidities. Variable importance values and partial dependence plots obtained using random forests delivered additional insights, in particular indicating non-linear influences of risk factors on disease severity. This study identified increased levels of complement C3 as a unique risk factor for adverse outcomes specific to young COVID-19 patients.


Methods
Study design and participants. This retrospective, single-center cohort study involved adult patients who were diagnosed with COVID-19 pneumonia between January 24th and March 27th, 2020, in the major government designated hospital in Wuhan: Wuhan No.1 Hospital. The date of the last follow-up was April 8th, 2020. The primary outcome was severity at the end of the study period. All patients were residents of Wuhan, and the diagnostic criteria of COVID-19 were based on the Diagnosis and Treatment Protocol for the 2019 Novel Coronavirus Pneumonia published by the National Health Commission of China.
The newly diagnosed patients were required to meet one of the following conditions: (1) positive signals of COVID-19 nucleic acids detected in fluorescent real-time RT-PCR; (2) viral gene sequencing showing a high degree of homology with the new coronavirus COVID-19. Patients with mild symptoms were required to meet the following conditions: (1) history of epidemiology; (2) fever or other respiratory symptoms; (3) CT image abnormalities typical of viral pneumonia. Patients with a severe condition met one of the following conditions: (1) shortness of breath, respiratory rate ≥ 30 breaths/min; (2) oxygen saturation (resting state) ≤ 93%; (3) PaO2/ FIO2 ≤ 300 mm Hg. Critically ill patients were required to meet one of the following conditions: (1) respiratory failure requiring mechanical ventilation; (2) shock; (3) organ failure requiring ICU monitoring.
The following data were collected on admission: age, sex, symptoms from onset to hospital admission (fever, cough, dyspnea, myalgia, rhinorrhea, arthralgia, chest pain, headache, and vomiting), comorbidities (cardiovascular disease, chronic pulmonary disease, cerebrovascular disease and chronic neurological disorders, diabetes, malignancy, and smoking), vital signs (heart rate, respiratory rate, and blood pressure), laboratory values on admission (serum hemoglobin concentration, lymphocyte counts, platelet counts, diverse protein markers), treatment regime used for COVID-19 pneumonia (antiviral agents, antibacterial agents, and Chinese medicine), date of symptom onset, admission, virus testing, CT-scan, as well as condition improvement and living status. The study was approved by the Ethics Committee of Wuhan No.1 Hospital (No. 202008).
Treatment protocol for SARS-CoV-2 pneumonia. The treatment strategy for patients with  pneumonia was based on the guidelines of the WHO 10 , which included symptom relief, treatment of underlying diseases, prevention of superimposed bacterial infections, active prevention of complications such as sepsis and acute respiratory distress syndrome (ARDS) and support organ vital function in a timely fashion. Oxygen supplementation was provided for patients with desaturation by means of high flow oxygen via nasal prong, noninvasive and invasive mechanical ventilation, or extracorporeal membrane oxygenation (ECMO) if required.
Statistical considerations. The outcome in this study was whether or not the patients experienced a severe to critical course of disease. This outcome will be denoted "severe versus mild" in the following. All eligible variables were considered as covariates potentially influencing the outcome in the statistical analysis (supplement section "In-depth description of the statistical analysis flow"). Univariable analysis was performed using logistic regression analysis, where p values were adjusted for multiple testing by means of the Benjamini-Hochberg procedure. The popular multivariate imputation by chained equations (MICE) approach 11 was used to deal with missing values in the multivariable analyses, where 20 imputed data sets were used in each analysis (m = 20). The ratio of C-reactive protein (CRP) versus serum albumin (ALB) correlated very strongly with CRP, and the white blood cell count (WBC) correlated very strongly with absolute neutrophil count (ANC) (ρ > 0.9), which is why CRP and ANC were not considered in the multivariable analysis. The latter analysis was performed separately for the young patients and for all patients together (1) using logistic regression in combination with an automatic forward covariate selection procedure based on the Akaike information criterion (AIC) 12 applicable to multiply imputed data 13 and (2) random forest 14 . As a sensitivity analysis 15 , the Bayesian information criterion (BIC) 16 , which tends to select fewer covariates than the AIC, was also considered and backward selection was performed in addition, both when using the AIC and the BIC. The prediction performance of the models was estimated using 20 times repeated stratified K-fold cross-validation (K = 3, 4, 5), repeating the whole model selection process on each training set in each cross-validation iteration, excluding the corresponding test set 17 . Multiple imputation was performed separately on training and test sets 18 . As prediction performance measures the area under the curve (AUC) and the Brier score were used. Random forest was also used to rank the covariates with respect to their importance for prognosis via the AUC covariate importance values 19 and to estimate the influence forms of the covariates using partial dependence plots (PDPs) 20 . All statistical analyses were performed using the software R, version 3.6.3. All p values smaller than 0.05 were considered as statistically significant, where all statistical tests were performed two-sided. For further details and explanations, the interested reader is referred to the detailed description of the statistical analysis flow in the supplement (section "In-depth description of the statistical analysis flow").
Potential risk factors associated with disease severity in the young patients. In the univariable analysis, elevated level of complement C3, systemic immune-inflammation index (SII), CRP, serum amyloid A (SAA), lactate dehydrogenase (LDH), the ratio of CRP versus ALB, and the ratio of neutrophil versus lymphocyte (NLR) were statistically significantly associated with an increased risk for the development of a severe disease course of COVID-19 infection in the young patients (Table 2; full results, including statistically nonsignificant results, are shown in Supplementary Table 1). In contrast, decreased levels of ALB and lymphocyte (LYM) were statistically significantly correlated with the development of a severe disease course. SAA had the largest covariate importance value, both for young and elderly patients. The covariate importance values (Fig. 1) and the PDPs (Fig. 2, Supplementary Figs. 1 to 4) obtained using random forest analysis suggest that complement C3 is only prognostic in young patients.
Generally, the covariate importance values reveal that, while the number of relevant risk factors is larger for the elderly patients, the difference in importance between the most important risk factors and the remaining risk factors is more pronounced for the young patients. Supplementary Figs. 5 to 10 show the corresponding variable importance values and PDPs obtained for all patients, irrespective of age. Many of the PDPs indicate complex influence forms of the covariates. While SAA is again associated with the largest importance values here, the importance values of complement C3 are relatively small. The latter confirms that a higher level of complement C3 probably is a risk factor only associated with severity in young patients. After obtaining the latter result, additional analyses were performed in order to investigate whether the influence of complement C3 is different for specific subgroups of young patients (see supplement section "Subgroup analysis of the influence of complement C3 in young patients" for details). The results of these analyses did not suggest any relevant dependence of the influence of complement C3 in young patients on age, gender, and comorbidities, indicating that this risk factor is relevant for young patients independent of their specific characteristics (Supplement Figs. 11-13).
Multivariable statistical model-analysis for disease severity. The results of the multivariable analysis using logistic regression in combination with the forward selection algorithm and the AIC criterion showed that the risk for the development of a severe disease course for COVID-19 in young patients was higher for combinations of elevated levels of complement C3, increased SAA and SII, and reduced levels of LYM, plateletlymphocyte ratio (PLR) and uric acid (UA) ( Table 3). Applying the backward selection algorithm, UA was not selected in the multivariable model due to lack of importance, but gender, hypertension, thyroid related disease, and blood urea nitrogen (BUN) were selected instead, indicating potential importance to the risk of developing a severe disease course in young patients (Supplementary Table 2). However, a sensitivity analysis presented in the supplement (section "Model stability analysis") indicates that the results obtained using the forward selection algorithm (Table 3) may be more statistically stable. When using the BIC criterion instead of the AIC criterion, only complement C3 and SAA were shown to be relevant to the severity in these young patients (Supplementary Table 3).
The results of multivariable logistic regression for all patients (including young and elderly) using the AIC criterion and the forward selection algorithm differed partly from that obtained for the young patients. The risk of developing a severe disease course in all patients was high for increased levels of complement C3, SAA, BUN, LDH and immunoglobulin G (IgG) and decreased levels of ALB, PLR and immunoglobulin A (IgA) ( Table 4). Note that the fact that complement C3 was included in the forward selection for all patients is very likely only due to its importance within the young cohort: In the univariable analysis complement C3 was only statistically significant for the young patients and, as seen in Fig. 1, the covariate importance value of complement C3 is only large for the young patients. As revealed by the covariate importance values obtained through the random forest analysis (Fig. 1), increased levels of SAA seem to be similarly associated with the development of a severe disease course in elderly patients as well as in young patients. When using the backward selection algorithm, LYM and blood platelet were selected in addition, indicating potential relevance associated with disease severity (Supplementary Table 4 Table 5) were very similar to those obtained in the model obtained using the AIC (Table 4). Applying the BIC criterion together with the forward selection algorithm delivered the same result. In summary, the estimated prediction performances of the multivariable logistic regression models for all patients were better than those of the models obtained specifically for the young patients ( Table 5). The random forest, which takes non-linear influences into account, performed best and the model selected using the BIC (which included fewer covariates) performed better than that selected using the AIC. Supplementary Table 6 provides an overview of which covariates were selected in each of the models obtained using the AIC and BIC criterion with forward and backward selection.

Discussion
COVID-19, caused by the SARS-CoV-2 virus, is a public health event that poses a serious threat to human health. The number of COVID-19 cases in young adults is higher than expected: According to the US CDC report almost half of the patients are younger than 65, and young patients have a substantial risk of developing a severe disease course. Facing the rapidly evolving circumstances caused by the COVID-19 pandemic, it is essential to prioritize medical resources by effectively conducting clinical stratification of COVID-19 young patients. Although several studies have identified risk factors for severity and mortality in COVID-19 patients 3,5,8 , it remains critical to identify early and investigate potential risk factors specific for the development of a severe/critical disease course for young COVID-19 patients, because there could be strong differences in the functional state of the immune system between the young and elderly population 21 . In order to shed light on potential risk factors associated with a severe disease course in young patients, this study investigated clinical, demographic, treatment, and laboratory data from a group of COVID-19 patients using a set of comprehensive modern statistical methodologies. The univariable analysis suggested that potential risk factors for disease severity in young patients (age ≤ 60 years; n = 762) are in part different from that in elderly patients (age > 60 years; n = 714). Specifically, elevated levels of complement C3 and SAA were only statistically significantly associated with higher risks of a severe disease course in the young patients. In contrast, increased levels of ANC, aspartate aminotransferase (AST), BUN, creatine kinase (CK), creatinine (CR), D-dimer, myo-hemoglobin, and PLR, and decreased levels of red blood cells (RBC) had a statistically significant influence on this risk only for elderly patients. Even though SAA lacked statistical significance in the elderly subgroup, the covariate importance values (Fig. 1) suggest that SAA is an important risk factor irrespective of patients' age. While complement C4 missed significance in both age groups,  www.nature.com/scientificreports/ the covariate importance values and PDPs suggest that this covariate, in addition to complement C3, could be a particularly strong risk factor in the young cohort, while its influence was not statistically significant in elderly patients. Further analysis showed that the significance of complement C3 in young patients was independent of the common risk factors age, gender and the presence versus absence of comorbidities. Risk factors reported previously in COVID-19 infection such as CRP 7 , lactate dehydrogenase 3 and decreased lymphocyte 8 were validated both in young and elderly patient cohorts. Only the D-dimer and procalcitonin did not show significance in the young patient cohort. The PDPs confirmed the observed differences and revealed that the influences of many covariates on the log odds of disease severity of COVID-19 infection are strongly non-linear. However, in logistic regression it is assumed that these influences are linear, which is likely an important reason why the random forests outperformed the multivariable logistic regression models, both for the cohort of young patients and for all patients taken together. The complement family is an important integral component of the innate immune response to viruses, not only protecting the body from infectious agents such as viruses and bacteria, but also playing a key role in promoting inflammatory processes triggering inflammatory cytokine storm 22 . The abnormal activation of various innate immune pathways, such as complement system, cytokines and thrombosis pathways, is considered as the driver of ARDS and may lead to multi-organ dysfunction 23,24 . The activation of the complement system also can be found in patients infected with coronaviruses, such as MERs-CoV, SARS-CoV-1 and SARS-CoV-2, which develop into ARDS 25 . When mice with complement C3 deficiency were infected with SARS-CoV, the infiltration of neutrophils and inflammatory monocytes in the lungs was strongly reduced, and the levels of cytokines and chemokines in the lungs and serum were decreased, as well as the incidence of respiratory failure. This suggests that the activation of the complement component C3 may aggravate the disease of SARS-CoV-related ARDS 24 .
Furthermore, complement C3, involved in the function of innate immunity, has been shown to play a role in the recovery of COVID-19 patients 26 , and critically low levels of this immune component have been shown to be connected with mortality following COVID-19 infection 27 . These results do not contradict the findings of our study. As the key initiator of innate immunity, complement C3 plays a major role in the activation of different immune cells including neutrophils and macrophages 25 . Critically low levels of complement C3 indicate an inability for the immune response to initiate, causing an immediate failure of anti-viral immune protection, whereas elevated levels of complement C3 may lead to excessive production of cytokine via diverse signaling pathways, causing a cytokine storm 25 . The majority of young patients (97.5%) in our cohort had a normal or elevated level of complement C3, reflecting the latter case. Young patients have fewer underlying diseases and are more immune-related compensated. This may be the reason why many indicators including D-dimer and procalcitonin do not change strongly, even when patients progress to severe COVID-19. However, young patients have more active immune function, which is why elevated levels of serum complement C3 may be a potential indicator of the severity of COVID-19 patients. A possible explanation why complement C3 did not have the Table 2. Univariable logistic regression for the outcome "severe versus mild". The p values were adjusted for multiple testing separately for the analysis of all patients, young patients, and elderly patients. Increased levels of complement C3 and SAA were associated with an increased risk for severity only in the young patient cohort. Supplementary Table 1  www.nature.com/scientificreports/ same prognostic role for elderly patients would be the immunosenescence caused by aging 21 . The hyperactivation of complement C3 alone does not suffice to induce activities of different immune cells. This study possesses limitations aside from those inherent to all retrospective cohort studies such as their lack of causal inference. First, it is a single-center study featuring a limited number of cases. Second, the patient data were collected within 3 days after hospital admission, leading to missing data in a number of variables, which were imputed with a standard statistical approach.  Table 3. Multivariable logistic regression models for the outcome "severe versus mild" in young patients selected using the AIC criterion and forward selection. www.nature.com/scientificreports/ In summary, this study conducted a comprehensive statistical analysis with a focus on non-linear relationships to identify risk factors and possible pathogenesis for the development of a severe disease course during COVID-19 infection in young patients. However, large-scale and multi-center analysis is needed to further build on the knowledge obtained.

Regression coefficient Odds ratio
Received: 11 July 2020; Accepted: 22 January 2021 Table 4. Multivariable logistic regression models for the outcome "severe versus mild" in all patients irrespective of age selected using the AIC criterion and forward selection.