Complement C3 identified as a unique Risk Factor for Disease Severity among Young COVID-19 Patients in Wuhan

Background Given that a substantial proportion of the subgroup of COVID-19 patients that face a severe disease course are younger than 60 years, it is critical to understand the disease-specific characteristics of young COVID-19 patients. Risk factors for a severe disease course for young COVID-19 patients and possibly non-linear influences remain unknown. Methods Data of COVID-19 patients with clinical outcome in a designated hospital in Wuhan, China, collected retrospectively from Jan 24th to Mar 27th, were analyzed. Clinical, demographic, treatment and laboratory data were collected from patients' medical records. Uni- and multivariable analysis using logistic regression and random forest, with the latter allowing the study of non-linear influences, were performed to investigate and exploit the clinical characteristics of a severe disease course. Results A total of 762 young patients (median age 47 years, interquartile ranges [IQR] 38 - 55, range 16 - 60; 55.9% female) were included, as well as 714 elderly patients as a comparison group. Among the young patients, 362 (47.5%) had a severe/critical disease course and the mean age was significantly higher in the severe subgroup than in the mild subgroup (59.3 vs. 56.0, Student's t-test: p < 0.001). The uni- and multivariable analysis suggested that several covariates such as elevated levels of ASS, CRP and LDH, and decreased lymphocyte counts are influential on disease severity independent of age. Elevated levels of complement C3 (odds ratio [OR] 15.6, 95% CI 2.41-122.3; p=0.039) are particularly associated with the risk for the development of severity specifically in young patients, where no such influence seems to exist for elderly patients. Additional analysis suggests that the influence of complement C3 in young patients is independent of age, gender, and comorbidities. Variable importance values and partial dependence plots obtained using random forests delivered additional insights, in particular indicating non-linear influences of risk factors on disease severity. Conclusion In young patients with COVID-19, the levels of complement C3 correlated with disease severity and tended to be a good predictor of adverse outcome.

deal with missing values in the multivariable analyses, where 20 imputed data sets were used in each analysis (m = 20). The ratio of CRP versus ALB and CRP, as well as WBC and ANC correlated very strongly (ρ > 0.9), which is why CRP and ANC were not considered in the multivariable analysis. The latter analysis was performed separately for the young patients and for all patients together 1) using logistic regression in combination with an automatic forward covariate selection procedure based on the Akaike information criterion (AIC) 12 applicable to multiply imputed data 13 and 2) random forest 14 .
As a sensitivity analysis 15 , the Bayesian information criterion (BIC) 16 that tends to select fewer covariates than the AIC was also considered and backward selection was performed in addition, both, when using the AIC and the BIC. The prediction performance of the models was estimated using 20 times repeated stratified K-fold cross-validation (K = 3,4,5), repeating the whole model selection process on each training set in each cross-validation iteration, excluding the corresponding test set 17 .
Multiple imputation was performed separately on training and test sets 18 . As prediction performance measures the AUC and the Brier score were used. Random forest was also used to rank the covariates with respect to their importance for prognosis via the AUC covariate importance values 19 and to estimate the influence forms of the covariates using partial dependence plots (PDPs) 20 . All statistical analyses were performed using the software R, version 3.6.3. All p-values smaller than 0.05 were considered as statistically significant. For further details and explanations, the interested reader is referred to the detailed description of the statistical analysis flow in the supplement (section "In-depth description of the statistical analysis flow").

Demographic and Clinical Features
The young cohort (age <= 60 years old) consisted of 762 patients (median age 47 years, interquartile ranges [IQR] 38 -55, range 16 -60; 55.9% female), who were admitted between Jan. 2020 and Mar. 2020 to Wuhan No.1 Hospitals. As shown in Table 1, 400 (52.5%) of the young patients had a mild condition during hospitalization (mild subgroup), while 362 (47.5%) developed a severe or critical disease course (severe subgroup). The mean age was significantly higher in the severe subgroup than in the mild subgroup (59.3 vs. 56.0, Student's t-test: p < 0.001). 145 (19.8%) patients were affected by underlying diseases, hypertension (14.3%) being the most common one ( Table 1). The median body temperature on admission was 36.6°C (IQR 35. 6 -37.2), no difference of median temperature between both subgroups (mild vs. severe) was seen. The hospitalization in the severe subgroup was significantly longer than that in the mild subgroup ( 20.0 days [IQR 15.0 -25.0] vs. 8.0 days [IQR 5.0 -13.0] ). The majority of patients received Chinese traditional medicine (93.4%), antiviral treatments (86.7%), and antibiotics (75.3%). In this cohort, the most frequently applied oxygen therapy was UOC (78.2%). All key laboratory findings of this cohort are listed in Table 1. Additionally, an elderly cohort of patients (median age 69 years [IQR] 65 -75, range 61 -97, 52.8% female) was included, mainly for statistical comparisons with the young cohort.

Potential Risk Factors Associated with Disease Severity in the Young Patients
In the univariable analysis, elevated level of complement C3, SII, CRP, ASS, LDH, the ratio of CRP versus ALB, and the ratio of neutrophile versus lymphocyte were significantly associated with an increased risk for the development of a severe disease course of COVID-19 infection in the young patients (  obtained using random forest analysis suggest that complement C3 is only prognostic in young patients. Generally, the covariate importance values reveal that, while the number of relevant risk factors is larger for the elderly patients, the difference in importance between the most important risk factors and the remaining risk factors is more pronounced for the young patients. Supplementary Figures 5 to 10 show the corresponding variable importance values and PDPs obtained for all patients, irrespective of age. Many of the PDPs indicate complex influence forms of the covariates. While ASS again is associated with the largest importance values here, the importance values of complement C3 are relatively small. The latter confirms that a higher level of complement C3 probably is a risk factor only associated with severity in young patients. After obtaining the latter result, additional analyses were performed in order to investigate, whether the influence of complement C3 is different for specific subgroups of young patients (see supplement section "Subgroup analysis of the influence of complement C3 in young patients" for details). The results of these analyses did not suggest any relevant dependence of the influence of complement C3 in young patients on age, gender, and comorbidities, indicating that this risk factor is relevant for young patients independent of their specific characteristics (Supplement Figures 11-13).

Multivariable Statistical Model-Analysis for Disease Severity
The results of the multivariable analysis using logistic regression in combination with the forward selection algorithm and the AIC criterion showed that the risk for the development of a severe disease course for COVID-19 in young patients was higher for combinations of elevated levels of complement C3, increase of ASS and SII, and reduced levels of LYM, PLR and UA (Table 3). Applying the backward selection algorithm, UA was not selected in the multivariable model due to lack of importance, but gender, hypertension, thyroid related disease, and BUN were selected instead, indicating potential importance to the risk for development of a severe disease course in the young patients (Supplementary Table 2). However, a sensitivity analysis presented in the supplement (section "Model stability analysis") indicates that the results obtained using the forward selection algorithm (Table 3) may be more statistically stable. When using the BIC criterion instead of the AIC criterion, only complement C3 and ASS were shown to be relevant to the severity in these young patients (Supplementary Table 3).
The results of multivariable logistic regression for all patients (including young and elderly) using the AIC criterion and the forward selection algorithm differed partly from that obtained for the young patients. The risk for the development of a severe disease course in all patients was high for increased levels of complement C3, ASS, BUN, LDH and immunoglobulin G (IgG) and decreased levels of ALB, PLR and IgA ( Table 4). Note that the fact that complement C3 was included in the forward selection for all patients is very likely only due to its importance within the young cohort: In the univariable analysis complement C3 was only significant for the young patients and, as seen in Figure 1, the covariate importance value of complement C3 is only large for the young patients. As revealed by the covariate importance values obtained through the random forest analysis (Figure 1), increased levels of ASS seem to be similarly associated with the development of a severe disease course in elderly patients as well as in young patients. When using the backward selection algorithm, LYM and blood platelet were selected in addition, indicating potential relevance associated with disease severity (Supplementary Table 4). Using the BIC criterion together with the forward selection algorithm, ASS, BUN, ALB, and LDH were selected, where the corresponding odds ratios (Supplementary Table 5) were very similar to those obtained in the model obtained using the AIC (Table 4). Applying the BIC criterion together with the forward selection algorithm delivered the same result.
In summary, the estimated prediction performances of the multivariable logistic regression models for all patients were better than those of the models obtained specifically for the young patients ( Table 5).
The random forest, that considers also non-linear influences, performed best and the model selected using the BIC (that included less covariates) better than that selected using the AIC. Supplementary Table 6 provides an overview on which covariates were selected in each of the models obtained using the AIC and BIC criterion with forward and backward selection.

Discussions
COVID-19, caused by the SARS-CoV-2 virus, is a public health event that poses a serious threat to human health, and the number of COVID-19 cases in young adults is higher than expected: According to the US CDC report almost half of the patients are younger than 65, and young patients have a substantial risk for development of a severe disease course. Facing the rapidly evolving circumstances caused by the COVID-19 pandemic, it is essential to prioritize the medical resources by effectively conducting clinical stratification of COVID-19 young patients. Therefore, it is critical to early identify and investigate potential risk factors for the development of a severe/critical disease course during COVID-19 infection. In order to shed light on potential risk factors associated with a severe disease course in young patients, this study investigated clinical, demographic, treatment, and laboratory data from a group of COVID-19 patients using a set of comprehensive modern statistical methodologies.
The univariable analysis suggested that potential risk factors for disease severity in young patients (age<=60 years; n=762) are in part different from that in elderly patients (age>60 years; n=714).
Specifically, elevated levels of complement C3 and ASS were only significantly associated with higher risks for a severe disease course in the young patients. In contrast, increased levels of ANC, AST, BUN, CK, CR, D-dimer, Myo-hemoglobin, and PLR, and decreased levels of RBC had a significant influence on this risk only for elderly patients. Even though ASS lacked a statistical significance in the elderly subgroup, the covariate importance values ( Figure 1) suggest that ASS is an important risk factor irrespective of patients' age. While complement C4 missed significance in both age groups, the covariate importance values and PDPs suggest that this covariate, complement C3, could be a particularly strong risk factor in the young cohort, while its influence was not significant in elderly patients. Further analysis showed that the significance of complement C3 in young patients was independent of the common risk factors age, gender and the presence versus absence of comorbidies.
Risk factors reported previously in COVID-19 infection such as CRP 7 , lactate dehydrogenase 3 and decreased lymphocyte 8 were validated both in young and elderly patient cohorts. Only the D-dimer and procalcitonin did not show significance in the young patient cohort. The PDPs confirmed the observed differences and revealed that the influences of many covariates on the log odds of disease severity of COVID-19 infection are strongly non-linear. However, in logistic regression it is assumed that these influences are linear, which is likely an important reason why the random forests outperformed the multivariable logistic regression models both for the cohort of young patients and for all patients taken together.
The complement family is an important integral component of the innate immune response to viruses, not only protects the body from infectious agents such as viruses and bacteria, but also plays a key role in promoting inflammatory processes triggering inflammatory cytokine storm 21 . The abnormal activation of various innate immune pathways, such as complement system, cytokines and thrombosis pathways, is considered as the driver of acute respiratory distress syndrome (ARDS) and may lead to multi-organ dysfunction 22,23 .The activation of the complement system also can be found in patients infected with coronaviruses, such as MERs-COV, SARS-CoV-1 and SARS-CoV-2, which develop into ARDS 24 . When mice with complement C3 deficiency were infected with SARS-COV, the infiltration of neutrophils and inflammatory monocytes in the lungs was significantly reduced, and the levels of cytokines and chemokines in the lungs and serum were decreased, as well as the incidence of respiratory failure. This suggests that the activation of the complement component C3 may aggravate the disease of SARS-COV-related ARDS 23 . Young patients have fewer underlying diseases and are more immune-related compensated. This may be the reason why many indicators including D-dimer and procalcitonin may not change significantly even when they progress to severe COVID-19.
However, young patients have more active immune function, which is why the serum complement C3 may be a potential indicator of the severity of COVID-19 patients.
This study possesses limitations aside from those inherent to all retrospective cohort studies such as their lack of causal inference. First, it is a single-center study featuring a limited number of cases.
Second, the patient data were collected within 3 days after hospital admission, leading to missing data in a number of variables, which were imputed with a standard statistical approach.
In summary, this study conducted a comprehensive statistical analysis with a focus on non-linear relationships to identify risk factors and possible pathogenesis for the development of a severe disease course during COVID-19 infection in young patients. However, large-scale and multi-center analysis is needed to foster the obtained knowledge.

Funding
This study was funded by the Natural Science Foundation of Hubei Province (No. 2019CFB641) and by the German Science Foundation (DFG-Einzelförderung HO6422/1-2 to RH).

Role of the Funder/Sponsor
The funder had no role in any activities of this study aside providing financial support.

Competing interests
All authors declare that there is no conflict of interest to report.

Ethics, consent and permissions
The study was approved by the Ethics Committee of Wuhan No.1 Hospital (No. 202008). All methods were carried out in accordance with relevant guidelines and regulations.

Consent to publish
For this study, informed consent was obtained from all subjects or, if subjects are under 18, from a parent and/or legal guardian.             Table 3: Multivariable logistic regression models for the outcome "severe vs. mild" in young patients selected using the AIC criterion and forward selection .   R  e  g  r  e  s  s  i  o  n  c  o  e  f  f  i  c  i  e  n  t  O  d  d  s  r  a  t  i  o   I  n  t  e  r  c  e  p  t  0  .  1  9  1  9  8  7  -C  o  m  p  l  e  m  e  n  t  C  3  0  .  5  0  6  4  5  6  1  .  6  5  9  4   I  m  m  u  n  o  g  l  o  b  u  l  i  n  I  g  A  -0  .  1  3  4  6  9 Table 5: Performances of the models measured using stratified K-fold cross-validation. For each value of K, the stratified K-fold cross-validation was repeated twenty times and the results averaged. Higher values of the AUC and smaller values of the Brier score are preferable.