Development and validation of prognostic model for predicting mortality of COVID-19 patients in Wuhan, China

Novel coronavirus 2019 (COVID-19) infection is a global public health issue, that has now affected more than 200 countries worldwide and caused a second wave of pandemic. Severe adult respiratory syndrome-CoV-2 (SARS-CoV-2) pneumonia is associated with a high risk of mortality. However, prognostic factors predicting poor clinical outcomes of individual patients with SARS-CoV-2 pneumonia remain under intensive investigation. We conducted a retrospective, multicenter study of patients with SARS-CoV-2 who were admitted to four hospitals in Wuhan, China from December 2019 to February 2020. Mortality at the end of the follow up period was the primary outcome. Factors predicting mortality were also assessed and a prognostic model was developed, calibrated and validated. The study included 492 patients with SARS-CoV-2 who were divided into three cohorts: the training cohort (n = 237), the validation cohort 1 (n = 120), and the validation cohort 2 (n = 135). Multivariate analysis showed that five clinical parameters were predictive of mortality at the end of follow up period, including advanced age [odds ratio (OR), 1.1/years increase (p < 0.001)], increased neutrophil-to-lymphocyte ratio [(NLR) OR, 1.14/increase (p < 0.001)], elevated body temperature on admission [OR, 1.53/°C increase (p = 0.005)], increased aspartate transaminase [OR, 2.47 (p = 0.019)], and decreased total protein [OR, 1.69 (p = 0.018)]. Furthermore, the prognostic model drawn from the training cohort was validated with validation cohorts 1 and 2 with comparable area under curves (AUC) at 0.912, 0.928, and 0.883, respectively. While individual survival probabilities were assessed, the model yielded a Harrell’s C index of 0.758 for the training cohort, 0.762 for the validation cohort 1, and 0.711 for the validation cohort 2, which were comparable among each other. A validated prognostic model was developed to assist in determining the clinical prognosis for SARS-CoV-2 pneumonia. Using this established model, individual patients categorized in the high risk group were associated with an increased risk of mortality, whereas patients predicted to be in the low risk group had a higher probability of survival.

www.nature.com/scientificreports/ with an increased risk of mortality, whereas patients predicted to be in the low risk group had a higher probability of survival.
Novel coronavirus  infection is a global public health issue that has now affected more than 200 countries worldwide and caused second wave of pandemic 1 . Severe adult respiratory syndrome-CoV-2 (SARS-CoV-2) pneumonia is associated with a high risk of mortality. However, factors that predict poor clinical outcomes of individual patients with SARS-CoV-2 pneumonia remains under intensive investigation. Current studies have showed that patients with SARS-CoV-2 pneumonia exhibit a wide range of symptoms such as fever, cough, myalgia, fatigue, or others [2][3][4][5] . Many patients experience a mild disease course, although approximately 15-25% develop more severe disease. Progression may result in acute respiratory distress syndrome (ARDS), multiple organ failure, and death 6 . Therefore, it is of ultimate importance to identify the high-risk group of patients in order to implement prompt medical intervention to improve clinical outcomes.
The aim of this study was to establish and validate a prognostic model for increased risk of mortality and survival time among individual patients with SARS-CoV-2 pneumonia. Our validated model stratifies patients into those with high versus low risk of death before life-threatening complications develop. This knowledge could be used to inform and justify critical patient management decisions and promote optimal use of often limited medical resources during the COVID-19 pandemic.  Fig. 1). Patients were followed until the 18th September, 2020. Patients were divided into three cohorts: the training cohort (TC) was used for establishment of a prognostic model, and 2 validation cohorts (VC1 and VC2) were used for external validation and assessment of robustness of the models. The TC included data collected from TJH between Jan 21th and Feb 16th, 2020. The VC1 consisted of patients from RHW and WNH admitted between Jan 23rd and Feb 16th, 2020. The VC2 included patients from WPH admitted between Jan 10th and Feb 27th, 2020. The primary outcome was mortality at the end of the study period. To be included for study, participants had to meet following diagnostic criteria for COVID-19 pneumonia: (1) confirmed diagnosis of SARS-CoV-2 pneumonia using RT-PCR on nasopharyngeal/oropharyngeal swab samples , (2) Computerised tomography (CT) evidence of viral pneumonia, defined as COVID-19. In addition, patient clinical outcome data had to be available. Exclusion criteria included: (1) death occurred within 24 h after hospital admission and for which related health records were unavailable, (2) no data on clinical outcomes were available, (3) suspected cases lacked a positive result for F137nCoV test, and (4) patients refused to participate in this study.

Methods
Following informed consent, the following data were collected on admission:age, sex, symptoms from onset to hospital admission (fever, cough, dyspnea, myalgia, rhinorrhea, arthralgia, chest pain, headache, and vomiting), comobidities (cardiovascular disease, chronic pulmonary disease, cerebrovascular disease and chronic neurological disorders, diabetes, malignancy, and smoking), vital signs (heart rate, respiratory rate, and blood pressure), laboratory values on admission (serum hemoglobin concentration, lymphocyte counts, platelet counts, diverse protein markers), treatment regime used for COVID-19 pneumonia (antiviral agents, antibacterial agents, corticosteroids, and interferon therapy), dates of symptom onset, admission, virus testing, CT-scan, as well as changes in patient condition and living status. All methods were carried out in accordance with relevant guidelines and regulation in Declaration of Helsinki.
Treatment protocol and criteria for discharged from hospitals for SARS-CoV-2 pneumonia. The treatment strategy for patients with COVID-19 pneumonia was based on the guidelines of World Health Organization (WHO) 7 , and included symptoms relief, treatment of underlying diseases, prevention of superimposed bacterial infections, active prevention of complications such as sepsis and ARDS and timely support of vital organ function. Oxygen supplementation was provided for patients with reduced O2 saturations and was administered via high flow oxygen via nasal prong (< 300 mmHg), non-invasive and invasive mechanical ventilation (< 200 and < 150 mmHg, respectively), or extracorporeal membrane oxygenation (ECMO) if required.
The discharge criteria for patients with SARS-CoV-2 pneumonia included one of the following: (1) haemodynamically stable and afebrile for > 3 days, (2) radiological evidence of significant resolution of pneumonia on CT-scan, (3) two sequential negative results for the F137nCoV test with at least 1 day interval, and (4) no concurrent acute medical issues requiring transfer to another medical facility.
Statistical considerations. Survival time was calculated from the date of hospital admission until death due to SARS-CoV-2 pneumonia or until the date of the last follow-up. Death due to SARS-CoV-2 pneumonia was considered as an event. Continuous variables were reported as means with standard deviations (SD) for normally distributed variables and as medians and interquartile ranges (IQR) for non-normally distributed variables. Categorical variables were reported as proportion.
According to the transparent reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis guidelines 8  www.nature.com/scientificreports/ were analyzed for a possible association between the life status (deceased versus living) and survival time using the least absolute shrinkage and selection operator (LASSO) for multivariable selection 9 . An iterative process combining forward and backward selection was applied to remove non-significant covariates. During each step of the iteration, the Akaike information criterion (AIC) was used to evaluate model fit 10 . The final model was then established with a minimum value of AIC. The AUC value was used to evaluate the accuracy of the prediction for the vital status. Model calibration was performed to ensure the robustness. The Cox proportional hazards regression analysis was used to evaluate the assessment of the prognostic model for individual survival times. Proportional hazards assumption for the Cox proportional hazards regression model was assessed by using the Schoenfeld residuals test. In order to validate the prognostic model, two independent validation cohorts (VC1, VC2) with the same discrimination method and survival function were used. The 95% confidence intervals (CIs) were estimated via 5000 bootstraps replicates. All statistical analyses were performed using R software version 3.6. A p < 0.05 was considered as statistically significant.
Role of funding. The funders were not involved in any activities of this study, aside from providing financing. Consent to publish. Informed consents were obtained from participants for the purpose of publication.

Demographic and clinical features in training and validation cohorts.
Overall 492 patients were recruited in this study. The demographic characteristics and clinical features of patients from the TC (n = 237; TJH), VC1 (n = 120; RHWU + WNH), and VC2 (n = 135; WPH) cohorts are listed in the Table 1. The mortality rates in the three cohorts were 44.3% (TC), 25.8% (VC1) and 33.3% (VC2), respectively. A total of 105 events occurred in the TC, events in VC1 and VC2 were 31 and 45, respectively. The median survival times were comparable among these three cohorts (15, 17 and 14 days for TC, VC1, and VC2, respectively). TC patients had a median age of 62 (IQR 50-70) and were older than those in VC1 (median age 46, IQR 37-66), but similar to those in VC2 (median age 63, IQR 52-70.5). There was no significant difference in sex distribution among the three cohorts. Most of patients were non-smokers ( Table 1). Leucocytosis was observed 24.5% in the TC, 21.7% in the VC1, and 20.7% in theVC2. Neutrophilia was observed in 34.6% in the TC, 30.8% in the VC1, and 31.1% in the VC2.
Potential risk factors associated with vital status for COVID-19. Univariate analysis revealed that advanced age, increased body temperature on admission, and the presence of underlying diseases were associated with a higher mortality rate in patients with COVID-19 infection ( Table 2). Tachypnoea and hypertension, as well as treatment with antibiotics, corticosteroids or intravenous immunoglobulin were also associated with increased mortality (Table 2). Several laboratory parameters including serum bilirubin, D-dimer, potassium level, prothrombin time (s), lactate dehydrogenase, aspartate transaminase (AST), and urea were also found to be associated with increased risk of death. In addition, patients with lymphopenia, leukocytosis, or neutrophilia also had an increased risk of death (Table 2). Of note, the deceased lymphocyte count, and increased neutrophile count, as well as the increased NLR were also significant risk factors for mortality.

Construction of a prognostic model for vital status and survival in SARS-CoV-2.
For the TC, a multivariate analysis was performed to analyze the association between vital status, survival time, and all the covariates listed in Table 1 Table 2). Based on the weights (coefficients) of these five significant covariates (Table 2), a prognostic model was constructed and applied to predict the vital status of the training cohort. The results of this analysis yielded an AUC of 0.912 (95% CI 0.878-0.947; Fig. 1A). This indicated that the prognostic model was able to effectively differentiate between patients with SARS-CoV-2 pneumonia who survive and were subsequently discharged      www.nature.com/scientificreports/ the 7th, 14th, 21th, and 28th day after admission (Fig. 1B). The nomogram was constructed to assess impact of these factors (Supplementary Fig. 2). The predicted 30-days survival rates of the high-and low-risk subgroups in the training cohort are visualized in Fig. 1C (≥ 799 and < 799). Here, 799 represented the cutoff in the model based on the average of minimum calculated scores among deceased patients.  (Fig. 2). By applying the same cutoff of model score, high-risk subgroups with lower survival rates were defined to clearly differentiate between the low-risk subgroups in both validation cohorts (HR: 11.53 [95% CI 4.01-33.15 for VC1 and HR: 9.3 [95% CI 3.32-26.03] for VC2) (Fig. 2). Of note, the predicted 30-day survival rates in high-and low-risk subgroups in both validation cohorts were similar to the observed survival rates in the training cohort (Fig. 2), thereby confirming the strength of the model for the prognosis for SARS-CoV-2 pneumonia.  Table 3]. Finally, to aid in the current clinical management of SARS-CoV-2, a web-based application (http://82.165.167.23:8734/SIMTa skMas ter/SARS2 _Tool) was developed to enable broad testing and utilization of the developed prognostic model (Supplement Fig. 3).

Discussion
In this retrospective multicenter study of 492 hospitalised patients with SARS-CoV-2 pneumonia, we found that advanced age, high body temperature on admission, high NLR, elevated AST as well as decreased total protein was associated with an increased risk of mortality. The prognostic model established based on these five clinical parameters was robustly validated using two separate validation cohorts. The aim of model application was the early identification and prioritization of individual patients requiring early administration of intensive treatment strategies.
The rapid transmission of the disease and the current second wave of COVID-19 pandemic have created public crisis on a global scale. To avoid overwhelming the public health systems and exacerbating the economic burdens, strategies to overcome this pandemic are being vigorously explored, including studies aimed at identifying the greatest at-risk populations. Prior studies have reported various potential risk factors associated with mortality in the setting of SARS-CoV-2 pneumonia 4,5 . For instance, Chen and colleagues found that age, obesity, and comorbidity were three identifiable risk factors for mortality 5 . Wang and colleagues identified that neutrophilia, lymphopenia, and elevated D-dimer and creatinine level were observed in non-survivors, implying that a cellular immune deficiency plus coagulation activation could potentially mediate disease severity 11 . In our study, the association between increased NLR and mortality suggests that altered immune cell function plays a critical role in the pathogenesis of SARS-CoV-2 pneumonia. This result is consistent with several recent independent studies [12][13][14] .
Advanced age has also been identified as an independent risk factor of COVID-19 5,15,16 . The underlying mechanisms could include changes of anatomical respiratory structure with aging 17 , immunosenescene 18 , and inflammaging 19 , which would, respectively, facilitate entry of SARS-CoV-2, weaken anti-viral immunity, and promote a cytokine storm, leading to multiple organ damages. Further, age-related alterations in metabolism are known to underlay changes in innate and adaptive immunity 20 , which also contribute to the weakening of immunity. Our findings that increased NLR in elderly patients is a major risk factor for mortality support the role of inflammaging in COVID-19 pathogenesis.
Aspartate transaminase (AST) is an important clinical marker for early diagnosis of various diseases including progression and/or metastatic potential of solid tumor 21,22 . Further, AST is an important enzyme involved in diverse metabolic pathways including purine metabolism 23 , steroid biosynthesis 24 , and synthesis of amino-acids such as arginine 25 , phenylalanine 26 , tyrosine 27 , and others 28 . Thus, elevated serum AST levels is considered an indicator of metabolic dysfunction. Furthermore, hypoalbuminaemia-manifested in our study as reduced total protein-is often related to malnutrition and recent studies have shown that diminished availability of metabolic nutrients directly leads to changes of immune responses [29][30][31][32] .
Because SARS-CoV-2 replication and pathogenesis are highly dependent on the host metabolism 33 . The decreased total protein strongly suggests a heightened viral burden and predicts a severe disease course. In total, poorer outcomes observed in our patients with elevated AST levels and decreased total protein could be related to age-and/or virus-induced metabolic dysfunction in these individuals. www.nature.com/scientificreports/ Lastly, an increase in body temperature is one clinical manifestation of pro-inflammatory cytokine production (e.g., TNFα, TNFβ, IL-1β) by activated macrophages and T-lymphocytes. Dysregulated production of such cytokines can lead to a "cytokine storm" that ultimately damages vital organs, including the lungs, contributing to ARDS. Dysregulated and sustained TNFα production in response to other viral infections (HIV/AIDS) also mediates cachexia (muscle wasting) and is characterized by changes in total protein 32 . Thus, an elevated body temperature in patients at risk for mortality from COVID-19 pneumonia may reflect an aberrant cytokine response to SARS-CoV-2 infection.
In sum, these five clinical parameters, when combined, predict mortality in patients with COVID-19 pneumonia in our model are reflective of the status of host immunity. Further, as with other coronaviruses, SARS-CoV-2 does not possess its own metabolism, the viral replication and pathogenesis are highly dependent on the host metabolism. The hijack of host metabolism remains the only way for the viral survival.
Our study has several strengths. First, while several studies have previously reported relevant risk factors associated with SARS-CoV-2 pneumonia 34-36 , our study combined such factors into a robust and validated prognostic model for outcome of COVID-19 infection. Second, our model utilizes five commonly used clinical parameters that are routinely obtained on hospital admission and are not confounded by prior treatment since this has not yet been initialized. Third, our study involved a large number of patients and the prognostic model was fully validated with two large, independent external cohorts. Fourth, the model was also validated for agespecific cohorts. Specifically, the high AUC and C-indices of the prediction of the vital status and survival in patients aged 50-70 years versus > 70 indicate the suitability of the prognostic model for elderly patients. In total, this established prognostic model can assist clinicians in identification and stratification of high risk patients, thereby promoting initialization of vital treatment strategies that can improve outcomes.
The major limitation of this study is that the model was developed and validated purely based on Chinese population. Therefore, its application to the regions outside of China needs to be further determined. We speculate that the model could still reach a high prediction rate, however, the cutoff of optimal model score of 799 might need to be adjusted correspondingly to cover a broader spectrum of disease trajectories. Further, our prognostic model excluded gender and presence of comorbidity due to low statistical significance. One of reasons might be that this study was not able to include COVID-19 patients outside of China. Moreover, another study of our group has shown that presence of comorbidity was an age-dependent risk factor for COVID-19 infection (under review).
In addition, due to the nature of an observational study, potential confounders may exist which can have impacts on the results. Therefore, further prospective international multicenter studies are needed to test the robustness of this model.

Conclusion
In this retrospective multi-center cohort study, a prognostic model was developed and validated to predict the outcome of individual patients suffering from SARS-CoV-2 pneumonia. We identified five common clinical parameters that are relevant to outcome of COVID-19 infection. This model enables clinical patient stratification to efficiently prioritize medical resources in the treatment and management of patients with SARS-CoV-2 pneumonia. The model's clinical application may also inform treatment recommendations to save more lives in a high-risk group of patients while avoiding overtreatment in those at lower risk.

Data availability
The data that support the findings of this study are available on request from the corresponding author.