The sudden increase in COVID-19 cases is putting high pressure on healthcare services worldwide. At this stage, fast, accurate and early clinical assessment of the disease severity is vital. To support decision making and logistical planning in healthcare systems, this study leverages a database of blood samples from 485 infected patients in the region of Wuhan, China, to identify crucial predictive biomarkers of disease mortality. For this purpose, machine learning tools selected three biomarkers that predict the mortality of individual patients more than 10 days in advance with more than 90% accuracy: lactic dehydrogenase (LDH), lymphocyte and high-sensitivity C-reactive protein (hs-CRP). In particular, relatively high levels of LDH alone seem to play a crucial role in distinguishing the vast majority of cases that require immediate medical attention. This finding is consistent with current medical knowledge that high LDH levels are associated with tissue breakdown occurring in various diseases, including pulmonary disorders such as pneumonia. Overall, this Article suggests a simple and operable decision rule to quickly predict patients at the highest risk, allowing them to be prioritized and potentially reducing the mortality rate.
Outbreaks of the COVID-19 epidemic have been causing worldwide health concerns since December 2019. The virus causes fever, cough, fatigue and mild to severe respiratory complications, which, if very severe, can lead to patient death. On 6 March, there were 98,192 cumulated cases of infection across the world and 3,045 deaths had been reported1. On 11 March, the virus outbreak was declared a pandemic by the World Health Organization2. So far, it has been reported that 13.8–19.1% of COVID-19-infected patients in Wuhan, China, became severely ill3,4,5. Furthermore, recent reports have exposed an astonishing case fatality rate of 61.5% for critical cases, increasing sharply with age and for patients with underlying comorbidities6. The severity of cases is putting great pressure on medical services, leading to a shortage of intensive care resources.
Unfortunately, there is no currently available prognostic biomarker to distinguish patients that require immediate medical attention and to estimate their associated mortality rate. The capacity to identify cases that are at imminent risk of death has thus become an urgent yet challenging necessity. Under these circumstances, we retrospectively analysed the blood samples of 485 patients from the region of Wuhan, China, to identify robust and meaningful markers of mortality risk. A mathematical modelling approach based on state-of-the-art interpretable machine learning algorithms was devised to identify the most discriminative biomarkers of patient mortality. The problem was formulated as a classification task, where the inputs included basic information, symptoms, blood samples and the results of laboratory tests, including liver function, kidney function, coagulation function, electrolytes and inflammatory factors, taken from originally general, severe and critical patients (Table 1), as well as their associated outcomes corresponding to either survival or death at the end of the examination period. Through optimization, this classifier aims to reveal the most crucial biomarkers distinguishing patients at imminent risk, thereby relieving clinical burden and potentially reducing the mortality rate.
Medical records were collected by using standard case report forms that included epidemiological, demographic, clinical, laboratory and mortality outcome information (Table 2 and Supplementary Data 1). The clinical outcomes were followed up to 24 February 2020. The study was approved by the Tongji Hospital Ethics Committee.
The medical information of all patients collected between 10 January and 18 February 2020 were used for model development. Data originating from pregnant and breast-feeding women, patients younger than 18 years and recordings with data material less than 80% complete were excluded from subsequent analysis. For 375 patients, fever was the most common initial symptom (49.9%), followed by cough (13.9%), fatigue (3.7%) and dyspnoea (2.1%). The age distribution of the patients was 58.83 ± 16.46;years, and 59.7% were male. The epidemiological history included Wuhan residents (37.9%), familial cluster (6.4%) and health workers (1.9%). The laboratory results are shown in Table 2. Of the 375 cases included in the subsequent analysis, 201 recovered from COVID-19 and were discharged from the hospital, while the remaining 174 died> deceased. Following this, 110 newly discharged or deceased patients between 19 February 2020 and 24 February 2020 were enrolled for analysis as an external test dataset.
The minimal, maximal and median follow-up times (from admission to hospital to death or discharge) for all 485 (375 + 110) patients are 0 days 02:01:58 (hours: minutes: seconds), 35 days 04:05:54 and 11 days 04:15:36, respectively. The high mortality rate seen in our study was related to the fact that Tongji Hospital admitted a higher rate of severe and critical cases in Wuhan. A patient’s severity was empirically assessed by medical doctors according to the criteria in Table 1 only at admission7. Figure 1 summarizes the outcome of patients in three different classes.
Development of a machine learning model
Most patients had multiple blood samples taken throughout their stay in hospital. However, the model training and testing uses only the data from the final sample as inputs to the model to assess the crucial biomarkers of disease severity, distinguish patients that require immediate medical assistance and accurately match corresponding features to each label. Nevertheless, the model can be applied to all other blood samples and the predictive potential of the identified biomarkers estimated (see Estimation of the prediction horizon section). Missing data were ‘−1’ padded. The model output corresponds to patient mortality. Patients that survived were assigned to class 0 and those that died to class 1.
The performance models were evaluated by assessing the classification accuracy (ratio of true predictions over all predictions), the precision, sensitivity/recall and F1 scores (defined below):
where \(i \in C\) represents the class, N is the number of all samples, C is the number of all classes, Ni is the number of samples, TNi in class i, TPi, FPi and FNi stand for true positive, true negative, false positive and false negative rates for class i, respectively. In total, 75 features were considered.
This study uses a supervised XGBoost classifier8 as the predictor model. XGBoost is a high-performance machine learning algorithm that benefits from great interpretability potential due to its recursive tree-based decision system. In contrast, internal model mechanisms of black-box modelling strategies are typically difficult to interpret. The importance of each individual feature in XGBoost is determined by its accumulated use in each decision step in trees. This computes a metric characterizing the relative importance of each feature, which is particularly valuable to estimate features that are the most discriminative of model outcomes, especially when they are related to meaningful clinical parameters.
XGBoost was originally trained with the following default parameter settings: maximum depth equal to 4, learning rate equal to 0.2, number of tree estimators set to 150, value of the regularization parameter α set to 1 and ‘subsample’ and ‘colsample_bytree’ both set to 0.9 to prevent overfitting for cases with many features and small sample size8. We refer to it as the ‘Multi-tree XGBoost algorithm’.
Feature importance for an operable decision tree
To evaluate the markers of imminent mortality risk, we assessed the contribution of each patient parameter to decisions of the algorithm. Features were ranked by Multi-tree XGBoost according to their importance (Supplementary Figs. 1 and 2 and Supplementary algorithm 1). The performances of the model showed no improvement in area under the curve (AUC) scores when the number of top features increased to four. Hence, the number of key features was set to the following three: lactic dehydrogenase (LDH), lymphocytes and high-sensitivity C-reactive protein (hs-CRP).
Table 3 summarizes the performances of the Multi-tree XGBoost model. The results show that the model is able to accurately identify the outcome of patients, regardless of their original diagnosis upon hospital admission. Notably, the performance of the external test set (detailed below) is similar to that of the training and validation sets, which suggests that the model captures the key biomarkers of patient mortality. The set of selected features is represented graphically for each patient in Supplementary Fig. 3, demonstrating a clear separability. Table 3 further emphasizes the importance of LDH as a crucial biomarker for patient mortality rate.
Development of a clinically operable decision tree
Following previous findings on the importance of LDH, lymphocytes and hs-CRP, we aimed to construct a simplified and clinically operable decision model. XGBoost algorithms are based on recursive decision tree building from past residuals and can identify those trees that contribute the most to the decision of the predictive model. Decision trees are simple classifiers consisting of sequences of binary decisions organized hierarchically. Hence, if the accuracy of a tree remains high, reducing the complexity of the model to such a structure has the potential to reveal a clinically portable decision algorithm. In the following, we refer to the latter as an ‘interpretable model’ or ‘single-tree XGBoost’.
There were 24 patients with incomplete measurements for at least one of the three principal biomarkers in their last blood samples, leaving 351 patients to identify a single-tree XGBoost model. To identify the model, XGBoost was re-trained with the same parameters as described above, except for the following: number of tree estimators set to 1, values of the regularization parameters α and β both set to 0, and the subsample and max features both set to 1 as overfitting issues have been avoided based on previous modelling8. The interpretable decision tree was obtained by a random split of the 351 patients to training and validation datasets in the ratio 7:3. The resulting tree structure and performances are shown, respectively, in Fig. 2 and Supplementary Tables 1 and 2.
In addition, the performances of the interpretable model were estimated for the external test set on the latest blood samples of 110 patients, which were not part of the training or validation of the Single-tree XGBoost model (Table 4). The associated confusion matrix is presented in Supplementary Fig. 5, which shows 100% survival prediction accuracy and 81% mortality prediction accuracy. Overall, the scores for survival and death prediction, accuracy, macro and weighted averages are consistently over 0.90.
Finally, for benchmark purposes, the performances of the interpretable model were compared with other standard methods such as random forest and logistic regression9. The receiver operating characteristic curves and AUC scores are shown in Supplementary Table 3 and Supplementary Fig. 4.
Estimation of the prediction horizon
Most patients had multiple blood samples taken throughout their hospital stay. In total, there were 909 blood samples with complete measurements of these three features for all 485 patients used for training and validation, and 251 blood samples with complete measurements of these three features for the 110 patients in the external test set. The predictive potential of our model was evaluated on all blood tests for all 485 patients and 110 patients in the external test dataset (Fig. 3 and Supplementary Figs. 6 and 7). On average, the accuracy of our algorithm was 90%, further showing that the model could be applied to any blood sample, including those that were taken far ahead of the day of primary clinical outcome. On average, the model could predict the outcome of all true positive patients at about 10 days (11 days for patients in the external test set) in advance of outcome using all their blood samples (Fig. 3b,c). The model can even predict 18 days in advance with a cumulative accuracy above 90% (Fig. 3d,e). The accuracy of the prediction increases closer to the patient’s outcome. This prediction horizon analysis suggests that, where a patient’s condition deteriorates, the clinical route is able to give an early warning to clinicians a few days in advance.
The significance of our work is twofold. First, it goes beyond providing high-risk factors4. It provides a simple and intuitive clinical test to precisely and quickly quantify the risk of death. For example, a routine sequential respiratory support therapy for patients with SpO2 below 93% comprises intranasal catheterization of oxygen, oxygen supply through a mask, high-flow oxygen supply through a nasal catheter, non-invasive ventilation support, invasive ventilation support and extracorporeal membrane oxygenation. Predicting that for some patients this sequential oxygen therapy leads to unsatisfactory therapeutic effects could preempt physicians to pursuit different approaches. The goal is for the model to identify high-risk patients before irreversible consequences occur. Second, the three key features, LDH, lymphocytes and hs-CRP, can be easily collected in any hospital. In crowded hospitals, and with shortages of medical resources, this simple model can help to quickly prioritize patients, especially during a pandemic when limited healthcare resources have to be allocated10.
The increase of LDH reflects tissue/cell destruction and is regarded as a common sign of tissue/cell damage. Serum LDH has been identified as an important biomarker for the activity and severity of idiopathic pulmonary fibrosis11. In patients with severe pulmonary interstitial disease, the increase of LDH is significant and is one of the most important prognostic markers of lung injury11. For critically ill patients with COVID-19, the rise in LDH level indicates an increase of the activity and extent of lung injury.
The increase of hs-CRP, an important marker for poor prognosis in acute respiratory distress syndrome12,13, reflects a persistent state of inflammation14. The result of this persistent inflammatory response is large grey-white lesions in the lungs of patients with COVID-19 (seen in autopsy)15. In tissue sections, a large amount of sticky secretion is also seen overflowing from the alveoli15.
Finally, our results also suggest that lymphocytes may serve as a potential therapeutic target. This hypothesis is supported by the results of clinical studies4,16. Lymphopenia is a common feature in patients with COVID-19 and might be a critical factor associated with disease severity and mortality17. Injured alveolar epithelial cells could induce the infiltration of lymphocytes, leading to persistent lymphopenia, as was seen in SARS-CoV-2 and MERS-CoV (they share similar alveolar penetrating and antigen presenting cell (APC) impairing pathways)18,19. A biopsy study has provided strong evidence of substantially reduced counts of peripheral CD4 and CD8 T cells, while their status was hyperactivated20. Also, Jing and colleagues have reported that the lymphopenia is mainly related to the decrease in CD4 and CD8 T cells21. It is thus likely that lymphocytes play distinct roles in COVID-19, which deserves further investigation.
This study has room for further improvement, which is left for future work. First, given that the proposed machine learning method is purely data-driven, our model may vary if starting from different datasets. As more data become available, the whole procedure can easily be repeated to obtain more accurate models. This is a single-centred, retrospective study, which provides a preliminary assessment of the clinical course and outcome of patients. We look forward to subsequent large-sample and multi-centred studies. Second, although we had a pool of more than 70 clinical features, our modelling principle is a trade-off between having a minimal number of features and the capacity of good prediction, therefore avoiding overfitting. Finally, this study strikes a balance between model interpretability and improved accuracy. Although clinical settings tend to prefer interpretable models, it is possible that a black-box model may lead to improved performance.
In summary, this study has identified three indicators (LDH, hs-CRP and lymphocytes), together with a clinical route (Fig. 2), for COVID-19 prognostic prediction. We have developed an XGBoost machine learning-based model that can predict the mortality rates of patients more than 10 days in advance with more than 90% accuracy, enabling detection, early intervention and potentially a reduction of mortality in patients with COVID-19.
Further information on research design is available in the Nature Research Reporting Summary linked to this Article.
World Health Organization. Coronavirus Disease 2019 (COVID-19) Situation Report 46, 6 March 2020 (2020); https://www.who.int/docs/default-source/coronaviruse/situation-reports/20200306-sitrep-46-covid-19.pdf
World Health Organization. Coronavirus Disease 2019 (COVID-19) Situation Report 68, 28 March 2020 (2020); https://www.who.int/docs/default-source/coronaviruse/situation-reports/20200328-sitrep-68-covid-19.pdf
Huang, C. et al. Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. Lancet 395, 497–506 (2020).
Chen, N. et al. Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: a descriptive study. Lancet 395, 507–513 (2020).
Novel Coronavirus Pneumonia Emergency Response Epidemiology Team The epidemiological characteristics of an outbreak of 2019 novel coronavirus diseases (COVID-19) in China. Zhonghua Liu Xing Bing Xue Za Zhi 41, 145–151 (2020).
Yang, X. et al. Clinical course and outcomes of critically ill patients with SARS CoV-2 pneumonia in Wuhan, China: a single-centered, retrospective, observational study. Lancet Resp. Med. 8, 475–481 (2020).
Diagnosis and Treatment of Pneumonia Infected by the New Novel Coronavirus (the trial fifth edition) Medical Letter from the National Health Office (National Health Commission of the People’s Republic of China, 2020).
Chen, T. & Guestrin, C. XGBoost: a scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 785–794 (ACM, 2016).
Lundberg, S. et al. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2, 56–67 (2020).
Truog, R. D., Mitchell, C. & Dalley, G. Q. The toughest triage—allocating ventilators in a pandemic. N. Engl. J. Med. https://doi.org/10.1056/NEJMp2005689 (2020).
Kishaba, T., Tamaki, H., Shimaoka, Y., Fukuyama, H. & Yamashiro, S. Staging of acute exacerbation in patients with idiopathic pulmonary fibrosis. Lung 192, 141–149 (2014).
Ridker, P. M. et al. Rosuvastatin to prevent vascular events in men and women with elevated C-reactive protein. N. Engl. J. Med. 359, 2195–2207 (2008).
Sharma, S. K. et al. Aetiology, outcomes & predictors of mortality in acute respiratory distress syndrome from a tertiary care centre in North India. Indian J. Med. Res. 143, 782–792 (2016).
Bajwa, E. K. et al. Plasma C-reactive protein levels are associated with improved outcome in ARDS. Chest 136, 471–480 (2009).
Liu, X. et al. A general report on the systematic anatomy of COVID-19. J. Forensic Med. 36, 1–3 (2020).
Wang, D. et al. Clinical characteristics of 138 hospitalized patients with 2019 novel coronavirus-infected pneumonia in Wuhan, China. JAMA 323, 1061–1069 (2020).
Chan, J. F. et al. A familial cluster of pneumonia associated with the 2019 novel coronavirus indicating person-to-person transmission: a study of a family cluster. Lancet 395, 514–523 (2020).
Li, F., Li, W., Farzan, M. & Harrison, S. C. Structure of SARS coronavirus spike receptor-binding domain complexed with receptor. Science 309, 1864–1868 (2005).
Ge, X. Y. et al. Isolation and characterization of a bat SARS-like coronavirus that uses the ACE2 receptor. Nature 503, 535–538 (2013).
Xu, Z. et al. Pathological findings of COVID-19 associated with acute respiratory distress syndrome. Lancet Resp. Med. 8, 420–422 (2020).
Liu, J. et al. Longitudinal characteristics of lymphocyte responses and cytokine profiles in the peripheral blood of SARS-CoV-2 infected patients. EbioMedicine 55, 102763 (2020).
We would like to dedicate this paper to those who have devoted their lives to the battle with coronavirus.
The authors declare no competing interests.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Yan, L., Zhang, HT., Goncalves, J. et al. An interpretable mortality prediction model for COVID-19 patients. Nat Mach Intell 2, 283–288 (2020). https://doi.org/10.1038/s42256-020-0180-7
BMC Medical Informatics and Decision Making (2021)
Self-supervised deep learning model for COVID-19 lung CT image segmentation highlighting putative causal relationship among age, underlying disease and COVID-19
Journal of Translational Medicine (2021)
BMC Gastroenterology (2021)
Clinical characteristics and outcomes in women and men hospitalized for coronavirus disease 2019 in New Orleans
Biology of Sex Differences (2021)
Machine learning-aided risk stratification in Philadelphia chromosome-positive acute lymphoblastic leukemia
Biomarker Research (2021)