Machine learning prediction models for prognosis of critically ill patients after open-heart surgery

Zhong, Zhihua; Yuan, Xin; Liu, Shizhen; Yang, Yuer; Liu, Fanna

doi:10.1038/s41598-021-83020-7

Download PDF

Article
Open access
Published: 09 February 2021

Machine learning prediction models for prognosis of critically ill patients after open-heart surgery

Zhihua Zhong^1,3,
Xin Yuan²,
Shizhen Liu³,
Yuer Yang⁴ &
…
Fanna Liu³

Scientific Reports volume 11, Article number: 3384 (2021) Cite this article

4627 Accesses
37 Citations
Metrics details

Subjects

Abstract

We aimed to build up multiple machine learning models to predict 30-days mortality, and 3 complications including septic shock, thrombocytopenia, and liver dysfunction after open-heart surgery. Patients who underwent coronary artery bypass surgery, aortic valve replacement, or other heart-related surgeries between 2001 and 2012 were extracted from MIMIC-III databases. Extreme gradient boosting, random forest, artificial neural network, and logistic regression were employed to build models by utilizing fivefold cross-validation and grid search. Receiver operating characteristic curve, area under curve (AUC), decision curve analysis, test accuracy, F1 score, precision, and recall were applied to access the performance. Among 6844 patients enrolled in this study, 215 patients (3.1%) died within 30 days after surgery, part of patients appeared liver dysfunction (248; 3.6%), septic shock (32; 0.5%), and thrombocytopenia (202; 2.9%). XGBoost, selected to be our final model, achieved the best performance with highest AUC and F1 score. AUC and F1 score of XGBoost for 4 outcomes: 0.88 and 0.58 for 30-days mortality, 0.98 and 0.70 for septic shock, 0.88 and 0.55 for thrombocytopenia, 0.89 and 0.40 for liver dysfunction. We developed a promising model, presented as software, to realize monitoring for patients in ICU and to improve prognosis.

Multi-center validation of machine learning model for preoperative prediction of postoperative mortality

Article Open access 12 July 2022

Development of postoperative delirium prediction models in patients undergoing cardiovascular surgery using machine learning algorithms

Article Open access 30 November 2023

Using machine learning tools to predict outcomes for emergency department intensive care unit patients

Article Open access 01 December 2020

Introduction

Open-heart surgery is a common surgery in the intensive care unit (ICU) with various complications such as acute kidney injury (AKI), sepsis, septic shock, chronic kidney disease (CKD), pneumonia, thrombocytopenia, and inflammatory responses^{1,2,3,4,5,6,7}. Lysak et al⁸ showed that because AKI and CKD are prevalent and generate high expenditure, early diagnosis is necessary to prevent these comorbidities from deteriorating. Early predicting comorbidities for critically ill patients after cardiac surgery is vital for patients’ prognosis and doctors’ decision making.

Compared to traditional methods to build a clinical prediction model by using logistic regression (LR), machine learning prediction models have the advantage of higher accuracy and robustness. Traditional algorithms like LR requires researchers to manually select the highly related independent variables X while cutting edge machine learning algorithms can find out the relationship between X and Y automatically. Many researchers tried to construct prediction models for patients who underwent cardiac surgery. Meyer et al⁹ used the recurrent neural network (RNN) to predict mortality, bleeding, and renal failure after patients received heart surgery. Lei et al¹⁰, Tseng et al¹¹, and Lee et al¹² used acute kidney injury (AKI) as their primary outcome, moreover, Kilic et al¹³ considered prolonged ventilation, reoperation as their prediction objectives. Many researchers paid attention to the most common complications including AKI and sepsis after cardiac surgery, however, the research on other comorbidities such as septic shock, liver dysfunction, severe thrombocytopenia was limited.

Vardon-Bounes et al¹⁴ suggested that thrombocytopenia, with a prevalence of 50%, is a common hemostatic disorder in ICU, and is associated with bleeding, high illness severity, organ failure, and bad prognosis¹⁵. Moreover, Kunutsor et al¹⁶ demonstrated that alanine transaminase (ALT) and aspartate transaminase (AST), as the indicator of liver dysfunction, are inversely associated with coronary heart disease (CHD) and are positively associated with stroke. In addition, Ambrosy et al¹⁷ showed that the higher the ALT and AST, the lower the survival rate. The increase of transaminase often indicates that the body is in a state of hypoperfusion or hypoxemia. It reminds us that timely intervention is needed, otherwise, patients will have adverse prognoses such as AKI or even death. Font et al¹⁸ claimed that during septic shock, the body produces a large number of inflammatory cytokines, causing multiple organ failures, such as septic cardiomyopathy, acute respiratory distress syndrome, septic encephalopathy, and other complications. Therefore, early prediction of the occurrence of septic shock is particularly important to reduce the further deterioration of the patient's condition.

Therefore, in this study, we aimed to build up multiple machine learning models to predict several risk factors of prognosis after open-heart surgery. Our primary outcomes were all-cause 30-days mortality, septic shock, severe thrombocytopenia, and liver dysfunction (abnormal AST and ALT).

Result

Study population

Among 6844 patients after heart surgery, 5475 (80%) patients were randomly divided into training data and 1369 (20%) patients were in test data. Table 1 showed the characteristics’ differences between these two groups' data, and most of the variables have no significant differences. Among 6844 data enrolled in this study, 219 (3.1%) patients died within 30-days after heart surgery. Septic shock, liver dysfunction, and thrombocytopenia accounts for 32 (0.5%), 248 (3.6%), 202 (2.9%) in respective. Table 2 showed most of the input variables have significant difference between positive samples (ill patients) and negative samples (normal patients) (P < 0.05).

Table 1 Baseline characteristics and variables 1.

Full size table

Table 2 Baseline characteristics and variables 2.

Full size table

Machine learning models’ performance

Accuracy, area under the curve (AUC), F1 score, precision, and recall of four models of all complications were shown in Table 3, and ROC curves of 4 primary outcomes were plotted in Fig. 1.

Table 3 Model evaluation.

Full size table

XGBoost (AUC: 0.99; F1 score 0.70 for septic shock; AUC: 0.88; F1 score 0.58 for 30-days mortality; AUC: 0.88; F1 score 0.55 for thrombocytopenia; AUC: 0.89; F1 score 0.40 for liver dysfunction) achieved the highest AUC and F1 score, which means it is the most robust model.

Compared to other algorithms, XGBoost has overall better performance in terms of AUC, test accuracy, and F1 score in respective. In Fig. 2, decision curve analysis showed that, in terms of net benefit, XGBoost and RF were better than LR and ANN. Besides, XGBoost is slightly better than RF. Therefore, we selected XGBoost model as our final model in this study, and based on XGBoost model files we built up a Windows 10 software, which can find the download link from the website https://github.com/Zhihua-PredictionModel/ML-Prediction-Model, to present our research results as shown in Fig. 3. Source files of XGBoost for 4 outcomes from Sklearn were also uploaded to the website. Other researchers or programmers can easily apply these trained model files (“.model” file can be loaded by joblib, a package in Python) to the practical customized use of prediction.

The top 5 predictors that influenced the decision making of XGBoost were calculated as shown in Table 4. The first, second, and third predictors of 4 outcomes are as follow. 30-days mortality: vasopressin (first), PH (second), and creatinine (third); septic shock: hemoglobin, hematocrit, and lactate; severe thrombocytopenia: vasopressin, bicarbonate, lactate; liver dysfunction: partial thromboplastin time, gender, partial pressure of oxygen.

Table 4 Feature importance outputted by XGBoost.

Full size table

Discussion

In our study, four machine learning models were constructed and compared for 30-days mortality and 3 comorbidities after heart-related surgery. Other researchers also conducted many researches on the prediction model for patients. Based on 2010 patients in the database of Seoul National University Hospital, Lee et al¹² found that among machine learning algorithms including decision tree, support vector machine, and random forest, XGBoost (Test accuracy: 0.74; AUC: 0.78) has the best performance to predict AKI after cardiac surgery and a website was created to process patients’ data in real-time. Kilic et al¹³ also applied XGBoost to predict multiple complications, including operative mortality (AUC: 0.771), renal failure (AUC: 0.776), prolonged ventilation (AUC: 0.739), reoperation (AUC: 0.637), stroke (AUC: 0.684), and deep sternal wound infection (AUC: 0.599), for adult patients after surgical aortic valve replacement in the Society of Thoracic Surgeons National Database. In addition, other researchers usually paid attention to common complications such as AKI, sepsis, and hospital mortality. However, researches on other complications were limited. Therefore, our study managed to predict 30-days mortality, septic shock, liver dysfunction, and severe thrombocytopenia which is also important for patients’ prognosis.

Several predictors of different comorbidities were outputted by XGBoost. According to Table 4, among 4 primary outcomes, lactate and platelet appeared 3 times, vasopressin, creatinine, platelet, appeared 2 times, which means they were the important factors for our outcomes. Models built by Kilic et al¹³ also showed that creatinine is an important factor to predict mortality, renal failure after heart surgery.

Our study has some limitations. Firstly, all experiments were conducted on a clinical database of critically ill patients called MIMIC-III, which means our machine learning models may have a good performance on those who are critically ill and are living in America. However, models may not work that well on people living in other regions. Therefore, further study is needed to obtain as much as possible data from various databases to construct a more comprehensive model that can work well on any population in any area.

Secondly, sample imbalance problem occurred in the experiment. There is a trade-off between accuracy, F1 score, precision, recall, and AUC because medical data usually are highly unbalanced that among 100 patients, there may be three positive samples and 97 negative samples (normal samples). In this situation, ML algorithms tend to classify samples into the class with most data. Therefore, we adjusted the weight of the positive samples of complications in the loss function and it improved precision, recall, and F1 score of models at the cost of reducing AUC and accuracy. And by setting other hyperparameters and using subsample, we make XGBoost keep a good balance between precision and recall. Facing with unbalanced medical data, how to improve accuracy, F1 score, and AUC simultaneously as much as possible remains an open problem.

In conclusion, four machine learning algorithms were built to predict 30-days mortality and 3 comorbidities after open-heart surgery. XGBoost model was the most robust model with the highest AUC, F1 score, and net benefit. Besides, Windows 10 software was created and is available on the website mentioned above for clinical staff. Moreover, multiple predictors outputted by XGBoost model indicated the relevance between these factors and comorbidities, and generated a hypothesis. Besides, whether these factors can be independent biochemical indexes remain open problems.

Methods

Data source and participants

Medical Information Mart for Intensive Care (MIMIC-III) is a freely available database containing critically ill patients who were admitted to the ICU of the Beth Israel Deaconess Medical Center between 2001 and 2012¹⁹. Those who were under coronary artery bypass surgery, aortic valve replacement, or insertion of the implantable heart assist system (including ICD9 code 3961, 3615, 3612, 8872, 3521, 6311, 3522, 3614, 3733, 3524) were enrolled in this study. 6844 related samples were extracted from MIMIC-III clinical database by using PostgreSQL and Python 3 (version 3.7.8).

Definition and primary outcomes

The primary outcomes were 30-days mortality, and three comorbidities including septic shock, liver dysfunction, and severe thrombocytopenia after heart-related surgery. 30-days mortality was defined as death after discharge from ICU within 30 days. A patient will be marked as liver dysfunction if his/her first test value of aspartate transaminase (AST) and alanine transaminase (ALT) was normal (10–45 IU/L for ALT; 10–35 IU/L for AST) and values, tested later, of AST or ALT were greater than the max normal value (45 IU/L for ALT; 37 for AST)²⁰. Severe thrombocytopenia was considered as that first platelet count was higher than 50 K/uL and one of later platelet count was lesser than 50 K/uL¹⁴. Considering septic shock is a severe disease with acute symptoms, it was diagnosed by ICD-9 code (785.52) in MIMIC-III database²¹. Only data of the first time ICU admission for each patient was considered.

Machine learning models

Logistic regression (LR) is a classic classification algorithm that makes a linear combination of input variables and uses the sigmoid function to output a probability. Main LR hyperparameter is C.

Neurons in artificial neural network (ANN) make a linear combination of the output value from the upper layers’ neurons, pass it through sigmoid functions, and finally output a value to the next neurons²². The width and depth of hidden layers influence the performance of ANN.

Compared to a single classifier, the ensemble learning algorithm random forest (RF), merging multiple weak classifiers to a strong classifier, showed a more powerful performance in the classification task²³. Main hyperparameters are n_estimators, max_depth, and max_leaf_nodes.

Extreme gradient boosting (XGBoost) is also an ensemble model of decision trees²⁴. Main parameters are n_estimators, max_depth, reg_lambda, gamma, min_child_weight, scale_pos_weight (when samples are unbalanced, this parameter can change the weight of positive samples in loss function), max_delta_step, and subsample.

Statistical method

35 input variables including demographics, use of vasopressin, laboratory variables, vital signs, comorbidities, urine output of the first day, and 4 output variables were extracted from the database as shown in Table 1. Some important variables such as fraction of inspiration O2 (Fio2) were excluded due to too much missing data. Variables that have less than 40% missing data were retained²⁵. All missing values were filled with the average value of this variable. A statistical method called winsorization was used to deal with the outliers.

Figure 4 showed the flow chart of data process. After that, 6844 samples were randomly divided into training data (5475), and test data (1369) in a ratio of 80–20%. Chi-square test and Wilcoxon rank-sum test were used to compare the differences of categorical and continuous variables respectively. They were employed to compare the differences between training data and test data to make sure the distributions of two datasets were as same as possible. In Table 1, P value was calculated and P > 0.05 was considered there were no significant distribution differences between training data and test data. In Table 2, Chi-square test and Wilcoxon rank-sum test were used to compare the difference between positive samples and negative samples to observe the correlation between the independent variables X and outcome variables Y. P < 0.05 was considered there was strong correlation.

Machine learning models training

Training data were evenly split into 5 parts that 4 parts were used to train a model of a certain hyperparameter, and the remaining one, also called validation set, was used to test the performance of this parameter. This process will conduct 5 times to gain 5 validation scores and the average score was used to evaluate the performance of the model. Data scientists usually call this method fivefold cross-validation which usually was used to select the best hyperparameter. 4 machine learning algorithms including LR, ANN, RF, and XGBoost were employed to fit the data, and all of these models have many hyperparameters that need to be specified as shown in Table 5. By applying grid search techniques, various kinds of parameters were searched automatically in Python and the best one was selected. Each model has its own best parameter. By comparing four models' performance, the best one, XGBoost, was picked up to be the final model of our study. We continued to fine-turn hyperparameters of XGBoost manually to obtain a better performance.

Table 5 Candidate parameters for grid search and fine-tune of parameters.

Full size table

It will overestimate the model performance if just using the validation set and its score to evaluate the model, and because of that, test data, divided at the beginning, will be utilized to obtain a final score whose results were presented in Table 3. Besides, decision curve analysis was also applied to evaluate the model as shown in Fig. 2. All machine learning experiments were conducted on Python (version 3.7.8).

Data availability

Original data were extracted from the MIMIC-III database by Z.Z., the first author, who passed the online training and obtained access to the database, https://mimic.mit.edu. If needed, related data of this article can be obtained by contacting F.L., the corresponding author, on reasonable request.

References

Montrief, T., Koyfman, A. & Long, B. Coronary artery bypass graft surgery complications: A review for emergency clinicians. Am. J. Emerg. Med. 36(12), 2289–2297 (2018).
Article Google Scholar
Solanki, J. et al. Heparin-induced thrombocytopenia and cardiac surgery. Semin. Thorac. Cardiovasc. Surg. 31(3), 335–344 (2019).
Article Google Scholar
Krajinovic, V., Ivancic, S., Gezman, P. & Barsic, B. Association between cardiac surgery and mortality among patients with infective endocarditis complicated by sepsis and septic shock. Shock 49(5), 536–542 (2018).
Article Google Scholar
Stephens, R. S. & Whitman, G. J. Postoperative critical care of the adult cardiac surgical patient: Part II: Procedure-specific considerations, management of complications, and quality improvement. Crit. Care Med. 43(9), 1995–2014 (2015).
Article Google Scholar
He, S. et al. Ventilator-associated pneumonia after cardiac surgery: A meta-analysis and systematic review. J. Thorac. Cardiovasc. Surg. 148(6), 3148–3231 (2014).
Article Google Scholar
Pouplard, C., Regina, S., May, M. A. & Gruel, Y. Heparin-induced thrombocytopenia: A frequent complication after cardiac surgery. Arch. Mal. Coeur Vaiss. 100(6–7), 563–568 (2007).
CAS PubMed Google Scholar
Squiccimarro, E. et al. Prevalence and clinical impact of systemic inflammatory reaction after cardiac surgery. J. Cardiothorac. Vasc. Anesth. 33(6), 1682–1690 (2019).
Article Google Scholar
Lysak, N., Bihorac, A. & Hobson, C. Mortality and cost of acute and chronic kidney disease after cardiac surgery. Curr. Opin. Anaesthesiol. 30(1), 113–117 (2017).
Article Google Scholar
Meyer, A. et al. Machine learning for real-time prediction of complications in critical care: A retrospective study. Lancet Respir. Med. 6(12), 905–914 (2018).
Article Google Scholar
Lei, G., Wang, G., Zhang, C., Chen, Y. & Yang, X. Using machine learning to predict acute kidney injury after aortic arch surgery. J. Cardiothorac. Vasc. Anesth. 10, 3321–3328 (2020).
Article Google Scholar
Tseng, P. Y. et al. Prediction of the development of acute kidney injury following cardiac surgery by machine learning. Crit. Care 24(1), 478 (2020).
Article Google Scholar
Lee, H. C. et al. Derivation and validation of machine learning approaches to predict acute kidney injury after cardiac surgery. J. Clin. Med. 7(10), 322 (2018).
Article Google Scholar
Kilic, A., Goyal, A., Miller, J. K., Gleason, T. G. & Dubrawksi, A. Performance of a machine learning algorithm in predicting outcomes of aortic valve replacement. Ann. Thorac. Surg. S0003–4975(20), 31156–31165 (2020).
Google Scholar
Vardon-Bounes, F. et al. Platelets are critical key players in sepsis. Int. J. Mol. Sci. 20(14), 3494 (2019).
Article CAS Google Scholar
Hui, P., Cook, D. J., Lim, W., Fraser, G. A. & Arnold, D. M. The frequency and clinical significance of thrombocytopenia complicating critical illness: A systematic review. Chest 139(2), 271–278 (2011).
Article Google Scholar
Kunutsor, S. K., Apekey, T. A. & Khan, H. Liver enzymes and risk of cardiovascular disease in the general population: A meta-analysis of prospective cohort studies. Atherosclerosis 236(1), 7–17 (2014).
Article CAS Google Scholar
Ambrosy, A. P. et al. Romanian acute heart failure syndromes (RO-AHFS) study investigators. The predictive value of transaminases at admission in patients hospitalized for heart failure: findings from the RO-AHFS registry. Eur. Heart J. Acute Cardiovasc. Care 2(2), 99–108 (2013).
Article Google Scholar
Font, M. D., Thyagarajan, B. & Khanna, A. K. Sepsis and septic shock—Basics of diagnosis, pathophysiology and clinical decision making. Med. Clin. N. Am. 104(4), 573–585 (2020).
Article Google Scholar
Johnson, A. E. W. et al. MIMIC-III, a freely accessible critical care database. Sci. Data 3, 160035 (2016).
Article CAS Google Scholar
Athyros, V. G. et al. Safety and efficacy of long-term statin treatment for cardiovascular events in patients with coronary heart disease and abnormal liver tests in the Greek Atorvastatin and Coronary Heart Disease Evaluation (GREACE) Study: A post-hoc analysis. Lancet 376(9756), 1916–1922 (2010).
Article CAS Google Scholar
Dellinger, R. P. et al. Surviving sepsis campaign: international guidelines for management of severe sepsis and septic shock, 2012. Intensive Care Med. 39, 165–228 (2013).
Article CAS Google Scholar
LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521(7553), 436–444 (2015).
Article CAS ADS Google Scholar
Breiman, L. Random forests. Mach Learn. 45(1), 5–32 (2001).
Article Google Scholar
Chen, T., & Guestrin, C. 2016. XGBoost: A scalable tree boosting system. in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’16) 785–794. (Association for Computing Machinery, New York, NY, USA).
Yang, J. et al. Brief introduction of medical database and data mining technology in big data era. J. Evid. Based Med. 13(1), 57–69 (2020).
Article Google Scholar

Download references

Funding

There is no funding.

Author information

Authors and Affiliations

College of Information Science and Technology, Jinan University, Guangzhou, China
Zhihua Zhong
College of Traditional Chinese Medicine, Jinan University, Guangzhou, China
Xin Yuan
Department of Nephrology, The First Affiliated Hospital of Jinan University, 613 W.Huangpu Avenue, Guangzhou, 510632, China
Zhihua Zhong, Shizhen Liu & Fanna Liu
College of Cyber Security, Jinan University, Guangzhou, China
Yuer Yang

Authors

Zhihua Zhong
View author publications
You can also search for this author in PubMed Google Scholar
Xin Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Shizhen Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yuer Yang
View author publications
You can also search for this author in PubMed Google Scholar
Fanna Liu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Z.Z. designed this study, extracted and preprocessed original data from MIMIC-III, conducted the computer experiment and data analysis, and wrote the first version manuscript. X.Y. and S.L. revised the manuscript and gave lots of important suggestions. Y.Y. made the Windows 10 software in this article. F.L. revised the final version manuscript. All authors reviewed the manuscript.

Corresponding author

Correspondence to Fanna Liu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Zhong, Z., Yuan, X., Liu, S. et al. Machine learning prediction models for prognosis of critically ill patients after open-heart surgery. Sci Rep 11, 3384 (2021). https://doi.org/10.1038/s41598-021-83020-7

Download citation

Received: 22 September 2020
Accepted: 22 January 2021
Published: 09 February 2021
DOI: https://doi.org/10.1038/s41598-021-83020-7

This article is cited by

Post-surgery survival and associated factors for cardiac patients in Ethiopia: applications of machine learning, semi-parametric and parametric modelling
- Melaku Tadege
- Awoke Seyoum Tegegne
- Zelalem G. Dessie
BMC Medical Informatics and Decision Making (2024)
Diagnosis of autism spectrum disorder based on functional brain networks and machine learning
- Caroline L. Alves
- Thaise G. L. de O. Toutain
- Francisco A. Rodrigues
Scientific Reports (2023)
Big Data in cardiac surgery: real world and perspectives
- Andrea Montisci
- Vittorio Palmieri
- Claudio Napoli
Journal of Cardiothoracic Surgery (2022)
Development and validation of a prediction model for in-hospital mortality of patients with severe thrombocytopenia
- Yan Lu
- Qiaohong Zhang
- Jinwen Jiang
Scientific Reports (2022)
A novel nomogram for predicting 3-year mortality in critically ill patients after coronary artery bypass grafting
- HuanRui Zhang
- Wen Tian
- YuJiao Sun
BMC Surgery (2021)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Introduction

Result

Study population

Machine learning models’ performance

Discussion

Methods

Data source and participants

Definition and primary outcomes

Machine learning models

Statistical method

Machine learning models training

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's note

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Comments

Search

Quick links