Sepsis is a complicated medical condition caused by an infection that triggers a systemic inflammatory response1. It is the most common and dangerous cause of illness and death in critically ill people2. It is well known that sepsis frequently leads to acute kidney injury (AKI). AKI occurs in around 40% of people with severe sepsis, increasing the difficulty, cost, and likelihood of death during treatment3,4,5,6. Sepsis-associated acute kidney injury (SA-AKI) is a complicated condition, including multiple contributing factors associated with a worse prognosis, a longer length of hospital stay, and a more significant number of co-morbidities than in sepsis patients with no AKI5,6,7. It is crucial to accurately predict the prognosis for SA-AKI patients in the intensive care unit (ICU) due to their critical condition.

In critical care medicine, the prognosis of SA-AKI patients is a hot topic. Several scoring systems have been developed to predict outcomes in patients with SA-AKI; however, their performances have been disappointing due to low specificity and sensitivity. These scoring systems include the sequential organ failure assessment (SOFA) score, the simplified acute physiology score II (SAPS II), and the acute physiology and chronic health evaluation II8,9. Moreover, some multivariate prediction models for predicting the outcome of patients with SA-AKI have been developed. These models are based on standard statistical techniques like logistic regression and the Cox proportional risk model. Hu et al. used the Cox proportional risk model to construct a mortality prediction model for 2066 patients with SA-AKI, showing a preferable forecast performance9. However, the links between variables are intricate, including both linear and non-linear relationships; the Cox proportional risk model is, by default, calibrated to handle linear associations between dependent and independent variables, which may oversimplify more complex non-linear relationships. In addition, the Cox proportional risk model is susceptible to multicollinearity between variables, which might lower the model's performance. Consequently, it is crucial to investigate more effective and precise prediction techniques in the care of SA-AKI patients.

As statistical theory and computer technology have advanced, so has an interest in and acceptance of machine learning (ML) among medical professionals. Predictive models for various diseases have significantly benefited from cutting-edge ML approaches, outperforming their more conventional logistic and Cox regression-based counterparts10,11. The clinical applications of ML have ranged from diagnosis to prediction and have been utilized in various clinical domains12,13,14. ML methods have also been used to forecast the prognoses of critically ill patients, with results that are superior to those obtained using the more conventional methods of logistic regression and Cox regression analysis15,16,17,18. However, the advantage of ML algorithms in predicting mortality in SA-AKI patients has not yet been demonstrated. This research tried to create and verify ML models for early predicting in-hospital mortality in SA-AKI patients.


Database introduction

The Medical Information Mart for Intensive Care IV (MIMIC IV) database is an integrated, de-identified, and full clinical dataset that covers all patients who were hospitalized in the ICUs at Beth Israel Deaconess Medical Center in Boston, Massachusetts, between the years 2008 and 201919. We acquired the certificate necessary to apply for database access after passing the exam to ensure the safety of human study participants (No. 35970146). Patient permission and an ethical approval statement were unnecessary because the experiment would not have affected clinical care, and all patient data had already been de-identified20. This research was carried out in conformity with the principles of the 2013 Helsinki Declaration.

Study population

This study enrolled adults with sepsis who developed AKI within 48 h of ICU admission. Patients with sepsis were identified within 24 h of ICU admission using the Sepsis-3 criteria, which required the presence of both a probable infectious cause and a SOFA score ≥ 221. AKI was diagnosed using the Kidney Disease: Improving Global Outcomes Clinical Practice Guideline’s (2012) suggested criteria of serum creatinine (Scr) and urine output22. The first available Scr following ICU admission was used as the baseline value if no Scr was available prior to admission23. Due to the importance of maintaining data independence, only the initial ICU hospitalization was included in the study if the patient had multiple admissions. Patients under the age of 18 or with ICU stays shorter than 48 h were excluded.

Data collection

We gathered data on the patient’s demographic features, chronic disease history, vital signs, laboratory results, Treatments, illness severity scores, and outcomes.

Demographic features obtained for the research consisted of age, sex, and weight. Chronic disease history included chronic pulmonary disease, peptic ulcer disease, peripheral vascular disease, myocardial infarction, cerebrovascular disease, diabetes, acquired immune deficiency syndrome, renal disease, dementia, rheumatic disease, paraplegia, liver disease, cancer, and congestive heart failure. We gathered mean values for vital information, such as heart rate, mean arterial pressure, respiration rate, body temperature, and SpO2, in the first 24 h after ICU admission. The highest values for a variety of laboratory results were collected in the first 20 h following ICU admission. These tests included the Scr, serum glucose, serum chloride, serum calcium, hematocrit, hemoglobin, platelets, anion gap, white blood cell, international normalized ratio, bicarbonate, serum sodium, blood urea nitrogen, serum potassium, prothrombin time, and partial thromboplastin time. We also recorded the quantity of urine passed in the first 24 h after ICU admission. Treatments included using renal replacement therapy, vasopressors, and mechanical ventilation during the first 24 h after ICU admission. In the first 24 h following ICU admission, we analyzed the initial SOFA score and SAPS II to determine the severity of the patient’s conditions.

Preprocessing of data

In this study, missing values for all variables were fewer than 20% (See Supplementary Table S1). When dealing with missing data, we employed the multiple imputation method implemented in the Python ‘miceforest’ package, widely acknowledged as a superior strategy for missing variables. We identified potential mortality-related variables using the least absolute shrinkage and selection operator (LASSO) analysis to reduce overfitting.

Statistical analysis

The median and interquartile range (IQR) were used to describe the normal distribution of continuous variables, whereas numbers and percentages were used to describe categorical variables. If applicable, The Mann–Whitney or Student’s t-test. Comparing continuous variables between groups using the U test. Apply either the Pearson chi-squared or Fisher's exact test to evaluate the significance of group differences in categorical variables.

R version 4.2.1 and Python version 3.9.12 were used for all statistical analyses. A two-tailed P value below 0.05 was considered statistically significant.

Machine learning

In this study, all participants were randomly divided into two sets a training set (consisting of 80% of patients) and a validation set (consisting of 20% of patients) (See Supplementary Table S2). The dimension of the features was reduced using the LASSO technique. Six different ML methods—logistic regression, support vector machine (SVM), k-nearest neighbor (KNN), decision tree, random forest (RF), and extreme gradient boosting (XGBoost)—were used to create and test models for predicting risks of in-hospital mortality. The ML model prediction ability was measured using accuracy, area under curve (AUC), sensitivity, specificity, and average precision. After considering accuracy and AUC, we settled on our final candidate model. To compare the predictive power between the models, we performed decision curve analysis (DCA) and plotted calibration curves. Using SHapley Additive exPlanations (SHAP) values, we characterized the crucial characteristics that affect mortality risk in the best ML model to study further each characteristic’s significance to the optimal model’s output. At last, the Local Interpretable Model-Agnostic Explanations (LIME) technique is used to fit the model's expected behavior. Finally, a sensitivity analysis of the results was performed.

Ethics approval and consent to participate

MIMIC IV was set up with the approval of the Institutional Review Board at the Massachusetts Institute of Technology. All participant data were anonymized to safeguard their privacy. Due to the use of anonymized health records, ethical approval and informed consent were not required. This study adheres to the ethical criteria outlined in the Helsinki Declaration of 1964.



The total number of people with SA-AKI who were considered for inclusion was 16,473. However, 2269 were ruled out because they had more than one ICU hospitalization, and 6592 excluded for less than 48 h in ICU. In the end, 8129 patients qualified for the study (Fig. 1). The prevalence of death in hospitals was 20% (1629/8129). The median age of these patients was 68.7 (IQR: 57.2–79.6) years, and 57.9% (4708/8129) were male. The top three comorbidities were congestive heart failure (2831/8129, 34.8%), diabetes (2566/8129, 31.6%), and chronic pulmonary illness (2358/8129, 29.0%). Table 1 provides a summary of the base characteristics of the dataset.

Figure 1
figure 1

The flowchart of patient selection. MIMIC IV Medical information mort for intensive care IV, ICU Intensive care unit, SA-AKI Sepsis-associated acute kidney injury.

Table 1 Demographic and clinical characteristics at baseline.

Predictor selection

Multiple imputations were employed to fill in missing values for each variable. On the first day after being admitted to the ICU, 44 variables were gathered and analyzed using LASSO regression. Twenty-four variables were found to be statistically significant predictors of mortality after feature selection with LASSO analysis (See Supplementary Fig. S1), and these are listed in detail in Supplementary Table S3.

Model development and validation

The total number of patients was 8129, and they were divided randomly between a training group of 6503 (80%) and a validation cohort of 1626 (20%). There were no significant differences in the baseline features between the training and validation sets. Using LASSO regression’s selected 24 variables, we built six ML models: logistic regression, SVM, KNN, decision tree, RF, and XGBoost. The XGBoost model achieved the highest AUC (0.794) in the validation cohort, outperforming logistic regression (0.730), SVM (0.680), KNN (0.601), decision tree (0.585), and RF (0.778) (Fig. 2A). In order to dig even deeper into the performance of the six models, we also measured their accuracy, sensitivity, specificity, and average precision, and the outcomes are tabulated in Table 2. Other clinical disease severity ratings [SOFA score (AUC: 0.701); SAPS II (AUC: 0.706)] did not perform as well as the XGBoost model (Fig. 2B). The DCA curves and calibration curves show that the XGBoost model performs best among the six models (See Supplementary Figs. S2 and S3).

Figure 2
figure 2

ROC curves for the ML models and the traditional severity of illness scores to predict in-hospital mortality. (A) ROC curves for the six ML models used to predict in-hospital mortality; (B) ROC curves for the traditional severity of disease scores used to predict in-hospital mortality. ROC Receiver operating characteristic, SVM Support vector machine, KNN k-nearest neighbors, AUC Area under the curve, SOFA Sequential organ failure assessment, SAPS II Simplified acute physiology score II.

Table 2 Performance comparison of the machine learning models in the validation set.

Model explainability

With SHAP values, we hoped to provide more insight into how the XGBoost model predicts deaths. For the XGBoost model, the SHAP summary graphic reveals that the SOFA score, respiratory rate, SAPS II, and age are the four most significant parameters (Fig. 3). In addition, we used SHAP dependence analysis to illustrate the impact of a single input variable had on the XGBoost prediction model’s final results (Fig. 4). Figure 5 displays the findings of a more in-depth analysis of the four most influential clinical features on the XGBoost prediction model's output.

Figure 3
figure 3

The top 20 important features derived from the XGBoost model. SHAP indicates the importance ranking of features. The significance of each covariate in the construction of the final predictive model is represented by the matrix plot. SHAP SHapley additive explanation, SOFA Sequential organ failure assessment, SAPS II Simplified acute physiology score II, INR International normalized ratio, PTT Partial thromboplastin time, BUN Blood urea nitrogen.

Figure 4
figure 4

SHAP summary plot of the top 20 features of the XGBoost model. The greater the SHAP value of a characteristic, the greater the likelihood of death development. The abscissa represents the SHAP value, and each line represents a feature. Red dots indicate greater feature values, whereas blue dots indicate lower feature values. SHAP SHapley additive explanation, SOFA Sequential organ failure assessment, SAPS II Simplified acute physiology score II, INR International normalized ratio, PTT Partial thromboplastin time, BUN Blood urea nitrogen.

Figure 5
figure 5

SHAP dependence plot of the XGBoost model. (A) SOFA score; (B) Respiratory rate; (C) SAPS II; (D) Age. Certain SHAP levels surpass zero indicates an elevated risk of death. SHAP SHapley additive explanation, SOFA Sequential organ failure assessment, SAPS II Simplified acute physiology score II.

We then took two random samples from the validation set and ran them through the LIME algorithm to shed light on the individual mortality forecast. Figure 6A depicts the case of death reported by the LIME algorithm. 76% was the expected probability of death according to the XGBoost model. The XGBoost model found a SOFA score of 17, a SAPS II of 77, a temperature of 36.27 °C, a history of malignancy, and a hemoglobin level of 9.6 g/dL were all associated with an elevated risk of mortality. Scr levels below 3.2 mg/dL and the absence of a prior history of cerebrovascular illness or paraplegia were found to reduce mortality risk. Both the XGBoost model and the actual outcome for this patient were death. Similarly, Fig. 6B illustrates a survival example utilizing the LIME technique. 10% was the expected probability of death according to the XGBoost model. The patient's age of 86.42 years, the history of cerebrovascular illness, and the hemoglobin level of 9 g/dL increase the risk of mortality, whereas a SOFA score of 4, the absence of a history of cerebrovascular disease, a respiratory rate of 14.98 beats per minute, the absence of a history of paraplegia, and the absence of a history of liver disease decrease the risk of death. Both the actual and expected outcomes confirmed the XGBoost model's prediction of the patient’s survival.

Figure 6
figure 6

LIME algorithm for explaining individual’s prediction results. Screenshot of the death prognosis for SA-AKI patients. (A) Utilizing the LIME method, show a death case. (B) Present a case of survival using the LIME method. The left portion of the picture depicts expected LIME findings. The center section lists, from highest to lowest, the eight variables that had the greatest impact on survival or death. The length of the bar for each feature reflects the weight of that feature in the prediction. A longer bar represents a characteristic that contributes more to survival or mortality. The right panel displays the crucial values of these eight factors at which they had the greatest influence on survival or death. SA-AKI Sepsis-associated acute kidney injury, SOFA Sequential organ failure assessment, SAPS II Simplified acute physiology score II, SpO2 Oxygen saturation, PTT Partial thromboplastin time, WBC White blood cell.

Sensitivity analyses

For patients without renal disease (N = 1232), the XGBoost model remained robust in predicting mortality in these patients (AUC: 0.808). Detailed results are shown in Supplementary Fig. S4.


This study developed and verified six ML methods for estimating the risk of in-hospital death among patients with SA-AKI. In predicting mortality in patients with SA-AKI, the XGBoost models outperformed other ML models (such as logistic regression, SVM, KNN, decision tree, and RF models) and conventional risk scores (such as the SOFA score and SAPS II). After doing a feature importance analysis, we found that the SOFA score, respiratory rate, SAPS II, and age were the top 4 features of the XGBoost model in terms of their ability to predict mortality. Also, we have documented how these factors influenced the XGBoost model. Finally, we use the LIME algorithm for personalized forecasting.

SA-AKI is frequent in critically ill patients and is characterized by rapid clinical deterioration and a considerably greater mortality rate than patients without AKI or those with other variables producing AKI24. Clinicians require accurate prediction models to evaluate the risk of dying and make appropriate therapy choices for severely ill patients with SA-AKI. For outcome prediction in intensive care settings, generic metrics such as the SOFA score and SAPS II are widely applied. The SOFA and SAPS II scoring systems have a number of shortcomings compared to ML models, including unsatisfactory predictive performance, poor specificity and sensitivity, a wide range of variability, and a laborious procedure10. According to the results of this research, the conventional severity scoring methods performed poorly compared to the ML model. This could be due to the two reasons listed below. First, the risk of bad outcomes in critically sick patients was assessed using the SOFA and SAPS II scoring systems, which relied heavily on the practitioner’s prior expertise25. Second, unlike multivariate models, these scoring systems cannot analyze a large number of potentially valuable variables, reducing their predictive power26.

Consistent with previous studies, our results show that the XGBoost model outperforms the other ML models in predicting death in SA-AKI patients. Liu et al. found that the XGBoost model predicted mortality in AKI patients better than logistic regression, SVM, and RF27. Zhu et al. evaluated the prediction of hospital mortality for patients on mechanical ventilation and discovered that the XGBoost model performed better than the RF, logistic regression, decision tree, and KNN models28. It is possible that multiple factors contributed to the boost in prediction abilities seen in XGBoost models. Firstly, the XGBoost method, derived from the gradient tree boosting framework, is highly skilled at fitting high-order interactions, discontinuities, and non-linearities. Second, the XGBoost method is resistant to outliers in the predictor variables and multicollinearity among those variables.

To enhance the interpretability of the model, we employ the SHAP value to explain and show the most influential elements of the prediction results. This study's analysis of the XGBoost model’s summary of feature importance found that the SOFA score was the most important predictor of mortality in patients with SA-AKI. The SOFA score quantifies organ impairment by measuring the burden of organ malfunction. The SOFA score was found to have a significant correlation with clinical outcomes, with a high SOFA score typically suggesting a critical condition and poor prognosis. Despite this, none of the prior models predicted the probability of mortality for SA-AKI patients using this crucial factor9,28. The SOFA score was the most heavily weighted variable in the XGBoost model, showing its significance in predicting the mortality of patients with SA-AKI in the present study. Additionally, we found that the rate of breathing is significantly correlated with the risk of dying from SA-AKI. Some research has demonstrated a correlation between breathing rate and inferior outcomes until the present day29. The SAPS II was an additional influential predictor of SA-AKI outcomes. SAPS II is a measure of disease severity, and greater SAPS II scores are linked to higher in-hospital death in critically sick patients30. Our results also showed that age was a significant risk factor for death among critically ill patients identified with SA-AKI. In the absence of co-morbidities and advanced age, the mortality rate from sepsis is less than 5%, according to a study conducted in Australia and New Zealand31.

However, this study also had some shortcomings. First, our study may be subject to selection bias because of its retrospective and observational design. Second, the current study was limited in its ability to conclude cause and effect because it was a retrospective modeling study conducted at a single location utilizing the MIMIC IV database. That being said, more prospective randomized clinical trials are needed to verify our model's efficacy. Third, we estimated specific missing data using the filling method, which may result in divergence from the genuine value. Finally, this study only validated the models internally; therefore, multicenter external validation is still needed to assess the models’ predictive ability. Finally, this study only validated the models internally; therefore, multicenter external validation is still required to assess the predictive potential of the models.


We built and verified ML models that excel in early mortality risk prediction in SA-AKI. The XGBoost model is the most effective of all the algorithms. Further SHAP values and LIME method indicated that SOFA score, respiratory rate, SAPS II, and age were played as the marked contributors for the prediction of death in SA-AKI patients. These findings would be helpful for clinical prediction. However, multi-center studies are still necessary to ensure that if this ML model are broadly applicable and generalizable to various settings and associated with improved clinical decision-making and outcomes.