Introduction

Sepsis is a life-threatening disease, accounting for approximately 11 million deaths per year1. Multiple organ dysfunction can caused by sepsis, and kidney is a target organ that most vulnerable to inflammatory damage. Observational studies have demonstrated an acute kidney injury (AKI) incidence of about 40% in patients with sepsis2. Conversely, sepsis contribute nearly half of the AKI occurring, and was reported as the most frequent aetiology of AKI3,4.

Renal replacement treatment (RRT) plays an important role in managing AKI. RRT includes methods such as intermittent hemodialysis (IHD) and continuous RRT (CRRT), with the latter typically involving continuous veno-venous hemodialysis (CVVHD) or hemodiafiltration (CVVHDF). CRRT is more popular used in intensive care settings, as it can better maintain hemodynamic stability5. Although RRT is widely used in AKI, there are still some questions yet to be answered6. Identifying the patient most likely to benefit from CRRT stands as a primary concern. Someone believes RRT may positively influence sepsis patients because RRT can modulate inflammation7. However, despite the removal of sepsis-related cytokines during RRT8, no reduction in mortality has been observed in sepsis patients, neither with high cut-off membrane filters9 nor with absorbing filters10. This highlights the importance of precise patient selection for CRRT to improve treatment outcome.

The initiation timing of CRRT is another question to be debated. A single center trial demonstrated that early RRT, compared with delayed initiation of RRT, reduced mortality over the first 90 days11, but other multi-center studies couldn’t find mortality decrease by early start of RRT in AKI12 or sepsis related AKI patients13. Thus, accurate evaluation of RRT benefits in S-AKI patients and initiating RRT treatment timely is essential for improving treatment outcomes.

Uplift modeling is an artificial intelligence technique that can identify the individual treatment effect (ITE) of an intervention in a population of individuals. Typically used in the field of marketing, it has been employed to identify the most vulnerable individuals who may be affected by online advertising or coupons14. In this study, we applied this method to estimate the ITE of S-AKI who may benefit from RRT and to identify the features of these patients.

Material and methods

Setting

This retrospective study utilized the Medical Information Mart for Intensive Care (MIMIC-III), a publicly available, large-scale critical care database. The database comprised 61,532 intensive care unit (ICU) admissions from 46,520 patients at the Beth Israel Deaconess Medical Center in Boston, MA, between 2001 and 2012. The data were collected from all patient admissions to various types of Intensive Care Units (ICUs) at the center during the specified period. The average length of stay in ICU was 2.1 (1.2–4.6) days, and the hospital mortality rate was 11.5%15. The MIMIC-III database integrates non-identifiable, comprehensive clinical data, including demographics, hourly vital signs, clinical measurements, laboratory results, treatments, and the International Classification of Diseases Ninth Revision (ICD-9) codes of diagnoses and procedures.

The data in MIMIC-III has been non-identifiable, and the institutional review boards of the Massachusetts Institute of Technology (No. 0403000206) and Beth Israel Deaconess Medical Center (2001-P-001699/14) both approved the use of the database for research. All data analysis and reporting has been performed in accordance with institutional guidelines and regulations.

Inclusion and exclusion criteria

The inclusion criteria: (1) diagnosed as sepsis; (2) suffering from AKI. Exclusion criteria: (1) not admitted to ICU for the first time. (2) Patients age < 18. (3) End-stage renal disease (ESRD). 4. Blood potassium > 6.5 mmol/L.

To diagnose sepsis, we utilized the third sepsis definition (sepsis-3), which defines sepsis as a life-threatening condition characterized by organ dysfunction caused by a dysregulated host response to infection16. We screened patients with documented or suspected infection and an acute change in total Sequential Organ Failure Assessment (SOFA) score of ≥ 2. This method closely aligns with the sepsis-3 definition and has been demonstrated to be effective in the Medical Information Mart for Intensive Care III (MIMIC-III) database17.

AKI was defined in accordance with the Kidney Disease: Improving Global Outcomes (KDIGO) criteria18, which classify AKI into three stages based on urine output and serum creatinine levels. Diagnosis of AKI was confirmed if the highest KDIGO stage during the ICU stay was greater than or equal to 1. In addition to recording the baseline KDIGO stage of each patient, we also continuously documented the KDIGO stage until the initiation of RRT.

We excluded all patients with blood potassium > 6.5, which is thought to be one of the most important reasons for urgently initiating CRRT6. We ruled out patients not on their first admission to avoid multiple records of the same patient.

The study population was divided into two groups: the RRT group and non-RRT group, based on whether they received RRT treatment during their ICU stay.

Primary endpoint

The primary endpoint of the study was 28-day mortality, which encompassed both in-hospital and out-of-hospital deaths.

Statistical analysis

Continuous variables were expressed as the mean ± SD and interquartile ranges (IQR) when the data as appropriate. Student's t-test was used for normally distributed variables, while the Mann–Whitney U test was used for skewed variables. Categorical variables were presented as counts and percentages, and compared using either the chi-square test. The estimation of sample size was carried out using a power analysis based on a two-sample t-test.

To estimate the association between RRT and outcomes, as well as to select the best-matched patients for further artificial intelligence analysis, propensity score matching (PSM) was employed in our study using a greedy nearest neighbor matching algorithm. Patients were matched at a 1:1 ratio, with each RRT patient matched to a non-RRT patients, based on estimated propensity scores. The efficacy of PSM in reducing between-group differences was assessed by calculating the standardized mean difference (SMD).

Uplift modeling

The uplift model aims to forecast the difference in outcomes between individuals who receive treatment and those who do not, while also identifying patients who are more likely to benefit from treatment. Upon reviewing previous research that employed s-learner and t-learner methods19,20, we found that the intrinsic logic of these studies was to indirectly model uplift. We utilized a class transformation method to create a new variable Z, where Z = 1 if a patient was in the RRT group and survived for 28 days, Z = 1 if a patient was in the non-RRT group and died within 28 days, Z = 0 if a patient was in the RRT group and died within 28 days, and Z = 0 if a patient was in the non-RRT group and survived for 28 days. We can prove that if the number of cases in the RRT group and non-RRT group was equal, then P (Z = 1│Xi) had a linear correlation with the uplift score, which can be calculated as Uplift Score = 2P (Z = 1│Xi) − 1. This method is applicable to cases where both treatment and outcome are binary variables, and single model prediction is used to achieve the conversion of prediction goals.

The validity of different models was evaluated using the adjusted qini curve, which was obtained by connecting proportion points of the adjusted qini index in different groups. The adjusted qini index was defined as following formula.

$${\text{Q}}\left(\mathrm{\varphi }\right)=\frac{{{\text{n}}}_{{\text{t}},1}\left(\mathrm{\varphi }\right)}{{{\text{N}}}_{{\text{t}}}}-\frac{{{\text{n}}}_{{\text{c}},1} (\mathrm{\varphi }){{\text{n}}}_{{\text{t}}} (\mathrm{\varphi })}{{{\text{N}}}_{{\text{t}}}{{\text{n}}}_{{\text{c}}} (\mathrm{\varphi })}$$

\(\mathrm{\varphi }\) is defined as the proportion of patients ranked from highest to lowest based on their Uplift Score in either the treatment or non-treatment group. For instance, \(\mathrm{\varphi }\)= 0.3 represents for the top 30% uplift score patients in treatment group or none treatment group. \({{\text{n}}}_{{\text{t}},{\text{y}}=1} (\mathrm{\varphi })\) represents the number of patients in the treatment group who are predicted to survive among the given percentage of patients.\({{\text{n}}}_{{\text{c}},{\text{y}}=1} (\mathrm{\varphi })\) represents the number of patients in the non-treatment group who are predicted to survive among the same percentage of patients. \({\mathrm{ N}}_{{\text{t}}}\mathrm{ and }{{\text{N}}}_{{\text{c}}}\) represents the total sample size of the treatment and non-treatment groups. The Area under the Uplift Curve (AUUC) is the area between the uplift curve and the random line, which serves as an indicator of model validity. A high AUUC corresponds to a high validity of the model.

The candidate predictors of our model included demographics, vital signs, SOFA and qSOFA scores, Kdigo stage, comorbidities and laboratory tests in the first 24 h after ICU admission. Predictor contributions were evaluated using the Shapley additive explanations (SHAP) strategy.

After modeling, we divided the patients in the validation cohort into a high benefit group and a low benefit group, and confirmed the characteristics of the high benefit group. We labeled the validation cohort according to the model, which allowed us to create a nomogram for patients who may benefit from RRT.

Ethical approval

The clinical data used for this research was obtained from publicly available non-identifiable database, Medical Information Mart for Intensive Care (MIMIC-III), and does not require a separate ethics approval or consent obtaining process. The Institutional Review Board at the Chinese PLA General Hospital waived the review of the research plan as the data was obtained from a publicly available database with no potential violation or infringement of the ethical regulations.

Results

Cohort selection process

Inclusion: 1. Total Admissions: Reviewed all 61,532 ICU admissions from the MIMIC III database. 2. Sepsis Identification: Identified 25,834 admissions with sepsis using Sepsis-3 Criteria. 3. AKI Definition: Defined AKI as KDIGO ≥ 1 and 12,760 (49.39%) admissions were identified as S-AKI. Exclusion: 1. Repeat ICU Admissions: Excluded 3250 patients. 2. Underage Patients: Excluded 229 patients below 18 years. 3. ESRD Diagnosis: Excluded 403 patients based on ICD-9 codes. 4. High Blood Potassium: Excluded 589 patients with blood potassium > 6.5 mmol/L, indicative of an emergency need for RRT.

Total Eligible Admissions: 8289 ICU stays. 591 patients received RRT (RRT group) and 7698 did not received RRT (non-RRT group) (Fig. 1).

Figure1
figure 1

Flow chart of the study. AKI acute kidney injury, RRT renal replacement therapy, ESRD end-stage renal disease.

Cohort characterization

The baseline characteristics of the two groups are shown in Table 1. Overall, the RRT group had a higher severity of illness than the non-RRT group. The SOFA score in the RRT group was 9 (IQR 6, 11) compared to 5 (IQR 3, 7) in the non-RRT group (p < 0.0001). On the first day after ICU entry, the RRT group was more likely to use mechanical ventilator (80.4% vs. 72.1%, p < 0.0001) and vasopressors (70.7% vs. 54.1%, p < 0.0001). The RRT group had higher levels of serum creatinine (364.21 ± 242.22 vs. 122.88 ± 87.52 mmol/l, p < 0.0001) and potassium (5.01 ± 0.81 vs. 4.79 ± 0.74 mmol/l, p < 0.0001) than the non-RRT group.

Table 1 Patients characters of RRT group, non-RRT group and non-RRT patients after PSM.

To correct for imbalances between the groups, we applied 1:1 PSM. Following PSM, the sample sizes were equalized to 458 patients in each of the RRT and non-RRT groups. This adjustment resulted in a significant reduction in discrepancies post-PSM, as shown in Table 1.

Based on a sample size of 458 in each group, with mortality rates of 0.393 and 0.323 respectively, and using an alpha level of 0.05, our sample size yields a test power exceeding 99.9%.

Primary endpoint

Prior to propensity score matching (PSM), the 28-day mortality rate was significantly higher in the RRT group compared to the non-RRT group (32.83% vs. 14.61%, p < 0.0001). After PSM, the 28-day mortality rate in the RRT group was 32.3% compared to 39.3% in the non-RRT group (p = 0.033).

Uplift modeling

The enrolled patients were divided into a development cohort consisting of 640 patients and a validation cohort consisting of 276 patients. Each cohort had an equal proportion of RRT and non-RRT patients. This division, maintaining a ratio of approximately 7:3 between the training and validation cohorts, is a common practice in such studies. After performing uplift modeling in the development cohort, several parameters were included in the model. Figure 2 shows that urine output, fluid input, mean blood pressure, body temperature, and lactate were the top five factors that most influenced the RRT effect. In addition, other predictors such as age,, SpO2, white blood cell count, ph, platelet count and potassium also had an impact on the RRT effect.

Figure 2
figure 2

SHAP value-based predictor contribution to the ITE prediction model. Feature importance is ranked based on SHAP values. In this figure, each point represented a single observation. The horizontal location showed whether the effect of that value was associated with a positive (a SHAP value greater than 0) or negative (a SHAP value less than 0) impact on prediction. Color showed whether the original value of that variable was high (in red) or low (in blue) for that observation. For example, a low white blood cell (WBC) value had a negative impact on the outcome of patient who was receiving RRT; the “low” came from the blue color, and the “negative” impact was shown on the horizontal axis.

Modeling validity

The modified qini curve was used to compare the predicting ability of our model with conventional scoring systems including SOFA and Kdigo-stage. As shown in Fig. 3, the area under the uplift curve (AUUC) of the class transformation model was 0.068, the AUUC of SOFA was 0.018, and the AUUC of Kdigo-stage was 0.050. The class transformation model was more efficient in predicting the individual treatment effect of RRT than conventional scores.

Figure 3
figure 3

Qini curve. The area under the uplift curve (AUUC) represents the area between a specific model curve and the random line, and serves as an indicator of model validity. A higher AUUC indicates higher validity, and an AUUC > 0 indicates model validity greater than random selection. In this study, the AUUC for the class transformation model was 0.074, for SOFA it was 0.022, and for Kdigo-stage it was 0.031.

Characters difference between high benefit and low benefit patients

We classified the patients in the validation group into high benefit group (n = 138) and low benefit group (n = 138), using class transformation. The high benefit individuals possessed different characteristics from low benefit individuals. As shown in Fig. 4, the high benefit group and low benefit group differed in renal-related indicators, including urine output, creatinine, BUN, pH, and chloride. Additionally, white blood cell count was found to be related to the effect of RRT.

Figure 4
figure 4

Different characteristics between RRT benefit and non-benefit patients. The seven parameter had statistic difference (p < 0.05) between two groups were lactate, body temperature, creatinine, urine output of first 24h admission in ICU, BUN, white blood count (WBC) and ph.

We constructed a logistic regression model with the benefit status of patients (benefit vs. non-benefit) as the dependent variable. Six factors with statistical significance between groups were selected as the independent variables, including urine output, creatinine, lactate, white blood cell count, glucose, respiratory rate (Table 2). We then built a nomogram of the risk predictive value of the model. As showed in Fig. 5. The nomogram illustrates the contribution rate of each risk index by the length of the corresponding line segment, providing a visual and practical representation of our model.

Table 2 Logistic regression analysis of S-AKI patients benefit from RRT.
Figure 5
figure 5

Nomogram for RRT benefit probability. Each parameter has a range of values that can be used to find different corresponding numerical values on the line segment. The sum of all the parameter values yields a score which can be used to calculate the probability of benefit from RRT therapy. wbc white blood cell.

Discussion

The optimal approach for administering RRT to AKI patients remains a critical issue in critical care medicine6. RRT has been suggested to down regulate immune responses in sepsis, introducing a promising dimension in the therapeutic approach for S-AKI patients. In this study, we classified S-AKI patients into RRT and non-RRT groups and compared their baseline characteristics and 28-day mortality rates, which were initially unbalanced. As expected, the RRT group had greater severity and had higher 28-day mortality rates. This reflecting the clinical reality that clinicians often tend to administer RRT to more severe patients. After PSM, the imbalance between the two groups was significantly reduced. These findings suggest the need for a more precise characterization of patients who can benefit from RRT. To address this issue, we utilized the uplift technique to model the treatment effect of RRT and identify S-AKI patients who may derive the most benefit from RRT. Our model outperformed traditional clinical indicators, such as the Kdigo grade, which are typically used to determine the need for RRT in S-AKI patients. Our model identified several parameters that were associated with the benefit from RRT. Notably, our findings suggest that certain inflammation-related factors, such as White Blood Cell (WBC) count is associated with improved outcomes in patients receiving RRT. Moreover, we developed a nomogram to assist in determining whether to initiate RRT in S-AKI patients.

In recent years, several randomized controlled studies (RCT) have been conducted to investigate the optimal timing of RRT in patients with AKI12,13,22, with one study specifically focusing on patients with septic AKI13. The findings from most of these studies indicate no significant difference in mortality between patients who received early RRT and those who received late RRT. For instance, the IDEAL-ICU trial recruited septic shock patients with AKI and showed no significant difference in overall mortality at 90 days between patients assigned to early or delayed RRT strategies13. Although RCTs are the gold standard for assessing treatment effects, they have limitations, such as stringent inclusion criteria and high cost23. For instance, the IDEAL-ICU trial, which was conducted in 29 ICUs over four years, screened 3753 patients, of whom 1728 met the inclusion criteria, but only 488 were randomized, with 246 patients in the early RRT group and 242 patients in the delayed RRT group13. The difficulties RCTs face in enrolling critically ill patients make the use of Real-World Evidence (RWE) data for analysis even more meaningful. RWE data, as a vital complement to RCTs, has been acknowledged in medical research24. In our study, we focused on S-AKI, which is a life-threatening subtype of AKI associated with sepsis. Our study employed real-world big data and enrolled 458 patients in the RRT group, and an equal number of 458 patients in the non-RRT group. Our study aims to supplement the gaps in RCTs and suggest possible approaches for administering RRT to S-AKI patients.

Prospective research has also investigated the indications for RRT in AKI25,26,27,28. However, we identified major selection bias in these studies. All these studies analyzed only patients who received RRT, patients who recovered from severe acute kidney injury without ever receiving RRT were excluded. As these patients might be more likely to have a good prognosis compared with those who undergo RRT, omission of these patients constitutes a major selection bias. Some clinicians have also raised this point29. This selection bias is also present in research that uses machine learning algorithms30. The models built by such studies can only predict the patients who have received RRT but not all patients who may need RRT. In our study, we included all patients meeting the criteria of Sepsis-3 and KDIGO stage > 1, regardless of whether they received RRT or not. The model we built has a wider scope of use than other prospective studies and can predict the expected treatment effect of all S-AKI patients.

The majority of studies on the timing of RRT have used renal function as the primary indication for early treatment, based on Kdigo12,22 or RIFLE staging13. However, relying solely on renal function as an indication may not provide a comprehensive understanding of the intrinsic nature of S-AKI. It is widely acknowledged among clinicians that modifying the immune response through RRT may have a positive impact on sepsis, as demonstrated by several studies31,32. We excluded all patients with blood potassium levels greater than 6.5, aiming to extensively eliminate those who urgently require RRT treatment. We believe this exclusion criterion enhances the practical significance and ethical alignment of our model. However, we did not exclude patients with volume overload, as this is a common state in sepsis treatment due to the essential role of fluid resuscitation.

Our research suggests that in addition to renal function, other indicators such as inflammation markers should be considered when making a decision about RRT for S-AKI. Figure 4 demonstrates that, in addition to renal-related indicators, several other factors are associated with the response to RRT. These include disease severity, as indicated by SOFA scores; volume overload, reflected in SpO2 levels; and inflammation level, evidenced by white blood cell count. This result indicates that when making a decision about starting RRT, factors such as the severity of sepsis and inflammation level should also be taken into account.

Applying the results of an RCT to individual patients should be done with caution, as there may be heterogeneity between individual patients that was not captured in the RCT33,34. Critical ill patients have a high degree of heterogeneity in disease states, which can generate heterogeneity in their response to treatment35. As shown in Table 1, patients who received RRT were much more severe, reflecting the real-world situation in which doctors are more likely to initiate RRT in severe cases. After balancing the differences between the RRT and non-RRT groups using PSM, we utilized the data for ITE analysis to identify characteristics related to treatment effects for each individual in high heterogeneity diseases like AKI. Uplift modeling has been used in AKI patients to identify those who can benefit from electronic alerts36. Our uplift modeling approach also produced a nomogram of the factors influencing the RRT effects on S-AKI patients. This work may be useful to clinicians when making decisions about whether to perform RRT on a S-AKI individual.

Randomized data can be applied to Uplift Modeling, such as data from RCTs, which can accurately distinguish the potential effects of treatments on different individuals37. However, randomized data is not always available. Non-randomized data, after being processed, can also be applied to uplift modeling38. For example, undersampling is an approach used in this context. It works by reducing the number of observations in the majority class, thus addressing the imbalance between different treatment groups. PSM is a method of under sampling and a common approach for applying non-randomized data in uplift modeling. Our study employed PSM to balance the baseline characteristics between the RRT and non-RRT groups, allowing this real-world data set to be applied to uplift modeling. This approach addresses the issue of insufficient randomized data for RRT treatment. Nonetheless, it is essential to note that applying non-randomized data to uplift modeling requires careful consideration of the potential impact that imbalances between groups may have on the results.

Our research has some limitations. Firstly, the training and test sets were all taken from the MIMIC database, which may limit the generalizability of our results. We looked into other public databases, such as the eICU Collaborative Research Database39 but found that they did not have sufficient data to enroll the model. Secondly, the indicators included in our model were conventional parameters, and some novel biomarkers associated with AKI, such as NGAL and KIM-140, were not studied. More research is needed to investigate the relationship between biomarkers and RRT treatment effects in the future.

Conclusion

Uplift modeling can better predict the ITE of RRT on S-AKI patients than conventional score systems such as Kdigo and SOFA. These findings are highly relevant to clinical practice, as they offer valuable insights into the identification of subgroups of S-AKI patients who are most likely to benefit from RRT.