Introduction

Fluid overload, a frequent and unintended consequence of the resuscitation process in critically ill adults may result in increased rates of acute kidney injury and invasive mechanical ventilation initiation, prolonged intensive care unit (ICU) stay, and mortality1,2. Timely de-resuscitation to remove excess fluid is associated with improved outcomes3,4,5,6. While the predictors of volume responsiveness are well-established7,8, particularly in patients with sepsis9,10, the predictors for ICU fluid overload remain unclear. Development of rigorous fluid overload prediction algorithms could shorten the time to the implementation of fluid overload mitigation strategies [e.g., concentration of intravenous (IV) fluid products, discontinuation of maintenance fluids, administration of diuretics] and improve outcomes.

Non-diuretic ICU medication use may affect fluid overload risk; preliminary data suggests the medication regimen complexity-ICU (MRC-ICU) score is associated with both fluid overload and fluid balance11. This score has also been shown to predict mortality and length of stay and also the medication interventions needed to optimize a patient’s pharmacotherapy regimen12,13,14,15,16,17,18,19. Therefore, quantifying patient-specific, medication-related data is likely an important consideration in the prediction of fluid overload in critically adults2,20,21.

Event prediction in the ICU remains a perennial area of research given the many challenges that exist for clinicians to accurately predict clinical outcomes in the highly complex and dynamic critical care environment22,23. Artificial intelligence and machine learning techniques have been proposed as a method to improve ICU clinical outcome prediction given their unique ability to handle multi-dimensional problems and identify novel patterns within the vast troves of continuously-generated patient data21,24,25,26. However, to some ICU clinicians, the use of artificial intelligence/machine learning approaches to predict clinical events may have a ‘black-box effect,’ which can ultimately preclude implementation. The rigorous evaluation of whether artificial intelligence-based approaches predict clinical events better than traditional regression models (or clinical expertise alone) remains a key question in critical care practice27,28,29,30,31.

In this study, we sought to compare the ability of machine learning approaches to traditional regression models to predict fluid overload and the individual predictors for its occurrence in critically ill adults. We hypothesized that advanced machine learning techniques perform better than traditional regression models to predict fluid overload and that the predictors for fluid overload identified through machine learning approaches may be different.

Methods

We conducted a retrospective, observational study of adults admitted ICUs at the University of North Carolina Health System (UNCHS), an integrated health system, who had fluid overload data available. The protocol for this study was approved with waivers of informed consent and HIPAA authorization granted by UNHCS Institutional Review Board (approval number: Project00001541; approval date: October 2021). Procedures followed in the study were in accordance with the ethical standards of the of the UNHCS Institutional Review Board and the Helsinki Declaration of 1975, as most recently amended32. The reporting of this study adheres to the STrengthening and reporting of OBservational data in Epidemiology statement33.

Population

A random sample of 1000 adults (≥ 18 years) admitted to an ICU at UNCHS between October 2015 and October 2020 was generated. Patients on their index ICU admission with fluid balance data available for the first 72 h were included (Supplemental Digital Content (SDC) Fig. S1). Patients were excluded if the admission was not their index ICU admission.

Data collection and outcomes

De-identified UNCHS electronic health record (EHR) data (Epic Systems, Verona, WI) housed in the Carolina Data Warehouse (CDW) was extracted by a trained CDW data analyst. The primary outcome was the presence of fluid overload at the 48–72 h (i.e., day 3) after ICU admission. Fluid overload was defined as a positive fluid balance in milliliters (mL) greater than or equal to 10% of the patient’s admission body weight in kilograms (kg)2,34. For example, a patient with a body weight of 100 kg at ICU admission having a positive fluid balance at 72 h of 12,000 mL (or 12 kg) would be considered to have fluid overload. A secondary outcome was the amount of fluid overload as a function of body weight. For example, the aforementioned patient would have a fluid overload amount of 12%.

Following a literature review, and through investigator consensus, potential predictor variables for fluid overload were defined2,35,36,37,38. A total of 28 potential predictors were identified: 1) ICU baseline: age ≥ 65 years, sex, admission to a medical (vs. surgical) ICU, primary ICU admission diagnosis (i.e., cardiac, chronic kidney disease, heart failure, hepatic, pulmonary, sepsis, trauma), and select co-morbidities (i.e., chronic kidney disease, heart failure); 2) 24 h after ICU admission: APACHE II and SOFA score (using worst values in the 24 h period), use of supportive care devices (i.e., renal replacement therapy, invasive mechanical ventilation), serum laboratory values (i.e., albumin < 3 mg/dL, bicarbonate < 22 mEq/L or > 29 mEq/L, chloride ≥ 110 mEq/L, creatinine ≥ 1.5 mg/dL, lactate ≥ 2 mmol/L, potassium ≥ 5.5 mEq/L, sodium ≥ 148 mEq/L or < 134 mEq/L), fluid balance (mL), and presence of acute kidney injury (as defined by need for renal replacement therapy or serum creatinine greater than or equal two times baseline); 3) Medication data at 24 h: MRC-ICU score, vasopressor use in the first 24 h, use of continuous medication infusions, and the number of continuous medication infusions.

Data analysis

Data missingness

Due to the hypothesis-generating nature of our study and the lack of published data on ICU fluid overload prediction, no attempt was made to estimate a study sample size. The 991 patients were split into training and testing datasets using a 80:20 ratio. We assumed data was missing at random (MAR) (i.e., related to observed, not unobserved values) and therefore chose Multiple Imputation by Chained Equation (MICE), rather than complete case analysis or simple imputation, as the most appropriate approach to address missingness. Ten imputations per variable were therefore applied for all missing data in the training and testing datasets to generate multiple imputed training and testing datasets (SDC Fig. S1).

Machine learning models

We employed Random Forest, SVM, and XGBoost for the task of modeling the presence of fluid overload39,40,41. During the model training on each of the ten imputed training sets, fivefold cross validation was applied for Random Forest, SVM and XGBoost, using their most appropriate R package42,43,44, to choose the hyperparameters for these machine learning models that resulted in the highest prediction accuracy. Each of these models was then fitted on the corresponding imputed training set, and predictions for probability of fluid overload were made on each of the ten imputed testing sets using the corresponding optimal model. During this phase, hyperparameters were tuned. For Random Forest, two hyperparameters were tuned (number of trees and number of variables randomly sampled as candidates at each split). For SVM, linear kernel and cost of constraints violation were tuned. For XGBoost, two hyperparameters were tuned (maximum depth of a tree and maximum number of boosting iterations). For each model, ten different predictions were generated on ten different imputed test sets. These predictions of the probability for fluid overload were averaged as the final prediction.

For the degree of fluid overload, we built models with the amount of fluid overload at 72 h. Since this is a continuous variable, we employed their regression of the above machine learning models: Random Forest regression, SVM regression, and XGBoost regression. For XGBoost, feature importance was measured as the frequency a feature was used in the trees. For Random Forest, feature importance was measured by mean decrease in node impurity. Because ten different models were used on each imputed dataset, ten different feature importance lists were generated for each. A subsequent analysis modeling fluid overload as a continuous variable (percent of net milliliters of fluid by body weight) instead of dichotomous presence or absence of fluid overload) was performed (see SDC—Additional Methods S1).

Traditional regression models

Subsequently, a full logistic regression model was built for the presence of fluid overload for each of the ten complete training sets. We then applied backward elimination to select the final model. The initial set of variables for the variable selection were determined by the significance of variables in the ten full models by multivariate Wald testing45. We built our linear regression models so that the degree of fluid overload was similar to that of the ten completed training sets. In order to compare these models with the MRC-ICU only model, we also built logistic regression and linear regression models with MRC-ICU as the sole predictor in the ten training sets. After model fitting, model fits were pooled using Rubin’s method46. Using the pooled models, odds ratios (OR) and their 95% confidence intervals (CI) were reported.

For each regression model, ten different predictions were generated on ten different imputed test sets as well. These predictions of the probability for fluid overload were averaged as the final prediction. We compared the variables selected through backward selection with the top five variables chosen by Random Forest (see SDC Additional Methods S1). To further evaluate our results in those patients with high APACHE-2 (≥ 25) and high SOFA (≥ 10) scores, we generated predictions using the backward section model (see SDC Additional Methods S1).

Ethical approval

The protocol for this study was approved with waivers of informed consent and HIPAA authorization granted by UNHCS Institutional Review Board (approval number: Project00001541; approval date: October 2021).

Results

A total of 49 (12.5%) of the 391 included patients had fluid overload on ICU day 3. The degree of day 3 fluid overload was significantly greater in the fluid overload (vs non overload) patients (16.6% vs 2.2%, p < 0.01). Overall, the mean APACHE II score was 15.7 ± 6.6, mean SOFA score was 8.3 ± 3.3, and MRC-ICU score was 11.8 ± 8.7. A significantly greater proportion of fluid overload patients (vs. those without) had an elevated serum lactate ≥ 2 mmol/L (32.7% vs. 14.9%, p = 0.01) and AKI (28.6% vs. 10.5%, p < 0.001) at 24 h and positive fluid balance (1,840 mL vs. 390 mL, p < 0.001) on ICU day 3. All model covariates are summarized in Table 1. At ICU day 3, patients with fluid overload (vs those without) were more likely to be dead (20.4% vs. 7.3%, p = 0.01), have AKI (34.7% vs. 15.8%, p < 0.001), and remain on mechanical ventilation (12.7% vs. 4.2%, p = 0.05).

Table 1 Study cohort characteristics by presence of fluid overload within 72 h of ICU admission.

Among the machine learning models, XGBoost demonstrated the highest AUROC (0.78) compared to SVM (0.69) and RF (0.76) and was associated with a PPV of 0.27 and NPV of 0.94. Notably, all models tested at relatively poor PPV. In comparison, stepwise logistic regression had an AUROC of 0.70, PPV 0.26, and NPV 0.94. Full results are reported in Table 2, and AUROC curves for all models are provided in SDC Supplemental Fig. S2. Results of the full logistic regression are reported in SDC Supplemental Table S1. Stepwise regression resulted in a more parsimonious model (7 variables vs. 31 variables) but demonstrated similar performance to the machine learning models (SDC Supplementary Table S2). In the stepwise regression, presence of sepsis, male sex, the SOFA score at 24 h, and the 24 h serum sodium and bicarbonate comprised the stepwise regression model (Table 2). In an analysis of MRC-ICU as a single predictor for fluid overload, the model had an AUROC of 0.74 (0.60–0.84), sensitivity 0.62 (0.35–0.85), specificity 0.70 (0.63–0.77), PPV 0.16 (0.08–0.27), and NPV 0.96 (0.90–0.98).

Table 2 Performance of presence of fluid overload prediction models, mean (confidence interval).

Feature importance graphs were plotted for XGBoost (Fig. 1), RF (SDC Supplemental Fig. S3) and SVM (SDC 5 Supplemental Fig. S4). Among the 10 different feature importance lists generated for each model, differences between top features were noted. For example, for two of the machine learning models, XGBoost (Fig. 2) and RF, the top five most important features were fluid balance at 24 h, SOFA score at 24 h, MRC-ICU at 24 h, APACHE II at 24 h, and the number of continuous infusions at 24 h. While the stepwise regression model found fluid balance at 24 h and APACHE II at 24 h to be top features, the SOFA score at 24 h, the MRC-ICU at 24 h and the number of continuous infusions were not found to be model features. The full regression results for predicting the amount of fluid overload at 72 h are reported in SDC Supplemental Table S3. For stepwise regression, twelve variables were included with fluid balance, laboratory values, and severity of illness being significant predictors (SDC Supplemental Table S4). All models demonstrated similar performance as measured by MSE (SDC Supplemental Table S5). Feature importance graphs are presented in SDC Supplemental Figs. S5S7).

Figure 1
figure 1

Feature importance for presence of fluid overload prediction with XGBoost.

Figure 2
figure 2

Most common features for presence of fluid overload prediction with XGBoost imputations.

When variables selected through backward selection were compared with the variables chosen by the Random Forest model, we found MRC-ICU at 24 h to be highly correlated with sex-male, number of IV continuous infusions to be highly correlated with sex-male and age—≥ 65), fluid balance at 24 h (mL) to be highly correlated with admission diagnosis-sepsis/septic shock, laboratory values-serum bicarbonate, and age—≥ 65 (SDC Supplemental Tables S6 and S7). These results indicate high explanatory power exists between the backward selection and random forest variables. The vast majority of cases of fluid overload occur in patients with both high APACHE II and SOFA scores (SDC Supplemental Table S8).

Discussion

Although machine learning models have been shown to outperform traditional regression models in a variety of settings47,48, the potential benefits of machine learning in critical care remain an open field of exploration, in part due to a current lack of rigorous comparison in high quality ICU datasets29,49,50. Our analysis represents the first published comparison of machine learning approaches with traditional regression methods to predict fluid overload using a novel dataset with granular medication data.

We report that machine learning and logistic regression analyses demonstrate a similar predictive power to identify patients with fluid overload on day 3 of their ICU stay. Although use of machine learning did not appear to improve predictive performance over regression analysis, it expanded the number of variables critical to fluid overload prediction and highlights the importance of further artificial intelligence-based exploration in this area. This analysis of individual predictors may help bedside clinicians better understand how the machine learning models work and may help overcome their ‘black box’ hesitancy to trust machine learning-generated results51,52. For example, feature importance graphs for the machine learning analyses found complexity of the daily ICU medication regimen (i.e., MRC-ICU score), which includes the number of intravenous medication infusions (the primary method to administer medications in this population and a primary source of fluids to a patient), to be an important contributor to fluid overload. In comparison, in the traditional multivariable regression, the MRC-ICU score was not associated with fluid overload. This may be because machine learning analyses better account for severity of illness and the response of clinicians to respond to this severity by administering more medication infusions leading to a more complex daily medication regimen; however, the methods applied, including feature importance, preclude causal inference at this juncture. As such, our results highlight the unique power of machine learning to identify complex relationships that can be further elucidated via machine-learning based causal inference modeling and other designs aimed at causation2,20.

Optimizing fluid management (or fluid stewardship) has been previously defined by the ROSE model of Resuscitation, Optimization, Stabilization, and dE-resuscitation35. After an initial 24—48 h period characterized by overt volume resuscitation (e.g., a crystalloid bolus) and IV medication initiation (e.g., antibiotics), and the associated fluid administration, the care priority shifts from volume administration to volume removal. While comprehensive fluid stewardship management strategies including reduced fluid use and diuretic administration can effectively reduce fluid overload and its sequelae, they are often deployed too late1,2. Interestingly, some reports have indicated ‘hidden fluids’ (defined as blood products, enteral nutrition, flushes, and intravenous medications) were significantly associated with the development of fluid overload. During critical illness many of these ‘hidden fluids’ are necessary (e.g., blood products), given that intravenous medications account for over 40% of total fluid intake in this analysis, interventions such as concentrating intravenous medications, employing oral formulations when feasible, careful evaluation of maintenace fluids, and antibiotic de-escalation are potoentially still viable even in high illness severity that can reduce this complication. However, weighing risks and benefits associated with these interventions in this context may be aided by more quantitative prediction data56,57. Overall, de-resuscitation and fluid stewardship can be deceptively complex53. In a patient with shock, balancing the dueling forces of volume responsiveness assessment and timely volume resuscitation with the risks associated with fluid overload represents a highly complex Goldilocks scenario that requires clinicians to have high clinical precision, essentially pivoting ‘on a dime’, from a strategy of aggressive volume expansion to one of rapid volume removal36,54,55.

Despite the complexities of this decision process, limited prediction tools for fluid overload are available to assist clinicians at the ICU bedside. As such, real-time recognition identifying when to make the shift from resuscitation to de-resuscitation has the potential to improve bedside management. However, to go beyond the hourly assessment of ‘Ins and Outs’ would require accurate prediction of future fluid overload risk and the adverse events associated with it, in the time-dependent context of intervention delivery (e.g., diuretics). In such a scenario, an algorithm would be able to accurately interpret a septic patient who is 3 L positive 24 h after fluid resuscitation initiation as being in a ‘green zone’ (i.e., appropriately resuscitated). However, 24 h later, if the same patient is 4 L positive while off vasopressors and with down-trending sepsis markers the algorithm could alert clinicians that the patient is now in a 'yellow zone' where interventions like diuretic therapy and fluid reductions are required to reduce acute kidney injury and intubation risk. This type of real-time predictive capability could support continuous clinician decision-making but requires evaluation outside the scope of our current study.

Fluid overload also presents an important test case for exploring and adapting artificial intelligence methods to ICU problems, particularly those related to ICU medication use. Fluid overload represents a uniquely intervenable event in the ICU. Intervenable events share three key characteristics: they are predictable, preventable, and otherwise associated with poor outcomes. The results of our study, and others, indicate that fluid overload can be predicted with modeling of some kind, especially given its ability to be quantitatively defined56,57,58. Fluid overload has been associated with poor outcomes including acute kidney injury, delirium, poor respiratory outcomes, prolonged length of stay, and potentially increasing mortality2,37,59,60,61,62. Evidence demonstrates the timely recognition and management of fluid overload is feasible and is associated with reduced mortality and time in the ICU3,5,63,64. Notably, fluid stewardship has been adapted by critical care pharmacists as key component of comprehensive medication management5,6,65. As such, these results may support other investigations as they identify patients in whom it is safe to initiate de-resuscitation or importantly never needed that degree of fluid volume initially and at the bedside may prompt clinicians to be more targeted in therapies initiated or aggressive in curtailing early ‘hidden’ fluids to avoid the complications of fluid overload and/or the need for a highly interventional period of de-resuscitation (e.g., diuretics, dialysis). Artificial intelligence may be particularly well suited to bolster these efforts, and thus while feature importance analyses cannot provide foundation for causal inference, they may guide such future investigations.

Our study has limitations. Our patient sample may have been too small to demonstrate superiority of the machine learning approaches compared to traditional regression, and no validation in a separate, external dataset was undertaken at this juncture66. Future studies applying this approach to alternative, larger datasets (e.g., MIMIC-III) should be considered to examine the external validity of our findings. Although MICE is the established approach to address missingness in cohort studies that includes variables that are a composite of several individual patient-specific values (e.g. SOFA), it is possible that some of the values in the imputed datasets that represented our new ground truth may not have been accurate67. Bias may exist due to which patients had fluid balance data available. Other predictors for fluid overload not included in our models may exist68. By relying on prediction data derived in the first 24 h of ICU admission, we did not fully capture the dynamic nature of critical illness over the entire three day ICU period before fluid overload occurred. Future time-dependent evaluations of changing features employing unsupervised learning techniques may yield novel insights.

Conclusion

Fluid overload is an important, intervenable event in the ICU population. Incorporation of medication-related variables and artificial intelligence has demonstrated promise to improve prediction that may ultimately guide timely intervention and mitigation of this ICU complication; however, comparative advantages over traditional modeling techniques may remain warranted.