Acute kidney injury detection using refined and physiological-feature augmented urine output

Acute kidney injury (AKI) is common in the intensive care unit, where it is associated with increased mortality. AKI is often defined using creatinine and urine output criteria. The creatinine-based definition is more reliable but less expedient, whereas the urine output based definition is rapid but less reliable. Our goal is to examine the urine output criterion and augment it with physiological features for better agreement with creatinine-based definitions of AKI. The objectives are threefold: (1) to characterize the baseline agreement of urine output and creatinine definitions of AKI; (2) to refine the urine output criteria to identify the thresholds that best agree with the creatinine-based definition; and (3) to build generalized estimating equation (GEE) and generalized linear mixed-effects (GLME) models with static and time-varying features to improve the accuracy of a near-real-time marker for AKI. We performed a retrospective observational study using data from two independent critical care databases, MIMIC-III and eICU, for critically ill patients who developed AKI in intensive care units. We found that the conventional urine output criterion (6 hr, 0.5 ml/kg/h) has specificity and sensitivity of 0.49 and 0.54 for MIMIC-III database; and specificity and sensitivity of 0.38 and 0.56 for eICU. Secondly, urine output thresholds of 12 hours and 0.6 ml/kg/h have specificity and sensitivity of 0.58 and 0.48 for MIMIC-III; and urine output thresholds of 10 hours and 0.6 ml/kg/h have specificity and sensitivity of 0.49 and 0.48 for eICU. Thirdly, the GEE model of four hours duration augmented with static and time-varying features can achieve a specificity and sensitivity of 0.66 and 0.61 for MIMIC-III; and specificity and sensitivity of 0.66 and 0.64 for eICU. The GLME model of four hours duration augmented with static and time-varying features can achieve a specificity and sensitivity of 0.71 and 0.55 for MIMIC-III; and specificity and sensitivity of 0.66 and 0.60 for eICU. The GEE model has greater performance than the GLME model, however, the GLME model is more reflective of the variables as fixed effects or random effects. The significant improvement in performance, relative to current definitions, when augmenting with patient features, suggest the need of incorporating these features when detecting disease onset and modeling at window-level rather than patient-level.

Pre-processing and inclusion/exclusion criteria. Patients with less than four hours urine output measurements were excluded. Of those with more than four hourly measures, we excluded any patients with a normalized urine output less than or equal 0.5 ml/kg/h during the first 6 h of admission given that they will require data collected prior to ICU admission which the current databases do not capture. As urine output measures occurred at irregular intervals, we estimated the urine output at the end of the sixth hour, when the measure was not recorded, using interpolation between the two nearest measures. Lastly, we excluded the first urine meas- www.nature.com/scientificreports/ urement that inconsistently includes urine output in the Emergency Department, in the operating room or the hospital ward prior to ICU admission. We excluded part of the database from analyses because we are concerned only with patients with sufficient data who developed AKI during their ICU stay. The data went through two stages of filtering as illustrated in Fig. 1 . The two cohorts resulting from the two stages are Analyses cohort and subsequently the GEE/GLME cohort.
The Analyses cohort is used in characterizing the baseline symmetry between the urine output and creatinine criteria of AKI, and in evaluating the performance of various combinations of time and volume thresholds. It included only patients who had normal kidney function at ICU admission. Therefore, we excluded patients if they had undergone dialyses prior to ICU admission, or if they had a first creatinine measure greater than 1.2 mg/dl, or had an average urine output less than 0.5 ml/kg/h for the first 6 h. Additionally, we excluded patients that had missing data, and ones with too few observations to reliably extract information from (e.g. had less than four measurements of urine output data).
The GEE/GLME cohort is used in identifying a urine output based model that is augmented with other static and dynamic features to predict AKI onset. This cohort is a subset of the Analyses cohort but additionally excluded any patient with missing values for the static and dynamic features used in the model. These features are: age, gender, use of diuretics, use of vasopressors, average MAP, and fluid intake.
Baseline symmetry and time/volume refinement. All three AKI standards (RIFLE, AKIN, and KDIGO) have similar criteria for their lowest levels of AKI classification. Stage 1 of KDIGO and AKIN and the risk stage of RIFLE require urine output that characterizes AKI by time and volume thresholds of 6 h and 0.5 ml/ kg/h and a creatinine level of greater than 1.5 × the baseline. The creatinine-based criteria for classifying patients as having AKI ( AKI Cr ) is based on the creatinine measurements within the first 48 h of ICU admission where we define AKI as either (1) an increase in creatinine greater than or equal 0.3 mg/dl from hospital stay minimum, or (2) a 50% or more increase from hospital stay minimum 16 . The urine output based criterion ( AKI UO ) classifies patients as having AKI if any time window of a given length threshold has an average weight-normalized urine output less than the volume threshold. We investigated the baseline symmetry between the creatinine and urine criteria of AKI. In particular, we determined its classification performance as indicated by sensitivity and specificity of time and volume thresholds of 6 h and 0.5 ml/kg/h with the creatinine-based definition of AKI as reference. We also refined the choice of time and volume threshold combinations that allowed for the greatest overlap between AKI UO and acute kidney injury based on creatinine ( AKI Cr ). The time thresholds we investigated ranged from 2 to 12 h in increments of 2 while the volume thresholds ranged from 0 to 1 ml/kg/h in increments of 0.1. For each combination of thresholds, we calculated specificity, sensitivity, J-point distance, and net reclassification index (NRI). J-point is the point on the ROC curve that has the least Cartesian distance to 100% sensitivity and specificity.
Multivariable modeling. Urine output is time-varying, with future values correlated to past values. This makes standard generalized linear modeling approaches invalid. To address this, we employed a generalized estimating equation (GEE), which estimates the parameters of a generalized linear model without any assumptions about the covariance structure of the data, allowing us to use multiple correlated urine observations for model parameter estimation.
The following features were included in the GEE model to predict AKI onset according to the creatinine criteria: age, having diabetes, having heart disease, having cancer, prior diuretic use, prior vasopressor use, first creatinine measure, lean body mass (LBM), time-averaged mean arterial pressure, and fluid balance. All these variables are considered as fixed effects in the GEE model. In comparison, for the GLME model, we consider age, prior vasopressor use, first creatinine measure, LBM, time-averaged mean arterial pressure, and fluid balance to be fixed effects; and a patient having diabetes, heart disease, cancer, and been given diuretic prior as random effects. This better representation could potentially lead to greater agreement with creatinine-based definition. The GLME model integrates out the random effects, but is limited to categorical variables. The extended GLMM model 43 is able to model continuous random effects using Monte Carlo simulation and expectation maximization, which makes it computationally infeasible for the size of the database we are using.
We computed fluid balance within a certain time window by subtracting the total urine output within the window from the adjusted fluid intake and normalizing it by the patient's first measured weight. The adjusted fluid intake is the sum of fluid intake up to and including during the time window minus the total urine output up to the start of the time window.
As in our refinement analyses, we explored various time window lengths and observed their impact on model performance in prediction of AKI onset with reference to creatinine based AKI criteria. Specifically, we explored time thresholds ranging from 2 to 12 h in increments of 2.
We generated the GEE model using GEEQBOX toolkit 36 and the GLME model using Matlab's GLME function using a randomly selected training set comprising of two-thirds of the GEE/GLME cohort, and tested the performance of our fitted models by predicting AKI Cr on the unseen test set (one-third of GEE/GLME cohort). We plotted the receiver operating characteristic (ROC) curve for each of the six models (one model for each time window), and examined the model coefficients, odds ratios, 95% confidence intervals, and p-values for each model.
For each model, we calculated the area under the ROC curve (AUC), J-point specificity and sensitivity, J-point distance, and net reclassification index (NRI). For computing the NRI for the various models, we binarized the prediction of AKI for the validation set using the probability threshold of the J-point.  (3) use of vasopressors, we used the normalized start time of each window. For the MAP, we obtained the median value one to three hours prior. For the fluid balance, we obtained the difference between fluid input and output and normalized it by weight. For the vasopressors, we checked to see if any vasopressor was used prior to the start time of the window.
To obtain AKI Cr for each window, we labeled each creatinine measurement with 0 or 1 (0: no AKI, 1: has AKI) based on the AKI Cr definition. We also, removed any UO window that overlap with serum creatinine measurements (because it is difficult to know which measurement it would belong to) and any window after the last measurement. We labeled each window based on the next nearest creatinine measurement.
Net reclassification index. In order to measure the improvement in performance of the various refinements in time and volume thresholds and GEE/GLME models with respect to the standard urine output threshold of 0.5 ml/kg/h for a duration of at least 6 h, we computed their net reclassification improvement (NRI) 44,45 . NRI is the difference between the probability of correct reclassification and the probability of incorrect reclassification. It is also the difference between the sum of the sensitivity and specificity of the new model and the sum of the sensitivity and specificity of the old model.
Use of experimental animals, and human participants. This is a retrospective study using openly available datasets and does not deal with human participants or groups. Therefore, need for consent is not applicable. Only computational methods were used and no clinical or experimental methods were carried out. All methods were carried out in accordance with relevant guidelines and regulations.

Results
Characteristics of patients and population sizes for the Primary cohort, Analyses cohort, and cohort of best performing GEE/GLME model for the MIMIC-III and eICU databases are shown in Table 1. We note that the GEE/GLME cohort differs from the Primary cohort in all characteristics in both databases with the exception of cancer indicator, use of diuretics, height, and age in MIMIC-III; and age in eICU . This is to be expected as we only include patients with specific characteristics from the general and heterogeneous patient population.
We also note a significant difference in the number of patients that have heart disease and that have cancer between the MIMIC and eICU databases-heart disease (MIMIC: 68%, eICU: 11.4%), cancer (MIMIC: 17%, eICU: 2.3%). The diagnoses included in the heart disease and cancer categories for MIMIC and eICU include similar diverse set of diagnoses. Johnson et al. 38 had similar statistics for the percentage of patients with heart disease (71.4%) and Pollard et al. 39 mentioned that 11.15% and 4.7% of the patients in the eICU had heart disease and cancer respectively, similar to our findings. Supported by existing work, the differences in the percentages of patients with diseases between the MIMIC and eICU datasets suggest that the two sets of patients are significantly different.
Additionally, there was a noticeable drop in the percentage of patients that meet the creatinine-based definition of AKI in the eICU database between the Primary and Analyses cohorts (56.4-34.4%). The reason behind this drop is due to there being a large intersection between the patients with abnormal kidney function at ICU admission and the ones who meet the definition of developing creatinine-based AKI. When filtering out the ones with prior abnormal kidney function from the Primary cohort a significant portion of the patients that had further increase in creatinine during their ICU stay were also excluded resulting in the sharp decrease.
The congruence between creatinine-based definition of AKI and mortality has a sensitivity of 0.61 and specificity of 0.48 for MIMIC-III; and sensitivity of 0.47 and specificity of 0.67 for eICU. The baseline symmetry between the standard AKI ( AKI UO ) definition of urine output less than 0.5 ml/kg/h for 6 h and the reference AKI ( AKI Cr ) definition based on creatinine levels has a sensitivity of 0.54 and specificity of 0.49, with a distance of 0.68 from 100% sensitivity and specificity for the MIMIC-III database; and a sensitivity of 0.56 and specificity of 0.38, with a distance of 0.76 from 100% sensitivity and specificity for the eICU database.
The results of refining AKI urine output and time thresholds are depicted in Fig. 2 and supplementary Table S1. For each of the two databases MIMIC-III and eICU, there are volume and time threshold combinations for the urine-based AKI definition that have better congruence with the creatinine-based AKI definition than the standard volume and time thresholds of 0.5 ml/kg/h and 6 h.
For the MIMIC-III database, ranking based on J-point distance results in the optimal time and volume thresholds of AKI UO as UO less than 0.6 ml/kg/h for 12 h. This combination has a sensitivity of 0.48, specificity of 0.58, J-point distance of 0.67, and NRI of 0.027. Ranking the threshold combinations based on NRI values, results in the same optimal time and volume thresholds of AKI UO . For the eICU database, ranking based on J-point distance results in the optimal time and volume thresholds of AKI UO as UO less than 0.6 ml/kg/h for 10 h. This combination has a sensitivity of 0.48, specificity of 0.49, distance of 0.73 from 100% sensitivity and specificity, and NRI of 0.026. Ranking the threshold combinations based on NRI values, results in the optimal time and volume thresholds of AKI UO as UO less than 1 ml/kg/h for 2 h. This combination has a sensitivity of 0.92, specificity of 0.074, distance of 0.93 from 100% sensitivity and specificity, and NRI of 0.046.
The mortality percentage of patients meeting the volume and duration thresholds of urine-based definition of AKI decreases as the normalized urine output threshold increases and increases as the time duration threshold increases as shown in Fig. 3.
The area under the ROC curve (AUC) for the GEE/GLME multivariable models augmented physiological features for two partitions are plotted in Fig. 4. GEE model has better performance than the GLME model for MIMIC and eICU databases. However, we include the GLME model as it is more reflective of fixed and random effects, integrating out random effects.
For the best performing model according to AUC (4 h of data), the odds ratio, and 95% confidence intervals for significant features are tabulated in Table 2.
First creatinine measurement, LBM, prior vasopressor use, and fluid balance were found to exhibit a statistically significant association with AKI Cr in both MIMIC-III and eICU. Additionally, heart disease was a significant indicator in MIMIC-III in the GEE model, while diuretics use and MAP were significant features in eICU in the GEE model. Specifically, increased first creatinine measurement, positive fluid balance, and decreased LBM showed a positive association with AKI. In MIMIC-III, heart disease and vasopressor use showed negative association with AKI. In eICU, use of diuretics and vasopressors use showed positive association, whereas mean arterial pressure showed negative association. Table 1. Study population characteristics. Representation of binary and continuous properties of primary, analyses, and GEE/GLME cohorts. Properties include LOS length of stay, LBM lean body mass, AKI Cr acute kidney injury based on creatinine. Binary properties are indicated with percentages of positive cases, and continuous properties are indicated with median and interquartile ranges.  www.nature.com/scientificreports/ Summary of performance across the various non-parametric and parametric models is tabulated in Table 3. For both databases, MIMIC-III and eICU, J-point distance is reduced for non-parametric model over the standard urine-based AKI definition. Additionally, the distance is substantially reduced for the parametric GEE and GLME models over the non-parametric model.
We also tested the MIMIC-trained model on eICU and vice versa using both GEE and GLME models. The significantly lower performance compared to models trained and tested on the same database leads to the conclusion that there are significant differences between the patients cohorts not captured in the databases. These

Discussion
Over the past 3 decades, the incidence of AKI has increased over 20-fold, making it an important problem in critical care medicine. The purpose of this paper was to investigate the complex factors mediating the relationship between urine output and creatinine in AKI, and to develop a time varying multivariable model that identifies factors mediating the relationship based on augmentation of urine output with physiological features. For the diagnosis of AKI, serum creatinine remains the AKI reference in practice. Creatinine, however, reflects kidney function and not kidney damage. This is problematic because functional changes tend to occur only after the kidney has suffered significant damage 10 . Recent studies have shown the potential of other biomarkers to be better predictors of AKI 33,46 that are not readily measured. Indeed, it has been reported that kidney damage may begin up to 48 hours before it is detected by changes in creatinine. This fact was the motivation for the development of urine output criteria of AKI in the first place 46 .
In the realm of urine output criteria, the congruence between urine output and creatinine-based AKI is greater in MIMIC-III than in eICU. This may be a result of a much larger portion of patients that meet the creatininebased AKI definition in MIMIC-III (54% in MIMIC-III vs 34 % in eICU). Additionally, the performance of the optimal time and volume threshold combinations both according to J-point distance and according to NRI had only a slightly better agreement with creatinine-based AKI definition than the standard urine-output based definition. We argue that the additional 4 or 6 hours of data required for this modified threshold does not merit the small improvement in classification performance.
In actuality, the relationship between urine output and creatinine is likely confounded by multiple factors. Fluctuations in urine output are also likely to be driven independently by variables completely unrelated to AKI. Overall, low urine output may translate into AKI in some patients but not in others, and potentially confounding clinical factors should be considered before urine output is used to make a diagnosis. Although it is known to be less accurate, there are known advantages to using the urine output criteria.
Ultimately AKI is a highly heterogeneous disease 29 and it may be naïve to assume that a single feature (be it urine or creatinine) will correctly predict the same ailment for all patients. As suggested by De Corte, one future path forward may be to condition the definition of AKI on the population in question 10 . Our work presented here is a step towards incorporating this heterogeneity through physiological features.
We saw a significant improvement in the predictive performance of feature-augmented time varying GEE and GLME models with a window of 6 h (time duration of standard urine output) compared to the standard urine output based AKI definition in terms of sensitivity, specificity, and J-point distance in both databases.
Additionally, the prediction performance of all the feature-augmented time varying models consistently outperformed the prediction performance of the original urine output based definition of AKI or any refinement of its time and volume thresholds according to any of the metrics used (sensitivity, specificity, J-point distance, and NRI). Importantly, there is no trade off between any of these metrics such as an increase in specificity at the expense of sensitivity. This suggests that having a time varying model augmented with static and dynamic features is necessary for significantly improved prediction of AKI.
Furthermore, our results provide insight into features other than urine output that might improve the prediction performance of AKI. In both MIMIC and eICU, first creatinine measurement, fluid balance, and LBM were significantly associated with creatinine-based AKI. First, a higher baseline creatinine was associated with future rise in creatinine. This is a noteworthy finding as we specifically excluded patients with "abnormal" baseline creatinine-thus even a "high normal" baseline creatinine is associated with AKI. Second, positive fluid balance was associated with future rise in creatinine. It is worth noting here that we did not directly investigate the type of fluid received by the patients, which has been reported as a potential driver of AKI by others in the literature 47 . Third, a greater LBM decreased the probability of developing creatinine-based AKI. This finding is substantiated by the work of Liu et al. 48 where they found that underweight patients had a greater chance of developing AKI in ICU as adequate nutritional intake is thought to reduce ICU length of stay and improve chances of recovery 49,50 .
In MIMIC-III, use of vasopressors and heart disease are associated with decreased risk of AKI. In MIMIC-III, more than 60% of the patients have heart disease. Of those patients, 39% were given vasopressors; Only 20% of the patients without heart disease were given vasopressors. Vasopressors stabilizes the abnormally low blood pressure and blood perfusion caused by heart disease and restores end-organ perfusion leading to better outcomes.
In eICU, use of diuretics was associated with increased chance of developing AKI. This may be due to forced diuresis leading to volume overload 51 . Also, decreased MAP had a positive association with future rise in creatinine. Low average MAP within a time window was associated with a future rise in creatinine, as expected from decreased renal perfusion. Additionally, use of vasopressors was associated with the development of AKI as also previously noted 52 . Reduction of blood flow to tissues for patients with increased fluid overload can cause harm 53 .
It is interesting to note that direction of association of a given feature depends on the underlying population. Vasopressor use was negatively associated with AKI in MIMIC whereas it was positively associated in eICU, as MIMIC has significantly more patients with heart disease diabetes, and cancer than eICU. This emphasises the importance of taking into account the patient population characteristics when making treatment decisions.
Even prior to the development of the RIFLE criteria 14 and the AKIN modification 19 , experts remarked "none of the definitions (of AKI) used to date take into account the modifying effects of age, gender, and race on creatinine generation" 54 . Even the most recent clinical practice guidelines state that the urine output criteria are not well validated, require further investigation, and that the effects of fluid balance and other factors should be considered. One recent study of 2171 patients performs such an adjustment based on fluid balance 55 , but our work here considers fluid balance in addition to multiple other factors suggested by prior investigators.
Our findings that UO alone is not a powerful indicator of AKI but UO along with other features such as blood pressure and use of vasopressors can be a sensitive indicator are supported by Prowle et al. 56  Both studies sought to determine if changes in UO could be a sensitive marker of AKI using creatinine-based definition as the gold standard. However, both studies used urine output and not fluid balance to detect AKI, which is necessary as increase in fluid intake while maintaining the same UO does raise concerns about kidney function. Additionally, they used summary statistics such as mean, median and interquartile range (IQR) for continuous variables and percentages and CI for categorical variables rather than utilizing higher resolution of variables. Our modeling at a window-level rather than a patient level allows use of the appropriate corresponding values. It allows accounting for the time difference between events such as use of vasopressors and change in fluid balance as the impact of drugs lessens over time.
The last decade's research on the topic of AKI has focused primarily on the discovery of more reliable biomarkers for laboratory diagnosis of AKI. Several biomarkers can give an indication before serum creatinine rises, but unfortunately they may perform no better than standard criteria in unselected populations, and have not been linked to improved outcomes 29,46 . Additionally, the biomarkers are not readily measured, making impossible to perform large retrospective studies on it.
With the advent of digital health records, we have the opportunity to re-calibrate consensus definitions and clinical guidelines traditionally based on expert opinion, and/or data from relatively small sample populations. This allows us to test the robustness of physiologic concepts developed based on animal experiments or studies on healthy human volunteers in the setting of critical illness. When AKIN first created a definition of AKI, large databases that relate creatinine to hourly urine output, like the Multiparameter Intelligent Monitoring in Intensive Care III database (MIMIC-III) and Collaborative Research Database (eICU), were not as readily available. Using two independent large retrospective clinical archives with significantly different patient populations we have re-examined the agreement between the two components of this definition.
While our results are robust, this improved detection cannot replace the measurement of creatinine for the definition of AKI. In the future, other definitions, and even guidelines, based on expert opinion and existing data should be revisited in this manner, based on new repositories of patient data linked with clinical outcomes, and we believe that our work presented here can serve as a prototype for this approach.

Conclusion
In this paper, we refined the urine-based definition of AKI by optimizing urine volume and duration criteria, and also introduced a time varying detection model that incorporated physiological features that confound the relationship between hourly urine output measurements and creatinine. This was conducted using two independent data sets with different patient populations. In both data sets we consistently showed that a model which monitors repeated urine output measures in addition to other covariates (such as average MAP) has enhanced associations with future rise in creatinine, as compared to applying a fixed criterion of 0.5 ml/kg/hour of urine for 6 hours or any of its refinements. Thus, urine output and other patient characteristics could be continuously monitored in real time by a bedside algorithm. Once the multivariable definition of AKI is met in a given patient, critical steps (such as interventions to treat AKI, or adjusting the dose of medications cleared by the kidneys) could be undertaken.