Early Recognition of Burn- and Trauma-Related Acute Kidney Injury: A Pilot Comparison of Machine Learning Techniques

Severely burned and non-burned trauma patients are at risk for acute kidney injury (AKI). The study objective was to assess the theoretical performance of artificial intelligence (AI)/machine learning (ML) algorithms to augment AKI recognition using the novel biomarker, neutrophil gelatinase associated lipocalin (NGAL), combined with contemporary biomarkers such as N-terminal pro B-type natriuretic peptide (NT-proBNP), urine output (UOP), and plasma creatinine. Machine learning approaches including logistic regression (LR), k-nearest neighbor (k-NN), support vector machine (SVM), random forest (RF), and deep neural networks (DNN) were used in this study. The AI/ML algorithm helped predict AKI 61.8 (32.5) hours faster than the Kidney Disease and Improving Global Disease Outcomes (KDIGO) criteria for burn and non-burned trauma patients. NGAL was analytically superior to traditional AKI biomarkers such as creatinine and UOP. With ML, the AKI predictive capability of NGAL was further enhanced when combined with NT-proBNP or creatinine. The use of AI/ML could be employed with NGAL to accelerate detection of AKI in at-risk burn and non-burned trauma patients.


Methods
We developed, validated, and compared five ML algorithms for early recognition of AKI following Cross Industry Standard Process for Data Mining (CRISP-DM) guidelines for a combined population of burn and non-burned trauma surgery patients. Selected features were NGAL, creatinine, NT-proBNP, and UOP based on their significance and relevance in clinical practice. The study focused on ML prediction within the first 24 hours due to burn-and/or trauma injury-related shock being common mechanisms causing AKI. These algorithms were first trained and validated on a retrospective burn AKI dataset. We then determined the generalizability of these ML algorithms in a second dataset containing a mix of burned and non-burned trauma surgery patients. The study was approved by the University of California, Davis Institutional Review Board (Study Cohort A: Protocol# 214836, and Study Cohort B: Protocol#1085450). All methods were performed in accordance with the relevant guidelines and regulations. Informed consent was obtained for all subjects.
Retrospective burn study population (cohort A). The retrospective quality database consisted of 50 adult (age ≥18 years) patients with ≥20% total body surface area (TBSA) burns at risk for AKI reported previously 14 . This database was derived from a hospital clinical laboratory project to validate a commercially available plasma neutrophil gelatinase associated lipocalin (NGAL) enzyme linked immunosorbent assay (Bioporto, Inc, Denmark). NGAL testing was performed on residual plasma chemistry samples collected at the time of burn intensive care unit admission. Briefly, NGAL is a novel AKI biomarker and is released by neutrophils during inflammation and renally cleared 6,7 . During AKI, decreases in glomerular filtration rate (GFR) increases plasma concentrations of NGAL. Unique to NGAL, renal tubular cells also produce the biomarker during AKI-increasing both plasma and urine concentrations of NGAL.
In addition to NGAL, we included natriuretic peptide testing given AKI can lead to acute heart dysfunction and manifesting as cardiorenal syndrome 6,7,18 . Specifically, N-terminal pro B-type natriuretic peptide (NT-proBNP) was also measured (Roche Diagnostics, Indianapolis, IN) using the same plasma samples. Paired to the NGAL and NT-proBNP results, we also recorded UOP, plasma creatinine results, and vital signs from the electronic medical record (EMR). Chart review was used to determine which patients experienced AKI during the first one-week of burn intensive care unit admission based on KDIGO criteria. prospective burn and trauma population (cohort B). The second dataset consisted of 51 adult patients with ≥20% TBSA burns or non-burn trauma-related injuries requiring surgery. Inclusion of a non-burned trauma population served to determine the generalizability of each ML model. These patients were prospectively enrolled to obtain residual clinical plasma samples within the first 24 hours of admission for testing by the same NGAL and NT-proBNP assays to predict AKI. Both NGAL and NT-proBNP results were not used for patient care. Again, chart review was performed to obtain paired UOP and plasma creatinine results, as well as patient history, vital signs (i.e., mean arterial pressure, central venous pressure) and demographic data. KDIGO criteria 17 was used to determine AKI status within the first week of stay. ML algorithms. Five ML approaches were evaluated to differentiate AKI versus non-AKI patients (Fig. 1).
Cohort A was used for the initial training and testing. This was then followed by Cohort B serving as means to evaluate the overall generalizability of our best performing ML algorithms. These ML approaches included: (a) logistic regression (LR), (b) k-nearest neighbor (k-NN), (c) random forest (RF), (d) support vector machine (SVM), and our multi-layer perceptron (MLP) deep neural network (DNN) (Fig. 1). The Scikit-Learn's version 0.20.2 was used in constructing the models within all five algorithms. Briefly, LR is based on traditional statistical techniques that is generally used for identifying predictors of a binary outcome (i.e., AKI vs. no AKI). k-NN is a non-parametric pattern recognition algorithm used for classification and regression 19 . Classification is based on the number of k neighbors and typically its Euclidean distance (d) from a pre-defined point. In contrast, random forest, a form of ensemble learning, uses a multitude of constructed decision trees for classification and regression 20 . Next, SVM is a form of AI/ML that classifies data by defining a hyperplane that best differentiates two groups (i.e., AKI vs. non-AKI patients) by maximizing the margin (the distance), ultimately leading to a hyperplane-bounded region with the largest possible margin 21 . Thus, the goal of SVM is to maximize the distance (margin) between groups of data which can also be applied as a linear method to nonlinear data by transposing the data features into a higher dimension (e.g., three dimensions) through the use of kernels. For this study, our SVM model incorporated a radial basis function kernel technique. This ultimately allows for a better classification and differentiation of the groups of interest (e.g., AKI versus No-AKI). Lastly, DNN utilizes artificial neural networks with multiple levels between input and output layers. Ultimately these multi-layer perceptrons (MLP) within the DNN identifies the appropriate mathematical manipulation to convert an input into an output. Our custom multi-layer neural network grid search in the scikit learn library uses the "Adam" solver (a stochastic gradient-based optimizer) to generate our multi-layer neural networks. This along with our variable number of hidden layers, variable penalty regularization alpha parameters, variable tol values (tolerance for the optimization parameters) and two unique activation functions: ReLU (the rectified linear unit function) and tanh (hyperbolic tan function) allowed us to build and find our best performing multi-layer neural network for each category amongst the thousands of our uniquely constructed ML models 22,23 . Since these ML algorithms are sensitive to unscaled data, variables were scaled based on a standard scaler method transforming features to a mean of 0 with a standard deviation of 1 14 . cross validation studies. Cross validation studies were also performed for LR, RF, k-NN, SVM, and DNN methods using the Scikit-learn cross validation grid search tool. This technique along with the grid search hyperparameter variations (noted above) enabled us to build and compare unique models to yield a total of 68,100 ML www.nature.com/scientificreports www.nature.com/scientificreports/ Statistical analysis. JMP software (SAS Institute, Cary, NC) was used for statistical analysis. Describe statistics were calculated for patient demographics. The Shapiro-Wilkes test and histogram analysis were used to determine normality. Continuous normally distributed variables were compared using means (standard deviation [SD]) using the 2-sample t-test, while discrete variables were compared using the non-parametric Chi-square test. Non-parametric continuous data compared using medians (interquartile range [IQR]), when appropriate, were analyzed using the Mann-Whitney U test. Multivariate logistic regression was used to determine predictors of AKI with age and burn size serving as covariates. Repeated measures analysis of variance was used for time series data. A p-value < 0.05 was considered statistically significant with receiver operator characteristic (ROC) analysis also performed to compare AKI biomarker performance.

Results
Patient demographics and biomarker comparisons between study cohorts (A vs. B, AKI vs. non-AKI, and burned vs. non-burned groups) are shown in Table 1. Briefly, 50% of patients (25/50) in Cohort A experienced AKI within the first week of hospital stay as shown previously. Five patients experienced fluid overload manifested as compartment syndrome. Again, Cohort A served as the dataset for our training phase and initial validation testing. In contrast, in our Cohort B 21.6% (11/51) of the patients experienced AKI within the same timeframe. Eight patients experienced over-resuscitation presenting with compartment syndrome (n = 2), pulmonary edema (n = 2), or both compartment syndrome and pulmonary edema (n = 4). Leveraging both some population similarities and differences, Cohort B was used as our secondary testing dataset to assess the generalizability of the models generated from cohort A. Receiver operator characteristics analysis showed NGAL serving as the best AKI biomarker (area under the curve [AUC]: 0.93, P = 0.023), followed by NT-proBNP (0.85), plasma creatinine (0.68), and UOP (0.57). The area under the ROC curve for each biomarker was significantly (P = 0.038) larger among non-burned patients versus burned patients. Table 2 summarizes the mean accuracy for the AI/ML models during the initial validation phase using Cohort A. For the generalization phase (Cohort B), Fig. 2 illustrates the mean accuracy for each biomarker combination using each AI/ML technique. Models using NGAL and NT-proBNP only reported the highest accuracy of 92% and AUC of 0.92 using either DNN or LR. The generalization accuracy and AUC of our NGAL and creatinine only model (90% and 91%) was noted within our LR model. Excluding NGAL and retaining the other biomarkers markedly reduced the predictive performance in all 5 of our ML platforms, DNN, LR, k-NN, SVM and RF (generalization accuracy of 55%, 49%, 55%, 41%, 22% and AUC of 71%, 68%, 68%, 63%, 50%, respectively). Notably, in the absence of NGAL, the highest generalization prediction accuracy and AUC was noted within our RF model using creatinine and UOP only (71% and 75%, respectively) and within our DNN model using the combination of creatinine, UOP, and NT-proBNP (55% and 71%, respectively). Figure 3 compares average area under the ROC curve ML model within each method with the best average accuracy for various biomarkers combinations including NGAL and/or NT-proBNP. In contrast, Fig. 4 shows ROC curves for each ML method using traditional AKI biomarkers and excluding NGAL and NT-proBNP.

Discussion
This study evaluates the generalizability of a burn population derived ML algorithm for predicting AKI in a mixed burn and non-burned trauma population. Overall, ML is clearly able to provide unique advantages in the context of AKI including the potential to be highly automated via electronic medical record systems, and as observed in previous and current studies, enable early classification of subtle changes for predicting AKI [24][25][26][27] . Kate et al. used LR, SVM, decision trees, and naïve Bayes to detect undiagnosed AKI in a large population of hospitalized elderly (age >60 years) patients 25 . The study reported area under the ROC curves ranging from 0.66 to 0.74. More recent studies compared the performance of ML versus physician prediction of AKI based on KDIGO criteria to achieve area under the ROC curves of 0.75 and 0.80 respectively for data presented at ICU admission 27 . Optimal performance was achieved with data after 24 hours with area under the ROC curve of 0.89 and 0.95 respectively. The study also suggested ML outperformed NGAL, but did not include NGAL in the model despite its reported benefit 28 .  www.nature.com/scientificreports www.nature.com/scientificreports/ Our study is unique in that it evaluates ML in a high-risk burn patients and incorporates (rather than comparing) NGAL into the predictive model. Moreover, five ML methods with unique hyperparameter combinations were used in our study to determine which model provides optimal accuracy across the burn-trauma population and generalized to a mixed burn versus non-burned population of varying disease severity.
As predicted, NGAL was found to be predictive of AKI in both burn and trauma surgery populations, even without using ML. The use of NGAL remains highly relevant in this paper since it is presently used in Europe and Bar graphs illustrate the accuracy for each of the five AI/ML techniques with differing combinations of NGAL, UOP, plasma creatinine, and NT-proBNP. Data was based on Cohort B (n = 51) severely burned or non-burned trauma patients. Notably, the accuracy and sensitivity of best performing models with NGAL alone was 92% and 73%, with an AUC of 85 respectively in 4 out of the five algorithms while the accuracy and sensitivity of the best performing model (seen with LR and DNN) with NGAL in combination with NT-pro-BNP was 92% and 91% with an AUC of 92, respectively. Standard deviations are shown as error bars. www.nature.com/scientificreports www.nature.com/scientificreports/ is expected to become available in the United States for clinical use in the near future 14 . Higher baseline NGAL levels found in our burn patients may be due to their underlying systemic inflammatory response to their injury. Inclusion of natriuretic peptide testing (i.e., NT-proBNP) with NGAL and other biomarkers aided in the evaluation of AKI by leveraging the cardiorenal axis 6,7,14 . Notably, NT-proBNP values were higher in both AKI and non-AKI burn patients in Cohort B. Previous studies have shown natriuretic peptides to be useful for predicting over-resuscitation. For our study, mean NT-proBNP values were higher on Cohort B burned patients due to having more severe complications associated with fluid overload. In contrast to NGAL, UOP has been shown previously to perform poorly for AKI especially in burn critical care 6,7,27,29 . The same holds true for plasma creatinine which exhibits high biological variability and less than ideal inter-assay imprecision 30,31 . In our study, although median creatinine and UOP were significantly different, they were clinically similar based on established acceptable values. Creatinine reference intervals at our institution ranges from 0.60 to 1.30 mg/dL, while output targets a range of 0.5 mL/kg/hr 17 which suggests a range of >30 mL hour in most patients.
Our study highlights the potential power of ML in enhancing the performance of AKI biomarkers in a high-risk population and emphasizes the profound importance of conducting generalization studies across different models. Specifically, our data showed ML was able to enhance the predictive capability and clinical sensitivity of NGAL when it is used in combination with other known biomarkers (e.g., NT-proBNP or creatinine). The generalization performance measures of NGAL alone was not surprisingly high with a 92% generalization accuracy, 73% sensitivity, 97% specificity and 85% AUC in 4 out of our 5 ML platforms (DNN, LR, SVM and RF). However, our DNN and LR models provided the best generalization accuracy, sensitivity, specificity and AUC using NGAL with NT-proBNP-achieving an accuracy of 92%, sensitivity of 91%, specificity of 93% and an AUC of 92%. Similar performance was also noted using NGAL with creatinine in our LR model which provided an accuracy of 90%, sensitivity of 91%, specificity of 90% and an AUC of 91%. Performance of k-NN using the same biomarker combination (NGAL and creatinine) achieved slightly lower performance measures (84% accuracy, 91% sensitivity, 82% specificity and 87% AUC). Differences in ML model performance must also be noted between Cohorts A and B. For any ML model, there is a fine balance between over-versus under-fitting data. Extremes in any direction results increases in error rate and bias 32 . In particular, DNN outperform all other models based on Cohort A ( Table 2), but did not achieve the same advantages when tested in Cohort B (Fig. 2). This observation could suggest over-fitting likely played a role, however, equally important, Cohort B contained very different non-burned trauma patients which could also reduce the overall performance of DNN. Ultimately, this highlights the importance of evaluating ML model performance with secondary datasets to assess for the degree of fitting and overall generalizability.
In summary, both ML models required the inclusion of NGAL which is expected to become available in the United States. The model with the best generalization accuracy without NGAL showed lower performance measures compared to models that included NGAL as a parameter. Specifically, this was a RF model that relied on a combination of creatinine and UOP only showing an accuracy of 71%, sensitivity of 82%, specificity of 68% and an AUC of 75%. Thus, NGAL may be a transformative biomarker for AKI prediction. Recent studies using ML have not included NGAL or similar biomarkers [24][25][26][27] . Interestingly, our ML models perform better than these studies which is likely explained by the inclusion of NGAL with a combination of our algorithms tested (in addition to our DNN model).
Although k-NN was not found to be the most generalizable model within this study, the performance of the k-NN model in Cohort B was similar to previous burn-focused studies reported in literature based on Cohort A 14 . Interestingly, NT-proBNP alone or in combination with only creatinine and/or UOP deteriorated the accuracy within our DNN, LR, k-NN, SVM, ad RF models. However, the addition of NT-proBNP to NGAL lead to our most generalizable models (DNN and LR) which suggests that both of these markers should be included to maintain the optimal predictive performance.
In addition to the above performance enhancements, our ML algorithms predicted AKI an average of 61.8 (32.5) hours (2.5 days) before patients met KDIGO criteria. The potential implications of this finding suggest AI/ ML could be also considered for use in pre-hospital settings (i.e., ambulance, combat casualty evacuations) to augment point-of-care testing especially when NGAL becomes available in the United States (Fig. 5) 33 .  testing with AI/ML could be used to enhance diagnostic power in pre-hospital settings. The figure illustrates a conceptual diagram where POC creatinine and NT-proBNP testing is used at a pre-hospital admission time (t −n ) point and augmented by AI/ML (green pathways). Point-of-care testing data may be then transmitted to an AI/ML algorithm to predict AKI prior to hospital admission. Alternately, AI/ML may also be employed as early as the first day of admission denoted as t 1 . In contrast, traditional workflows (red pathways) relying on urine output and creatinine delay recognition of AKI.