Machine learning-based prediction of acute kidney injury after nephrectomy in patients with renal cell carcinoma

The precise prediction of acute kidney injury (AKI) after nephrectomy for renal cell carcinoma (RCC) is an important issue because of its relationship with subsequent kidney dysfunction and high mortality. Herein we addressed whether machine learning (ML) algorithms could predict postoperative AKI risk better than conventional logistic regression (LR) models. A total of 4104 RCC patients who had undergone unilateral nephrectomy from January 2003 to December 2017 were reviewed. ML models such as support vector machine, random forest, extreme gradient boosting, and light gradient boosting machine (LightGBM) were developed, and their performance based on the area under the receiver operating characteristic curve, accuracy, and F1 score was compared with that of the LR-based scoring model. Postoperative AKI developed in 1167 patients (28.4%). All the ML models had higher performance index values than the LR-based scoring model. Among them, the LightGBM model had the highest value of 0.810 (0.783–0.837). The decision curve analysis demonstrated a greater net benefit of the ML models than the LR-based scoring model over all the ranges of threshold probabilities. The application of ML algorithms improves the predictability of AKI after nephrectomy for RCC, and these models perform better than conventional LR-based models.

www.nature.com/scientificreports/ for irreversible kidney dysfunction [18][19][20] . Furthermore, there is increasing concern that the transition to chronic kidney disease after nephrectomy is associated with both all-cause 21,22 and cancer-specific mortality 23 .
Although previous studies have focused on postoperative kidney function after nephrectomy in the short-or intermediate-to-long term 13,14,[16][17][18][19] , few models for predicting postoperative AKI have been developed. Moreover, these studies included patients who underwent certain types of surgery (e.g. laparoscopic or robot-assisted laparoscopic) rather than all kinds of operations 15,20 . Preparing for AKI beforehand may not be easy because several conditions in addition to operative settings have interactive and complex effects on the risk. The heterogeneous features of patients may also make it difficult to accomplish precise prediction. A previous logistic regression (LR) model (e.g., the simple postoperative AKI risk [SPARK] index) has suitable performance in predicting the risk of postoperative AKI in noncardiac surgery, but its performance has not been validated in the urologic surgery 24 . To overcome these limitations, we aimed to apply several machine learning models in predicting AKI after nephrectomy for RCC, and compared their performance with that of conventional LR models.

Methods
Patient and study design. A total of 4659 patients who were diagnosed with RCC and thus had undergone unilateral PN or RN between January 2003 and December 2017 were retrospectively reviewed. Patients were excluded if they met any of the following criteria: less than 18 years old (n = 11); metastatic RCCs (clinical T stage = 4; N stage > 0; and M stage > 0) (n = 331); previous history of nephrectomy (n = 3); kidney transplant recipients (n = 13); staged nephrectomy due to bilateral RCCs (n = 6); congenital single kidney before surgery (n = 4); presence of postoperative complications requiring re-operation (n = 3); and incomplete laboratory information (n = 184). Accordingly, 4,104 patients were analyzed in the present study. The study was approved by the institutional review boards of Seoul National University Hospital (H-1904-005-1021) and Seoul National University Bundang Hospital (B-1905-538-404) and was conducted in accordance with the principle of the Declaration of Helsinki. The requirement to obtain informed consent from the patients was waived by the above two IRBs.
Study variables. Patient demographics such as clinical and laboratory data were recorded. Preoperative and intraoperative data (such as age, sex, body mass index, smoking status, hypertension, diabetes mellitus, histories of myocardial infarction, stroke, peripheral vascular disease, chronic hepatitis B and C, and other cancers, medications of angiotensin-converting enzyme inhibitors and angiotensin receptor blockers, type of operation, total and ischemic time of operation, estimated amounts of blood loss, intraoperative transfusion) and tumorspecific data (such as tumor size and clinical T stage) were extracted from electronic medical records. Blood laboratory data, such as preoperative serum creatinine, blood urea nitrogen, albumin, and hemoglobin, were obtained. For serum creatinine, postoperative values were also obtained. The estimated glomerular filtration rate (eGFR) was calculated using the Chronic Kidney Disease Epidemiology Collaboration equation 25 . Proteinuria was defined as ≥ 1+ on a dipstick test.
The primary outcome was postoperative AKI, defined as an increase in serum creatinine level to ≥ 0.3 mg/ dL within 48 h or ≥ 1.5 times baseline within 7 days after operation according to the Kidney Disease Improving Global Outcomes guideline 26 . If the serum creatinine decreased within the non-AKI range and was at least 0.3 mg/dL below the peak level, the cases were defined as recovered AKI 27 .
Statistical analysis. All analyses were implemented using R software (version 3.6.3; R Foundation for Statistical Computing). Comparisons of baseline characteristics were performed with the Wilcoxon rank-sum test for continuous variables and the chi-square test for categorical variables. The patients were randomly assigned to training (70%) and testing (30%) datasets. Using the training dataset, we developed machine learning models such as support vector machine (SVM), random forest, extreme gradient boosting (XGBoost), and light gradient boosting machine (LightGBM) to predict the risk of AKI. As a reference model, we used multivariable LR analysis (herein termed the LR-scoring model). Variables with a P value of < 0.2 in the univariate model were adjusted with a stepwise fashion. The logistic coefficients were used as clinical scores by proportionally assigning points and rounding to the nearest integer. For another reference, we used the SPARK index which had been validated in patients undergoing noncardiac operations 24 . SVM constructs a hyperplane in a high-dimensional space, which can be used for classification. Random forest is an ensemble of decision trees created by using bootstrap samples of the training dataset and random selection in tree induction 28 . For the random forest model, we used a grid search strategy to identify the best combination of hyperparameters with the caret package. XGBoost is an ensemble approach with a gradient descent-boosted decision tree algorithm 29 . We selected a low learning rate (0.0001), interaction depth of 5, and a maximum of 3000 iterations. LightGBM is an improvement framework based on the gradient descent-boosted decision tree algorithm and is more powerful than the previous XGBoost with a fast training speed and less memory occupation 30 . To minimize potential overfitting in the above machine learning models, we used tenfold cross-validation and out-of-bag estimation during development.
The model performance was assessed with the area under the receiver operating characteristic curve (AUROC), accuracy, and F1 score in the testing dataset. To calculate the performance of the SPARK index, we used the best threshold point of the curve. The DeLong test was used to compare AUROCs 31 . The net benefit over a specified range of threshold probabilities in outcome was evaluated using decision curve analysis 32,33 . The Hosmer-Lemeshow test was used to assess calibration. Two-sided P values less than 0.05 were considered significant. Model performance in predicting AKI. When adjustment with a stepwise fashion was applied, several factors, such as male sex, diabetes mellitus, hypertension, RN, large tumor size, long operation time, intraoperative transfusion, and low eGFR were selected as risk factors for AKI in the LR-scoring model (Table S1). The corresponding clinical scores in this LR model are presented in Fig. S1. We set up two LR-based models, the SPARK index and the LR-scoring model as a reference for comparison with the machine learning models. Among the models developed, the LightGBM model had the highest AUROC value (0.810 [0.783-0.837]), whereas the SPARK index showed the lowest AUROC value (0.626 [0.607-0.644]) ( Table 2). All the machine learning models had higher AUROC values than the SPARK index. The LightGBM model had a higher AUROC value than the LR-scoring model with marginal significance. Corresponding curves supported these results (Fig. 1). When other performance indices, such as accuracy and F1 score, were examined, the XGBoost model had the best performance, and the LR-based models, including the SPARK index and the LR-scoring model, had the poorest performance. In decision curve analysis (Fig. 2), the net benefit was greater for machine learning models than for the SPARK index over all the ranges of threshold probabilities. The Light-GBM, XGBoost and SVM models had the highest net benefits among the models. The LR-scoring model had a negative benefit in > 0.6 of the threshold probabilities. The LightGBM, XGBoost, random forest, and LR-scoring models were well calibrated (all P > 0.05), but the other models were not (all P < 0.05) (Fig. 3). Based on these results, the LightGBM model was chosen as the best model for predicting postoperative AKI.

Variable ranking analysis.
To estimate the contribution degree of each variable in predicting the risk of AKI, variable ranking analysis was performed (Fig. 4). Relative values ranged from 0 to 1, which indicated the proportional contribution of variables in predicting AKI. Accordingly, type of operation, sex, tumor size, operation time, and baseline eGFR were highly ranked as the top predictors.

Discussion
It has become more important to precisely predict AKI in patients undergoing nephrectomy for RCC because surviving patients with AKI will suffer from subsequent chronic kidney disease and other worse outcomes. The present study first applied machine learning algorithms to accomplish the precise prediction of postoperative AKI, and the performance and calibration of these models were better than those of the LR-based reference models. Based on ranking analysis, certain variables were noted to contribute more to the predictive performance of the models. These results indicate that the precise prediction of postoperative AKI is achievable by machine learning despite the complex and interactive relationships of several variables.
A meta-analysis of 71 studies suggested that machine learning algorithms did not improve discriminative power over traditional LR-based models in predicting various clinical outcomes such as diabetes mellitus, infection, heart failure, and cancer 34 . Nevertheless, one study reported the superiority of machine learning models to the LR model in predicting AKI after minimally invasive laparoscopic or robot-assisted laparoscopic nephrectomy for RCC 15 . The present study dealing with all operation types supports this result with better model performance. Particularly, the performance improvement by the LightGBM model can be acceptable to alert clinicians of the risk of postoperative AKI.
Decision curve analysis takes into account the weights of different misclassification types with a direct clinical interpretation of the net benefit (i.e., the trade-off between undertreatment and overtreatment in the model) 32,33 . It is useful to compare models where the default strategies predict all-or-none outcomes such as AKI. All the machine learning models had greater net benefit over the range of threshold probabilities than the SPARK index. The LR-scoring model had a negative value of net benefit in a high range of threshold probabilities. These results provide clues on how machine learning models will be applicable to clinical practice.
The ranking analysis showed that certain variables such as nephrectomy type, patient characteristics (e.g., age and sex), and laboratory findings (e.g., eGFR and hemoglobin), contributed to the model performance. These results support the findings of previous large cohort studies focusing on postoperative AKI [14][15][16][17][18][19] . Only one or two variables may not be enough to accomplish a perfect prediction. Accordingly, modeling with at least the top variables obtained from the ranking analysis is needed if another model in an independent population should be developed.
Although the results were informative, some limitations should be discussed. The study design was retrospective in nature which may have potential selection bias. The study identified the most important variables with respect to predicting mortality, but we could not obtain certain degrees of risk, such as the relative risk, which is a common limitation of machine learning algorithms. The study results may not be applicable to some specific populations such as patients with metastasis or kidney transplant recipients. Concerns could be raised regarding other issues such as the absence of external validation and the effects of unidentified factors.
The application of machine learning algorithms improves the predictability of AKI after nephrectomy for RCC, and these models performed better than conventional LR-based models. If machine learning-based prediction models are successfully applied in clinical practice, the overall patient outcomes will improve by www.nature.com/scientificreports/ implementing earlier management. Future studies will explore whether machine learning is also applicable to predicting other outcomes after nephrectomy with validating results in independent cohorts.