A validation study comparing existing prediction models of acute kidney injury in patients with acute heart failure

Acute kidney injury (AKI) is a common complication in acute heart failure (AHF) and is associated with prolonged hospitalization and increased mortality. The aim of this study was to externally validate existing prediction models of AKI in patients with AHF. Data for 10,364 patients hospitalized for acute heart failure between 2008 and 2018 were extracted from the Chang Gung Research Database and analysed. The primary outcome of interest was AKI, defined according to the KDIGO definition. The area under the receiver operating characteristic (AUC) curve was used to assess the discrimination performance of each prediction model. Five existing prediction models were externally validated, and the Forman risk score and the prediction model reported by Wang et al. showed the most favourable discrimination and calibration performance. The Forman risk score had AUCs for discriminating AKI, AKI stage 3, and dialysis within 7 days of 0.696, 0.829, and 0.817, respectively. The Wang et al. model had AUCs for discriminating AKI, AKI stage 3, and dialysis within 7 days of 0.73, 0.858, and 0.845, respectively. The Forman risk score and the Wang et al. prediction model are simple and accurate tools for predicting AKI in patients with AHF.

Study population. We analysed the records of patients who had emergency department visits and were subsequently admitted due to acute heart failure (ICD-9-CM diagnostic code: 428; ICD-10-CM diagnostic code: I50) between January 1, 2008, and December 31, 2018, at all 7 Chang Gung Memorial Hospital branches located in Linkou, Taipei, Taoyuan, Keelung, Yunlin, Chiayi, and Kaohsiung, which span northern to southern Taiwan. When a patient had multiple AHF episodes, the first episode of AHF hospitalization between 2008 and 2018 was selected as the index hospitalization. The admission date of the index AHF hospitalization was used as the index date. The first record of laboratory examination results was used as the baseline laboratory data; the first records of vital sign data and medication treatment within 48 h after admission were also collected for further analysis.
Patients without sufficient data for AKI assessment were not included, such as those without baseline creatinine data or a second creatinine examination within 7 days of admission. The remaining patients were excluded if they met any of the following criteria: (1) were younger than 18 years old, (2) had end-stage renal disease or were undergoing maintenance dialysis, (3) had follow-up of less than 24 h, (4) received extracorporeal membrane oxygenation during the index admission, (5) anticipated cardiac transplantation, (6) received a nephrotoxic agent (including contrast agents, nonsteroidal anti-inflammatory drugs, aminoglycoside, and vancomycin) within 4 weeks of admission or during the index admission, (7) had obstructive nephropathy, and (8) had acute coronary syndrome with inotropic agents used during the index admission. After exclusion, 10,364 patients remained eligible for the study, of whom 1483 (14.3%) had AKI (Fig. 1).

Existing prediction models and covariates.
A review of previous studies found five prediction models or risk factor studies for AKI prediction in patients with AHF. All five were externally validated in this study.
The prediction models were as follows: the Forman risk score, reported in 2004 and based on four prediction factors, including underlying diseases as well as clinical and laboratory parameters 13 ; the Basel risk score, reported in 2011 and using chronic kidney disease, bicarbonate level, and outpatient diuretic treatment as indices 1 ; the prediction model reported by Wang et al. in 2013 based on a data analysis of 1709 patients and using 8 prediction factors 15 ; the prediction model reported by Zhou et al. in 2016 that combines clinical parameters and novel urine biomarkers for AKI prediction in AHF patients 16 , though we were unable to include the NT-proBNP and urine biomarkers used by Zhou et al. because the CGRD did not include these data; and the study by Verdiani et al. in 2010 investigating predication of AKI in hospitalized AHF patients 14 .
The study populations, publication years, heart failure criteria, and full lists of predictors of the prediction models are summarized in Table 1.
Outcome definition. The primary outcome was the development of AKI within 7 days after admission. The first record of the creatinine level during emergency department admission was used as the baseline creatinine level, and AKI was defined as an increase in serum creatinine by 0. 3  www.nature.com/scientificreports/  www.nature.com/scientificreports/ Injury 20 . This study also validated the performance of existing prediction models in predicting serious AKI events, including AKI stage 3 and dialysis. Stage 3 AKI was defined as a ≥ 200% increase in serum creatinine, a serum creatinine concentration of ≥ 4 mg/dl, or the initiation of dialysis within 7 days of study enrolment, according to the KDIGO guidelines. Urine output was not used to define AKI because these data were not complete in the CGRD.
To identify the clinical end point of acute kidney injury, we also evaluated the development of major adverse kidney events (MAKEs) on or after the 8 th day following the index date. MAKEs were defined as the composite of chronic kidney disease (an estimated glomerular filtration rate [eGFR] decline of > 25% from baseline), endstage renal disease (ESRD) requiring chronic renal replacement therapy, and all-cause mortality 23,24 . We assessed MAKEs within 1 year of AKI diagnosis and from the index date to the final visit date, the date of death, the date of event occurrence, or December 31, 2018, whichever came first. Only patients with a follow-up duration of > 7 days were included in the MAKE analysis.
Statistical analysis. Due to the presence of a substantial amount of missing data, we imputed the data using the single expectation maximization (EM) method for the primary analysis in this study. To test the robustness of the results, only patients with complete data were retained and used in the sensitivity analysis. It was noted that the missing rate of bicarbonate data was particularly high (59%).
The characteristics of the patients in the AKI and non-AKI groups were compared using the Mann-Whitney U-test for continuous variables (due to the lack of normality) and the chi-square test for categorical variables. The discrimination ability of individual scores in predicting an outcome of interest (i.e., AKI, AKI stage 3, or dialysis) in patients with AHF was determined using the area under the receiver operating characteristic curve (AUC). Optimal cut-off points were determined using the Youden index, and the corresponding sensitivity and specificity were calculated. The AUCs among the existing prediction models were compared in a pairwise manner using the DeLong test. In addition, the calibration performance of each score was assessed using the Hosmer-Lemeshow (HL) goodness-of-fit test, with smaller statistics (chi-square) indicating a smaller discrepancy between the predicted probability and observed AKI event for the prediction models. The patients were divided into two subgroups according to the optimal cut-off of each score. The risk of MAKEs on or after the 8 th day following the index date was compared between the higher and lower cut-offs of each score using the Cox proportional hazards model.
All tests were 2-tailed, and P < 0.05 was considered statistically significant. Data analyses were conducted using SPSS 25 (IBM SPSS Inc., Chicago, Illinois).

Ethics approval and consent to participate. The Institutional Review Board of Chang Gung Memo-
rial Hospital approved the study (approval number: CGMHIRB No. 202000915B0) and waived the need for informed consent due to the retrospective nature of the study, which did not compromise the privacy of any patients.

Results
Baseline characteristics. The patients' characteristics at baseline are presented in Table 2. A total of 10,364 patients were included in the analysis, of whom 1483 (14.3%) developed AKI. The median age and sex distribution were similar in the AKI and non-AKI groups. Of the total patient population, 42.2% had been diagnosed with congestive heart failure, 36.8% with diabetes mellitus, 46.0% with chronic kidney disease, and 56.9% with hypertension. The AKI group showed a significantly higher prevalence of diabetes mellitus (43.2%), chronic kidney disease (56.1%), and hypertension (61.9%). A total of 1581 patients in the total population exhibited severe heart failure symptoms and were categorized as New York Heart Association (NYHA) functional class IV. The AKI group also had a significantly higher percentage of patients categorized as NYHA functional class IV (22.3%) than the non-AKI group (14.1%).
Regarding the clinical parameters, the AKI group patients had significantly higher systemic blood pressure upon admission. The AKI group also exhibited significantly lower haemoglobin, lymphocyte percentage, serum albumin, and bicarbonate levels as well as higher creatinine, blood urea nitrogen, potassium, lactic acid, and BNP levels. A higher percentage of AKI group patients showed positive proteinuria results via dipstick tests. The AKI group received higher dosages of loop diuretics during their AHF admission period; however, there was no significant difference between the groups in outpatient diuretic treatment strategy. The AKI group was more likely to receive calcium channel blockers but less likely to use digoxin during hospitalization ( Table 2).
Validation of existing prediction models for AKI. The performance of predicting AKI events in patients with AHF was externally validated for each existing prediction, as summarized in Table 3. The AUC discrimination ability was highest for the Wang Table 3). The pairwise comparison results for the AUCs showed that all of the AUCs differed significantly between any two prediction models, except for the Basel risk score and Verdiani et al. model (Table 4).

Extension of models for predicting AKI stage 3 and dialysis.
Among the 1483 patients with AKI, 519 (35%) were stage 1, 96 (6%) were stage 2, and 868 (58%) were stage 3. Among the 868 patients with stage 3 AKI, 509 (59%) did not undergo dialysis, and 259 (41%) had AKI requiring dialysis. We extended the scores of the prediction models to examine their ability to predict AKI stage 3 and dialysis. The results showed that the www.nature.com/scientificreports/  (Table 3 and Fig. 2B). Similar to the results for predicting AKI stage 3, the discrimination performance of the Wang et al. model (AUC = 84.5%) and Forman risk score (81.7%) for dialysis was satisfactory, and these models had relatively low HL chi-square statistics (Table 3 and Fig. 2C). In addition, all of the AUC pairwise comparisons differed significantly in predicting AKI stage 3 or dialysis.
Extension of models for predicting MAKEs. We next analysed MAKEs within 1 year of AKI diagnosis and MAKEs from the index day to the end of follow-up. A total of 6137 (62.8%) patients suffered from MAKE events, of whom 5437 (55.6%) patients developed CKD, 1298 (13.3%) patients developed ESRD requiring chronic renal replacement therapy, and 2684 (27.5%) died. The patients were separated into two groups according to the cut-off value determined by the Youden index in predicting AKI and AKI stage 3. The results showed that the group with higher risk scores had a significantly greater risk of MAKEs than did the group with lower scores according to all five models, of which the hazard ratios ranged from 1.56 to 1.76 for 1-year followup and 1.52-1.70 for the index date to the end of follow-up (Fig. 3A,B).

Additional analysis.
We further conducted an additional analysis by excluding the patients with a creatinine level > 3.5 at arrival of the index admission because those patients may not be right at the cusp of developing severe AKI and may be lower on the nonlinear creatinine curve. The analyses demonstrated similar results to the overall results in that the performance of discrimination and calibration of the Wang et al. study and Forman risk score was superior to that of the other three scores (Supplemental Table 1). In addition, the results were generally consistent with the overall results when using the complete data set without any missing values (Supplemental Table 2).

Discussion
In the present study, we externally validated five existing models for predicting the risk of AKI in patients with AHF. The Forman risk score and Wang et al. model showed superior discrimination and calibration performance compared with the three other models. The development of AKI in patients with AHF leads to prolonged hospitalization, increased readmission rates, and increased short-and long-term all-cause mortality and cardiovascular mortality. Coexisting AKI and AHF also lead to higher health care costs for patients with heart failure 4-6 . In the past two decades, many studies have focused on the early identification of patients with AHF who are at high risk of AKI development to initiate intervention earlier and improve their clinical outcomes. Some of these studies have used clinical parameters as risk predictors, and others have introduced or added novel urine biomarkers for AKI prediction 1,[13][14][15][16] . However, the widely varying definition and classification of AKI (or WRF in some studies) as well as differences in the observed time-at-risk and heterogeneity of study populations have hindered the cross-comparison of published data. For this reason, AKI in the present study was defined according to the KDIGO Clinical Practice Guidelines for Acute Kidney Injury published in 2012 20 , which are currently the most widely accepted and used criteria. www.nature.com/scientificreports/ www.nature.com/scientificreports/ To our knowledge, this is the first multi-institution validation study to use the KDIGO guidelines to compare existing prediction models of AKI in patients with AHF. Among the AKI prediction models for patients with AHF, the Forman risk score, which was the first to be published, utilizes 4 factors (i.e., congestive heart failure history, diabetes mellitus, systolic blood pressure over 160 mmHg during admission, and elevated creatinine). The study introducing the risk score showed predictive ability for AKI in AHF, but it did not report any area under the ROC curve 13 15 . The subsequently developed Basel risk score sought to use fewer predictive factors to achieve better prediction ability. Chronic Table 4. Pairwise comparisons of area under the receiver operating characteristic curve between the prediction models. *Indicates P < 0.05; † DeLong's test.  www.nature.com/scientificreports/ kidney disease, bicarbonate level, and outpatient diuretic treatment were used for AKI prediction, and the AUC was reported to be 0.71 in the original article. However, a few years later, Wang et al. found no difference in discrimination ability between the Basel and Forman risk scores, both of which had an AUC of 0.65 according to externally validated results 15 . In 2013, Wang et al. reported a prediction score derived from a larger patient number and, for the first time, included proteinuria as one of the risk factors for AKI prediction in the AHF population. Since then, proteinuria has been increasingly reported to be not only a predictive factor but also an aggravating factor in AKI [25][26][27]  Although these markers might provide more information in AKI prediction, they have not been widely examined and are more expensive to assess. This means that using these prediction models was more costly. To offer an easier and more cost-effective choice, we externally validated these five prediction models in the present study.
Our current study not only externally validated these five prediction models in terms of AKI prediction but also estimated their performance in predicting serious AKI events, including AKI stage 3 and dialysis. As Table 3 shows, the AUCs of these prediction models for AKI prediction ranged from 0.543 to 0.73. Better performance was noted in AKI stage 3 and dialysis prediction, with AUCs of 0.565-0.858 and 0.539-0.845, respectively. All five prediction models showed favourable ability in long-term outcome prediction, with significantly higher incidences of MAKEs in the high-score groups than in the low-score groups. Thus, these prediction models can not only predict AKI events in AHF patients during hospitalization but also predict long-term adverse events in AHF patients.
Of the five prediction models we validated, the Forman risk score and Wang et al. model showed superior discrimination and calibration. The AKI risk score for AHF derived using the Wang et al. model had the best performance; its AUC was 0.73 in AKI prediction, and its AUCs for AKI stage 3 and dialysis were 0.858 and 0.845, respectively. This scoring system showed favourable calibration in predicting all three outcomes. The Forman risk score also showed good performance and calibration in AKI, AKI stage 3, and dialysis prediction, with AUCs of 0.696, 0.829, and 0.817, respectively.
Although the further pairwise comparison of AUCs revealed significant differences between the Wang et al. model and Forman risk score (Table 4), both had excellent discrimination (AUC of 0.8-0.9) by general definition 28 . Considering this, the Forman risk score may be seen as a relatively easier and more convenient tool for predicting AKI in AHF patients clinically because it requires only 4 clinical factors.
Much current research is being conducted to identify serum or urine biomarkers for early AKI prediction. However, these biomarkers are more costly to utilize and have not yet been widely examined in general laboratory settings. Some recent studies have reported that adding urine biomarkers to clinical prediction models yielded no significant performance improvement [29][30][31][32][33] , and Törnblom et al. even reported that new statistical methods no longer support using uNGAL to predict AKI in certain patient groups 32 . Taking this into consideration, prediction models based on clinical parameters seem to offer a faster, cheaper, and easier means of AKI prediction, thus increasing the likelihood of AKI prevention and early intervention. The current study demonstrated that a clinical prediction model alone can provide excellent discrimination ability for AKI in AHF patients. Clinical prediction models can achieve an AUC of 0.80, which is particularly high for serious AKI event prediction.

Strengths and limitations.
Our study has several notable strengths. First, this is the first multi-institution validation study to compare existing prediction models of AKI in AHF patients based on the KDIGO Clinical Practice Guidelines. Second, our study further evaluated the performance of these prediction models in predicting serious AKI events and revealed that these prediction models also offer high discriminative power for predicting AKI stage 3 and dialysis. Third, this study not only assessed the short-term renal outcomes of patients with AHF but also evaluated their long-term outcomes. We demonstrated that patients with scores above the cut-off value had poorer long-term outcomes (defined by MAKE incidence) than did patients with lower scores.
This study also has some limitations. First, this was a retrospective analysis, and the inherent drawbacks of this design cannot be avoided. Second, the first record of the creatinine level upon emergency department admission was used as the baseline creatinine level, and AKI was defined by the subsequent change in creatinine. Thus, our study could only examine predictive ability in terms of AKI development during admission and not AKI at admission. For patients with higher baseline creatinine levels, small changes in eGFR could lead to a 0.3 mg/dL increase according to the KDIGO guideline definition. Third, data limitations prevented some prediction factors from being validated, including NT-proBNP, uNGAL, and uAGT. Last, the present study was based on CGRD data, so the enrolled patients were relatively homogenous. The result of external validation in this study might not be applicable to other populations.

Conclusion
We externally validated five existing prediction models for AKI in patients with AHF. The Forman risk score and Wang et al. model showed favourable discrimination and calibration in predicting AKI, AKI stage 3, and dialysis. The Forman risk score, as it comprises only 4 prediction factors, may offer the easiest and fastest means of individual risk prediction as well as risk stratification. By utilizing appropriate prediction models, clinicians www.nature.com/scientificreports/ can assess the risk of AKI in patients with AHF earlier and thus plan and initiate adequate disease management for these patients in a much timelier manner.