Introduction

Acute heart failure (AHF) with abrupt onset dyspnoea, sensation of suffocation, and sometimes, pink frothy expectoration is a leading cause of hospitalization. Acute kidney injury (AKI) or worsening renal function (WRF), the term used in some previous studies, are common complications among patients with AHF, with an incidence of 21–45%1,2,3. Previous studies have revealed that the development of AKI in patients with AHF results in longer hospital stays, higher readmission rates, and increased short- and long-term mortality4,5,6,7. Smith et al. further reported that even a slightly increased creatinine level (≥ 0.2 mg/dL) increases the risk of mortality among patients with AHF8. In the past few years, investigators have reported that AHF patients might experience “congestion”, named to describe signs and symptoms of extracellular fluid accumulation that results in increased cardiac filling pressure, and “renal congestion” has been recognized as part of systemic congestion. Renal congestion, resulting from lower cardiac output, tubuloglomerular feedback, increased intra-abdominal pressure and increased venous pressure, has been viewed as a contributor to renal function impairment in AHF9. Nevertheless, evidence has indicated that in those patients decongested at discharge, in-hospital WRF was not associated with worse outcomes10,11. However, there is no reliable clinical or laboratory marker to distinguish WRF caused by renal congestion and “true” acute kidney injury9,11,12. Therefore, the early recognition and identification of patients who are at high risk of developing AKI is still essential for early prevention and treatment in patients with AHF.

For these reasons, many studies have focused on identifying relevant risk factors, with some having derived AKI prediction models in patients with AHF1,13,14,15,16,17. The Forman risk score, first reported in 2004, was initially based on hospitalized heart failure patients but was later externally validated in AHF patients, and it is arguably the best-known prediction model worldwide13. Following the subsequently developed Basel risk score, prediction models were also proposed by Wang et al. and Zhou et al. between 2011 and 20161,15,16. However, the definition of AKI or WRF varies in these studies due to the AKI classification changing from the RIFLE (Risk, Injury, Failure, Loss of kidney function, and End-stage kidney disease) classification and AKIN (Acute Kidney Injury Network) criteria to the KDIGO (Kidney Disease Improving Global Outcome) guidelines in the past few years18,19,20. In addition to the changing AKI definition and classification by year, these existing prediction models also vary by population, region, sample size and research methods. Considering the importance of early identification, prevention, and intervention of AKI in patients with AHF, revalidating the performance and discrimination of these prediction models together and according to the current AKI definition seems to be necessary. Therefore, we aim to externally validate the existing prediction models for AKI in patients with AHF based on the KDIGO Clinical Practice Guidelines for Acute Kidney Injury.

Methods

Data source

This study was based on the electronic medical records of the Chang Gung Research Database (CGRD) from the Chang Gung Medical Foundation. The database incorporates data from the nationwide Chang Gung Memorial Hospital system, which is the largest health care system of its kind in Taiwan, comprising two medical centres, two regional hospitals, and three district hospitals. The CGRD consists of clinical epidemiological data, laboratory data, inpatient and outpatient records, emergency medical records, pathology reports, and disease category data. The overall coverage rates of the CGRD are approximately 20% for outpatients and 12% for inpatients for the entire Taiwanese population. More detailed information about the CGRD has been reported in previous studies21,22. Its disease diagnoses are coded using the International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) for records before 2016 and the International Classification of Diseases, Tenth Revision, Clinical Modification (ICD-10-CM) for those thereafter. The Institutional Review Board of Chang Gung Memorial Hospital approved the study (approval number: CGMHIRB No. 202000915B0) and waived the need for informed consent due to the retrospective nature of the study, which did not compromise the privacy of any patients. In this study, all the methods were performed in accordance with the Declaration of Helsinki.

Study population

We analysed the records of patients who had emergency department visits and were subsequently admitted due to acute heart failure (ICD-9-CM diagnostic code: 428; ICD-10-CM diagnostic code: I50) between January 1, 2008, and December 31, 2018, at all 7 Chang Gung Memorial Hospital branches located in Linkou, Taipei, Taoyuan, Keelung, Yunlin, Chiayi, and Kaohsiung, which span northern to southern Taiwan. When a patient had multiple AHF episodes, the first episode of AHF hospitalization between 2008 and 2018 was selected as the index hospitalization. The admission date of the index AHF hospitalization was used as the index date. The first record of laboratory examination results was used as the baseline laboratory data; the first records of vital sign data and medication treatment within 48 h after admission were also collected for further analysis.

Patients without sufficient data for AKI assessment were not included, such as those without baseline creatinine data or a second creatinine examination within 7 days of admission. The remaining patients were excluded if they met any of the following criteria: (1) were younger than 18 years old, (2) had end-stage renal disease or were undergoing maintenance dialysis, (3) had follow-up of less than 24 h, (4) received extracorporeal membrane oxygenation during the index admission, (5) anticipated cardiac transplantation, (6) received a nephrotoxic agent (including contrast agents, nonsteroidal anti-inflammatory drugs, aminoglycoside, and vancomycin) within 4 weeks of admission or during the index admission, (7) had obstructive nephropathy, and (8) had acute coronary syndrome with inotropic agents used during the index admission. After exclusion, 10,364 patients remained eligible for the study, of whom 1483 (14.3%) had AKI (Fig. 1).

Figure 1
figure 1

Flow chart for patient selection.

Existing prediction models and covariates

A review of previous studies found five prediction models or risk factor studies for AKI prediction in patients with AHF. All five were externally validated in this study.

The prediction models were as follows: the Forman risk score, reported in 2004 and based on four prediction factors, including underlying diseases as well as clinical and laboratory parameters13; the Basel risk score, reported in 2011 and using chronic kidney disease, bicarbonate level, and outpatient diuretic treatment as indices1; the prediction model reported by Wang et al. in 2013 based on a data analysis of 1709 patients and using 8 prediction factors15; the prediction model reported by Zhou et al. in 2016 that combines clinical parameters and novel urine biomarkers for AKI prediction in AHF patients16, though we were unable to include the NT-proBNP and urine biomarkers used by Zhou et al. because the CGRD did not include these data; and the study by Verdiani et al. in 2010 investigating predication of AKI in hospitalized AHF patients14.

The study populations, publication years, heart failure criteria, and full lists of predictors of the prediction models are summarized in Table 1.

Table 1 Existing prediction models for acute kidney injury in patients with acute heart failure.

Outcome definition

The primary outcome was the development of AKI within 7 days after admission. The first record of the creatinine level during emergency department admission was used as the baseline creatinine level, and AKI was defined as an increase in serum creatinine by 0.3 mg/dL within 48 h or a 50% increase in serum creatinine within 7 days, in accordance with the KDIGO Clinical Practice Guidelines for Acute Kidney Injury20. This study also validated the performance of existing prediction models in predicting serious AKI events, including AKI stage 3 and dialysis. Stage 3 AKI was defined as a ≥ 200% increase in serum creatinine, a serum creatinine concentration of ≥ 4 mg/dl, or the initiation of dialysis within 7 days of study enrolment, according to the KDIGO guidelines. Urine output was not used to define AKI because these data were not complete in the CGRD.

To identify the clinical end point of acute kidney injury, we also evaluated the development of major adverse kidney events (MAKEs) on or after the 8th day following the index date. MAKEs were defined as the composite of chronic kidney disease (an estimated glomerular filtration rate [eGFR] decline of > 25% from baseline), end-stage renal disease (ESRD) requiring chronic renal replacement therapy, and all-cause mortality23,24. We assessed MAKEs within 1 year of AKI diagnosis and from the index date to the final visit date, the date of death, the date of event occurrence, or December 31, 2018, whichever came first. Only patients with a follow-up duration of > 7 days were included in the MAKE analysis.

Statistical analysis

Due to the presence of a substantial amount of missing data, we imputed the data using the single expectation maximization (EM) method for the primary analysis in this study. To test the robustness of the results, only patients with complete data were retained and used in the sensitivity analysis. It was noted that the missing rate of bicarbonate data was particularly high (59%).

The characteristics of the patients in the AKI and non-AKI groups were compared using the Mann–Whitney U-test for continuous variables (due to the lack of normality) and the chi-square test for categorical variables. The discrimination ability of individual scores in predicting an outcome of interest (i.e., AKI, AKI stage 3, or dialysis) in patients with AHF was determined using the area under the receiver operating characteristic curve (AUC). Optimal cut-off points were determined using the Youden index, and the corresponding sensitivity and specificity were calculated. The AUCs among the existing prediction models were compared in a pairwise manner using the DeLong test. In addition, the calibration performance of each score was assessed using the Hosmer–Lemeshow (HL) goodness-of-fit test, with smaller statistics (chi-square) indicating a smaller discrepancy between the predicted probability and observed AKI event for the prediction models. The patients were divided into two subgroups according to the optimal cut-off of each score. The risk of MAKEs on or after the 8th day following the index date was compared between the higher and lower cut-offs of each score using the Cox proportional hazards model.

All tests were 2-tailed, and P < 0.05 was considered statistically significant. Data analyses were conducted using SPSS 25 (IBM SPSS Inc., Chicago, Illinois).

Ethics approval and consent to participate

The Institutional Review Board of Chang Gung Memorial Hospital approved the study (approval number: CGMHIRB No. 202000915B0) and waived the need for informed consent due to the retrospective nature of the study, which did not compromise the privacy of any patients.

Results

Baseline characteristics

The patients’ characteristics at baseline are presented in Table 2. A total of 10,364 patients were included in the analysis, of whom 1483 (14.3%) developed AKI. The median age and sex distribution were similar in the AKI and non-AKI groups. Of the total patient population, 42.2% had been diagnosed with congestive heart failure, 36.8% with diabetes mellitus, 46.0% with chronic kidney disease, and 56.9% with hypertension. The AKI group showed a significantly higher prevalence of diabetes mellitus (43.2%), chronic kidney disease (56.1%), and hypertension (61.9%). A total of 1581 patients in the total population exhibited severe heart failure symptoms and were categorized as New York Heart Association (NYHA) functional class IV. The AKI group also had a significantly higher percentage of patients categorized as NYHA functional class IV (22.3%) than the non-AKI group (14.1%).

Table 2 Baseline characteristics of patients with and without AKI.

Regarding the clinical parameters, the AKI group patients had significantly higher systemic blood pressure upon admission. The AKI group also exhibited significantly lower haemoglobin, lymphocyte percentage, serum albumin, and bicarbonate levels as well as higher creatinine, blood urea nitrogen, potassium, lactic acid, and BNP levels. A higher percentage of AKI group patients showed positive proteinuria results via dipstick tests. The AKI group received higher dosages of loop diuretics during their AHF admission period; however, there was no significant difference between the groups in outpatient diuretic treatment strategy. The AKI group was more likely to receive calcium channel blockers but less likely to use digoxin during hospitalization (Table 2).

Validation of existing prediction models for AKI

The performance of predicting AKI events in patients with AHF was externally validated for each existing prediction, as summarized in Table 3. The AUC discrimination ability was highest for the Wang et al. model (AUC = 73%), followed by the Forman risk score (69.6%), Basel risk score (59.7%), Verdiani et al. model (58.8%), and Zhou et al. model (54.3%) (Fig. 2A). Regarding calibration, the HL chi-square statistics were the smallest for the Wang et al. model, followed by the Forman risk score, Zhou et al. model, Basel risk score, and Verdiani et al. model (Table 3). The pairwise comparison results for the AUCs showed that all of the AUCs differed significantly between any two prediction models, except for the Basel risk score and Verdiani et al. model (Table 4).

Table 3 Prediction model performance in discrimination and calibration outcomes of interest.
Figure 2
figure 2

The discrimination ability by assessing the area under the receiver operating characteristic (AUC) curve for AKI (A), AKI stage 3 (B), and dialysis (C). AKI acute kidney injury; CI confidence interval.

Table 4 Pairwise comparisons of area under the receiver operating characteristic curve between the prediction models.

Extension of models for predicting AKI stage 3 and dialysis

Among the 1483 patients with AKI, 519 (35%) were stage 1, 96 (6%) were stage 2, and 868 (58%) were stage 3. Among the 868 patients with stage 3 AKI, 509 (59%) did not undergo dialysis, and 259 (41%) had AKI requiring dialysis. We extended the scores of the prediction models to examine their ability to predict AKI stage 3 and dialysis. The results showed that the Wang et al. study and Forman risk score demonstrated satisfactory discrimination performance (AUC = 85.8% and 82.9%, respectively) for AKI stage 3 and relatively low HL chi-square statistics (Table 3 and Fig. 2B). Similar to the results for predicting AKI stage 3, the discrimination performance of the Wang et al. model (AUC = 84.5%) and Forman risk score (81.7%) for dialysis was satisfactory, and these models had relatively low HL chi-square statistics (Table 3 and Fig. 2C). In addition, all of the AUC pairwise comparisons differed significantly in predicting AKI stage 3 or dialysis.

Extension of models for predicting MAKEs

We next analysed MAKEs within 1 year of AKI diagnosis and MAKEs from the index day to the end of follow-up. A total of 6137 (62.8%) patients suffered from MAKE events, of whom 5437 (55.6%) patients developed CKD, 1298 (13.3%) patients developed ESRD requiring chronic renal replacement therapy, and 2684 (27.5%) died. The patients were separated into two groups according to the cut-off value determined by the Youden index in predicting AKI and AKI stage 3. The results showed that the group with higher risk scores had a significantly greater risk of MAKEs than did the group with lower scores according to all five models, of which the hazard ratios ranged from 1.56 to 1.76 for 1-year follow-up and 1.52–1.70 for the index date to the end of follow-up (Fig. 3A,B).

Figure 3
figure 3

Forest plot showing the association between higher risk scores (above the optimal cutoff) and the risk of MAKEs during 1-year follow-up (A) and at the end of follow-up (B). MAKEs major adverse kidney events.

Additional analysis

We further conducted an additional analysis by excluding the patients with a creatinine level > 3.5 at arrival of the index admission because those patients may not be right at the cusp of developing severe AKI and may be lower on the nonlinear creatinine curve. The analyses demonstrated similar results to the overall results in that the performance of discrimination and calibration of the Wang et al. study and Forman risk score was superior to that of the other three scores (Supplemental Table 1). In addition, the results were generally consistent with the overall results when using the complete data set without any missing values (Supplemental Table 2).

Discussion

In the present study, we externally validated five existing models for predicting the risk of AKI in patients with AHF. The Forman risk score and Wang et al. model showed superior discrimination and calibration performance compared with the three other models.

The development of AKI in patients with AHF leads to prolonged hospitalization, increased readmission rates, and increased short- and long-term all-cause mortality and cardiovascular mortality. Coexisting AKI and AHF also lead to higher health care costs for patients with heart failure4,5,6. In the past two decades, many studies have focused on the early identification of patients with AHF who are at high risk of AKI development to initiate intervention earlier and improve their clinical outcomes. Some of these studies have used clinical parameters as risk predictors, and others have introduced or added novel urine biomarkers for AKI prediction1,13,14,15,16. However, the widely varying definition and classification of AKI (or WRF in some studies) as well as differences in the observed time-at-risk and heterogeneity of study populations have hindered the cross-comparison of published data. For this reason, AKI in the present study was defined according to the KDIGO Clinical Practice Guidelines for Acute Kidney Injury published in 201220, which are currently the most widely accepted and used criteria. To our knowledge, this is the first multi-institution validation study to use the KDIGO guidelines to compare existing prediction models of AKI in patients with AHF.

Among the AKI prediction models for patients with AHF, the Forman risk score, which was the first to be published, utilizes 4 factors (i.e., congestive heart failure history, diabetes mellitus, systolic blood pressure over 160 mmHg during admission, and elevated creatinine). The study introducing the risk score showed predictive ability for AKI in AHF, but it did not report any area under the ROC curve13. The AUC for AKI prediction was externally validated as being 0.65 by Breidthardt et al. in 20111 and Wang et al. in 201315. The subsequently developed Basel risk score sought to use fewer predictive factors to achieve better prediction ability. Chronic kidney disease, bicarbonate level, and outpatient diuretic treatment were used for AKI prediction, and the AUC was reported to be 0.71 in the original article. However, a few years later, Wang et al. found no difference in discrimination ability between the Basel and Forman risk scores, both of which had an AUC of 0.65 according to externally validated results15. In 2013, Wang et al. reported a prediction score derived from a larger patient number and, for the first time, included proteinuria as one of the risk factors for AKI prediction in the AHF population. Since then, proteinuria has been increasingly reported to be not only a predictive factor but also an aggravating factor in AKI25,26,27. The Wang et al. prediction model had a high sensitivity of 70.0%, specificity of 70.6%, and AUC of 0.76 in predicting AKI in AHF patients. Subsequently, Zhou et al. derived the first scoring system combining clinical risk factors and novel kidney injury biomarkers (uNGAL and uAGT)16. Zhou et al. reported the AUC separately; the AUC for the clinical model alone was 0.765, close to that of the Wang et al. model, while the AUC for the prediction model was 0.87416. These five AKI prediction models each have their own advantages and disadvantages in clinical application. The Forman risk score uses only four factors, and each of them is easily and widely examined in the clinical practice. Similar to the Basel risk score, which only included three prediction factors, it was easy for clinicians to use. Wang et al. and Zhou et al. published prediction models and included laboratory parameters that had been reported as AKI aggravating factors or novel AKI biomarkers. Although these markers might provide more information in AKI prediction, they have not been widely examined and are more expensive to assess. This means that using these prediction models was more costly. To offer an easier and more cost-effective choice, we externally validated these five prediction models in the present study.

Our current study not only externally validated these five prediction models in terms of AKI prediction but also estimated their performance in predicting serious AKI events, including AKI stage 3 and dialysis. As Table 3 shows, the AUCs of these prediction models for AKI prediction ranged from 0.543 to 0.73. Better performance was noted in AKI stage 3 and dialysis prediction, with AUCs of 0.565–0.858 and 0.539–0.845, respectively. All five prediction models showed favourable ability in long-term outcome prediction, with significantly higher incidences of MAKEs in the high-score groups than in the low-score groups. Thus, these prediction models can not only predict AKI events in AHF patients during hospitalization but also predict long-term adverse events in AHF patients.

Of the five prediction models we validated, the Forman risk score and Wang et al. model showed superior discrimination and calibration. The AKI risk score for AHF derived using the Wang et al. model had the best performance; its AUC was 0.73 in AKI prediction, and its AUCs for AKI stage 3 and dialysis were 0.858 and 0.845, respectively. This scoring system showed favourable calibration in predicting all three outcomes. The Forman risk score also showed good performance and calibration in AKI, AKI stage 3, and dialysis prediction, with AUCs of 0.696, 0.829, and 0.817, respectively.

Although the further pairwise comparison of AUCs revealed significant differences between the Wang et al. model and Forman risk score (Table 4), both had excellent discrimination (AUC of 0.8–0.9) by general definition28. Considering this, the Forman risk score may be seen as a relatively easier and more convenient tool for predicting AKI in AHF patients clinically because it requires only 4 clinical factors.

Much current research is being conducted to identify serum or urine biomarkers for early AKI prediction. However, these biomarkers are more costly to utilize and have not yet been widely examined in general laboratory settings. Some recent studies have reported that adding urine biomarkers to clinical prediction models yielded no significant performance improvement29,30,31,32,33, and Törnblom et al. even reported that new statistical methods no longer support using uNGAL to predict AKI in certain patient groups32. Taking this into consideration, prediction models based on clinical parameters seem to offer a faster, cheaper, and easier means of AKI prediction, thus increasing the likelihood of AKI prevention and early intervention. The current study demonstrated that a clinical prediction model alone can provide excellent discrimination ability for AKI in AHF patients. Clinical prediction models can achieve an AUC of 0.80, which is particularly high for serious AKI event prediction.

Strengths and limitations

Our study has several notable strengths. First, this is the first multi-institution validation study to compare existing prediction models of AKI in AHF patients based on the KDIGO Clinical Practice Guidelines. Second, our study further evaluated the performance of these prediction models in predicting serious AKI events and revealed that these prediction models also offer high discriminative power for predicting AKI stage 3 and dialysis. Third, this study not only assessed the short-term renal outcomes of patients with AHF but also evaluated their long-term outcomes. We demonstrated that patients with scores above the cut-off value had poorer long-term outcomes (defined by MAKE incidence) than did patients with lower scores.

This study also has some limitations. First, this was a retrospective analysis, and the inherent drawbacks of this design cannot be avoided. Second, the first record of the creatinine level upon emergency department admission was used as the baseline creatinine level, and AKI was defined by the subsequent change in creatinine. Thus, our study could only examine predictive ability in terms of AKI development during admission and not AKI at admission. For patients with higher baseline creatinine levels, small changes in eGFR could lead to a 0.3 mg/dL increase according to the KDIGO guideline definition. Third, data limitations prevented some prediction factors from being validated, including NT-proBNP, uNGAL, and uAGT. Last, the present study was based on CGRD data, so the enrolled patients were relatively homogenous. The result of external validation in this study might not be applicable to other populations.

Conclusion

We externally validated five existing prediction models for AKI in patients with AHF. The Forman risk score and Wang et al. model showed favourable discrimination and calibration in predicting AKI, AKI stage 3, and dialysis. The Forman risk score, as it comprises only 4 prediction factors, may offer the easiest and fastest means of individual risk prediction as well as risk stratification. By utilizing appropriate prediction models, clinicians can assess the risk of AKI in patients with AHF earlier and thus plan and initiate adequate disease management for these patients in a much timelier manner.