Introduction

Despite the steady decline in the number of performed cardiac surgeries, every year more than 300,000 coronary artery bypass graft (CABG) and valve operations are performed in the United States1. Among patients undergoing cardiac surgery, atrial fibrillation (AF) prevalence is estimated for 11.5%, which makes it one of the most common co-morbidities in this group2. AF is often considered as an indicator of high-risk patients and a predictor of higher mortality rates and potentially fatal postoperative complications. As a consequence of loss of atrial systole contribution, greater morbidity rates for stroke and renal failure, prolonged ventilation time, higher reoperation rates, and deep sternal wound complications have been reported. Moreover, patients with pre-operative AF experience a higher adjusted long-term risk of all-cause death and cumulative risk of stroke and systemic embolism3,4.

The European System for Cardiac Operative Risk Evaluation (EuroSCORE) II was developed to reflect a more current dataset and evidence-based improvements in cardiac surgery. In the United States, The Society of Thoracic Surgeons (STS) risk score is more accepted owing to the relatively high predictive value despite less user friendliness and inapplicability to some cardiac surgeries. The inclusive nature of EuroSCORE II for numerous procedures provides more flexibility than the STS score for complex procedures. Unlike the STS risk score, EuroSCORE II does not include AF as a risk factor, often leading to the underestimation of risk in higher-risk-profile patients.

The current analysis aimed to validate EuroSCORE II on the robust cohort of heart surgery patients with underlying AF from a contemporary nationwide registry.

Methods

Registry design

All data were retrieved from the Polish National Registry of Cardiac Surgery Procedures (KROK). Data collection methods and definitions are available at https://krok.csioz.gov.pl. Initially, our data set consisted of 45,050 adult cardiac surgical patients from January, 1st 2017 to January, 1st 2020 from 37 enrolling centres. However, patients were excluded from the study if any of the following exclusion criteria were met: patients with missing > 1 EuroSCORE II predictors or with missing in-hospital mortality data (N = 512), patients aged under 18 years (N = 226) or over 90 years due to not enough data for nonagenarians in the time of creating EuroSCORE II risk model (N = 140). EuroSCORE II was recalculated for every single patient enrolled in the study with the interactive calculator (available at https://www.euroscore.org). Depending on the calculated score, EuroSCORE II situates patients at ≤ 2%—low, 2% to ≤ 5%—mild, > 5% to ≤ 10%—moderate, 10% to ≤ 20%—high, and > 20%—very high risk of perioperative in-hospital mortality. The diagnosis of any type of pre-operative AF was based on the anamnesis interview.

Data collection

A detailed questionnaire, defined according to standard definitions, including demographic data, history, physical findings, management, imaging studies, and outcomes, was developed. Data were collected either at presentation or by physician review of the hospital records and were forwarded to the KROK registry. For patients undergoing heart surgery, we considered and reported the variables according to EuroSCORE II definitions. Additionally, exact types of surgeries were reported alongside. The study was approved by the Institutional Board of Central Clinical Hospital of the Ministry of Interior, Centre of Postgraduate Medical Education, Warsaw, Poland and adheres to Helsinki Declaration as revised in 2013. Due to the anonymization of registry data, patient informed consent was waived by the Institutional Board of Central Clinical Hospital of the Ministry of Interior, Centre of Postgraduate Medical Education, Warsaw, Poland.

The primary endpoint was in-hospital mortality as per EuroSCORE II definition, together with 30- and 90-day mortality.

Statistical analysis

Normal distribution was assessed using a Shapiro–Wilk test. Descriptive analyses were represented as a median (Me) with interquartile range (IQR) for continuous variables, and for categorical variables as a number (N) of occurrences (%). The statistical significance of differences between the two groups was determined using the χ2, Mann–Whitney U and Dunn's multiple comparisons tests when appropriate. The association between mortality, EuroSCORE II and atrial fibrillation (AF) was assessed using univariable and multivariable logistic regression.

Model calibration was evaluated using the Hosmer–Lemeshow test (data collapsed into 10 quartiles of estimated probability), calibration plot methodology (predicted probability of expected (E) vs. the observed (O) proportion) of outcomes for 50 equally sized groups in two cohorts according to the presence or lack of AF. The expected mortality rate was compared with the observed mortality rate in the overall cohort and clinically defined sub-groups according to mortality risk and AF. The estimated survival probability was presented graphically by Kaplan–Meier curves5.

Discriminative performance was assessed by receiver operating characteristic (ROC) curves and by computing the area under the curve (AUC) with a 95% confidence interval (95% CI). The AUCs were compared using the Delong test. The tests were assessed in the overall study group and the subgroups of patients with AF regardless of the type of surgery. For all analyses, we set the level of statistical significance at P < 0.05. All statistical analysis was performed using XL Stat (Addinsoft, 2022, version 2022.1.02.1251, New York, NY, USA), and Stata Statistical Software, (StataCorp, 2022, version 17, TX, USA).

Results

The final cohort for analysis consisted of 44,172 patients (Fig. 1). The median age of the population was 67 years (IQR 60–72), 30.8% (N = 13,604) of patients were female, and 13.4% (N = 5906) had AF. Detailed characteristics with a comparison between AF (+) and AF (−) groups are shown in Table 1. Patients with AF had a higher percentage of all major complications, i.e., prolonged mechanical ventilation, surgical-site infection, bleeding, reoperation, stroke, and acute kidney injury excluding acute coronary syndromes. Overall, the in-hospital mortality rate was 4.14% (N = 1830), and 5.21% (N = 2303) for 30-day mortality (Table S1—supplementary materials).

Figure 1
figure 1

Flow chart of study design. AF atrial fibrillation.

Table 1 Characteristics of the screened population.

The median hospital length of stay in the overall population was 3.7 days, (IQR 0.6–13.7 days). Comparing alive and in hospital died patients, those who died were older [69 (62–75) vs. 66 (60–75) years, P < 0.001], more often women [37.9% (N = 694) vs. 30.5% (N = 12,910), P < 0.001], and more often had atrial fibrillation [20.4 (N = 374) vs. 13.1% (N = 5532), P < 0.001]. Their EuroSCORE II was higher [7.0 (3.1–26) vs. 1.9 (1.2–3.5), P < 0.001] with significant differences in all parameters included in the EuroSCORE II model. (Table S2—supplementary materials). The detailed list of performed procedures is presented in Table S3.

Association with in-hospital mortality

In univariable analysis, EuroSCORE II was associated with in-hospital mortality with an unadjusted odds ratio (OR) of 1.38 (95% CI 1.36–1.4), P < 0.001. Similarly, AF was associated with mortality with an unadjusted OR of 1.71 (1.52–1.92), P < 0.001. In a multivariable analysis, with all the variables included in EuroSCORE II scale, AF was not associated with in-hospital mortality anymore (OR 0.99, 95% CI 0.86–1.14, P = 0.87) (Table S4—supplementary materials). The multivariable analysis of EuroSCORE II thresholds showed no statistically significant impact of AF on the frequency of hospital mortality, although the impact of AF was numerically most pronounced in the highest risk groups (≥ 10%, Table S5—supplementary materials).

The Kaplan–Meier survival analysis showed a significantly higher 30 days mortality rate in the AF (+) group compared to the patients without AF (P < 0.001) but the effect is only expressed in patients with EuroSCORE II less than 5% (Fig. 2 and Table S6).

Figure 2
figure 2

Kaplan–Meier 30-days (inner graphs—90-days) survival analysis in relation to perioperative risk and atrial fibrillation (AF). Patients are stratified to: (A) total population; (B) low-risk patients; (C) mild risk patients; (D) moderate risk patients; (E) high risk patients. Significant differences between occurrence of AF and survival are evident in all groups in 30-days follow-up (p < 0.001) and in patients with risk ≤ 5% in 90-days follow-up.

In survival analysis in relation to the type of surgery and AF, the highest differences in mortality were observed in CABG and three surgery procedures as opposed to single non-CABG and two surgery procedures (Fig. 3).

Figure 3
figure 3

Kaplan–Meier 30-days survival analysis in relation to type of surgery and atrial fibrillation. (A) Coronary artery bypass grafting (CABG); (B) single non-CABG, (C) 2-procedures; (D) 3-procedures. Significant differences between occurrence of AF and survival are evident in all groups (p < 0.001).

Model calibration

The clinical performance of EuroSCORE II was tested in different populations of predicted mortality risk patients. The overall population was divided into quintiles according to EuroSCORE II and a comparison between observed and predicted in-hospital mortality according to the five models considered was made. In the total cohort, EuroSCORE II expected mortality rate was 4.01%, giving an observed to expected (O:E) ratio of 1.03. We observed under prediction of mortality for mild, moderate, and high-risk patients (O:E—1.1, 1.16, and 1.04 respectively) in opposite to low and very high-risk patients (O:E—0.91 and 0.96 respectively). In the AF (+) subgroup EuroSCORE II score performed well (O:E—0.99), whereas in the low and very high-risk populations we observed the greatest overestimation of mortality (O:E—0.89 and 0.9). On the other hand, the biggest underprediction was observed in mild and moderate-risk patients in AF (−) subgroup (O:E 1.11 and 1.18, respectively). Detailed characteristics are shown in Table S7. Visual representation of the calibration plot for AF (+) and AF (−) patients demonstrates overprediction of mortality of the EuroSCORE II model for the low and very high-risk patients as shown in Fig. 4 and Table 2.

Figure 4
figure 4

Calibration plot, comparison between observed mortality and mortality predicted by EuroSCORE II. AF atrial fibrillation.

Table 2 Calibration parameters in subgroups according to EuroSCORE risk and absence of atrial fibrillation.

All models failed the Hosmer–Lemeshow tests (χ2 = 444.6, 62.7, and 363.2 for the total cohort, AF (+), and AF (−) groups, respectively. P-values were < 0.001 for all groups (Table S6).

ROC analysis

EuroSCORE II showed good discrimination in the overall population with the AUC of 0.807 (95% CI 0.796–0.817). The ROC curves for AF (+) group presented a lower AUC (0.791, 95%CI 0.767–0.816) than in AF (−) (AUC 0.805, 95% CI 0.793–0.817), P < 0.001 (Fig. 5).

Figure 5
figure 5

Receiver operating characteristic curves: (A) AF (+) vs. AF (−); (B) studied population for different in-hospital mortality risks; (C) AF (+) for different in-hospital mortality risks. (D) AF (−) for different in-hospital mortality risks. P for comparison between AF (−) vs. AF (+) in —CABG group (< 0.001), —singe non-CABG (0.16), —2 procedures (0.01), —3 procedures (0.44). AUC area under curve, CI confidence interval.

The best discriminative performance of EuroSCORE II models for the AF (+) cohort was 2 procedures surgery (AUC = 0.831, 95%CI 0.792–0.871). On the other hand, the worst discriminative power of EuroSCORE II for the AF (+) was for CABG (AUC 0.746, 95%CI 0.676–0.817) as compared with AF (−) population (AUC 0.798, 95% CI 0.774–0.822), P < 0.001. Discrimination for different in-hospital mortality risk groups for the overall cohort and AF subgroup are presented in Fig. 6.

Figure 6
figure 6

Receiver operating characteristic curves: (A) studied population for different in-hospital mortality risks; (B) AF (+) population for different in-hospital mortality risks; (C) AF (−) population for different in-hospital mortality risks. P for comparison between AF (−) vs. AF (+) in —low group (< 0.001), —mild (< 0.001), —moderate (< 0.001), —high (< 0.001), —very high (0.01). AUC area under curve, CI confidence interval.

Discussion

The current analysis is the first to perform a validation of EuroSCORE II for patients with underlying AF and undergoing heart surgery on such a scale. The main finding of the current work is that in the overall population, EuroSCORE II has good predictive value, however, its calibration was better in patients with concomitant AF. Moreover, EuroSCORE II significantly underestimated mortality in mild and moderate groups of patients. On the other hand, mortality was numerically overestimated in low and very high-risk groups, particularly in patients with AF. In this group, the model had poor performance for isolated coronary artery bypass graft surgery risk stratification and in both the highest and lowest risk patients. Therefore, EuroSCORE should be used with caution in these groups of patients.

A number of risk calculators are available to estimate the risk of mortality and complications following heart surgery both in-hospital and long-term6. EuroSCORE II has been accepted and widely adopted for numerous reasons—ease, readiness, and bedside use being the most important. A recent review of external validations of cardiovascular clinical prediction models (CPM) reported that EuroSCORE II is the third most validated CPM. The median AUC of 65 validations was 0.76 (IQR 0.68–0.81), proving its good discriminative performance7. When compared to other risk calculators, in many analyses that investigated reliability in predicting perioperative mortality in cardiac surgery patients, EuroSCORE II has provided similar results to STS score, both outperforming EuroSCORE I and ACEF (age, creatinine, ejection fraction)6,8. Interestingly, in the aforementioned CPMs review by Wessler et al., EuroSCORE II was found to have better discriminative power than STS score and ACEF but not EuroSCORE I7.

However, EuroSCORE II has its shortcomings. It was reported that the model may not be reliable in non-elective surgeries and patients undergoing valvular interventions9,10,11,12,13. Grant et al. in their analysis of 3,343 emergency procedures found that the risk tended to be underpredicted in lower-risk patients and over-predicted in the higher-risk10. It raised a concern that a situation in which a patient is denied an emergency cardiac surgery due to an inappropriately high-risk score can occur. On the other hand, Paparella et al., in their external validation in a prospective registry, reported that in urgent and emergent surgery observed-to-expected mortality rates were 1.43 and 1.45, respectively, suggesting significant underestimation in such cases11. However, both studies were consistent when considering good overall prediction of in-hospital mortality in non-emergent cases. Our results are contrary to other studies that reported underestimation of the expected mortality rates among low- and high-risk patients11,14,15. Moreover, our study does not support the current data suggesting well calibration of EuroSCORE II among patients with mild or moderate risk11,16. There may be several sources of these discrepancies. Firstly, our procedural characteristics differ from the EuroSCORE II and validation studies reports. The main differences regard higher rates of combined surgeries and differences in rates of valvular interventions. On the other hand, being aware of the risk of multicollinearity and overfitting, it was proved in EuroSCORE I that the limitation of included variables resulted in better calibration and clinical performance17. Finally, EuroSCORE II was based on data from 43 countries, including 16 non-European. Knowing the differences in quality of care, comorbidities, and risk profiles between nations, its heterogeneity may have affected the accuracy of estimations and the results may not be generalizable to all populations13.

Pre- and post-operative AF is a well-known risk factor for adverse short- and long-term outcomes, including higher mortality, for both cardiac and non-cardiac surgery4,18,19,20. Not only did our study prove that AF is associated with higher peri-operative mortality rates, but also that such patients were significantly more susceptible to all of the analyzed complications, except acute coronary syndrome. The lower risk of myocardial infarction among patients with pre-operative AF is consistent with the recent Prasada et al. impressive analysis of 8,635,758 individuals who underwent non-cardiac surgery19. Several mechanisms explaining the detrimental effect of AF in the peri-operative period are proposed, including, among others, low-cardiac-output syndrome and impaired bypass graft flow21. The meta-analysis of 35 studies revealed that perioperative AF was associated with an increased risk of stroke and mortality, both in the short- and long-term3. In addition, AF may contribute to the development of heart failure, its exacerbations, and increased bleeding events due to the necessity for chronic oral anticoagulation22.

One of the important novel findings of our study is that EuroSCORE II provided even better prediction in the cohort of patients suffering from AF. However, its discriminative power was lower in this group, reaching the lowest value in isolated CABG surgery. In Kaplan–Meier 30-day survival analysis in relation to perioperative risk, there was a cross-over of mortality curves at the threshold of EuroSCORE II 5%, which may reflect differences in calibration between the two groups. Potential explanation includes the fact that in the group of patients without AF there was a higher rate of urgent, emergency and salvage operations—known for the increased peri-operative risk. Moreover, EuroSCORE II was reported to underestimate operative risk in non-elective cases, which may be reflected in the underprediction in patients with EuroSCORE II ≥ 5% in our study11. The observed differences were no longer significant in the long-term follow-up.

When it comes to the survival analysis in relation to the type of surgery, AF significantly worsened the prognosis in all of the analyzed procedures, which is consistent with the higher prevalence of peri-operative complications. The worst outcomes were reported for CABG and curves diverged after 10 days, which may be partially explained by the previously proposed influence of AF on early graft failure. Surgical ablation should be considered in such cases, as a significant improvement in prognosis was previously reported, especially in lower-risk patients23.

Our study proves that in the group of patients with AF EuroSCORE II overestimated mortality in low- and very-high-risk patients. Its discriminative power is significantly lower in the group of patients with AF, particularly those undergoing CABG. Future efforts in the development of EuroSCORE III should focus on taking into consideration minimally invasive approaches in cardiac surgery, e.g., transcatheter aortic valve implantation (TAVI), off-pump coronary artery bypass (OPCAB) or minimally invasive mitral valve surgery (MIMVS). There are also a few more alarming outcomes that require further investigation. This analysis demonstrated that AF is not a benign co-morbidity, but a serious condition that deeply affects prognosis after cardiac procedures. In the future, more emphasis should be placed on research focusing on the prevention of the most common complications. With epicardial left atrial appendage closure and the Cox-maze ablation, modern surgery offers effective treatment options that may improve short- and long-term outcomes in patients with concomitant AF.

Limitations

Limitations of EuroSCORE II scoring system are inherent and translatable to the current analysis as well. EuroSCORE II by initially not including predefined factors such as neurological condition, blood panel counts, BMI, race, level of coronary stenosis etc. makes the general accuracy of the model lower in the specific subsets of patients underrepresented in the initial EuroSCORE II study cohort. One limitation of the current analysis is the lack of detailed information on AF in the KROK registry—we could not stratify patients depending on the type of AF (paroxysmal vs permanent) nor on the AF duration and association of outcomes with the type, doses, duration and adherence to OAC. Second, differences in the protocols and patient management, particularly during intensive care unit (ICU) stay, exist across participating centres. We have made an attempt to minimize the institutional bias by placing the time-frames of the study to best represent the contemporary surgical and ICU practice, yet not to overlap with COVID—19 pandemic which has made an early diagnosis and access to heart surgery care more difficult in the recent 3 years. However, it resulted in the inclusion a of relatively small group of high and very high-risk patients. Third, the current analysis does not assess the long-term outcomes; such an analysis could shed further light on the impact of initial EuroSCORE II on out-of-hospital outcomes as well. Finally, in our analysis all models failed the Hosmer–Lemeshow tests. Knowing concerns linked with the Hosmer–Lemeshow test, we decided to use also calibration plots in our analysis as well.

Conclusions

The main findings of this study are that while EuroSCORE II is a good predictor of outcomes for the general population, it is more accurate for patients with concomitant AF. However, EuroSCORE II underestimated mortality rates for patients with low-to-moderate risk. Additionally, its ability to distinguish between high- and low-risk patients was lower for those with AF, especially those undergoing coronary artery bypass grafting, indicating that its use should be cautiously used in these groups.