Introduction

Oral anticoagulation (OAC) is highly effective reducing thromboembolism and mortality in patients with atrial fibrillation (AF)1,2. Given that OAC confers a risk of bleeding, various clinical risk scores have been proposed to help risk stratification3, such as HAS-BLED, ATRIA, HEMORR2HAGES and more recently, ORBIT (Supplementary Table 1).

One of the advantages of HAS-BLED is its simplicity, yet useful predictive capability for bleeding in users of VKA and non-VKA anticoagulants, aspirin or no antithrombotic therapy4,5. Importantly, HAS-BLED draws attention to the potentially modifiable bleeding risk factors, such as uncontrolled hypertension (‘H’ in HAS-BLED), concomitant use of aspirin/non-steroidal anti-inflammatory drugs (NSAIDs) or alcohol excess (‘D’ in HAS-BLED)6,7. HAS-BLED also takes into consideration the quality of anticoagulation control amongst VKA users (i.e. the ‘labile INRs’ criterion, often defined by the time in therapeutic range [TTR] <65% or similar indices, e.g. proportion of INRs in range (PINRR))8. In VKA users, good quality anticoagulation control is a cornerstone, given that the efficacy and safety of VKA is intimately related to TTR4,9. Indeed, both major bleeding and mortality rates are significantly higher with low TTR10,11. Despite the introduction of the Non-VKA Oral Anticoagulants (NOACs), the VKAs remain very widely used world-wide, and to have a simple bleeding risk score valid in VKA and non-VKA anticoagulants, aspirin or no antithrombotic therapy allows clinical application in all parts of the AF patient management pathway.

Based on clinical trial cohorts, other risk scores have been proposed, to be valid in VKA or NOACs users by not considering ‘labile INR’ as a criterion. In clinical trials, however, patients are often carefully selected and followed up regularly, whereas AF patients in ‘real world’ clinical practice tend to be older, with associated comorbidities and polypharmacy.

In the present study, we have compared the four AF-validated bleeding risk schemas in a large ‘real world’ cohort of AF patients over a long period of follow-up. Second, we tested if the predictive values of ATRIA, ORBIT and HEMORR2HAGES scores, and their clinical usefulness could be improved adding a labile INR criterion (defined as TTR <65%).

Methods

From May 1, 2007 to December 1, 2007 we recruited consecutive patients with paroxysmal, persistent or permanent AF who had steady OAC with VKA (INR 2.0–3.0) for at least 6 months, in our single anticoagulation center from a tertiary Hospital in Murcia (Southeastern Spain). At baseline, all patients were receiving anticoagulation therapy with acenocoumarol (the commonest VKA used in Spain) and consistently achieved an INR between 2.0 and 3.0 during the previous 6 months (hence, TTR 100% for this cohort – to ensure baseline homogeneity and avoiding the bias produced by a low TTR at entry or initially unstable INRs especially in an inception cohort). Patients with prosthetic heart valves or AF due to mitral valve stenosis, recent acute coronary syndrome (ACS), stroke (ischemic or embolic), or any hemodynamic instability that led hospital admission or surgical intervention in the preceding 6 months were excluded.

At baseline, a complete medical history was recorded and stroke risk (CHADS2) and bleeding risk (HEMORR2HAGES) were calculated. Other risk scores (CHA2DS2-VASc for stroke risk; HAS-BLED, ATRIA and ORBIT for bleeding risk) were calculated retrospectively using the clinical variables available in our (prospectively collected) dataset. The TTR at 6 months after entry was calculated using the linear interpolation method of Rosendaal12. Good anticoagulation control was defined as a TTR >65%, based on recommendations of the National Institute for Health and Care Excellence (NICE)13. Anemia was defined as hemoglobin <13 g/L in men and <12 g/L in women.

Follow-up was performed by personal interview at each visit to the anticoagulation clinic and through medical records. During this period we recorded all bleeding events, which were categorized as major bleeding (primary endpoint) if they met the following 2005 International Society on Thrombosis and Haemostasis (ISTH) criteria14: fatal bleeding, and/or symptomatic bleeding in a critical area or organ, such as intracranial, intraspinal, intraocular, retroperitoneal, intra-articular or pericardial, or intramuscular with compartment syndrome, and/or bleeding causing a fall in hemoglobin level of 20 g.L−1 (1.24 mmol.L−1) or more, or leading to transfusion of two or more units of whole blood or red cells. Bleeding events, as well as other clinical outcomes, were identified, confirmed and recorded by the investigators.

This observational registry was approved by the Ethical Committee from University Hospital Morales Meseguer and was performed in accordance with the ethical standards laid down in the 1964 Declaration of Helsinki and its later amendments. All patients gave informed consent to participation in the study.

Statistical analysis

Categorical variables are presented as counts and percentages. Continuous variables were tested for normality by the Kolmogorov-Smirnov test and presented as mean ± standard deviation (SD) or median and interquartile range (IQR), as appropriate. The Chi-squared test was used to compare proportions. Cox regression models were performed to determine the association between higher values (or high risk and medium/high risk when we analyzed as categories) of the bleeding risk scores and the occurrence of a major bleeding.

Kaplan-Meier estimates and analysis by the long-rank test were carried out to assess differences in event-free survival distributions between subgroups of bleeding risk categories. Receiver operating characteristic (ROC) curves were applied to evaluate the predictive ability (expressed as c-indexes) of the four AF-validated bleeding risk scores. Comparisons of ROC curves were carried out by DeLong et al. method15. Net reclassification improvement (NRI) and integrated discriminatory improvement (IDI) were performed according to the methods described by Pencina et al.16. Additional analyses were carried out by adding one point for TTR <65% to the ATRIA, ORBIT and HEMORR2HAGES scores (as ‘labile INR’ was already included within the HAS-BLED score), in order to determine if this results into an improvement of the predictive ability for major bleeding.

Goodness of fit of the new bleeding risk models was evaluated using the Hosmer-Lemeshow test. Finally, clinical usefulness and net benefit of the new predictive models were estimated using decision curve analyses (DCAs)17,18. The DCA test identifies patients who will have any major bleeding, based on the predictions of one risk score when is compared with another score. The x-axis shows threshold values for major bleeding risk while the y-axis represents the net benefit for the different threshold values of major bleeding risk. The prediction models that are the farthest away from the slanted dash grey line (i.e., assume all major bleeding) and the horizontal black line (i.e., assume none major bleeding) demonstrates the higher net benefit.

A p value < 0.05 was accepted as statistically significant. Statistical analyses were performed using SPSS v. 19.0 (SPSS, Inc., Chicago, IL, USA), MedCalc v. 16.4.3 (MedCalc Software bvba, Ostend, Belgium) and STATA v. 12.0 (Stata Corp., College Station, TX, USA) for Windows.

Results

We included 1361 patients (48.7% male; median age 76, IQR 71–81 years), followed-up for a median of 6.5 years (IQR 4.3–7.9). A summary of clinical characteristics at baseline is shown in Table 1. Median CHA2DS2-VASc was 4 (IQR 3–5) and the median TTR at 6 months after entry was 80% (IQR 66–100), although 24.2% of patients had a TTR <65%.

Table 1 Baseline clinical characteristics.

Median (IQR) values in our cohort for HAS-BLED, ATRIA, ORBIT and HEMORR2HAGES were 2 (IQR 2–3), 3 (IQR 1–3), 1 (IQR 1–2) and 2 (IQR 1–3), respectively. Based on HAS-BLED, 44.7% of patients were categorised as being at ‘high risk’ for bleeding, whilst for HEMORR2HAGES, ATRIA and ORBIT, the corresponding proportions categorised as ‘medium/high risk’ (i.e. the ‘action needed’ threshold) were 74.8%, 22.3% and 23.4%, respectively.

During the 6.5 years (IQR 4.3–7.9) follow-up, there were 250 (18.4%) major bleeding events (i.e. 2.82%/year), of which 78 (5.7%, i.e. 0.88%/year) were intracranial bleeds and 97 (7.1%, i.e. 1.09%/year) were gastrointestinal bleeds. Fatal bleeds occurred in 52 (3.8%, i.e. 0.59%/year).

Relationship to comorbidities

Diabetes mellitus, heart failure, coronary artery disease and prior malignancy were generally more prevalent in high risk groups in all scores (Table 1). Patients at medium/high risk of bleeding using HEMORR2HAGES were more frequently female (p = 0.004), but there was no sex association with other scores. Anaemia was more prevalent with high risk HAS-BLED (p = 0.001). Patients at medium/high risk according to ORBIT and ATRIA more commonly had previous stroke/TIA. As expected, thromboembolic risk according to CHA2DS2-VASc score was higher amongst high or medium/high bleeding risk categories (p < 0.001 for all scores).

Median TTR analyzed was significantly lower in the HAS-BLED high risk group (p < 0.001) and ATRIA (p = 0.028) and ORBIT (p = 0.003) medium/high risk groups. The proportion with poor anticoagulation control (TTR <65%) was significantly increased in the high risk HAS-BLED (p < 0.001) and medium/high ORBIT (p = 0.019) categories.

Bleeding events

Of 250 major bleeding, 65.2% occurred in the HAS-BLED high risk category and 82.4% in the HEMORR2HAGES medium/high risk category; in contrast, most major bleeds occurred in ‘low risk’ ATRIA and ORBIT scores, with only 29.6% and 34.0% of major bleeds occurred in their respective ‘medium/high risk’ categories. Odds ratios (OR) for major bleeds using the four bleeding risk scores were calculated. The HAS-BLED high risk category [OR 2.00 (1.51–2.63); p < 0.001] showed the highest value compared with the medium/high risk categories for ATRIA [OR 1.63 (1.20–2.22); p = 0.002], ORBIT [OR 1.93 (1.43–2.60); p < 0.001] and HEMORR2HAGES [OR 1.72 (1.21–2.45); p = 0.002] (Table 2).

Table 2 Distribution of major bleeding events according to the bleeding risk scores.

Univariate Cox regression analysis also showed a significant association between the four bleeding risk scores and major bleeds, whether analysed as continuous or categorical variables (Table 3). Survival analysis demonstrated that patients categorized at high risk or medium/high risk showed an increased risk of major bleeding (HAS-BLED: Log-Rank 40.24, p < 0.001; ATRIA: Log-Rank 25.82, p < 0.001; ORBIT: Log-Rank 40.88, p < 0.001 and HEMORR2HAGES: Log-Rank 21.33, p < 0.001) (Fig. 1).

Table 3 Univariate Cox regression analysis between bleeding risk scores and major bleeding events.
Figure 1
figure 1

Event free survival for major bleeding according to risk categories for each bleeding risk score.

Receiver operating characteristic (ROC) curves analysis shows that all scores predicted major bleeding in patients with AF, with c-indexes of 0.62 (p < 0.001) for HAS-BLED, and 0.54 (p = 0.004), 0.56 (p < 0.001) and 0.54 (p = 0.007) for ATRIA, ORBIT and HEMORR2HAGES (Supplementary Table 2), with HAS-BLED having the best predictive value. Comparison of the ROC curves according to DeLong et al.15 demonstrated that HAS-BLED had the best performance of the four scores (Table 4).

Table 4 Comparison of the ROC curves, IDI and NRI of the four bleeding risk scores.

When labile INR or poor anticoagulation control (i.e. TTR <65%) was added to ATRIA, ORBIT and HEMORR2HAGES, this modification significantly increased the ability of discrimination and their predictive values (Table 5). Comparison of the original and modified scores demonstrated significant improvements in c-indexes for the ATRIA, ORBIT and HEMORR2HAGES modified scores (p < 0.001 for the three scores). Reclassification analysis showed an improvement in sensitivity and significant positive reclassification of the modified scores compared with the original, based on the IDI and NRI (Table 5; Fig. 2). Based on the p values of the Hosmer-Lemeshow test, the new predictive models that include poor anticoagulation control (TTR <65%) were properly calibrated (ATRIA, p = 0.981; ORBIT, p = 0.569 and HEMORR2HAGES, p = 0.294).

Table 5 Comparison of the ROC curves, IDI and NRI of the modified bleeding risk scores (by addition of labile INR defined as time in therapeutic range <65%).
Figure 2
figure 2

ROC curves for original and modified bleeding risk scores (adding TTR <65%).

Finally, decision curve analysis (DCA) graphically demonstrates that the overall risk of major bleeding is 19%, based on the intersection of the y-axis and the slanted dash grey line. As they are farthest away from the slanted dash grey line (i.e., assume all major bleeding) and the horizontal black line (i.e., assume none major bleeding), the modified ATRIA, ORBIT and HEMORR2HAGES scores (that include labile INR) demonstrates improved clinical usefulness and a higher net benefit compared to the original scores (Fig. 3).

Figure 3
figure 3

Decision curves for the original and modified bleeding risk scores (adding TTR <65%).

Discussion

In this ‘real world’ study, our principal finding was that in AF patients taking VKAs, HAS-BLED, ATRIA, ORBIT and HEMORR2HAGES scores are all associated with major bleeding, although HAS-BLED had the best predictive ability. Second, adding labile INR (TTR <65%) to ATRIA, ORBIT and HEMORR2HAGES scores significantly improved their predictive value for major bleeding, suggesting that these three scores would perform suboptimally in VKA users by not considering ‘labile INR’ as a criterion for bleeding. Indeed, the modified ATRIA, ORBIT and HEMORR2HAGES scores (that include labile INR) demonstrated improved clinical usefulness and a higher net benefit compared to the original scores.

Given that the VKAs are the commonest OACs in use world-wide, our findings have major implications for bleeding risk assessment in relation to OAC use. Also bleeding risk is not a ‘static’ process, and patients require re-evaluation at every opportunity over the course of the patient pathway19. The appropriate use of bleeding risk scores has been discussed, and these scores are to ‘flag up’ patients potentially at risk of bleeding for more careful review and follow-up. Thus, the ATRIA and ORBIT categorise most patients at ‘low risk’ and hence, would not have ‘flag up’ patients potentially at risk of bleeding – indeed, most patients sustaining major bleeding events occurred in the ‘low risk’ categories of the ATRIA and ORBIT scores.

Given that bleeding risk can be modified, appropriate use of bleeding scores should be to focus attention on reversible bleeding risk factors, such as uncontrolled hypertension, excess alcohol and concomitant antiplatelet therapy or NSAID use, as well as labile INRs in a patient taking VKA19. These features are fulfilled by HAS-BLED which has been validated in patients on anticoagulants (whether VKA or non-VKA), aspirin or no antithrombotic therapy – hence, the validity of using this bleeding score in all steps of the patient management pathway.

In the present study, all four bleeding risk scores were associated with major bleeding, although HAS-BLED had the best predictive performance, based on the c-index and NRI. Indeed, HAS-BLED has previously demonstrated better prediction than ATRIA and HEMORR2HAGES, even for intracranial haemorrhage5,20,21,22,23,24,25,26.

The ATRIA (Anticoagulation and Risk Factors in Atrial Fibrillation) score was proposed in 2011 to predict bleeding associated with warfarin27, but none of the risk score criteria includes assessment of quality of anticoagulation control or the concomitant use of antiplatelet therapy. As previously described, the HAS-BLED score has a better performance than ATRIA amongst anticoagulated patients with AF, whether with VKA23,25 or non-VKA anticoagulants28, as well as amongst non-anticoagulated patients29. More recently, the ORBIT score was derived from an industry-sponsored registry and proposed as a simple score to assess the risk of bleeding in patients with AF regardless of the type of anticoagulant, whether VKA or non-VKA30. Although the ORBIT score predicted major bleeding in the large cohort of the ROCKET-AF trial31,32, the ORBIT score ignores the quality of anticoagulation control as a criterion and has been shown to be inferior to HAS-BLED in predicting bleeding amongst AF patients on VKA and non-VKA anticoagulants33,34,35, as well as those who are non-anticoagulated29.

The association between the TTR and adverse events has been shown in numerous studies. For example, an increased risk of major bleeds has been consistently shown in patients with VKAs with a TTR below than 65%36,37,38,39. Many other clinical factors have been added into bleeding risk stratification schemes, but these have been based on complex scoring systems derived from multivariate analyses and thus, difficult to apply in clinical daily practice6. Whilst undoubtedly interesting and necessary to develop risk scores for assessing the risk of bleeding irrespective of the anticoagulant and make these scores as simple as possible, the VKAs are still the most commonly used OAC worldwide, and thus, anticoagulation control is an issue that cannot be ignored to support the appropriate clinical decision making. Given the close relationship of bleeding to labile INRs and poor TTR, attention to this clinical factor amongst those patients taking a VKA is crucial7. Of note, ‘labile INR’ can also be easily defined using other simple (and easily accessible) parameters, such as the proportion of INRs in range, INR variability, time above range, INR >5 twice, INR >8 once, or INR <2 twice, etc.13,40. The results of the present study reinforce this perspective, since the inclusion of ‘labile INR’ into the ATRIA, ORBIT and HEMORR2HAGES would significantly increase the predictive ability and clinical usefulness of these scores. Importantly, this suggests that these scores perform suboptimally in VKA patients unless labile INR is considered. Hence, these findings observed in ‘real world’ AF patients support the results from clinical trial cohorts33,35.

Limitations

This study is limited by its single centre design, with a Caucasian based population. The dataset was collected prospectively, although we calculated some risk scores (CHA2DS2-VASc, HAS-BLED, ATRIA and ORBIT) and performed the analyses retrospectively, since at the time of patient inclusion these newer scores were not yet described and hence, were not used to ‘clinically manage’ these patients. At the beginning of the study all patients were treated with acenocoumarol, which has a shorter half-life than other VKAs. However, one strength of our study is the inclusion of consecutive AF patients that were stable with VKA (INR 2.0–3.0) for at least 6 months. Follow-up was also done in an anticoagulation clinic, where at the beginning of OAC therapy patients are carefully followed, according to a standardized care protocol. This aspect may have minimized our bleeding events, and the generalizability to other settings with less intense follow-up.

Conclusions

In AF patients taking VKAs, the HAS-BLED score had the best predictive ability. Adding labile INR (TTR <65%) to ATRIA, ORBIT and HEMORR2HAGES scores improved their predictive value for major bleeding leading to improved clinical usefulness and a higher net benefit compared to the original scores. This suggests that these three scores would perform suboptimally in VKA users by not considering ‘labile INR’ as a criterion for bleeding.