Introduction

Oral anticoagulant (OAC) therapy is the cornerstone of management in patients with atrial fibrillation (AF) for the prevention the risk of stroke and thromboembolism1,2,3. Bleeding risk represents the downside of treatment with anticoagulation, with a reported major bleeding incidence of 2.0 per 100 patient-years4. Therefore it is important to have simple and practical bleeding risk stratification tools for use in AF patients, to aid clinical decision-making5.

The HAS-BLED score6 is a simple, clinical risk factor based score which has been well validated in various cohorts7,8,9,10,11,12,13. Two other bleeding risk scores have been developed from large observational studies of AF populations and subsequently validated; the ATRIA14 and the HEMORR2HAGES15 scores. The HAS-BLED score has been shown to be as good as–or superior to–these other (and arguably more complicated) bleeding risk scores11,16,17,18. More recently, the ORBIT bleeding score was derived from a large contemporary AF prospective registry19,20, with the aim to propose a simpler score to be used for the assessment of bleeding risk in AF patients, irrespective of the type of OAC used. The limitations of the ORBIT score have been recently discussed21.

Despite this aim, one major limitation of the ORBIT and other bleeding risk scores (apart from HAS-BLED) is the exclusion of labile anticoagulation control (as reflected by time in therapeutic range [TTR]) amongst vitamin K antagonist (VKA, e.g. warfarin) users, despite the very strong association of poor TTR with major bleeding3,22,23. The VKAs are still in very widespread clinical use as OAC therapy worldwide and clinically useful bleeding risk scores also need to be applicable to VKA users.

Recently we investigated the impact of TTR when added to the ORBIT and ATRIA bleeding risk scores, when compared to HAS-BLED, in predicting ‘clinically relevant bleeding’ in the AMADEUS Trial cohort24. Adding TTR to both ORBIT and ATRIA bleeding scores improved their predictive ability, although this analysis was hampered by a short follow-up observation, low number of adverse events and a broad definition of ‘any clinically relevant bleeding’ rather than focusing on major bleeding24. Thus, a separate validation study in an independent cohort which is adequately powered for major bleeding is needed to confirm our initial observations.

The objectives for the present analysis were as follows: (i) to perform a comprehensive comparison of the four AF-validated bleeding risk scores (HAS-BLED, ORBIT, ATRIA and HEMORR2HAGES), amongst a large cohort of non-valvular AF patients; and (ii) to further investigate if the predictive value of bleeding risk scores other than HAS-BLED could be improved by incorporating TTR, if VKA was used.

Methods

We tested the HAS-BLED, ORBIT, ATRIA and HEMORR2HAGES bleeding scores on the patients receiving warfarin in the pooled population dataset from the Stroke Prevention using an Oral Thrombin Inhibitor in patients with atrial Fibrillation (SPORTIF) III and V studies25,26,27. The SPORTIF trials were two multicentre phase III clinical trials comparing the efficacy and safety of the direct thrombin inhibitor ximelagatran, compared to warfarin, in predicting thromboembolic stroke in non-valvular AF patients.

De-identified datasets with patient-level information for the SPORTIF trials were obtained directly from Astra Zeneca and all the analyses were performed independent of the company. All patients assigned to the warfarin treatment arms and with available data for the clinical variables used to calculate the four bleeding prediction scores were included in the present analysis. Detailed methods about evaluation of anticoagulation control, assessment of the HAS-BLED, ORBIT, ATRIA and HEMORR2HAGES bleeding scores and study outcomes definition are reported in the web-only Supplementary Material.

We considered ‘major bleeding’ events in two distinct ways, as follows: (i) “investigator level” events (that included the crude number of all the major bleeding events reported by any investigator at every study site); and (ii) “adjudicated events” (corresponding to the final trial adjudicated major bleeding events, after the independent central adjudication committee evaluated all the reported events). This distinction was done in order to analyse the ability of bleeding scores in correctly identifying patients at low risk of bleeding, when accounting for possible bleeding events that could occur in “real-life” management of AF patients (ie. at the ‘prescriber’ or investigator level), in contrast to the strictly defined adjudicated trial protocol criteria.

Statistical Analysis

All continuous variables were tested for normality with the Shapiro-Wilk test. Variables with normal distribution were expressed as means and standard deviation and tested for differences using a t-test. Non-parametric variables were expressed as median and interquartile range (IQR), with differences tested using the Mann-Whitney U test. Categorical variables, expressed as counts and percentages, were analysed by chi-squared test. A logistic regression analysis was performed to investigate the association between the “low risk” bleeding category and the clinical endpoint of “absence of major bleeding”. This analysis was performed according to the “investigator level” major bleeding events group.

Event-free survival analysis, assessed by an intention-to-treat approach, was performed according to bleeding risk categories and differences in survival distributions between subgroups were analysed using the log-rank test. A Cox proportional hazards analysis was used to evaluate the occurrence of major bleeding according to bleeding risk categories, based on “adjudicated events”.

A receiver operating characteristic (ROC) curve was compiled for all risk scores, using the major bleeding “adjudicated events” group, in order to evaluate the predictive ability of all models. Comparisons of ROC curves were performed according to De Long, De Long and Clarke-Pearson method28. Continuous net reclassification improvement (NRI) and integrated discriminatory improvement (IDI) were computed using the “PredictABEL” R package, according to methods described by Pencina et al.29.

To evaluate the impact of poor anticoagulation control amongst the bleeding scores, an additional analysis adding one point for TTR < 65%, was added to the ORBIT, ATRIA and HEMORR2HAGES scores to determine if this attributed any improvement in predictive performance for major bleeding.

Two-sided p values < 0.05 were considered statistically significant. All analyses were performed using SPSS v. 22.0 (IBM, NY, USA), MedCalc v. 15.6 (MedCalc Software, Belgium) and R for Mac OS X v. 3.2.1 (The R Foundation for Statistical Computing).

Results

In the original combined SPORTIF dataset, a total of 3,665 patients were assigned to the warfarin treatment arm; data to calculate the bleeding risk scores for the present analyses were available in 3,551 patients (96.9%). The majority of patients were male (69.5%) and the median [IQR] age was 72 [66–77] years. A total of 706 (19.9%) patients were treated concomitantly with aspirin, while only 20.1% of patients were VKA naïve at baseline. Median [IQR] CHA2DS2-VASc score was 3 [2–4], with 3,074 (86.6%) patients being categorised as ‘high risk’.

Bleeding Risk in Overall Population

Median [IQR] value in the overall study cohort for HAS-BLED score was 3 [2–4], while the median ORBIT, ATRIA and HEMORR2HAGES scores were 1 [0–2], 1 [1–3] and 1 [1, 2], respectively. Distribution of patients according to the various scores on each bleeding risk schema is shown in Fig. 1 (Panels A–D). High bleeding risk according to HAS-BLED (score ≥ 3) was seen in 71.0% of patients; whilst 7.5% of patients were at medium/high bleeding risk using the ORBIT score. The proportion of medium/high risk patients were 2.5% for ATRIA and 41.9% for HEMORR2HAGES.

Figure 1
figure 1

Distribution of scores for the cohort utilising each bleeding risk score.

Clinical Characteristics at Baseline

Distribution of clinical characteristics according to risk categories for the various bleeding schemes are reported in eTable 1. Higher proportions of females were found in high or medium/high risk categories (p = 0.001 for HAS-BLED and p < 0.001 for the other scores). Coronary heart disease was more frequent in the high or medium/high risk categories for HAS-BLED (p < 0.001), ORBIT (p < 0.001) and HEMORR2HAGES (p = 0.026) but not for ATRIA (p = 0.13). No significant difference was found in previous stroke/transient ischemic attack between low and medium/high ATRIA categories (p = 0.065). Similarly, no difference in heart failure was found between the ORBIT and ATRIA score categories (p = 0.091 and p = 0.477, respectively), or in hypertension for ORBIT score categories (p = 0.874). Thromboembolic risk progressively increased between risk categories for all the scores.

The proportion of patients with good anticoagulation control (TTR > 70%) progressively decreased with increasing risk categories for all scores, except for ATRIA (p = 0.424). The HAS-BLED low risk category had the highest proportion of good anticoagulation control patients (64.7% vs 38.7% in high risk category, p < 0.001).

Follow-Up Analysis

A median [IQR] follow-up of 1.6 [1.3–1.8] years yielded a total of 5,002 patient-years observation. A total of 162 “investigator level” major bleeding events were recorded. Of these, 127 were validated as “adjudicated events”, with an overall incidence of 2.5 per 100 patient-years. Event rates according to the various bleeding score and risk categories are shown in Table 1.

Table 1 Major bleeding event rates according to the bleeding risk scores.

Major bleeding rates, both in the “investigator level” and “adjudicated event” groups, progressively increased as the bleeding risk score increased (and by bleeding risk categories) using the HAS-BLED score. Conversely, event rates decreased as the bleeding risk score increased and from low to medium/high bleeding risk categories, both for the ORBIT and ATRIA scores.

Based on the HEMORR2HAGES score, most of the events occurred in the ‘low risk’ category. Of the 127 adjudicated major bleeding events, 21.3% of events occurred in the ‘low risk’ HAS-BLED category (1.8 per 100 patient-years), compared to 87.4% occurring in the low risk category for ORBIT, 96.6% for ATRIA and 52.0% for HEMORR2HAGES (approx. 2.5 per 100 patient-years) (Table 1).

Regression and Survival Analyses

Logistic regression analysis for the absence of any ‘investigator level’ defined major bleeding events, found that the HAS-BLED ‘low risk’ category was associated with the absence of any ‘investigator level’ defined major bleeding events (low risk vs. high risk, adjusted odds ratio [OR]: 1.46, 95% confidence interval [CI]: 1.00–2.13, p = 0.050). None of the other scores showed a significant association with the absence of any ‘investigator level’ defined major bleeding.

Analysis of bleeding scores as continuous variables, adjusted for sex and AF type, showed a significant association with adjudicated major bleeding events for all scores (eTable 2). Survival analysis showed that patients in high or medium/high risk category had a higher risk for adjudicated major bleeding events for both the HAS-BLED (Log-Rank: 5.147, p = 0.023) and ORBIT (Log-Rank: 5.247, p = 0.022) scores. Log-rank analyses showed no significant differences between risk categories for both ATRIA and HEMORR2HAGES scores.

After adjustment for gender and type of AF, Cox regression analyses demonstrated that a high or medium/high risk category was significantly associated with adjudicated major bleeding for both HAS-BLED (HR: 1.62, 95% CI: 1.06–2.48, p = 0.026) [Fig. 2, Panel A] and ORBIT (HR: 1.83, 95% CI: 1.08–3.09, p = 0.024) [Fig. 2, Panel B], but not for the ATRIA (HR: 1.36, 95% CI: 0.50–3.69, p = 0.544) [Fig. 2, Panel C] and HEMORR2HAGES (HR: 1.41, 95% CI: 0.99–2.00, p = 0.057) [Fig. 2, Panel D] scores.

Figure 2
figure 2

Event free survival for “adjudicated event” major bleeding according to risk categories for each bleeding risk score.

Panel (A) HAS-BLED Solid Line = High Risk; Dashed Line = Low Risk; Panel (B) ORBIT Solid Line = Medium/High Risk; Dashed Line = Low Risk; Panel (C) ATRIA Solid Line = Medium/High Risk; Dashed Line = Low Risk; Panel (D) HEMORR2HAGES Solid Line = Medium/High Risk; Dashed Line = Low Risk.

Performance and Reclassification Analysis

Receiver operating characteristic (ROC) curves analysis (eTable 3) showed that all the four scores were able to identify AF patients that reported an adjudicated major bleeding event. ROC curves comparison analyses showed that HEMORR2HAGES had a worse performance compared both to ORBIT (z: 1.923, p = 0.054) and ATRIA (z: 2.521, p = 0.012) scores. No other statistically significant differences were found between the other risk scores. Based on Pencina et al.29, there was a significant negative reclassification with the HEMORR2HAGES score compared with ORBIT and ATRIA (NRI: −0.2164, p = 0.016 and NRI: −0.3128, p < 0.001 respectively) and a loss in sensitivity in comparison with all the other bleeding scores based on IDI analyses (Table 2).

Table 2 Reclassification analysis for the various bleeding risk scores.

Impact of TTR

Modified scores, as continuous variables, were significantly associated (p < 0.001) with adjudicated major bleeding events (eTable 4). When considered as categorical variables, medium/high risk vs. low risk was also significantly associated with adjudicated events (p < 0.001 for ORBIT and ATRIA; p = 0.013 for HEMORR2HAGES).

Comparisons between ROC curves for the modified scores compared with their original values are summarised in Table 3 and show a significant difference for the HEMORR2HAGES (p = 0.028) score with borderline significance for the modified ATRIA score (p = 0.052). Reclassification analysis demonstrated that by adding TTR < 65%, all 3 modified scores reported a significant improvement in reclassification and discrimination gain, with significant differences in NRI and IDI compared to their original scores that did not include TTR (Table 3). No significant difference was found in AUC for each of the modified scores (that included TTR), when compared with HAS-BLED (full data not shown).

Table 3 Comparison of ROC curves and reclassification analysis for modified bleeding scores with TTR.

Discussion

Our principal finding was that the different bleeding scores provided different discriminatory capacities for major bleeding in anticoagulated AF patients; specifically, both HAS-BLED and ORBIT categorised adjudicated major bleeding events in low risk and high or medium/high risk patients appropriately, but the majority of adjudicated major bleeding events occurred in the ‘low risk’ ORBIT category. Second, adding a labile INR criterion (TTR < 65%) to ORBIT, ATRIA and HEMORR2HAGES led to improved predictive performance for major bleeding compared to the original scores. Thus, both the ATRIA and ORBIT scores may perform suboptimally in identifying serious bleeding risk in a patient on warfarin, unless they are re-calibrated taking labile INRs (or TTRs) into consideration.

In contrast, the HAS-BLED score already considers ‘labile INR’ as one of its criteria, which is applicable only for a VKA user (whilst the L criterion is not applicable if a NOAC is used). The HAS-BLED score has been well-validated in predicting major bleeding in various clinical settings11,12,13,30. This score has been tested in untreated, aspirin, VKA11 and in non-VKA anticoagulant settings16,17,31, as well as in AF and non-AF cohorts. HAS-BLED is also predictive of major bleeding during bridging13 and in the setting of acute coronary syndrome and percutaneous coronary intervention30. Previous direct comparisons with HEMORR2HAGES15 and ATRIA14 have showed that HAS-BLED was a good as (or even superior) in the evaluation of bleeding risk11,17,32, even in ‘real world’ settings33 and in predicting intracranial haemorrhage (ICH)34.

The ORBIT score was derived from a large industry-sponsored observation registry that enrolled more than 7,400 AF patients19. The authors proposed that the development of this new bleeding score would allow the evaluation of bleeding risk in AF patients in an easier and simplified manner compared with other scores, taking into consideration clinical variables easily collectible from clinical history. The score was validated in the ROCKET-AF trial, which was a trial of anticoagulation with rivaroxaban vs. warfarin, among high risk AF patients (only those with CHADS2 score ≥2 were included and those with score = 2 were capped at 10%)35. Based on the published results the ORBIT score performed similarly in the derivation cohort and statistically better than other validated scores, HAS-BLED and ATRIA20. As highlighted in the accompanying Editorial21, statistical significance and clinical applicability need to be balanced36. For example, a 40 year old man with prior stroke, labile INRs on warfarin (e.g. TTR 50%), concomitant use of non-steroidal anti-inflammatory drugs, abnormal liver function would have an ORBIT score of 0 (i.e. low risk), but would have a HAS-BLED score of 4 (high risk)21. As recommended in guidelines, the responsible physician would ‘flag up’ this patient with high HAS-BLED score and in accordance with good clinical practice would strive to control blood pressure, optimise the TTR (or swap to a NOAC) and reduce concomitant drugs assumption. The ORBIT score would not ‘flag up’ such a patient (relevant to automated ‘alert flags’ used in electronic health records) nor draw attention to the reversible bleeding risk factors.

Our study seems to confirm the illustrative case above, given that use of HAS-BLED categorised adjudicated major bleeding events in low risk and high risk patients appropriately, whilst the majority of major bleeding events occurred in patients categorised as ‘low risk’ using the ORBIT score. Also, HAS-BLED score category was associated with the absence of any “investigator defined” major bleeding events, whilst risk categorisation using the ORBIT score was not significantly associated with the absence of any investigator defined major bleeding events.

Consideration of poor anticoagulation control is crucial when evaluating bleeding risk. Our data clearly show an improved association with adjudicated major bleeding events and risk stratification for the ORBIT, ATRIA and HEMORR2HAGES scores when adding TTR < 65% as a measure of poor anticoagulation, with improved reclassification and discriminatory performance. Moreover, these results clearly confirm and further strengthen our previous analyses. In the AMADEUS trial cohort, adding TTR to the ORBIT and ATRIA bleeding risk score schemes improved their predictive abilities for clinically relevant bleeding24. Despite that, given that the AMADEUS trial was stopped early and given the few major bleeding endpoints during the short follow-up period, the analysis was only focused on ‘any clinically relevant bleeding’.

Our study reinforces the concept that neglecting anticoagulation control, expressed by TTR, in bleeding assessment would led to a reduced performance of bleeding prediction scores in patients treated with VKA. This paper has important clinical implications as it suggests that some bleeding scores (ORBIT and ATRIA) are suboptimally in identifying serious bleeding risk in a patient on VKA, unless they are re-calibrated taking labile INRs (or TTRs) into consideration. Indeed, VKAs are still very commonly used worldwide as oral anticoagulants and despite the introduction of the NOACs, over-simplification of bleeding risk scores advocated to work for both VKAs and NOACs, but yet ignoring TTR in the VKA users, could potentially underestimate bleeding risks (and lead to potentially serious bleeding events). Of note, the HAS-BLED score already assigns 1 point for ‘labile INR’ and there was no difference in AUCs for the modified scores (adding TTR to HEMORR2HAGES, ORBIT and ATRIA), when compared with HAS-BLED.

Limitations

This study is mainly limited by the post-hoc, retrospective nature of our analysis and by the relatively short follow-up observation period, eventhough our study population was an ancillary analysis to a well conducted prospective randomised trial with adjudicated endpoints. Moreover, not all the factors (e.g. genetic factors for HEMORR2HAGES) required for scores assessment were present in our data set. In addition, we used only the Cockroft-Gault creatinine clearance calculation for renal function assessment, differently from the definitions used in the original score schemes. Also, study patients were carefully followed up as per the clinical trial protocol, risk factors would have been proactively managed and some patients at very high bleeding risk were excluded due to the trial exclusion criteria. These reasons could account for the low bleeding rates seen and the low event rates even amongst ‘high risk’ patients. Nonetheless, the HAS-BLED score was developed to ‘flag up’ the patients potentially at high risk for bleeding for more careful review and follow-up so that reversible risk factors can be addressed (rather than let bleeding events actually occur).

Conclusion

In conclusion, the HAS-BLED score had the best predictive value in identifying those at ‘low risk’ of major amongst bleeding VKA-treated patients. Indeed, the majority of such bleeds occurred in patients categorised as ‘low risk’ using the ORBIT, ATRIA and HEMORR2HAGES scores. Second, adding labile INR (i.e. TTR < 65%) to the ORBIT, ATRIA and HEMORR2HAGES bleeding risk scores led to their improved predictive performance for major bleeding.

Additional Information

How to cite this article: Proietti, M. et al. Major Bleeding in Patients with Non-Valvular Atrial Fibrillation: Impact of Time in Therapeutic Range on Contemporary Bleeding Risk Scores. Sci. Rep. 6, 24376; doi: 10.1038/srep24376 (2016).