Major Bleeding in Patients with Non-Valvular Atrial Fibrillation: Impact of Time in Therapeutic Range on Contemporary Bleeding Risk Scores

Bleeding risk represents a major concern in anticoagulated patients with atrial fibrillation (AF). Several bleeding prediction scores have been described: HAS-BLED, ATRIA, HEMORR2HAGES and ORBIT. Of these, only HAS-BLED considers quality of anticoagulation control amongst vitamin K antagonist (VKA) users. We hypothesised that predictive value of bleeding risk scores other than HAS-BLED could be improved incorporating time in therapeutic range (TTR) in warfarin-treated patients. Of the 127 adjudicated major bleeding events, 21.3% of events occurred in ‘low-risk’ HAS-BLED category (1.8 per 100 patient-years), compared to higher proportions (≥50% of events; ~2.5 per 100 patient-years) in ‘low-risk’ categories for other scores. Only the ‘low-risk’ HAS-BLED category was associated with the absence of investigator-defined major bleeding events (OR: 1.46;95% CI: 1.00–2.15). ‘High’ or ‘medium/high’ risk categories for the HAS-BLED (p = 0.023) or ORBIT (p = 0.022) scores, respectively, conferred significant risk for adjudicated major bleeding events. On Cox regression analysis, adjudicated major bleeding was associated only with HAS-BLED (HR: 1.62;95% CI: 1.06–2.48) and ORBIT (HR: 1.83;95% CI: 1.08–3.09) ‘high-risk’ categories. Adding ‘labile INR’ (TTR < 65%) to ORBIT, ATRIA and HEMORR2HAGES significantly improved their reclassification and discriminatory performances. In conclusion, HAS-BLED categorised adjudicated major bleeding events in low-risk and high-risk patients appropriately, whilst ORBIT and ATRIA categorised most major bleeds into their ‘low-risk’ patient categories. Adding TTR to ORBIT, ATRIA and HEMORR2HAGES led to improved predictive performance for major bleeding.

K antagonist (VKA, e.g. warfarin) users, despite the very strong association of poor TTR with major bleeding 3,22,23 . The VKAs are still in very widespread clinical use as OAC therapy worldwide, and clinically useful bleeding risk scores also need to be applicable to VKA users.
Recently we investigated the impact of TTR when added to the ORBIT and ATRIA bleeding risk scores, when compared to HAS-BLED, in predicting 'clinically relevant bleeding' in the AMADEUS Trial cohort 24 . Adding TTR to both ORBIT and ATRIA bleeding scores improved their predictive ability, although this analysis was hampered by a short follow-up observation, low number of adverse events and a broad definition of 'any clinically relevant bleeding' rather than focusing on major bleeding 24 . Thus, a separate validation study in an independent cohort which is adequately powered for major bleeding is needed to confirm our initial observations.
The objectives for the present analysis were as follows: (i) to perform a comprehensive comparison of the four AF-validated bleeding risk scores (HAS-BLED, ORBIT, ATRIA and HEMORR 2 HAGES), amongst a large cohort of non-valvular AF patients; and (ii) to further investigate if the predictive value of bleeding risk scores other than HAS-BLED could be improved by incorporating TTR, if VKA was used.

Methods
We tested the HAS-BLED, ORBIT, ATRIA and HEMORR 2 HAGES bleeding scores on the patients receiving warfarin in the pooled population dataset from the Stroke Prevention using an Oral Thrombin Inhibitor in patients with atrial Fibrillation (SPORTIF) III and V studies [25][26][27] . The SPORTIF trials were two multicentre phase III clinical trials comparing the efficacy and safety of the direct thrombin inhibitor ximelagatran, compared to warfarin, in predicting thromboembolic stroke in non-valvular AF patients.
De-identified datasets with patient-level information for the SPORTIF trials were obtained directly from Astra Zeneca, and all the analyses were performed independent of the company. All patients assigned to the warfarin treatment arms and with available data for the clinical variables used to calculate the four bleeding prediction scores were included in the present analysis. Detailed methods about evaluation of anticoagulation control, assessment of the HAS-BLED, ORBIT, ATRIA and HEMORR 2 HAGES bleeding scores and study outcomes definition are reported in the web-only Supplementary Material. We considered 'major bleeding' events in two distinct ways, as follows: (i) "investigator level" events (that included the crude number of all the major bleeding events reported by any investigator at every study site); and (ii) "adjudicated events" (corresponding to the final trial adjudicated major bleeding events, after the independent central adjudication committee evaluated all the reported events). This distinction was done in order to analyse the ability of bleeding scores in correctly identifying patients at low risk of bleeding, when accounting for possible bleeding events that could occur in "real-life" management of AF patients (ie. at the 'prescriber' or investigator level), in contrast to the strictly defined adjudicated trial protocol criteria. Statistical Analysis. All continuous variables were tested for normality with the Shapiro-Wilk test. Variables with normal distribution were expressed as means and standard deviation, and tested for differences using a t-test. Non-parametric variables were expressed as median and interquartile range (IQR), with differences tested using the Mann-Whitney U test. Categorical variables, expressed as counts and percentages, were analysed by chi-squared test. A logistic regression analysis was performed to investigate the association between the "low risk" bleeding category and the clinical endpoint of "absence of major bleeding". This analysis was performed according to the "investigator level" major bleeding events group.
Event-free survival analysis, assessed by an intention-to-treat approach, was performed according to bleeding risk categories and differences in survival distributions between subgroups were analysed using the log-rank test. A Cox proportional hazards analysis was used to evaluate the occurrence of major bleeding according to bleeding risk categories, based on "adjudicated events".
A receiver operating characteristic (ROC) curve was compiled for all risk scores, using the major bleeding "adjudicated events" group, in order to evaluate the predictive ability of all models. Comparisons of ROC curves were performed according to De Long, De Long and Clarke-Pearson method 28 . Continuous net reclassification improvement (NRI) and integrated discriminatory improvement (IDI) were computed using the "PredictABEL" R package, according to methods described by Pencina et al. 29 .
To evaluate the impact of poor anticoagulation control amongst the bleeding scores, an additional analysis adding one point for TTR < 65%, was added to the ORBIT, ATRIA and HEMORR 2 HAGES scores to determine if this attributed any improvement in predictive performance for major bleeding.
Two-sided p values < 0.05 were considered statistically significant. All analyses were performed using SPSS v. 22

Results
In the original combined SPORTIF dataset, a total of 3,665 patients were assigned to the warfarin treatment arm; data to calculate the bleeding risk scores for the present analyses were available in 3,551 patients (96.9%). The majority of patients were male (69.5%) and the median [ Fig. 1 (Panels A-D). High bleeding risk according to HAS-BLED (score ≥ 3) was seen in 71.0% of patients; whilst 7.5% of patients were at medium/high bleeding risk using the ORBIT score. The proportion of medium/ high risk patients were 2.5% for ATRIA and 41.9% for HEMORR 2 HAGES.
Clinical Characteristics at Baseline. Distribution of clinical characteristics according to risk categories for the various bleeding schemes are reported in eTable 1. Higher proportions of females were found in high or medium/high risk categories (p = 0.001 for HAS-BLED and p < 0.001 for the other scores). Coronary heart disease was more frequent in the high or medium/high risk categories for HAS-BLED (p < 0.001), ORBIT (p < 0.001) and HEMORR 2 HAGES (p = 0.026) but not for ATRIA (p = 0.13). No significant difference was found in previous stroke/transient ischemic attack between low and medium/high ATRIA categories (p = 0.065). Similarly, no difference in heart failure was found between the ORBIT and ATRIA score categories (p = 0.091 and p = 0.477, respectively), or in hypertension for ORBIT score categories (p = 0.874). Thromboembolic risk progressively increased between risk categories for all the scores.
The proportion of patients with good anticoagulation control (TTR > 70%) progressively decreased with increasing risk categories for all scores, except for ATRIA (p = 0.424). The HAS-BLED low risk category had the highest proportion of good anticoagulation control patients (64.7% vs 38.7% in high risk category, p < 0.001).

Follow-Up Analysis. A median [IQR]
follow-up of 1.6 [1.3-1.8] years yielded a total of 5,002 patient-years observation. A total of 162 "investigator level" major bleeding events were recorded. Of these, 127 were validated as "adjudicated events", with an overall incidence of 2.5 per 100 patient-years. Event rates according to the various bleeding score and risk categories are shown in Table 1.
Major bleeding rates, both in the "investigator level" and "adjudicated event" groups, progressively increased as the bleeding risk score increased (and by bleeding risk categories) using the HAS-BLED score. Conversely, event rates decreased as the bleeding risk score increased and from low to medium/high bleeding risk categories, both for the ORBIT and ATRIA scores.
Based on the HEMORR 2 HAGES score, most of the events occurred in the 'low risk' category. Of the 127 adjudicated major bleeding events, 21.3% of events occurred in the 'low risk' HAS-BLED category (1.8 per 100 patient-years), compared to 87.4% occurring in the low risk category for ORBIT, 96.6% for ATRIA and 52.0% for HEMORR 2 HAGES (approx. 2.5 per 100 patient-years) ( Table 1).
Regression and Survival Analyses. Logistic regression analysis for the absence of any 'investigator level' defined major bleeding events, found that the HAS-BLED 'low risk' category was associated with the absence of any 'investigator level' defined major bleeding events (low risk vs. high risk, adjusted odds ratio [OR]: 1.46, 95% Analysis of bleeding scores as continuous variables, adjusted for sex and AF type, showed a significant association with adjudicated major bleeding events for all scores (eTable 2). Survival analysis showed that patients in high or medium/high risk category had a higher risk for adjudicated major bleeding events for both the HAS-BLED (Log-Rank: 5.147, p = 0.023) and ORBIT (Log-Rank: 5.247, p = 0.022) scores. Log-rank analyses showed no significant differences between risk categories for both ATRIA and HEMORR 2 HAGES scores.
After adjustment for gender and type of AF, Cox regression analyses demonstrated that a high or medium/ high risk category was significantly associated with adjudicated major bleeding for both HAS-BLED (HR: 1. Performance and Reclassification Analysis. Receiver operating characteristic (ROC) curves analysis (eTable 3) showed that all the four scores were able to identify AF patients that reported an adjudicated major bleeding event. ROC curves comparison analyses showed that HEMORR 2 HAGES had a worse performance compared both to ORBIT (z: 1.923, p = 0.054) and ATRIA (z: 2.521, p = 0.012) scores. No other statistically significant differences were found between the other risk scores. Based on Pencina et al. 29 , there was a significant negative reclassification with the HEMORR 2 HAGES score compared with ORBIT and ATRIA (NRI: − 0.2164, p = 0.016 and NRI: − 0.3128, p < 0.001 respectively) and a loss in sensitivity in comparison with all the other bleeding scores based on IDI analyses ( Table 2).

Impact of TTR.
Modified scores, as continuous variables, were significantly associated (p < 0.001) with adjudicated major bleeding events (eTable 4). When considered as categorical variables, medium/high risk vs. low risk was also significantly associated with adjudicated events (p < 0.001 for ORBIT and ATRIA; p = 0.013 for HEMORR 2 HAGES).
Comparisons between ROC curves for the modified scores compared with their original values are summarised in Table 3, and show a significant difference for the HEMORR 2 HAGES (p = 0.028) score with borderline significance for the modified ATRIA score (p = 0.052). Reclassification analysis demonstrated that by adding TTR < 65%, all 3 modified scores reported a significant improvement in reclassification and discrimination gain, with significant differences in NRI and IDI compared to their original scores that did not include TTR (Table 3).  No significant difference was found in AUC for each of the modified scores (that included TTR), when compared with HAS-BLED (full data not shown).

Discussion
Our principal finding was that the different bleeding scores provided different discriminatory capacities for major bleeding in anticoagulated AF patients; specifically, both HAS-BLED and ORBIT categorised adjudicated major bleeding events in low risk and high or medium/high risk patients appropriately, but the majority of adjudicated major bleeding events occurred in the 'low risk' ORBIT category. Second, adding a labile INR criterion (TTR < 65%) to ORBIT, ATRIA and HEMORR 2 HAGES led to improved predictive performance for major bleeding compared to the original scores. Thus, both the ATRIA and ORBIT scores may perform suboptimally in identifying serious bleeding risk in a patient on warfarin, unless they are re-calibrated taking labile INRs (or TTRs) into consideration. In contrast, the HAS-BLED score already considers 'labile INR' as one of its criteria, which is applicable only for a VKA user (whilst the L criterion is not applicable if a NOAC is used). The HAS-BLED score has been well-validated in predicting major bleeding in various clinical settings [11][12][13]30 . This score has been tested in untreated, aspirin, VKA 11 and in non-VKA anticoagulant settings 16,17,31 , as well as in AF and non-AF cohorts. HAS-BLED is also predictive of major bleeding during bridging 13 and in the setting of acute coronary syndrome and percutaneous coronary intervention 30 . Previous direct comparisons with HEMORR 2 HAGES 15 and ATRIA 14 have showed that HAS-BLED was a good as (or even superior) in the evaluation of bleeding risk 11,17,32 , even in 'real world' settings 33 and in predicting intracranial haemorrhage (ICH) 34 .
The ORBIT score was derived from a large industry-sponsored observation registry that enrolled more than 7,400 AF patients 19 . The authors proposed that the development of this new bleeding score would allow the evaluation of bleeding risk in AF patients in an easier and simplified manner compared with other scores, taking into consideration clinical variables easily collectible from clinical history. The score was validated in the ROCKET-AF trial, which was a trial of anticoagulation with rivaroxaban vs. warfarin, among high risk AF patients (only those with CHADS 2 score ≥ 2 were included, and those with score = 2 were capped at 10%) 35 . Based on the published results the ORBIT score performed similarly in the derivation cohort and statistically better than other validated scores, HAS-BLED and ATRIA 20 . As highlighted in the accompanying Editorial 21 , statistical significance and clinical applicability need to be balanced 36   (e.g. TTR 50%), concomitant use of non-steroidal anti-inflammatory drugs, abnormal liver function would have an ORBIT score of 0 (i.e. low risk), but would have a HAS-BLED score of 4 (high risk) 21 . As recommended in guidelines, the responsible physician would 'flag up' this patient with high HAS-BLED score, and in accordance with good clinical practice would strive to control blood pressure, optimise the TTR (or swap to a NOAC), and reduce concomitant drugs assumption. The ORBIT score would not 'flag up' such a patient (relevant to automated 'alert flags' used in electronic health records) nor draw attention to the reversible bleeding risk factors.
Our study seems to confirm the illustrative case above, given that use of HAS-BLED categorised adjudicated major bleeding events in low risk and high risk patients appropriately, whilst the majority of major bleeding events occurred in patients categorised as 'low risk' using the ORBIT score. Also, HAS-BLED score category was associated with the absence of any "investigator defined" major bleeding events, whilst risk categorisation using the ORBIT score was not significantly associated with the absence of any investigator defined major bleeding events.
Consideration of poor anticoagulation control is crucial when evaluating bleeding risk. Our data clearly show an improved association with adjudicated major bleeding events and risk stratification for the ORBIT, ATRIA and HEMORR 2 HAGES scores when adding TTR < 65% as a measure of poor anticoagulation, with improved reclassification and discriminatory performance. Moreover, these results clearly confirm and further strengthen our previous analyses. In the AMADEUS trial cohort, adding TTR to the ORBIT and ATRIA bleeding risk score schemes improved their predictive abilities for clinically relevant bleeding 24 . Despite that, given that the AMADEUS trial was stopped early, and given the few major bleeding endpoints during the short follow-up period, the analysis was only focused on 'any clinically relevant bleeding' .
Our study reinforces the concept that neglecting anticoagulation control, expressed by TTR, in bleeding assessment would led to a reduced performance of bleeding prediction scores in patients treated with VKA. This paper has important clinical implications as it suggests that some bleeding scores (ORBIT and ATRIA) are suboptimally in identifying serious bleeding risk in a patient on VKA, unless they are re-calibrated taking labile INRs (or TTRs) into consideration. Indeed, VKAs are still very commonly used worldwide as oral anticoagulants and despite the introduction of the NOACs, over-simplification of bleeding risk scores advocated to work for both VKAs and NOACs, but yet ignoring TTR in the VKA users, could potentially underestimate bleeding risks (and lead to potentially serious bleeding events). Of note, the HAS-BLED score already assigns 1 point for 'labile INR' and there was no difference in AUCs for the modified scores (adding TTR to HEMORR 2 HAGES, ORBIT and ATRIA), when compared with HAS-BLED.
Limitations. This study is mainly limited by the post-hoc, retrospective nature of our analysis and by the relatively short follow-up observation period, eventhough our study population was an ancillary analysis to a well conducted prospective randomised trial with adjudicated endpoints. Moreover, not all the factors (e.g. genetic factors for HEMORR 2 HAGES) required for scores assessment were present in our data set. In addition, we used only the Cockroft-Gault creatinine clearance calculation for renal function assessment, differently from the definitions used in the original score schemes. Also, study patients were carefully followed up as per the clinical trial protocol, risk factors would have been proactively managed, and some patients at very high bleeding risk were excluded due to the trial exclusion criteria. These reasons could account for the low bleeding rates seen, and the low event rates even amongst 'high risk' patients. Nonetheless, the HAS-BLED score was developed to 'flag up' the  patients potentially at high risk for bleeding for more careful review and follow-up so that reversible risk factors can be addressed (rather than let bleeding events actually occur).

Conclusion
In conclusion, the HAS-BLED score had the best predictive value in identifying those at 'low risk' of major amongst bleeding VKA-treated patients. Indeed, the majority of such bleeds occurred in patients categorised as 'low risk' using the ORBIT, ATRIA and HEMORR 2 HAGES scores. Second, adding labile INR (i.e. TTR < 65%) to the ORBIT, ATRIA and HEMORR 2 HAGES bleeding risk scores led to their improved predictive performance for major bleeding.