Introduction

Multiple clinical trials have demonstrated that longer atrial high-rate episodes (AHREs) are associated with an increased risk of atrial fibrillation (AF), ischemic stroke, and adverse cardiovascular outcomes1,2,3,4,5. Therefore, the current European Society of Cardiology (ESC) guidelines recommend comprehensive cardiovascular evaluation, including the stroke risk, medical comorbidities, and risk factors, in patients with AHREs detected by implanted devices. Notably, in patients with a longer duration of AHREs, more intensive monitoring can be more useful6,7. Previous clinical studies have demonstrated multiple predictors of AHREs in patients with pacemakers1,4,8,9,10. However, these inconsistent predictors paradoxically indicate that the etiology is ambiguous. Therefore, precisely estimating clinically relevant AHREs is an important part of optimal patient management after pacemaker implantation. Machine learning (ML) offers a computational and alternative approach to standard predictive modeling that recognizes complex characteristics within data11. To date, there have been several investigations of ML algorithms for coronary artery disease genetics12, cardiac resynchronization therapy outcomes, such as mortality and heart failure (HF)-related hospitalization13, HF with preserved ejection fraction14, and estimation of ventricular tachycardia recurrence and mortality after catheter ablation15 in the cardiology era. We hypothesized that ML algorithms can produce a predictive model for clinically relevant AHREs in individual patients that can be more useful than previously reported predictors. Herein, we sought to determine which class of ML algorithms has the highest predictive accuracy using data from a prospective multicenter registry.

Materials and methods

Study design

The cohort of patients in this study was derived from the evaluation of the Atrial Fibrillation occurrence in patients after Pacemaker implantation (AF-Pacemaker study), a prospective, multicenter, observational registry study performed in patients with AF aged > 18 years attending any of the 11 tertiary hospital centers comprising all geographical regions of Republic of Korea. The study enrollment period started in September 2017 and ended in July 2020.

The AF-Pacemaker study aimed to investigate the occurrence and management (including ablation therapy) of device-detected AF episodes in patients with pacemaker implants through a prospective, non-randomized, non-blinded, observational, multicenter design. The study was conducted in compliance with the ethical rules of the Declaration of Helsinki as a statement of ethical principles for medical research involving human subjects by the World Medical Association and approved by the Institutional Review Board of Yonsei University Health System (1-2017-0008). This study was registered at ClinicalTrials.gov (NCT03303872, First posted on October 6, 2017). The ethics committees of all 11 tertiary hospitals (Severance Hospital, Seoul National University Bundang Hospital, Seoul National University Hospital, Donga University Medical Center, Keimyung University Hospital, Ewha Womans University Medical Center, Daegu Catholic University Medical Center, Korea University Medical Center, Eulji University Hospital, CHA Bundang Medical Center, and Kangneung Asan Medical Center) approved this study, and all patients provided informed consent for their inclusion.

Study population

The study population included patients (1) eligible for permanent pacemaker implantation according to the guidelines on cardiac pacemaker implantation for sick sinus syndrome (sinus bradycardia, sinus pause of ≥ 3 s, tachy-brady syndrome, sinus node dysfunction, and chronotropic incompetence) or atrioventricular block (high-degree/complete atrioventricular block), (2) with an atrial sensing capability.

A total of 816 consecutive patients who were implanted with a St. Jude Medical dual-chamber rate-adaptive pacemaker (Assurity PM2240) with stored electrogram capabilities were enrolled. The pacemakers were incorporated with bipolar atrial and ventricular leads (Tendril MRI LPA1200M, Isoflex Optim 1944/1948, and Tendril ST Optim 1888TC) in all patients. The atrial and ventricular leads were placed in the right atrial appendage and right ventricular apex, respectively. We excluded patients with severe liver dysfunction (aspartate aminotransaminase/alanine aminotransferase level ≥ 3 times the normal upper limit) or severe renal dysfunction (serum creatinine level of ≥ 3.5 mg/dL or creatinine clearance of ≤ 30 mL/min), including conditions requiring dialysis; pregnant or lactating patients; or those malignant cancer, dilated cardiomyopathy, hypertrophic cardiomyopathy, severe valvular heart disease, or life expectancy of ≤ 12 months from enrollment. Further, 95 patients with missing data for analysis were excluded (Fig. 1). Based on the available data from 721 patients567 patients were used for model development and 154 for validation.

Figure 1
figure 1

Study population selection process.

Data collection, Pacemaker programming and AHREs detection

For the AF-Pacemaker study, data were collected by independent clinical research coordinators via Web-based case report forms on the Internet-based Clinical Research and Trial management system (iCReaT), a data management system established by the Centers for Disease Control and Prevention, Ministry of Health and Welfare, Republic of Korea (iCReaT Study No. C170004). Each center could see its data and those of the other participating centers.

An AHRE detection rate of 220 beats/min was programmed, and storage of up to four atrial electrograms of 12-s duration on automatic detection of AHREs was activated. At every visit, the longest duration of all AHREs was ascertained. The longest AHRE (> 6 min) was defined as clinically relevant AHREs. Device interrogation information was obtained at regular clinic visits every 6 months after pacemaker implantation. An interval of up to 3 months before and after each clinic visit was allowed. The AHREs were interrogated to compare electrocardiogram (ECG) traces and pacemaker AHREs data at the end of the ambulatory monitoring period. The clinicians were blinded to the atrial diagnostic data. A detailed study protocol has been published previously16.

Model development

Predictive feature selection

We collected 28 variables available before the time of pacemaker implantation, including baseline patient demographic characteristics, clinical information, medications, and 12-lead ECG, laboratory examination, Holter monitoring, treadmill test, and transthoracic echocardiography findings. The predictors were age, sex, body mass index, current or former smoking status, current or former alcohol consumption, baseline heart rate, baseline systolic blood pressure, baseline diastolic blood pressure, indications for pacemaker implantation (sinus node dysfunction or atrioventricular node disease), estimated glomerular filtration rate (eGFR), left atrium (LA) diameter, left ventricular ejection fraction (LVEF), QRS duration, corrected QT (QTc) interval, presence of HF, hypertension, diabetes mellitus, prior stroke or transient ischemic attack (TIA), vascular disease, chronic kidney disease, presence of ventricular premature contraction, presence of atrial premature contraction, dyslipidemia, and medications, including renin‒angiotensin‒aldosterone system blockers, beta adrenergic receptor blockers, calcium channel blockers, diuretics, and statins. Continuous variables were evaluated using point-biserial correlation coefficients17 in relation to the clinically relevant AHREs, and the correlation of discrete variables was measured using Cramér’s V18. Considering the two indicators, variables showing a low degree of correlation value using the knee point with the clinically relevant AHREs were excluded from the modeling process (Supplementary Table 1 and Supplementary Fig. 1). In addition, all continuous variables were normalized using Z-score normalization. Finally, the predictive models were constructed with subset data consisting of 10 variables based on model performance index for maximum prediction performance, excluding 18 variables, as shown in Supplementary Table 2.

Imbalanced data preprocessing

When a predictive model is trained on imbalanced data, it tends to classify patterns biased into majority classes and ignores the characteristics of minority classes. Considering the imbalance problem of the study population, we applied a balancing method using the synthetic minority oversampling technique (SMOTE)19, and the minority class (clinically relevant AHREs) was oversampled. Clinically relevant AHREs were noted in 14.4% of the original dataset but in 30.8% of the SMOTE-balanced dataset after oversampling.

Model derivation and algorithms

To develop ML algorithms, we divided the study population into a training set, in which the prediction algorithm for clinically relevant AHREs was derived, and a test set, in which the algorithm was evaluated based on hospital units. The training set was derived from the patient data of seven hospitals and the test set from the patient data of the four remaining hospitals. We developed three ML-based predictive models: the random forest (RF)20 algorithm, support vector machine (SVM)21 algorithm, and extreme gradient boosting (XGB)22 algorithm, in comparison to conventional logistic regression (Fig. 2).

Figure 2
figure 2

Flow diagram for the modeling process.

RF algorithm is an ensemble learning method that creates multiple prediction results by multiple decision trees and determine the outcome by majority vote from decision tree results. We developed our RF model with 300 decision trees (Supplementary Fig. 2) and the average tree depth is 16 ± 2.1 (mean ± SD). SVM is an algorithm that finds a hyperplane that divides two categories of data in a multi-dimensional space of data. Lastly, XGB is an ensemble algorithm to create a strong decision tree classifier by combining weak tree classifiers built sequentially that each subsequent tree is trained from residuals of the previous tree to reduce its error.

We applied five repetitions of the tenfold cross validation (CV)23 method to validate predictive models with hyperparameters tuned by a grid search algorithm for RF and SVM, and Bayesian optimization24 algorithm for XGB. To determine how each variable affects the prediction of outcome, we calculated the feature importance of individual variables using the average absolute deviation by sensitivity analysis for each model25.

Statistical analysis

Continuous variables were presented as means ± standard deviations for normally distributed values or medians and interquartile intervals for non-normally distributed values and categorical variables as numbers and percentages in each group. The baseline characteristics of the two groups were compared using Student’s t-test or Wilcoxon test for continuous variables and Pearson’s χ2 test or Fisher’s exact test for categorical variables. To evaluate the model discrimination and accuracy, we measured the area under the receiver operating characteristic (AUROC) curve, area under the precision–recall curve (AUPRC), F1-score, accuracy, sensitivity, specificity, negative predictive value (NPV), and positive predictive value (PPV) and calculated the Brier score26, including reliability, resolution, and uncertainty. All analyses were performed using R (version 4.0.2; R Foundation for Statistical Computing, Vienna, Austria). The dedicated R packets and versions to validate and apply the 3 ML models has been published in GitHub (RF, R package version 4.6–14 https://CRAN.R-project.org/package=randomForest; SVM, R package version 0.9–29 https://CRAN.R-project.org/package=kernlab; XGB, R package version 1.4.1.1. https://CRAN.R-project.org/package=xgboost). Statistical significance was set at p values of < 0.05.

Results

Between September 2017 and July 2020, 721 patients were eligible for inclusion; their median age was 73 years, and 61.5% were women. The median follow-up duration was 18 months. Atrioventricular node disease (56.7%) was the most common indication for cardiac implantable electronic device implantation, followed by sinus node dysfunction (43.3%). A total of 104 patients (14.4%) experienced AHREs lasting for > 6 min, which were defined as clinically relevant AHREs in this study. The patients who experienced clinically relevant AHREs were significantly more likely to have a higher rate of sinus node dysfunction for pacemaker implantation, experience > 1% atrial premature contraction on pre-procedural Holter monitoring, use beta-adrenergic receptor blockers, and have a shorter QRS duration than those who did not experience clinically relevant AHREs (Table 1). The derivation set consisted of 567 patients from seven hospitals, and the validation set consisted of 154 patients from four hospitals. Supplementary Table 3 shows the characteristics of these two sets. The prevalence of former or current smokers and alcohol consumers and dyslipidemia, baseline systolic blood pressure, and LA diameter on echocardiography were higher in the derivation set. Further, this set of patients had lower baseline eGFR and LVEF and shorter QRS duration and QTc interval on electrocardiography.

Table 1 Baseline characteristics of the patients with and without clinically relevant AHREs.

Model performance

In the ML-based models, improvement in discrimination was achieved using the same data of the validation set (Table 2). The AUROCs achieved by each model was 0.742 for the RF algorithm, 0.675 for the SVM algorithm, and 0.745 for the XGB algorithm, which were numerically higher than 0.669 for logistic regression. The AUPRC and F1-score also improved, especially those for the XGB algorithm. The ROC and PRC curve plots are shown in Fig. 3. Using the Brier score, we achieved better performance in all three models based on the reliability, with lower values indicating higher agreement between observed and predicted risks, in comparison with logistic regression. The XGB algorithm had a higher resolution value, which indicated a prediction more accurate than that of the other algorithms across the spectrum risk.

Table 2 Performance characteristics of models in the validation set for predicting clinically relevant AHREs in patients with pacemaker.
Figure 3
figure 3

Receiver operating characteristic curve analysis (A) and precision–recall curve analysis (B) for each model.

Feature importance of the individual variables

We observed that beta-adrenergic receptor blocker use, prior stroke or TIA, LA diameter, and > 1% atrial premature contraction on Holter monitoring before pacemaker implantation consistently influenced the modeling process for predicting clinically relevant AHREs using the three ML-based models (Table 3 and Fig. 4).

Table 3 Feature importance in the three machine learning-based models.
Figure 4
figure 4

Feature importance index plot for each model.

Discussion

In this prospective, multicenter, observational registry study, we found that the three ML algorithms (RF, SVM, and XGB) were better at identifying individuals who will develop clinically relevant AHREs among those with pacemaker implants. The XGB algorithm showed better performance and Brier scores than did the other algorithms. We also defined and calculated the index of feature importance of the variables in all three ML-based models. We found consistent feature importances for beta-adrenergic receptor blocker use, prior stroke or TIA, > 1% atrial premature contraction on Holter monitoring before pacemaker implantation, and LA diameter on echocardiography.

The detection of clinically relevant AHREs with cardiac implantable electronic devices is becoming increasingly attractive for predicting the progression to AF or reducing the risk of embolic stroke. Although various predictors of AHREs4,8,27,28 have been identified, they did not show highly consistent results. ML is powerful and has become ubiquitous and indispensable for solving complex medical problems. It offers an improved description and development of decision support tools to predict clinical events and encourage steps forward. Although it has been reported that AHREs ≥ 30 s or ≥ 2 min is associated with cerebrovascular events in previous studies29,30, other studies have reported the more relevant diagnostic points for AHREs (> 6 min)1,31,32. The current ESC guidelines also suggest that a minimum of 5 to 6 min of AHREs is associated with progression to AF, ischemic stroke, and major adverse cardiovascular events3,5,6.

To our knowledge, this is the first study to apply an ML algorithm to predict clinically relevant AHREs in patients with pacemaker implants and demonstrate improved prediction compared with that obtained with traditional statistical methods.

We selected the RF, SVM, and XGB algorithms and applied them to each predictive model in comparison to the logistic regression model. Because our dataset was relatively small and imbalanced, we used the five repeated tenfold CV method, instead of specifying a validation set in a training set separately, and the SMOTE to oversample the data set. In this study, we assessed the AUROC, AUPRC, F1-score, accuracy, sensitivity, specificity, NPV, and PPV and calculated the Brier score to evaluate the performance metrics of each model. The model with the XGB algorithm achieved the best performance. The predictive models also provide an opportunity to understand the features that may contribute to clinically relevant AHREs. This approach can identify the quantitation of feature importance for each variable, and we found consistent (beta-adrenergic receptor blocker use, prior stroke or TIA, > 1% atrial premature contraction on Holter monitoring before pacemaker implantation, and LA diameter) factors in the predictive models. Some of these features (prior stroke or TIA and LA diameter) have been described previously8,16.

This study had several strengths. The predictive models were based on simple and readily available clinical characteristics. Although the models appear only as mathematical exercise, it can provide information that is indirectly helpful to clinicians. Recently, Perino et al.33 reported that with increasing AHREs lasting for > 6 min to > 24 h, the stroke risk increased in those who did not receive anticoagulation treatment and mostly decreased in those who did. Vergara et al.34 investigated the temporal association between AHREs and the risk of ventricular arrhythmias (VA). AHREs that precede VA increased the risk of VA recurrence. In these regards, the predicted probability for AHREs lasting for > 6 min could be shared with the patient, and anticoagulation or antiarrhythmic therapy could serve as a critical element to prevent ischemic stroke or the recurrence of VA. The temporal association was observed We compared the model evaluation performance metrics, and the XGB model yielded greater discrimination and accuracy than did the other models and logistic regression. Meanwhile, the SVM model exhibited relatively worse performance metrics than did the RF and XGB models because the ensemble model tends to have better performance and robustness than the single model. Our study showed that ML algorithms may play a role in precision cardiology. Although the baseline characteristics of the derivation and validation sets were slightly different, the model performance metrics showed acceptable results.

This study has some limitations. As the data sets, especially the validation set, were relatively small and contained limited features, the current analysis findings may not necessarily be representative of the predictors of clinically relevant AHREs in patients with implanted pacemakers. However, the patients were prospectively enrolled from 11 tertiary centers, which can yield some degree of generalizability. In this study, 61.5% were women. Many women were enrolled, but the ratio of men and women was comparable to similar studies in the same region35,36. Further assessment and improvement of the applicability of the predictive models are necessary for larger and different-race populations.

Finally, the models developed herein used only clinically relevant AHREs as the outcome data, not progression to clinical AF, ischemic stroke, MACEs, and death. Further improvements in multiple outcome prediction with higher accuracy should be explored.

Conclusion

Our study illustrated the utility of ML algorithms in estimating clinically relevant AHREs in patients with implanted pacemakers using easily obtainable preimplantation features. Classification of patients using these models can support clinical decisions for anticoagulation therapy to prevent adverse outcomes in selected patients. From this perspective, models need to be built and validated individually for each diagnosis. More high-quality evidence can be obtained by applying ML algorithms; consequently, the data obtained can aid in the optimal management of these patients with shared decision-making.