Optimal Multi-Stage Arrhythmia Classification Approach

Zheng, Jianwei; Chu, Huimin; Struppa, Daniele; Zhang, Jianming; Yacoub, Sir Magdi; El-Askary, Hesham; Chang, Anthony; Ehwerhemuepha, Louis; Abudayyeh, Islam; Barrett, Alexander; Fu, Guohua; Yao, Hai; Li, Dongbo; Guo, Hangyuan; Rakovski, Cyril

doi:10.1038/s41598-020-59821-7

Download PDF

Article
Open access
Published: 19 February 2020

Optimal Multi-Stage Arrhythmia Classification Approach

Jianwei Zheng¹,
Huimin Chu ORCID: orcid.org/0000-0002-8621-0708²,
Daniele Struppa¹,
Jianming Zhang³,
Sir Magdi Yacoub⁴,
Hesham El-Askary¹,
Anthony Chang⁵,
Louis Ehwerhemuepha^1,5,
Islam Abudayyeh ORCID: orcid.org/0000-0002-6366-4205⁶,
Alexander Barrett¹,
Guohua Fu²,
Hai Yao⁷,
Dongbo Li²,
Hangyuan Guo³ &
…
Cyril Rakovski¹

Scientific Reports volume 10, Article number: 2898 (2020) Cite this article

10k Accesses
56 Citations
2 Altmetric
Metrics details

Subjects

Abstract

Arrhythmia constitutes a problem with the rate or rhythm of the heartbeat, and an early diagnosis is essential for the timely inception of successful treatment. We have jointly optimized the entire multi-stage arrhythmia classification scheme based on 12-lead surface ECGs that attains the accuracy performance level of professional cardiologists. The new approach is comprised of a three-step noise reduction stage, a novel feature extraction method and an optimal classification model with finely tuned hyperparameters. We carried out an exhaustive study comparing thousands of competing classification algorithms that were trained on our proprietary, large and expertly labeled dataset consisting of 12-lead ECGs from 40,258 patients with four arrhythmia classes: atrial fibrillation, general supraventricular tachycardia, sinus bradycardia and sinus rhythm including sinus irregularity rhythm. Our results show that the optimal approach consisted of Low Band Pass filter, Robust LOESS, Non Local Means smoothing, a proprietary feature extraction method based on percentiles of the empirical distribution of ratios of interval lengths and magnitudes of peaks and valleys, and Extreme Gradient Boosting Tree classifier, achieved an F₁-Score of 0.988 on patients without additional cardiac conditions. The same noise reduction and feature extraction methods combined with Gradient Boosting Tree classifier achieved an F₁-Score of 0.97 on patients with additional cardiac conditions. Our method achieved the highest classification accuracy (average 10-fold cross-validation F₁-Score of 0.992) using an external validation data, MIT-BIH arrhythmia database. The proposed optimal multi-stage arrhythmia classification approach can dramatically benefit automatic ECG data analysis by providing cardiologist level accuracy and robust compatibility with various ECG data sources.

A 12-lead electrocardiogram database for arrhythmia research covering more than 10,000 patients

Article Open access 12 February 2020

Computer-aided detection of arrhythmogenic sites in post-ischemic ventricular tachycardia

Article Open access 27 April 2023

ECG-based machine-learning algorithms for heartbeat classification

Article Open access 21 September 2021

Introduction

ECGs represent the filtered electrical activity generated by the heart. An ECG from lead II presents a normal heartbeat under sinus rhythm that has a characteristic shape with three features, a P-wave presenting the atrial depolarization process, a QRS complex denoting the ventricular depolarization process, and a T-wave representing the ventricular repolarization. The normal feature sequence of the cardiac cycle is P-wave, QRS complex, and T-wave with sections between them called segments. Three such major segments are the PR, ST, and TP segments. Important periods within and between ECG waves are the PR, QT, and RR intervals.

Damage to the heart muscle or nerves can change the electrical activity of the heart and induce a corresponding change in the shape of the ECGs. Thus, ECG is a major clinical diagnostic tool for various heart abnormalities. Arrhythmias are a family of conditions characterized by aberrations from the normal rate or rhythm of the heartbeats. There are several dozen classes of arrhythmia with various distinct manifestations, excessively slow or fast heartbeats such as sinus bradycardia and atrial tachycardia, irregular rhythm with missing or distorted wave segments and intervals, or both. Arrhythmias have a wide and significant impact on public health, quality of life, and medical expenditures. For example, the common type of arrhythmia, atrial fibrillation (AFIB), is associated with a significant increase in the risk of cardiac dysfunction and stroke. According to the American Heart Association¹, in 2015 AFIB was the underlying cause of death in 23,862 people and was listed on 148,672 US death certificates. The estimates of the prevalence of AFIB in the United States ranged from 2.7 million to 6.1 million in 2010. Further, the AFIB prevalence is expected to rise to 12.1 million in 2030 as the average population age increases. In the European Union, the prevalence of AFIB in adults older than 55 years was estimated to be 8.8 million (95% CI, 6.5 –12.3 million) in 2010 and was projected to rise to 17.9 million in 2060 (95% CI, 13.6 –23.7 million). The weighted prevalence of AFIB in the Chinese population aged 35 years or older was 0.71%².

According to the existing screening and diagnostic practice, cardiologists review ECG data, establish the diagnosis, and begin implementing subsequent treatment plans such as anticoagulation or radiofrequency catheter ablation. However, the demand for high-accuracy automatic heart condition diagnoses has recently increased sharply in parallel with the public health policy of implementing wider screening procedures, and the adoption of ECG enabled wearable devices. Such classification methods have to properly account for the inter-person and intra-personal variability of ECG signals, distortion from noise, missing feature waves and intervals in many arrhythmia cases. A variety of algorithms have been proposed for removing noise from raw ECG data, extracting salient features from the smoothed ECG signals, and feeding them into an optimal classification method.

Some previous studies^3,4,5 have focused on the separation between AFIB and sinus rhythm (SR). These studies achieved a high accuracy of classification rate. Kennedy et al.³ proposed Random Forest (RF) and K Nearest Neighbors (KNN) to classify AFIB and SR by the coefficient of sample entropy (CoSEn), the coefficient of variance (CV), root mean square of the successive differences (RMSSD), and median absolute deviation (MAD). Zhu et al.⁴ suggested using maximum margin clustering with an immune evolutionary algorithm and features of wave and segment measurements for classifying ectopic heartbeats by the database from MIT laboratories at Boston’s Beth Israel Hospital (MIT-BIH). Asgari et al.⁵ proposed to use features of peak-to-average power ratio and log-energy entropy to detect AFIB by support vector machine (SVM) model. A high precision classification of a more extensive set of arrhythmia classes has been achieved with extensive neural network classification⁶. However, a complete comparison of the classification accuracy of multiple analytical algorithms and accompanying noise reduction and feature selection techniques for a large number of arrhythmia classes has not been performed yet.

In this work, we employed several signal noise reduction techniques, proposed a novel ECG feature extraction method, designed and implemented and a large computational comparison study across thousand of competing classification schemes based on new, proprietary, expertly labeled data. According to clinical relevance, 11 rhythms labeled by certified physicians were merged into 4 groups (SB, AFIB, GSVT, SR), SB only included sinus bradycardia, AFIB consisted of atrial fibrillation and atrial flutter (AFL), GSVT contained supraventricular tachycardia, atrial tachycardia, atrioventricular node reentrant tachycardia, atrioventricular reentrant tachycardia and wandering atrial pacemaker, and SR included sinus rhythm and sinus irregularity. These 4 group labels were used for training and testing of our models. The pipeline of the proposed multi-stage scheme is presented in Fig. 1. We utilized the Butterworth Low-pass filter to remove high-frequency noise, the Robust LOESS to eliminate baseline wandering and Non Local Means (NLM) to remove the remaining noise. The features extracted from ECGs included measurements of wave and segments provided by ECG machine and relation measurements among peaks and valleys, producing up to 39,830 features. In order to study the classification reliability of features, we defined 11 distinct feature combinations with respect to the type of features and the lead of the ECGs. This feature combination setting aimed to compare the performance of classification schemes using 12-lead and single-lead ECG data, and to evaluate the classification capacity of different feature combinations. Moreover, aiming to evaluate the additional cardiac conditions impact for the rhythm classification, we separated a small subset without such conditions from the entire dataset. Sequentially, these two datasets, with and without additional cardiac conditions, generated 22 datasets by 11 distinct feature combinations as mentioned above. As a common practice in machine learning, we rescaled the subject’s raw ECG signals to have maximum peak values of 1. Thus, we generated the new 22 datasets by rescaling the original 22 datasets. That allowed us to assess the effect of rescaling on classification accuracy as well. Using these 44 datasets, we carried out an exhaustive grid search spanning the ranges of all tuning hyperparameters for nineteen base classification algorithms and they combined with five optimal strategies such as bagging average, Adaboost, OneVsRest, OneVsOne, and Error-Correcting Output-Codes. The hyperparameters tuning, model fitting and optimal strategy evaluating were deployed on each dataset respectively. Thus, we compared thousands of competing strategies to discover the optimal multi-stage arrhythmia classification routine. The base classification algorithms that we studied were Decision Tree (DT), K Nearest Neighbors (KNN), Nearest Centroid (NC), Gaussian Naive Bayesian (GNB), Multinomial Naive Bayesian (MNB), Complement Naive Bayesian (CNB), Bernoulli Naive Bayesian (BNB), Linear Classifier (LC), Quadratic Discriminant Analysis (QDA), Multinomial Logistic Regression (MLR), Multi-layer Perceptron Neural Net (MPN), Ridge Regression Classifier (RRC), Linear Classifiers with Stochastic Gradient Descent (LCSGD), Passive Aggressive Classifier (PAC), Linear SVC (SVC), Random Forest (RF), Extremely Randomized Trees (ERT), Gradient Boosting Tree (GBT) and Extreme Gradient Boosting Tree (EGBT). Finally, EGBT and GBT models achieved the best classification performance and with details presented in the Results section. A presentation of complete results that include all competing schemes comparisons is shown in Supplementary sections C and D.

Results

We used confusion matrices and normalized confusion matrices to evaluate the performance of classification models and weighted average F₁-Score defined in 1 as criteria for selection of the best hyperparameters and models.

$$Weighted\ average\ {F}_{1}=\frac{\mathop{\sum }\limits_{j=1}^{n}{F}_{1j}\ast {N}_{j}}{\mathop{\sum }\limits_{j=1}^{n}{N}_{j}}$$

(1)

where n presents the number of different labels that will be classified, N_j is the total number of observations with label j and F_1j presents the F₁-Score associated with label j.

$${F}_{1}-Score=\frac{2\ast (precision\ast recall)}{precision+recall}$$

(2)

The F₁-Score, confusion matrix, and normalized confusion matrix presented below are the average results from 10-fold cross-validation with 20% testing data and 80% training data.

Firstly, EGBT model using Feature Group 5 dataset of patients without additional cardiac conditions attained the highest weighted average F₁-Score of 0.988 (shown in Table 1). GBT model using Feature Group 8 dataset of patients with additional cardiac conditions attained the highest weighted average F₁-Score of 0.97 (shown in Table 2). The confusion matrix and normalized confusion matrix for each model were presented in Figs. 2, 3, 4, and 5 respectively. For the dataset of patients without additional cardiac conditions, the average F₁-Score shown in Table 3 of the models using features in group 1 that were provided by the ECG machine is 0.021 lower than that of models using features in group 2 that includes engineered features on lead II. That is, the engineered features proposed by this work had higher classification capacity than the features measured by ECG machine. The full comparison results as mentioned in the introduction section is presented in Supplementary sections D.

Table 1 Report of EGBT with Feature Group 8 dataset of patients without additional cardiac conditions.

Full size table

Table 2 Report of GBT with Feature Group 5 dataset of patients with additional cardiac conditions.

Full size table

Table 3 F₁-Score comparison for different feature groups.

Full size table

Secondly, our results show that the presence of conduction findings such as premature ventricular contraction (PVC), right bundle branch block (RBBB), left bundle branch block (LBBB) and atrial premature contraction (APC) negatively impacted the accuracy of the arrhythmia classification algorithms. In particular, based on the same feature group, the average F₁-Score of the ECG dataset with these conditions was lower than that of datasets without them by 0.017 to 0.034 respectively (shown in Table 3). Furthermore, the multi-classification strategy interacted with the feature groups to provide scenario specific optimal approaches. The best models associated with each feature group are presented in Table 4. Table 4 shows that EGBT and GBT models dominate the highest classifiers for most scenarios.

Table 4 The best classification model list for each feature group.

Full size table

Thirdly, we tested rescaling effects by the best performance classification models and feature groups reported in Tables 1 and 2. The results show that for the dataset of patients with additional cardiac conditions, weighted average F₁-Score of the non-rescaling method is 0.001 lower than that of the rescaling method, while for the dataset of patients with additional cardiac conditions F₁-Score of the non-rescaling method is 0.0016 lower than that of the rescaling method. For each model mentioned in Tables 1 and 2, the confusion matrix and normalized confusion matrix associated with the non-rescaling method are shown in Figs. 6, 7, 8, and 9 respectively. The effect of rescaling the subject’s raw ECG signals to have maximum peak values of 1 has a very small positive effect on the classification accuracy of arrhythmia types. The idea of this rescaling approach is similar to the inclusion of random effects in linear models. Morever, rescaling is a generally recommended preprocessing procedure in nonparametric classification methods such as neural networks and boosting trees.

Lastly, we ascertained the performance advantage of our method consisting of noise reduction methods, feature extraction scheme and Extreme Gradient Boosting Tree classification model to classify normal heart beat and four conduction conditions (shown in Table 5) in the MIT-BIH database⁷. Two RR intervals close to each heartbeat were used to extract features. The approach we proposed attains an F₁-Score of 0.992 that is the weighted average score of 10-fold cross-validation with 10% testing data and 90% training data. Compared with available studies^{8,9,10,11,12,13,14,15,16,17}, the approach we proposed achieved the highest accuracy score by using all the data files in MIT-BIH database.

Table 5 Report for the classification of normal heart beat and four conduction conditions. /: Paced beat; L: Left bundle branch block beat. N: Normal beat; R: Right bundle branch block beat. V: Premature ventricular contraction.

Full size table

Discussion

We designed and implemented a large scale study aimed at finding the best multi-stage arrhythmia classification scheme. We carried out an extensive accuracy comparison among a range of 98 competing methods that are manifested in Supplementary section C. These multi-stage schemes consisted of a sequential application of denoising techniques, feature extraction methods and classification algorithms. We have provided methodological advancements to each of these steps. We propose a novel, three stage denoising method that includes Butterworth Low-pass filter to remove high-frequency noise (above 50 Hz), the Robust LOESS to eliminate baseline wandering and Non Local Means (NLM) to remove residual noise. We designed a novel, robust and optimal feature extraction strategy based on the magnitudes and lengths of peaks and valleys and distributional characteristics of their transformations. In particular, for each pair of peaks or valleys, we assessed the empirical frequency distribution of the ratio between the differences of heights and distances of the time, the ratio between the differences of widths and the distances of the times, as well as the ratio between the differences of the prominences and the distance of the time. The newly obtained features reveal the relationship between attributes of wave and time duration, which is a central key for recognizing possible rhythms. Thus, the feature extraction strategy in this project is more transparent and interpretable than the one that has been obtained via the use of deep neural networks⁶ and other automatic feature extraction methods⁵ where features are uninterpretable. We performed an extensive grid search of the classification method’s hyperparameters. We have shown that the optimal multi-stage classification approach as described above consisted of a three-stage noise reduction process, the new empirical frequency distribution feature extraction strategy, and extreme gradient boosting tree classification model combined with hyperparameters tuned via an exhaustive grid search attains arrhythmia classification accuracy that exceeds the level of professional cardiologists.

Our computational study compared 98 approaches that were trained on a new, expertly labeled high-quality data on 40,258 patients from the Shaoxing People’s Hospital (Shaoxing Hospital Zhejiang University School of Medicine) and Ningbo First Hospital of Zhejiang University. In total, 22 cardiologist and physician experts labeled and reviewed the rhythms and additional cardiac findings. This is a new, large size database of 12-lead ECGs and comprehensive rhythms and conditions labels. Previous related studies^3,4,5,6 were limited in the degree of novelty of the methodological approaches, the size of the samples and the diversity of cardiac conditions considered. We have made our database accessible to the scientific community for further scientific endeavors.

We assessed, for the first time, the additional classification accuracy attributable to analyzing 12-lead ECGs vs single lead ECGs. We have found that the accuracy based on 12-lead data increases the F₁-Score by 1.4%. We have also compared the algorithm’s ECG classification accuracy for patients with and without additional heart conditions. The accuracy decreases by 2% on average for patients with additional cardiac conditions such as PVC, APC, RBBB, and LBBB. The detrimental effect of these conditions on arrhythmia classification precision has not been previously studied^3,4,5,6. For patients without additional cardiac condition, EGBT model that fed by rescaled features extracted from lead II ECGs produced the highest accuracy rate. Given these two results, the approach can have important arrhythmia classification applications to data collected from wearable devices such as Apple watch.

Lastly, we used our method to achieve the highest classification accuracy (average 10-fold cross-validation F₁-Score of 0.992) using an external validation data, MIT-BIH. The proposed optimal multi-stage arrhythmia classification approach can dramatically benefit automatic ECG data analysis by providing cardiologist level accuracy and robust compatibility with various ECG data sources.

Methods

Study design and patients selection

Our novel data consisted of 40,258 12-lead ECGs, including 22,599 males and 17,659 females. The study participants were randomly chosen from over 120,000 subjects who visited the Shaoxing People’s Hospital (Shaoxing Hospital Zhejiang University School of Medicine) and the Ningbo First Hospital of Zhejiang University between 2013 and 2018. The institutional review board of Shaoxing People’s Hospital (Shaoxing Hospital Zhejiang University School of Medicine) and Ningbo First Hospital of Zhejiang University approved this study and granted the waiver of the requirement to obtain informed consent. The data contain 20% normal SR and 80% abnormal readings. The age groups with the highest prevalence were 51–60, 61–70, and 71–80 years representing 19.8%, 24%, and 17.3% respectively.

Each patient’s ECG data were collected over 10 seconds at a sampling rate of 500 Hz and labeled by cardiologist-supervised physicians. The data labels included 11 types of rhythm and 67 additional cardiac findings such as PVC, RBBB, LBBB and APC. A detailed description of the enrolled participants’ baseline characteristics and rhythm frequency distribution is presented in Table 6. Since some rare rhythms only have single unit readings, according to a suggestion from cardiologists, we have hierarchically merged several rare cases to upper-level arrhythmia types. After re-grouping labels of the dataset, this new setting of classes can significantly contribute to the training of the best approach. It also complies with and benefits the daily clinical practice. Thus, 11 rhythms were merged into 4 groups (SB, AFIB, GSVT, SR), SB only included sinus bradycardia, AFIB consisted of atrial fibrillation and atrial flutter (AFL), GSVT contained supraventricular tachycardia, atrial tachycardia, atrioventricular node reentrant tachycardia, atrioventricular reentrant tachycardia and wandering atrial pacemaker, and SR included sinus rhythm and sinus irregularity. Referring to guidelines^18,19,20. that recommend AFIB and AFL often coexist, in the present study any ECG with a rhythm of AFIB or AFL was classified into AFIB group. Merging sinus rhythm and sinus irregularity to SR group helps with distinguishing such a combination from the GSVT group, and sinus irregularity can be easily separated from sinus rhythm later by one single criterion, RR interval variation. Supraventricular tachycardia actually is a general term used in the daily ECG screening. For example, if the cardiologists cannot confirm atrial tachycardia or atrioventricular node reentrant tachycardia purely by ECG, they will give the general name supraventricular tachycardia. Therefore, the practice of merging all tachycardia originating from supraventricular locations to GSVT group was adopted in this work. Figures 10, 11, 12, and 13 depict 12-lead ECGs of randomly selected patients from the SR, AFIB, GSVT and SB groups respectively. The detailed definition of rhythm groups subseuently used for classification and the definition of rhythms labeled by certified physicians are presented in Supplementary section A.

Table 6 Rhythm information and baseline characteristics of the enrolled participants.

Full size table

We also investigated a separate and important issue in ECG analysis, the evaluation of the impact of additional cardiac conditions such as PVC, LBBB, RBBB, or APC on the rhythm classification accuracy. We created a subset of the data containing ECGs of subjects without such conditions consisting of 20,766 samples. Details on the distribution of arrhythmia types in the two datasets and the p-values comparing their prevalence are shown in Table 7. The differences between the sample prevalence of SB, SR, AFIB, and GSVT are statistically significant with magnitudes of 4.4%, 10.3%, −7.3% and −7.4% respectively.

Table 7 Participants with and without additional cardiac conditions.

Full size table

Noise reduction

When the ECG data was collected, the major noise contamination sources were power line interference, electrode contact noise, motion artifacts, skeletal muscle contraction, baseline wandering, and random noise. The baseline wandering could be induced by respiration. The frequency of the power line interference is 50-60 Hz while the frequency of the baseline wander is less than 0.5 Hz. The currently available noise reduction methods have both pros and cons. Adaptive Filter²¹ possesses desirable performance, but the reference signal is hard to get. Wavelet and Band Pass Filter²² need predetermined thresholds, the Morphology Technique²³ with dilation and erosion operation has similar issues. In the subsequent analyses, we implemented the Butterworth Low-pass filter to remove high-frequency noise (above 50 Hz), the Robust LOESS to eliminate baseline wandering and Non Local Means (NLM) to remove the remaining noise²⁴.

Non local means denoising

The NLM algorithm was introduced to address the preservation of repeated structures in digital images²⁵. Later, NLM was used to remove noise from ECG data²⁶ and further combined with the Empirical Mode Decomposition (EMD)²⁷.

NLM denoising reconstructs the true signal S(i) at all time points i through weighted averaging of all points D(j) within predefined range. The weights are determined by a similarity measure between D(i + δ) and D(j + δ), δ ∈ Δ.

$$S(i)=\frac{1}{Z(i)}\sum _{j\in N(i)}w(i,j)D(j)$$

(3)

where Z(i) = ∑_j w(i, j) and

$$w(i,j)=exp(-\frac{{\sum }_{\delta \in \Delta }{[D(i+\delta )-D(j+\delta )]}^{2}}{2{L}_{\Delta }{\lambda }^{2}})$$

(4)

wherein λ is a smoothness control parameter, and Δ represents a local patch of samples containing L_Δ samples. Thus, at each point, the NLM smoothing borrows information from all points that have similar patterns within the search range. The similarity measure determines how many periods will be included and averaged. We used the Gaussian kernel as a weight function in the smoothing step of our analysis. In this work, we included all data points in the patient 10-second ECG data, and set the patch window size to 10 and the smoothness control parameter λ to 1.5 times the estimated standard deviation of the noise σ. The median absolute deviation (MAD) method was used to estimate the variability of the noise.

$$\sigma =1.4826\ast MAD(R)=1.4826\ast median(| D-median(D)| )$$

(5)

where median denotes the median operator, and D = D(1), …, D(l), …, D(L) is the set of the local residuals of the selected homogeneous region of length L in the noisy signal D_n with D(l) = (2D_n(l) − (D_n(l − 1) + D_n(l + 1)))/$\sqrt{6}$. After passing through the above three denoising filters, the high frequency noise and baseline wandering are removed from the raw ECG data.

Features extraction

In previous studies, neural network models have been successfully employed in arrhythmia classification⁶. These models used sequential transformations of the raw data as features that were ultimately fed into a multinomial logistic regression classifier (softmax unit). The architecture of neural networks allows an infinite number of such models, and properly training even one of them requires large data and long computation time. Another common strategy is to extract features such as peak magnitudes, duration, distances between peaks, and their variability in the four major components of beats, P-wave, Q-wave, T-wave, and QRS complex. However, these features do not provide sufficient information for high accuracy classification of several arrhythmia types, especially the ones characterized by distortion or complete omission of some components. For instance, the P-waves of AFIB and AFL are commonly replaced by multiple flutter and fibrillation waves that are lower than a normal P-wave in amplitude and do not correspond to the QRS rhythm. Further, using Wavelet or Fast Fourier Transformation to extract frequency features will neglect the time domain information.

The major challenges of feature extraction are the variability in wave morphology among and within individuals and the distortion from various conditions. Moreover, individuals with different gender, age and race will have different ECGs in both amplitude and frequency. Thus, as a preliminary data manipulation step, we rescaled the ECG data using the maximum-minimum algorithm to unify the amplitude scale. We evaluated the rescaling influence for classification, and the performance of rescaling is discussed in the Results section.

In this project, we designed a novel and interpretable feature extraction method. As a part of our comparison of competing multi-stage classification schemes, we carried out an analysis of feature selection approaches that included a total of 11 distinct scenarios. The first and simplest set of features only included 11 basic characteristics of the signal while the last and most exhaustive set included 39,830 features. We added age and gender as features due to their importance in almost all medical data analyses. Other meaningful features such as the mean and variance of the RR intervals as well as RR interval counts that are only computed in lead II ECG were also included. Table 8 shows other features, the Feature Group 1 includes ventricular rate in beats per minute (BPM), atrial rate in BPM, QRS duration in millisecond, QT interval in millisecond, R axis, T axis, QRS count, Q onset, Q offset, totally 11 variables. The Feature Group 2 in Table 8 includes mean and variance of RR intervals, RR interval count, mean and variance of height, width, and prominence of QRS complex, non-QRS peaks, and valleys in lead II ECG, totally 23 variables. As depicted in Fig. 14, peaks and valleys here represent the local maxima and minima. The prominence of a peak or a valley measures how much the peak or valley stands out due to its intrinsic height and its location relative to neighbor peaks or valleys. Thus, the prominence is defined as the vertical distance between the peak point and its lowest contour line. For instance, the prominence of peak P2 in Fig. 14 is the vertical distance between point P2 and contour line CL02, rather than the distance between P2 and contour line CL01. The peaks and valleys were assigned to 3 subsets, QRS complex, non-QRS peaks, and Valleys. So that the relationship among peaks and valleys were measured on 6 distinct pairwise combinations, which consist of QRS complex Vs QRS complex, non-QRS peaks Vs non-QRS peaks, valleys Vs valleys, QRS complex Vs non-QRS peaks, QRS complex Vs valleys, and non-QRS peaks Vs valleys. Sequentially, for the 6 distinct pairwise combinations mentioned above, we computed the ratio of width difference over time difference, the ratio of height difference over time difference, and the ratio of prominence difference over time difference. However, such ratios cannot be directly used as feature inputs to the classification model, since each patient will have a different number of such ratios. Therefore, we formed an empirical frequency distribution table spanning 100 groups between the maximum value and minimum value of ratios. The same empirical frequency distribution table was constructed for the attributes of peaks and valleys (height, width, and prominence), and the location difference between peaks and valleys in 6 distinct pairwise combinations. For instance, in Figs. 15, 16, 17 the frequencies of each variable (height, width and prominence) that extracted from Lead II can be used as features feeding into a classification model and each variable has uniform 100 length. The full demonstration for feature extraction from 12 leads ECG can be found in Supplementary section B. Thus, the Feature Group 6 is designed for lead II ECG and consists of a total of 900 frequencies of height, width, prominence for QRS complex, non-QRS peaks, and valleys; a total of 600 frequencies of location difference; and a total of 1800 frequencies of the ratio between width difference and time difference, the ratio between height difference and time difference, and the ratio between prominence difference and time difference. The remaining feature groups derive from the features in group 1, 2, and 6. From what has been discussed above, we proposed a feature extraction method that can fully reveal the empirical frequency distribution of P, Q, R, S, T and the segments among them, the key factors to identify rhythms, and the results discussed later testified such strategy is reliable and robust.

Table 8 Feature groups table.

Full size table

Grid search for hyperparameters

We carried out the additional analysis that focused on the optimal selection of hyperparameters via a comprehensive grid search. We selected the optimal values of the hyperparameters based on the maximum average F₁-Score over 10-fold validation datasets. The hyperparameters and the corresponding classification algorithms implemented in the scikit-learn package²⁸ are presented in Table 9.

Table 9 Hyperparameters table.

Full size table

Ensemble classification methods

After completing the identification of the optimal hyperparameters, ensemble machine learning methods based on multiple sampling can be used to improve classification results. We studied two families of ensemble methods: averaging, and boosting. The first method consists of building numerous classifiers that are independently trained on different observed samples, and the individual results are averaged. This approach has the computational advantage of carrying out the independent training steps in parallel. In contrast, the second method builds a set of classification models that will work sequentially. A boosting model i is trained to classify observations. The misclassified samples from model i are added to the training samples for model i + 1. This process continues until a quasi-optimal model with the lowest misclassification probability is obtained. In this work we compared Bagging Average²⁹, Random Forest³⁰, AdaBoost³¹, GBT, EGBT³², and ERT³³.

Multi-classification problems can be decomposed into multiple binomial classification problems. In this study, we compared several strategies for the above decomposition such as, One Vs Rest, One Vs One, and Error-Correcting Output-Codes. After combining ensemble methods and the multi-classification strategies with meta classifiers including DT, KNN, NC, GNB, MNB, CNB, BNB, LC, QDA, MLR, MPN, RRC, LCSGD, PAC, SVC, RF, ERT, GBT and EGBT, 98 different combinations were compared in this study. Using 10-fold cross-validation, we found the best hyperparameters through an exhaustive grid search method. These optimal values attained the highest weighted average F₁-Score for the validation datasets.

Gradient boosting tree classifier

Gradient boosting tree classifier is an additive model that assembles a certain number of weak classifiers such as decision trees. The boosting procedure optimizes a cost function to find the best group of decision trees. Explicit regression gradient boosting algorithms^34,35 were developed by Jerome H. Friedman. Unlike popular stochastic gradient decent optimization, gradient boosting tree classifier needs to learn both best-fit functions and hyperparameters. The boosting tree model is a sum of M decision trees, which can be formulated as following:

$${f}_{M}(x)=\mathop{\sum }\limits_{m=1}^{M}T(x;{\theta }_{m})$$

(6)

where

$$T(x;\theta )=\mathop{\sum }\limits_{j=1}^{J}{\gamma }_{j}I(x\in {R}_{j})$$

(7)

with parameters $\theta ={\{{R}_{j},{\gamma }_{j}\}}_{1}^{J}$. Decision tree partitions the space of all joint predictor values into disjoint regions R_j, j = 1, 2, . . . , J. A constant γ_j is assigned to each such region according to the rule x ∈ R_j → f(x) = γ_j. Therefore, after training data is given, the learning object is to minimize the cost or loss function to find θ_m. Since directly minimizing the loss function L(y_i − f_M(x)) is difficult, it can be approximated in a forward stagewise boosting fashion by minimizing loss function iteratively in (6) at a time.

$$L({y}_{i},{f}_{m-1}({x}_{i})+T({x}_{i};{\theta }_{m}))$$

(8)

where ${\theta }_{m}={\{{R}_{jm},{\gamma }_{jm}\}}_{1}^{{J}_{m}}$.

If the region R_jm are given, finding the optimal constants γ_jm in each region is typically straightforward:

$${\widehat{\gamma }}_{jm}=argmi{n}_{{\gamma }_{jm}}\sum _{{x}_{i}\in {R}_{jm}}L({y}_{i},{f}_{m-1}({x}_{i})+{\gamma }_{jm})$$

(9)

Thus, the key question turns to finding proper regions R_jm and the one approximated solution is to fit the m^th iteration tree function T(x; θ_m) as close as possible to the negative gradient g_im = I(y_i = C_k) − P_k(x) where P_k(x) is the probability of the outcome variable that belongs to the ${K}_{}^{th}$ class C_k.

$${P}_{k}(x)=\frac{{e}^{{f}_{k}(x)}}{{\sum }_{l=1}^{K}{f}_{l}(x)}$$

(10)

Finally, a pseudo gradient boosting algorithm is given as following³⁶:

1.
Start with a constant model f_k0, k = 1, 2, …, K;
2.
For m = 1 to M:
1. 2.1
  For k = 1, 2, …, K:
  1. 2.1.1
    compute r_ikm = y_ik − p_k(x_i), i = 1, 2, …, N;
  2. 2.1.2
    Fit a regression tree to the targets r_ikm, i = 1, 2, …, N, giving terminal regions R_jkm, j = 1, 2, …, J_m;
  3. 2.1.3
    For j = 1, 2, …, J_m compute ${\gamma }_{jkm}=\frac{K-1}{K}\ast \frac{\sum _{{x}_{i}\in {R}_{jkm}}{r}_{ikm}}{\sum _{{x}_{i}\in {R}_{jkm}}| {r}_{ikm}| (1-| {r}_{ikm}| )}$;
  4. 2.1.4
    Update ${f}_{km}(x)={f}_{k,m-1}(x)+\mathop{\sum }\limits_{j=1}^{{J}_{m}}{\gamma }_{jkm}I(x\in {R}_{jkm})$;
3.
Output ${\widehat{f}}_{k}(x)={f}_{kM}(x),k=1,2,\ldots ,K.$;

The GBT and EGBT models used in this work were both tree boosting models but with different numerical implementations. EGBT enhances the boosting optimization by the Newton-Raphson method and provides more hyperparameters for the penalization of trees and the shrinking of the leaf nodes.

Codes availability

The source code of converter tool that converts ECG data file from XML format to CSV format can be found https://github.com/zheng120/ECGConverter, which contains both binary executable files, source code, and the user manual. The MATALB program for ECG denoising was put under https://github.com/zheng120/ECGDenoisingTool.

Data availability

Data records presented in this work consist of four parts: raw ECG data, denoised ECG data, diagnostic file, and attributes dictionary file. These files are available online from figshare (Data Citation 1: Figshare https://doi.org/10.6084/m9.figshare.c.4560497.v1).

References

Emelia, J. & Benjamin, A. E. A. Heart disease and stroke statistics-2018 update: A report from the American Heart Association. Circ. 137, e67–e492 (2018).
Google Scholar
Zengwu, W. et al. The disease burden of atrial fibrillation in china from a national cross-sectional survey. Am. J. Cardiol. 122, 793–798 (2018).
Article Google Scholar
Kennedy Alan, G. D. B. R. M. J. D., Finlay, Dewar & Kieran, M. Finlay Dewar and Kieran, M. Automated detection of atrial fibrillation using r-r intervals and multivariate based classification. J. Electrocardiol. 49 (2016).
BohuiZhu, Y. D. & Hao, K. A novel automatic detection system for ecg arrhythmias using maximum margin clustering with immune evolutionary algorithm. Comp. Math. Methods Medicine 453402 (2013).
AsgariShadnaz, M. A. & Maryam, M. Automatic detection of atrial fibrillation using stationary wavelet transform and support vector machine. Comput. Biol. Med. 60, 132–142 (2015).
Article CAS Google Scholar
A.Y., Hannun et al. Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network. Nat. Medicine 25, 65–69 (2019).
Article Google Scholar
Moody, G. & Mark, R. The impact of the mit-bih arrhythmia database. IEEE Eng in Med and Biol 20, 40–50 (2001).
Article Google Scholar
Yeh, Y.-C., Wang, W.-J. & Chiou, C. W. A novel fuzzy c-means method for classifying heartbeat cases from ecg signals. Meas. 43, 1542–1555 (2010).
Article Google Scholar
Özbay, Y., Ceylan, R. & Karlik, B. Integration of type-2 fuzzy clustering and wavelet transform in a neural network based ecg classifier. Expert. Syst. with Appl. 38, 1004–1010 (2011).
Article Google Scholar
Gothwal, H., Kedawat, S. & Kumar, R. Cardiac arrhythmias detection in an ecg beat signal using fast fourier transform and artificial neural network. J. Biomed. Sci. Eng. 4, 289–296 (2011).
Article Google Scholar
Shen, C.-P. et al. Detection of cardiac arrhythmia in electrocardiograms using adaptive feature extraction and modified support vector machines. Expert. Syst. with Appl. 39, 7845–7852 (2012).
Article Google Scholar
Martis, R. J., Acharya, U. R., Mandana, K., Ray, A. & Chakraborty, C. Application of principal component analysis to ecg signals for automated diagnosis of cardiac health. Expert. Syst. with Appl. 39, 11792–11800 (2012).
Article Google Scholar
Zadeh, A. E., Khazaee, A. & Ranaee, V. Classification of the electrocardiogram signals using supervised classifiers and efficient features. Comput. Methods Programs Biomed. 99, 179–194 (2010).
Article Google Scholar
Faezipour, M. et al. A patient-adaptive profiling scheme for ecg beat classification. IEEE Transactions on Inf. Technol. Biomed. 14, 1153–1165 (2010).
Article Google Scholar
Zidelmal, Z. & Amirou, A. Ould-Abdeslam, D. andMerckle, J. Ecg beat classification using a cost sensitive classifier. Comput. Methods Programs Biomed. 111, 570–577 (2013).
Article CAS Google Scholar
Kiranyaz, S., Ince, T., Pulkkinen, J. & Gabbouj, M. Personalized long-term ecg classification: A systematic approach. Expert. Syst. with Appl. 38, 3220–3226 (2011).
Article Google Scholar
Sayadi, O., Shamsollahi, M. B. & Clifford, G. D. Robust detection of premature ventricular contractions using a wave-based bayesian framework. IEEE Transactions on Biomed. Eng. 57, 353–362 (2010).
Article Google Scholar
January, C. T. et al. 2014 aha/acc/hrs guideline for the management of patients with atrial fibrillation. J. Am. Coll. Cardiol. 64, e1–e76 (2014).
Article Google Scholar
Page, R. L. et al. 2015 acc/aha/hrs guideline for the management of adult patients with supraventricular tachycardia. J. Am. Coll. Cardiol. 67, e27–e115 (2016).
Article Google Scholar
Kirchhof, P. et al. 2016 ESC Guidelines for the management of atrial fibrillation developed in collaboration with EACTS. Eur. Hear. J. 37, 2893–2962 (2016).
Article Google Scholar
Jané, R., Laguna, P., Thakor, N. & Caminal, P. Adaptive baseline wander removal in the ecg: Comparative analysis with cubic spline technique. Proc. Comput. Cardiol. 143–146 (1992).
Lin, H.-Y, S.-Y., Liang, Y.-L., Hob & H.-P., Maa Discrete-wavelet-transform-based noise removal and feature extraction for ecg signals. IRBM 35 (2014).
SunYan, L. C. K. & Shankar, M. K. Ecg signal conditioning by morphological filters. Comput. biology medicine 32, 465–79 (2002).
Article CAS Google Scholar
Rajendra Acharya, U., Jasjit, J. A. S.Suri, S. & Krishnan, S. M. (eds.) Advances in Cardiac Signal Processing (Springer, Berlin, Heidelberg, Berlin, Heidelberg, Germany, 2007).
Buades, A., Coll, C. B. & MOREL, J. M. A review of image denoising algorithms, with a new one. MULTISCALE MODEL. SIMUL 4, 490–530 (2005).
Article MathSciNet Google Scholar
Tracey, B. H. & Miller, E. L. Nonlocal means denoising of ecg signals. IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING 59, 9 (2012).
Article Google Scholar
Tian, X. et al. Electrocardiogram signal denoising using extreme-point symmetric mode decomposition and nonlocal means. Sensors 16, 1584 (2016).
Article Google Scholar
Pedregosa, F. et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
MathSciNet MATH Google Scholar
Breiman, L. Bagging predictors. Mach. Learn. 24, 123–140 (1996).
MATH Google Scholar
Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).
Article Google Scholar
Freund, Y. & Schapire, R. E. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55, 119–139 (1997).
Article MathSciNet Google Scholar
Chen, T. & Guestrin, C. Xgboost: A scalable tree boosting system. CoRR abs/1603.02754 (2016).
Geurts, P., Ernst, D. & Wehenkel, L. Extremely randomized trees. Mach. Learn. 63, 3–42 (2006).
Article Google Scholar
Friedman, J. Greedy function approximation: A gradient boosting machine. The Annals Stat. 29, 1189–1232 (2001).
Article MathSciNet Google Scholar
Friedman, J., Hastie, T. & Tibshirani, R. Additive logistic regression: a statistical view of boosting. Annals Stat. 28, 2000 (1998).
Hastie, T., Tibshirani, R. & Friedman, J. The Elements of Statistical Learning. Springer Series in Statistics (Springer New York Inc., New York, NY, USA, 2001).

Download references

Acknowledgements

This project has received funding from the Kay Family Foundation Data Analytic Grant. This project has received funding from 2018 Shaoxing Medical and Hygiene Research Grant, ID 2018C30070. We are grateful for the support of Shaoxing People’s Hospital (Shaoxing Hospital Zhejiang University School of Medicine) ECG department and Dr. Lingyun Ding from Ningbo First Hospital of Zhejiang University ECG department. Kyle Anderson and Sidy Danioko from Chapman University provided great comments and suggestions for this study. We are grateful for the medical device support from Zhejiang Cachet Jetboom Medical Devices CO.LTD. We received the software engineering support provided by Kelvin Zheng and Terence Wang from Global Customer Support of Schneider Electric Software.

Author information

Authors and Affiliations

Chapman University, Orange, USA
Jianwei Zheng, Daniele Struppa, Hesham El-Askary, Louis Ehwerhemuepha, Alexander Barrett & Cyril Rakovski
Ningbo First Hospital of Zhejiang University, Hangzhou, China
Huimin Chu, Guohua Fu & Dongbo Li
Shaoxing People’s Hospital (Shaoxing Hospital Zhejiang University School of Medicine), Shaoxing, China
Jianming Zhang & Hangyuan Guo
Imperial College London, London, USA
Sir Magdi Yacoub
CHOC Children’s Hospital, Orange, USA
Anthony Chang & Louis Ehwerhemuepha
Loma Linda University Health, Loma Linda, USA
Islam Abudayyeh
Zhejiang Cachet Jetboom Medical Devices CO.LTD, Hangzhou, China
Hai Yao

Authors

Jianwei Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Huimin Chu
View author publications
You can also search for this author in PubMed Google Scholar
Daniele Struppa
View author publications
You can also search for this author in PubMed Google Scholar
Jianming Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Sir Magdi Yacoub
View author publications
You can also search for this author in PubMed Google Scholar
Hesham El-Askary
View author publications
You can also search for this author in PubMed Google Scholar
Anthony Chang
View author publications
You can also search for this author in PubMed Google Scholar
Louis Ehwerhemuepha
View author publications
You can also search for this author in PubMed Google Scholar
Islam Abudayyeh
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Barrett
View author publications
You can also search for this author in PubMed Google Scholar
Guohua Fu
View author publications
You can also search for this author in PubMed Google Scholar
Hai Yao
View author publications
You can also search for this author in PubMed Google Scholar
Dongbo Li
View author publications
You can also search for this author in PubMed Google Scholar
Hangyuan Guo
View author publications
You can also search for this author in PubMed Google Scholar
Cyril Rakovski
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

J.W.Z., G.H., C.R., L.E., I.A., H.E. and D.S. designed and performed experiments, analyzed the data, and wrote the manuscript; H.C., J.Z., M.Y., A.C., I.A., G.F. D.L. and G.H. reviewed the data labels, and provided medical consultant; J.W.Z., A.B., C.R. and H.Y. implemented data analytic coding, and prepared figures; All authors discussed the results and commented on the manuscript.

Corresponding author

Correspondence to Hangyuan Guo.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Zheng, J., Chu, H., Struppa, D. et al. Optimal Multi-Stage Arrhythmia Classification Approach. Sci Rep 10, 2898 (2020). https://doi.org/10.1038/s41598-020-59821-7

Download citation

Received: 31 July 2019
Accepted: 04 February 2020
Published: 19 February 2020
DOI: https://doi.org/10.1038/s41598-020-59821-7

This article is cited by

Reduced lead ECG multi-label classification with higher generalization using 2D SEResnets with self attention
- Danish Sheikh
- Himanshu Verma
- Naveen Chauhan
Multimedia Tools and Applications (2024)
Robust electrocardiogram delineation model for automatic morphological abnormality interpretation
- Siti Nurmaini
- Annisa Darmawahyuni
- Anggun Islami
Scientific Reports (2023)
Detection of bradycardia from electrocardiogram signals using feature extraction and snapshot ensembling
- Subhadeep Sengupta
- Veena Mayya
- S. Sowmya Kamath
International Journal of Information Technology (2022)
Deep learning for single-lead ECG beat arrhythmia-type detection using novel iris spectrogram representation
- Ali Mohammad Alqudah
- Amin Alqudah
Soft Computing (2022)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Introduction

Results

Discussion

Methods

Study design and patients selection

Noise reduction

Non local means denoising

Features extraction

Grid search for hyperparameters

Ensemble classification methods

Gradient boosting tree classifier

Codes availability

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Comments

Search

Quick links