Towards better heartbeat segmentation with deep learning classification

Silva, Pedro; Luz, Eduardo; Silva, Guilherme; Moreira, Gladston; Wanner, Elizabeth; Vidal, Flavio; Menotti, David

doi:10.1038/s41598-020-77745-0

Download PDF

Article
Open access
Published: 26 November 2020

Towards better heartbeat segmentation with deep learning classification

Pedro Silva¹,
Eduardo Luz¹,
Guilherme Silva²,
Gladston Moreira¹,
Elizabeth Wanner³,
Flavio Vidal⁴ &
…
David Menotti⁵

Scientific Reports volume 10, Article number: 20701 (2020) Cite this article

7361 Accesses
19 Citations
3 Altmetric
Metrics details

Subjects

Abstract

The confidence of medical equipment is intimately related to false alarms. The higher the number of false events occurs, the less truthful is the equipment. In this sense, reducing (or suppressing) false positive alarms is hugely desirable. In this work, we propose a feasible and real-time approach that works as a validation method for a heartbeat segmentation third-party algorithm. The approach is based on convolutional neural networks (CNNs), which may be embedded in dedicated hardware. Our proposal aims to detect the pattern of a single heartbeat and classifies them into two classes: a heartbeat and not a heartbeat. For this, a seven-layer convolution network is employed for both data representation and classification. We evaluate our approach in two well-settled databases in the literature on the raw heartbeat signal. The first database is a conventional on-the-person database called MIT-BIH, and the second is one less uncontrolled off-the-person type database known as CYBHi. To evaluate the feasibility and the performance of the proposed approach, we use as a baseline the Pam-Tompkins algorithm, which is a well-known method in the literature and still used in the industry. We compare the baseline against the proposed approach: a CNN model validating the heartbeats detected by a third-party algorithm. In this work, the third-party algorithm is the same as the baseline for comparison purposes. The results support the feasibility of our approach showing that our method can enhance the positive prediction of the Pan-Tompkins algorithm from $97.84\%$/$90.28\%$ to $100.00\%$/$96.77\%$ by slightly decreasing the sensitivity from $95.79\%$/$96.95\%$ to $92.98\%/$ $95.71\%$ on the MIT-BIH/CYBHi databases.

Delineation of the electrocardiogram with a mixed-quality-annotations dataset using convolutional neural networks

Article Open access 13 January 2021

Contactless facial video recording with deep learning models for the detection of atrial fibrillation

Article Open access 07 January 2022

A foundational vision transformer improves diagnostic performance for electrocardiograms

Article Open access 06 June 2023

Introduction

False alarms may pose significant problems in intensive care units (ICU)¹. Excessive false events lead hospital staff to lose confidence in such types of equipment, and thus, alarms from real problems may go undetected. The excessive false alarm problem has been extensively studied in the literature^{1,2,3,4,5,6,7}.

Most of these approaches rely on the electrocardiogram (ECG) signal. The ECG is a vital signal, used to evaluate the electrical activity of the heart^8,9. Its structure is composed of six fiducial points, presented in Fig. 1. Each fiducial point represents an event during the contraction/relaxation of the heart. Due to the information provided by this type of signal, ECG serves as the primary source for calculation of heartbeat rate and also for the detection and classification of cardiovascular diseases, such as arrhythmia¹⁰.

A competition considering the importance of the false alarm rate detection was promoted, the 2015 PhysioNet/CinC Challenge¹². The 2015 PhysioNet/CinC Challenge was focused on five life-threatening arrhythmias. An arrhythmia can be described in two manners^13,14: (i) a sequence of irregular beats, or (ii) as a unique irregular cardiac beat. Whether being harmful or not, the arrhythmia requests attention. Some cases need to be treated immediately, and in others, just a precautionary measure is required. For example, the tachycardia is one kind of life-threatening arrhythmia and needs to be treated immediately. From the works carried during the competition¹², Plesinger et al.³ reported an outstanding score of 81.39% (the used competition score is a different weighted accuracy, i.e., $score = 100\cdot (TP + TN)/(TP + TN + FP + 5 \cdot FN)$). They proposed an approach based on the information of multiple channels, which were filtered by a cut-off frequency between 5 and 20 Hz, besides spectral features and descriptive residue statistics along with heuristic rules to achieve the reported metrics.

Arrhythmia detection is a straightforward application of the ECG signal. Therefore it relies heavily on the quality of the signal and also on the QRS detection algorithm (segmentation). Applications based on the ECG signal are commonly divided into four stages: pre-processing (filtering), ECG signal segmentation (QRS complex detection), signal representation using pattern recognition techniques, and classification algorithms. A failure in the segmentation stage propagates the error to the subsequent stages and directly affects the classification efficiency. Furthermore, the correct segmentation of the ECG signal and the identification of fiducial points are of paramount importance to reduce false alarms. However, many works in the literature^15,16 focus on reducing false alarms in the classification stage, neglecting the error propagated by false alarms in the segmentation stage. Thus the motivation of this work arises: to reduce false alarms in the segmentation stage by using state-of-the-art pattern recognition techniques (a.k.a deep learning). It is important to note that convolutional neural networks (CNNs) have been applied to classify electrocardiogram (ECG) heartbeats in the diagnosis of arrhythmia^17,18,19, which is a underlying subject to the scope of this work.

Several authors have worked on this problem (reducing false alarms during the segmentation stage), and one promising approach is the signal quality assessment, such as in Behar et al.¹⁵. The authors used machine learning to decide whether the signal is of good or bad quality. In according to Behar et al.¹⁵, the ECG signal is manually annotated in two classes (good quality signal and bad quality signal), down-sampled to 125 Hz, and seven quality indexes, which were used as a feature vector to train a support vector machine (SVM) classifier. The experiments were conducted in three databases, the Physionet, from the Computing in Cardiology (CinC) Challenge 2011²⁰, the MIT-BIH arrhythmia database²¹, and the MIMIC II²² database. Behar et al.¹⁵ reported improvements in reducing false alarm for ectopic beats, tachycardia rhythms, atrial fibrillation, and on sinus rhythm.

Other authors employed a multi-modal approach, as in¹⁶, in which multiple ECG leads were used along with the invasive blood pressure wave. Quality indices, which resemble handcrafted feature extraction, were used in conjunction with a Kalman filtering algorithm. The method is evaluated with the MIMIC II²² database, and external noises were artificially added to the signals. Also, in²³, a multi-modal approach is used to combine multiple ECG leads with pulse-oximeter (PPG) and arterial blood pressure (ABP) curves. A peak detection algorithm was proposed for each type of curve and improved by a quality assessment method. According to the authors, results showed a robust peak detection algorithm. The approach was evaluated on the Physionet Challenge 2015 database¹².

In this work, a different approach is proposed based on deep learning techniques. The approach consists of a deep learning model validating the QRS complexes patterns detected by a third-party algorithm. Rather than relying on signal quality or the noise associated with it, we detect the ECG wave pattern, i.e., we detect (or validate) a heartbeat only by its shape. One advantage of this approach is to benefit from hardware accelerators for deep learning. Nowadays, there are many off-the-shelf deep learning accelerators, which means easy and effective integration with real equipment. Besides that, the proposed approach could be constantly improved by means of online learning. As the third-party algorithm, we select the well-known Pan-Tompkins algorithm²⁴, since it is prevalent both in industry and academy. Moreover, it does not require significant computing resources. In summary, the main contributions of this work are:

An efficient method for heartbeat pattern classification that operates in real-time to improve heartbeat segmentation.
A CNN architecture for heartbeat classification.
A proposal of a cyber-physical embedded system for heartbeat segmentation.

This work extends the one presented in the 23rd Iberoamerican Congress on Pattern Recognition (CIARP 2018)²⁵ as follows:

It presents an improved methodology, in particular, regarding the criterion for the selection of negative samples for training the deep learning model.
It presents a more detailed evaluation and includes another challenging database (off-the-person category), i.e., the CYBHi database.
It improved the experimental methodology by combining the CNN model with a popular QRS detection algorithm²⁴.
It adds a proposal to employ our approach in an embedded system context.

The obtained results show the effectiveness of the proposed approach to improve the QRS detection algorithms. Our approach enhances the Pan-Tompkins algorithm²⁴ positive prediction from $97.84$ to $100.00\%$ in the MIT-BIH database and $91.81\%$ to $96.36\%$ in CYBHi. Though, there is a trade-off regarding sensitivity, and once there is a reduction from $95.79$ to $92.98\%$ in the MIT-BIH database and $95.86\%$ to $95.43\%$ in CYBHi.

In that sense, the proposed approach is feasible for real applications, since it allows the reduction of the false positive rate. The computational cost for the CNN inference has become increasingly attractive, since it is possible to embed the model in dedicated hardware, such as the Nvidia Jetson TX2 (available on https://developer.nvidia.com/embedded/jetson-tx2) and Field Programmable Gate Array (FPGA)²⁶, for instance. This scenario facilitates the process of including this approach in Cyber-Physical/embedded systems, which is the case of medical equipment²⁷.

Methods

In this section, we present the methodology used to train CNN for ECG heartbeat recognition. Our method aims to validate the response of a well-known QRS complex detector from the literature. One may treat the QRS complex detector as an R-peak detection or heartbeat detection.

The proposed approach is seen in Fig. 2 and can be divided into six main steps: (1) database split, (2) pre-processing, (3) train CNN, (4) R-peak detector, (5) validation of the R-peaks detected, and (6) evaluation. The database split is the process of separating it into train and test subsets. The pre-processing depends on the nature of the data and consists of dividing the original signal into several segments and apply data augmentation techniques. Step 3 is conducted using the training database to train a CNN. The R-peak detector consists of using some algorithm to detect the R-peak. The validation is given by authenticating whether the signal is a heartbeat (QRS complex) or not. In the last step (step 6), we report the metrics used to compare the algorithms.

Database split and pre-processing

This step aims to divide the database into training and testing partitions. The first one is used exclusively to train a CNN and the latter for the testing phase. This process is necessary to avoid over-fitting and an overestimation of the proposed approach.

The pre-processing stage includes several steps and an adjustment of the input data size. Furthermore, since CNN requires a specific input size, all the segments must have a specific shape and a fixed sample window in time. Then, the input has been standardized to have 300 samples, in a 360 Hz sample frequency signal, resulting in 833 ms length. As a result, for a database sampled in 1MHz, the correspondent samples in 833 ms (833 samples) must be reshaped to 300 samples. Thus any 833 ms/300 samples-length segments are feed-forward into the network without any specific filter pre-processing.

Data augmentation

This step also includes the application of data augmentation techniques for positive and negative samples. CNN benefits from this technique once it increases the amount of data and helps in convergence. Worth highlighting that, since CNN needs extra data only for training, we only apply the data augmentation techniques over the training sets.

According to Silva et al.²⁵, to construct the positive samples, simple data augmentation is applied by considering the centralized R-peak and heartbeat signal shifted by exactly $\pm 5$ samples. For the negative samples, the heartbeat is shifted by exactly $\pm 30$, $\pm 50$, $\pm 80$, and $\pm 120$. A similar scenario presented in²⁵ is considered: binary classification between segments with a heartbeat (positive samples) and without (negative samples). However, in this work, a different data augmentation approach is used to feed the deep learning model and this model applied with a different purpose. For the positive samples we use:

1.
Centralized R-peak.
2.
Shifted R-peak by ± 5 samples.
3.
Shifted R-peak by ± 10 samples.
4.
Shifted R-peak by ± 15 samples.
5.
Centralized R-peak with P-wave (375 ms before the R-peak) attenuated by 30%.
6.
Centralized R-peak with T-wave (375 ms after the R-peak) attenuated by 30%.
7.
Centralized R-peak with a reduction of 20% over the entire segment.
8.
Centralized R-peak with a reduction of 40% over the entire segment.

For the negative samples, all data between two R-peaks have used: 50 samples after the first R-peak and 50 samples before the second R-peak. This range marks the beginning and ending points of the sliding window, which is shifted by five-step stride (there is an overlapping among the samples within the two R-peaks). Figure 3 illustrates the data augmentation applied to the positive samples (sliding window, and wave manipulation) and Fig. 4 illustrates the construction of the negative ones. Since the QRS complex is the wave with the greatest amplitude within a heartbeat, it is less susceptible to noise. In contrast, the T and P waves have smaller amplitudes and usually a longer period of time and thus are more affected by all sources of noise. Thus, we propose a data augmentation attenuating the T and P waves, in order to force the model to be more immune to changes in the patterns of these waves.

R-peak detector

In this step, a third-party algorithm is used to detect the R-peak along with the segment. Essentially, the QRS-detector method in this stage should be fast and have low computational power consumption. This stage is an essential step for our approach, once the amount of R-peak segments detected impacts on the time required by our approach to finish the process. For each R-peak detected, the CNN trained is used to infer if it is a real heartbeat or not.

The process starts with the ECG signal as the input for the R-peak detector. Then, this ECG signal is processed by the algorithm. At this point, each R-peak detector may apply a specific pre-processing that best fits its needs. The response of the method is the sample with an R-peak location. Some methods, such as the Pan-Tompkins²⁴, may return a delay, which gives a range in where each R-peak may be.

In²⁴, the authors designed the method using integer arithmetic, aiming a reduction in the computing consumption power to be as lowest as possible. A digital band-pass filter is applied by composing high and low-pass filters to reduce the impact of the noise over the signal, followed by one differentiation step and further a squaring step to intensify the slope and reduce the false-positives caused by the T waves. To detect the R-peak, Pan and Tompkins²⁴ applied a sliding window along with an adaptive threshold, which results in an efficient and robust approach to discard noises. Therefore, it reduces the false-positive samples. To reduce the false-negative samples, those authors used a scheme with a dual-threshold, in which one is twice smaller than the other, and both have a continuous adaptation according to the current signal state.

Pan and Tompkins²⁴ outlined a strategy based on the periodicity of the R-peaks on an ECG record. For the case in which an R-peak is not found within 166% of the current average interval, the maximal point in this interval, which lies between two thresholds, is considered as an R-peak, and as a consequence, a heartbeat or QRS complex. The authors highlight that this technique is only feasible for individuals with a regular heartbeat (without arrhythmia). For arrhythmic individuals, the authors proposed a reduction of both thresholds by half, to raise the sensitivity of R-Peak detection.

The authors also added two essential constraints regarding R-peak detection: (1) the next R-peak must occur at least 200 ms at a physiological point of view, and (2) an R-peak detection approach needs to adapt parameters to each patient continuously.

CNN training

The CNN model/architecture used here is the same used in our preliminary work²⁵. It is composed of four convolutional layers, two fully-connected layers, a dropout layer to reduce over-fitting, and a final fully-connected layer with two neurons for binary classification: (1) this segment has an R-peak centered in the segment, and (2) segment without R-peak centered. Figure 5 shows such CNN architecture.

Different from our previous work²⁵, in this work, the deep learning model is used as a second judge for a well-known R-peak detector algorithm. The present approach aims to enhance the result from the R-peak detection algorithm, aligned with state-of-the-art trends²⁸.

Beforehand, to train CNN, a set of data is separated and labeled, usually by a human expert. This data is then used to generate positive and negative samples. Those samples are used to train the CNN as a simple binary classification problem: the output is a heartbeat or no heartbeat.

Validation of R-peaks detected

In this step, any algorithm presented in the literature which aims an R-peak detection can be used. However, this step needs three inputs: an ECG signal, the R-peaks location detected by an R-peak detector algorithm, and a machine learning model. The output of this step is a set of all R-peaks locations in which the machine learning model agrees with the R-peak detector.

In this step, an 833-ms window centered in each R-peak detected is feed-forwarded through the CNN. The CNN confirms whether it is an R-peak in the center of the segment or not.

Evaluation

To evaluate one database, a set of data is reserved as a testing partition. With the R-peaks validated by the machine learning model, the metrics used to compare the approaches are calculated. We compare the right and wrong detections of both approaches: the third-party algorithm by itself and the proposed approach, with the CNN as a validator. A correct heartbeat detection is considered when an R-peak is within the center of a segment with a tolerance of the shifts used in data augmentation described. A wrong detection occurs when an R-peak is not in this range.

Results

In this section, the experiments are described in detail. Also, the results reached with the proposed approach are presented as well as the discussion.

Experiment details

Database

To report the results presented in this section, we used two databases to train a CNN model: CYBHi¹¹ (off-the-person) and MIT-BIH²⁹ (on-the-person). To conduct fair experiments, we split both databases into two sets without patient intersection.

As the MIT-BIH database has a group of signals with healthy individuals and another group with individuals who have cardiac problems (arrhythmia), we decided to use only the first group to avoid impacts on the R-peak detectors algorithms and, therefore, in the final metrics reported. The healthy group has a total of 23 records, and each record received a numerical identification on the dataset. The records are: 100, 102, 104, 106, 108, 112, 114, 116, 118, 122, 124, 101, 103, 105, 107, 109, 111, 113, 115, 117, 119, 121, 123. All heartbeats (approximately 110,000) available with the MIT-BIH database have its R-peak location annotated by two cardiologists in a separate manner and disagreements were resolved by a third person. The annotations are available on the Physionet website. As CNN model benefits of more data, we decided to use the odd records to train and even for testing. We stress that each record belongs to a single subject and that there is no overlap of subjects on both training and test sets.

The CYBHi has more registers when compared to the MIT-BIH database with 126 records. As the database is captured with an off-the-person device, it suffers more with noise. The data acquisition is made using two differential lead electrodes at hand palms and fingers, as shown in Fig. 6.

We have discarded 12 records from the CYBHi because our specialists weren’t able to detect the heartbeats due to excess of noise. Therefore, we have no ground truth to train the model. Figure 7 presents a 10-s-segment which should have approximately 10 R-peaks, however, it is hard to detect them and, subsequently, label them. The remaining 114 records are used for training and testing. For the construction of the CYBHi database, the data acquisition happens in two distinct 2-min sessions with 63 subjects, into the range of 3 months.

For each one of 63 subjects, two sessions were acquired in two different setups: Short-term signals and Long-term signals. Only the latest one is used in this work since it is a more challenging scenario³⁰. The CYBHi database’s authors did not provide the R-peak location annotation. Thus, this annotation was made by the researchers of this work and will be provided along with the source code. According to Luz and Menotti³¹, the data from a patient must be on the training set or in testing, not both. Upon this fact, we ignored the natural division of CYBHi database and used both sessions of an individual only to train the CNN or to test. We selected half of the subjects to training and half for the test set, randomly, and for reproducibility, the records are made available at https://github.com/ufopcsilab/qrs-better-heartbeat-segmentation.

Resulting data augmentation

Table 1 presents the total of training samples with and without Data Augmentation (see detail on how data augmentation is performed in “Methods” section). As one may see, the number of negative samples (No R-peak) is the same independently whether the Data Augmentation is used or not. The main difference is the number of positive samples (R-peak), which turns possible to train a CNN model.

Table 1 Total training samples with the presence or absence of the Data Augmentation used.

Full size table

To train the models, we allocate 70% of a register’s data (data of one individual) for the training partition and the remaining 30% to the validation partition which is used only for network optimization. Thus, from the total number of images presented in Table 1, 70% is used for training and 30% for validation. The list of records selected for both training (train/validation) and test partitions, for both databases, are available at https://github.com/ufopcsilab/qrs-better-heartbeat-segmentation.