Introduction

Heart failure (HF) is a prevalent cardiovascular condition and a considerable public health problem1,2. Half of the patients with HF have systolic cardiac contractile dysfunction, which is usually confirmed by ejection fraction (EF) measured with echocardiography3. Although the echocardiographic diagnosis of systolic HF is precise, this modality is relatively time-consuming and costly in comparison with electrocardiography4. By contrast, a 12-lead electrocardiogram (12-lead ECG) is a convenient and inexpensive tool that provides comprehensive information on cardiac electrical dynamics. Consequently, its use is highly desirable for the early screening of suspected systolic HF.

In recent years, artificial intelligence (AI) has been widely used in the medical field5,6. Numerous AI ECG-based feature detection approaches have been employed, such as the use of artificial neural networks4, image classification–based convolutional neural networks (CNNs)7, time-relative recurrent neural networks8, and the unsupervised method9. Along with the rapid development of AI, studies on using ECG signals to prescreen for specific diseases based on AI algorithms are evolving rapidly. Crucial research on systolic HF combine 12-lead ECG for classification was markedly progressed. However, a single ECG lead or combination of leads primarily drove accuracy of EF discrimination is unknown. In this study, we applied the continuous wavelet transform (CWT) to convert the 1D-ECG signals to 2D spectra for 2D-CNN classification. The contribution of individual ECG leads to the classification result was evaluated, and a comprehensive scoring method was designed to improve outcomes.

Methods

Dataset

This study was approved by the Research Ethics Committee for Human Subject Protection, National Taiwan University Hospital, Hsinchu Branch, Hsinchu, Taiwan (IRB number:108-073-E), and each person enrolled were all over 18 years old and gave written informed consent to participate. All authors confirm that all the experiments were performed in accordance with relevant guidelines and regulations. Two datasets were used in this study. One included 1090 systolic HF patients with an EF of < 50%. The other included 10,000 individuals with an EF of > 50% and without HF symptoms. The EF was measured by echocardiography performed by cardiologists, and 12-lead ECG data were acquired at clinics or during hospitalization. Both datasets were provided by National Taiwan University Hospital, Hsinchu and Biomedical Park Branch. The 12-lead ECG data of all participants were obtained within one week after echocardiography identified their left ventricular EF (LVEF) greater than 50% or not. Each 12-lead ECG recording was from a single participant, without duplication.

Patient selection

The patient selection process is presented in Fig. 1. Among the 1090 patients with reduced EF, 12-lead ECG data with excessive noise were excluded from this research. ECG data with excessive noise was attributed to interferences from baseline wander, power line interference, electromyography noise, and R peak detection error. Examples of noise illustrations were shown in Fig. 2. Excessive noise in the ECG signal resulted in a splitting error. Splitting errors may generate an atypical waveform map, which could mislead our model for finding EF features. Thus, such ECG signals were excluded from our study. The remaining 12-lead ECG data for 900 patients with systolic HF was used as the patient training dataset. The corresponding 900 age-matched and EF-normal individuals were selected from the dataset with 10,000 individuals from health examination. Information for the patients with systolic HF and the individuals without HF are presented in Table 1. A total of 214 individuals were excluded due to having ECG data with signal splitting errors. A total of 186 testing data were randomly selected from the remaining 772 patients with systolic HF and 814 individuals without HF. The data of the systolic HF patients and the individuals without HF (total 1400) were randomly separated into two groups: 90% of data were used for training (n = 1260) and 10% of data were used for validation (n = 140). All patient’s original EF values were measured with echocardiograph. The systolic HF patient (EF < 50%) and individuals without systolic HF (EF > 50%) were divided in two classes, and compared to the AI prediction class.

Figure 1
figure 1

Data selection flowchart. A total of 900 patients with systolic HF were included in the study. For comparison, 900 age-matched individuals without systolic HF were included in the research. After ECG preprocessing, noisy data and data with ECG splitting errors were excluded. The remaining 700 patients with systolic HF and the corresponding individuals without HF were combined into one dataset. These data were then separated into groups of 1260 for training data, 140 for validation data, and 186 for testing data.

Figure 2
figure 2

Signals with excessive ECG noise, such as R peak detect error, electromyogram noise, baseline wandering and power line artifact, were excluded from our study. These ECG signals are severely affected by noise and therefore cannot be processed to obtain the correct ECG compound. They were excluded from our research to avoid misleading the neural network model during training.

Table 1 Information on individuals with and without systolic HF.

Electrocardiogram extraction

The flow chart for the whole experiment, including ECG extraction, CWT, and 2D-CNN classification, is depicted in Fig. 3. Because the 12-lead ECG data were recorded as a JPG image, the ECG signals had to be extracted from the image. The extraction procedure involved processing the JPG image through image binarization and signal extraction to obtain pure ECG signals. Then, the ECG image was cut vertically into four parts, followed by searching for black pixels on each of three vertical line to reconstruct the original ECG signal. The reconstructed ECG signal was then normalized and calibrated. Each ECG line was cut between two R peaks to obtain three small segments, and the middle segment was selected as the single-beat ECG compound. This procedure generated 12 single-lead ECG compounds for further processing. The details of the ECG extraction process are demonstrated in Fig. 4.

Figure 3
figure 3

Illustration of the research process employed in this study. The first step was to extract the ECG signal from the JPG images. The second step was to transform each single-lead ECG signal into CWT spectra. In the final step, the spectra were trained separately in 12 models for the 12-leads, and the softmax layer output scores (ranging from 0 to 1) were recorded and applied for the comprehensive scoring method. Four comprehensive scoring methods were considered, including one where equal weights were given to the 12-leads and the key leads close to the left ventricle (Leads I, V5, and V6).

Figure 4
figure 4

12-lead ECG signal preprocessing. Details are provided in seven steps. Before converting the 12-lead ECG images into the signal, it was necessary to confirm the target area of the ECG signal and then crop the region containing the ECG image and transform the RGB image into a grayscale image. Image binarization could convert the color of the signal and background into black or white. Then, the pixel bits of each lead signal were regarded as coordinates: X, regarded as time, and Y, regarded as relative size, were independently accessed. The squares on the 12-lead ECG images were used as the basis for determining the length of time required to reconstruct the original signal. Finally, the interpolated signal was used to compare the relative position of the pixel to the time required to reconstruct the electrical signal and the signal’s size.

Continuous wavelet transform

The 12-lead ECG signals were transformed by CWT to 2D spectra. Wavelet transform can be used to analyze time series in different frequencies that contain nonstationary power. In this research, Daubechies CWT (db8) was used to transform the ECG signals because it has a favorable balance between time and frequency localization. The CWT and Daubechies wavelet formulas are shown in Eqs. (1) and (2).

CWT equation:

$$ {\text{CWT}}_{{\text{x}}}^{\uppsi } \left( {\tau ,s} \right) = \frac{1}{\sqrt s }\int x \left( t \right)\uppsi ^{*} \left( {\frac{t - \tau }{s}} \right){\text{d}}t $$
(1)

Daubechies equation:

$$ {\text{H}}_{\emptyset } \left( {{\text{e}}^{{{\text{j}}\upomega }} } \right) = \sqrt 2 \left( {\frac{{1 + {\text{e}}^{{ - j\upomega }} }}{2}} \right)^{p} R\left( {{\text{e}}^{{{\text{j}}\upomega }} } \right) $$
(2)

In the CWT Eq. (1), τand s correspond to the translation and scale parameters, respectively. ψ(t) is the transforming function, which also represents the mother wavelet. Matlab CWT toolbox was employed in the CWTs. In the Daubechies Eq. (2), \(p\) represents a vanishing moment10. The Matlab wavelet toolbox was used for the implementation of these two equations.

CNN structure

The neural network programming was based on the Python and Keras application programming interfaces. The 2D-CNN structure was modified from the Visual Geometry Group (VGG) network5 for the 12-lead ECG CWT spectra classification. The 12-lead ECG CWT spectra were first resized to 200 × 200 × 3 pixels and then passed to the 2D-CNN as inputs. A 14-layer 2D-CNN was constructed with 6 convolution layers, 6 max-pooling layers, 1 flatten layer and 1 dense output layer with softmax function. The rectified linear unit, batch normalization, and dropout functions were used after each convolution layer was applied. Binary cross-entropy was defined as the loss function. An Adam optimizer was employed as the learning guide for 2D-CNN learning, and its learning rate was set as 10–4. For detailed 2D-CNN structure and hyperparameter information, please refer to Tables S1 and S2. In this research, the 12-lead ECG spectra were separately passed to 12 identical 2D-CNNs.

Comprehensive 12-lead ECG scoring

The comprehensive 12-lead ECG scoring method is based on the 2D-CNN output layers with a softmax formula, which is displayed in Eq. (3).

$$ {\text{S}}\left( {{\text{yi}}} \right) = \frac{{{\text{e}}^{{{\text{yi}}}} }}{{\sum\limits_{j} {e^{{y_{j} }} } }}\quad for\;i \, = \, 1, \ldots j $$
(3)

The logits vector from the 2D-CNN flatten layer proceeds through the softmax layer, the output class probability score of i(yi), and the class probability score summation [\(\sum\nolimits_{j} {e^{{y_{j} }} }\) (for j from 0 to 1)]6. In our research, the 12-lead ECG CWT spectra were separately passed to 12 2D-CNNs, which generated 12 probability scores in the individual without HF class. The 12 probability scores were employed in our comprehensive 12-lead ECG scoring method.

This method integrates 12 scores into one key diagnostic index for detecting systolic HF. V5, and V6 are the three ECG leads physically closest to the left ventricle, and they may have more relevance to EF detection than other ECG leads. Also, the Leads I result shows higher accuracy (82%) than Lead II and Lead III, so this lead had been considered in our comprehensive method. The scores from these three leads were selected in our comprehensive scoring method. The four types of scoring method were designed to obtain four diagnostic indices. The first one is the average value of the 12-lead output score, named “12-lead with equal weighting;” the second index is the average of three crucial lead scores, Leads I, V5, and V6, named “Lead I, V5, and V6”; the third index is the average value of Lead I and V6 scores, named “Lead I and V6;” and the fourth index is the average value of the V5 and V6 scores, named “V5 and V6.” The 12 output prediction scores from 12 neural network softmax layers were summed and used the cutoff value of 0.5. If the summation score is greater than or equal to 0.5, the individual EF is in the normal range ( 50%). By contrast, a summation score below 0.5 indicates that the individual has low EF (< 50%).

Statistical method

Descriptive continuous data were presented as mean ± standard deviation if normal distributed or otherwise as median/IQR. Ejection fraction measured by echocardiography were compared with comprehensive scoring and 12-lead 2D-CNN scoring predictions using accuracy, sensitivity, specificity, and f1 score formulas. These formulas can be used to evaluate the predictive capability of the 12-lead ECG, 2D-CNN model as well as the true positive (TP), true negative (TN), false positive (FP), and false negative (FN) rates11. These formulas are written as follows: accuracy = (TP + TN)/(TP + TN + FP + FN), sensitivity = TP/ (TP + FN), specificity = TP/(TP + FN), precision = TP/(TN + FP), recall = TP/(TP + FN) and f1 score = (2 × precision × recall)/(precision + recall).

The receiver operating characteristic curve (ROC) was used as an evaluation method in this study. The ROC curve is a common analysis method for evaluating deep learning models. Using ROC curves, the graphical display of true positives (as the y-axis) versus false positives (as the x-axis) can be observed and compared directly. The area under the ROC curve (AUC) represents the equivalent of the probability when randomly selecting a sample. The classifier ranks a randomly chosen positive sample higher than a randomly chosen negative sample. The AUC value ranges between 0 and 1, and if the AUC value is in the range of 0.5 < AUC < 1, it means the classifier has more effective predicting ability than random guesses12.

Results

Systolic HF prediction results

The 12-lead ECG data of the 900 HF patients and the 900 individuals without HF were transformed into CWT spectra. The baseline data of individuals with and without systolic HF are listed in Table 1. Furthermore, Table 2 presents the characteristics of the training, validation, and testing sets. In the testing dataset (n = 186), the mean LVEF was 32.6 ± 3.4%, and 73 (39.2%) patients had myocardial infarction, 82 (44.1%) had hypertension, and 69 (37.1) had diabetes. Those values were similar to those in the training dataset (LVEF = 32.3 ± 4.6, p = 0.25; myocardial infarction 36.7%, p = 0.16; hypertension 48.5%, p = 0.09, and diabetes 37.0%, p = 0.11). Figure 5 illustrates the original ECG data from the JPG image and the CWT spectra. The CWT spectra can concentrate the unobvious ECG linear features into 2D image, which can enhance specific features of HF for machine learning classification. Also, the tenfold cross validation had been applied to our model, and demonstrated that the V6 had the highest average accuracy of 89.07% (Table S3).

Table 2 Characteristics of the training, validation, and testing sets.
Figure 5
figure 5

Illustration for the CWT of the 12-lead ECG. The original image of the ECG JPG file and CWT spectra were showed.

Systolic HF prediction results for individual leads

A total of 1400 ECG training data and 186 ECG testing data were used in this study. The accuracy, sensitivity, specificity, and f1 score of the test dataset are revealed in Table 3. The ECG results for individual ECG leads were favorable for the classification of patients with systolic HF. Each lead had accuracy ranging from 0.71 to 0.93. In particular, lead V6 exhibited the highest accuracy (0.93), specificity (0.97), and f1 score (0.94). The full results are presented in Table 3A.

Table 3 2D-CNN classification results. (A) reveals the results of the 12-lead ECG scoring methods. (B) presents the results of four comprehensive 12-lead ECG scoring methods.

ROC analysis and comprehensive scoring results

The ROC curve of our 12-lead ECG (with individual lead results) and comprehensive scoring results are detailed in Fig. 6. In Fig. 6A, the individual lead 2D-CNN models reveal AUC values between 0.76 and 0.96. Figure 6B reveals that the four AUC values for comprehensive scoring are between 0.96 and 0.98; these are higher than for any individual leads. The purpose of the comprehensive 12-lead ECG scoring method is to obtain one precise diagnostic index for systolic HF classification from 12 CNN models. Thus, the four comprehensive scoring methods were designed. The comprehensive scoring method’s accuracy, sensitivity, specificity, and f1 score are illustrated in Table 3B. The four comprehensive scoring method results include the “12-leads with equal weighting,” “Lead I, V5, and V6,” “Lead I and V6,” and “Lead V5 and V6.” In Table 3B, Lead V5 and V6 reveals the highest accuracy of 0.94, sensitivity of 0.97, specificity of 0.89, and an f1 score of 0.94. The four comprehensive scoring methods all showed higher accuracy, sensitivity, specificity, and f1 score than the average 12-lead ECG results.

Figure 6
figure 6

The ROC curves were presented. (A) presents a comparison of the 12-lead results for each lead and (B) presents the comprehensive scoring method results.

Discussion

A pre-screening systolic HF was established in this study. However, prospective testing of this method is still needed. The novel and comprehensive 12-lead ECG scoring method can achieve higher classification performance than individual leads. The individual leads had an average accuracy, sensitivity, and specificity of 0.75 and an f1 score of 0.83. However, the accuracy (0.94), specificity (0.97), sensitivity (0.89), and f1 score (0.94) all improved when the comprehensive 12-lead ECG scoring method was used.

In the individual lead results, V6 had higher accuracy than the other 11 leads. The screening results revealed that V6 was the representative lead of the 12-leads for pre-screening patients for systolic HF. This might be because V6 is the lead physically closest to the left ventricle. In comprehensive scoring, the four methods all had high classification capability. Among them, V5 and V6 had better classification capability than the other comprehensive scoring methods. In the ROC curve, the comprehensive scoring method exhibited significantly improved AUC compared with the AUC of the 12 individual leads. Therefore, V6 single lead and the comprehensive scoring method can be highly effective for screening patients for systolic HF.

In previous EF prediction research, many studies used ECG features and physiological parameters to predict EF13. However, by restricting the ECG features in the method, researchers may have ignored other important features. A previous study applied AI for predicting EF using the entire 12-lead ECG signal as a matrix and feeding into 2D-CNN for EF prediction14. But the crucial ECG leads were not identified.

In many ECG AI studies, researchers used 1D-CNN to classify ECG signals, such as for atrial fibrillation classification4,15. We compare 1D and 2D-CNNs in this study by establishing three simple 1D-CNN models and applied the 1D-CNN structure of the two papers on our dataset16,17 (Tables S4S7). According to the above comparison, our 2D-CNN shows the highest accuracy in predicting systolic heart failure compared to the other three 1D-CNN models. Although such a result may be limited by the amount of training data, it still reveals that the 2D-CNN model may perform better than the 1D-CNN models when classifying using a single ECG compound.

Another study applied ECG CWT in a 2D-CNN for atrial arrhythmia detection but also focused on only one lead feature18. These studies have revealed that ECG CWT is a powerful tool for identifying abnormal ECG features. In our research, all ECG leads were used to expand all ECG features to the 2D-CNN to enhance the systolic HF classification capability. A key lead, useful for pre-screening patients for systolic HF, was identified by separately training the 12-leads in identical 2D-CNN models. The key lead cannot be identified without training the 12-leads concurrently.

Several ECG features have been used to assess the LV function, and the presence of prolonged QRS duration is a strong marker for diminished LV systolic function19. Our study also supported these findings. Under our algorithm, the widening of QRS and lower QRS amplitude imply the high probability of poor LVEF. As shown in supplemental Fig. 1, we also found other specific features that suggested poor contractility, including p wave amplitude, T amplitude and ST interval, which were also the possible indicators for poor LV contractility20,21,22. Our AI-assisted algorithm could combine those features and assess the possibility of LVEF of < 50% with good accuracy. On the other hand, our algorithm aimed to screen patients with reduced LVEF (LVEF < 50%) by using only 12-lead ECG, which is low-cost and easily-feasible. In many rural areas and developing countries, the difficult access to cardiologic care and imaging could cause under-diagnosis and treatment for heart failure. Our algorithm by converting 12-lead ECG to 2D images rather than raw data provides a portable, inexpensive test for ventricular systolic dysfunction. The early diagnosis of left ventricular dysfunction could permit early institution of effective therapies, such as beta-blockers, angiotensin receptor antagonists and implantable devices. Along with the smartphone-enable electrodes, the single-lead ECG could be acquired by using mobile applications. Our algorithm could also be incorporated into those applications to assess the ventricular function.

Study limitation

The limitations of this research are deficiencies for patients with systolic HF and noise in all ECG data. For future studies to further improve screening accuracy, it is imperative to collect training data from hospitals to enhance the dataset continually. The new data can be used to improve our model’s performance. Furthermore, ECG signals should be kept clear and noise-free to prevent ECG slicing issue. Also, in this study, the prevalence of disease in our study cohort does not reflect prevalence in the general population. Thus, further research is needed to assess the utility of the given cutoffs in a general, ostensibly healthy population. At last, our dataset excluded patients with heart failure symptoms but with normal left ventricular systolic function. Therefore, heart failure patients with preserved LVEF may not be identified by using our algorithm.

Conclusion

In this research, we revealed that ECG CWT spectra can expand all ECG features for 2D-CNN classification. With the comprehensive 12-lead ECG scoring method, systolic HF screening obtained an accuracy of over 0.94 in Lead V5 and V6 and an AUC of 0.98 in Lead I and V6. In addition, we found that the V6 lead is vital for detecting systolic HF. Overall, this study provided an effective and accurate screening method for predicting cardiac contractile dysfunction using 12-lead ECG images.