ECG-based machine-learning algorithms for heartbeat classification

Electrocardiogram (ECG) signals represent the electrical activity of the human hearts and consist of several waveforms (P, QRS, and T). The duration and shape of each waveform and the distances between different peaks are used to diagnose heart diseases. In this work, to better analyze ECG signals, a new algorithm that exploits two-event related moving-averages (TERMA) and fractional-Fourier-transform (FrFT) algorithms is proposed. The TERMA algorithm specifies certain areas of interest to locate desired peak, while the FrFT rotates ECG signals in the time-frequency plane to manifest the locations of various peaks. The proposed algorithm’s performance outperforms state-of-the-art algorithms. Moreover, to automatically classify heart disease, estimated peaks, durations between different peaks, and other ECG signal features were used to train a machine-learning model. Most of the available studies uses the MIT-BIH database (only 48 patients). However, in this work, the recently reported Shaoxing People’s Hospital (SPH) database, which consists of more than 10,000 patients, was used to train the proposed machine-learning model, which is more realistic for classification. The cross-database training and testing with promising results is the uniqueness of our proposed machine-learning model.

www.nature.com/scientificreports/ samples before and after the detected R peaks, including the R peak samples, are set to zero depending on the RR interval. This algorithm provides acceptable results with regard to peak detection. In this paper, to address the drawbacks of the above mentioned algorithms, based on the fusion of TERMA and fractional Fourier-transform (FrFT), we propose an algorithm that can produce better results. TERMA is used in economics to detect different events in trading, and moving averages are helpful in detecting the signals that contain specific events. Thus, these averages can also be used in ECG signals , which contain events such as P, QRS complex, and T waves. These waves repeat themselves after certain time intervals. Likewise, time-frequency analyses are relevant due to the large variations in P, QRS complex, and T waves. In this paper, we demonstrate how moving averages and time-frequency analyses can be exploited for the detection of these waves. Further, we showed that the proposed algorithm in this paper, has a significantly better performance than the existing algorithms.
Our second objective is to classify the CVD of a given ECG signal, if any. Classification involves two steps: feature extraction and classifier model selection. Many researchers have worked on the classification of ECG signals using the MIT-BIH arrhythmia database. Different preprocessing techniques, feature extraction methods, and classifiers have been used in previous studies and some of them are discussed in this paper. In 14 features such as the R peak and RR interval were extracted using discrete-wavelet-transform (DWT), and multi-layer perceptron (MLP) was used in ECG classification. The obtained accuracy was 99.9% but a total number of 301 features were used for classification. Similarly, in 15 , the R peak location and RR interval were extracted using db4 DWT, and to classify ECG signals, a feed-forward neural-network (FFNN) was trained with backpropagation. The sensitivity, specificity, and accuracy achieved by FFNN were 90%, 90% , and 95% respectively. In [16][17][18][19][20] different classifiers such as Naive Bayes, Adaboost, support vector machines (SVM) and neural networks were used in classification.
Our contribution. Our contributions are as follows: • Our proposed FrFT-based algorithm exploits FrFT for the detection of P, QRS, and T waveform peaks. Additionally, it is simple and less complex than other algorithms, and it has outperformed the recently proposed TERMA algorithm in detecting P, QRS, and T peaks. The detection performance of the TERMA algorithm depends on CVD. In contrast, our proposed algorithm is more generic and outperforms TERMA for any CVDs. • The second contribution is related to the CVD classification. The PR and RT durations calculated from the estimated locations of the P, R, and T peaks in the previous contribution are considered as features. Along with AR coefficients, these features significantly reduced the number of features required to classify CVD. We tried different features and improved the classification accuracy using MLP and SVM classifiers. • Usually, the particular features chosen for a database do not necessarily perform well another database.
We showed that PR and RT durations along with the age and sex features perform very well for different databases, and the computational complexity required was found to be significantly lower than that of stateof-the-art algorithms. • We trained our model using MIT-BIH arrhythmia database 21 and then tested it on two different databases, INCART 22 and SPH 23 respectively. The attained accuracies were 99.85% and 68%.  www.nature.com/scientificreports/ The proposed algorithm can be used in futuristic cardiologist-and the probe-less systems as shown in Fig. 2. In such a system, probe-less ECG sensors are placed on the patient body and signals are transmitted with the help of Bluetooth to a processing device such as a mobile. The received signal can be processed and passed to a proposed machine learning algorithm for automatic CVD diagnosis.
Paper organization. The rest of the paper is organized as follows. Section 2 describes the some techniques used in the proposed algorithm, and Sect. 3 describes the methodology used in peak detection in detail. Then, Sect. 4 describes the feature extraction and classification using machine learning and Sect. 5 presents the results of the proposed algorithm, which was validated over a variety of signals from two different databases. Finally, Sect. 6 concludes the paper.

Some preliminaries
In this section, we discuss some important techniques that are used in the proposed methodology.
Discrete wavelet transform. The approximate and detailed coefficients of DWT of a function x(t) are respectively defined as follows 24 : and where j ≥ j o , j o is the starting scale, φ j,k (t) is the scaling function, and ψ j,k (t) is the wavelet function. The inverse discrete-wavelet-transform (IDWT) for given approximate and detailed coefficients is defined as follows: x(t)ψ j,k (t), www.nature.com/scientificreports/ Moving averages. Moving averages result in smoothing out short-term events while highlighting longterm events. In trading, two moving averages are used together resulting in two crossovers. The use of these averages results in the detection of trading events. These averages can be used in the detection of P, QRS, and T waves. The implementation of the moving average results in higher numerical efficiency with less complexity. Therefore, the idea of using two moving averages is promising in analyzing biomedical signals.
Fractional Fourier transform. The FrFT is the generic form of classical Fourier-transform with a parameter ( α ) that shows order 25 . It was first introduced in mathematical literature years ago. FrFT is mainly used in solving the differential equations in quantum physics, but it can also be used in interpreting optics related problems. In recent years, the use of FrFT in optical applications has been increasing. Many new applications have been proposed in the field of data processing of signals because of the useful characteristics of FrFT in the time-frequency plane. The FrFT of a signal can be defined as follows 26 : where α is the order of FrFT and φ = απ/2 is the angle of rotation. While F α (·) denotes the FrFT operator and K φ (t, u) represents the kernel of FrFT and is defined as where n is an integer.

Proposed methodology
As motioned earlier, for the accurate detection of P, QRS, and T waves, artifacts and noise should be removed from signals. Figure 3 shows the block diagram of the proposed three-step methodology. (1) To remove noise and artifacts, the conventional wavelet-transform-based filtering method is used, (2) for the detection of P, QRS complex, and T waveforms TERMA and FrFT are fused together to improve the detection performance, and (3) machine learning algorithms are applied to classify ECG signals to determine the CVD if any. The individual tasks are discussed in detail in the following subsections.
Signal filtering. The ECG signals are non-stationary, i.e., their frequency response changes with respect to time. Similarly, the noise and artifacts contaminating the ECG signal are non-linear, and their probabilitydistribution function is time-dependent. Conventional Fourier transform techniques do not provide time localization, while DWT provides time localization. Therefore, DWT can better deal with non-stationary signals. First step is to remove the baseline drift using DWT 27 . For this purpose, first of all, the central frequency, F c , (also  www.nature.com/scientificreports/ called F c factor) is calculated for the wavelet, which ranges from 0 to 1 depending on the similarity between the signal and chosen wavelet. For the ECG signals, Daubichie-4 (db4) has the highest F c factor, which is approximately equal to 0.7. Next, pseudo-frequency, F a , is calculated at each scale using the expression 27 where a and F s represent the scale and sampling frequency of the ECG signals, respectively. The baseline drift is mostly localized around 0.5Hz 28 . For the MIT-BIH F s = 360 , therefore using (2), the scales corresponding to different pseudo frequencies can be easily calculated. Decomposition should be up to scale 9 that corresponds to F a = 0.5 . Therefore, the ECG signal is decomposed into approximation and detailed coefficients using the db4 wavelet up to scale 9. The approximate coefficients corresponding to the baseline drift are removed, and the signal is reconstructed using IDWT to obtain a baseline drift-free signal 29 .
Once the baseline drift-free signal is obtained, the next step is to remove high frequency noise. It was reported in 30 , that most of the QRS complex energy is concentrated within the range of 8 to 20 Hz. The SNR has been calculated at different levels, which shows that decomposition up to level 6 is required to capture the QRS complex wave. Therefore, the signal is reconstructed using the detailed coefficients of levels 4, 5, 6 and the approximation coefficients of level 6.
The detailed coefficients of levels 1, 2 and 3 contain high frequencies ranging from 50 Hz to 100 kHz. These frequencies belong to muscle contraction noise. Therefore, at these levels, the details are discarded, and the approximations are retained to remove high-frequency noise. The resulted signal has highest SNR because the high frequency detailed coefficients are discarded. Figure 4 shows the baseline drift and high frequency noisefree signal. In the TERMA algorithm, to detect peaks, the artifact and noise free signal is squared to enhance the peak values, a BOI is generated for each wave, and thresholding is finally applied. In the following subsection, we showed how the TERMA algorithm detection performance can be improved by exploiting FrFT. R peak detection using the fusion algorithm. In the ECG signal, the maximum change in frequency occurred at the R peak. By taking the Fourier transform of the ECG signal, the time localization can be lost. Therefore, in this step, FrFT was applied to the noise-free signal to rotate the signal in the time-frequency plane 31 . As seen in the preliminaries, the FrFT operation comprises a chirp multiplication, followed by a chirp convolution, and lastly another chirp multiplication. Rotating the signal with a higher value of α is like moving closer to the frequency domain of the signal, while rotating it with a lower value of α is like moving toward the time domain of the signal. In R-peak detection, time localization is very important 32 . Using the hit and trial method, we found that the value of α = 0.01 appropriately enhances R-peaks and makes them easy to detect. After applying FrFT, the R peak was more enhanced by squaring each sample. After the enhancement, two moving averages based on event and cycle were calculated as follows: where W 1 depends on the duration of the QRS complex, and W 2 depends on the heartbeat duration. The mean ( µ ), of the enhanced signal is calculated and multiplied by a factor ( β ) whose optimum value was chosen by hit and trial method. The output number is denoted by γ = βµ , and was added to MA cyclic to generate threshold values. Each value of the MA event was compared with the corresponding threshold value. If MA event (n) was www.nature.com/scientificreports/ greater than the nth threshold, one is assigned. Otherwise, zero is assigned in a new vector. This way, a train of nonuniform rectangular pulses is generated. Finally, the pulses that have widths equivalent to W 1 are the blocks that contain the desired event as shown in Fig. 5a. In each block, the maximum value in the corresponding enhanced signal is considered an R peak value. This process is explained in detail in 12 . Figure 6a shows that the R peaks were accurately detected after applying the proposed algorithm.
P and T peak detection using the fusion algorithm. To detect P and T peaks, TERMA uses a complicated threshold. We reduced the overall computation complexity of the algorithm by applying a simplified threshold. The first step of the algorithm is to remove the R peaks to make the P and T peaks prominent. Thus, 30 samples (0.083 s) before the R peak and 60 samples (0.166 s) after the R peak were set to 0 in the noise-free signal. In the chosen interval, the expectation of the P and T waves was almost zero for any CVD. After the QRS interval removal, the signal was rotated in time-frequency plane using FrFT to enhance the P and T peaks. Similar, to the previous section, block of interests were generated as shown in Fig. 5b, using two moving averages defined as follows: www.nature.com/scientificreports/ where W 3 depends on the P wave duration, W 4 depends on the QT interval, q = W 3 −1 2 , and r = W 4 −1 2 . For a normal healthy person, the P wave duration can be (100 ± 20) ms, whereas the QT interval can be (400 ± 40) ms. To detect P waves, instead of a normal size, a smaller window was chosen to consider the special cases of arrhythmias. Here, in contrast to the case of the R-peak detection, the threshold values were simply the values of the second moving average. If the first moving average was greater than the corresponding second moving average one is assigned. Otherwise, zero is assigned in a new vector. This way, a train of nonuniform rectangular pulses is generated. www.nature.com/scientificreports/ Finally, a threshold based on the PR, RR and RT intervals was applied to distinguish the generated blocks from the blocks that contain P and T peaks. If the distance between the maximum value of the block and the nearest R peak is within the predefined PR interval, the maximum value of the block is referred to as the P peak. If the distance between the maximum value of the block and the nearest R peak is within the predefined RT interval, the maximum value of the block is referred to as the T peak. Figure 6b and c shows that the P and T peaks were accurately detected after applying the proposed algorithm. In this work, similar to the TERMA algorithm, we have detected normal and merged T peaks. Therefore, there is a need to investigate T peaks with different shapes such as inverted, biphasic negative-positive, and biphasic positive-negative. Moreover, different types of moving averages can help in further analysis of ECG signals. This algorithm is not designed to work for the additional U wave after the T peak. These aspects would be investigated in our future work.

Classification of The ECG signal
In this section, to classify the given ECG signal according to CVD, machine learning was applied. In machine learning, training datasets with corresponding labels are fed in an algorithm, where different features are extracted from each dataset and a model is formed to predict test data labels. This is called supervised machine learning. It helps in the automatic decision-making process by building different models from sample data. Data training includes two steps, feature extraction and classification, as discussed in the following subsections.
Feature extraction. Different features can be extracted from the ECG signal. For example, the estimation of different peaks can be used to find the time intervals between different peaks. Since these time intervals represent different cardiac conditions, they can be considered as features. Moreover, auto-regressive (AR) model coefficients of the ECG signal can be used as a feature 33 . The AR model of order p, AR(p), can be defined as follows: where a(i) is the ith coefficient of AR model, e(n) is a white noise with a zero mean, and p is the order. The optimum order of the AR model depends on the number of factors. Higher order AR model yields more accurate modeled signals but at the cost of higher computational complexity in calculating the coefficient values. Similarly, other features, such as the wavelet transform coefficients, mean, variance, age, sex, and cumulant, can be extracted to classify the CVD of the ECG signal. Feature extraction is very important because it shows which type of inputs can better represent the signal. In this work, MIT-BIH arrhythmia 21 and SPH 34 database signals were used.

Feature matrix.
The feature matrix contains feature information of ECG beats taken from different records of the arrhythmia database. Each row of the matrix shows the feature information of a single heartbeat. Each row includes different features of heartbeats taken from the datasets. For example, if we take four coefficients from the AR model, n coefficients from the FrFT of the given heartbeats, and two intervals PR and RT as features, the feature vector can be written as follows: {a 1 , a 2 , a 3 , a 4 , f 1 , f 2 , . . . , f n , PR, RT} . The feature matrix can be formed with such multiple rows.
Supervised machine learning algorithms. The classification of the ECG signal is a very important and challenging task. It can provide substantial information about the CVDs of a patient without the involvement of a cardiologist. Only a technician is required to attach the probes, and the machine learning based solution can automatically diagnose the CVDs of the patient. This technique can immediately prioritize the patients that need urgent medical attention 35 . In this work, the SVM and MLP supervised learning algorithms were used for classification and they were briefly discussed in the following subsections.
Support vector machine classifier. The SVM algorithm can be used in classification and regression problems 36 . In SVM, data is plotted in an l-dimensional space, where l denotes the number of features. After plotting the data, classification is performed by finding a hyperplane that differentiates between different classes. The maximization of the margin optimizes the hyperplane. Then, the hyperplane, that is at a higher distance from the closest data points among other hyperplanes, is chosen. The SVM solves the following quadratic problem: where X i , X j are input features, y i , y j are class labels , α i ≥ 0 are Lagrangian multipliers, C is a constant, and K(X, X 1 ) is a kernel function 37 . A very common kernel function is the Gaussian radial basis function: www.nature.com/scientificreports/ The SVM is very effective in higher dimensional spaces and when the number of dimensions is greater than the number of samples.

Multi-layer perceptron classifier.
Artificial-neural-network (ANN) algorithms classify regions-of-interest using a methodology that performs functions similar to those of the human brain, such as understanding, learning, solving problems, and making decisions. The ANN architecture consists of three layers. The first layer is the input layer, and the input parameters determine the number of neurons in this layer. The last layer is the output layer, and the number of neurons in this layer represents the number of output classes. The layers between the input and output layers are called the hidden layers 38 . MLP was used in this work, and it is a subclass of the feedforward ANN.

Simulation results and discussion
This section is divided into three parts, which are dedicated respectively to peak detection, classification, and cross-database training and testing.
Detection of ECG peaks. In the first part of the simulation, using our proposed FrFT-based algorithm, the P, R, and T peaks are detected, and the proposed algorithm is validated over all the 48 records of the MIT-BIH database. Lead II (MLII) data is used in this paper. Our algorithm works independent of the amplitude of the waveform, so any lead data can be used for the peak detection. Moreover, the performance is assessed using different metrics reported in the literature, such as sensitivity, positive predictivity, and error-rate, which are defined as follows 39,40 : where TP denotes the true-positive, FN denotes the false-negative defined as the annotated peaks not detected by the algorithm, and FP denotes the false-positive defined as the peaks detected by the algorithm but not actually present. If a peak is detected within the 30 ms interval of the annotated peak, it is defined as TP. To assess the performance of the algorithm, we observed TP, FN, and FPs. In Table 1, the R peak detection performance of our proposed algorithm is compared with the TERMA algorithm. Both algorithms were tested over the 48 records of the MIT-BIH arrhythmia database. As seen, the proposed algorithm performed slightly better than the TERMA algorithm. Similarly, the detection performance of the proposed algorithm in the detection of P and T waves was compared with that of TERMA algorithm as shown in Table 2. In Table, we compared the reported performance of TERMA algorithm in 13 , where only 10 records of MIT-BIH database were selected. It can be seen that our proposed algorithm outperforms TERMA algorithm. www.nature.com/scientificreports/ In Table 2, both algorithms were also tested on the remaining 38 records of the MIT-BIH database. Here, significant difference can be seen in the detection performance of both algorithms. For the P peak detection, our proposed algorithm resulted in SE of an 75.8% and an Err of 0.40 compared with an SE of 67.5% and Err of 0.51 in the case of TERMA. For the T peaks detection, proposed algorithm results in SE of 59.2% and Err of 1.04 compared with an SE of 42.8% and Err of 1.15 in the case of the TERMA algorithm as shown in the table. This shows that the detection performance of the TERMA algorithm is limited to a few CVDs, while our proposed algorithm performs very well for the other CVDs in the MIT-BIH database.
Overall, it was found that our proposed algorithm performs better than the TERMA algorithm and other previously presented algorithms.

Classification of CVDs.
In the second part of the simulation, we classify the ECG signals according to their CVDs. Here, for all simulations 70% of the feature data was allocated to train the machine learning model while 30% was kept for testing 37 . Therefore, different features were extracted from the signals for the classification. Then, the extracted features were passed into the SVM and MLP classifiers to classify the input ECG signals as normal, PVC, APC, LBBB, RBBB, and PACE heartbeats. To compare the performance of the proposed classifier with that of the existing ones, the following performance metrics were used: where TN denotes a true-negative, which is defined as, the patient has a CVD and the classifier also predicts that the patient is not normal.
As we know, the MIT-BIH database contains limited ECG signals from only 48 patients. For machine learning algorithms, the quantity of data is crucial. Therefore, for classification, we tested the proposed algorithms on the recently reported Shaoxing SPH database 23 . This database contains 12 lead ECG signals from 10,646 patients. In contrast to the MIT-BIH ECG signal sampling rate of 360 samples/s, the sampling rate of the SPH ECG signal is 500 samples/s. The data set consists of four folders containing ECG raw data, ECG denoised data, diagnosis data, and attributes. This database consists of 11 common rhythms and 67 additional cardiovascular conditions. Each of the 12 lead signals is 10 s long i.e., 5000 samples for each lead. In this database, 11 rhythms are merged into four groups SB, AFIB, GSVT, and SR. The SB group only includes sinus bradycardia, the AFIB group consists of atrial fibrillation and atrial flutter (AF), the GVST group contains supra ventricular tachycardia, atrial tachycardia, atrioventricular node reentrant tachycardia, atrioventricular reentrant tachycardia, and sinus atrium to the atrial wandering rhythm, while the last SR group includes sinus rhythm and sinus irregularity.
For the first classification-simulation, the extracted features were passed to the SVM classifier. The parameter values of C and γ = 1 2σ 2 were respectively adjusted to 65536 and 2.44 × 10 −437 . The scikit-learn library of Python was used for machine learning model building 41 . In 37 , to classify an ECG signal, 36 features are extracted from it, where 32 features were the DWT (db4) of the signal and 4 were the coefficients of AR model. However, in the proposed classifier, a feature matrix was generated using only four features, where two features were extracted using the estimated P, R, and, T peaks, which are PR and RT intervals, whereas the other two were age and sex.
Both classifiers were trained and tested on the records of the MIT-BIH and SPH databases. In the case of MIT-BIH database, the number of heartbeats extracted from the Normal, LBBB, RBBB, PACE, PVC, and APC records was 2237, 2490, 2165, 2077, 992, and 1382 respectively. However, in the case of SPH, the features were extracted from all heartbeats of 10,646 patients. The corresponding performances of both classifiers for the MIT-BIH and SPH databases is shown in Table 3. In the case of the MIT-BIH database, the overall accuracy of the classifier proposed in 37   www.nature.com/scientificreports/ to 37.1%. Nevertheless, in the case of the MIT-BIH database, the accuracy of our proposed classifier with only four features was 82.2%, but it became 84.2% in case of the SPH database, so it is much better and more stable than that of the proposed classifier in 37 . The computational complexity comparison of the feature extraction for both classifiers is also shown in the Table 3. The computational complexity to find the AR coefficients is O(p 3 ) + O(p 2 N) , and DWT is O(LN) , and α shows the computational complexity of finding the R peaks, where L is the number of decomposition levels and N is the number of samples in one heartbeat. In our algorithm, to find the R peak using FrFT, the computational complexity was O(N log 2 N) . In 37 , instead of estimations, annotated R peaks were used, so there were some computation cost denoted by η depending on the used algorithm. Considering the same computational complexity for estimating R peaks, the computational complexity of our proposed classifier is lower by an order of O(p 3 ) + O(p 2 N) , which is the computational cost of AR model. In the table, by adding a few other features, the corresponding accuracy and computational complexity were also shown. It can be seen in terms of computational complexity and accuracy, PR, RT, age, and sex are the most promising ones for different databases.
In the second simulation, the first simulation steps were repeated with the MLP classifier. The corresponding simulation results are also shown in Table 3. Here again, it can be seen that in the case of the MIT-BIH database, the MLP classifier's accuracy with 36 features was 99.8%, but in the case of SPH, it decreased to 38.2%. However, with our proposed 4 features, in the case of the MIT-BIH database, the accuracy was 80% while in the case of the SPH database, it was 90.7%. Therefore, we can say that our proposed classifier has more stability with respect to database changes than other classifiers. www.nature.com/scientificreports/ Table 4, shows a performance comparison of SVM and MLP for the MIT-BIH and SPH databases in terms of precision, recall, and F 1 -Score for individual CVDs. In the table, it can be seen that MLP performed much better than SVM on the SPH database. While, for some diseases, the performance of the SVM classifier was slightly better than that of MLP in the case of the MIT-BIH database. Therefore, we can say that MLP is a better choice for both databases. The confusion matrix for the MIT-BIH using MLP classifier is shown in Table 5. The confusion matrix for other classifiers can be easily calculated.
Cross database training and testing. In the third part of the simulation, the MLP classifier was trained using the MIT-BIH arrhythmia database and then tested on the St. Petersburg INCART 22 and SPH 23 databases to classify the Normal, RBBB, and PVC heartbeats. All three databases have different sampling rates. Therefore, all the signals were resampled to a frequency of 128 Hz for the simplicity. The data extracted from these databases was already baseline wander and noise free, so there was no need of preprocessing. Different features, such as age, sex, PR, and RT intervals were extracted. The overall accuracy of the trained model on the INCART database and SPH database was 99.85% and 68% respectively. The detailed performance of the classifier for various CVDs in terms of precision, recall, and F 1 −Score is shown in Table 6.
In the case of the SPH database, as shown in the Table 6, classifier was unable to correctly classify the RBBB and PVC heartbeats, because our proposed algorithm was unable to detect inverted ,biphasic negative-positive and biphasic positive-negative T peaks, which may present in RBBB and PVC. It results in degradation of the overall classifier accuracy. There is a drawback associated with cross database processing. The classifier works only when disease features are normalized and normal patient features are not normalized for both training and testing. If we apply normalization to all the training and testing data, the accuracy of the classifier further degrades. However, this condition is not realistic and needs further investigation.
In the future, we plan to work on this problem to further increase the overall prediction accuracy.

Conclusion
In this work, a fusion algorithm based on FrFT and TERMA was proposed to detect R, P, and T peaks. Conventional wavelet transform method were used to denoise signals, whereas the use of FrFT in the TERMA algorithm significantly improved the peak detection performance. We applied the proposed peak detection algorithm in the MIT-BIH arrhythmia database, and it performed slightly better than the TERMA algorithm in the detection of the R peak, while significantly better than it in the detection of the P and T waveforms. Moreover, in contrast to the TERMA algorithm, the performance was independent of CVDs. After the peak detection, the results were used to find the PR and RT intervals as two features of the ECG signal for the classification. We used two classifiers with different features and found that MLP performs better than SVM for a variety of ECG signals. Both classifiers were tested on the two databases. Finally, we designed a classifier for cross-database training and testing. This is a challenging task, and as far as we know, there have not been any available works in this direction. Our initial results are promising and to further improve the results, will be our future work. A demo of the work can be seen at the link https:// www. youtu be. com/ watch?v= 3tfin 4sSBFQ. In the demo video, the algorithm is explained in the first part, while in the second part initial wireless ECG diagnosis system is presented. The ECG signal from the AD8232 ECG module is transmitted with the help of Arduino and Bluetooth transmitter and received by the Bluetooth receiver of an android mobile phone that run an Android app to display the signal on the mobile screen. In the initial version only raw signal display is included in the Android app, the algorithms proposed in this paper will be included in the developed Android app in the ongoing work.