Classification of ECG signals using multi-cumulants based evolutionary hybrid classifier

Every human being has a different electro-cardio-graphy (ECG) waveform that provides information about the well being of a human heart. Therefore, ECG waveform can be used as an effective identification measure in biometrics and many such applications of human identification. To achieve fast and accurate identification of human beings using ECG signals, a novel robust approach has been introduced here. The databases of ECG utilized during the experimentation are MLII, UCI repository arrhythmia and PTBDB databases. All these databases are imbalanced; hence, resampling techniques are helpful in making the databases balanced. Noise removal is performed with discrete wavelet transform (DWT) and features are obtained with multi-cumulants. This approach is mainly based on features extracted from the ECG data in terms of multi-cumulants. The multi-cumulants feature based ECG data is classified using kernel extreme learning machine (KELM). The parameters of multi-cumulants and KELM are optimized using genetic algorithm (GA). Excellent classification rate is achieved with 100% accuracy on MLII and UCI repository arrhythmia databases, and 99.57% on PTBDB database. Comparison with existing state-of-art approaches has also been performed to prove the efficacy of the proposed approach. Here, the process of classification in the proposed approach is named as evolutionary hybrid classifier.

. This concept of fuzzy logic was again utilized when extended version i.e. type-2 fuzzy clustering was implemented with wavelet transform and neural network for the classification of ECG signals 14 . In 2001, Lena Biel et al. experimented and concluded that one lead can also be helpful in extracting features from ECG and recognize a person 15 . Mohamed I. Owis et al. proposed and reported a model using features obtained from Lyapunov exponent and correlation dimension for the successful detection and classification of ECG signals 16 . An ECG signal coder, having low computational complexity, was designed by applying N-PR cosine modulated filter banks to have low bit rate in 2004 17 . M.G. Tsipouras et al. exploited the RR interval in ECG waveforms for the beat and episode classification by achieving 98% and 94% of accuracy respectively 18 . ECG is also helpful in identifying an individual. It was done in 2005 when Steven A. Israel et al. computed the features using the fiducial points obtained from filtered data of ECG and characterize the uniqueness of an individual 19 . The work of knowledge-based ECG interpretation was further extended and rule-based rough-set based ECG classification was generated by S. Mitra in 2006. They had introduced an offline system for ECG data acquisition which produced noisy data. Therefore, proper noise removal and baseline correction was performed before applying the proposed method. Respective peaks in the ECG waveforms were detected for the classification 20 . A method for noise removal was also proposed by B. N. Singh and A. K. Tiwari in 2006 which utilized mother wavelet basis function for denoising of ECG signals while retaining the ECG peaks to the same as in noisy data 21 . In the same year, a comparative study was also proposed on ECG descriptors for the heartbeats classification. This was between morphological and time frequency descriptors. Morphological features include QRS pattern recognition while computing expansion coefficients using matching pursuits algorithm gave time frequency correlation. The heartbeats are taken from MIT-BIH arrhythmia database and four local sets of GLS and classified usingk nearest neighbour classifier 22,23 . Very good accuracy was achieved with both the descriptors 24 . Yeong Pong Meau et al. introduced a novel technique for ECG classification in 2006. The technique was a hybrid of extended kalman filter and neuro fuzzy system and helpful in distinguishing various abnormal ECG signals. Due to the use of multi-layer perceptron network in neuro fuzzy system, the technique was iterative and hence, computational complexity is high 25 . In one more research, DWT was utilized to decompose the ECG into time and frequency domain to compute the wavelet coefficients and classification of ECG beats were performed using multiclass support vector machine 26 . Independent component analysis was also implemented to decompose the ECG signals into weighted sum of basic components that are statistically mutual independent. This feature vector was formed by combining these components with the RR interval and classified using various classifiers like Bayes, minimum distance and neural network classifier 27,28 . These features of independent component analysis and RR interval when combined with wavelet transform features, 99.3% of accuracy was achieved using SVM on 16 classes of MIT-BIH databases 29 . In 2008, Argyro Kampouraki et al. utilized the statistical analysis for feature extraction of two ECG databases: young and elderly ECG signals & normal and abnormal ECG signals. The classification was performed using SVM with very low signal to noise ratio 30 . The parameters of SVM like Gaussian radial basis function (RBF) and penalty parameter were also optimized using genetic algorithm (GA) for ECG arrhythmias classification 31 . The authors also performed the same task by changing the optimization algorithm from GA to PSO giving better results than the earlier approach. PSO based optimization is faster as well compared to GA 32 . In medical applications, ECG was introduced for age classification by M. Wiggins et al. with the help of genetically optimized Bayesian classifier achieving 86.25% AUC which is better compared to other existing methods 1 . Turker Ince et al. proposed a method for ECG patterns recognition by applying wavelet transform for feature extraction and PCA for dimensionality reduction. The classification was performed using neural network optimized using particle swarm analysis (PSO). The method even achieved higher accuracy on larger databases 33 . PCA was also combined with linear discriminant analysis (LDA) for feature reduction and using probabilistic neural network classifier ECG arrhythmias were classified with 99.71% of accuracy 34 . In 2009, Walter Karlen et al. combined the fast fourier transform and artificial neural network for the sleep and wake states in ECG signals obtained with the help of wearable sensors. 86.7% of accuracy was achieved on multiclass data as satisfactory performance 2 . Sleep apnea was also detected by Baile Xie and Hlaing Minn in 2012 using saturation of peripheral oxygen ad combination of various classifiers 35 . Comparisons of DWT, continuous wavelet transform (CWT) and discrete cosine transform (DCT) were performed on Massachusetts Institute of Technology-Beth Israel Hospital (MIT-BIH) databases 36 by using neural network and SVM by Hamid Khorrami and Majid Moavenian in 2010 37 . Then, Yüksel Özbay and Gülay Tezel introduced a neural network for ECGclassification which has adaptive activation function. The results achieved over 92 patients of ECGs were 98.19% which is quite good 38,39 . A unique way of Teager energy function was firstly utilized for ECG beat classification by C. Kamath in 2011. The advantage of using Teager energy function for ECG was that this function models the energy of the source such that the activity in the heart can easily be visible in the function 40 . ECG recordings can also be used for the recognition of emotions in a person. It was introduced by Guo Xianhai by utilizing radial basis function neural network and achieved an accuracy of 91.67% 41 66 that has never been utilized in the ECG analysis. The features obtained from second-, third-and fourth-order cumulants are concatenated to form a feature vector. The feature vector is then used for classification. The classification is performed with a non-iterative method of machine learning. It is named as kernel extreme learning machine (KELM) 67 . The parameters of KELM are optimized using an optimization algorithm. The hybrid of KELM and optimization algorithm is termed, here, as evolutionary hybrid classifier. The remaining paper is organised in the following manner: "Preliminaries" gives a brief idea about the preliminaries and databases used during the experimentation of the proposed approach. "Proposed method" tells www.nature.com/scientificreports/ about the proposed method of ECG analysis. "Experimental results and analyses" discusses the experimental results and analyses followed by conclusion in "Conclusion and future scope".

Preliminaries
The proposed method for the recognition of ECG signals consists of three steps: pre-processing, feature extraction and classification. A basic overview of a general recognition system is shown in Fig. 2. Detailed block diagram is explained in the next section. Pre-processing step includes balancing of the ECG databases utilized during experimentation and noise filtering in the ECG signals which is performed with the help of resampling techniques and wavelet transform respectively. Feature extraction is done using cumulants and a feature vector is obtained as multi-cumulants features. Pre-processing and feature extraction stages have been combined and known as feature detection in this approach. At last, ECG signals are classified into their respective classes by using evolutionary hybrid classifier. A brief overview of all the preliminaries used is given as follows: Resampling techniques. First step towards pre-processing in the proposed approach is balancing of the database. Uneven number of samples in various classes of a database is quite common. However, it increases the chances of error during classification when the difference between numbers of samples in a class is very large compared to other classes. Therefore, the approach utilized for sample recognition becomes sensitive towards the class having major number of samples in the database. Thus, data balancing is very important in such uneven or unbalanced databases. Resampling techniques are those techniques which are used for the balancing of samples in each class of a database. There can be so many techniques for data balancing but commonly classified into over-sampling, under-sampling and hybrid of these two (known as importance resampling) 68 .
Random oversampling technique. Random OverSampling Technique (ROST) is a resampling technique utilized for balancing the unbalanced data in the database. It is a non recursive approach as it randomly copies the data of the class having less number of samples to make the samples equal in the respective class (the minor class) to the class (the major class) having highest number of samples in that database. It is shown in Fig. 3a for better understanding. As it can be seen from the figure, before applying ROST, Class 1 is the major class containing very large number of samples as compared to Class 2 (the minor class). ROST copies the samples of Class 2 randomly and makes itself equal to the Class 2. This technique of resampling is very effective in giving better results of recognition in machine learning because copying the same data into the classes helps in getting good training of the machine learning approach so that efficient model can be formed. Along with this advantage, ROST also results into over-fitting of the data which is a substantial drawback for the technique and is rectified at the classification stage of our proposed method.

Multiclass Classification
Feature Detection Databases Pre-Processing Feature Extracion Figure 2. Basic overview of a general recognition system.  Importance resampling technique. Importance ReSampling Technique (IRST) is a hybrid of ROST and RUST. IRST overcomes the limitations of ROST and RUST by combining the advantages of both of these techniques into one. This technique uses important or weighted information of the data and reframes the data according to their importance in the database. The weight acts as the carrier to the data and the most important is assigned with the weights to remove the least important data only.

Class
As there is loss of enormous amount of data in RUST (as shown in Tables 1 and 2), therefore, only ROST and IRST have been utilized for balancing the number of samples in the classes of databases for the proposed approach. After balancing the data, noise removal is performed in the ECG signals using wavelet transform.
Wavelet transform. Wavelet transform (WT) takes its origin from Fourier transform. In Fourier transform, signals are transformed into frequency domain so that analysis of the ECG signal can be done easily. This is because computations in time domain are difficult as compared to the computations in frequency domain. For example, convolution in time domain is simple multiplication in the frequency domain. A general equatorial representation of WT (assuming finite energy and zero mean) is as follows: Here, W T represents the wavelet coefficients of convolution of the signal x(t) with mother wavelet function Φ(t). Γ is the measure of time known as translation and σ is the measure of frequency known as scaling parameters. By taking different combinations of Γ and σ, various mother wavelet functions can be generated. There can be different families of wavelet transforms and that are Haar, symmlet, coiflet, Daubechies, Mexican hat, B-splines, and many more.  www.nature.com/scientificreports/ In discrete time domain, discrete wavelet transforms are defined as: It is nothing but the decomposition of the signal using successive filtering with the help of low and high pass filters. A is the approximation coefficient and D is the detailed coefficient. A and D are obtained using dyadic decomposition of signal using successive low pass and high pass filters respectively. r h and r l are the high and low pass filters in dyadic DWT with half the cut off frequency from the previous frequency.
The scaling and wavelet functions in discrete WT are represented mathematically as: where m,n ∈ Z. Figure 4 shows the decomposition of a signal on the basis of WT. WT provides a multi-resolution system. Signals having discontinuity have many coefficients with large magnitude using Fourier transform but WT generates few significant coefficients around the discontinuity and set the rest to zero. Hence, better results are achieved in non approximation with WT during reconstruction of the signals. Due to this advantage of WT, it is also helpful in achieving good accuracy results in compression and denoising of the signals. Therefore, WT is utilized here for the denoising of the ECG signals. Now, the features will be computed for these denoised signals and it will be performed with the help of Cumulants.

Cumulants.
Higher order statistics are stated in terms of moments (n m ) and cumulants (K m ). K m are stated as the set of components that are generated using the non-linear combinations of moments 69 . Generating function f(t) is also helpful in defining the K m and for a random variable Y, f(t) is represented as: where E is the statistical expectation and defined for random variable Y, having probability distribution function g(y), as: www.nature.com/scientificreports/ Cumulants (K m ) are obtained from the power series expansion of cumulant generating function and represented as: Therefore, K m can be defined as MacLaurin series expansion in which mth order Cumulant is calculated at t = 0. Now, Cumulants (K m ) are represented using combinations of moments (n m ). The moments (n m ) existing for upto l order, say for a non-stationary signal z (m) which depends only on the time differences ǫ 1 , ǫ 2 , . . . , ǫ l−1 where m = 0, ± 1, ± 2, ± 3, ± 4, ……, in terms of E, is Hence, using this equation, the first-order cumulant K 1 is stated as It is clearly visible that it is equal to the first-order moment and is defined as the mean value of the nonstationary signal z (m). Similarly, the second-order cumulant K 2 (ǫ) is shown in the form of equation as: Here, n 2 (ǫ) is the second-order moment defined as the autocorrelation and K 2 (ǫ) denotes the second-order cumulant that is called as variance. For zero mean variables, K 2 (ǫ) = n 2 (ǫ) . The third-order cumulant K 3 (ǫ 1 , ǫ 2 ) is represented as In this, n 3 (ǫ 1 , ǫ 2 ) depicts the third-order moment. K 3 (ǫ 1 , ǫ 2 ) explains the skewness of the signal and is equal to n 3 (ǫ 1 , ǫ 2 ) for zero-mean. When there is symmetric signal, K 3 (ǫ 1 , ǫ 2 ) becomes zero and for zero-mean variables, cumulants are equal to moments upto third-order i.e. K 3 (ǫ 1 , ǫ 2 ) = n 3 (ǫ 1 , ǫ 2 ) . Therefore, fourth-order cumulant K 4 (ǫ 1 , ǫ 2 , ǫ 3 ) is required because under zero-mean condition also, fourth-and second-order moments are needed to compute K 4 (ǫ 1 , ǫ 2 , ǫ 3 ) and it is represented as where n 4 (ǫ 1 , ǫ 2 , ǫ 3 ) is the fourth-order moment and if signal is having zero-mean, then, Fourth-order cumulant describes about the kurtosis of the signal. If these cumulants are considered in frequency domain then it can be obtained by taking the Fourier transform of these cumulants. Fourier transform of third-order cumulant is given as where S(ϕ 1 , ϕ 2 ) is the bispectrum of z(m), K 3 (ǫ 1 , ǫ 2 ) is the third-order cumulant and Z(ϕ) is the Fourier transform of x(n).
Similarly, for the fourth-order cumulant, its Fourier transform can be defined as Trispectrum and is given as www.nature.com/scientificreports/ where Q(ϕ 1 , ϕ 2 , ϕ 3 ) represents the Trispectrum of z(m) and K 4 (ǫ 1 , ǫ 2 , ǫ 3 ) as the fourth-order cumulant. Generalization of the cumulant parameters, with respect to the zero maximum lag to be computed, is shown in the Table 3.
Cumulants have been never used in the classification of ECG signals. In the approach utilized by V Sharmila et al., 3rd-order cumulant was utilized there to obtain the symmetry in the signal which is further utilized for AR modelling which helps in enhancing the ECG signal 70 . This feature of any signal i.e. symmetry can be used for classification. This feature of the non-stationary signals can be useful in classifying the non-stationary beats in ECG signals and hence, exploited here to obtain the features from the ECG signals. Then, for the classification purpose, advanced version of neural network has been used and is explained in the next sub-section.
Kernel extreme learning machine. Kernel extreme learning machine (KELM) is an extension of extreme learning machine (ELM). ELM was introduced by G.B. Huang in the year 2006 71 . It is a non-linear mapping process and was modelled for single feedforward hidden layer neural network 72 . Different from traditional neural network, ELM is a non-iterative approach and targets to minimize the training error and output weight's norm. ELM has gained attention among active research topic since last one decade 66,67,[73][74][75][76] . This is because of fusion of multiclass and binary classification, having ability to perform regression and classification, easy implementation and higher recognition rate.
An ELM model is defined for r-dimensional input vector having w number of training samples as Y = { y w , τ w |w = 1, 2, 3, 4, . . . , W} . As y w is r-dimensional, hence, input vector y w = [y w1 , y w2 , . . . , y wr ] and the corresponding target class (c number of classes), τ w = [τ w1 , τ w2 , . . . , τ wc ] . Therefore, ELM model for P number of neurons in the hidden layer is µ p tells about the weight at the hidden node κ p output. P, ω p and β p represent the number of neuron in the hidden layer, weight vector on the p th neuron of that hidden layer and bias on the p th neuron of the hidden layer respectively. The Eq. (18) can be re-written in the matrix form as: Thus, the output weight µ on the hidden node is given as the pseudo inverse of K and it is represented as This gave a model named as ELM model that can be shown as Here, C R is the regularization coefficient which is a constant and the value of this constant is required to be selected properly for the generalized performance of the model. ELM model has various advantages of lower computational complexity as the method is non-iterative, minimum error is achieved with the help of proper training. The problem of local minima and over-fitting is present in ELM. These problems are overcome by using kernel matrix with ELM, introduced in 2016 based on Mercer's condition 74,77 , as Now, modifying the ELM model as represented using Eq. (21), gives, and (17) www.nature.com/scientificreports/ κ(y) is that hidden node output which maps the input data to the hidden layer feature space. If there are two samples, say γ th and δ th input samples, then the kernel function can be stated as There are various kernel functions which can be used in the kernel based ELM. They are polynomial kernel, Laplacian kernel, sigmoid kernel, wavelet kernel, and RBF kernel. Equations for these kernel functions are shown in Table 4.
Any of these kernel functions can be utilized with KELM depending upon the requirement and hence, the kernel based ELM model is defined as the kernel extreme learning machine (KELM). Its architecture is shown in Fig. 5.
KELM can be utilized for both binary as well as multiclass classification 78 . Here, it is used in multiclass classification of ECG signals. This classification of ECG signals is optimized for minimum percentage error rate using an optimization algorithm which is explained in the next sub-section. In KELM, regularization coefficient and kernel parameter are the two variables whose optimized values affect the recognition of ECG signals.
Optimization algorithm. Optimization algorithm helps in selecting such values of the parameters or variables at which percentage error rate can be minimized in order to achieve good rate of ECG signals classification. This is performed using genetic algorithm (GA) here. GA was introduced by Holland and Goldberg using the concepts of genetics and Darwin's theory 79 .
In GA, fitness function is used to check for the best solution. It is a function which takes the solution (chromosome) as input and provides solution as output. Various combinations of the parameters are formed and tested for the solutions to the problem. Combinations of these parameters are selected using three basic steps of GA. These are as follows: parent selection, crossover and mutation. A basic structure representing the algorithm of GA is shown in Fig. 6.
Selection. It is a process in GA with which initial variables are selected as parent variables and they are mated and recombined to produce their off-spring or child. It is very important step as a good selection of the parent (25) ψ y γ , y δ = κ(y γ )κ(y δ ) T Table 4. Kernel functions with their equations.

Polynomial
ψ x, y = x T y + 1 n Laplacian ψ x, y = e (−�x−y�/σ) Sigmoid ψ x, y = tanh(βx T y + c) www.nature.com/scientificreports/ variable helps in generating better off-springs for the better solution. Selection can be done in various ways: fitness proportionate selection, tournament selection, stochastic uniform sampling, roulette wheel selection, random selection and rank selection. An appropriate selection is must to achieve better and fit solutions. Improper selection will lead to premature convergence to a suboptimal solution which is because of getting stuck in local minima. This problem may also arise because of the small size of the population. Thus, it is necessary to have a good selection of initial variables so that better off-springs can be generated and lead to achieve better results.
Crossover. It is the next step to selection in which one of the several crossover operators is utilized on the selected parents and off-springs are produced using the genetic properties of the parents. These operators include uniform crossover, partially mapped crossover, Davis' Order crossover, shuffle crossover, whole arithmetic recombination, ring crossover, order based crossover, one point crossover, and multi point crossover. Various combinations of parent chromosomes are performed to obtain the child chromosomes using these operators.
Mutation. It is defined as a fine adjustment in the child chromosome to obtain a whole new chromosome. It is performed to have diversity in the genetic population so that search space can be explored widely. It is an essential step for the convergence of GA. It also uses some operators for their function that commonly includes swap mutation, inversion mutation, scramble mutation, random resetting, and bit flip mutation. These operators are utilized according to the requirement in the problem that need to be solved. In GA, population is initiated either randomly or with some other heuristics and parent chromosomes are selected for mating. Value of the fitness function (or objective function) is computed. Now, crossover and mutation operators are applied on the parent chromosomes to produce child chromosomes. Fitness function value is computed again for these child chromosomes. Both the values are compared and chromosomes with which best solution is obtained will help in generating the chromosomes for next generation. This step repeats until termination criterion is reached.
Termination criterion is very important in GA to end a GA running process. Some conditions that can be utilized to stop a GA run are, when number of iterations (or generations) reached to maximum, when population size becomes equal to the chromosomes validated, or when best fitness function value becomes equal to the mean of the fitness function values of all the iterations.
Databases used. The  Out of all these attributes, first four attributes represents general details about the sample viz. age, sex, height and weight, while rest 275 attributes are the parametric details of the ECG signal including duration of QRS complex, duration between onset P and Q waves, between Q and offset T waves, duration between two consecutive P waves etc. and 280 th attribute tells about the class to which that sample belongs 60 . Out of all attributes, 206 are linear valued attributes and 73 as the nominal ones. All of these values are taken in milliseconds duration that represents average values. These parametric values are taken from a 12-lead recording of ECG 81 . Name of these arrhythmia classes with their respective number of samples is given in Table 5. In this ECG database, 11th to 15th attributes in each class, contains the missing values 82 . These missing values are filled with some values, the process of that is given and explained in the next section.

Proposed method
The proposed approach introduces a novel and robust approach of ECG signals classification. It is based on feature vectors obtained with the help of cumulants. 2nd-, 3rd-and 4th-order cumulants are utilized as the statistical approach of feature extraction. As it is already stated, 2nd-order cumulant is helpful in computing the autocorrelation of the signal, similarly, 3rd-and 4th-order cumulants for skewness and kurtosis of the signal respectively. These are very useful properties of non-stationary signals such as ECG because any small variation in the health of a person can be seen as variation in its ECG. Such variations can be computed statistically and helpful in recognizing different types of ECG signals. For the faster speed of the proposed approach, non-iterative method is used for classification. This non-iterative method is hybridized with optimization algorithm and hence, forms an evolutionary hybrid classifier. A block diagram of the proposed approach is shown in Fig. 7.
Here, three databases are utilized for checking the robustness of the proposed approach. These are explained in "Databases used". MLII and PTBDB databases are the proper ECG signals having 1000 fragments in MLII and PTBDB cropped and down-sampled to 188 dimension sizes. UCI repository database contains ECG parameters as stated above in "UCI repository arrhythmia database". In this database, as there are some missing values in 11th www.nature.com/scientificreports/ to 15th attributes, therefore, to maintain the relevance and reliability of the arrhythmia database, it is a prior need to fill these missing values by pre-processing the database. In some of the earlier researches, these missing values are tackled by directly removing the rows containing these values. Researchers also removed the 13th class which have uncertainty about the class containing unrecognized data. We have utilized this data as the separate class as 13th class. The missing values are dealt with a pre-processing step as it is shown in Fig. 8. As shown in figure, the missing values in 11th to 15th attributes of UCI repository arrhythmia database are replaced by the standard deviation of all remaining attributes of the respective class. The database containing these corrected values in the missing attributes is termed as the UCI repository arrhythmia corrected database. MLII & PTBDB ECG database do not have any such attributes or missing values, therefore, no correction is required in these two databases. Now, there are three databases of three different types. One is the complete ECG signal having 1000 fragments as in MLII, second is the ECG data down-sampled to 188 size of dimension and third one having parameter values of ECG signals. All these three databases are unbalanced having one class as the majority in number compared to other classes. Therefore, all three databases are balanced using resampling techniques. ROST and IRST are utilized as resampling techniques here. RUST is avoided because it removes the samples from the classes and reduces the database to very small size. Table 6 is showing the number of samples in the classes, before and after applying resampling techniques, of various databases used during experimentation. After balancing the databases, noise removal is performed on the signals of ECG databases. It is performed utilizing DWT. ECG signal's biology and shape helps in selecting the mother wavelet and the level of decomposition required 84 . As the Daubechies (db6) wavelet resembles the most with ECG signal, hence, it is used with decomposition level 10. There are many disturbances present in a raw ECG signals. These disturbances are present due to motion artifacts, power line interference and skin electrode contact 85 . This is performed normalizing the ECG signal so that DC offset (125 Hz) can be reduced 86 and variance of the amplitude can be eliminated.
After that signal is denoised using Daubechies wavelet of vanishing moments six and ten level of decomposition. WT decomposes the ECG signal into detailed and approximation coefficients as shown in Fig. 4. Then, high frequency noise is removed from the signal by removing the detailed coefficients D1-D2 and low frequency noise is removed by eliminating the low frequency coefficient A10. It is done with the help of automatic soft computing technique. ECG signal is regenerated by combining the rest of the coefficients. One more noise is still present in the signal i.e. Baseline wander noise at the range of 0.15-0.8 Hz. This noise is due to the electrode impedance and respiration in the human body 86 . It is removed with the help of moving average filter and signal has been smoothed. This step of pre-processing in the proposed approach is shown in Fig. 7.
These smoothened ECG data is then utilized for extracting the features. These features are statistical measures in terms of cumulants. 2nd-, 3rd-and 4th-order cumulants are computed of the smoothened ECG data means the features are extracted from noise removed ECG signals. If any noise or disturbance is still present then these features will be helpful in such conditions. This is because results the 3rd-and 4th-order cumulants are insensitive towards noise. Pre-processing step of noise removal with these higher order cumulants makes the proposed method more robust towards the ECG signals used. Also, 2nd-, 3rd-and 4th-order cumulants are applied in this proposed method because 2nd-order cumulant tells about the autocorrelation of the signal, 3rd-order cumulant tells about the skewness and 4th-order cumulant tells about the kurtosis of the ECG signal. 2nd-order cumulant or autocorrelation does not contain any information about phase 70 . With this advantage of minimum phase, 2nd-order cumulants are greatly helpful in identifying the non-linear signals like ECG signals. There are some types of phase coupling associated with nonlinear signals that are not correctly identified with the help of 2ndorder cumulants. In such conditions, higher order cumulants are useful. 3rd-order cumulant or skewness is a measure of asymmetry of any distribution about its mean 70 . It can have positive and zero values only. Positive value of skewness tells that the tail of the ECG signal is longer and thinner towards right side as compared to the left side. Zero value of skewness depicts the case of symmetric signal about the mean. It is also true in asymmetric signals in which asymmetry obeys, one tail being short and thin, and the other being long and thick. In the ECG waveforms, some kind of asymmetry is observed among four types of ECG datasets used. 4th-order cumulant or kurtosis of the signal which is a measure of the peakedness in the distributions and peakedness of any ECG waveform is defined by width of their peaks 70 . Higher kurtosis means more of the variance which is the result of infrequent extreme deviations and their Fourier transforms gives Bispectrum and Trispectrum for the signals, respectively, which can also be used as features for the signals. Hence, 2nd-, 3rd-and 4th-order cumulants are used for feature extraction to achieve better accuracy and classification results. In the following equation, size of the feature vector (N K ) obtained using cumulant is represented 87 : www.nature.com/scientificreports/ where, m l is the maximum number of lags of the cumulant that need to be used. The classification is performed with the help of evolutionary hybrid classifier. It is hybrid of optimization algorithm, GA and non-iterative algorithm i.e. KELM. Algorithm for the evolutionary hybrid classifier is also given.
In the evolutionary hybrid algorithm, parameters of KELM are optimized using GA. For this, population of the parameters ( m l , C R , P K ) are initialized, termination criterion is set with lower and upper limits of the C R and P K . Fitness function for the algorithm is error rate computed using the classifier KELM. It is written below in the form of equation: ErrorRate KELM is defined as the total number of incorrect predictions divided by the total number of data samples in the database. Here, in Using confusion matrix, it is defined as Confusion matrix:

FP TN
where, TP is correct positive prediction, FN is incorrect positive prediction, FP is correct negative prediction, TN is incorrect negative prediction The fitness function value is computed for the initial population and the best fit from that is obtained. The best fit is the values of parameters ( m l , C R , P K ) for the minimum Ϝ. Then these best fit chromosomes will become the parent for the next generation. Next generation population is obtained using crossover and mutation operations. Uniform crossover operator is applied in which each gene is treated separately without dividing the chromosomes into the segments. A representation is shown in Fig. 9 for uniform crossover. www.nature.com/scientificreports/ After that, the resultant chromosomes are applied with mutation operator i.e. random resetting operator here. In this, one or more genes are selected and there values are replaced with other random values within the given range. It is shown, in Fig. 10, as follows: Again, the fitness function is computed for the new generation and best fit is computed using Ϝ. Same process is repeated until some termination criterion is reached or the best fit (or minimum Ϝ) becomes equal to the mean value of the Ϝ obtained from the last generated population when the run is terminated. Hence, the optimized values of the parameters m l , C R and P K are obtained with corresponding percentage error rate (best fit or minimum Ϝ).

Experimental results and analysis
The proposed approach of ECG signals classification undergoes various steps in experimentation. For achieving better results in identifying the signal, pre-processing and feature extraction are performed which provides precision to the proposed approach. Experimental analysis performed on the proposed approach is explained as follows: As it is already mentioned in "Databases used", three different types of ECG signals are utilized for the analysis of the proposed approach. All these databases are facing a problem of large difference in number of samples in majority and minority classes. Imbalance ratio for each ECG database, used during experimentation, is given in the Table 6. Three databases MLII, UCI Repository Arrhythmia Corrected and PTBDB database have imbalance ratio of 283:10, 245:2 and 309:119 respectively. Imbalanced databases make the classification biased towards the majority class. Therefore, resampling techniques are applied on these databases to balance the number of samples in majority class with minority class. Comparison of number of samples in the classes in balanced database with number of samples in the classes in imbalanced database is represented in Tables 1 and 2. It can be seen from the tables (Tables 1 and 2) that RUST is not an appropriate resampling technique as it reduces the number of samples leading to loss of data which makes the system unreliable. ROST and IRST techniques do not suffer loss of data as they add samples to the classes. Therefore, ROST and IRST techniques are utilized in the proposed approach for balancing the number of samples in the classes of databases. ROST has a limitation of over-fitting because of addition of large number of samples. This problem is overcome by using the evolutionary hybrid classifier. After data balancing, pre-processing is performed with the help of DWT.
ECG signals are affected with many noises, as already explained in previous section ("Proposed method"). Therefore, noise is removed from these signals using db6 level 10 WT. DWT decomposition for an ECG signal is shown in Fig. 11. Detailed (D1-D10) and approximation (A1-A10) coefficients are represented in the figure. It is the segregation of a signal into various frequency bands so that time-frequency information of the signal can be extracted. As it can be seen in Fig. 11, A10 contains the lowest frequency band of the ECG signal and D1 contains the highest frequency band. These are the noises in the signal and hence, removed by filtering these particular bands. A10 is targeted for low frequency noise removal.
For high frequency noise removal, D1 and D2 are removed. Baseline correction and powerline interference noises are also performed at 0.15-0.8 Hz and 60 Hz respectively. The ECG signal after removing all these noises is reconstructed as noise removed ECG signals. It is shown in Fig. 12. In this figure, five samples of original ECG signals are shown and correspondingly, noise removed ECG signals are represented. After filtering the noise, next step is to compute features from this noise free ECG signal. Therefore, 2nd-, 3rd-and 4th-order cumulants comes into action as feature extractors.
2nd-, 3rd-and 4th-order cumulants are used here for obtaining the statistical features from the ECG signals. These are shown as samples in Fig. 13. Five sample of noised free ECG signals are shown with their corresponding 2nd-, 3rd-and 4th-order cumulants. As it can be seen from the figure, there are variations in the curves obtained by applying these multi-cumulants. These variations are because of variations in the ECG signal due to various types of arrhythmia problems. These arrhythmia problems creates disturbances in the ECG of a subject which are reflected in 2nd-, 3rd-and 4th-order cumulants. It also represents that using a single type of cumulant is not sufficient as a feature vector as it is unable to differentiate between different types of ECG signals. 3rd-and 4thorder cumulants work better in non-linear signals like ECG. Therefore, concatenating 2nd-, 3rd-and 4th-order cumulants will give the feature vector of size, computed by multiplying Eq. (26), by 3 (for concatenation of three cumulant's features) for various ECG signals in the databases. These feature vectors are then utilised to classify the ECG signals with the help of an evolutionary hybrid classifier. This classifier uses KELM which is a noniterative algorithm and overcomes the problem of overfitting that is generated because of ROST. Kernel function selected for the experimentation is RBF kernel. Its equation is shown in Table 4. Evolutionary algorithm helps in optimization of the parameters m l , C R and P K with the help of GA. Now, based on this proposed approach, classification of different types of ECG signals is represented according to the databases used for the experimentation.

MLII ECG database.
After applying balancing, pre-processing and feature extraction to the MLII ECG Database, classification of the ECG signals is performed based on the feature vectors obtained with vector size (2703 units) using Eq. (26). Size of the original signal (size 3600 units) is reduced to 2703 units; hence, dimension reduction is also obtained with the proposed approach. As shown in Table 1, after balancing using IRST technique each class of the MLII database have 59 samples in each class owing to a total of 1003 samples. These are divided into Train-Test ratio. Combination of ratios taken for this database are as follows: 20-80, 30-70, 40-60, 50-50, 60-40, 70-30, 90-10 and 95-05. Taking one by one each of these ratios, KELM is trained, following which testing is performed on the test samples. Values of KELM parameters ( C R , P K ) and feature vector parameter, m l are obtained by optimizing KELM using GA. All the results obtained for this database (for IRST based balancing of database) are shown in Table 7. Values of m l , C R and P K are also shown in the www.nature.com/scientificreports/ Figure 11. Decomposition for d1-d10 and a1-a10 of ECG Signal using db6 level 10 DWT. Similarly, experiment is performed on MLII ECG Database when data balancing is performed using ROST. In that case, 283 ECG signals are present in each class with 4811 signals in total. Same sequence of Train-Test Ratio is taken here as well and results are computed by training of KELM and optimizing m l , C R and P K using GA. Percentage error rate with their corresponding values of optimized m l , C R and P K are shown in Table 7.
Results obtained in this case are excellent as zero error rate is achieved on MLII ECG Database for this case. Zero error rate is achieved for 50-50 Train-Test ratio or when the training data is more than 50%. For 40-60 Train-Test ratio, 2.63% error rate is achieved. It means 16 ECG signals are misclassified out of 602 ECG signals.
UCI repository arrhythmia database. The UCI repository arrhythmia database is different from MLII ECG database. As discussed in "Databases used", UCI repository arrhythmia database is containing the parametric observations like P, Q, R, S, T peak values, their time durations, distance between various peaks etc. of the ECG signals. After performing same operations of the proposed approach on this database, two databases   10-90, 20-80, 30-70, 40-60, 50-50, 60-40, 70-30, 80-20 and 90-10. In this also, KELM is trained with each training set and correspondingly testing is performed with remaining test set. Results are shown in terms of percentage error rate in Table 8. The values of the KELM parameters ( C R , P K ) and the feature extraction parameter, m l are also obtained by optimizing GA to achieve minimum error rate. As with increasing training data, error rate reduces, therefore, 90-10 Train-Test ratio provides best result with 10.96% error rate in classifying the UCI repository arrhythmia corrected database when IRST is used for data balancing. It means 5 arrhythmias are misclassified out of 46.
Similarly, for 70-30 Train-Test ratio, 16.92% error rate is achieved. While, in the second case, when ROST is utilized for resampling the database, results achieved are excellent. In this, 3185 samples are present in this database with 245 samples of each class. Zero error rate is achieved when training data 30% or more is used. It means with only 30% of training data, zero error rate is achieved on Table 7. Percentage error rate with values of m l , C R and P K obtained using proposed approach on MLII ECG database.   Table 9. The values of the parameters m l , C R and P K are also shown in the table with each Train-Test ratio for corresponding percentage error rate. Similar operation is performed with ROST based balanced PTBDB ECG Database. After processing through pre-processing step and features extraction, the database is fed to evolutionary hybrid classifier. In this case, 21,010 samples are present with 10,505 in each class. The database becomes quite large still excellent results are achieved on this database when it is experimented with proposed approach. Sequentially dividing the Train-Test ratio from 10-90 to 90-10, the database is divided into train and test set. The results are shown in Table 9. Best value of result achieved is with 90-10 Train-Test ratio i.e. 0.43% error rate. It means nine signals are misclassified from 2101 signals of PTBDB Database. Similarly, with 70-30 ratio, 0.35% error rate is achieved giving 22 misclassifications out of 6303. According to the size of this database, the results achieved are excellent with the proposed approach of evolutionary hybrid classifier.
Comparison with other approaches. The results obtained on MLII, UCI and PTBDB ECG databases using proposed approach are compared with the existing state-of-art approaches. It is represented in Table 10. The performance measure selected for showing the ECG classification is percentage accuracy and it can be seen that excellent results have been achieved over these utilized databases.
On MLII ECG database, 100% accuracy is obtained with the proposed approach of evolutionary hybrid classifier with multi-cumulants as feature extraction step. Proposed method outperforms the other state-of-art approaches viz. KICA + LIBSVM 88 , 1-D CNN 65 , PCAnet + SVM 89 , CNN + LSTM 90 , Evolutionary-Neural System based on SVM 64 , WT-HMM model 91 , Ensemble SVM 92 . All these approaches used MLII database with 4 or 5 classes. They have merged the samples depending upon their category of cardiac disorders [88][89][90][91][92] . Only 64,65 have utilized the MLII database with 17 classes and achieved the accuracies of 90% and 91.33% respectively. They have obtained these results with Train-Test ratio 70-30. However, in our proposed approach, 100% accuracy is achieved with MLII database (with ROST balancing) with 50-50 Train-Test ratio. It also signifies that with lesser Table 9. Percentage error rate with values of m l , C R and P K obtained using proposed approach on PTBDB ECG database.  101 have achieved quite promising results over this database as shown in Table 10 except GWMD-DE technique 59 . In 59 , With 50-50 Train-Test ratio only, they have achieved 96% of accuracy in classifying the ECG signals of UCI arrhythmia database. With the proposed approach, we have attained 100% accuracy on this database as well with 30-70 or more Train-Test ratio as shown in Table 8. This result is using ROST for balancing of UCI database. With IRST as balancing technique, proposed approach outperforms the results of all other techniques in Table 10 except 59 as 80.19% accuracy is achieved with 50-50 Train-Test ratio over this database.
At last, proposed approach is compared on PTBDB ECG database having 13 classes with some of the latest state-of-art works on ECG classification. These includes Naïve Bayes 94 , RBF SVM 95 , Convolutional Neural Network 96 , Third-order tensor based analysis 50 , Deep Neural Network 98 , CNN with and without feature extraction 57 , Wavelet KELM 99 , DEA-ELM 100 . All the researchers have considered 2 classes in PTBDB ECG classes viz. normal ECG and abnormal ECG (i.e. with some cardiac disorder). Proposed approach outperforms all these techniques by achieving 99.43% of accuracy is classifying the ECG signals.
The reason behind achieving excellent results for the proposed approach is the use of pre-processing and feature extraction steps before classification. All the three databases of ECG utilized during experimentation face the problem of imbalance (Table 6) which is overcome by ROST and IRST. Noise removal is performed using Table 10. Comparison of the proposed approach with existing state-of-art approaches. KNN K nearest neighbour, RBF radial basis function, SVM support vector machine, KICA + LIBSVM kernel independent component analysis + library for SVM, CNN convolutional neural network, PCAnet principal component analysis network, LSTM long short-term memory, GWMD-DE Gabor wavelet multi-linear discriminant based data extraction, WT-HMM wavelet transform-hidden Markov model, DEA-ELM extreme learning machine using differential evolution algorithm. www.nature.com/scientificreports/ Daubechies (db6) with level of decomposition 10. Features are extracted in terms of statistical parameters using higher order cumulants. In the classification step as well, evolutionary hybrid classifier optimizes the various parameters to achieve preciseness in the results using the proposed approach.
Conclusion and future scope. For the classification of ECG signals, a novel and robust approach has been introduced. ECG signals refinement using pre-processing techniques and feature extraction using statistical measures is proved to be a precise and efficient approach for classification of ECG signals. The proposed approach provides excellent results irrespective of signal type whether complete ECG signals, parameters obtained from signals based ECG data or down-sampled ECG signals. Use of evolutionary hybrid classifier also helps in computing the results more precisely. This proposed approach based on feature extraction using multi-cumulants gives 100% accurate results for complete ECG signals of MLII and UCI repository arrhythmia database in which data is parameters obtained from ECG signals. The results obtained on PTBDB database are also very good with maximum percentage accuracy achieved are 99.24% (with IRST) and 99.57% (with ROST). Even the use of ROS techniques for balancing of database makes the processing slow by increasing the execution as the size of database becomes very large. Here, this problem of speed is compensated using the non-iterative classifier (KELM). The results obtained are better than the existing state-of-art approaches as it is already shown in previous section. As future work, the proposed method can be tested over some live ECG databases. More resampling techniques can further be tested for data balancing as the size of database becomes very large in ROST and creates an issue of slow processing. More refinement can be done in the down-sampled ECG database like PTBDDB database to achieve more accurate results.