The hidden waves in the ECG uncovered revealing a sound automated interpretation method

A novel approach for analysing cardiac rhythm data is presented in this paper. Heartbeats are decomposed into the five fundamental P, Q, R, S and T waves plus an error term to account for artifacts in the data which provides a meaningful, physical interpretation of the heart’s electric system. The morphology of each wave is concisely described using four parameters that allow all the different patterns in heartbeats to be characterized and thus differentiated This multi-purpose approach solves such questions as the extraction of interpretable features, the detection of the fiducial marks of the fundamental waves, or the generation of synthetic data and the denoising of signals. Yet the greatest benefit from this new discovery will be the automatic diagnosis of heart anomalies as well as other clinical uses with great advantages compared to the rigid, vulnerable and black box machine learning procedures, widely used in medical devices. The paper shows the enormous potential of the method in practice; specifically, the capability to discriminate subjects, characterize morphologies and detect the fiducial marks (reference points) are validated numerically using simulated and real data, thus proving that it outperforms its competitors.


Introduction
The importance of the ECG signal in diagnosis and prediction of cardiovascular diseases is worth noting.The process recorded in the ECG is the periodic electrical activity of the heart.This activity represents the contraction and relaxation of the atria and ventricle, processes related to the crests and troughs of the ECG waveform, labelled P , Q, R, S and T (see Figure 1 (a)).The main features used in the medical practice are related to the location and amplitudes of these waves.A standard ECG signal is registered using twelve leads calculated from different electrodes being Lead II the reference one.
The mere visual observation of the ECG signals, although made by a consolidated expert, is not enough to discover the diversity of abnormalities and the specific characteristics of the morphology of each ECG.Moreover, it requires an enormous amount of human expertise resources.Therefore, a rigorous automatic analysis of digitalized ECG signals can be of great help.However, although it has been a question that has received a lot of attention in the literature over the last decades, there is still no suitable mathematical model or computational approach, that accurately describes the spectrum of morphologies in ECG signals, as is noted in recent references on this topic, such as [1], [2], [3], [4] or [5], among others.
The literature addressing the problem of the automatic interpretation of the ECG is so extensive that it is difficult to include a complete review here.The most widely used model-based approach describes the main waves with a combination of basic functions, the Gaussians being the preferred ones, for a single or average beat.A precursor model was proposed by [6] and was more recently considered by [7] or [8] among others, whom proposed improvements in the formulation and estimation algorithms.[9] also recently uses this approach for the predictive modelling of drug effects on ECG signals.These approaches have important shortcomings.In particular, the Gaussian functions fail to reproduce the morphology of the waves in a simple way, especially for atypical and noisy ECGs, where the complexity as well as the risk of over fitting increase.Moreover, most of the parameters do not have a specific morphological meaning.Other examples of model-based proposals are those by [10], [11], [12], [13] or [14].These approaches may be suitable to study some specific questions, but, they are far from being multi-purpose methods.
However, many of the recent papers are contributions to computational and machine learning approaches.Some of the large list of references are: [15], [16], [17], [18], [19], [20] or [21].Also, the papers by [4], [22], [23], [24], [25] and [26] extended the list of procedures and their pros and cons for the automatic analysis of ECGs.In general, machine learning approaches success is very dependent on the training set, the selection of diagnostic groups, the preprocessing and the data base.Furthermore, they are rigid and black-box procedures that are susceptible to adversial attacks [27].
The approach, called F M M ecg , presented in this paper is just the opposite.
This novel approach combines a physically meaningful formulation with good statistical and computational properties.F M M ecg is a multicomponent model, where each component is a single F M M (Frequency Modulate Möbius) oscillator and specific ECG parameter restrictions are included.Single F M M models are recently proposed by [28] to predict oscillatory signals in several different fields from biology to astrophysics.The distinguishing feature of the F M M model is that it is formulated in terms of the phase, which is the angular variable that represents the periodic movement of the oscillation.Specifically, the F M M ecg model is defined as the combination of exactly five oscillatory components referred to as waves: W J (), J = P, Q, R, S, T , which correspond to the fundamental waves in a heartbeat; plus an error term that accounts for artefacts in the data.Four parameters characterize each wave and, a Maximization-Identification (MI) algorithm is designed to estimate them.This algorithm alternates, iteratively, between a maximization M-step and a wave-identification I-step.While the model proposal is valid for signals registered elsewhere, the I-step is lead-specific.Nevertheless, the I-step can be easily adapted to signals registered in other regions.
The main virtues of the novel approach can be summarized in five points which are validated in the paper.Firstly, the F M M ecg model is physically meaningful representing the conduction of the electrical signal by the combination of five main waves presented in a normal heartbeat.Therefore, alterations in a specific wave identifies the part of the heart responsible.Secondly, for each wave, four parameters are extracted, measuring, amplitude, location, scale and shape.These parameters are able to characterize, reproduce and identify the variety of morphologies observed in real ECG signals.In addition, other interesting features are easily derived from these main parameters.Thirdly, the MI algorithm provides accurate and robust estimates of the model parameters discarding overfitting problems.Fourthly, the approach is not dependent on a training set and is valid for any ECG registered signal, independently of the preprocessing, frequency or scale.Finally, the approach has strong theoretical properties: is maximum likelihood based while assuming Gaussian errors, the parameters are identifiable and the estimators are consistent.
The validation of the F M M ecg approach is not simple as there are many properties that the model is supposed to verify.Moreover, there is no multi-purpose approach in the literature similar to F M M ecg .Therefore, the main properties of F M M ecg are validated considering diverse alternative approaches.On the one hand, for global goodness of fit consistency, robustness and discriminative power, the F M M ecg is compared with a modelbased approach, which considers a combination of Gaussian components, similar to that proposed by [8].On the other hand, the ability to detect fiducial marks is compared with several recent machine learning approaches, in particular, those considered by [18].In this paper, we deal with signals from Lead II and close to it.Simulated and publicly available data from databases in Physionet (www.physionet.org)[29] are used.Very promising results have been obtained from real data.For example, Figure 1 shows the result of applying the F M M ecg to data from patient sel106 in MIT-BIT database, a representative, typical pattern used by many authors.The waves drawn in Figure 1 (a) have not been artificially generated, but are simply the estimators provided by the MI algorithm for the five waves: 2 Overview of the F M M ecg model Suppose X(t i ), t 1 < ... < t n are observations from one beat.Without loss of generality, we assume that t i ∈ [0, 2π] (in any other case, transform the observed time points as in [28]. For J ∈ {P, Q, R, S, T }, let υ J = (A J , α J , β J , ω J ) be the four-dimensional parameters describing the waveforms in such a way that Then, the F M M ecg model, is defined as a parametric additive signal plus error model as follows: where, and The incorporation of circular order restrictions among the α's represent the ordered movement of the stimulus from the sinus node to the ventricles, passing through the atria, this giving the model physical interpretability.The restrictions guarantee the identifiability of the parameters once main wave R is located.
The parameter M is an intercept parameter and the components of υ J describe different aspects of the morphology of wave J. Specifically, the parameter A J measures the wave amplitude; a zero value indicating that the corresponding wave is not present.The parameter α J is a location parameter.In addition, β J and ω J measure skewness and kurtosis, respectively.More specifically, assuming α J = 0, the values for parameter β J close to π (or 2π) represent a unimodal symmetric wave (or an inverse unimodal symmetric wave); as β J moves away from these values, the patterns are more asymmetric and the values of β J equal to π/2 or 3π/2 describe a wave with both crest and trough with completely asymmetric patterns.The parameter ω J measures the sharpness of the peak, ω J = 1 corresponds to an exact sinusoidal shape and, as ω J approaches zero, the sharpness becomes more pronounced (see [28] for more details in parameter interpretation).
Other features extracted from the main parameters are the marks for crest (t U ) and trough times (t L ), defined for J ∈ {P, Q, R, S, T } as follows: Moreover, measurements of inter-wave intervals, as those in Figure 1 (a) are calculated using angular distances between these marks, and other features, such as those used in the literature of ECG interpretation, can be easily derived from the main parameters.However, while the estimation of features proposed in the literature often depends on the algorithm and voltage measurements [4], F M M ecg provides systematic and reliable measurements.
In the estimation process, to improve the waves identification when atypical patterns are observed, additional conditions are imposed.
The dependence of signal, waves and model on the parameters θ or υ is omitted when no confusion across this paper.

Validation
Three different validation analyses have been performed.The first two refer to the QT database [30] and the third is a simulation experiment, which is deferred to Supplementary Information.The QT database was chosen as it has been used recently by several authors and provides a wide range of morphologies associated with healthy and pathological ECG's.The database contains 105 ECG records and signals from two leads.We analyse the segment for each patient for whom the T or P waves have been manually annotated, as well as the data corresponding to the signal closest to Lead II (in most cases it corresponds to the first signal).For patient sel 42, data from the first signal are not reliable, instead, the inverse of the second one is analysed as it represents a signal closer to Lead II.
A total of 3,623 single beats signals have been analysed.The validation includes, the global fit of the model, the identifiability of parameters, the accuracy and consistency of estimators, the robustness of the model against noise, the capability to characterize different morphologies, but also the performance in specific tasks of practical interest as the subject discrimination or the determination of the fiducial marks of T and P waves.

Analysis of QT database signals. Graphical and analytical results.
For each single beat, the value of a coefficient of determination that measures the proportion of the variance explained by the model out of the total variance, is denoted by R 2 and is obtained as follows: The R 2 values are very high across patients, being R 2 global mean (SD) equal to 0.98(0.02).2, the first five correspond to the most frequent categories according to Physionet's classification of the heartbeats by their morphology.The selected categories are the ones that appear most frequently in the databases and are identified as: NORMAL (typical pattern), PACE (Paced beat), RBBB (Right bundle branch block beat), APC (Atrial premature beat), and PVC (Premature ventricular contraction); besides a NOISY pattern is also considered.The NOISY pattern exhibited both, low and high frequency noise as the zoom in the corresponding plot shows.The R 2 specific means, is equal to 0.92 for the NOISY and higher than 0.98 for the others.
It is interesting to observe how the specific shapes of the five main waves contribute to draw the observed pattern of the different morphologies as it is shown in Figure 3.The estimated values of the parameters, recorded on the right side of the plots, quantify and describe the patterns, and explain the differences between the morphologies.
On the other hand, the potential of the F M M ecg parameters to solve the problem of subject identification is also shown.A Fisher linear discriminant analysis is applied, using as predictors: A J , ω J , β j ; J = P, Q, R, S, T (where missing values are replaced with the median value of the corresponding patient) and the one-leave-out rule to estimate the error rate.Only 8.6% out of the 3,623 beats do not correctly identify the true patient.This error rate is very low taking into account the difficult task of discrimination among the 105 patients.As far as we know, this is the first time that this milestone has been achieved for the QT database, since other authors consider specifically selected sets of patients of a much smaller size ( [31] and references therein).Moreover, a complete analysis is provided in the Supplementary Information, including specific-patient plots and statistics for the main F M M ecg parameters, see Figures S4-S10 and Table S4, respectively.The results reveal consistency and reliability of estimators and a great potential for individual identification tasks.

Analysis of QT Data base signals. P and T wave annotations.
This question is still a challenge as [32], [33], [34] or [35], among others, confirm.Let t F I J , J = T, P be the fiducial F M M ecg marks.Where if wave J is positive ( t F I J = tU J ) or negative ( t F I J = tL J ) is determined by β J and μ(t F I J ).In order to perform a fair comparison with alternatives approaches, we follow the analysis in [18].Several measures are calculated to assess the wave detection: sensivity (Se = T P T P +F N ), positive predictive value (P P V = T P T P +F P ), detection error rate (DER = F P +F N T P +F N ) and F1 score (F 1 = 2T P 2T P +F P +F N ), where TP is the number of true positive detections, FN stands for the number of negative detections and FP stands for the number of false positive detections, that is, when the fiducial mark is outside the range of ±75ms Table 1 shows the results, along with the four best methods in [18], i.e., Martinez PT, Martinez WT+templates, Martinez WT+PT and Martinez PT + templates.F M M ecg gives the best results for all the validation measures and for both P and T peak/trough detection.It is especially striking that DER is less than halved in comparison to other methods for both T and P wave detection.The accurate detection of waves provided by F M M ecg is more valuable as the algorithm has not been specifically designed for this task, as it also serves other purposes.Specific patient measures are given in Tables S5 and S6.Besides, Figures S11-S16 show cases where the F M M ecg annotation is correct but is annotated mistaken as FN or FP.In some of those cases, what happens is that Physionet annotation uses the information from the second signal or from a close beat.In other cases, what happens is the F M M ecg annotation is more reasonable than, or as least as reasonable as, the Physionet annotation, although different.These cases indicate that the good F M M ecg results from Table 1 could even be improved.

Discussion
From the methodological point of view, two contributions are proposed in this paper that have never been described before in the literature.On the one hand, a regression model with multiple oscillatory components, which is formulated in terms of angular variables that represents the periodic movement of the waves, and that incorporates restrictions among the parameters, is considered.And, on the other hand, an MI original algorithm of estimation is designed.These methodological contributions have been proved here to be very relevant for their application in the description of the cardiac rhythm, but the potential is higher as they will likely be able to solve problems in other fields.
As for the contributions to the automatic diagnosis of cardiovascular diseases and other clinical uses, the highlight of our approach is that it provides a set of new parameters and features with high descriptive potential which provides a concise analytical description of the morphology of the five main waves; specifically, its high capacity in human recognition has been demonstrated.Moreover, it is also very reliable even in abnormal and poor quality ECGs, it does not use training data and it works independently of preprocessing, scale and frequency.
The F M M ecg parameters can be very useful to generate an automatic diagnostic by imitating the recognition skills of human beings, because estimated values under a given condition can be compared with reference values.In addition, the influence of such factors as age, gender, physical condition, medication, anatomic or genetic differences can be taken into account.In fact, actual automatic diagnosis proposals fail due to two main causes; firstly, because different and unreliable measurements are used; secondly, because different problems in origin generate partially similar morphologies and, conversely, a certain anomaly is not associated with a single pattern.Using personalized reference ranges avoids false positives in diagnosis and subscribes to the global trend towards personalized medicine.
Moreover, the new parameters can be used in experimental essays to test medical and preventive strategies, to study the evolution of the heart's functioning, or in biometric identification.
The limitations of the approach, which are also challenges and extensions for future research, are sketched out next.
Firstly, a catalogue of interesting patterns together with their parametric characterization must be elaborated in collaboration with an expert.This question is partially addressed here, but a much more precise and detailed study is needed.This task should be done by the incorporation of identification algorithms from other leads.
Secondly, there are a few patterns, such as the Atrial Flutter, that do not fit well into the five main wave paradigm, but for which it is possible to design a specific algorithm.The analysis of multiple leads would also facilitate the wave identification task and provide more accurate results.
Finally, the incorporation of covariates, the definition of multivariate models and dynamic models, are statistical extensions to be studied that have several applications in the clinic.Specifically, the covariates would serve to assess the influence of medication or the effect of interventions and multivariate and dynamic models would serve to describe spatio-temporal behaviours and model relationships between biological processes.

Methods
The application of our method for the QT database analysis and simulations assumes that QRS annotations are provided.The detection of the QRS complex is a highly researched problem and well solved; interesting references on the subject are [36], [37], [38], [39] and [40], among others.The QRS annotations and RR values (distances between consecutive QRS annotations), provided by Physionet, are used to select the specific segment corresponding to a single beat in our data analysis.For a given QRS annotation, t QRS , let RR − and RR + be the RR obtained from the previous and the next QRS annotation, respectively.Then, the input for the analysis of a single beat are the observations, X(t i ), where t i ∈ [t QRS − 40%RR − , t QRS + 60%RR + ], i = 1, ..., n, which before entering the algorithm, pass a trend removal step to reduce the influence of the low frequency noise, if necessary.
The MI algorithm, described below, uses these input data to derive predicted values for the voltage and features.

MI Algorithm
Consider the model in Definition 1.The estimation problem reduces to solving the following optimization problem: Where Θ is the parametric space.For a typical ECG pattern Θ is simply defined as in Definition 1 through the restrictions among the α's.However, in order to arrive to a right identification of letters in atypical patterns in real practice, additional restrictions are needed.Mathematically, it means that Θ is reduced and are incorporated as thresholds in the algorithm.
The optimization problem above is computationally intensive and it is solved using a iterative algorithm which alternates M and I steps that provide successive estimators for W J , J = P, Q, R, S, T .The M step provides K ≥ 5 oscillatory components using a backfitting algorithm and the I step assigns K ≤ 5 letters to, at most, five of these components.Typically, K = 5, however, in the presence of significant noise or when the morphology is pathological, sometimes, the interesting waves may be null or be hidden between the sixth or seventh component (very exceptionally in others).For each component, the F M M parameter values and percentage of explained variance, PV, are computed.The latter defined as follows, where R 2 1,...,k , defined in (1), refers to a multicomponent F M M model with K = k components.For atypical patterns, the identification is done using thresholds which have been checked over many previous fits to a wide variety of ECG patterns in Physionet.
The initial values for the components to start the backfitting are those of the waves assigned so far and zero for the rest.The algorithm finishes when there is no significant increase in the percentage of variance explained or when a maximum number of iterations is attained.An increase of less than 0.01% in the percentage of variance explained and a maximum of 10 iterations has been used in the analysis of the QT database.

M step:
The backfitting algorithm is designed by fitting a single F M M component succesively to the residuals.To fit a single component, an adapted algorithm from that in [28] is developed.The numbers of backfitting passes depends on the initialization.In the first M step up to 5 full turns of the backfitting are made.

I step:
The R is assigned in the first place.R wave corresponds to the component, in the top five, with highest PV between components close to t QRS , π/2 < β < 5π/3 (with a crest not a trough), ω < 0.12 (sharp) and maximum µ(t U J ) (exceptionally the second maximum).Next, preassignation of P, Q, S and T to the free components among the first five is done using α P ≤ α Q ≤ α R ≤ α S ≤ α T .This preassigment corresponds to the definite assignment in typical patterns.Successive steps are needed when the preaasignation components do not exhibit the expected wave morphology features, known from literature; it can be due to the absence of a wave or to the presence of noisy components.New assignations of letter to components are conducted using thresholds on the F M M parameters that represent the previous knowledge.For instance, thresholds to decide between P or Q, are derived assuming that Q is between P and R (α P ≤ α Q ≤ α R ), Q is often sharper (ω Q < ω P ), and Q has a trough, while P has a crest.Noisy components are detected with small PV 's and ω values.
The outputs will be considered satisfactory (OK) only when the five letters are assigned and the parameters of the corresponding components describe the expected morphology.

Figure 1 :
Figure 1: (a) The five waves : P ,Q,R,S,T derived from the F M M ecg model and some of the main features that are derived from the parameters of the model in a simple way.(b) Observed signal (black points) and F M M ecg fit (blue).Data from patient sel 106 from MIT-BIT Arrhythmia Database from Physionet (http://www.physionet.org)

Figure 4 = 5 RFigure 4 :
Figure4shows a flowchart of the algorithm where different colours are used for M and I steps.The R code to implement the algorithm is available from corresponding author on reasonable request.

Table 1 :
Summary of performance measures P and T waves detection from QT first signal data.