Personalized Monitoring and Advance Warning System for Cardiac Arrhythmias

Each year more than 7 million people die from cardiac arrhythmias. Yet no robust solution exists today to detect such heart anomalies right at the moment they occur. The purpose of this study was to design a personalized health monitoring system that can detect early occurrences of arrhythmias from an individual’s electrocardiogram (ECG) signal. We first modelled the common causes of arrhythmias in the signal domain as a degradation of normal ECG beats to abnormal beats. Using the degradation models, we performed abnormal beat synthesis which created potential abnormal beats from the average normal beat of the individual. Finally, a Convolutional Neural Network (CNN) was trained using real normal and synthesized abnormal beats. As a personalized classifier, the trained CNN can monitor ECG beats in real time for arrhythmia detection. Over 34 patients’ ECG records with a total of 63,341 ECG beats from the MIT-BIH arrhythmia benchmark database, we have shown that the probability of detecting one or more abnormal ECG beats among the first three occurrences is higher than 99.4% with a very low false-alarm rate.


ABS Filter Design
An abnormal beat synthesis (ABS) filter models the degradation transformation of a regular normal beat to become an abnormal beat. This degradation basically represents one of the common causes of the cardiac arrhythmias that physically degrades the healthy heart (regular normal beat) to an unhealthy one that produces abnormal beats. Our main assumption is that ABS filters are linear and time-invariant (LTI). With this assumption, one can write the input-output expression as, (1) where a and b are N-length input and output signals corresponding to (regular) normal and abnormal beats, respectively, and h is the M-length filter coefficients of the LTI system. H(f) and B(f) are the DFT of h and b, respectively. One can easily derive h from the IDFT of the ratio between the DFTs of b and a, as follows: where A(f) and B(f) are the DFT of the input and output signals a and b, respectively. However, the singularities and low values of A(f) caused to noise will make it infeasible to compute h accurately. Instead, we shall derive h using the Least-Squares (LS) optimization directly. One can write the linear convolution as, Note that the convolution output will have the length, N+M-1 where we only consider the first N samples since the output signal (abnormal beat) has the same length as the input. This can be written in matrix equation as follows: However, matrix, A may not be of full rank, i.e., rank( A ) = r < M, in which case, A A T will be singular and the inverse cannot be computed. To address this problem, we make use of the Singular Value Decomposition (SVD) of A as follows, where U and V are NxN and MxM orthogonal matrices which holds the eigenvectors of the square matrices, T AA and A A T , respectively, as the column vectors. The NxM matrix,  , can be expressed as, However, the LS solution, LS x , can still yield large values, the so-called "explosion" of the LS solution, due to noisy values in matrix A (the input) or in vector b (the output), or both since in general they are both susceptible to noise. A crucial disadvantage therein is that the smaller non-zero singular values will result in even larger explosion of LS x . In order to prevent this, we shall regularize the LS solution by optimizing the LS error together with the magnitude of the LS solution as, where  is the regularization parameter. It is straightforward to show that this joint optimization can be expressed as, where  A is now an Mx(M+N) full-rank matrix (r=M) and therefore, the LS solution over  A can be obtained by using Eq. (6) as, . Therefore, using the orthogonality of the eigenvectors, one can write the eigenvector decomposition of   A A T , and its inverse as follows: Using Eqs. (7) and (9)

Figure 1: Illustration of two ABS filter designs: the top one is for a V-beat in single-beat representation and the bottom one is for an S-beat in a beat-trio representation.
Comparing regularized LS solution in Eq. (15) with the LS solution in Eq. (9), the explosive effects of those noise-like singular values over the solution can now be significantly suppressed with a practical choice of  , e.g., As a result, one can design an ABS filter using Eq. (15) for each pair of normal-abnormal ECG beats and as illustrated in Figure 1, a library of ABS filters can be designed for each representation (singlebeat and beat-trio).

ABS Filter Selection
Comparing regularized LS solution in Eq. (15)with the LS solution in Eq. (9), the explosive effects of those noiselike singular values over the solution can now be significantly suppressed with a practical choice of , e.g., . As a result, one can design an ABS filter using Eq. (15) for each pair of normal-abnormal ECG beats and as illustrated in Figure 2, a library of ABS filters can be designed for each representation (single-beat and beat-trio). Finally, two filter selection models were applied to eliminate similar and all-pass (impulsive) filters. The former occurs if the abnormal beats of the subject are similar with each other. In that case one or few representative filters suffice to model abnormal beat syntheses from that patient. The latter occurs when the abnormal beat is similar to the ANB. Especially for single beat representation some S-beats can have the same pattern as an N-beat. Since the aim is to model the syntheses of abnormal beats from a normal beat, those "all-pass" filters can be left out. Both selections are performed by evaluating the mean-normalized variance of the filter coefficients. Those filters, which yield the highest variances will be selected into the ABS filter library. Over the training partition of the database we selected 464 filters to form the ABS filter library and used them to synthesize abnormal beats for the subjects in the test partition. We set 1 . 0   and the minimum variances for filter selection are, 0.1 and 0.15 for the S and V type abnormalities, respectively. The reason for different variance settings is the scarcity of S beats compared to V beats, hence we balance the final number of filters for each of them. There is no selection for Q and F type anomalies since only a few (13+7=20) of them exists in the training partition. So, all ABS filters for Q and F type abnormal beats are kept in the library. Figure 2 presents the ABS filters formed for the V beats from the patientspecific training data of the Subjects with IDs 118 and 119. Note that for the Subject 119 only 4 out of 81 ABS filters are selected due to the high similarity among the filter coefficients while all filters are selected for the subject 118.
The reason we selected this many filters although there are only few dozens of common causes for arrhythmia is threefold: 1) It is desired to have more filters for a common cause which is more frequent among the arrhythmia patients. This will provide more samples for this type of arrhythmia and thus the personal classifier will have more positive samples for training. 2) This number of abnormal beats yields a balanced training dataset into which the normal beats from the first 5 minutes (i.e., 300-400 beats) are inserted. 3) Even multiple filters from the same degradation type (common cause) will emerge, the characteristics of the degradation occurred may be different and each filter can capture a distinct degradation instance. For example, the S type beat typically occurs with an early activation of the QRS waveform. However, the timing of the early activation may be different from subject to subject. Therefore, it makes sense to capture and model different S type occurrences with several ABS filters.

1D Convolutional Neural Networks
In this study, we use the 1D Convolutional Neural Network (CNN) architecture shown in Figure 3. There are two types of layers in the proposed 1D CNN: 1) CNN-layers where both 1D convolutions and sub-sampling occur, and 2) Fully-connected layers that are identical to the hidden and output layers of a typical Multi-Layer Perceptron (MLP).
Layer l Layer (l+1) SS (2) US (2) kth neuron 1x22 1x20 1x10 1x8 1x8 1x8 1x20 1x10 w is the kernel weight from the the i th neuron at layer l-1 to the k th neuron at layer l. With such an "adaptive" design it is aimed that the number of hidden CNN layers can be set to any practical number because the subsampling factor of the output CNN layer (the hidden CNN layer just before the first fully-connected layer) is set to the dimensions of its input map, e.g., in Figure 3 if the layer l+1 would be the output CNN layer, then the subsampling factors for that layer is automatically set to ss = 8 since the input map dimension is 8 in this sample illustration. Besides the sub-sampling, note that the dimension of the input maps will gradually decrease due to the convolution without zero padding, i.e., in Figure 3, the dimension of the neuron output is 22 at the layer l-1 that is reduced to 20 at layer l. As a result, the dimension of the input maps of the current layer is reduced by K-1 where K is the size of the kernel.

Back-Propagation Training for 1D CNNs
We shall now briefly formulate the back-propagation (BP) steps while skipping the detailed derivations due to the space limitations. The BP of the error starts from the output layer. Let l=1 and l=L be the input and output layers, respectively. The error in the output layer can be written as, For an input vector p, and its corresponding output vector, ] ,...., [ 1 L N L L y y , we wish to compute the derivative of this error with respect to an individual weight (connected to that neuron, k) 1  l ik w , and the bias of neuron k, l k b , so that we can apply gradient descent to minimize the error, accordingly. Once all delta errors in each fullyconnected layer are determined by BP, the weights and bias of each neuron can be updated by the gradient descent method. Specifically, the delta of the k th neuron at layer l, l k  will be used to update the bias of that neuron and all weights of the neurons in the previous layer connected to that neuron, as: Once the first BP is performed from the next layer, l+1, to the current layer, l, then we can further back-propagate it to the input delta, l k  . Let zero order up-sampled map be: , then one can write:

Performance Evaluation Metrics
As mentioned earlier, abnormal beat detection is a binary classification problem that assigns beats to either normal (N) or abnormal (S, V, Q, or F). Since this is originally a 5-class problem in order to compute the two binary detection performance metrics, abnormal beat detection accuracy (Acc) and false alarm rate (FAR), we shall first cumulate the 5x5 confusion matrix (CM) of each run and then deduce a binary (2x2) CM from the cumulated 5x5 CM. Let the cumulated CM be parametrized as in Table I where the ground-truth numbers are shown in the columns. The deduction of this 5x5 CM to 2x2 CM is basically the fusion of the abnormal classes (S, V, F and Q) into a single abnormal class (A), while keeping the normal (N) as the other class. The deducted CM is shown in Table II. The number true-negatives, (TN) is the number of N beats correctly classified which will be equivalent to N0 in 5x5 CM. The number of false-positives, FP, is the number of misclassified N beats; therefore, FP = N1 + N2 + N3 + N4. The number of false-negatives, FN, is the misclassified abnormal beats, or equivalently the total number of abnormal beats classified as N beats. Therefore, FN = S1 + V1 + F1 + Q1. Finally, the number of truepositives, TP is the number of correctly classified (detected) abnormal beats and TP is, therefore, the sum of all entries in the red block in Table I. Note that we are ignoring the misclassifications among the abnormal classes as long as an abnormal beat is assigned to an abnormal class. With these definitions, we can define the standard performance metrics as follows: Accuracy is the ratio of the number of correctly classified patterns to the total number of patterns classified, Acc = (TP+TN)/(TP+TN+FP+FN); Sensitivity is the rate of correctly classified events among all events, Sen = TP/(TP+FN); Specificity is the rate of correctly classified nonevents among all nonevents, Spe = TN/(TN+FP); and Positive Predictivity is the rate of correctly classified events in all detected events, Ppr = TP/(TP+FP). Since there is a large variation in the number of beats from different classes in the training/testing data, Sensitivity, Specificity, and Positive Predictivity are more relevant performance criteria for medical diagnosis applications. Especially Sensitivity can also be interpreted as the average probability of detecting an abnormal beat accurately. Obviously, 1-Sen is also the probability of missing the detection of an abnormal beat. Therefore, the probability of missing all n consecutive abnormal beats will be, Pn = (1-Sen) n . Finally, 1-Pn , will, therefore, be the detection probability of at least one or more abnormal beat(s) among n consecutive occurrences.