Accurate detection of paroxysmal atrial fibrillation with certified-GAN and neural architecture search

This paper presents a novel machine learning framework for detecting PxAF, a pathological characteristic of electrocardiogram (ECG) that can lead to fatal conditions such as heart attack. To enhance the learning process, the framework involves a generative adversarial network (GAN) along with a neural architecture search (NAS) in the data preparation and classifier optimization phases. The GAN is innovatively invoked to overcome the class imbalance of the training data by producing the synthetic ECG for PxAF class in a certified manner. The effect of the certified GAN is statistically validated. Instead of using a general-purpose classifier, the NAS automatically designs a highly accurate convolutional neural network architecture customized for the PxAF classification task. Experimental results show that the accuracy of the proposed framework exhibits a high value of 99.0% which not only enhances state-of-the-art by up to 5.1%, but also improves the classification performance of the two widely-accepted baseline methods, ResNet-18, and Auto-Sklearn, by \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$2.2\%$$\end{document}2.2% and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$6.1\%$$\end{document}6.1%.


Introduction
Recent progresses in artificial intelligence and Deep Learning (DL) methods created a leap toward automatic decisionmaking in various domains including health and medicine.Sophisticated Deep Learning (DL) methods have been proposed for classifying biological signals [20], including heart sound [21,22] and electrocardiogram [13].Electrocardiograph (ECG) is a recording of the electrical activity of the heart, reflecting possible disorders in heart function.Paroxysmal Atrial Fibrillation (PxAF) is a disorder in the electrical activity of the heart that can lead to adverse events such as cardiac stroke [42].Screening patients with PxAF is currently performed by physicians in their clinical practice, and the development of a reliable system for automated detection of PxAF is a need for any healthcare system.Several methods have been proposed for detecting PxAF from the ECG signal [13,23,45,50], from which the DLbased ones are considered as the state-of-the-art of this topic [45,52].Nevertheless, accurate detection of PxAF is still an open research question [45,52]).
We hypothesize two issues that could have led to inaccurate PxAF diagnosis.Firstly, the class imbalance is commonly seen in most of the public ECG databases, where the size of the class with PxAF arrhythmia is by far smaller than the one with normal cases.Secondly, the backbone architectures used in the state-of-the-art studies may not be optimal as they were manually designed for image classification tasks.
One solution to tackle the first issue is to increase the group size of the minority class, i.e., the PxAF class [29], by producing synthetic data from the real ones.Patients' real data are being recorded electronically by healthcare providers and private industries.However, the recorded data is hardly accessible to scientists due to patient privacy concerns.Even when researchers are able to access high-quality data, they must ensure that the data is properly used and protected in a legal and ethical manner which is a time-consuming process [24].
Generating synthetic medical data has been broadly explored for various sorts of medical data including physiological signals [55].Synthetic ECG data has been reported as the case study in several reports (Section 3.2).Recently, Generative Adversarial Networks (GANs) have demonstrated impressive performance in medical data augmentation.However, the synthetic ECGs, generated by GAN, are mostly immature to be used as the training data due to morphological irrelevance, and thus, leveraging them in the training process can mislead the classifier.As we will see in the sequels, this important point is elaborately considered by the proposed method.
Neural Architecture Search (NAS), as an automated technique for designing artificial neural networks, has recently received attention from researchers and engineers.It provides a solid tool to achieve an optimized architecture for the problem of designing an optimal machine learning solution.Applicability of this technique has been explored in different domains such as biomedical engineering, in which classification of physiological signals is an important challenge [16,34,38,41].
In this paper, we propose an original framework for detecting PxAF arrhythmia based on an enhanced combination of GAN and NAS.The framework is composed of three compartments: 1) data enrichment, 2) signal processing, and 3) machine learning compartments.The proposed framework introduces innovative ideas in the methodologies employed for this important research question.It proposes the use of a GAN architecture for data enrichment in a new manner, named certified-GAN, in conjunction with the original signal processing and machine learning methods.The performance of the framework is statistically evaluated both holistically and independently for each compartment.The accuracy of the framework in detecting PxAF was estimated to be 99%, exhibiting a considerable improvement in the state-of-the-art.
To the best of our knowledge, this paper is the first study proposing an automatic methodology for certified synthetic data generation and designing an accurate CNN architecture for PxAF detection.We name this combination of certified-GAN and NAS for PxAF detection as Deep-PxAF.The contributions of this paper are: • A novel data enrichment method is proposed that enables the generation of the certified synthetic PxAF samples based on the recommendations of an expert physician (Section 4.2).• A novel data pre-processing approach is proposed to improve the detection performance (Section 4.3).• A cell-based neural architecture search method is employed to design a specialized CNN architecture for the PxAF detection task (Section 4.4).• We provide extensive experiments to demonstrate the effectiveness of Deep-PxAF (Section 6).Plus, we discuss the reproducibility results of the proposed method (Section 7).
Results show that Deep-PxAF achieves higher accuracy compared to handcrafted DL architectures and automated machine learning (AutoML) tools on the PhysioNet PxAF database [8].Moreover, Deep-PxAF shows stable results with marginal differences with multiple repetitions, confirming the reproducibility of the results.The database of certified labels is open-access and can be used by any researcher for scientific purposes.

Draft
Accurate Detection of Paroxysmal Atrial Fibrillation with Certified-GAN and Neural Architecture Search 2 Preliminaries

Paroxysmal Atrial Fibrillation
ECG is a registration of the electrical activity of heart cells.A normal ECG is a cyclic signal composed of several waves and peaks within each cycle from which the QRS complex, T-wave, and P-wave are mostly regarded as indicative patterns of the signal.Fig. 1.a depicts a normal ECG signal along with the indicative patterns occurring in a certain order in time.The cyclic behavior of the ECG signal comes from the fact that heart muscles have two phases of activity: contraction and relaxation.A contraction is normally followed by a relaxation, where the contraction is initiated from the right atrium down to the ventricles and returned to its initiating point to create a self-stimulating activity through the heart muscles with a rhythmic behavior.This rhythmic action is projected to the ECG signal.The P-wave and QRS complex coincide with the atrial and ventricular contraction, respectively, while the T-wave results from the ventricular relaxation.In the cardiac investigation, a complete relaxation followed by a left ventricle contraction is known as the cardiac cycle.However, for simplicity in ECG signal processing, a cardiac cycle can be defined as the interval between two successive R-peaks for computerized processing.
The morphology of an ECG signal conveys important information about the heart's electrical activity and, to a lesser extent, about its mechanical activity.This includes not only the duration of the QRS complex and the time intervals between the waves and the complex, but also the amplitude of the patterns.Deviation from the typical characteristics of ECG can be resulted either from a physiological condition such as sinus arrhythmia or from pathological conditions, e.g., arrhythmia.Sinus arrhythmia can be dominantly caused by respiration.Paroxysmal Atrial Fibrillation (PxAF) is a pathological condition of the electrical heart action that can happen when the atrial contraction is performed inappropriately.PxAF can initiate an arrhythmia and requires medical considerations and sometimes appropriate management.As can be seen in Fig. 1, cardiac cycles show a physiological variation of sinus rhythm with clearly visible P-waves in all the cycles.In contrast, in the PxAF case, the P-waves show noticeable alterations over the cycles along with the arrhythmia.An association between PxAF and mortality has been previously demonstrated [19].It is also studied that timely detection of PxAF can improve survival in this patient group by appropriate medical management [19].

Generative Adversarial Networks
Generative Adversarial Networks (GANs) are a class of deep learning architectures that have been successfully used to generate synthetic images, time-series data, and other data modalities [7,28].In general, GANs are comprised of two Draft Accurate Detection of Paroxysmal Atrial Fibrillation with Certified-GAN and Neural Architecture Search sub-networks: the generator (G) and the discriminator (D).G generates synthetic data that is as close as possible to the real data, while D determines whether the generated data is real or not.These two sub-networks compete with each other in a two-player minimax game with a loss function of V (G, D) (Eq.1).The goal of solving Eq. 1 optimization problem is to reach Nash equilibrium [27].
Probability D(x) determines whether x is generated data or real data.
3 Related Works

PxAF Diagnosis Using DL Methods
Previous studies on DL-based methods showed less attention paid to PxAF detection than other forms of arrhythmia [13].Pourbabaee et al. [45] proposed a method for identifying patients with PxAF.Their proposed method employs raw ECG data as input; then, uses a CNN with one fully-connected layer to learn a discriminative pattern of data in the time domain.Plus, they manually tweaked various classification methods to achieve maximum performance.[50] proposed an attention-based DL method for detecting PxAF episodes from a synthetic database composed of 24-hour Holter ECG recordings.Time-frequency representations of 30-second windows are fed sequentially into the CNN.Then, the extracted features are presented to a bidirectional recurrent neural network with an attention layer.[23] constructed a new long-term ECG database (24 to 96 hours) for the purpose of detecting PxAF.After careful analysis by a cardiologist, 250 AF onsets of PxAF have been detected.They proposed a CNN followed by a bidirectional Gated Recurrent Units (GRU) network for PxAF detection.The network was trained to distinguish between RR intervals that precede an AF onset and RR intervals distant from any AF.They concluded that RR intervals contain information about the incoming AF episode.[56] proposed to predict the occurrence of PxAF by combining wavelet decomposition and a CNN classifier.[52] aimed to detect PxAF episodes before occurrence.[52] leveraged a CNN to process normalized heart rate variability features resulting in 87.76% accuracy and 87.50% f1-score in heart rate variability.

Synthetic Data Generation for ECGs
Medical data tend to be highly sensitive by nature and are often subject to severe usage restrictions.As a result, it is difficult for researchers to collect and share this data.A possible alternative to address the problem of data scarcity is to generate realistic synthetic data [7].[39,47] proposed mathematical dynamical models to generate continuous ECG signals.These models, however, were limited to one lead signal and did not provide any insight into the mechanism of disease.
Recent studies have demonstrated that GANs are extremely effective at synthesizing ECG waveforms based on a prior distribution of data.Prior works are mainly focused on efficient GAN architecture [1,3,10,31,54,57,60]. [10] studied various GAN architectures by leveraging LSTM or BiLSTM as the generator and a CNN discriminator with single or multiple Convolution-ReLU-Pooling layer(s).Results show that a BiLSTM GAN with a single Convolution-ReLU-Pooling layer provides the best performance.[60] used a BiLSTM-CNN GAN model to generate synthetic ECG signals.A GAN architecture based on a four-layer generator and a five-layer fully-connected discriminator is proposed in [49].[3] proposed a multi-GAN method to generate ECG waveforms for atrial fibrillation arrhythmia by combining the output of GAN models.[54] proposed two GAN architectures, WaveGAN * and Pulse2Pulse, with the ability to generate synthetic 10-s ECG waveforms.Pulse2Pulse, which is based on a U-net generative model, is superior to producing realistic ECGs.[31] was the first to propose a transformer-based conditional GAN architecture, named TTS-CGAN, to generate synthetic time-series with sequences of arbitrary length.Compared to popular RNN or LSTM-based GANs for generating time-series [10,15,59], TTS-CGAN has no difficulties in producing long synthetic sequences.In continuation, [57] proposed TCGAN, an architecture combined with a transformer generator and CNN discriminator.
Despite the success of these methods, they do not guarantee that the generated data is trustworthy, resulting in the failure of classifiers to make accurate predictions.This paper sheds light on the fact that synthesizing high-quality artificial data play a crucial role in accurate predictions.Thus, we propose a novel physician-certified synthetic data generation method that provides ECG samples indistinguishable from real ones.

Neural Architecture Search for ECG
Several DL models have been developed for detecting a variety of cardiac arrhythmias.However, increasing the complexity of manual-designed networks does not always lead to better performance.Moreover, the introduced deep Draft Accurate Detection of Paroxysmal Atrial Fibrillation with Certified-GAN and Neural Architecture Search neural networks mostly require a cumbersome phase of trial-and-error, which results in enormous computational costs [35].Recent advances in Neural Architecture Search (NAS) have enabled the designing of scalable and resourceefficient neural architectures.Being inspired by the remarkable success of NAS in the computer vision domain [14], several techniques very recently proposed to leverage NAS for designing accurate architectures for arrhythmia detection [16,17,34,38,41].
Fayyazifar et al. [17] studied the impact of manually tweaking deep neural networks for cardiac abnormality classification.Additionally, they used wavelet decomposition to enhance the classification performance of the PhysioNet Challenge 2020 [2].[16] employed a NAS method for AF classification where they achieved an accuracy of 84.15% on the PhysioNet challenge 2017 [8].Heart-Darts [38] proposed a heartbeat classification method by automatically designing an efficient CNN architecture with a differentiable NAS method.Heart-Darts provides state-of-the-art performance, applied to the MIT-BIH arrhythmia database [40].[34] developed a NAS-based learning method to detect cardiovascular diseases in 12-lead ECG data.In particular, they proposed a novel search strategy that optimizes different attention modules of the same network synchronously.EExNAS [41] designed energy-efficient CNN architectures for detecting Myocardial Infarction (MI) and Human Activity Recognition (HAR) on wearable devices.
These methods utilize NAS to design an efficient arrhythmia classifier; however, they are limited to optimizing the feature extraction part.Further, it is not conclusive that the findings of the prior studies are reproducible, especially since there is no comprehensive evaluation found in their report [32].

Method Overview
We propose a novel method with three phases, comprising: 1) synthetic data generation, 2) ECG Signal Processing, and 3) CNN Architecture Search.Fig. 2 depicts the bird's eye view of the proposed method.In the first phase, we generate synthetic ECGs for the PxAF class using a GAN model.After the GAN creates synthetic ECGs, an expert physician evaluates them to identify high-quality training data.The second phase of the method employs the wavelet transform of an ECG signal along with the recurrence graph.Rhythmic information of an ECG within short length windows of 4 second is preserved in a recurrence graph.The outcome of the first stage is a sequence of the twodimensional images, each incorporating rhythmic contents of a 4 second interval of an input ECG.In the last phase, a CNN is trained to classify the images where the architecture of the CNN is found using NAS.As we will see, the combination of these innovations noticeably improves the performance of the classification.
Phase 1: Certified Synthetic Data Generation.The public databases of ECG mostly contain a heavy class imbalance for the arrhythmia classes.The machine learning methods trained by such databases will be consequently biased for the normal classes.In order to cope with the shortage of signals from the minority class, i.e. the PxAF class, a structure GAN is invoked to create synthetic ECGs from the PxAF class.Obviously, inappropriate synthetic ECGs can mislead the classifier.Therefore, the synthetic ECGs created by the GAN are evaluated by an expert physician in terms of quality using a clearly-defined protocol.The disqualified ECGs will be discarded from the training and the synthetic ECGs certified by the expert physicians will be invoked for the learning process (Section 4.2).
Phase 2: ECG Signal Processing.ECG signal in its raw form is contaminated by different sources of noises and disturbances, such that the PxAF information can be fully concealed.In order to extract discriminant contents of PxAF from the pathological signals, a level of signal processing is required to purify indicative signal contents (Section 4.3).This processing yields a sequence of 2D images, each containing the dynamics of a few seconds of the signal, to a CNN architecture, in which the ultimate classification is performed.
Phase 3: CNN Architecture Search.Manual design of task-specific neural architectures requires tremendous human effort and domain expertise.In addition, the knowledge learned from designing a network cannot be directly transferred to another person.Neural Architecture Search (NAS) is the process of automatically optimizing a neural network architecture.NAS research has shown significant progress in enabling accurate neural architectures for computer vision applications [6,14,35,36].Because of this insight, we came up with the idea of leveraging NAS with the hope of improving the accuracy of PxAF detection (Section 4.4).In this paper, we used the Pulse2Pulse GAN model proposed by [54].Here, we briefly present generator and discriminator architectures.Then, we present the procedure of certifying the quality of generated data with the help of an expert physician.
Generator.The architecture of the generator is inspired by the U-Net architecture.The U-Net implementation uses 1D convolutional layers for ECG signal generation.The network takes a 2×5000 noise vector to generate a 2-lead signal, which is equal to the dimension of the output layer.The noise is passed through six down-sampling blocks followed by six up-sampling blocks.Each down-sampling block consists of a 1D-convolution layer followed by a Leaky ReLU activation.The deconvolution blocks were built from a series of four layers: an up-sampling layer, a constant padding layer, a 1D-convolution layer, and a ReLU activation function consecutively.

Discriminator.
The discriminator takes an ECG as input and outputs a score indicating how close it is to a fake ECG.The architecture is composed of seven convolutional layers that follow the Convolution+Leaky ReLU+Phase Shuffle order.Using phase shuffle operation, each feature map's phase is uniformly perturbed [11].Training specification is reported in Table2.

Synthetic Data Certification.
We observe that not all GAN-generated synthetic ECGs cannot be used as training segments due to their improper morphology, and thus, leveraging all GAN-generated segments in the training process will negatively affect the classification accuracy.Based on the morphological characteristics of ECG signal for PxAF cases, an expert physician manually verified all the synthetic ECGs and certified the valid ones based on the directives listed in Table 1.In this table, the bizare shape implies on the condition in which the sequence of the ECG peaks and wave, and/or their shapes fundamentally differ from the ones, seen in the clinical practice.This condition might be seen in a segment (directive 2), or the entire of the synthetic ECG.It was also observed that the QRS complexes of the synthetic ECG are inconsis-Draft Accurate Detection of Paroxysmal Atrial Fibrillation with Certified-GAN and Neural Architecture Search tent, or accompanied by extra weird morphology (directives 3, 4).The PxAF characteristics were inconsistently seen in some of the data, affecting the learning process, and thus were eliminated (directive 5).

ECG Signal Processing
Fig. 4 shows the major steps of the proposed signal processing pipeline.As shown, the input ECG signal is firstly decomposed to its constitutive components using wavelet transformation until the 10 th level using the Daubechies 3 wavelet family.The detail of the wavelet transforms at the levels 2, 3 and 4 along with the approximation contents of the 10 * th level are reconstructed and added together, to eliminate the noises and the disturbances contaminating the signal.The resulting signal is then normalized by the absolute value of the points with the largest value.Next, the Shannon energy of the normalized signal is calculated using the following formula: where x(t) is the normalized ECG signal which is positively biased to secure non-zero values.An envelope of the resulting Shannon energy signal is found by using a non-overlapping temporal window of length 100ms, which slides over the signal.Lastly, a recurrence 2D function of the envelope is obtained.Calculational details of finding the recurrence plot are found in [25].The output of the signal processing algorithm is a 2D representation of an input signal, which is discriminant for the PxAF and the normal classes.A CNN employs 2D images for classification.

CNN Architecture Search
In general, the learning proficiency of CNNs will be improved by increasing the number of network layers.However, simply stacking the network layers may cause accuracy degradation since the deeper networks will encounter a vanishing/explosion gradient problem.Neural Architecture Search (NAS) methods aim to help engineers to design highly efficient neural networks from scratch [33,35,36].
The NAS pipeline typically begins with a pre-defined space of network operators.Since the search space is often enormous (e.g., containing 10 24 or even more possible architectures [36]), it is unlikely that an exhaustive search is tractable.Thus, heuristic search methods are widely applied to speed up the search process.At an early age, each sampled architecture undergoes an individual training process from scratch, and thus the overall computational overhead is large, e.g., hundreds of GPU-days (e.g., [53] requires 3800 GPU days).
To alleviate the computing cost of NAS methods, researchers proposed to share computation among the sampled architectures, with the key idea of reusing network weights trained previously [5,36] or starting from a well-trained super-network [44].These efforts shed light on the one-shot NAS methods, which require training the super-network only once, and therefore run more efficiently (e.g., 2-3 orders of magnitude faster than conventional approaches).
One-shot NAS methods jointly formulate architecture search and network training [6,12,33].Differentiable NAS methods solve this problem using gradient-based algorithms such as Stochastic Gradient Descent (SGD).DARTS [33] is a well-known differentiable NAS method that constructs a super-network with all possible operators.DARTS utilizes a cell-based design space to search for a well-behaved cell architecture [12,33].Then, the cell may be stacked any number of times to meet various hardware devices' resource requirements.In this paper, we utilize DARTS [33] to design CNN architectures due to significantly reducing the notorious design time of neural networks.
Mathematically, the final DARTS architecture is a function, f (x; ω, α), where x is input, ω is network parameters (e.g., convolutional kernels), and α in architectural parameters (e.g., indicating the importance of each operator between each pair of layers).f (x; ω, α) is differentiable to both ω and α could be optimized using the SGD algorithm.f (x; ω, α) is composed of a few cells, where each cell of DARTS is defined by a directed acyclic graph with a pre-defined number of layers and a limited set of neural operators.Each cell contains N nodes, and there is a predefined set, E, which indicates connected pairs of nodes.For each connected node pair (i, j) and i < j, node j takes  propagates it through a pre-defined operator set, O, and sums up all outputs (Eq.3).O supports separable convolution (3 × 3, 5 × 5), dilated convolution (3 × 3, 5 × 5), max/average-pooling (3 × 3), and Identify operators.
The normalization is performed by computing the Softmax function over the architectural weights.α and ω get optimized alternately in each search iteration.Afterward, the operator o with the maximum value is preserved for each edge (i, j), and all other network parameters ω are discarded.In DARTS, the type of each cell is either a normal cell for feature extraction or a reduction cell for both feature extraction and dimension reduction.After designing the optimal cell, we assemble the final network by stacking 18 normal cells with two reduction cells, where every six normal cells are followed by one reduction cell [37].Last, the final architecture is re-trained from scratch to fine-tune the network parameters.

Database Preparation
Deep-PxAF identifies individuals who are at risk of PxAF.To this end, we utilized the PhysioNet PxAF prediction challenge database [8].This database includes two-channel ECG recordings.The ECG signals were digitized with a 128 Hz sampling frequency, 16 bits per sample, and nominally 200 A/D units per millivolt.The database is divided into training and testing sets.The original train set consists of 100 records with a duration of 30 minutes that are collected for normal individuals and PAF patients, each with an equal number of recordings.The test set contains 50 records of 30 minutes duration in which 28 subjects are at risk of PxAF, and 22 subjects are healthy individuals.We completely isolate the training and testing sets.We also did not create a separate validation set to evaluate training performance since the size of the database is relatively small.
In this paper, we partitioned each 30 minutes ECG signal into segments of four seconds duration resulting in 512 samples/segment.To build the original database (D Original ), we randomly select 4231, 906, and 906 segments for train, validation, and testing, respectively.We consider two classes for training and testing sets: normal (healthy) and PxAF patients.D Original contains 3545 and 2498 samples for normal and PxAF classes, respectively.The ECG data labeling tool will be released alongside the codes upon acceptance.
We generate 10000 synthetic segments for the PxAF class using GAN.As we have data imbalance for the PxAF class, we only synthesize PxAF segments.The original training database has been augmented with 10000 synthetic segments (D GAN ).Due to the fact that most of the generated segments are not of high quality, an expert physician evaluated all synthetic data and certified 539 segments containing PxAF.Then, we add the certified synthetic PxAF segments to the original training set to make the final synthetic database (D CGAN ).The synthetic data generation time takes ≈ 42 GPU hours on a single NVIDIA ® GTX 1080ti that produces ≈4.3 Kg CO 2 [30].

Configuration Setup
Table 2 summarizes the configuration setup of experiments.In this paper, each DARTS cell consists of seven nodes equipped with a depth-wise concatenation operation as the output node.The convolutional operations follow the Convolution+Batch Normalization+ReLU order.The network design time (search+re-training) takes ≈ 9 GPU hours on a single NVIDIA ® GTX 1080ti that produces ≈0.97 Kg CO 2 [30].The rest of the setup follows [33].

Baseline for Comparison
Auto-Sklearn [18].Auto-Sklearn is a state-of-the-art library for automated machine learning (AutoML) that is compatible with the scikit-learn library [43].Auto-Sklearn automatically selects appropriate hyperparameters for a given database by leveraging Bayesian optimization [48]  parameters.Auto-Sklearn considers the past performance of similar databases and constructs ensembles from the machine learning models evaluated during the optimization to improve the optimization quality.Due to the high efficiency of Auto-Sklearn in customizing the machine learning pipeline [9,46], we consider Auto-Sklearn as the second comparison baseline.
Deep Residual Network (ResNet) [26].ResNet is a family of handcrafted architectures that won the ILSVRC competition challenge in 2015.ResNet is constructed by several back-to-back residual blocks connected to a final linear fully-connected layer.In this study, we used ResNet as the third comparison baseline since ResNet has been widely used in automated clinical diagnosis of various diseases [4,13,51,58].

Performance Measurement
This section introduces common quantitative metrics used for presenting how well synthetic data generation and classification methods work.
GAN Performance.For evaluating the performance of GAN, we use a database containing GAN output data and original data to train a model, which is then tested on a held-out set of true examples.This requires the generated data to have labels -an expert physician provides labels to GAN output data.We statistically analyze the distribution of read ECGs and fake ECGs using Kolmogorov-Smirnov test (K-S test).Plus, we will show the Q-Q plot to look at the skewness of fake data from real data.
Classifier Performance.The formulas for quantifying measurements are listed below: where T P , T N , F P , and F N denote True Positives, True Negatives, False Positive, and False Negative, respectively.
6 Experimental Results

The Synthetic ECGs
The previously-described GAN is trained with 8000 epochs and a learning rate of 0.0001.Fig. 5 depicts the loss function of the generator and the discriminator of the GAN.Both of the losses converge to a similar low margin implying the learning relevance.The outcomes of the GAN generator constitute our synthetic ECGs.The quality of the synthetic ECGs is evaluated based on the statistical measures, separately applied to the entire original and synthetic populations, once using the outcomes of the certified-GAN and once using the GAN without accreditation of the expert physician.In both cases, the fidelity of the synthetic ECGs is evaluated by using the two Draft Accurate Detection of Paroxysmal Atrial Fibrillation with Certified-GAN and Neural Architecture Search PxAF-related parameters of ECG: heart rate and R-peak to R-peak interval (RR Interval).These two parameters are independently calculated for the populations using the signal processing algorithm described in Section 4.3.It is worth noting that these two parameters reflect the variation of the cardiac cycle and heart rate that is linked to arrhythmia.
In total, 10000 synthetic ECGs were generated using the previously-described GAN, from which 539 were accredited by the expert physician.Fig. 6 illustrates the histogram of the two PxAF-related parameters for the real and the synthetic ECGs resulting from the certified-GAN.The modal similarities are obviously seen for the synthetic and real populations.In order to explore the fidelity of the synthetic ECGs, descriptive statistics are calculated over the three populations: The real ECGs, the GAN, and the certified-GAN subjected to having PxAF condition.Table 3 represents the mean, standard deviation, and percentile values corresponding to the three populations.From the population perspective, the two PxAF-related parameters of the certified synthetic ECGs demonstrate very good fitness to the population of the real ECGs, with a marginal deviation of less than 2% for the mean value.This value is almost 4% for the data from the GAN.The deviation of the percentile values is less than 10%.The certified-GAN provides clear improvements in all the statistics, but the 2.5% percentile which corresponds to the outlier data.In order to obtain a better understanding of the outperformance of the certified-GAN, the quantile distribution of the real and synthetic data, the so-called Q-Q plot, is investigated.It is obviously seen that the certified-GAN provides closer statistical distribution to the real one, as compared to the plain GAN.This is also explored by using the Kolmogorov-Smirnov Test.
Table 4 presents the results of the Kolmogorov-Smirnov (K-S) test for heart rate.As seen in the table, the certified-GAN improves the K-S statistics as well as the p-value, showing a closer distribution to the real population.This distribution is closer to the real population than the one for the GAN, confirming the effectiveness of the certified-GAN.

Method
PhysioNet Classification Accuracy (%) Pourbabaee et al. [45] ‡ 91.0 Surucu et al. [52] 93.88 D Original (%) D CGAN (%) ResNet-18 [26] 95.2 97.0 Auto Sklearn [18] 92.This study proposed an accurate method for screening PxAF.In this application, the trade-off between sensitivity and specificity is made by assigning the threshold of the output layer, where sensitivity and specificity are defined as: • Sensitivity is the probability of PxAF condition when the classification result is positive • Specificity is the probability of normal condition when the classification result is negative Receiver Operating Characteristics (ROC) is a plot of the Sensitivity against (1-Specif icity), in which the optimal point is the point with maximal Sensitivity and specificity.Fig. 8 illustrates the ROC curve for the proposed method in comparison with the ResNet-18 classification method.As can be seen in Fig. 8 This is because larger kernel sizes (5×5) improve the representational power of the network.In contrast, the reduction cell has many average pooling operations for compressing the information across the spatial dimension.This is because pooling operations can increase the nonlinear representation ability of the network.Referring to the recurrence graphs in which rhythmic contents of ECG are preserved within the squares of 4 second (see Fig2, one can intuitively understand that an optimal kernel size is one that can include rhythms.A small kernel size can negatively impact the learning quality due to its failure to incorporate rhythmic content.

Discussion
This study suggested an original framework for PxAF classification using novel combination of a GAN and NAS in conjunction with an advanced signal processing method.The plays an important role in enriching the training data by generating valid synthetic ECGs through a certified procedure, and the NAS acts as a reliable architecture designer to boost the classification performance.The resulting classification method was optimized and implemented to detect patients with PxAF arrhythmia, which is regarded as an important case study with vital importance.The proposed method improved the screening accuracy by 6.1% compared to the state-of-the-art automated machine learning method [18].The baseline for comparison was ResNet-18 and Auto-Sklearn which are well-known benchmarks for the machine learning method.These benchmarks was noticeably outperformed by the proposed method.
Synthetic Data Generation.This study employed a GAN architecture to generate synthetic ECGs and meanwhile invoked an expert physician to accredit the synthetic data.The application of GAN in generating synthetic ECG has been already explored [54,60], however, the effectiveness of the generated ECGs in the training process is questionable since inappropriate synthetic data can evidently mislead the classifier.The certified-GAN which was proposed by this study effectively pruned the inappropriate signals.Results showed a noticeable improvement in the learning process using the certified-GAN.We will make these synthetic signals publicly available to any researcher to explore for any scientific purposes.
Another interesting aspect of this study is the statistical techniques employed to study the fidelity of synthetic ECGs.Heart rate and R-R interval were employed as the measures for the PxAF.The statistical techniques mainly perform population-based evaluations which fit well into the scope of the learning process.The certified-GAN showed incapability to generate appropriate outliers, as reflected by the 2.5 percentile in Table3.Such outlier data cannot play an important role in the learning process performed by the proposed deep learning architecture.
ECG Signal Processing.In this study, the rhythmic contents of the heartbeats are innovatively preserved at the feature extraction level through signal processing and the recurrence images.Like other methods sufficing to the temporal features, there are a number of design parameters associated with the method at this level, such as the window's length for obtaining the recurrence graph as well as the wavelet transformation.These parameters were empirically obtained based on prior knowledge of the signal.Integration of finding the optimal values for these design parameters with the optimization process might provide further improvements.
CNN Architecture Search.Although several NAS methods have been proposed to detect various arrhythmias [16,17,34,38,41], the area is still unexplored for designing an efficient method for PxAF detection based on an optimized architecture of CNN.Moreover, the optimization process was not performed at the feature extraction level.

Draft
Accurate Detection of Paroxysmal Atrial Fibrillation with Certified-GAN and Neural Architecture Search We learned the dynamic variation of the heartbeats at the feature learning level by designing customized architectures for recurrence images.Several design parameters are associated with the method at this level, such as the number of training epochs.We empirically obtained these parameters based on prior knowledge about the neural architecture search.PxAF yields higher performance compared to the results of conventional machine learning techniques that are automatically tuned by Auto-Sklearn.This primarily results from our custom-designed CNN architecture's higher feature extraction performance.On the other hand, manually tuning a generic CNN architecture [45] may result in lower accuracy in comparison with Auto-Sklearn.
Statement of Reproducibility.To foster reproducibility: • Reproducibility analysis.Many works on NAS have issues regarding reproducibility due to intrinsic stochasticity.Our project, codes, and labeled datasets will be open-sourced upon acceptance to ensure reproducibility.
• Availability of database.In this study, we evaluated our networks using the PhysioNet PxAF database [8].Thus, this work does not involve any new data collection or human subject evaluation.The generated ECGs with the corresponding ground truth labels can be downloaded after paper acceptance.

Conclusion & Future Work
This paper suggested an original combination of certified synthetic data generation in conjunction with the NAS method for classifying a vital pathological sign of ECG signal: Paroxysmal Atrial Fibrillation (PxAF).To overcome privacy and ethical concerns for data sharing, a GAN model was used to generate synthetic data.The synthetic ECGs were purified by an expert physician to discard the irrelevant ones.We employed a CNN for the classification, for which the optimal was found by the NAS.The input images to the CNN were extracted from the ECGs using recurrence graphs of the wavelet transform.It is found that the proposed framework offers a noticeable improvement in classification performance compared to the state-of-the-art as well as the existing benchmarks.In future work, the performance of the classifier resulting from this study will be practically explored on the general population after being implemented in an appropriate platform of wearable ECG.

Fig. 1 .
b shows a PxAF condition versus a normal sinus rhythm.

Figure 1 :
Figure 1: a) Illustration of a sinus rhythm condition.Heart rate variation within 60-100 beats per minute.(b) PxAF condition.Heart rate variability in the form of arrhythmia and P-wave alterations.

Figure 2 :
Figure 2: The bird's-eye view of the proposed method.

Fig. 3
.b 2. Distorted PxAFThere are distorted segments of the signal with bizarre shape Fig. 3.c 3. Inconsistent QRS-complex Heart beat exist, but the QRS-complexes are inconsistent in different beats Fig. 3.d 4. Redundant/Noisy R peaks Extra and noisy R peaks in the segment Fig. 3.e 5. Partial PxAF Segment partially include PxAF pattern Fig. 3.f x i as input and Draft Accurate Detection of Paroxysmal Atrial Fibrillation with Certified-GAN and Neural Architecture Search

Figure 3 :
Figure 3: (a) Plotting a certified synthetic PxAF sample.Plotting PxAF synthetic samples rejected by an expert physician due to (b) bizarre shape, (c) distorted PxAF, (d) inconsistent QRS-complex, (e) redundant/noisy R peaks (showing with red points), and (f) partially existing PxAF pattern in the segment.The sampling frequency is 128 Hz.

Figure 4 :
Figure 4: Illustration of the proposed signal processing pipeline.

Figure 5 :
Figure 5: Loss of the discriminator and generator during GAN training.

Figure 6 :
Figure 6: Distribution of (left) heart rates and (right) RR interval in all 539 certified segments (D CGAN ) compared to the original database (D Original ).

Fig. 7 Figure 7 :
Figure 7: Illustration of the Q-Q-plot for (left) heart rate, and (right) RR interval.

Figure 8 :
Figure 8: Comparing the ROC curve of Deep-PxAF trained on D and D CGAN to the ResNet-18 trained on D CGAN baseline method.

Table 1 :
Directives for rejecting improper synthetic ECG segments.

Table 2 :
as the search method.Auto-Sklearn uses four data preprocessing techniques, 14 feature preprocessing techniques, 15 classifiers, and a structured hypothesis space with 110 hyper-Draft Accurate Detection of Paroxysmal Atrial Fibrillation with Certified-GAN and Neural Architecture Search The configuration setup of the signal processing and neural architecture search hyper-parameters.

Table 3 :
Mean, standard deviation (STD), 2.5%, and 97.5% percentile for HR and RR interval parameters in real and synthetic ECGs.BPM stands for beats per minute.

Table 5
compares the results of Deep-PxAF with the state-of-the-art and state-of-practice classification methods.Results show that Deep-PxAF provides the most accurate classification result with 99% accuracy compared to all counterparts.

Table 5 :
Comparing the results of Deep-PxAF with state-of-the-art and state-of-practice methods.
Using the same search space as DARTS.‡ Reporting the best results by CNN architecture with a K-nearest neighbor (KNN) classifier.