A new deep learning algorithm of 12-lead electrocardiogram for identifying atrial fibrillation during sinus rhythm

Atrial fibrillation (AF) is the most prevalent arrhythmia and is associated with increased morbidity and mortality. Its early detection is challenging because of the low detection yield of conventional methods. We aimed to develop a deep learning-based algorithm to identify AF during normal sinus rhythm (NSR) using 12-lead electrocardiogram (ECG) findings. We developed a new deep neural network to detect subtle differences in paroxysmal AF (PAF) during NSR using digital data from standard 12-lead ECGs. Raw digital data of 2,412 12-lead ECGs were analyzed. The artificial intelligence (AI) model showed that the optimal interval to detect subtle changes in PAF was within 0.24 s before the QRS complex in the 12-lead ECG. We allocated the enrolled ECGs to the training, internal validation, and testing datasets in a 7:1:2 ratio. Regarding AF identification, the AI-based algorithm showed the following values in the internal and external validation datasets: area under the receiver operating characteristic curve, 0.79 and 0.75; recall, 82% and 77%; specificity, 78% and 72%; F1 score, 75% and 74%; and overall accuracy, 72.8% and 71.2%, respectively. The deep learning-based algorithm using 12-lead ECG demonstrated high accuracy for detecting AF during NSR.

Atrial fibrillation (AF) is the most prevalent arrhythmia and is associated with increased morbidity and mortality. Its early detection is challenging because of the low detection yield of conventional methods. We aimed to develop a deep learning-based algorithm to identify AF during normal sinus rhythm (NSR) using 12-lead electrocardiogram (ECG) findings. We developed a new deep neural network to detect subtle differences in paroxysmal AF (PAF) during NSR using digital data from standard 12-lead ECGs. Raw digital data of 2,412 12-lead ECGs were analyzed. The artificial intelligence (AI) model showed that the optimal interval to detect subtle changes in PAF was within 0.24 s before the QRS complex in the 12-lead ECG. We allocated the enrolled ECGs to the training, internal validation, and testing datasets in a 7:1:2 ratio. Regarding AF identification, the AI-based algorithm showed the following values in the internal and external validation datasets: area under the receiver operating characteristic curve, 0.79 and 0.75; recall, 82% and 77%; specificity, 78% and 72%; F1 score, 75% and 74%; and overall accuracy, 72.8% and 71.2%, respectively. The deep learning-based algorithm using 12-lead ECG demonstrated high accuracy for detecting AF during NSR.
Atrial fibrillation (AF) is one of the most important public health problems and a significant cause of increasing health care costs worldwide 1 . AF is the most common form of arrhythmia and is reported to increase mortality and the risk of ischemic stroke, heart failure, and dementia in patients 2,3 . AF is confirmed based on 12-lead electrocardiogram (ECG) findings; however, it is difficult to identify AF, especially paroxysmal AF (PAF), from ECGs acquired during normal sinus rhythm (NSR) because of low detection by conventional methods and the silent nature of PAF 4 . Conventional methods, such as Holter ECG monitoring and event recorder examination, rely on the detection of symptoms over a relatively short period. ECG patches, such as smartwatches, have recently shown a diagnostic AF yield of 34% 5 . It has been reported that ECG monitoring with an implantable loop recorder (ILR) was superior to conventional follow-up for detecting AF after cryptogenic stroke 3,6 . However, smartwatches and ILRs are not widely available because of their cost and invasiveness, making them less accessible to some patients and doctors. These methods also have insurance issues on a case-by-case basis. Therefore, a new cost-effective strategy to meet the "unmet need" and improve AF detection is needed in the future. Meanwhile, the progression of AF can cause electrical and structural changes, manifesting as subtle changes on normal ECGs 7,8 . However, even for cardiologists, it is impossible to distinguish the NSR of a patient with PAF from that of a healthy person without AF on an ECG. A recent report showed good performance of artificial intelligence (AI) using a convolutional neural network for point-of-care identification of AF using ECGs acquired during NSR in patients with PAF 9 . We hypothesized that we could identify the subtle ECG changes present in a standard 12-lead ECG during NSR in patients with PAF using a deep learning algorithm. To evaluate this hypothesis, we trained, validated, and tested a recurrent neural network (RNN) deep learning algorithm using NSR ECGs in PAF and healthy individuals in a tertiary hospital. www.nature.com/scientificreports/ Application of ECG interpretation using deep learning analysis. We developed an RNN-based AI application that can be used for analyses in real-time on computers in our hospital after internal validation of the RNN-based deep learning algorithm. Using our application, there were interesting findings revealed by the NSR ECGs. For instance, when taken on a date close to the date of documented AF or when an AF symptom was present, it tended to have high detection probability, and low AF detection probability was noted in the absence of AF symptoms when multiple serial ECGs were assessed from the same patient. As shown in Fig. 3, the probability of PAF using a deep learning algorithm program could change according to the dates of ECG acquisition. For example, a 72-year-old man diagnosed with PAF was calculated to have AF with probabilities of 90% and 100% by the AI program during NSR, acquired after his AF episode had terminated, and to have AF with probabilities of 6.3% and 10% in the absence of AF symptoms.

Discussion
We analyzed the predictive value and the optimal section in an ECG for identifying AF during NSR using a deep learning algorithm. The AI-deep learning algorithm developed to estimate the probability of PAF during NSR using a 12-lead ECG was excellent for identifying PAF (recall of 82%, specificity of 78%, F1 score of 75%, and overall accuracy of 72.8%). The suggested model showed a reliable harmonic mean of precision and recall (F1 score) for identifying PAF during NSR compared with the models used in recently published studies 9,10 . The model showed that the optimal interval to detect subtle changes of PAF was within 0.24 s before the QRS complex in a 12-lead ECG.  www.nature.com/scientificreports/ Deep learning models usually require access to large and accurate datasets 11 . Despite the relatively small size of our data compared to those in the previous studies, our model showed favorable recall and accuracy 9,10 . This could be attributed to the use of accurate ECG data for training and validation of deep learning verified by two electrophysiologists and the detection of optimal intervals for AF detection. It has been reported that P-wave analysis calculated on a standard surface ECG could be used to identify patients with PAF [12][13][14] . We intended to recognize the subtle but significant differences among PAF-NSR and healthy-NSR ECGs carefully through this approach despite the relatively small data size. It is expected that through the use of this model, the amount of data required for a diagnosis would reduce greatly, making it easy to apply to actual clinical trials.
Opportunistic screening for AF in patients aged ≥ 65 years during other examinations, such as blood pressure checks, has detected AF in approximately 1.4% of patients 15 . The detection rates of AF using repeated snapshot handheld ECG devices and continuous recordings, such as patches or ILRs, were 1-2.5% per day (3.8% per week) and 22-34% per year, respectively 16,17 . However, these monitoring devices are invasive and expensive 18 . Although it is difficult to perform a head-to-head comparison among these various modalities for AF detection because of different techniques used and heterogeneity of patients enrolled, AI using ECG could have a good performance to detect patients with PAF using a single 12-lead ECG, which is a rapid, simple, and inexpensive point-of-care test. Currently, there is an unmet need for a method to increase AF detection with good sensitivity. Our algorithm showed excellent performance for recall of identifying AF. ECG and Holter monitoring are short-term monitoring methods that usually show NSR in one or more tests, even in patients with AF. However, patients' preferences for intensive long-term monitoring pose limitations for AF detection. Therefore, the use of AI to increase the accuracy of AF diagnosis would be very useful in pre-screening, as it would save unnecessary inspection time and cost. With continuous ECG monitoring over extended periods for people aged 65 years and older, one-fourth to one-third of them would have brief AF episodes. The use of our model in this population could be a cost-effective alternative for AF detection.
Data from a Swedish registry helped identify two major gaps in AF-related stroke prevention, representing 33% of all ischemic strokes 19 . AF was not detected before the stroke in 9% of all stroke cases 15 . In these patients, AF screening and stroke prevention, such as appropriate anticoagulation prescription, would be needed for the prevention of recurrent strokes. Pre-screening for AF using AI could be helpful in reducing the evidence-practice gap of oral anticoagulant (OAC) prescriptions in these populations.
The cost-effectiveness is likely to be a result of earlier diagnosis of AF and initiation of treatment to reduce stroke risk, as stroke is a severe event with a high economic burden 20 . Since stroke is a serious event with a large economic burden, cost effectiveness is most likely the result of early diagnosis of AF and initiation of treatment to reduce the risk of stroke. Early rhythm-control therapy was associated with a lower risk of adverse cardiovascular outcomes compared with usual care in patients diagnosed early with AF according to findings from the EAST-AFNET 4 trial 21 . Accurate early diagnosis and proper clinical management of AF are expected to contribute to improving patient and population health outcomes by ensuring that patients receive appropriate treatment. We expect that if AI performance becomes more accurate in the near future, it will play a role in this first step of AF screening. www.nature.com/scientificreports/ It has been demonstrated that the maintenance of AF provokes ion channel changes and leads to a marked shortening of the atrial effective refractory (AER) period, a reversion of its physiological rate adaptation, and an increase in rate, inducibility, and stability of AF; all these changes were completely reversible within 1 week of sinus rhythm 22 . AF is a progressive disease associated with progressive electrical and structural remodeling and a gradual increase in AF burden 23 . Delayed recurrence of AF after AF ablation might be related to AF progression 24 . The time-course of AER involves a transitional period associated with the progression and maintenance of AF 25 . AF progression shows patient-specific patterns of the atrial activation rate 26 . Our model showed differences in acquired AF probabilities of NSR ECG according to AF episodes in the same patients (Fig. 3). It suggested that the duration between an AF episode and the length of PAF-normal ECG recording might be associated with subtle changes in ECG, indicating AF progression. Additionally, our study showed that AI could identify this subtle difference even in NSR. This could explain the reflected reversible electrical remodeling when there were AF symptoms or episodes. In this study, patients with PAF-NSR had higher HR and prolonged PR interval, QRS duration, and corrected QT interval than healthy persons who have no AF recorded. Previous studies showed that several ECG changes had been identified in patients with AF, including prolonged PR interval, P wave duration, and QT interval, and left ventricular hypertrophy 23,26,27 . Although we cannot understand and interpret these ECG changes because of a so-called "black box" limitation of a deep learning algorithm in terms of the approach to the decision for detecting AF, we have assumed that these changes related to atrial remodeling might have influenced our deep learning decision process.
We have evaluated an AI-based deep learning algorithm for the identification of AF during NSR. Further studies should evaluate the hypothesis that combining ECG analysis with AI and clinical comorbidities could enhance AF prediction. Such algorithms could be useful stratification tools for patients at risk for developing new-onset AF, especially in those with cryptogenic stroke. While there have been many reported traditional predictors of PAF after cryptogenic stroke, only a few studies focused on ECG analysis using AI 9,10,28 . Attia and colleagues reported that AI could help in identifying the point of care of individuals with AF by using 649,931 ECGs acquired during NSR 9 . It has been reported that deep learning network could help in identifying new-onset AF and AF-related stoke using the 1.6 million 12-lead ECGs by Raghunath and colleagues 10 . In other words, the intensive analysis of ECG provided by the deep neural network might detect subtle and multifaceted perturbations of ECG and to identify AF-related stroke patients predicted to be at a high risk of AF. Although further research is needed, these deep learning algorithms may be able to identify a high-risk subset of patients with potential stroke who may benefit from empirical anticoagulant therapy. Improved risk stratification would allow more patient-centered intervention and patient-tailored decision making for better AF management. Increasing awareness and detection of undiagnosed AF and administering OAC for thromboprophylaxis remains an ongoing issue. The increased detection yield of AF by AI could lead to the establishment of effective thromboprophylaxis with OACs to overcome the treatment gap between aspirin and OACs 29 . In the future, efforts should be directed at the primary prevention of AF to provide the basis for fine-tuning patient-tailored decision making. Furthermore, it could be used to predict responsibility of AF treatment, such as electrical and pharmacological cardioversions, and AF catheter ablation.  www.nature.com/scientificreports/ Several limitations in this study should be considered. First, as this study was a retrospective research conducted in one tertiary hospital in Korea, it is necessary to validate the model with patients from other hospitals and countries. Study enrollment duration of data B is different from that of data A because of the limitation of a retrospective study. Baseline characteristics and electrocardiographic findings were different between data A and B. However, patients with PAF-NSR in both data A and B had similar characteristics and pattern of electrocardiographic findings. A prospective study is warranted to establish its usefulness in AF patients as a new feasible and non-invasive screening tool. The interpretation of deep learning models is challenging, and benefiting from deep learning models requires access to large datasets. Therefore, future studies to improve the interpretability of the developed deep learning models and to identify the right size of the training and test datasets are warranted. Second, despite focusing on a specific area on the ECG, the accurate rationale behind AI decision making remains unknown because of the nature of AI; therefore, this needs to be further explored. Recently, explainable AI has been studied, and the automatic detection of bias and the ability to explain its decision making process could be made possible in the near future 30 . Furthermore, we focused on developing screening tools for AF based on 12-lead ECGs. Despite the favorable performance of our deep learning algorithm, overcoming false positives and negatives to identify the optimal treatment and predict prognosis remains an important issue. Nonetheless, health professionals can be alerted to potential occurrences of AF in the population with a higher risk of AF by the suggested algorithm, and additional evaluation with ECG monitoring might be warranted. Although it is difficult to rely on AI as a direct factor in clinical decisions involving the administration of drugs, such as novel OACs or antiarrhythmics, the algorithm can predict AF with high sensitivity to improve AF detection, which is an unmet need in the field, especially for patients with cryptogenic stroke.
In conclusion, the deep learning-based algorithm using 12-lead ECGs may discriminate "hidden" AF during NSR. Further studies are needed to evaluate their possible use in future prognostic models for precise decision making in daily practice.

Methods
Study design and population. This retrospective cohort study included adult participants (age ≥ 18 years) with standard 12-lead ECGs acquired at least twice every 3 months to ensure accuracy of PAF and health-NSR group classification of AF rhythms recorded at the Inha University Hospital. ECG XML raw data to access and extract for AI use in our hospitals have been stored since 2015. Dataset A, acquired from March 2015 to March 2019, was used for development and internal validation, and dataset B, acquired from April 2019 to April 2020 after AI development, was used for external validation. All ECGs were acquired at a sampling rate of 500 Hz using a GE-Marquette ECG machine (Marquette Tools, Milwaukee, WI, USA), with the raw data stored as XML documents using the MUSE data management system in relational databases. We defined PAF as episodes of AF lasting < 48 h, which terminated spontaneously within 7 days or terminated following electric or pharmacological cardioversion within 48 h 14 . The first recorded ECG with AF was defined as the index ECG; subsequent NSR ECGs were defined as PAF-NSR ECGs. We identified healthy-NSR ECGs as the NSR ECGs of healthy individuals on the health screening list at our hospital, with NSR ECGs acquired at least twice every 3 months to ensure accuracy of health-NSR group classification of no AF rhythms recorded. Patients who continued to use antiarrhythmic drugs for > 3 months were excluded to rule out antiarrhythmic effects. Two electrophysiologists reviewed all the ECGs with corrections made to the diagnostic labels as necessary. Figure 1 shows the dataset creation and analysis strategy, which was devised to ensure a robust and reliable dataset for training, validating, and testing the network. The study protocol was approved by the Institutional Review Board of the Inha University Hospital (2018-630 and 2019-10-038) and complied with the principles of the Declaration of Helsinki. The need for obtaining patients' informed consent was waived owing to impracticality and minimal risk of harm. Development of the AI algorithm for identifying AF during NSR. The AI algorithm was developed using an RNN to manage sequential data reflecting the ECG characteristics 31,32 . An ECG is a graphical display of the heart's electrical activity depicting changes in voltage over time through electrodes. These electrodes detect subtle electrical changes resulting from cardiac muscle depolarization and repolarization during each cardiac cycle. Changes in the normal ECG pattern occur in numerous heart abnormalities, including arrhythmia. We have chosen the RNN of deep neural networks, which have advantages in dealing with time-series data, such as ECG data 33 . The bi-directional connection is added so that time flow can be considered in forward and backward passes, and long short-term memory is used to maintain a series of information in the short and long terms. We extracted and analyzed XML data from the MUSE data management system to minimize artifacts. All data files were stored in the XML format on a GE MAC5500 machine (GE Healthcare, Chicago, IL, USA). The ECGs were originally measured on 12 leads, but because of the deviceʼs data storage method, only data from eight leads were stored, excluding XMLʼs Lead III, aVR, aVL, and aVF. The data from these four leads can be calculated with simple arithmetic expressions, and it is common practice to approximate the data with these operations. Therefore, in this study, only the eight measured signals of leads I, II, V1, V2, V3, V4, V5, and V6 were used. The signals were measured for 10 s on each lead simultaneously. When the Base 64 encoded value was read, eight one-dimensional arrays for each XML file were obtained. As a 10-s signal has multiple pulses and the heart rate varies from person to person, approximately 10 or more pulses can be obtained per person. For training, we used all the individual beats sampled from each recording after discarding a few highly noisy signals. The validation www.nature.com/scientificreports/ and test were performed by grouping the beats from a single recording. The result was finally computed by the ratio of test detected beats as PAF-NSR to the total number of beats in the recording. We separated the training, validation, and test sets to include a group of patient recordings in only one set. AF has characterized atrial fibrillatory waves and an irregular ventricular rate on 12-lead ECG. We hypothesized and verified that the vicinity of the P-wave before the QRS complex would be important for differentiating AF during NSR. To verify this hypothesis, we tested the accuracy of the binary classification by assessing whether each case was a PAF-NSR or healthy-NSR ECG; then, the data were evaluated once with five-fold cross-validation. The results demonstrated that the accuracy of PAF detection starts to improve when approximately ≥ 100 samples are used in the test. The experiment for the optimal sample size to identify AF was performed at the specific range of the R-R interval that was significantly related to PAF 12,34 . We reweighted the input ECG signal f(t) using the window function g(t). For the optimal interval to detect subtle changes in PAF-NSR, we used the rectangular function as follows: The reweighted signal h(t) was computed as follows: h(t) = f(t) × g(t). This process clarified the value ranges that particularly affect the trained model along the time-axis. The bi-directional connection was added so that time flow could be considered in forward and backward passes, and long short-term memory was used to maintain a series of information in the short and long terms (Fig. 4).
A ROC curve was created to test and validate the datasets and assess the AUC of the AI-enabled ECGs acquired during NSR to determine whether AF was present. Using the ROC curve for the small internal validation set, the probability threshold was set and applied to the testing dataset to derive the accuracy, sensitivity, specificity, and F1 score of the testing dataset. After the internal validation of this RNN-based deep learning algorithm, we developed AI applications that can be used on computers in our hospital. Continuous ECG data since 2019 were gathered and analyzed in real-time. Figure 5 describes the schematic representation of the AI algorithm and its application for detecting PAF.
Statistical analysis. Continuous variables are reported as means ± standard deviations or medians and interquartile ranges, and categorical variables are expressed as percentages and frequencies. Comparisons between the two groups were performed using the independent sample t-test or chi-square test. The performance of the AI model was measured using the AUC and ROC curves for predicting dataset accuracy, recall (sensitivity), specificity, and F1 score. The recall is the ratio of correctly predicted positive observations to all observations. F1 score (balanced F-score) is the harmonic mean of precision and recall. A two-sided value of P ≤ 0.05 was considered statistically significant. Statistical analyses were performed using SPSS statistical software (SPSS version 21.0 for Windows, SPSS Inc., Armonk, NY, USA).  The experiment for the optimal sample size to identify AF was performed at a certain range of the R-R interval, where we reweighted the input EEG signal f(t) using the window function g(t) (b). The reweighted signal h(t) is computed by the equation h(t) = f(t) × g(t) and illustrated by the red dotted curve (c). This process clarifies the value ranges that particularly affect the trained model along the time-axis. The bi-directional connection is added so that the time flow can be considered in forward and backward passes, and long short-term memory is used to maintain a series of information in the short and long terms (d). NSR normal sinus rhythm, AF paroxysmal atrial fibrillation; LSTM long short-term memory.

Data availability
The data collected from the Inha University Hospital during this study are patient data obtained under the institutional review boards' ethical approval. The corresponding author agrees to share de-identified individual participant data, the study protocol, and the statistical analysis plan with academic researchers following completion of a data use agreement specifying that this information cannot be shared. The coding used to train the AI model is dependent on annotation, infrastructure, and hardware and therefore, cannot be released.
Received: 16 March 2021; Accepted: 7 June 2021 Figure 5. Description of the artificial intelligence algorithm for detecting PAF. All raw ECG data were stored as XML documents using the MUSE data management system in a relational database server. PAF probability is calculated through our developed AI algorithm using an RNN with two-dimensional convolution (red box). AI artificial intelligence; ECG electrocardiogram; LSTM long short-term memory; PAF paroxysmal atrial fibrillation; RNN recurrent neural network.