Long-term PM2.5 exposure and the clinical application of machine learning for predicting incident atrial fibrillation

Clinical impact of fine particulate matter (PM2.5) air pollution on incident atrial fibrillation (AF) had not been well studied. We used integrated machine learning (ML) to build several incident AF prediction models that include average hourly measurements of PM2.5 for the 432,587 subjects of Korean general population. We compared these incident AF prediction models using c-index, net reclassification improvement index (NRI), and integrated discrimination improvement index (IDI). ML using the boosted ensemble method exhibited a higher c-index (0.845 [0.837–0.853]) than existing traditional regression models using CHA2DS2-VASc (0.654 [0.646–0.661]), CHADS2 (0.652 [0.646–0.657]), or HATCH (0.669 [0.661–0.676]) scores (each p < 0.001) for predicting incident AF. As feature selection algorithms identified PM2.5 as a highly important variable, we applied PM2.5 for predicting incident AF and constructed scoring systems. The prediction performances significantly increased compared with models without PM2.5 (c-indices: boosted ensemble ML, 0.954 [0.949–0.959]; PM-CHA2DS2-VASc, 0.859 [0.848–0.870]; PM-CHADS2, 0.823 [0.810–0.836]; or PM-HATCH score, 0.849 [0.837–0.860]; each interaction, p < 0.001; NRI and IDI were also positive). ML combining readily available clinical variables and PM2.5 data was found to predict incident AF better than models without PM2.5 or even established risk prediction approaches in the general population exposed to high air pollution levels.

, CHA 2 DS 2 -VASc 4 , and HATCH 5 scores; however, their prediction accuracies are not sufficient for wide application. Although epidemiological studies have suggested that an elevated level of ambient particulate matter < 2.5 μm in aerodynamic diameter (PM 2.5 ) is consistently associated with adverse cardiac events 6 and arrhythmias 7 , including AF 8 , the role of PM 2.5 on incident AF remains to be investigated. Recently, data-driven analyses using machine learning (ML) methods have been introduced to identify some blood biomarkers that are risk factors of AF prevalence (not incidence) 9 , and they were considered non-inferior to traditional analyses 9,10 . However, it was not clear whether these data-driven approaches could find correlations between PM 2.5 and incident AF, or if they could predict incident AF better than traditional analysis in clinical practice.
Although some studies from Western countries did not show a correlation between short-term exposure to PM 2.5 and incident AF 11,12 , the air pollution levels in those areas were much lower than the levels in Asian countries; therefore, the effect sizes could be low in those studies. Our previous study, performed in the general population of an Asian country, showed correlations between PM 2.5 exposure and increased AF incidence 8

. Thus
Scientific RepoRtS | (2020) 10:16324 | https://doi.org/10.1038/s41598-020-73537-8 www.nature.com/scientificreports/ far, the identification of AF risk factors had been hypothesis driven and most studies performed analyses based on the selection of several cardiovascular risk factors. To perform a data-driven analysis for revealing AF risk factors, we used 27 readily available parameters including PM 2.5 level in the Korean general population. All subjects without a history of previous AF were included in our population to identify the risk factors for incident AF. We also analyzed already revealed clinical risk factors to determine which risk factors best predict incident AF in this population. We investigated the robust risk factors for incident AF by using both the traditional regression method and the ML algorithm.

Methods
In this nationwide cohort study, we investigated the relationship between long-term exposure to PM 2.5 and incident AF by using ML methods. The study protocol adhered to the ethical guidelines of the 1975 Declaration of Helsinki. The protocol was approved by the Institutional Review Board of Yonsei University College of Medicine, which waived the need for informed consent.  (Fig. 1). For the purpose of analysis, the subjects were divided chronologically in an approximately 7:3 ratio for a conventional discovery-validation approach.
Air pollution measurements. During the study period, PM 2.5 levels, temperature, and humidity were measured hourly at the 313 sites of the Korean Nationwide Meteorological Observatory by the Korean Department of Environmental Protection. The entire Korean peninsula is divided into 256 residential ZIP codes including 74 metropolitan areas (average 73 km 2 ). To assess long-term PM 2.5 exposure effects, the nearest monitoring facility of each residence was identified and used to assess the average annual pollutant levels for each study subject 16 . The geographically based long-term average PM 2.5 level during the study period was measured hourly by the monitoring facilities 16 , and 256 residential ZIP codes were matched with the nearest monitoring facilities. Meteorological variables (temperature and humidity) were included as geographically based long-term averages of the hourly measured temperature and humidity for each subject during the follow-up period. The long-term average (during the total study period for each subject) PM 2.5 levels and meteorological measurements (temperature and humidity) were calculated from these hourly measurements at each site. The Korean National Ambient Air Quality Standards (NAAQS) and PM 2.5 measurement methods are described in Supplementary Table 2.
Primary outcome. The primary outcome was the incidence of AF according to the PM 2.5 level. AF was diagnosed on the basis of hospital admission or at least two outpatient visits for AF 17,18 . The cohort was followed up to the time of an AF incident, the time of disqualification from the NHIS (death or immigration), or the end of the study (December 31, 2013).
Machine learning. Twenty-six readily available clinical parameters and PM 2.5 data were used for variable selection. Several supervised ML classifiers included support vector machine (SVM), decision tree, random forest, Naïve Bayes, deep neural network, and extreme gradient boosting models. Our SVM model was used to differentiate patients whether they developed new-onset AF or not by a computed hyper-plane (the optimal cost and gamma parameters were found with radial kernel) which separates these categories most effectively 19 . Our decision tree model constructed with a recursive tree structure using computationally selected parameters can differentiate features step by step by creating appropriate splits (recursive partitioning). And it was combined by ensembled algorithms to construct better prediction models such as gradient boosting (500 iterations using root mean square error as evaluating metric) and random forest (ten decision trees were combined to construct the best model) 20 . As mathematical models, artificial neural network systems mimic human neural networks which can be trained to discriminate different patterns of diseases, and we selected the three-layered deep neural www.nature.com/scientificreports/ network model using Tensorflow backend with Keras framework 21 . The entire ML process consisted of ML with automated feature selection by information gain attributive ranking algorithm 22 , model constructing with a boosted ensemble algorithm, and tenfold cross-validation to reduce overfitting 23 .
On the basis of supervised ML methods to construct prediction models, we used a sequential method of feature construction and automated selection by information gain ranking to identify predictive risk factors from the various health examination parameters 22 . Variable selection with an entry criterion of p < 0.05 was applied, and data-driven approaches were used to identify the smallest number of variables required for each prediction model. By using each model, selected variables including PM 2.5 were modeled for their association with incident AF in the discovery cohort, and subsequently evaluated in the validation cohort (Fig. 1). New-onset AF events were analyzed with the geographically based long-term average PM 2.5 level during the study period for each subject. Bootstrapping and tenfold cross-validation were used to adjust the model coefficient to avoid overfitting in the discovery sample. Model accuracy was calculated in the validation sample (30% of the original dataset), and loss was calculated using binary or categorical cross-entropy. The area under the receiver operating characteristic curve or c-index was used for each constructed model, and the net reclassification improvement index (NRI) and integrated discrimination improvement index (IDI) were calculated to assess the additional discriminative ability of these models.
Networks for training and validation were constructed based on Tensorflow (version 1.10) using the Keras framework (version 2.1.6), and all statistical analyses were performed using R (version 3.5) and Python (version 3.6) software. Training and validation were performed on an Intel central processing unit Xeon Scalable Gold 6126 under two graphics processing unit (GPU) devices support (two Nvidia RTX 2080Ti GPU devices; CUDA version 9.0), and constructed models were saved for further analysis.
Statistical analyses. The baseline characteristics of subjects with and without AF in both the discovery and validation cohorts were compared. We assumed that the study subjects were exposed to ambient air pollution within their residential ZIP codes during the study period 16 . Individual subjects were matched with the average air pollution levels and meteorological information during the study period obtained from the nearest monitoring facilities (according to the subjects' residential address). By using Cox proportional-hazard model regression www.nature.com/scientificreports/ analysis, the relationship between incident AF and PM 2.5 level was analyzed using a generalized estimating equation approach with a random-effect analysis 24 . The proportionality of the hazard assumptions was checked with a log-minus-log graph and a test on the Schoenfeld residuals. Consequently, the test results were found to be valid for each lifestyle factor. In Cox regression analysis, the included subjects were followed from their national health examination until the development of new-onset AF, disqualification (death or immigration), or the end of the study. A two-tailed p-value of < 0.05 was considered statistically significant.

Results
Baseline characteristics. There were no significant differences in body mass index (BMI), smoking history, socioeconomic status, and follow-up duration between the groups (Table 1). Subjects who developed newonset AF during the follow-up were older, included a higher proportion of men, were more likely to have comorbidities, and were exposed to higher average PM 2.5 levels than those without AF (Table 1). Subjects with AF had higher use of antiplatelet agents, beta-blockers, and statins than those without AF.
Application of PM 2.5 to traditional regression analysis improves the prediction of incident AF. The air pollution and meteorological measurements are described in detail in Supplementary Results Table 3). We have previously reported on the association between increased exposure to longterm average PM 2.5 and increased incidence of AF 8 Table 3). The total scores ranged from 0 to 10, 0 to 7, and 0 to 8 points, respectively. The scores showed good discrimination with c-indices of 0.859 (0.848-0.870), 0.823 (0.810-0.836), and 0.849 (0.837-0.860), respectively ( Table 2). These scoring systems showed significantly better performances for predicting incident AF than each existing score (CHA 2 DS 2 -VASc, CHADS 2 , and HATCH), and their NRI and IDI were also positive ( Table 2).
Comparing ML models with the traditional regression model. To estimate the crude accuracies of the ML models for predicting incident AF, we compared the six ML models and the traditional regression analysis model with age, sex, and BMI as input variables ( Table 4). The c-indices of the six ML models (SVM, decision tree, random forest, naïve Bayes, deep neural network, and extreme gradient boosting models)  Table 4). The extreme gradient boosting model showed the highest c-index for predicting incident AF among these models (Table 4).
Validation by ML models and the application of PM 2.5 for predicting incident AF. We used several ML models and performed analyses for predicting incident AF in our cohort. We used the 27 variables listed in Table 1 as input variables for the ML models and performed training using a discovery cohort of 302,811 subjects including 2444 with incident AF (0.8%) that developed over the 5-year follow-up period ( Table 1) Table 5). For random forest and extreme gradient boosting ML models, based on decision trees, the rank variable importance is determined by the selection frequency of the variables as a decision node, whereas SVM uses the sensitivity of generalization error bounds with respect to a variable and neural networks use overall weighting of the variable within the model 26 . PM 2.5 was also highly ranked and other variables were also selected, as described in Supplementary Figure and Table 5.
After applying the tenfold cross-validation algorithm, the best ML model was the extreme gradient boosting model of the boosted ensemble algorithm with a c-index of 0.845 (0.837-0.853) ( Table 2). After adding PM 2.5 as Table 1. Baseline characteristics of the study population (n = 432,587). AF atrial fibrillation, BMI body mass index (kg/m 2 ), CKD chronic kidney disease (eGFR lower than 60 mL/min estimated by serum creatinine using CKD-EPI formula) 37 , COPD chronic obstructive pulmonary disease, DBP diastolic blood pressure, eGFR estimated glomerular filtration rate (mL/min), HDL high density lipoprotein, LDL low density lipoprotein, MI myocardial infarction, PM 2.5 particulate matter < 2.5 μm in diameter, SBP systolic blood pressure, TIA transient ischemic attack. *Socioeconomic status was divided into two groups: higher (≥ 51% of income level) and lower (< 51% of income level).  3 and Table 2). All NRI and IDI were also positive ( Table 2).

Discussion
There have been few studies about the correlations between PM 2.5 and incident AF, and the clinical significance of PM 2.5 for predicting AF incidence has not been investigated. In this study, we investigated the clinical impact of PM 2.5 on predicting AF incidence by using ML methods in the general population of an Asian country affected by high air pollution levels. ML methods identified the clinically important variables for predicting incident AF, and PM 2.5 was also identified as a highly ranked important variable. With the addition of the PM 2.5 variable, the prediction performance significantly improved with both traditional regression analysis and ML methods. Additionally, based on traditional regression analysis, we constructed scoring systems for predicting incident Although some studies, including our previous study 8 , have shown the relationships between air pollution exposure and AF development in patients with known cardiac diseases 27 , some studies from Western countries did not show a relationship between PM 2.5 exposure and incident AF 11,12 . However, these studies were performed in European countries and the United States, where the air pollution levels were much lower than the levels in Asian countries. Therefore, the effect sizes could be low in those studies. In our nationwide dataset, to facilitate the data-driven analysis for revealing AF risk factors, we used 27 readily available parameters among the general population, and PM 2.5 was identified as a highly ranked variable. Adding the information of PM 2.5 exposure to known clinical risk factors can enable a better prediction of incident AF in the general population. Additionally, we attempted to apply this information about PM 2.5 exposure for predicting incident AF in clinical practice by constructing relevant risk scores based on the Korean NAAQS, which might add some information when managing patients with AF risk factors. Further prospective studies using these new risk scoring systems will be needed whether upstream medical therapy is beneficial to prevent incident AF in the general population.
The adverse health effects related to air pollution have been studied since 1993 28 , even for arrhythmias 29 . One suggested mechanism is the occurrence of myocardial repolarization abnormalities contributing to arrhythmias 30 caused by systemic inflammatory cytokines produced by pulmonary inflammatory responses after inhaling www.nature.com/scientificreports/ particles 6 . Another suggested mechanism includes alteration of the cardiac autonomic nervous system that occurs with the inhalation of particles mediated by reactive oxygen species 31,32 , these adverse inhalation effects can be diminished in patients with chronic lung parenchymal diseases 33,34 .
As an advanced computing technology for artificial intelligence, ML is increasingly used in cardiology to meaningfully process data that exceed the capacity of the human brain 35 . Unlike traditional statistical analyses, ML models can accept enormous data as input variables and can improve the prediction performances through a repetitive training process, thus offering more applicable prediction models to external datasets 20,36 . Everyone can access this advanced computing technology and it can rapidly construct models by automatically training process, consequently it can offer better prediction model than traditional model manually made by human 35 . However, ML is highly data-dependent ("garbage-in, garbage-out") and hard to interpret, and it often develops overfitting problems 35 .
To the best of our knowledge, this is the first cohort study with 1,666,528 person-years of follow-up to assess the prediction performance of long-term PM 2.5 exposure for incident AF. Additionally, long-term PM 2.5 exposure was identified as a highly important variable for predicting incident AF by using ML methods. After adding the PM 2.5 variable to established AF prediction scoring systems, the prediction performances for incident AF significantly improved.
Our study suggests that applying long-term average PM 2.5 measurements in clinical practice could better predict the development of AF in patients. Additionally, ML using boosted ensemble methods can predict incident AF better, with readily available subject characteristics, than traditional regression analysis. The detailed characteristics of the subjects in this study allowed blood pressure measurements, blood test results including fasting glucose and cholesterol profiles, and smoking and alcohol intake habits to be integrated in these analyses.
On the basis of these findings, we constructed scoring systems for predicting incident AF by adding the PM 2.5 variable to existing risk prediction approaches: PM-CHA 2 DS 2 -VASc, PM-CHADS 2 , and PM-HATCH, which showed better predicting performances than established scoring systems. Table 4. Performance of predictive models for incident AF risk during follow-up period in overall general population (age, sex, and BMI-adjusted models). AF atrial fibrillation, BMI body mass index, CI confidence interval, IDI integrated discrimination improvement index, NRI category-free net reclassification improvement index. *Age, sex, and BMI were used for constructing these predictive models (age, sex, and BMI were adjusted for traditional regression analysis, and these variables were used as input variables for training the listed machine learning models). www.nature.com/scientificreports/ Although our findings from traditional regression analysis and novel ML methods drew similar results, there are some limitations. Although we set disease-free baseline period (7 years: 2002-2008) with excluding subjects with a previous AF history, the existence of selection bias cannot be ruled out. However, the diagnostic accuracy of AF with this manner was previously validated in our NHIS database 17 . Although we excluded subjects who changed residence within the study period, subjects' air pollutant exposure or specific locations could not be fully reflected during the period. In addition, as our data were from the National Health Insurance administrative claims database, the exact hour of AF development could not be identified. Therefore, we thought that the analysis for the effects of acute exposure might draw somewhat biased results, and further investigations are needed 8 . Although we used previously established risk scoring systems such as CHA 2 DS 2 -VASc 4 , CHADS 2 3 , and HATCH 5 scores, they were not originally designed for predicting incident AF. However, these scores included important clinical comorbidities affect AF development, and we also assessed the prediction performances of traditional regression analyses and ML methods using readily available clinical risk factors and it showed better prediction performances with PM 2.5 than those without PM 2.5 ( Table 2 and Fig. 3). Our prediction models did not adjust some confounders such as echocardiographic parameters (left atrial size, left ventricular ejection fraction, and ventricular chamber size), other chronic diseases (liver diseases, dementia, and chronic systemic inflammatory diseases), and some exposure confounders (occupational aspects who usually work inside or outside) that affect AF development or air pollution exposures. As we did not investigate the associations between air pollution and myocardial repolarization or inflammatory markers, the mechanism behind the relationship of exposure to air pollution and AF remains unclear. Although statistical approaches including cross-validation to minimize overfitting were applied when constructing ML models, and they supported our main results, external validation is needed especially in Western countries where the air pollution level is low.  Table 2) as input variables, TR2 traditional regression analysis model using clinical 12 variables (adjusted variables were same with that of TR2 (model 2) in Table 2), XGBM extreme gradient boosting model. www.nature.com/scientificreports/ conclusions Data-driven approaches suggested long-term exposure to PM 2.5 air pollution as a risk factor robustly associated with incident AF. Such ML models combining readily available clinical characteristics and PM 2.5 measurements were found to predict incident AF better than traditional statistical models or even established risk prediction approaches in the Korean general population exposed to high levels of air pollution. Further external validation is warranted especially in Western countries affected by low levels of air pollution.