Factors associated with chronic obstructive pulmonary disease exacerbation, based on big data analysis

Preventing exacerbation in chronic obstructive pulmonary disease (COPD) patients is crucial, but requires identification of the exacerbating factors. To date, no integrated analysis of patient-derived and external factors has been reported. To identify factors associated with COPD exacerbation, we collected data, including smoking status, lung function, and COPD assessment test scores, from 594 COPD patients in the Korean COPD subgroup study (KOCOSS), and merged these data with patients’ Korean Health Insurance Review and Assessment Service data for 2007–2012. We also collected primary weather variables, including levels of particulate matter <10 microns in diameter, daily minimum ambient temperature, as well as respiratory virus activities, and the logs of web queries on COPD-related issues. We then assessed the associations between these patient-derived and external factors and COPD exacerbations. Univariate analysis showed that patient factors, air pollution, various types of viruses, temperature, and the number of COPD-related web queries were associated with COPD exacerbation. Multivariate analysis revealed that the number of exacerbations in the preceding year, female sex, COPD grade, and influenza virus detection rate, and lowest temperature showed significant association with exacerbation. Our findings may help COPD patients predict when exacerbations are likely, and provide intervention as early as possible.

Chronic obstructive pulmonary disease (COPD) characteristically involves an airflow limitation that is not fully reversible. Its worldwide prevalence is increasing, and the Global Burden of Disease Study has estimated that COPD will be the fourth leading cause of death by 2030 1 . Although pharmacotherapies for COPD have improved, many patients still experience exacerbations of COPD, during which respiratory symptoms worsen acutely, and which determine disease-associated morbidity, mortality, resource burden, and healthcare costs 2 . After exacerbations, the patients' symptomatic and pulmonary function recovery takes several weeks, and their quality of life may be seriously degraded. In a large study of commercially insured COPD patients, the total medical and pharmaceutical costs per patient admitted to an emergency department or as a hospital inpatient was approximately $2,000-$40,000 3 . Hence, it is extremely important to prevent acute exacerbation in COPD patients, but this requires identification of the factors associated with exacerbation.
To date, most research on COPD-exacerbating factors has relied on cohort data. Moreover, most studies have focused on factors inherent to the COPD patients themselves, rather than external factors [4][5][6] . Although external factors, such as air pollution and viral infection, are known to contribute to COPD exacerbation [7][8][9] , few studies have analysed the factors associated with COPD based on cohort as well as external data.

Patient characteristics associated with COPD acute exacerbation during a 5-year follow-up period.
Univariate analysis was used to examine the influence of patient characteristics on COPD acute exacerbations during a 5-year follow-up period, and the results are shown in Supplementary Table S2. Old age, female sex, smoking status, high CAT score, low FEV1, higher grade of COPD, number of exacerbations during the previous year (2007), and number of visits to the ER during the previous year (2007) were all associated with COPD acute exacerbation.
Meteorological factors associated with COPD acute exacerbation during a 5-year follow-up period. Table 1 shows the influence of environmental factors on COPD acute exacerbation in terms of univariate analysis. Humidity, variation of diurnal temperature, lowest temperature, and cumulative amount of rainfall during the 7 days before an acute exacerbation were associated with acute exacerbations. Figure 1 shows the effect of the lowest temperature 1 day prior to a COPD acute exacerbation. The lowest temperature was also correlated with PM10 and the virus detection rate (as detailed in Supplementary Table S3).

Air pollution factors associated with COPD acute exacerbation during a 5-year follow-up period.
The results of univariate analyses, showing the relationship between air pollution and COPD exacerbation, are shown in Table 2. PM10 levels 1 day before acute exacerbation was associated with acute exacerbation. The monthly mean incidence rate of COPD acute exacerbation showed a similar tendency with PM10 1 day prior to the acute exacerbation (Fig. 2).
Web query data associated with COPD acute exacerbation during a 5-year follow-up period.
The results of GEE model analysis of the relationship between web search data and COPD exacerbations are shown in Supplementary Table S4. In univariate analysis, the number of logs of online web search queries about COPD, containing the words "flu", "dyspnea", "asthma", "COPD", "acute exacerbation", "emphysema" and "chronic bronchitis" at 1 and 2 weeks prior to exacerbation correlated positively with COPD exacerbation. Viral factors associated with COPD acute exacerbation during a 5-year follow-up period.
Univariate analysis results indicating the relationship between viral factors and COPD acute exacerbation are  Table 3. The detection rate of IFV, hCoV, hRV, and COPD acute exacerbation correlated positively. Figure 3 shows the relationship between the detection rate of viruses and COPD acute exacerbation. Table 4 shows multivariate associations between potential predictor variables and exacerbations. Female sex, number of exacerbations in the baseline year, COPD grade, and detection rate of IFV 2 weeks before acute exacerbation were significantly positively correlated with COPD acute exacerbation. Multivariate analysis revealed that the number of exacerbations in the preceding year, female sex, COPD grade, and IFV detection rate, and lowest temperature showed significant association with exacerbation

Discussion
We merged clinical data from cohort and 6 years' follow-up claims data with external factors, such as air pollution and viral infection rates, to identify factors that were associated with acute COPD exacerbation. The possible factors associated with exacerbation were female sex, a history of frequent exacerbations in the previous year, a higher COPD grade, the IFV detection rate in the period before an acute exacerbation, and a low lowest temperature before an acute exacerbation (Fig. 4).
Many other predictive studies have focused only on factors inherent to COPD patients [4][5][6]11 , or only on external factors, such as environmental or viral factors 8,9,[12][13][14][15] . In the present study, we integrated factors related to the COPD patients with external potentially contributing factors.
Previous studies on predictors of COPD exacerbation have commonly identified airway obstruction (FEV1% predicted, or FEV1, or GOLD stage), previous exacerbations, age, and smoking as the causes of COPD exacerbation 16 . Similar to previous studies, a higher grade of COPD and the number of exacerbations were identified as significant predictors; older age and smoking status were significant in the present univariate, but not multivariate analyses.
A lower lowest temperature and the viral detection rate were related with COPD exacerbation. COPD exacerbations were more common in the winter months, with colder temperature, when respiratory viral infections are more prevalent in the community 9,17 . A two-fold increase in the COPD exacerbation rate in winter has previously been reported 18,19 , and a previous nationwide study has shown that cold temperature increased COPD exacerbation 14 , in agreement with the present findings.    reported that viral infection played a role in 48% of COPD exacerbations; based on polymerase chain reaction analyses, rhinovirus infections were the most common, followed by IFVs, accounting for 23% of virus-associated exacerbations 22 . The present univariate analysis implicated various viruses in COPD exacerbation, as in previous studies. However, in multivariate analysis, the correlation between viruses other than IFV and COPD exacerbation was not statistically significant, perhaps due to the confounding effect of temperature, which is a very powerful factor in COPD exacerbation.
In this study, PM10 was identified as a significant predictor in univariate analysis, and the monthly mean trend of PM10 showed a similar tendency as that of the COPD exacerbation incidence rate. Due to the rapid urbanization of the world population, air pollution has become a major health problem. Outdoor air pollution also seems to be an important environmental trigger for acute exacerbation of COPD 23 . A few studies have reported the relationship between air pollution, such as particulate matters (PM10, PM2.5) and harmful gases (NO 2 , SO 2 , and O 3 ), and COPD exacerbation 7,24-26 . The deposition of PM in the respiratory tract depends on the size of the particles, and its insufficient clearance may cause a chronic, low-grade inflammatory response, which is known to cause COPD exacerbations 27 . Currently, there is insufficient evidence that air pollution is a causative factor of COPD, further studies on this topic are therefore recommended. Although multivariate analysis showed that the correlation between PM10 and COPD exacerbation was not statistically significant in this study, this discrepancy could also be attributed to the powerful factor of temperature.
Lower temperature, a higher virus detection rate, and a higher concentration of PM10 were significantly correlated with COPD exacerbation in the present study. This result suggested that cold weather eventually increases viral infection and air pollution. Multivariate analysis showed that COPD exacerbations are not related to viruses other than influenza, and air pollution, but this is assumed to have been due to the temperature factor, which affects viral infection and air pollution, rather than due to their actual irrelevance.
Our study demonstrated that the number of exacerbations during the previous year is a significant predictive factor, which was in accord with the findings of previous studies. Make et al. reported that a history of prior exacerbations and more exacerbations during a previous year were strong predictors for future exacerbations 28 . Müllerová et al. also reported that patients with only one prior moderate exacerbation were at increased risk of future exacerbation events 29 . A large observational cohort study also demonstrated that the single best predictor of exacerbations was a history of previous exacerbations 30 . The findings of these studies were consistent with those reported in the present study.
Our study had several strengths. First, this study analysed both patient factors and external factors. Furthermore, we merged our cohort data with claims data. Cohort data contained detailed and accurate information regarding COPD, such as lung function and CAT score, while claims data included medical reimbursement records for the entire Korean population, and allowed us to obtain the exact date of COPD exacerbation. Using this information, we could precisely match COPD exacerbations with external factors, which included information on the weather, air pollution, and viruses. Moreover, no previous study had analysed COPD patients using nationwide big data or long-term follow-up (5 years) data.
The study had some limitations. First, COPD exacerbation might be under or overestimated in this study, because the exacerbation data analysed were based on claims data, rather than being acquired via an exacerbation diary or patient interview. Second, since the concentration of PM and the detection rate of viruses vary according OR (95% CI) P-value  www.nature.com/scientificreports www.nature.com/scientificreports/ to region, consideration should be given to the local distribution of patients. We had, in fact, attempted to adjust for the local distributions of patients based on claims data. It is possible to identify the location of the hospital from the HIRA data; however, this adjustment is not perfect, since the patient may visit a hospital far from home. A further study that considers the regional distribution is therefore required. Third, some of the baseline data were adopted from the cohort data, and there was a time difference between cohort data and HIRA data. Thus, CAT, PFT, BMI, smoking status were actually measured after the follow-up period. Also, difference of data measurement period (e.g., daily mean PM10 and weekly viral infection status) should be considered as the limitation of this study. Finally, we did not weight on any variables in multivariate analysis. The methodology we developed in this study is the first attempt in the world, and there has been no reference how to weight in equation. Further study with large number of patients will be needed.
In conclusion, in this study, we merged factors inherent to COPD patients and external factors, such as environmental, viral, and web search data, to identify factors associated with acute exacerbation of COPD. Univariate analysis showed that patient factors, air pollution, various types of viruses, temperature, and the number of web queries about COPD were associated with COPD exacerbation. We demonstrated that the number of exacerbations in the preceding year, being female, a high grade of COPD, the IFV detection rate, and a low lowest   www.nature.com/scientificreports www.nature.com/scientificreports/ temperature were significantly associated with future exacerbation events, according to multivariate analysis. Since exacerbations can negatively impact health status and disease progression 31,32 , these findings may help to identify the COPD patients who are at risk of exacerbation and to provide intervention as early as possible.

Methods
Study design and data source. We investigated data from 594 COPD patients who were enrolled in the Korean COPD subgroup study (KOCOSS) cohort between December 2011 and March 2014. Patients were eligible if they had been diagnosed with COPD by a pulmonologist, were aged ≥40 years, had post-bronchodilator FEV1/FVC < 0.7, and if respiratory symptoms, such as cough, sputum, or dyspnoea, were present. Detailed information regarding KOCOSS cohort is described in a previous publication 33 .
We obtained four data items from KOCOSS cohort, i.e., smoking status, lung function, body mass index (BMI), and COPD assessment test (CAT) scores. We then merged these items with their HIRA data from between 2007 and 2012; the latter data included details on comorbidity, medication, and health care utilization. Moderate exacerbation was defined as when COPD patients visited outpatient clinics with an ICD-10 code of COPD (J43. x−44.x, except J430, as the primary or within the fifth secondary diagnosis) and systemic steroid medication with or without antibiotics were prescribed. Severe exacerbation was defined as when COPD patients visited the emergency room or were admitted to hospital with an ICD-10 code of COPD and were prescribed steroid medication with or without antibiotics. A high COPD grade was arbitrarily defined as patients who were (1) tertiary hospital care patients meeting the above definition of COPD, and (2) regularly used triple inhaler therapy (inhaled corticosteroids [ICS] + long acting beta-2 agonists [LABA] + long acting muscarinic antagonists [LAMA]) or used of systemic steroid therapy at least twice per year with COPD inhaler therapy (LAMA or ICS + LABA) 34 . The HIRA 2007-2012 data were used to analyse utilization of COPD medication, comorbidity, and hospital admission. The 2007 data were used to identify the history of COPD exacerbation over the past year, and the 2008-2012 data were used to analyse the relationship between COPD exacerbation and putative predictive factors.
We also collected primary weather variables, including levels of particulate matter <10 microns in diameter (PM10, µg/m 3 ), daily minimum ambient temperature (°C), and daily precipitation for 2007-2012. Daily local meteorological weather data, measured at the local weather stations, were provided by the Korean Ministry of the Environment. All the information amassed was matched to the patients' addresses. Matching was performed at province level. The area of the respective provinces (n = 16) ranged from 501 to 19,031 km 2 (median: 6,628 km 2 , interquartile range: 827-10,362 km 2 ).
The Korean National Institute of Health (KNIH) monitors the trends of virus activity on a nation level. KNIH has surveyed respiratory viral infection status since 2000 and provided information to the public on a weekly basis (detection rate, %). Data on the activities of influenza adenovirus, parainfluenza virus, respiratory syncytial virus, influenza virus (IFV), human coronavirus (hCoV), human rhinovirus (hRV), human bocavirus, and human enterovirus were collected from the KNIH database for the 2007-2012 period.
Additionally, we tracked the tendencies of online web search queries (normalized search volume index, 0-100), containing the key terms "flu", "dyspnea", "asthma", "COPD", "acute exacerbation", "emphysema, " and "chronic bronchitis" on Naver (www.naver.com; the biggest search engine site in Korea), for the period 2007 to 2012. www.nature.com/scientificreports www.nature.com/scientificreports/ Statistical analysis. Descriptive statistics were used to characterize the study population (mean and standard deviation [SD], and percent). Independent t-tests were used to compare continuous variables and chi-square tests were used for categorical variable comparisons. Because all longitudinal data related to these patients are measured repeatedly, it violates the independency assumption of general statistical techniques. Therefore, the occurrence of acute exacerbation was investigated using generalized estimating equations (GEEs) 35 . Univariate analysis was performed by GEE between COPD exacerbation and single factor one by one. Then, multivariable analysis was performed with all factors included. Factors included in multivariable analysis were time variables, sex, smoking history, CAT score, FEV1(%), prescription of COPD medication, ICD-10 code for comorbidities, number of hospital admissions during the previous year (2007), number of ER visits during the previous year (2007), the history of acute exacerbations during the previous year (2007), grade of COPD, activities of viruses, cumulative exposures to PM10, meteorological factors, and online web search query logs. Subsequently, backward elimination of significant variables was performed. Pearson's correlation coefficient and Spearman's correlation coefficient were used to assess the relationships between lowest temperature and other variables. All analyses were performed with SAS software, version 9.4 (SAS Institute, Cary, NC).