Understanding the role of the chromosome 15q25.1 in COPD through epigenetics and transcriptomics

Chronic obstructive pulmonary disease (COPD) is a major health burden in adults and cigarette smoking is considered the most important environmental risk factor of COPD. Chromosome 15q25.1 locus is associated with both COPD and smoking. Our study aims at understanding the mechanism underlying the association of chromosome 15q25.1 with COPD through epigenetic and transcriptional variation in a population-based setting. To assess if COPD-associated variants in 15q25.1 are methylation quantitative trait loci, epigenome-wide association analysis of four genetic variants, previously associated with COPD (P < 5 × 10−8) in the 15q25.1 locus (rs12914385:C>T-CHRNA3, rs8034191:T>C-HYKK, rs13180:C>T-IREB2 and rs8042238:C>T-IREB2), was performed in the Rotterdam study (n = 1489). All four variants were significantly associated (P < 1.4 × 10−6) with blood DNA methylation of IREB2, CHRNA3 and PSMA4, of which two, including IREB2 and PSMA4, were also differentially methylated in COPD cases and controls (P < 0.04). Further additive and multiplicative effects of smoking were evaluated and no significant effect was observed. To evaluate if these four genetic variants are expression quantitative trait loci, transcriptome-wide association analysis was performed in 1087 lung samples. All four variants were also significantly associated with differential expression of the IREB2 3’UTR in lung tissues (P < 5.4 × 10−95). We conclude that regulatory mechanisms affecting the expression of IREB2 gene, such as DNA methylation, may explain the association between genetic variants in chromosome 15q25.1 and COPD, largely independent of smoking.


Introduction
Chronic obstructive pulmonary disease (COPD) affects over 300 million people and is the third leading cause of death worldwide, which makes it a major public health burden [1]. COPD is characterised by airflow limitation and chronic, inflammatory response of the airways to cigarette smoke, occupational exposures, air pollution etc. [2]. Systemic inflammation and complications, together with comorbid conditions, add to its complexity [3].
COPD is determined by both genetic and environmental factors. Genetic factors explain 20-40% of the variance in the disease [4], while the most important environmental risk factor is smoking. Smokers have 10-fold increased incidence of COPD compared to never-smokers [5]. Genomewide association studies (GWASs) revealed genetic variants associated with COPD and lung function [6,7]. From a genetic-epidemiological perspective, the chromosome 15q25.1 locus is of interest, harbouring three nicotinic receptors (CHRNA3, CHRNA5 and CHRNB4) and other genes that could have potential impact on COPD, including IREB2, PSMA4 and HYKK [8][9][10][11]. The 15q25.1 region has also been associated with smoking [12,13] and lung cancer [14] in large GWASs. Because smoking is a risk factor for both COPD and lung cancer, the association of this locus with COPD and lung cancer might be mediated through smoking [15], which is in line with the only longitudinal study investigating this hypothesis thus far [16].
The mechanism through which these single-nucleotide polymorphisms (SNPs) and smoking are involved in COPD and related outcomes remains obscure. SNPs in the 15q25.1 locus are shown to be cis-expression quantitative trait loci (cis-eQTLs) in blood [17], brain [18], sputum [19] and lungs [20,21]. This raises the question of whether the variants are involved in COPD through the regulatory mechanisms. DNA methylation is a heritable, dynamic, epigenetic mark that plays a critical role in the regulation of gene expression [22]. Despite having a strong genetic component, DNA methylation is known to respond to changes in environmental factors [23], and its role in mediating genetic risk effect and the interaction with environmental exposure has been widely proposed [24].
Recent studies have shown the association between genome-wide patterns of DNA methylation variation with smoking behaviour [25,26] and COPD [22,27]. Though differential DNA methylation sites (CpG) in CHRNA3 (15q25.1) were associated with COPD status and lung cancer [28], the role of DNA methylation as a mechanism through which the genetic variants may be involved in COPD and related outcomes remains unexplored.
In this study we selected all SNPs in the 15q25.1 region, associated with COPD in GWAS: rs12914385:C>T (CHRNA3), rs8034191:T>C (HYKK, CHRNA3, CHRNA5), rs13180:C>T (IREB2) and rs8042238:C>T (IREB2) [8][9][10][11]. The rs12914385:C>T is a top hit in the largest GWAS of COPD (odds ratio (OR) = 1.39, P = 2.7 × 10 −16 ) [8]. SNPs rs12914385:C>T and rs8034191:T>C are in moderate linkage disequilibrium (LD; r 2 = 0.723), while rs13180: C>T and rs8042238:C>T are in perfect LD (r 2 = 0.997). Rs8034191:T>C is estimated to explain 12.2% of COPD risk in the general population and 14.3% in current smokers [11]. Rs13180:C>T is associated with COPD, independently of smoking and of rs8034191:T>C [10,15,21]. Evidence suggests that the association of COPD with rs8034191:T>C may be mediated by smoking, while the association with rs13180:C>T is independent of smoking [21,29]. We examined whether SNPs in the 15q25.1 locus are also associated with differential DNA methylation in the population-based Rotterdam study (RS). Further, we tested if methylation patterns associated with these variants are associated with COPD and FEV 1 /FVC (the ratio of the forced expiratory volume in first second (FEV 1 ) over forced vital capacity of the lungs (FVC)). We also tested whether the variants are associated with a differential expression in non-tumour lung tissue from the Lung eQTL study (LES).

Study population
Participants of the discovery and replication cohorts were part of the RS, a prospective, population-based study, designed to investigate the occurrence and determinants of diseases in the elderly, as described elsewhere [30]. The discovery cohort of our epigenetic analysis is a random sample of 723 participants from RS with complete phenotype, genome-wide genotype and methylation data available. An independent sample of 766 participants from RS were included as a replication cohort. RS is part of the Biobanking and Biomolecular Resources Research Infrastructure for The Netherlands (BBMRI-NL), BIOS (Biobank-based Integrative Omics Studies) project [31]. The epigenome-wide association study (EWAS) data of RS were made publically available as a Rainbow Project (RP3; BIOS) of the BBMRI-NL (data access link: http://wiki. bbmri.nl/wiki/BIOS_bios). Results of this study are available through dbGaP (accession number phs000930, https:// www.ncbi.nlm.nih.gov/gap).
Detailed information on spirometry measures, COPD diagnosis, COPD SNP selection, genotyping and DNA methylation assessment in RS and RNA array in LES is provided in the Supplementary information. LES included patients from three participating sites: University of Groningen (GRN), Laval University (Laval) and University of British Columbia (UBC).

Statistical analyses
First, we tested the association of the four selected SNPs with COPD and FEV 1 /FVC in our discovery and replication cohorts using logistic and linear regression models, respectively, adjusted for age and sex in model 1 and additionally adjusting for current smoking and pack-years in model 2. Results from the two cohorts were then metaanalysed using fixed effects models with "rmeta" package in R. Further, in the RS discovery cohort (n = 723), we performed four EWASs to assess the relationship between dosages of each SNP, as independent variable, and epigenome-wide DNA methylation in blood as dependent variable. We applied linear regression methods using two models. One adjusted for age, sex, technical covariates to correct for batch effects (array number and position on array), and white blood cell types to correct for the cellular heterogeneity of blood (number of lymphocytes, monocytes and granulocytes) (model 1). The other was adjusted additionally for current smoking and pack-years, the number of cigarette packs smoked in 1 year (model 2). The false discovery rate (FDR) of <0.05 was used to declare epigenomewide significance. Significant sites were tested in the replication cohort (n = 766) using the same models. Since the 15q25.1 region is also associated with smoking behaviour, significant CpG sites were also tested in a third model including "SNP×current smoking" and "SNP×packyears" interaction terms to assess possible genetic-environment interaction between the tested SNPs and smoking per cohort. Per cohort results were meta-analysed using fixed effects models with "rmeta" package in R. Associations of the identified CpG sites with COPD and FEV 1 /FVC were further performed using logistic and linear regression, respectively, adjusted for age, sex, technical covariates and white blood cell counts in both the discovery and replication cohorts, and meta-analysed as mentioned above.
Finally, we assessed whether the identified SNPs were acting as eQTLs in lung tissue. We performed genome-wide eQTL analysis in 1087 samples from GRN, UBC and Laval, compared to Nguyen et al. [20] who used Laval (N = 420) as the discovery and UBC and GRN samples for replication. First, cohort-specific (GRN, Laval and UBC) principal components (PCs) explaining at least 1% were calculated based on residuals from linear regression models on genome-wide 2-log transformed gene expression levels (of each probe separately) adjusted for COPD status, age, sex and smoking status. Second, in each cohort separately, linear regression analysis was used to test for association between the SNPs and genome-wide 2-log transformed gene expression levels. SNPs were tested in an additive genetic model and the models were adjusted for disease status (COPD, alpha-1 antitrypsin deficiency, idiopathic pulmonary fibrosis, pulmonary hypertension, cystic fibrosis and other disease), age, sex, smoking status and the cohortspecific number of PCs (14 PCs for GRN and Laval, and 16 for UBC). Finally, SNP effect estimates of the three cohorts were meta-analysed using fixed effects models with effect estimates weighted by the reciprocal of the estimated variance. We used FDR < 0.05 to correct for multiple testing. Data presented as percent or mean ± SD; for COPD status, which was not available for all participants, the valid percentage is denoted in brackets (percent of all); in the discovery cohort 68 patients and in the replication cohort 82 patients were excluded from the association analyses with COPD due to possible asthma All all participants included in EWAS, COPD chronic obstructive pulmonary disease cases, P-value P-value of the difference of discovery and replication cohorts *P-value of the difference of COPD status in discovery and replication cohorts a Pack-years calculated in current and ex-smokers only

Results
The discovery set comprised 723 participants of RS, with genotype and DNA methylation data from whole blood, including 114 COPD cases and 541 controls (68 excluded due to possible asthma). The replication set comprised 766 independent participants of Rotterdam study, with genotype and DNA methylation data, including 93 COPD cases and 591 controls (82 excluded due to possible asthma). The characteristics of the discovery and replication cohorts are shown in Table 1. COPD cases were more often male and smokers and had smoked on average more pack-years compared to controls. Three of the four selected SNPs (rs12914385:C>T, rs13180:C>T and rs8042238:C>T) on chromosome 15q25.1 were nominally associated with COPD in RS (n = 1339), while only rs12914385:C>T was nominally associated with COPD in a considerably smaller dataset from LES (n = 512) ( Table S1). None of the SNPs were associated with FEV 1 /FVC (Table S1). To determine whether the four SNPs (rs12914385:C>T, rs8034191:T>C, rs13180:C>T and rs8042238:C>T) in chromosome 15q25.1 are methylation QTLs (meQTLs), we performed EWAS in RS for each SNP. Significant associations (FDR < 0.05) were detected at 14 unique CpG sites ( Fig. 1), 12 sites in cis (within the window of 400 kb) (Fig. 2), and two in different chromosomes 6 and 12 (Table 2, model 1). Of these 14 CpG sites, 10 were significantly replicated in the independent sample from RS at a significance level of P < 0.0019, corresponding to the Bonferroni correction for number of tests performed in the replication sample (n = 26) (Table 2, model 1). All four SNPs were significantly associated with differential methylation at three CpG sites (cg18825076, cg04882995 and cg04140906) in IREB2, CHRNA3 and PSMA4, respectively ( Table 2, Fig. 2). Addition of smoking as a confounder did not change the results (Table 2, model 2), suggesting that the effects of the SNPs on DNA methylation are independent of smoking. However, significant genetic-environment interaction between pack-years and rs12914385:C>T (CHRNA3) was observed for DNA methylation levels at two CpG sites including the top hit, cg18825076 (IREB2; P interaction = 5.0 × 10 −4 ) and cg00540400 (ADAMTS7-MORF4L1; P in- Table S2). The association between rs12914385:C>T (CHRNA3) and cg18825076 and cg00540400 remained significant, albeit slightly decreased (Table S2). However, the direction of the effect of both significant interactions was opposite in the discovery and replication cohorts (Table S3), suggesting this is likely a false positive finding. Interestingly these two CpG sites, i.e., cg18825076 and cg00540400, and a third one, cg04140906 in PSMA4, were nominally associated (P < 0.04) with COPD (Table 3, Table S4, model 1). When correcting for smoking (Table 3, Table S4, model 2) the association between cg18825076 and COPD disappeared (P = 0.16), while the association of the other two CpG sites (cg00540400 and cg04140906) became stronger (P < 0.03). None of the CpGs were associated with FEV 1 /FVC (Table S5).
Results involving differential DNA methylation at IREB2 are of special interest, as we have detected association with COPD risk allele (rs12914385:C>T) and with COPD status. These relationships are illustrated in Fig. 3. We found a higher frequency of T allele among the COPD cases, compared to the controls in our dataset (Fig. 3a). Further, the T allele was associated with lower DNA methylation of the cg18825076 (Fig. 3b). Lastly, COPD cases showed lower DNA methylation at cg18825076 compared to controls (Fig. 3c).
Finally, in the eQTL analysis in 1087 lung tissue samples from LES, significant association (FDR < 0.05) was observed at 15 expression probe sets both in cis and trans . The x-axis shows all genes in the region as well as the position of SNPs of interest. The y-axis shows negative logarithm of the P-values of the associations of CpGs with the corresponding SNP (distinguished by different colours). In the case a CpG associates with more than one SNP, the smaller P-value is taken into account   (Table 4). All four SNPs were significantly associated with probe set 100154936_TGI_at (3' untranslated region (UTR) of the IREB2 gene, P < 3.2 × 10 −98 ) (Fig. 4). The T allele of rs12914385:C>T was associated with higher expression of this probe. Trans-eQTL effects were observed in the chromosomes 2, 3, 4, 13 and 14 ( Table 4). The rs8034191:T>C was associated with gene expression of FAM13A intron in chromosome 4 (100158626_TGI_at), a very well-known gene involved in COPD [10].

Discussion
In the current study we show that four COPD-associated SNPs (rs12914385:C>T, CHRNA3; rs8034191:T>C, HYKK; rs13180:C>T and rs8042238:C>T, IREB2) in the 15q25.1 locus are also blood meQTLs that regulate mostly nearby DNA methylation levels. All variants are associated with differential DNA methylation of IREB2, CHRNA3 and PSMA4 genes, independently of smoking. We further show that DNA methylation at two sites in genes IREB2 and PSMA4, together with the site between ADAMTS7 and MORF4L1, are associated with COPD. Finally, we show that all four SNPs are also lung cis-and trans-eQTLs, affecting the expression of several genes including IREB2, PSMA4, CHRNA3, CHRNA5, HYKK, FAM13A, KLC1 and TRIM13. Our results demonstrate that COPD-SNPs shape the epigenetic regulatory landscape in the 15q25.1 locus in blood and lung tissues, and suggest that the genetic risk of these SNPs on COPD might be mediated and/or modified by DNA methylation levels in this region. Overall, our findings put forward the role of DNA methylation in COPD as an important mechanism in the complex regulation of the 15q25.1 locus. Rs12914385:C>T and rs8034191:T>C are in moderate LD (Table S6), and hence as expected they yielded multiple overlapping results. The rs13180:C>T and rs8042238:C>T are in perfect LD (Table S6) and thus showed almost the same effects. We show that rs12914385:C>T, rs8034191: T>C and rs13180:C>T are associated with COPD in RS, but only rs12914385:C>T in LES, which is expected given the lower sample size. While rs12914385:C>T was not vastly studied, rs8034191:T>C is a well-studied SNP with regards to COPD and lung function in different populations [32][33][34][35]. It has been shown that rs8034191:T>C is associated with an increased risk for COPD independent of rs13180:C>T and pack-years of smoking [33]. On the contrary, the same study shows that the association of rs13180:C>T with severe COPD is possibly driven by moderate LD with rs8034191:T>C (r 2 = 0.21). However, a study in a Chinese population shows association of both SNPs with lung function but not with COPD, and the association of rs8034191:T>C with pack-years in COPD  cases [35]. A study including 3424 COPD cases and 1872 controls showed that the association of rs8034191:T>C with COPD is 30% mediated by pack-years, and this mediation increases to 42% when adjusted for rs13180:C>T [15]. In our study we show that the rs8034191:T>C yielded more epigenome-wide significant results and that some overlapped with rs13180:C>T (Fig. 2). The most interesting finding of this study is that all four SNPs influence the three CpG sites in the IREB2, CHRNA3 and PSMA4 genes. Furthermore, the same site in PSMA4 is also associated with COPD, independent of smoking. Site in IREB2, our top hit, is also associated with COPD, but this association drops after correcting for smoking. This suggests that the four genetic variants may influence COPD susceptibility through changes in DNA methylation of IREB2, PSMA4 and CHRNA3.
We focused on chromosome 15q25.1 region hits from COPD GWASs. However, this region has also been reported in the association with smoking by several large smoking genetics consortia [12,13]. They showed that locus in 15q25.1, represented by rs16969968:G>A, and other SNPs are mostly associated with smoking quantity. Saccone et al. [36] showed no significant association between rs16969968:G>A and COPD in smokers, adjusted for cigarettes per day. This SNP was in high LD with our SNPs, rs12914385:C>T (r 2 = 0.84) and rs8034191:T>C (r 2 = 0.93), but in this study we show that the proposed pathway, in which SNPs act as meQTLs, is mainly independent of smoking. Additionally, none of our four SNPs were found to be significant meQTLs in the brain in a study of nicotine dependence [37]. Despite the well-described role of smoking behaviour in shaping the methylome at multiple tissues [25,26], our SNP-DNA methylation associations could not be explained by exposure to smoking, making a mediating effect of smoking improbable. In line with our observations, the studies of the role of smoking on DNA methylation have failed to show an effect in this region [25,[38][39][40][41]. In a recent large smoking EWAS meta-analysis [41], none of our 10 replicated sites were associated with smoking, strengthening our claims. Yet, the failure to detect an association in these analyses does not necessarily represent absence of a true effect, it may rather reflect the lack of the statistical power to detect a true interaction. Further larger studies are needed to elucidate this question.
In blood, rs12914385:C>T, rs13180:C>T and rs8042238:C>T have been previously associated with differential expression of IREB2 and PSMA4, while rs8034191:T>C was associated with the expression of IREB2, PSMA4 and CHRNA5 [17]. However, in the present study we examined the gene expression in the lung tissue and found that all four SNPs were also lung eQTLs for multiple genes in cis, including IREB2, PSMA4, CHRNA3, CHRNA5, and HYKK, as well as with other genes in trans involved in COPD pathogenesis, such as FAM13A (chromosome 4) [10]. Compared to Nguyen et al. [20], who only used a part of this dataset as discovery, we showed that in addition to CHRNA3, CHRNA5 and PSMA4, the four SNPs were also associated with differential expression of IREB2 and HYKK. Again, all SNPs were associated with differential expression of IREB2 gene, suggesting that possibly the genetic variants are involved in the pathogenesis of the Model 1 adjusted for age, sex, technical covariates and different white blood cellular proportions; model 2 additionally adjusted for current smoking and pack-years; nominally significant results are shown in bold β coefficient estimates from the logistic regression models, P P-value of the significance disease through differential DNA methylation and regulation of expression. Based on our results in relation to IREB2, we propose a disease model in which the COPD-risk allele (rs12914385: C>T, CHRNA3) exerts its risk by lowering the DNA methylation level at IREB2 gene and subsequently increasing its expression in COPD patients. The lower level of IREB2 DNA methylation in blood from COPD cases and the positive effect of the risk allele on gene expression in lungs support this scenario. As we do not have the methylation and expression data in the same tissue, we were not able to validate this hypothesis directly, but future integrative studies in lung tissue should elucidate this further. IREB2 gene is coding the RNA-binding protein that binds to iron-responsive elements and can regulate the expression of transferrin receptor and ferritin by changing its own protein expression and thereby regulate iron metabolism, important in pathogenesis of lung diseases [42]. It is shown that IREB2 gene interacts with MYC and MAX genes involved in the regulation of the gene transcription through epigenetic changes [43].
The strength of this study is that our findings are based on a large and unique sample of patients who are in-depth characterised genomically and epigenetically and our findings on the role of the genetic variants in blood corroborate with the changes in the transcriptome in lung. However, there are some limitations to our study. The first and main limitation is the use of blood in the DNA methylation Fig. 3 Interplay between genetics, DNA methylation and COPD status at IREB2 gene. a The frequency of T and C alleles of rs12914385:C>T (y-axis) among COPD cases and controls (x-axis) (N = 1339, P = 0.043). Homozygotes were counted as carrying two and heterozygotes as carrying one copy of a given allele. b Differences in cg18825076 DNA methylation levels (y-axis) between rs12914385:C>T genotypes (x-axis) (N = 1489, P = 1.05 × 10 −125 ). c Differences in cg18825076 DNA methylation levels (y-axis) in COPD cases and controls (x-axis) (N = 1339, P = 0.04) analysis as a proxy for clinically and biologically relevant changes that develop in the lungs. In the absence of lung tissue DNA methylation measurements, blood is the most reasonable surrogate for examination of methylation changes related to COPD and smoking. This is because the disease, apart from affecting lung tissue, also induces systemic changes and has been associated with elevation in markers of systemic inflammation [44]. Furthermore, studies comparing the DNA methylation patterns in multiple tissues confirm that there is a great overlap in patterns, encouraging us to believe that blood is a good surrogate to study differences that occur in lungs [45,46]. The second limitation is the use of COPD definition based on the prebronchodilator spirometry. This measure demonstrates the variability of the smooth muscle contraction, while by using post-bronchodilator spirometry we can observe the irreversibility of the airflow limitation, the main characteristic of COPD [47]. However, in the attempt to minimise potential misclassification, we have identified and excluded all possible asthmatic patients. In addition, some epidemiological studies show that both pre-and post-bronchodilator spirometry predicted mortality related to COPD with a similar degree of accuracy [48]. The third limitation to this study is the use of the whole lung tissue for gene expression analysis, which is very heterogeneous, instead of identifying the source cells for our eQTL signals. Finally, we formulated hypothesis based on results obtained from a crosssectional study, in which inferences on directionality of effects are complicated. Future results based on longitudinal studies will help to support the role of DNA methylation and gene expression as regulatory mediators for COPD genetic risk. In addition, further replication of our results in other ethnicities and more diverse studies may corroborate our results. Since genetic variants identified through GWASs usually map to non-coding intergenic and intronic regulatory regions, the functional role is often unclear [49,50]. They are more likely to modulate gene expression through regulatory mechanisms and epigenetic modifications (e.g., DNA methylation), as we show in this study. However, we do not show significant association of all SNPs with the disease, but this is expected since our sample size is considerably smaller than that used in the original GWAS. Our study did not aim to prioritise between the four variants in terms of relevance to COPD, but to investigate if the variants in 15q25.1 region, associated to COPD, are involved in regulatory mechanisms. The changes in DNA methylation levels that we observed in blood are small, most likely because we assess differential methylation in a variety of cell types. More substantial changes may be found in lung tissue in future integrative studies.
In summary, we found evidence suggesting that genetic variations underlying IREB2, HYKK and CHRNA3 act as cis meQTLs and eQTLs. They all affect the DNA methylation and expression of IREB2, which also contributes to the risk of having COPD. We did not find evidence that smoking mediates these relationships, although this should be corroborated in larger sample sizes. This finding is compatible with the hypothesis that the genetic variants are involved in the pathogenesis of COPD through differential methylation and regulation of expression. Future integrative studies quantifying both DNA methylation and gene expression in lung tissue, as well as functional studies, are needed to confirm suggested hypothesis.

Ethics approval and consent to participate
The Rotterdam Study has been approved by the Medical Ethics Committee of the Erasmus MC and by the Ministry of Health, Welfare and Sport of the Netherlands, implementing the Population Studies Act: Rotterdam Study. All participants provided written informed consent to participate in the study and to obtain information from their treating physicians. Patients from LES provided written Fig. 4 3'UTR IREB2 expression plot in regards to the genotypes of four SNPs. Plots of genotype-specific mean residuals (with 95% confidence intervals) from the linear regression models on gene expression levels adjusted for disease status, age, sex, smoking status and the cohort-specific number of PCs. The association of residuals of the expression levels (probe set 100154936_TGI_at, 3'UTR IREB2) on the y-axis with genotypes of a rs12914385:C>T, b rs8034191:T>C, c rs13180:C>T and d rs8042238:C>T on the x-axis, risk/tested allele is given in brackets; β is regression coefficient estimate, P is P-value of the significance, TT/TC/CC are different genotypes of the given SNP and number of carriers is given in brackets informed consent and the study was approved by the ethics committees of the Institut universitaire de cardiologie et de pneumologie de Québec and the UBC-Providence Health Care Research Institute Ethics Board for Laval and UBC, respectively. The study protocol was consistent with the Research Code of the University Medical Centre Groningen and Dutch national ethical and professional guidelines.