Main

Measures of pulmonary function, such as FEV1 and FEV1/FVC ratio, are important predictors of population morbidity and mortality1,2,3,4 as well as forming the basis for the diagnosis of COPD. It is well established that pulmonary function is partially genetically determined. Twin studies in European and US populations give heritability estimates for FEV1 as high as 0.77 (refs. 5,6). Longitudinal studies in families suggest that genetic effects are consistent over time7. Genetic determinants of pulmonary function seem to operate, at least in part, independent of disease status (such as asthma) and smoking status8, suggesting that population-based association studies are a viable way to identify key genetic determinants of lung function.

Adequately powered genome-wide association studies (GWAS) using hundreds of thousands of common SNPs can identify loci associated with common diseases and the quantitative traits that underlie them. Collaborative studies achieving sample sizes in excess of 10,000 have been able to identify associations with common genetic variants with typically modest effect sizes (usually <0.1 s.d.)9. In the past year, GWAS have reported association between an intergenic locus at chromosome 4q31 and FEV1/FVC ratio and COPD, but no large-scale collaborative GWAS have yet been undertaken for lung function10,11.

If common SNPs underlying lung function have modest effects, very large sample sizes will be required to identify them. We therefore established the SpiroMeta consortium to facilitate large-scale meta-analysis of GWAS of lung function. Here we report a meta-analysis of GWAS in the SpiroMeta consortium, comprising 20,288 individuals of European ancestry, that tested association between cross-sectional lung function measures and 2.5 million genotyped or imputed SNPs (stage 1). We followed up SNPs drawn from the most significantly associated loci in up to 32,184 individuals by direct genotyping (stage 2a) and using in silico summary association data relating to a further 22,092 individuals (stage 2b). These studies confirm the previous reported association at 4q31 and show that five previously unreported loci are robustly associated with lung function.

Results

Genome-wide association with lung function (stage 1)

We included 14 studies of individuals of European ancestry, with sample sizes totaling 20,288 (Table 1). All individuals had measures of FEV1 and FVC and smoking status recorded. FEV1 and (separately) FEV1/FVC measures were adjusted for age, age2, sex, height and ancestry principal components within each study. Genome-wide genotyping was undertaken with a variety of platforms, and standard quality control measures were used (Online Methods and Supplementary Table 1). Genotypes were imputed for 2.5 million autosomal SNPs from HapMap CEU data and tested for association separately for the inverse-normal transformed residuals of FEV1 and FEV1/FVC under an additive genetic model. We carried out meta-analysis of study-specific test statistics using an inverse variance weighting. We applied genomic control at the study and meta-analysis levels to avoid overinflation of test statistics owing to population structure or relatedness. Test statistic inflation before applying genomic control at the meta-analysis level was modest (λGC = 1.046 for FEV1 and 1.035 for FEV1/FVC). The plots of meta-analysis test statistics against expected values under the null hypothesis showed an excess of extreme values even after exclusion of the previously reported11 4q31 locus near HHIP, indicative of additional loci associated with lung function (Supplementary Fig. 1a,b).

Table 1 Study characteristics

We observed independent regions of association at 17 loci with P < 1 × 10−5 for FEV1 and 23 for FEV1/FVC (Figs. 1a,b and 2), including three regions (4q24 in GSTCD, 4q31 near HHIP and 15q23 in THSD4) that reached P < 5 × 10−8 in the stage 1 GWAS data alone, corresponding to a threshold of P < 0.05 after adjusting for 1 million independent tests12. SNP rs12504628, which was associated with both FEV1/FVC (P = 6.48 × 10−13; Fig. 2c and Table 2) and FEV1 (P = 1.50 × 10−10; Table 3), lies in an intergenic region upstream of HHIP and spanning 300 kb at 4q31 that has been associated with lung function11, COPD11 and height9. Our top SNP rs12504628 was in strong linkage disequilibrium (LD; r2 = 0.97) with the previously reported SNP associated with lung function, rs13147758 (P = 5.30 × 10−10 for FEV1 and P = 1.11 × 10−12 for FEV1/FVC in our data), and with SNPs associated previously with height (rs6854783, r2 = 0.55; rs2055059, r2 = 0.48), suggesting a role in skeletal growth and development. The hedgehog gene family, of which HHIP is a member, encodes signaling molecules involved in regulating lung morphogenesis, suggesting other mechanisms underlying these associations13. This intergenic region also contains multiple ESTs in human fetal lung (UCSC Browser).

Figure 1: Manhattan plots of association results for FEV1 and FEV1/FVC (analysis stage 1).
figure 1

(a,b) Manhattan plots ordered by chromosome position. SNPs for which −log10P > 5 are indicated in red. The six loci indicated by arrows showed association with FEV1 (a) or FEV1/FVC (b; P < 5 × 10−8) in the meta-analysis of data from stages 1, 2a and 2b.

Figure 2: Regional association plots of six lung function–associated loci.
figure 2

(af) Statistical significance of each SNP on the −log10 scale as a function of chromosome position (NCBI build 36) in the meta-analysis of stage 1 data alone. The sentinel SNP at each locus is shown in blue; the correlations (r2) of each of the surrounding SNPs to the sentinel SNP are shown in the indicated colors. The six loci included are those that showed association with FEV1 or FEV1/FVC (P < 5 × 10−8) in the meta-analysis of data from stages 1, 2a and 2b. The combined P values for all stages are indicated by arrows. The relevant trait (FEV1 or FEV1/FVC ratio) is indicated for each plot. For rs12504628, the plot shows only the association of FEV1/FVC; this SNP was associated (P < 5 × 10−8) with both FEV1 and FEV1/FVC. Fine-scale recombination rate is plotted in blue39. Combined P value from stages 1 and 2a only; SNP rs3995090 had low imputation quality in the CHARGE Consortium data and so was not included in stage 2b.

Table 2 Loci associated with lung function
Table 3 Relation of SNPs at genome-wide significant loci to FEV1, FVC and FEV1/FVC, and impact of adjustment for smoking in stage 1 (SpiroMeta GWAS) data

Follow-up analyses (stage 2)

To validate potential associations with lung function, we selected 10 SNPs for further genotyping in additional studies of European ancestry (stage 2a, 32,184 individuals; Supplementary Table 2) and 30 SNPs for in silico follow-up (stage 2b; Supplementary Table 3). We obtained the in silico association results from the Health 2000 study (883 individuals) and from the CHARGE Consortium (21,209 individuals). Meta-analysis of the association results across stages 1, 2a and 2b showed five novel loci reaching genome-wide significance (P < 5 × 10−8): 2q35 in TNS1, 4q24 in GSTCD, 5q33 in HTR4, 6p21 in AGER and 15q23 in THSD4 (Table 2 and Fig. 2). A further locus, 6p21 in DAAM2, which was not selected for further genotyping follow-up in stage 2a, fell just below the threshold for genome-wide significance for association with FEV1/FVC after meta-analysis across stages 1 and 2b (rs2395730, P = 7.98 × 10−8; Supplementary Table 3 and Table 2).

The strongest association of FEV1 was at 4q24 in GSTCD (rs10516526, P = 2.18 × 10−23; Table 2 and Fig. 2b). Relatively little is known about GSTCD, but the presence of the C-terminal α-helical domain common to the glutathione S-transferase (GST) family of enzymes suggests this protein is involved in cellular detoxification by catalyzing conjugation of glutathione to products of oxidative stress14. GST enzymes also show glutathione peroxidase activity regulating the synthesis of prostaglandins and leukotrienes14. To explore the potential function of GSTCD, we conducted a protein homology search and identified homology with chloride intracellular channels 1, 3, 4, 5 and 6, suggesting a role for GSTCD beyond the GST enzyme family. Genes in the region also include INTS12 and NPNT. INTS12 associates with RNA polymerase II and mediates 3′-end processing of small nuclear RNA15.

The second locus associated with FEV1 was at 2q35, localized to the TNS1 gene (nonsynonymous coding SNP rs2571445, P = 1.11 × 10−12; Table 2 and Fig. 2a). The protein this encodes, tensin-1, is an actin-binding protein that contains Src homology 2 domains, suggesting a role in linking cytoskeletal changes with signal transduction16. Tensin-1 may be functionally involved in cell migration17.

Multiple genes potentially underlie the third locus associated with FEV1 at 5q33. The most strongly associated SNPs in this region, rs3995090 and rs6889822 (P = 4.29 × 10−9 and P = 8.17 × 10−9; Table 2 and Fig. 2d), are located in an intron in HTR4 and are part of a cluster of associated SNPs also spanning a SPINK5-like gene, SPINK7, SPINK9 and FBXO38. HTR4, which encodes 5-hydroxytryptamine receptor-4, is expressed in neurons in the respiratory pre-Bötzinger complex. Activation of this G protein–coupled receptor protects spontaneous respiratory activity18. Notably, selective antagonism of HTR4 in human bronchial strips in vitro attenuates the facilitation of electric field–stimulated cholinergic contraction by 5-hydroxytryptamine, suggesting a role for HTR4 in mediating airway caliber19. HTR4 expression has recently been confirmed in airway epithelial type II cells, where receptor stimulation seems to regulate cytokine responses20. The SPINK family of serine protease inhibitors may have a role in antimicrobial protection of mucous epithelia21. F-box protein-38 (encoded by FBXO38) is a member of a family of proteins that are believed to mediate protein-protein interactions and protein degradation22.

The strongest association with FEV1/FVC was at 6p21, a gene-rich region of the major histocompatibility complex (MHC). The extended LD in this region of the MHC prevented accurate localization of the association signal. However, we observed the peak of association for a nonsynonymous coding SNP in AGER (rs2070600, P = 3.07 × 10−15; Table 2 and Fig. 2e), which is a plausible candidate for causal association. AGER, also known as RAGE, is a multiligand receptor of the immunoglobulin superfamily23. AGER is highly expressed in the lung, in particular alveolar epithelial cells24, with a potential role in epithelium–extracellular matrix interactions. Reduced AGER expression has been identified in individuals with idiopathic pulmonary fibrosis25, and Ager−/− mice develop age-related pulmonary fibrosis26. Another candidate in this region is the nearby gene NOTCH4, a member of the family of transmembrane receptors involved in cell fate decisions27. Notch4 is expressed in endothelial cells of the adult mouse lung, where it is believed to regulate angiogenesis28.

The second identified association with FEV1/FVC was at 15q23, encompassing the THSD4 gene (rs12899618, P = 7.24 × 10−15; Table 2 and Fig. 2f). THSD4 shows homology with members of the thrombospondin family of extracellular calcium-binding proteins that modulate cellular attachment, proliferation and migration and have been implicated in wound healing, inflammation and angiogenesis29.

For each of the loci we reported, the estimated effect sizes were broadly consistent across the GWAS (Fig. 3).

Figure 3: Forest plots of the stage 1 meta-analysis for the six lung function–associated loci.
figure 3

Each of the SNPs included in the figure showed genome-wide significant association (P < 5 × 10−8) with either FEV1 or FEV1/FVC in the data from stages 1, 2a and 2b. The plots show the meta-analysis of the stage 1 data for each sentinel SNP. The contributing effect (transformed beta) from each study is shown by a square, with confidence intervals indicated by horizontal lines. The contributing weight of each study to the meta-analysis is indicated by the size of the square. The combined meta-analysis estimate in the stage 1 data is shown at the bottom of each graph.

Association of variants with FVC

We tested the top SNP at each of the loci showing genome-wide significant association (P < 5 × 10−8) with FEV1 or FEV1/FVC for association with the other of the two traits, and with FVC in the stage 1 studies (Table 3). In addition to being associated with FEV1, rs10516526 in GSTCD was associated with FVC (P = 2.53 × 10−7) but showed no discernible effect on FEV1/FVC.

Effect of smoking on SNP associations

Adjustment for ever-smoking status in the stage 1 data (Table 3) did not show materially different effect-size estimates for the associations with the sentinel SNPs in TNS1, GSTCD, HTR4, AGER, THSD4 or HHIP. Similarly, adjustments for a quantitative measure of lifetime smoking exposure (pack-years) did not show substantially different effect-size estimates for the identified SNP associations (data not shown). We also tested the associations of the top SNPs in TNS1, GSTCD, HTR4, AGER and THSD4 separately in ever-smokers and never-smokers (Supplementary Table 4); all P values were >0.05 for tests of interaction between smoking status and these SNPs on lung function.

Gene expression

We determined the mRNA expression profiles of GSTCD, HHIP, THSD4, TNS1, HTR4, AGER and NOTCH4 in human lung tissue and a series of primary cells. We detected all transcripts in lung tissue (Supplementary Fig. 2a) and bronchial epithelial cells (Supplementary Fig. 2b); six transcripts (excluding NOTCH4) were present in human airway smooth muscle cells. We also detected GSTCD, TNS1, HTR4, AGER and NOTCH4 transcripts in peripheral blood mononuclear cells (Supplementary Fig. 2b). For AGER, we noted the presence of two PCR products suggesting an unreported splice variant; we confirmed the presence of the splice variant by sequencing.

Discussion

Our study reports a meta-analysis of GWAS results from 20,288 participants and follow-up analyses in 54,276 participants, identifying five novel genome-wide significant loci for pulmonary function. The regions identified were 4q24 (GSTCD), 2q35 (TNS1) and 5q33 (HTR4) for FEV1, and 6p21 (AGER) and 15q23 (THSD4) for FEV1/FVC. In addition, we identified a region suggestive of association with FEV1/FVC at 6p21 in DAAM2. The companion manuscript from the CHARGE Consortium, which reports a GWAS of lung function in 20,890 participants, also identifies genome-wide significant associations at GSTCD, HTR4 and AGER30. Both SpiroMeta and CHARGE confirmed the previously reported association between FEV1 and FEV1/FVC and the 4q31 locus upstream of HHIP11.

Our findings have several important implications. First, the loci identified were observed in the whole population studied and were not specific to smokers. The presence of genetic determinants of lung function that do not depend on prior smoking exposure has been suggested by previous studies of heritability8. This does not rule out a possible subset of genetic determinants with effects on lung function that are partially or wholly dependent on smoking exposure.

We have also attempted to address the issue of genetic factors that influence smoking behavior. We did not observe any association with the CHRNA3-CHRNA5-CHRNB4 locus previously reported to be associated with cigarette smoke exposure, lung cancer, peripheral arterial disease31 and COPD10 (rs1051730, P = 0.23 for FEV1 and 0.56 for FEV1/FVC). The associations we show in GSTCD, TNS1, HTR4, THSD4 and AGER do not seem to be attenuated by adjustment for qualitative or quantitative adjustments for smoking exposure. None of these loci have been implicated in published GWAS of smoking quantity, although a recent report suggested a role for TSHD4 variants in smoking cessation32.

SNPs showing association with height could also show association with lung function measures because of incomplete adjustment for height, or because of SNP effects on skeletal growth with consequences for both height and lung function. The 4q31 locus near HHIP has shown convincing association with height33. An association was recently reported between height and rs185819 at 6p21 (ref. 34). Although this association signal was broad, reflecting the extended LD across this region of the MHC, rs185819 was in weak LD (r2 = 0.069) with rs2070600 (the sentinel SNP we reported for FEV1/FVC in AGER). These findings leave open the possibility of shared genetic determinants of growth of pulmonary function and height, but they do not suggest that our findings are primarily accounted for by inadequate adjustment for height.

The level of FEV1 at a given time point in an individual depends on two potentially independent processes: the maximum lung function obtained during development, and the rate of decline of lung function with age. Lung function reaches a maximum by age 25–35 years35. The populations studied in SpiroMeta cover a wide range of ages except the very elderly; as expected, FEV1 and FVC values were much lower in children. At least for the loci we identify, there was little evidence for age-specific effects, suggesting that the genetic risk factors identified operate across the age ranges; these findings again are in keeping with those of previous epidemiological studies7. Our analyses were based on cross-sectional measures of lung function; additional studies in cohorts with longitudinal data will be needed to identify determinants of the gradients of development and decline in lung function with age.

The magnitude of the estimated effect on untransformed FEV1 of rs10516526 in GSTCD was 52 ml per copy of the G allele (frequency, 0.06). This is equivalent to about 3 years of FEV1 decline in the nonsmoking population35. Allelic effect sizes on FEV1 of the more common variants (minor allele frequencies 0.4) were 19–23 ml for rs3995090 in HTR4 and rs2571445 in TNS1. Individually, the five loci we describe account for a small proportion (0.07%–0.14%) of the variance in FEV1 and in FEV1/FVC (Table 2) in the general population.

After exclusion of the locus near HHIP and the five reported regions, meta-analysis test statistics still showed an excess of extreme values compared with expected values under the null, particularly for FEV1. Although we cannot rule out the possibility of residual population stratification, this indicates the potential to detect further loci associated with lung function (Supplementary Fig. 1a,b). We have provided a list of the top 2000 associations for FEV1 and for FEV1/FVC (Supplementary Table 5) as a resource to other investigators.

We imputed nongenotyped SNPs using two software implementations36,37 that share similar underlying population genetic models38. This methodology facilitates meta-analysis across different marker sets and improves coverage across the genome, and its utility has been empirically shown in several large GWAS. However, the power to detect associations with rare alleles is limited. The loci we report include two relatively infrequent SNPs, GSTCD (rs10516526, minor allele frequency 0.06) and AGER (rs2070600, minor allele frequency 0.05); these SNPs were directly genotyped in the majority of stage 1 subjects (16,514 and 15,386 individuals, respectively).

The associations we report relate to the general population but were of comparable magnitude after the exclusion of documented cases of asthma or COPD (data not shown). Although pulmonary function is an important predictor of morbidity and mortality per se, it will be important to investigate, in appropriately powered studies, whether the risk alleles in the genes identified in this study act as independent susceptibility markers for COPD or influence the development of airway obstruction in other diseases, such as asthma.

Our expression profiling studies identified expression of all of the candidate genes in relevant tissues. Further work is required to elucidate the mechanisms underlying the novel association signals we describe. In broad terms, however, it is notable that the most probable candidate genes in the regions identified seem to be involved either in developmental pathways important for lung growth or in tissue remodeling pathways that might be expected to alter airway architecture.

In conclusion, the results presented here from the SpiroMeta consortium, together with those reported by the CHARGE Consortium30, provide strong evidence for newly identified genetic loci that act as important determinants of pulmonary function.

Methods

Study design.

The study consisted of two stages. In stage 1, a meta-analysis was conducted on directly genotyped and imputed SNPs from 14 studies of individuals of European ancestry, with a total sample size of 20,288. Details of these studies are given in Table 1. This meta-analysis provided loci for further genotyping in up to 32,184 individuals of European origin (stage 2a) and in silico comparisons in 22,092 individuals of European origin (stage 2b).

Stage 1 samples.

The SpiroMeta consortium consists of 14 GWAS studies: ALSPAC, B58C-T1DGC, B58C-WTCCC, EPIC (obese and population-based substudies), the EUROSPAN studies (Korcula, NSPHS, ORCADES and Vis), FTC (incorporating the FinnTwin16 and Finnish Twin Study on Aging), KORA S3, NFBC1966, SHIP and TwinsUK (see Table 1 for definitions of acronyms). The primary analyses on FEV1 and FEV1/FVC included 20,288 individuals of European descent. The measurements of FEV1 and FVC are described in the Supplementary Note.

Genome-wide genotyping and quality control.

The platforms used were Affymetrix 500K GeneChip array (four studies), Illumina HumanHap 550 Beadchip (one study), Illumina 317K (four studies), Affymetrix Genome-Wide SNP6.0 (one study), Illumina Hap370cnv (one study), Illumina Hap300 v1 (one study) and Illumina Hap300 v2 (two studies). Each individual study applied quality-control criteria as described in Supplementary Table 1.

Imputation.

Imputation of nongenotyped SNPs was undertaken with MACH36 or IMPUTE37 with preimputation filters and parameters as shown in Supplementary Table 1. SNPs were excluded if the imputation information, assessed using r2.hat (MACH) or .info (IMPUTE), was <0.3. In total, 2,705,257 autosomal SNPs were analyzed.

Transformation of data and genotype-phenotype association analysis.

Linear regression of age, age2, sex, height and ancestry principal components was undertaken on FEV1 (milliliters) and FEV1/FVC (percentage). The residuals were transformed to ranks and subsequently to normally distributed z scores, and were then used as the phenotype for association testing under an additive genetic model using software specified in Supplementary Table 1. Appropriate tests for association in related individuals were applied where necessary, as described in the Supplementary Note.

Meta-analysis of stage 1 data.

All stage 1 study effect estimates were corrected using genomic control40 and were oriented to the forward strand of the NCBI build 36 reference sequence of the human genome, consistently using the alphabetically higher allele as the coded allele. Study-specific lambda estimates are shown in Supplementary Table 1. The pooled effect-size estimate and s.e.m. were computed using inverse variance weighting, and genomic control was applied to the pooled effect-size estimates. To describe the effect of imperfect imputation on power, we report 'N effective', the sum of the study-specific products of the sample size and the imputation quality metric. Meta-analysis statistics and figures were produced using R version 2.7.0.

Selection of SNPs for stage 2.

Ten leading SNPs were selected for stage 2a genotyping follow-up (Supplementary Table 2). Thirty leading SNPs were selected for stage 2b in silico exchange, according to P value (under the threshold of 5 × 10−5), N effective (≥70% of the total sample size) and evidence from supporting SNPs (Supplementary Table 3).

Stage 2a samples (follow-up genotyped data).

We genotyped 10 SNPs in up to 32,184 individuals from the ADONIX, BHS, BRHS, BWHHS, Gedling, GS:SFHS, HCS, KORA F4, NFBC1986, Nottingham Smokers and NSHD studies. The characteristics of the studies are summarized in Table 1, and stage 2a study information is provided in the Supplementary Note.

Stage 2b samples (in silico data).

The CHARGE Consortium includes four population-based studies with data on FEV1 and FEV1/FVC: the Atherosclerosis Risk in Communities (ARIC) study, the Cardiovascular Health Study (CHS), the Framingham Heart Study (FHS) and the Rotterdam Study (RS). Details are provided in the companion paper in this issue from the CHARGE Consortium30. Given differences between the analysis approaches for GWAS adopted by the SpiroMeta and CHARGE consortia, the CHARGE analyses were undertaken using the analysis approach adopted by the SpiroMeta consortium (21,209 individuals; larger than the sample in the companion paper, which excluded subjects with missing or incomplete pack-years data). We also included 883 population-based subjects from the Health 2000 study in the stage 2b analysis.

Combined analysis of stage 1 and 2 samples.

Meta-analysis of data from stages 1, 2a and 2b was conducted using inverse variance weighting. We described associations as genome-wide significant if P < 5 × 10−8.

Secondary analyses.

To examine the effect of smoking on the causal pathway between the SNPs and the traits of interest, an adjustment for smoking was applied. The subgroups of 'ever-smokers' and 'never-smokers' were analyzed separately, and the stratum-specific estimated effects were combined within each individual study using inverse variance weights before meta-analyzing over studies. Additional adjustments were undertaken by adjusting for pack-years among the ever-smokers with these data available, and repeating the analyses.

PCR expression profiling.

The mRNA expression profiles of GSTCD, HHIP, THSD4, TNS1, HTR4, AGER and NOTCH4 were determined in human lung tissue and primary cell samples using RT-PCR, including RNA from lung (Ambion/ABI), brain, airway smooth muscle cells41 and human bronchial epithelial cells (Clonetics42). Peripheral blood mononuclear cells were isolated from whole blood using 6% (w/v) dextran and 42%–51% (v/v) Percoll gradients (Sigma). Ethical approval for the use of primary cells was obtained from the local ethics committees. Total RNA was extracted from samples using an RNeasy kit (Qiagen) as directed by the manufacturer. cDNA was generated from 1 μg of RNA template using random hexamers and a SuperScript kit (Invitrogen) as directed by the manufacturer. PCR assays were designed to cross intron-exon boundaries and where splice variation was known, in order to detect all variants. Primer sequences are given in Supplementary Table 6. All PCR was done using Platinum Taq High Fidelity (Invitrogen) with 100 ng of cDNA template in a 25-μl reaction. Cycling conditions were as follows: 94 °C for 3 min, 35 cycles of 94 °C for 45 s, 55 °C for 30 s, and 72 °C for 90 s.

URLs.

UCSC browser, http://genome.ucsc.edu/.