Main

Asthma is a complex disease affecting hundreds of millions of people worldwide. The prevalence of asthma varies across populations and ancestral origins; for example, in the US, the prevalence ranges from 3.9% in Mexican Americans to 12.5% in African Americans1. The contribution of genetic factors to asthma risk has been demonstrated in family studies, in which heritability estimates range from 25% to 80% (ref. 2). The high variability in prevalence and heritability estimates reflects the roles of environmental exposure in the disease risk and phenotypic heterogeneity that are hallmarks of asthma. These features may explain why genome-wide association studies (GWAS) have identified a smaller number of asthma loci than have been found in similarly sized studies of other multifactorial diseases3. Indeed, at the time of analysis, only 21 loci have been associated with asthma per se in 20 studies, and these loci explain only part of the genetic risk. Although an exome-array study has shown no evidence of low-frequency or rare variants with large effects on asthma risk4, the role of rare noncoding variants in asthma remains unknown. Future studies based on whole-genome sequencing may clarify the respective influence of common and rare variants on asthma risk. To generate larger sample sizes for GWAS meta-analysis of asthma and thereby enable the discovery of new common risk loci, we established the Trans-National Asthma Genetic Consortium (TAGC), comprising worldwide groups of investigators, which has analyzed genome-wide data available in 142,000 individuals of diverse ancestries. We constructed a comprehensive catalog of asthma risk variants that are robust across populations and environmental-exposure conditions. By combining TAGC meta-analysis results with data from existing databases, we assessed the genetic architecture of asthma risk alleles with respect to functional effects and shared effects with other diseases.

Results

Meta-analysis of asthma GWAS

We combined data from asthma GWAS with high-density genotyped and imputed SNP data (2.83 million SNPs) in the following populations: European ancestry (19,954 asthma cases, 107,715 controls), African ancestry (2,149 asthma cases, 6,055 controls), Japanese ancestry (1,239 asthma cases, 3,976 controls), and Latino ancestry (606 asthma cases, 792 controls) (Supplementary Table 1). After extensive quality control of summary data provided by each participating group (Methods, Supplementary Note and Supplementary Table 2), we conducted ancestry-specific meta-analyses, then performed a multiancestry meta-analysis of all populations (23,948 asthma cases, 118,538 controls) to identify additional loci with panancestry effects. Because childhood-onset asthma may be distinct from later-onset asthma5 and may represent a more homogeneous subgroup, we also performed analyses on the pediatric subgroup (asthma onset ≤16 years; 8,976 asthma cases, 18,399 controls). Meta-analyses of SNP effect sizes obtained from each asthma GWAS were performed with fixed-effects (significance of the combined SNP effect size summarized in P fixed) and random-effects (P random) models (Methods), and a conventional P random (or P fixed) threshold of 5 × 10−8 was used to define genome-wide significance. The results were consistent between methods for detecting loci with at least one SNP significantly associated with asthma. We therefore present the results from the random-effects analysis for the European-ancestry and multiancestry meta-analyses, which included the largest number of studies and allowed for an accurate estimate of the between-study variance, and the results from the fixed-effects analysis for the African-ancestry, Japanese-ancestry, and Latino-ancestry meta-analyses. We observed little evidence of inflation in the test statistics in either the ancestry-specific (European ancestry, λ = 1.031; African ancestry, λ = 1.014; Japanese ancestry, λ = 1.021; Latino ancestry, λ = 1.044) or multiancestry (λ = 1.046) meta-analyses (Supplementary Fig. 1).

We identified 673 genome-wide-significant SNPs (P random ≤5 × 10−8) at 16 loci in European-ancestry populations (Fig. 1a, Table 1 and Supplementary Tables 3 and 4; locus definition in Methods). No genome-wide-significant risk loci were detected in African-ancestry, Japanese-ancestry, or Latino-ancestry populations (Supplementary Fig. 2 and Supplementary Tables 57), possibly because of a lack of power. In the combined multiancestry meta-analysis, 205 additional SNPs were significant (P random ≤5 × 10−8), including 12 SNPs at two loci not detected in the European-ancestry analysis (Fig. 1b, Table 1 and Supplementary Tables 3 and 8). Altogether, 878 SNPs at 18 loci reached genome-wide significance, of which 69% were significant in both European-ancestry and multiancestry meta-analyses, 23% were significant in only the multiancestry meta-analysis, and 8% were significant in only the European-ancestry meta-analysis (Supplementary Tables 4 and 8; regional plots of the 18 loci in Supplementary Fig. 3). All 18 loci remained genome-wide significant after further genomic control correction of the test statistics, thus confirming the robustness of these results (Supplementary Table 9).

Fig. 1: Manhattan plots of the results of European-ancestry and multiancestry random-effects meta-analyses of asthma risk.
figure 1

a, The European-ancestry meta-analysis pertains to 19,954 asthma cases and 107,715 controls. b, The multiancestry meta-analysis pertains to 23,948 asthma cases and 118,538 controls. Each locus is annotated according to its cytogenetic-band location. The x axis represents chromosomal location, and the y axis represents –log10(P value) for tests of association between SNPs and asthma. Black, previously known loci; red, new loci identified in the European-ancestry meta-analysis; blue, additional new loci identified in the multiancestry meta-analysis The dashed horizontal line denotes P = 5 × 10−8.

Table 1 Genetic loci associated with asthma in European-ancestry and multiancestry meta-analyses

The 18 chromosomal regions included five new loci associated with asthma at 5q31.3, 6p22.1, 6q15, 12q13.3, and 17q21.33; two new associations at 6p21.33 and 10p14 that were independent of previously reported signals at these loci in ancestry-specific populations (Latino6 and Japanese7 ancestries, respectively); two associations at 8q21.13 and 16p13.13 that were previously reported for asthma plus hay fever but not for asthma alone in a study of European-ancestry populations8; and nine previously identified asthma loci.

None of the lead SNPs at the 18 loci showed evidence of heterogeneity in effect sizes across studies except for the lead variant at 9p24.1 (P het for Cochran’s Q test9 = 0.008 across European-ancestry studies and P het = 0.02 across multiancestry studies; Table 1 and Supplementary Fig. 4). There was also significant evidence of heterogeneity in the ancestry-specific effect sizes (P ethnic = 0.003) for the 6p22.1 lead SNP rs1233578, which consequently did not reach significance in the multiancestry analysis (Table 1 and Supplementary Table 3). The meta-analysis of the pediatric subgroup showed evidence of association (P random ≤5 × 10−8) at five of the 18 loci (2q12, 5q31, 6p21.33 9p24.1, and 17q12-21) (Supplementary Figs. 5 and 6 and Supplementary Table 10). No loci specific to the pediatric subgroup were identified.

The results provided genome-wide-significant confirmation of nine previously reported loci in both the European-ancestry and multiancestry meta-analyses (Table 1 and Supplementary Figs. 3b and 4). Our results allowed for detailed analysis of the broad 17q12-21 locus. Notably, the lead SNP (rs2952156) at this locus was within ERBB2 (P random = 2.2 × 10−30 in multiancestry meta-analysis), at least 180 kb from the previously recognized asthma-associated signals at the GSDMB/ORMDL3 haplotype block3 (Supplementary Fig. 7). This result was attributable to effect-size heterogeneity across studies (0.001 ≤ P het ≤ 0.05) that extended over a 200-kb region including ORMDL3 and GSDMB (Supplementary Table 11). This heterogeneity was partly due to the age of asthma onset, as previously reported5. Indeed, in the pediatric group, the 17q12-21 SNPs did not show heterogeneity (P het ≥0.09), and the lead SNP rs8069176 was 3.6 kb proximal to GSDMB (P random = P fixed = 4.4 × 10−26), in agreement with results from previous studies3,5. The SNP effect sizes in the pediatric and nonpediatric studies showed a significant difference for rs8069176 at the GSDMB locus (P het = 7.4 × 10−4) but no difference for rs2952156 at the ERBB2 locus (P het = 0.11). These two SNPs were in only moderate linkage disequilibrium (LD) (r 2 = 0.30), and each was in strong LD (r 2 >0.9) with missense variants localized in ERBB2 for the proxy of rs2952156 and in ZPBP2 and GSDMB for the proxies of rs8069176. Moreover, both rs2952156 and rs8069176 are associated with expression of GSDMB and ORMDL3 in blood10,11,12,13, and with the expression of GSDMA, CDK12, GSDMB, and ORMDL3 in whole lung tissue12,14. However, only rs2952156 is associated with PGAP3 expression in the lung12,14 (Supplementary Table 12a). Further exploration of expression quantitative trait loci (eQTL) data from Genotype-Tissue Expression (GTEx)12 indicated that rs8069176 accounted for a large part of the association of the most significant SNP with ORMDL3 expression in the blood, whereas rs2952156 accounted for a large part of the association of the most significant SNP with PGAP3 expression in the lungs (Supplementary Table 12b), thus suggesting that the asthma-associated signals near the PGAP3/ERBB2 and ORMDL3/GSDMB blocks may affect asthma risk through the expression of different genes in different tissues.

Finally, of the 21 published asthma loci, 12 did not reach genome-wide significance in TAGC (Supplementary Table 13). The most significant SNPs in the GWAS catalog3 at seven of those loci had P values >0.01 in TAGC analyses. Among these seven nonreplicated loci, two (4q31.21 (ref. 7) and 8q24.11 (ref. 15)) have been reported in Japanese individuals, three (4q12, 9p23, and 10q24.2)16 had SNPs with low minor allele frequency (MAF ≤2%) and have been reported in a childhood-onset asthma study, and two (1q31.3 (ref. 17) and 5q12.1 (ref. 18)) have been reported in children of European ancestry with asthma defined by current or persistent asthma symptoms with regular use of medication. The most significant SNPs at the remaining five loci had P values ≤5 × 10−4 in at least one TAGC meta-analysis, thus providing some replication. Among these five loci, (i) the 1q23.1 locus is specific to African-ancestry populations19; (ii) the 12q13.2 SNP, reported in a study of Japanese individuals7, showed heterogeneity in the TAGC Japanese-ancestry meta-analysis as well as the European-ancestry and multiancestry meta-analyses (P het ≤0.05); and (iii) the 7q22.3 SNP, previously reported in European-ancestry populations20, was associated with a severe form of childhood asthma and also showed heterogeneity across studies in the original publication20 (in which the P random value did not reach significance) as well as in our study (European-ancestry, multiancestry, and pediatric meta-analyses, 0.006 ≤ P het ≤ 0.03). Finally, SNPs at the 1q21.3 and 22q12.3 loci, previously reported in European-ancestry populations21,22, did not show significant evidence of heterogeneity across TAGC studies in the European-ancestry and multiancestry meta-analyses (0.11 ≤ P het ≤ 0.19). When we repeated these two meta-analyses under a fixed-effects model and separately considered the set of TAGC datasets that were part of the original publication (set P) and the set of remaining TAGC datasets (set R), both 1q21.3 and 22q12.3 SNPs had higher effect sizes in set P than in set R. These differences in effect sizes did not reach significance for the 1q21.3 SNP (P het for Cochran’s Q test of 0.13 and 0.20 in the European-ancestry and multiancestry analyses, respectively) and were borderline significant for the 22q12.3 SNP (0.04 ≤ P het ≤ 0.06) (Supplementary Table 14). Altogether, these results suggested that the lack of replication was mainly due to heterogeneity attributable to various factors, such as ancestry, specificity of clinical phenotypes, or other factors, as further discussed below.

To investigate whether the 18 asthma loci identified in this study contained multiple distinct signals, we performed approximate conditional regression analysis, based on summary statistics, for all loci (Methods), except for the 9p24.1 region, which showed heterogeneity in SNP effect size across studies over the entire locus. For the 17q12-21 locus, this analysis was restricted to the pediatric subgroup in which there was no heterogeneity. After conditioning on the lead SNP in each investigated region, four secondary signals (2q12, 5q22.1, 5q31, and 6p21.32) remained significant (P fixed ≤ 5 × 10−8) (Supplementary Table 15), thus yielding 22 distinct genome-wide-significant signals.

To provide biological insight into our findings, we conducted a comprehensive bioinformatic assessment of the asthma-association signals. To pinpoint the most likely candidate genes at the nine loci with new associations with asthma per se, we interrogated the results of six eQTL studies in tissues relevant to asthma: blood (including peripheral blood11,12, lymphoblastoid cell lines10,13, and monocytes23), and whole lung tissue12,14.  We also searched for missense variants potentially tagged by the association signals, using the HaploReg v4.1 tool (URLs). To assess the degree of overlap of asthma associations with susceptibility loci for other phenotypes, we interrogated the GWAS catalog3 while varying the strength of association with asthma (thresholds from 5 × 10−8 to 10−3). To obtain greater insight into how asthma-associated variants might functionally influence disease, we interrogated the Roadmap/Encyclopedia of DNA Elements (ENCODE) functional genomics data generated from a wide range of human cell types24. Finally, the degree of connectivity among the asthma-associated loci was assessed through text mining25. The results are described below.

Candidate genes at the nine loci showing new associations

A summary of the eQTL analysis for these nine loci is described in Table 2 and Supplementary Table 16; regional plots are shown in Supplementary Fig. 3a.

Table 2 Main characteristics of the nine loci showing new associations with asthma

New asthma susceptibility loci

Five new loci were identified in this study. The strongest new signal in both the European-ancestry (P random = 8.6 × 10−13) and multiancestry (P random = 2.2 × 10−12) meta-analyses was for SNP rs2325291 in an intron of BACH2 at 6q15, which was strongly correlated with rs10455168 (r 2 = 0.91), a cis eQTL altering expression of BACH2 in the blood11. BACH2 encodes a ZIP transcription factor that regulates nucleic-acid-triggered antiviral responses in human cells26. The second-strongest signal in the European-ancestry and multiancestry analyses was for rs17637472 (P random = 3.3 × 10−9 and 6.6 × 10−9), which is located between ZNF652 and PHB at 17q21.33 and is a strong cis eQTL for GNGT2 (173 kb from rs17637472) in the blood10,11,13,23. GNGT2 interacts with β-arrestin 1 and consequently promotes G-protein-dependent AKT signaling in NF-κB activation27.

Among the other new signals, the lead SNP rs1233578 at 6p22.1 (P random = 5.3 × 10−9 in European-ancestry populations) was located between TRIM27 and GPX5. This SNP was not associated with gene expression in the blood or lungs but was in LD (r 2 = 0.6 in European-ancestry populations) with rs7766356 (312 kb from rs1233578), which is a cis eQTL for ZSCAN12 in the blood13 and ZSCAN31 in the lungs14. These genes encoding zinc-finger proteins are associated with lung function28. The two SNPs rs1233578 and rs7766356 represented the same association signal in European-ancestry populations (the association with rs7766356 became nonsignificant after conditioning on the lead SNP rs1233578). The 12q13.3 lead SNP (rs167769), which was significant only in the multiancestry analysis (P random = 3.9 × 10−9), was located within an intron of STAT6 and is strongly associated with STAT6 expression in the blood10,11,13 and lungs14. STAT6 is a transcription factor that is essential for the TH2-lymphocyte functional responses mediated by IL-4 and IL-13 (ref. 29). This result established the association of STAT6 with asthma risk that has been disputed in candidate-gene studies30. The 5q31.3 lead SNP rs7705042 (P random = 7.9 × 10−9 in multiancestry analysis) was located within an intron of NDFIP1 and is associated with NDFIP1 expression in the blood11,12,13. NDFIP1 is a potent inhibitor of the antiviral response31 and inflammation processes32.

New asthma signals at loci reported in specific populations

Two associations in our study were with new SNPs at loci previously reported to be associated with asthma in people of Latino6 and Japanese7 ancestry. The first one, at 6p21.33, has previously been reported in an admixture mapping study in Latino individuals6. The lead TAGC SNP rs2855812 (P random = 8.9 × 10−12 in the multiancestry analysis; P random = 1.7 × 10−8 in the European-ancestry meta-analysis) was located within an intron of MICB. This SNP was not correlated (r 2 = 0) with any of the SNPs reported in the study of Latino individuals6. The 6p21.33 region contains many genes whose transcripts are associated with TAGC asthma signals, including TNF, LST1, HLA-C, and LTA in the blood10,11,13, and MICB in the lungs12,14. These genes are involved in immunologically related mechanisms. This 6p21.33 locus is approximately 600 kb from the previously reported 6p21.32 locus that spans HLA class II genes. Intensive sequencing efforts will be needed to further clarify the HLA-region associations. The second association was at the 10p14 locus, where a GWAS in Japanese individuals7 has reported an association (lead SNP rs10508372) with adult asthma. We detected a new signal, rs2589561, in European-ancestry (P random = 1.4 × 10−8) and multiancestry meta-analyses (P random = 3.5 × 10−9) that was not correlated with rs10508372 in either European-ancestry or Japanese-ancestry populations. The SNP rs2589561 is in a gene desert, 929 kb from GATA3. However, recently published promoter-capture Hi-C data in hematopoietic cells33 has shown that two proxies of rs2589561 (r 2 >0.9) are located in a region that interacts with the GATA3 promoter, especially in CD4+ T cells. These findings suggest that the SNP may be in a distal regulator of GATA3, which encodes a transcription factor that is a master regulator of differentiation of TH2 cells and type 2 innate lymphoid cells (ref. 34).

Asthma signals reported for asthma plus hay fever

In one study of individuals of European-ancestry, loci on chromosomes 8q21.13 and 16p13.13 have been associated with asthma plus hay fever but not with asthma alone8. In our results, the 8q21.13 lead SNP rs12543811 (P random = 3.4 × 10−8 and 1.1 × 10−10 in the European-ancestry and multiancestry analyses) was located between TPD52 and ZBTB10 and was in strong LD (r2 = 0.79) with the previously reported asthma/hay fever SNP rs7009110. These two SNPs represented the same signal, because the association with rs12543811 became nonsignificant after conditioning on rs7009110. Thus, the 8q21.13 locus is likely to be associated with allergic asthma. A functional analysis of the asthma/hay fever locus pinpointed PAG1 as a promising candidate35. The chromosome 16p13.13 SNP rs17806299 is within an intron of CLEC16A (P random = 2.1 × 10−10 and 2.7 × 10−10 in European-ancestry and multiancestry meta-analyses). Although it was in moderate LD (r 2 = 0.66) with the previously reported asthma/hay fever signal (rs62026376)8, the association of asthma with rs17806299 was removed after conditioning on rs12935657 (r 2 = 0.96 with rs62026376), thus indicating that these SNPs represented the same signal and that 16p13.13 was probably also an allergic asthma locus. The SNP rs17806299 is strongly associated with the expression of a nearby gene, DEXI in the blood11,23. Similar observations of associations of CLEC16A SNPs with autoimmune diseases and expression of DEXI together with chromosome-conformation-capture experiments have implicated DEXI as the most likely candidate gene associated with autoimmune diseases36. The potential relevance of DEXI in allergic diseases has also been previously discussed8.

Notably, the lead SNPs at the nine new asthma-associated loci were located in noncoding regions and did not tag missense variants.

Overlap of loci associated with asthma and other phenotypes

We next explored whether the nine loci bearing new signals for asthma per se overlapped with GWAS loci reported for allergy-related phenotypes, lung-function phenotypes, or other immunologically related diseases, by using the GWAS catalog3. Six of these nine asthma loci showed overlapping associations with allergy-related phenotypes, and eight showed overlapping associations with autoimmune diseases or infection-related phenotypes (Table 2). Moreover, three asthma loci overlapped with associations with lung-function phenotypes.

We expanded our search of overlap between the asthma-association signals with multiancestry P random <10−4 in this study and GWAS signals with all phenotypes and diseases in the GWAS catalog3. We examined 4,231 unique trait–loci combinations (Methods) and used the disease classification from Wang et al.37 to group traits. We summarized the overlap with GWAS-catalog signals as the proportion of catalog SNPs with asthma P values <10−4 in our analysis. The results showed significant overlap with autoimmune disease (49 out of 480 catalog SNPs (10%) showed evidence for asthma association), in agreement with the hypothesized shared susceptibility38,39; moderate overlap with diseases with an inflammatory component (cardiovascular diseases, cancers, and neuropsychiatric diseases); and little to no overlap with other diseases (Table 3). When investigating specific diseases and traits (Supplementary Table 17), we observed the most significant overlap with allergic phenotypes. There was little to no overlap with other phenotypes that appeared most frequent in the GWAS catalog (for example, no shared associations with type 2 diabetes).

Table 3 Overlap of TAGC asthma-associated SNPs with GWAS-catalog association signals by disease group

When we broadened our analysis to a larger set of SNPs in the GWAS catalog to identify loci for diseases with potentially shared genetic architecture with asthma (i.e., SNPs associated with asthma at P random ≤10−3 in our multiancestry meta-analysis), additional pleiotropic signals emerged (Supplementary Table 18). This larger set of associations suggested a broader picture of asthma risk, with a wide range of shared effects with traits ranging from lung cancer and multiple sclerosis (with rs3817963 in BTNL2) to coronary heart disease (with rs1333042 near CDKN2B). This analysis also generated an extended set of candidate asthma-associated genes. Indeed, there were 210 SNPs in the GWAS catalog that were associated with asthma in TAGC at a threshold of 10−3, and the proportion of false positives among these was smaller than 1%.

Enrichment of asthma risk loci in epigenetic marks

Because nearly all lead SNPs at the 18 loci identified by this study, except for the IL13 missense variant (rs20541), were located in noncoding sequences, we investigated whether the asthma-associated variants and their proxies (r 2 ≥0.80) might be concentrated in cis-regulatory DNA elements. We explored only 16 of 18 identified asthma loci, excluding the two loci spanning the HLA region because of the region’s high variability and extensive LD. We interrogated the 111 Roadmap and 16 ENCODE reference epigenomes in a wide range of human cell types24, focusing on histone marks characterizing enhancers and promoters assayed in all 127 epigenomes and DNase I–hypersensitive sites available in 51 cell types. To assess enrichment of the asthma risk variants for colocalization with these regulatory elements, we used the Uncovering Enrichment through Simulation (UES) pipeline40. This approach generates random SNP sets that match the characteristics of the original asthma-associated SNPs (distance from the nearest transcription start site, number of LD partners, and MAF). Empirical P values for enrichment were calculated by comparing the observed frequency of colocalization of SNPs with a given type of regulatory element in the original asthma-associated SNP set to the co-localization-frequency distribution obtained from the 10,000 random SNP sets generated. Benjamini–Hochberg false discovery rate (FDR) values were then computed to correct for multiple testing (Methods).

Although the asthma-associated variants were strongly enriched for colocalization with enhancer marks, there was only weak enrichment in promoter marks (Table 4 and Supplementary Table 19). This enrichment was highest in leukocytes (27 leukocytes, of which 19 (70%) were lymphocytes and monocytes). For example, an FDR ≤5% for enrichment of asthma loci in active enhancers was observed in 100% of leukocytes compared with 50% of all cell types. The enrichment of asthma risk variants for colocalization with DNase I–hypersensitive sites was intermediate between the enrichments in promoters and enhancers and was again elevated in blood cells (FDR ≤5% in 40% of leukocytes and 12% of all cell types) (Table 4 and Supplementary Table 20).

Table 4 Enrichment of asthma risk loci in promoter and enhancer marks and DNase I–hypersensitive sites

The strong enrichment of asthma loci in enhancer marks, especially in immune cells, indicated that the associated genetic variants are likely to be involved in the regulation of immunologically related functions. This finding also suggested that epigenetic mechanisms may be key to promoting asthma, as evidenced by IgE levels, an asthma-associated phenotype41.

Connectivity among asthma-associated loci

To characterize the degree of connectivity among the 18 asthma-associated loci, we applied the Gene Relationships Across Implicated Loci (GRAIL) text-mining approach25. Genes at 11 of these loci showed connections with a GRAIL score P GRAIL <5% (and seven of them were highly connected, with P GRAIL <10−3) (Fig. 2 and Supplementary Table 21). These genes were connected through keywords such as ‘asthma’, ‘allergy’, ‘atopic’, ‘interleukin’, ‘cytokines’, ‘airway’, and ‘inflammation’, thus confirming the central role of immunologically related mechanisms accounting for these connections.

Fig. 2: GRAIL circle plot of connectivity among genes at asthma risk loci.
figure 2

The 17 asthma risk loci are along the outer ring (the 10p14 locus was ignored because it corresponds to a gene desert); the internal ring represents the genes at these loci. The widths of the lines drawn between genes correspond to the strength of the literature-based connectivity, with thicker lines representing stronger connections.

Discussion

In this meta-analysis of worldwide asthma GWAS in ethnically diverse subjects, we identified nine new loci influencing asthma risk. Our findings confirm that immunologically related mechanisms are prominent in asthma susceptibility and provide new insights that may open new avenues for future asthma research. The asthma-associated loci identified by TAGC are enriched in enhancer marks and are likely to be involved in gene regulation. Although these findings were observed in immune cells, asthma-associated genes (e.g., IL1RL1, TSLP, IL33, and ORMDL3/GSDMB) are also expressed in the airway epithelium, where they modulate airway inflammation. Investigation of epigenetic marks in airway epithelial cells may provide additional insight. The best candidates at many loci are involved in immune responses to viruses or bacteria, thereby underscoring the importance of infections in asthma risk. This study further provides evidence of an overlap of asthma loci with loci underlying autoimmune diseases and other diseases with an inflammatory component, thereby strengthening the growing understanding of the importance of pleiotropy in multifactorial diseases.

Our meta-analysis doubles the number of asthma cases analyzed in prior genome-wide studies21,22 at the time of analysis. We identified 878 SNPs corresponding to 22 distinct association signals at 18 loci meeting criteria for genome-wide significance in European-ancestry and/or multiancestry populations. Pooling data from ethnically diverse populations can increase the power to detect new loci (in this study, two loci reached the genome-wide threshold only in the multiancestry analysis) but may also increase heterogeneity. Beyond differences in the genetic background, varying environmental-exposure conditions can modify genetic risk and result in heterogeneity in SNP effect size, and consequently make the power of multiancestry analysis lower than that of ancestry-specific analysis. If asthma prevalence is assumed to be 10%, the variance in asthma liability explained by the 22 distinct genome-wide-significant variants in this study was estimated to be 3.5% (95% confidence interval 2.0–5.4%) of which 72% was accounted for by the known loci, and 28% was accounted for by the new loci. Notably, the current study was based on HapMap2-imputed data, which were shared within the TAGC consortium and thus allowed for detection of associations with common genetic variants (MAF ≥1%).

The overall relative paucity of asthma risk loci detected by large-scale GWAS, as compared with the number of risk loci identified for other common diseases, may be due to the clinical heterogeneity of asthma and the important etiological role of differing environmental-exposure conditions. Asthma is thought to be not a single disease but a syndrome that varies according to many characteristics42, including the age of asthma onset, the severity of disease, the type of cellular inflammatory infiltrates, occupational exposure, and the varying response to treatment. It is thus possible that additional asthma loci may be identified by studies targeting more specific asthma subphenotypes and/or considering environmental exposure.

In conclusion, future discoveries might result from exploring more complex models of asthma phenotypes and from joint analysis of asthma and other immunologically mediated and inflammatory diseases. The central role of gene-regulatory mechanisms highlighted by our study might prompt genome-wide exploration of the epigenome in immune cells and the respiratory epithelium while integrating information on genetic variation and environmental-exposure histories.

URLs

National Human Genome Research Institute (NHGRI) and European Bioinformatics Institute (EBI) catalog of published genome-wide association, https://www.ebi.ac.uk/gwas/; 1000 Genomes Project Consortium Phase 3, http://www.internationalgenome.org/; Genome-wide Complex Trait Analysis (GCTA), http://cnsgenomics.com/software/gcta/; Blood eQTL browser, https://omictools.com/blood-eqtl-browser-tool; GTEx, http://www.gtexportal.org/; Multiple Tissue Human Expression Resource (MuTHER) database, http://www.muther.ac.uk/; eQTL database in lymphoblastoid cell lines from MRCA and MRCE families, https://www.hsph.harvard.edu/liming-liang/software/eqtl/; GHS-Express, http://genecanvas.ecgene.net/; HaploReg v4.1, http://archive.broadinstitute.org/mammals/haploreg/haploreg.php/; Roadmap and ENDCODE epigenomics data, http://egg2.wustl.edu/roadmap/web_portal/; UES pipeline, https://github.com/JamesHayes/uesEnrichment/; GRAIL, https://software.broadinstitute.org/mpg/grail/; Visualizing GRAIL connections (VIZ-GRAIL), http://software.broadinstitute.org/mpg/grail/vizgrail.html; LocusZoom, http://locuszoom.org/.

Methods

GWAS and shared data

All 66 GWAS from the TAGC consortium are described in the Supplementary Note and are summarized in Supplementary Table 1. These GWAS included 56 studies of individuals of European ancestry (19,954 asthma cases, 107,715 controls), seven studies of individuals of African ancestry (2,149 asthma cases, 6,055 controls), two studies of individuals of Japanese ancestry (1,239 asthma cases, 3,976 controls), and one study of individuals of Latino ancestry (606 asthma cases, 792 controls), with a total of 23,948 asthma cases and 118,538 controls. There were 27 studies including only childhood-onset asthma (defined as asthma diagnosed at or before 16 years of age), thus allowing us to separately analyze a pediatric subgroup (8,976 asthma cases, 18,399 controls). All subjects provided informed consent to participate in genetic studies, and the local ethics committee for each individual study approved the study protocol. The definition of asthma was based on physicians’ diagnoses and/or standardized questionnaires (details in Supplementary Note). The samples were genotyped on a variety of commercial arrays, as detailed in the Supplementary Note and Supplementary Table 2. GWAS were performed on imputed SNP data that were generated with HapMap2 as the reference panel in one of the commonly used imputation programs (Supplementary Note and Supplementary Table 2). In each dataset, the effect of each individual SNP on asthma, assuming an additive genetic model, was estimated through a logistic-regression-based approach and is expressed in terms of a regression coefficient with its standard error; the detailed methodology and software used for analysis in each study can be found in the Supplementary Note and Supplementary Table 2.

Imputation, quality control (including adjustments for population stratification), and analysis were performed by each group independently, and data on a predefined set of 3,952,683 autosomal SNPs were shared. These SNPs were those of the HapMap Phase 2, release 21 panel in subjects of European, Asian, and African ancestry that were filtered through SNP annotation from build 37.3 of the reference sequence and dbSNP b135 (31,587 SNPs (0.8% of all SNPs) from previous annotations that showed discrepancies with the chosen annotation were deleted). The variables that were shared contained the study name, general information on SNPs (rsID, chromosome, position, alleles (baseline and effect alleles as used in the analysis by each study), SNP status (imputed or genotyped SNP and whether the SNP genotype or imputed value was used in computation), quality control (QC) indicators (call rate and P value for the Hardy–Weinberg (HW) equilibrium test for genotyped SNPs, software used for imputation, and imputation quality score for imputed SNPs), allele frequencies in individuals with asthma and control individuals, and information on association statistics (regression coefficient for SNP effect, standard error of regression coefficient, Z scores, and P values associated with Z-score statistics).

Quality control of shared data

For each SNP, the alleles on the HapMap2 template (reference and alternate alleles on the positive strand) were compared with the alleles (baseline and effect alleles) used in the analysis by each group. When necessary, the association variables (allele frequencies, regression coefficient for SNP effect, and Z score) were switched to match the reference/alternate alleles of the template. Data for each SNP showing any ambiguity or error in assignment to the template were set to missing. In addition, several QC checks were performed regarding the name, format, range of possible values for all shared variables mentioned in the previous paragraph, and consistency across variables. Any problem or inconsistency was corrected; otherwise, the data for that SNP were set to missing. After this first stage of QC, association statistics for at least one SNP in at least one study were available for 2.83 million autosomal SNPs. Strict QC criteria were used for inclusion of a SNP in the analysis. When a SNP genotype was used in the study analysis, these criteria were call rate ≥99%, P value for HW test ≥10−6, and MAF ≥0.01 in both controls and affected individuals. When an imputed SNP value was used in the analysis, the criteria were imputation quality score ≥0.5 and MAF ≥0.01 in both controls and asthma cases. The distribution of the summary statistics (regression coefficient for SNP effect, standard error, and Z score) of all SNPs passing QC was examined for each study; SNPs that still showed extreme Z scores (≥7 or ≤–7) after QC were excluded.

Meta-analysis of asthma GWAS

We conducted fixed-effects meta-analysis with inverse variance weighting and random-effects meta-analysis, using the Der Simonian and Laird43 estimator of the between-study variance, when the meta-analyses included a large number of studies (European-ancestry, multiancestry and pediatric-subgroup meta-analyses), thus allowing for an accurate estimate of the between-study variance. We used a fixed-effects model for the meta-analyses of the African-ancestry, Japanese-ancestry, and Latino-ancestry populations. For all these meta-analyses, we used the SNP regression coefficient and standard error from each study for which the SNP passed QC. All meta-analyses were done with Stata version 14.1. To minimize the false-positive findings and to obtain robust results, we examined the combined results for SNPs for which at least two-thirds of the studies contributed to a meta-analysis. Tests of significance of the combined effect sizes were performed by using a standard normal distribution. We applied a threshold of P random (or P fixed) of 5 × 10−8 to declare a combined SNP effect as genome-wide significant. To verify the robustness of the results, we applied a genomic control correction to the association test statistics. The lead SNP at a locus was the variant with the strongest evidence of association in the European-ancestry or multiancestry meta-analysis. We defined a support interval around the lead SNP designated as ‘locus’; the bounds of this interval were the positions of the two most extreme SNPs among all SNPs that were located within 500 kb on each side of the lead SNP and had P random (or P fixed) ≤10−6. The heterogeneity of per-SNP effect sizes across all studies in a meta-analysis was assessed with Cochran’s Q test9. Differences among the four ethnic-specific summary effects were also tested with Cochran’s Q statistic.

Conditional analysis of asthma-associated loci

GCTA software44 (URLs) was used to perform approximate conditional analysis for all loci with at least one SNP reaching the genome-wide-significance level. This approximate conditional analysis is based on the summary meta-analysis statistics obtained under a fixed-effects model and takes into account the correlations among SNPs that are estimated from a large reference population included in the meta-analysis. Approximate conditional analysis was performed in only the European-ancestry group, which could be assumed to share a similar LD pattern and was both the largest ancestry-specific dataset and the only one showing genome-wide-significant results. Because this analysis assumes no heterogeneity in SNP effect size across studies, the 9p24.1 and 17q12-21 loci, which showed significant heterogeneity (P het ≤0.05, Cochran’s Q test) for a large portion of each locus, were not investigated. However, for the 17q12-21 locus, where there was no heterogeneity in the pediatric subgroup, GCTA was restricted to the European-ancestry pediatric subgroup. We used the large ECRHS dataset as the reference sample to estimate LD. This dataset was genotyped with the Illumina Human610Quad array and included 2,101 unrelated individuals after QC22. Imputation was performed with MACH software45 and the HapMap2, release 21 panel; only well-imputed SNPs (imputation quality score rsq >0.8) with MAF ≥1% were retained in this reference sample. For each asthma-associated locus, the region explored by conditional analysis extended 500 kb on each side of the two extreme SNPs defining the support interval around the lead SNP (described in preceding paragraph). However, we decreased that extension to 250 kb for the 6p21.33 and 6p21.32 loci to avoid overlap. The length of the regions explored by conditional analysis varied from 1.01 Mb to 1.63 Mb. Within each region investigated by conditional analysis, summary effects for SNPs belonging to that region were adjusted for the lead SNP by using the --cojo-cond option; tests for the adjusted SNP effects were based on the two-sided Wald test. If there was an additional SNP meeting the Bonferroni-corrected threshold for the total number of SNPs over all regions investigated by GCTA (P = 4.1 × 10−6), after adjustment for the lead SNP, we performed an additional round including both SNPs. If the remaining SNPs had P >4.1 × 10−6, no further analysis was performed. The results of this analysis are reported in Supplementary Table 15.

Identification of cis eQTLs at new asthma risk loci

To obtain greater insight into the genes potentially driving the association signals at the new asthma loci, we defined a list of SNPs to be interrogated that included the lead SNPs, the secondary signals identified by conditional analysis, and all SNPs in LD with these SNPs (r 2 between 0.5 and 1). To search for cis eQTLs within up to 1 Mb of each investigated SNP, we interrogated six publically available eQTL databases, giving priority to cell types more likely to be involved in asthma biology (blood cell types and lung tissue): (i) a meta-analysis of the transcriptional profiles from peripheral blood cells of 5,311 individuals of European ancestry (the blood eQTL browser11); (ii) gene expression data from 777 lymphoblastoid cell lines from the MuTHER database10; (iii) transcriptional profiles of 405 and 550 lymphoblastoid cell lines from UK asthma (MRCA) and eczema (MRCE) family members, respectively13; (iv) eQTL data from monocytes from 1,490 individuals included in the GHS-express database23; (v) GTEx eQTL Browser data from multiple tissues including the blood and lungs12; and (vi) transcriptional profiles from the lung tissues of 1,111 subjects14 (URLs).

Search for missense variants at new asthma risk loci

To complement the eQTL analysis, we searched whether the lead asthma-associated SNPs and secondary signals were in LD (r 2 >0.5) with missense variants by using the HaploReg v4.1 tool (URLs).

Overlap of loci associated with asthma and other phenotypes

Overlap of new asthma risk loci with associations with allergy-related phenotypes/diseases and immunologically related diseases as well as lung-function phenotypes was first annotated by using the 24 March 2015 version of the NHGRI–EBI GWAS catalog3 (URLs). We then used this catalog to systematically investigate the overlap of asthma signals with P random ≤10−4 in the multiancestry meta-analysis with association signals of all diseases and traits in the catalog. That version of the catalog comprised 19,080 SNP entries, 16,047 of which had a TAGC asthma-association P value. To investigate pleiotropy, we filtered out SNPs associated with asthma in the database, SNPs with a reported GWAS P value >10−7 (with the intent of removing some of the potential false positives in the catalog) and SNPs that were duplicated (i.e., to remove disease-SNP duplications). This procedure decreased the number of entries to 5,927. Notably, this process did not remove either SNPs in perfect LD associated with the same disease or SNPs that were present multiple times in the database because of their association with different phenotypes. For some diseases or quantitative traits, there were multiple SNPs in the same region reported in the catalog, thus potentially yielding redundant information. Some of the SNPs might have been in strong LD, whereas others might have reflected independent signals. To avoid possible duplication of signals, we retained only unique trait–loci combinations, as reflected by the variables ‘disease trait’ and ‘region’ in the catalog. There were 4,231 unique entries remaining after this filtering step. Diseases/traits in the GWAS catalog were grouped according to the classification from Wang et al.37. We summarized the overlap of GWAS-catalog signals with asthma signals according to the proportion of catalog SNPs with asthma P values <10−4 in our analysis. The significance of overlap was estimated as the binomial-tail probability for observing the number of TAGC SNPs with P random ≤10−4 among the number of SNPs reported in the GWAS catalog for a group of diseases. The significance threshold for enrichment in shared associations between a disease group and asthma was set to 0.05 divided by the number of disease groups investigated, through a Bonferroni correction. Finally, we examined a larger set of SNPs in the GWAS catalog that showed an association with asthma at P random ≤10−3 in TAGC multiancestry meta-analysis and estimated the proportion of false positives among those SNPs.

Enrichment of asthma risk loci in epigenetic marks

To obtain greater insight into the functional role of the genetic variants at the new and known asthma loci identified in this study, we investigated whether the lead SNPs and their proxies (r 2 ≥0.80) were concentrated in cis-regulatory DNA elements. We used the UES pipeline40 (URLs) that was adapted to the current study. This approach tests whether GWAS-identified SNPs are enriched in particular functional annotations through use of Monte Carlo simulations. The original set of asthma-associated SNPs included the lead SNPs at each asthma risk locus (i.e., one SNP per asthma-associated locus, as recommended by Hayes et al.40). We excluded the two associated loci spanning the HLA region (6p21.33 and 6p21.32), because of the high amount of variability and LD in that region. Each of the original lead SNPs was categorized according to its distance from the nearest transcription start site (TSS) and the number of LD partners (r 2 ≥0.8). Quartiles for both the TSS distance and LD-partner count were calculated, and the initial SNPs were binned accordingly. Then, SNPs from the entire set of imputed SNPs used for analysis were binned according to the original SNP criteria (distance from the closest TSS, number of LD partners, and MAF). Random SNP sets were chosen, matching the original bin frequencies. LD partners (r 2 ≥0.8) for both the original lead SNPs and random SNPs were retrieved. The SNP data, including the original and random sets of SNPs and their corresponding LD partners (r 2 ≥0.8), were intersected with the cell-specific epigenome tracks of regulatory elements with BedTools intersectBed46, to determine which SNPs colocalized with a given type of regulatory elements (for example, enhancers or promoters). The resultant SNPs were then collapsed into loci that colocalized with marks according to LD structure. We computed an empirical P value for a specific track by using 10,000 random SNP sets (this P value was equal to r loci/n, where r loci is the number of instances in which the frequency of colocalization of the random SNP sets with the regulatory feature was greater than or equal to the frequency of colocalization with the feature for the original SNP set, and n is the number of random SNP sets generated (here, 10,000). We used Benjamini–Hochberg FDRs to correct for multiple testing. We interrogated the functional data from 111 Roadmap reference epigenomes and 16 additional epigenomes from ENCODE that are available in a wide range of human cell and tissue types24 (URLs). We focused on enhancers and promoters that were defined with the ChromHMM 15-state model and assayed in all 127 epigenomes. We also examined enrichment in DNase I–hypersensitive sites that were available in 51 cell types.

Connectivity among asthma-associated loci

We used GRAIL25 to assess the relatedness among asthma-associated loci. As previously described in detail25, to define the genes near each SNP, GRAIL finds the furthest neighboring SNPs in the 3′ and 5′ direction that are in LD (r 2 >0.5) and proceeds outward in each direction to the nearest recombination hotspot. All genes that overlap that interval are considered to be implicated by the SNP. If there are no genes in that region, the interval is extended by 250 kb in either direction. We used the genome-wide-significant signals identified by this study as seeds and queried loci to investigate the biological connectivity among those loci. The connectivity between genes belonging to these loci was assessed through text mining of PubMed abstracts. Each gene at each locus was scored for enrichment in GRAIL connectivity to genes located at the other loci by using statistical text-mining methods, as previously described25. The interconnectivity among genes at asthma risk loci was visualized using VIZGRAIL47 (URLs).

Variance explained by the asthma-associated genetic variants

We estimated the variance in asthma liability explained by the 22 distinct genome-wide-significant SNPs (18 lead SNPs plus four secondary signals identified by approximate conditional analysis) at the 18 asthma-associated loci, by using a method based on the liability threshold model48 and assuming a prevalence of asthma of 10%. The variance in asthma liability explained by individual SNPs was summed over all 22 significant variants. For the loci that included two SNPs (lead SNP and secondary signal), we used the SNP effect sizes estimated by approximate joint analysis by using GCTA44. We also estimated the variance in asthma liability explained by the nine lead SNPs at the nine new asthma loci and by the 13 distinct genome-wide-significant signals at the nine known loci.

Life Sciences Reporting Summary

Further information on experimental design is available in the Life Sciences Reporting Summary.

Data availability

The summary statistics of the meta-analysis that support the findings of this study are available through a link from the GWAS Catalog entry for the TAGC study on the EMBL–EBI (European Bioinformatics Institute) website (https://www.ebi.ac.uk/gwas/downloads/summary-statistics).