Ethnic-specific associations of rare and low-frequency DNA sequence variants with asthma

Common variants at many loci have been robustly associated with asthma but explain little of the overall genetic risk. Here we investigate the role of rare (<1%) and low-frequency (1–5%) variants using the Illumina HumanExome BeadChip array in 4,794 asthma cases, 4,707 non-asthmatic controls and 590 case–parent trios representing European Americans, African Americans/African Caribbeans and Latinos. Our study reveals one low-frequency missense mutation in the GRASP gene that is associated with asthma in the Latino sample (P=4.31 × 10−6; OR=1.25; MAF=1.21%) and two genes harbouring functional variants that are associated with asthma in a gene-based analysis: GSDMB at the 17q12–21 asthma locus in the Latino and combined samples (P=7.81 × 10−8 and 4.09 × 10−8, respectively) and MTHFR in the African ancestry sample (P=1.72 × 10−6). Our results suggest that associations with rare and low-frequency variants are ethnic specific and not likely to explain a significant proportion of the ‘missing heritability’ of asthma.

A sthma is a common, chronic inflammatory disease of the airways with typical onset in childhood. Heritability estimates indicate that approximately half the variation in risk is attributable to genetic factors [1][2][3] ; yet, the common variants identified by genome-wide association studies (GWAS) account for very little of the genetic risk [4][5][6][7] . This so-called 'missing heritability' in asthma and other common diseases has been attributed to many potential causes that generally fall into two categories 8 . First, the genetic variants interrogated by GWAS may not capture all relevant variation. In particular, rare variants, which comprise the bulk of genetic variation in the human genome and are predicted to have larger phenotypic effects than common variants, are not represented or tagged by the singlenucleotide polymorphisms (SNPs) that are included on the genotyping arrays typically used in GWAS. Second, the statistical approaches used in GWAS may be overly simplistic and not adequately model the genetic architecture of asthma, which is probably polygenic and includes many gene-environment interactions 9 .
In this study, we begin to address the first limitation of GWAS by investigating the role of rare (minor allele frequency (MAF) o1%) and low-frequency (MAF 1-5%) variants in asthma. We conducted this meta-analysis study in the largest ethnically diverse sample of asthma cases, non-asthmatic controls and caseparent trios, which include 13 studies 4,10,11 from 8 groups of investigators who compose the EVE Consortium on Asthma Genetics 4,10 . We conducted analyses separately in each of the 13 study samples and used meta-analyses to first combine results within each of three major ethnic/racial groups (European American, African American/African Caribbean and Latino) and then across all ethnic/racial groups to yield measures of association for the combined sample. Here we report associations with a low-frequency functional variant and lower risk for asthma in Latinos in a novel gene, GRASP, and with functional variants in the MTHFR gene in an African ancestry sample. We also report an association in Latinos and in the combined sample with functional variants in GSDMB, a gene at the 17q12-21 asthma locus 4,5,[12][13][14] , which is attributed to a putatively damaging common missense mutation.

Results
Single-variant association analyses. We first evaluated the individual effects of low-frequency, putatively functional exonic variants on asthma risk in 11,225 subjects representing European Americans, African Americans, African Caribbeans, Mexican Americans, Mexicans and Puerto Ricans (Table 1). We tested each low-frequency variant, defined as having a MAF between 1% and 5% within each ethnic group. For the meta-analysis in the combined sample, we excluded variants that were either common (MAF Z5%) in all three ethnic groups or rare (o1%) in all three ethnic groups (Supplementary Table 1 and Supplementary  Table 2). This definition allows us to include SNPs that may be common in one population but rare in the other(s). The quantilequantile plots for the analyses of low-frequency variants indicated that there was no inflation of test statistics ( Supplementary  Fig. 1). One missense mutation (c.C1093G, p.L365V) in the GRASP gene was associated with asthma risk in the Latinos after correcting for multiple testing (9,519 tests; Genome-wide Efficient Mixed Model Association (GEMMA) 15 ALHS  776  802  859  719  1,578  CAG  186  209  172  223  395  CAMP*  376  706  588  494  1,082  CHS  415  640  546  509  1,055  COAST  134  119  150  103  253  Total  1,511  376*   1,770  2,315  2,048  4,363   African Ancestry  CAG  274  200  170  304  474  GRADD-AA  236  288  199  325  524  GRADD-AC w  227  297  250  274  524  SAPPHIRE  553  233  278  508  786  Total  1,290  1,018  897  1,411  2,308   Latino  CHS  581  775  713  643  1,356  GALA II-MA  568  604  568  604  1,172  GALA II-PR  844  540  731  653  1,384  MCAAS*  214  428  337  305  642  Total  1, Test (FBAT) 16 meta-analysis, P ¼ 4.31 Â 10 À 6 ; N ¼ 4,554; odds ratio (OR) ¼ 1.25; MAF ¼ 0.012). Six additional low-frequency variants were modestly associated at suggestive levels of significance (Po10 À 4 ) in at least one of the three groups or in the combined sample (Table 2). Of these, five missense mutations (one each in FAM134B, NPC1L1, IKZF1, SLC26A3 and GSDMB) showed evidence of association with asthma in the combined sample, two of which (IKZF1 and SLC26A3) were more significant in the African ancestry sample. The low-frequency variants at FAM134B (c.G1145C, p.S382T; transcript accession NM_001034850), NPC1L1 (c.C2920T, p.P974S) and IKZF1 (c.G1009A, p.G337S; transcript accession NM_006060) were all associated with lower risk for asthma, whereas the variants in SLC26A3 (c.A241G, p.I81V) and GSDMB (c.A365G, p.E122G; transcript accession NM_001042471) were associated with increased risk for asthma. An additional nonsense mutation (c.C919T, p.R307X) in PDE4DIP was also associated with asthma risk in the African ancestry sample. The latter result is noteworthy for two reasons. First, an exome-sequencing study in a European American, non-Hispanic white nuclear family with two asthmatic and two non-asthmatic children also reported a missense mutation in PDE4DIP (c.A907C, p.I303L) segregating with asthma 17 . Second, this gene encodes a protein that interacts with PDE4D, a gene that was identified in a GWAS of asthma 18 .

/Family Based Association
Our results further implicate this pathway in asthma pathogenesis.
The missense mutation in GSDMB is also intriguing, because this gene is at the well-replicated 17q12-21 asthma locus 4,5 ; this signal is largely attributable to the GALA II Mexican American sample (GEMMA, P ¼ 1.1 Â 10 À 4 ; N ¼ 1,172). This variant showed striking MAF differences between the groups, ranging from 0.19 in the Latinos to 0.0031 in the African Americans and 0.0023 in the European Americans, but was included because it was neither rare nor common in all three groups (see Methods).
Conditional analyses at the 17q12-21 locus. To determine whether the association with this variant (c.A365G, p.E122G; rs12450091) in Latinos and the combined sample represents independent risk for asthma or is in linkage disequilibrium (LD) with the common variants at the 17q12-21 locus that have been associated with asthma in previous GWAS 4,5,19,20 , we performed conditional analyses of rs12450091 in the case-control samples only (case-parent trio studies were excluded and meta P-value recalculated: Latino GEMMA meta-analysis P ¼ 6.89 Â 10 À 5 , N ¼ 3,912; combined GEMMA meta-analysis P ¼ 5.52 Â 10 À 5 , N ¼ 9,501). For this analysis, we first identified three common variants (one synonymous, one missense and one intronic) at this locus that were included on the exome array and associated with asthma in previous reports (G at rs907092 (refs 4,5,19), G at rs2305480 (refs 5,20) and T at rs7216389 (refs 5,21,22)). In Latinos, the risk allele at rs12450091 (C) tags 28% of the high-risk GGT haplotypes (GGCT frequency 18.09%, GGTT frequency 45.84%) and is not in LD with any of the known expression quantitative trait loci reported at this locus 12,[21][22][23][24] . We then repeated the association studies with rs12450091 (c.A365G, p.E122G) in GSDMB in three separate analyses, including each of the common variants as a covariate in each analysis to 'remove' the effects of the common variant on the association signal. In each of these three analyses, the evidence for association between rs12450091 and asthma was reduced from P ¼ 6.89 Â 10 À 5 to Pr0.022 in Latinos and from P ¼ 5.52 Â 10 À 5 to Pr0.019 in the combined sample (Table 3). These results suggest that although most of the evidence for association with this variant was shared with the associated common variants, the residual signal in conditional analyses indicates that this missense substitution or a variant in strong LD with the missense substitution contributes additional, independent risk to asthma in Latinos.
Power analyses for low-frequency variants. After testing for nearly 33,000 functional low-frequency exonic variants in 11,225 subjects, only 1 variant remained statistically significant in Latinos after correcting for testing of 9,519 of variants and only 6 were modestly associated at P-values o10 À 4 in the African ancestry or the combined sample. To assess whether the paucity of associations was due to lack of power, we performed simulations to determine the power of our studies to detect associations with asthma for low-frequency variants. We considered ORs of 1.5, 2, 2.5, 3 and 4, within the expected ranges of effect sizes for Table 2 | Results of meta-analyses of single SNP association studies of low-frequency functional variants and asthma risk with association P-values o10 À 4 .  Fig. 2). Thus, the lack of significant associations with low-frequency variants in our study could be due to low power in the ethnic/racial groups, small effect sizes of variants in this frequency range, or both.
Gene-based analyses. Studies of rare variants have virtually no power to detect individual associations in samples of even many thousands of individuals [25][26][27] . Rather, gene-based tests that combine the evidence for association among all variants within each gene provide more power to detect associations. To test the hypothesis that the overall burden of functional variants at each gene confers risk for asthma, we performed a gene-based association test as implemented in MetaSKAT-O 28 for each of the 10,208 genes represented on the exome array that had three or more functional variants present in at least two case-control samples. We included in these analyses all putatively functional variants on the array regardless of allele frequency. We performed two sets of analyses. In the first we weighted all variants equally, and in the second we gave larger weights to rarer variants (MetaSKAT-O default). For each of those analyses, we considered all potentially functional variants (missense, nonsense and splice site) and then considered only variants predicted to be 'probably damaging' by Polymorphism Phenotyping v2 (PolyPhen-2) 29 .
In the analysis with equal weights and defining functional variants more broadly, associations with GSDMB and ZPBP2, two genes at the 17q12-21 asthma locus, were significant in the combined (MetaSKAT-O: P ¼ 6.10 Â 10 À 10 , 16 variants and P ¼ 1.34 Â 10 À 6 , 7 variants, respectively; N ¼ 9,501) and Latino (MetaSKAT-O: P ¼ 1.02 Â 10 À 6 , 12 variants and P ¼ 1.73 Â 10 À 6 , 7 variants, respectively; N ¼ 3,912) samples after Bonferroni correction for 9,534 and 10,439 genes, respectively. In addition, the MTHFR gene was significant (MetaSKAT-O: P ¼ 1.72 Â 10 À 6 , 11 variants; N ¼ 2,308) in the African ancestry sample after multiple testing correction for 10,342 genes. When we considered only variants with PolyPhen-2 prediction of 'probably damaging', the association with GSDMB remained significant (combined MetaSKAT-O: , but the associations with the ZBPB2 and MTHFR genes did not. No genes were significant when larger weights were applied to rare variants in either of the analyses (  (Table 5). Surprisingly, the proportion of functional variants on the array was similar in the three ethnic Thus, overall, the rare and low-frequency functional variation at the 17q12-21 locus was reasonably well covered by the array.

Discussion
Our study is the largest to date, to assess the effects of rare variants on asthma risk in ethnically diverse individuals. Although we were limited by the variation present on the array and by relatively small samples within each ethnic/racial group, our results provide several insights into the genetic architecture of asthma. First, associations with rare or low-frequency variants are likely to be ethnic specific. The few significant associations that we detected were all limited to a single ethnic group, despite having the greatest power to detect associations in the combined sample. Two novel associations, one with a missense mutation in the GRASP gene in Latinos and one with the many rare missense mutations in the MTHFR gene in African Americans/African Caribbeans, were revealed. The MTHFR gene harbours 11 missense mutations in African Americans, only one of which is common (rs1801133), and showed a gene-level association with asthma. Although most mutations in this gene were not predicted to be 'damaging' by PolyPhen-2 scores, the sheer number of potentially functional variants and the strong association with asthma in the African ancestry sample makes this gene a good candidate for further studies. Common variants in MTHFR, encoding methylenetetrahydrofolate reductase, have been associated with many common conditions, including neural tube defects, vascular disease and pregnancy complications, among others 18,31-33 . There have been conflicting reports as to whether the common C677T variant in this gene is associated with asthma or allergic diseases in European populations 4,5,34-36 , although maternal folate levels in pregnancy have been associated with risk of early asthma outcomes in children 37,38 . The missense mutation  in GRASP that is associated with asthma in Latinos occurs at a frequency of 0.012 in this population, at higher frequency (0.034) in European Americans and very low-frequency (0.006) in the African ancestry samples. The fact that the association was confined to Latinos even though we had greater power to detect associations in the European Americans suggests that the effects of this variant on asthma risk might involve interactions with other variants or environmental exposures that are more common in Latinos, but this single-variant association requires replication in an independent Latino sample or functional validation of this variant's role in asthma risk. GRASP encodes GRP1 (general receptor for phosphoinositides-1)-associated scaffold protein and has been shown to be highly expressed in the brain with lower levels of expression in the lung, heart, embryo, kidney and ovary 39 . Although little is known about the function of this gene or its potential role in asthma, scaffold proteins have been implicated in asthma signalling pathways 40,41 . Second, among the six low-frequency variants showing suggestive evidence for association with asthma (Table 2), three variants were associated with decreased risk and three were associated with increased risk. The discovery of potentially protective alleles in three genes, FAM134B (family with sequence similarity 134, member B1), NPC1L1 (Niemann-Pick disease, type C1, gene-like 1) and IKZF1 (IKAROS family zinc finger 1), is intriguing because elucidating the function of these variants could identify novel therapeutic targets for asthma.
Third, the GSDMB (gasdermin B) locus on chromosome 17q12-21 is within an LD block that has been robustly associated with asthma 4-6,12,13 ; yet, the causal gene and its function are still unclear. The SNPs discovered by GWAS that are associated with asthma are also associated with the expression of both GSDMB and its close neighbour ORMDL3 in white blood cells 12 , peripheral blood mononuclear cells 21 , lymphoblastic cells lines 42 , CD4 þ T cells 23,24 and lung tissue 22 , and it has been assumed both that the causal SNPs at this locus are expression quantitative trait loci for these genes and that one or both of these genes are involved in asthma pathogenesis. Our study revealed several lines of evidence of association with GSDMB and extensive functional variation within this gene, including a common missense mutation rs2305479 (c.G871A, p.G291R; rs2305479) that is predicted to be 'probably damaging' by PolyPhen-2. This common variant was identified in the previous EVE meta-analysis of GWAS 4 (P ¼ 1.21 Â 10 À 12 in the combined sample) and is in strong LD with the GWASidentified SNP, but its potential as a causal candidate has not previously been discussed. Our study suggests rs2305479 as a good candidate for the causal variant at this important locus and implicates GSDMB more directly in asthma risk. We identified a second potential causal variant, a common SNP within a splicing site in GSDMB (rs11078928; c.661-2A4G; transcript accession NM_001165959). This SNP was previously shown to have a genotype-dependent effect on GSDMB transcript levels, whereby the asthma risk allele (A) was associated with increased expression of an alternate transcript lacking exons 5-8 (ref. 43). Although not much is known about the function of GSDMB or of its alternative splice forms, other gasdermin genes are involved in apoptosis in epithelial cells 44 , processes and cells that are relevant to asthma. It is additionally notable that in contrast to GSDMB, there was only one rare missense mutation in ORMDL3 in Latinos (rs200735199, c457T, p.T96M; transcript accession NP_644809; MAF 0.005) in the WGS from 278 asthmatics, and this variant was not included on the exome array.
In summary, we report significant associations of asthma with rare or low-frequency variants in two novel genes, GRASP and MTHFR, in Latinos and the African ancestry sample, respectively, and with a potentially damaging common missense mutation in GSDMB in Latinos and the combined sample. Based on the WGS from 278 asthmatic subjects, our study included over 60% of all putatively functional rare and low-frequency variation. Thus, even if we discovered only half of all truly associated variants, it is unlikely that rare and low-frequency variants will account for a significant portion of the heritability of asthma.

Methods
Study subjects. The subjects in the study were from 13 individual studies from eight investigators that comprise the EVE Consortium on the Genetics of Asthma. This sample has large overlap with our previous asthma meta-analyses 4,10 , but includes three additional studies: GALA II, COAST and ALHS. Descriptions of each of the studies can be found in prior published work 4,10,11 or in Supplementary Note 1. The results reported here are based on analyses of 4,794 asthma cases, 4,707 asthma controls and 590 case-parent trios ( Table 1) Table 5). We excluded from analyses variants with genotype call rates o95%, Hardy-Weinberg equilibrium P-values o10 À 4 in any of the studied populations (African American, African Caribbean, European American and Mexican/Mexican American or Puerto Rican) and 333 caution sites reported as problematic by the exome array design group (http://genome.sph.umich.edu/wiki/Exome_Chip_Design). Among the variants passing quality control, 197,339 (79.61%) were variable in the individuals under study. In the combined sample, 23,206 (11.76%) variants were common (MAFZ5%) to all ethnic groups, 36,413 (18.45%) were low-frequency (MAF 1-5%) and 137,720 (69.79%) were rare (MAFo1%) in all three groups. Private variants for each group and counts of variants tested are shown in Supplementary Table 1 and  Supplementary Table 2. Additional probe quality and genotype-calling evaluations were performed and are described in Supplementary Note 2.
Sample quality control. Six hundred and seventy-nine subjects were excluded from analysis (Supplementary Table 6), resulting in a final sample size of 11,225 individuals. Sample exclusions included gender discordances, sample duplications, Mendelian errors in trios, as implemented in PLINK 45 , missing case-control status and incomplete trios. In addition, we completed principal component analysis using the R function prcomp to identify ancestry outliers ( Supplementary Figs 7-9). Principal component analysis was carried out by using the ancestry informative markers present on the exome array and including HapMap 46 CEU, YRI, CHB, JPT and MEX subjects. Fifty-six samples were excluded because one or both of the first two principle components were more than six s.d. from the mean of the expected ethnic group. These inferred principle components were applied as covariates in the gene-based tests.
Single variant analyses. We subjected functional low-frequency variants to single variant association analysis for each of the three ethnic groups (European American, African American and Latino) and for the combined sample. Low-frequency variants (1-5%) were included in each of the ethnic-specific meta-analyses (8,249 in the European Americans, 17,861 in the African ancestry and 9,519 in the Latinos) and the variants that were neither common (Z5%) in all 3 ethnic groups nor rare (o1%) in all three ethnic groups (32,681 variants) were meta-analysed in the combined sample (Supplementary Table 2). To control for population stratification and relatedness, single SNP association tests for case-control studies were completed using the linear mixed model GEMMA 15 . For the samples comprising case-parent trios, we used the FBAT 16 . Z-score statistics obtained from GEMMA and FBAT were combined in a meta-analysis as a weighted sum to reflect the power of each independent study. The weight (w) used for each SNP is a function of allele frequency (p), proportion of cases (v) and sample size (N) Þ 10 of each study. P-values were calculated using standard normal approximations and Bonferroni significance thresholds were applied (a ¼ 0.05) to correct for the number of variants tested within each group and in the combined sample. ORs were obtained for samples analysed with GEMMA as a linear combination of log ORs with weights proportional to the s.e. of the log ORs using the R package meta.
Power analyses for single-variant analyses. For each ethnic group and for the combined sample, we performed the following simulations. We randomly selected 1,000 variants with MAF between 0.5% and 5%, and simulated genotypes 1,000 times for each of 5 ORs (OR ¼ 1.5, 2, 2.5, 3 and 4). We used the same study structure (sample sizes and case-control or parent offspring designs) and observed MAF distributions that were present in our data. Z-scores were then meta-analysed with the weighted approach described in the single-variant meta-analysis methods and power was calculated with a predefined a of 0.05. We plotted power estimates versus MAF and implemented a local linear regression fit with the R function loess ( Supplementary Fig. 2).
Gene-based analyses. Functional variants sets were defined two ways: (1) missense, nonsense or variants within splice sites or (2) coding variants with Poly-Phen-2 (ref. 29 ) prediction scores Z0.957 (probably damaging variants). PolyPhen-2 predicts possible impact of an amino acid substitution on the structure and function of human proteins. In the first functional definition set, variant pruning for each gene was completed within each ethnic group by randomly selecting one variant for each pair of functional variants with LD r 2 40.9 (r 2 and D' values were calculated with PLINK). Requiring each gene to comprise at least 3 functional variants, we considered a total of 140,986 functional coding variants (8,933 genes) in European Americans, 148,677 functional variants (10,342 genes) in African Americans, 149,236 functional variants (10,439 genes) in Latinos and 151,006 functional variants (9,534 genes) in the combined sample. In the second functional definition set, we considered a total of 20,077 functional coding variants (1,977 genes) in European Americans, 22,625 functional variants (2,465 genes) in African Americans, 22,644 functional variants (2,427 genes) in Latinos and 22,248 functional variants (2,316 genes) in the combined sample. We performed a metaanalysis of gene-based association tests applying the first five principal components as implemented in MetaSKAT-O 28 . Analyses were completed with two sets of variant weights: either larger weights on rare variants (MetaSKAT-O default parameter) or equal weights for all variants. Bonferroni significance thresholds were applied (a ¼ 0.05) to correct for the number of genes tested within each group and in the combined sample.
Assessment of functional variants on the Illumina exome array. WGS for the 278 asthmatics was carried out at Complete Genomics (CGI) and resulted in an average read depth of 95.4% at 20 Â or greater across the exome target and 91.4% across the genome. We excluded singletons from this analysis that were discovered in the 278 WGS but not present in dbSNP137 or in the Exome Sequencing Project variant server. We selected functional variants (missense, nonsense and splice site mutations) that were of low-frequency or rare (MAF o5%) in any of the ethnic groups in the WGS and identified the overlap with the variants present on the exome array.