Introduction

Studying how environmental factors interact with single nucleotide polymorphisms (SNPs) to moderate or modify their effects on disease risk can advance our understanding of disease etiology. The genome–environment-wide interaction study (GEWIS) strategy has been used to agnostically search for SNP × environment interactions over the entire genome [1]. However, this strategy suffers from a large multiple testing burden. Additionally, to maintain comparable power, the detection of interactions generally requires sample sizes around four times greater than those required to test main effects [2]. Thus, most studies are underpowered to perform genome-wide scans of SNP × environment interactions.

Table 1 Empirical type I error for each choice of STT for the Gamma Method

Methods have been proposed to combine gene–environment (GE) interactions across a gene to test for interaction of an environment with a gene, rather than SNP, of interest [3,4,5]. This approach not only reduces multiple testing, but can also increases power by aggregating multiple small SNP-level GE effects. Gene-level interaction tests can be further combined into gene-set (or pathway) tests to evaluate whether genetic variation in a given gene-set/pathway interacts with the environment to influence the outcome.

Gene-set analyses have been used extensively to study the main effects of genes, with methods for testing two different types of hypotheses: competitive and self-contained [6]. In a competitive gene-set analysis (GSA), data across the entire genome are used to assess whether observed associations cluster in the gene-set of interest more than in the rest of the genome. In contrast, self-contained GSAs only consider data for a given pathway of interest to test whether genetic variants in the gene-set are associated with the outcome. In the context of self-contained analysis of genetic main effects, two-step approaches were shown to be more powerful than one-step approaches [6]. In a two-step approach, gene-level p-values are first computed, and then the gene-level p-values within a pathway are combined using strategies such as Fisher’s Method or the Gamma Method [7]. A key assumption of these p-value combination methods is that the gene-level p-values are independent. However, because gene-level p-values may be correlated due to SNPs in genes within the same pathway being in linkage disequilibrium (LD), this assumption may be incorrect resulting in inflated probability of a type I error for the pathway-level test. To avoid this problem, permutation techniques are typically used to control type I error in pathway-level tests of genetic main effects

There has been a lot of progress in developing methods to study genetic main effects in a gene-set [6,7,8]. One popular pathway tool, MAGMA [9], can also jointly test genetic effects and GE interactions. However, no methods have been proposed for solely assessing GE interactions in a gene-set. In this paper, we address the challenges of assessing GE interactions in a two-step self-contained GSA (GE-GSA). We consider the analysis of pre-defined gene-sets, for example, a set of genes within a specific biological pathway defined in the Kyoto Encyclopedia of Genes and Genomes (KEGG) database (http://www.genome.jp/kegg/pathway.html) or the Gene Ontology (GO) Consortium database (http://www.geneontology.org). We apply a two-step approach in which gene-level p-values are combined using the Gamma Method [7]. While permutation approaches can be used to obtain a valid pathway-level p-value in a GSA of genetic main effects, in a GE interaction analysis, it is typically not possible to construct an exact permutation test. Bužková et al. [10] showed that a parametric bootstrap, originally proposed by Efron [11], can be used instead when testing GE interactions at the gene level. Here we extend this approach to develop a test for GE interactions for a gene-set, and perform simulations to evaluate the type I error and power of our method.

We demonstrate the proposed method by testing whether the risk of obesity associated with bipolar disorder (BD) is modified by genetic variation in the Wnt signaling pathway using data from the Genetic Association Information Network (GAIN) study of BD. A previous analysis of the GAIN data revealed a genome-wide significant association of BD with the interaction between body mass index (BMI) and an SNP in the gene TCF7L2 [12]. TCF7L2 codes for the transcription factor TCF/LF, which plays a role in the Wnt signaling pathway [13]. The Wnt signaling pathway has crucial implications in neurodevelopment, neurogenesis, and neuroplasticity, and is involved in mechanisms of action of medications used to treat BD [14, 15]. Therefore, here, we extend the prior findings of the association of BD with the TCF7L2 SNP × BMI interaction, to investigate interaction effects with other genetic variants in the Wnt signaling pathway. We run the GE-GSA including and excluding TCF7L2 from the gene-set to study interactions involving genetic variation in the pathway and to assess whether observed associations at the pathway level are due solely to the previously described TCF7L2 interaction.

Methods

Assume n unrelated subjects are genotyped for a set of variants in a gene-set or pathway consisting of K genes with pk SNPs in the Kth gene and \(P = \mathop {\sum }\nolimits_{k = 1}^K p_k\)total SNPs. Let the genotypes (coded, for example, as the number of copies of the minor allele) at the pk SNPs for the kth gene for the ith subject be denoted as \({\boldsymbol{G}}_i^{\left( k \right)} = \left( {G_{i1}^{\left( k \right)}, \ldots ,G_{ip_k}^{\left( k \right)}} \right)\). Denote the phenotype (e.g., disease or quantitative trait), covariates, and environment for the ith subject as Yi, X, and Ei, respectively. We model the GE interactions for both continuous and binary outcomes in a generalized linear model (GLM) framework. In particular, we model continuous traits using linear regression and binary (disease) traits using logistic regression. Thus, we consider the model:

$$\left( {\mu _i} \right) = \alpha _0 + {\boldsymbol{\alpha }}_1{\boldsymbol{X}}_i + \alpha _2E_i \!+\! \mathop {\sum }\limits_{k = 1}^K {\boldsymbol{\alpha }}_3^{\left( k \right)}{\boldsymbol{G}}_i^{\left( k \right)} \!+\! \mathop {\sum }\limits_{k = 1}^K {\boldsymbol{\beta }}^{\left( k \right)}({\boldsymbol{G}}_i^{\left( k \right)} \!\times\! E_i),$$
(1)

where g(.) be the canonical link function for the mean, \(\mu _i = E\left( {Y_i\left| {{\boldsymbol{X}}_i,E_i,{\boldsymbol{G}}_i^{\left( 1 \right)}, \ldots ,{\boldsymbol{G}}_i^{\left( K \right)}} \right.} \right)\), of the phenotype, and \(({\boldsymbol{G}}_i^{\left( k \right)} \times E_i) = (G_{i1}^{(k)}E_i, \ldots ,G_{ip_k}^{(k)}E_i)\) is the vector of pk GE interactions. We are interested in testing the null hypothesis H0(k) = 0 for all k = 1,…,K.

A two-step GE-GSA begins by testing for GE interaction in each gene within the gene-set. There are many options for performing a gene-level GE interaction test [5]. Here we use principal components (PCs) to reduce the dimensionality of SNP data within a gene: for the Kth gene, we calculate PCs using all pk SNPs within the gene and retain the first qk components (qk < pk) that explain 80% of the variance within the gene. We then test the GE interactions for the Kth gene by performing a score test of the β(k) term in the following model:

$$g\left( {\mu _i} \right) \!=\! \alpha _0^{(k)} \!+ {\boldsymbol{\alpha }}_1^{(k)}{\boldsymbol{X}}_i + \alpha _2^{(k)}E_i + {\boldsymbol{\alpha }}_3^{(k)}{\boldsymbol{PC}}_i^{\left( k \right)} \!+ {\boldsymbol{\beta }}^{\left( k \right)}\left( {{\boldsymbol{PC}}_i^{\left( k \right)} \!\times\! E_i} \right),$$
(2)

where PCi(k) is the set of pk PCs in the Kth gene for the ith person and \(({\boldsymbol{PC}}_i^{\left( k \right)} \times E_i) = (PC_{i1}^{(k)}E_i, \ldots ,PC_{iq_k}^{(k)}E_i)\) is the vector of qkPC × E interactions. By using PCs to model the effects of SNPs in a gene, the collinearity of correlated SNPs within a gene is removed, and the number of parameters in the model is reduced by 2(pkqk).

Gamma Method

Once the K gene-level GE interaction p-values (pval1, … pvalK) are computed, we propose aggregating them to the gene-set using the Gamma Method. The test statistic for the Gamma Method can be written as

$$T = \mathop {\sum }\limits_{k = 1}^K G_{{\mathrm{\omega }},1}^{ - 1}\left( {1 - pval_k} \right),$$
(3)

where \(G_{{\mathrm{\omega }},1}^{ - 1}( \cdot )\) is the inverse gamma distribution with shape parameter ω and scale parameter 1. More emphasis can be given to p-values below a particular threshold, referred to as a soft truncation threshold (STT), by varying \(\omega = {\mathrm{G}}_{{\mathrm{\omega }},1}^{ - 1}(1 - STT)\) [6]. The uncorrected p-value Pval for a pathway can be calculated for the test statistic T using its null distribution; when the p-values are independent and have a standard uniform distribution, the null distribution of T is a Gamma distribution with shape parameter ω and scale parameter 1. When ω= 1, the null distribution of T is a χ2 distribution with degrees-of-freedom (df) equal to 2K. This specific case of the Gamma Method, which corresponds to an STT of 1/e, was first described by Fisher [16] and is known as Fisher’s method. Because the optimal STT is unknown for a given dataset, it may be beneficial to search over multiple STTs and use the STT that leads to the smallest p-value. However, the minimum p-value resulting from this search (minGamma) will have inflated probability of a type I error unless the multiple testing is correctly taken into account. Additionally, the Gamma Method assumes that gene-level GE interaction p-values are independent. Violation of this assumption can lead to inflated probability of a type I error. Typically, permutations are used to control type I error when evaluating genetic main effects at a pathway level, but this approach is not applicable when assessing interactions [10]. Thus, we propose using a parametric bootstrap approach to control the type I error.

Parametric bootstrap

Bužková et al. [10] proposed a parametric bootstrap to test for GE interaction at the gene-level. For a GE-GSA, the parametric bootstrap can be applied to estimate p-values using the following steps:

  1. 1.

    Compute the uncorrected gene-set p-value (Pval) for the original data as described above in the section on the Gamma Method.

  2. 2.

    Obtain parameter estimates of SNP and environment effects from the original data by fitting the gene-set model under the null hypothesis H0:β(K) = 0 for all k.

  3. 3.

    For all subjects, simulate responses Y(b) from the model obtained in Step 2.

  4. 4.

    Using the data simulated in Step 3, perform a two-step GE-GSA using the same PC-Gamma gene-set level GE test that was used in the original data, i.e., compute gene-level GE interaction p-values using the simulated response, and then combine the gene-level p-values in the gene-set using the Gamma Method with a chosen STT (or the minGamma method) to obtain the simulated gene-set p-value Pval(b).

  5. 5.

    Repeat Steps 3–4 B times to approximate the distribution of the p-values under the null hypothesis.

  6. 6.

    Compute the corrected (parametric bootstrap) p-value by comparing the uncorrected p-value from Step 1 with the simulated null p-values: PvalPB = \(\frac{1}{B}\mathop {\sum }\nolimits_{b = 1}^B I(Pval < Pval^{(b)})\) where I(·) is the indicator function. Standard error for PvalPB can be calculated using the binomial distribution: \(\sqrt {Pval_{PB}\left( {{\mathrm{1 - }}Pval_{PB}} \right){\mathrm{/}}B}\).

Fitting the gene-set model under the null hypothesis (i.e., Step 2 above) requires estimating effects for the environment, covariates, and all SNPs in the gene-set. Since the number of SNPs in a gene-set is typically very large (P > n), this model usually cannot be fit using a standard GLM. Even if the parametric bootstrap was based on modeling the effects of PCs (rather than SNPs) under the null hypothesis of no GE effects, problems with estimation would still likely to be encountered unless very few PCs were used, which may not appropriately model the main effect. Instead, we propose fitting the null model by treating the genetic main effects as random effects in the following model:

$$g\left( {\mu _i} \right) = \alpha _0 + {\boldsymbol{\alpha }}_1{\boldsymbol{X}}_i + \alpha _2E_i + \gamma,$$
(4)

where \(\gamma \sim {\mathrm{MVN}}(0,\sigma _G^2{\mathbf{GG}}^T)\) and

$${\mathbf{G}} = \left[ {\begin{array}{*{20}{c}} {{\mathbf{G}}_1^{(1)}} & {{\mathbf{G}}_1^{(2)}} & \cdots & {{\mathbf{G}}_1^{(K)}} \\ {{\mathbf{G}}_2^{(1)}} & {{\mathbf{G}}_2^{(2)}} & \cdots & {{\mathbf{G}}_2^{(K)}} \\ \vdots & \vdots & \ddots & \vdots \\ {{\mathbf{G}}_n^{(1)}} & {{\mathbf{G}}_n^{(2)}} & \cdots & {{\mathbf{G}}_n^{(K)}} \end{array}} \right],$$

which is an n × P matrix. This model is similar to that used in Genome-wide Complex Trait Analysis (GCTA) but is confined to only the SNPs in the gene-set of interest [17]. This formulation of our model avoids estimating the proportion of variance explained by each SNP and instead estimates the total proportion of variation explained by all of the SNPs in the gene-set. It can be shown that the above model is equivalent to using a ridge penalty to penalize the genetic main effects [18, 19]. With this parameterization, the approximate null gene-set model requires fitting only 3 + L parameters, where L is the number of covariates. This model can be fit using the R package GMMAT [20]. Responses can then be simulated from this null model (Step 3 above) using the best linear unbiased prediction. Code for performing the described parametric bootstrap approach can be found at https://github.com/bcoombes/Parametric_Bootstrap.

Simulations

To illustrate the proposed approach, we analyzed Wnt signaling pathway data in a subset of 388 BD cases and 1020 controls from the GAIN study. We considered models with BD as the outcome predicted by gene–BMI interactions, and with BMI as the outcome predicted by gene–BD interactions. Before analyzing the BD data, we performed simulations to study type I error and power of the proposed approach. For the simulations, we generated data with properties similar to the GAIN-BD Wnt signaling pathway data.

To simulate each replicate dataset, we first sampled genotypes in the Wnt signaling pathway from the GAIN-BD data without replacement N = 1000 times. Sex was independently sampled from a Bernoulli distribution with Pr(Male) = 0.5. We used the GAIN Wnt signaling pathway data to estimate each SNP effect and SNP × BD interaction effect on BMI, using regression models with scaled BMI as the outcome and with sex as a covariate. The directionality of the main and interaction effects for the top SNP in each gene was recorded. We randomly selected J different “causal” SNPs (J = 2, 20, or 100) from this list to have a main effect and an interaction in the simulations. We then simulated a quantitative trait conditional on sex, genotypes from the gene set, and BD status sampled from the real data using a linear model with a standard normal error term. The variation in the quantitative trait explained by the chosen causal SNPs and their interactions was held fixed to explain 30% of the variation throughout the simulations so that under models with more causal SNPs, the per-SNP effects would be smaller. We also analyzed a binary trait derived by dichotomizing the quantitative outcome at the median, to study the methods’ properties when the outcome is binary.

We calculated gene-level p-values using the PC method described above using the simulated quantitative or dichotomized quantitative variable as the outcome variable in the model and testing for the presence of G × BD interactions. Sex was included as a covariate in both analyses. We combined the gene-level results using the Gamma Method with STT equal to 0.01, 0.05, 0.10, 0.15, 0.2, or 1/e (Fisher’s method) as well as the minGamma approach, and computed uncorrected as well as parametric bootstrap gene-set level p-values (i.e., Pval and PvalPB, respectively) for each simulated dataset, in order to evaluate the performance of the proposed method in the absence or presence of GE interactions.

For type I error estimation, we simulated data where the top J SNPs had a main effect and no GE interactions. To estimate power, for the same J SNPs with main effects, we included interaction terms with BD. Type 1 error was estimated at the α = 0.05 and 0.01 levels based on 10,000 null replicate datasets, while power was estimated at the α = 0.05 level based on 1000 replicate datasets under each scenario.

GE-GSA of the GAIN-BD dataset

To illustrate the proposed GE-GSA approach, we used genetic data from the Wnt signaling pathway in a subset of subjects from the GAIN-BD study with available BMI data. The dataset [21] was obtained from dbGaP (phs000017.v3.p1), and includes the same set of subjects as the paper that originally reported the genome-wide significant association between BD and the TCF7L2-BMI interaction [12]. The analyzed gene-set was the Wnt canonical pathway as defined by KEGG. Gene regions were defined as 20 kb up/downstream of the RefSeq transcription start/end sites according to the Human Genome Browser build hg18 [37]. The analysis included 143 genes in the gene-set that had more than one SNP mapping to the gene; 3767 SNPs in these genes were included in our analysis. For each gene, we computed PCs accounting for 80% of the variation, which resulted in a total of 580 PCs in the pathway. We used either BD or log BMI as the outcome testing for either G × BMI or G × BD interactions in the Wnt pathway. BMI was log-transformed to create an approximately normal distribution. Sex was included as a covariate in the model.

Results

Type I error

We first assessed the empirical type I error of the proposed GE-GSA strategies for α = 0.05 and 0.01 (Table 1). We present results for a scenario with J = 2 SNPs that have a main effect but no GE interaction. Other choices of J yielded similar results (not shown). Results in Table 1 demonstrate that the uncorrected (asymptotic) Gamma Method, which assumes the gene-level tests are independent, has inflated probability of a type I error. As expected, the type I error for the minGamma method is even more inflated due to the search over multiple STTs. Using our parametric bootstrap strategy, the type I error for the Gamma and minGamma methods was controlled at the correct level, for either choice of α.

Table 2 Top ten gene-level interaction p-values for the GAIN dataset using either BD status (G × BMI p-value) or log BMI (G × BD p-value) as the outcome

Empirical power

We next compared the power of the GE-GSA strategies which maintained type I error. To evaluate empirical power at α = 0.05, we varied the number of SNPs with main effect and interaction with BD status (J = 2, 20, or 100). Figure 1 shows the power for the Gamma Method using either the largest or smallest choice of STT searched as well as the minGamma method across different choices of J. For J = 2, the smallest values of STT were most powerful and as J increased, larger values of STT performed better. The minGamma method was robust to choices of J and was always powerful. Fisher’s method (STT = 1/e) was most powerful than other options of STT when J was very large. As expected, using a dichotomized outcome resulted in a loss of power.

Fig. 1
figure 1

Empirical power at α = 0.05 level for different choices of STT and varying number of causal genes. Pathway analysis was performed using a quantitative trait (top row) or dichotomous trait (bottom row) as the outcome. The y-axis shows the empirical power

GSA of the Wnt signaling pathway in GAIN-BD data

To test for GE interaction in the Wnt signaling pathway in the GAIN-BD data, we first computed gene-level p-values using the PC method. The gene-level p-values from the G × BMI or G × BD models were highly correlated (r2 = 0.95). The top genes with interaction are listed in Table 2. The majority of gene-level results were more significant when testing for G × BD association with BMI rather than G × BMI association with BD; TCF7L2 was the second most significant gene in both analyses.

Table 3 GE gene-set (Wnt signaling pathway) analysis of the GAIN dataset

We next combined the gene-level results for each model using the strategies described in the Methods section. For the parametric bootstrap, 10,000 p-values were generated from the null model to generate an empirical p-value for the observed data. We report the gene-set GE interaction p-values for all choices of STT in Table 3. The asymptotic Gamma Method p-values are provided to show how the parametric bootstrap adjusts the p-value to control the type I error. When testing for association of BD with G × BMI interactions, significant evidence of interaction in the Wnt pathway was only obtained with Fisher’s method when TCF7L2 was included in the gene-set. When we tested for association of BMI with G × BD interaction, the Gamma with STT >0.1, and minGamma methods, provided significant evidence of interaction in the pathway regardless of whether TCF7L2 was included or excluded. Fisher’s method produced the smallest p-value throughout our analyses, which indicates that there are likely many genes with small GE interaction effects that contributed to the significant pathway-level interaction result. It is not surprising that the pathway-level results were more significant for models with BMI as the outcome, because the gene-level p-values for this model were typically smaller than the p-values for the model with BD as the outcome.

Discussion

In this paper, we proposed a variation of the parametric bootstrap to test for GE interaction in a gene-set. In a GE-GSA, the null model requires estimating a large number of genetic main effects. Instead of fitting all of the genetic main effects, we treated the effects as a random effect, which allowed us to substantially reduce the number of parameters in the model. Treating the genetic main effects as a random effect can be shown to be equivalent to penalizing the effects with a ridge penalty. The proposed parametric bootstrap technique corrected the inflated type-I error probability associated with using an asymptotic Gamma Method in a GE interaction GSA. The parametric bootstrap also maintained correct type I error when searching over multiple values of STT, which provides a GE-GSA test that is powerful across a range of different underlying situations.

In our analysis of the GAIN-BD data using the parametric bootstrap, we found statistically significant evidence that the association of BD with BMI is modified by genetic variation in the Wnt signaling pathway. For this analysis, Fisher’s method produced the smallest p-value which indicates that this pathway may contain many genes with G × BD interaction effects on BMI. Our results also suggest that the previously reported relationship between BD, BMI, and TCF7L2 genetic variation [12] may be reversed where obesity risk is affected by interactions between BD status and the SNPs in the Wnt pathway rather than BD risk being affected by G × BMI interactions. This is in agreement with our simulation results where we generated BMI as dependent on sex, genetics, BD, and G × BD interactions and found the model that treated BMI as the outcome performed better than the model with BD as the outcome. However, directionality of the BD-BMI association cannot be conclusively determined from these analyses, and other approaches such as Mendelian Randomization would be required to assess causality.

One limitation of the parametric bootstrap is that like all sampling-based techniques, it can be computationally intensive. Our analysis of one candidate pathway required us to analyze 10,000 samples generated from the null model. If we were interested in testing GE interactions for many gene-sets in a hypothesis-generating context, the parametric bootstrap would become computationally infeasible. In this situation, we recommend using the much faster uncorrected (asymptotic) Gamma or minGamma method, followed by analysis of the top resulting gene-sets using the parametric bootstrap to ensure proper control of type I error. Another limitation to the parametric bootstrap is that it assumes the null model is correct, and thus a severe misspecification of the null model could result in inflated probability of a type I error. However, the parametric bootstrap is currently one of the only approaches that can be used to simulate a null distribution for GE interaction tests. Further investigation of the impact of null model misspecification and ongoing method development is therefore needed to address this issue. The current simulations also only considered the case where G and E are independent. More study is necessary to investigate GE dependence. Finally, while our simulations only used one type of gene-level GE interaction test, there are other gene-level GE interaction tests that could be implemented in this framework. In our future work, we plan to incorporate the proposed parametric bootstrap to explore which of these gene-level GE interaction tests are most powerful in a two-step pathway analysis.

In this paper, we used a parametric bootstrap approach to derive a valid GE-GSA test with correct type 1 error. It was previously shown that the parametric bootstrap can be a useful tool when the asymptotic distribution of a test statistic is difficult or impossible to derive. However, the parametric bootstrap requires fitting the null model, which sometimes poses a challenge. The parametric bootstrap approach that we proposed for GE-GSA overcame this challenge, and may prove useful in other “big data” applications, such as tests of interactions among SNPs in a gene-set, and gene-level tests of interactions when the number of SNPs in a gene is very large.