Introduction

It is generally accepted that both genetic and environmental factors contribute to the development of complex diseases. Thus, gene–environment (G × E) interaction is a hot topic in human genetics and there are great expectations for potential applications. Personalized medicine or individualized lifestyle recommendations based on the genetic profile are being promoted as the future of public health. Substantial funds devoted to study the genetics of human diseases are justified by these expectations. However, up to now, there are only a few replicated, biologically plausible and methodologically sound examples of G × E interactions with a proven clinical relevance1, 2 and even less are used in daily clinical routines.3 The extent to which G × E interactions are of general importance for the development of common, complex diseases is currently unknown, even though important examples exist. Formal genetic evidence for G × E interaction can consist of the observation that a certain exposure has different effects in different populations or ethnic groups or in people with different genetically determined phenotypes. One example is exposure to sunlight that raises the risk of melanoma much more in fair-skinned than in dark-skinned people, that is there is an interaction between ultraviolet light and skin pigmentation.4

Constant advances in genotyping technology now enable genome-wide association studies and researchers are tempted to investigate their data as comprehensively as possible, including G × E interactions. In this review, we present the perspectives for clinical applications, clarify definitions, discuss the range of application, and the design and required sample size of epidemiological G × E studies. We conclude with some cautionary remarks on methodological challenges of such studies.

Potential applications of G × E interactions in public health and clinical care

The most important area of application for G × E interactions is personalized medicine, both in prevention and treatment (pharmacogenetics). Regarding the first, personalized prevention recommendations could be developed if the effects of an environmental risk factor strongly depend on an identified genetic polymorphism. In this sense, the assessment of the effects of genotypes in different exposure strata or vice versa of environmental exposures on disease risk in different genotype groups might be useful, even without a priori knowledge of the precise biological mechanisms underlying the statistical interaction. However, even the existence of a strong interaction does not imply that high-risk individuals can be easily identified for a targeted intervention, as usually many other factors will be important in disease development. This is for example the case for most so-called ‘sporadic cancers’ where presumably a strong stochastic element is involved in carcinogenesis, making accurate prediction of individual disease risk almost impossible.2 Moreover, most study designs will not yield unbiased estimates of effects – the influence of the investigated risk factors is often overestimated.5, 6 From a public health perspective, the idea of personalized recommendations and targeted intervention has been questioned, as the overall benefit of small changes at a population level may be larger than that of large changes in high-risk individuals.7 Whenever the interaction results only in a stronger or smaller detrimental effect of an exposure in the different genotype groups, all individuals may benefit from avoiding the exposure if the exposure is causally related to the disease. It is this very situation in which general recommendations are advisable, for example like those regarding exercise, smoking and diet.8 Personalized recommendations, however, may be considered reasonable for cases when an exposure has a null or negative effect in one genotype group and a protective effect in another genotype group.

Also the second area of application, pharmacogenetics,9 relies on the existence of such strong G × E interactions. It is implicit that individuals with different genotypes will benefit from different medication in a predictable manner.3 Although it is plausible that the different reactions of patients to drugs may depend on their individual genetic ‘make-up’, the systematic study of such interactions is still in its beginnings. A prerequisite for widespread use in clinical practice is that the genetic variant is a sufficiently strong predictor of harm or benefit.5 One example is anticoagulant treatment, where it is known that warfarin clearance depends on the genotype of the metabolizing enzyme cytochrome P-450 2C9 (CYP2C9). About one-third of Caucasian patients possess one of the polymorphisms that require a reduced maintenance dose of warfarin to avoid adverse side effects. Prior to integration of genetic information in clinical practice randomized, controlled clinical trials will be required to demonstrate the benefits of including CYP2C9 genotype in warfarin dosing (together with other covariates) compared to traditional dose-finding methods.10, 11 For a more detailed view of the potential impact of pharmacogenetics on public health we refer to a review by Goldstein et al.12

Definition and meaning of interaction

While reviewing the data, one will often notice that both different connotations and different concepts of the term interaction are used by statisticians, clinicians, biologists and geneticists.13, 14 Frequently, a precise definition is completely omitted, which may lead to some confusion and controversy between scientists of different disciplines. Quite commonly in general contexts, ‘G × E interaction’ is used in a very loose sense, meaning some sort of interplay between genetic and environmental factors. However, a specific mode of joint action or a certain relationship between statistical risks is not implied in many cases. Sometimes it is even used to express that several factors contribute to disease risk, without excluding the possibility of complete independence. In these cases using for example the term ‘joint action’ would be preferable. If ‘interaction’ is used in a narrower sense, it can refer to a biological (causal) or statistical level and we will define it here, introducing commonly used statistical terminology and finally distinguishing it from confounding.

Biological interaction is defined as the joint effect of two factors that act together in a direct physical or chemical reaction and the coparticipation of two or more factors in the same causal mechanism of disease development.15 Further notations are causal or mechanical interaction. Examples of biological interaction are the direct reaction of a certain exposure with, for example, an enzyme whose detoxification ability depends on the genotype of a certain gene. A good overview of possible causal relationships and interaction mechanisms is given by Ottman.16 Such etiological mechanisms have to be explored by functional studies.

On the other hand, there is the definition of statistical interaction, which does not imply any inference about particular biological modes of action. Statistical interaction (or heterogeneity of effects) is usually defined as ‘departure from additivity of effects on a specific outcome scale’.14 If only one factor is present, its effect on the risk of disease is called main effect. In the case where two or more risk factors are present, the marginal effect of a risk factor is its average effect across all levels of the other risk factors. The risk factors are said to interact, if the effect of one risk factor depends on the level of the other risk factor (Table 1). Several equivalent terms denoting statistical interaction exist, such as non-additivity, effect measure modification or heterogeneity of effects. The joint effect of two risk factors refers to both their marginal effects and their interaction effect. The joint effect can vary from less than additive (subadditive) to more than multiplicative (supramultiplicative) of the individual marginal effects. Theoretical models for such interaction relationships have been explored especially for cancer development, where carcinogens act at different stages.17

Table 1 Example of additive and multiplicative models of relative risks for an environmental and a genetic risk factor

Interactions are sometimes divided into removable and nonremovable:18 if a monotone transformation (eg taking logarithms or square roots of quantitative phenotypes) exists that removes the interaction19 (Figure 1), it is called removable. This implies that there is an additive relationship between the variables, just on a different scale. Therefore, nonremovable interactions are usually of greater interest. To complete the terminology, nonremovable interaction effects are also called crossover effects20 or qualitative interactions (as opposed to quantitative, ie removable interactions).

Figure 1
figure 1

Examples of main and interaction effects. Phenotypic values depending on genotype G (two groups, eg under a dominant genetic model) and exposure E (also two groups, exposed (dotted line) and unexposed (solid line)). (a) Neither G nor E have a main effect and there is no interaction; (b) G has a main effect, E has no main effect, no interaction; (c) E has a main effect, G has no main effect, no interaction; (d) both G and E have main effects, no interaction; (e) G and E have main effects and there is an interaction (which could be removed by changing the phenotype scale, eg to a logarithmic scale); (f) G and E have main effects and there is an interaction (which cannot be removed by any monotone transformation).

Furthermore, it is necessary to distinguish between interaction and confounding of environmental and genetic factors. Confounding refers to a mixing of extraneous effects with the effect of interest,14 for example a (true but unmeasured) risk factor of disease that is correlated with the investigated risk factor and results in a noncausative association. In the context of interactions, this could primarily be a correlation between the genetic and environmental risk factors, which could be misinterpreted as an interaction if the statistical model used does not account for the correlation but treats them as independent. Such a gene–environment correlation can occur in samples with latent population substructure (eg unintentionally including groups of different ethnicity) where both risk allele frequencies and exposure frequencies vary between subpopulations. It can also result from the influence of genes on behavior like alcohol consumption or food and satiety responsiveness that in turn are related to diseases such as coronary heart disease or obesity. In many other contexts confounding would not be a serious concern, as genotype and environmental risk factors will usually be independent – genotypes are fixed throughout life and are thus not influenced by or associated with environmental exposures (cf. concept of ‘Mendelian randomization’21, 22). At the data level, confounding and interaction may lead to similar patterns, especially in partial collection designs such as the case-only design. An identified interaction should therefore be carefully interpreted to consider whether confounding could explain part of the observed effect.

When should G × E interactions be investigated?

The analysis of G × E interactions in genetic epidemiology can be done at both different time points during the research process and with varying scopes. The relevant research questions that could be addressed by a G × E interaction study include the identification of new disease genes, the characterization of gene effects, the clinical relevance of a G × E interaction and the public health impact of it.

In the phase of identification of genetic risk factors, accounting for a G × E interaction might increase the power to detect genes with small marginal effects,23, 24, 25 especially if the effect of a gene is only relevant in an etiological subgroup of patients, defined by a certain exposure. Here, the interaction is not of specific interest per se. Especially for high-throughput genotyping of polymorphisms in hundreds of candidate genes or genome-wide association studies with several hundred thousands of polymorphisms, the inclusion and testing of interactions greatly increase the number of statistical tests and thus the need to correct for multiple testing. Joint tests of marginal and interaction effects25 may provide power over a wide range of unknown true situations. However, in the absence of very strong interaction, tests for marginal gene effects are still the most powerful to identify a disease-related gene.

Alternatively, a G × E study can be part of the detailed characterization of gene effects for genes that have already been shown to be involved in disease etiology but whose effect may vary across different environmental strata. In this case, the interaction itself is of interest and the aim of an initial study may be primarily hypothesis generating (exploratory), possibly investigating several environmental factors or different polymorphisms within one gene to provide effect size estimates. The next step would be to establish clinical relevance of a detected G × E interaction, which involves confirmatory testing of one specific a priori hypothesis within the clinical population and under the circumstances proposed for later application. It also includes the estimation of the strength of the interaction (effect size, eg odds ratio). Ideally, such investigations will be part of a randomized controlled (phase III) trial. Finally, assessments of the public health impact of an established G × E interaction depend on the strength of the interaction, exposure frequency and allele frequencies. More importantly, however, the ascertainment strategy and the study design will require careful considerations to enable generalizations of the study results.

Study designs for G × E

Common family- and population-based designs for association studies can be extended for G × E interaction. Table 2 lists different designs with their respective advantages and disadvantages and research situations in which such a design would be suitable. Family-based designs protect against bias due to population stratification with both differential exposure and genotype distribution in subgroups. In population-based designs, data on a quantitative trait or a disease phenotype are collected from unrelated individuals, either prospectively (cohort) or retrospectively (case–control). If a large prospective cohort exists, a nested case–control study can reduce selection and possibly stratification biases and be a good compromise regarding cost and efficiency.29 For the relative merits of cohort and case–control designs see also the discussion started by Clayton and McKeigue,21 who argue that case–control studies are more feasible and cost efficient than cohort studies for modest disease risks and that exposure misclassification bias is not a serious threat in the case of G × E interactions. Others however stress this possible bias and emphasize the merit of cohorts in studying multiple end points and especially different diseases in one sample.30, 31, 32, 33

Table 2 Study designs for genetic association studies that can include G × E interactions with their main advantages and disadvantages and the situations in which these designs are most suitable

If the interest is limited to G × E interaction, the special ‘case-only’ design exists that has the practical advantage that no controls need to be collected.34 This design is based on the assumption that genotype and environmental exposure are independent in the population that the case sample is drawn from, so that exposure should not differ among subgroups defined by genotype. Since, in the presence of a G × E interaction, specific combinations of genotypes and exposure lead to increased risk of disease and thus are more prevalent among cases, differences in exposure will be observable between genotype groups in cases. Because of the independence assumption, the case-only design is more efficient than the traditional case–control design, but this assumption is not assessable in the case sample alone. Therefore, the design is prone to bias and confounding, especially if there is exposure misclassification (keeping in mind that especially lifetime environmental exposures are not as accurately measurable as genotypes).35, 36, 37, 38 Another drawback is that although estimation of the G × E interaction is possible, the estimation of the joint effect of exposure and genotype is impossible39 even though the latter usually is of greater importance for the public health aspect of a G × E investigation. As a consequence, the practical applicability of this design is limited and it is rarely applied. The case–control design is better suitable to address the relevant research questions,40 and if one is willing to make the assumption of gene–environment independence, analysis methods exist that also leverage this.39, 41

Two special, nonstandard applications of G × E interactions occur in infectious disease and pharmacogenetic studies. In infectious diseases, only individuals exposed to the infectious agent can contract the disease, thus the environmental factor is a necessary causal factor. Genes may modify the risk of infection (or disease severity) for those exposed.42, 43, 44 Examples are the CCR5 gene for HIV infection,45 malaria and heterozygosity for sickle cell anemia46 or variant Creutzfeld–Jakob disease and a polymorphism in codon 129 of the prion protein gene PRNP.47 In these examples, individuals with certain genotypes have a much lower risk for infection or progression to serious disease. Infectious disease studies usually include only individuals at high risk of infection (assumed to be exposed). Here, the aim is an investigation of potential differences in disease prevalence between genotype groups similar to the usual genetic association or linkage studies without explicit consideration of G × E interaction in the statistical analysis. Such differences can then be interpreted as G × E interactions, since the genotype alone cannot lead to an infectious disease. Similarly, some pharmacogenetic studies for licensed drugs aim at identifying individuals at risk for serious side effects or increased efficacy by exclusively including drug-treated patients. In this design it is impossible to distinguish between genetic effects and G × E interaction. More suitable is a design that includes pharmacogenetic aspects in randomized clinical trials by giving placebo or active drug stratified according to genotype.48, 49

Sample size and power

Depending on the strength of the interaction and exposure and allele frequencies, sample size requirements to detect a statistically significant G × E interaction may be substantially larger than the sample sizes to identify a G or E marginal effect. Some illustrative examples for association studies of a candidate gene are shown in Figure 2, which give the required samples sizes for four different study designs (case–control, trio, case-only and cohort) for varying effect sizes of the G × E interaction. Only for very weak marginal effects (OR=1.2, a) and at least moderate interactions (OR>1.5), the interaction is detectable with a smaller sample size than the marginal effect. But even for slightly larger marginal effects (OR=1.5, b) and weak to moderate interactions (OR<2), the sample size required to detect the interaction can be several fold higher than that required for detecting the marginal genetic effect. These examples are based on a level of significance (0.01) that might be used in a confirmatory study for testing one well-defined a priori hypothesis (eg one polymorphism within one gene). Sample sizes would be much higher for (exploratory) studies such as genome-wide association scans with hundreds of thousands of markers, as the correction for multiple testing requires much smaller levels of significance and thus much larger samples. In addition, these studies rely on linkage disequilibrium between the genotyped markers and potentially untyped disease alleles, and such indirect association studies may need much larger sample sizes.50 Especially for G × E interactions that might realistically be even smaller, large cohorts such as BioBank UK (planned with 500 000 individuals over 10 years33), EPIC51 and the Multi-ethnic Cohort52 will be necessary. Although a sample size of 500 000 might be useful for common diseases such as type II diabetes, it will still be insufficient for rarer diseases with prevalence less than approximately 1%, for which case–control studies might be the only feasible approach.21

Figure 2
figure 2

Sample size requirements for 80% power to detect a gene–environment (G × E ) interaction for different study designs depending on the strength of the interaction. Sample sizes for case–control, case–parent trio, and case-only designs were calculated using Quanto59 (http://hydra.usc.edu/gxe), assuming an analysis by (conditional) logistic regression. For the cohort design, sample sizes are estimated using Power58 (http://dcegqa.cancer.gov/bb/tools/power), which is based on a prospective binary response model. Shown are the number of individuals required to detect a significant G × E interaction effect at α=0.01 with a power of 80%. Solid lines represent the case–control design, dotted lines the trio design, dashed line the case-only design and dotted-dashed lines the cohort design. The horizontal solid line represents the sample size required for 80% power to detect a genetic main effect using a case–control design. The interaction odds ratio was varied between 1.25 and 3 whereas the main effects of the genetic and environmental risk factors were 1.2 (a) and 1.5 (b). The disease model was defined by a recessive disease allele with frequency 0.3. The environmental risk factor had a prevalence of 30%. The baseline risk of the disease was 10%. The samples sizes to detect the genetic main effect, which were constant in the two scenarios, were 43 045, 16 196 and 19 860 in (a) for the cohort, case–control and trio design, and 7712, 3070 and 3423 in (b), respectively. For a dominant disease allele, similar relations between required sample sizes are observed for the different designs.

Note that sample size and power calculations are also possible for other study designs, for example for association studies of quantitative traits,53 categorical or continuous exposure variables54 as well as for pharmacogenetic study designs.55, 56, 57 Freely available software programs such as Power,58 Quanto59 or a Stata program by Saunders et al60 may be used if required.

Methodological challenges and perspectives

In summary, the methodological requirements for a G × E interaction study are greatly driven by the research question. We thus conclude by addressing five common caveats that need to be considered: the study aims, the conduct of a study, reporting and interpretation of results, extending inferences and clinical relevance.

First, one should distinguish between primarily exploratory (ie hypothesis-generating) or confirmatory (hypothesis testing) study aims. In our opinion, genome-wide association studies and small initial studies can only be considered exploratory. The latter will often be performed, for example because of difficult or time-consuming phenotyping, limited availability of the required biological material (eg tissue samples) and financial constraints. Both approaches are important and valid first steps in research but their exploratory nature has to be kept in mind. Therefore, such smaller studies will be valuable for generating hypotheses that should then be tested for confirmation in adequately powered, presumably larger studies. On the other hand, inadequate sample sizes lead to underpowered studies that give rise to both false-negative and false-positive findings especially at the hypothesis-generating stage. Biological relationships cannot be inferred from genetic–epidemiological studies, and further functional experiments are necessary for this.

Second, a well-designed confirmatory study of G × E interaction should be based on a justifiable a priori hypothesis of an interaction between a plausible or established gene with known function and a known environmental risk factor with some link to gene function, for which a reasonable biological interaction mechanism exists. Only prespecified (prior to data collection) hypotheses and statistical tests can be interpreted as confirmatory. Ideally, there is evidence from formal genetic studies (eg twin studies or segregation analyses) of an interaction between the exposure and genetic factors. Next, an appropriate study design (see above) must be chosen and a sufficient sample size needs to be pheno- and genotyped. Then, an adequate statistical analysis is needed (including a multiple comparison procedure for control of the type I error if more than one statistical test is conducted).

Third, reporting and interpretation of detected G × E interactions should be faithful and balanced. Reporting should center on what range of true effects would be compatible with the observed effects (using confidence intervals of effect estimates) and it should be discussed whether these could be of a clinically relevant size. By contrast, less emphasis should be on the results of significance tests (P-values) as these will be misleading if the reader is unaware of the multiple tests performed. To avoid publication bias, all test results (or at least the number of tests performed) must be reported, not only interactions that are nominally significant (eg at a 5% level). Overreporting and overinterpretation of results will lead to inconsistent and inconclusive results.2, 61 And even in case of careful descriptions, effect estimates in initial reports tend to be biased5, 6 and may vary between different populations with different allele and exposure frequencies.

Fourth, if some evidence for a G × E interaction is observed, its biological plausibility should be critically discussed and potential confounders or intermediate pathways have to be explored. Here, one has to keep in mind that conclusions dealing with a certain biological mechanism cannot be confirmed or rejected by statistical arguments based on epidemiological data alone.20 Only in light of additional lines of evidence, such as functional experiments, may the inferences toward causality be extended.

Finally, even though the potential clinical relevance or impact of a reported G × E interaction may be discussed, these implications should be evaluated in subsequent studies designed for that special purpose. At this subsequent stage, the choice of the appropriate phenotype(s) is of special importance and clinically relevant end points and disease-related phenotypes, such as myocardial infarction, need to be studied before study results are embedded in public health programs or exploited for personalized medicine and individualized lifestyle recommendations.1, 5 Note that physiological and biochemical phenotypes (endophenotypes), such as lipid levels, IgE levels and so on may be closer to the underlying gene action and may thus be more appropriate for elucidating the biological mechanism underlying an interaction. Such biomarkers are, however, at most surrogate risk factors for a disease. Clinical relevance by contrast requires that the predictive or discriminative power of the genotype for the clinically defined disease (eg death due to myocardial infarction) or treatment success (eg extended survival time) has to be sufficiently high. Predominantly, this will be the case for strong qualitative interactions.

When these challenging requirements are fulfilled, research on G × E interactions can yield valuable insights into the etiology of complex diseases. Ultimately, this knowledge may contribute to more effective strategies for prevention and treatment.