Tremendous activity in the development of methodology has now rendered the exhaustive search for pairwise genetic interactions computationally routine, but addressing the statistical problems of detecting epistasis remains a big challenge.
Most reports of epistasis influencing human complex traits that exist in the literature raise concerns regarding their validity and do not follow the same strict protocols that are in place for reporting additive effects.
There is mounting evidence against the existence of pairwise epistatic effects influencing human complex traits that are sufficiently large for detection in standard single-sample genome-wide association studies (GWASs). If epistatic effects do influence complex traits, then each interaction effect will probably be small, as is observed with additive effects.
The majority of robust additive effects are only found when GWASs are carried out using huge sample sizes and good single-nucleotide polymorphism coverage, often as a result of multistudy meta-analyses. Similar approaches are necessary if epistatic effects are also to be robustly detected, although methodology or attempts at implementation are yet to surface.
Methods have emerged for estimating the total contribution of additive effects across the whole genome; similar methods for estimating the total contribution of genetic interactions would be valuable but have not yet been developed.
Genome-wide association studies (GWASs) have become the focus of the statistical analysis of complex traits in humans, successfully shedding light on several aspects of genetic architecture and biological aetiology. Single-nucleotide polymorphisms (SNPs) are usually modelled as having additive, cumulative and independent effects on the phenotype. Although evidently a useful approach, it is often argued that this is not a realistic biological model and that epistasis (that is, the statistical interaction between SNPs) should be included. The purpose of this Review is to summarize recent directions in methodology for detecting epistasis and to discuss evidence of the role of epistasis in human complex trait variation. We also discuss the relevance of epistasis in the context of GWASs and potential hazards in the interpretation of statistical interaction terms.
This is a preview of subscription content, access via your institution
Subscribe to this journal
Receive 12 print issues and online access
$189.00 per year
only $15.75 per issue
Rent or buy this article
Prices vary by article type
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Phillips, P. C. Epistasis — the essential role of gene interactions in the structure and evolution of genetic systems. Nature Rev. Genet. 9, 855–867 (2008).
Cordell, H. J. Epistasis: what it means, what it doesn't mean, and statistical methods to detect it in humans. Hum. Mol. Genet. 11, 2463–2468 (2002).
Wang, X., Elston, R. C. & Zhu, X. The meaning of interaction. Hum. Hered. 70, 269–277 (2010).
Visscher, P. M., Hill, W. G. & Wray, N. R. Heritability in the genomics era — concepts and misconceptions. Nature Rev. Genet. 9, 255–266 (2008).
Huang, Y., Wuchty, S. & Przytycka, T. M. eQTL epistasis — challenges and computational approaches. Front. Genet. 4, 51 (2013).
McKinney, B. A. & Pajewski, N. M. Six degrees of epistasis: Statistical network models for GWAS. Front. Genet. 2, 109 (2011).
Pang, X. et al. A statistical procedure to map high-order epistasis for complex traits. Brief. Bioinform. 14, 302–314 (2013).
Ritchie, M. D. Using biological knowledge to uncover the mystery in the search for epistasis in genome-wide association studies. Ann. Hum. Genet. 75, 172–182 (2011).
Steen, K. V. Travelling the world of gene–gene interactions. Brief. Bioinform. 13, 1–19 (2012).
Zhang, Y., Jiang, B., Zhu, J. & Liu, J. S. Bayesian models for detecting epistatic interactions from genetic data. Ann. Hum. Genet. 75, 183–193 (2011).
Gyenesei, A. et al. BiForce Toolbox: powerful high-throughput computational analysis of gene–gene interactions in genome-wide association studies. Nucleic Acids Res. 40, W628–632 (2012).
Hemani, G., Theocharidis, A., Wei, W. & Haley, C. EpiGPU: exhaustive pairwise epistasis scans parallelized on consumer level graphics cards. Bioinformatics 27, 1462–1465 (2011).
Liu, Y. et al. Genome-wide interaction-based association analysis identified multiple new susceptibility loci for common diseases. PLoS Genet. 7, e1001338 (2011).
Schüpbach, T., Xenarios, I., Bergmann, S. & Kapur, K. FastEpistasis: a high performance computing solution for quantitative trait epistasis. Bioinformatics 26, 1468–1469 (2010).
Yung, L. S., Yang, C., Wan, X. & Yu, W. GBOOST: a GPU-based tool for detecting gene–gene interactions in genome-wide case control studies. Bioinformatics 27, 1309–1310 (2011).
Cordell, H. J. Detecting gene–gene interactions that underlie human diseases. Nature Rev. Genet. 10, 392–404 (2009). This is an excellent review of methods to study epistasis in GWASs of human diseases.
Ueki, M. & Cordell, H. J. Improved statistics for genome-wide interaction analysis. PLoS Genet. 8, e1002625 (2012). This is a comprehensive assessment of LD- and haplotype-based methods for genome-wide detection of epistasis.
Hu, J. K., Wang, X. & Wang, P. Testing gene–gene interactions in genome wide association studies. Genet. Epidemiol. 38, 123–134 (2014).
Kam-Thong, T. et al. EPIBLASTER-fast exhaustive two-locus epistasis detection strategy using graphical processing units. Eur. J. Hum. Genet. 19, 465–471 (2010).
Wang, Z., Wang, Y. & Tan, K. L., Wong, L. & Agrawal, D. eCEO: an efficient Cloud Epistasis cOmputing model in genome-wide association study. Bioinformatics 27, 1045–1051 (2011).
Prabhu, S. & Pe'er, I. Ultrafast genome-wide scan for SNP–SNP interactions in common complex disease. Genome Res. 22, 2230–2240 (2012).
Wan, X. et al. BOOST: a fast approach to detecting gene–gene interactions in genome-wide case–control studies. Am. J. Hum. Genet. 87, 325–340 (2010).
Gyenesei, A., Moody, J., Semple, C. A., Haley, C. S. & Wei, W.-H. High-throughput analysis of epistasis in genome-wide association studies with BiForce. Bioinformatics 28, 1957–1964 (2012).
Wei, W., Gyenesei, A., Semple, C. A. & Haley, C. S. Properties of local interactions and their potential value in complementing genome-wide association studies. PLoS ONE 8, e71203 (2013).
Gauderman, W. J. Sample size requirements for association studies of gene–gene interaction. Am. J. Epidemiol. 155, 478–484 (2002). This is an important work that investigates power and sample sizes required for studying epistasis in GWASs.
Zuk, O., Hechter, E., Sunyaev, S. R. & Lander, E. S. The mystery of missing heritability: genetic interactions create phantom heritability. Proc. Natl Acad. Sci. USA 109, 1193–1198 (2012). This paper provides an interesting theoretical exploration of how disease traits can be the sum of many lower-level pathways and how polygenic modes of inheritance may invoke high-level epistasis.
Ma, L. et al. Knowledge-driven analysis identifies a gene–gene interaction affecting high-density lipoprotein cholesterol levels in multi-ethnic populations. PLoS Genet. 8, e1002714 (2012).
Evans, D. M. et al. Interaction between ERAP1 and HLA-B27 in ankylosing spondylitis implicates peptide handling in the mechanism for HLA-B27 in disease susceptibility. Nature Genet. 43, 761–767 (2011).
Strange, A. et al. A genome-wide association study identifies new psoriasis susceptibility loci and an interaction between HLA-C and ERAP1. Nature Genet. 42, 985–990 (2010).
Carlborg, O. & Haley, C. S. Epistasis: too often neglected in complex trait studies? Nature Rev. Genet. 5, 618–625 (2004).
Evans, D. M., Marchini, J., Morris, A. P. & Cardon, L. R. Two-stage two-locus models in genome-wide association. PLoS Genet. 2, e157 (2006).
Marchini, J., Donnelly, P. & Cardon, L. R. Genome-wide strategies for detecting multiple loci that influence complex diseases. Nature Genet. 37, 413–417 (2005). This important simulation study investigates key issues in studying epistasis in GWASs.
Hoh, J. & Ott, J. Mathematical multi-locus approaches to localizing complex human trait genes. Nature Rev. Genet. 4, 701–709 (2003).
Zhao, J., Jin, L. & Xiong, M. Test for interaction between two unlinked loci. Am. J. Hum. Genet. 79, 831–845 (2006).
Haig, D. Does heritability hide in epistasis between linked SNPs? Eur. J. Hum. Genet. 19, 123 (2011). This paper presents an early suggestion of examining interactions between neighbouring SNPs.
Wellek, S. & Ziegler, A. A genotype-based approach to assessing the association between single nucleotide polymorphisms. Hum. Hered. 67, 128–139 (2009).
Yuan, Z. et al. From interaction to co-association — a fisher R-to-Z transformation-based simple statistic for real world genome-wide association study. PLoS ONE 8, e70774 (2013).
Zhang, Y. & Liu, J. S. Bayesian inference of epistatic interactions in case–control studies. Nature Genet. 39, 1167–1173 (2007).
Tang, W., Wu, X., Jiang, R. & Li, Y. Epistatic module detection for case–control studies: a Bayesian model with a Gibbs sampling strategy. PLoS Genet. 5, e1000464 (2009).
Chen, G. K. & Thomas, D. C. Using biological knowledge to discover higher order interactions in genetic association studies. Genet. Epidemiol. 34, 863–878 (2010).
Yi, N., Kaklamani, V. G. & Pasche, B. Bayesian analysis of genetic interactions in case–control studies, with application to adiponectin genes and colorectal cancer risk. Ann. Hum. Genet. 75, 90–104 (2011).
Zhang, Y. A novel bayesian graphical model for genome-wide multi-SNP association mapping. Genet. Epidemiol. 36, 36–47 (2012).
Li, J., Zhang, K. & Yi, N. A. Bayesian hierarchical model for detecting haplotype–haplotype and haplotype–environment interactions in genetic association studies. Hum. Hered. 71, 148–160 (2011).
Ferreira, T. & Marchini, J. Modeling interactions with known risk loci — a Bayesian model averaging approach. Ann. Hum. Genet. 75, 1–9 (2011).
Turner, S. D. et al. Knowledge-driven multi-locus analysis reveals gene–gene interactions influencing HDL cholesterol level in two independent EMR-linked biobanks. PLoS ONE 6, e19586 (2011).
Ackermann, M. & Beyer, A. Systematic detection of epistatic interactions based on allele pair frequencies. PLoS Genet. 8, e1002463 (2012).
Xie, M., Li, J. & Jiang, T. Detecting genome-wide epistases based on the clustering of relatively frequent items. Bioinformatics 28, 5–12 (2012).
Zhang, X., Huang, S., Zou, F. & Wang, W. TEAM: efficient two-locus epistasis tests in human genome-wide association study. Bioinformatics 26, i217–i227 (2010).
Brinza, D., Schultz, M., Tesler, G. & Bafna, V. RAPID detection of gene–gene interactions in genome-wide association studies. Bioinformatics 26, 2856–2862 (2010).
Ueki, M. & Tamiya, G. Ultrahigh-dimensional variable selection method for whole-genome gene–gene interaction analysis. BMC Bioinformatics 13, 72 (2012).
Yang, C. et al. SNPHarvester: a filtering-based approach for detecting epistatic interactions in genome-wide association studies. Bioinformatics 25, 504–511 (2009).
Shen, X., Pettersson, M., Ronnegard, L. & Carlborg, O. Inheritance beyond plain heritability: variance-controlling genes in Arabidopsis thaliana. PLoS Genet. 8, e1002839 (2012).
Ronnegard, L. & Valdar, W. Recent developments in statistical methods for detecting genetic loci affecting phenotypic variability. BMC Genet. 13, 63 (2012).
Brown, A. A. et al. Genetic interactions affecting human gene expression identified by variance association mapping. Elife 3, e01381 (2014).
Lewinger, J. P. et al. Efficient two-step testing of gene–gene interactions in genome-wide association studies. Genet. Epidemiol. 37, 440–451 (2013).
Sun, X. et al. Analysis pipeline for the epistasis search — statistical versus biological filtering. Front. Genet. 5, 106 (2014).
Fairfax, B. P. et al. Genetics of gene expression in primary immune cells identifies cell type-specific master regulators and roles of HLA alleles. Nature Genet. 44, 502–510 (2012).
Trynka, G. et al. Chromatin marks identify critical cell types for fine mapping complex trait variants. Nature Genet. 45, 124–130 (2013).
Yang, C. et al. The choice of null distributions for detecting gene–gene interactions in genome-wide association studies. BMC Bioinformatics 12 (Suppl. 1), S26 (2011).
Fang, G. et al. High-order SNP combinations associated with complex diseases: efficient discovery, statistical power and functional interactions. PLoS ONE 7, e33531 (2012).
Culverhouse, R. C. A comparison of methods sensitive to interactions with small main effects. Genet. Epidemiol. 36, 303–311 (2012).
Molinaro, A. M. et al. Power of data mining methods to detect genetic associations and interactions. Hum. Hered. 72, 85–97 (2011).
Zhu, Z. et al. Development of GMDR-GPU for gene–gene interaction analysis and its application to WTCCC GWAS data for type 2 diabetes. PLoS ONE 8, e61943 (2013).
Schwarz, D. F., König, I. R. & Ziegler, A. On safari to Random Jungle: a fast implementation of Random Forests for high-dimensional data. Bioinformatics 26, 1752–1758 (2010).
Knights, J., Yang, J., Chanda, P., Zhang, A. & Ramanathan, M. SYMPHONY, an information-theoretic method for gene–gene and gene–environment interaction analysis of disease syndromes. Heredity 110, 548–559 (2013).
Shervais, S., Kramer, P. L., Westaway, S. K., Cox, N. J. & Zwick, M. Reconstructability analysis as a tool for identifying gene–gene interactions in studies of human diseases. Stat. Appl. Genet. Mol. Biol. 9, article18 (2010).
Zwick, M. Reconstructability analysis of epistasis. Ann. Hum. Genet. 75, 157–171 (2011).
Lishout, F. V. et al. An efficient algorithm to perform multiple testing in epistasis screening. BMC Bioinformatics 14, 138 (2013).
Mahachie John, J. M., Van Lishout, F. & Van Steen, K. Model-based multifactor dimensionality reduction to detect epistasis for quantitative traits in the presence of error-free and noisy data. Eur. J. Hum. Genet. 19, 696–703 (2011).
Gui, J. et al. A novel survival multifactor dimensionality reduction method for detecting gene–gene interactions with application to bladder cancer prognosis. Hum. Genet. 129, 101–110 (2011).
Lee, S., Kwon, M. S., Oh, J. M. & Park, T. Gene–gene interaction analysis for the survival phenotype based on the Cox model. Bioinformatics 28, i582–i588 (2012).
Yoshida, M. & Koike, A. SNPInterForest: a new method for detecting epistatic interactions. BMC Bioinformatics 12, 469 (2011).
Li, J., Horstman, B. & Chen, Y. Detecting epistatic effects in association studies at a genomic level based on an ensemble approach. Bioinformatics 27, i222–i229 (2011).
Lu, Q., Wei, C., Ye, C., Li, M. & Elston, R. C. A likelihood ratio-based Mann–Whitney approach finds novel replicable joint gene action for type 2 diabetes. Genet. Epidemiol. 36, 583–593 (2012).
De Lobel, L. et al. A screening methodology based on Random Forests to improve the detection of gene–gene interactions. Eur. J. Hum. Genet. 18, 1127–1132 (2010).
Lin, H. Y. et al. TRM: a powerful two-stage machine learning approach for identifying SNP–SNP interactions. Ann. Hum. Genet. 76, 53–62 (2012).
Wang, Y., Liu, X., Robbins, K. & Rekaya, R. AntEpiSeeker: detecting epistatic interactions for case–control studies using a two-stage ant colony optimization algorithm. BMC Res. Notes 3, 117 (2010).
Hu, T. et al. An information-gain approach to detecting three-way epistatic interactions in genetic association studies. J. Am. Med. Inform. Assoc. 20, 630–636 (2013).
Ma, L., Clark, A. G. & Keinan, A. Gene-based testing of interactions in association studies of quantitative traits. PLoS Genet. 9, e1003321 (2013).
Oh, S. et al. A novel method to identify high order gene–gene interactions in genome-wide association studies: gene-based MDR. BMC Bioinformatics 13 (Suppl. 9), S5 (2012).
Wu, M. C. et al. Powerful SNP-set analysis for case–control genome-wide association studies. Am. J. Hum. Genet. 86, 929–942 (2010).
Wu, C. & Cui, Y. Boosting signals in gene-based association studies via efficient SNP selection. Br. Bioinform. 15, 279–291 (2014).
He, S. & Wu, Z. Gene-based Higher Criticism methods for large-scale exonic single-nucleotide polymorphism data. BMC Proceedings. 5 (Suppl. 9), S65 (2011).
Rajapakse, I., Perlman, M. D., Martin, P. J., Hansen, J. A. & Kooperberg, C. Multivariate detection of gene–gene interactions. Genet. Epidemiol. 36, 622–630 (2012).
Zhang, X. et al. A PLSPM-based test statistic for detecting gene–gene co-association in genome-wide association study with case–control design. PLoS ONE 8, e62129 (2013).
Davis, N. A., Crowe, J. E. Jr, Pajewski, N. M. & McKinney, B. A. Surfing a genetic association interaction network to identify modulators of antibody response to smallpox vaccine. Genes Immun. 11, 630–636 (2010).
Carter, G. W., Hays, M., Sherman, A. & Galitski, T. Use of pleiotropy to model genetic interactions in a population. PLoS Genet. 8, e1003010 (2012).
Snitkin, E. S. & Segre, D. Epistatic interaction maps relative to multiple metabolic phenotypes. PLoS Genet. 7, e1001294 (2011).
Li, F. et al. A powerful latent variable method for detecting and characterizing gene-based gene–gene interaction on multiple quantitative traits. BMC Genet. 14, 89 (2013).
Lehner, B. Molecular mechanisms of epistasis within and between genes. Trends Genet. 27, 323–331 (2011). This is an overview of possible molecular mechanisms that can cause epistasis and links between functional and statistical epistasis.
Becker, J., Wendland, J. R., Haenisch, B., Nöthen, M. M. & Schumacher, J. A systematic eQTL study of cis–trans epistasis in 210 HapMap individuals. Eur. J. Hum. Genet. 97–101 (2011).
Zhang, W., Zhu, J., Schadt, E. E. & Liu, J. S. A Bayesian partition method for detecting pleiotropic and epistatic eQTL modules. PLoS Comput. Biol. 6, e1000642 (2010).
Lee, S. & Xing, E. P. Leveraging input and output structures for joint mapping of epistatic and marginal eQTLs. Bioinformatics 28, i137–146 (2012).
Holzinger, E. R. et al. Initialization parameter sweep in ATHENA: optimizing neural networks for detecting gene–gene interactions in the presence of small main effects. Genet. Evol. Comput. Conf. 12, 203–210 (2010).
Wise, A. L., Gyi, L. & Manolio, T. A. eXclusion: toward integrating the X chromosome in genome-wide association analyses. Am. J. Hum. Genet. 92, 643–647 (2013).
Chen, C. C. et al. Methods for identifying SNP interactions: a review on variations of Logic Regression, Random Forest and Bayesian logistic regression. IEEE/ACM Trans. Comput. Biol. Bioinform. 8, 1580–1591 (2011).
Garcia-Magarinos, M., Lopez-de-Ullibarri, I., Cao, R. & Salas, A. Evaluating the ability of tree-based methods and logistic regression for the detection of SNP–SNP interaction. Ann. Hum. Genet. 73, 360–369 (2009).
Kapur, K., Schupbach, T., Xenarios, I., Kutalik, Z. & Bergmann, S. Comparison of strategies to detect epistasis from eQTL data. PLoS ONE 6, e28415 (2011).
Shang, J. et al. Performance analysis of novel methods for detecting epistasis. BMC Bioinformatics 12, 475 (2011).
Winham, S., Wang, C. & Motsinger-Reif, A. A. A comparison of multifactor dimensionality reduction and L1-penalized regression to identify gene–gene interactions in genetic association studies. Stat. Appl. Genet. Mol. Biol. 10, Article 4 (2011).
An, P. et al. The challenge of detecting epistasis (G × G interactions): genetic analysis workshop 16. Genet. Epidemiol. 33 (Suppl. 1), S58–67 (2009).
Hemani, G., Knott, S. & Haley, C. An evolutionary perspective on epistasis and the missing heritability. PLoS Genet. 9, e1003295 (2013).
Lippert, C. et al. An exhaustive epistatic SNP association analysis on expanded Wellcome Trust data. Sci. Rep. 3, 1099 (2013).
Schadt, E. et al. Genetics of gene expression surveyed in maize, mouse and man. Nature 422, 297–302 (2003).
Powell, J. E. et al. The Brisbane Systems Genetics Study: genetical genomics meets complex trait genetics. PLoS ONE 7, e35430 (2012).
Hemani, G. et al. Detection and replication of epistasis influencing transcription in humans. Nature 10, 249–253 (2014).
Combarros, O., Cortina-Borja, M., Smith, A. D. & Lehmann, D. J. Epistasis in sporadic Alzheimer's disease. Neurobiol. Aging 30, 1333–1349 (2009).
Kolsch, H. et al. Interaction of insulin and PPAR-α genes in Alzheimer's disease: the Epistasis Project. J. Neural Transm. 119, 473–479 (2012).
Bullock, J. M. et al. Discovery by the Epistasis Project of an epistatic interaction between the GSTM3 gene and the HHEX/IDE/KIF11 locus in the risk of Alzheimer's disease. Neurobiol. Aging 34, 1309.e1–1309.e7 (2013).
Combarros, O. et al. The dopamine β-hydroxylase -1021C/T polymorphism is associated with the risk of Alzheimer's disease in the Epistasis Project. BMC Med. Genet. 11, 162 (2010).
Combarros, O. et al. Replication by the Epistasis Project of the interaction between the genes for IL-6 and IL-10 in the risk of Alzheimer's disease. J. Neuroinflammation 6, 22 (2009).
Rhinn, H. et al. Integrative genomics identifies APOE ε4 effectors in Alzheimer's disease. Nature 500, 45–50 (2013). This paper presents a good example of how knowledge of protein–protein interactions can lead to the identification of statistical interactions between genetic variants.
Gregersen, J. W. et al. Functional epistasis on a common MHC haplotype associated with multiple sclerosis. Nature 443, 574–577 (2006).
Lincoln, M. R. et al. Epistasis among HLA-DRB1, HLA-DQA1, and HLA-DQB1 loci determines multiple sclerosis susceptibility. Proc. Natl Acad. Sci. 106, 7542–7547 (2009).
Castillejo-López, C. et al. Genetic and physical interaction of the B-cell systemic lupus erythematosus-associated genes BANK1 and BLK. Ann. Rheum. Dis. 71, 136–142 (2012).
Dempster, E. R. & Lerner, I. M. Heritability of threshold characters. Genetics 35, 212–236 (1950). This is a clear and insightful paper that explains the concepts behind the liability scale and observed scale in binary phenotypes.
Lucas, G. et al. Hypothesis-based analysis of gene–gene interactions and risk of myocardial infarction. PLoS ONE 7, e41730 (2012).
Bell, J. T. et al. Genome-wide association scan allowing for epistasis in type 2 diabetes. Ann. Hum. Genet. 75, 10–19 (2011).
Wei, W. H. et al. Genome-wide analysis of epistasis in body mass index using multiple human populations. Eur. J. Hum. Genet. 20, 857–862 (2012).
Wei, W. et al. Characterisation of genome-wide association epistasis signals for serum uric acid in human population isolates. PLoS ONE 6, e23836 (2011).
Visscher, P. M., Brown, M. a, McCarthy, M. I. & Yang, J. Five years of GWAS discovery. Am. J. Hum. Genet. 90, 7–24 (2012).
Hill, L. D. et al. Epistasis between COMT and MTHFR in maternal–fetal dyads increases risk for preeclampsia. PLoS ONE 6, e16681 (2011).
Génin, E. et al. Epistatic interaction between BANK1 and BLK in rheumatoid arthritis: results from a large trans-ethnic meta-analysis. PLoS ONE 8, e61044 (2013).
Verhoeven, K. J. F., Casella, G. & McIntyre, L. M. Epistasis: obstacle or advantage for mapping complex traits? PLoS ONE 5, e12264 (2010).
Hill, W. G., Goddard, M. E. & Visscher, P. M. Data and theory point to mainly additive genetic variance for complex traits. PLoS Genet. 4, e1000008 (2008). This study explores the apparent dichotomy between evidence for functional epistasis and lack of evidence for statistical epistasis; it points out that, with allele frequency distributions typical of natural populations, non-additive gene action typically generates little epistatic variance.
Gjuvsland, a B., Vik, J. O., Woolliams, J. a & Omholt, S. W. Order-preserving principles underlying genotype–phenotype maps ensure high additive proportions of genetic variance. J. Evol. Biol. 24, 2269–2279 (2011).
Mäki-Tanila, A. & Hill, W. Influence of gene interaction on complex trait variation with multi-locus models. Genetics http://dx.doi.org/10.1534/genetics.114.165282 (2014).
Falconer, D. S. & Mackay, T. F. C. Introduction to Quantitative Genetics (Longman, 1996).
Stringer, S., Derks, E., Kahn, R., Hill, W. & Wray, N. Assumptions and properties of limiting pathway models for analysis of epistasis in complex traits. PLoS ONE 8, 1–9 (2013).
Evans, D. M., Gillespie, N. a & Martin, N. G. Biometrical genetics. Biol. Psychol. 61, 33–51 (2002).
Silventoinen, K. et al. Heritability of adult body height: a comparative study of twin cohorts in eight countries. Twin Res. 6, 399–408 (2003).
Elks, C. E. et al. Variability in the heritability of body mass index: a systematic review and meta-regression. Front. Endocrinol. 3, 29 (2012).
Hu, X. et al. Integrating autoimmune risk loci with gene-expression data identifies specific pathogenic immune cell subsets. Am. J. Hum. Genet. 89, 496–506 (2011).
Wray, N. R., Yang, J., Goddard, M. E. & Visscher, P. M. The genetic interpretation of area under the ROC curve in genomic profiling. PLoS Genet. 6, e1000864 (2010).
Daetwyler, H. D., Villanueva, B. & Woolliams, J. A. Accuracy of predicting the genetic risk of disease using a genome-wide approach. PLoS ONE 3, e3395 (2008).
Quon, G., Lippert, C., Heckerman, D. & Listgarten, J. Patterns of methylation heritability in a genome-wide analysis of four brain regions. Nucleic Acids Res. 41, 2095–2104 (2013).
Gervin, K. et al. Extensive variation and low heritability of DNA methylation identified in a twin study. Genome Res. 21, 1813–1821 (2011).
Purcell, S. M. et al. Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature 460, 748–752 (2009).
Wray, N. R. et al. Pitfalls of predicting complex traits from SNPs. Nature Rev. Genet. 14, 507–515 (2013). This is essential reading for those interested in prediction of complex disease from genetic signals — some of the pitfalls may be even more dangerous when using epistatic signals.
Zhou, X., Carbonetto, P. & Stephens, M. Polygenic modeling with bayesian sparse linear mixed models. PLoS Genet. 9, e1003264 (2013).
Becker, T., Herold, C., Meesters, C., Mattheisen, M. & Baur, M. P. Significance levels in genome-wide interaction analysis (GWIA). Ann. Hum. Genet. 75, 29–35 (2011).
Carlborg, O., Jacobsson, L., Ahgren, P., Siegel, P. & Andersson, L. Epistasis and the release of genetic variation during long-term selection. Nature Genet. 38, 418–420 (2006).
Álvarez-Castro, J. M., Le Rouzic, A., Andersson, L., Siegel, P. B. & Carlborg, Ö. Modelling of genetic interactions improves prediction of hybrid patterns — a case study in domestic fowl. Genet. Res. 94, 255–266 (2012).
Wang, D. et al. Prediction of genetic values of quantitative traits with epistatic effects in plant breeding populations. Heredity 109, 313–319 (2012).
Dudley, J. W. & Johnson, G. R. Epistatic models improve prediction of performance in corn. Crop Sci. 49, 763–770 (2009).
Hu, Z. et al. Genomic value prediction for quantitative traits under the epistatic model. BMC Genet. 12, 15 (2011).
González-Camacho, J. M. et al. Genome-enabled prediction of genetic values using radial basis function neural networks. Theor. Appl. Genet. 125, 759–771 (2012).
Buckler, E. S. et al. The genetic architecture of maize flowering time. Science 325, 714–718 (2009).
Mackay, T. F. C. Epistasis and quantitative traits: using model organisms to study gene–gene interactions. Nature Rev. Genet. 15, 22–33 (2014). This review argues that detection of epistasis is often more tractable in model organisms, but differences in populations and genetic architecture (especially allele frequency and effect size) make it difficult to extrapolate conclusions on the importance of epistasis to human populations.
Houle, D., Pélabon, C., Wagner, G. & Hansen, T. Measurement and meaning in biology. Q. Rev. Biol. 86, 3–34 (2011). This is an interesting discussion on the science of measuring things and is informative when thinking about scale effects that may underlie epistatic signals.
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
Dinu, I. et al. SNP–SNP interactions discovered by logic regression explain Crohn's disease genetics. PLoS ONE 7, e43035 (2012).
Piriyapongsa, J. et al. iLOCi: a SNP interaction prioritization technique for detecting epistasis in genome-wide association studies. BMC Genomics 13 (Suppl. 7), S2 (2012).
Hu, X. et al. SHEsisEpi, a GPU-enhanced genome-wide SNP–SNP interaction scanning algorithm, efficiently reveals the risk genetic epistasis in bipolar disorder. Cell Res. 20, 854–857 (2010).
Wu, X. et al. A novel statistic for genome-wide interaction analysis. PLoS Genet. 6, e1001131 (2010).
Emily, M. IndOR: a new statistical procedure to test for SNP–SNP epistasis in genome-wide association studies. Stat. Med. 31, 2359–2373 (2012).
Li, M., Romero, R., Fu, W. J. & Cui, Y. Mapping haplotype–haplotype interactions with adaptive LASSO. BMC Genet. 11, 79 (2010).
Yi, N., Liu, N., Zhi, D. & Li, J. Hierarchical generalized linear models for multiple groups of rare and common variants: jointly estimating group and individual-variant effects. PLoS Genet. 7, e1002382 (2011).
Winham, S. J. & Motsinger-Reif, A. A. An R package implementation of multifactor dimensionality reduction. BioData Min. 4, 24 (2011).
Yang, P., Ho, J. W., Yang, Y. H. & Zhou, B. B. Gene–gene interaction filtering with ensemble of filters. BMC Bioinformatics 12, (Suppl. 1), S10 (2011).
Wan, X. et al. Predictive rule inference for epistatic interaction detection in genome-wide association studies. Bioinformatics 26, 30–37 (2010).
Winham, S. J. et al. SNP interaction detection with Random Forests in high-dimensional genetic data. BMC Bioinformatics 13, 164 (2012).
The authors are grateful to three anonymous reviewers for help in improving the manuscript. W.-H.W. acknowledges financial support from the Higher Education Funding Council for England (HEFCE) and the Medical Research Council. G.H. is grateful for support from the Medical Research Council (MC_UU_12013/1-9) and by the US National Institutes of Health (GM057091). C.S.H. is grateful for financial support from the UK Medical Research Council and the Biotechnology and Biological Sciences Research Council.
The authors declare no competing financial interests.
- Complex traits
Traits for which variation between individuals is controlled by several or many genes and different environmental effects, potentially with interactions between these different effects.
- Mutational target size
The fraction of the genome in which new mutations can potentially cause variation for a trait. For most complex traits this is large, thus suggesting that many loci can influence trait variation.
- Causal variants
Genetic variants that directly modify a phenotype and/or cause a change of disease risk. Owing to the limited amount of variation interrogated by single-nucleotide polymorphism (SNP) genotyping microarrays, SNPs in genome-wide association studies typically merely tag the causal region rather than being the causal variants themselves.
- Genetic architecture
The complete description of the genetic factors influencing trait variation, such as the number of genetic loci, their effects, allele frequencies, actions and interactions.
Statistical interactions between loci in their effect on a trait such that the impact of a particular single-locus genotype depends on the genotype at other loci.
- Narrow-sense heritability
(h2). The proportion of variation due to the additive effects of genes.
- Broad-sense heritability
(H2). The proportion of variation due to all genetic effects (that is, both additive and non-additive, including dominance and epistasis).
- Exhaustive search
A search of all possible pairwise combinations of loci for evidence of epistatic interactions.
- Bonferroni correction
The simplest and perhaps most conservative method to control family error rate (α) by correcting for the number of independent hypothesis tests (n) when n is large; that is, the corrected threshold Pcorrected = α/n.
An analysis in which no assumption is made about the loci involved in epistasis or their effects and so all possible pairs of single-nucleotide polymorphisms are tested (that is, an exhaustive search).
An analysis that limits the combinations of loci tested for epistasis according to some prior hypothesis (for example, only loci with a marginal effect or loci involved in a particular biological pathway should be tested).
- Quantitative traits
Phenotypes that vary continuously (for example, height), in contrast to qualitative traits in which phenotypes are discrete (for example, diseased or healthy).
- Saturated and reduced models
There are nine joint genotypes for a pair of single-nucleotide polymorphisms (SNPs) each with three genotypes (for example, AA, Aa and aa). These can be modelled in full using nine parameters: one as the baseline (for example, aa/aa), two for each SNP (for example, AA/aa and Aa/aa) and four for interactions (for example, AA/Aa, AA/AA, Aa/Aa, Aa/AA). The saturated model fits all the nine parameters, whereas the reduced model fits the first five parameters and excludes the four interaction parameters.
- Hardy–Weinberg equilibrium
(HWE). A principle stating that allele and genotype frequencies of variants in a population will remain constant from one generation to the next in the absence of evolutionary disturbing factors such as mutation and genetic drift.
- Marginal effects
(Also known as main effects). The average effect of a locus across all other loci and environmental effects.
- Linkage disequilibrium
(LD). The nonrandom association of alleles of two or more loci in a population owing to limited recombination. LD is often used to measure the relationship of genetic markers of the loci: a high LD means the markers are closely related (that is, co-occurring) so the genotype at one marker is predictive of the genotype at another.
A combination of alleles (DNA sequences) inherited from a single parent. A haplotype can be within one locus or across multiple loci, with or without physical coupling on the DNA strand.
- Linkage phase
(Also known as gametic phase). The information of combinations of DNA alleles in a diploid individual inherited from the mother or father.
- Polygenic architecture
A trait genetic architecture under which many genes of small effect contribute to trait variation.
Variables that may confound the outcome variable of a statistical model, for example, age is a covariate of human height.
- Bayes' theorem
A probability theory by Thomas Bayes to calculate conditional probabilities based on prior distributions of parameters in a model and the observed experimental data.
- Variance heterogeneity
Differnce in variance of a quantitative trait between the three possible genotypes of a biallelic single-nucleotide polymorphism in the presence of genetic interactions; it can therefore be used to screen for potential interacting SNPs.
- Publication bias
A bias that arises owing to only certain types of results (for example, those that successfully reject the null hypothesis) being much more likely to be published than others, leading to a disproportionate representation in the literature.
- Large P small N problem
A statistical challenge to estimate a large number of parameters based on a small number of samples.
- Multifactor dimensionality reduction
A data-mining algorithm that can reduce a high-dimensional multilocus model of multifactorial classes (that is, single-nucleotide polymorphism genotype combinations) into a one-dimensional model of one variable of either high-risk (potential interacting) or low-risk classes based on the ratio of cases and controls in each class. The algorithm uses cross-validation iteratively to define the best classification.
- Tree-based methods
Model-free or non-parametric machine-learning approaches for regression and classification analyses by recursive partitioning of variables into tree structures. Popular applications in epistasis studies include random forest, random jungle, classification and regression trees.
- Entropy-based methods
Entropy is a key measure of uncertainty associated with a random variable in information theory. Entropy-based methods examine the information entropy difference between different models with and without interactions to detect epistasis.
Statistical inference of unobserved single-nucleotide polymorphism (SNP) genotypes based on a reference panel of known haplotypes in a population (for example, the 1000 Genomes Project). Imputation can greatly narrow down the distance between SNPs and causal variants, and thus increase the power of detection of associations.
- Pleiotropic epistasis
Statistical interaction signals shared in multiple traits.
- Expression quantitative trait locus
(eQTL). A locus that controls variation in expression of a particular gene. An eQTL may lie adjacent to the gene being controlled (cis-acting control) or some distance away (trans-acting control).
- Wellcome Trust Case–Control Consortium
(WTCCC). One of the first large collaborative genome-wide association studies that included eight disease traits. This study has become a role model for subsequent studies, and the data set has been subjected to additional analyses, including for epistasis.
Heritable traits that are genetically correlated with disease traits. They are often traits (such as the level of a metabolite or transcript) that can be measured in all individuals (both diseased and healthy) and that can potentially provide a predictor of disease status.
- Observed scale
Measurement of a binary phenotype in terms of whether the participant exhibits the phenotype or not.
- Liability scale
An unobserved underlying risk of a binary phenotype or disease that is measured on a continuous scale and that is likely to be influenced by many genetic and environmental factors.
- Binary phenotypes
Disease traits that have two major states on the observed scale: diseased or healthy. They may nonetheless be complex traits in which transition to the disease state is influenced by continuous variation on an underlying liability scale for disease that is controlled by many genetic loci and environmental effects.
About this article
Cite this article
Wei, WH., Hemani, G. & Haley, C. Detecting epistasis in human complex traits. Nat Rev Genet 15, 722–733 (2014). https://doi.org/10.1038/nrg3747
This article is cited by
BioData Mining (2023)
Nature Reviews Genetics (2023)
Nature Communications (2023)
Scientific Reports (2023)
Genetics Selection Evolution (2022)