Genome-wide association (GWA) studies have typically focused on the analysis of single markers, which often lacks the power to uncover the relatively small effect sizes conferred by most genetic variants. Recently, pathway-based approaches have been developed, which use prior biological knowledge on gene function to facilitate more powerful analysis of GWA study data sets. These approaches typically examine whether a group of related genes in the same functional pathway are jointly associated with a trait of interest. Here we review the development of pathway-based approaches for GWA studies, discuss their practical use and caveats, and suggest that pathway-based approaches may also be useful for future GWA studies with sequencing data.
This article provides a background introduction to pathway-based approaches for analyzing genome-wide association (GWA) studies. An example is shown to illustrate that many genes in a susceptibility pathway may show evidence of association, although not genome-wide significance, in any given GWA study.
A brief overview of published studies that use pathway approaches for interpreting data from GWA studies is given.
A summary and classification of the currently available pathway approaches is provided. The differences in their statistical approaches and analytical procedures are described.
A discussion of the challenges and pitfalls for using pathway approaches for analyzing GWA studies is then provided.
An outline of the future research directions that could further mine information from existing GWA study data sets is given. The extension of pathway approaches to next-generation sequencing data is also discussed.
Subscribe to Journal
Get full journal access for 1 year
only $22.08 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Manolio, T. A., Brooks, L. D. & Collins, F. S. A HapMap harvest of insights into the genetics of common disease. J. Clin. Invest. 118, 1590–1605 (2008).
Li, M., Wang, K., Grant, S. F., Hakonarson, H. & Li, C. ATOM: a powerful gene-based association test by combining optimally weighted markers. Bioinformatics 25, 497–503 (2009).
Gauderman, W. J., Murcray, C., Gilliland, F. & Conti, D. V. Testing association between disease and multiple SNPs in a candidate gene. Genet. Epidemiol. 31, 383–395 (2007).
Wang, T. & Elston, R. C. Improved power by use of a weighted score test for linkage disequilibrium mapping. Am. J. Hum. Genet. 80, 353–360 (2007).
Wu, M. C. et al. Powerful SNP-set analysis for case–control genome-wide association studies. Am. J. Hum. Genet. 86, 929–942 (2010).
Kwee, L. C., Liu, D., Lin, X., Ghosh, D. & Epstein, M. P. A powerful and flexible multilocus association test for quantitative traits. Am. J. Hum. Genet. 82, 386–397 (2008).
Wang, K. & Abbott, D. A principal components regression approach to multilocus genetic association studies. Genet. Epidemiol. 32, 108–118 (2008).
Liu, J. Z. et al. A versatile gene-based test for genome-wide association studies. Am. J. Hum. Genet. 87, 139–145 (2010).
Li, Y., Willer, C., Sanna, S. & Abecasis, G. Genotype imputation. Annu . Rev. Genomics Hum. Genet. 10, 387–406 (2009).
Marchini, J. & Howie, B. Genotype imputation for genome-wide association studies. Nature Rev. Genet. 11, 499–511 (2010).
Roeder, K., Bacanu, S. A., Wasserman, L. & Devlin, B. Using linkage genome scans to improve power of association in genome scans. Am. J. Hum. Genet. 78, 243–252 (2006).
Wang, K., Li, M. & Bucan, M. Pathway-based approaches for analysis of genomewide association studies. Am. J. Hum. Genet. 81, 1278–1283 (2007). This is one of the first studies to propose the use of pathway information in GWA studies. Borrowing ideas from the gene expression microarray field, the authors adapted a GSEA approach for pathway analysis and demonstrated its use in several GWA studies.
Schadt, E. E. Molecular networks as sensors and drivers of common human diseases. Nature 461, 218–223 (2009).
Mootha, V. K. et al. PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nature Genet. 34, 267–273 (2003).
Subramanian, A., Kuehn, H., Gould, J., Tamayo, P. & Mesirov, J. P. GSEA-P: a desktop application for Gene Set Enrichment Analysis. Bioinformatics 23, 3251–3253 (2007).
Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acac. Sci. USA 102, 15545–15550 (2005). The authors proposed the GSEA approach for analysis of expression microarray data. This approach has been modified in many subsequent studies to perform pathway-based analysis on both expression data and GWA study data.
Song, S. & Black, M. A. Microarray-based gene set analysis: a comparison of current methods. BMC Bioinformatics 9, 502 (2008).
Hedegaard, J. et al. Methods for interpreting lists of affected genes obtained in a DNA microarray experiment. BMC Proc. 3 (Suppl. 4), 5 (2009).
Dong, C. TH17 cells in development: an updated view of their molecular identity and genetic programming. Nature Rev. Immunol. 8, 337–348 (2008).
Abraham, C. & Cho, J. H. IL-23 and autoimmunity: new insights into the pathogenesis of inflammatory bowel disease. Annu. Rev. Med. 60, 97–110 (2009).
Yoshida, H., Nakaya, M. & Miyazaki, Y. Interleukin 27: a double-edged sword for offense and defense. J. Leukoc. Biol. 86, 1295–1303 (2009).
Abraham, C. & Cho, J. Interleukin-23/TH17 pathways and inflammatory bowel disease. Inflamm. Bowel Dis. 15, 1090–1100 (2009).
Barrett, J. C. et al. Genome-wide association defines more than 30 distinct susceptibility loci for Crohn's disease. Nature Genet. 40, 955–962 (2008).
Glas, J. et al. Evidence for STAT4 as a common autoimmune gene: rs7574865 is associated with colonic Crohn's disease and early disease onset. PLoS ONE 5, e10373 (2010).
Martinez, A. et al. Association of the STAT4 gene with increased susceptibility for some immune-mediated diseases. Arthritis Rheum. 58, 2598–2602 (2008).
Zhernakova, A. et al. Genetic analysis of innate immunity in Crohn's disease and ulcerative colitis identifies two susceptibility loci harboring CARD9 and IL18RAP. Am. J. Hum. Genet. 82, 1202–1210 (2008).
Leach, S. T. et al. Local and systemic interleukin-18 and interleukin-18-binding protein in children with inflammatory bowel disease. Inflamm. Bowel Dis. 14, 68–74 (2008).
Wang, K. et al. Comparative genetic analysis of inflammatory bowel disease and type 1 diabetes implicates multiple loci with opposite effects. Hum. Mol. Genet. 19, 2059–2067 (2010).
Sato, K. et al. Strong evidence of a combination polymorphism of the tyrosine kinase 2 gene and the signal transducer and activator of transcription 3 gene as a DNA-based biomarker for susceptibility to Crohn's disease in the Japanese population. J. Clin. Immunol. 29, 815–825 (2009).
Klein, R. J. et al. Complement factor h polymorphism in age-related macular degeneration. Science 308, 385–389 (2005).
Edwards, A. O. et al. Complement factor H polymorphism and age-related macular degeneration. Science 308, 421–424 (2005).
Haines, J. L. et al. Complement factor H variant increases the risk of age-related macular degeneration. Science 308, 419–421 (2005).
Dinu, V., Miller, P. L. & Zhao, H. Evidence for association between multiple complement pathway genes and AMD. Genet. Epidemiol. 31, 224–237 (2007).
Ng, T. K. et al. Multiple gene polymorphisms in the complement factor H gene are associated with exudative age-related macular degeneration in chinese. Invest. Ophthalmol. Vis. Sci. 49, 3312–3317 (2008).
Lesnick, T. G. et al. A genomic pathway approach to a complex disease: axon guidance and Parkinson disease. PLoS Genet. 3, e98 (2007).
Lesnick, T. G. et al. Beyond Parkinson disease: amyotrophic lateral sclerosis and the axon guidance pathway. PLoS ONE 3, e1449 (2008).
O'Dushlaine, C. et al. Molecular pathways involved in neuronal cell adhesion and membrane scaffolding contribute to schizophrenia and bipolar disorder susceptibility. Mol. Psychiatry 16 Feb 2010 (doi:10.1038/mp.2010.7).
Wang, K. et al. Common genetic variants on 5p14.1 associate with autism spectrum disorders. Nature 459, 528–533 (2009).
Askland, K., Read, C. & Moore, J. Pathways-based analyses of whole-genome association study data in bipolar disorder reveal genes mediating ion channel activity and synaptic neurotransmission. Hum. Genet. 125, 63–79 (2009).
Holmans, P. et al. Gene ontology analysis of GWA study data sets provides insights into the biology of bipolar disorder. Am. J. Hum. Genet. 85, 13–24 (2009).
Wang, K. et al. Diverse genome-wide association studies associate the IL12/IL23 pathway with Crohn disease. Am. J. Hum. Genet. 84, 399–405 (2009). The authors demonstrated a successful example in which pathway-based association approaches can identify a known disease susceptibility pathway and reveal additional susceptibility genes. Furthermore, they showed that pathway association can be replicated between different genotyping platforms or different ethnicity groups.
Eleftherohorinou, H. et al. Pathway analysis of GWAS provides new insights into genetic susceptibility to 3 inflammatory diseases. PLoS ONE 4, e8068 (2009).
Tintle, N. L., Borchers, B., Brown, M. & Bekmetjev, A. Comparing gene set analysis methods on single-nucleotide polymorphism data from Genetic Analysis Workshop 16. BMC Proc. 3 (Suppl. 7), 96 (2009).
Ballard, D. H. et al. A pathway analysis applied to Genetic Analysis Workshop 16 genome-wide rheumatoid arthritis data. BMC Proc. 3 (Suppl. 7), 91 (2009).
Beyene, J. et al. Pathway-based analysis of a genome-wide case–control association study of rheumatoid arthritis. BMC Proc. 3 (Suppl. 7), 128 (2009).
Sohns, M., Rosenberger, A. & Bickeboller, H. Integration of a priori gene set information into genome-wide association studies. BMC Proc. 3 (Suppl. 7), 95 (2009).
Lebrec, J. J., Huizinga, T. W., Toes, R. E., Houwing-Duistermaat, J. J. & van Houwelingen, H. C. Integration of gene ontology pathways with North American Rheumatoid Arthritis Consortium genome-wide association data via linear modeling. BMC Proc. 3 (Suppl. 7), 94 (2009).
Torkamani, A., Topol, E. J. & Schork, N. J. Pathway analysis of seven common diseases assessed by genome-wide association. Genomics 92, 265–272 (2008).
Chen, L. S. et al. Insights into colon cancer etiology via a regularized approach to gene set analysis of GWAS data. Am. J. Hum. Genet. 86, 860–871 (2010). The authors proposed a strategy that uses representative eigenSNPs for each gene to assess their joint association with disease risk. This approach compares favourably against other approaches that examine only the most significant SNP in each gene or SNPs passing a certain p -value threshold.
Zhang, L. et al. Pathway-based genome-wide association analysis identified the importance of regulation-of-autophagy pathway for ultradistal radius BMD. J. Bone Miner. Res. 25, 1572–1580 (2010).
Peng, G. et al. Gene and pathway-based second-wave analysis of genome-wide association studies. Eur. J. Hum. Genet. 18, 111–117 (2010).
Chen, Y. et al. Pathway-based genome-wide association analysis identified the importance of EphrinA–EphR pathway for femoral neck bone geometry. Bone 46, 129–136 (2010).
Lambert, J. C. et al. Implication of the immune system in Alzheimer's disease: evidence from genome-wide pathway analysis. J. Alzheimers Dis. 20, 1107–1118 (2010).
Joslyn, G., Ravindranathan, A., Brush, G., Schuckit, M. & White, R. L. Human variation in alcohol response is influenced by variation in neuronal signaling genes. Alcohol. Clin. Exp. Res. 34, 800–812 (2010).
Ballard, D., Abraham, C., Cho, J. & Zhao, H. Pathway analysis comparison using Crohn's disease genome wide association studies. BMC Med. Genomics 3, 25 (2010).
Yu, K. et al. Pathway analysis by adaptive combination of P-values. Genet. Epidemiol. 33, 700–709 (2009).
Chen, L. et al. Prioritizing risk pathways: a novel association approach to searching for disease pathways fusing SNPs and pathways. Bioinformatics 25, 237–242 (2009).
O'Dushlaine, C. et al. The SNP ratio test: pathway analysis of genome-wide association datasets. Bioinformatics 25, 2762–2763 (2009).
Chai, H. S. et al. GLOSSI: a method to assess the association of genetic loci-sets with complex diseases. BMC Bioinformatics 10, 102 (2009).
Chasman, D. I. On the utility of gene set methods in genomewide association studies of quantitative traits. Genet. Epidemiol. 32, 658–668 (2008).
De la Cruz, O., Wen, X., Ke, B., Song, M. & Nicolae, D. L. Gene, region and pathway level analyses in whole-genome studies. Genet. Epidemiol. 34, 222–231 (2010).
Zhang, K., Cui, S., Chang, S., Zhang, L. & Wang, J. i-GSEA4GWAS: a web server for identification of pathways/gene sets associated with traits by applying an improved gene set enrichment analysis to genome-wide association study. Nucleic Acids Res. 38 (Suppl. 2), W90–W95 (2010).
Schwender, H., Ruczinski, I. & Ickstadt, K. Testing SNPs and sets of SNPs for importance in association studies. Biostatistics 2 July 2010 (doi:10.1093/biostatistics/kxq042).
Nam, D., Kim, J., Kim, S. Y. & Kim, S. GSA-SNP: a general approach for gene set analysis of polymorphisms. Nucleic Acids Res. 38 (Suppl. 2), W749–W754 (2010).
Luo, L. et al. Genome-wide gene and pathway analysis. Eur. J. Hum. Genet. 18, 1045–1053 (2010).
Guo, Y. F., Li, J., Chen, Y., Zhang, L. S. & Deng, H. W. A new permutation strategy of pathway-based approach for genome-wide association study. BMC Bioinformatics 10, 429 (2009).
Cantor, R. M., Lange, K. & Sinsheimer, J. S. Prioritizing GWAS results: a review of statistical methods and recommendations for their application. Am. J. Hum. Genet. 86, 6–22 (2010). A crucial review of current statistical approaches used in GWA studies, including meta-analysis, epistasis analysis and pathway analysis. The authors give a few recommendations for using these approaches.
Hong, M. G., Pawitan, Y., Magnusson, P. K. & Prince, J. A. Strategies and issues in the detection of pathway enrichment in genome-wide association studies. Hum. Genet. 126, 289–301 (2009).
Kraft, P. & Raychaudhuri, S. Complex diseases, complex genes: keeping pathways on the right track. Epidemiology 20, 508–511 (2009). The authors discuss three loosely defined approaches to pathway analysis and touch on potential pitfalls for each when applied to GWA studies. They suggest that care must be taken to avoid biases and errors that will send researchers down blind alleys.
Tintle, N. et al. Inclusion of a priori information in genome-wide association analysis. Genet. Epidemiol. 33 (Suppl. 1), 74–80 (2009).
Thomas, D. C. et al. Use of pathway information in molecular epidemiology. Hum. Genomics 4, 21–42 (2009).
Elbers, C. C. et al. Using genome-wide pathway analysis to unravel the etiology of complex diseases. Genet. Epidemiol. 33, 419–431 (2009). The authors present the various benefits and limitations of pathway classification tools for analyzing GWA study data. They demonstrate multiple differences in outcome between pathway tools analyzing the same data set and suggest that the limitations of pathway approaches need to be addressed.
Kanehisa, M. & Goto, S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28, 27–30 (2000).
Ashburner, M. et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature Genet. 25, 25–29 (2000).
Klingstrom, T. & Plewczynski, D. Protein–protein interaction and pathway databases, a graphical review. Brief. Bioinform. 17 Sept 2010 (doi:10.1093/bib/bbq064).
Goeman, J. J. & Buhlmann, P. Analyzing gene expression data in terms of gene sets: methodological issues. Bioinformatics 23, 980–987 (2007).
Keating, B. J. et al. Concept, design and implementation of a cardiovascular gene-centric 50 k SNP array for large-scale genomic association studies. PLoS ONE 3, e3583 (2008).
Fridley, B. L., Jenkins, G. D. & Biernacka, J. M. Self-contained gene-set analysis of expression data: an evaluation of existing and novel methods. PLoS ONE 5, e12693 (2010).
Willer, C. J. et al. Six new loci associated with body mass index highlight a neuronal influence on body weight regulation. Nature Genet. 41, 25–34 (2009).
Weedon, M. N. et al. Genome-wide association analysis identifies 20 loci that influence adult height. Nature Genet. 40, 575–583 (2008).
Lango Allen, H. et al. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature 467, 832–838 (2010).
Lewinger, J. P., Conti, D. V., Baurley, J. W., Triche, T. J. & Thomas, D. C. Hierarchical Bayes prioritization of marker associations from a genome-wide association scan for further investigation. Genet. Epidemiol. 31, 871–882 (2007).
Wu, T. T., Chen, Y. F., Hastie, T., Sobel, E. & Lange, K. Genome-wide association analysis by lasso penalized logistic regression. Bioinformatics 25, 714–721 (2009).
Zhou, H., Sehl, M. E., Sinsheimer, J. S. & Lange, K. Association screening of common and rare genetic variants by penalized regression. Bioinformatics 26, 2375–2382 (2010).
Perry, J. R. et al. Interrogating type 2 diabetes genome-wide association data using a biological pathway-based approach. Diabetes 58, 1463–1467 (2009).
Mirnics, K., Middleton, F. A., Marquez, A., Lewis, D. A. & Levitt, P. Molecular characterization of schizophrenia viewed by microarray analysis of gene expression in prefrontal cortex. Neuron 28, 53–67 (2000). This is one of the first gene expression studies demonstrating that a group of functionally related genes may show modest yet consistent expression changes between two conditions.
Jiang, Z. & Gentleman, R. Extensions to gene set enrichment. Bioinformatics 23, 306–313 (2007).
Efron, B. & Tibshirani, R. On testing the significance of sets of genes. Ann. Appl. Stat. 1, 107–129 (2007).
Dinu, I. et al. Improving gene set analysis of microarray data by SAM-GS. BMC Bioinformatics 8, 242 (2007).
Heller, R., Manduchi, E., Grant, G. R. & Ewens, W. J. A flexible two-stage procedure for identifying gene sets that are differentially expressed. Bioinformatics 25, 1019–1025 (2009).
Ackermann, M. & Strimmer, K. A general modular framework for gene set enrichment analysis. BMC Bioinformatics 10, 47 (2009).
Glazko, G. V. & Emmert-Streib, F. Unite and conquer: univariate and multivariate approaches for finding differentially expressed gene sets. Bioinformatics 25, 2348–2354 (2009).
Irizarry, R. A., Wang, C., Zhou, Y. & Speed, T. P. Gene set enrichment analysis made simple. Stat. Methods Med. Res. 18, 565–575 (2009).
Hsu, Y. H. et al. An integration of genome-wide association study and gene expression profiling to prioritize the discovery of novel susceptibility loci for osteoporosis-related traits. PLoS Genet. 6, e1000977 (2010).
Zhong, H., Yang, X., Kaplan, L. M., Molony, C. & Schadt, E. E. Integrating pathway analysis and genetics of gene expression for genome-wide association studies. Am. J. Hum. Genet. 86, 581–591 (2010). The authors performed an analysis that leverages information from genetics of gene expression studies to identify biological pathways enriched for expression-associated genetic loci associated with disease in GWA studies. They demonstrated the utility of integrating pathway analysis and gene expression data for interpreting signals from GWA studies.
Li, B. & Leal, S. M. Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am. J. Hum. Genet. 83, 311–321 (2008).
Price, A. L. et al. Pooled association tests for rare variants in exon-resequencing studies. Am. J. Hum. Genet. 86, 832–838 (2010).
Madsen, B. E. & Browning, S. R. A groupwise association test for rare mutations using a weighted sum statistic. PLoS Genet. 5, e1000384 (2009).
Han, F. & Pan, W. A data-adaptive sum test for disease association with multiple common or rare variants. Hum. Hered. 70, 42–54 (2010).
Wei, Z. et al. From disease association to risk assessment: an optimistic view from genome-wide association studies on type 1 diabetes. PLoS Genet. 5, e1000678 (2009).
Frayling, T. M., Colhoun, H. & Florez, J. C. A genetic link between type 2 diabetes and prostate cancer. Diabetologia 51, 1757–1760 (2008).
Giovannucci, E. et al. Diabetes and cancer: a consensus report. CA Cancer J. Clin. 60, 207–221 (2010).
Pan, W. Network-based model weighting to detect multiple loci influencing complex diseases. Hum. Genet. 124, 225–234 (2008).
Baranzini, S. E. et al. Pathway and network-based analysis of genome-wide association studies in multiple sclerosis. Hum. Mol. Genet. 18, 2078–2090 (2009).
Baurley, J. W., Conti, D. V., Gauderman, W. J. & Thomas, D. C. Discovery of complex pathways from observational data. Stat. Med. 29, 1998–2011 (2010).
Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003).
Zinovyev, A., Viara, E., Calzone, L. & Barillot, E. BiNoM: a Cytoscape plugin for manipulating and analyzing biological networks. Bioinformatics 24, 876–877 (2008).
Clement-Ziza, M. et al. Genoscape: a Cytoscape plug-in to automate the retrieval and integration of gene expression data and molecular networks. Bioinformatics 25, 2617–2618 (2009).
Neurath, M. F., Fuss, I., Kelsall, B. L., Stuber, E. & Strober, W. Antibodies to interleukin 12 abrogate established experimental colitis in mice. J. Exp. Med. 182, 1281–1290 (1995).
Neurath, M. F. IL-23: a master regulator in Crohn disease. Nature Med. 13, 26–28 (2007).
Medina, I. et al. Gene set-based analysis of polymorphisms: finding pathways or biological processes associated to traits in genome-wide association studies. Nucleic Acids Res. 37, W340–W344 (2009).
Holden, M., Deng, S., Wojnowski, L. & Kulle, B. GSEA-SNP: applying gene set enrichment analysis to SNP data from genome-wide association studies. Bioinformatics 24, 2784–2785 (2008).
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
We thank D. C. Thomas (University of Southern California) for his helpful critiques which greatly improved the manuscript.
The authors declare no competing financial interests.
A strategy for assessing the probability of observing the value of a particular statistic. The probability is computed from a data set in which the data are randomly shuffled and the statistic is recomputed from the shuffled data many times and ultimately compared to the value of the statistic obtained with the non-shuffled data.
- Multi-marker test
A statistical method that measures the strength of association between a trait and multiple SNP markers.
- SNP ascertainment
Identification of SNPs that should be placed on a genotyping array to ensure representative coverage of the genome.
- Linkage disequilibrium
The non-random association of alleles at two or more closely linked loci.
- Genomic inflation
The presence of excess false-positive results, measured by quantifying the ratio of the median of the empirically observed distribution of the test statistic to the expected median.
- Type I error
The probability of a false-positive result from a statistical hypothesis test.
- Bonferroni correction
A multiple comparison adjustment approach that tests each individual hypothesis by dropping the threshold for declaring statistical significance by n-fold, when n hypotheses are being tested.
- False Discovery Rate
A multiple comparison adjustment approach to control the expected proportion of incorrectly rejected null hypotheses in a list of rejected hypotheses.
- Genotype imputation
A statistical method that predicts individual genotypes at ungenotyped markers from genotypes of other nearby markers, usually using the HapMap data as a reference.
About this article
Frontiers in Genetics (2019)
PLOS Genetics (2019)
Therapeutic Advances in Psychopharmacology (2019)
Drug Discovery Today (2019)