Review Article | Published:

Meta-analysis methods for genome-wide association studies and beyond

Nature Reviews Genetics volume 14, pages 379389 (2013) | Download Citation

Subjects

Abstract

Meta-analysis of genome-wide association studies (GWASs) has become a popular method for discovering genetic risk variants. Here, we overview both widely applied and newer statistical methods for GWAS meta-analysis, including issues of interpretation and assessment of sources of heterogeneity. We also discuss extensions of these meta-analysis methods to complex data. Where possible, we provide guidelines for researchers who are planning to use these methods. Furthermore, we address special issues that may arise for meta-analysis of sequencing data and rare variants. Finally, we discuss challenges and solutions surrounding the goals of making meta-analysis data publicly available and building powerful consortia.

Key points

  • Meta-analysis of genome-wide association studies has contributed to the discovery of most of the recently identified genetic risk factors for complex diseases.

  • Common meta-analytical approaches have been successfully applied; however, novel methods have been proposed that may have some advantages and disadvantages.

  • Heterogeneity in meta-analysis can be introduced from various sources and should not be disregarded. Several methods have been proposed that may optimize power in the presence of heterogeneity from known or unknown sources.

  • Next-generation sequence data will boost the study of rare variants; however, larger sample sizes are required. Several techniques have been developed for the meta-analysis of rare variants. Tools other than P values may be useful for inference.

  • Scientists will benefit from publicly available data sets and collaboration between consortia that will facilitate a wide range of methodological and applied research.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

References

  1. 1.

    et al. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nature Rev. Genet. 9, 356–369 (2008). This is a comprehensive Review of challenges in the discovery of associations using GWASs.

  2. 2.

    & Genotype imputation for genome-wide association studies. Nature Rev. Genet. 11, 499–511 (2010).

  3. 3.

    , & Implications of small effect sizes of individual genetic variants on the design and interpretation of genetic association studies of complex diseases. Am. J. Epidemiol. 164, 609–614 (2006).

  4. 4.

    , , & Required sample size and nonreplicability thresholds for heterogeneous genetic associations. Proc. Natl Acad. Sci. USA 105, 617–622 (2008).

  5. 5.

    , , , & Defining the power limits of genome-wide association scan meta-analyses. Genet. Epidemiol. 35, 781–789 (2011).

  6. 6.

    et al. Impact of phenotype definition on genome-wide association signals: empirical evaluation in human immunodeficiency virus type 1 infection. Am. J. Epidemiol. 173, 1336–1342 (2011).

  7. 7.

    & Meta-analysis in genome-wide association studies. Pharmacogenomics 10, 191–201 (2009).

  8. 8.

    , , & The power of meta-analysis of genome-wide association studies. Annu. Rev. Genom. Hum. Genet. (in the press).

  9. 9.

    , , & Estimating missing heritability for disease from genome-wide association studies. Am. J. Hum. Genet. 88, 294–305 (2011). In this paper, a method is presented for estimating the proportion of variation in disease liability that is captured in GWAS by simultaneously considering all SNPs.

  10. 10.

    et al. Novel method to estimate the phenotypic variation explained by genome-wide association studies reveals large fraction of the missing heritability. Genet. Epidemiol. 35, 341–349 (2011).

  11. 11.

    et al. Practical aspects of imputation-driven meta-analysis of genome-wide association studies. Hum. Mol. Genet. 17, R122–128 (2008).

  12. 12.

    , & The meta-analysis of genome-wide association studies. Brief. Bioinform. 12, 259–269 (2011).

  13. 13.

    et al. Genome-wide meta-analysis identifies 56 bone mineral density loci and reveals 14 loci associated with risk of fracture. Nature Genet. 44, 491–501 (2012).

  14. 14.

    et al. The emergence of networks in human genome epidemiology: challenges and opportunities. Epidemiology 18, 1–8 (2007).

  15. 15.

    The 1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012). This is the first description of the 1000 Genomes Project.

  16. 16.

    & Meta-analysis of genome-wide association studies: no efficiency gain in using individual participant data. Genet. Epidemiol. 34, 60–66 (2010).

  17. 17.

    , & Prioritizing GWAS results: a review of statistical methods and recommendations for their application. Am. J. Hum. Genet. 86, 6–22 (2010).

  18. 18.

    & Estimation of significance thresholds for genomewide association scans. Genet. Epidemiol. 32, 227–234 (2008).

  19. 19.

    , & Validating, augmenting and refining genome-wide association signals. Nature Rev. Genet. 10, 318–329 (2009).

  20. 20.

    , & METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26, 2190–2191 (2010).

  21. 21.

    & GWAMA: software for genome-wide association meta-analysis. BMC Bioinformatics 11, 288 (2010).

  22. 22.

    , , & GenABEL: an R library for genome-wide association analysis. Bioinformatics 23, 1294–1296 (2007).

  23. 23.

    et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).

  24. 24.

    , & (eds) The Handbook of Research Synthesis and Meta-Analysis (Russell Sage Foundation, 2009).

  25. 25.

    , , & Comprehensive literature review and statistical considerations for GWAS meta-analysis. Nucleic Acids Res. 40, 3777–3784 (2012).

  26. 26.

    , , , & CCRaVAT and QuTie-enabling analysis of rare variants in large-scale case control and quantitative trait association studies. BMC Bioinformatics 11, 527 (2010).

  27. 27.

    , & On combining data from genome-wide assocition studies to discover disease-associated SNPs. Statist. Sci. 24, 547–560 (2009).

  28. 28.

    , , & Discovery properties of genome-wide association signals from cumulatively combined data sets. Am. J. Epidemiol. 170, 1197–1206 (2009).

  29. 29.

    & Methods for meta-analysis in genetic association studies: a review of their potential and pitfalls. Hum. Genet. 123, 1–14 (2008).

  30. 30.

    & Meta-analysis in clinical trials. Control Clin. Trials 7, 177–188 (1986).

  31. 31.

    & Random-effects model for meta-analysis of clinical trials: an update. Contemp. Clin. Trials 28, 105–114 (2007).

  32. 32.

    Empirical versus natural weighting in random effects meta-analysis. Stat. Med. 29, 1259–1265 (2010).

  33. 33.

    , & Heterogeneity in meta-analyses of genome-wide association investigations. PLoS ONE 2, e841 (2007).

  34. 34.

    & Random-effects model aimed at discovering associations in meta-analysis of genome-wide association studies. Am. J. Hum. Genet. 88, 586–598 (2011).

  35. 35.

    The combination of estimated from different experiments. Biometrics 10, 101–129 (1954).

  36. 36.

    , , & The use of imputed values in the meta-analysis of genome-wide association studies. Genet. Epidemiol. 35, 597–605 (2011).

  37. 37.

    , & Optimal methods for meta-analysis of genome-wide association studies. Genet. Epidemiol. 35, 581–591 (2011).

  38. 38.

    Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678 (2007).

  39. 39.

    et al. Genomewide association analysis of coronary artery disease. N. Engl. J. Med. 357, 443–453 (2007).

  40. 40.

    et al. Underlying genetic models of inheritance in established type 2 diabetes associations. Am. J. Epidemiol. 170, 537–545 (2009).

  41. 41.

    et al. Bayesian inference analyses of the polygenic architecture of rheumatoid arthritis. Nature Genet. 44, 483–489 (2012).

  42. 42.

    , , , & Assessing the probability that a positive report is false: an approach for molecular epidemiology studies. J. Natl Cancer Inst. 96, 434–442 (2004).

  43. 43.

    , , & Estimation of the multiple testing burden for genomewide association studies of nearly all common variants. Genet. Epidemiol. 32, 381–385 (2008).

  44. 44.

    , & The Genome-Wide Significance Project. What should the genome-wide significance threshold be? Empirical replication of borderline genetic associations. Int. J. Epidemiol. 41, 273–286 (2012).

  45. 45.

    et al. A genome-wide association search for type 2 diabetes genes in African Americans. PLoS ONE 7, e29202 (2012).

  46. 46.

    Toward evidence-based medical statistics. 2: the Bayes factor. Ann. Intern. Med. 130, 1005–1013 (1999).

  47. 47.

    Bayes factors for genome-wide association studies: comparison with P-values. Genet. Epidemiol. 33, 79–86 (2009).

  48. 48.

    & Statistical significance for genomewide studies. Proc. Natl Acad. Sci. USA 100, 9440–9445 (2003).

  49. 49.

    et al. Mining gold dust under the genome wide significance level: a two-stage approach to analysis of GWAS. Genet. Epidemiol. 35, 111–118 (2011).

  50. 50.

    , , , & A new multipoint method for genome-wide association studies by imputation of genotypes. Nature Genet. 39, 906–913 (2007).

  51. 51.

    , , , & Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nature Genet. 44, 955–959 (2012). A method is presented here for genotype imputation in GWASs using large reference panels.

  52. 52.

    , , , & MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet. Epidemiol. 34, 816–834 (2010).

  53. 53.

    et al. A new statistic to evaluate imputation reliability. PLoS ONE 5, e9697 (2010).

  54. 54.

    & Imputation aware meta-analysis of genome-wide association studies. Genet. Epidemiol. 34, 537–542 (2010).

  55. 55.

    et al. Challenges in phenotype definition in the whole-genome era: multivariate models of memory and intelligence. Neuroscience 164, 88–107 (2009).

  56. 56.

    The Tobacco and Genetics Consortium. Genome-wide meta-analyses identify multiple loci associated with smoking behavior. Nature Genet. 42, 441–447 (2010).

  57. 57.

    et al. Large-scale analysis of association between GDF5 and FRZB variants and osteoarthritis of the hip, knee, and hand. Arthritis Rheum. 60, 1710–1721 (2009).

  58. 58.

    et al. Meta-analysis of genome-wide association studies confirms a susceptibility locus for knee osteoarthritis on chromosome 7q22. Ann. Rheum. Dis. 70, 349–355 (2011).

  59. 59.

    et al. Genotype-phenotype associations in obesity dependent on definition of the obesity phenotype. Obes Facts 1, 138–145 (2008).

  60. 60.

    et al. Meta-analysis of the INSIG2 association with obesity including 74,345 individuals: does heterogeneity of estimates relate to study design? PLoS Genet. 5, e1000694 (2009).

  61. 61.

    et al. Recommendations for standardization and phenotype definitions in genetic studies of osteoarthritis: the TREAT-OA consortium. Osteoarthritis Cartilage 19, 254–264 (2011).

  62. 62.

    et al. Phenotype harmonization and cross-study collaboration in GWAS consortia: the GENEVA experience. Genet. Epidemiol. 35, 159–173 (2011).

  63. 63.

    et al. A subset-based approach improves power and interpretation for the combined analysis of genetic association studies of heterogeneous traits. Am. J. Hum. Genet. 90, 821–835 (2012).

  64. 64.

    , , , & To stratify or not to stratify: power considerations for population-based genome-wide association studies of quantitative traits. Genet. Epidemiol. 35, 867–879 (2011).

  65. 65.

    , , & Consistency of genome-wide associations across major ancestral groups. Hum. Genet. 131, 1057–1071 (2012).

  66. 66.

    et al. Consistent association of type 2 diabetes risk variants found in Europeans in diverse racial and ethnic groups. PLoS Genet. 6, e1001078 (2010).

  67. 67.

    Transethnic meta-analysis of genomewide association studies. Genet. Epidemiol. 35, 809–822 (2011).

  68. 68.

    et al. Analysis and application of European genetic substructure using 300 K SNP information. PLoS Genet. 4, e4 (2008).

  69. 69.

    et al. Tracing sub-structure in the European American population with PCA-informative markers. PLoS Genet. 4, e1000114 (2008).

  70. 70.

    et al. Population structure, differential bias and genomic control in a large-scale, case-control association study. Nature Genet. 37, 1243–1246 (2005).

  71. 71.

    , , , & Rare and low frequency variant stratification in the UK population: description and impact on association tests. PLoS ONE 7, e46519 (2012).

  72. 72.

    et al. Meta-analysis identifies 13 new loci associated with waist-hip ratio and reveals sexual dimorphism in the genetic basis of fat distribution. Nature Genet. 42, 949–960 (2010).

  73. 73.

    , , & The DIAGRAM Consortium. Sex-specific differences in effect size estimates at established complex trait loci. Int. J. Epidemiol. 41, 1376–1382 (2012).

  74. 74.

    , & Meta-analysis of sex-specific genome-wide association studies. Genet. Epidemiol. 34, 846–853 (2010).

  75. 75.

    The power of the standard test for the presence of heterogeneity in meta-analysis. Stat. Med. 25, 2688–2699 (2006).

  76. 76.

    , , & Clinical interpretation of Cochran's Q test depends on power and prior assumptions about heterogeneity. Res. Synthesis Methods 1, 149–161 (2010).

  77. 77.

    & Quantifying heterogeneity in a meta-analysis. Stat. Med. 21, 1539–1558 (2002).

  78. 78.

    , , & Measuring inconsistency in meta-analyses. BMJ 327, 557–560 (2003).

  79. 79.

    et al. Pervasive sharing of genetic effects in autoimmune disease. PLoS Genet. 7, e1002254 (2011).

  80. 80.

    et al. Large-scale association analysis identifies 13 new susceptibility loci for coronary artery disease. Nature Genet. 43, 333–338 (2011).

  81. 81.

    et al. Meta-analysis of gene-environment interaction: joint estimation of SNP and SNP x environment regression coefficients. Genet. Epidemiol. 35, 11–18 (2011).

  82. 82.

    , , & Genome-wide meta-analysis of joint tests for genetic and gene-environment interaction effects. Hum. Hered. 70, 292–300 (2010).

  83. 83.

    , , , & An evaluation of bivariate random-effects meta-analysis for the joint synthesis of two correlated outcomes. Stat. Med. 26, 78–97 (2007).

  84. 84.

    A unification of multivariate methods for meta-analysis of genetic association studies. Stat. Appl. Genet. Mol. Biol. 7, 31 (2008).

  85. 85.

    On the covariance of two correlated log-odds ratios. Stat. Med. 31, 1418–1431 (2012).

  86. 86.

    et al. Fine-mapping at three loci known to affect fetal hemoglobin levels explains additional genetic variation. Nature Genet. 42, 1049–1051 (2010).

  87. 87.

    et al. Fine mapping of five loci associated with low-density lipoprotein cholesterol detects variants that double the explained heritability. PLoS Genet. 7, e1002198 (2011).

  88. 88.

    et al. Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nature Genet. 44, 369–375 (2012).

  89. 89.

    & Mendelian randomization: prospects, potentials, and limitations. Int. J. Epidemiol. 33, 30–42 (2004).

  90. 90.

    , & CRP CHD Genetics Collaboration. Methods for meta-analysis of individual participant data from Mendelian randomisation studies with binary outcomes. Stat. Methods Med. Res. 19 Jun 2012 (10.1177/0962280212451882).

  91. 91.

    et al. Plasma HDL cholesterol and risk of myocardial infarction: a Mendelian randomisation study. Lancet 380, 572–580 (2012).

  92. 92.

    & The effect of next-generation sequencing technology on complex trait research. Eur. J. Clin. Invest. 41, 561–567 (2011).

  93. 93.

    , , & Much ado about nothing: a comparison of the performance of meta-analytical methods with rare events. Stat. Med. 26, 53–77 (2007).

  94. 94.

    , , & Why add anything to nothing? The arcsine difference as a measure of treatment effect in meta-analysis with zero cells. Stat. Med. 28, 721–738 (2009).

  95. 95.

    & Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am. J. Hum. Genet. 83, 311–321 (2008).

  96. 96.

    & A groupwise association test for rare mutations using a weighted sum statistic. PLoS Genet. 5, e1000384 (2009).

  97. 97.

    & An evaluation of statistical approaches to rare variant analysis in genetic association studies. Genet. Epidemiol. 34, 188–193 (2010).

  98. 98.

    & A data-adaptive sum test for disease association with multiple common or rare variants. Hum. Hered. 70, 42–54 (2010).

  99. 99.

    et al. Extending rare-variant testing strategies: analysis of noncoding sequence and imputed genotypes. Am. J. Hum. Genet. 87, 604–617 (2010).

  100. 100.

    et al. Pooled association tests for rare variants in exon-resequencing studies. Am. J. Hum. Genet. 86, 832–838 (2010).

  101. 101.

    et al. A covering method for detecting genetic associations between rare variants and common phenotypes. PLoS Comput. Biol. 6, e1000954 (2010).

  102. 102.

    & A unified framework for multi-locus association analysis of both common and rare variants. BMC Genomics 12, 89 (2011).

  103. 103.

    , , & ARIEL and AMELIA: testing for an accumulation of rare variants using next-generation sequencing data. Hum. Hered. 73, 84–94 (2012).

  104. 104.

    et al. Rare-variant association testing for sequencing data with the sequence kernel association test. Am. J. Hum. Genet. 89, 82–93 (2011). A SKAT is described here for the identification of rare variants associated with continuous of dichotomous traits.

  105. 105.

    et al. Optimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies. Am. J. Hum. Genet. 91, 224–237 (2012).

  106. 106.

    Bias reduction of maximum likelihood estimates. Biometrika 80, 12 (1993).

  107. 107.

    , , & Meta-analysis of a rare-variant association test. Stat Tech , (2012).

  108. 108.

    , & Pe'er, I. Metaseq: privacy preserving meta-analysis of sequencing-based association studies. Pac. Symp. Biocomput. 2013, 356–367 (2013).

  109. 109.

    et al. The NCBI dbGaP database of genotypes and phenotypes. Nature Genet. 39, 1181–1186 (2007).

  110. 110.

    et al. Glad you asked: participants' opinions of re-consent for dbGap data submission. J. Empir. Res. Hum. Res. Eth. 5, 9–16 (2010).

  111. 111.

    et al. Association analyses of 249,796 individuals reveal 18 new loci associated with body mass index. Nature Genet. 42, 937–948 (2010).

  112. 112.

    et al. Variance component model to account for sample structure in genome-wide association studies. Nature Genet. 42, 348–354 (2010).

  113. 113.

    et al. Genome-wide association studies identify four ER negative-specific breast cancer risk loci. Nature Genet. 45, 392–398 (2013).

  114. 114.

    et al. Identification of 23 new prostate cancer susceptibility loci using the iCOGS custom genotyping array. Nature Genet. 45, 385–391 (2013).

  115. 115.

    et al. GWAS meta-analysis and replication identifies three new susceptibility loci for ovarian cancer. Nature Genet. 45, 362–370 (2013).

  116. 116.

    et al. Large-scale genotyping identifies 41 new loci associated with breast cancer risk. Nature Genet. 45, 353–361 (2013). This is a multi-consortium effort that led to the identification of numerous novel loci associated with breast cancer. Separate papers described the identification of additional loci were found for prostate and ovarian cancer.

  117. 117.

    et al. Multiple independent variants at the TERT locus are associated with telomere length and risks of breast and ovarian cancer. Nature Genet. 45, 371–384 (2013).

Download references

Acknowledgements

E.E. is partially funded by the GEFOS (FP7-HEALTH-F2-2008-201865-GEFOS) and the TREATOA (FP7-HEALTH-F2-2008-200800-TREATOA) projects.

Author information

Affiliations

  1. Clinical and Molecular Epidemiology Unit, Department of Hygiene and Epidemiology, University of Ioannina Medical School, Ioannina 45110, Greece.

    • Evangelos Evangelou
  2. Stanford Prevention Research Center, Department of Medicine and Department of Health Research and Policy, Stanford University School of Medicine, Stanford, California 94305–5411, USA.

    • John P. A. Ioannidis
  3. Department of Statistics, Stanford University School of Humanities and Sciences, Stanford, California 94305–5411, USA.

    • John P. A. Ioannidis

Authors

  1. Search for Evangelos Evangelou in:

  2. Search for John P. A. Ioannidis in:

Competing interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to John P. A. Ioannidis.

Glossary

Meta-analysis

A statistical method for the combination of different studies to provide a summary result.

Summary data

Data that present summary statistics of a population and are used in meta-analysis approaches without granting access to individual-level data.

Imputation

In genetics, the inference of genotypes of markers that have not been directly genotyped by making use of information from haplotype reference panels such as the HapMap or the 1000 Genomes panels.

Genome-wide significance

The significance threshold for rejecting the null hypothesis in genome-wide association studies.

Minor allele frequency

(MAF). The frequency of the less common allele of a polymorphic locus. It has a value that lies between 0 and 0.5 and can vary between populations.

Hardy–Weinberg equilibrium

A principle stating that the genetic variation in a population will remain constant from one generation to the next in the absence of disturbing factors.

Bayesian approaches

Fully probabilistic methods for describing models, parameters and data. They are so called because extensive use is made of Bayes' theorem to compute the probability distribution of model parameters given the experimental data.

Bonferroni correction

A method to counteract the problem of multiple comparisons. It is the simplest and most conservative approach to control for type I error.

Type I error

The probability of rejecting the null hypothesis when it is true. For genetic association studies, type I errors reflect false-positive findings of associations between allele or genotype and disease.

Linkage disequilibrium

The nonrandom association of alleles of different linked polymorphisms in a population.

Population stratification

The presence of several population subgroups that show limited interbreeding. When such subgroups differ both in allele frequency and in disease prevalence, this can lead to erroneous results in association studies.

Principal components

A composite variable that summarizes the variation across a larger number of variables, each represented by a column of a matrix.

Main effects

The effects of a variable assuming no dependency or conditionality of other variables.

Bivariate meta-analysis

Joint synthesis of two phenotypes by using their correlation.

Asymptotic assumptions

When the sample size in a data set grows indefinitely, then the distribution of the estimators becomes approximately normal.

2 × 2 tables

A 2 × 2 table that describes the cross-classification of data that are divided into two groups with two categories in each.

Collapsing approach

Statistical methods for association analysis in which multiple low-frequency or rare variants are collapsing into a single locus.

Lambda inflation factor

A metric used in genetic association studies to correct for spurious associations (which may arise owing to population stratification) by estimating the extent of inflation in the statistical evidence and appropriately down-weighting this inflation.

About this article

Publication history

Published

DOI

https://doi.org/10.1038/nrg3472

Further reading