Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

Bayesian statistical methods for genetic association studies

Key Points

  • p-values are commonly used as summaries of evidence for association between a genetic variant and phenotype, but they have an important limitation in that they are unable to quantify how confident one should be that a given SNP is truly associated with a phenotype.

  • Bayesian methods provide an alternative approach to assessing associations. We show that Bayesian analyses are not too difficult and can be rewarding — for example, unlike p-values, a Bayesian probability of association is comparable across SNPs and across studies.

  • For a Bayesian analysis of single-SNP association in a case–control study, we discuss genetic models that can form an alternative to the null hypothesis of no association, in addition to effect-size distributions for the parameters of these models. An alternative Bayesian analysis derives a posterior distribution for effect size, without reference to a null hypothesis.

  • We give an example of a multi-SNP Bayesian analysis for fine-scale mapping and discuss Bayesian approaches to multiple testing and meta-analysis.

  • Broad guidelines are suggested for editors and reviewers of Bayesian analyses.

Abstract

Bayesian statistical methods have recently made great inroads into many areas of science, and this advance is now extending to the assessment of association between genetic variants and disease or other phenotypes. We review these methods, focusing on single-SNP tests in genome-wide association studies. We discuss the advantages of the Bayesian approach over classical (frequentist) approaches in this setting and provide a tutorial on basic analysis steps, including practical guidelines for appropriate prior specification. We demonstrate the use of Bayesian methods for fine mapping in candidate regions, discuss meta-analyses and provide guidance for refereeing manuscripts that contain Bayesian analyses.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Dependence of the Bayes factor on minor allele count and on the prior standard deviation of the effect size.

Similar content being viewed by others

References

  1. Sellke, T., Bayarri, M. J. & Berger, J. O. Calibration of p values for testing precise null hypotheses. Am. Stat. 55, 62–71 (2001).

    Article  Google Scholar 

  2. Sterne, J. A. C. & Davey Smith, G. Sifting the evidence — what's wrong with significance tests? BMJ 322, 226–231 (2001).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Ioannidis, J. P. A. Effect of formal statistical significance on the credibility of observational associations. Am. J. Epidem. 168, 374–383 (2008).

    Article  Google Scholar 

  4. Ayres, K. L. & Balding, D. J. Measuring departures from Hardy–Weinberg: a Markov chain Monte Carlo method for estimating the inbreeding coefficient. Heredity 80, 769–777 (1998).

    Article  PubMed  Google Scholar 

  5. Shoemaker, J. S., Painter, I. S. & Weir, B. S. Bayesian statistics in genetics — a guide for the uninitiated. Trends Genet. 15, 354–358 (1999).

    Article  CAS  PubMed  Google Scholar 

  6. Beaumont, M. A. & Rannala, B. The Bayesian revolution in genetics. Nature Rev. Genet. 5, 251–261 (2004).

    Article  CAS  PubMed  Google Scholar 

  7. Marjoram, P. & Tavare, S. Modern computational approaches for analysing molecular genetic variation data. Nature Rev. Genet. 7, 759–770 (2006).

    Article  CAS  PubMed  Google Scholar 

  8. O'Hara, R. B., Cano, J. M., Ovaskainen, O., Teplitsky, C. & Alho, J. S. Bayesian approaches in evolutionary quantitative genetics. J. Evol. Biol. 21, 949–957 (2008).

    Article  CAS  PubMed  Google Scholar 

  9. Wakefield, J. Bayesian methods for examining Hardy–Weinberg equilibrium. Biometrics 13 May 2009 (doi:10.1111/j.1541-0420.2009.01267.x).

    Article  PubMed  PubMed Central  Google Scholar 

  10. Lunn, D. J., Whittaker, J. C. & Best, N. A Bayesian toolkit for genetic association studies. Genet. Epidem. 30, 231–247 (2006).

    Article  Google Scholar 

  11. Marchini, J., Howie, B., Myers, S., McVean, G. & Donnelly, P. A new multipoint method for genome-wide association studies by imputation of genotypes. Nature Genet. 39, 906–913 (2007). The supplementary material of this article includes a review of frequentist tests and BFs for single-SNP association and a brief review of the Laplace approximation. In particular, it describes the Bayesian analysis methods implemented in the SNPTEST software.

    Article  CAS  PubMed  Google Scholar 

  12. Servin, B. & Stephens, M. Imputation-based analysis of association studies: candidate regions and quantitative traits. PLoS Genet. 3, e114 (2007). This paper includes a description of several of the Bayesian analysis methods that are implemented in the BIMBAM software, including the Bayesian multi-SNP analysis methods that we used in this Review.

    Article  PubMed  PubMed Central  Google Scholar 

  13. The Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678 (2007). A landmark paper because of the size of the studies, the pioneering use of unphenotyped common controls for a range of diseases and the large number of novel genetic associations reported. The authors also advocate the use of Bayesian approaches for evaluating evidence of association, which was reported alongside traditional p -values for the first time in a major study.

  14. Wakefield, J. A Bayesian measure of the probability of false discovery in genetic epidemiology studies. Am. J. Hum. Genet. 81, 208–227 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Hosking, F. J., Sterne, J. A. C., Smith, G. D. & Green, P. J. Inference from genome-wide association studies using a novel Markov model. Genet. Epidem. 32, 497–504 (2008).

    Article  Google Scholar 

  16. Verzilli, C. et al. Bayesian meta-analysis of genetic association studies with different sets of markers. Am. J. Hum. Genet. 82, 859–872 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Fridley, B. L. Bayesian variable and model selection methods for genetic association studies. Genet. Epidem. 33, 27–37 (2009).

    Article  Google Scholar 

  18. Newcombe, P. J. et al. Multilocus Bayesian meta-analysis of gene–disease associations. Am. J. Hum. Genet. 84, 567–580 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Wakefield, J. Reporting and interpretation in genome-wide association studies. Intern. J. Epidem. 37, 641–653 (2008).

    Article  Google Scholar 

  20. Guan, Y. & Stephens, M. Practical issues in imputation-based association mapping. PLoS Genet. 4, e1000279 (2008). This article includes a detailed discussion of the advantages of Bayesian methods over frequentist methods when assessing associations with imputed SNPs.

    Article  PubMed  PubMed Central  Google Scholar 

  21. Balding, D. J. A tutorial on statistical methods for population association studies. Nature Rev. Genet. 7, 781–791 (2006). This Review covers: preliminary analyses (of Hardy–Weinberg and linkage equilibria, inference of phase and missing genotypes); single-SNP tests of association for binary, continuous and ordinal outcomes; multi-SNP and haplotype analyses; and dealing with population stratification and multiple-testing issues, largely within the frequentist framework.

    Article  CAS  PubMed  Google Scholar 

  22. Jeffreys, H. Theory of Probability (Oxford Univ. Press, 1961).

    Google Scholar 

  23. Good, I. J. The Bayes/non-Bayes compromise: a brief review. J. Am. Stat. Assoc. 87, 597–606 (1992).

    Article  Google Scholar 

  24. Seaman, S. R. & Richardson, S. Equivalence of prospective and retrospective models in the Bayesian analysis of case–control studies, Biometrika 91, 15–25 (2004).

    Article  Google Scholar 

  25. Freidlin, B., Zheng, G., Li, Z. H. & Gastwirth, J. L. Trend tests for case–control studies of genetic markers: power, sample size and robustness. Hum. Hered. 53, 146–152 (2002).

    Article  CAS  PubMed  Google Scholar 

  26. The SEARCH Collaborative Group. SLCO1B1 variants and statin-induced myopathy — a genomewide study. N. Engl. J. Med. 359, 789–799 (2008).

  27. Scott, L. J. et al. A genome-wide association study of type 2 diabetes in Finns detects multiple susceptibility variants. Science 316, 1341–1345 (2009).

    Article  Google Scholar 

  28. Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B 58, 267–288 (1996).

    Google Scholar 

  29. Hoggart, C. J., Whittaker, J. C., De Iorio, M. & Balding, D. J. Simultaneous analysis of all SNPs in genome-wide and re-sequencing association studies. PLoS Genet. 4, e1000130 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  30. Kavvoura, F. K. & Ioannidis, J. P. A. Methods for meta-analysis in genetic association studies: a review of their potential and pitfalls. Hum. Genet. 123, 1–14 (2008).

    Article  PubMed  Google Scholar 

  31. Van Houwelingen, H. & Lebrec, J. P. in Meta-analysis and Combining Information in Genetics and Genomics (eds Guerra, R. et al.) 49–66 (CRC Press, 2009).

    Google Scholar 

  32. Ioannidis, J. P., Patsopoulos, N. A. & Evangelou, E. Heterogeneity in meta-analyses of genome-wide association investigations. PLoS ONE 2, e841 (2007).

    Article  PubMed  PubMed Central  Google Scholar 

  33. Lunn, D. J., Thomas, A., Best, N. & Spiegelhalter, D. WinBUGS — a Bayesian modelling framework: concepts, structure, and extensibility. Stat. Comput. 10, 325–337 (2000).

    Article  Google Scholar 

  34. Thompson, J. R., Minelli, C., Abrams, K. R., Thakkinstian, A. & Attia, J. Combining information from related meta-analyses of genetic association studies. J. R. Stat. Soc. C 57, 103–115 (2008).

    Article  Google Scholar 

  35. Hoggart, C. J., Clark, T. G., De Iorio, M., Whittaker, J. C. & Balding, D. J. Genome-wide significance for dense SNP and resequencing data. Genet. Epidem. 32, 179–185 (2008).

    Article  Google Scholar 

  36. Veyrieras, J.-B. et al. High-resolution mapping of expression-QTLs yields insight into human gene regulation. PLoS Genet. 4, e1000214 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  37. Lee, S.-I. et al. Learning a prior on regulatory potential from eQTL data. PLoS Genet. 5, e1000358 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  38. Chen, R. et al. FitSNPs: highly differentially expressed genes are more likely to have variants associated with disease. Genome Biol. 9, R170 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  39. Tachmazidou, I., Andrew, T., Verzilli, C. J., Johnson, M. R. & De Iorio, M. Bayesian survival analysis in genetic association studies. Bioinformatics 24, 2030–2036 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate — a practical and powerful approach to multiple testing. J. R. Stat. Soc. B 57, 289–300 (1995).

    Google Scholar 

  41. Storey, J. D. A direct approach to false discovery rates. J. R. Stat. Soc. B 64, 479–498 (2002).

    Article  Google Scholar 

  42. Wakefield, J. Bayes factors for genome-wide association studies: comparison with P-values. Genet. Epidem. 33, 79–86 (2009). This is the last in a sequence of three single-author papers published by Wakefield in successive years. This paper uses the approximate BF introduced in Reference 14 to highlight what can be regarded as implicit assumptions in the use of standard p -values as the primary summaries of evidence for association.

    Article  Google Scholar 

  43. Wang, W. Y. S., Barratt, B. J., Clayton, D. G. & Todd, J. A. Genome-wide association studies: theoretical and practical concerns. Nature Rev. Genet. 6, 109–118 (2005).

    Article  CAS  PubMed  Google Scholar 

  44. Gorlov, I. P., Gorlova, O. Y., Sunyaev, S. R., Spitz, M. R. & Amos, C. I. Shifting paradigm of association studies: value of rare single-nucleotide polymorphisms. Am. J. Hum. Genet. 82, 100–112 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Greenland, S. Multiple comparisons and association selection in general epidemiology. Intern. J. Epidem. 37, 430–434 (2008).

    Article  Google Scholar 

  46. Scheipl, F. & Kneib, T. Locally adaptive Bayesian P-splines with a normal-exponential-gamma prior. Comput. Stat. Data Anal. 53, 3533–3552 (2009).

    Article  Google Scholar 

  47. Reiner, A. P. et al. Polymorphisms of the HNF1A gene encoding hepatocyte nuclear factor-1α are associated with C-reactive protein. Am. J. Hum. Genet. 82, 1193–1201 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We thank C. Hoggart for providing R code to compute the normal-exponential-gamma probability density function and J. Wakefield for helpful discussions and critical reading of an early draft. We thank R. Krauss for access to the CRP genotype and phenotype data that we analysed here. We are also grateful to W. Astle, A. Ramasamy, L. Bottolo, L. Coin, P. O'Reilly and H. Eleftherohorinou for discussions. The authors' work is supported in part by National Institutes of Health grants HL084689 (to M.S.) and EP/C533542 (to D.J.B.).

Author information

Authors and Affiliations

Authors

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Related links

Related links

FURTHER INFORMATION

Matthew Stephens' homepage

David J. Balding's homepage

BIMBAM

SNPTEST

WinBUGS

Nature Reviews Genetics series on Modelling

Nature Reviews Genetics series on Genome-wide Association Studies

Glossary

Frequentist

A statistical school of thought in which inferences about unknowns are justified not with reference to probabilities for the inferred value, but on the basis of measures of performance under imaginary repetitions of the procedure that was used to make the inference.

Population association

Also known as true association. An association between a SNP and a phenotype that is present in the population from which a sample is taken. A population association can arise owing to population structure, but for simplicity we assume here that this possibility has been eliminated (for example, by covariate adjustment) and hence that population associations are caused by a functional SNP, either directly or through linkage disequilibrium.

p-value

The probability, if the null hypothesis were true, that an imaginary future repetition of the study would generate stronger evidence for association than that actually observed. A p-value is conventionally interpreted as measuring the strength of evidence for association, but there is no simple relationship between a p-value and the probability that the association is genuine.

Power

For a given population association, the power of a statistical test is the probability that the null hypothesis is rejected under imaginary repetitions of the study.

Bayesian

A statistical school of thought that holds that inferences about any unknown parameter or hypothesis should be encapsulated in a probability distribution, given the observed data. Computing this posterior probability distribution usually proceeds by specifying a prior distribution that summarizes knowledge about the unknown before the observed data are considered, and then using Bayes' theorem to transform the prior distribution into a posterior distribution.

Meta-analysis

The combination of the results of multiple scientific studies that address the same, or similar, hypotheses.

Posterior probability of association

The probability that a SNP is truly associated with a phenotype. The posterior probability of association depends on modelling assumptions that should be made explicit in a careful analysis.

Likelihood ratio

The ratio of the probabilities of the observed data for two different values of the unknown parameter(s) under a given statistical model.

Odds

The probability of the occurrence of a particular event (for example, the onset of disease) divided by the probability of the event not occurring. It is often mathematically convenient to transform a probability, which must lie between zero and one, to odds, which can take any positive value.

Bonferroni correction

When multiple hypotheses are tested, the Bonferroni correction to the overall desired significance level (α) is obtained by dividing it by the number of tests (k), so that each hypothesis is rejected if p-value < α/k.

False discovery rate

For a sequence of hypothesis tests, the false discovery rate is the proportion of times H0 is true among those tests for which H0 is rejected.

Odds ratio

The odds ratio comparing, for example, two genotypes is the odds for individuals with the first genotype divided by the odds for individuals with the second genotype.

Logistic regression

A regression model for binary outcomes (such as case and control) in which the logarithm of the odds is related linearly to one or more predictors, such as SNP minor allele count(s).

Laplace approximation

A method for approximating the integral of a (possibly multidimensional) probability density based on replacing that density by a Gaussian probability density with the same mean and variance–covariance matrix.

Maximum-likelihood estimate

The maximum-likelihood estimate of an unknown parameter in a statistical model is the value of the parameter that maximizes the probability under the model of the observed data.

Statin

A class of drugs that is used to lower cholesterol levels in people with, or at risk of, cardiovascular disease.

Genotype imputation method

A method for estimating ('imputing') the unobserved genotypes of study subjects, both for individuals with missing or unreliable genotypes at a genotyped SNP and for all individuals at an ungenotyped SNP.

Hardy–Weinberg equilibrium

This holds at a given locus in a given population when the two alleles of individuals in the population are mutually independent.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Stephens, M., Balding, D. Bayesian statistical methods for genetic association studies. Nat Rev Genet 10, 681–690 (2009). https://doi.org/10.1038/nrg2615

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1038/nrg2615

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing