Technical Report | Published:

Efficient Bayesian mixed-model analysis increases association power in large cohorts

Nature Genetics volume 47, pages 284290 (2015) | Download Citation

Abstract

Linear mixed models are a powerful statistical tool for identifying genetic associations and avoiding confounding. However, existing methods are computationally intractable in large cohorts and may not optimize power. All existing methods require time cost O(MN2) (where N is the number of samples and M is the number of SNPs) and implicitly assume an infinitesimal genetic architecture in which effect sizes are normally distributed, which can limit power. Here we present a far more efficient mixed-model association method, BOLT-LMM, which requires only a small number of O(MN) time iterations and increases power by modeling more realistic, non-infinitesimal genetic architectures via a Bayesian mixture prior on marker effect sizes. We applied BOLT-LMM to 9 quantitative traits in 23,294 samples from the Women's Genome Health Study (WGHS) and observed significant increases in power, consistent with simulations. Theory and simulations show that the boost in power increases with cohort size, making BOLT-LMM appealing for genome-wide association studies in large cohorts.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

References

  1. 1.

    et al. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat. Genet. 38, 203–208 (2006).

  2. 2.

    et al. Efficient control of population structure in model organism association mapping. Genetics 178, 1709–1723 (2008).

  3. 3.

    et al. Variance component model to account for sample structure in genome-wide association studies. Nat. Genet. 42, 348–354 (2010).

  4. 4.

    et al. Mixed linear model approach adapted for genome-wide association studies. Nat. Genet. 42, 355–360 (2010).

  5. 5.

    et al. FaST linear mixed models for genome-wide association studies. Nat. Methods 8, 833–835 (2011).

  6. 6.

    & Genome-wide efficient mixed-model analysis for association studies. Nat. Genet. 44, 821–824 (2012).

  7. 7.

    et al. An efficient multi-locus mixed-model approach for genome-wide association studies in structured populations. Nat. Genet. 44, 825–830 (2012).

  8. 8.

    et al. A mixed-model approach for genome-wide association studies of correlated traits in structured populations. Nat. Genet. 44, 1066–1071 (2012).

  9. 9.

    et al. Improved linear mixed models for genome-wide association studies. Nat. Methods 9, 525–526 (2012).

  10. 10.

    , , , & Rapid variance components–based method for whole-genome association analysis. Nat. Genet. 44, 1166–1170 (2012).

  11. 11.

    , & FaST-LMM-Select for addressing confounding from spatial structure and rare variants. Nat. Genet. 45, 470–471 (2013).

  12. 12.

    , , , & Advantages and pitfalls in the application of mixed-model association methods. Nat. Genet. 46, 100–106 (2014).

  13. 13.

    et al. Genomic inflation factors under polygenic inheritance. Eur. J. Hum. Genet. 19, 807–812 (2011).

  14. 14.

    et al. Bayesian inference analyses of the polygenic architecture of rheumatoid arthritis. Nat. Genet. 44, 483–489 (2012).

  15. 15.

    et al. The benefits of selecting phenotype-specific variants for applications of mixed models in genomics. Sci. Rep. 3, 1815 (2013).

  16. 16.

    , , & A Lasso multi-marker mixed model for association mapping with population structure correction. Bioinformatics 29, 206–214 (2013).

  17. 17.

    , & Prediction of total genetic value using genome-wide dense marker maps. Genetics 157, 1819–1829 (2001).

  18. 18.

    , , , & Whole-genome regression and prediction methods applied to plant and animal breeding. Genetics 193, 327–345 (2013).

  19. 19.

    , & Polygenic modeling with Bayesian sparse linear mixed models. PLoS Genet. 9, e1003264 (2013).

  20. 20.

    , , & A fast algorithm for BayesB type of prediction of genome-wide estimates of genetic value. Genet. Sel. Evol. 41, 2 (2009).

  21. 21.

    & Scalable variational inference for Bayesian variable selection in regression, and its accuracy in genetic association studies. Bayesian Anal. 7, 73–108 (2012).

  22. 22.

    , & A variational Bayes algorithm for fast and accurate multiple locus genome-wide association analysis. BMC Bioinformatics 11, 58 (2010).

  23. 23.

    & MASTOR: mixed-model association mapping of quantitative traits in samples with related individuals. Am. J. Hum. Genet. 92, 652–666 (2013).

  24. 24.

    et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 10.1038/ng.3211 (2 February 2015).

  25. 25.

    et al. Rationale, design, and methodology of the Women's Genome Health Study: a genome-wide association study of more than 25,000 initially healthy American women. Clin. Chem. 54, 249–255 (2008).

  26. 26.

    , , & Variance component estimation by resampling. J. Anim. Breed. Genet. 109, 358–363 (1992).

  27. 27.

    , , , & Employing a Monte Carlo algorithm in Newton-type methods for restricted maximum likelihood estimation of genetic parameters. PLoS ONE 8, e80821 (2013).

  28. 28.

    & Computing strategies in genome-wide selection. J. Dairy Sci. 91, 360–366 (2008).

  29. 29.

    Efficient methods to compute genomic predictions. J. Dairy Sci. 91, 4414–4423 (2008).

  30. 30.

    et al. Genetic risk and a primary role for cell-mediated immune mechanisms in multiple sclerosis. Nature 476, 214–219 (2011).

  31. 31.

    , , & GenABEL: an R library for genome-wide association analysis. Bioinformatics 23, 1294–1296 (2007).

  32. 32.

    et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904–909 (2006).

  33. 33.

    & Genomic control for association studies. Biometrics 55, 997–1004 (1999).

  34. 34.

    et al. Pitfalls of predicting complex traits from SNPs. Nat. Rev. Genet. 14, 507–515 (2013).

  35. 35.

    et al. Demonstrating stratification in a European American population. Nat. Genet. 37, 868–872 (2005).

  36. 36.

    , & Improving the power of GWAS and avoiding confounding from population stratification with PC-Select. Genetics 197, 1045–1049 (2014).

  37. 37.

    & Bayesian statistical methods for genetic association studies. Nat. Rev. Genet. 10, 681–690 (2009).

  38. 38.

    , , , & A novel variational Bayes multiple locus Z-statistic for genome-wide association studies with Bayesian model averaging. Bioinformatics 28, 1738–1744 (2012).

  39. 39.

    et al. Nonsense mutation in the LGR4 gene is associated with several human diseases and other traits. Nature 497, 517–520 (2013).

  40. 40.

    et al. Web-based genome-wide association study identifies two novel loci and a substantial genetic component for Parkinson's disease. PLoS Genet. 7, e1002141 (2011).

  41. 41.

    et al. Mixed model with correction for case-control ascertainment increases association power. bioRxiv 10.1101/008755 (2014).

  42. 42.

    & MultiBLUP: improved SNP-based prediction for complex traits. Genome Res. 24, 1550–1557 (2014).

  43. 43.

    & Family-based association tests for genomewide association scans. Am. J. Hum. Genet. 81, 913–926 (2007).

  44. 44.

    , & Genomewide rapid association using mixed model and regression: a fast and simple method for genomewide pedigree-based quantitative trait loci association analysis. Genetics 177, 577–585 (2007).

  45. 45.

    , & A generalized family-based association test for dichotomous traits. Am. J. Hum. Genet. 85, 364–376 (2009).

  46. 46.

    & Convex Optimization (Cambridge University Press, 2004).

  47. 47.

    et al. Genome partitioning of genetic variation for complex traits using common SNPs. Nat. Genet. 43, 519–525 (2011).

  48. 48.

    et al. Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 42, 565–569 (2010).

Download references

Acknowledgements

We are grateful to M. Lipson, S. Simmons, A. Gusev, K. Galinsky, J. Yang, P. Visscher, Z. Zhu and D. Gudbjartsson for helpful discussions. This research was supported by US National Institutes of Health grant R01 HG006399 and US National Institutes of Health fellowship F32 HG007805. H.K.F. was supported by the Fannie and John Hertz Foundation. The WGHS is supported by HL043851 and grants HL080467 from the National Heart, Lung, and Blood Institute and grant CA047988 from the National Cancer Institute, by the Donald W. Reynolds Foundation and by the Fondation Leducq, with collaborative scientific support and funding for genotyping provided by Amgen.

Author information

Affiliations

  1. Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA.

    • Po-Ru Loh
    • , George Tucker
    • , Bjarni J Vilhjálmsson
    •  & Alkes L Price
  2. Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA.

    • Po-Ru Loh
    • , Brendan K Bulik-Sullivan
    • , Bjarni J Vilhjálmsson
    • , Rany M Salem
    • , Benjamin M Neale
    • , Nick Patterson
    •  & Alkes L Price
  3. Department of Mathematics, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.

    • George Tucker
    • , Hilary K Finucane
    •  & Bonnie Berger
  4. Computer Science and Artificial Intelligence Laboratory, Cambridge, Massachusetts, USA.

    • George Tucker
    •  & Bonnie Berger
  5. Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts, USA.

    • Brendan K Bulik-Sullivan
    •  & Benjamin M Neale
  6. Department of Endocrinology, Children's Hospital Boston, Boston, Massachusetts, USA.

    • Rany M Salem
  7. Division of Preventive Medicine, Brigham and Women's Hospital, Boston, Massachusetts, USA.

    • Daniel I Chasman
    •  & Paul M Ridker
  8. Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA.

    • Alkes L Price

Authors

  1. Search for Po-Ru Loh in:

  2. Search for George Tucker in:

  3. Search for Brendan K Bulik-Sullivan in:

  4. Search for Bjarni J Vilhjálmsson in:

  5. Search for Hilary K Finucane in:

  6. Search for Rany M Salem in:

  7. Search for Daniel I Chasman in:

  8. Search for Paul M Ridker in:

  9. Search for Benjamin M Neale in:

  10. Search for Bonnie Berger in:

  11. Search for Nick Patterson in:

  12. Search for Alkes L Price in:

Contributions

P.-R.L., N.P. and A.L.P. designed experiments. P.-R.L. performed experiments. P.-R.L., G.T., B.K.B.-S., B.J.V., H.K.F. and A.L.P. analyzed data. D.I.C. and P.M.R. provided data. All authors wrote the manuscript.

Competing interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to Po-Ru Loh or Alkes L Price.

Supplementary information

PDF files

  1. 1.

    Supplementary Text and Figures

    Supplementary Figures 1–7, Supplementary Tables 1–15 and Supplementary Note.

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/ng.3190

Further reading