Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advantages and pitfalls in the application of mixed-model association methods

Abstract

Mixed linear models are emerging as a method of choice for conducting genetic association studies in humans and other organisms. The advantages of the mixed-linear-model association (MLMA) method include the prevention of false positive associations due to population or relatedness structure and an increase in power obtained through the application of a correction that is specific to this structure. An underappreciated point is that MLMA can also increase power in studies without sample structure by implicitly conditioning on associated loci other than the candidate locus. Numerous variations on the standard MLMA approach have recently been published, with a focus on reducing computational cost. These advances provide researchers applying MLMA methods with many options to choose from, but we caution that MLMA methods are still subject to potential pitfalls. Here we describe and quantify the advantages and pitfalls of MLMA methods as a function of study design and provide recommendations for the application of these methods in practical settings.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Buy article

Get time limited or full article access on ReadCube.

$32.00

All prices are NET prices.

Figure 1: MLMe increases power and MLMi decreases power compared to linear regression.
Figure 2: Effectiveness of mixed linear models using random or top associated markers in correcting for stratification.
Figure 3: Effectiveness of mixed linear models using top associated markers in increasing study power.

References

  1. Yu, J. et al. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat. Genet. 38, 203–208 (2006).

    Article  CAS  PubMed  Google Scholar 

  2. Zhao, K. et al. An Arabidopsis example of association mapping in structured samples. PLoS Genet. 3, e4 (2007).

    Article  PubMed  PubMed Central  Google Scholar 

  3. Kang, H.M. et al. Efficient control of population structure in model organism association mapping. Genetics 178, 1709–1723 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  4. Kang, H.M. et al. Variance component model to account for sample structure in genome-wide association studies. Nat. Genet. 42, 348–354 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Zhang, Z. et al. Mixed linear model approach adapted for genome-wide association studies. Nat. Genet. 42, 355–360 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Price, A.L., Zaitlen, N.A., Reich, D. & Patterson, N. New approaches to population stratification in genome-wide association studies. Nat. Rev. Genet. 11, 459–463 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Lippert, C. et al. FaST linear mixed models for genome-wide association studies. Nat. Methods 8, 833–835 (2011).

    Article  CAS  PubMed  Google Scholar 

  8. Listgarten, J. et al. Improved linear mixed models for genome-wide association studies. Nat. Methods 9, 525–526 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Segura, V. et al. An efficient multi-locus mixed-model approach for genome-wide association studies in structured populations. Nat. Genet. 44, 825–830 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Korte, A. et al. A mixed-model approach for genome-wide association studies of correlated traits in structured populations. Nat. Genet. 44, 1066–1071 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Zhou, X. & Stephens, M. Genome-wide efficient mixed-model analysis for association studies. Nat. Genet. 44, 821–824 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Svishcheva, G.R., Axenovich, T.I., Belonogova, N.M., van Duijn, C.M. & Aulchenko, Y.S. Rapid variance components–based method for whole-genome association analysis. Nat. Genet. 44, 1166–1170 (2012).

    Article  CAS  PubMed  Google Scholar 

  13. Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 42, 565–569 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Zaitlen, N. & Kraft, P. Heritability in the genome-wide association era. Hum. Genet. 131, 1655–1664 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  15. Henderson, C.R. Best linear unbiased estimation and prediction under a selection model. Biometrics 31, 423–447 (1975).

    Article  CAS  PubMed  Google Scholar 

  16. de los Campos, G., Gianola, D. & Allison, D.B. Predicting genetic predisposition in humans: the promise of whole-genome markers. Nat. Rev. Genet. 11, 880–886 (2010).

    Article  CAS  PubMed  Google Scholar 

  17. Sul, J.H. & Eskin, E. Mixed models can correct for population structure for genomic regions under selection. Nat. Rev. Genet. 14, 300 (2013).

    Article  CAS  PubMed  Google Scholar 

  18. Price, A.L., Zaitlen, N.A., Reich, D. & Patterson, N. Response to Sul and Eskin. Nat. Rev. Genet. 14, 300 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Wang, K., Hu, X. & Peng, Y. An analytical comparison of the principal component method and the mixed effects model for association studies in the presence of cryptic relatedness and population stratification. Hum. Hered. 76, 1–9 (2013).

    Article  PubMed  Google Scholar 

  20. Mathieson, I. & McVean, G. Differential confounding of rare and common variants in spatially structured populations. Nat. Genet. 44, 243–246 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Chen, W.M. & Abecasis, G.R. Family-based association tests for genomewide association scans. Am. J. Hum. Genet. 81, 913–926 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Yang, J. et al. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Sawcer, S. et al. Genetic risk and a primary role for cell-mediated immune mechanisms in multiple sclerosis. Nature 476, 214–219 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Yang, J. et al. Genomic inflation factors under polygenic inheritance. Eur. J. Hum. Genet. 19, 807–812 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  25. Yang, J. et al. Genome partitioning of genetic variation for complex traits using common SNPs. Nat. Genet. 43, 519–525 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Lippert, C. et al. The benefits of selecting phenotype-specific variants for applications of mixed models in genomics. Sci. Rep. 3, 1815 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  27. Price, A.L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904–909 (2006).

    Article  CAS  PubMed  Google Scholar 

  28. Listgarten, J., Lippert, C. & Heckerman, D. FaST-LMM-Select for addressing confounding from spatial structure and rare variants. Nat. Genet. 45, 470–471 (2013).

    Article  CAS  PubMed  Google Scholar 

  29. Mefford, J. & Witte, J.S. The Covariate's Dilemma. PLoS Genet. 8, e1003096 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Zaitlen, N. et al. Analysis of case-control association studies with known risk variants. Bioinformatics 28, 1729–1737 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Clayton, D. Link functions in multi-locus genetic models: implications for testing, prediction, and interpretation. Genet. Epidemiol. 36, 409–418 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  32. Pirinen, M., Donnelly, P. & Spencer, C.C. Including known covariates can reduce power to detect genetic effects in case-control studies. Nat. Genet. 44, 848–851 (2012).

    Article  CAS  PubMed  Google Scholar 

  33. Zaitlen, N. et al. Informed conditioning on clinical covariates increases power in case-control association studies. PLoS Genet. 8, e1003032 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Falconer, D.S. The inheritance of liability to diseases with variable age of onset, with particular reference to diabetes mellitus. Ann. Hum. Genet. 31, 1–20 (1967).

    Article  CAS  PubMed  Google Scholar 

  35. Lee, S.H., Wray, N.R., Goddard, M.E. & Visscher, P.M. Estimating missing heritability for disease from genome-wide association studies. Am. J. Hum. Genet. 88, 294–305 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  36. Jostins, L. et al. Host-microbe interactions have shaped the genetic architecture of inflammatory bowel disease. Nature 491, 119–124 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Lee, S.H. et al. Estimation and partitioning of polygenic variation captured by common SNPs for Alzheimer's disease, multiple sclerosis and endometriosis. Hum. Mol. Genet. 22, 832–841 (2013).

    Article  CAS  PubMed  Google Scholar 

  38. Devlin, B. & Roeder, K. Genomic control for association studies. Biometrics 55, 997–1004 (1999).

    Article  CAS  PubMed  Google Scholar 

  39. Lango Allen, H. et al. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature 467, 832–838 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Meuwissen, T.H., Hayes, B.J. & Goddard, M.E. Prediction of total genetic value using genome-wide dense marker maps. Genetics 157, 1819–1829 (2001).

    CAS  PubMed  PubMed Central  Google Scholar 

  41. Erbe, M. et al. Improving accuracy of genomic predictions within and between dairy cattle breeds with imputed high-density single nucleotide polymorphism panels. J. Dairy Sci. 95, 4114–4129 (2012).

    Article  CAS  PubMed  Google Scholar 

  42. Zhou, X., Carbonetto, P. & Stephens, M. Polygenic modeling with bayesian sparse linear mixed models. PLoS Genet. 9, e1003264 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We are grateful to N. Patterson, D. Heckerman, J. Listgarten, C. Lippert, E. Eskin, B. Vilhjalmsson, P. Loh, T. Hayeck, T. Frayling, A. McRae, L. Ronnegard, O. Weissbrod, G. Tucker and the GIANT Consortium for helpful discussions and to A. Gusev and S. Pollack for assistance with the multiple sclerosis and ulcerative colitis data sets. We are grateful to two anonymous referees for their helpful comments. This study makes use of data generated by the Wellcome Trust Case Control Consortium and data from the database of Genotypes and Phenotypes (dbGaP) under accessions phs000090.v2.p1 and phs000091.v2.p1 (see the Supplementary Note for the full set of acknowledgments for these data). This research was supported by US National Institutes of Health (NIH) grants R01 HG006399, P01 GM099568 and R01 GM075091, by the Australian Research Council (DP130102666) and by the Australian National Health and Medical Research Council (APP1011506 and APP1052684).

Author information

Authors and Affiliations

Authors

Contributions

All authors conceived the project and designed the analyses. J.Y., N.A.Z. and A.L.P. performed the analyses. J.Y., M.E.G. and P.M.V. provided the theoretical derivations. J.Y. wrote the GCTA software. J.Y., N.A.Z. and A.L.P. wrote the manuscript with edits from all authors.

Corresponding authors

Correspondence to Peter M Visscher or Alkes L Price.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Figure 1, Supplementary Tables 1–11 and Supplementary Note. (PDF 434 kb)

Source data

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Yang, J., Zaitlen, N., Goddard, M. et al. Advantages and pitfalls in the application of mixed-model association methods. Nat Genet 46, 100–106 (2014). https://doi.org/10.1038/ng.2876

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/ng.2876

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing