Advantages and pitfalls in the application of mixed-model association methods

Yang, Jian; Zaitlen, Noah A; Goddard, Michael E; Visscher, Peter M; Price, Alkes L

doi:10.1038/ng.2876

Perspective
Published: 29 January 2014

Advantages and pitfalls in the application of mixed-model association methods

Jian Yang^1,2^na1,
Noah A Zaitlen³^na1,
Michael E Goddard⁴^na2,
Peter M Visscher^1,2^na2 &
…
Alkes L Price^5,6,7^na2

Nature Genetics volume 46, pages 100–106 (2014)Cite this article

37k Accesses
534 Citations
31 Altmetric
Metrics details

Subjects

Abstract

Mixed linear models are emerging as a method of choice for conducting genetic association studies in humans and other organisms. The advantages of the mixed-linear-model association (MLMA) method include the prevention of false positive associations due to population or relatedness structure and an increase in power obtained through the application of a correction that is specific to this structure. An underappreciated point is that MLMA can also increase power in studies without sample structure by implicitly conditioning on associated loci other than the candidate locus. Numerous variations on the standard MLMA approach have recently been published, with a focus on reducing computational cost. These advances provide researchers applying MLMA methods with many options to choose from, but we caution that MLMA methods are still subject to potential pitfalls. Here we describe and quantify the advantages and pitfalls of MLMA methods as a function of study design and provide recommendations for the application of these methods in practical settings.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: MLMe increases power and MLMi decreases power compared to linear regression.**

**Figure 2: Effectiveness of mixed linear models using random or top associated markers in correcting for stratification.**

**Figure 3: Effectiveness of mixed linear models using top associated markers in increasing study power.**

Genome-wide association studies

Article 26 August 2021

Emil Uffelmann, Qin Qin Huang, … Danielle Posthuma

Tissue-specific enhancer–gene maps from multimodal single-cell data identify causal disease alleles

Article 09 April 2024

Saori Sakaue, Kathryn Weinand, … Soumya Raychaudhuri

Genome-wide characterization of circulating metabolic biomarkers

Article Open access 06 March 2024

Minna K. Karjalainen, Savita Karthikeyan, … Johannes Kettunen

References

Yu, J. et al. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat. Genet. 38, 203–208 (2006).
Article CAS PubMed Google Scholar
Zhao, K. et al. An Arabidopsis example of association mapping in structured samples. PLoS Genet. 3, e4 (2007).
Article PubMed PubMed Central Google Scholar
Kang, H.M. et al. Efficient control of population structure in model organism association mapping. Genetics 178, 1709–1723 (2008).
Article PubMed PubMed Central Google Scholar
Kang, H.M. et al. Variance component model to account for sample structure in genome-wide association studies. Nat. Genet. 42, 348–354 (2010).
Article CAS PubMed PubMed Central Google Scholar
Zhang, Z. et al. Mixed linear model approach adapted for genome-wide association studies. Nat. Genet. 42, 355–360 (2010).
Article CAS PubMed PubMed Central Google Scholar
Price, A.L., Zaitlen, N.A., Reich, D. & Patterson, N. New approaches to population stratification in genome-wide association studies. Nat. Rev. Genet. 11, 459–463 (2010).
Article CAS PubMed PubMed Central Google Scholar
Lippert, C. et al. FaST linear mixed models for genome-wide association studies. Nat. Methods 8, 833–835 (2011).
Article CAS PubMed Google Scholar
Listgarten, J. et al. Improved linear mixed models for genome-wide association studies. Nat. Methods 9, 525–526 (2012).
Article CAS PubMed PubMed Central Google Scholar
Segura, V. et al. An efficient multi-locus mixed-model approach for genome-wide association studies in structured populations. Nat. Genet. 44, 825–830 (2012).
Article CAS PubMed PubMed Central Google Scholar
Korte, A. et al. A mixed-model approach for genome-wide association studies of correlated traits in structured populations. Nat. Genet. 44, 1066–1071 (2012).
Article CAS PubMed PubMed Central Google Scholar
Zhou, X. & Stephens, M. Genome-wide efficient mixed-model analysis for association studies. Nat. Genet. 44, 821–824 (2012).
Article CAS PubMed PubMed Central Google Scholar
Svishcheva, G.R., Axenovich, T.I., Belonogova, N.M., van Duijn, C.M. & Aulchenko, Y.S. Rapid variance components–based method for whole-genome association analysis. Nat. Genet. 44, 1166–1170 (2012).
Article CAS PubMed Google Scholar
Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 42, 565–569 (2010).
Article CAS PubMed PubMed Central Google Scholar
Zaitlen, N. & Kraft, P. Heritability in the genome-wide association era. Hum. Genet. 131, 1655–1664 (2012).
Article PubMed PubMed Central Google Scholar
Henderson, C.R. Best linear unbiased estimation and prediction under a selection model. Biometrics 31, 423–447 (1975).
Article CAS PubMed Google Scholar
de los Campos, G., Gianola, D. & Allison, D.B. Predicting genetic predisposition in humans: the promise of whole-genome markers. Nat. Rev. Genet. 11, 880–886 (2010).
Article CAS PubMed Google Scholar
Sul, J.H. & Eskin, E. Mixed models can correct for population structure for genomic regions under selection. Nat. Rev. Genet. 14, 300 (2013).
Article CAS PubMed Google Scholar
Price, A.L., Zaitlen, N.A., Reich, D. & Patterson, N. Response to Sul and Eskin. Nat. Rev. Genet. 14, 300 (2013).
Article CAS PubMed PubMed Central Google Scholar
Wang, K., Hu, X. & Peng, Y. An analytical comparison of the principal component method and the mixed effects model for association studies in the presence of cryptic relatedness and population stratification. Hum. Hered. 76, 1–9 (2013).
Article PubMed Google Scholar
Mathieson, I. & McVean, G. Differential confounding of rare and common variants in spatially structured populations. Nat. Genet. 44, 243–246 (2012).
Article CAS PubMed PubMed Central Google Scholar
Chen, W.M. & Abecasis, G.R. Family-based association tests for genomewide association scans. Am. J. Hum. Genet. 81, 913–926 (2007).
Article CAS PubMed PubMed Central Google Scholar
Yang, J. et al. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011).
Article CAS PubMed PubMed Central Google Scholar
Sawcer, S. et al. Genetic risk and a primary role for cell-mediated immune mechanisms in multiple sclerosis. Nature 476, 214–219 (2011).
Article CAS PubMed PubMed Central Google Scholar
Yang, J. et al. Genomic inflation factors under polygenic inheritance. Eur. J. Hum. Genet. 19, 807–812 (2011).
Article PubMed PubMed Central Google Scholar
Yang, J. et al. Genome partitioning of genetic variation for complex traits using common SNPs. Nat. Genet. 43, 519–525 (2011).
Article CAS PubMed PubMed Central Google Scholar
Lippert, C. et al. The benefits of selecting phenotype-specific variants for applications of mixed models in genomics. Sci. Rep. 3, 1815 (2013).
Article PubMed PubMed Central Google Scholar
Price, A.L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904–909 (2006).
Article CAS PubMed Google Scholar
Listgarten, J., Lippert, C. & Heckerman, D. FaST-LMM-Select for addressing confounding from spatial structure and rare variants. Nat. Genet. 45, 470–471 (2013).
Article CAS PubMed Google Scholar
Mefford, J. & Witte, J.S. The Covariate's Dilemma. PLoS Genet. 8, e1003096 (2012).
Article CAS PubMed PubMed Central Google Scholar
Zaitlen, N. et al. Analysis of case-control association studies with known risk variants. Bioinformatics 28, 1729–1737 (2012).
Article CAS PubMed PubMed Central Google Scholar
Clayton, D. Link functions in multi-locus genetic models: implications for testing, prediction, and interpretation. Genet. Epidemiol. 36, 409–418 (2012).
Article PubMed PubMed Central Google Scholar
Pirinen, M., Donnelly, P. & Spencer, C.C. Including known covariates can reduce power to detect genetic effects in case-control studies. Nat. Genet. 44, 848–851 (2012).
Article CAS PubMed Google Scholar
Zaitlen, N. et al. Informed conditioning on clinical covariates increases power in case-control association studies. PLoS Genet. 8, e1003032 (2012).
Article CAS PubMed PubMed Central Google Scholar
Falconer, D.S. The inheritance of liability to diseases with variable age of onset, with particular reference to diabetes mellitus. Ann. Hum. Genet. 31, 1–20 (1967).
Article CAS PubMed Google Scholar
Lee, S.H., Wray, N.R., Goddard, M.E. & Visscher, P.M. Estimating missing heritability for disease from genome-wide association studies. Am. J. Hum. Genet. 88, 294–305 (2011).
Article PubMed PubMed Central Google Scholar
Jostins, L. et al. Host-microbe interactions have shaped the genetic architecture of inflammatory bowel disease. Nature 491, 119–124 (2012).
Article CAS PubMed PubMed Central Google Scholar
Lee, S.H. et al. Estimation and partitioning of polygenic variation captured by common SNPs for Alzheimer's disease, multiple sclerosis and endometriosis. Hum. Mol. Genet. 22, 832–841 (2013).
Article CAS PubMed Google Scholar
Devlin, B. & Roeder, K. Genomic control for association studies. Biometrics 55, 997–1004 (1999).
Article CAS PubMed Google Scholar
Lango Allen, H. et al. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature 467, 832–838 (2010).
Article CAS PubMed PubMed Central Google Scholar
Meuwissen, T.H., Hayes, B.J. & Goddard, M.E. Prediction of total genetic value using genome-wide dense marker maps. Genetics 157, 1819–1829 (2001).
CAS PubMed PubMed Central Google Scholar
Erbe, M. et al. Improving accuracy of genomic predictions within and between dairy cattle breeds with imputed high-density single nucleotide polymorphism panels. J. Dairy Sci. 95, 4114–4129 (2012).
Article CAS PubMed Google Scholar
Zhou, X., Carbonetto, P. & Stephens, M. Polygenic modeling with bayesian sparse linear mixed models. PLoS Genet. 9, e1003264 (2013).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We are grateful to N. Patterson, D. Heckerman, J. Listgarten, C. Lippert, E. Eskin, B. Vilhjalmsson, P. Loh, T. Hayeck, T. Frayling, A. McRae, L. Ronnegard, O. Weissbrod, G. Tucker and the GIANT Consortium for helpful discussions and to A. Gusev and S. Pollack for assistance with the multiple sclerosis and ulcerative colitis data sets. We are grateful to two anonymous referees for their helpful comments. This study makes use of data generated by the Wellcome Trust Case Control Consortium and data from the database of Genotypes and Phenotypes (dbGaP) under accessions phs000090.v2.p1 and phs000091.v2.p1 (see the Supplementary Note for the full set of acknowledgments for these data). This research was supported by US National Institutes of Health (NIH) grants R01 HG006399, P01 GM099568 and R01 GM075091, by the Australian Research Council (DP130102666) and by the Australian National Health and Medical Research Council (APP1011506 and APP1052684).

Author information

Jian Yang and Noah A Zaitlen: These authors contributed equally to this work.
Michael E Goddard, Peter M Visscher and Alkes L Price: These authors jointly directed this work.

Authors and Affiliations

Queensland Brain Institute, University of Queensland, Brisbane, Queensland, Australia
Jian Yang & Peter M Visscher
University of Queensland Diamantina Institute, University of Queensland, Princess Alexandra Hospital, Brisbane, Queensland, Australia
Jian Yang & Peter M Visscher
Department of Medicine, Lung Biology Center, University of California, San Francisco, San Francisco, California, USA
Noah A Zaitlen
Faculty of Land and Food Resources, University of Melbourne, Parkville, Victoria, Australia
Michael E Goddard
Department of Epidemiology, Harvard School of Public Health, Boston, Massachusetts, USA
Alkes L Price
Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts, USA
Alkes L Price
Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
Alkes L Price

Authors

Jian Yang
View author publications
You can also search for this author in PubMed Google Scholar
Noah A Zaitlen
View author publications
You can also search for this author in PubMed Google Scholar
Michael E Goddard
View author publications
You can also search for this author in PubMed Google Scholar
Peter M Visscher
View author publications
You can also search for this author in PubMed Google Scholar
Alkes L Price
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors conceived the project and designed the analyses. J.Y., N.A.Z. and A.L.P. performed the analyses. J.Y., M.E.G. and P.M.V. provided the theoretical derivations. J.Y. wrote the GCTA software. J.Y., N.A.Z. and A.L.P. wrote the manuscript with edits from all authors.

Corresponding authors

Correspondence to Peter M Visscher or Alkes L Price.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Figure 1, Supplementary Tables 1–11 and Supplementary Note. (PDF 434 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, J., Zaitlen, N., Goddard, M. et al. Advantages and pitfalls in the application of mixed-model association methods. Nat Genet 46, 100–106 (2014). https://doi.org/10.1038/ng.2876

Download citation

Received: 23 August 2013
Accepted: 30 December 2013
Published: 29 January 2014
Issue Date: February 2014
DOI: https://doi.org/10.1038/ng.2876

This article is cited by

Genetic background of juniper (Juniperus spp.) consumption predicted by fecal near-infrared spectroscopy in divergently selected goats raised in harsh rangeland environments
- Henrique A. Mulim
- John W. Walker
- Luiz F. Brito
BMC Genomics (2024)
Identification of eQTLs using different sets of single nucleotide polymorphisms associated with carcass and body composition traits in pigs
- Felipe André Oliveira Freitas
- Luiz F. Brito
- Aline Silva Mello Cesar
BMC Genomics (2024)
A Genome Wide Association Study (GWAS) Identifies SNPs Associated with Resistance to Tobacco Rattle Virus (TRV) and Potato Mop-Top Virus (PMTV) in a Tetraploid Mapping Population of Potato
- Noelle L. Anglin
- Shashi K. R. Yellarreddygari
- Joseph J. Coombs
American Journal of Potato Research (2024)
An innovative model of psychological service delivery in primary healthcare: the Single-Session Intervention
- Kathy Perreault
- Mylaine Breton
- Djamal Berbiche
BMC Primary Care (2023)
Cataracts in Havanese: genome wide association study reveals two loci associated with posterior polar cataract
- Kim K. L. Bellamy
- Frode Lingaas
Canine Medicine and Genetics (2023)