An efficient multi-locus mixed-model approach for genome-wide association studies in structured populations

Segura, Vincent; Vilhjálmsson, Bjarni J; Platt, Alexander; Korte, Arthur; Seren, Ümit; Long, Quan; Nordborg, Magnus

doi:10.1038/ng.2314

Technical Report
Published: 17 June 2012

An efficient multi-locus mixed-model approach for genome-wide association studies in structured populations

Vincent Segura^1,2^na1,
Bjarni J Vilhjálmsson^1,3^na1,
Alexander Platt^1,3,
Arthur Korte¹,
Ümit Seren¹,
Quan Long¹ &
…
Magnus Nordborg^1,3

Nature Genetics volume 44, pages 825–830 (2012)Cite this article

14k Accesses
600 Citations
3 Altmetric
Metrics details

Subjects

Abstract

Population structure causes genome-wide linkage disequilibrium between unlinked loci, leading to statistical confounding in genome-wide association studies. Mixed models have been shown to handle the confounding effects of a diffuse background of large numbers of loci of small effect well, but they do not always account for loci of larger effect. Here we propose a multi-locus mixed model as a general method for mapping complex traits in structured populations. Simulations suggest that our method outperforms existing methods in terms of power as well as false discovery rate. We apply our method to human and Arabidopsis thaliana data, identifying new associations and evidence for allelic heterogeneity. We also show how a priori knowledge from an A. thaliana linkage mapping study can be integrated into our method using a Bayesian approach. Our implementation is computationally efficient, making the analysis of large data sets (n > 10,000) practicable.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: A GWAS for a simulated trait with two causal SNPs randomly chosen from a real *A. thaliana* SNP data set.**

**Figure 2: Power and FDR in 100-locus model simulations for four different mapping methods: LM, SWLM, MM and MLMM.**

**Figure 3: GWAS for LDL levels in the NFBC1966 data set.**

**Figure 4: GWAS for sodium accumulation in *A. thaliana*.**

**Figure 5: An example of Bayesian MLMM for the analysis of *FLC* expression in *A. thaliana*.**

Genome-wide association studies

Article 26 August 2021

Emil Uffelmann, Qin Qin Huang, … Danielle Posthuma

The variation and evolution of complete human centromeres

Article Open access 03 April 2024

Glennis A. Logsdon, Allison N. Rozanski, … Evan E. Eichler

Tissue-specific enhancer–gene maps from multimodal single-cell data identify causal disease alleles

Article 09 April 2024

Saori Sakaue, Kathryn Weinand, … Soumya Raychaudhuri

References

Cardon, L.R. & Palmer, L.J. Population stratification and spurious allelic association. Lancet 361, 598–604 (2003).
Article Google Scholar
Marchini, J., Cardon, L.R., Phillips, M.S. & Donnelly, P. The effects of human population structure on large genetic association studies. Nat. Genet. 36, 512–517 (2004).
Article CAS Google Scholar
Devlin, B. & Roeder, K. Genomic control for association studies. Biometrics 55, 997–1004 (1999).
Article CAS Google Scholar
Pritchard, J.K., Stephens, M., Rosenberg, N.A. & Donnelly, P. Association mapping in structured populations. Am. J. Hum. Genet. 67, 170–181 (2000).
Article CAS Google Scholar
Price, A.L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904–909 (2006).
Article CAS Google Scholar
Yu, J. et al. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat. Genet. 38, 203–208 (2006).
Article CAS Google Scholar
Zhao, K. et al. An Arabidopsis example of association mapping in structured samples. PLoS Genet. 3, e4 (2007).
Article Google Scholar
Henderson, C.R. Application of Linear Models in Animal Breeding (University of Guelph, Guelph, Canada, 1984).
Fisher, R.A. The correlation between relatives on the supposition of Mendelian inheritance. Trans. R. Soc. Edinb. 52, 399–433 (1918).
Article Google Scholar
Kang, H.M. et al. Efficient control of population structure in model organism association mapping. Genetics 178, 1709–1723 (2008).
Article Google Scholar
Kang, H.M. et al. Variance component model to account for sample structure in genome-wide association studies. Nat. Genet. 42, 348–354 (2010).
Article CAS Google Scholar
Atwell, S. et al. Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines. Nature 465, 627–631 (2010).
Article CAS Google Scholar
Aulchenko, Y.S., de Koning, D.J. & Haley, C. Genomewide rapid association using mixed model and regression: a fast and simple method for genomewide pedigree-based quantitative trait loci association analysis. Genetics 177, 577–585 (2007).
Article CAS Google Scholar
Zhang, Z. et al. Mixed linear model approach adapted for genome-wide association studies. Nat. Genet. 42, 355–360 (2010).
Article CAS Google Scholar
Yang, J. et al. Genomic inflation factors under polygenic inheritance. Eur. J. Hum. Genet. 19, 807–812 (2011).
Article Google Scholar
Jansen, R.C. Interval mapping of multiple quantitative trait loci. Genetics 135, 205–211 (1993).
CAS PubMed PubMed Central Google Scholar
Zeng, Z.B. Precision mapping of quantitative trait loci. Genetics 136, 1457–1468 (1994).
CAS PubMed PubMed Central Google Scholar
Platt, A., Vilhjalmsson, B.J. & Nordborg, M. Conditions under which genome-wide association studies will be positively misleading. Genetics 186, 1045–1052 (2010).
Article Google Scholar
Allen, A.S., Satten, G.A., Bray, S.L., Dudbridge, F. & Epstein, M.P. Fast and robust association tests for untyped SNPs in case-control studies. Hum. Hered. 70, 167–176 (2010).
Article CAS Google Scholar
Dickson, S.P., Wang, K., Krantz, I., Hakonarson, H. & Goldstein, D.B. Rare variants create synthetic genome-wide associations. PLoS Biol. 8, e1000294 (2010).
Article Google Scholar
Cordell, H.J. & Clayton, D.G. A unified stepwise regression procedure for evaluating the relative effects of polymorphisms within a gene using case/control or family data: application to HLA in type 1 diabetes. Am. J. Hum. Genet. 70, 124–141 (2002).
Article CAS Google Scholar
Hoggart, C.J., Whittaker, J.C., De Iorio, M. & Balding, D.J. Simultaneous analysis of all SNPs in genome-wide and re-sequencing association studies. PLoS Genet. 4, e1000130 (2008).
Article Google Scholar
Malo, N., Libiger, O. & Schork, N.J. Accommodating linkage disequilibrium in genetic-association analyses via ridge regression. Am. J. Hum. Genet. 82, 375–385 (2008).
Article CAS Google Scholar
Croiseau, P. & Cordell, H.J. Analysis of North American Rheumatoid Arthritis Consortium data using a penalized logistic regression approach. BMC Proc. 3, S61 (2009).
Article Google Scholar
Cho, S. et al. Joint identification of multiple genetic variants via elastic-net variable selection in a genome-wide association analysis. Ann. Hum. Genet. 74, 416–428 (2010).
Article Google Scholar
Wang, D., Eskridge, K.M. & Crossa, J. Identifying QTLs and epistasis in structured plant populations using adaptive mixed LASSO. J. Agric. Biol. Environ. Stat. 16, 170–184 (2011).
Article Google Scholar
Ayers, K.L. & Cordell, H.J. SNP selection in genome-wide and candidate gene studies via penalized logistic regression. Genet. Epidemiol. 34, 879–891 (2010).
Article Google Scholar
Horton, M.W. et al. Genome-wide patterns of genetic variation in worldwide Arabidopsis thaliana accessions from the RegMap panel. Nat. Genet. 44, 212–216 (2012).
Article CAS Google Scholar
Chen, J.H. & Chen, Z.H. Extended Bayesian information criteria for model selection with large model spaces. Biometrika 95, 759–771 (2008).
Article Google Scholar
Astle, W. & Balding, D.J. Population structure and cryptic relatedness in genetic association studies. Stat. Sci. 24, 451–471 (2009).
Article Google Scholar
Sabatti, C. et al. Genome-wide association analysis of metabolic traits in a birth cohort from a founder population. Nat. Genet. 41, 35–46 (2009).
Article CAS Google Scholar
Kathiresan, S. et al. Common variants at 30 loci contribute to polygenic dyslipidemia. Nat. Genet. 41, 56–65 (2009).
Article CAS Google Scholar
Teslovich, T.M. et al. Biological, clinical and population relevance of 95 loci for blood lipids. Nature 466, 707–713 (2010).
Article CAS Google Scholar
Baxter, I. et al. A coastal cline in sodium accumulation in Arabidopsis thaliana is driven by natural variation of the sodium transporter AtHKT1;1. PLoS Genet. 6, e1001193 (2010).
Article Google Scholar
1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010).
Tibshirani, R. Regression shrinkage and selection via the Lasso. J. R. Stat. Soc., B 58, 267–288 (1996).
Google Scholar
Valdar, W., Holmes, C.C., Mott, R. & Flint, J. Mapping in structured populations by resample model averaging. Genetics 182, 1263–1277 (2009).
Article Google Scholar
Tian, F. et al. Genome-wide association study of leaf architecture in the maize nested association mapping population. Nat. Genet. 43, 159–162 (2011).
Article CAS Google Scholar
Stephens, M. & Balding, D.J. Bayesian statistical methods for genetic association studies. Nat. Rev. Genet. 10, 681–690 (2009).
Article CAS Google Scholar
Servin, B. & Stephens, M. Imputation-based analysis of association studies: candidate regions and quantitative traits. PLoS Genet. 3, e114 (2007).
Article Google Scholar
Hastie, T., Tibshirani, R. & Friedman, J. The Elements of Statistical Learning (Springer, New York, 2009).
Kass, R.E. & Raftery, A.E. Bayes Factors. J. Am. Stat. Assoc. 90, 773–795 (1995).
Article Google Scholar

Download references

Acknowledgements

We acknowledge the NFBC1966 Study investigators for allowing us to use their phenotype and genotype data in our study. The NFBC1966 Study is conducted and supported by the National Heart, Lung, and Blood Institute (NHLBI) in collaboration with the Broad Institute, the University of California, Los Angeles (UCLA), the University of Oulu and the National Institute for Health and Welfare in Finland. This manuscript was not prepared in collaboration with the investigators from the NFBC1966 Study and does not necessarily reflect the opinions or views of these investigators or those at the collaborating institutes. We thank N.B. Freimer and S.K. Service for their help in pre-processing the NFBC1966 data. We would also like to thank P. Forai for excellent information technology and cluster support at GMI, the INRA MIGALE bioinformatics platform for additional computational resources and D.V. Conti, D.J. Balding and S. Srivastava for useful discussions on the topic. Finally, we would like to thank the anonymous reviewers for their helpful comments on the manuscript. This work was supported by grants from the Ecologie des For≖ts, Prairies et milieux Aquatiques (EFPA) department of INRA to V.S. and Deutsche Forschungsgemeinschaft (DFG) to A.K. and by grants from the US National Institutes of Health (P50 HG002790) and the European Union Framework Programme 7 (TransPLANT, grant agreement 283496) to M.N., as well as by the Austrian Academy of Sciences through GMI.

Author information

Vincent Segura and Bjarni J Vilhjálmsson: These authors contributed equally to this work.

Authors and Affiliations

Gregor Mendel Institute (GMI), Austrian Academy of Sciences, Vienna, Austria.,
Vincent Segura, Bjarni J Vilhjálmsson, Alexander Platt, Arthur Korte, Ümit Seren, Quan Long & Magnus Nordborg
Institut National de la Recherche Agronomique (INRA), UR0588, Orléans, France.,
Vincent Segura
Department of Molecular and Computational Biology, University of Southern California, Los Angeles, California, USA
Bjarni J Vilhjálmsson, Alexander Platt & Magnus Nordborg

Authors

Vincent Segura
View author publications
You can also search for this author in PubMed Google Scholar
Bjarni J Vilhjálmsson
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Platt
View author publications
You can also search for this author in PubMed Google Scholar
Arthur Korte
View author publications
You can also search for this author in PubMed Google Scholar
Ümit Seren
View author publications
You can also search for this author in PubMed Google Scholar
Quan Long
View author publications
You can also search for this author in PubMed Google Scholar
Magnus Nordborg
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to designing the study. V.S. and B.J.V. ran the simulations and analyzed the data. V.S., B.J.V. and M.N. wrote the manuscript with input from A.P., A.K., Ü.S. and Q.L.

Corresponding author

Correspondence to Magnus Nordborg.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Table 1, Supplementary Figures 1–11 and Supplementary Note (PDF 1167 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Segura, V., Vilhjálmsson, B., Platt, A. et al. An efficient multi-locus mixed-model approach for genome-wide association studies in structured populations. Nat Genet 44, 825–830 (2012). https://doi.org/10.1038/ng.2314

Download citation

Received: 16 November 2011
Accepted: 04 May 2012
Published: 17 June 2012
Issue Date: July 2012
DOI: https://doi.org/10.1038/ng.2314

This article is cited by

Prediction accuracy of genomic estimated breeding values for fruit traits in cultivated tomato (Solanum lycopersicum L.)
- Jeyun Yeon
- Thuy Tien Phan Nguyen
- Sung-Chur Sim
BMC Plant Biology (2024)
Next-Gen GWAS: full 2D epistatic interaction maps retrieve part of missing heritability and improve phenotypic prediction
- Clément Carré
- Jean Baptiste Carluer
- Gabriel Krouk
Genome Biology (2024)
Superior haplotypes of key drought-responsive genes reveal opportunities for the development of climate-resilient rice varieties
- Preeti Singh
- Krishna T. Sundaram
- Pallavi Sinha
Communications Biology (2024)
Multi-locus genome-wide association study and genomic prediction for flowering time in chrysanthemum
- Jiangshuo Su
- Zhaowen Lu
- Fadi Chen
Planta (2024)
Multi-locus genome-wide association studies reveal the dynamic genetic architecture of flowering time in chrysanthemum
- Jiangshuo Su
- Junwei Zeng
- Fadi Chen
Plant Cell Reports (2024)