Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

An efficient multi-locus mixed-model approach for genome-wide association studies in structured populations

Abstract

Population structure causes genome-wide linkage disequilibrium between unlinked loci, leading to statistical confounding in genome-wide association studies. Mixed models have been shown to handle the confounding effects of a diffuse background of large numbers of loci of small effect well, but they do not always account for loci of larger effect. Here we propose a multi-locus mixed model as a general method for mapping complex traits in structured populations. Simulations suggest that our method outperforms existing methods in terms of power as well as false discovery rate. We apply our method to human and Arabidopsis thaliana data, identifying new associations and evidence for allelic heterogeneity. We also show how a priori knowledge from an A. thaliana linkage mapping study can be integrated into our method using a Bayesian approach. Our implementation is computationally efficient, making the analysis of large data sets (n > 10,000) practicable.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Buy article

Get time limited or full article access on ReadCube.

$32.00

All prices are NET prices.

Figure 1: A GWAS for a simulated trait with two causal SNPs randomly chosen from a real A. thaliana SNP data set.
Figure 2: Power and FDR in 100-locus model simulations for four different mapping methods: LM, SWLM, MM and MLMM.
Figure 3: GWAS for LDL levels in the NFBC1966 data set.
Figure 4: GWAS for sodium accumulation in A. thaliana.
Figure 5: An example of Bayesian MLMM for the analysis of FLC expression in A. thaliana.

References

  1. Cardon, L.R. & Palmer, L.J. Population stratification and spurious allelic association. Lancet 361, 598–604 (2003).

    Article  Google Scholar 

  2. Marchini, J., Cardon, L.R., Phillips, M.S. & Donnelly, P. The effects of human population structure on large genetic association studies. Nat. Genet. 36, 512–517 (2004).

    Article  CAS  Google Scholar 

  3. Devlin, B. & Roeder, K. Genomic control for association studies. Biometrics 55, 997–1004 (1999).

    Article  CAS  Google Scholar 

  4. Pritchard, J.K., Stephens, M., Rosenberg, N.A. & Donnelly, P. Association mapping in structured populations. Am. J. Hum. Genet. 67, 170–181 (2000).

    Article  CAS  Google Scholar 

  5. Price, A.L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904–909 (2006).

    Article  CAS  Google Scholar 

  6. Yu, J. et al. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat. Genet. 38, 203–208 (2006).

    Article  CAS  Google Scholar 

  7. Zhao, K. et al. An Arabidopsis example of association mapping in structured samples. PLoS Genet. 3, e4 (2007).

    Article  Google Scholar 

  8. Henderson, C.R. Application of Linear Models in Animal Breeding (University of Guelph, Guelph, Canada, 1984).

  9. Fisher, R.A. The correlation between relatives on the supposition of Mendelian inheritance. Trans. R. Soc. Edinb. 52, 399–433 (1918).

    Article  Google Scholar 

  10. Kang, H.M. et al. Efficient control of population structure in model organism association mapping. Genetics 178, 1709–1723 (2008).

    Article  Google Scholar 

  11. Kang, H.M. et al. Variance component model to account for sample structure in genome-wide association studies. Nat. Genet. 42, 348–354 (2010).

    Article  CAS  Google Scholar 

  12. Atwell, S. et al. Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines. Nature 465, 627–631 (2010).

    Article  CAS  Google Scholar 

  13. Aulchenko, Y.S., de Koning, D.J. & Haley, C. Genomewide rapid association using mixed model and regression: a fast and simple method for genomewide pedigree-based quantitative trait loci association analysis. Genetics 177, 577–585 (2007).

    Article  CAS  Google Scholar 

  14. Zhang, Z. et al. Mixed linear model approach adapted for genome-wide association studies. Nat. Genet. 42, 355–360 (2010).

    Article  CAS  Google Scholar 

  15. Yang, J. et al. Genomic inflation factors under polygenic inheritance. Eur. J. Hum. Genet. 19, 807–812 (2011).

    Article  Google Scholar 

  16. Jansen, R.C. Interval mapping of multiple quantitative trait loci. Genetics 135, 205–211 (1993).

    CAS  PubMed  PubMed Central  Google Scholar 

  17. Zeng, Z.B. Precision mapping of quantitative trait loci. Genetics 136, 1457–1468 (1994).

    CAS  PubMed  PubMed Central  Google Scholar 

  18. Platt, A., Vilhjalmsson, B.J. & Nordborg, M. Conditions under which genome-wide association studies will be positively misleading. Genetics 186, 1045–1052 (2010).

    Article  Google Scholar 

  19. Allen, A.S., Satten, G.A., Bray, S.L., Dudbridge, F. & Epstein, M.P. Fast and robust association tests for untyped SNPs in case-control studies. Hum. Hered. 70, 167–176 (2010).

    Article  CAS  Google Scholar 

  20. Dickson, S.P., Wang, K., Krantz, I., Hakonarson, H. & Goldstein, D.B. Rare variants create synthetic genome-wide associations. PLoS Biol. 8, e1000294 (2010).

    Article  Google Scholar 

  21. Cordell, H.J. & Clayton, D.G. A unified stepwise regression procedure for evaluating the relative effects of polymorphisms within a gene using case/control or family data: application to HLA in type 1 diabetes. Am. J. Hum. Genet. 70, 124–141 (2002).

    Article  CAS  Google Scholar 

  22. Hoggart, C.J., Whittaker, J.C., De Iorio, M. & Balding, D.J. Simultaneous analysis of all SNPs in genome-wide and re-sequencing association studies. PLoS Genet. 4, e1000130 (2008).

    Article  Google Scholar 

  23. Malo, N., Libiger, O. & Schork, N.J. Accommodating linkage disequilibrium in genetic-association analyses via ridge regression. Am. J. Hum. Genet. 82, 375–385 (2008).

    Article  CAS  Google Scholar 

  24. Croiseau, P. & Cordell, H.J. Analysis of North American Rheumatoid Arthritis Consortium data using a penalized logistic regression approach. BMC Proc. 3, S61 (2009).

    Article  Google Scholar 

  25. Cho, S. et al. Joint identification of multiple genetic variants via elastic-net variable selection in a genome-wide association analysis. Ann. Hum. Genet. 74, 416–428 (2010).

    Article  Google Scholar 

  26. Wang, D., Eskridge, K.M. & Crossa, J. Identifying QTLs and epistasis in structured plant populations using adaptive mixed LASSO. J. Agric. Biol. Environ. Stat. 16, 170–184 (2011).

    Article  Google Scholar 

  27. Ayers, K.L. & Cordell, H.J. SNP selection in genome-wide and candidate gene studies via penalized logistic regression. Genet. Epidemiol. 34, 879–891 (2010).

    Article  Google Scholar 

  28. Horton, M.W. et al. Genome-wide patterns of genetic variation in worldwide Arabidopsis thaliana accessions from the RegMap panel. Nat. Genet. 44, 212–216 (2012).

    Article  CAS  Google Scholar 

  29. Chen, J.H. & Chen, Z.H. Extended Bayesian information criteria for model selection with large model spaces. Biometrika 95, 759–771 (2008).

    Article  Google Scholar 

  30. Astle, W. & Balding, D.J. Population structure and cryptic relatedness in genetic association studies. Stat. Sci. 24, 451–471 (2009).

    Article  Google Scholar 

  31. Sabatti, C. et al. Genome-wide association analysis of metabolic traits in a birth cohort from a founder population. Nat. Genet. 41, 35–46 (2009).

    Article  CAS  Google Scholar 

  32. Kathiresan, S. et al. Common variants at 30 loci contribute to polygenic dyslipidemia. Nat. Genet. 41, 56–65 (2009).

    Article  CAS  Google Scholar 

  33. Teslovich, T.M. et al. Biological, clinical and population relevance of 95 loci for blood lipids. Nature 466, 707–713 (2010).

    Article  CAS  Google Scholar 

  34. Baxter, I. et al. A coastal cline in sodium accumulation in Arabidopsis thaliana is driven by natural variation of the sodium transporter AtHKT1;1. PLoS Genet. 6, e1001193 (2010).

    Article  Google Scholar 

  35. 1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010).

  36. Tibshirani, R. Regression shrinkage and selection via the Lasso. J. R. Stat. Soc., B 58, 267–288 (1996).

    Google Scholar 

  37. Valdar, W., Holmes, C.C., Mott, R. & Flint, J. Mapping in structured populations by resample model averaging. Genetics 182, 1263–1277 (2009).

    Article  Google Scholar 

  38. Tian, F. et al. Genome-wide association study of leaf architecture in the maize nested association mapping population. Nat. Genet. 43, 159–162 (2011).

    Article  CAS  Google Scholar 

  39. Stephens, M. & Balding, D.J. Bayesian statistical methods for genetic association studies. Nat. Rev. Genet. 10, 681–690 (2009).

    Article  CAS  Google Scholar 

  40. Servin, B. & Stephens, M. Imputation-based analysis of association studies: candidate regions and quantitative traits. PLoS Genet. 3, e114 (2007).

    Article  Google Scholar 

  41. Hastie, T., Tibshirani, R. & Friedman, J. The Elements of Statistical Learning (Springer, New York, 2009).

  42. Kass, R.E. & Raftery, A.E. Bayes Factors. J. Am. Stat. Assoc. 90, 773–795 (1995).

    Article  Google Scholar 

Download references

Acknowledgements

We acknowledge the NFBC1966 Study investigators for allowing us to use their phenotype and genotype data in our study. The NFBC1966 Study is conducted and supported by the National Heart, Lung, and Blood Institute (NHLBI) in collaboration with the Broad Institute, the University of California, Los Angeles (UCLA), the University of Oulu and the National Institute for Health and Welfare in Finland. This manuscript was not prepared in collaboration with the investigators from the NFBC1966 Study and does not necessarily reflect the opinions or views of these investigators or those at the collaborating institutes. We thank N.B. Freimer and S.K. Service for their help in pre-processing the NFBC1966 data. We would also like to thank P. Forai for excellent information technology and cluster support at GMI, the INRA MIGALE bioinformatics platform for additional computational resources and D.V. Conti, D.J. Balding and S. Srivastava for useful discussions on the topic. Finally, we would like to thank the anonymous reviewers for their helpful comments on the manuscript. This work was supported by grants from the Ecologie des Forts, Prairies et milieux Aquatiques (EFPA) department of INRA to V.S. and Deutsche Forschungsgemeinschaft (DFG) to A.K. and by grants from the US National Institutes of Health (P50 HG002790) and the European Union Framework Programme 7 (TransPLANT, grant agreement 283496) to M.N., as well as by the Austrian Academy of Sciences through GMI.

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to designing the study. V.S. and B.J.V. ran the simulations and analyzed the data. V.S., B.J.V. and M.N. wrote the manuscript with input from A.P., A.K., Ü.S. and Q.L.

Corresponding author

Correspondence to Magnus Nordborg.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Table 1, Supplementary Figures 1–11 and Supplementary Note (PDF 1167 kb)

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Segura, V., Vilhjálmsson, B., Platt, A. et al. An efficient multi-locus mixed-model approach for genome-wide association studies in structured populations. Nat Genet 44, 825–830 (2012). https://doi.org/10.1038/ng.2314

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/ng.2314

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing