Mixed linear model approach adapted for genome-wide association studies

Abstract

Mixed linear model (MLM) methods have proven useful in controlling for population structure and relatedness within genome-wide association studies. However, MLM-based methods can be computationally challenging for large datasets. We report a compression approach, called 'compressed MLM', that decreases the effective sample size of such datasets by clustering individuals into groups. We also present a complementary approach, 'population parameters previously determined' (P3D), that eliminates the need to re-compute variance components. We applied these two methods both independently and combined in selected genetic association datasets from human, dog and maize. The joint implementation of these two methods markedly reduced computing time and either maintained or improved statistical power. We used simulations to demonstrate the usefulness in controlling for substructure in genetic association datasets for a range of species and genetic architectures. We have made these methods available within an implementation of the software program TASSEL.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Figure 1: The forms of MLM classified by the random effect size and types of kinship.
Figure 2: Quantile-quantile plots of type I error (false positive) rates of association tests using the compressed MLM under different compression levels.
Figure 3: The performance of the compressed MLM under different compression levels (horizontal axis).
Figure 4: The P values and statistical power of association tests obtained by using the one-step MLM with the full optimization (full OPT) for all unknown parameters compared to P3D on a maize phenotype simulated with different epistatic effects (E).

References

  1. 1

    Abiola, O. et al. The nature and identification of quantitative trait loci: a community's view. Nat. Rev. Genet. 4, 911–916 (2003).

  2. 2

    Pritchard, J.K., Stephens, M. & Donnelly, P. Inference of population structure using multilocus genotype data. Genetics 155, 945–959 (2000).

  3. 3

    Devlin, B. & Roeder, K. Genomic control for association studies. Biometrics 55, 997–1004 (1999).

  4. 4

    Abecasis, G.R., Cardon, L.R. & Cookson, W.O. A general test of association for quantitative traits in nuclear families. Am. J. Hum. Genet. 66, 279–292 (2000).

  5. 5

    Yu, J. et al. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat. Genet. 38, 203–208 (2006).

  6. 6

    Zhao, K. et al. An Arabidopsis example of association mapping in structured samples. PLoS Genet. 3, e4 (2007).

  7. 7

    Balding, D.J. A tutorial on statistical methods for population association studies. Nat. Rev. Genet. 7, 781–791 (2006).

  8. 8

    Buckler, E.S. et al. The genetic architecture of maize flowering time. Science 325, 714–718 (2009).

  9. 9

    Henderson, C.R. Comparison of alternative sire evaluation methods. J. Anim. Sci. 41, 760–770 (1975).

  10. 10

    Pollak, E.J. & Quaas, R.L. Definition of group effects in sire evaluation models. J. Dairy Sci. 66, 1503–1509 (1983).

  11. 11

    Thompson, R. Sire evaluation. Biometrics 35, 339–353 (1979).

  12. 12

    Quass, R.L. & Pollak, E.J. Mixed model methodology for farm and ranch beef cattle testing programs. J. Anim. Sci. 51, 1277–1287 (1980).

  13. 13

    Myles, S. et al. Association mapping: critical considerations shift from genotyping to experimental design. Plant Cell 21, 2194–2202 (2009).

  14. 14

    Zhu, L. et al. The long (and winding) road to gene discovery for canine hip dysplasia. Vet. J. 181, 97–110 (2009).

  15. 15

    Henderson, C.R. Applications of Linear Models in Animal Breeding (University of Guelph, Guelph, Ontario, Canada, 1984).

  16. 16

    Kang, H.M. et al. Efficient control of population structure in model organism association mapping. Genetics 178, 1709–1723 (2008).

  17. 17

    Aulchenko, Y.S., de Koning, D.-J. & Haley, C. Genomewide rapid association using mixed model and regression: a fast and simple method for genomewide pedigree-based quantitative trait loci association analysis. Genetics 177, 577–585 (2007).

  18. 18

    Searle, S.R., Casella, G. & McCulloch, C.E. Variance Components (Wiley & Sons, New York, 1992).

  19. 19

    Robertson, A. Optimum group size in progeny testing and family selection. Biometrics 13, 442–450 (1957).

  20. 20

    Hannrup, B., Jansson, G. & Danell, Ö. Comparing gain and optimum test size from progeny testing and phenotypic selection in Pinus sylvestris. Can. J. For. Res. 37, 1227–1235 (2007).

  21. 21

    de Oliveira, H.N. & Lobo, R.B. Use of progeny testing in beef cattle: prediction of genetic gain in Nelore cattle breeding program. Rev. Bras. Genet. 18, 207–214 z(1995).

  22. 22

    Yu, J., Arbelbide, M. & Bernardo, R. Power of in silico QTL mapping from phenotypic, pedigree and marker data in a hybrid breeding program. Theor. Appl. Genet. 110, 1061–1067 (2005).

  23. 23

    Rutherford, J.R. & Krutchkoff, R.G. The empirical Bayes approach: estimating the prior distribution. Biometrika 54, 326–328 (1967).

  24. 24

    Romesberg, H.C. Cluster Analysis for Researchers (LULU Press, Raleigh, North Carolina, USA, 2004).

  25. 25

    Jain, A.K., Murty, M.N. & Flynn, P.J. Data clustering: a review. ACM Comput. Surv. 31, 264–323 (1999).

  26. 26

    SAS Institute Inc. Statistical Analysis Software for Windows (Cary, North Carolina, 2002).

  27. 27

    Bradbury, P.J. et al. TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics 23, 2633–2635 (2007).

  28. 28

    Lai, C.Q. et al. Fenofibrate effect on triglyceride and postprandial response of apolipoprotein A5 variants: the GOLDN study. Arterioscler. Thromb. Vasc. Biol. 27, 1417–1425 (2007).

  29. 29

    Zhang, Z. et al. Estimation of heritabilities, genetic correlations, and breeding values of four traits collectively defining hip dysplasia in dogs. Am. J. Vet. Res. 70, 483–492 (2009).

  30. 30

    Long, A.D. & Langley, C.H. The power of association studies to detect the contribution of candidate genetic loci to variation in complex traits. Genome Res. 9, 720–731 (1999).

  31. 31

    Lande, R. & Thompson, R. Efficiency of marker-assisted selection in the improvement of quantitative traits. Genetics 124, 743–756 (1990).

  32. 32

    Loiselle, B.A., Sork, V.L., Nason, J. & Graham, C. Spatial genetic-structure of a tropical understory shrub, Psychotria officinalis (Rubiaceae). Am. J. Bot. 82, 1420–1425 (1995).

Download references

Acknowledgements

This study was supported by the US National Science Foundation (NSF)–Plant Genome Program (DBI-0321467, 0703908 and 0820619), NSF–Plant Genome Comparative Sequencing Program (DBI-06638566), US National Institutes of Health (1R21AR055228-01A1), National Heart, Lung, and Blood Institute (U 01 HL72524, HL54776 and 5U01HL072524-06), US Department of Agriculture Research Service (53-K06–5-10 and 58–1950-9–001), USDA–Cooperative State Research, Education and Extension Service National Research Initiative (2006-35300-17155), Morris Animal Foundation (D04CA-135), WALTHAM Centre for Pet Nutrition, Cornell Advanced Technology in Biotechnology and the Collaborative Research Program in the Cornell Veterinary College. The authors would like to thank K. Zhao for providing the source code to compute kinship and L. Rigamer Lirette, A.L. Ingham and S. Myles for editing of the manuscript.

Author information

Z.Z. conceptualized the study, performed the data analyses and wrote the manuscript. E.E., M.A.G. and J.Y. participated in the data analyses and wrote the manuscript. P.J.B. implemented the two new methods in the TASSEL software package. C.L., H.K.T., D.K.A. and J.M.O. provided the human data and supervised its analyses. R.J.T. provided the dog data and supervised its analyses. E.S.B designed and supervised the project. All authors edited the manuscript.

Correspondence to Zhiwu Zhang.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–5 and Supplementary Note (PDF 1425 kb)

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Zhang, Z., Ersoz, E., Lai, C. et al. Mixed linear model approach adapted for genome-wide association studies. Nat Genet 42, 355–360 (2010). https://doi.org/10.1038/ng.546

Download citation

Further reading