Mixed linear model (MLM) methods have proven useful in controlling for population structure and relatedness within genome-wide association studies. However, MLM-based methods can be computationally challenging for large datasets. We report a compression approach, called 'compressed MLM', that decreases the effective sample size of such datasets by clustering individuals into groups. We also present a complementary approach, 'population parameters previously determined' (P3D), that eliminates the need to re-compute variance components. We applied these two methods both independently and combined in selected genetic association datasets from human, dog and maize. The joint implementation of these two methods markedly reduced computing time and either maintained or improved statistical power. We used simulations to demonstrate the usefulness in controlling for substructure in genetic association datasets for a range of species and genetic architectures. We have made these methods available within an implementation of the software program TASSEL.
Subscribe to Journal
Get full journal access for 1 year
only $18.75 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Abiola, O. et al. The nature and identification of quantitative trait loci: a community's view. Nat. Rev. Genet. 4, 911–916 (2003).
Pritchard, J.K., Stephens, M. & Donnelly, P. Inference of population structure using multilocus genotype data. Genetics 155, 945–959 (2000).
Devlin, B. & Roeder, K. Genomic control for association studies. Biometrics 55, 997–1004 (1999).
Abecasis, G.R., Cardon, L.R. & Cookson, W.O. A general test of association for quantitative traits in nuclear families. Am. J. Hum. Genet. 66, 279–292 (2000).
Yu, J. et al. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat. Genet. 38, 203–208 (2006).
Zhao, K. et al. An Arabidopsis example of association mapping in structured samples. PLoS Genet. 3, e4 (2007).
Balding, D.J. A tutorial on statistical methods for population association studies. Nat. Rev. Genet. 7, 781–791 (2006).
Buckler, E.S. et al. The genetic architecture of maize flowering time. Science 325, 714–718 (2009).
Henderson, C.R. Comparison of alternative sire evaluation methods. J. Anim. Sci. 41, 760–770 (1975).
Pollak, E.J. & Quaas, R.L. Definition of group effects in sire evaluation models. J. Dairy Sci. 66, 1503–1509 (1983).
Thompson, R. Sire evaluation. Biometrics 35, 339–353 (1979).
Quass, R.L. & Pollak, E.J. Mixed model methodology for farm and ranch beef cattle testing programs. J. Anim. Sci. 51, 1277–1287 (1980).
Myles, S. et al. Association mapping: critical considerations shift from genotyping to experimental design. Plant Cell 21, 2194–2202 (2009).
Zhu, L. et al. The long (and winding) road to gene discovery for canine hip dysplasia. Vet. J. 181, 97–110 (2009).
Henderson, C.R. Applications of Linear Models in Animal Breeding (University of Guelph, Guelph, Ontario, Canada, 1984).
Kang, H.M. et al. Efficient control of population structure in model organism association mapping. Genetics 178, 1709–1723 (2008).
Aulchenko, Y.S., de Koning, D.-J. & Haley, C. Genomewide rapid association using mixed model and regression: a fast and simple method for genomewide pedigree-based quantitative trait loci association analysis. Genetics 177, 577–585 (2007).
Searle, S.R., Casella, G. & McCulloch, C.E. Variance Components (Wiley & Sons, New York, 1992).
Robertson, A. Optimum group size in progeny testing and family selection. Biometrics 13, 442–450 (1957).
Hannrup, B., Jansson, G. & Danell, Ö. Comparing gain and optimum test size from progeny testing and phenotypic selection in Pinus sylvestris. Can. J. For. Res. 37, 1227–1235 (2007).
de Oliveira, H.N. & Lobo, R.B. Use of progeny testing in beef cattle: prediction of genetic gain in Nelore cattle breeding program. Rev. Bras. Genet. 18, 207–214 z(1995).
Yu, J., Arbelbide, M. & Bernardo, R. Power of in silico QTL mapping from phenotypic, pedigree and marker data in a hybrid breeding program. Theor. Appl. Genet. 110, 1061–1067 (2005).
Rutherford, J.R. & Krutchkoff, R.G. The empirical Bayes approach: estimating the prior distribution. Biometrika 54, 326–328 (1967).
Romesberg, H.C. Cluster Analysis for Researchers (LULU Press, Raleigh, North Carolina, USA, 2004).
Jain, A.K., Murty, M.N. & Flynn, P.J. Data clustering: a review. ACM Comput. Surv. 31, 264–323 (1999).
SAS Institute Inc. Statistical Analysis Software for Windows (Cary, North Carolina, 2002).
Bradbury, P.J. et al. TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics 23, 2633–2635 (2007).
Lai, C.Q. et al. Fenofibrate effect on triglyceride and postprandial response of apolipoprotein A5 variants: the GOLDN study. Arterioscler. Thromb. Vasc. Biol. 27, 1417–1425 (2007).
Zhang, Z. et al. Estimation of heritabilities, genetic correlations, and breeding values of four traits collectively defining hip dysplasia in dogs. Am. J. Vet. Res. 70, 483–492 (2009).
Long, A.D. & Langley, C.H. The power of association studies to detect the contribution of candidate genetic loci to variation in complex traits. Genome Res. 9, 720–731 (1999).
Lande, R. & Thompson, R. Efficiency of marker-assisted selection in the improvement of quantitative traits. Genetics 124, 743–756 (1990).
Loiselle, B.A., Sork, V.L., Nason, J. & Graham, C. Spatial genetic-structure of a tropical understory shrub, Psychotria officinalis (Rubiaceae). Am. J. Bot. 82, 1420–1425 (1995).
This study was supported by the US National Science Foundation (NSF)–Plant Genome Program (DBI-0321467, 0703908 and 0820619), NSF–Plant Genome Comparative Sequencing Program (DBI-06638566), US National Institutes of Health (1R21AR055228-01A1), National Heart, Lung, and Blood Institute (U 01 HL72524, HL54776 and 5U01HL072524-06), US Department of Agriculture Research Service (53-K06–5-10 and 58–1950-9–001), USDA–Cooperative State Research, Education and Extension Service National Research Initiative (2006-35300-17155), Morris Animal Foundation (D04CA-135), WALTHAM Centre for Pet Nutrition, Cornell Advanced Technology in Biotechnology and the Collaborative Research Program in the Cornell Veterinary College. The authors would like to thank K. Zhao for providing the source code to compute kinship and L. Rigamer Lirette, A.L. Ingham and S. Myles for editing of the manuscript.
The authors declare no competing financial interests.
About this article
Cite this article
Zhang, Z., Ersoz, E., Lai, C. et al. Mixed linear model approach adapted for genome-wide association studies. Nat Genet 42, 355–360 (2010). https://doi.org/10.1038/ng.546
Journal of Human Genetics (2020)
Genome‐wide association mapping of QTLs implied in potato virus Y population sizes in pepper: evidence for widespread resistance QTL pyramiding
Molecular Plant Pathology (2020)
Theoretical and Applied Genetics (2020)
Journal of Cereal Science (2020)
Genome-wide association mapping for seed protein content in finger millet (Eleusine coracana) global collection through genotyping by sequencing
Journal of Cereal Science (2020)