We thank Sul and Eskin (Mixed models can correct for population structure for genomic regions under selection. Nature Reviews Genetics 26 Feb 2013 (doi:10.1038/nrg2813-c1))1 for carefully examining and confirming the limitation of standard mixed model association methods that we identified in our Progress article (New approaches to population stratification in genome-wide association studies. Nature Reviews Genetics 11, 459–463 (2010))2 and for developing an interesting new way to address it.
In our article2, we investigated the limits of mixed model methods by considering an extreme simulation in which most markers had low population differentiation (FST = 0.01), but a small fraction of markers were unusually differentiated (allele frequency difference = 0.6). We found that standard mixed model methods3 did not fully correct for population structure, but mixed models with principal component covariates4 did fully correct for population structure. We stated that “population structure is a fixed effect, and spurious associations might result if it is modelled as a random effect based on overall covariance”.
Sul and Eskin1 have confirmed that, in this extreme simulation, standard mixed model methods do not fully correct for population structure and that mixed models with principal component covariates do fully correct for population structure. They also investigated a new approach, which is to use a mixed model using two kinship matrices: one computed using unusually differentiated markers identified by their spatial ancestry analysis (SPA) method5, and one computed using the remaining markers. They reported that this approach also fully corrects for population structure in this simulation. Thus, population stratification (a fixed effect in this simulation) can be addressed using random effects in a way that we had not previously considered: our review considered only mixed models with a single random effect based on overall covariance3,4,6,7,8 but did not consider mixed models with multiple random effects1.
Another possibility, very similar to the Sul and Eskin1 approach, is to use a mixed model that uses two kinship matrices — one computed from principal component 1, and one computed using the remaining principal components; this approach is based on the natural decomposition of a kinship matrix into its principal components9. This would also fully correct for population structure in this extreme simulation, as Sul and Eskin1 showed that using a single kinship matrix computed from principal component 1 fully corrects for population structure.
A broader question is whether the limitation of standard mixed model methods that arises in this extreme simulation is a major concern in empirical studies. In our article2, we stated that standard mixed model methods are an appealing and simple approach and are sufficient to correct for stratification in many settings. Sul and Eskin1 indicated that the limitation we described did not arise in the Finnish and UK data sets that they analysed. We agree that mixed models with a single random effect based on overall covariance will probably be sufficient to correct for population structure fully in most settings.
Finally, we note that recent work has raised additional points about mixed model methods, including inclusion versus exclusion of the candidate marker in the kinship matrix, use of only a small subset of markers in computing the kinship matrix and effects of case–control ascertainment10,11,12,13. We believe that these are important points that merit further investigation, but this is outside the scope of the current Correspondence.
Sul, J. H. & Eskin, E. Mixed models can correct for population structure for genomic regions under selection. Nature Rev. Genet. 26 Feb 2013 (10.1038/nrg2813-c1).
Price, A. L., Zaitlen, N. A., Reich, D. & Patterson, N. New approaches to population stratification in genome-wide association studies. Nature Rev. Genet. 11, 459–463 (2010).
Kang, H. M. et al. Variance component model to account for sample structure in genome-wide association studies. Nature Genet. 42, 348–354 (2010).
Zhang, Z. et al. Mixed linear model approach adapted for genome-wide association studies. Nature Genet. 42, 355–360 (2010).
Yang, W. Y., Novembre, J., Eskin, E. & Halperin, E. A model-based approach for analysis of spatial structure in genetic data. Nature Genet. 44, 725–731 (2012).
Segura, V. et al. An efficient multi-locus mixed-model approach for genome-wide association studies in structured populations. Nature Genet. 44, 825–830 (2012).
Zhou, X. & Stephens, M. Genome-wide efficient mixed-model analysis for association studies. Nature Genet. 44, 821–824 (2012).
Vilhjalmsson, B. J. & Nordborg, M. The nature of confounding in genome-wide association studies. Nature Rev. Genet. 14, 1–2 (2013).
Janss, L., de Los Campos, G., Sheehan, N. & Sorensen, D. Inferences from genomic models in stratified populations. Genetics 192, 693–704 (2012).
Sawcer, S. et al. Genetic risk and a primary role for cell-mediated immune mechanisms in multiple sclerosis. Nature 476, 214–219 (2011).
Lippert, C. et al. FaST linear mixed models for genome-wide association studies. Nature Methods 8, 833–835 (2011).
Listgarten, J. et al. Improved linear mixed models for genome-wide association studies. Nature Methods 9, 525–526 (2012).
Mefford, J. & Witte, J. S. The covariate's dilemma. PLoS Genet. 8, e1003096 (2012).
We are grateful to E. Eskin, P. Visscher, J. Yang and M. Goddard for helpful discussions.
The authors declare no competing financial interests.
About this article
Cite this article
Price, A., Zaitlen, N., Reich, D. et al. Response to Sul and Eskin. Nat Rev Genet 14, 300 (2013). https://doi.org/10.1038/nrg2813-c2
Mapping the genomic architecture of adaptive traits with interspecific introgressive origin: a coalescent-based approach
BMC Genomics (2016)
Nature Genetics (2014)