Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Testing for genetic associations in arbitrarily structured populations


We present a new statistical test of association between a trait and genetic markers, which we theoretically and practically prove to be robust to arbitrarily complex population structure. The statistical test involves a set of parameters that can be directly estimated from large-scale genotyping data, such as those measured in genome-wide association studies (GWAS). We also derive a new set of methodologies, called a 'genotype-conditional association test' (GCAT), shown to provide accurate association tests in populations with complex structures, manifested in both the genetic and non-genetic contributions to the trait. We demonstrate the proposed method on a large simulation study and on the Northern Finland Birth Cohort study. In the Finland study, we identify several new significant loci that other methods do not detect. Our proposed framework provides a substantially different approach to the problem from existing methods, such as the linear mixed-model and principal-component approaches.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Figure 1: Rationale for the proposed test of association.
Figure 2: Performance of the association testing methods.


  1. McCarthy, M.I. et al. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat. Rev. Genet. 9, 356–369 (2008).

    Article  CAS  PubMed  Google Scholar 

  2. Frazer, K.A., Murray, S.S., Schork, N.J. & Topol, E.J. Human genetic variation and its contribution to complex traits. Nat. Rev. Genet. 10, 241–251 (2009).

    Article  CAS  PubMed  Google Scholar 

  3. Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678 (2007).

  4. Pritchard, J.K. & Rosenberg, N.A. Use of unlinked genetic markers to detect population stratification in association studies. Am. J. Hum. Genet. 65, 220–228 (1999).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Astle, W. & Balding, D.J. Population structure and cryptic relatedness in genetic association studies. Stat. Sci. 24, 451–471 (2009).

    Article  Google Scholar 

  6. Price, A.L., Zaitlen, N.A., Reich, D. & Patterson, N. New approaches to population stratification in genome-wide association studies. Nat. Rev. Genet. 11, 459–463 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Zhang, S., Zhu, X. & Zhao, H. On a semiparametric test to detect associations between quantitative traits and candidate genes using unrelated individuals. Genet. Epidemiol. 24, 44–56 (2003).

    Article  PubMed  Google Scholar 

  8. Price, A.L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904–909 (2006).

    Article  CAS  PubMed  Google Scholar 

  9. Yu, J. et al. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat. Genet. 38, 203–208 (2006).

    Article  CAS  PubMed  Google Scholar 

  10. Kang, H.M. et al. Variance component model to account for sample structure in genome-wide association studies. Nat. Genet. 42, 348–354 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Wang, K., Hu, X. & Peng, Y. An analytical comparison of the principal component method and the mixed effects model for association studies in the presence of cryptic relatedness and population stratification. Hum. Hered. 76, 1–9 (2013).

    Article  PubMed  Google Scholar 

  12. Sabatti, C. et al. Genome-wide association analysis of metabolic traits in a birth cohort from a founder population. Nat. Genet. 41, 35–46 (2009).

    Article  CAS  PubMed  Google Scholar 

  13. Soranzo, N. et al. Meta-analysis of genome-wide scans for human adult stature identifies novel loci and associations with measures of skeletal frame size. PLoS Genet. 5, e1000445 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  14. Hao, W., Song, M. & Storey, J.D. Probabilistic models of genetic variation in structured populations applied to global human studies. arXiv, (2013).

  15. Zhou, X. & Stephens, M. Genome-wide efficient mixed-model analysis for association studies. Nat. Genet. 44, 821–824 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Dudbridge, F. & Gusnanto, A. Estimation of significance thresholds for genomewide association scans. Genet. Epidemiol. 32, 227–234 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  17. Sandhu, M.S. et al. LDL-cholesterol concentrations: a genome-wide association study. Lancet 371, 483–491 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Prokopenko, I. et al. Variants in MTNR1B influence fasting glucose levels. Nat. Genet. 41, 77–81 (2009).

    Article  CAS  PubMed  Google Scholar 

  19. Devlin, B. & Roeder, K. Genomic control for association studies. Biometrics 55, 997–1004 (1999).

    Article  CAS  PubMed  Google Scholar 

  20. Yang, J. et al. Genomic inflation factors under polygenic inheritance. Eur. J. Hum. Genet. 19, 807–812 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  21. Witten, D.M., Tibshirani, R. & Hastie, T. A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 10, 515–534 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  22. Baglama, J. & Reichel, L. Restarted block Lanczos bidiagonalization methods. Num. Algo. 43, 251–272 (2006).

    Article  Google Scholar 

  23. Balding, D.J. & Nichols, R.A. A method for quantifying differentiation between populations at multi-allelic loci and its implications for investigating identity and paternity. Genetica 96, 3–12 (1995).

    Article  CAS  PubMed  Google Scholar 

  24. Pritchard, J.K., Stephens, M. & Donnelly, P. Inference of population structure using multilocus genotype data. Genetics 155, 945–959 (2000).

    CAS  PubMed  PubMed Central  Google Scholar 

Download references


This research was supported in part by US National Institutes of Health grant R01 HG006448. The NFBC data were collected by the STAMPEED: Northern Finland Birth Cohort 1966 (NFBC1966) GWAS, made available through database of Genotypes and Phenotypes (dbGaP) study accession phs000276.v2.p1. A full list of contributors to the STAMPEED study can be found on its dbGaP web site.

Author information

Authors and Affiliations



J.D.S. designed the study and wrote the manuscript. J.D.S. and M.S. developed statistical theory and methods. W.H., J.D.S. and M.S. designed the simulations. W.H. analyzed the data and developed the software.

Corresponding author

Correspondence to John D Storey.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Note, Supplementary Figures 1–18 and Supplementary Tables 1 and 2. (PDF 14034 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Song, M., Hao, W. & Storey, J. Testing for genetic associations in arbitrarily structured populations. Nat Genet 47, 550–554 (2015).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

This article is cited by


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing