Principal components analysis corrects for stratification in genome-wide association studies

Abstract

Population stratification—allele frequency differences between cases and controls due to systematic ancestry differences—can cause spurious associations in disease studies. We describe a method that enables explicit detection and correction of population stratification on a genome-wide scale. Our method uses principal components analysis to explicitly model ancestry differences between cases and controls. The resulting correction is specific to a candidate marker's variation in frequency across ancestral populations, minimizing spurious associations while maximizing power to detect true associations. Our simple, efficient approach can easily be applied to disease studies with hundreds of thousands of markers.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Figure 1: The EIGENSTRAT algorithm, illustrated on simulated data.
Figure 2: The top two axes of variation of European American samples.

References

  1. 1

    Lander, E.S. & Schork, N.J. Genetic dissection of complex traits. Science 265, 2037–2048 (1994).

    CAS  Article  Google Scholar 

  2. 2

    Lohmueller, K. et al. Meta-analysis of genetic association studies supports a contribution of common variants to susceptibility to common disease. Nat. Genet. 33, 177–182 (2003).

    CAS  Article  Google Scholar 

  3. 3

    Freedman, M. et al. Assessing the impact of population stratification on genetic association studies. Nat. Genet. 36, 388–393 (2004).

    CAS  Article  Google Scholar 

  4. 4

    Marchini, J. et al. The effects of human population structure on large genetic association studies. Nat. Genet. 36, 512–517 (2004).

    CAS  Article  Google Scholar 

  5. 5

    Helgason, A. et al. An Icelandic example of the impact of population structure on association studies. Nat. Genet. 37, 90–95 (2005).

    CAS  Article  Google Scholar 

  6. 6

    Campbell, C.D. et al. Demonstrating stratification in a European American population. Nat. Genet. 37, 868–872 (2005).

    CAS  Article  Google Scholar 

  7. 7

    Hirschhorn, J.N. & Daly, M.J. Genome-wide association studies for common diseases and complex traits. Nat. Rev. Genet. 6, 95–108 (2005).

    CAS  Article  Google Scholar 

  8. 8

    Thomas, D.C. et al. Recent developments in genomewide association scans: a workshop summary and review. Am. J. Hum. Genet. 77, 337–345 (2005).

    CAS  Article  Google Scholar 

  9. 9

    Reich, D. & Goldstein, D. Detecting association in a case-control study while allowing for population stratification. Genet. Epidemiol. 20, 4–16 (2001).

    CAS  Article  Google Scholar 

  10. 10

    Devlin, B. & Roeder, K. Genomic control for association studies. Biometrics 55, 997–1004 (1999).

    CAS  Article  Google Scholar 

  11. 11

    Devlin, B. et al. Genomic control to the extreme. Nat. Genet. 36, 1129–1130 (2004).

    CAS  Article  Google Scholar 

  12. 12

    Pritchard, J.K. et al. Association mapping in structured populations. Am. J. Hum. Genet. 67, 170–181 (2000).

    CAS  Article  Google Scholar 

  13. 13

    Satten, G. et al. Accounting for unmeasured population substructure in case-control studies of genetic association using a novel latent-class model. Am. J. Hum. Genet. 68, 466–477 (2001).

    CAS  Article  Google Scholar 

  14. 14

    Setakis, E., Stirnadel, H. & Balding, D.J. Logistic regression protects against population structure in genetic association studies. Genome Res. 16, 290–296 (2006).

    CAS  Article  Google Scholar 

  15. 15

    Pritchard, J.K. et al. Inference of population structure using multilocus genotype data. Genetics 155, 945–959 (2000).

    CAS  PubMed  PubMed Central  Google Scholar 

  16. 16

    Serre, D. & Paabo, S. Evidence for gradients of human genetic diversity within and among continents. Genome Res. 14, 1679–1685 (2004).

    CAS  Article  Google Scholar 

  17. 17

    Jackson, J.E. A User's Guide to Principal Components (John Wiley & Sons, New York, 2003).

    Google Scholar 

  18. 18

    Menozzi, P., Piazza, A. & Cavalli-Sforza, L. Synthetic maps of human gene frequencies in Europeans. Science 201, 786–792 (1978).

    CAS  Article  Google Scholar 

  19. 19

    Cavalli-Sforza, L.L., Menozzi, P. & Piazza, A. Demic expansions and human evolution. Science 259, 639–646 (1993).

    CAS  Article  Google Scholar 

  20. 20

    Johnstone, I. On the distribution of the largest eigenvalue in principal components analysis. Ann. Stat. 29, 295–327 (2001).

    Article  Google Scholar 

  21. 21

    Soshnikov, A. A note on universality of the distribution of the largest eigenvalues in certain sample covariance matrices. J. Stat. Phys. 108, 1033–1056 (2002).

    Article  Google Scholar 

  22. 22

    Baik, J., Ben Arous, G. & Peche, S. Phase transition of the largest eigenvalue for non-null complex sample covariance matrices. Ann. Probab. 33, 1643–1697 (2005).

    Article  Google Scholar 

  23. 23

    Rosenberg, N.A. et al. Clines, clusters, and the effect of study design on the inference of human population structure. PLoS Genetics 1, 660–671 (2005).

    CAS  Article  Google Scholar 

  24. 24

    Pritchard, J.K. & Donnelly, P. Case-control studies of association in structured or admixed populations. Theor. Popul. Biol. 60, 227–237 (2001).

    CAS  Article  Google Scholar 

  25. 25

    Balding, D.J. & Nichols, R.A. A method for quantifying differentiation between populations at multi-allelic loci and its implications for investigating identify and paternity. Genetica 96, 3–12 (1995).

    CAS  Article  Google Scholar 

  26. 26

    Cavalli-Sforza, L.L., Menozzi, P. & Piazza, A. The History and Geography of Human Genes (Princeton Univ. Press, Princeton, New Jersey, 1994).

    Google Scholar 

  27. 27

    Nicholson, G. et al. Assessing population differentiation and isolation from single-nucleotide polymorphism data. J. R. Statist. Soc. (B) 64, 695–715 (2002).

    Article  Google Scholar 

  28. 28

    Bersaglieri, T. et al. Genetic signatures of strong recent positive selection at the lactase gene. Am. J. Hum. Genet. 74, 1111–1120 (2004).

    CAS  Article  Google Scholar 

  29. 29

    Armitage, P. Tests for linear trends in proportions and frequencies. Biometrics 11, 375–386 (1955).

    Article  Google Scholar 

  30. 30

    Enattah, N.S. et al. Identification of a variant associated with adult-type hypolactasia. Nat. Genet. 30, 233–237 (2002).

    CAS  Article  Google Scholar 

  31. 31

    The International HapMap Consortium. A haplotype map of the human genome. Nature 437, 1299–1320 (2005).

  32. 32

    Cimmino, M.A. et al. Prevalence of rheumatoid arthritis in Italy: the Chiavari study. Ann. Rheum. Dis. 57, 315–318 (1998).

    CAS  Article  Google Scholar 

  33. 33

    Rosati, G. The prevalence of multiple sclerosis in the world: an update. Neurol. Sci. 22, 117–139 (2001).

    CAS  Article  Google Scholar 

  34. 34

    Panza, F. et al. Shifts in angiotensin I converting enzyme insertion allele frequency across Europe: implications for Alzheimer's disease risk. J. Neurol. Neurosurg. Psychiatry 74, 1159–1161 (2003).

    CAS  Article  Google Scholar 

  35. 35

    Bernardi, F. et al. Contribution of factor VII genotype to activated FVII levels. Differences in genotype frequencies between northern and southern European populations. Arterioscler. Thromb. Vasc. Biol. 17, 2548–2553 (1997).

    CAS  Article  Google Scholar 

  36. 36

    Angastiniotis, M. & Modell, B. Global epidemiology of hemoglobin disorders. Ann. NY Acad. Sci. 850, 251–269 (1998).

    CAS  Article  Google Scholar 

  37. 37

    Clayton, D.G. et al. Population structure, differential bias and genomic control in a large-scale, case-control association study. Nat. Genet. 37, 1243–1246 (2005).

    CAS  Article  Google Scholar 

  38. 38

    Wright, S. The genetical structure of populations. Ann. Eugen. 15, 323–354 (1951).

    CAS  Article  Google Scholar 

  39. 39

    Benito-Garcia, E. et al. Dietary caffeine does not affect methotrexate efficacy in rheumatoid arthritis patients. J. Rheumatol. (in the press).

Download references

Acknowledgements

The authors are grateful to B. Blumenstiel, M. DeFelice, M. Parkin, R. Barry, W. Winslow, C. Healy and S. Gabriel for generation of the Affymetrix genotype data. We are grateful to the BRASS study participants, the BRASS study team, and our rheumatology colleagues at the Brigham and Women's Hospital Arthritis Center. We thank C. Campbell and J. Hirschhorn for helpful comments and sharing data from their paper6. The BRASS study was supported by a grant from Millennium Pharmaceuticals. D.R. is supported in part by a Burroughs Wellcome Career Development Award in the Biomedical Sciences.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Alkes L Price.

Ethics declarations

Competing interests

M.E.W. serves as a consultant to Millennium Pharmaceuticals; the BRASS study, which produced a data set described in the paper, was supported by a grant from Millenium Pharmaceuticals.

Supplementary information

Supplementary Fig. 1

P-P plot of EIGENSTRAT test statistics. (PDF 429 kb)

Supplementary Table 1

Simulations using K axes of variation. (PDF 58 kb)

Supplementary Table 2

Simulations using M SNPs. (PDF 66 kb)

Supplementary Table 3

Simulations of Pritchard and Donnelly. (PDF 68 kb)

Supplementary Table 4

Simulations with no stratification and n subpopulations. (PDF 73 kb)

Supplementary Table 5

Stratification correction at rs10511418 using M SNPs. (PDF 73 kb)

Supplementary Note (PDF 207 kb)

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Price, A., Patterson, N., Plenge, R. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 38, 904–909 (2006). https://doi.org/10.1038/ng1847

Download citation

Further reading

Search

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing