Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Letter
  • Published:

Interpreting principal component analyses of spatial population genetic variation

Abstract

Nearly 30 years ago, Cavalli-Sforza et al. pioneered the use of principal component analysis (PCA) in population genetics and used PCA to produce maps summarizing human genetic variation across continental regions1. They interpreted gradient and wave patterns in these maps as signatures of specific migration events1,2,3. These interpretations have been controversial4,5,6,7, but influential8, and the use of PCA has become widespread in analysis of population genetics data9,10,11,12,13. However, the behavior of PCA for genetic data showing continuous spatial variation, such as might exist within human continental groups, has been less well characterized. Here, we find that gradients and waves observed in Cavalli-Sforza et al.'s maps resemble sinusoidal mathematical artifacts that arise generally when PCA is applied to spatial data, implying that the patterns do not necessarily reflect specific migration events. Our findings aid interpretation of PCA results and suggest how PCA can help correct for continuous population structure in association studies.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Comparison of PC maps of ref. 3 with theoretical and empirical predictions.
Figure 2: Results of PCA applied to data from a one-dimensional habitat.

Similar content being viewed by others

References

  1. Menozzi, P., Piazza, A. & Cavalli-Sforza, L. Synthetic maps of human gene frequencies in Europeans. Science 201, 786–792 (1978).

    Article  CAS  PubMed  Google Scholar 

  2. Cavalli-Sforza, L.L., Menozzi, P. & Piazza, A. Demic expansions and human evolution. Science 259, 639–646 (1993).

    Article  CAS  PubMed  Google Scholar 

  3. Cavalli-Sforza, L.L., Menozzi, P. & Piazza, A. The History and Geography of Human Genes (Princeton University Press, Princeton, New Jersey, USA, 1994).

    Google Scholar 

  4. Rendine, S., Piazza, A. & Cavalli-Sforza, L.L. Simulation and separation by principal components of multiple demic expansions in Europe. Am. Nat. 128, 681–706 (1986).

    Article  Google Scholar 

  5. Sokal, R.R., Oden, N.L. & Thomson, B.A. A problem with synthetic maps. Hum. Biol. 71, 1–13 (1999).

    CAS  PubMed  Google Scholar 

  6. Rendine, S., Piazza, A. & Cavalli-Sforza, L.L. A problem with synthetic maps: Reply to Sokal et al. Hum. Biol. 71, 15–25 (1999).

    Google Scholar 

  7. Currat, M. & Excoffier, L. The effect of the Neolithic expansion on European molecular diversity. Proc. Biol. Sci. 272, 679–688 (2005).

    Article  PubMed  PubMed Central  Google Scholar 

  8. Jobling, M., Hurles, M. & Tyler-Smith, C. Human Evolutionary Genetics (Garland Science, New York, 2004).

    Google Scholar 

  9. Hanotte, O. et al. African pastoralism: genetic imprints of origins and migrations. Science 296, 336–339 (2002).

    Article  CAS  PubMed  Google Scholar 

  10. Price, A.L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904–909 (2006).

    Article  CAS  PubMed  Google Scholar 

  11. Patterson, N., Price, A. & Reich, D. Population structure and eigenanalysis. PLoS Genet. 2, e190 (2006).

    Article  PubMed  PubMed Central  Google Scholar 

  12. Bauchet, M. et al. Measuring European population stratification with microarray genotype data. Am. J. Hum. Genet. 80, 948–956 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Linz, B. et al. An African origin for the intimate association between humans and Helicobacter pylori. Nature 445, 915–918 (2007).

    Article  PubMed  PubMed Central  Google Scholar 

  14. Ahmed, N., Natarajan, T. & Rao, K.R. Discrete cosine transform. IEEE Trans. Comput. C-23, 90–93 (1974).

    Article  Google Scholar 

  15. Brillinger, D.R. Time Series: Data Analysis and Theory (Holt, Rinehart, and Winston, New York, 1975).

    Google Scholar 

  16. Podani, J. & Miklos, I. Resemblance coefficients and the horseshoe effect in principal coordinates analysis. Ecology 83, 3331–3343 (2002).

    Article  Google Scholar 

  17. Richman, M.B. Rotation of principal components. J. Climatol. 6, 293–335 (1986).

    Article  Google Scholar 

  18. Heidemann, G. The principal components of natural images revisited. IEEE Trans. Pattern Anal. Mach. Intell. 28, 822–826 (2006).

    Article  PubMed  Google Scholar 

  19. Freiberger, W., ed. The International Dictionary of Applied Mathematics (D. Van Nostrand Co., Princeton, New Jersey, USA, 1960).

    Google Scholar 

  20. Diaconis, P., Goel, S. & Holmes, S. Horseshoes in multidimensional scaling and kernel methods. Ann. Appl. Stat. (in the press).

  21. Zhao, K. et al. An Arabidopsis example of association mapping in structured samples. PLoS Genet. 3, e4 (2007).

    Article  PubMed  PubMed Central  Google Scholar 

  22. Irwin, D.E., Bensch, S., Irwin, J.H. & Price, T.D. Speciation by distance in a ring species. Science 307, 414–416 (2005).

    Article  CAS  PubMed  Google Scholar 

  23. Handley, L.J.L., Manica, A., Goudet, J. & Balloux, F. Going the distance: human population genetics in a clinal world. Trends Genet. 23, 432–439 (2007).

    Article  CAS  PubMed  Google Scholar 

  24. Semino, O. et al. Origin, diffusion, and differentiation of Y-chromosome haplogroups E and J: inferences on the Neolithization of Europe and later migratory events in the Mediterranean area. Am. J. Hum. Genet. 74, 1023–1034 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Haak, W. et al. Ancient DNA from the first European farmers in 7500-year-old Neolithic sites. Science 310, 1016–1018 (2005).

    CAS  PubMed  Google Scholar 

  26. Pinhasi, R., Fort, J. & Ammerman, A.J. Tracing the origin and spread of agriculture in Europe. PLoS Biol. 3, e410 (2005).

    Article  PubMed  PubMed Central  Google Scholar 

  27. Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678 (2007).

  28. Zhu, X., Zhang, S., Zhao, H. & Cooper, R.S. Association mapping, using a mixture model for complex traits. Genet. Epidemiol. 23, 181–196 (2002).

    Article  PubMed  Google Scholar 

  29. Pritchard, J.K. & Rosenberg, N.A. Use of unlinked genetic markers to detect population stratification in association studies. Am. J. Hum. Genet. 65, 220–228 (1999).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Wakefield, J. Disease mapping and spatial regression with count data. Biostatistics 8, 158–183 (2007).

    Article  PubMed  Google Scholar 

Download references

Acknowledgements

We thank G. Coop, M. Przeworski, D. Reich, N. Patterson, Y. Guan, M. Barber and C. Becquet for helpful comments and D. Witonsky for pointing out the connection to Lissajous figures. Funding for this research was provided by a US National Science Foundation postdoctoral research fellowship in bioinformatics (J.N.) and US National Institutes of Health grant RO1 HG02585-01 (M.S.).

Author information

Authors and Affiliations

Authors

Contributions

J.N. and M.S. jointly designed the analyses and interpreted results. J.N. performed the analyses. J.N. and M.S. wrote the paper.

Corresponding author

Correspondence to Matthew Stephens.

Supplementary information

Supplementary Text and Figures

Supplementary Methods, Supplementary Note and Supplementary Figures 1–9 (PDF 2227 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Novembre, J., Stephens, M. Interpreting principal component analyses of spatial population genetic variation. Nat Genet 40, 646–649 (2008). https://doi.org/10.1038/ng.139

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/ng.139

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing