Letter | Published:

Interpreting principal component analyses of spatial population genetic variation

Nature Genetics volume 40, pages 646649 (2008) | Download Citation

Subjects

Abstract

Nearly 30 years ago, Cavalli-Sforza et al. pioneered the use of principal component analysis (PCA) in population genetics and used PCA to produce maps summarizing human genetic variation across continental regions1. They interpreted gradient and wave patterns in these maps as signatures of specific migration events1,2,3. These interpretations have been controversial4,5,6,7, but influential8, and the use of PCA has become widespread in analysis of population genetics data9,10,11,12,13. However, the behavior of PCA for genetic data showing continuous spatial variation, such as might exist within human continental groups, has been less well characterized. Here, we find that gradients and waves observed in Cavalli-Sforza et al.'s maps resemble sinusoidal mathematical artifacts that arise generally when PCA is applied to spatial data, implying that the patterns do not necessarily reflect specific migration events. Our findings aid interpretation of PCA results and suggest how PCA can help correct for continuous population structure in association studies.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

References

  1. 1.

    , & Synthetic maps of human gene frequencies in Europeans. Science 201, 786–792 (1978).

  2. 2.

    , & Demic expansions and human evolution. Science 259, 639–646 (1993).

  3. 3.

    , & The History and Geography of Human Genes (Princeton University Press, Princeton, New Jersey, USA, 1994).

  4. 4.

    , & Simulation and separation by principal components of multiple demic expansions in Europe. Am. Nat. 128, 681–706 (1986).

  5. 5.

    , & A problem with synthetic maps. Hum. Biol. 71, 1–13 (1999).

  6. 6.

    , & A problem with synthetic maps: Reply to Sokal et al. Hum. Biol. 71, 15–25 (1999).

  7. 7.

    & The effect of the Neolithic expansion on European molecular diversity. Proc. Biol. Sci. 272, 679–688 (2005).

  8. 8.

    , & Human Evolutionary Genetics (Garland Science, New York, 2004).

  9. 9.

    et al. African pastoralism: genetic imprints of origins and migrations. Science 296, 336–339 (2002).

  10. 10.

    et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904–909 (2006).

  11. 11.

    , & Population structure and eigenanalysis. PLoS Genet. 2, e190 (2006).

  12. 12.

    et al. Measuring European population stratification with microarray genotype data. Am. J. Hum. Genet. 80, 948–956 (2007).

  13. 13.

    et al. An African origin for the intimate association between humans and Helicobacter pylori. Nature 445, 915–918 (2007).

  14. 14.

    , & Discrete cosine transform. IEEE Trans. Comput. C-23, 90–93 (1974).

  15. 15.

    Time Series: Data Analysis and Theory (Holt, Rinehart, and Winston, New York, 1975).

  16. 16.

    & Resemblance coefficients and the horseshoe effect in principal coordinates analysis. Ecology 83, 3331–3343 (2002).

  17. 17.

    Rotation of principal components. J. Climatol. 6, 293–335 (1986).

  18. 18.

    The principal components of natural images revisited. IEEE Trans. Pattern Anal. Mach. Intell. 28, 822–826 (2006).

  19. 19.

    , ed. The International Dictionary of Applied Mathematics (D. Van Nostrand Co., Princeton, New Jersey, USA, 1960).

  20. 20.

    , & Horseshoes in multidimensional scaling and kernel methods. Ann. Appl. Stat. (in the press).

  21. 21.

    et al. An Arabidopsis example of association mapping in structured samples. PLoS Genet. 3, e4 (2007).

  22. 22.

    , , & Speciation by distance in a ring species. Science 307, 414–416 (2005).

  23. 23.

    , , & Going the distance: human population genetics in a clinal world. Trends Genet. 23, 432–439 (2007).

  24. 24.

    et al. Origin, diffusion, and differentiation of Y-chromosome haplogroups E and J: inferences on the Neolithization of Europe and later migratory events in the Mediterranean area. Am. J. Hum. Genet. 74, 1023–1034 (2004).

  25. 25.

    et al. Ancient DNA from the first European farmers in 7500-year-old Neolithic sites. Science 310, 1016–1018 (2005).

  26. 26.

    , & Tracing the origin and spread of agriculture in Europe. PLoS Biol. 3, e410 (2005).

  27. 27.

    Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678 (2007).

  28. 28.

    , , & Association mapping, using a mixture model for complex traits. Genet. Epidemiol. 23, 181–196 (2002).

  29. 29.

    & Use of unlinked genetic markers to detect population stratification in association studies. Am. J. Hum. Genet. 65, 220–228 (1999).

  30. 30.

    Disease mapping and spatial regression with count data. Biostatistics 8, 158–183 (2007).

Download references

Acknowledgements

We thank G. Coop, M. Przeworski, D. Reich, N. Patterson, Y. Guan, M. Barber and C. Becquet for helpful comments and D. Witonsky for pointing out the connection to Lissajous figures. Funding for this research was provided by a US National Science Foundation postdoctoral research fellowship in bioinformatics (J.N.) and US National Institutes of Health grant RO1 HG02585-01 (M.S.).

Author information

Author notes

    • John Novembre

    Present address: Department of Ecology and Evolutionary Biology, University of California Los Angeles, 621 Charles E. Young Dr. South, Los Angeles, California 90095, USA.

Affiliations

  1. Department of Human Genetics, University of Chicago, 920 E. 58th Street, CLSC 5th floor, Chicago, Illinois 60637, USA.

    • John Novembre
    •  & Matthew Stephens
  2. Department of Statistics, University of Chicago, 5734 S. University Ave., Chicago, Illinois 60637, USA.

    • Matthew Stephens

Authors

  1. Search for John Novembre in:

  2. Search for Matthew Stephens in:

Contributions

J.N. and M.S. jointly designed the analyses and interpreted results. J.N. performed the analyses. J.N. and M.S. wrote the paper.

Corresponding author

Correspondence to Matthew Stephens.

Supplementary information

PDF files

  1. 1.

    Supplementary Text and Figures

    Supplementary Methods, Supplementary Note and Supplementary Figures 1–9

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/ng.139