Abstract
Nearly 30 years ago, Cavalli-Sforza et al. pioneered the use of principal component analysis (PCA) in population genetics and used PCA to produce maps summarizing human genetic variation across continental regions1. They interpreted gradient and wave patterns in these maps as signatures of specific migration events1,2,3. These interpretations have been controversial4,5,6,7, but influential8, and the use of PCA has become widespread in analysis of population genetics data9,10,11,12,13. However, the behavior of PCA for genetic data showing continuous spatial variation, such as might exist within human continental groups, has been less well characterized. Here, we find that gradients and waves observed in Cavalli-Sforza et al.'s maps resemble sinusoidal mathematical artifacts that arise generally when PCA is applied to spatial data, implying that the patterns do not necessarily reflect specific migration events. Our findings aid interpretation of PCA results and suggest how PCA can help correct for continuous population structure in association studies.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Menozzi, P., Piazza, A. & Cavalli-Sforza, L. Synthetic maps of human gene frequencies in Europeans. Science 201, 786–792 (1978).
Cavalli-Sforza, L.L., Menozzi, P. & Piazza, A. Demic expansions and human evolution. Science 259, 639–646 (1993).
Cavalli-Sforza, L.L., Menozzi, P. & Piazza, A. The History and Geography of Human Genes (Princeton University Press, Princeton, New Jersey, USA, 1994).
Rendine, S., Piazza, A. & Cavalli-Sforza, L.L. Simulation and separation by principal components of multiple demic expansions in Europe. Am. Nat. 128, 681–706 (1986).
Sokal, R.R., Oden, N.L. & Thomson, B.A. A problem with synthetic maps. Hum. Biol. 71, 1–13 (1999).
Rendine, S., Piazza, A. & Cavalli-Sforza, L.L. A problem with synthetic maps: Reply to Sokal et al. Hum. Biol. 71, 15–25 (1999).
Currat, M. & Excoffier, L. The effect of the Neolithic expansion on European molecular diversity. Proc. Biol. Sci. 272, 679–688 (2005).
Jobling, M., Hurles, M. & Tyler-Smith, C. Human Evolutionary Genetics (Garland Science, New York, 2004).
Hanotte, O. et al. African pastoralism: genetic imprints of origins and migrations. Science 296, 336–339 (2002).
Price, A.L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904–909 (2006).
Patterson, N., Price, A. & Reich, D. Population structure and eigenanalysis. PLoS Genet. 2, e190 (2006).
Bauchet, M. et al. Measuring European population stratification with microarray genotype data. Am. J. Hum. Genet. 80, 948–956 (2007).
Linz, B. et al. An African origin for the intimate association between humans and Helicobacter pylori. Nature 445, 915–918 (2007).
Ahmed, N., Natarajan, T. & Rao, K.R. Discrete cosine transform. IEEE Trans. Comput. C-23, 90–93 (1974).
Brillinger, D.R. Time Series: Data Analysis and Theory (Holt, Rinehart, and Winston, New York, 1975).
Podani, J. & Miklos, I. Resemblance coefficients and the horseshoe effect in principal coordinates analysis. Ecology 83, 3331–3343 (2002).
Richman, M.B. Rotation of principal components. J. Climatol. 6, 293–335 (1986).
Heidemann, G. The principal components of natural images revisited. IEEE Trans. Pattern Anal. Mach. Intell. 28, 822–826 (2006).
Freiberger, W., ed. The International Dictionary of Applied Mathematics (D. Van Nostrand Co., Princeton, New Jersey, USA, 1960).
Diaconis, P., Goel, S. & Holmes, S. Horseshoes in multidimensional scaling and kernel methods. Ann. Appl. Stat. (in the press).
Zhao, K. et al. An Arabidopsis example of association mapping in structured samples. PLoS Genet. 3, e4 (2007).
Irwin, D.E., Bensch, S., Irwin, J.H. & Price, T.D. Speciation by distance in a ring species. Science 307, 414–416 (2005).
Handley, L.J.L., Manica, A., Goudet, J. & Balloux, F. Going the distance: human population genetics in a clinal world. Trends Genet. 23, 432–439 (2007).
Semino, O. et al. Origin, diffusion, and differentiation of Y-chromosome haplogroups E and J: inferences on the Neolithization of Europe and later migratory events in the Mediterranean area. Am. J. Hum. Genet. 74, 1023–1034 (2004).
Haak, W. et al. Ancient DNA from the first European farmers in 7500-year-old Neolithic sites. Science 310, 1016–1018 (2005).
Pinhasi, R., Fort, J. & Ammerman, A.J. Tracing the origin and spread of agriculture in Europe. PLoS Biol. 3, e410 (2005).
Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678 (2007).
Zhu, X., Zhang, S., Zhao, H. & Cooper, R.S. Association mapping, using a mixture model for complex traits. Genet. Epidemiol. 23, 181–196 (2002).
Pritchard, J.K. & Rosenberg, N.A. Use of unlinked genetic markers to detect population stratification in association studies. Am. J. Hum. Genet. 65, 220–228 (1999).
Wakefield, J. Disease mapping and spatial regression with count data. Biostatistics 8, 158–183 (2007).
Acknowledgements
We thank G. Coop, M. Przeworski, D. Reich, N. Patterson, Y. Guan, M. Barber and C. Becquet for helpful comments and D. Witonsky for pointing out the connection to Lissajous figures. Funding for this research was provided by a US National Science Foundation postdoctoral research fellowship in bioinformatics (J.N.) and US National Institutes of Health grant RO1 HG02585-01 (M.S.).
Author information
Authors and Affiliations
Contributions
J.N. and M.S. jointly designed the analyses and interpreted results. J.N. performed the analyses. J.N. and M.S. wrote the paper.
Corresponding author
Supplementary information
Supplementary Text and Figures
Supplementary Methods, Supplementary Note and Supplementary Figures 1–9 (PDF 2227 kb)
Rights and permissions
About this article
Cite this article
Novembre, J., Stephens, M. Interpreting principal component analyses of spatial population genetic variation. Nat Genet 40, 646–649 (2008). https://doi.org/10.1038/ng.139
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/ng.139
This article is cited by
-
Cross-ancestry genetic architecture and prediction for cholesterol traits
Human Genetics (2024)
-
Demographic history of Ryukyu islanders at the southern part of the Japanese Archipelago inferred from whole-genome resequencing data
Journal of Human Genetics (2023)
-
A method for an unbiased estimate of cross-ancestry genetic correlation using individual-level data
Nature Communications (2023)
-
Mink invasion in Chiloé Island, Chile: population genetics and Leptospira spp. detection in Neovison vison
Mammal Research (2023)
-
Universal prediction of cell-cycle position using transfer learning
Genome Biology (2022)