Abstract
Nearly 30 years ago, Cavalli-Sforza et al. pioneered the use of principal component analysis (PCA) in population genetics and used PCA to produce maps summarizing human genetic variation across continental regions1. They interpreted gradient and wave patterns in these maps as signatures of specific migration events1,2,3. These interpretations have been controversial4,5,6,7, but influential8, and the use of PCA has become widespread in analysis of population genetics data9,10,11,12,13. However, the behavior of PCA for genetic data showing continuous spatial variation, such as might exist within human continental groups, has been less well characterized. Here, we find that gradients and waves observed in Cavalli-Sforza et al.'s maps resemble sinusoidal mathematical artifacts that arise generally when PCA is applied to spatial data, implying that the patterns do not necessarily reflect specific migration events. Our findings aid interpretation of PCA results and suggest how PCA can help correct for continuous population structure in association studies.
Access options
Subscribe to Journal
Get full journal access for 1 year
70,80 €
only 5,90 € per issue
All prices include VAT for France.
Rent or Buy article
Get time limited or full article access on ReadCube.
from$8.99
All prices are NET prices.
References
- 1.
Menozzi, P., Piazza, A. & Cavalli-Sforza, L. Synthetic maps of human gene frequencies in Europeans. Science 201, 786–792 (1978).
- 2.
Cavalli-Sforza, L.L., Menozzi, P. & Piazza, A. Demic expansions and human evolution. Science 259, 639–646 (1993).
- 3.
Cavalli-Sforza, L.L., Menozzi, P. & Piazza, A. The History and Geography of Human Genes (Princeton University Press, Princeton, New Jersey, USA, 1994).
- 4.
Rendine, S., Piazza, A. & Cavalli-Sforza, L.L. Simulation and separation by principal components of multiple demic expansions in Europe. Am. Nat. 128, 681–706 (1986).
- 5.
Sokal, R.R., Oden, N.L. & Thomson, B.A. A problem with synthetic maps. Hum. Biol. 71, 1–13 (1999).
- 6.
Rendine, S., Piazza, A. & Cavalli-Sforza, L.L. A problem with synthetic maps: Reply to Sokal et al. Hum. Biol. 71, 15–25 (1999).
- 7.
Currat, M. & Excoffier, L. The effect of the Neolithic expansion on European molecular diversity. Proc. Biol. Sci. 272, 679–688 (2005).
- 8.
Jobling, M., Hurles, M. & Tyler-Smith, C. Human Evolutionary Genetics (Garland Science, New York, 2004).
- 9.
Hanotte, O. et al. African pastoralism: genetic imprints of origins and migrations. Science 296, 336–339 (2002).
- 10.
Price, A.L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904–909 (2006).
- 11.
Patterson, N., Price, A. & Reich, D. Population structure and eigenanalysis. PLoS Genet. 2, e190 (2006).
- 12.
Bauchet, M. et al. Measuring European population stratification with microarray genotype data. Am. J. Hum. Genet. 80, 948–956 (2007).
- 13.
Linz, B. et al. An African origin for the intimate association between humans and Helicobacter pylori. Nature 445, 915–918 (2007).
- 14.
Ahmed, N., Natarajan, T. & Rao, K.R. Discrete cosine transform. IEEE Trans. Comput. C-23, 90–93 (1974).
- 15.
Brillinger, D.R. Time Series: Data Analysis and Theory (Holt, Rinehart, and Winston, New York, 1975).
- 16.
Podani, J. & Miklos, I. Resemblance coefficients and the horseshoe effect in principal coordinates analysis. Ecology 83, 3331–3343 (2002).
- 17.
Richman, M.B. Rotation of principal components. J. Climatol. 6, 293–335 (1986).
- 18.
Heidemann, G. The principal components of natural images revisited. IEEE Trans. Pattern Anal. Mach. Intell. 28, 822–826 (2006).
- 19.
Freiberger, W., ed. The International Dictionary of Applied Mathematics (D. Van Nostrand Co., Princeton, New Jersey, USA, 1960).
- 20.
Diaconis, P., Goel, S. & Holmes, S. Horseshoes in multidimensional scaling and kernel methods. Ann. Appl. Stat. (in the press).
- 21.
Zhao, K. et al. An Arabidopsis example of association mapping in structured samples. PLoS Genet. 3, e4 (2007).
- 22.
Irwin, D.E., Bensch, S., Irwin, J.H. & Price, T.D. Speciation by distance in a ring species. Science 307, 414–416 (2005).
- 23.
Handley, L.J.L., Manica, A., Goudet, J. & Balloux, F. Going the distance: human population genetics in a clinal world. Trends Genet. 23, 432–439 (2007).
- 24.
Semino, O. et al. Origin, diffusion, and differentiation of Y-chromosome haplogroups E and J: inferences on the Neolithization of Europe and later migratory events in the Mediterranean area. Am. J. Hum. Genet. 74, 1023–1034 (2004).
- 25.
Haak, W. et al. Ancient DNA from the first European farmers in 7500-year-old Neolithic sites. Science 310, 1016–1018 (2005).
- 26.
Pinhasi, R., Fort, J. & Ammerman, A.J. Tracing the origin and spread of agriculture in Europe. PLoS Biol. 3, e410 (2005).
- 27.
Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678 (2007).
- 28.
Zhu, X., Zhang, S., Zhao, H. & Cooper, R.S. Association mapping, using a mixture model for complex traits. Genet. Epidemiol. 23, 181–196 (2002).
- 29.
Pritchard, J.K. & Rosenberg, N.A. Use of unlinked genetic markers to detect population stratification in association studies. Am. J. Hum. Genet. 65, 220–228 (1999).
- 30.
Wakefield, J. Disease mapping and spatial regression with count data. Biostatistics 8, 158–183 (2007).
Acknowledgements
We thank G. Coop, M. Przeworski, D. Reich, N. Patterson, Y. Guan, M. Barber and C. Becquet for helpful comments and D. Witonsky for pointing out the connection to Lissajous figures. Funding for this research was provided by a US National Science Foundation postdoctoral research fellowship in bioinformatics (J.N.) and US National Institutes of Health grant RO1 HG02585-01 (M.S.).
Author information
Author notes
- John Novembre
Present address: Department of Ecology and Evolutionary Biology, University of California Los Angeles, 621 Charles E. Young Dr. South, Los Angeles, California 90095, USA.
Affiliations
Department of Human Genetics, University of Chicago, 920 E. 58th Street, CLSC 5th floor, Chicago, Illinois 60637, USA.
- John Novembre
- & Matthew Stephens
Department of Statistics, University of Chicago, 5734 S. University Ave., Chicago, Illinois 60637, USA.
- Matthew Stephens
Authors
Search for John Novembre in:
Search for Matthew Stephens in:
Contributions
J.N. and M.S. jointly designed the analyses and interpreted results. J.N. performed the analyses. J.N. and M.S. wrote the paper.
Corresponding author
Correspondence to Matthew Stephens.
Supplementary information
PDF files
- 1.
Supplementary Text and Figures
Supplementary Methods, Supplementary Note and Supplementary Figures 1–9
Rights and permissions
To obtain permission to re-use content from this article visit RightsLink.
About this article
Further reading
-
1.
Wolf outside, dog inside? The genomic make-up of the Czechoslovakian Wolfdog
BMC Genomics (2018)
-
2.
Nonparametric approaches for population structure analysis
Human Genomics (2018)
-
3.
Journal of Pest Science (2018)
-
4.
International Journal of Health Geographics (2018)
-
5.
Genomic history of the Sardinian population
Nature Genetics (2018)