Interpreting principal component analyses of spatial population genetic variation


Nearly 30 years ago, Cavalli-Sforza et al. pioneered the use of principal component analysis (PCA) in population genetics and used PCA to produce maps summarizing human genetic variation across continental regions1. They interpreted gradient and wave patterns in these maps as signatures of specific migration events1,2,3. These interpretations have been controversial4,5,6,7, but influential8, and the use of PCA has become widespread in analysis of population genetics data9,10,11,12,13. However, the behavior of PCA for genetic data showing continuous spatial variation, such as might exist within human continental groups, has been less well characterized. Here, we find that gradients and waves observed in Cavalli-Sforza et al.'s maps resemble sinusoidal mathematical artifacts that arise generally when PCA is applied to spatial data, implying that the patterns do not necessarily reflect specific migration events. Our findings aid interpretation of PCA results and suggest how PCA can help correct for continuous population structure in association studies.

Figure 1: Comparison of PC maps of ref. 3 with theoretical and empirical predictions.
Figure 2: Results of PCA applied to data from a one-dimensional habitat.


We thank G. Coop, M. Przeworski, D. Reich, N. Patterson, Y. Guan, M. Barber and C. Becquet for helpful comments and D. Witonsky for pointing out the connection to Lissajous figures. Funding for this research was provided by a US National Science Foundation postdoctoral research fellowship in bioinformatics (J.N.) and US National Institutes of Health grant RO1 HG02585-01 (M.S.).

J.N. and M.S. jointly designed the analyses and interpreted results. J.N. performed the analyses. J.N. and M.S. wrote the paper.

Correspondence to Matthew Stephens.

