Visualizing spatial population structure with estimated effective migration surfaces

Journal name:
Nature Genetics
Volume:
48,
Pages:
94–100
Year published:
DOI:
doi:10.1038/ng.3464
Received
Accepted
Published online

Abstract

Genetic data often exhibit patterns broadly consistent with 'isolation by distance'—a phenomenon where genetic similarity decays with geographic distance. In a heterogeneous habitat, this may occur more quickly in some regions than in others: for example, barriers to gene flow can accelerate differentiation between neighboring groups. We use the concept of 'effective migration' to model the relationship between genetics and geography. In this paradigm, effective migration is low in regions where genetic similarity decays quickly. We present a method to visualize variation in effective migration across a habitat from geographically indexed genetic data. Our approach uses a population genetic model to relate effective migration rates to expected genetic dissimilarities. We illustrate its potential and limitations using simulations and data from elephant, human and Arabidopsis thaliana populations. The resulting visualizations highlight important spatial features of population structure that are difficult to discern using existing methods for summarizing genetic variation.

At a glance

Figures

  1. A schematic overview of EEMS (Estimated Effective Migration Surfaces), using African elephant data for illustration.
    Figure 1: A schematic overview of EEMS (Estimated Effective Migration Surfaces), using African elephant data for illustration.

    (ac) Setting up the population grid. (a) Samples are collected at known locations across a two-dimensional habitat; green and orange represent two species of African elephant, forest and savanna, respectively. (b) A dense triangular grid is chosen to span the habitat. (c) Each sample is assigned to the closest deme on the grid. (df) EEMS analysis. (d) Migration rates vary according to a Voronoi tessellation that partitions the habitat into 'cells' with constant migration rate; colors represent relative rates of migration, ranging from low (orange) to high (blue). (e) Each edge has the same migration rate as the cell into which it falls. Cell locations and migration rates are adjusted, using Bayesian inference, so that expected genetic dissimilarities under the EEMS model match observed genetic dissimilarities. (f) The EEMS is a color contour plot produced by averaging draws from the posterior distribution of the migration rates, interpolating between grid points. Here and in all other figures, log(m) denotes the effective migration rate on a log10 scale, relative to the overall migration rate across the habitat. (Thus, log(m) = 1 corresponds to effective migration that is tenfold faster than the average.) The main feature of the EEMS for the African elephant is a barrier of low effective migration that separates the habitats of the two species: forest elephants to the west and savanna elephants to the north, south and east.

  2. Simulations comparing EEMS and PCA.
    Figure 2: Simulations comparing EEMS and PCA.

    For each method, we show results for two migration scenarios—representing uniform migration and a barrier to migration—and three different sampling schemes. (a,b) The true underlying migration rates for the uniform (a) and barrier (b) scenarios; colors represent relative migration rates. (c) The three sampling schemes used; the size of the circle at each node is proportional to the number of individuals sampled at that location, and locations are color-coded to facilitate cross-referencing the EEMS and PCA results. (d) PCA results. (e) EEMS results. In contrast to PCA, EEMS is robust to sampling scheme and shows clear qualitative differences between the estimated effective migration rates under the two scenarios, reflecting the underlying simulation truth.

  3. Simulations illustrate that EEMS infers effective migration rates rather than actual steady-state migration rates.
    Figure 3: Simulations illustrate that EEMS infers effective migration rates rather than actual steady-state migration rates.

    (a) Individuals have uniform migration rates, but the central area of the habitat has a lower population density (demes in this region have fewer individuals, as represented by smaller circles in gray). Thus, fewer migrants are exchanged per generation in the central area, producing an effective barrier to gene flow that is reflected in the EEMS. (b) A simple population split scenario: migration was initially uniform, but at some time in the past a complete barrier to migration arose in the central area (represented by dashed edges). Under this scenario, the groups on either side of the central region have diverged, which creates a barrier in the EEMS.

  4. EEMS analysis of African elephant data.
    Figure 4: EEMS analysis of African elephant data.

    (a) African elephant samples are collected from two species in five biogeographic regions: the forest elephant (in green) inhabits the western and central regions, and the savanna elephant (in orange) inhabits the northern, eastern and southern regions. (b) Estimated effective migration rates for forest and savanna samples analyzed jointly. (c,d) Estimated effective migration rates for savanna (c) and forest (d) samples analyzed separately.

  5. EEMS analysis of human population structure in Western Europe and in sub-Saharan Africa.
    Figure 5: EEMS analysis of human population structure in Western Europe and in sub-Saharan Africa.

    (a) Effective migration rates in Western Europe, estimated using geo-referenced data from the POPRES (Population Reference Sample) project34. (b) Effective migration rates in sub-Saharan Africa, estimated using geo-referenced data from two previously published studies35, 36. Population abbreviations are defined in the Supplementary Note.

  6. EEMS analysis of A. thaliana data from the RegMap panel.
    Figure 6: EEMS analysis of A. thaliana data from the RegMap panel.

    (a) Estimated effective migration rates in North America, Europe and across the Atlantic Ocean. (b) Estimated effective migration rates within North America. (c) Estimated effective migration rates within Europe.

References

  1. Li, J.Z. et al. Worldwide human relationships inferred from genome-wide patterns of variation. Science 319, 11001104 (2008).
  2. Reich, D., Thangaraj, K., Patterson, N., Price, A.L. & Singh, L. Reconstructing Indian population history. Nature 461, 489494 (2009).
  3. Beaumont, M.A. & Balding, D.J. Identifying adaptive genetic divergence among populations from genome scans. Mol. Ecol. 13, 969980 (2004).
  4. Becquet, C. & Przeworski, M. A new approach to estimate parameters of speciation models with application to apes. Genome Res. 17, 15051519 (2007).
  5. Teeter, K.C. et al. Genome-wide patterns of gene flow across a house mouse hybrid zone. Genome Res. 18, 6776 (2008).
  6. Kronforst, M.R., Young, L.G., Blume, L.M. & Gilbert, L.E. Multilocus analyses of admixture and introgression among hybridizing Heliconius butterflies. Evolution 60, 12541268 (2006).
  7. Hinch, A. et al. The landscape of recombination in African Americans. Nature 476, 170175 (2011).
  8. Gonder, M.K. et al. Evidence from Cameroon reveals differences in the genetic structure and histories of chimpanzee populations. Proc. Natl. Acad. Sci. USA 108, 47664771 (2011).
  9. Wasser, S.K. et al. Assigning African elephant DNA to geographic region of origin: applications to the ivory trade. Proc. Natl. Acad. Sci. USA 101, 1484714852 (2004).
  10. Yang, W.-Y., Novembre, J., Eskin, E. & Halperin, E. A model-based approach for analysis of spatial structure in genetic data. Nat. Genet. 44, 725731 (2012).
  11. Campbell, C.D. et al. Demonstrating stratification in a European American population. Nat. Genet. 37, 868872 (2005).
  12. Price, A.L., Zaitlen, N.A., Reich, D. & Patterson, N. New approaches to population stratification in genome-wide association studies. Nat. Rev. Genet. 11, 459463 (2010).
  13. Pritchard, J.K., Stephens, M. & Donnelly, P. Inference of population structure using multilocus genotype data. Genetics 155, 945959 (2000).
  14. Guillot, G., Estoup, A., Mortier, F. & Cosson, J.F. A spatial statistical model for landscape genetics. Genetics 170, 12611280 (2005).
  15. Price, A.L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904909 (2006).
  16. Patterson, N., Price, A.L. & Reich, D. Population structure and eigenanalysis. PLoS Genet. 2, e190 (2006).
  17. Rousset, F. Genetic differentiation and estimation of gene flow from F-statistics under isolation by distance. Genetics 145, 12191228 (1997).
  18. Novembre, J. et al. Genes mirror geography within Europe. Nature 456, 98101 (2008).
  19. Novembre, J. & Stephens, M. Interpreting principal component analyses of spatial population genetic variation. Nat. Genet. 40, 646649 (2008).
  20. McVean, G. A genealogical interpretation of principal components analysis. PLoS Genet. 5, e1000686 (2009).
  21. DeGiorgio, M. & Rosenberg, N.A. Geographic sampling scheme as a determinant of the major axis of genetic variation in principal components analysis. Mol. Biol. Evol. 30, 480488 (2013).
  22. Manni, F., Guérard, E. & Heyer, E. Geographic patterns of (genetic, morphologic, linguistic) variation: how barriers can be detected by using Monmonier's algorithm. Hum. Biol. 76, 173190 (2004).
  23. Manel, S. et al. A new individual-based spatial approach for identifying genetic discontinuities in natural populations. Mol. Ecol. 16, 20312043 (2007).
  24. Duforet-Frebourg, N. & Blum, M.G. Nonstationary patterns of isolation-by-distance: inferring measures of local genetic differentiation with Bayesian kriging. Evolution 68, 11101123 (2014).
  25. Beerli, P. & Felsenstein, J. Maximum likelihood estimation of a migration matrix and effective population sizes in n subpopulations by using a coalescent approach. Proc. Natl. Acad. Sci. USA 98, 45634568 (2001).
  26. McRae, B.H. Isolation by resistance. Evolution 60, 15511561 (2006).
  27. Hanks, E. & Hooten, M. Circuit theory and model-based inference for landscape connectivity. J. Am. Stat. Assoc. 108, 2233 (2013).
  28. Kimura, M. & Weiss, G.H. The stepping stone model of population structure and the decrease of genetic correlation with distance. Genetics 49, 561576 (1964).
  29. Hudson, R.R. Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics 18, 337338 (2002).
  30. Wasser, S.K. et al. Genetic assignment of large seizures of elephant ivory reveals Africa's major poaching hotspots. Science 349, 8487 (2015).
  31. Beaumont, M.A. & Nichols, R.A. Evaluating loci for use in the genetic analysis of population structure. Proc. R. Soc. Lond. B 263, 16191626 (1996).
  32. Georgiadis, N. et al. Structure and history of African elephant populations: I. eastern and southern Africa. J. Hered. 85, 100104 (1994).
  33. Comstock, K.E. et al. Patterns of molecular genetic variation among African elephant populations. Mol. Ecol. 11, 24892498 (2002).
  34. Nelson, M.R. et al. The Population Reference Sample, POPRES: a resource for population, disease, and pharmacological genetics research. Am. J. Hum. Genet. 83, 347358 (2008).
  35. Xing, J. et al. Toward a more uniform sampling of human genetic diversity: a survey of worldwide populations by high-density genotyping. Genomics 96, 199210 (2010).
  36. Henn, B.M. et al. Hunter-gatherer genomic diversity suggests a southern African origin for modern humans. Proc. Natl. Acad. Sci. USA 108, 51545162 (2011).
  37. Wang, C., Zöllner, S. & Rosenberg, N.A. A quantitative comparison of the similarity between genes and geography in worldwide human populations. PLoS Genet. 8, e1002886 (2012).
  38. Lao, O. et al. Correlation between genetic and geographic structure in Europe. Curr. Biol. 18, 12411248 (2008).
  39. Tian, C. et al. Analysis and application of European genetic substructure using 300 K SNP information. PLoS Genet. 4, e4 (2008).
  40. Nordborg, M. et al. The pattern of polymorphism in Arabidopsis thaliana. PLoS Biol. 3, e196 (2005).
  41. Platt, A. et al. The scale of population structure in Arabidopsis thaliana. PLoS Genet. 6, e1000843 (2010).
  42. Horton, M.W. et al. Genome-wide patterns of genetic variation in worldwide Arabidopsis thaliana accessions from the RegMap panel. Nat. Genet. 44, 212216 (2012).
  43. O'Kane, S. & Al-Shehbaz, I. A synopsis of Arabidopsis (Brassicaceae). Novon 7, 323327 (1997).
  44. McRae, B.H., Dickson, B.G., Keitt, T.H. & Shah, V.B. Using circuit theory to model connectivity in ecology, evolution, and conservation. Ecology 89, 27122724 (2008).
  45. Felsenstein, J. A pain in the torus: some difficulties with models of isolation by distance. Am. Nat. 109, 359368 (1975).
  46. Lawson, D.J., Hellenthal, G., Myers, S. & Falush, D. Inference of population structure using dense haplotype data. PLoS Genet. 8, e1002453 (2012).
  47. Mathieson, I. & McVean, G. Differential confounding of rare and common variants in spatially structured populations. Nat. Genet. 44, 243246 (2012).
  48. Cavalli-Sforza, L.L. & Edwards, A.W. Phylogenetic analysis. Models and estimation procedures. Am. J. Hum. Genet. 19, 233257 (1967).
  49. Felsenstein, J. Maximum-likelihood estimation of evolutionary trees from continuous characters. Am. J. Hum. Genet. 25, 471492 (1973).
  50. Saitou, N. & Nei, M. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4, 406425 (1987).
  51. Pickrell, J.K. & Pritchard, J.K. Inference of population splits and mixtures from genome-wide allele frequency data. PLoS Genet. 8, e1002967 (2012).
  52. McCullagh, P. Marginal likelihood for distance matrices. Stat. Sin. 19, 631649 (2009).
  53. Bahlo, M. & Griffiths, R.C. Coalescence time for two genes from a subdivided population. J. Math. Biol. 43, 397410 (2001).
  54. Hey, J. A multi-dimensional coalescent process applied to multi-allelic selection models and migration models. Theor. Popul. Biol. 39, 3048 (1991).
  55. Klein, D. & Randić, M. Resistance distance. J. Math. Chem. 12, 8195 (1993).
  56. Babić, D., Klein, D., Lukovits, I., Nikolić, S. & Trinajstić, N. Resistance-distance matrix: a computational algorithm and its application. Int. J. Quantum Chem. 90, 166176 (2002).
  57. Light, A. & Bartlein, P. The end of the rainbow? Color schemes for improved data graphics. Eos 85, 385 (2004).

Download references

Author information

Affiliations

  1. Department of Statistics, The University of Chicago, Chicago, Illinois, USA.

    • Desislava Petkova &
    • Matthew Stephens
  2. Wellcome Trust Centre for Human Genetics, Oxford, UK.

    • Desislava Petkova
  3. Department of Human Genetics, The University of Chicago, Chicago, Illinois, USA.

    • John Novembre &
    • Matthew Stephens

Contributions

M.S. and J.N. conceived the project. D.P., J.N. and M.S. developed and refined methods. D.P. implemented methods. D.P., J.N. and M.S. wrote the manuscript.

Competing financial interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to:

Author details

Supplementary information

PDF files

  1. Supplementary Text and Figures (11,060 KB)

    Supplementary Figures 1–17 and Supplementary Note.

Additional data