Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Technical Report
  • Published:

A model-based approach for analysis of spatial structure in genetic data

Abstract

Characterizing genetic diversity within and between populations has broad applications in studies of human disease and evolution. We propose a new approach, spatial ancestry analysis, for the modeling of genotypes in two- or three-dimensional space. In spatial ancestry analysis (SPA), we explicitly model the spatial distribution of each SNP by assigning an allele frequency as a continuous function in geographic space. We show that the explicit modeling of the allele frequency allows individuals to be localized on the map on the basis of their genetic information alone. We apply our SPA method to a European and a worldwide population genetic variation data set and identify SNPs showing large gradients in allele frequency, and we suggest these as candidate regions under selection. These regions include SNPs in the well-characterized LCT region, as well as at loci including FOXP2, OCA2 and LRP1B.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Examples of the allele frequency slope model.
Figure 2: Model-based mapping convergence with random initialization.
Figure 3: Mapping spatial structure on a globe using HGDP data.
Figure 4: The distribution of SPA scores representing allele frequency gradients.
Figure 5: Selection results of six methods in two chromosomes.

Similar content being viewed by others

References

  1. Price, A.L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904–909 (2006).

    Article  CAS  PubMed  Google Scholar 

  2. Seldin, M.F., Pasaniuc, B. & Price, A.L. New approaches to disease mapping in admixed populations. Nat. Rev. Genet. 12, 523–528 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Lewontin, R.C. & Krakauer, J. Distribution of gene frequency as a test of the theory of the selective neutrality of polymorphisms. Genetics 74, 175–195 (1973).

    CAS  PubMed  PubMed Central  Google Scholar 

  4. Pickrell, J.K. et al. Signals of recent positive selection in a worldwide sample of human populations. Genome Res. 19, 826–837 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Coop, G. et al. The role of geography in human adaptation. PLoS Genet. 5, e1000500 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  6. Jakobsson, M. et al. Genotype, haplotype and copy-number variation in worldwide human populations. Nature 451, 998–1003 (2008).

    Article  CAS  PubMed  Google Scholar 

  7. Li, J.Z. et al. Worldwide human relationships inferred from genome-wide patterns of variation. Science 319, 1100–1104 (2008).

    Article  CAS  PubMed  Google Scholar 

  8. Lao, O. et al. Correlation between genetic and geographic structure in Europe. Curr. Biol. 18, 1241–1248 (2008).

    Article  CAS  PubMed  Google Scholar 

  9. Novembre, J. et al. Genes mirror geography within Europe. Nature 456, 98–101 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Novembre, J. & Stephens, M. Interpreting principal component analyses of spatial population genetic variation. Nat. Genet. 40, 646–649 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. McVean, G. A genealogical interpretation of principal components analysis. PLoS Genet. 5, e1000686 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  12. Novembre, J. & Di Rienzo, A. Spatial patterns of variation due to natural selection in humans. Nat. Rev. Genet. 10, 745–755 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Excoffier, L. & Ray, N. Surfing during population expansions promotes genetic revolutions and structuration. Trends Ecol. Evol. 23, 347–351 (2008).

    Article  PubMed  Google Scholar 

  14. Voight, B.F., Kudaravalli, S., Wen, X. & Pritchard, J.K. A map of recent positive selection in the human genome. PLoS Biol. 4, e72 (2006).

    Article  PubMed  PubMed Central  Google Scholar 

  15. Holsinger, K.E. & Weir, B.S. Genetics in geographically structured populations: defining, estimating and interpreting FST . Nat. Rev. Genet. 10, 639–650 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Coop, G., Witonsky, D., Di Rienzo, A. & Pritchard, J.K. Using environmental correlations to identify loci underlying local adaptation. Genetics 185, 1411–1423 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Nelson, M.R. et al. The population reference sample, POPRES: a resource for population, disease, and pharmacological genetics research. Am. J. Hum. Genet. 83, 347–358 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Sabeti, P.C. et al. Detecting recent positive selection in the human genome from haplotype structure. Nature 419, 832–837 (2002).

    Article  CAS  PubMed  Google Scholar 

  19. Bersaglieri, T. et al. Genetic signatures of strong recent positive selection at the lactase gene. Am. J. Hum. Genet. 74, 1111–1120 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Enard, W. et al. Molecular evolution of FOXP2, a gene involved in speech and language. Nature 418, 869–872 (2002).

    Article  CAS  PubMed  Google Scholar 

  21. Liu, C.X., Musco, S., Lisitsina, N.M., Yaklichkin, S.Y. & Lisitsyn, N.A. Genomic organization of a new candidate tumor suppressor gene, LRP2B. Genomics 69, 271–274 (2000).

    Article  CAS  PubMed  Google Scholar 

  22. Nocedal, J. & Wright, S.J. Numerical Optimization (Springer, New York, 2000).

Download references

Acknowledgements

W.-Y.Y. and E.E. are supported by grants from the US National Science Foundation (0513612, 0731455, 0729049, 0916676 and 1065276) and the US National Institutes of Health (K25 HL080079, U01 DA024417, P01 HL30568 and PO1 HL28481). J.N. is supported by National Science Foundation grant (0933731) and by the Searle Scholars Program. E.H. is a faculty fellow of the Edmond J. Safra Program at Tel Aviv University and was supported in part by the Israeli Science Foundation (grant 04514831) and by IBM open collaborative research award program.

Author information

Authors and Affiliations

Authors

Contributions

W.-Y.Y., J.N., E.E. and E.H. designed the methods and experiments. W.-Y.Y. implemented the methods. W.-Y.Y., J.N., E.E. and E.H. jointly performed the analysis. All authors discussed the results and contributed to the writing of the manuscript.

Corresponding author

Correspondence to Eleazar Eskin.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–5, Supplementary Tables 1–4 and Supplementary Note (PDF 652 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, WY., Novembre, J., Eskin, E. et al. A model-based approach for analysis of spatial structure in genetic data. Nat Genet 44, 725–731 (2012). https://doi.org/10.1038/ng.2285

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/ng.2285

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing