Genome-wide patterns of variation across individuals provide a powerful source of data for uncovering the history of migration, range expansion, and adaptation of the human species. However, high-resolution surveys of variation in genotype, haplotype and copy number have generally focused on a small number of population groups1,2,3. Here we report the analysis of high-quality genotypes at 525,910 single-nucleotide polymorphisms (SNPs) and 396 copy-number-variable loci in a worldwide sample of 29 populations. Analysis of SNP genotypes yields strongly supported fine-scale inferences about population structure. Increasing linkage disequilibrium is observed with increasing geographic distance from Africa, as expected under a serial founder effect for the out-of-Africa spread of human populations. New approaches for haplotype analysis produce inferences about population structure that complement results based on unphased SNPs. Despite a difference from SNPs in the frequency spectrum of the copy-number variants (CNVs) detected—including a comparatively large number of CNVs in previously unexamined populations from Oceania and the Americas—the global distribution of CNVs largely accords with population structure analyses for SNP data sets of similar size. Our results produce new inferences about inter-population variation, support the utility of CNVs in human population-genetic research, and serve as a genomic resource for human-genetic studies in diverse worldwide populations.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.


Primary accessions

Gene Expression Omnibus

Data deposits

The array data described in this paper are deposited in the Gene Expression Omnibus ( under accession number GSE10331.


  1. 1.

    The International Haplotype Map Consortium. A haplotype map of the human genome. Nature 437, 1299–1320 (2005)

  2. 2.

    et al. Whole-genome patterns of common DNA variation in three human populations. Science 307, 1072–1079 (2005)

  3. 3.

    et al. Global variation in copy number in the human genome. Nature 444, 444–454 (2006)

  4. 4.

    et al. A human genome diversity cell line panel. Science 296, 261–262 (2002)

  5. 5.

    Counting alleles with rarefaction: private alleles and hierarchical sampling designs. Conserv. Genet. 5, 539–543 (2004)

  6. 6.

    , & Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics 164, 1567–1587 (2003)

  7. 7.

    , & The genetic structure of human populations studied through short insertion–deletion polymorphisms. Ann. Hum. Genet. 70, 658–665 (2006)

  8. 8.

    et al. Clines, clusters, and the effect of study design on the inference of human population structure. PLoS Genet. 1, e70 (2005)

  9. 9.

    et al. Genetic structure of human populations. Science 298, 2381–2385 (2002)

  10. 10.

    , , & Going the distance: human population genetics in a clinal world. Trends Genet. 23, 432–439 (2007)

  11. 11.

    et al. Support from the relationship of genetic and geographic distance in human populations for a serial founder effect originating in Africa. Proc. Natl Acad. Sci. USA 102, 15942–15947 (2005)

  12. 12.

    & Homozygosity and linkage disequilibrium. Genetics 160, 1707–1719 (2002)

  13. 13.

    et al. A worldwide survey of haplotype variation and linkage disequilibrium in the human genome. Nature Genet. 38, 1251–1260 (2006)

  14. 14.

    et al. The structure of haplotype blocks in the human genome. Science 296, 2225–2229 (2002)

  15. 15.

    et al. Linkage disequilibrium in the human genome. Nature 411, 199–204 (2001)

  16. 16.

    & Implications of biogeography of human populations for ‘race’ and medicine. Nature Genet. 36, S21–S27 (2004)

  17. 17.

    A genealogical interpretation of linkage disequilibrium. Genetics 162, 987–991 (2002)

  18. 18.

    et al. Genetic signatures of strong recent positive selection at the lactase gene. Am. J. Hum. Genet. 74, 1111–1120 (2004)

  19. 19.

    et al. Convergent adaptation of human lactase persistence in Africa and Europe. Nature Genet. 39, 31–40 (2007)

  20. 20.

    et al. PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome Res. 17, 1665–1674 (2007)

  21. 21.

    et al. A comprehensive analysis of common copy-number variations in the human genome. Am. J. Hum. Genet. 80, 91–104 (2007)

  22. 22.

    et al. Linkage disequilibrium and heritability of copy-number polymorphisms within duplicated regions of the human genome. Am. J. Hum. Genet. 79, 275–290 (2006)

  23. 23.

    et al. Segmental duplications and copy-number variation in the human genome. Am. J. Hum. Genet. 77, 78–88 (2005)

  24. 24.

    et al. Challenges and standards in integrating surveys of structural variation. Nature Genet. 39, S7–S15 (2007)

  25. 25.

    & Imputation-based analysis of association studies: candidate regions and quantitative traits. PLoS Genet. 3, e114 (2007)

  26. 26.

    & Genome-wide tagging for everyone. Nature Genet. 38, 1227–1228 (2006)

  27. 27.

    , , & Allele frequency matching between SNPs reveals an excess of linkage disequilibrium in genic regions of the human genome. PLoS Genet. 2, e142 (2006)

  28. 28.

    & A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am. J. Hum. Genet. 78, 629–644 (2006)

  29. 29.

    & CLUMPP: a cluster matching and permutation program for dealing with label switching and multimodality in analysis of population structure. Bioinformatics 23, 1801–1806 (2007)

  30. 30.

    , , , & Development of bioinformatics resources for display and analysis of copy number and other structural variants in the human genome. Cytogenet. Genome Res. 115, 205–214 (2006)

Download references


We thank the Biological Resource Center at the Fondation Jean Dausset – CEPH for preparing HGDP–CEPH diversity panel DNA samples, and S. Chanock and A. Hutchinson for assistance with the DNAs. This work was supported in part by NIH grants, by a postdoctoral fellowship from the University of Michigan Center for Genetics in Health and Medicine, by grants from the Alfred P. Sloan Foundation and the Burroughs Wellcome Fund, by the National Center for Minority Health and Health Disparities, and by the Intramural Program of the National Institute on Aging. The study used the Biowulf Linux cluster at the National Institutes of Health (

Author Contributions N.A.R. and A.B.S. wish to be regarded as joint last authors.

Author information

Author notes

    • Mattias Jakobsson
    • , Sonja W. Scholz
    •  & Paul Scheet

    These authors contributed equally to this work.


  1. Center for Computational Medicine and Biology,

    • Mattias Jakobsson
    • , Paul Scheet
    • , Jenna M. VanLiere
    • , Zachary A. Szpiech
    • , James H. Degnan
    •  & Noah A. Rosenberg
  2. Department of Human Genetics,

    • Mattias Jakobsson
    • , James H. Degnan
    •  & Noah A. Rosenberg
  3. Department of Biostatistics, University of Michigan, Ann Arbor, Michigan 48109, USA

    • Paul Scheet
    •  & Noah A. Rosenberg
  4. Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, Maryland 20892, USA

    • Sonja W. Scholz
    • , J. Raphael Gibbs
    • , Hon-Chung Fung
    • , Rita Guerreiro
    • , Jose M. Bras
    • , Jennifer C. Schymick
    • , Dena G. Hernandez
    • , Bryan J. Traynor
    • , Javier Simon-Sanchez
    • , Mar Matarin
    • , Angela Britton
    • , Joyce van de Leemput
    • , Ian Rafferty
    •  & Andrew B. Singleton
  5. Department of Molecular Neuroscience and Reta Lila Weston Institute of Neurological Studies, Institute of Neurology, University College London, Queen Square, London WC1N 3BG, UK

    • Sonja W. Scholz
    • , J. Raphael Gibbs
    • , Joyce van de Leemput
    •  & John A. Hardy
  6. Department of Neurology, Chang Gung Memorial Hospital and College of Medicine, Chang Gung University, Taipei 10591, Taiwan

    • Hon-Chung Fung
  7. Department of Genetics, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA

    • Kai Wang
    •  & Maja Bucan
  8. Center for Neurosciences and Cell Biology, Faculty of Medicine, University of Coimbra, 3004-504 Coimbra, Portugal

    • Rita Guerreiro
    •  & Jose M. Bras
  9. University of Oxford, Department of Clinical Neurology, John Radcliffe Hospital, Oxford OX3 9DU, UK

    • Jennifer C. Schymick
  10. Neurogenetics Branch, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, Maryland 20892, USA

    • Bryan J. Traynor
  11. Unidad de Genética Molecular, Departamento de Genómica y Proteómica, Instituto de Biomedicina de Valencia-CSIC, 46010, Valencia, Spain

    • Javier Simon-Sanchez
  12. Fondation Jean Dausset – Centre d’Étude du Polymorphisme Humain (CEPH), 27 rue Juliette Dodu, 75010 Paris, France

    • Howard M. Cann
  13. Center for Public Health Genomics, University of Virginia, Charlottesville, Virginia 22908, USA

    • Andrew B. Singleton


  1. Search for Mattias Jakobsson in:

  2. Search for Sonja W. Scholz in:

  3. Search for Paul Scheet in:

  4. Search for J. Raphael Gibbs in:

  5. Search for Jenna M. VanLiere in:

  6. Search for Hon-Chung Fung in:

  7. Search for Zachary A. Szpiech in:

  8. Search for James H. Degnan in:

  9. Search for Kai Wang in:

  10. Search for Rita Guerreiro in:

  11. Search for Jose M. Bras in:

  12. Search for Jennifer C. Schymick in:

  13. Search for Dena G. Hernandez in:

  14. Search for Bryan J. Traynor in:

  15. Search for Javier Simon-Sanchez in:

  16. Search for Mar Matarin in:

  17. Search for Angela Britton in:

  18. Search for Joyce van de Leemput in:

  19. Search for Ian Rafferty in:

  20. Search for Maja Bucan in:

  21. Search for Howard M. Cann in:

  22. Search for John A. Hardy in:

  23. Search for Noah A. Rosenberg in:

  24. Search for Andrew B. Singleton in:

Corresponding authors

Correspondence to Noah A. Rosenberg or Andrew B. Singleton.

Supplementary information

PDF files

  1. 1.

    Supplementary Information

    This file contains extensive Supplementary Information with Supplementary Notes, Supplementary Data, Supplementary Tables S1-S17, Supplementary Figures S1-S30 with Legends and additional references.

About this article

Publication history






By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.