Article | Published:

The Simons Genome Diversity Project: 300 genomes from 142 diverse populations

Nature volume 538, pages 201206 (13 October 2016) | Download Citation

Abstract

Here we report the Simons Genome Diversity Project data set: high quality genomes from 300 individuals from 142 diverse populations. These genomes include at least 5.8 million base pairs that are not present in the human reference genome. Our analysis reveals key features of the landscape of human genome variation, including that the rate of accumulation of mutations has accelerated by about 5% in non-Africans compared to Africans since divergence. We show that the ancestors of some pairs of present-day human populations were substantially separated by 100,000 years ago, well before the archaeologically attested onset of behavioural modernity. We also demonstrate that indigenous Australians, New Guineans and Andamanese do not derive substantial ancestry from an early dispersal of modern humans; instead, their modern human ancestry is consistent with coming from the same source as that of other non-Africans.

Main

To obtain a complete picture of human diversity, it is necessary to sequence the genomes of many individuals from diverse locations. To date, the largest whole-genome sequencing survey, the 1000 Genomes Project, analysed 26 populations of European, East Asian, South Asian, American, and sub-Saharan African ancestry1. However, this and most other sequencing studies have focused on demographically large populations. Such studies tend to ignore smaller populations that are also important for understanding human diversity. In addition, many of these studies have sequenced genomes to only 4–6-fold coverage. Here, we report the Simons Genome Diversity Project (SGDP): deep genome sequences of 300 individuals from 142 populations chosen to span much of human genetic, linguistic, and cultural variation (Supplementary Data Table 1).

Data set and catalogue of novel variants

We sequenced the samples to an average coverage of 43-fold (range 34–83-fold) at Illumina Ltd; almost all samples (278) were prepared using the same PCR-free library preparation (https://support.illumina.com/content/dam/illumina-marketing/documents/services/FastTrackServices_Methods_Tech_Note.pdf). We aligned reads to the human reference genome hs37d5/hg19 using BWA-MEM (BWA-0.7.12)2 (Supplementary Information section 1). We genotyped each sample separately using the Genome Analysis Toolkit (GATK)3, with a modification to eliminate bias towards genotypes matching the reference (Supplementary Information section 1). We developed a filtering procedure that generates a sample-specific mask. At ‘filter level 1’ which we recommend for most analyses, we retain an average of 2.13 Gb of sequence per sample and identify 34.4 million single nucleotide polymorphisms (SNPs) and 2.1 million insertion/deletion polymorphisms (indels) (Supplementary Information section 2). We have made the GATK-processed data available in a file small enough to download by FTP, along with software to analyse these data (Supplementary Information section 3). The SGDP data set highlights the incompleteness of current catalogues of human variation, with the fraction of heterozygous positions not discovered by the 1000 Genomes Project being 11% in the KhoeSan and 5% in New Guineans and Australians (Extended Data Fig. 1; Supplementary Data Table 1). We used FermiKit4 to map short reads against each other, store the assemblies in a compressed form that retains all the information required for polymorphism discovery and analysis, and identified SNPs by comparing against the human reference. We find that FermiKit has comparable sensitivity and specificity to GATK for SNP discovery and genotyping, and is more accurate for indels (Supplementary Information section 4). FermiKit also identified 5.8 Mb of contigs that are present in the SGDP but absent in the human reference genome presumably because they are deleted there; these contigs which we have made publicly available can be used as ‘decoys’ to improve read mapping (Supplementary Information section 5). Finally, we called copy number variants5 and used lobSTR6,7 to genotype 1.6 million short tandem repeats (STRs) (Supplementary Information section 6). The high quality of the STR genotypes (r2 = 0.92 to capillary sequencing calls) is evident from their accurate reconstruction of population relationships, even for difficult-to-genotype mononucleotide repeats (Extended Data Fig. 2).

The structure of human genetic diversity

To obtain an overview of population relationships, we carried out ADMIXTURE8 (Extended Data Fig. 3) and principal component analysis9 (Extended Data Fig. 4a). We also built neighbour-joining trees based on pairwise divergence per nucleotide (Fig. 1a) and FST (Extended Data Fig. 4b) whose topologies are consistent with previous findings that the deepest splits among human populations are among Africans. We computed heterozygosity—the proportion of diallelic genotypes per base pair—and recapitulate previous findings that the highest genetic diversity is found in sub-Saharan Africa and that there is a much lower ratio of X-to-autosome diversity in non-Africans than in Africans (Fig. 1b)10. A surprise is that African ‘pygmy’ hunter-gatherers have reduced X-to-autosome diversity ratios relative to all other sub-Saharan Africans. This pattern is just as strong even after we remove the third of chromosome X known to be subject to the strongest natural selection, suggesting that the finding is driven by demographic history rather than by natural selection (Supplementary Information section 7). It has been suggested that the reduced X-to-autosome heterozygosity ratio in non-Africans is due to ongoing male-driven admixture10,11. Male non-pygmy admixture into pygmies is well-documented12,13, so this process could explain these findings.

Figure 1: Genetic variation in the SGDP.
Figure 1

a, Neighbour-joining tree of relationships based on pairwise divergence. b, Plot of autosomal heterozygosity against the X-to-autosome heterozygosity ratio, showing the reduction in this ratio in non-Africans and pygmies. c, Estimate of Neanderthal ancestry with a heat map scale of 0–3%. d, Estimate of Denisovan ancestry with a heat map scale of 0–0.5% to bring out subtle differences in mainland Eurasia (Oceanian groups with as much as 5% Denisovan ancestry are saturated in bright red).

Comparisons of ancient to present-day human genomes have shown that all non-Africans today possess Neanderthal ancestry14 with more in eastern non-Africans15,16, and that Australo-Melanesians, and to a lesser extent other eastern non-Africans, possess Denisovan ancestry17,18,19. However, these studies only analysed genomes from a handful of populations. We computed statistics informative about Neanderthal and Denisovan ancestry and provide a fine-scale view of these ancestry distributions worldwide (Fig. 1c, d; Supplementary Data Table 1; Supplementary Information section 8). We do not detect any population with a higher proportion of Neanderthal ancestry than is present in East Asians. However, we do find suggestive evidence of an excess of Denisovan ancestry in some South Asians compared to other Eurasians. This signal may not have been detected before because earlier surveys of archaic introgression largely excluded South Asians (Fig. 1d; Supplementary Data Table 1).

The time course of human population separation

We studied demographic history by leveraging the fact that variation across the genome in divergent sites per base pair can be used to reconstruct population size changes and separations. We used the pairwise sequential Markovian coalescent (PSMC)20 to reconstruct population size changes, and the multiple sequentially Markovian coalescent (MSMC)21 to study the time course of population separations. We infer that the population ancestral to all present day humans began to develop substructure at least 200 thousand years ago (kya), which is most apparent when comparing the ancestors of some present-day African hunter–gatherers (southern African KhoeSan and central African Mbuti pygmies) to other populations (Fig. 2a). However, it is also clear that this substructure developed slowly, as all pairs of present-day populations including African hunter–gatherers share a substantial subset of their ancestors as recently as a hundred thousand years ago22,23,24,25. Quoting the time at which MSMC infers that more than 50% (25–75%) of lineages for a pair of populations are descended from the same ancestral population, we estimate that non-Africans separated substantially from KhoeSan 131 (82–173) kya and almost as anciently from the Mbuti around 112 (66–171) kya. Within Africa (Fig. 2a, b), we infer that the Yoruba separated substantially from the KhoeSan 87 (58–120) kya; from the Mbuti 56 (32–84) kya; and from the Dinka 19 (9–25) kya. We estimate a relatively rapid 21 (21–26) kya separation of northern and southern KhoeSan23,26 potentially reflecting isolation since the last glacial maximum; and 38 (27–44) kya separation between western (Biaka) and eastern (Mbuti) pygmies, confirming very old substructure between these two central African hunter–gatherer groups27. Outside Africa, the most ancient structure dates to around 50 kya (Fig. 2c) during or shortly after the deepest part of the shared non-African bottleneck 40–60 kya, consistent with the archaeological evidence of the dispersal of modern humans into Eurasia during this period. We are not confident about the estimates of the date of separation of Australians, New Guineans and Andamanese from other populations because we find that these inferences change depending on the computational method we use for phasing, probably due to these populations not being represented in the 1000 Genomes haploid genome reference panel (Supplementary Information section 9). We caution that the date estimates also do not take into account uncertainty about the true value of the human mutation rate, which could plausibly be 30% higher or lower than the point estimate we use28.

Figure 2: Cross-coalescence rates and effective population sizes for selected population pairs.
Figure 2

ac, Cross-coalescence rates as a function of time in thousands of years ago (kya) estimated using MSMC, with four haplotypes per pair. In each subfigure legend, we give the point estimate of the date at which 25%, 50% and 75% of lineages in the pair of populations have coalesced into a common ancestral population. We generated these plots using data phased with the 1000 Genomes reference panel (method PS1 described in Supplementary Information section 9), but only show pairs of populations for which the cross-coalescence rates are relatively insensitive to the phasing approach. a, Selected African cross-coalescence rates. b, Central African rainforest hunter–gatherer cross-coalescence rates. c, Ancient non-African cross coalescence rates. df, Effective population sizes inferred using PSMC, using one diploid genome per population, for the same populations that we used in ac.

Early modern human dispersals contributed little to non-Africans

There is intense debate about whether present-day Australians, New Guineans and Asian ‘Negrito’ populations are descended from the same source population as mainland Eurasians, or whether they also derive some ancestry from an early, independent dispersal of modern humans into Asia29,30,31. To explore this scenario rigorously, we fit an admixture graph32—a phylogenetic tree incorporating mixture events—to the allele frequency correlations among Neanderthals, Denisovans, Upper Paleolithic Europeans, East Asians, New Guineans, Australians, and Andamanese. We obtain a good fit to the data if we include known Neanderthal and Denisovan introgression and model all modern human ancestry in New Guineans, Australians and Andamanese as part of an eastern clade together with mainland East Asians (Supplementary Information section 11; Fig. 3). Furthermore, when we manually introduce a deeply diverging modern human lineage contributing ancestry to Australians, New Guineans, and Andamanese (or when we repeat the analysis in a model without Andamanese), no position or proportion of the deep lineage improves the fit. If this putative source population branched off the main lineage leading to non-Africans more than about 10–20 thousand years before the separation of European and East Asian ancestors, we obtain an upper bound of a few per cent for the possible contribution to Australians and New Guineans (Fig. 3 inset; Supplementary Information section 11). These results are at odds with an inference of substantial early dispersal ancestry in a previous analysis of an Australian genome31; however, that study used a less complete model that, notably, did not include the known Denisovan admixture into Australo-Melanesians17. The findings for Australians are also unlikely to be due to some unusual feature of the individuals we sequenced, as when we compared three different groups of Australian samples for which there is published genome-wide data, we found them all to be consistent with descending from a common homogeneous population since separation from New Guineans (Supplementary Information section 10). These results are not in conflict with skeletal and archaeological evidence of an early modern human presence outside of Africa29,33, as early migrations could have occurred but not contributed substantially to present-day populations. The possibility of populations that once flourished but did not contribute substantially to living groups is especially plausible now that ancient DNA from the ~45 kya Ust’-Ishim28 and the ~40 kya Oase 1 individuals34 has documented their existence.

Figure 3: Present-day populations have negligible ancestry from an early dispersal of modern humans out of Africa.
Figure 3

Best-fitting admixture graph model of relationships among Australians, New Guineans, Andamanese and other diverse populations. Present-day populations are shown in blue, ancient samples in red, and select inferred ancestral nodes in green. Dotted lines indicate admixture events, all of which involve archaic humans. All f-statistic relationships are accurately fit to within 2.1 standard errors. Inset, results of adding putative early dispersal admixture to the graph model for different assumptions about when the early lineage split off. We specify the split time in terms of the genetic drift above the ‘Non-African’ node, with 0.01 units of drift representing on the order of ten thousand years. The (approximate) model likelihood is maximized with zero early dispersal ancestry, and no more than a few per cent is consistent with the data.

Accelerated mutation accumulation in non-Africans

The SGDP data provide an opportunity to compare the rates at which mutations have accumulated across populations. We restricted our analyses to samples for which our genotypes are likely to be most reliable (this included restricting to samples which were all processed in the same way), and we used the highest level of filtering (‘level 9’) (Supplementary Information section 7). We pooled samples by region to increase power, and for all pairs of regions, computed the expected number of positions where, if we picked a random chromosome from both, region A would mismatch chimpanzee and region B would be identical to chimpanzee (or vice versa). If the rate of accumulation of mutation has been the same since the two populations diverged, these numbers are expected to be equal35. However, when we compute the ratio of mutations on one lineage or the other since separation, we find a subtle (average of 0.5%) but significant excess of mutations in non-Africans relative to sub-Saharan Africans (3.3 < |Z| < 9.4 standard errors from zero; Extended Data Table 1). Because any difference must reflect events since non-African/African population divergence, which is a less than a tenth of average genetic divergence (Fig. 2a), this implies a greater difference in mutation accumulation rates since population divergence (~5%). We were concerned that these results might be biased by the fact that the human genome reference sequence is more closely related to non-Africans than to Africans, or by higher levels of heterozygosity in Africans, as both of these issues could make detection of divergent sites in Africans more difficult. However, we replicated the findings after remapping to chimpanzee, which is equally distant to all present populations, and after restricting analyses to the X chromosome in males (as males only have a single X chromosome, this procedure avoids bias due to different error rates in detecting heterozygous genotypes in populations with different rates of heterozygosity) (Extended Data Fig. 5). These observations are most likely to be explained by acceleration in the rate of mutation accumulation in non-Africans, since the same signal appears in comparisons to sub-Saharan Africans related in different ways to non-Africans (Extended Data Table 1). It is known that the rate of CCT > CTT mutations differs across human populations. However, this particular mutation class was found to be enriched relative to Africans in Europeans but not in East Asians, and thus cannot explain our signal36. One of several possible explanations for these findings is a decrease in the generation interval in non-Africans compared to Africans since separation37.

No species-wide sweeps in modern humans

Finally, we used the SGDP data set to address the hypothesis that the widespread appearance of modern human behaviour in the archaeological record after ~50 kya was driven by one or a few changes in neurological genes that swept through the population shortly before this time38. We first applied the 3P-CLR method39 to search for locations in the genome with low allele frequency differentiation between KhoeSan and other modern humans, combined with high differentiation between modern and archaic (Neanderthal and Denisovan) humans, as might be expected from a selective sweep in the ancestors of all modern humans (Supplementary Information section 12) (Extended Data Fig. 6). We found no strong outlier signals, although a caveat is that the scan has limited power and we could not apply it to filtered sections of the genome. We also applied the PSMC method20 to estimate the average time since the most recent common ancestor (TMRCA) of individuals’ two chromosomes in the genomic regions within the largest 3P-CLR peaks (38 peaks corresponding the top 0.1%). In none of the regions did we infer that the great majority of all pairs of modern humans share a common ancestor <100 kya, as would be expected for a sweep just before ~50 kya years ago (Supplementary Data Table 2).

As a second approach to scanning for species-wide selective sweeps, we applied the PSMC to infer TMRCA for SGDP samples across the entire genome. This analysis found no regions where the great majority of pairs of human genomes are inferred to share a common ancestor <100 kya (the largest fraction seen anywhere in the genome is 68%; Extended Data Fig. 7).

Taken together, these results do not rule out the possibility that genetic changes contributed in a meaningful way to changes in human behaviour after 50 kya; for example, changing selection can produce shifts in the frequencies of pre-existing mutations to bring a population to a new and advantageous set-point for a phenotype as occurred in the case of height differences between northern and southern Europeans40. For polygenic selection, however, genetics is not a creative force, and instead responds to selection pressures imposed by novel environmental conditions or lifestyles. Thus, our results provide evidence against a model in which one or a few mutations were responsible for the rapid developments in human behaviour in the last 50,000 years. Instead, changes in lifestyles due to cultural innovation or exposure to new environments are likely to have been driving forces behind the rapid transformations in human behaviour in the last 50,000 years41,42.

Accessions

Primary accessions

European Nucleotide Archive

Data deposits

Raw data for 279 genomes for which the informed consent documentation is consistent with fully public data release are available through the EBI European Nucleotide Archive under accession numbers PRJEB9586 and ERP010710. For the remaining 21 genomes (designated by code ‘Y’ in the seventh column of Supplementary Data Table 1), data are deposited at the European Genome-phenome Archive (EGA), which is hosted by the EBI and the CRG, under accession number EGAS00001001959. Data for these 21 genomes can be obtained by submitting to the EGA Data Access Committee a signed letter containing the following text: “(a) I will not distribute the data outside my collaboration; (b) I will not post the data publicly; (c) I will make no attempt to connect the genetic data to personal identifiers for the samples; and (d) I will not use the data for any commercial purposes.” Compact versions of the SGDP dataset and software for accessing it are available at (http://genetics.med.harvard.edu/reichlab/Reich_Lab/Datasets.html). The short tandem repeat (STR) genotypes are available through dbVar under accession number nstd128 (http://www.ncbi.nlm.nih.gov/dbvar).

References

  1. 1.

    et al. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012)

  2. 2.

    & Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26, 589–595 (2010)

  3. 3.

    et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010)

  4. 4.

    FermiKit: assembly-based variant calling for Illumina resequencing data. Preprint at (2015)

  5. 5.

    et al. Global diversity, population stratification, and selection of human copy-number variation. Science 349, aab3761 (2015)

  6. 6.

    & Profiling short tandem repeats from short reads. Methods Mol. Biol. 1038, 113–135 (2013)

  7. 7.

    , , & lobSTR: A short tandem repeat profiler for personal genomes. Genome Res. 22, 1154–1162 (2012)

  8. 8.

    , & Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009)

  9. 9.

    , & Population structure and eigenanalysis. PLoS Genet. 2, e190 (2006)

  10. 10.

    , , & Accelerated genetic drift on chromosome X during the human dispersal out of Africa. Nat. Genet. 41, 66–70 (2009)

  11. 11.

    & Can a sex-biased human demography account for the reduced effective population size of chromosome X in non-Africans? Mol. Biol. Evol. 27, 2312–2321 (2010)

  12. 12.

    et al. Sociocultural behavior, sex-biased admixture, and effective population sizes in Central African Pygmies and non-Pygmies. Mol. Biol. Evol. 30, 918–937 (2013)

  13. 13.

    The framework of central African hunter-gatherers and neighbouring societies. African Study Monographs Suppl. 28, 57–79 (2003)

  14. 14.

    et al. A draft sequence of the Neandertal genome. Science 328, 710–722 (2010)

  15. 15.

    et al. A high-coverage genome sequence from an archaic Denisovan individual. Science 338, 222–226 (2012)

  16. 16.

    et al. Higher levels of neanderthal ancestry in East Asians than in Europeans. Genetics 194, 199–209 (2013)

  17. 17.

    et al. Genetic history of an archaic hominin group from Denisova Cave in Siberia. Nature 468, 1053–1060 (2010)

  18. 18.

    et al. The complete genome sequence of a Neanderthal from the Altai Mountains. Nature 505, 43–49 (2014)

  19. 19.

    & Archaic human ancestry in East Asia. Proc. Natl Acad. Sci. USA 108, 18301–18306 (2011)

  20. 20.

    & Inference of human population history from individual whole-genome sequences. Nature 475, 493–496 (2011)

  21. 21.

    & Inferring human population size and separation history from multiple genome sequences. Nat. Genet. 46, 919–925 (2014)

  22. 22.

    , , , & Bayesian inference of ancient human demography from individual genome sequences. Nat. Genet. 43, 1031–1034 (2011)

  23. 23.

    et al. Genomic variation in seven Khoe-San groups reveals adaptation and complex African history. Science 338, 374–379 (2012)

  24. 24.

    et al. An early divergence of KhoeSan ancestors from those of other modern humans is supported by an ABC-based analysis of autosomal resequencing data. Mol. Biol. Evol. 29, 617–630 (2012)

  25. 25.

    , & Archaic lineages in the history of modern humans. Genetics 156, 799–808 (2000)

  26. 26.

    et al. The genetic prehistory of southern Africa. Nat. Commun. 3, 1143 (2012)

  27. 27.

    et al. Inferring the demographic history of African farmers and pygmy hunter-gatherers using a multilocus resequencing data set. PLoS Genet. 5, e1000448 (2009)

  28. 28.

    et al. Genome sequence of a 45,000-year-old modern human from western Siberia. Nature 514, 445–449 (2014)

  29. 29.

    et al. Rethinking the dispersal of Homo sapiens out of Africa. Evol. Anthropol. 24, 149–164 (2015)

  30. 30.

    , , , & Testing modern human out-of-Africa dispersal models and implications for modern human origins. J. Hum. Evol. 87, 95–106 (2015)

  31. 31.

    et al. An Aboriginal Australian genome reveals separate human dispersals into Asia. Science 334, 94–98 (2011)

  32. 32.

    et al. Ancient admixture in human history. Genetics 192, 1065–1093 (2012)

  33. 33.

    et al. The earliest unequivocally modern humans in southern China. Nature 526, 696–699 (2015)

  34. 34.

    et al. An early modern human from Romania with a recent Neanderthal ancestor. Nature 524, 216–219 (2015)

  35. 35.

    et al. No evidence that selection has been less effective at removing deleterious mutations in Europeans than in Africans. Nat. Genet. 47, 126–131 (2015)

  36. 36.

    Evidence for recent, population-specific evolution of the human mutation rate. Proc. Natl Acad. Sci. USA 112, 3439–3444 (2015)

  37. 37.

    , & Determinants of mutation rate variation in the human germline. Annu. Rev. Genomics Hum. Genet. 15, 47–70 (2014)

  38. 38.

    & The dawn of human culture. (Wiley, 2002)

  39. 39.

    Testing for ancient selection using cross-population allele frequency differentiation. Genetics 202, 733–750 (2015)

  40. 40.

    et al. Evidence of widespread selection on standing variation in Europe at height-associated SNPs. Nat. Genet. 44, 1015–1019 (2012)

  41. 41.

    & The revolution that wasn’t: a new interpretation of the origin of modern human behavior. J. Hum. Evol. 39, 453–563 (2000)

  42. 42.

    Prehistory: the Making of the Human Mind. (Modern Library, 2009)

  43. 43.

    & Enhancements to the ADMIXTURE algorithm for individual ancestry estimation. BMC Bioinformatics 12, 246 (2011)

  44. 44.

    et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, 7 (2015)

  45. 45.

    et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007)

Download references

Acknowledgements

We thank the volunteers who donated samples. We thank H. Blanche, N. Boivin, H. Cann (deceased), E. Eichler, H. Greely, M. Petraglia, K. Prüfer, A. Rogers, M. Steinrücken, U. Stenzel and P. Sudmant for comments, critiques, discussions, or advice on assembling samples. We thank S. Fan for uploading 21 genomes to the European Genome-phenome archive. The sequencing was funded by the Simons Foundation (SFARI 280376) and the US National Science Foundation (BCS-1032255). I.M. was supported by a Long Term Fellowship grant LT001095/2014 from the Human Frontier Science program. P.S. was supported by the Wenner-Gren foundation and the Swedish Research Council (VR grant 2014-453). T.W. and M.G. were supported by an NIJ grant 2014-DN-BX-K089. Y.E. was supported by a Career Award at the Scientific Interface from the Burroughs Wellcome Fund and by NIJ grant 2014-DN-BX-K089. D.L. was supported by the Natural Sciences and Engineering Research Council of Canada. T.K. was supported by ERC Starting Investigator grant FP7 - 261213. R.S. received support from Russian Foundation for Basic Research (#15-04-02543). S.D. received support from the Russian Foundation for Basic Research (#16-34-00599). R.K., E.K. and S.L. were supported by the Russian Foundation for Basic Research (11-04-00725-a). E.B. was supported by the Russian Foundation for Basic Research (16-06-00303). O.B. was supported by the Russian Scientific Fund (14-04-00827) and by the Russian Foundation for Basic Research (16-04-00890). D.M.B., H.S., E.M., R.V. and M.M. were supported by Institutional Research Funding from the Estonian Research Council IUT24-1 and by the European Regional Development Fund (European Union) through the Centre of Excellence in Genomics to Estonian Biocentre and University of Tartu. D.C. was supported by the Spanish MINECO grant CGL-44351-P. L.B.J. and W.S.W. were supported by NIH grant GM59290. S.A.T. was supported by NIH grants 5DP1ES022577 05, 1R01DK104339-01, and 1R01GM113657-01. C.T.-S. and Y.X. were supported by The Wellcome Trust grant 098051. C.M.B. was supported by NSF grants 0924726 and 1153911. K.T. was supported by CSIR Network Project grant (GENESIS: BSC0121). J.P.S. and Y.S.S. were supported in part by an NIH grant R01-GM094402, and a Packard Fellowship for Science and Engineering. G.R., J.K and S.P. were funded by the Max Planck Society. N.P. and D.R. were supported by NIH grant GM100233 and D.R. is a Howard Hughes Medical Institute investigator.

Author information

Author notes

    • Heng Li
    • , Mark Lipson
    •  & Iain Mathieson

    These authors contributed equally to this work.

    • Sriram Sankararaman
    •  & Lalji Singh

    Present addresses: Department of Computer Science, University of California at Los Angeles, California 90095, USA and Department of Human Genetics Science, University of California at Los Angeles, California 90095, USA (S.S.); Genome Foundation, Hyderabad 500076, India (L.S).

Affiliations

  1. Department of Genetics, Harvard Medical School, Boston, Massachusetts 02115, USA

    • Swapan Mallick
    • , Mark Lipson
    • , Iain Mathieson
    • , Mengyao Zhao
    • , Niru Chennagiri
    • , Susanne Nordenfelt
    • , Arti Tandon
    • , Pontus Skoglund
    • , Iosif Lazaridis
    • , Sriram Sankararaman
    • , Qiaomei Fu
    • , Nadin Rohland
    •  & David Reich
  2. Broad Institute of Harvard and MIT, Cambridge, Massachusetts 02142, USA

    • Swapan Mallick
    • , Heng Li
    • , Melissa Gymrek
    • , Mengyao Zhao
    • , Niru Chennagiri
    • , Susanne Nordenfelt
    • , Arti Tandon
    • , Pontus Skoglund
    • , Iosif Lazaridis
    • , Sriram Sankararaman
    • , Qiaomei Fu
    • , Nadin Rohland
    • , Nick Patterson
    •  & David Reich
  3. Howard Hughes Medical Institute, Harvard Medical School, Boston, Massachusetts 02115, USA

    • Swapan Mallick
    • , Mengyao Zhao
    • , Niru Chennagiri
    • , Susanne Nordenfelt
    •  & David Reich
  4. Whitehead Institute for Biomedical Research, Cambridge, Massachusetts 02142, USA

    • Melissa Gymrek
  5. Harvard-MIT Division of Health Sciences and Technology, MIT, Cambridge, Massachusetts 02139, USA

    • Melissa Gymrek
  6. New York Genome Center, New York, New York 10013, USA

    • Melissa Gymrek
    • , Yaniv Erlich
    • , Thomas Willems
    •  & William Klitz
  7. Department of Integrative Biology, University of California, Berkeley, California 94720-3140, USA

    • Fernando Racimo
  8. Key Laboratory of Vertebrate Evolution and Human Origins of Chinese Academy of Sciences, IVPP, CAS, Beijing 100044, China

    • Qiaomei Fu
  9. Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, D-04103 Leipzig, Germany

    • Gabriel Renaud
    • , Svante Pääbo
    •  & Janet Kelso
  10. Department of Computer Science, Columbia University, New York, New York 10027, USA

    • Yaniv Erlich
  11. Center for Computational Biology and Bioinformatics, Columbia University, New York, New York 10032, USA

    • Yaniv Erlich
  12. Computational and Systems Biology Program, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA

    • Thomas Willems
  13. Laboratorios de Investigación y Desarrollo, Facultad de Ciencias y Filosofía, Universidad Peruana Cayetano Heredia, Lima 15102, Perú

    • Carla Gallo
    •  & Giovanni Poletti
  14. Computational Biology Graduate Group, University of California, Berkeley, California 94720, USA

    • Jeffrey P. Spence
  15. Computer Science Division, University of California, Berkeley, California 94720, USA

    • Yun S. Song
  16. Department of Statistics, University of California, Berkeley, California 94720, USA

    • Yun S. Song
  17. Department of Mathematics and Department of Biology, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA

    • Yun S. Song
  18. Genetics Institute, University College London, Gower Street, London WC1E 6BT, UK

    • Francois Balloux
  19. Institute of Linguistics, University of Bern, Bern CH-3012, Switzerland

    • George van Driem
  20. Department of Human and Clinical Genetics, Postzone S5-P, Leiden University Medical Center, 2333 ZA Leiden, Netherlands

    • Peter de Knijff
  21. School of Biological Sciences, Nanyang Technological University, 637551 Singapore

    • Irene Gallego Romero
  22. Lee Kong Chian School of Medicine, Nanyang Technological University, 636921 Singapore

    • Irene Gallego Romero
  23. Department of Human Genetics, University of Chicago, Chicago, Illinois 60637, USA

    • Aashish R. Jha
    • , Anna Di Rienzo
    •  & Choongwon Jeong
  24. Estonian Biocentre, Evolutionary Biology group, Tartu 51010, Estonia

    • Doron M. Behar
    • , Hovhannes Sahakyan
    • , Ene Metspalu
    • , Jüri Parik
    • , Richard Villems
    • , Sergey Litvinov
    • , Toomas Kivisild
    •  & Mait Metspalu
  25. Laboratorio de Genética Molecular Poblacional, Instituto Multidisciplinario de Biología Celular (IMBICE), CCT-CONICET La Plata/CIC Buenos Aires/Universidad Nacional de La Plata, La Plata B1906APO, Argentina

    • Claudio M. Bravi
  26. Department of Zoology, University of Oxford, Oxford OX1 3PS, UK

    • Cristian Capelli
  27. Department of Clinical Science, University of Bergen, Bergen 5021, Norway

    • Tor Hervig
  28. National Laboratory of Genomics for Biodiversity (LANGEBIO), CINVESTAV, Irapuato, Guanajuato 36821, Mexico

    • Andres Moreno-Estrada
  29. Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, Novosibirsk 630090, Russia

    • Olga L. Posukh
  30. Novosibirsk State University, Novosibirsk 630090, Russia

    • Olga L. Posukh
  31. Research Centre for Medical Genetics, Moscow 115478, Russia

    • Elena Balanovska
    •  & Oleg Balanovsky
  32. Vavilov Institute for General Genetics, Moscow 119991, Russia

    • Oleg Balanovsky
  33. Moscow Institute for Physics and Technology, Dolgoprudniy 141700, Russia

    • Oleg Balanovsky
  34. Department of Medical Genetics, National Human Genome Center, Medical University Sofia, Sofia 1431, Bulgaria

    • Sena Karachanak-Yankova
    •  & Draga Toncheva
  35. Laboratory of Ethnogenomics, Institute of Molecular Biology, National Academy of Sciences of Armenia, Yerevan 0014, Armenia

    • Hovhannes Sahakyan
    •  & Levon Yepiskoposyan
  36. The Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK

    • Chris Tyler-Smith
    •  & Yali Xue
  37. RIPAS Hospital, Bandar Seri Begawan, Brunei

    • M. Syafiq Abdullah
  38. Department of Genetics, Evolution and Environment, University College London WC1E 6BT, UK

    • Andres Ruiz-Linares
  39. Department of Anthropology, Case Western Reserve University, Cleveland, Ohio 44106-7125, USA

    • Cynthia M. Beall
  40. Laboratory of Human Molecular Genetics, Institute of Molecular and Cellular Biology, Siberian Branch of Russian Academy of Sciences, Novosibirsk 630090, Russia

    • Elena B. Starikovskaya
    • , Stanislav Dryomov
    •  & Rem Sukernik
  41. Department of Evolutionary Biology, University of Tartu, Tartu 51010, Estonia

    • Ene Metspalu
    •  & Richard Villems
  42. Estonian Academy of Sciences, Tallinn 10130, Estonia

    • Richard Villems
  43. Department of Ecology and Evolution, Stony Brook University, Stony Brook, New York 11794, USA

    • Brenna M. Henn
  44. NextBio, Illumina, Santa Clara, California 95050, USA

    • Ugur Hodoglugil
  45. Gladstone Institutes, San Francisco, California 94158, USA

    • Robert Mahley
  46. Department of Forensic Medicine, University of Helsinki, Helsinki 00014, Finland

    • Antti Sajantila
  47. Department of Medicine, Division of Medical Genetics, University of Washington, Seattle, Washington 98195, USA

    • George Stamatoyannopoulos
  48. National Cancer Centre Singapore, 169610 Singapore

    • Joseph T. S. Wee
  49. Institute of Biochemistry and Genetics, Ufa Research Centre, Russian Academy of Sciences, Ufa 450054, Russia

    • Rita Khusainova
    • , Elza Khusnutdinova
    •  & Sergey Litvinov
  50. Department of Genetics and Fundamental Medicine, Bashkir State University, Ufa 450074, Russia

    • Rita Khusainova
    • , Elza Khusnutdinova
    •  & Sergey Litvinov
  51. Jaramogi Oginga Odinga University of Science and Technology, Bondo 40601, Kenya

    • George Ayodo
  52. Institut de Biologia Evolutiva (CSIC-UPF), Departament de Ciències Experimentals i de la Salut, Universitat Pompeu Fabra, Barcelona 08003, Spain

    • David Comas
  53. ARL Division of Biotechnology, University of Arizona, Tucson, Arizona 85721, USA

    • Michael F. Hammer
  54. Division of Biological Anthropology, University of Cambridge, Fitzwilliam Street, Cambridge CB2 1QH, UK

    • Toomas Kivisild
  55. Basic Research Laboratory, Center for Cancer Research, NCI, Leidos Biomedical Research, Inc., Frederick National Laboratory, Frederick, Maryland 21702, USA

    • Cheryl A. Winkler
  56. CHU Sainte-Justine, Pediatrics Departement, Université de Montréal, Québec H3T 1C5, Canada

    • Damian Labuda
  57. Department of Pediatrics, University of Washington, Seattle, Washington 98119, USA

    • Michael Bamshad
  58. Department of Human Genetics, University of Utah School of Medicine, Salt Lake City, Utah 84112, USA

    • Lynn B. Jorde
  59. Departments of Genetics and Biology, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA

    • Sarah A. Tishkoff
  60. Department of Human Genetics, Eccles Institute of Human Genetics, University of Utah, Salt Lake City, Utah 84112, USA

    • W. Scott Watkins
  61. Department of Paleolithic Archaeology, Institute of Archaeology and Ethnography, Siberian Branch of Russian Academy of Sciences, Novosibirsk 630090, Russia

    • Stanislav Dryomov
  62. Altai State University, Barnaul 656000, Russia

    • Rem Sukernik
  63. CSIR-Centre for Cellular and Molecular Biology, Hyderabad 500 007, India

    • Lalji Singh
    •  & Kumarasamy Thangaraj

Authors

  1. Search for Swapan Mallick in:

  2. Search for Heng Li in:

  3. Search for Mark Lipson in:

  4. Search for Iain Mathieson in:

  5. Search for Melissa Gymrek in:

  6. Search for Fernando Racimo in:

  7. Search for Mengyao Zhao in:

  8. Search for Niru Chennagiri in:

  9. Search for Susanne Nordenfelt in:

  10. Search for Arti Tandon in:

  11. Search for Pontus Skoglund in:

  12. Search for Iosif Lazaridis in:

  13. Search for Sriram Sankararaman in:

  14. Search for Qiaomei Fu in:

  15. Search for Nadin Rohland in:

  16. Search for Gabriel Renaud in:

  17. Search for Yaniv Erlich in:

  18. Search for Thomas Willems in:

  19. Search for Carla Gallo in:

  20. Search for Jeffrey P. Spence in:

  21. Search for Yun S. Song in:

  22. Search for Giovanni Poletti in:

  23. Search for Francois Balloux in:

  24. Search for George van Driem in:

  25. Search for Peter de Knijff in:

  26. Search for Irene Gallego Romero in:

  27. Search for Aashish R. Jha in:

  28. Search for Doron M. Behar in:

  29. Search for Claudio M. Bravi in:

  30. Search for Cristian Capelli in:

  31. Search for Tor Hervig in:

  32. Search for Andres Moreno-Estrada in:

  33. Search for Olga L. Posukh in:

  34. Search for Elena Balanovska in:

  35. Search for Oleg Balanovsky in:

  36. Search for Sena Karachanak-Yankova in:

  37. Search for Hovhannes Sahakyan in:

  38. Search for Draga Toncheva in:

  39. Search for Levon Yepiskoposyan in:

  40. Search for Chris Tyler-Smith in:

  41. Search for Yali Xue in:

  42. Search for M. Syafiq Abdullah in:

  43. Search for Andres Ruiz-Linares in:

  44. Search for Cynthia M. Beall in:

  45. Search for Anna Di Rienzo in:

  46. Search for Choongwon Jeong in:

  47. Search for Elena B. Starikovskaya in:

  48. Search for Ene Metspalu in:

  49. Search for Jüri Parik in:

  50. Search for Richard Villems in:

  51. Search for Brenna M. Henn in:

  52. Search for Ugur Hodoglugil in:

  53. Search for Robert Mahley in:

  54. Search for Antti Sajantila in:

  55. Search for George Stamatoyannopoulos in:

  56. Search for Joseph T. S. Wee in:

  57. Search for Rita Khusainova in:

  58. Search for Elza Khusnutdinova in:

  59. Search for Sergey Litvinov in:

  60. Search for George Ayodo in:

  61. Search for David Comas in:

  62. Search for Michael F. Hammer in:

  63. Search for Toomas Kivisild in:

  64. Search for William Klitz in:

  65. Search for Cheryl A. Winkler in:

  66. Search for Damian Labuda in:

  67. Search for Michael Bamshad in:

  68. Search for Lynn B. Jorde in:

  69. Search for Sarah A. Tishkoff in:

  70. Search for W. Scott Watkins in:

  71. Search for Mait Metspalu in:

  72. Search for Stanislav Dryomov in:

  73. Search for Rem Sukernik in:

  74. Search for Lalji Singh in:

  75. Search for Kumarasamy Thangaraj in:

  76. Search for Svante Pääbo in:

  77. Search for Janet Kelso in:

  78. Search for Nick Patterson in:

  79. Search for David Reich in:

Contributions

S.M., Y.E., Y.S.S., S.P., J.K., N.P. and D.R. supervised the study. S.N., N.R., C.G., G.P., F.B., G.D., I.G.R., A.R.J., P.D., D.M.B., C.M.B., C.C., T.H., A.M.-E., O.L.P., E.B., O.B., S.K.-Y., H.S., D.T., L.Y., C.T.-S., Y.X., M.S.A., A.R.-L., C.B., A.D.R., C.J., E.B.S., E.M., J.P., R.V., B.M.H., U.H., R.W.M., A.S., G.S., J.T.S.W., R.K., E.K., S.L., G.A., D.C., M.H., T.K., W.K., C.A.W., D.L., M.B., L.B.J., S.A.T., W.S.W., M.M., S.D., R.S., L.S., K.T. and D.R. assembled samples. S.M., H.L., M.L., I.M., M.G., F.R., J.P.S., M.Z., N.C., A.T., P.S., I.L., S.S., Q.F., G.R., Y.S., N.P. and D.R. performed analyses. S.M., H.L., M.L., I.M., M.G., F.R., M.Z., N.P. and D.R. wrote the manuscript with help from all co-authors.

Competing interests

U.H. is employed by NextBio, a division of Illumina Ltd.

Corresponding authors

Correspondence to Swapan Mallick or David Reich.

Reviewer Information Nature thanks P. Bellwood and S. Ramachandran and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Extended data

Supplementary information

PDF files

  1. 1.

    Supplementary Information

    This file contains Supplementary Text and Data, Supplementary Tables Supplementary Figures and additional references (see Contents for details).

Excel files

  1. 1.

    Supplementary Table 1

    This file shows the data by each sample studied.

  2. 2.

    Supplementary Table 2

    This table shows the top hits for 3P-CLR run.

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/nature18964

Further reading

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.