Here we report the Simons Genome Diversity Project data set: high quality genomes from 300 individuals from 142 diverse populations. These genomes include at least 5.8 million base pairs that are not present in the human reference genome. Our analysis reveals key features of the landscape of human genome variation, including that the rate of accumulation of mutations has accelerated by about 5% in non-Africans compared to Africans since divergence. We show that the ancestors of some pairs of present-day human populations were substantially separated by 100,000 years ago, well before the archaeologically attested onset of behavioural modernity. We also demonstrate that indigenous Australians, New Guineans and Andamanese do not derive substantial ancestry from an early dispersal of modern humans; instead, their modern human ancestry is consistent with coming from the same source as that of other non-Africans.
European Nucleotide Archive
Raw data for 279 genomes for which the informed consent documentation is consistent with fully public data release are available through the EBI European Nucleotide Archive under accession numbers PRJEB9586 and ERP010710. For the remaining 21 genomes (designated by code ‘Y’ in the seventh column of Supplementary Data Table 1), data are deposited at the European Genome-phenome Archive (EGA), which is hosted by the EBI and the CRG, under accession number EGAS00001001959. Data for these 21 genomes can be obtained by submitting to the EGA Data Access Committee a signed letter containing the following text: “(a) I will not distribute the data outside my collaboration; (b) I will not post the data publicly; (c) I will make no attempt to connect the genetic data to personal identifiers for the samples; and (d) I will not use the data for any commercial purposes.” Compact versions of the SGDP dataset and software for accessing it are available at (http://genetics.med.harvard.edu/reichlab/Reich_Lab/Datasets.html). The short tandem repeat (STR) genotypes are available through dbVar under accession number nstd128 (http://www.ncbi.nlm.nih.gov/dbvar).
We thank the volunteers who donated samples. We thank H. Blanche, N. Boivin, H. Cann (deceased), E. Eichler, H. Greely, M. Petraglia, K. Prüfer, A. Rogers, M. Steinrücken, U. Stenzel and P. Sudmant for comments, critiques, discussions, or advice on assembling samples. We thank S. Fan for uploading 21 genomes to the European Genome-phenome archive. The sequencing was funded by the Simons Foundation (SFARI 280376) and the US National Science Foundation (BCS-1032255). I.M. was supported by a Long Term Fellowship grant LT001095/2014 from the Human Frontier Science program. P.S. was supported by the Wenner-Gren foundation and the Swedish Research Council (VR grant 2014-453). T.W. and M.G. were supported by an NIJ grant 2014-DN-BX-K089. Y.E. was supported by a Career Award at the Scientific Interface from the Burroughs Wellcome Fund and by NIJ grant 2014-DN-BX-K089. D.L. was supported by the Natural Sciences and Engineering Research Council of Canada. T.K. was supported by ERC Starting Investigator grant FP7 - 261213. R.S. received support from Russian Foundation for Basic Research (#15-04-02543). S.D. received support from the Russian Foundation for Basic Research (#16-34-00599). R.K., E.K. and S.L. were supported by the Russian Foundation for Basic Research (11-04-00725-a). E.B. was supported by the Russian Foundation for Basic Research (16-06-00303). O.B. was supported by the Russian Scientific Fund (14-04-00827) and by the Russian Foundation for Basic Research (16-04-00890). D.M.B., H.S., E.M., R.V. and M.M. were supported by Institutional Research Funding from the Estonian Research Council IUT24-1 and by the European Regional Development Fund (European Union) through the Centre of Excellence in Genomics to Estonian Biocentre and University of Tartu. D.C. was supported by the Spanish MINECO grant CGL-44351-P. L.B.J. and W.S.W. were supported by NIH grant GM59290. S.A.T. was supported by NIH grants 5DP1ES022577 05, 1R01DK104339-01, and 1R01GM113657-01. C.T.-S. and Y.X. were supported by The Wellcome Trust grant 098051. C.M.B. was supported by NSF grants 0924726 and 1153911. K.T. was supported by CSIR Network Project grant (GENESIS: BSC0121). J.P.S. and Y.S.S. were supported in part by an NIH grant R01-GM094402, and a Packard Fellowship for Science and Engineering. G.R., J.K and S.P. were funded by the Max Planck Society. N.P. and D.R. were supported by NIH grant GM100233 and D.R. is a Howard Hughes Medical Institute investigator.