High-coverage whole-genome sequence studies have so far focused on a limited number1 of geographically restricted populations2,3,4,5, or been targeted at specific diseases, such as cancer6. Nevertheless, the availability of high-resolution genomic data has led to the development of new methodologies for inferring population history7,8,9 and refuelled the debate on the mutation rate in humans10. Here we present the Estonian Biocentre Human Genome Diversity Panel (EGDP), a dataset of 483 high-coverage human genomes from 148 populations worldwide, including 379 new genomes from 125 populations, which we group into diversity and selection sets. We analyse this dataset to refine estimates of continent-wide patterns of heterozygosity, long- and short-distance gene flow, archaic admixture, and changes in effective population size through time as well as for signals of positive or balancing selection. We find a genetic signature in present-day Papuans that suggests that at least 2% of their genome originates from an early and largely extinct expansion of anatomically modern humans (AMHs) out of Africa. Together with evidence from the western Asian fossil record11, and admixture between AMHs and Neanderthals predating the main Eurasian expansion12, our results contribute to the mounting evidence for the presence of AMHs out of Africa earlier than 75,000 years ago.
This is a preview of subscription content
Subscribe to Journal
Get full journal access for 1 year
only $3.90 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Get time limited or full article access on ReadCube.
All prices are NET prices.
European Nucleotide Archive
The newly sequenced genomes are part of the Estonian Biocentre human Genome Diversity Panel (EGDP) and were deposited in the ENA archive under accession number PRJEB12437 and are also freely available through the Estonian Biocentre website (www.ebc.ee/free_data)
Drmanac, R. et al. Human genome sequencing using unchained base reads on self-assembling DNA nanoarrays. Science 327, 78–81 (2010)
Lachance, J. et al. Evolutionary history and adaptation from high-coverage whole-genome sequences of diverse African hunter-gatherers. Cell 150, 457–469 (2012)
Pagani, L. et al. Tracing the route of modern humans out of Africa by using 225 human genome sequences from Ethiopians and Egyptians. Am. J. Hum. Genet. 96, 986–991 (2015)
Clemente, F. J. et al. A selective sweep on a deleterious mutation in CPT1A in Arctic populations. Am. J. Hum. Genet. 95, 584–589 (2014)
Gudbjartsson, D. F. et al. Large-scale whole-genome sequencing of the Icelandic population. Nat. Genet. 47, 435–444 (2015)
Weinstein, J. N. et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genet. 45, 1113–1120 (2013)
Li, H. & Durbin, R. Inference of human population history from individual whole-genome sequences. Nature 475, 493–496 (2011)
Schiffels, S. & Durbin, R. Inferring human population size and separation history from multiple genome sequences. Nat. Genet. 46, 919–925 (2014)
Patterson, N. et al. Ancient admixture in human history. Genetics 192, 1065–1093 (2012)
Scally, A. & Durbin, R. Revising the human mutation rate: implications for understanding human evolution. Nat. Rev. Genet. 13, 745–753 (2012)
Grove, M. et al. Climatic variability, plasticity, and dispersal: a case study from Lake Tana, Ethiopia. J. Hum. Evol. 87, 32–47 (2015)
Kuhlwilm, M. et al. Ancient gene flow from early modern humans into Eastern Neanderthals. Nature 530, 429–433 (2016)
Groucutt, H. S. et al. Rethinking the dispersal of Homo sapiens out of Africa. Evol. Anthropol. 24, 149–164 (2015)
Liu, W. et al. The earliest unequivocally modern humans in southern China. Nature 526, 696–699 (2015)
Reyes-Centeno, H. et al. Genomic and cranial phenotype data support multiple modern human dispersals from Africa and a southern route into Asia. Proc. Natl Acad. Sci. USA 111, 7248–7253 (2014)
Mellars, P., Gori, K. C., Carr, M., Soares, P. A. & Richards, M. B. Genetic and archaeological perspectives on the initial modern human colonization of southern Asia. Proc. Natl Acad. Sci. USA 110, 10699–10704 (2013)
Prugnolle, F., Manica, A. & Balloux, F. Geography predicts neutral genetic diversity of human populations. Curr. Biol. 15, R159–R160 (2005)
Green, R. E. et al. A draft sequence of the Neandertal genome. Science 328, 710–722 (2010)
Reich, D. et al. Denisova admixture and the first modern human dispersals into Southeast Asia and Oceania. Am. J. Hum. Genet. 89, 516–528 (2011)
Fu, Q. et al. Genome sequence of a 45,000-year-old modern human from western Siberia. Nature 514, 445–449 (2014)
Fu, Q. et al. A revised timescale for human evolution based on ancient mitochondrial genomes. Curr. Biol. 23, 553–559 (2013)
Fu, Q. et al. The genetic history of Ice Age Europe. Nature 534, 200–205 (2016)
Meyer, M. et al. A high-coverage genome sequence from an archaic Denisovan individual. Science 338, 222–226 (2012)
Petkova, D., Novembre, J. & Stephens, M. Visualizing spatial population structure with estimated effective migration surfaces. Nat. Genet. 48, 94–100 (2016)
Hellenthal, G. et al. A genetic atlas of human admixture history. Science 343, 747–751 (2014)
Chapman, N. H. & Thompson, E. A. A model for the length of tracts of identity by descent in finite random mating populations. Theor. Popul. Biol. 64, 141–150 (2003)
Wall, J. D. et al. Higher levels of Neanderthal ancestry in East Asians than in Europeans. Genetics 194, 199–209 (2013)
Posth, C. et al. Pleistocene mitochondrial genomes suggest a single major dispersal of non-Africans and a late glacial population turnover in Europe. Curr. Biol. 26, 827–833 (2016)
1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012)
Migliano, A. B., Vinicius, L. & Lahr, M. M. Life history trade-offs explain the evolution of human pygmies. Proc. Natl Acad. Sci. USA 104, 20216–20219 (2007)
Chimpanzee Sequencing and Analysis Consortium. Initial sequence of the chimpanzee genome and comparison with the human genome. Nature 437, 69–87 (2005)
Lawson, D. J., Hellenthal, G., Myers, S. & Falush, D. Inference of population structure using dense haplotype data. PLoS Genet. 8, e1002453 (2012)
Behar, D. M. et al. A “Copernican” reassessment of the human mitochondrial DNA tree from its root. Am. J. Hum. Genet. 90, 675–684 (2012)
Soares, P. et al. The archaeogenetics of Europe. Curr. Biol. 20, R174–R183 (2010)
Support was provided by: Estonian Research Infrastructure Roadmap grant no 3.2.0304.11-0312; Australian Research Council Discovery grants (DP110102635 and DP140101405) (D.M.L., M.W. and E.W.); Danish National Research Foundation; the Lundbeck Foundation and KU2016 (E.W.); ERC Starting Investigator grant (FP7 - 261213) (T.K.); Estonian Research Council grant PUT766 (G.C. and M.K.); EU European Regional Development Fund through the Centre of Excellence in Genomics to Estonian Biocentre (R.V.; M.Me. and A.Me.), and Centre of Excellence for Genomics and Translational Medicine Project No. 2014-2020.4.01.15-0012 to EGC of UT (A.Me.) and EBC (M.Me.); Estonian Institutional Research grant IUT24-1 (L.S., M.J., A.K., B.Y., K.T., C.B.M., Le.S., H.Sa., S.L., D.M.B., E.M., R.V., G.H., M.K., G.C., T.K. and M.Me.) and IUT20-60 (A.Me.); French Ministry of Foreign and European Affairs and French ANR grant number ANR-14-CE31-0013-01 (F.-X.R.); Gates Cambridge Trust Funding (E.J.); ICG SB RAS (No. VI.58.1.1) (D.V.L.); Leverhulme Programme grant no. RP2011-R-045 (A.B.M., P.G. and M.G.T.); Ministry of Education and Science of Russia; Project 6.656.2014/K (S.A.F.); NEFREX grant funded by the European Union (People Marie Curie Actions; International Research Staff Exchange Scheme; call FP7-PEOPLE-2012-IRSES-number 318979) (M.Me., G.H. and M.K.); NIH grants 5DP1ES022577 05, 1R01DK104339-01, and 1R01GM113657-01 (S.Tis.); Russian Foundation for Basic Research (grant N 14-06-00180a) (M.G.); Russian Foundation for Basic Research; grant 16-04-00890 (O.B. and E.B); Russian Science Foundation grant 14-14-00827 (O.B.); The Russian Foundation for Basic Research (14-04-00725-a), The Russian Humanitarian Scientific Foundation (13-11-02014) and the Program of the Basic Research of the RAS Presidium “Biological diversity” (E.K.K.); Wellcome Trust and Royal Society grant WT104125AIA & the Bristol Advanced Computing Research Centre (http://www.bris.ac.uk/acrc/) (D.J.L.); Wellcome Trust grant 098051 (Q.A.; C.T.-S. and Y.X.); Wellcome Trust Senior Research Fellowship grant 100719/Z/12/Z (M.G.T.); Young Explorers Grant from the National Geographic Society (8900-11) (C.A.E.); ERC Consolidator Grant 647787 ‘LocalAdaptatio’ (A.Ma.); Program of the RAS Presidium “Basic research for the development of the Russian Arctic” (B.M.); Russian Foundation for Basic Research grant 16-06-00303 (E.B.); a Rutherford Fellowship (RDF-10-MAU-001) from the Royal Society of New Zealand (M.P.C.).
The authors declare no competing financial interests.
Reviewer Information Nature thanks R. Dennell and the other anonymous reviewer(s) for their contribution to the peer review of this work.
Extended data figures and tables
a, Map of location of samples highlighting the diversity/selection sets. b, Sample-level heterozygosity is plotted against distance from Addis Ababa. The trend line represents only non-African samples. The inset shows the waypoints used to arrive at the distance in kilometres for each sample. c, ADMIXTURE plot (K = 8 and 14) which relates general visual inspection of genetic structure to studied populations and their region of origin. d, Box plots were used to visualize the Denisova (red), Altai (green) and Croatian Neanderthal (blue) D distribution for each regional group of samples. Oceanian Altai D values show a remarkable similarity with the Denisova D values for the same region, in contrast with the other groups of samples where the Altai box plots tend to be more similar to the Croatian Neanderthal ones. Boxes show median, first and third quartiles, with 1.5× interquartile range whiskers and black dots as outliers.
a, b, Concordance of DNA sequencing (Complete Genomics Inc.) and DNA genotyping (Illumina genotyping arrays) data (ref-ref; het-ref-alt and hom-alt-alt, see Supplementary Information 1.6) from chip (a) and sequence data (b). c, Coverage (depth) distribution of variable positions, divided by DNA source (blood or saliva) and complete genomic calling pipeline (release version). d, Genome-wide distribution of transition/transversion ratio subdivided by DNA source (saliva or blood) and by complete genomic calling pipeline. e, Genome-wide distribution of transition/transversion ratio subdivided by chromosomes. f, Inter-chromosome differences in observed heterozygosity in 447 samples from the diversity set. g, Inter-chromosome differences in observed heterozygosity in a set of 50 unpublished genomes from the Estonian Genome Center, sequenced on an Illumina platform at an average coverage exceeding 30×. h, Inter-chromosome differences in observed heterozygosity in the phase 3 of the 1000 Genomes Project. The total number of observed heterozygous sites was divided by the number of accessible base pairs reported by the 1000 Genomes Project.
ChromoPainter and FineSTRUCTURE results, showing both inferred populations and the underlying (averaged) number of haplotypes that an individual in a population receives (rows) from donor individuals in other populations (columns). 108 populations are inferred by FineSTRUCTURE. The dendrogram shows the inferred relationship between populations. The numbers on the dendrogram give the proportion of MCMC iterations for which each population split is observed (where this is less than 1). Each ‘geographical region’ has a unique colour from which individuals are labelled. The number of individuals in each population is given in the label; for example, ‘4Italians; 3Albanians’ is a population of size 7 containing 4 individuals from Italy and 3 from Albania.
a, The MSMC split times estimated between each sample and a reference panel of nine genomes were linearly interpolated to infer the broader square matrix. b, c, Summary of outgroup f 3 statistics for each pair of non-African populations or an ancient sample using Yoruba as an outgroup. Populations are grouped by geographic region and are ordered with increasing distance from Africa (left to right for columns and bottom to top for rows). Colour bars at the left and top of the heat map indicate the colour coding used for the geographical region. Individual population labels are indicated at the right and bottom of the heat map. The f3 statistics are scaled to lie between 0 and 1, with a black colour indicating those close to 0 and a red colour indicating those close to 1. Let m and M be the minimum and maximum f3 values within a given row (that is, focal population). That is, for focal population X (on rows), m = minY,Y≠X f3(X, Y; Yoruba) and M = maxY,Y≠X f3(X, Y; Yoruba). The scaled f3 statistic for a given cell in that row is given by f3scaled = (f3 − m)/(M − m), so that the smallest f3 in the row has value f3scaled = 0 (black) and the largest has value f3scaled = 1 (red). By default, the diagonal has value f3scaled = 1 (red). The heat map is therefore asymmetric, with the population closest to the focal population at a given row having value f3scaled = 1 (red colour) and the population farthest from the focal population at a given row having value f3scaled = 0 (black colour). Therefore, at a given row, scanning the columns of the heat map reveals the populations with the most shared ancestry with the focal population of that row in the heat map.
Isolation by distance pattern across areas of high genetic gradient, using Europe as a baseline. The samples used in each analysis are indicated by coloured lines on the maps to the right of each plot. a–d, The panels show FST as a function of distance across the Himalayas (a), the Ural mountains (b), and the Caucasus (c) as reported on the colour-coded map (d). e, Effect of creating gaps in the samples in Europe. f, g, We tested the effect of removing samples from stripes, either north to south (f) or west to east (g), to create gaps comparable in size to the gaps in samples in the dataset. h, Effective migration surfaces inferred by EEMS.
a, Bar plot comparing frequency distributions of functional variants in Africans and non-Africans. The distribution of exonic SNPs according to their functional impact (synonymous, missense and nonsense) as a function of allele frequency. Note that the data from both groups was normalized for a sample size of n = 21 and that the Africans show significantly (χ2 P < 1 × 10-15) more rare variants across all sites classes. b, Result of 1,000 bootstrap replica of the RX/Y test for a subset of pigmentation genes highlighted by Genome Wide Association Studies (GWAS, n = 32). The horizontal line provides the African reference (x = 1) against which all other groups are compared. The blue and red marks show the 95th and the 5th percentile of the bootstrap distributions respectively. If the 95th percentile is below 1, then the population shows a significant excess of missense variants in the pigmentation subset relative to the Africans. Note that this is the case for all non-Africans except the Oceanians. c, Pools of individuals for selection scans. fineSTRUCTURE-based co-ancestry matrix was used to define twelve groups of populations for the downstream selection scans. These groups are highlighted in the plot by boxes with broken line edges. The number of individuals in each group is reported in Supplementary Table 1:3.2-I.
Extended Data Figure 7 Length of haplotypes assigned as African by fineSTRUCTURE as a function of genome proportion.
a, 447 Diversity Panel results, showing label averages (large crosses) along with individuals (small dots). b, Relative excluded Diversity Panel results, to check for whether including related individuals affects African genome fraction. Individuals that shared more than 2% of genome fraction were forbidden from receiving haplotypes from each other, and the painting was re-run on a large subset of the genome (all run of homozygosity (ROH) regions from any individual). c, ROH-only African haplotypes. To guard against phasing errors, we analysed only regions for which an individual was in a long (>500 kb) run of homozygosity using the PLINK command ‘–homozyg-window-kb 500000–homozyg-window-het 0–homozyg-density 10’. Because there are so few such regions, we report only the population average for populations with two or more individuals, as well as the standard error in that estimate. Populations for which the 95% confidence interval passed 0 were also excluded. Note the logarithmic axis. d, Ancient DNA panel results. We used a different panel of 109 individuals which included three ancient genomes. We painted chromosomes 11, 21 and 22 and report as crosses the population averages for populations with two or more individuals. The solid thin lines represent the position of each population when modern samples only are analysed. The dashed lines lead off the figure to the position of the ancient hominins and the African samples.
a–c, The examined Central Asian (a), East African (b), and African–American (c) genomes yielded a signature of MSMC split time (truth, left-most column) that could be recapitulated (reconstruction, second left-most column) as a linear mixture of other MSMC split times. The admixture proportions inferred by our method (top of each admixture component column) were remarkably similar to the ones previously reported from the literature. d, MSMC split times calculated after re-phasing an Estonian and a Papuan (Koinanbe) genome together with all the available West African and Pygmy genomes from our dataset to minimize putative phasing artefacts. The cross coalescence rate curves reported here are quantitatively comparable with the ones of Fig. 2a, hence showing that phasing artefacts are unlikely to explain the observed past-ward shift of the Papuan–African split time. e, Box plot showing the distribution of differences between African–Papuan and African–Eurasian split times obtained from coalescent simulations assembled through random replacement to make 2,000 sets of 6 individuals (to match the 6 Papuans available from our empirical dataset), each made of 1.5 Gb of sequence. The simulation command line used to generate each chromosome made of 5 Mb was as follows, where x is the variable for the divergence time used. x = 0.064, 0.4 or 0.8 for the xOoA, Denisova (Den) and Divergent Denisova (DeepDen) cases, respectively. ms0ancient2 10 1. 065.05 -t 5000. -r 3000. 5000000 -I 7 1 1 1 1 2 2 2 -en 0. 1 .2 -en 0. 2 .2 -en 0. 3 .2 -en 0. 4 .2 -es .025 7.96 -en .025 8.2 -ej.03 7 6 -ej.04 6 5 -ej.060 8 3 -ej.061 4 3 -ej.062 2 1 -ej.063 3 1 -ej x 1 5.
a, Joint distribution of haplotype lengths and derived allele count, showing the median position of each cluster and all haplotypes assigned to it in the maximum a posteriori (MAP) estimate. Note that although a different proportion of points is assigned to each in the MAP, the total posterior is very close to 1/K for all. The dashed lines show a constant mutation rate. Haplotypes are ordered by mutation rate from low to high. b, Residual distribution comparison between the two-component mixture using EUR.AFR and EUR.PNG (left), and the three-component mixture including xOoA (using the same colour scale) (right). The root mean square error (RMSE) residuals without xOoA are larger (RMSE = 0.0055 compared to RMSE = 0.0018) but more importantly, they are also structured. c, Assuming a mutational clock and a correct assignment of haplotypes, we can estimate the relative age of the splits from the number of derived alleles observed on the haplotypes. This leads to an estimate of 1.5 times older for xOoA compared to the Eurasian–Africa split.
A schematic illustrating, as suggested by the results presented here, a model of an early, extinct Out-of-Africa (xOoA) signature in the genomes of Sahul populations at their arrival in the region. Given the overall small genomic contribution of this event to the genomes of modern Sahul individuals, we could not determine whether the documented Denisova admixture (question marks) and putative multiple Neanderthal admixtures took place along this extinct OoA. We also speculate (question mark) people who migrated along the xOoA route may have left a trace in the genomes of the Altai Neanderthal as reported by Kuhlwilm and colleagues12.
About this article
Cite this article
Pagani, L., Lawson, D., Jagoda, E. et al. Genomic analyses inform on migration events during the peopling of Eurasia. Nature 538, 238–242 (2016). https://doi.org/10.1038/nature19792
Contrasting maternal and paternal genetic histories among five ethnic groups from Khyber Pakhtunkhwa, Pakistan
Scientific Reports (2022)
UCE sequencing-derived mitogenomes reveal the timing of mitochondrial replacement in Malagasy shrew tenrecs (Afrosoricida, Tenrecidae, Microgale)
Mammalian Biology (2022)
Comparison of sequencing data processing pipelines and application to underrepresented African human populations
BMC Bioinformatics (2021)
Population genetic considerations for using biobanks as international resources in the pandemic era and beyond
BMC Genomics (2021)