Genomic analyses inform on migration events during the peopling of Eurasia

Journal name:
Nature
Volume:
538,
Pages:
238–242
Date published:
DOI:
doi:10.1038/nature19792
Received
Accepted
Published online

High-coverage whole-genome sequence studies have so far focused on a limited number1 of geographically restricted populations2, 3, 4, 5, or been targeted at specific diseases, such as cancer6. Nevertheless, the availability of high-resolution genomic data has led to the development of new methodologies for inferring population history7, 8, 9 and refuelled the debate on the mutation rate in humans10. Here we present the Estonian Biocentre Human Genome Diversity Panel (EGDP), a dataset of 483 high-coverage human genomes from 148 populations worldwide, including 379 new genomes from 125 populations, which we group into diversity and selection sets. We analyse this dataset to refine estimates of continent-wide patterns of heterozygosity, long- and short-distance gene flow, archaic admixture, and changes in effective population size through time as well as for signals of positive or balancing selection. We find a genetic signature in present-day Papuans that suggests that at least 2% of their genome originates from an early and largely extinct expansion of anatomically modern humans (AMHs) out of Africa. Together with evidence from the western Asian fossil record11, and admixture between AMHs and Neanderthals predating the main Eurasian expansion12, our results contribute to the mounting evidence for the presence of AMHs out of Africa earlier than 75,000 years ago.

At a glance

Figures

  1. Genetic barriers across space.
    Figure 1: Genetic barriers across space.

    Spatial visualization of genetic barriers inferred from genome-wide genetic distances, quantified as the magnitude of the gradient of spatially interpolated allele frequencies (value denoted by colour bar; grey areas have been land during the last glacial maximum but are currently underwater). Here we used a spatial kernel smoothing method based on the matrix of pairwise average heterozygosity and a MATLAB script that plots the hexagons of the grid with a colour coding to represent gradients. Inset, partial correlation between magnitude of genetic gradients and combinations of different geographic factors, elevation (E), temperature (T) and precipitation (R), for genetic gradients from fineSTRUCTURE (red) and allele frequencies (blue). This analysis (Supplementary Information 2.2.2 for details) shows that genetic differences within this region display some correlation with physical barriers such as mountain ranges, deserts, forests, and open water (such as the Wallace line).

  2. Evidence of an xOoA signature in the genomes of modern Papuans.
    Figure 2: Evidence of an xOoA signature in the genomes of modern Papuans.

    a, MSMC split times plot. The Yoruba–Eurasia split curve shows the mean of all Eurasian genomes against one Yoruba genome. The grey area represents top and bottom 5% of runs. We chose a Koinanbe genome as representative of the Sahul populations. bd, Decomposition of Papuan haplotypes inferred as African by fineSTRUCTURE. b, Semi-parametric decomposition of the joint distribution of haplotype lengths and non-African derived allele rate per SNP, showing the relative proportion of haplotypes in K = 20 components of the distribution, ordered by non-African derived allele rate, relative to the overall proportion of haplotypes in each component. The four datasets produced by considering haplotypes inferred as (African/Denisova) in (Europeans/Papuans) are shown with our inferred ‘extra Out-of-Africa’ (xOoA) component. AFR, African; DEN, Denisova; PNG, Papuans; EUR, Europeans. c, The properties of the components in terms of non-African derived allele rate, on which the components are ordered, and length. d, The reconstruction of haplotypes inferred as African in the genomes of Papuan individuals, using a mixture of all other data (red) and with the addition of the xOoA signature (black).

  3. Sample Diversity and Archaic signals.
    Extended Data Fig. 1: Sample Diversity and Archaic signals.

    a, Map of location of samples highlighting the diversity/selection sets. b, Sample-level heterozygosity is plotted against distance from Addis Ababa. The trend line represents only non-African samples. The inset shows the waypoints used to arrive at the distance in kilometres for each sample. c, ADMIXTURE plot (K = 8 and 14) which relates general visual inspection of genetic structure to studied populations and their region of origin. d, Box plots were used to visualize the Denisova (red), Altai (green) and Croatian Neanderthal (blue) D distribution for each regional group of samples. Oceanian Altai D values show a remarkable similarity with the Denisova D values for the same region, in contrast with the other groups of samples where the Altai box plots tend to be more similar to the Croatian Neanderthal ones. Boxes show median, first and third quartiles, with 1.5× interquartile range whiskers and black dots as outliers.

  4. Data quality checks and heterozygosity patterns.
    Extended Data Fig. 2: Data quality checks and heterozygosity patterns.

    a, b, Concordance of DNA sequencing (Complete Genomics Inc.) and DNA genotyping (Illumina genotyping arrays) data (ref-ref; het-ref-alt and hom-alt-alt, see Supplementary Information 1.6) from chip (a) and sequence data (b). c, Coverage (depth) distribution of variable positions, divided by DNA source (blood or saliva) and complete genomic calling pipeline (release version). d, Genome-wide distribution of transition/transversion ratio subdivided by DNA source (saliva or blood) and by complete genomic calling pipeline. e, Genome-wide distribution of transition/transversion ratio subdivided by chromosomes. f, Inter-chromosome differences in observed heterozygosity in 447 samples from the diversity set. g, Inter-chromosome differences in observed heterozygosity in a set of 50 unpublished genomes from the Estonian Genome Center, sequenced on an Illumina platform at an average coverage exceeding 30×. h, Inter-chromosome differences in observed heterozygosity in the phase 3 of the 1000 Genomes Project. The total number of observed heterozygous sites was divided by the number of accessible base pairs reported by the 1000 Genomes Project.

  5. FineSTRUCTURE shared ancestry analysis.
    Extended Data Fig. 3: FineSTRUCTURE shared ancestry analysis.

    ChromoPainter and FineSTRUCTURE results, showing both inferred populations and the underlying (averaged) number of haplotypes that an individual in a population receives (rows) from donor individuals in other populations (columns). 108 populations are inferred by FineSTRUCTURE. The dendrogram shows the inferred relationship between populations. The numbers on the dendrogram give the proportion of MCMC iterations for which each population split is observed (where this is less than 1). Each ‘geographical region’ has a unique colour from which individuals are labelled. The number of individuals in each population is given in the label; for example, ‘4Italians; 3Albanians’ is a population of size 7 containing 4 individuals from Italy and 3 from Albania.

  6. MSMC genetic split times and outgroup f3 results.
    Extended Data Fig. 4: MSMC genetic split times and outgroup f3 results.

    a, The MSMC split times estimated between each sample and a reference panel of nine genomes were linearly interpolated to infer the broader square matrix. b, c, Summary of outgroup f 3 statistics for each pair of non-African populations or an ancient sample using Yoruba as an outgroup. Populations are grouped by geographic region and are ordered with increasing distance from Africa (left to right for columns and bottom to top for rows). Colour bars at the left and top of the heat map indicate the colour coding used for the geographical region. Individual population labels are indicated at the right and bottom of the heat map. The f3 statistics are scaled to lie between 0 and 1, with a black colour indicating those close to 0 and a red colour indicating those close to 1. Let m and M be the minimum and maximum f3 values within a given row (that is, focal population). That is, for focal population X (on rows), m = minY,YX f3(X, Y; Yoruba) and M = maxY,YX f3(X, Y; Yoruba). The scaled f3 statistic for a given cell in that row is given by f3scaled = (f3m)/(Mm), so that the smallest f3 in the row has value f3scaled = 0 (black) and the largest has value f3scaled = 1 (red). By default, the diagonal has value f3scaled = 1 (red). The heat map is therefore asymmetric, with the population closest to the focal population at a given row having value f3scaled = 1 (red colour) and the population farthest from the focal population at a given row having value f3scaled = 0 (black colour). Therefore, at a given row, scanning the columns of the heat map reveals the populations with the most shared ancestry with the focal population of that row in the heat map.

  7. Geographical patterns of genetic diversity.
    Extended Data Fig. 5: Geographical patterns of genetic diversity.

    Isolation by distance pattern across areas of high genetic gradient, using Europe as a baseline. The samples used in each analysis are indicated by coloured lines on the maps to the right of each plot. ad, The panels show FST as a function of distance across the Himalayas (a), the Ural mountains (b), and the Caucasus (c) as reported on the colour-coded map (d). e, Effect of creating gaps in the samples in Europe. f, g, We tested the effect of removing samples from stripes, either north to south (f) or west to east (g), to create gaps comparable in size to the gaps in samples in the dataset. h, Effective migration surfaces inferred by EEMS.

  8. Summary of positive selection results.
    Extended Data Fig. 6: Summary of positive selection results.

    a, Bar plot comparing frequency distributions of functional variants in Africans and non-Africans. The distribution of exonic SNPs according to their functional impact (synonymous, missense and nonsense) as a function of allele frequency. Note that the data from both groups was normalized for a sample size of n = 21 and that the Africans show significantly (χ2 P < 1 × 10-15) more rare variants across all sites classes. b, Result of 1,000 bootstrap replica of the RX/Y test for a subset of pigmentation genes highlighted by Genome Wide Association Studies (GWAS, n = 32). The horizontal line provides the African reference (x = 1) against which all other groups are compared. The blue and red marks show the 95th and the 5th percentile of the bootstrap distributions respectively. If the 95th percentile is below 1, then the population shows a significant excess of missense variants in the pigmentation subset relative to the Africans. Note that this is the case for all non-Africans except the Oceanians. c, Pools of individuals for selection scans. fineSTRUCTURE-based co-ancestry matrix was used to define twelve groups of populations for the downstream selection scans. These groups are highlighted in the plot by boxes with broken line edges. The number of individuals in each group is reported in Supplementary Table 1:3.2-I.

  9. Length of haplotypes assigned as African by fineSTRUCTURE as a function of genome proportion.
    Extended Data Fig. 7: Length of haplotypes assigned as African by fineSTRUCTURE as a function of genome proportion.

    a, 447 Diversity Panel results, showing label averages (large crosses) along with individuals (small dots). b, Relative excluded Diversity Panel results, to check for whether including related individuals affects African genome fraction. Individuals that shared more than 2% of genome fraction were forbidden from receiving haplotypes from each other, and the painting was re-run on a large subset of the genome (all run of homozygosity (ROH) regions from any individual). c, ROH-only African haplotypes. To guard against phasing errors, we analysed only regions for which an individual was in a long (>500 kb) run of homozygosity using the PLINK command ‘–homozyg-window-kb 500000–homozyg-window-het 0–homozyg-density 10’. Because there are so few such regions, we report only the population average for populations with two or more individuals, as well as the standard error in that estimate. Populations for which the 95% confidence interval passed 0 were also excluded. Note the logarithmic axis. d, Ancient DNA panel results. We used a different panel of 109 individuals which included three ancient genomes. We painted chromosomes 11, 21 and 22 and report as crosses the population averages for populations with two or more individuals. The solid thin lines represent the position of each population when modern samples only are analysed. The dashed lines lead off the figure to the position of the ancient hominins and the African samples.

  10. MSMC Linear behaviour of MSMC split estimates in presence of admixture.
    Extended Data Fig. 8: MSMC Linear behaviour of MSMC split estimates in presence of admixture.

    ac, The examined Central Asian (a), East African (b), and African–American (c) genomes yielded a signature of MSMC split time (truth, left-most column) that could be recapitulated (reconstruction, second left-most column) as a linear mixture of other MSMC split times. The admixture proportions inferred by our method (top of each admixture component column) were remarkably similar to the ones previously reported from the literature. d, MSMC split times calculated after re-phasing an Estonian and a Papuan (Koinanbe) genome together with all the available West African and Pygmy genomes from our dataset to minimize putative phasing artefacts. The cross coalescence rate curves reported here are quantitatively comparable with the ones of Fig. 2a, hence showing that phasing artefacts are unlikely to explain the observed past-ward shift of the Papuan–African split time. e, Box plot showing the distribution of differences between African–Papuan and African–Eurasian split times obtained from coalescent simulations assembled through random replacement to make 2,000 sets of 6 individuals (to match the 6 Papuans available from our empirical dataset), each made of 1.5 Gb of sequence. The simulation command line used to generate each chromosome made of 5 Mb was as follows, where x is the variable for the divergence time used. x = 0.064, 0.4 or 0.8 for the xOoA, Denisova (Den) and Divergent Denisova (DeepDen) cases, respectively. ms0ancient2 10 1. 065.05 -t 5000. -r 3000. 5000000 -I 7 1 1 1 1 2 2 2 -en 0. 1 .2 -en 0. 2 .2 -en 0. 3 .2 -en 0. 4 .2 -es .025 7.96 -en .025 8.2 -ej.03 7 6 -ej.04 6 5 -ej.060 8 3 -ej.061 4 3 -ej.062 2 1 -ej.063 3 1 -ej x 1 5.

  11. Modelling the xOoA components with FineSTRUCTURE.
    Extended Data Fig. 9: Modelling the xOoA components with FineSTRUCTURE.

    a, Joint distribution of haplotype lengths and derived allele count, showing the median position of each cluster and all haplotypes assigned to it in the maximum a posteriori (MAP) estimate. Note that although a different proportion of points is assigned to each in the MAP, the total posterior is very close to 1/K for all. The dashed lines show a constant mutation rate. Haplotypes are ordered by mutation rate from low to high. b, Residual distribution comparison between the two-component mixture using EUR.AFR and EUR.PNG (left), and the three-component mixture including xOoA (using the same colour scale) (right). The root mean square error (RMSE) residuals without xOoA are larger (RMSE = 0.0055 compared to RMSE = 0.0018) but more importantly, they are also structured. c, Assuming a mutational clock and a correct assignment of haplotypes, we can estimate the relative age of the splits from the number of derived alleles observed on the haplotypes. This leads to an estimate of 1.5 times older for xOoA compared to the Eurasian–Africa split.

  12. Proposed xOoA model.
    Extended Data Fig. 10: Proposed xOoA model.

    A schematic illustrating, as suggested by the results presented here, a model of an early, extinct Out-of-Africa (xOoA) signature in the genomes of Sahul populations at their arrival in the region. Given the overall small genomic contribution of this event to the genomes of modern Sahul individuals, we could not determine whether the documented Denisova admixture (question marks) and putative multiple Neanderthal admixtures took place along this extinct OoA. We also speculate (question mark) people who migrated along the xOoA route may have left a trace in the genomes of the Altai Neanderthal as reported by Kuhlwilm and colleagues12.

Accession codes

Primary accessions

European Nucleotide Archive

References

  1. Drmanac, R. et al. Human genome sequencing using unchained base reads on self-assembling DNA nanoarrays. Science 327, 7881 (2010)
  2. Lachance, J. et al. Evolutionary history and adaptation from high-coverage whole-genome sequences of diverse African hunter-gatherers. Cell 150, 457469 (2012)
  3. Pagani, L. et al. Tracing the route of modern humans out of Africa by using 225 human genome sequences from Ethiopians and Egyptians. Am. J. Hum. Genet. 96, 986991 (2015)
  4. Clemente, F. J. et al. A selective sweep on a deleterious mutation in CPT1A in Arctic populations. Am. J. Hum. Genet. 95, 584589 (2014)
  5. Gudbjartsson, D. F. et al. Large-scale whole-genome sequencing of the Icelandic population. Nat. Genet. 47, 435444 (2015)
  6. Weinstein, J. N. et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genet. 45, 11131120 (2013)
  7. Li, H. & Durbin, R. Inference of human population history from individual whole-genome sequences. Nature 475, 493496 (2011)
  8. Schiffels, S. & Durbin, R. Inferring human population size and separation history from multiple genome sequences. Nat. Genet. 46, 919925 (2014)
  9. Patterson, N. et al. Ancient admixture in human history. Genetics 192, 10651093 (2012)
  10. Scally, A. & Durbin, R. Revising the human mutation rate: implications for understanding human evolution. Nat. Rev. Genet. 13, 745753 (2012)
  11. Grove, M. et al. Climatic variability, plasticity, and dispersal: a case study from Lake Tana, Ethiopia. J. Hum. Evol. 87, 3247 (2015)
  12. Kuhlwilm, M. et al. Ancient gene flow from early modern humans into Eastern Neanderthals. Nature 530, 429433 (2016)
  13. Groucutt, H. S. et al. Rethinking the dispersal of Homo sapiens out of Africa. Evol. Anthropol. 24, 149164 (2015)
  14. Liu, W. et al. The earliest unequivocally modern humans in southern China. Nature 526, 696699 (2015)
  15. Reyes-Centeno, H. et al. Genomic and cranial phenotype data support multiple modern human dispersals from Africa and a southern route into Asia. Proc. Natl Acad. Sci. USA 111, 72487253 (2014)
  16. Mellars, P., Gori, K. C., Carr, M., Soares, P. A. & Richards, M. B. Genetic and archaeological perspectives on the initial modern human colonization of southern Asia. Proc. Natl Acad. Sci. USA 110, 1069910704 (2013)
  17. Prugnolle, F., Manica, A. & Balloux, F. Geography predicts neutral genetic diversity of human populations. Curr. Biol. 15, R159R160 (2005)
  18. Green, R. E. et al. A draft sequence of the Neandertal genome. Science 328, 710722 (2010)
  19. Reich, D. et al. Denisova admixture and the first modern human dispersals into Southeast Asia and Oceania. Am. J. Hum. Genet. 89, 516528 (2011)
  20. Fu, Q. et al. Genome sequence of a 45,000-year-old modern human from western Siberia. Nature 514, 445449 (2014)
  21. Fu, Q. et al. A revised timescale for human evolution based on ancient mitochondrial genomes. Curr. Biol. 23, 553559 (2013)
  22. Fu, Q. et al. The genetic history of Ice Age Europe. Nature 534, 200205 (2016)
  23. Meyer, M. et al. A high-coverage genome sequence from an archaic Denisovan individual. Science 338, 222226 (2012)
  24. Petkova, D., Novembre, J. & Stephens, M. Visualizing spatial population structure with estimated effective migration surfaces. Nat. Genet. 48, 94100 (2016)
  25. Hellenthal, G. et al. A genetic atlas of human admixture history. Science 343, 747751 (2014)
  26. Chapman, N. H. & Thompson, E. A. A model for the length of tracts of identity by descent in finite random mating populations. Theor. Popul. Biol. 64, 141150 (2003)
  27. Wall, J. D. et al. Higher levels of Neanderthal ancestry in East Asians than in Europeans. Genetics 194, 199209 (2013)
  28. Posth, C. et al. Pleistocene mitochondrial genomes suggest a single major dispersal of non-Africans and a late glacial population turnover in Europe. Curr. Biol. 26, 827833 (2016)
  29. 1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 5665 (2012)
  30. Migliano, A. B., Vinicius, L. & Lahr, M. M. Life history trade-offs explain the evolution of human pygmies. Proc. Natl Acad. Sci. USA 104, 2021620219 (2007)
  31. Chimpanzee Sequencing and Analysis Consortium. Initial sequence of the chimpanzee genome and comparison with the human genome. Nature 437, 6987 (2005)
  32. Lawson, D. J., Hellenthal, G., Myers, S. & Falush, D. Inference of population structure using dense haplotype data. PLoS Genet. 8, e1002453 (2012)
  33. Behar, D. M. et al. A “Copernican” reassessment of the human mitochondrial DNA tree from its root. Am. J. Hum. Genet. 90, 675684 (2012)
  34. Soares, P. et al. The archaeogenetics of Europe. Curr. Biol. 20, R174R183 (2010)

Download references

Author information

  1. These authors contributed equally to this work.

    • Luca Pagani,
    • Daniel John Lawson,
    • Evelyn Jagoda,
    • Alexander Mörseburg,
    • Anders Eriksson,
    • Richard Villems,
    • Eske Willerslev,
    • Toomas Kivisild &
    • Mait Metspalu

Affiliations

  1. Estonian Biocentre, 51010 Tartu, Estonia

    • Luca Pagani,
    • Georgi Hudjashov,
    • Lauri Saag,
    • Mari Järve,
    • Monika Karmin,
    • Alena Kushniarevich,
    • Bayazit Yunusbayev,
    • Kristiina Tambets,
    • Chandana Basu Mallick,
    • Hovhannes Sahakyan,
    • Gyaneshwer Chaubey,
    • Sergei Litvinov,
    • Doron M. Behar,
    • Ene Metspalu,
    • Richard Villems,
    • Toomas Kivisild &
    • Mait Metspalu
  2. Department of Archaeology and Anthropology, University of Cambridge, Cambridge CB2 1QH, UK

    • Luca Pagani,
    • Evelyn Jagoda,
    • Alexander Mörseburg,
    • Florian Clemente,
    • Alexia Cardona,
    • Sarah Kaewert,
    • Charlotte Inchley,
    • Christiana L. Scheib,
    • Florin Mircea Iliescu,
    • Christina A. Eichstaedt &
    • Toomas Kivisild
  3. Department of Biological, Geological and Environmental Sciences, University of Bologna, Via Selmi 3, 40126 Bologna, Italy

    • Luca Pagani
  4. Integrative Epidemiology Unit, School of Social and Community Medicine, University of Bristol, Bristol BS8 2BN, UK

    • Daniel John Lawson
  5. Department of Human Evolutionary Biology, Harvard University, Cambridge, Massachusetts 02138, USA

    • Evelyn Jagoda
  6. Integrative Systems Biology Lab, Division of Biological and Environmental Sciences & Engineering, King Abdullah University of Science and Technology, Thuwal 23955-6900, Saudi Arabia

    • Anders Eriksson
  7. Department of Zoology, University of Cambridge, Cambridge CB2 3EJ, UK

    • Anders Eriksson &
    • Andrea Manica
  8. Estonian Genome Center, University of Tartu, 51010 Tartu, Estonia

    • Mario Mitt,
    • Reedik Mägi,
    • Evelin Mihailov &
    • Andres Metspalu
  9. Department of Biotechnology, Institute of Molecular and Cell Biology, University of Tartu, 51010 Tartu, Estonia

    • Mario Mitt &
    • Andres Metspalu
  10. Institut de Biologie Computationnelle, Université Montpellier 2, 34095 Montpellier, France

    • Florian Clemente
  11. Department of Psychology, University of Auckland, Auckland 1142, New Zealand

    • Georgi Hudjashov &
    • Monika Karmin
  12. Statistics and Bioinformatics Group, Institute of Fundamental Sciences, Massey University, 4442 Palmerston North, New Zealand

    • Georgi Hudjashov &
    • Murray P. Cox
  13. Department of Biology, Pennsylvania State University, University Park, Pennsylvania 16802, USA

    • Michael DeGiorgio
  14. Institute for Human Genetics, University of California, San Francisco, California 94143, USA

    • Jeffrey D. Wall
  15. MRC Epidemiology Unit, University of Cambridge, Institute of Metabolic Science, Box 285, Addenbrooke’s Hospital, Hills Road, Cambridge CB2 0QQ, UK

    • Alexia Cardona
  16. School of Life Sciences, Arizona State University, Tempe, Arizona 85287, USA

    • Melissa A. Wilson Sayres
  17. Center for Evolution and Medicine, The Biodesign Institute, Tempe, Arizona 85287, USA

    • Melissa A. Wilson Sayres
  18. Department of Evolutionary Biology, Institute of Molecular and Cell Biology, University of Tartu, 51010 Tartu, Estonia

    • Monika Karmin,
    • Lehti Saag,
    • Ene Metspalu &
    • Richard Villems
  19. Mathematical Sciences, University of Southampton, Southampton SO17 1BJ, UK

    • Guy S. Jacobs
  20. Institute for Complex Systems Simulation, University of Southampton, Southampton SO17 1BJ, UK

    • Guy S. Jacobs
  21. Division of Biological Sciences, University of Montana, Missoula, Montana 59812, USA

    • Tiago Antao
  22. Institute of Genetics and Cytology, National Academy of Sciences, BY-220072 Minsk, Belarus

    • Alena Kushniarevich
  23. The Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK

    • Qasim Ayub,
    • Chris Tyler-Smith &
    • Yali Xue
  24. Institute of Biochemistry and Genetics, Ufa Scientific Center of RAS, 450054 Ufa, Russia

    • Bayazit Yunusbayev,
    • Alexandra Karunas,
    • Sergei Litvinov,
    • Rita Khusainova,
    • Vita Akhmetova,
    • Irina Khidiyatova &
    • Elza K. Khusnutdinova
  25. Kuban State Medical University, 350040 Krasnodar, Russia

    • Elvira Pocheshkhova
  26. Scientific Research Center of the Caucasian Ethnic Groups, St. Andrews Georgian University, 0162 Tbilisi, Georgia

    • George Andriadze
  27. Center for GeoGenetics, University of Copenhagen, 1350 Copenhagen, Denmark

    • Craig Muller,
    • Rasmus Nielsen &
    • Eske Willerslev
  28. Research Centre for Human Evolution, Environmental Futures Research Institute, Griffith University, Nathan, Queensland 4111, Australia

    • Michael C. Westaway &
    • David M. Lambert
  29. Center of Molecular Diagnosis and Genetic Research, University Hospital of Obstetrics and Gynecology, 1000 Tirana, Albania

    • Grigor Zoraqi
  30. Center of High Technology, Academy of Sciences, 100047 Tashkent, Uzbekistan

    • Shahlo Turdikulova
  31. Institute of Bioorganic Chemistry Academy of Science, 100047 Tashkent, Uzbekistan

    • Dilbar Dalimova
  32. L.N. Gumilyov Eurasian National University, 010008 Astana, Kazakhstan

    • Zhaxylyk Sabitov
  33. Centre for Advanced Research in Sciences (CARS), DNA Sequencing Research Laboratory, University of Dhaka, Dhaka-1000, Bangladesh

    • Gazi Nurun Nahar Sultana
  34. Department of Genetics, University of Pennsylvania, Philadelphia, Pennsylvania 19104-6145, USA

    • Joseph Lachance
  35. School of Biological Sciences, Georgia Institute of Technology, Atlanta, Georgia 30332, USA

    • Joseph Lachance
  36. Departments of Genetics and Biology, University of Pennsylvania, Philadelphia, Pennsylvania 19104-6313, USA

    • Sarah Tishkoff
  37. DNcode laboratories, 117623 Moscow, Russia

    • Kuvat Momynaliev
  38. Institute of Molecular Biology and Medicine, 720040 Bishkek, Kyrgyzstan

    • Jainagul Isakova
  39. Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, 630090 Novosibirsk, Russia

    • Larisa D. Damba,
    • Marina Gubina,
    • Daria V. Lichman,
    • Mikhail Voevoda &
    • Ludmila P. Osipova
  40. Mongolian Academy of Medical Sciences, 210620 Ulaanbaatar, Mongolia

    • Pagbajabyn Nymadawa
  41. Northern State Medical University, 163000 Arkhangelsk, Russia

    • Irina Evseeva
  42. Anthony Nolan, The Royal Free Hospital, Pond Street, London NW3 2QG, UK

    • Irina Evseeva
  43. V. N. Karazin Kharkiv National University, 61022 Kharkiv, Ukraine

    • Lubov Atramentova &
    • Olga Utevska
  44. Evolutionary Medicine group, Laboratoire d’Anthropologie Moléculaire et Imagerie de Synthèse, UMR 5288, Centre National de la Recherche Scientifique, Université de Toulouse 3, Toulouse 31073, France

    • François-Xavier Ricaut,
    • Nicolas Brucato &
    • Thierry Letellier
  45. Genome Diversity and Diseases Laboratory, Eijkman Institute for Molecular Biology, 10430 Jakarta, Indonesia

    • Herawati Sudoyo
  46. Department of Molecular Genetics, Yakut Scientific Centre of Complex Medical Problems, 677027 Yakutsk, Russia

    • Nikolay A. Barashkov &
    • Sardana A. Fedorova
  47. Laboratory of Molecular Biology, Institute of Natural Sciences, M.K. Ammosov North-Eastern Federal University, 677027 Yakutsk, Russia

    • Nikolay A. Barashkov &
    • Sardana A. Fedorova
  48. Genos DNA laboratory, 10000 Zagreb, Croatia

    • Vedrana Škaro
  49. University of Osijek, Medical School, 31000 Osijek, Croatia

    • Vedrana Škaro &
    • Dragan Primorac
  50. Center for Genomics and Transcriptomics, CeGaT, GmbH, D-72076 Tübingen, Germany

    • Lejla Mulahasanovic´
  51. St. Catherine Specialty Hospital, 49210 Zabok and 10000 Zagreb, Croatia

    • Dragan Primorac
  52. Eberly College of Science, The Pennsylvania State University, University Park, Pennsylvania 16802, USA

    • Dragan Primorac
  53. University of Split, Medical School, 21000 Split, Croatia

    • Dragan Primorac
  54. Laboratory of Ethnogenomics, Institute of Molecular Biology, National Academy of Sciences, Republic of Armenia, 7 Hasratyan Street, 0014 Yerevan, Armenia

    • Hovhannes Sahakyan &
    • Levon Yepiskoposyan
  55. Department of Applied Social Sciences, University of Winchester, Sparkford Road, Winchester SO22 4NR, UK

    • Maru Mormina
  56. Thoraxklinik Heidelberg, University Hospital Heidelberg, 69120 Heidelberg, Germany

    • Christina A. Eichstaedt
  57. Novosibirsk State University, 630090 Novosibirsk, Russia

    • Daria V. Lichman,
    • Mikhail Voevoda &
    • Ludmila P. Osipova
  58. RIPAS Hospital, Bandar Seri Begawan, BE1518 Brunei

    • Syafiq Abdullah
  59. National Cancer Centre Singapore, 169610 Singapore

    • Joseph T. S. Wee
  60. Department of Genetics and Fundamental Medicine, Bashkir State University, 450000 Ufa, Russia

    • Alexandra Karunas,
    • Sergei Litvinov,
    • Rita Khusainova,
    • Natalya Ekomasova,
    • Irina Khidiyatova &
    • Elza K. Khusnutdinova
  61. Department of Genetics and Bioengineering. Faculty of Engineering and Information Technologies, International Burch University, 71000 Sarajevo, Bosnia and Herzegovina

    • Damir Marjanović
  62. Institute for Anthropological Researches, 10000 Zagreb, Croatia

    • Damir Marjanović
  63. Research Centre for Medical Genetics, Russian Academy of Sciences, Moscow 115478, Russia

    • Elena Balanovska &
    • Oleg Balanovsky
  64. Genetics Laboratory, Institute of Biological Problems of the North, Russian Academy of Sciences, 685000 Magadan, Russia

    • Miroslava Derenko &
    • Boris Malyarchuk
  65. Institute of Internal Medicine, Siberian Branch of Russian Academy of Medical Sciences, 630009 Novosibirsk, Russia

    • Mikhail Voevoda
  66. Leverhulme Centre for Human Evolutionary Studies, Department of Archaeology and Anthropology, University of Cambridge, Cambridge CB2 1QH, UK

    • Marta Mirazón Lahr
  67. Research Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK

    • Pascale Gerbault &
    • Mark G. Thomas
  68. Department of Archaeology, University of Papua New Guinea, University PO Box 320, 134 NCD, Papua New Guinea

    • Matthew Leavesley
  69. College of Arts, Society and Education, James Cook University, PO Box 6811, Cairns, Queensland 4870, Australia

    • Matthew Leavesley
  70. Department of Anthropology, University College London, London WC1H 0BW, UK

    • Andrea Bamberg Migliano
  71. Max Planck Institute for the Science of Human History, Kahlaische Strasse 10, D-07743 Jena, Germany

    • Michael Petraglia
  72. Vavilov Institute for General Genetics, Russian Academy of Sciences, 119333 Moscow, Russia

    • Oleg Balanovsky
  73. Department of Integrative Biology, University of California Berkeley, Berkeley 94720, California, USA

    • Rasmus Nielsen
  74. Estonian Academy of Sciences, 6 Kohtu Street, Tallinn 10130, Estonia

    • Richard Villems

Contributions

R.V., E.W., T.K. and M.Me. conceived the study. A.K., K.T., C.B.M., Le.S., E.P., G.A., C.M., M.W., D.L., G.Z., S.T., D.D., Z.S., G.N.N.S., K.M., J.I., L.D.D., M.G., P.N., I.E., L.At., O.U., F.-X.R., N.B., H.S., T.L., M.P.C., N.A.B., V.S., L.A., D.Pr., H.Sa., M.Mo., C.A.E., D.V.L., S.A., G.C., J.T.S.W., E.Mi., A.Ka., S.L., R.K., N.T., V.A., I.K., D.M., L.Y., D.M.B., E.B., A.Me., M.D., B.M., M.V., S.A.F., L.P.O., M.Mi., M.L., A.B.M., O.B., E.K.K, E.M., M.G.T. and E.W. conducted anthropological research and/or sample collection and management. J.L. and S.Ti. provided access to data. L.P., D.J.L, E.J., A.Mo., A.E., M.Mi., F.C., G.H., M.D., L.S., J.W., A.C., R.M., M.A.W.S., S.K., C.I., C.L.S., M.J., M.K., G.S.J., T.A., F.M.I., A.K., Q.A., C.T.-S., Y.X., B.Y., C.B.M., T.K. and M.Me. analysed data. L.P., D.J.L., E.J., A.Mo., L.S., M.K., K.T., C.B.M., Le.S., G.C., M.Mi., P.G., M.L., A.B.M., M.P., E.M., M.G.T., A.Ma., R.N., R.V., E.W., T.K. and M.Me. contributed to the interpretation of results. L.P., D.J.L., E.J., A.Mo., A.E., F.C., G.H., M.D., A.C., M.A.W.S., B.Y., J.L., S.Ti., M.Mi., P.G., M.L., A.B.M., M.P., M.G.T., A.Ma., R.N., R.V., E.W., T.K. and M.Me. wrote the manuscript.

Competing financial interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to:

The newly sequenced genomes are part of the Estonian Biocentre human Genome Diversity Panel (EGDP) and were deposited in the ENA archive under accession number PRJEB12437 and are also freely available through the Estonian Biocentre website (www.ebc.ee/free_data)

Reviewer Information Nature thanks R. Dennell and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Author details

Extended data figures and tables

Extended Data Figures

  1. Extended Data Figure 1: Sample Diversity and Archaic signals. (727 KB)

    a, Map of location of samples highlighting the diversity/selection sets. b, Sample-level heterozygosity is plotted against distance from Addis Ababa. The trend line represents only non-African samples. The inset shows the waypoints used to arrive at the distance in kilometres for each sample. c, ADMIXTURE plot (K = 8 and 14) which relates general visual inspection of genetic structure to studied populations and their region of origin. d, Box plots were used to visualize the Denisova (red), Altai (green) and Croatian Neanderthal (blue) D distribution for each regional group of samples. Oceanian Altai D values show a remarkable similarity with the Denisova D values for the same region, in contrast with the other groups of samples where the Altai box plots tend to be more similar to the Croatian Neanderthal ones. Boxes show median, first and third quartiles, with 1.5× interquartile range whiskers and black dots as outliers.

  2. Extended Data Figure 2: Data quality checks and heterozygosity patterns. (365 KB)

    a, b, Concordance of DNA sequencing (Complete Genomics Inc.) and DNA genotyping (Illumina genotyping arrays) data (ref-ref; het-ref-alt and hom-alt-alt, see Supplementary Information 1.6) from chip (a) and sequence data (b). c, Coverage (depth) distribution of variable positions, divided by DNA source (blood or saliva) and complete genomic calling pipeline (release version). d, Genome-wide distribution of transition/transversion ratio subdivided by DNA source (saliva or blood) and by complete genomic calling pipeline. e, Genome-wide distribution of transition/transversion ratio subdivided by chromosomes. f, Inter-chromosome differences in observed heterozygosity in 447 samples from the diversity set. g, Inter-chromosome differences in observed heterozygosity in a set of 50 unpublished genomes from the Estonian Genome Center, sequenced on an Illumina platform at an average coverage exceeding 30×. h, Inter-chromosome differences in observed heterozygosity in the phase 3 of the 1000 Genomes Project. The total number of observed heterozygous sites was divided by the number of accessible base pairs reported by the 1000 Genomes Project.

  3. Extended Data Figure 3: FineSTRUCTURE shared ancestry analysis. (716 KB)

    ChromoPainter and FineSTRUCTURE results, showing both inferred populations and the underlying (averaged) number of haplotypes that an individual in a population receives (rows) from donor individuals in other populations (columns). 108 populations are inferred by FineSTRUCTURE. The dendrogram shows the inferred relationship between populations. The numbers on the dendrogram give the proportion of MCMC iterations for which each population split is observed (where this is less than 1). Each ‘geographical region’ has a unique colour from which individuals are labelled. The number of individuals in each population is given in the label; for example, ‘4Italians; 3Albanians’ is a population of size 7 containing 4 individuals from Italy and 3 from Albania.

  4. Extended Data Figure 4: MSMC genetic split times and outgroup f3 results. (1,027 KB)

    a, The MSMC split times estimated between each sample and a reference panel of nine genomes were linearly interpolated to infer the broader square matrix. b, c, Summary of outgroup f 3 statistics for each pair of non-African populations or an ancient sample using Yoruba as an outgroup. Populations are grouped by geographic region and are ordered with increasing distance from Africa (left to right for columns and bottom to top for rows). Colour bars at the left and top of the heat map indicate the colour coding used for the geographical region. Individual population labels are indicated at the right and bottom of the heat map. The f3 statistics are scaled to lie between 0 and 1, with a black colour indicating those close to 0 and a red colour indicating those close to 1. Let m and M be the minimum and maximum f3 values within a given row (that is, focal population). That is, for focal population X (on rows), m = minY,YX f3(X, Y; Yoruba) and M = maxY,YX f3(X, Y; Yoruba). The scaled f3 statistic for a given cell in that row is given by f3scaled = (f3m)/(Mm), so that the smallest f3 in the row has value f3scaled = 0 (black) and the largest has value f3scaled = 1 (red). By default, the diagonal has value f3scaled = 1 (red). The heat map is therefore asymmetric, with the population closest to the focal population at a given row having value f3scaled = 1 (red colour) and the population farthest from the focal population at a given row having value f3scaled = 0 (black colour). Therefore, at a given row, scanning the columns of the heat map reveals the populations with the most shared ancestry with the focal population of that row in the heat map.

  5. Extended Data Figure 5: Geographical patterns of genetic diversity. (730 KB)

    Isolation by distance pattern across areas of high genetic gradient, using Europe as a baseline. The samples used in each analysis are indicated by coloured lines on the maps to the right of each plot. ad, The panels show FST as a function of distance across the Himalayas (a), the Ural mountains (b), and the Caucasus (c) as reported on the colour-coded map (d). e, Effect of creating gaps in the samples in Europe. f, g, We tested the effect of removing samples from stripes, either north to south (f) or west to east (g), to create gaps comparable in size to the gaps in samples in the dataset. h, Effective migration surfaces inferred by EEMS.

  6. Extended Data Figure 6: Summary of positive selection results. (379 KB)

    a, Bar plot comparing frequency distributions of functional variants in Africans and non-Africans. The distribution of exonic SNPs according to their functional impact (synonymous, missense and nonsense) as a function of allele frequency. Note that the data from both groups was normalized for a sample size of n = 21 and that the Africans show significantly (χ2 P < 1 × 10-15) more rare variants across all sites classes. b, Result of 1,000 bootstrap replica of the RX/Y test for a subset of pigmentation genes highlighted by Genome Wide Association Studies (GWAS, n = 32). The horizontal line provides the African reference (x = 1) against which all other groups are compared. The blue and red marks show the 95th and the 5th percentile of the bootstrap distributions respectively. If the 95th percentile is below 1, then the population shows a significant excess of missense variants in the pigmentation subset relative to the Africans. Note that this is the case for all non-Africans except the Oceanians. c, Pools of individuals for selection scans. fineSTRUCTURE-based co-ancestry matrix was used to define twelve groups of populations for the downstream selection scans. These groups are highlighted in the plot by boxes with broken line edges. The number of individuals in each group is reported in Supplementary Table 1:3.2-I.

  7. Extended Data Figure 7: Length of haplotypes assigned as African by fineSTRUCTURE as a function of genome proportion. (565 KB)

    a, 447 Diversity Panel results, showing label averages (large crosses) along with individuals (small dots). b, Relative excluded Diversity Panel results, to check for whether including related individuals affects African genome fraction. Individuals that shared more than 2% of genome fraction were forbidden from receiving haplotypes from each other, and the painting was re-run on a large subset of the genome (all run of homozygosity (ROH) regions from any individual). c, ROH-only African haplotypes. To guard against phasing errors, we analysed only regions for which an individual was in a long (>500 kb) run of homozygosity using the PLINK command ‘–homozyg-window-kb 500000–homozyg-window-het 0–homozyg-density 10’. Because there are so few such regions, we report only the population average for populations with two or more individuals, as well as the standard error in that estimate. Populations for which the 95% confidence interval passed 0 were also excluded. Note the logarithmic axis. d, Ancient DNA panel results. We used a different panel of 109 individuals which included three ancient genomes. We painted chromosomes 11, 21 and 22 and report as crosses the population averages for populations with two or more individuals. The solid thin lines represent the position of each population when modern samples only are analysed. The dashed lines lead off the figure to the position of the ancient hominins and the African samples.

  8. Extended Data Figure 8: MSMC Linear behaviour of MSMC split estimates in presence of admixture. (255 KB)

    ac, The examined Central Asian (a), East African (b), and African–American (c) genomes yielded a signature of MSMC split time (truth, left-most column) that could be recapitulated (reconstruction, second left-most column) as a linear mixture of other MSMC split times. The admixture proportions inferred by our method (top of each admixture component column) were remarkably similar to the ones previously reported from the literature. d, MSMC split times calculated after re-phasing an Estonian and a Papuan (Koinanbe) genome together with all the available West African and Pygmy genomes from our dataset to minimize putative phasing artefacts. The cross coalescence rate curves reported here are quantitatively comparable with the ones of Fig. 2a, hence showing that phasing artefacts are unlikely to explain the observed past-ward shift of the Papuan–African split time. e, Box plot showing the distribution of differences between African–Papuan and African–Eurasian split times obtained from coalescent simulations assembled through random replacement to make 2,000 sets of 6 individuals (to match the 6 Papuans available from our empirical dataset), each made of 1.5 Gb of sequence. The simulation command line used to generate each chromosome made of 5 Mb was as follows, where x is the variable for the divergence time used. x = 0.064, 0.4 or 0.8 for the xOoA, Denisova (Den) and Divergent Denisova (DeepDen) cases, respectively. ms0ancient2 10 1. 065.05 -t 5000. -r 3000. 5000000 -I 7 1 1 1 1 2 2 2 -en 0. 1 .2 -en 0. 2 .2 -en 0. 3 .2 -en 0. 4 .2 -es .025 7.96 -en .025 8.2 -ej.03 7 6 -ej.04 6 5 -ej.060 8 3 -ej.061 4 3 -ej.062 2 1 -ej.063 3 1 -ej x 1 5.

  9. Extended Data Figure 9: Modelling the xOoA components with FineSTRUCTURE. (690 KB)

    a, Joint distribution of haplotype lengths and derived allele count, showing the median position of each cluster and all haplotypes assigned to it in the maximum a posteriori (MAP) estimate. Note that although a different proportion of points is assigned to each in the MAP, the total posterior is very close to 1/K for all. The dashed lines show a constant mutation rate. Haplotypes are ordered by mutation rate from low to high. b, Residual distribution comparison between the two-component mixture using EUR.AFR and EUR.PNG (left), and the three-component mixture including xOoA (using the same colour scale) (right). The root mean square error (RMSE) residuals without xOoA are larger (RMSE = 0.0055 compared to RMSE = 0.0018) but more importantly, they are also structured. c, Assuming a mutational clock and a correct assignment of haplotypes, we can estimate the relative age of the splits from the number of derived alleles observed on the haplotypes. This leads to an estimate of 1.5 times older for xOoA compared to the Eurasian–Africa split.

  10. Extended Data Figure 10: Proposed xOoA model. (218 KB)

    A schematic illustrating, as suggested by the results presented here, a model of an early, extinct Out-of-Africa (xOoA) signature in the genomes of Sahul populations at their arrival in the region. Given the overall small genomic contribution of this event to the genomes of modern Sahul individuals, we could not determine whether the documented Denisova admixture (question marks) and putative multiple Neanderthal admixtures took place along this extinct OoA. We also speculate (question mark) people who migrated along the xOoA route may have left a trace in the genomes of the Altai Neanderthal as reported by Kuhlwilm and colleagues12.

Supplementary information

PDF files

  1. Supplementary Information (45.6 MB)

    This file contains Supplementary Text and Data, Supplementary Figures, Supplementary Tables and additional references (see Contents for more details).

Excel files

  1. Supplementary Tables (7 MB)

    This file contains Supplementary Tables.

Additional data