The genomic landscape of Neanderthal ancestry in present-day humans

Journal name:
Nature
Volume:
507,
Pages:
354–357
Date published:
DOI:
doi:10.1038/nature12961
Received
Accepted
Published online

Genomic studies have shown that Neanderthals interbred with modern humans, and that non-Africans today are the products of this mixture1, 2. The antiquity of Neanderthal gene flow into modern humans means that genomic regions that derive from Neanderthals in any one human today are usually less than a hundred kilobases in size. However, Neanderthal haplotypes are also distinctive enough that several studies have been able to detect Neanderthal ancestry at specific loci1, 3, 4, 5, 6, 7, 8. We systematically infer Neanderthal haplotypes in the genomes of 1,004 present-day humans9. Regions that harbour a high frequency of Neanderthal alleles are enriched for genes affecting keratin filaments, suggesting that Neanderthal alleles may have helped modern humans to adapt to non-African environments. We identify multiple Neanderthal-derived alleles that confer risk for disease, suggesting that Neanderthal alleles continue to shape human biology. An unexpected finding is that regions with reduced Neanderthal ancestry are enriched in genes, implying selection to remove genetic material derived from Neanderthals. Genes that are more highly expressed in testes than in any other tissue are especially reduced in Neanderthal ancestry, and there is an approximately fivefold reduction of Neanderthal ancestry on the X chromosome, which is known from studies of diverse species to be especially dense in male hybrid sterility genes10, 11, 12. These results suggest that part of the explanation for genomic regions of reduced Neanderthal ancestry is Neanderthal alleles that caused decreased fertility in males when moved to a modern human genetic background.

At a glance

Figures

  1. Maps of Neanderthal ancestry.
    Figure 1: Maps of Neanderthal ancestry.

    a, Individual maps; the marginal probability of Neanderthal ancestry for one European-American, one east-Asian and one sub-Saharan-African phased genome across chromosome 9. b, Population maps; estimate of the proportion of Neanderthal ancestry in European individuals (red) and east-Asian individuals (green), averaged across all individuals from each population in non-overlapping 100-kb windows on chromosome 9. The black bar denotes the coordinates of the centromere. The plot is limited to segments of the chromosome that pass filters (see Supplementary Information section 8). CEU, residents of Utah, US, with northern and western European ancestry (from the Centre d’Etude du Polymorphisme Humain (CEPH) collection); CHB, Han Chinese in Beijing, China; LWK, African Luhya in Webuye, Kenya.

  2. Functionally important regions are deficient in Neanderthal ancestry.
    Figure 2: Functionally important regions are deficient in Neanderthal ancestry.

    The median of the proportion of Neanderthal ancestry (estimated as the average over the marginal probability of Neanderthal ancestry assigned to each individual allele at a SNP) within quintiles of a B statistic that measures proximity to functionally important regions (1-low, 5-high). We show results on the autosomes and the X chromosome, and in European and east-Asian populations.

  3. Three features used in the Conditional Random Field for predicting Neanderthal ancestry.
    Extended Data Fig. 1: Three features used in the Conditional Random Field for predicting Neanderthal ancestry.

    Top (feature 1), patterns of variation at a single SNP. Sites at which a panel of sub-Saharan-African individuals carry the ancestral allele and in which the sequenced Neanderthal and the test haplotype carry the derived allele are likely to be derived from Neanderthal gene flow. Middle (feature 2), haplotype divergence patterns. Genomic segments in which the divergence of the test haplotype to the sequenced Neanderthal is low, whereas the divergence to a panel of sub-Saharan-African individuals is high, are likely to be introgressed. Bottom (feature 3), we searched for segments that have a length consistent with what is expected from Neanderthal-to-modern-human gene flow approximately 2,000 generations ago, corresponding to a size of about 0.05 cM = (100 cM per Morgan)/(2,000 generations).

  4. Map of Neanderthal ancestry in 1000 Genomes European and east-Asian populations.
    Extended Data Fig. 2: Map of Neanderthal ancestry in 1000 Genomes European and east-Asian populations.

    For each chromosome, we plot the fraction of alleles confidently inferred to be of Neanderthal origin (probability >90%) in non-overlapping 1-Mb windows in Europeans (red) and in east Asians (green). Black bars denote the coordinates of the centromeres. We plot traces in non-overlapping 10-Mb windows that pass filters. We label 10-Mb-scale windows that are deficient in Neanderthal ancestry (e1–e9 (e, European), a1–a17 (a, Asian)) (see Supplementary Information section 8 for details).

  5. Tiling path from confidently inferred Neanderthal haplotypes.
    Extended Data Fig. 3: Tiling path from confidently inferred Neanderthal haplotypes.

    a, Example tiling path at the BNC2 locus on chromosome 9 in European individuals. Red, confidently inferred Neanderthal haplotypes in a subset of these individuals; blue, resulting tiling path. We identified Neanderthal haplotypes by scanning for runs of consecutive SNPs along a haplotype with a marginal probability >90% and requiring the haplotypes to be at least 0.02 cM long. b, Distribution of contig lengths obtained by constructing a tiling path across confidently inferred Neanderthal haplotypes. On merging Neanderthal haplotypes in each of the 1000 Genomes European and east-Asian populations, we reconstructed 4,437 Neanderthal contigs with median length 129 kb.

Tables

  1. Gene categories enriched or depleted in Neanderthal ancestry
    Extended Data Table 1: Gene categories enriched or depleted in Neanderthal ancestry
  2. Neanderthal-derived alleles that have been associated with phenotypes in genome-wide association studies
    Extended Data Table 2: Neanderthal-derived alleles that have been associated with phenotypes in genome-wide association studies
  3. Recall of the CRF as a function of the effective population size
    Extended Data Table 3: Recall of the CRF as a function of the effective population size
  4. Unbiased estimate of the proportion of Neanderthal ancestry as a function of the B statistic
    Extended Data Table 4: Unbiased estimate of the proportion of Neanderthal ancestry as a function of the B statistic
  5. Recall of the CRF on the X chromosome versus the autosomes
    Extended Data Table 5: Recall of the CRF on the X chromosome versus the autosomes

References

  1. Green, R. E. et al. A draft sequence of the Neanderthal genome. Science 328, 710722 (2010)
  2. Prüfer, K. et al. The complete genome sequence of a Neanderthal from the Altai Mountains. Nature 505, 4349 (2014)
  3. Abi-Rached, L. et al. The shaping of modern human immune systems by multiregional admixture with archaic humans. Science 334, 8994 (2011)
  4. Mendez, F. L., Watkins, J. C. & Hammer, M. F. A haplotype at STAT2 Introgressed from neanderthals and serves as a candidate of positive selection in Papua New Guinea. Am. J. Hum. Genet. 91, 265274 (2012)
  5. Mendez, F. L., Watkins, J. C. & Hammer, M. F. Neanderthal origin of genetic variation at the cluster of OAS immunity genes. Mol. Biol. Evol. 30, 798801 (2013)
  6. Yotova, V. et al. An X-linked haplotype of Neanderthal origin is present among all non-African populations. Mol. Biol. Evol. 28, 19571962 (2011)
  7. Wall, J. D. et al. Higher levels of neanderthal ancestry in East Asians than in Europeans. Genetics 194, 199209 (2013)
  8. Lachance, J. et al. Evolutionary history and adaptation from high-coverage whole-genome sequences of diverse African hunter-gatherers. Cell 150, 457469 (2012)
  9. The 1000 Genomes Project Consortium An integrated map of genetic variation from 1,092 human genomes. Nature 491, 5665 (2012)
  10. Tucker, P. K., Sage, R. D., Wilson, A. C. & Eichler, E. M. Abrupt cline for sex chromosomes in a hybrid zone between two species of mice. Evolution 46, 11461163 (1992)
  11. Good, J. M., Dean, M. D. & Nachman, M. W. A complex genetic basis to X-linked hybrid male sterility between two species of house mice. Genetics 179, 22132228 (2008)
  12. Presgraves, D. C. Sex chromosomes and speciation in Drosophila. Trends Genet. 24, 336343 (2008)
  13. Lafferty, J., McCallum, A. & Pereira, F. C. N. Conditional random fields: probabilistic models for segmenting and labeling sequence data. Proc. 18th Int. Conf. Machine Learn. 282289. (2001)
  14. Sankararaman, S., Patterson, N., Li, H., Paabo, S. & Reich, D. The date of interbreeding between Neanderthals and modern humans. PLoS Genet. 8, e1002947 (2012)
  15. Hellenthal, G. & Stephens, M. msHOT: modifying Hudson's ms simulator to incorporate crossover and gene conversion hotspots. Bioinformatics 23, 520521 (2007)
  16. Paten, B., Herrero, J., Beal, K., Fitzgerald, S. & Birney, E. Enredo and Pecan: genome-wide mammalian consistency-based multiple alignment with paralogs. Genome Res. 18, 18141828 (2008)
  17. Meyer, M. et al. A high-coverage genome sequence from an archaic Denisovan individual. Science 338, 222226 (2012)
  18. Durand, E. Y. Neanderthal Ancestry Estimator White paper 23-05 http://23andme.https.internapcdn.net/res/pdf/hXitekfSJe1lcIy7-Q72XA_23-05_Neanderthal_Ancestry.pdf (23andMe, 2011)
  19. Hindorff, L. A. et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl Acad. Sci. USA 106, 93629367 (2009)
  20. The SIGMA Type 2 Diabetes Consortium Sequence variants in SLC16A11 are a common risk factor for type 2 diabetes in Mexico. Nature http://dx.doi.org/10.1038/nature12828 (25 December 2014)
  21. McVicker, G., Gordon, D., Davis, C. & Green, P. Widespread genomic signatures of natural selection in hominid evolution. PLoS Genet. 5, e1000471 (2009)
  22. Reich, D. et al. Genetic history of an archaic hominin group from Denisova Cave in Siberia. Nature 468, 10531060 (2010)
  23. Coyne, J. A. O. H. A. Speciation and Its Consequences (eds Otte, D. & Endler, J. A., 180207 Sinauer Associates, 1989)
  24. Wu, C.-I. & Davis, A. W. Evolution of postmating reproductive isolation: the composite nature of Haldane's rule and its genetic basis. Am. Nat. 142, 187212 (1993)
  25. Derrien, T. et al. The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome Res. 22, 17751789 (2012)
  26. Keinan, A., Mullikin, J. C., Patterson, N. & Reich, D. Measurement of the human allele frequency spectrum demonstrates greater genetic drift in East Asians than in Europeans. Nature Genet. 39, 12511255 (2007)
  27. Bhatia, G. et al. Genome-wide scan of 29,141 African Americans finds no evidence of selection since admixture. Preprint at http://arxiv.org/pdf/1312.2675.pdf (2013)
  28. Orr, H. A. & Turelli, M. The evolution of postzygotic isolation: accumulating Dobzhansky-Muller incompatibilities. Evolution 55, 10851094 (2001)
  29. Myers, S., Bottolo, L., Freeman, C., McVean, G. & Donnelly, P. A fine-scale map of recombination rates and hotspots across the human genome. Science 310, 321324 (2005)
  30. Sutton, C. & McCallum, A. in Introduction to Statistical Relational Learning (eds Getoor, L. & Taskar, B.) Ch. 4, 93128 (MIT Press, 2007)
  31. Byrd, R. H., Nocedal, J. & Schnabel, R. B. Representations of quasi-Newton matrices and their use in limited memory methods. Mathematical Programming 63, 129156 (1994)
  32. Gravel, S. Population genetics models of local ancestry. Genetics 191, 607619 (2012)
  33. Pruitt, K. D. et al. The consensus coding sequence (CCDS) project: identifying a common protein-coding gene set for the human and mouse genomes. Genome Res. 19, 13161323 (2009)
  34. Ashburner, M. et al. Gene ontology: tool for the unification of biology. Nature Genet. 25, 2529 (2000)
  35. Prüfer, K. et al. FUNC: a package for detecting significant associations between gene sets and ontological annotations. BMC Bioinformatics 8, 41 (2007)
  36. Percival, D. B. & Walden, A. T. Wavelet Methods for Time Series Analysis. (Cambridge Univ. Press, 2005)
  37. Kunsch, H. R. The jackknife and the bootstrap for general stationary observations. Ann. Statist. 17, 12171241 (1989)
  38. Hudson, R. R. Generating samples under a Wright–Fisher neutral model of genetic variation. Bioinformatics 18, 337338 (2002)
  39. Anders, S. & Huber, W. Differential expression analysis for sequence count data. Genome Biol. 11, R106 (2010)

Download references

Author information

Affiliations

  1. Department of Genetics, Harvard Medical School, Boston, Massachusetts 02115, USA

    • Sriram Sankararaman,
    • Swapan Mallick,
    • Nick Patterson &
    • David Reich
  2. Broad Institute of Harvard and MIT, Cambridge, Massachusetts 02142, USA

    • Sriram Sankararaman,
    • Swapan Mallick,
    • Nick Patterson &
    • David Reich
  3. Max Planck Institute for Evolutionary Anthropology, Leipzig 04103, Germany

    • Michael Dannemann,
    • Kay Prüfer,
    • Janet Kelso &
    • Svante Pääbo
  4. Howard Hughes Medical Institute, Harvard Medical School, Boston, Massachusetts 02115, USA

    • David Reich

Contributions

S.S., N.P., S.P. and D.R. conceived of the study. S.S., S.M. M.D., K.P., J.K. and D.R. performed analyses. J.K., S.P., N.P. and D.R. supervised the study. S.S. and D.R. wrote the manuscript with help from all co-authors.

Competing financial interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to:

The tiling path of confidently inferred Neanderthal haplotypes, as well as the Neanderthal introgression map, can be found at http://genetics.med.harvard.edu/reichlab/Reich_Lab/Datasets.html.

Author details

Extended data figures and tables

Extended Data Figures

  1. Extended Data Figure 1: Three features used in the Conditional Random Field for predicting Neanderthal ancestry. (252 KB)

    Top (feature 1), patterns of variation at a single SNP. Sites at which a panel of sub-Saharan-African individuals carry the ancestral allele and in which the sequenced Neanderthal and the test haplotype carry the derived allele are likely to be derived from Neanderthal gene flow. Middle (feature 2), haplotype divergence patterns. Genomic segments in which the divergence of the test haplotype to the sequenced Neanderthal is low, whereas the divergence to a panel of sub-Saharan-African individuals is high, are likely to be introgressed. Bottom (feature 3), we searched for segments that have a length consistent with what is expected from Neanderthal-to-modern-human gene flow approximately 2,000 generations ago, corresponding to a size of about 0.05 cM = (100 cM per Morgan)/(2,000 generations).

  2. Extended Data Figure 2: Map of Neanderthal ancestry in 1000 Genomes European and east-Asian populations. (466 KB)

    For each chromosome, we plot the fraction of alleles confidently inferred to be of Neanderthal origin (probability >90%) in non-overlapping 1-Mb windows in Europeans (red) and in east Asians (green). Black bars denote the coordinates of the centromeres. We plot traces in non-overlapping 10-Mb windows that pass filters. We label 10-Mb-scale windows that are deficient in Neanderthal ancestry (e1–e9 (e, European), a1–a17 (a, Asian)) (see Supplementary Information section 8 for details).

  3. Extended Data Figure 3: Tiling path from confidently inferred Neanderthal haplotypes. (333 KB)

    a, Example tiling path at the BNC2 locus on chromosome 9 in European individuals. Red, confidently inferred Neanderthal haplotypes in a subset of these individuals; blue, resulting tiling path. We identified Neanderthal haplotypes by scanning for runs of consecutive SNPs along a haplotype with a marginal probability >90% and requiring the haplotypes to be at least 0.02 cM long. b, Distribution of contig lengths obtained by constructing a tiling path across confidently inferred Neanderthal haplotypes. On merging Neanderthal haplotypes in each of the 1000 Genomes European and east-Asian populations, we reconstructed 4,437 Neanderthal contigs with median length 129 kb.

Extended Data Tables

  1. Extended Data Table 1: Gene categories enriched or depleted in Neanderthal ancestry (197 KB)
  2. Extended Data Table 2: Neanderthal-derived alleles that have been associated with phenotypes in genome-wide association studies (167 KB)
  3. Extended Data Table 3: Recall of the CRF as a function of the effective population size (93 KB)
  4. Extended Data Table 4: Unbiased estimate of the proportion of Neanderthal ancestry as a function of the B statistic (192 KB)
  5. Extended Data Table 5: Recall of the CRF on the X chromosome versus the autosomes (41 KB)

Supplementary information

PDF files

  1. Supplementary Information (5.5 MB)

    This file contains Supplementary Figures, Supplementary Tables and Supplementary Text and Data - see Contents for more information.

Additional data