Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

The evolution and population diversity of human-specific segmental duplications

Abstract

Segmental duplications contribute to human evolution, adaptation and genomic instability but are often poorly characterized. We investigate the evolution, genetic variation and coding potential of human-specific segmental duplications (HSDs). We identify 218 HSDs based on analysis of 322 deeply sequenced archaic and contemporary hominid genomes. We sequence 550 human and nonhuman primate genomic clones to reconstruct the evolution of the largest, most complex regions with protein-coding potential (N = 80 genes from 33 gene families). We show that HSDs are non-randomly organized, associate preferentially with ancestral ape duplications termed ‘core duplicons’ and evolved primarily in an interspersed inverted orientation. In addition to Homo sapiens-specific gene expansions (such as TCAF1/TCAF2), we highlight ten gene families (for example, ARHGAP11B and SRGAP2C) where copy number never returns to the ancestral state, there is evidence of mRNA splicing and no common gene-disruptive mutations are observed in the general population. Such duplicates are candidates for the evolution of human-specific adaptive traits.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Identification of HSDs.
Figure 2: Timing of HSDs.
Figure 3: Human CN diversity.
Figure 4: CN polymorphism across diverse populations of TCAF1 and TCAF2 HSDs.
Figure 5: Complex models of HSD evolutionary history.

Similar content being viewed by others

References

  1. O’Bleness, M., Searles, V. B., Varki, A., Gagneux, P. & Sikela, J. M. Evolution of genetic and genomic features unique to the human lineage. Nat. Rev. Genet. 13, 853–866 (2012).

    Article  Google Scholar 

  2. Gallego Romero, I. et al. A panel of induced pluripotent stem cells from chimpanzees: a resource for comparative functional genomics. eLife 4, e07103 (2015).

    Article  Google Scholar 

  3. Khan, Z. et al. Primate transcript and protein expression levels evolve under compensatory selection pressures. Science 342, 1100–1104 (2013).

    Article  CAS  Google Scholar 

  4. McLean, C. Y. et al. Human-specific loss of regulatory DNA and the evolution of human-specific traits. Nature 471, 216–219 (2011).

    Article  CAS  Google Scholar 

  5. Prescott, S. L. et al. Enhancer divergence and cis-regulatory evolution in the human and chimp neural crest. Cell 163, 68–83 (2015).

    Article  CAS  Google Scholar 

  6. Vermunt, M. W. et al. Epigenomic annotation of gene regulatory alterations during evolution of the primate brain. Nat. Neurosci. 19, 494–503 (2016).

    Article  CAS  Google Scholar 

  7. Eichler, E. E., Clark, R. A. & She, X. An assessment of the sequence gaps: unfinished business in a finished human genome. Nat. Rev. Genet. 5, 345–354 (2004).

    Article  CAS  Google Scholar 

  8. Ohno, S. Evolution by Gene Duplication (Springer-Verlag, 1970).

    Book  Google Scholar 

  9. Boyd, J. L. et al. Human–chimpanzee differences in a FZD8 enhancer alter cell-cycle dynamics in the developing neocortex. Curr. Biol. 25, 772–779 (2015).

    Article  CAS  Google Scholar 

  10. Charrier, C. et al. Inhibition of SRGAP2 function by its human-specific paralogs induces neoteny during spine maturation. Cell 149, 923–935 (2012).

    Article  CAS  Google Scholar 

  11. Dennis, M. Y. et al. Evolution of human-specific neural SRGAP2 genes by incomplete segmental duplication. Cell 149, 912–922 (2012).

    Article  CAS  Google Scholar 

  12. Florio, M. et al. Human-specific gene ARHGAP11B promotes basal progenitor amplification and neocortex expansion. Science 347, 1465–1470 (2015).

    Article  CAS  Google Scholar 

  13. Marques-Bonet, T. et al. A burst of segmental duplications in the genome of the African great ape ancestor. Nature 457, 877–881 (2009).

    Article  CAS  Google Scholar 

  14. Sudmant, P. H. et al. Evolution and diversity of copy number variation in the great ape lineage. Genome Res. 23, 1373–1382 (2013).

    Article  CAS  Google Scholar 

  15. Bailey, J. A. & Eichler, E. E. Primate segmental duplications: crucibles of evolution, diversity and disease. Nat. Rev. Genet. 7, 552–564 (2006).

    Article  CAS  Google Scholar 

  16. Fortna, A. et al. Lineage-specific gene duplication and loss in human and great ape evolution. PLoS Biol. 2, E207 (2004).

    Article  Google Scholar 

  17. Locke, D. P. et al. Large-scale variation among human and great ape genomes determined by array comparative genomic hybridization. Genome Res. 13, 347–357 (2003).

    Article  CAS  Google Scholar 

  18. Cheng, Z. et al. A genome-wide comparison of recent chimpanzee and human segmental duplications. Nature 437, 88–93 (2005).

    Article  CAS  Google Scholar 

  19. Sudmant, P. H. et al. Diversity of human copy number variation and multicopy genes. Science 330, 641–646 (2010).

    Article  CAS  Google Scholar 

  20. Chaisson, M. J. et al. Resolving the complexity of the human genome using single-molecule sequencing. Nature 517, 608–611 (2015).

    Article  CAS  Google Scholar 

  21. Eichler, E. E. Segmental duplications: what's missing, misassigned, and misassembled—and should we care? Genome Res. 11, 653–656 (2001).

    Article  CAS  Google Scholar 

  22. Fan, J. B. et al. Paternal origins of complete hydatidiform moles proven by whole genome single-nucleotide polymorphism haplotyping. Genomics 79, 58–62 (2002).

    Article  CAS  Google Scholar 

  23. Kajii, T. & Ohama, K. Androgenetic origin of hydatidiform mole. Nature 268, 633–634 (1977).

    Article  CAS  Google Scholar 

  24. Destouni, A. et al. Zygotes segregate entire parental genomes in distinct blastomere lineages causing cleavage-stage chimerism and mixoploidy. Genome Res. 26, 567–578 (2016).

    Article  CAS  Google Scholar 

  25. Itsara, A. et al. Population analysis of large copy number variants and hotspots of human genetic disease. Am. J. Hum. Genet. 84, 148–161 (2009).

    Article  CAS  Google Scholar 

  26. Antonacci, F. et al. Palindromic GOLGA8 core duplicons promote chromosome 15q13.3 microdeletion and evolutionary instability. Nat. Genet. 46, 1293–1302 (2014).

    Article  CAS  Google Scholar 

  27. Jiang, Z. et al. Ancestral reconstruction of segmental duplications reveals punctuated cores of human genome evolution. Nat. Genet. 39, 1361–1368 (2007).

    Article  CAS  Google Scholar 

  28. Steinberg, K. M. et al. Structural diversity and African origin of the 17q21.31 inversion polymorphism. Nat. Genet. 44, 872–880 (2012).

    Article  CAS  Google Scholar 

  29. Sawyer, S. Statistical tests for detecting gene conversion. Mol. Biol. Evol. 6, 526–538 (1989).

    CAS  PubMed  Google Scholar 

  30. Sudmant, P. H. et al. Global diversity, population stratification, and selection of human copy-number variation. Science 349, aab3761 (2015).

    Article  Google Scholar 

  31. Sudmant, P. H. et al. An integrated map of structural variation in 2,504 human genomes. Nature 526, 75–81 (2015).

    Article  CAS  Google Scholar 

  32. Prufer, K. et al. The complete genome sequence of a Neanderthal from the Altai Mountains. Nature 505, 43–49 (2014).

    Article  Google Scholar 

  33. Meyer, M. et al. A high-coverage genome sequence from an archaic Denisovan individual. Science 338, 222–226 (2012).

    Article  CAS  Google Scholar 

  34. Prado-Martinez, J. et al. Great ape genetic diversity and population history. Nature 499, 471–475 (2013).

    Article  CAS  Google Scholar 

  35. Redon, R. et al. Global variation in copy number in the human genome. Nature 444, 444–454 (2006).

    Article  CAS  Google Scholar 

  36. Nuttle, X. et al. Emergence of a Homo sapiens-specific gene family and chromosome 16p11.2 CNV susceptibility. Nature 536, 205–209 (2016).

    Article  CAS  Google Scholar 

  37. Gkika, D. et al. TRP channel-associated factors are a novel protein family that regulates TRPM8 trafficking and activity. J. Cell Biol. 208, 89–107 (2015).

    Article  CAS  Google Scholar 

  38. The GTEx Consortium The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348, 648–660 (2015).

    Article  Google Scholar 

  39. Hiatt, J. B., Pritchard, C. C., Salipante, S. J., O'Roak, B. J. & Shendure, J. Single molecule molecular inversion probes for targeted, high-accuracy detection of low-frequency variation. Gen. Res. 23, 843–854 (2013).

    Article  CAS  Google Scholar 

  40. Sanders, S. J. et al. Multiple recurrent de novo CNVs, including duplications of the 7q11.23 Williams syndrome region, are strongly associated with autism. Neuron 70, 863–885 (2011).

    Article  CAS  Google Scholar 

  41. Antonell, A., de Luis, O., Domingo-Roura, X. & Perez-Jurado, L. A. Evolutionary mechanisms shaping the genomic structure of the Williams–Beuren syndrome chromosomal region at human 7q11.23. Gen. Res. 15, 1179–1188 (2005).

    Article  CAS  Google Scholar 

  42. Zhang, Q. & Su, B. Evolutionary origin and human-specific expansion of a cancer/testis antigen gene family. Mol. Biol. Evol. 31, 2365–2375 (2014).

    Article  CAS  Google Scholar 

  43. Lee, J. A., Carvalho, C. M. & Lupski, J. R. A DNA replication mechanism for generating nonrecurrent rearrangements associated with genomic disorders. Cell 131, 1235–1247 (2007).

    Article  CAS  Google Scholar 

  44. Carvalho, C. M. et al. Inverted genomic segments and complex triplication rearrangements are mediated by inverted repeats in the human genome. Nat. Genet. 43, 1074–1081 (2011).

    Article  CAS  Google Scholar 

  45. Park, S. S. et al. Structure and evolution of the Smith–Magenis syndrome repeat gene clusters, SMS-REPs. Gen. Res. 12, 729–738 (2002).

    Article  Google Scholar 

  46. Hastings, P. J., Ira, G. & Lupski, J. R. A microhomology-mediated break-induced replication model for the origin of human copy number variation. PLoS Genet. 5, e1000327 (2009).

    Article  CAS  Google Scholar 

  47. Iida, N. & Kozasa, T. Identification and biochemical analysis of GRIN1 and GRIN2. Methods Enzymol. 390, 475–483 (2004).

    Article  CAS  Google Scholar 

  48. Chen, L. T., Gilman, A. G. & Kozasa, T. A candidate target for G protein action in brain. J. Biol. Chem. 274, 26931–26938 (1999).

    Article  CAS  Google Scholar 

  49. Bamford, R. N. et al. Loss-of-function mutations in the EGF-CFC gene CFC1 are associated with human left-right laterality defects. Nat. Genet. 26, 365–369 (2000).

    Article  CAS  Google Scholar 

  50. Sha, B. Y. et al. Genome-wide association study suggested copy number variation may be associated with body mass index in the Chinese population. J. Hum. Genet. 54, 199–202 (2009).

    Article  CAS  Google Scholar 

  51. Dumont, B. L. Interlocus gene conversion explains at least 2.7% of single nucleotide variants in human segmental duplications. BMC Genomics 16, 456 (2015).

    Article  Google Scholar 

  52. Fawcett, J. A. & Innan, H. The role of gene conversion in preserving rearrangement hotspots in the human genome. Trends Genet. 29, 561–568 (2013).

    Article  CAS  Google Scholar 

  53. Nuttle, X. et al. Rapid and accurate large-scale genotyping of duplicated genes and discovery of interlocus gene conversions. Nat. Methods 10, 903–909 (2013).

    Article  CAS  Google Scholar 

  54. Colburn, R. W. et al. Attenuated cold sensitivity in TRPM8 null mice. Neuron 54, 379–386 (2007).

    Article  CAS  Google Scholar 

  55. Bautista, D. M. et al. The menthol receptor TRPM8 is the principal detector of environmental cold. Nature 448, 204–208 (2007).

    Article  CAS  Google Scholar 

  56. Bailey, J. A. et al. Recent segmental duplications in the human genome. Science 297, 1003–1007 (2002).

    Article  CAS  Google Scholar 

  57. Bailey, J. A., Yavor, A. M., Massa, H. F., Trask, B. J. & Eichler, E. E. Segmental duplications: organization and impact within the current human genome project assembly. Genome Res. 11, 1005–1017 (2001).

    Article  CAS  Google Scholar 

  58. Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).

    Article  CAS  Google Scholar 

  59. Kent, W. J. BLAT—the BLAST-like alignment tool. Genome Res. 12, 656–664 (2002).

    Article  CAS  Google Scholar 

  60. Quinlan, A. R. BEDTools: the Swiss-Army tool for genome feature analysis. Curr. Protoc. Bioinformatics 47, 11–34 (2014).

    Article  Google Scholar 

  61. Huddleston, J. et al. Reconstructing complex regions of genomes using long-read sequencing technology. Genome Res. 24, 688–696 (2014).

    Article  CAS  Google Scholar 

  62. Katoh, K. & Standley, D. M. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772–780 (2013).

    Article  CAS  Google Scholar 

  63. Waterhouse, A. M., Procter, J. B., Martin, D. M., Clamp, M. & Barton, G. J. Jalview version 2—a multiple sequence alignment editor and analysis workbench. Bioinformatics 25, 1189–1191 (2009).

    Article  CAS  Google Scholar 

  64. Tamura, K., Stecher, G., Peterson, D., Filipski, A. & Kumar, S. MEGA6: molecular evolutionary genetics analysis version 6.0. Mol. Biol. Evol. 30, 2725–2729 (2013).

    Article  CAS  Google Scholar 

  65. Fu, Q. et al. Genome sequence of a 45,000-year-old modern human from western Siberia. Nature 514, 445–449 (2014).

    Article  CAS  Google Scholar 

  66. Lazaridis, I. et al. Ancient human genomes suggest three ancestral populations for present-day Europeans. Nature 513, 409–413 (2014).

    Article  CAS  Google Scholar 

  67. Hach, F. et al. mrsFAST: a cache-oblivious algorithm for short-read mapping. Nat. Methods 7, 576–577 (2010).

    Article  CAS  Google Scholar 

  68. Antonacci, F. et al. A large and complex structural polymorphism at 16p12.1 underlies microdeletion disease risk. Nat. Genet. 42, 745–750 (2010).

    Article  CAS  Google Scholar 

  69. Patro, R., Mount, S. M. & Kingsford, C. Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms. Nat. Biotechnol. 32, 462–464 (2014).

    Article  CAS  Google Scholar 

  70. Boyle, E. A., O'Roak, B. J., Martin, B. K., Kumar, A. & Shendure, J. MIPgen: optimized modeling and design of molecular inversion probes for targeted resequencing. Bioinformatics 30, 2670–2672 (2014).

    Article  CAS  Google Scholar 

  71. O'Roak, B. J. et al. Multiplex targeted sequencing identifies recurrently mutated genes in autism spectrum disorders. Science 338, 1619–1622 (2012).

    Article  CAS  Google Scholar 

  72. Cunningham, F. et al. Ensembl 2015. Nucleic Acids Res. 43, D662–D669 (2015).

    Article  CAS  Google Scholar 

  73. Fischbach, G. D. & Lord, C. The Simons Simplex Collection: a resource for identification of autism genetic risk factors. Neuron 68, 192–195 (2010).

    Article  CAS  Google Scholar 

  74. Geschwind, D. H. et al. The autism genetic resource exchange: a resource for the study of autism and related neuropsychiatric conditions. Am. J. Hum. Genet. 69, 463–466 (2001).

    Article  CAS  Google Scholar 

  75. Buxbaum, J. D. et al. The Autism Simplex Collection: an international, expertly phenotyped autism sample for genetic and phenotypic analyses. Mol. Autism 5, 34 (2014).

    Article  Google Scholar 

  76. Brunet, M. et al. New material of the earliest hominid from the Upper Miocene of Chad. Nature 434, 752–755 (2005).

    Article  CAS  Google Scholar 

  77. Brunet, M. et al. A new hominid from the Upper Miocene of Chad, Central Africa. Nature 418, 145–151 (2002).

    Article  CAS  Google Scholar 

  78. Vignaud, P. et al. Geology and palaeontology of the Upper Miocene Toros-Menalla hominid locality, Chad. Nature 418, 152–155 (2002).

    Article  CAS  Google Scholar 

  79. Jiang, Z., Hubley, R., Smit, A. & Eichler, E. E. DupMasker: a tool for annotating primate segmental duplications. Genome Res. 18, 1362–1368 (2008).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

We would like to thank many individuals that contributed to the results described here. We thank B. Coe for assistance in statistical analyses, T. Brown for manuscript editing, and L. Vives, T. Wang and B. Xiong for technical assistance with MIP sequencing. We would also like to thank B. Dumont, C. Campbell, K. Meltz Steinberg, S. Girirajan, C. Payan, C. Alkan and E. Karakoc for helpful discussion and advice. We thank M. Kremitzki for technical support in generating sequence maps and contigs of BACs not currently included in the human reference sequence. For DNA samples used in MIP sequencing, we would like to thank the investigators and families participating in the 1KG Project, Autism Speaks, TASC, and SSC. Additionally, we would like to thank the principal investigators involved in the SSC (A. Beaudet, R. Bernier, J. Constantino, E. Cook, E. Fombonne, D. Geschwind, R. Goin-Kochel, E. Hanson, D. Grice, A. Klin, D. Ledbetter, C. Lord, C. Martin, D. Martin, R. Maxim, J. Miles, O. Ousley, K. Pelphrey, B. Peterson, J. Piggot, C. Saulnier, M. State, W. Stone, J. Sutcliffe, C. Walsh, Z. Warren and E. Wijsman). Approved researchers can obtain the SSC population data set described in this study (https://sfari.org/resources/autism-cohorts/simons-simplex-collection) by applying at https://base.sfari.org. The BAC clones from the complete hydatidiform mole were derived from a cell line created by U. Surti. This work was supported, in part, by US National Institutes of Health (NIH) grants from NINDS (R00NS083627 to M.Y.D.), NIMH (R01MH101221 to E.E.E.) and NHGRI (R01HG002385 to E.E.E., and U41HG007635 to R.K.W. and E.E.E.), as well as The Paul G. Allen Family Foundation (11631 to E.E.E.). S.C. is supported by a National Health and Medical Research Council (NHMRC) CJ Martin Biomedical Fellowship (#1073726). E.E.E. is an investigator of the Howard Hughes Medical Institute.

Author information

Authors and Affiliations

Authors

Contributions

M.Y.D. and E.E.E. conceived and designed the experiments. M.Y.D., L.H., B.J.N., O.P., S.C., J.H., F.A., L.D., K.M. and C.E. performed the experiments. M.Y.D., L.H., B.J.N., O.P., S.C. and J.H. analysed data. M.Y.D., K.P., A.R., C.B., M.M., N.J. and K.H. provided technical support. H.A.F.S., X.N., T.A.L.G., R.K.W. and E.E.E. provided materials or analyses tools. M.Y.D., L.H., B.J.N., O.P., S.C. and E.E.E. wrote the paper.

Corresponding author

Correspondence to Evan E. Eichler.

Ethics declarations

Competing interests

E.E.E. is on the scientific advisory board (SAB) of DNAnexus, Inc., is a consultant for Kunming University of Science and Technology (KUST) as part of the 1000 China Talent Program (2014–2016), and was an SAB member of Pacific Biosciences, Inc. (2009–2013).

Supplementary information

Supplementary Information

Supplementary Figures 1–26; Supplementary Tables 7, 12, 13, 15, 17 and 23; Supplementary Note; Supplementary Discussion; Supplementary Methods; and Supplementary References. (PDF 4818 kb)

Supplementary Tables

Supplementary Tables 1–6, 8–11, 14, 16 and 18–22. (XLSX 1403 kb)

Supplementary Dataset 1

Duplication sequence data including: (1) multiple-sequence alignment (MSA) of HSDs (fastas; labelled 1 through 19 corresponding to labels in Supplementary Dataset 2); (2) visualization of pairwise alignments from MSAs (Align_slider.pdf); and (3) contigs of HSD regions of CH17 BACs not included in reference genome (CH17_contigs.fasta). (ZIP 7776 kb)

Supplementary Dataset 2

Evolutionary analyses of HSDs (XLSX 2749 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dennis, M., Harshman, L., Nelson, B. et al. The evolution and population diversity of human-specific segmental duplications. Nat Ecol Evol 1, 0069 (2017). https://doi.org/10.1038/s41559-016-0069

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1038/s41559-016-0069

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing