Abstract

Segmental duplications contribute to human evolution, adaptation and genomic instability but are often poorly characterized. We investigate the evolution, genetic variation and coding potential of human-specific segmental duplications (HSDs). We identify 218 HSDs based on analysis of 322 deeply sequenced archaic and contemporary hominid genomes. We sequence 550 human and nonhuman primate genomic clones to reconstruct the evolution of the largest, most complex regions with protein-coding potential (N = 80 genes from 33 gene families). We show that HSDs are non-randomly organized, associate preferentially with ancestral ape duplications termed ‘core duplicons’ and evolved primarily in an interspersed inverted orientation. In addition to Homo sapiens-specific gene expansions (such as TCAF1/TCAF2), we highlight ten gene families (for example, ARHGAP11B and SRGAP2C) where copy number never returns to the ancestral state, there is evidence of mRNA splicing and no common gene-disruptive mutations are observed in the general population. Such duplicates are candidates for the evolution of human-specific adaptive traits.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

References

  1. 1.

    , , , & Evolution of genetic and genomic features unique to the human lineage. Nat. Rev. Genet. 13, 853–866 (2012).

  2. 2.

    et al. A panel of induced pluripotent stem cells from chimpanzees: a resource for comparative functional genomics. eLife 4, e07103 (2015).

  3. 3.

    et al. Primate transcript and protein expression levels evolve under compensatory selection pressures. Science 342, 1100–1104 (2013).

  4. 4.

    et al. Human-specific loss of regulatory DNA and the evolution of human-specific traits. Nature 471, 216–219 (2011).

  5. 5.

    et al. Enhancer divergence and cis-regulatory evolution in the human and chimp neural crest. Cell 163, 68–83 (2015).

  6. 6.

    et al. Epigenomic annotation of gene regulatory alterations during evolution of the primate brain. Nat. Neurosci. 19, 494–503 (2016).

  7. 7.

    , & An assessment of the sequence gaps: unfinished business in a finished human genome. Nat. Rev. Genet. 5, 345–354 (2004).

  8. 8.

    Evolution by Gene Duplication (Springer-Verlag, 1970).

  9. 9.

    et al. Human–chimpanzee differences in a FZD8 enhancer alter cell-cycle dynamics in the developing neocortex. Curr. Biol. 25, 772–779 (2015).

  10. 10.

    et al. Inhibition of SRGAP2 function by its human-specific paralogs induces neoteny during spine maturation. Cell 149, 923–935 (2012).

  11. 11.

    et al. Evolution of human-specific neural SRGAP2 genes by incomplete segmental duplication. Cell 149, 912–922 (2012).

  12. 12.

    et al. Human-specific gene ARHGAP11B promotes basal progenitor amplification and neocortex expansion. Science 347, 1465–1470 (2015).

  13. 13.

    et al. A burst of segmental duplications in the genome of the African great ape ancestor. Nature 457, 877–881 (2009).

  14. 14.

    et al. Evolution and diversity of copy number variation in the great ape lineage. Genome Res. 23, 1373–1382 (2013).

  15. 15.

    & Primate segmental duplications: crucibles of evolution, diversity and disease. Nat. Rev. Genet. 7, 552–564 (2006).

  16. 16.

    et al. Lineage-specific gene duplication and loss in human and great ape evolution. PLoS Biol. 2, E207 (2004).

  17. 17.

    et al. Large-scale variation among human and great ape genomes determined by array comparative genomic hybridization. Genome Res. 13, 347–357 (2003).

  18. 18.

    et al. A genome-wide comparison of recent chimpanzee and human segmental duplications. Nature 437, 88–93 (2005).

  19. 19.

    et al. Diversity of human copy number variation and multicopy genes. Science 330, 641–646 (2010).

  20. 20.

    et al. Resolving the complexity of the human genome using single-molecule sequencing. Nature 517, 608–611 (2015).

  21. 21.

    Segmental duplications: what's missing, misassigned, and misassembled—and should we care? Genome Res. 11, 653–656 (2001).

  22. 22.

    et al. Paternal origins of complete hydatidiform moles proven by whole genome single-nucleotide polymorphism haplotyping. Genomics 79, 58–62 (2002).

  23. 23.

    & Androgenetic origin of hydatidiform mole. Nature 268, 633–634 (1977).

  24. 24.

    et al. Zygotes segregate entire parental genomes in distinct blastomere lineages causing cleavage-stage chimerism and mixoploidy. Genome Res. 26, 567–578 (2016).

  25. 25.

    et al. Population analysis of large copy number variants and hotspots of human genetic disease. Am. J. Hum. Genet. 84, 148–161 (2009).

  26. 26.

    et al. Palindromic GOLGA8 core duplicons promote chromosome 15q13.3 microdeletion and evolutionary instability. Nat. Genet. 46, 1293–1302 (2014).

  27. 27.

    et al. Ancestral reconstruction of segmental duplications reveals punctuated cores of human genome evolution. Nat. Genet. 39, 1361–1368 (2007).

  28. 28.

    et al. Structural diversity and African origin of the 17q21.31 inversion polymorphism. Nat. Genet. 44, 872–880 (2012).

  29. 29.

    Statistical tests for detecting gene conversion. Mol. Biol. Evol. 6, 526–538 (1989).

  30. 30.

    et al. Global diversity, population stratification, and selection of human copy-number variation. Science 349, aab3761 (2015).

  31. 31.

    et al. An integrated map of structural variation in 2,504 human genomes. Nature 526, 75–81 (2015).

  32. 32.

    et al. The complete genome sequence of a Neanderthal from the Altai Mountains. Nature 505, 43–49 (2014).

  33. 33.

    et al. A high-coverage genome sequence from an archaic Denisovan individual. Science 338, 222–226 (2012).

  34. 34.

    et al. Great ape genetic diversity and population history. Nature 499, 471–475 (2013).

  35. 35.

    et al. Global variation in copy number in the human genome. Nature 444, 444–454 (2006).

  36. 36.

    et al. Emergence of a Homo sapiens-specific gene family and chromosome 16p11.2 CNV susceptibility. Nature 536, 205–209 (2016).

  37. 37.

    et al. TRP channel-associated factors are a novel protein family that regulates TRPM8 trafficking and activity. J. Cell Biol. 208, 89–107 (2015).

  38. 38.

    The GTEx Consortium The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348, 648–660 (2015).

  39. 39.

    , , , & Single molecule molecular inversion probes for targeted, high-accuracy detection of low-frequency variation. Gen. Res. 23, 843–854 (2013).

  40. 40.

    et al. Multiple recurrent de novo CNVs, including duplications of the 7q11.23 Williams syndrome region, are strongly associated with autism. Neuron 70, 863–885 (2011).

  41. 41.

    , , & Evolutionary mechanisms shaping the genomic structure of the Williams–Beuren syndrome chromosomal region at human 7q11.23. Gen. Res. 15, 1179–1188 (2005).

  42. 42.

    & Evolutionary origin and human-specific expansion of a cancer/testis antigen gene family. Mol. Biol. Evol. 31, 2365–2375 (2014).

  43. 43.

    , & DNA replication mechanism for generating nonrecurrent rearrangements associated with genomic disorders. Cell 131, 1235–1247 (2007).

  44. 44.

    et al. Inverted genomic segments and complex triplication rearrangements are mediated by inverted repeats in the human genome. Nat. Genet. 43, 1074–1081 (2011).

  45. 45.

    et al. Structure and evolution of the Smith–Magenis syndrome repeat gene clusters, SMS-REPs. Gen. Res. 12, 729–738 (2002).

  46. 46.

    , & A microhomology-mediated break-induced replication model for the origin of human copy number variation. PLoS Genet. 5, e1000327 (2009).

  47. 47.

    & Identification and biochemical analysis of GRIN1 and GRIN2. Methods Enzymol. 390, 475–483 (2004).

  48. 48.

    , & A candidate target for G protein action in brain. J. Biol. Chem. 274, 26931–26938 (1999).

  49. 49.

    et al. Loss-of-function mutations in the EGF-CFC gene CFC1 are associated with human left-right laterality defects. Nat. Genet. 26, 365–369 (2000).

  50. 50.

    et al. Genome-wide association study suggested copy number variation may be associated with body mass index in the Chinese population. J. Hum. Genet. 54, 199–202 (2009).

  51. 51.

    Interlocus gene conversion explains at least 2.7% of single nucleotide variants in human segmental duplications. BMC Genomics 16, 456 (2015).

  52. 52.

    & The role of gene conversion in preserving rearrangement hotspots in the human genome. Trends Genet. 29, 561–568 (2013).

  53. 53.

    et al. Rapid and accurate large-scale genotyping of duplicated genes and discovery of interlocus gene conversions. Nat. Methods 10, 903–909 (2013).

  54. 54.

    et al. Attenuated cold sensitivity in TRPM8 null mice. Neuron 54, 379–386 (2007).

  55. 55.

    et al. The menthol receptor TRPM8 is the principal detector of environmental cold. Nature 448, 204–208 (2007).

  56. 56.

    et al. Recent segmental duplications in the human genome. Science 297, 1003–1007 (2002).

  57. 57.

    , , , & Segmental duplications: organization and impact within the current human genome project assembly. Genome Res. 11, 1005–1017 (2001).

  58. 58.

    , , , & Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).

  59. 59.

    BLAT—the BLAST-like alignment tool. Genome Res. 12, 656–664 (2002).

  60. 60.

    BEDTools: the Swiss-Army tool for genome feature analysis. Curr. Protoc. Bioinformatics 47, 11–34 (2014).

  61. 61.

    et al. Reconstructing complex regions of genomes using long-read sequencing technology. Genome Res. 24, 688–696 (2014).

  62. 62.

    & MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772–780 (2013).

  63. 63.

    , , , & Jalview version 2—a multiple sequence alignment editor and analysis workbench. Bioinformatics 25, 1189–1191 (2009).

  64. 64.

    , , , & MEGA6: molecular evolutionary genetics analysis version 6.0. Mol. Biol. Evol. 30, 2725–2729 (2013).

  65. 65.

    et al. Genome sequence of a 45,000-year-old modern human from western Siberia. Nature 514, 445–449 (2014).

  66. 66.

    et al. Ancient human genomes suggest three ancestral populations for present-day Europeans. Nature 513, 409–413 (2014).

  67. 67.

    et al. mrsFAST: a cache-oblivious algorithm for short-read mapping. Nat. Methods 7, 576–577 (2010).

  68. 68.

    et al. A large and complex structural polymorphism at 16p12.1 underlies microdeletion disease risk. Nat. Genet. 42, 745–750 (2010).

  69. 69.

    , & Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms. Nat. Biotechnol. 32, 462–464 (2014).

  70. 70.

    , , , & MIPgen: optimized modeling and design of molecular inversion probes for targeted resequencing. Bioinformatics 30, 2670–2672 (2014).

  71. 71.

    et al. Multiplex targeted sequencing identifies recurrently mutated genes in autism spectrum disorders. Science 338, 1619–1622 (2012).

  72. 72.

    et al. Ensembl 2015. Nucleic Acids Res. 43, D662–D669 (2015).

  73. 73.

    & The Simons Simplex Collection: a resource for identification of autism genetic risk factors. Neuron 68, 192–195 (2010).

  74. 74.

    et al. The autism genetic resource exchange: a resource for the study of autism and related neuropsychiatric conditions. Am. J. Hum. Genet. 69, 463–466 (2001).

  75. 75.

    et al. The Autism Simplex Collection: an international, expertly phenotyped autism sample for genetic and phenotypic analyses. Mol. Autism 5, 34 (2014).

  76. 76.

    et al. New material of the earliest hominid from the Upper Miocene of Chad. Nature 434, 752–755 (2005).

  77. 77.

    et al. A new hominid from the Upper Miocene of Chad, Central Africa. Nature 418, 145–151 (2002).

  78. 78.

    et al. Geology and palaeontology of the Upper Miocene Toros-Menalla hominid locality, Chad. Nature 418, 152–155 (2002).

  79. 79.

    , , & DupMasker: a tool for annotating primate segmental duplications. Genome Res. 18, 1362–1368 (2008).

Download references

Acknowledgements

We would like to thank many individuals that contributed to the results described here. We thank B. Coe for assistance in statistical analyses, T. Brown for manuscript editing, and L. Vives, T. Wang and B. Xiong for technical assistance with MIP sequencing. We would also like to thank B. Dumont, C. Campbell, K. Meltz Steinberg, S. Girirajan, C. Payan, C. Alkan and E. Karakoc for helpful discussion and advice. We thank M. Kremitzki for technical support in generating sequence maps and contigs of BACs not currently included in the human reference sequence. For DNA samples used in MIP sequencing, we would like to thank the investigators and families participating in the 1KG Project, Autism Speaks, TASC, and SSC. Additionally, we would like to thank the principal investigators involved in the SSC (A. Beaudet, R. Bernier, J. Constantino, E. Cook, E. Fombonne, D. Geschwind, R. Goin-Kochel, E. Hanson, D. Grice, A. Klin, D. Ledbetter, C. Lord, C. Martin, D. Martin, R. Maxim, J. Miles, O. Ousley, K. Pelphrey, B. Peterson, J. Piggot, C. Saulnier, M. State, W. Stone, J. Sutcliffe, C. Walsh, Z. Warren and E. Wijsman). Approved researchers can obtain the SSC population data set described in this study (https://sfari.org/resources/autism-cohorts/simons-simplex-collection) by applying at https://base.sfari.org. The BAC clones from the complete hydatidiform mole were derived from a cell line created by U. Surti. This work was supported, in part, by US National Institutes of Health (NIH) grants from NINDS (R00NS083627 to M.Y.D.), NIMH (R01MH101221 to E.E.E.) and NHGRI (R01HG002385 to E.E.E., and U41HG007635 to R.K.W. and E.E.E.), as well as The Paul G. Allen Family Foundation (11631 to E.E.E.). S.C. is supported by a National Health and Medical Research Council (NHMRC) CJ Martin Biomedical Fellowship (#1073726). E.E.E. is an investigator of the Howard Hughes Medical Institute.

Author information

Affiliations

  1. Genome Center, MIND Institute, and Department of Biochemistry & Molecular Medicine, University of California, Davis, California 95616, USA

    • Megan Y. Dennis
  2. Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA

    • Megan Y. Dennis
    • , Lana Harshman
    • , Bradley J. Nelson
    • , Osnat Penn
    • , Stuart Cantsilieris
    • , John Huddleston
    • , Kelsi Penewit
    • , Laura Denman
    • , Archana Raja
    • , Carl Baker
    • , Kenneth Mark
    • , Maika Malig
    • , Nicolette Janke
    • , Claudia Espinoza
    • , Holly A. F. Stessman
    • , Xander Nuttle
    • , Kendra Hoekzema
    •  & Evan E. Eichler
  3. Howard Hughes Medical Institute, University of Washington, Seattle, Washington 98195, USA

    • John Huddleston
    • , Archana Raja
    •  & Evan E. Eichler
  4. Dipartimento di Biologia, Università degli Studi di Bari “Aldo Moro”, Bari 70125, Italy

    • Francesca Antonacci
  5. McDonnell Genome Institute at Washington University, Washington University School of Medicine, St Louis, Missouri 63108, USA

    • Tina A. Lindsay-Graves
    •  & Richard K. Wilson

Authors

  1. Search for Megan Y. Dennis in:

  2. Search for Lana Harshman in:

  3. Search for Bradley J. Nelson in:

  4. Search for Osnat Penn in:

  5. Search for Stuart Cantsilieris in:

  6. Search for John Huddleston in:

  7. Search for Francesca Antonacci in:

  8. Search for Kelsi Penewit in:

  9. Search for Laura Denman in:

  10. Search for Archana Raja in:

  11. Search for Carl Baker in:

  12. Search for Kenneth Mark in:

  13. Search for Maika Malig in:

  14. Search for Nicolette Janke in:

  15. Search for Claudia Espinoza in:

  16. Search for Holly A. F. Stessman in:

  17. Search for Xander Nuttle in:

  18. Search for Kendra Hoekzema in:

  19. Search for Tina A. Lindsay-Graves in:

  20. Search for Richard K. Wilson in:

  21. Search for Evan E. Eichler in:

Contributions

M.Y.D. and E.E.E. conceived and designed the experiments. M.Y.D., L.H., B.J.N., O.P., S.C., J.H., F.A., L.D., K.M. and C.E. performed the experiments. M.Y.D., L.H., B.J.N., O.P., S.C. and J.H. analysed data. M.Y.D., K.P., A.R., C.B., M.M., N.J. and K.H. provided technical support. H.A.F.S., X.N., T.A.L.G., R.K.W. and E.E.E. provided materials or analyses tools. M.Y.D., L.H., B.J.N., O.P., S.C. and E.E.E. wrote the paper.

Competing interests

E.E.E. is on the scientific advisory board (SAB) of DNAnexus, Inc., is a consultant for Kunming University of Science and Technology (KUST) as part of the 1000 China Talent Program (2014–2016), and was an SAB member of Pacific Biosciences, Inc. (2009–2013).

Corresponding author

Correspondence to Evan E. Eichler.

Supplementary information

PDF files

  1. 1.

    Supplementary Information

    Supplementary Figures 1–26; Supplementary Tables 7, 12, 13, 15, 17 and 23; Supplementary Note; Supplementary Discussion; Supplementary Methods; and Supplementary References.

Excel files

  1. 1.

    Supplementary Tables

    Supplementary Tables 1–6, 8–11, 14, 16 and 18–22.

  2. 2.

    Supplementary Dataset 2

    Evolutionary analyses of HSDs

Zip files

  1. 1.

    Supplementary Dataset 1

    Duplication sequence data including: (1) multiple-sequence alignment (MSA) of HSDs (fastas; labelled 1 through 19 corresponding to labels in Supplementary Dataset 2); (2) visualization of pairwise alignments from MSAs (Align_slider.pdf); and (3) contigs of HSD regions of CH17 BACs not included in reference genome (CH17_contigs.fasta).

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/s41559-016-0069

Further reading