Ancestral reconstruction of segmental duplications reveals punctuated cores of human genome evolution

Article metrics


Human segmental duplications are hotspots for nonallelic homologous recombination leading to genomic disorders, copy-number polymorphisms and gene and transcript innovations. The complex structure and history of these regions have precluded a global evolutionary analysis. Combining a modified A-Bruijn graph algorithm with comparative genome sequence data, we identify the origin of 4,692 ancestral duplication loci and use these to cluster 437 complex duplication blocks into 24 distinct groups. The sequence-divergence data between ancestral-derivative pairs and a comparison with the chimpanzee and macaque genome support a 'punctuated' model of evolution. Our analysis reveals that human segmental duplications are frequently organized around 'core' duplicons, which are enriched for transcripts and, in some cases, encode primate-specific genes undergoing positive selection. We hypothesize that the rapid expansion and fixation of some intrachromosomal segmental duplications during great-ape evolution has been due to the selective advantage conferred by these genes and transcripts embedded within these core duplications.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Figure 1: Ancestral-state determination of duplication blocks.
Figure 2: Ancestral-state determination of 2p11 region.
Figure 3: Flowchart of computational analysis.
Figure 4: Definition of the ancestral loci by reciprocal best hit.
Figure 5: Validation of duplicons by comparative FISH analysis.
Figure 6: Nonrandom distribution of sequence divergence.
Figure 7: Genome-wide hierarchical clustering of duplication blocks and core structure.


  1. 1

    Bailey, J.A., Yavor, A.M., Massa, H.F., Trask, B.J. & Eichler, E.E. Segmental duplications: organization and impact within the current human genome project assembly. Genome Res. 11, 1005–1017 (2001).

  2. 2

    She, X. et al. A preliminary comparative analysis of primate segmental duplications shows elevated substitution rates and a great-ape expansion of intrachromosomal duplications. Genome Res. 16, 576–583 (2006).

  3. 3

    Eichler, E.E. et al. Interchromosomal duplications of the adrenoleukodystrophy locus: a phenomenon of pericentromeric plasticity. Hum. Mol. Genet. 6, 991–1002 (1997).

  4. 4

    Orti, R. et al. Conservation of pericentromeric duplications of a 200-kb part of the human 21q22.1 region in primates. Cytogenet. Cell Genet. 83, 262–265 (1998).

  5. 5

    Jackson, M.S. et al. Sequences flanking the centromere of human chromosome 10 are a complex patchwork of arm-specific sequences, stable duplications, and unstable sequences with homologies to telomeric and other centromeric locations. Hum. Mol. Genet. 8, 205–215 (1999).

  6. 6

    Horvath, J., Schwartz, S. & Eichler, E. The mosaic structure of a 2p11 pericentromeric segment: a strategy for characterizing complex regions of the human genome. Genome Res. 10, 839–852 (2000).

  7. 7

    Horvath, J. et al. Molecular structure and evolution of an alpha/non-alpha satellite junction at 16p11. Hum. Mol. Genet. 9, 113–123 (2000).

  8. 8

    Johnson, M.E. et al. Positive selection of a gene family during the emergence of humans and African apes. Nature 413, 514–519 (2001).

  9. 9

    Stankiewicz, P. & Lupski, J.R. Genome architecture, rearrangements and genomic disorders. Trends Genet. 18, 74–82 (2002).

  10. 10

    Horvath, J.E. et al. Punctuated duplication seeding events during the evolution of human chromosome 2p11. Genome Res. 15, 914–927 (2005).

  11. 11

    Locke, D.P. et al. Molecular evolution of the human chromosome 15 pericentromeric region. Cytogenet. Genome Res. 108, 73–82 (2005).

  12. 12

    Linardopoulou, E.V. et al. Human subtelomeres are hot spots of interchromosomal recombination and segmental duplication. Nature 437, 94–100 (2005).

  13. 13

    Bailey, J.A. et al. Recent segmental duplications in the human genome. Science 297, 1003–1007 (2002).

  14. 14

    She, X. et al. Shotgun sequence assembly and recent segmental duplications within the human genome. Nature 431, 927–930 (2004).

  15. 15

    Pevzner, P.A., Tang, H. & Tesler, G. De novo repeat classification and fragment assembly. Genome Res. 14, 1786–1796 (2004).

  16. 16

    Gibbs, R.A. et al. Evolutionary and biomedical insights from the rhesus macaque genome. Science 316, 222–234 (2007).

  17. 17

    Waterston, R. et al. Initial sequencing and comparative analysis of the mouse genome. Nature 420, 520–562 (2002).

  18. 18

    Gibbs, R.A. et al. Genome sequence of the Brown Norway rat yields insights into mammalian evolution. Nature 428, 493–521 (2004).

  19. 19

    Lindblad-Toh, K. et al. Genome sequence, comparative analysis and haplotype structure of the domestic dog. Nature 438, 803–819 (2005).

  20. 20

    Eichler, E.E. et al. Duplication of a gene-rich cluster between 16p11.1 and Xq28: a novel pericentromeric-directed mechanism for paralogous genome evolution. Hum. Mol. Genet. 5, 899–912 (1996).

  21. 21

    Regnier, V. et al. Emergence and scattering of multiple neurofibromatosis (NF1)-related sequences during hominoid evolution suggest a process of pericentromeric interchromosomal transposition. Hum. Mol. Genet. 6, 9–16 (1997).

  22. 22

    Potier, M. et al. Two sequence-ready contigs spanning the two copies of a 200-kb duplication on human 21q: partial sequence and polymorphisms. Genomics 51, 417–426 (1998).

  23. 23

    She, X. et al. The structure and evolution of centromeric transition regions within the human genome. Nature 430, 857–864 (2004).

  24. 24

    Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W. & Haussler, D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc. Natl. Acad. Sci. USA 100, 11484–11489 (2003).

  25. 25

    Eichler, E.E. et al. Divergent origins and concerted expansion of two segmental duplications on chromosome 16. J. Hered. 92, 462–468 (2001).

  26. 26

    Jackson, M.S. et al. Evidence for widespread reticulate evolution within human duplicons. Am. J. Hum. Genet. 77, 824–840 (2005).

  27. 27

    Hurles, M.E. Gene conversion homogenizes the CMT1A paralogous repeats. BMC Genomics 2, 11 (2001).

  28. 28

    Pavlicek, A., House, R., Gentles, A.J., Jurka, J. & Morrow, B.E. Traffic of genetic information between segmental duplications flanking the typical 22q11.2 deletion in velo-cardio-facial syndrome/DiGeorge syndrome. Genome Res. 15, 1487–1495 (2005).

  29. 29

    Cheng, Z. et al. A genome-wide comparison of recent chimpanzee and human segmental duplications. Nature 437, 88–93 (2005).

  30. 30

    Bowers, P.M., Cokus, S.J., Eisenberg, D. & Yeates, T.O. Use of logic relationships to decipher protein network organization. Science 306, 2246–2249 (2004).

  31. 31

    Rivera, M.C. & Lake, J.A. The ring of life provides evidence for a genome fusion origin of eukaryotes. Nature 431, 152–155 (2004).

  32. 32

    Lake, J.A. & Rivera, M.C. Deriving the genomic tree of life in the presence of horizontal gene transfer: conditioned reconstruction. Mol. Biol. Evol. 21, 681–690 (2004).

  33. 33

    Rual, J.F. et al. Towards a proteome-scale map of the human protein-protein interaction network. Nature 437, 1173–1178 (2005).

  34. 34

    Paulding, C.A., Ruvolo, M. & Haber, D.A. The Tre2 (USP6) oncogene is a hominoid-specific gene. Proc. Natl. Acad. Sci. USA 100, 2507–2511 (2003).

  35. 35

    Vandepoele, K., Van Roy, N., Staes, K., Speleman, F. & van Roy, F. A novel gene family NBPF: intricate structure generated by gene duplications during primate evolution. Mol. Biol. Evol. 22, 2265–2274 (2005).

  36. 36

    Ciccarelli, F.D. et al. Complex genomic rearrangements lead to novel primate gene function. Genome Res. 15, 343–351 (2005).

  37. 37

    Gu, X., Wang, Y. & Gu, J. Age distribution of human gene families shows significant roles of both large- and small-scale duplications in vertebrate evolution. Nat. Genet. 31, 205–209 (2002).

  38. 38

    Lynch, M. & Conery, J.S. The evolutionary fate and consequences of duplicate genes. Science 290, 1151–1155 (2000).

  39. 39

    Horvath, J.E. et al. Using a pericentromeric interspersed repeat to recapitulate the phylogeny and expansion of human centromeric segmental duplications. Mol. Biol. Evol. 20, 1463–1479 (2003).

  40. 40

    Johnson, M.E. et al. Recurrent duplication-driven transposition of DNA during hominoid evolution. Proc. Natl. Acad. Sci. USA 103, 17626–17631 (2006).

  41. 41

    Kimura, M. A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J. Mol. Evol. 16, 111–120 (1980).

Download references


We thank P. Green, J. Felsenstein, T. Newman, C. Alkan and Z. Bao for useful comments and valuable discussions in the preparation of this manuscript, and E. Tüzün and Z. Cheng for computational assistance. This work was supported by a US National Institutes of Health grant GM58815 to E.E.E. and a Rosetta Inpharmatics fellowship (Merck Laboratories) to Z.J. T.M.-B. is a research fellow supported by Departament d'Educació i Universitats de la Generalitat de Catalunya. E.E.E. is an investigator of the Howard Hughes Medical Institute.

Author information

Z.J. performed the analyses and drafted the manuscript. H.T. implemented the program package and performed part of the analyses. M.V. and M.F.C. performed the FISH validation experiment. T.M.-B. performed the positive selection analysis on the core genes. X.S. was involved in part of the fusion gene analysis. P.A.P. and E.E.E. designed the study, and E.E.E. finalized the manuscript.

Correspondence to Evan E Eichler.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–4, Supplementary Note (PDF 900 kb)

Supplementary Table 1 (XLS 23 kb)

Supplementary Table 2 (XLS 911 kb)

Supplementary Table 3s (XLS 63 kb)

Supplementary Table 4 (XLS 73 kb)

Supplementary Table 5 (XLS 22 kb)

Rights and permissions

Reprints and Permissions

About this article

Further reading