Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

Computational approaches to unveiling ancient genome duplications

Key Points

  • Many, if not most, eukaryotic model organisms have had their genome duplicated, sometimes more than once, in their evolutionary past. Such large-scale gene duplication events might explain major leaps in evolution and adaptive radiations of species. Owing to its putative impact on evolution, the search for traces of such events has received much attention of late.

  • Evidence for ancient large-scale gene duplication events comes from the detection and delineation of 'blocks' or 'segments' in the genome that are homologous; that is, that contain a set of homologous genes.

  • Extensive gene loss and gene translocations can obscure homology between two segments. In particular, after tens or hundreds of millions of years of evolution, too few homologous gene pairs might remain in close proximity to detect statistically significant collinearity. More sophisticated bioinformatics approaches are then needed to uncover homology between two segments.

  • One approach to uncover homology in the 'twilight zone' is to combine the gene content and gene order of multiple segments and build so-called genomic profiles. These profiles can then be used as more sensitive probes to sweep the rest of the genome to uncover additional homology.

  • Another strategy for uncovering duplicated segments that have become unrecognizable, because of differential gene loss, is to compare gene order information of a genome with that of the genome of another species. The comparative approach has been proven effective but relies on the assumption that gene order is largely conserved between the genomes of different species.

  • Dating duplication events provides another useful means to find evidence for large-scale gene duplication events. If it can be shown that many gene duplicates were created at about the same time, this can be considered strong evidence to show that most paralogous genes were created by one single event. Absolute dating can be based on synonymous substitution rates, or on the construction and analysis of linear phylogenetic trees.

  • Currently, more genomic sequences are being determined from species that probably carry remnants of whole-genome duplication events. On top of that, they might have experienced lineage-specific segmental duplications. Therefore, it is anticipated that more large-scale gene duplication events in eukaryotic genomes will be unveiled and that the detection of such events will soon become standard procedure.

Abstract

Recent analyses of complete genome sequences have revealed that many genomes have been duplicated in their evolutionary past. Such events have been associated with important biological transitions, major leaps in evolution and adaptive radiations of species. Here, we consider recently developed computational methods to detect such ancient large-scale gene duplication events. Several new approaches have been used to show that large-scale gene duplications are more common than previously thought.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Gene homology matrix (GHM).
Figure 2: Hidden duplications and transitive homology.
Figure 3: Detection of genomic homology using a genomic profile.
Figure 4: Construction of a gene duplication landscape.
Figure 5: Age distributions of duplicated genes.

Similar content being viewed by others

References

  1. Ohno, S. Evolution by Gene Duplication (Springer, New York, 1970).

    Book  Google Scholar 

  2. McLysaght, A., Hokamp, K. & Wolfe, K. H. Extensive genomic duplication during early chordate evolution. Nature Genet. 31, 200–204 (2002).

    CAS  PubMed  Google Scholar 

  3. Gu, X., Wang, Y. & Gu, J. Age distribution of human gene families shows significant roles of both large- and small-scale duplications in vertebrate evolution. Nature Genet. 31, 205–209 (2002).

    Article  CAS  PubMed  Google Scholar 

  4. Larhammar, D., Lundin, L. -G. & Hallböök, F. The human Hox-bearing chromosome regions did arise by block or chromosome (or even genome) duplications. Genome Res. 12, 1910–1920 (2002).

    CAS  PubMed  PubMed Central  Google Scholar 

  5. Lundin, L. -G., Larhammer, D. & Hallböök, F. Numerous groups of chromosomal regional paralogies strongly indicate two genome doublings at the root of the vertebrates. J. Struct. Funct. Genomics 3, 53–63 (2003).

    CAS  PubMed  Google Scholar 

  6. Panopoulou, G. et al. New evidence for genome-wide duplications at the origin of vertebrates using an Amphioxus gene set and completed animal genomes. Genome Res. 13, 1056–1066 (2003).

    PubMed  PubMed Central  Google Scholar 

  7. Vandepoele, K., De Vos, W., Taylor, J. S., Meyer, A. & Van de Peer, Y. Major events in the genome evolution of vertebrates: paranome age and size differs considerably between fishes and land vertebrates. Proc. Natl Acad. Sci. USA 101, 1638–1643 (2004).

    CAS  PubMed  PubMed Central  Google Scholar 

  8. Holland, P. W. More genes in vertebrates? J. Struct. Funct. Genomics 3, 75–84 (2003).

    CAS  PubMed  Google Scholar 

  9. Aburomia, R., Khaner, O. & Sidow, A. Functional evolution in the ancestral lineage of vertebrates or when genomic complexity was wagging its morphological tail. J. Struct. Funct. Genomics 3, 45–52 (2003). These authors devised a method to estimate the amount of change in morphological complexity during vertebrate evolution and noticed that increase in complexity coincided with postulated whole-genome duplication events in early vertebrate evolution.

    CAS  PubMed  Google Scholar 

  10. Otto, S. P. & Whitton, J. W. Polyploid incidence and evolution. Annu. Rev. Genet. 34, 401–437 (2000).

    CAS  PubMed  Google Scholar 

  11. Blanc, G. & Wolfe, K. H. Functional divergence of duplicated genes formed by polyploidy during Arabidopsis evolution. Plant Cell 16, 1679–1691 (2004).

    CAS  PubMed  PubMed Central  Google Scholar 

  12. Amores, A. et al. Zebrafish hox clusters and vertebrate genome evolution. Science 282, 1711–1714 (1998).

    CAS  PubMed  Google Scholar 

  13. Naruse, K. et al. A detailed linkage map of medaka, Oryzias latipes: comparative genomics and genome evolution. Genetics 154, 1773–1784 (2000).

    CAS  PubMed  PubMed Central  Google Scholar 

  14. Elgar, G. et al. Generation and analysis of 25 Mb of genomic DNA from the pufferfish Fugu rubripes by sequence scanning. Genome Res. 9, 960–971 (1999).

    PubMed  PubMed Central  Google Scholar 

  15. Postlethwait, J. H. et al. Zebrafish comparative genomics and the origins of vertebrate chromosomes. Genome Res. 10, 1890–1902 (2000).

    CAS  PubMed  Google Scholar 

  16. Woods, I. G. et al. A comparative map of the zebrafish genome. Genome Res. 10, 1903–1914 (2000).

    CAS  PubMed  PubMed Central  Google Scholar 

  17. Taylor, J. S., Braasch, I., Frickey, T., Meyer, A. & Van de Peer, Y. Genome duplication, a trait shared by 22000 species of ray-finned fish. Genome Res. 13, 382–390 (2003).

    CAS  PubMed  PubMed Central  Google Scholar 

  18. Wittbrodt, J., Meyer, A. & Schartl, M. More genes in fish? BioEssays 20, 511–512 (1998).

    Google Scholar 

  19. Meyer, A. & Schartl, M. Gene and genome duplications in vertebrates: the one-to-four (-to-eight in fish) rule and the evolution of novel gene functions. Curr. Opin. Cell Biol. 11, 699–704 (1999).

    CAS  PubMed  Google Scholar 

  20. Postlethwait, J., Amores, A., Cresko, W., Singer, A. & Yan, Y. -L. Subfunction partitioning, the teleost radiation and the annotation of the human genome. Trends Genet. (in the press).

  21. Wolfe, K. H. Yesterday's polyploids and the mystery of diploidization. Nature Rev. Genet. 2, 333–341 (2001).

    CAS  PubMed  Google Scholar 

  22. Gu, X. & Huang, W. Testing the parsimony test of genome duplications: a counterexample. Genome Res. 12, 1–2 (2002).

    CAS  PubMed  Google Scholar 

  23. Seoighe, C. Turning the clock back on ancient genome duplication. Curr. Opin. Genet. Devel. 13, 636–643 (2003).

    CAS  Google Scholar 

  24. Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).

    Article  CAS  PubMed  Google Scholar 

  25. Vandepoele, K., Saeys, Y., Simillion, C., Raes, J. & Van de Peer, Y. The automatic detection of homologous regions (ADHoRe) and its application to microcollinearity between Arabidopsis and rice. Genome Res. 12, 1792–1801 (2002).

    CAS  PubMed  PubMed Central  Google Scholar 

  26. Simillion, C., Vandepoele, K., Saeys, Y. & Van de Peer, Y. Building genomic profiles for uncovering segmental homology in the twilight zone. Genome Res. 14, 1095–1106 (2004).

    CAS  PubMed  PubMed Central  Google Scholar 

  27. Wolfe, K. H. & Shields, D. C. Molecular evidence for an ancient duplication of the entire yeast genome. Nature 387, 708–713 (1997).

    CAS  PubMed  Google Scholar 

  28. Vision, T. J., Brown, D. G. & Tanksley, S. D. The origins of genomic duplications in Arabidopsis. Science 290, 2114–2117 (2000).

    CAS  PubMed  Google Scholar 

  29. Force, A. et al. Preservation of duplicate genes by complementary, degenerative mutations. Genetics 151, 1531–1545 (1999).

    CAS  PubMed  PubMed Central  Google Scholar 

  30. Lynch, M. & Conery, J. S. The evolutionary fate and consequences of duplicate genes. Science 290, 1151–1155 (2000). Seminal paper that describes the birth and death rate of genes in eukaryotic genomes. The study describes a continuous mode of gene duplication, the rate of which is similar to nucleotide substitutions.

    Article  CAS  PubMed  Google Scholar 

  31. Gibson, T. J. & Spring, J. Evidence in favour of ancient octaploidy in the vertebrate genome. Biochem. Soc. Trans. 28, 259–264 (2000).

    CAS  PubMed  Google Scholar 

  32. Lynch, M. & Force, A. The probability of duplicate gene preservation by subfunctionalization. Genetics 154, 459–473 (2000).

    CAS  PubMed  PubMed Central  Google Scholar 

  33. Wagner, A. Selection and gene duplication: a view from the genome. Genome Biol. 3, 1012.1–1012.3 (2002).

    Google Scholar 

  34. Simillion, C., Vandepoele, K., Van Montagu, M., Zabeau, M. & Van de Peer, Y. The hidden duplication past of Arabidopsis thaliana. Proc. Natl Acad. Sci. USA 99, 13627–13632 (2002). The A. thaliana genome was shown to contain numerous segments that seemed to have been duplicated between five and eight times. This observation can be explained by inferring three, but no more, genome-wide duplication events.

    CAS  PubMed  PubMed Central  Google Scholar 

  35. Vandepoele, K., Simillion, C. & Van de Peer, Y. Detecting the undetectable: uncovering duplicated segments in Arabidopsis through rice. Trends Genet. 18, 606–608 (2003).

    Google Scholar 

  36. Kellis, M., Birren, B. W. & Lander, E. S. Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiae. Nature 428, 617–624 (2004). This paper (see also reference 37) uses a non-duplicated genome sequence of a related yeast species to prove the existence of an ancient genome duplication in S. cerevisiae.

    CAS  PubMed  Google Scholar 

  37. Dietrich, F. S. et al. The Ashbya genome as a tool for mapping the ancient Saccharomyces cerevisiae genome. Science 304, 304–307 (2004).

    CAS  PubMed  Google Scholar 

  38. Bowers, J. E., Chapman, B. A., Rong, J. & Paterson, A. H. Unravelling angiosperm genome evolution by phylogenetic analysis of chromosomal duplication events. Nature 422, 433–438 (2003).

    CAS  PubMed  Google Scholar 

  39. Blanc, G., Hokamp, K. & Wolfe, K. H. A recent polyploidy superimposed on older large-scale duplications in the Arabidopsis genome. Genome Res. 13, 137–144 (2003).

    CAS  PubMed  PubMed Central  Google Scholar 

  40. Abi-Rached, L., Gilles, A., Shiina, T., Pontarotti, P. & Inoko, H. Evidence of en bloc duplication in vertebrate genomes. Nature Genet. 31, 100–105 (2002).

    CAS  PubMed  Google Scholar 

  41. Friedman, R. & Hughes, A. L. Gene duplication and the structure of eukaryotic genomes. Genome Res. 11, 373–381 (2001).

    CAS  PubMed  PubMed Central  Google Scholar 

  42. Cavalcanti, A. R., Ferreira, R., Gu, Z. & Li, W. -H. Patterns of gene duplication in Saccharomyces cerevisiae and Caenorhabditis elegans. J. Mol. Evol. 56, 28–37 (2003).

    CAS  PubMed  Google Scholar 

  43. Gehring, W. J. Master Control Genes in Development and Evolution: the Homeobox Story (Yale Univ. Press, New Haven, 1998).

    Google Scholar 

  44. Bailey, J. A. et al. Recent segmental duplications in the human genome. Science 297, 1003–1007 (2002).

    CAS  PubMed  Google Scholar 

  45. Bailey, J. A., Church, D. M., Ventura, M., Rocchi, M. & Eichler, E. E. Analysis of segmental duplications and genome assembly in the mouse. Genome Res. 14, 789–801 (2004).

    CAS  PubMed  PubMed Central  Google Scholar 

  46. Koszul, R., Caburet, S., Dujon, B. & Fischer, G. Eucaryotic genome evolution through the spontaneous duplication of large chromosomal segments. EMBO J. 23, 234–243 (2004).

    CAS  PubMed  Google Scholar 

  47. Tuzun, E., Bailey, J. A. & Eichler, E. E. Recent segmental duplications in the working draft assembly of the brown Norway rat. Genome Res. 14, 493–506 (2004).

    CAS  PubMed  PubMed Central  Google Scholar 

  48. Seoighe, C. & Wolfe, K. H. Updated map of duplicated regions in the yeast genome. Gene 238, 253–261 (1999).

    CAS  PubMed  Google Scholar 

  49. Blanc, G., Barakat, A., Guyot, R., Cooke, R. & Delseny, M. Extensive duplication and reshuffling in the Arabidopsis genome. Plant Cell 12, 1093–1101 (2000).

    CAS  PubMed  PubMed Central  Google Scholar 

  50. Arabidopsis Genome Initiative. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408, 796–815 (2000).

  51. Paterson, A. H. et al. Comparative genomics of plant chromosomes. Plant Cell 12, 1523–1540 (2000).

    CAS  PubMed  PubMed Central  Google Scholar 

  52. Raes, J., Vandepoele, K., Saeys, Y., Simillion, C. & Van de Peer, Y. Investigating ancient duplication events in the Arabidopsis genome. J. Struct. Funct. Genomics. 3, 117–129 (2003).

    CAS  PubMed  Google Scholar 

  53. Nei, M. & Kumar, S. Molecular Evolution and Phylogenetics (Oxford Univ. Press, New York, 2000).

    Google Scholar 

  54. Hughes, A. L. Adaptive Evolution of Genes and Genomes (Oxford Univ. Press, New York, 1999).

    Google Scholar 

  55. Koch, M. A., Haubold, B. & Mitchell-Olds, T. Comparative evolutionary analysis of chalcone synthase and alcohol dehydrogenase loci in Arabidopsis, Arabis, and related genera (Brassicaceae). Mol. Biol. Evol. 17, 1483–1498 (2000).

    CAS  PubMed  Google Scholar 

  56. Gaut, B. S., Morton, B. R., McCaig, B. C. & Clegg, M. T. Substitution rate comparisons between grasses and palms: synonymous rate differences at the nuclear gene Adh parallel rate differences at the plastid gene rbcL. Proc. Natl Acad. Sci. USA. 93, 10274–10279 (1996).

    CAS  PubMed  PubMed Central  Google Scholar 

  57. Vandepoele, K., Simillion, C. & Van de Peer, Y. Evidence that rice, and other cereals, are ancient aneuploids. Plant Cell 15, 2192–2202 (2003).

    CAS  PubMed  PubMed Central  Google Scholar 

  58. Blanc, G. & Wolfe, K. H. Widespread paleopolyploidy in model plant species inferred from age distributions of duplicate genes. Plant Cell 16, 1667–1678 (2004). An elegant approach to uncover large-scale gene duplication-events based on age distributions of paralogous expressed sequence tags.

    CAS  PubMed  PubMed Central  Google Scholar 

  59. Zhang, L., Vision, T. & Gaut, B. S. Patterns of nucleotide substitution among simultaneously duplicated gene pairs in Arabidopsis thaliana. Mol. Biol. Evol. 19, 1464–1473 (2002).

    CAS  PubMed  Google Scholar 

  60. Li, W. -H. Unbiased estimation of the rates of synonymous and nonsynonymous substitution. J. Mol. Evol. 36, 96–99 (1993).

    CAS  PubMed  Google Scholar 

  61. Nei, M. & Gojobori, T. Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol. Biol. Evol. 3, 418–426 (1986).

    CAS  PubMed  Google Scholar 

  62. Yang, Z. PAML: a program package for phylogenetic analysis by maximum likelihood. Comput. Appl. Biosci. 13, 555–556 (1997).

    CAS  PubMed  Google Scholar 

  63. Yang, Z. & Nielsen, R. Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Mol. Biol. Evol. 17, 32–43 (2000).

    CAS  PubMed  Google Scholar 

  64. Conery, J. S. & Lynch, M. Nucleotide substitutions and the evolution of duplicate genes. Pac. Symp. Biocomput. 6, 167–178 (2001).

    Google Scholar 

  65. Lynch, M. & Conery, J. S. The evolutionary demography of duplicate genes. J. Struct. Funct. Genomics 3, 35–44 (2003).

    CAS  PubMed  Google Scholar 

  66. Hughes, A. Phylogenetic tests of the hypothesis of block duplication of homologous genes on chromosomes 6, 9, and 1. Mol. Biol. Evol. 15, 854–870 (1998).

    CAS  PubMed  Google Scholar 

  67. Robinson-Rechavi, M., Boussau, B. & Laudet, V. Phylogenetic dating and characterization of gene duplications in vertebrates: the cartilaginous fish reference. Mol. Biol. Evol. 21, 580–586 (2004).

    CAS  PubMed  Google Scholar 

  68. Langkjaer, R. B., Cliften, P. F., Johnston, M. & Piskur, J. Yeast genome duplication was followed by asynchronous differentiation of duplicated genes. Nature 421, 848–852 (2003).

    CAS  PubMed  Google Scholar 

  69. Chapman, B. A., Bowers, J. E., Schulze, S. R. & Paterson, A. H. A comparative phylogenetic approach for dating whole genome duplication events. Bioinformatics 20, 180–185 (2004).

    CAS  PubMed  Google Scholar 

  70. Van de Peer, Y., Taylor, J. & Meyer, A. Are all fishes ancient polyploids? J. Sruct. Funct. Genomics 2, 65–73 (2003).

    Google Scholar 

  71. Skrabanek, L. & Wolfe, K. H. Eukaryote genome duplication — where's the evidence? Curr. Opin. Genet. Dev. 8, 694–700 (1998).

    CAS  PubMed  Google Scholar 

  72. Hughes, A. L. Phylogenies of developmentally important proteins do not support the hypothesis of two rounds of genome duplication early in vertebrate history. J. Mol. Evol. 48, 565–576 (1999).

    CAS  PubMed  Google Scholar 

  73. Martin, A. Is tetralogy true? Lack of support for the 'one-to-four rule'. Mol. Biol. Evol. 18, 89–93 (2001).

    CAS  PubMed  Google Scholar 

  74. Hughes, A. L. & Friedman, R. Testing hypotheses of genome duplication in early vertebrates. J. Struct. Funct. Genomics 3, 85–93 (2003).

    CAS  PubMed  Google Scholar 

  75. Furlong, R. F. & Holland, P. W. H. Were vertebrates octoploid? Phil. Trans. R. Soc. Lond. B 357, 531–544 (2002).

    CAS  Google Scholar 

  76. Gibson, T. J. & Spring, J. Genetic redundancy in vertebrates: polyploidy and persistence of genes encoding multidomain proteins. Trends Genet. 14, 46–49 (1998).

    CAS  PubMed  Google Scholar 

  77. Van de Peer, Y., Taylor, J. S., Braasch, I. & Meyer, A. The ghost of selection past: rates of evolution and functional divergence in anciently duplicated genes. J. Mol. Evol. 53, 436–446 (2001).

    CAS  PubMed  Google Scholar 

  78. Zhang, P., Gu, Z. & Li, W. -H. Different evolutionary patterns between young duplicate genes in the human genome. Genome Biol. 4, R56 (2003).

    PubMed  PubMed Central  Google Scholar 

  79. Taylor, S. T. & Brinkmann, H. 2R or not 2R? Trends Genet. 17, 488–489 (2001).

    CAS  PubMed  Google Scholar 

  80. Takezaki, N., Rzhetsky, A. & Nei, M. Phylogenetic test of the molecular clock and linearized trees. Mol. Biol. Evol. 12, 823–833 (1995).

    CAS  PubMed  Google Scholar 

  81. Wong, S., Butler, G. & Wolfe, K. H. Gene order evolution and paleopolyploidy in hemiascomycete yeasts. Proc. Natl Acad. Sci. USA 99, 9272–9277 (2002). A clever approach to use of partial gene order information of related species to demonstrate genome duplication in S. cerevisiae.

    CAS  PubMed  PubMed Central  Google Scholar 

  82. Cheung., J. et al. Recent segmental and gene duplications in the mouse genome. Genome Biol. 4, R47 (2003).

    PubMed  PubMed Central  Google Scholar 

  83. Cheung, J. et al. Genome-wide detection of segmental duplications and potential assembly errors in the human genome sequence. Genome Biol. 4, R25 (2003).

    PubMed  PubMed Central  Google Scholar 

  84. Llorente, B. et al. Genomic exploration of the hemiascomycetous yeasts: 20. Evolution of redundancy compared to Saccharomyces cerevisiae. FEBS Lett. 22, 122–133 (2000).

    Google Scholar 

  85. Li, W. -H., Gu, Z., Cavalcanti, A. R. O. & Nekrutenko, A. Detection of gene duplications and block duplications in eukaryotic genomes. J. Struct. Funct. Genomics 3, 27–34 (2003).

    CAS  PubMed  Google Scholar 

  86. Wolfe, K. Evolutionary genomics: yeasts accelerate beyond BLAST. Curr. Biol. 14, R392–394 (2004).

    CAS  PubMed  Google Scholar 

  87. Dujon et al. Genome evolution in yeasts. Nature 430, 35–44 (2004).

    PubMed  Google Scholar 

  88. Van de Peer, Y., Frickey, T., Taylor, J. S. & Meyer, A. Dealing with saturation at the amino acid level: a case study involving anciently duplicated zebrafish genes. Gene 295, 205–211 (2002).

    CAS  PubMed  Google Scholar 

  89. Durand, D. & Sankoff, D. Tests for gene clustering. J. Comput. Biol. 10, 453–482 (2003).

    CAS  PubMed  Google Scholar 

  90. Gu, Z., Cavalcanti, A., Chen, F. C., Bouman, P. & Li, W. -H. Extent of gene duplication in the genomes of Drosophila, nematode, and yeast. Mol. Biol. Evol. 19, 256–562 (2002).

    CAS  PubMed  Google Scholar 

  91. Rost, B. Twilight zone of protein sequence alignments. Protein Eng. 12, 85–94 (1999).

    CAS  PubMed  Google Scholar 

  92. Li, W. -H., Gu, Z., Wang, H. & Nekrutenko, A. Evolutionary analyses of the human genome. Nature 409, 847–849 (2001).

    CAS  PubMed  Google Scholar 

  93. Enright, A. J., Van Dongen, S. & Ouzounis, C. A. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 30, 1575–1584 (2002).

    CAS  PubMed  PubMed Central  Google Scholar 

  94. Calabrese, P. P., Chakravarty, S. & Vision, T. J. Fast identification and statistical evaluation of segmental homologies in comparative maps. Bioinformatics 19 (Suppl. 1), i74–i80 (2003).

    PubMed  Google Scholar 

  95. Durand, D. Vertebrate evolution, doubling and shuffling with a full deck. Trends Genet. 19, 2–5 (2003).

    CAS  PubMed  Google Scholar 

  96. Holland, P. W., Garcia-Fernandez, J., Williams, N. A. & Sidow, A. Gene duplications and the origins of vertebrate development. Dev. Suppl. 125–133 (1994).

  97. Holland, P. W. Vertebrate evolution: something fishy about Hox genes. Curr. Biol. 7, R570–572 (1997).

    CAS  PubMed  Google Scholar 

  98. Spring, J. Vertebrate evolution by interspecific hybridization — are we polyploidy? FEBS Lett. 400, 2–8 (1997).

    CAS  PubMed  Google Scholar 

  99. Makalowski, W. Are we polyploids? A brief history of one hypothesis. Genome Res. 11, 667–670 (2001).

    CAS  PubMed  Google Scholar 

  100. Paterson, A. H., Bowers, J. E. & Chapman, B. A. Ancient polyploidization predating divergence of the cereals, and its consequences for comparative genomics. Proc. Natl Acad. Sci. USA 101, 9903–9908 (2004).

    CAS  PubMed  PubMed Central  Google Scholar 

  101. Guyot, R. & Keller, B. Ancestral genome duplication in rice. Genome 47, 610–614 (2004).

    CAS  PubMed  Google Scholar 

  102. Christoffels, A. et al. (2004) Fugu genome analysis provides evidence for a whole-genome duplication early during the evolution of ray-finned fishes. Mol. Biol. Evol. 21, 1146–1151 (2004).

    CAS  PubMed  Google Scholar 

Download references

Acknowledgements

I would like to thank all members of the Bioinformatics and Computational Biology Division for their interest and stimulating discussions. I am particularly grateful to Klaas Vandepoele, Cedric Simillion, Dirk Gevers, Jeroen Raes and Kathleen Marchal for critical readings of the manuscript, and to the Department of Plant Systems Biology for continuous support. I also gratefully acknowledge the constructive comments of the three anonymous reviewers.

Author information

Authors and Affiliations

Authors

Ethics declarations

Competing interests

The author declares no competing financial interests.

Related links

Related links

FURTHER INFORMATION

BLAST

Yves Van de Peer's laboratory

Glossary

SYMPATRIC SPECIATION

Genetic divergence that leads to species formation in the same habitat.

BLOCK OR SEGMENTAL DUPLICATIONS

A duplication of several genes at the same time. The result is two genomic segments that share a similar set of genes and are therefore homologous.

TANDEM DUPLICATION

Duplication of (single) genes that create tandem repeats in the genome. The HOX genes are a well-known example of a gene family generated through tandem duplication.

ANCHOR POINT

A pair of homologous genes in a duplicated segment. Several anchor points in close proximity form strong evidence for a block duplication.

COLLINEARITY

Conservation in gene order and gene content between two genomic segments.

POLYPLOIDY

A polyploid organism has more than two sets of chromosomes.

E-VALUES

The expect value (E) is a parameter that describes the number of hits one can expect to see by chance when searching a database of a particular size. The lower the E-value, the more significant the match is, and the more probable that two sequences are homologous.

PARALOGUES

Homologous genes that have originated through gene duplication events; that is, by tandem, block or whole-genome duplication events.

GRAPH-CLUSTERING ALGORITHM

This is applied to separate, sparsely-connected, dense subgraphs (here gene families). This means that the graph is partitioned in such a way that the distance between the subgraphs (clusters) is maximized, whereas the sum of the distances within each subgraph is minimized.

NON-FUNCTIONALIZATION

When one of the two duplicate genes accumulates deleterious mutations in coding or regulatory sequences that ultimately renders the gene non-functional.

TRANSITIVE HOMOLOGY

When homology between two genes or genomic segments can only be inferred through a third gene or segment.

MULTIPLICON

A set of homologous genomic segments. The multiplication level of a multiplicon refers to the number of homologous segments that the multiplicon consists of.

GENOMIC PROFILES

A genomic profile is formed by genomic segments that combine gene order and gene content. These profiles can then be used to identify more concealed forms of homology.

PARALOGON

Homologous genomic segments created by partial or complete genome duplication.

PARANOME

The complete set of duplicated genes in a genome. The paranome can be formed by both small-scale and large-scale gene duplication events.

SYNTENY

Two loci are called syntenic when they are located on the same chromosome.

SILENT SUBSTITUTIONS

Nucleotide substitutions that do not lead to amino-acid replacements. They are considered to be neutral and to occur in a clocklike manner.

SYNONYMOUS SITE

One at which a nucleotide change does not alter the amino acid encoded.

TETRAPLOID

A tetraploid organism has four sets of homologous chromosomes, instead of the usual two.

AUTOTETRAPOLYPLOID

Tetraploidy, in which all the chromosomes come from the same species; that is, a tetraploid is formed by the doubling of its own genome.

ALLOTETRAPOLYPLOID

An allotetrapolyploid originates by the fusion of the genomes of two different, but closely-related species.

DIPLOIDIZATION

The evolutionary process whereby a polyploid species becomes a diploid again. The molecular basis of diploidization is not known yet.

LINEARIZED TREE

Linearized trees are phylogenetic trees that assume equal evolutionary rates in different lineages since their divergence from a common ancestor. As such trees assume a clocklike behaviour of the underlying molecular marker, a timescale can be superimposed on them.

MOLECULAR CLOCK

The hypothesis that, in any given gene or DNA sequence, mutations accumulate at an approximately constant rate in all evolutionary lineages as long as the gene or the DNA sequence retains its original function.

ISOZYME

Different forms of the same enzyme (synonymous with allozymes), which were used as some of the first biochemically-based genetic markers.

QUADRUPLICATE PARALOGY

Quadruplicate homology, where homology is found between four different genomic segments, is often considered to be evidence for two rounds of large-scale gene duplication (for example, Hox clusters).

Rights and permissions

Reprints and permissions

About this article

Cite this article

de Peer, Y. Computational approaches to unveiling ancient genome duplications. Nat Rev Genet 5, 752–763 (2004). https://doi.org/10.1038/nrg1449

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1038/nrg1449

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing