Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Natural history and evolutionary principles of gene duplication in fungi


Gene duplication and loss is a powerful source of functional innovation. However, the general principles that govern this process are still largely unknown. With the growing number of sequenced genomes, it is now possible to examine these events in a comprehensive and unbiased manner. Here, we develop a procedure that resolves the evolutionary history of all genes in a large group of species. We apply our procedure to seventeen fungal genomes to create a genome-wide catalogue of gene trees that determine precise orthology and paralogy relations across these species. We show that gene duplication and loss is highly constrained by the functional properties and interacting partners of genes. In particular, stress-related genes exhibit many duplications and losses, whereas growth-related genes show selection against such changes. Whole-genome duplication circumvents this constraint and relaxes the dichotomy, resulting in an expanded functional scope of gene duplication. By characterizing the functional fate of duplicate genes we show that duplicated genes rarely diverge with respect to biochemical function, but typically diverge with respect to regulatory control. Surprisingly, paralogous modules of genes rarely arise, even after whole-genome duplication. Rather, gene duplication may drive the modularization of functional networks through specialization, thereby disentangling cellular systems.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Figure 1: The SYNERGY algorithm.
Figure 2: A gene ancestry catalogue for Ascomycota fungi.
Figure 3: A functional dichotomy of uniform, persistent and volatile orthogroups.
Figure 4: Evolutionary profiles correspond to the hierarchical modular organization of the yeast transcriptional system.
Figure 5: Functional conservation and innovation of paralogues in classes and networks.


  1. Ohno, S. Evolution by Gene Duplication (Allen and Unwin, London, 1970)

    Book  Google Scholar 

  2. Lynch, M. & Conery, J. S. The origins of genome complexity. Science 302, 1401–1404 (2003)

    Article  ADS  CAS  Google Scholar 

  3. Blomme, T. et al. The gain and loss of genes during 600 million years of vertebrate evolution. Genome Biol. 7, R43 (2006)

    Article  Google Scholar 

  4. Freeling, M. & Thomas, B. C. Gene-balanced duplications, like tetraploidy, provide predictable drive to increase morphological complexity. Genome Res. 16, 805–814 (2006)

    Article  CAS  Google Scholar 

  5. Gu, Z., Rifkin, S. A., White, K. P. & Li, W. H. Duplicate genes increase gene expression diversity within and between species. Nature Genet. 36, 577–579 (2004)

    Article  CAS  Google Scholar 

  6. Kafri, R., Bar-Even, A. & Pilpel, Y. Transcription control reprogramming in genetic backup circuits. Nature Genet. 37, 295–299 (2005)

    Article  CAS  Google Scholar 

  7. Maere, S. et al. Modeling gene and genome duplications in eukaryotes. Proc. Natl Acad. Sci. USA 102, 5454–5459 (2005)

    Article  ADS  CAS  Google Scholar 

  8. Makova, K. D. & Li, W. H. Divergence in the spatial pattern of gene expression between human duplicate genes. Genome Res. 13, 1638–1645 (2003)

    Article  CAS  Google Scholar 

  9. Papp, B., Pal, C. & Hurst, L. D. Dosage sensitivity and the evolution of gene families in yeast. Nature 424, 194–197 (2003)

    Article  ADS  CAS  Google Scholar 

  10. Scannell, D. R., Byrne, K. P., Gordon, J. L., Wong, S. & Wolfe, K. H. Multiple rounds of speciation associated with reciprocal gene loss in polyploid yeasts. Nature 440, 341–345 (2006)

    Article  ADS  CAS  Google Scholar 

  11. He, X. & Zhang, J. Rapid subfunctionalization accompanied by prolonged and substantial neofunctionalization in duplicate gene evolution. Genetics 169, 1157–1164 (2005)

    Article  Google Scholar 

  12. Hong, E. L. B. R. et al. Saccharomyces Genome Database〉 (2005)

    Google Scholar 

  13. Arnaud, M. B. C. M. et al. Candida Genome Database〉 (2006)

    Google Scholar 

  14. Dietrich, F. S. et al. The Ashbya gossypii genome as a tool for mapping the ancient Saccharomyces cerevisiae genome. Science 304, 304–307 (2004)

    Article  ADS  CAS  Google Scholar 

  15. Dujon, B. et al. Genome evolution in yeasts. Nature 430, 35–44 (2004)

    Article  ADS  Google Scholar 

  16. Galagan, J. E. et al. Sequencing of Aspergillus nidulans and comparative analysis with A. fumigatus and A. oryzae. Nature 438, 1105–1115 (2005)

    Article  ADS  CAS  Google Scholar 

  17. Fusarium graminearum Sequencing Project. 〈〉 (Broad Institute of Harvard and MIT, 2003)

  18. Dean, R. A. et al. The genome sequence of the rice blast fungus Magnaporthe grisea. Nature 434, 980–986 (2005)

    Article  ADS  CAS  Google Scholar 

  19. Kellis, M., Birren, B. W. & Lander, E. S. Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiae. Nature 428, 617–624 (2004)

    Article  ADS  CAS  Google Scholar 

  20. Kellis, M., Patterson, N., Endrizzi, M., Birren, B. & Lander, E. S. Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature 423, 241–254 (2003)

    Article  ADS  CAS  Google Scholar 

  21. Wood, V. et al. The genome sequence of Schizosaccharomyces pombe. Nature 415, 871–880 (2002)

    Article  ADS  CAS  Google Scholar 

  22. Arvestad, L., Berglund, A. C., Lagergren, J. & Sennblad, B. Bayesian gene/species tree reconciliation and orthology analysis using MCMC. Bioinformatics 19 (Suppl. 1). i7–i15 (2003)

    Article  Google Scholar 

  23. Chen, K., Durand, D. & Farach-Colton, M. NOTUNG: a program for dating gene duplications and optimizing gene family trees. J. Comput. Biol. 7, 429–447 (2000)

    Article  CAS  Google Scholar 

  24. Dufayard, J. F. et al. Tree pattern matching in phylogenetic trees: automatic search for orthologs or paralogs in homologous gene sequence databases. Bioinformatics 21, 2596–2603 (2005)

    Article  CAS  Google Scholar 

  25. Durand, D., Halldorsson, B. V. & Vernot, B. A hybrid micro-macroevolutionary approach to gene tree reconstruction. J. Comput. Biol. 13, 320–335 (2006)

    Article  MathSciNet  CAS  Google Scholar 

  26. Fitch, W. M. Distinguishing homologous from analogous proteins. Syst. Zool. 19, 99–113 (1970)

    Article  CAS  Google Scholar 

  27. Jothi, R., Zotenko, E., Tasneem, A. & Przytycka, T. M. COCO-CL: hierarchical clustering of homology relations based on evolutionary correlations. Bioinformatics 22, 779–788 (2006)

    Article  CAS  Google Scholar 

  28. Kellis, M., Patterson, N., Birren, B., Berger, B. & Lander, E. S. Methods in comparative genomics: genome correspondence, gene identification and regulatory motif discovery. J. Comput. Biol. 11, 319–355 (2004)

    Article  CAS  Google Scholar 

  29. Li, H. et al. TreeFam: a curated database of phylogenetic trees of animal gene families. Nucleic Acids Res. 34, D572–D580 (2006)

    Article  CAS  Google Scholar 

  30. Remm, M., Storm, C. E. & Sonnhammer, E. L. Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. J. Mol. Biol. 314, 1041–1052 (2001)

    Article  CAS  Google Scholar 

  31. Tatusov, R. L. et al. The COG database: an updated version includes eukaryotes. BMC Bioinform. 4, article no. 41 (2003)

  32. Wapinski, I., Pfeffer, A., Friedman, N. & Regev, A. Automatic genome-wide reconstruction of phylogenetic gene trees. Bioinformatics doi: 10.1093/bioinformatics/bmt193 (2007)

  33. Byrne, K. P. & Wolfe, K. H. The Yeast Gene Order Browser: combining curated homology and syntenic context reveals gene fate in polyploid species. Genome Res. 15, 1456–1461 (2005)

    Article  CAS  Google Scholar 

  34. Dudley, A. M., Janse, D. M., Tanay, A., Shamir, R. & Church, G. M. A global view of pleiotropy and phenotypically derived gene function in yeast. Mol. Syst. Biol. 1, 2005.0001 (2005)

    Article  Google Scholar 

  35. Tzung, K. W. et al. Genomic evidence for a complete sexual cycle in Candida albicans. Proc. Natl Acad. Sci. USA 98, 3249–3253 (2001)

    Article  ADS  CAS  Google Scholar 

  36. Ashburner, M. et al. Gene ontology: tool for the unification of biology. Nature Genet. 25, 25–29 (2000)

    Article  CAS  Google Scholar 

  37. Segal, E., Friedman, N., Kaminski, N., Regev, A. & Koller, D. From signatures to models: understanding cancer using microarrays. Nature Genet. 37 (Suppl.). S38–S45 (2005)

    Article  CAS  Google Scholar 

  38. Gavin, A. C. et al. Proteome survey reveals modularity of the yeast cell machinery. Nature 440, 631–636 (2006)

    Article  ADS  CAS  Google Scholar 

  39. Deutschbauer, A. M. et al. Mechanisms of haploinsufficiency revealed by genome-wide profiling in yeast. Genetics 169, 1915–1925 (2005)

    Article  CAS  Google Scholar 

  40. Hughes, T. R. et al. Functional discovery via a compendium of expression profiles. Cell 102, 109–126 (2000)

    Article  CAS  Google Scholar 

  41. Newman, J. R. et al. Single-cell proteomic analysis of S. cerevisiae reveals the architecture of biological noise. Nature 441, 840–846 (2006)

    Article  ADS  CAS  Google Scholar 

  42. Huisinga, K. L. & Pugh, B. F. A genome-wide housekeeping role for TFIID and a highly regulated stress-related role for SAGA in Saccharomyces cerevisiae. Mol. Cell 13, 573–585 (2004)

    Article  CAS  Google Scholar 

  43. Tirosh, I., Weinberger, A., Carmi, M. & Barkai, N. A genetic signature of interspecies variations in gene expression. Nature Genet. 38, 830–834 (2006)

    Article  CAS  Google Scholar 

  44. Sopko, R. et al. Mapping pathways and phenotypes by systematic gene overexpression. Mol. Cell 21, 319–330 (2006)

    Article  ADS  CAS  Google Scholar 

  45. Davis, J. C. & Petrov, D. A. Do disparate mechanisms of duplication add similar genes to the genome? Trends Genet. 21, 548–551 (2005)

    Article  CAS  Google Scholar 

  46. Reguly, T. et al. Comprehensive curation and analysis of global interaction networks in Saccharomyces cerevisiae. J. Biol. 5, article no. 11 (2006)

  47. Kafri, R., Levy, M. & Pilpel, Y. The regulatory utilization of genetic redundancy through responsive backup circuits. Proc. Natl Acad. Sci. USA 103, 11653–11658 (2006)

    Article  ADS  CAS  Google Scholar 

  48. Harbison, C. T. et al. Transcriptional regulatory code of a eukaryotic genome. Nature 431, 99–104 (2004)

    Article  ADS  CAS  Google Scholar 

  49. Gerber, A. P., Herschlag, D. & Brown, P. O. Extensive association of functionally and cytotopically related mRNAs with Puf family RNA-binding proteins in yeast. PLoS Biol. 2, article no. E79 (2004)

  50. Force, A. et al. The origin of subfunctions and modular gene regulation. Genetics 170, 433–446 (2005)

    Article  CAS  Google Scholar 

  51. Pearson, W. R. & Lipman, D. J. Improved tools for biological sequence comparison. Proc. Natl Acad. Sci. USA 85, 2444–2448 (1988)

    Article  ADS  CAS  Google Scholar 

  52. Jones, D. T., Taylor, W. R. & Thornton, J. M. The rapid generation of mutation data matrices from protein sequences. Comput. Appl. Biosci. 8, 275–282 (1992)

    CAS  PubMed  Google Scholar 

  53. Saitou, N. & Nei, M. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4, 406–425 (1987)

    CAS  Google Scholar 

  54. Kurtzman, C. P. & Robnett, C. J. Phylogenetic relationships among yeasts of the 'Saccharomyces complex' determined from multigene sequence analyses. FEMS Yeast Res. 3, 417–432 (2003)

    Article  CAS  Google Scholar 

  55. Kuramae, E. E., Robert, V., Snel, B. & Boekhout, T. Conflicting phylogenetic position of Schizosaccharomyces pombe. Genomics 88, 387–393 (2006)

    Article  CAS  Google Scholar 

  56. Felsenstein, J. PHYLIP—Phylogeny Inference Package (Version 3.2). Cladistics 5, 164–166 (1989)

    Google Scholar 

  57. Ninio, M., Privman, E., Pupko, T. & Friedman, N. Phylogeny reconstruction: increasing the accuracy of pairwise distance estimation using Bayesian inference of evolutionary rates. Bioinformatics 23, e136–e141 (2007)

    Article  CAS  Google Scholar 

  58. Kanehisa, M. A database for post-genome analysis. Trends Genet. 13, 375–376 (1997)

    Article  CAS  Google Scholar 

  59. Karp, P. D. et al. Expansion of the BioCyc collection of pathway/genome databases to 160 genomes. Nucleic Acids Res. 33, 6083–6089 (2005)

    Article  CAS  Google Scholar 

  60. Mewes, H. W. et al. MIPS: analysis and annotation of proteins from whole genomes in 2005. Nucleic Acids Res. 34, D169–D172 (2006)

    Article  CAS  Google Scholar 

  61. Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797 (2004)

    Article  CAS  Google Scholar 

  62. Eddy, S. HMMER: Hidden Markov models for sequence profile analysis. 〈〉 (2003)

Download references


A.R. was supported by a Career Award at the Scientific Interface from the Burroughs Wellcome Fund and by NIGMS. N.F. was supported by the Israel Science Foundation. We thank E. S. Lander for discussions and D. Peer, A. Tanay and O. Rando for their comments on previous drafts of this manuscript. We are also grateful to the members of the FAS Center and the Broad Institute for their scientific and technical support, especially A. Daneau, M. Ethier and B. Mantenuto.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Aviv Regev.

Ethics declarations

Competing interests

Reprints and permissions information is available at The authors declare no competing financial interests.

Supplementary information

Supplementary Information

This file contains Supplementary Figures 1-12 and Legends and Supplementary Notes 1-5. The notes contain: quality of genomic data sources; summary of bootstrap method; accuracy measures as compared to curated resources and simulated data; summary of selected essential S. cerevisiae genes not present in all species; and discussion of coherence in gene duplication and loss (PDF 2716 kb)

Supplementary Table 1

This file contains Supplementary Table 1 which summarizes the significant enrichments of all orthogroup classes tested (different categories are on separate tabs). (XLS 764 kb)

Supplementary Table 2

This file contains Supplementary Table 2 which includes a list of each S. cerevisiae gene’s transcription module membership. (XLS 6152 kb)

Supplementary Table 3

This file contains Supplementary Table 3 which summarizes the enrichments of all gene classes tested for the transcription modules (different categories are on separate tabs). (XLS 112 kb)

Supplementary Table 4

This file contains Supplementary Table 4 which lists the extended copy number variation profile coherences in each orthogroup class tested (different categories are on separate tabs). (XLS 766 kb)

Supplementary Table 5

This file contains Supplementary Table 5 which summarizes the gene class migrations and protein interaction network statistics for each pair of paralogous S. cerevisiae genes. (XLS 148 kb)

Supplementary Table 6

This file contains Supplementary Table 6 which lists the GEO accession numbers and references for the gene expression assays included in all our analyses. (PDF 30 kb)

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Wapinski, I., Pfeffer, A., Friedman, N. et al. Natural history and evolutionary principles of gene duplication in fungi. Nature 449, 54–61 (2007).

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI:

This article is cited by


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing