Abstract
Gene duplication and loss is a powerful source of functional innovation. However, the general principles that govern this process are still largely unknown. With the growing number of sequenced genomes, it is now possible to examine these events in a comprehensive and unbiased manner. Here, we develop a procedure that resolves the evolutionary history of all genes in a large group of species. We apply our procedure to seventeen fungal genomes to create a genome-wide catalogue of gene trees that determine precise orthology and paralogy relations across these species. We show that gene duplication and loss is highly constrained by the functional properties and interacting partners of genes. In particular, stress-related genes exhibit many duplications and losses, whereas growth-related genes show selection against such changes. Whole-genome duplication circumvents this constraint and relaxes the dichotomy, resulting in an expanded functional scope of gene duplication. By characterizing the functional fate of duplicate genes we show that duplicated genes rarely diverge with respect to biochemical function, but typically diverge with respect to regulatory control. Surprisingly, paralogous modules of genes rarely arise, even after whole-genome duplication. Rather, gene duplication may drive the modularization of functional networks through specialization, thereby disentangling cellular systems.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 51 print issues and online access
$199.00 per year
only $3.90 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Ohno, S. Evolution by Gene Duplication (Allen and Unwin, London, 1970)
Lynch, M. & Conery, J. S. The origins of genome complexity. Science 302, 1401–1404 (2003)
Blomme, T. et al. The gain and loss of genes during 600 million years of vertebrate evolution. Genome Biol. 7, R43 (2006)
Freeling, M. & Thomas, B. C. Gene-balanced duplications, like tetraploidy, provide predictable drive to increase morphological complexity. Genome Res. 16, 805–814 (2006)
Gu, Z., Rifkin, S. A., White, K. P. & Li, W. H. Duplicate genes increase gene expression diversity within and between species. Nature Genet. 36, 577–579 (2004)
Kafri, R., Bar-Even, A. & Pilpel, Y. Transcription control reprogramming in genetic backup circuits. Nature Genet. 37, 295–299 (2005)
Maere, S. et al. Modeling gene and genome duplications in eukaryotes. Proc. Natl Acad. Sci. USA 102, 5454–5459 (2005)
Makova, K. D. & Li, W. H. Divergence in the spatial pattern of gene expression between human duplicate genes. Genome Res. 13, 1638–1645 (2003)
Papp, B., Pal, C. & Hurst, L. D. Dosage sensitivity and the evolution of gene families in yeast. Nature 424, 194–197 (2003)
Scannell, D. R., Byrne, K. P., Gordon, J. L., Wong, S. & Wolfe, K. H. Multiple rounds of speciation associated with reciprocal gene loss in polyploid yeasts. Nature 440, 341–345 (2006)
He, X. & Zhang, J. Rapid subfunctionalization accompanied by prolonged and substantial neofunctionalization in duplicate gene evolution. Genetics 169, 1157–1164 (2005)
Hong, E. L. B. R. et al. Saccharomyces Genome Database 〈http://www.yeastgenome.org〉 (2005)
Arnaud, M. B. C. M. et al. Candida Genome Database 〈http://www.candidagenome.org〉 (2006)
Dietrich, F. S. et al. The Ashbya gossypii genome as a tool for mapping the ancient Saccharomyces cerevisiae genome. Science 304, 304–307 (2004)
Dujon, B. et al. Genome evolution in yeasts. Nature 430, 35–44 (2004)
Galagan, J. E. et al. Sequencing of Aspergillus nidulans and comparative analysis with A. fumigatus and A. oryzae. Nature 438, 1105–1115 (2005)
Fusarium graminearum Sequencing Project. 〈http://www.broad.mit.edu〉 (Broad Institute of Harvard and MIT, 2003)
Dean, R. A. et al. The genome sequence of the rice blast fungus Magnaporthe grisea. Nature 434, 980–986 (2005)
Kellis, M., Birren, B. W. & Lander, E. S. Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiae. Nature 428, 617–624 (2004)
Kellis, M., Patterson, N., Endrizzi, M., Birren, B. & Lander, E. S. Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature 423, 241–254 (2003)
Wood, V. et al. The genome sequence of Schizosaccharomyces pombe. Nature 415, 871–880 (2002)
Arvestad, L., Berglund, A. C., Lagergren, J. & Sennblad, B. Bayesian gene/species tree reconciliation and orthology analysis using MCMC. Bioinformatics 19 (Suppl. 1). i7–i15 (2003)
Chen, K., Durand, D. & Farach-Colton, M. NOTUNG: a program for dating gene duplications and optimizing gene family trees. J. Comput. Biol. 7, 429–447 (2000)
Dufayard, J. F. et al. Tree pattern matching in phylogenetic trees: automatic search for orthologs or paralogs in homologous gene sequence databases. Bioinformatics 21, 2596–2603 (2005)
Durand, D., Halldorsson, B. V. & Vernot, B. A hybrid micro-macroevolutionary approach to gene tree reconstruction. J. Comput. Biol. 13, 320–335 (2006)
Fitch, W. M. Distinguishing homologous from analogous proteins. Syst. Zool. 19, 99–113 (1970)
Jothi, R., Zotenko, E., Tasneem, A. & Przytycka, T. M. COCO-CL: hierarchical clustering of homology relations based on evolutionary correlations. Bioinformatics 22, 779–788 (2006)
Kellis, M., Patterson, N., Birren, B., Berger, B. & Lander, E. S. Methods in comparative genomics: genome correspondence, gene identification and regulatory motif discovery. J. Comput. Biol. 11, 319–355 (2004)
Li, H. et al. TreeFam: a curated database of phylogenetic trees of animal gene families. Nucleic Acids Res. 34, D572–D580 (2006)
Remm, M., Storm, C. E. & Sonnhammer, E. L. Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. J. Mol. Biol. 314, 1041–1052 (2001)
Tatusov, R. L. et al. The COG database: an updated version includes eukaryotes. BMC Bioinform. 4, article no. 41 (2003)
Wapinski, I., Pfeffer, A., Friedman, N. & Regev, A. Automatic genome-wide reconstruction of phylogenetic gene trees. Bioinformatics doi: 10.1093/bioinformatics/bmt193 (2007)
Byrne, K. P. & Wolfe, K. H. The Yeast Gene Order Browser: combining curated homology and syntenic context reveals gene fate in polyploid species. Genome Res. 15, 1456–1461 (2005)
Dudley, A. M., Janse, D. M., Tanay, A., Shamir, R. & Church, G. M. A global view of pleiotropy and phenotypically derived gene function in yeast. Mol. Syst. Biol. 1, 2005.0001 (2005)
Tzung, K. W. et al. Genomic evidence for a complete sexual cycle in Candida albicans. Proc. Natl Acad. Sci. USA 98, 3249–3253 (2001)
Ashburner, M. et al. Gene ontology: tool for the unification of biology. Nature Genet. 25, 25–29 (2000)
Segal, E., Friedman, N., Kaminski, N., Regev, A. & Koller, D. From signatures to models: understanding cancer using microarrays. Nature Genet. 37 (Suppl.). S38–S45 (2005)
Gavin, A. C. et al. Proteome survey reveals modularity of the yeast cell machinery. Nature 440, 631–636 (2006)
Deutschbauer, A. M. et al. Mechanisms of haploinsufficiency revealed by genome-wide profiling in yeast. Genetics 169, 1915–1925 (2005)
Hughes, T. R. et al. Functional discovery via a compendium of expression profiles. Cell 102, 109–126 (2000)
Newman, J. R. et al. Single-cell proteomic analysis of S. cerevisiae reveals the architecture of biological noise. Nature 441, 840–846 (2006)
Huisinga, K. L. & Pugh, B. F. A genome-wide housekeeping role for TFIID and a highly regulated stress-related role for SAGA in Saccharomyces cerevisiae. Mol. Cell 13, 573–585 (2004)
Tirosh, I., Weinberger, A., Carmi, M. & Barkai, N. A genetic signature of interspecies variations in gene expression. Nature Genet. 38, 830–834 (2006)
Sopko, R. et al. Mapping pathways and phenotypes by systematic gene overexpression. Mol. Cell 21, 319–330 (2006)
Davis, J. C. & Petrov, D. A. Do disparate mechanisms of duplication add similar genes to the genome? Trends Genet. 21, 548–551 (2005)
Reguly, T. et al. Comprehensive curation and analysis of global interaction networks in Saccharomyces cerevisiae. J. Biol. 5, article no. 11 (2006)
Kafri, R., Levy, M. & Pilpel, Y. The regulatory utilization of genetic redundancy through responsive backup circuits. Proc. Natl Acad. Sci. USA 103, 11653–11658 (2006)
Harbison, C. T. et al. Transcriptional regulatory code of a eukaryotic genome. Nature 431, 99–104 (2004)
Gerber, A. P., Herschlag, D. & Brown, P. O. Extensive association of functionally and cytotopically related mRNAs with Puf family RNA-binding proteins in yeast. PLoS Biol. 2, article no. E79 (2004)
Force, A. et al. The origin of subfunctions and modular gene regulation. Genetics 170, 433–446 (2005)
Pearson, W. R. & Lipman, D. J. Improved tools for biological sequence comparison. Proc. Natl Acad. Sci. USA 85, 2444–2448 (1988)
Jones, D. T., Taylor, W. R. & Thornton, J. M. The rapid generation of mutation data matrices from protein sequences. Comput. Appl. Biosci. 8, 275–282 (1992)
Saitou, N. & Nei, M. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4, 406–425 (1987)
Kurtzman, C. P. & Robnett, C. J. Phylogenetic relationships among yeasts of the 'Saccharomyces complex' determined from multigene sequence analyses. FEMS Yeast Res. 3, 417–432 (2003)
Kuramae, E. E., Robert, V., Snel, B. & Boekhout, T. Conflicting phylogenetic position of Schizosaccharomyces pombe. Genomics 88, 387–393 (2006)
Felsenstein, J. PHYLIP—Phylogeny Inference Package (Version 3.2). Cladistics 5, 164–166 (1989)
Ninio, M., Privman, E., Pupko, T. & Friedman, N. Phylogeny reconstruction: increasing the accuracy of pairwise distance estimation using Bayesian inference of evolutionary rates. Bioinformatics 23, e136–e141 (2007)
Kanehisa, M. A database for post-genome analysis. Trends Genet. 13, 375–376 (1997)
Karp, P. D. et al. Expansion of the BioCyc collection of pathway/genome databases to 160 genomes. Nucleic Acids Res. 33, 6083–6089 (2005)
Mewes, H. W. et al. MIPS: analysis and annotation of proteins from whole genomes in 2005. Nucleic Acids Res. 34, D169–D172 (2006)
Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797 (2004)
Eddy, S. HMMER: Hidden Markov models for sequence profile analysis. 〈http://hmmer.janelia.org/〉 (2003)
Acknowledgements
A.R. was supported by a Career Award at the Scientific Interface from the Burroughs Wellcome Fund and by NIGMS. N.F. was supported by the Israel Science Foundation. We thank E. S. Lander for discussions and D. Peer, A. Tanay and O. Rando for their comments on previous drafts of this manuscript. We are also grateful to the members of the FAS Center and the Broad Institute for their scientific and technical support, especially A. Daneau, M. Ethier and B. Mantenuto.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
Reprints and permissions information is available at www.nature.com/reprints. The authors declare no competing financial interests.
Supplementary information
Supplementary Information
This file contains Supplementary Figures 1-12 and Legends and Supplementary Notes 1-5. The notes contain: quality of genomic data sources; summary of bootstrap method; accuracy measures as compared to curated resources and simulated data; summary of selected essential S. cerevisiae genes not present in all species; and discussion of coherence in gene duplication and loss (PDF 2716 kb)
Supplementary Table 1
This file contains Supplementary Table 1 which summarizes the significant enrichments of all orthogroup classes tested (different categories are on separate tabs). (XLS 764 kb)
Supplementary Table 2
This file contains Supplementary Table 2 which includes a list of each S. cerevisiae gene’s transcription module membership. (XLS 6152 kb)
Supplementary Table 3
This file contains Supplementary Table 3 which summarizes the enrichments of all gene classes tested for the transcription modules (different categories are on separate tabs). (XLS 112 kb)
Supplementary Table 4
This file contains Supplementary Table 4 which lists the extended copy number variation profile coherences in each orthogroup class tested (different categories are on separate tabs). (XLS 766 kb)
Supplementary Table 5
This file contains Supplementary Table 5 which summarizes the gene class migrations and protein interaction network statistics for each pair of paralogous S. cerevisiae genes. (XLS 148 kb)
Supplementary Table 6
This file contains Supplementary Table 6 which lists the GEO accession numbers and references for the gene expression assays included in all our analyses. (PDF 30 kb)
Rights and permissions
About this article
Cite this article
Wapinski, I., Pfeffer, A., Friedman, N. et al. Natural history and evolutionary principles of gene duplication in fungi. Nature 449, 54–61 (2007). https://doi.org/10.1038/nature06107
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1038/nature06107
This article is cited by
-
Evolution of pathogenicity-associated genes in Rhizoctonia solani AG1-IA by genome duplication and transposon-mediated gene function alterations
BMC Biology (2023)
-
OAF is a DAF-like gene that controls ovule development in plants
Communications Biology (2023)
-
Species-specific effects of the introduction of Aspergillus nidulans gfdB in osmophilic aspergilli
Applied Microbiology and Biotechnology (2023)
-
Regulatory network for FOREVER YOUNG FLOWER-like genes in regulating Arabidopsis flower senescence and abscission
Communications Biology (2022)
-
Yeast cell fate control by temporal redundancy modulation of transcription factor paralogs
Nature Communications (2021)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.