Genome sequencing has led to the discovery of tens of thousands of potential new genes. Six years after the sequencing of the well-studied yeast Saccharomyces cerevisiae and the discovery that its genome encodes ∼6,000 predicted proteins, more than 2,000 have not yet been characterized experimentally, and determining their functions seems far from a trivial task. One crucial constraint is the generation of useful hypotheses about protein function. Using a new approach to interpret microarray data, we assign likely cellular functions with confidence values to these new yeast proteins. We perform extensive genome-wide validations of our predictions and offer visualization methods for exploration of the large numbers of functional predictions. We identify potential new members of many existing functional categories including 285 candidate proteins involved in transcription, processing and transport of non-coding RNA molecules. We present experimental validation confirming the involvement of several of these proteins in ribosomal RNA processing. Our methodology can be applied to a variety of genomics data types and organisms.
This is a preview of subscription content, access via your institution
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Rent or buy this article
Prices vary by article type
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Lander, E.S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).
Venter, J.C. et al. The sequence of the human genome. Science 291, 1304–1351 (2001).
Genome sequence of the nematode C. elegans: a platform for investigating biology. The C. elegans Sequencing Consortium. Science 282, 2012–2018 (1998).
Adams, M.D. et al. The genome sequence of Drosophila melanogaster. Science 287, 2185–2195 (2000).
Goffeau, A. et al. Life with 6000 genes. Science 274, 546, 563–547 (1996).
Consortium, T.C.e.S. Genome sequence of the nematode C. elegans: a platform for investigating biology. Science 282, 2012–2018 (1998).
Hodges, P.E., McKee, A.H., Davis, B.P., Payne, W.E. & Garrels, J.I. The Yeast Proteome Database (YPD): a model for the organization and presentation of genome-wide functional data. Nucleic Acids Res. 27, 69–73 (1999).
Mewes, H.W., Albermann, K., Heumann, K., Liebl, S. & Pfeiffer, F. MIPS: a database for protein sequences, homology data and yeast genome information. Nucleic Acids Res. 25, 28–30 (1997).
Ball, C.A. et al. Intergrating functional genomic information into the Saccharomyces genome database. Nucleic Acids Res. 28, 77–80 (2000).
Bork, P. et al. Predicting function: from genes to genomes and back. J. Mol. Biol. 283, 707–725 (1998).
Eisen, M.B., Spellman, P.T., Brown, P.O. & Botstein, D. Cluster analysis and display of genome-wide expression patterns. Proc. Natl Acad. Sci. USA 95, 14863–14868 (1998).
Marcotte, E.M., Pellegrini, M., Thompson, M.J., Yeates, T.O. & Eisenberg, D. A combined algorithm for genome-wide prediction of protein function. Nature 402, 83–86 (1999).
Niehrs, C. & Pollet, N. Synexpression groups in eukaryotes. Nature 402, 483–487 (1999).
Brown, M.P. et al. Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc. Natl Acad. Sci. USA 97, 262–267 (2000).
King, R.D., Karwath, A., Clare, A. & Dehaspe, L. Accurate prediction of protein functional class from sequence in the Mycobacterium tuberculosis and Escherichia coli genomes using data mining. Yeast 17, 283–293 (2000).
Hishigaki, H., Nakai, K., Ono, T., Tanigami, A. & Takagi, T. Assessment of prediction accuracy of protein function from protein–protein interaction data. Yeast 18, 523–531 (2001).
Shatkay, H., Edwards, S., Wilbur, W.J. & Boguski, M. Genes, themes and microarrays: using information retrieval for large-scale gene analysis. Proc. Int. Conf. Intell. Syst. Mol. Biol. 8, 317–328 (2000).
Jenssen, T.K., Laegreid, A., Komorowski, J. & Hovig, E. A literature network of human genes for high-throughput analysis of gene expression. Nat. Genet. 28, 21–28 (2001).
Hartigan, J. Clustering Algorithms (John Wiley & Sons, 1975).
Wen, X. et al. Large-scale temporal gene expression mapping of central nervous system development. Proc. Natl Acad. Sci. USA 95, 334–339 (1998).
DeRisi, J.L., Iyer, V.R. & Brown, P.O. Exploring the metabolic and genetic control of gene expression on a genomic scale. Science 278, 680–686 (1997).
Miki, R. et al. Delineating developmental and metabolic pathways in vivo by expression profiling using the RIKEN set of 18,816 full-length enriched mouse cDNA arrays. Proc. Natl Acad. Sci. USA 98, 2199–2204 (2001).
Hughes, T.R. et al. Functional discovery via a compendium of expression profiles. Cell 102, 109–126 (2000).
Tavazoie, S., Hughes, J.D., Campbell, M.J., Cho, R.J. & Church, G.M. Systematic determination of genetic network architecture. Nat. Genet. 22, 281–285 (1999).
Goldstein, D.R., Ghosh, D. & Conlon, E.M. Statistical issues in the clustering of gene expression data. Stat. Sinica 12, 219–240 (2002).
Jeffery, C.J. Moonlighting proteins. Trends Biochem. Sci. 24, 8–11 (1999).
Prosperi, E. Multiple roles of the proliferating cell nuclear antigen: DNA replication, repair and cell cycle control. Prog. Cell Cycle Res. 3, 193–210 (1997).
Chen, C., Merrill, B.J., Lau, P.J., Holm, C. & Kolodner, R.D. Saccharomyces cerevisiae pol30 (proliferating cell nuclear antigen) mutations impair replication fidelity and mismatch repair. Mol. Cell Biol. 19, 7801–7815 (1999).
Chu, S. et al. The transcriptional program of sporulation in budding yeast. Science 282, 699–705 (1998).
Spellman, P.T. et al. Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol. Biol. Cell 9, 3273–3297 (1998).
Roberts, C.J. et al. Signaling and circuitry of multiple MAPK pathways revealed by a matrix of global gene expression profiles. Science 287, 873–880 (2000).
Kressler, D., Linder, P. & de La Cruz, J. Protein trans-acting factors involved in ribosome biogenesis in Saccharomyces cerevisiae. Mol. Cell Biol. 19, 7897–7912 (1999).
Paule, M.R. & White, R.J. Survey and summary: transcription by RNA polymerases I and III. Nucleic Acids Res. 28, 1283–1298 (2000).
Spingola, M., Grate, L., Haussler, D. & Ares, M., Jr. Genome-wide bioinformatic and molecular analysis of introns in Saccharomyces cerevisiae. RNA 5, 221–234 (1999).
Cheng, Y., Dahlberg, J.E. & Lund, E. Diverse effects of the guanine nucleotide exchange factor RCC1 on RNA transport. Science 267, 1807–1810 (1995).
Winzeler, E.A. et al. Functional characterization of the S. cerevisiae genome by gene deletion and parallel analysis. Science 285, 901–906 (1999).
Gari, E., Piedrafita, L., Aldea, M. & Herrero, E. A set of vectors with a tetracycline-regulatable promoter system for modulated gene expression in Saccharomyces cerevisiae. Yeast 13, 837–848 (1997).
Gelperin, D., Horton, L., Beckman, J., Hensold, J. & Lemmon, S.K. Bms1p, a novel GTP-binding protein, and the related Tsr1p are required for distinct steps of 40S ribosome biogenesis in yeast. RNA 7, 1268–1283 (2001).
Bassler, J. et al. Identification of a 60S preribosomal particle that is closely linked to nuclear export. Mol. Cell 8, 517–529 (2001).
Kim, S.K. et al. A gene expression map for Caenorhabditis elegans. Science 293, 2087–2092 (2001).
Ashburner, M. et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25, 25–29 (2000).
Kohonen, T. Self-Organizing Maps (2001).
We thank M. Boguski, S. Friend, L. Hartwell and A. W. Murray for support, advice and encouragement; J. Burchard, J. Castle, Y. He, M. Margarint and E. Tan for help with BLAST and clustering; L. Garwin, M. Groudine, J. Johnson, P. Linsley, P. Lum, D. Marks, C. Roberts, M. Roth, C. Sander, E. Schadt and S. Tapscott for comments and useful discussions on this work; and B. Blencowe and S. McCracken for lab space, reagents and assistance with experiments in Fig. 6. This work was supported by Rosetta Inpharmatics, a CIHR Operating Grant to T.R.H. and the Ontario Premier's Research Excellence Award to T.R.H.
The authors declare no competing financial interests.
About this article
Cite this article
Wu, L., Hughes, T., Davierwala, A. et al. Large-scale prediction of Saccharomyces cerevisiae gene function using overlapping transcriptional clusters. Nat Genet 31, 255–265 (2002). https://doi.org/10.1038/ng906
This article is cited by
Scientific Reports (2022)
Scientific Reports (2021)
Disruption in iron homeostasis and impaired activity of iron-sulfur cluster containing proteins in the yeast model of Shwachman-Diamond syndrome
Cell & Bioscience (2020)
LSTrAP-Crowd: prediction of novel components of bacterial ribosomes with crowd-sourced analysis of RNA sequencing data
BMC Biology (2020)
Biological Network Analyses of WRKY Transcription Factor Family in Soybean (Glycine max) under Low Phosphorus Treatment
Journal of Crop Science and Biotechnology (2020)