Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Large-scale prediction of Saccharomyces cerevisiae gene function using overlapping transcriptional clusters

This article has been updated


Genome sequencing has led to the discovery of tens of thousands of potential new genes. Six years after the sequencing of the well-studied yeast Saccharomyces cerevisiae and the discovery that its genome encodes 6,000 predicted proteins, more than 2,000 have not yet been characterized experimentally, and determining their functions seems far from a trivial task. One crucial constraint is the generation of useful hypotheses about protein function. Using a new approach to interpret microarray data, we assign likely cellular functions with confidence values to these new yeast proteins. We perform extensive genome-wide validations of our predictions and offer visualization methods for exploration of the large numbers of functional predictions. We identify potential new members of many existing functional categories including 285 candidate proteins involved in transcription, processing and transport of non-coding RNA molecules. We present experimental validation confirming the involvement of several of these proteins in ribosomal RNA processing. Our methodology can be applied to a variety of genomics data types and organisms.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type



Prices may be subject to local taxes which are calculated during checkout

Figure 1: Overview of prediction and validation approach.
Figure 2: Exploratory visualization of annotated clusters.
Figure 3: Functional prediction for POL30.
Figure 4: Validation of predictions using known annotations, for a variety of parameters and choices shown in Fig 1.
Figure 5: Cellular Role category predictions for 1,644 genes (2,368 annotations) previously unclassified by Cellular Role.
Figure 6: Northern-blot analysis of total RNA extracted from strains with TET promoters integrated upstream of genes predicted with high confidence to function in rRNA processing and modification.

Similar content being viewed by others

Change history

  • 19 June 2002

    added supplementary figure callouts


  1. Lander, E.S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).

    Article  CAS  Google Scholar 

  2. Venter, J.C. et al. The sequence of the human genome. Science 291, 1304–1351 (2001).

    Article  CAS  Google Scholar 

  3. Genome sequence of the nematode C. elegans: a platform for investigating biology. The C. elegans Sequencing Consortium. Science 282, 2012–2018 (1998).

  4. Adams, M.D. et al. The genome sequence of Drosophila melanogaster. Science 287, 2185–2195 (2000).

    Article  Google Scholar 

  5. Goffeau, A. et al. Life with 6000 genes. Science 274, 546, 563–547 (1996).

    Article  CAS  Google Scholar 

  6. Consortium, T.C.e.S. Genome sequence of the nematode C. elegans: a platform for investigating biology. Science 282, 2012–2018 (1998).

  7. Hodges, P.E., McKee, A.H., Davis, B.P., Payne, W.E. & Garrels, J.I. The Yeast Proteome Database (YPD): a model for the organization and presentation of genome-wide functional data. Nucleic Acids Res. 27, 69–73 (1999).

    Article  CAS  Google Scholar 

  8. Mewes, H.W., Albermann, K., Heumann, K., Liebl, S. & Pfeiffer, F. MIPS: a database for protein sequences, homology data and yeast genome information. Nucleic Acids Res. 25, 28–30 (1997).

    Article  CAS  Google Scholar 

  9. Ball, C.A. et al. Intergrating functional genomic information into the Saccharomyces genome database. Nucleic Acids Res. 28, 77–80 (2000).

    Article  CAS  Google Scholar 

  10. Bork, P. et al. Predicting function: from genes to genomes and back. J. Mol. Biol. 283, 707–725 (1998).

    Article  CAS  Google Scholar 

  11. Eisen, M.B., Spellman, P.T., Brown, P.O. & Botstein, D. Cluster analysis and display of genome-wide expression patterns. Proc. Natl Acad. Sci. USA 95, 14863–14868 (1998).

    Article  CAS  Google Scholar 

  12. Marcotte, E.M., Pellegrini, M., Thompson, M.J., Yeates, T.O. & Eisenberg, D. A combined algorithm for genome-wide prediction of protein function. Nature 402, 83–86 (1999).

    Article  CAS  Google Scholar 

  13. Niehrs, C. & Pollet, N. Synexpression groups in eukaryotes. Nature 402, 483–487 (1999).

    Article  CAS  Google Scholar 

  14. Brown, M.P. et al. Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc. Natl Acad. Sci. USA 97, 262–267 (2000).

    Article  CAS  Google Scholar 

  15. King, R.D., Karwath, A., Clare, A. & Dehaspe, L. Accurate prediction of protein functional class from sequence in the Mycobacterium tuberculosis and Escherichia coli genomes using data mining. Yeast 17, 283–293 (2000).

    Article  CAS  Google Scholar 

  16. Hishigaki, H., Nakai, K., Ono, T., Tanigami, A. & Takagi, T. Assessment of prediction accuracy of protein function from protein–protein interaction data. Yeast 18, 523–531 (2001).

    Article  CAS  Google Scholar 

  17. Shatkay, H., Edwards, S., Wilbur, W.J. & Boguski, M. Genes, themes and microarrays: using information retrieval for large-scale gene analysis. Proc. Int. Conf. Intell. Syst. Mol. Biol. 8, 317–328 (2000).

    CAS  PubMed  Google Scholar 

  18. Jenssen, T.K., Laegreid, A., Komorowski, J. & Hovig, E. A literature network of human genes for high-throughput analysis of gene expression. Nat. Genet. 28, 21–28 (2001).

    CAS  PubMed  Google Scholar 

  19. Hartigan, J. Clustering Algorithms (John Wiley & Sons, 1975).

    Google Scholar 

  20. Wen, X. et al. Large-scale temporal gene expression mapping of central nervous system development. Proc. Natl Acad. Sci. USA 95, 334–339 (1998).

    Article  CAS  Google Scholar 

  21. DeRisi, J.L., Iyer, V.R. & Brown, P.O. Exploring the metabolic and genetic control of gene expression on a genomic scale. Science 278, 680–686 (1997).

    Article  CAS  Google Scholar 

  22. Miki, R. et al. Delineating developmental and metabolic pathways in vivo by expression profiling using the RIKEN set of 18,816 full-length enriched mouse cDNA arrays. Proc. Natl Acad. Sci. USA 98, 2199–2204 (2001).

    Article  CAS  Google Scholar 

  23. Hughes, T.R. et al. Functional discovery via a compendium of expression profiles. Cell 102, 109–126 (2000).

    Article  CAS  Google Scholar 

  24. Tavazoie, S., Hughes, J.D., Campbell, M.J., Cho, R.J. & Church, G.M. Systematic determination of genetic network architecture. Nat. Genet. 22, 281–285 (1999).

    Article  CAS  Google Scholar 

  25. Goldstein, D.R., Ghosh, D. & Conlon, E.M. Statistical issues in the clustering of gene expression data. Stat. Sinica 12, 219–240 (2002).

    Google Scholar 

  26. Jeffery, C.J. Moonlighting proteins. Trends Biochem. Sci. 24, 8–11 (1999).

    Article  CAS  Google Scholar 

  27. Prosperi, E. Multiple roles of the proliferating cell nuclear antigen: DNA replication, repair and cell cycle control. Prog. Cell Cycle Res. 3, 193–210 (1997).

    Article  CAS  Google Scholar 

  28. Chen, C., Merrill, B.J., Lau, P.J., Holm, C. & Kolodner, R.D. Saccharomyces cerevisiae pol30 (proliferating cell nuclear antigen) mutations impair replication fidelity and mismatch repair. Mol. Cell Biol. 19, 7801–7815 (1999).

    Article  CAS  Google Scholar 

  29. Chu, S. et al. The transcriptional program of sporulation in budding yeast. Science 282, 699–705 (1998).

    Article  CAS  Google Scholar 

  30. Spellman, P.T. et al. Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol. Biol. Cell 9, 3273–3297 (1998).

    Article  CAS  Google Scholar 

  31. Roberts, C.J. et al. Signaling and circuitry of multiple MAPK pathways revealed by a matrix of global gene expression profiles. Science 287, 873–880 (2000).

    Article  CAS  Google Scholar 

  32. Kressler, D., Linder, P. & de La Cruz, J. Protein trans-acting factors involved in ribosome biogenesis in Saccharomyces cerevisiae. Mol. Cell Biol. 19, 7897–7912 (1999).

    Article  CAS  Google Scholar 

  33. Paule, M.R. & White, R.J. Survey and summary: transcription by RNA polymerases I and III. Nucleic Acids Res. 28, 1283–1298 (2000).

    Article  CAS  Google Scholar 

  34. Spingola, M., Grate, L., Haussler, D. & Ares, M., Jr. Genome-wide bioinformatic and molecular analysis of introns in Saccharomyces cerevisiae. RNA 5, 221–234 (1999).

    Article  CAS  Google Scholar 

  35. Cheng, Y., Dahlberg, J.E. & Lund, E. Diverse effects of the guanine nucleotide exchange factor RCC1 on RNA transport. Science 267, 1807–1810 (1995).

    Article  CAS  Google Scholar 

  36. Winzeler, E.A. et al. Functional characterization of the S. cerevisiae genome by gene deletion and parallel analysis. Science 285, 901–906 (1999).

    Article  CAS  Google Scholar 

  37. Gari, E., Piedrafita, L., Aldea, M. & Herrero, E. A set of vectors with a tetracycline-regulatable promoter system for modulated gene expression in Saccharomyces cerevisiae. Yeast 13, 837–848 (1997).

    Article  CAS  Google Scholar 

  38. Gelperin, D., Horton, L., Beckman, J., Hensold, J. & Lemmon, S.K. Bms1p, a novel GTP-binding protein, and the related Tsr1p are required for distinct steps of 40S ribosome biogenesis in yeast. RNA 7, 1268–1283 (2001).

    Article  CAS  Google Scholar 

  39. Bassler, J. et al. Identification of a 60S preribosomal particle that is closely linked to nuclear export. Mol. Cell 8, 517–529 (2001).

    Article  CAS  Google Scholar 

  40. Kim, S.K. et al. A gene expression map for Caenorhabditis elegans. Science 293, 2087–2092 (2001).

    Article  CAS  Google Scholar 

  41. Ashburner, M. et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25, 25–29 (2000).

    Article  CAS  Google Scholar 

  42. Kohonen, T. Self-Organizing Maps (2001).

    Book  Google Scholar 

Download references


We thank M. Boguski, S. Friend, L. Hartwell and A. W. Murray for support, advice and encouragement; J. Burchard, J. Castle, Y. He, M. Margarint and E. Tan for help with BLAST and clustering; L. Garwin, M. Groudine, J. Johnson, P. Linsley, P. Lum, D. Marks, C. Roberts, M. Roth, C. Sander, E. Schadt and S. Tapscott for comments and useful discussions on this work; and B. Blencowe and S. McCracken for lab space, reagents and assistance with experiments in Fig. 6. This work was supported by Rosetta Inpharmatics, a CIHR Operating Grant to T.R.H. and the Ontario Premier's Research Excellence Award to T.R.H.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Steven J. Altschuler.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Wu, L., Hughes, T., Davierwala, A. et al. Large-scale prediction of Saccharomyces cerevisiae gene function using overlapping transcriptional clusters. Nat Genet 31, 255–265 (2002).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

This article is cited by


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing