Large-scale prediction of Saccharomyces cerevisiae gene function using overlapping transcriptional clusters

Wu, Lani F.; Hughes, Timothy R.; Davierwala, Armaity P.; Robinson, Mark D.; Stoughton, Roland; Altschuler, Steven J.

doi:10.1038/ng906

Article
Published: 24 June 2002

Large-scale prediction of Saccharomyces cerevisiae gene function using overlapping transcriptional clusters

Lani F. Wu¹^na1^nAff3,
Timothy R. Hughes^1,2^na1,
Armaity P. Davierwala²,
Mark D. Robinson²,
Roland Stoughton¹ &
…
Steven J. Altschuler¹^nAff3

Nature Genetics volume 31, pages 255–265 (2002)Cite this article

898 Accesses
256 Citations
3 Altmetric
Metrics details

This article has been updated

Abstract

Genome sequencing has led to the discovery of tens of thousands of potential new genes. Six years after the sequencing of the well-studied yeast Saccharomyces cerevisiae and the discovery that its genome encodes ∼6,000 predicted proteins, more than 2,000 have not yet been characterized experimentally, and determining their functions seems far from a trivial task. One crucial constraint is the generation of useful hypotheses about protein function. Using a new approach to interpret microarray data, we assign likely cellular functions with confidence values to these new yeast proteins. We perform extensive genome-wide validations of our predictions and offer visualization methods for exploration of the large numbers of functional predictions. We identify potential new members of many existing functional categories including 285 candidate proteins involved in transcription, processing and transport of non-coding RNA molecules. We present experimental validation confirming the involvement of several of these proteins in ribosomal RNA processing. Our methodology can be applied to a variety of genomics data types and organisms.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Overview of prediction and validation approach.**

**Figure 2: Exploratory visualization of annotated clusters.**

**Figure 3: Functional prediction for *POL30*.**

**Figure 4: Validation of predictions using known annotations, for a variety of parameters and choices shown in Fig 1.**

**Figure 5: Cellular Role category predictions for 1,644 genes (2,368 annotations) previously unclassified by Cellular Role.**

**Figure 6: Northern-blot analysis of total RNA extracted from strains with TET promoters integrated upstream of genes predicted with high confidence to function in rRNA processing and modification.**

Genomic language model predicts protein co-regulation and function

Article Open access 03 April 2024

Yunha Hwang, Andre L. Cornman, … Peter R. Girguis

Three million images and morphological profiles of cells treated with matched chemical and genetic perturbations

Article Open access 09 April 2024

Srinivas Niranj Chandrasekaran, Beth A. Cimini, … Anne E. Carpenter

Assessing GPT-4 for cell type annotation in single-cell RNA-seq analysis

Article Open access 25 March 2024

Wenpin Hou & Zhicheng Ji

Change history

19 June 2002
added supplementary figure callouts

References

Lander, E.S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).
Article CAS Google Scholar
Venter, J.C. et al. The sequence of the human genome. Science 291, 1304–1351 (2001).
Article CAS Google Scholar
Genome sequence of the nematode C. elegans: a platform for investigating biology. The C. elegans Sequencing Consortium. Science 282, 2012–2018 (1998).
Adams, M.D. et al. The genome sequence of Drosophila melanogaster. Science 287, 2185–2195 (2000).
Article Google Scholar
Goffeau, A. et al. Life with 6000 genes. Science 274, 546, 563–547 (1996).
Article CAS Google Scholar
Consortium, T.C.e.S. Genome sequence of the nematode C. elegans: a platform for investigating biology. Science 282, 2012–2018 (1998).
Hodges, P.E., McKee, A.H., Davis, B.P., Payne, W.E. & Garrels, J.I. The Yeast Proteome Database (YPD): a model for the organization and presentation of genome-wide functional data. Nucleic Acids Res. 27, 69–73 (1999).
Article CAS Google Scholar
Mewes, H.W., Albermann, K., Heumann, K., Liebl, S. & Pfeiffer, F. MIPS: a database for protein sequences, homology data and yeast genome information. Nucleic Acids Res. 25, 28–30 (1997).
Article CAS Google Scholar
Ball, C.A. et al. Intergrating functional genomic information into the Saccharomyces genome database. Nucleic Acids Res. 28, 77–80 (2000).
Article CAS Google Scholar
Bork, P. et al. Predicting function: from genes to genomes and back. J. Mol. Biol. 283, 707–725 (1998).
Article CAS Google Scholar
Eisen, M.B., Spellman, P.T., Brown, P.O. & Botstein, D. Cluster analysis and display of genome-wide expression patterns. Proc. Natl Acad. Sci. USA 95, 14863–14868 (1998).
Article CAS Google Scholar
Marcotte, E.M., Pellegrini, M., Thompson, M.J., Yeates, T.O. & Eisenberg, D. A combined algorithm for genome-wide prediction of protein function. Nature 402, 83–86 (1999).
Article CAS Google Scholar
Niehrs, C. & Pollet, N. Synexpression groups in eukaryotes. Nature 402, 483–487 (1999).
Article CAS Google Scholar
Brown, M.P. et al. Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc. Natl Acad. Sci. USA 97, 262–267 (2000).
Article CAS Google Scholar
King, R.D., Karwath, A., Clare, A. & Dehaspe, L. Accurate prediction of protein functional class from sequence in the Mycobacterium tuberculosis and Escherichia coli genomes using data mining. Yeast 17, 283–293 (2000).
Article CAS Google Scholar
Hishigaki, H., Nakai, K., Ono, T., Tanigami, A. & Takagi, T. Assessment of prediction accuracy of protein function from protein–protein interaction data. Yeast 18, 523–531 (2001).
Article CAS Google Scholar
Shatkay, H., Edwards, S., Wilbur, W.J. & Boguski, M. Genes, themes and microarrays: using information retrieval for large-scale gene analysis. Proc. Int. Conf. Intell. Syst. Mol. Biol. 8, 317–328 (2000).
CAS PubMed Google Scholar
Jenssen, T.K., Laegreid, A., Komorowski, J. & Hovig, E. A literature network of human genes for high-throughput analysis of gene expression. Nat. Genet. 28, 21–28 (2001).
CAS PubMed Google Scholar
Hartigan, J. Clustering Algorithms (John Wiley & Sons, 1975).
Google Scholar
Wen, X. et al. Large-scale temporal gene expression mapping of central nervous system development. Proc. Natl Acad. Sci. USA 95, 334–339 (1998).
Article CAS Google Scholar
DeRisi, J.L., Iyer, V.R. & Brown, P.O. Exploring the metabolic and genetic control of gene expression on a genomic scale. Science 278, 680–686 (1997).
Article CAS Google Scholar
Miki, R. et al. Delineating developmental and metabolic pathways in vivo by expression profiling using the RIKEN set of 18,816 full-length enriched mouse cDNA arrays. Proc. Natl Acad. Sci. USA 98, 2199–2204 (2001).
Article CAS Google Scholar
Hughes, T.R. et al. Functional discovery via a compendium of expression profiles. Cell 102, 109–126 (2000).
Article CAS Google Scholar
Tavazoie, S., Hughes, J.D., Campbell, M.J., Cho, R.J. & Church, G.M. Systematic determination of genetic network architecture. Nat. Genet. 22, 281–285 (1999).
Article CAS Google Scholar
Goldstein, D.R., Ghosh, D. & Conlon, E.M. Statistical issues in the clustering of gene expression data. Stat. Sinica 12, 219–240 (2002).
Google Scholar
Jeffery, C.J. Moonlighting proteins. Trends Biochem. Sci. 24, 8–11 (1999).
Article CAS Google Scholar
Prosperi, E. Multiple roles of the proliferating cell nuclear antigen: DNA replication, repair and cell cycle control. Prog. Cell Cycle Res. 3, 193–210 (1997).
Article CAS Google Scholar
Chen, C., Merrill, B.J., Lau, P.J., Holm, C. & Kolodner, R.D. Saccharomyces cerevisiae pol30 (proliferating cell nuclear antigen) mutations impair replication fidelity and mismatch repair. Mol. Cell Biol. 19, 7801–7815 (1999).
Article CAS Google Scholar
Chu, S. et al. The transcriptional program of sporulation in budding yeast. Science 282, 699–705 (1998).
Article CAS Google Scholar
Spellman, P.T. et al. Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol. Biol. Cell 9, 3273–3297 (1998).
Article CAS Google Scholar
Roberts, C.J. et al. Signaling and circuitry of multiple MAPK pathways revealed by a matrix of global gene expression profiles. Science 287, 873–880 (2000).
Article CAS Google Scholar
Kressler, D., Linder, P. & de La Cruz, J. Protein trans-acting factors involved in ribosome biogenesis in Saccharomyces cerevisiae. Mol. Cell Biol. 19, 7897–7912 (1999).
Article CAS Google Scholar
Paule, M.R. & White, R.J. Survey and summary: transcription by RNA polymerases I and III. Nucleic Acids Res. 28, 1283–1298 (2000).
Article CAS Google Scholar
Spingola, M., Grate, L., Haussler, D. & Ares, M., Jr. Genome-wide bioinformatic and molecular analysis of introns in Saccharomyces cerevisiae. RNA 5, 221–234 (1999).
Article CAS Google Scholar
Cheng, Y., Dahlberg, J.E. & Lund, E. Diverse effects of the guanine nucleotide exchange factor RCC1 on RNA transport. Science 267, 1807–1810 (1995).
Article CAS Google Scholar
Winzeler, E.A. et al. Functional characterization of the S. cerevisiae genome by gene deletion and parallel analysis. Science 285, 901–906 (1999).
Article CAS Google Scholar
Gari, E., Piedrafita, L., Aldea, M. & Herrero, E. A set of vectors with a tetracycline-regulatable promoter system for modulated gene expression in Saccharomyces cerevisiae. Yeast 13, 837–848 (1997).
Article CAS Google Scholar
Gelperin, D., Horton, L., Beckman, J., Hensold, J. & Lemmon, S.K. Bms1p, a novel GTP-binding protein, and the related Tsr1p are required for distinct steps of 40S ribosome biogenesis in yeast. RNA 7, 1268–1283 (2001).
Article CAS Google Scholar
Bassler, J. et al. Identification of a 60S preribosomal particle that is closely linked to nuclear export. Mol. Cell 8, 517–529 (2001).
Article CAS Google Scholar
Kim, S.K. et al. A gene expression map for Caenorhabditis elegans. Science 293, 2087–2092 (2001).
Article CAS Google Scholar
Ashburner, M. et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25, 25–29 (2000).
Article CAS Google Scholar
Kohonen, T. Self-Organizing Maps (2001).
Book Google Scholar

Download references

Acknowledgements

We thank M. Boguski, S. Friend, L. Hartwell and A. W. Murray for support, advice and encouragement; J. Burchard, J. Castle, Y. He, M. Margarint and E. Tan for help with BLAST and clustering; L. Garwin, M. Groudine, J. Johnson, P. Linsley, P. Lum, D. Marks, C. Roberts, M. Roth, C. Sander, E. Schadt and S. Tapscott for comments and useful discussions on this work; and B. Blencowe and S. McCracken for lab space, reagents and assistance with experiments in Fig. 6. This work was supported by Rosetta Inpharmatics, a CIHR Operating Grant to T.R.H. and the Ontario Premier's Research Excellence Award to T.R.H.

Author information

Lani F. Wu & Steven J. Altschuler
Present address: Bauer Center for Genomics Research, Harvard University, Cambridge, Massachusetts, USA
Lani F. Wu and Timothy R. Hughes: These authors contributed equally to this manuscript.

Authors and Affiliations

Rosetta Inpharmatics, Kirkland, Washington, USA
Lani F. Wu, Timothy R. Hughes, Roland Stoughton & Steven J. Altschuler
Banting and Best Department of Medical Research, University of Toronto, Toronto, Canada
Timothy R. Hughes, Armaity P. Davierwala & Mark D. Robinson

Authors

Lani F. Wu
View author publications
You can also search for this author in PubMed Google Scholar
Timothy R. Hughes
View author publications
You can also search for this author in PubMed Google Scholar
Armaity P. Davierwala
View author publications
You can also search for this author in PubMed Google Scholar
Mark D. Robinson
View author publications
You can also search for this author in PubMed Google Scholar
Roland Stoughton
View author publications
You can also search for this author in PubMed Google Scholar
Steven J. Altschuler
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Steven J. Altschuler.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Web Figure A

Web Figure B

Web Figure C

Web Figure D

Web Figure E

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, L., Hughes, T., Davierwala, A. et al. Large-scale prediction of Saccharomyces cerevisiae gene function using overlapping transcriptional clusters. Nat Genet 31, 255–265 (2002). https://doi.org/10.1038/ng906

Download citation

Received: 16 January 2002
Accepted: 14 May 2002
Published: 24 June 2002
Issue Date: July 2002
DOI: https://doi.org/10.1038/ng906

This article is cited by

Transcriptional signatures of wheat inflorescence development
- Carl VanGessel
- James Hamilton
- Stephen Pearce
Scientific Reports (2022)
Transfer of oral bacteria to the fetus during late gestation
- Kevin Yu
- Michelle Rodriguez
- Charles E. Wood
Scientific Reports (2021)
Disruption in iron homeostasis and impaired activity of iron-sulfur cluster containing proteins in the yeast model of Shwachman-Diamond syndrome
- Ayushi Jain
- Phubed Nilatawong
- Amornrat Naranuntarat Jensen
Cell & Bioscience (2020)
LSTrAP-Crowd: prediction of novel components of bacterial ribosomes with crowd-sourced analysis of RNA sequencing data
- Benedict Hew
- Qiao Wen Tan
- Marek Mutwil
BMC Biology (2020)
Biological Network Analyses of WRKY Transcription Factor Family in Soybean (Glycine max) under Low Phosphorus Treatment
- Firat Kurt
- Ertugrul Filiz
Journal of Crop Science and Biotechnology (2020)

Large-scale prediction of Saccharomyces cerevisiae gene function using overlapping transcriptional clusters

Abstract

Access options

Similar content being viewed by others

Genomic language model predicts protein co-regulation and function

Three million images and morphological profiles of cells treated with matched chemical and genetic perturbations

Assessing GPT-4 for cell type annotation in single-cell RNA-seq analysis

Change history

19 June 2002

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Supplementary information

Web Figure A

Web Figure B

Web Figure C

Web Figure D

Web Figure E

Rights and permissions

About this article

Cite this article

This article is cited by

Transcriptional signatures of wheat inflorescence development

Transfer of oral bacteria to the fetus during late gestation

Disruption in iron homeostasis and impaired activity of iron-sulfur cluster containing proteins in the yeast model of Shwachman-Diamond syndrome

LSTrAP-Crowd: prediction of novel components of bacterial ribosomes with crowd-sourced analysis of RNA sequencing data

Biological Network Analyses of WRKY Transcription Factor Family in Soybean (Glycine max) under Low Phosphorus Treatment

Search

Quick links

Abstract

Access options

Similar content being viewed by others

Change history

19 June 2002

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links