Abstract
Several widely used methods for predicting functional associations between proteins are based on the systematic analysis of genomic context. Efforts are ongoing to improve these methods and to search for novel aspects in genomes that could be exploited for function prediction. Here, we use gene expression data to demonstrate two functional implications of genome organization: first, chromosomal proximity indicates gene coregulation in prokaryotes independent of relative gene orientation; and second, adjacent bidirectionally transcribed genes (that is,'divergently' organized coding regions) with conserved gene orientation are strongly coregulated. We further demonstrate that such bidirectionally transcribed gene pairs are functionally associated and derive from this a novel genomic context method that reliably predicts links between >2,500 pairs of genes in ∼100 species. Around 650 of these functional associations are supported by other genomic context methods. In most instances, one gene encodes a transcriptional regulator, and the other a nonregulatory protein. In-depth analysis in Escherichia coli shows that the vast majority of these regulators both control transcription of the divergently transcribed target gene/operon and auto-regulate their own biosynthesis. The method thus enables the prediction of target processes and regulatory features for several hundred transcriptional regulators.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Enright, A.J., Iliopoulos, I., Kyrpides, N.C. & Ouzounis, C.A. Protein interaction maps for complete genomes based on gene fusion events. Nature 402, 86–90 (1999).
Marcotte, E.M. et al. Detecting protein function and protein–protein interactions from genome sequences. Science 285, 751–753 (1999).
Dandekar, T., Snel, B., Huynen, M. & Bork, P. Conservation of gene order: a fingerprint of proteins that physically interact. Trends Biochem. Sci. 23, 324–328 (1998).
Overbeek, R., Fonstein, M., D'Souza, M., Pusch, G.D. & Maltsev, N. The use of gene clusters to infer functional coupling. Proc. Natl. Acad. Sci. USA 96, 2896–2901 (1999).
Pellegrini, M., Marcotte, E.M., Thompson, M.J., Eisenberg, D. & Yeates, T.O. Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc. Natl. Acad. Sci. USA 96, 4285–4288 (1999).
Marcotte, E.M., Xenarios, I., van Der Bliek, A.M. & Eisenberg, D. Localizing proteins in the cell from their phylogenetic profiles. Proc. Natl. Acad. Sci. USA 97, 12115–12120 (2000).
Kolesov, G., Mewes, H.W. & Frishman, D. SNAPping up functionally related genes based on context information: a colinearity-free approach. J. Mol. Biol. 311, 639–656 (2001).
Mellor, J.C., Yanai, I., Clodfelter, K.H., Mintseris, J. & DeLisi, C. Predictome: a database of putative functional links between proteins. Nucleic Acids Res. 30, 306–309 (2002).
Wu, J., Kasif, S. & DeLisi, C. Identification of functional links between genes using phylogenetic profiles. Bioinformatics 19, 1524–1530 (2003).
Overbeek, R. et al. The ERGO genome analysis and discovery system. Nucleic Acids Res. 31, 164–171 (2003).
Date, S.V. & Marcotte, E.M. Discovery of uncharacterized cellular systems by genome-wide analysis of functional linkages. Nat. Biotechnol. 21, 1055–1062 (2003).
von Mering, C. et al. STRING: a database of predicted functional associations between proteins. Nucleic Acids Res. 31, 258–261 (2003).
Salwinski, L. & Eisenberg, D. Computational methods of analysis of protein-protein interactions. Curr. Opin. Struct. Biol. 13, 377–382 (2003).
Ouzounis, C.A., Coulson, R.M., Enright, A.J., Kunin, V. & Pereira-Leal, J.B. Classification schemes for protein structure and function. Nat. Rev. Genet. 4, 508–519 (2003).
Valencia, A. & Pazos, F. Computational methods for the prediction of protein interactions. Curr. Opin. Struct. Biol. 12, 368–373 (2002).
Aloy, P. & Russell, R.B. Interrogating protein interaction networks through structural biology. Proc. Natl. Acad. Sci. USA 99, 5896–5901 (2002).
Jansen, R. et al. A Bayesian networks approach for predicting protein-protein interactions from genomic data. Science 302, 449–453 (2003).
Bader, G.D. et al. Functional genomics and proteomics: charting a multidimensional map of the yeast cell. Trends Cell. Biol. 13, 344–356 (2003).
Alm, E. & Arkin, A.P. Biological networks. Curr. Opin. Struct. Biol. 13, 193–202 (2003).
Li, S. et al. A map of the interactome network of the metazoan C. elegans. Science 303, 540–543 (2004).
Bork, P. et al. Protein interaction networks from yeast to human. Curr. Opin. Struct. Biol. 14, 292–299 (2004).
Altschul, S.F. & Koonin, E.V. Iterated profile searches with PSI-BLAST—a tool for discovery in protein databases. Trends Biochem. Sci. 23, 444–447 (1998).
Bateman, A. et al. The Pfam protein families database. Nucleic Acids Res. 30, 276–280 (2002).
Letunic, I. et al. SMART 4.0: towards genomic data integration. Nucleic Acids Res. 32, Database issue, D142–144 (2004).
Tatusov, R.L. et al. The COG database: an updated version includes eukaryotes. BMC Bioinformatics 4, 41 (2003).
Morett, E. et al. Systematic discovery of analogous enzymes in thiamin biosynthesis. Nat. Biotechnol. 21, 790–795 (2003).
Jacob, F. The operon after 25 years. C.R. Acad. Sci. III 320, 199–206 (1997).
Salgado, H., Moreno-Hagelsieb, G., Smith, T.F. & Collado-Vides, J. Operons in Escherichia coli: genomic analyses and predictions. Proc. Natl. Acad. Sci. USA 97, 6652–6657 (2000).
Rhee, K.Y. et al. Transcriptional coupling between the divergent promoters of a prototypic LysR-type regulatory system, the ilvYC operon of Escherichia coli. Proc. Natl. Acad. Sci. USA 96, 14294–14299 (1999).
Adachi, N. & Lieber, M.R. Bidirectional gene organization: a common architectural feature of the human genome. Cell 109, 807–809 (2002).
Beck, C.F. & Warren, R.A. Divergent promoters, a common form of gene organization. Microbiol. Rev. 52, 318–326 (1988).
El-Robh, M.S. & Busby, S.J. The Escherichia coli cAMP receptor protein bound at a single target can activate transcription initiation at divergent promoters: a systematic study that exploits new promoter probe plasmids. Biochem. J. 368, 835–843 (2002).
Stuart, J.M., Segal, E., Koller, D. & Kim, S.K. A gene-coexpression network for global discovery of conserved genetic modules. Science 302, 249–255 (2003).
van Noort, V., Snel, B. & Huynen, M.A. Predicting gene function by conserved co-expression. Trends Genet. 19, 238–242 (2003).
Huynen, M.A. & Snel, B. Gene and context: integrative approaches to genome analysis. Adv. Protein Chem. 54, 345–379 (2000).
Bork, P. et al. Empirical and analytical approaches to gene order dynamics, map alignment and the evolution of gene families. in Comparative Genomics, vol. 1 (Sankoff, D. & Nadeau, J.H., eds.) 281–294 (Kluwer academic publishers, Dordrecht, 2000).
Gollub, J. et al. The Stanford Microarray Database: data access and quality assessment tools. Nucleic Acids Res. 31, 94–96 (2003).
Zhou, X., Kao, M.C. & Wong, W.H. Transitive functional annotation by shortest-path analysis of gene expression data. Proc. Natl. Acad. Sci. USA 99, 12783–12788 (2002).
Salgado, H. et al. RegulonDB (version 3.2): transcriptional regulation and operon organization in Escherichia coli K-12. Nucleic Acids Res. 29, 72–74 (2001).
Munch, R. et al. PRODORIC: prokaryotic database of gene regulation. Nucleic Acids Res. 31, 266–269 (2003).
Madan Babu, M. & Teichmann, S.A. Evolution of transcription factors and the gene regulatory network in Escherichia coli. Nucleic Acids Res. 31, 1234–1244 (2003).
Kanehisa, M., Goto, S., Kawashima, S. & Nakaya, A. The KEGG databases at GenomeNet. Nucleic Acids Res. 30, 42–46 (2002).
Von Mering, C. et al. Genome evolution reveals biochemical networks and functional modules. Proc. Natl. Acad. Sci. USA 100, 15428–15433 (2003).
Altschul, S.F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).
Gabaldon, T. & Huynen, M.A. Reconstruction of the proto-mitochondrial metabolism. Science 301, 609 (2003).
Huh, W.K. et al. Global analysis of protein localization in budding yeast. Nature 425, 686–691 (2003).
Steinmetz, L.M. et al. Systematic screen for human disease genes in yeast. Nat. Genet. 31, 400–404 (2002).
Lee, T.I. et al. Transcriptional regulatory networks in Saccharomyces cerevisiae. Science 298, 799–804 (2002).
Warner, J.R. The economics of ribosome biosynthesis in yeast. Trends Biochem. Sci. 24, 437–440 (1999).
Snel, B., Bork, P. & Huynen, M.A. Genome phylogeny based on gene content. Nat. Genet. 21, 108–110 (1999).
Korbel, J.O., Snel, B., Huynen, M.A. & Bork, P. SHOT: a web server for the construction of genome phylogenies. Trends Genet. 18, 158–162 (2002).
Hedges, S.B. The origin and evolution of model organisms. Nat. Rev. Genet. 3, 838–849 (2002).
Feng, D.F., Cho, G. & Doolittle, R.F. Determining divergence times with a protein clock: update and reevaluation. Proc. Natl. Acad. Sci. USA 94, 13028–13033 (1997).
Doolittle, R.F., Feng, D.F., Tsang, S., Cho, G. & Little, E. Determining divergence times of the major kingdoms of living organisms with a protein clock. Science 271, 470–477 (1996).
Workman, C. et al. A new non-linear normalization method for reducing variability in DNA microarray experiments. Genome Biol. 3, research0048, 30 August 2002, doi:10.1186/gb-2002-3-9-research0048.
Huelsenbeck, J.P. & Ronquist, F. MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics 17, 754–755 (2001).
Chenna, R. et al. Multiple sequence alignment with the Clustal series of programs. Nucleic Acids Res. 31, 3497–3500 (2003).
Tatusov, R.L., Koonin, E.V. & Lipman, D.J. A genomic perspective on protein families. Science 278, 631–637 (1997).
Huynen, M.A. & Bork, P. Measuring genome evolution. Proc. Natl. Acad. Sci. USA 95, 5849–5856 (1998).
Marcotte, E.M. Computational genetics: finding protein function by nonhomology methods. Curr. Opin. Struct. Biol. 10, 359–365 (2000).
Galperin, M.Y. & Koonin, E.V. Who's your neighbor? New computational approaches for functional genomics. Nat. Biotechnol. 18, 609–613 (2000).
Osterman, A. & Overbeek, R. Missing genes in metabolic pathways: a comparative genomics approach. Curr. Opin. Chem. Biol. 7, 238–251 (2003).
Huynen, M.A., Snel, B., von Mering, C. & Bork, P. Function prediction and protein networks. Curr. Opin. Cell. Biol. 15, 191–198 (2003).
Pazos, F. & Valencia, A. Similarity of phylogenetic trees as indicator of protein-protein interaction. Protein Eng. 14, 609–614 (2001).
Thomas, G., Coutts, G. & Merrick, M. The glnKamtB operon. A conserved gene pair in prokaryotes. Trends Genet. 16, 11–14 (2000).
Coutts, G., Thomas, G., Blakey, D. & Merrick, M. Membrane sequestration of the signal transduction protein GlnK by the ammonium transporter AmtB. EMBO J. 21, 536–545 (2002).
Weller, G.R. et al. Identification of a DNA nonhomologous end-joining complex in bacteria. Science 297, 1686–1689 (2002).
Daugherty, M., Vonstein, V., Overbeek, R. & Osterman, A. Archaeal shikimate kinase, a new member of the GHMP-kinase family. J. Bacteriol. 183, 292–300 (2001).
Huynen, M.A., Snel, B., Bork, P. & Gibson, T.J. The phylogenetic distribution of frataxin indicates a role in iron-sulfur cluster protein assembly. Hum. Mol. Genet. 10, 2463–2468 (2001).
Muhlenhoff, U., Richhardt, N., Ristow, M., Kispal, G. & Lill, R. The yeast frataxin homolog Yfh1p plays a specific role in the maturation of cellular Fe/S proteins. Hum. Mol. Genet. 11, 2025–2036 (2002).
Myllykallio, H. et al. An alternative flavin-dependent mechanism for thymidylate synthesis. Science 297, 105–107 (2002).
Jacob, F. & Monod, J. Genetic regulatory mechanisms in the synthesis of proteins. J. Mol. Biol. 3, 318–356 (1961).
Sabatti, C., Rohlin, L., Oh, M.K. & Liao, J.C. Co-expression pattern from DNA microarray experiments as a tool for operon prediction. Nucleic Acids Res. 30, 2886–2893 (2002).
Zheng, Y., Szustakowski, J.D., Fortnow, L., Roberts, R.J. & Kasif, S. Computational identification of operons in microbial genomes. Genome Res. 12, 1221–1230 (2002).
Chandler, M.G. & Pritchard, R.H. The effect of gene concentration and relative gene dosage on gene output in Escherichia coli. Mol. Gen. Genet. 138, 127–141 (1975).
Ehira, S., Ohmori, M. & Sato, N. Genome-wide expression analysis of the responses to nitrogen deprivation in the heterocyst-forming cyanobacterium Anabaena sp. strain PCC 7120. DNA Res. 10, 97–113 (2003).
Hatfield, G.W. & Benham, C.J. DNA topology-mediated control of global gene expression in Escherichia coli. Annu. Rev. Genet. 36, 175–203 (2002).
Dorman, C.J. & Deighan, P. Regulation of gene expression by histone-like proteins in bacteria. Curr. Opin. Genet. Dev. 13, 179–184 (2003).
Acknowledgements
This work was supported by the Bundesministerium für Forschung und Bildung, Germany. We would like to thank Aidan Budd, Toby Gibson, Florian Raible, David Ussery and members of the Bork group for helpful discussions—and in particular the former group members Martijn Huynen and Berend Snel for invaluable input in the early phase of this project.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Rights and permissions
About this article
Cite this article
Korbel, J., Jensen, L., von Mering, C. et al. Analysis of genomic context: prediction of functional associations from conserved bidirectionally transcribed gene pairs. Nat Biotechnol 22, 911–917 (2004). https://doi.org/10.1038/nbt988
Published:
Issue Date:
DOI: https://doi.org/10.1038/nbt988
This article is cited by
-
Deciphering microbial gene function using natural language processing
Nature Communications (2022)
-
Implementation of homology based and non-homology based computational methods for the identification and annotation of orphan enzymes: using Mycobacterium tuberculosis H37Rv as a case study
BMC Bioinformatics (2020)
-
Whole-genome-based phylogeny of Bacillus cytotoxicus reveals different clades within the species and provides clues on ecology and evolution
Scientific Reports (2019)
-
Genome-wide analysis of fitness data and its application to improve metabolic models
BMC Bioinformatics (2018)
-
Variations on a theme: evolution of the phage-shock-protein system in Actinobacteria
Antonie van Leeuwenhoek (2018)