Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Analysis
  • Published:

Analysis of genomic context: prediction of functional associations from conserved bidirectionally transcribed gene pairs

Abstract

Several widely used methods for predicting functional associations between proteins are based on the systematic analysis of genomic context. Efforts are ongoing to improve these methods and to search for novel aspects in genomes that could be exploited for function prediction. Here, we use gene expression data to demonstrate two functional implications of genome organization: first, chromosomal proximity indicates gene coregulation in prokaryotes independent of relative gene orientation; and second, adjacent bidirectionally transcribed genes (that is,'divergently' organized coding regions) with conserved gene orientation are strongly coregulated. We further demonstrate that such bidirectionally transcribed gene pairs are functionally associated and derive from this a novel genomic context method that reliably predicts links between >2,500 pairs of genes in 100 species. Around 650 of these functional associations are supported by other genomic context methods. In most instances, one gene encodes a transcriptional regulator, and the other a nonregulatory protein. In-depth analysis in Escherichia coli shows that the vast majority of these regulators both control transcription of the divergently transcribed target gene/operon and auto-regulate their own biosynthesis. The method thus enables the prediction of target processes and regulatory features for several hundred transcriptional regulators.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Evolutionary conservation of the orientation of gene neighbors across prokaryotic lineages.
Figure 2: Conserved organization of DT-pairs.
Figure 3: Correlated gene expression of adjacent divergently transcribed E. coli genes.
Figure 4: Combining homology and context information for function prediction.
Figure 5: Different genomic context methods and their relative coverage.
Figure 6: Genomic vicinity indicates gene coexpression.

Similar content being viewed by others

References

  1. Enright, A.J., Iliopoulos, I., Kyrpides, N.C. & Ouzounis, C.A. Protein interaction maps for complete genomes based on gene fusion events. Nature 402, 86–90 (1999).

    Article  CAS  PubMed  Google Scholar 

  2. Marcotte, E.M. et al. Detecting protein function and protein–protein interactions from genome sequences. Science 285, 751–753 (1999).

    Article  CAS  PubMed  Google Scholar 

  3. Dandekar, T., Snel, B., Huynen, M. & Bork, P. Conservation of gene order: a fingerprint of proteins that physically interact. Trends Biochem. Sci. 23, 324–328 (1998).

    Article  CAS  PubMed  Google Scholar 

  4. Overbeek, R., Fonstein, M., D'Souza, M., Pusch, G.D. & Maltsev, N. The use of gene clusters to infer functional coupling. Proc. Natl. Acad. Sci. USA 96, 2896–2901 (1999).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Pellegrini, M., Marcotte, E.M., Thompson, M.J., Eisenberg, D. & Yeates, T.O. Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc. Natl. Acad. Sci. USA 96, 4285–4288 (1999).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Marcotte, E.M., Xenarios, I., van Der Bliek, A.M. & Eisenberg, D. Localizing proteins in the cell from their phylogenetic profiles. Proc. Natl. Acad. Sci. USA 97, 12115–12120 (2000).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Kolesov, G., Mewes, H.W. & Frishman, D. SNAPping up functionally related genes based on context information: a colinearity-free approach. J. Mol. Biol. 311, 639–656 (2001).

    Article  CAS  PubMed  Google Scholar 

  8. Mellor, J.C., Yanai, I., Clodfelter, K.H., Mintseris, J. & DeLisi, C. Predictome: a database of putative functional links between proteins. Nucleic Acids Res. 30, 306–309 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Wu, J., Kasif, S. & DeLisi, C. Identification of functional links between genes using phylogenetic profiles. Bioinformatics 19, 1524–1530 (2003).

    Article  CAS  PubMed  Google Scholar 

  10. Overbeek, R. et al. The ERGO genome analysis and discovery system. Nucleic Acids Res. 31, 164–171 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Date, S.V. & Marcotte, E.M. Discovery of uncharacterized cellular systems by genome-wide analysis of functional linkages. Nat. Biotechnol. 21, 1055–1062 (2003).

    Article  CAS  PubMed  Google Scholar 

  12. von Mering, C. et al. STRING: a database of predicted functional associations between proteins. Nucleic Acids Res. 31, 258–261 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Salwinski, L. & Eisenberg, D. Computational methods of analysis of protein-protein interactions. Curr. Opin. Struct. Biol. 13, 377–382 (2003).

    Article  CAS  PubMed  Google Scholar 

  14. Ouzounis, C.A., Coulson, R.M., Enright, A.J., Kunin, V. & Pereira-Leal, J.B. Classification schemes for protein structure and function. Nat. Rev. Genet. 4, 508–519 (2003).

    Article  CAS  PubMed  Google Scholar 

  15. Valencia, A. & Pazos, F. Computational methods for the prediction of protein interactions. Curr. Opin. Struct. Biol. 12, 368–373 (2002).

    Article  CAS  PubMed  Google Scholar 

  16. Aloy, P. & Russell, R.B. Interrogating protein interaction networks through structural biology. Proc. Natl. Acad. Sci. USA 99, 5896–5901 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Jansen, R. et al. A Bayesian networks approach for predicting protein-protein interactions from genomic data. Science 302, 449–453 (2003).

    Article  CAS  PubMed  Google Scholar 

  18. Bader, G.D. et al. Functional genomics and proteomics: charting a multidimensional map of the yeast cell. Trends Cell. Biol. 13, 344–356 (2003).

    Article  CAS  PubMed  Google Scholar 

  19. Alm, E. & Arkin, A.P. Biological networks. Curr. Opin. Struct. Biol. 13, 193–202 (2003).

    Article  CAS  PubMed  Google Scholar 

  20. Li, S. et al. A map of the interactome network of the metazoan C. elegans. Science 303, 540–543 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Bork, P. et al. Protein interaction networks from yeast to human. Curr. Opin. Struct. Biol. 14, 292–299 (2004).

    Article  CAS  PubMed  Google Scholar 

  22. Altschul, S.F. & Koonin, E.V. Iterated profile searches with PSI-BLAST—a tool for discovery in protein databases. Trends Biochem. Sci. 23, 444–447 (1998).

    Article  CAS  PubMed  Google Scholar 

  23. Bateman, A. et al. The Pfam protein families database. Nucleic Acids Res. 30, 276–280 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Letunic, I. et al. SMART 4.0: towards genomic data integration. Nucleic Acids Res. 32, Database issue, D142–144 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Tatusov, R.L. et al. The COG database: an updated version includes eukaryotes. BMC Bioinformatics 4, 41 (2003).

    Article  PubMed  PubMed Central  Google Scholar 

  26. Morett, E. et al. Systematic discovery of analogous enzymes in thiamin biosynthesis. Nat. Biotechnol. 21, 790–795 (2003).

    Article  CAS  PubMed  Google Scholar 

  27. Jacob, F. The operon after 25 years. C.R. Acad. Sci. III 320, 199–206 (1997).

    Article  CAS  PubMed  Google Scholar 

  28. Salgado, H., Moreno-Hagelsieb, G., Smith, T.F. & Collado-Vides, J. Operons in Escherichia coli: genomic analyses and predictions. Proc. Natl. Acad. Sci. USA 97, 6652–6657 (2000).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Rhee, K.Y. et al. Transcriptional coupling between the divergent promoters of a prototypic LysR-type regulatory system, the ilvYC operon of Escherichia coli. Proc. Natl. Acad. Sci. USA 96, 14294–14299 (1999).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Adachi, N. & Lieber, M.R. Bidirectional gene organization: a common architectural feature of the human genome. Cell 109, 807–809 (2002).

    Article  CAS  PubMed  Google Scholar 

  31. Beck, C.F. & Warren, R.A. Divergent promoters, a common form of gene organization. Microbiol. Rev. 52, 318–326 (1988).

    CAS  PubMed  PubMed Central  Google Scholar 

  32. El-Robh, M.S. & Busby, S.J. The Escherichia coli cAMP receptor protein bound at a single target can activate transcription initiation at divergent promoters: a systematic study that exploits new promoter probe plasmids. Biochem. J. 368, 835–843 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Stuart, J.M., Segal, E., Koller, D. & Kim, S.K. A gene-coexpression network for global discovery of conserved genetic modules. Science 302, 249–255 (2003).

    Article  CAS  PubMed  Google Scholar 

  34. van Noort, V., Snel, B. & Huynen, M.A. Predicting gene function by conserved co-expression. Trends Genet. 19, 238–242 (2003).

    Article  CAS  PubMed  Google Scholar 

  35. Huynen, M.A. & Snel, B. Gene and context: integrative approaches to genome analysis. Adv. Protein Chem. 54, 345–379 (2000).

    Article  CAS  PubMed  Google Scholar 

  36. Bork, P. et al. Empirical and analytical approaches to gene order dynamics, map alignment and the evolution of gene families. in Comparative Genomics, vol. 1 (Sankoff, D. & Nadeau, J.H., eds.) 281–294 (Kluwer academic publishers, Dordrecht, 2000).

    Chapter  Google Scholar 

  37. Gollub, J. et al. The Stanford Microarray Database: data access and quality assessment tools. Nucleic Acids Res. 31, 94–96 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Zhou, X., Kao, M.C. & Wong, W.H. Transitive functional annotation by shortest-path analysis of gene expression data. Proc. Natl. Acad. Sci. USA 99, 12783–12788 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Salgado, H. et al. RegulonDB (version 3.2): transcriptional regulation and operon organization in Escherichia coli K-12. Nucleic Acids Res. 29, 72–74 (2001).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Munch, R. et al. PRODORIC: prokaryotic database of gene regulation. Nucleic Acids Res. 31, 266–269 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Madan Babu, M. & Teichmann, S.A. Evolution of transcription factors and the gene regulatory network in Escherichia coli. Nucleic Acids Res. 31, 1234–1244 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Kanehisa, M., Goto, S., Kawashima, S. & Nakaya, A. The KEGG databases at GenomeNet. Nucleic Acids Res. 30, 42–46 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Von Mering, C. et al. Genome evolution reveals biochemical networks and functional modules. Proc. Natl. Acad. Sci. USA 100, 15428–15433 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Altschul, S.F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Gabaldon, T. & Huynen, M.A. Reconstruction of the proto-mitochondrial metabolism. Science 301, 609 (2003).

    Article  CAS  PubMed  Google Scholar 

  46. Huh, W.K. et al. Global analysis of protein localization in budding yeast. Nature 425, 686–691 (2003).

    Article  CAS  PubMed  Google Scholar 

  47. Steinmetz, L.M. et al. Systematic screen for human disease genes in yeast. Nat. Genet. 31, 400–404 (2002).

    Article  CAS  PubMed  Google Scholar 

  48. Lee, T.I. et al. Transcriptional regulatory networks in Saccharomyces cerevisiae. Science 298, 799–804 (2002).

    Article  CAS  PubMed  Google Scholar 

  49. Warner, J.R. The economics of ribosome biosynthesis in yeast. Trends Biochem. Sci. 24, 437–440 (1999).

    Article  CAS  PubMed  Google Scholar 

  50. Snel, B., Bork, P. & Huynen, M.A. Genome phylogeny based on gene content. Nat. Genet. 21, 108–110 (1999).

    Article  CAS  PubMed  Google Scholar 

  51. Korbel, J.O., Snel, B., Huynen, M.A. & Bork, P. SHOT: a web server for the construction of genome phylogenies. Trends Genet. 18, 158–162 (2002).

    Article  CAS  PubMed  Google Scholar 

  52. Hedges, S.B. The origin and evolution of model organisms. Nat. Rev. Genet. 3, 838–849 (2002).

    Article  CAS  PubMed  Google Scholar 

  53. Feng, D.F., Cho, G. & Doolittle, R.F. Determining divergence times with a protein clock: update and reevaluation. Proc. Natl. Acad. Sci. USA 94, 13028–13033 (1997).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Doolittle, R.F., Feng, D.F., Tsang, S., Cho, G. & Little, E. Determining divergence times of the major kingdoms of living organisms with a protein clock. Science 271, 470–477 (1996).

    Article  CAS  PubMed  Google Scholar 

  55. Workman, C. et al. A new non-linear normalization method for reducing variability in DNA microarray experiments. Genome Biol. 3, research0048, 30 August 2002, doi:10.1186/gb-2002-3-9-research0048.

  56. Huelsenbeck, J.P. & Ronquist, F. MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics 17, 754–755 (2001).

    Article  CAS  PubMed  Google Scholar 

  57. Chenna, R. et al. Multiple sequence alignment with the Clustal series of programs. Nucleic Acids Res. 31, 3497–3500 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Tatusov, R.L., Koonin, E.V. & Lipman, D.J. A genomic perspective on protein families. Science 278, 631–637 (1997).

    Article  CAS  PubMed  Google Scholar 

  59. Huynen, M.A. & Bork, P. Measuring genome evolution. Proc. Natl. Acad. Sci. USA 95, 5849–5856 (1998).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Marcotte, E.M. Computational genetics: finding protein function by nonhomology methods. Curr. Opin. Struct. Biol. 10, 359–365 (2000).

    Article  CAS  PubMed  Google Scholar 

  61. Galperin, M.Y. & Koonin, E.V. Who's your neighbor? New computational approaches for functional genomics. Nat. Biotechnol. 18, 609–613 (2000).

    Article  CAS  PubMed  Google Scholar 

  62. Osterman, A. & Overbeek, R. Missing genes in metabolic pathways: a comparative genomics approach. Curr. Opin. Chem. Biol. 7, 238–251 (2003).

    Article  CAS  PubMed  Google Scholar 

  63. Huynen, M.A., Snel, B., von Mering, C. & Bork, P. Function prediction and protein networks. Curr. Opin. Cell. Biol. 15, 191–198 (2003).

    Article  CAS  PubMed  Google Scholar 

  64. Pazos, F. & Valencia, A. Similarity of phylogenetic trees as indicator of protein-protein interaction. Protein Eng. 14, 609–614 (2001).

    Article  CAS  PubMed  Google Scholar 

  65. Thomas, G., Coutts, G. & Merrick, M. The glnKamtB operon. A conserved gene pair in prokaryotes. Trends Genet. 16, 11–14 (2000).

    Article  CAS  PubMed  Google Scholar 

  66. Coutts, G., Thomas, G., Blakey, D. & Merrick, M. Membrane sequestration of the signal transduction protein GlnK by the ammonium transporter AmtB. EMBO J. 21, 536–545 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. Weller, G.R. et al. Identification of a DNA nonhomologous end-joining complex in bacteria. Science 297, 1686–1689 (2002).

    Article  CAS  PubMed  Google Scholar 

  68. Daugherty, M., Vonstein, V., Overbeek, R. & Osterman, A. Archaeal shikimate kinase, a new member of the GHMP-kinase family. J. Bacteriol. 183, 292–300 (2001).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  69. Huynen, M.A., Snel, B., Bork, P. & Gibson, T.J. The phylogenetic distribution of frataxin indicates a role in iron-sulfur cluster protein assembly. Hum. Mol. Genet. 10, 2463–2468 (2001).

    Article  CAS  PubMed  Google Scholar 

  70. Muhlenhoff, U., Richhardt, N., Ristow, M., Kispal, G. & Lill, R. The yeast frataxin homolog Yfh1p plays a specific role in the maturation of cellular Fe/S proteins. Hum. Mol. Genet. 11, 2025–2036 (2002).

    Article  PubMed  Google Scholar 

  71. Myllykallio, H. et al. An alternative flavin-dependent mechanism for thymidylate synthesis. Science 297, 105–107 (2002).

    Article  CAS  PubMed  Google Scholar 

  72. Jacob, F. & Monod, J. Genetic regulatory mechanisms in the synthesis of proteins. J. Mol. Biol. 3, 318–356 (1961).

    Article  CAS  PubMed  Google Scholar 

  73. Sabatti, C., Rohlin, L., Oh, M.K. & Liao, J.C. Co-expression pattern from DNA microarray experiments as a tool for operon prediction. Nucleic Acids Res. 30, 2886–2893 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  74. Zheng, Y., Szustakowski, J.D., Fortnow, L., Roberts, R.J. & Kasif, S. Computational identification of operons in microbial genomes. Genome Res. 12, 1221–1230 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  75. Chandler, M.G. & Pritchard, R.H. The effect of gene concentration and relative gene dosage on gene output in Escherichia coli. Mol. Gen. Genet. 138, 127–141 (1975).

    Article  CAS  PubMed  Google Scholar 

  76. Ehira, S., Ohmori, M. & Sato, N. Genome-wide expression analysis of the responses to nitrogen deprivation in the heterocyst-forming cyanobacterium Anabaena sp. strain PCC 7120. DNA Res. 10, 97–113 (2003).

    Article  CAS  PubMed  Google Scholar 

  77. Hatfield, G.W. & Benham, C.J. DNA topology-mediated control of global gene expression in Escherichia coli. Annu. Rev. Genet. 36, 175–203 (2002).

    Article  CAS  PubMed  Google Scholar 

  78. Dorman, C.J. & Deighan, P. Regulation of gene expression by histone-like proteins in bacteria. Curr. Opin. Genet. Dev. 13, 179–184 (2003).

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

This work was supported by the Bundesministerium für Forschung und Bildung, Germany. We would like to thank Aidan Budd, Toby Gibson, Florian Raible, David Ussery and members of the Bork group for helpful discussions—and in particular the former group members Martijn Huynen and Berend Snel for invaluable input in the early phase of this project.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peer Bork.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Rights and permissions

Reprints and permissions

About this article

Cite this article

Korbel, J., Jensen, L., von Mering, C. et al. Analysis of genomic context: prediction of functional associations from conserved bidirectionally transcribed gene pairs. Nat Biotechnol 22, 911–917 (2004). https://doi.org/10.1038/nbt988

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nbt988

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing