Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Global probabilistic annotation of metabolic networks enables enzyme discovery

Abstract

Annotation of organism-specific metabolic networks is one of the main challenges of systems biology. Importantly, owing to inherent uncertainty of computational annotations, predictions of biochemical function need to be treated probabilistically. We present a global probabilistic approach to annotate genome-scale metabolic networks that integrates sequence homology and context-based correlations under a single principled framework. The developed method for global biochemical reconstruction using sampling (GLOBUS) not only provides annotation probabilities for each functional assignment but also suggests likely alternative functions. GLOBUS is based on statistical Gibbs sampling of probable metabolic annotations and is able to make accurate functional assignments even in cases of remote sequence identity to known enzymes. We apply GLOBUS to genomes of Bacillus subtilis and Staphylococcus aureus and validate the method predictions by experimentally demonstrating the 6-phosphogluconolactonase activity of YkgB and the role of the Sps pathway for rhamnose biosynthesis in B. subtilis.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Overview of the GLOBUS method.
Figure 2: GLOBUS precision-recall performance.
Figure 3: In vitro biochemical assays used to characterize activities of SpsI and SpsJ using high-precision MS.
Figure 4: In vitro biochemical assays used to characterize the 6-phosphogluconolactonase activity of YkgB.

Similar content being viewed by others

References

  1. Oberhardt, M.A., Palsson, B.O. & Papin, J.A. Applications of genome-scale metabolic reconstructions. Mol. Syst. Biol. 5, 320 (2009).

    Article  Google Scholar 

  2. Almaas, E., Oltvai, Z.N. & Barabasi, A.L. The activity reaction core and plasticity of metabolic networks. PLoS Comput. Biol. 1, e68 (2005).

    Article  Google Scholar 

  3. Kanehisa, M. et al. KEGG for linking genomes to life and the environment. Nucleic Acids Res. 36, D480–D484 (2008).

    Article  CAS  Google Scholar 

  4. Notebaart, R.A., van Enckevort, F.H., Francke, C., Siezen, R.J. & Teusink, B. Accelerating the reconstruction of genome-scale metabolic networks. BMC Bioinformatics 7, 296 (2006).

    Article  Google Scholar 

  5. Karp, P.D. et al. Pathway Tools version 13.0: integrated software for pathway/genome informatics and systems biology. Brief. Bioinform. 11, 40–79 (2010).

    Article  CAS  Google Scholar 

  6. Lee, D., Redfern, O. & Orengo, C. Predicting protein function from sequence and structure. Nat. Rev. Mol. Cell Biol. 8, 995–1005 (2007).

    Article  CAS  Google Scholar 

  7. Tian, W. & Skolnick, J. How well is enzyme function conserved as a function of pairwise sequence identity? J. Mol. Biol. 333, 863–882 (2003).

    Article  CAS  Google Scholar 

  8. Schnoes, A.M., Brown, S.D., Dodevski, I. & Babbitt, P.C. Annotation error in public databases: misannotation of molecular function in enzyme superfamilies. PLoS Comput. Biol. 5, e1000605 (2009).

    Article  Google Scholar 

  9. Hsiao, T.L., Revelles, O., Chen, L., Sauer, U. & Vitkup, D. Automatic policing of biochemical annotations using genomic correlations. Nat. Chem. Biol. 6, 34–40 (2010).

    Article  CAS  Google Scholar 

  10. Chang, A., Scheer, M., Grote, A., Schomburg, I. & Schomburg, D. BRENDA, AMENDA and FRENDA the enzyme information system: new content and tools in 2009. Nucleic Acids Res. 37, D588–D592 (2009).

    Article  CAS  Google Scholar 

  11. Rebhan, M., Chalifa-Caspi, V., Prilusky, J. & Lancet, D. GeneCards: a novel functional genomics compendium with automated data mining and query reformulation support. Bioinformatics 14, 656–664 (1998).

    Article  CAS  Google Scholar 

  12. Caspi, R. et al. The MetaCyc Database of metabolic pathways and enzymes and the BioCyc collection of Pathway/Genome Databases. Nucleic Acids Res. 36, D623–D631 (2008).

    Article  CAS  Google Scholar 

  13. Boeckmann, B. et al. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res. 31, 365–370 (2003).

    Article  CAS  Google Scholar 

  14. Overbeek, R. et al. The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes. Nucleic Acids Res. 33, 5691–5702 (2005).

    Article  CAS  Google Scholar 

  15. Pellegrini, M., Marcotte, E.M., Thompson, M.J., Eisenberg, D. & Yeates, T.O. Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc. Natl. Acad. Sci. USA 96, 4285–4288 (1999).

    Article  CAS  Google Scholar 

  16. Yanai, I., Derti, A. & DeLisi, C. Genes linked by fusion events are generally of the same functional category: a systematic analysis of 30 microbial genomes. Proc. Natl. Acad. Sci. USA 98, 7940–7945 (2001).

    Article  CAS  Google Scholar 

  17. Wu, L.F. et al. Large-scale prediction of Saccharomyces cerevisiae gene function using overlapping transcriptional clusters. Nat. Genet. 31, 255–265 (2002).

    Article  CAS  Google Scholar 

  18. Overbeek, R., Fonstein, M., D′Souza, M., Pusch, G.D. & Maltsev, N. The use of gene clusters to infer functional coupling. Proc. Natl. Acad. Sci. USA 96, 2896–2901 (1999).

    Article  CAS  Google Scholar 

  19. Eisenberg, D., Marcotte, E.M., Xenarios, I. & Yeates, T.O. Protein function in the post-genomic era. Nature 405, 823–826 (2000).

    Article  CAS  Google Scholar 

  20. Korbel, J.O., Jensen, L.J., von Mering, C. & Bork, P. Analysis of genomic context: prediction of functional associations from conserved bidirectionally transcribed gene pairs. Nat. Biotechnol. 22, 911–917 (2004).

    Article  CAS  Google Scholar 

  21. von Mering, C. et al. Genome evolution reveals biochemical networks and functional modules. Proc. Natl. Acad. Sci. USA 100, 15428–15433 (2003).

    Article  CAS  Google Scholar 

  22. Chen, L. & Vitkup, D. Predicting genes for orphan metabolic activities using phylogenetic profiles. Genome Biol. 7, R17 (2006).

    Article  Google Scholar 

  23. Kharchenko, P., Chen, L., Freund, Y., Vitkup, D. & Church, G.M. Identifying metabolic enzymes with multiple types of association evidence. BMC Bioinformatics 7, 177 (2006).

    Article  Google Scholar 

  24. Price, N.D. & Shmulevich, I. Biochemical and statistical network models for systems biology. Curr. Opin. Biotechnol. 18, 365–370 (2007).

    Article  CAS  Google Scholar 

  25. Kharchenko, P., Church, G.M. & Vitkup, D. Expression dynamics of a cellular metabolic network. Mol. Syst. Biol. 1, 2005.0016 (2005).

    Article  Google Scholar 

  26. Li, S.Z. Markov Random Field Modeling in Image Analysis, Ch. 1 (Springer, Tokyo, 2001).

  27. Casella, G. & George, E.I. Explaining the Gibbs sampler. Am. Stat. 46, 167–174 (1992).

    Google Scholar 

  28. Hastings, W.K. Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57, 97–109 (1970).

    Article  Google Scholar 

  29. Lawrence, C.E. et al. Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science 262, 208–214 (1993).

    Article  CAS  Google Scholar 

  30. Kuepfer, L., Sauer, U. & Blank, L.M. Metabolic functions of duplicate genes in Saccharomyces cerevisiae. Genome Res. 15, 1421–1430 (2005).

    Article  CAS  Google Scholar 

  31. Kirkpatrick, S., Gelatt, C.D. Jr. & Vecchi, M.P. Optimization by Simulated Annealing. Science 220, 671–680 (1983).

    Article  CAS  Google Scholar 

  32. Mo, M.L., Palsson, B.O. & Herrgard, M.J. Connecting extracellular metabolomic measurements to intracellular flux states in yeast. BMC Syst. Biol. 3, 37 (2009).

    Article  Google Scholar 

  33. Henry, C.S., Zinner, J.F., Cohoon, M.P. & Stevens, R.L. iBsu1103: a new genome-scale metabolic model of Bacillus subtilis based on SEED annotations. Genome Biol. 10, R69 (2009).

    Article  Google Scholar 

  34. Oh, Y.K., Palsson, B.O., Park, S.M., Schilling, C.H. & Mahadevan, R. Genome-scale reconstruction of metabolic network in Bacillus subtilis based on high-throughput phenotyping and gene essentiality data. J. Biol. Chem. 282, 28791–28799 (2007).

    Article  CAS  Google Scholar 

  35. Becker, S.A. & Palsson, B.O. Genome-scale reconstruction of the metabolic network in Staphylococcus aureus N315: an initial draft to the two-dimensional annotation. BMC Microbiol. 5, 8 (2005).

    Article  Google Scholar 

  36. Altschul, S.F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).

    Article  CAS  Google Scholar 

  37. Stamford, N.P., Capretta, A. & Battersby, A.R. Expression, purification and characterisation of the product from the Bacillus subtilis hemD gene, uroporphyrinogen III synthase. Eur. J. Biochem. 231, 236–241 (1995).

    Article  CAS  Google Scholar 

  38. Bower, S. et al. Cloning, sequencing, and characterization of the Bacillus subtilis biotin biosynthetic operon. J. Bacteriol. 178, 4122–4130 (1996).

    Article  CAS  Google Scholar 

  39. Faille, C. et al. Morphology and physico-chemical properties of Bacillus spores surrounded or not with an exosporium: consequences on their ability to adhere to stainless steel. Int. J. Food Microbiol. 143, 125–135 (2010).

    Article  CAS  Google Scholar 

  40. Eichenberger, P. et al. The program of gene transcription for a single differentiating cell type during sporulation in Bacillus subtilis. PLoS Biol. 2, e328 (2004).

    Article  Google Scholar 

  41. Timmons, S.C., Mosher, R.H., Knowles, S.A. & Jakeman, D.L. Exploiting nucleotidylyltransferases to prepare sugar nucleotides. Org. Lett. 9, 857–860 (2007).

    Article  CAS  Google Scholar 

  42. Ishihama, Y. et al. Protein abundance profiling of the Escherichia coli cytosol. BMC Genomics 9, 102 (2008).

    Article  Google Scholar 

  43. Hecker, M., Reder, A., Fuchs, S., Pagels, M. & Engelmann, S. Physiological proteomics and stress/starvation responses in Bacillus subtilis and Staphylococcus aureus. Res. Microbiol. 160, 245–258 (2009).

    Article  CAS  Google Scholar 

  44. Tam, Le, T. et al. Proteome signatures for stress and starvation in Bacillus subtilis as revealed by a 2-D gel image color coding approach. Proteomics 6, 4565–4585 (2006).

    Article  Google Scholar 

  45. Galperin, M.Y., Moroz, O.V., Wilson, K.S. & Murzin, A.G. House cleaning, a part of good housekeeping. Mol. Microbiol. 59, 5–19 (2006).

    Article  CAS  Google Scholar 

  46. Satish Kumar, V., Dasika, M.S. & Maranas, C.D. Optimization based automated curation of metabolic reconstructions. BMC Bioinformatics 8, 212 (2007).

    Article  Google Scholar 

  47. Henry, C.S. et al. High-throughput generation, optimization and analysis of genome-scale metabolic models. Nat. Biotechnol. 28, 977–982 (2010).

    Article  CAS  Google Scholar 

  48. Breitling, R., Vitkup, D. & Barrett, M.P. New surveyor tools for charting microbial metabolic maps. Nat. Rev. Microbiol. 6, 156–161 (2008).

    Article  CAS  Google Scholar 

  49. Faith, J.J. et al. Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles. PLoS Biol. 5, e8 (2007).

    Article  Google Scholar 

  50. Fuhrer, T., Heer, D., Begemann, B. & Zamboni, N. High-throughput, accurate mass metabolome profiling of cellular extracts by flow injection-time-of-flight mass spectrometry. Anal. Chem. 83, 7074–7080 (2011).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

We thank R. Hüttenhain for technical assistance and measuring the peptide samples. We thank P. Eichenberger from the Biology Department at New York University for providing mutant strains. This work was supported in part by US National Institutes of Health grant GM079759 to D.V. and National Centers for Biomedical Computing grant U54CA121852 to Columbia University.

Author information

Authors and Affiliations

Authors

Contributions

G.P., T.-L.H. and D.V. performed computational research and data analysis. T.F. performed experimental research and analysis. D.V. directed computational research. D.V. and U.S. directed experimental research. G.P., T.F., T.-L.H. and D.V. wrote the manuscript. All authors read and edited the manuscript.

Corresponding author

Correspondence to Dennis Vitkup.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Methods and Supplementary Results (PDF 1949 kb)

Supplementary Data Set 1

Overview of detected peptides validated by PeptideProphet with 1% error rate cutoff for YkgB band (XLSX 30 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Plata, G., Fuhrer, T., Hsiao, TL. et al. Global probabilistic annotation of metabolic networks enables enzyme discovery. Nat Chem Biol 8, 848–854 (2012). https://doi.org/10.1038/nchembio.1063

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nchembio.1063

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing