Abstract
Annotation of organism-specific metabolic networks is one of the main challenges of systems biology. Importantly, owing to inherent uncertainty of computational annotations, predictions of biochemical function need to be treated probabilistically. We present a global probabilistic approach to annotate genome-scale metabolic networks that integrates sequence homology and context-based correlations under a single principled framework. The developed method for global biochemical reconstruction using sampling (GLOBUS) not only provides annotation probabilities for each functional assignment but also suggests likely alternative functions. GLOBUS is based on statistical Gibbs sampling of probable metabolic annotations and is able to make accurate functional assignments even in cases of remote sequence identity to known enzymes. We apply GLOBUS to genomes of Bacillus subtilis and Staphylococcus aureus and validate the method predictions by experimentally demonstrating the 6-phosphogluconolactonase activity of YkgB and the role of the Sps pathway for rhamnose biosynthesis in B. subtilis.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Oberhardt, M.A., Palsson, B.O. & Papin, J.A. Applications of genome-scale metabolic reconstructions. Mol. Syst. Biol. 5, 320 (2009).
Almaas, E., Oltvai, Z.N. & Barabasi, A.L. The activity reaction core and plasticity of metabolic networks. PLoS Comput. Biol. 1, e68 (2005).
Kanehisa, M. et al. KEGG for linking genomes to life and the environment. Nucleic Acids Res. 36, D480–D484 (2008).
Notebaart, R.A., van Enckevort, F.H., Francke, C., Siezen, R.J. & Teusink, B. Accelerating the reconstruction of genome-scale metabolic networks. BMC Bioinformatics 7, 296 (2006).
Karp, P.D. et al. Pathway Tools version 13.0: integrated software for pathway/genome informatics and systems biology. Brief. Bioinform. 11, 40–79 (2010).
Lee, D., Redfern, O. & Orengo, C. Predicting protein function from sequence and structure. Nat. Rev. Mol. Cell Biol. 8, 995–1005 (2007).
Tian, W. & Skolnick, J. How well is enzyme function conserved as a function of pairwise sequence identity? J. Mol. Biol. 333, 863–882 (2003).
Schnoes, A.M., Brown, S.D., Dodevski, I. & Babbitt, P.C. Annotation error in public databases: misannotation of molecular function in enzyme superfamilies. PLoS Comput. Biol. 5, e1000605 (2009).
Hsiao, T.L., Revelles, O., Chen, L., Sauer, U. & Vitkup, D. Automatic policing of biochemical annotations using genomic correlations. Nat. Chem. Biol. 6, 34–40 (2010).
Chang, A., Scheer, M., Grote, A., Schomburg, I. & Schomburg, D. BRENDA, AMENDA and FRENDA the enzyme information system: new content and tools in 2009. Nucleic Acids Res. 37, D588–D592 (2009).
Rebhan, M., Chalifa-Caspi, V., Prilusky, J. & Lancet, D. GeneCards: a novel functional genomics compendium with automated data mining and query reformulation support. Bioinformatics 14, 656–664 (1998).
Caspi, R. et al. The MetaCyc Database of metabolic pathways and enzymes and the BioCyc collection of Pathway/Genome Databases. Nucleic Acids Res. 36, D623–D631 (2008).
Boeckmann, B. et al. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res. 31, 365–370 (2003).
Overbeek, R. et al. The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes. Nucleic Acids Res. 33, 5691–5702 (2005).
Pellegrini, M., Marcotte, E.M., Thompson, M.J., Eisenberg, D. & Yeates, T.O. Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc. Natl. Acad. Sci. USA 96, 4285–4288 (1999).
Yanai, I., Derti, A. & DeLisi, C. Genes linked by fusion events are generally of the same functional category: a systematic analysis of 30 microbial genomes. Proc. Natl. Acad. Sci. USA 98, 7940–7945 (2001).
Wu, L.F. et al. Large-scale prediction of Saccharomyces cerevisiae gene function using overlapping transcriptional clusters. Nat. Genet. 31, 255–265 (2002).
Overbeek, R., Fonstein, M., D′Souza, M., Pusch, G.D. & Maltsev, N. The use of gene clusters to infer functional coupling. Proc. Natl. Acad. Sci. USA 96, 2896–2901 (1999).
Eisenberg, D., Marcotte, E.M., Xenarios, I. & Yeates, T.O. Protein function in the post-genomic era. Nature 405, 823–826 (2000).
Korbel, J.O., Jensen, L.J., von Mering, C. & Bork, P. Analysis of genomic context: prediction of functional associations from conserved bidirectionally transcribed gene pairs. Nat. Biotechnol. 22, 911–917 (2004).
von Mering, C. et al. Genome evolution reveals biochemical networks and functional modules. Proc. Natl. Acad. Sci. USA 100, 15428–15433 (2003).
Chen, L. & Vitkup, D. Predicting genes for orphan metabolic activities using phylogenetic profiles. Genome Biol. 7, R17 (2006).
Kharchenko, P., Chen, L., Freund, Y., Vitkup, D. & Church, G.M. Identifying metabolic enzymes with multiple types of association evidence. BMC Bioinformatics 7, 177 (2006).
Price, N.D. & Shmulevich, I. Biochemical and statistical network models for systems biology. Curr. Opin. Biotechnol. 18, 365–370 (2007).
Kharchenko, P., Church, G.M. & Vitkup, D. Expression dynamics of a cellular metabolic network. Mol. Syst. Biol. 1, 2005.0016 (2005).
Li, S.Z. Markov Random Field Modeling in Image Analysis, Ch. 1 (Springer, Tokyo, 2001).
Casella, G. & George, E.I. Explaining the Gibbs sampler. Am. Stat. 46, 167–174 (1992).
Hastings, W.K. Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57, 97–109 (1970).
Lawrence, C.E. et al. Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science 262, 208–214 (1993).
Kuepfer, L., Sauer, U. & Blank, L.M. Metabolic functions of duplicate genes in Saccharomyces cerevisiae. Genome Res. 15, 1421–1430 (2005).
Kirkpatrick, S., Gelatt, C.D. Jr. & Vecchi, M.P. Optimization by Simulated Annealing. Science 220, 671–680 (1983).
Mo, M.L., Palsson, B.O. & Herrgard, M.J. Connecting extracellular metabolomic measurements to intracellular flux states in yeast. BMC Syst. Biol. 3, 37 (2009).
Henry, C.S., Zinner, J.F., Cohoon, M.P. & Stevens, R.L. iBsu1103: a new genome-scale metabolic model of Bacillus subtilis based on SEED annotations. Genome Biol. 10, R69 (2009).
Oh, Y.K., Palsson, B.O., Park, S.M., Schilling, C.H. & Mahadevan, R. Genome-scale reconstruction of metabolic network in Bacillus subtilis based on high-throughput phenotyping and gene essentiality data. J. Biol. Chem. 282, 28791–28799 (2007).
Becker, S.A. & Palsson, B.O. Genome-scale reconstruction of the metabolic network in Staphylococcus aureus N315: an initial draft to the two-dimensional annotation. BMC Microbiol. 5, 8 (2005).
Altschul, S.F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).
Stamford, N.P., Capretta, A. & Battersby, A.R. Expression, purification and characterisation of the product from the Bacillus subtilis hemD gene, uroporphyrinogen III synthase. Eur. J. Biochem. 231, 236–241 (1995).
Bower, S. et al. Cloning, sequencing, and characterization of the Bacillus subtilis biotin biosynthetic operon. J. Bacteriol. 178, 4122–4130 (1996).
Faille, C. et al. Morphology and physico-chemical properties of Bacillus spores surrounded or not with an exosporium: consequences on their ability to adhere to stainless steel. Int. J. Food Microbiol. 143, 125–135 (2010).
Eichenberger, P. et al. The program of gene transcription for a single differentiating cell type during sporulation in Bacillus subtilis. PLoS Biol. 2, e328 (2004).
Timmons, S.C., Mosher, R.H., Knowles, S.A. & Jakeman, D.L. Exploiting nucleotidylyltransferases to prepare sugar nucleotides. Org. Lett. 9, 857–860 (2007).
Ishihama, Y. et al. Protein abundance profiling of the Escherichia coli cytosol. BMC Genomics 9, 102 (2008).
Hecker, M., Reder, A., Fuchs, S., Pagels, M. & Engelmann, S. Physiological proteomics and stress/starvation responses in Bacillus subtilis and Staphylococcus aureus. Res. Microbiol. 160, 245–258 (2009).
Tam, Le, T. et al. Proteome signatures for stress and starvation in Bacillus subtilis as revealed by a 2-D gel image color coding approach. Proteomics 6, 4565–4585 (2006).
Galperin, M.Y., Moroz, O.V., Wilson, K.S. & Murzin, A.G. House cleaning, a part of good housekeeping. Mol. Microbiol. 59, 5–19 (2006).
Satish Kumar, V., Dasika, M.S. & Maranas, C.D. Optimization based automated curation of metabolic reconstructions. BMC Bioinformatics 8, 212 (2007).
Henry, C.S. et al. High-throughput generation, optimization and analysis of genome-scale metabolic models. Nat. Biotechnol. 28, 977–982 (2010).
Breitling, R., Vitkup, D. & Barrett, M.P. New surveyor tools for charting microbial metabolic maps. Nat. Rev. Microbiol. 6, 156–161 (2008).
Faith, J.J. et al. Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles. PLoS Biol. 5, e8 (2007).
Fuhrer, T., Heer, D., Begemann, B. & Zamboni, N. High-throughput, accurate mass metabolome profiling of cellular extracts by flow injection-time-of-flight mass spectrometry. Anal. Chem. 83, 7074–7080 (2011).
Acknowledgements
We thank R. Hüttenhain for technical assistance and measuring the peptide samples. We thank P. Eichenberger from the Biology Department at New York University for providing mutant strains. This work was supported in part by US National Institutes of Health grant GM079759 to D.V. and National Centers for Biomedical Computing grant U54CA121852 to Columbia University.
Author information
Authors and Affiliations
Contributions
G.P., T.-L.H. and D.V. performed computational research and data analysis. T.F. performed experimental research and analysis. D.V. directed computational research. D.V. and U.S. directed experimental research. G.P., T.F., T.-L.H. and D.V. wrote the manuscript. All authors read and edited the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Supplementary Text and Figures
Supplementary Methods and Supplementary Results (PDF 1949 kb)
Supplementary Data Set 1
Overview of detected peptides validated by PeptideProphet with 1% error rate cutoff for YkgB band (XLSX 30 kb)
Rights and permissions
About this article
Cite this article
Plata, G., Fuhrer, T., Hsiao, TL. et al. Global probabilistic annotation of metabolic networks enables enzyme discovery. Nat Chem Biol 8, 848–854 (2012). https://doi.org/10.1038/nchembio.1063
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nchembio.1063
This article is cited by
-
Growth promotion and antibiotic induced metabolic shifts in the chicken gut microbiome
Communications Biology (2022)
-
Addressing uncertainty in genome-scale metabolic model reconstruction and analysis
Genome Biology (2021)
-
Nontargeted in vitro metabolomics for high-throughput identification of novel enzymes in Escherichia coli
Nature Methods (2017)
-
Long-term phenotypic evolution of bacteria
Nature (2015)
-
Predicting network functions with nested patterns
Nature Communications (2014)