Genome-scale models (GEMs) of bacterial strains’ metabolism have been formulated and used over the past 20 years. Recently, with the number of genome sequences exponentially increasing, multi-strain GEMs have proved valuable to define the properties of a species. Here, through four major stages, we extend the original Protocol used to generate a GEM for a single strain to enable multi-strain GEMs: (i) obtain or generate a high-quality model of a reference strain; (ii) compare the genome sequence between a reference strain and target strains to generate a homology matrix; (iii) generate draft strain-specific models from the homology matrix; and (iv) manually curate draft models. These multi-strain GEMs can be used to study pan-metabolic capabilities and strain-specific differences across a species, thus providing insights into its range of lifestyles. Unlike the original Protocol, this procedure is scalable and can be partly automated with the Supplementary Jupyter notebook Tutorial. This Protocol Extension joins the ranks of other comparable methods for generating models such as CarveMe and KBase. This extension of the original Protocol takes on the order of weeks to multiple months to complete depending on the availability of a suitable reference model.
This is a preview of subscription content, access via your institution
Subscribe to Nature+
Get immediate online access to the entire Nature family of 50+ journals
Subscribe to Journal
Get full journal access for 1 year
only $8.25 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Get time limited or full article access on ReadCube.
All prices are NET prices.
Code availability statement
The source code is provided in the Supplementary Tutorial. The code in this Protocol Extension has been peer-reviewed.
Medini, D., Donati, C., Tettelin, H., Masignani, V. & Rappuoli, R. The microbial pan-genome. Curr. Opin. Genet. Dev. 15, 589–594 (2005).
Tettelin, H., Riley, D., Cattuto, C. & Medini, D. Comparative genomics: the bacterial pan-genome. Curr. Opin. Microbiol. 11, 472–477 (2008).
Rasko, D. A. et al. The pangenome structure of Escherichia coli: comparative genomic analysis of E. coli commensal and pathogenic isolates. J. Bacteriol. 190, 6881–6893 (2008).
Bordbar, A., Monk, J. M., King, Z. A. & Palsson, B. O. Constraint-based models predict metabolic and associated cellular functions. Nat. Rev. Genet. 15, 107–120 (2014).
O’Brien, E. J., Monk, J. M. & Palsson, B. O. Using genome-scale models to predict biological capabilities. Cell 161, 971–987 (2015).
Thiele, I. & Palsson, B. Ø. A protocol for generating a high-quality genome-scale metabolic reconstruction. Nat. Protoc. 5, 93–121 (2010).
Monk, J. M. et al. Genome-scale metabolic reconstructions of multiple Escherichia coli strains highlight strain-specific adaptations to nutritional environments. Proc. Natl Acad. Sci. USA 110, 20338–20343 (2013).
Fang, X. et al. Escherichia coli B2 strains prevalent in inflammatory bowel disease patients have distinct metabolic capabilities that enable colonization of intestinal mucosa. BMC Syst. Biol. 12, 66 (2018).
Seif, Y. et al. Genome-scale metabolic reconstructions of multiple Salmonella strains reveal serovar-specific metabolic traits. Nat. Commun. 9, 3771 (2018).
Norsigian, C. J., Kavvas, E., Seif, Y., Palsson, B. O. & Monk, J. M. iCN718, an updated and improved genome-scale metabolic network reconstruction of Acinetobacter baumannii AYE. Front. Genet. 9, 121 (2018).
Fouts, D. E. et al. What makes a bacterial species pathogenic?: Comparative genomic analysis of the genus Leptospira. PLoS Negl. Trop. Dis. 10, e0004403 (2016).
Bosi, E. et al. Comparative genome-scale modelling of Staphylococcus aureus strains identifies strain-specific metabolic capabilities linked to pathogenicity. Proc. Natl Acad. Sci. USA 113, E3801–E3809 (2016).
Monk, J., Nogales, J. & Palsson, B. O. Optimizing genome-scale network reconstructions. Nat. Biotechnol. 32, 447–452 (2014).
Monk, J. M. et al. iML1515, a knowledgebase that computes Escherichia coli traits. Nat. Biotechnol. 35, 904–908 (2017).
Fang, X. et al. Metagenomics-based, strain-level analysis of Escherichia coli from a time-series of microbiome samples from a Crohn’s disease patient. Front. Microbiol. 9, 2559 (2018).
King, Z. A. et al. BiGG models: a platform for integrating, standardizing and sharing genome-scale models. Nucleic Acids Res. 44, D515–D522 (2016).
Ganter, M., Bernard, T., Moretti, S., Stelling, J. & Pagni, M. MetaNetX.org: a website and repository for accessing, analysing and manipulating metabolic networks. Bioinformatics 29, 815–816 (2013).
Le Novere, N. et al. BioModels Database: a free, centralized database of curated, published, quantitative kinetic models of biochemical and cellular systems. Nucleic Acids Res. 34, D689–D691 (2006).
Lieven C et al. Memote: a community driven effort towards a standardized genome-scale metabolic model test suite. Preprint at bioRxiv, https://doi.org/10.1101/350991 (2018).
Wattam, A. R. et al. PATRIC, the bacterial bioinformatics database and analysis resource. Nucleic Acids Res. 42, D581–D591 (2014).
Sayers, E. W. et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 40, D13–D25 (2012).
Lewis, N. E., Nagarajan, H. & Palsson, B. O. Constraining the metabolic genotype–phenotype relationship using a phylogeny of in silico methods. Nat. Rev. Microbiol. 10, 291–305 (2012).
Ebrahim, A., Lerman, J. A., Palsson, B. O. & Hyduke, D. R. COBRApy: constraints-based reconstruction and analysis for Python. BMC Syst. Biol. 7, 74 (2013).
Heirendt, L. et al. Creation and analysis of biochemical constraint-based models using the COBRA Toolbox v.3.0. Nat. Protoc. 14, 639–702 (2019).
Palsson, B. Ø. Systems Biology: Constraint-Based Reconstruction and Analysis (Cambridge Univ. Press, Cambridge, UK, 2015).
Becker, S. A. et al. Quantitative prediction of cellular metabolism with constraint-based models: the COBRA Toolbox. Nat. Protoc. 2007 2: 727–738.
Orth, J. D., Thiele, I. & Palsson, B. Ø. What is flux balance analysis? Nat. Biotechnol. 28, 245–248 (2010).
Gelius-Dietrich, G., Desouki, A. A., Fritzemeier, C. J. & Lercher, M. J. Sybil–efficient constraint-based modelling in R. BMC Syst. Biol. 7, 125 (2013).
Schellenberger J. et al. Quantitative prediction of cellular metabolism with constraint-based models: the COBRA Toolbox v2. 0. Nat. Protoc. 2011 6: 1290–1307.
Orth, J. D. & Palsson, B. Ø. Systematizing the generation of missing metabolic knowledge. Biotechnol. Bioeng. 107, 403–412 (2010).
Pan, S. & Reed, J. L. Advances in gap-filling genome-scale metabolic models and model-driven experiments lead to novel metabolic discoveries. Curr. Opin. Biotechnol. 51, 103–108 (2018).
Karp, P. D., Weaver, D. & Latendresse, M. How accurate is automated gap filling of metabolic models? BMC Syst. Biol. 12, 73 (2018).
Reed, J. L. et al. Systems approach to refining genome annotation. Proc. Natl Acad. Sci. USA 103, 17480–17484 (2006).
Ekblom, R. & Wolf, J. B. W. A field guide to whole-genome sequencing, assembly and annotation. Evol. Appl. 7, 1026–1042 (2014).
Angel, V. D. D. et al. Ten steps to get started in Genome Assembly and Annotation. F1000Res. 7, 148 (2018).
Magnúsdóttir, S. et al. Generation of genome-scale metabolic reconstructions for 773 members of the human gut microbiota. Nat. Biotechnol. 35, 81–89 (2017).
Feist, A. M., Herrgård, M. J., Thiele, I., Reed, J. L. & Palsson, B. Ø. Reconstruction of biochemical networks in microorganisms. Nat. Rev. Microbiol. 7, 129–143 (2009).
Christopher Frey, H. & Patil, S. R. Identification and review of sensitivity analysis methods. Risk Anal. 22, 553–578 (2002).
Schmelling N. Reciprocal Best Hit BLASTv1 (protocols.io.grnbv5e). protocols.io. https://doi.org/10.17504/protocols.io.grnbv5e.
Hulsen, T., Huynen, M. A., de Vlieg, J. & Groenen, P. M. A. Benchmarking ortholog identification methods using functional genomics data. Genome Biol. 7, R31 (2006).
Lachance, J.-C. et al. BOFdat: generating biomass objective functions for genome-scale metabolic models from experimental data. PLoS Comput. Biol. 15, e1006971 (2019).
Edwards, D. J. & Holt, K. E. Beginner’s guide to comparative bacterial genome analysis using next-generation sequence data. Microb. Inform. Exp. 3, 2 (2013).
Van Domselaar, G. H. et al. BASys: a web server for automated bacterial genome annotation. Nucleic Acids Res. 33, W455–W459 (2005).
Tanizawa, Y., Fujisawa, T. & Nakamura, Y. DFAST: a flexible prokaryotic genome annotation pipeline for faster genome publication. Bioinformatics 34, 1037–1039 (2018).
Seemann, T. Prokka: rapid prokaryotic genome annotation. Bioinformatics 30, 2068–2069 (2014).
This research was supported by NIH grant 1-U01-AI124316, and Novo Nordisk Foundation Center for Biosustainability and the Technical University of Denmark (grant NNF10CC1016517).
The authors declare no competing interests.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Key references using this protocol
Thiele, I. & Palsson, B. Ø. Nat. Protoc. 5, 93–121 (2010): https://doi.org/10.1038/nprot.2009.203
Monk, J. M. et al. Nat. Biotechnol. 35, 904–908 (2017): https://doi.org/10.1038/nbt.3956
Seif, Y. et al. Nat. Commun. 9, 3771 (2018): https://doi.org/10.1038/s41467-018-06112-5
Norsigian, C. J. et al. Front. Cell. Infect. Microbiol. 9, 161 (2019): https://doi.org/10.3389/fcimb.2019.00161
This protocol is an extension to: Nat. Protoc. doi: 10.1038/nprot.2009.203
Integrated supplementary information
The number of genes retained in each strain-specific model is dependent on the threshold utilized for binarization of the homology matrix. The effect of the threshold will also be dependent on how closely related the target strains are to the reference strain. For example, within the strains in the Supplementary Tutorial notebooks, we see that CU651637.1 and CP002167.1 are more dissimilar to reference model iML1515, as the drop-off in retained genes occurs in a steeper fashion. We suggest using a threshold of 80% when comparing strains of the same species to ensure a sufficient similarity metric to include a gene in the draft models.
To investigate the effect of coverage on overall assembly statistics of N50 and number of contigs, we randomly sampled reads of the BOP27 strain, which has been sequenced to extremely high coverage (400×), enabling this analysis. Analyzing the resulting assemblies at coverages ranging from 10× to 100×, we see from comparing the metrics that at 70×, the assembly quality mostly saturates, and as such, we recommend that included genomes have at least this much coverage.
Supplementary Figs. 1 and 2, Supplementary Tables 1 and 2, and Supplementary Methods.
Three Jupyter notebooks detailing the entire Protocol Extension as laid out within the Procedure. The first details sequence comparison to generate homology matrix, the second details generation of multi-strain models from the homology matrix, and the third details the beginning investigation of strain-specific capabilities using the draft models.
About this article
Cite this article
Norsigian, C.J., Fang, X., Seif, Y. et al. A workflow for generating multi-strain genome-scale metabolic models of prokaryotes. Nat Protoc 15, 1–14 (2020). https://doi.org/10.1038/s41596-019-0254-3
Nature Reviews Microbiology (2020)
Biotechnology and Bioprocess Engineering (2020)