A workflow for generating multi-strain genome-scale metabolic models of prokaryotes

Norsigian, Charles J.; Fang, Xin; Seif, Yara; Monk, Jonathan M.; Palsson, Bernhard O.

doi:10.1038/s41596-019-0254-3

Protocol Extension
Published: 20 December 2019

A workflow for generating multi-strain genome-scale metabolic models of prokaryotes

Nature Protocols volume 15, pages 1–14 (2020)Cite this article

6766 Accesses
40 Citations
46 Altmetric
Metrics details

Subjects

Abstract

Genome-scale models (GEMs) of bacterial strains’ metabolism have been formulated and used over the past 20 years. Recently, with the number of genome sequences exponentially increasing, multi-strain GEMs have proved valuable to define the properties of a species. Here, through four major stages, we extend the original Protocol used to generate a GEM for a single strain to enable multi-strain GEMs: (i) obtain or generate a high-quality model of a reference strain; (ii) compare the genome sequence between a reference strain and target strains to generate a homology matrix; (iii) generate draft strain-specific models from the homology matrix; and (iv) manually curate draft models. These multi-strain GEMs can be used to study pan-metabolic capabilities and strain-specific differences across a species, thus providing insights into its range of lifestyles. Unlike the original Protocol, this procedure is scalable and can be partly automated with the Supplementary Jupyter notebook Tutorial. This Protocol Extension joins the ranks of other comparable methods for generating models such as CarveMe and KBase. This extension of the original Protocol takes on the order of weeks to multiple months to complete depending on the availability of a suitable reference model.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Applications of multi-strain GEMs.**

An open source knowledge graph ecosystem for the life sciences

Article Open access 11 April 2024

Tiffany J. Callahan, Ignacio J. Tripodi, … Lawrence E. Hunter

Elucidation of genes enhancing natural product biosynthesis through co-evolution analysis

Article 12 April 2024

Xinran Wang, Ningxin Chen, … Xiaozhou Luo

A host–microbiota interactome reveals extensive transkingdom connectivity

Article 20 March 2024

Nicole D. Sonnert, Connor E. Rosen, … Noah W. Palm

Code availability statement

The source code is provided in the Supplementary Tutorial. The code in this Protocol Extension has been peer-reviewed.

References

Medini, D., Donati, C., Tettelin, H., Masignani, V. & Rappuoli, R. The microbial pan-genome. Curr. Opin. Genet. Dev. 15, 589–594 (2005).
CAS PubMed Google Scholar
Tettelin, H., Riley, D., Cattuto, C. & Medini, D. Comparative genomics: the bacterial pan-genome. Curr. Opin. Microbiol. 11, 472–477 (2008).
CAS PubMed Google Scholar
Rasko, D. A. et al. The pangenome structure of Escherichia coli: comparative genomic analysis of E. coli commensal and pathogenic isolates. J. Bacteriol. 190, 6881–6893 (2008).
CAS PubMed PubMed Central Google Scholar
Bordbar, A., Monk, J. M., King, Z. A. & Palsson, B. O. Constraint-based models predict metabolic and associated cellular functions. Nat. Rev. Genet. 15, 107–120 (2014).
CAS PubMed Google Scholar
O’Brien, E. J., Monk, J. M. & Palsson, B. O. Using genome-scale models to predict biological capabilities. Cell 161, 971–987 (2015).
PubMed PubMed Central Google Scholar
Thiele, I. & Palsson, B. Ø. A protocol for generating a high-quality genome-scale metabolic reconstruction. Nat. Protoc. 5, 93–121 (2010).
CAS PubMed PubMed Central Google Scholar
Monk, J. M. et al. Genome-scale metabolic reconstructions of multiple Escherichia coli strains highlight strain-specific adaptations to nutritional environments. Proc. Natl Acad. Sci. USA 110, 20338–20343 (2013).
CAS PubMed PubMed Central Google Scholar
Fang, X. et al. Escherichia coli B2 strains prevalent in inflammatory bowel disease patients have distinct metabolic capabilities that enable colonization of intestinal mucosa. BMC Syst. Biol. 12, 66 (2018).
PubMed PubMed Central Google Scholar
Seif, Y. et al. Genome-scale metabolic reconstructions of multiple Salmonella strains reveal serovar-specific metabolic traits. Nat. Commun. 9, 3771 (2018).
CAS PubMed PubMed Central Google Scholar
Norsigian, C. J., Kavvas, E., Seif, Y., Palsson, B. O. & Monk, J. M. iCN718, an updated and improved genome-scale metabolic network reconstruction of Acinetobacter baumannii AYE. Front. Genet. 9, 121 (2018).
PubMed PubMed Central Google Scholar
Fouts, D. E. et al. What makes a bacterial species pathogenic?: Comparative genomic analysis of the genus Leptospira. PLoS Negl. Trop. Dis. 10, e0004403 (2016).
PubMed PubMed Central Google Scholar
Bosi, E. et al. Comparative genome-scale modelling of Staphylococcus aureus strains identifies strain-specific metabolic capabilities linked to pathogenicity. Proc. Natl Acad. Sci. USA 113, E3801–E3809 (2016).
CAS PubMed PubMed Central Google Scholar
Monk, J., Nogales, J. & Palsson, B. O. Optimizing genome-scale network reconstructions. Nat. Biotechnol. 32, 447–452 (2014).
CAS PubMed Google Scholar
Monk, J. M. et al. iML1515, a knowledgebase that computes Escherichia coli traits. Nat. Biotechnol. 35, 904–908 (2017).
CAS PubMed PubMed Central Google Scholar
Fang, X. et al. Metagenomics-based, strain-level analysis of Escherichia coli from a time-series of microbiome samples from a Crohn’s disease patient. Front. Microbiol. 9, 2559 (2018).
PubMed PubMed Central Google Scholar
King, Z. A. et al. BiGG models: a platform for integrating, standardizing and sharing genome-scale models. Nucleic Acids Res. 44, D515–D522 (2016).
CAS PubMed Google Scholar
Ganter, M., Bernard, T., Moretti, S., Stelling, J. & Pagni, M. MetaNetX.org: a website and repository for accessing, analysing and manipulating metabolic networks. Bioinformatics 29, 815–816 (2013).
CAS PubMed PubMed Central Google Scholar
Le Novere, N. et al. BioModels Database: a free, centralized database of curated, published, quantitative kinetic models of biochemical and cellular systems. Nucleic Acids Res. 34, D689–D691 (2006).
PubMed Google Scholar
Lieven C et al. Memote: a community driven effort towards a standardized genome-scale metabolic model test suite. Preprint at bioRxiv, https://doi.org/10.1101/350991 (2018).
Wattam, A. R. et al. PATRIC, the bacterial bioinformatics database and analysis resource. Nucleic Acids Res. 42, D581–D591 (2014).
CAS PubMed Google Scholar
Sayers, E. W. et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 40, D13–D25 (2012).
CAS PubMed Google Scholar
Lewis, N. E., Nagarajan, H. & Palsson, B. O. Constraining the metabolic genotype–phenotype relationship using a phylogeny of in silico methods. Nat. Rev. Microbiol. 10, 291–305 (2012).
CAS PubMed PubMed Central Google Scholar
Ebrahim, A., Lerman, J. A., Palsson, B. O. & Hyduke, D. R. COBRApy: constraints-based reconstruction and analysis for Python. BMC Syst. Biol. 7, 74 (2013).
PubMed PubMed Central Google Scholar
Heirendt, L. et al. Creation and analysis of biochemical constraint-based models using the COBRA Toolbox v.3.0. Nat. Protoc. 14, 639–702 (2019).
CAS PubMed PubMed Central Google Scholar
Palsson, B. Ø. Systems Biology: Constraint-Based Reconstruction and Analysis (Cambridge Univ. Press, Cambridge, UK, 2015).
Becker, S. A. et al. Quantitative prediction of cellular metabolism with constraint-based models: the COBRA Toolbox. Nat. Protoc. 2007 2: 727–738.
CAS PubMed Google Scholar
Orth, J. D., Thiele, I. & Palsson, B. Ø. What is flux balance analysis? Nat. Biotechnol. 28, 245–248 (2010).
CAS PubMed PubMed Central Google Scholar
Gelius-Dietrich, G., Desouki, A. A., Fritzemeier, C. J. & Lercher, M. J. Sybil–efficient constraint-based modelling in R. BMC Syst. Biol. 7, 125 (2013).
PubMed PubMed Central Google Scholar
Schellenberger J. et al. Quantitative prediction of cellular metabolism with constraint-based models: the COBRA Toolbox v2. 0. Nat. Protoc. 2011 6: 1290–1307.
CAS PubMed PubMed Central Google Scholar
Orth, J. D. & Palsson, B. Ø. Systematizing the generation of missing metabolic knowledge. Biotechnol. Bioeng. 107, 403–412 (2010).
CAS PubMed PubMed Central Google Scholar
Pan, S. & Reed, J. L. Advances in gap-filling genome-scale metabolic models and model-driven experiments lead to novel metabolic discoveries. Curr. Opin. Biotechnol. 51, 103–108 (2018).
CAS PubMed Google Scholar
Karp, P. D., Weaver, D. & Latendresse, M. How accurate is automated gap filling of metabolic models? BMC Syst. Biol. 12, 73 (2018).
PubMed PubMed Central Google Scholar
Reed, J. L. et al. Systems approach to refining genome annotation. Proc. Natl Acad. Sci. USA 103, 17480–17484 (2006).
CAS PubMed PubMed Central Google Scholar
Ekblom, R. & Wolf, J. B. W. A field guide to whole-genome sequencing, assembly and annotation. Evol. Appl. 7, 1026–1042 (2014).
PubMed PubMed Central Google Scholar
Angel, V. D. D. et al. Ten steps to get started in Genome Assembly and Annotation. F1000Res. 7, 148 (2018).
Google Scholar
Magnúsdóttir, S. et al. Generation of genome-scale metabolic reconstructions for 773 members of the human gut microbiota. Nat. Biotechnol. 35, 81–89 (2017).
PubMed Google Scholar
Feist, A. M., Herrgård, M. J., Thiele, I., Reed, J. L. & Palsson, B. Ø. Reconstruction of biochemical networks in microorganisms. Nat. Rev. Microbiol. 7, 129–143 (2009).
CAS PubMed Google Scholar
Christopher Frey, H. & Patil, S. R. Identification and review of sensitivity analysis methods. Risk Anal. 22, 553–578 (2002).
PubMed Google Scholar
Schmelling N. Reciprocal Best Hit BLASTv1 (protocols.io.grnbv5e). protocols.io. https://doi.org/10.17504/protocols.io.grnbv5e.
Hulsen, T., Huynen, M. A., de Vlieg, J. & Groenen, P. M. A. Benchmarking ortholog identification methods using functional genomics data. Genome Biol. 7, R31 (2006).
PubMed PubMed Central Google Scholar
Lachance, J.-C. et al. BOFdat: generating biomass objective functions for genome-scale metabolic models from experimental data. PLoS Comput. Biol. 15, e1006971 (2019).
PubMed PubMed Central Google Scholar
Edwards, D. J. & Holt, K. E. Beginner’s guide to comparative bacterial genome analysis using next-generation sequence data. Microb. Inform. Exp. 3, 2 (2013).
PubMed PubMed Central Google Scholar
Van Domselaar, G. H. et al. BASys: a web server for automated bacterial genome annotation. Nucleic Acids Res. 33, W455–W459 (2005).
PubMed PubMed Central Google Scholar
Tanizawa, Y., Fujisawa, T. & Nakamura, Y. DFAST: a flexible prokaryotic genome annotation pipeline for faster genome publication. Bioinformatics 34, 1037–1039 (2018).
CAS PubMed Google Scholar
Seemann, T. Prokka: rapid prokaryotic genome annotation. Bioinformatics 30, 2068–2069 (2014).
CAS PubMed Google Scholar

Download references

Acknowledgements

This research was supported by NIH grant 1-U01-AI124316, and Novo Nordisk Foundation Center for Biosustainability and the Technical University of Denmark (grant NNF10CC1016517).

Author information

These authors contributed equally: Charles J. Norsigian, Xin Fang.

Authors and Affiliations

Department of Bioengineering, University of California, San Diego, La Jolla, CA, USA
Charles J. Norsigian, Xin Fang, Yara Seif, Jonathan M. Monk & Bernhard O. Palsson
Department of Pediatrics, University of California, San Diego, La Jolla, CA, USA
Bernhard O. Palsson
Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Lyngby, Denmark
Bernhard O. Palsson

Authors

Charles J. Norsigian
View author publications
You can also search for this author in PubMed Google Scholar
Xin Fang
View author publications
You can also search for this author in PubMed Google Scholar
Yara Seif
View author publications
You can also search for this author in PubMed Google Scholar
Jonathan M. Monk
View author publications
You can also search for this author in PubMed Google Scholar
Bernhard O. Palsson
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

C.J.N. and X.F. prepared the manuscript. C.J.N. and J.M.M. prepared the supplementary tutorial. Y.S., J.M.M. and B.O.P. reviewed and edited the manuscript.

Corresponding author

Correspondence to Bernhard O. Palsson.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated supplementary information

Supplementary Fig. 1 Genes retained per strain at incrementing PID thresholds.

The number of genes retained in each strain-specific model is dependent on the threshold utilized for binarization of the homology matrix. The effect of the threshold will also be dependent on how closely related the target strains are to the reference strain. For example, within the strains in the Supplementary Tutorial notebooks, we see that CU651637.1 and CP002167.1 are more dissimilar to reference model iML1515, as the drop-off in retained genes occurs in a steeper fashion. We suggest using a threshold of 80% when comparing strains of the same species to ensure a sufficient similarity metric to include a gene in the draft models.

Supplementary Fig. 2 Resulting assembly statistics at various coverages.

To investigate the effect of coverage on overall assembly statistics of N50 and number of contigs, we randomly sampled reads of the BOP27 strain, which has been sequenced to extremely high coverage (400×), enabling this analysis. Analyzing the resulting assemblies at coverages ranging from 10× to 100×, we see from comparing the metrics that at 70×, the assembly quality mostly saturates, and as such, we recommend that included genomes have at least this much coverage.

Supplementary information

Supplementary Information

Supplementary Figs. 1 and 2, Supplementary Tables 1 and 2, and Supplementary Methods.

Reporting Summary

Supplementary Tutorial

Three Jupyter notebooks detailing the entire Protocol Extension as laid out within the Procedure. The first details sequence comparison to generate homology matrix, the second details generation of multi-strain models from the homology matrix, and the third details the beginning investigation of strain-specific capabilities using the draft models.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Norsigian, C.J., Fang, X., Seif, Y. et al. A workflow for generating multi-strain genome-scale metabolic models of prokaryotes. Nat Protoc 15, 1–14 (2020). https://doi.org/10.1038/s41596-019-0254-3

Download citation

Received: 16 May 2019
Accepted: 08 October 2019
Published: 20 December 2019
Issue Date: January 2020
DOI: https://doi.org/10.1038/s41596-019-0254-3

This article is cited by

Reconstruction, simulation and analysis of enzyme-constrained metabolic models using GECKO Toolbox 3.0
- Yu Chen
- Johan Gustafsson
- Eduard J. Kerkhoven
Nature Protocols (2024)
Teasing out missing reactions in genome-scale metabolic networks through hypergraph learning
- Can Chen
- Chen Liao
- Yang-Yu Liu
Nature Communications (2023)
Genome-scale metabolic modeling of Aspergillus fumigatus strains reveals growth dependencies on the lung microbiome
- Mohammad H. Mirhakkak
- Xiuqiang Chen
- Gianni Panagiotou
Nature Communications (2023)
High-quality genome-scale metabolic network reconstruction of probiotic bacterium Escherichia coli Nissle 1917
- Max van ‘t Hof
- Omkar S. Mohite
- Morten O. A. Sommer
BMC Bioinformatics (2022)
Reconstructing organisms in silico: genome-scale models and their emerging applications
- Xin Fang
- Colton J. Lloyd
- Bernhard O. Palsson
Nature Reviews Microbiology (2020)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.