Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

A workflow for generating multi-strain genome-scale metabolic models of prokaryotes

Abstract

Genome-scale models (GEMs) of bacterial strains’ metabolism have been formulated and used over the past 20 years. Recently, with the number of genome sequences exponentially increasing, multi-strain GEMs have proved valuable to define the properties of a species. Here, through four major stages, we extend the original Protocol used to generate a GEM for a single strain to enable multi-strain GEMs: (i) obtain or generate a high-quality model of a reference strain; (ii) compare the genome sequence between a reference strain and target strains to generate a homology matrix; (iii) generate draft strain-specific models from the homology matrix; and (iv) manually curate draft models. These multi-strain GEMs can be used to study pan-metabolic capabilities and strain-specific differences across a species, thus providing insights into its range of lifestyles. Unlike the original Protocol, this procedure is scalable and can be partly automated with the Supplementary Jupyter notebook Tutorial. This Protocol Extension joins the ranks of other comparable methods for generating models such as CarveMe and KBase. This extension of the original Protocol takes on the order of weeks to multiple months to complete depending on the availability of a suitable reference model.

This is a preview of subscription content, access via your institution

Access options

Buy article

Get time limited or full article access on ReadCube.

$32.00

All prices are NET prices.

Fig. 1: Applications of multi-strain GEMs.
Fig. 2

Code availability statement

The source code is provided in the Supplementary Tutorial. The code in this Protocol Extension has been peer-reviewed.

References

  1. Medini, D., Donati, C., Tettelin, H., Masignani, V. & Rappuoli, R. The microbial pan-genome. Curr. Opin. Genet. Dev. 15, 589–594 (2005).

    CAS  PubMed  Google Scholar 

  2. Tettelin, H., Riley, D., Cattuto, C. & Medini, D. Comparative genomics: the bacterial pan-genome. Curr. Opin. Microbiol. 11, 472–477 (2008).

    CAS  PubMed  Google Scholar 

  3. Rasko, D. A. et al. The pangenome structure of Escherichia coli: comparative genomic analysis of E. coli commensal and pathogenic isolates. J. Bacteriol. 190, 6881–6893 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  4. Bordbar, A., Monk, J. M., King, Z. A. & Palsson, B. O. Constraint-based models predict metabolic and associated cellular functions. Nat. Rev. Genet. 15, 107–120 (2014).

    CAS  PubMed  Google Scholar 

  5. O’Brien, E. J., Monk, J. M. & Palsson, B. O. Using genome-scale models to predict biological capabilities. Cell 161, 971–987 (2015).

    PubMed  PubMed Central  Google Scholar 

  6. Thiele, I. & Palsson, B. Ø. A protocol for generating a high-quality genome-scale metabolic reconstruction. Nat. Protoc. 5, 93–121 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  7. Monk, J. M. et al. Genome-scale metabolic reconstructions of multiple Escherichia coli strains highlight strain-specific adaptations to nutritional environments. Proc. Natl Acad. Sci. USA 110, 20338–20343 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  8. Fang, X. et al. Escherichia coli B2 strains prevalent in inflammatory bowel disease patients have distinct metabolic capabilities that enable colonization of intestinal mucosa. BMC Syst. Biol. 12, 66 (2018).

    PubMed  PubMed Central  Google Scholar 

  9. Seif, Y. et al. Genome-scale metabolic reconstructions of multiple Salmonella strains reveal serovar-specific metabolic traits. Nat. Commun. 9, 3771 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  10. Norsigian, C. J., Kavvas, E., Seif, Y., Palsson, B. O. & Monk, J. M. iCN718, an updated and improved genome-scale metabolic network reconstruction of Acinetobacter baumannii AYE. Front. Genet. 9, 121 (2018).

    PubMed  PubMed Central  Google Scholar 

  11. Fouts, D. E. et al. What makes a bacterial species pathogenic?: Comparative genomic analysis of the genus Leptospira. PLoS Negl. Trop. Dis. 10, e0004403 (2016).

    PubMed  PubMed Central  Google Scholar 

  12. Bosi, E. et al. Comparative genome-scale modelling of Staphylococcus aureus strains identifies strain-specific metabolic capabilities linked to pathogenicity. Proc. Natl Acad. Sci. USA 113, E3801–E3809 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  13. Monk, J., Nogales, J. & Palsson, B. O. Optimizing genome-scale network reconstructions. Nat. Biotechnol. 32, 447–452 (2014).

    CAS  PubMed  Google Scholar 

  14. Monk, J. M. et al. iML1515, a knowledgebase that computes Escherichia coli traits. Nat. Biotechnol. 35, 904–908 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  15. Fang, X. et al. Metagenomics-based, strain-level analysis of Escherichia coli from a time-series of microbiome samples from a Crohn’s disease patient. Front. Microbiol. 9, 2559 (2018).

    PubMed  PubMed Central  Google Scholar 

  16. King, Z. A. et al. BiGG models: a platform for integrating, standardizing and sharing genome-scale models. Nucleic Acids Res. 44, D515–D522 (2016).

    CAS  PubMed  Google Scholar 

  17. Ganter, M., Bernard, T., Moretti, S., Stelling, J. & Pagni, M. MetaNetX.org: a website and repository for accessing, analysing and manipulating metabolic networks. Bioinformatics 29, 815–816 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  18. Le Novere, N. et al. BioModels Database: a free, centralized database of curated, published, quantitative kinetic models of biochemical and cellular systems. Nucleic Acids Res. 34, D689–D691 (2006).

    PubMed  Google Scholar 

  19. Lieven C et al. Memote: a community driven effort towards a standardized genome-scale metabolic model test suite. Preprint at bioRxiv, https://doi.org/10.1101/350991 (2018).

  20. Wattam, A. R. et al. PATRIC, the bacterial bioinformatics database and analysis resource. Nucleic Acids Res. 42, D581–D591 (2014).

    CAS  PubMed  Google Scholar 

  21. Sayers, E. W. et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 40, D13–D25 (2012).

    CAS  PubMed  Google Scholar 

  22. Lewis, N. E., Nagarajan, H. & Palsson, B. O. Constraining the metabolic genotype–phenotype relationship using a phylogeny of in silico methods. Nat. Rev. Microbiol. 10, 291–305 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  23. Ebrahim, A., Lerman, J. A., Palsson, B. O. & Hyduke, D. R. COBRApy: constraints-based reconstruction and analysis for Python. BMC Syst. Biol. 7, 74 (2013).

    PubMed  PubMed Central  Google Scholar 

  24. Heirendt, L. et al. Creation and analysis of biochemical constraint-based models using the COBRA Toolbox v.3.0. Nat. Protoc. 14, 639–702 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  25. Palsson, B. Ø. Systems Biology: Constraint-Based Reconstruction and Analysis (Cambridge Univ. Press, Cambridge, UK, 2015).

  26. Becker, S. A. et al. Quantitative prediction of cellular metabolism with constraint-based models: the COBRA Toolbox. Nat. Protoc. 2007 2: 727–738.

    CAS  PubMed  Google Scholar 

  27. Orth, J. D., Thiele, I. & Palsson, B. Ø. What is flux balance analysis? Nat. Biotechnol. 28, 245–248 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  28. Gelius-Dietrich, G., Desouki, A. A., Fritzemeier, C. J. & Lercher, M. J. Sybil–efficient constraint-based modelling in R. BMC Syst. Biol. 7, 125 (2013).

    PubMed  PubMed Central  Google Scholar 

  29. Schellenberger J. et al. Quantitative prediction of cellular metabolism with constraint-based models: the COBRA Toolbox v2. 0. Nat. Protoc. 2011 6: 1290–1307.

    CAS  PubMed  PubMed Central  Google Scholar 

  30. Orth, J. D. & Palsson, B. Ø. Systematizing the generation of missing metabolic knowledge. Biotechnol. Bioeng. 107, 403–412 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  31. Pan, S. & Reed, J. L. Advances in gap-filling genome-scale metabolic models and model-driven experiments lead to novel metabolic discoveries. Curr. Opin. Biotechnol. 51, 103–108 (2018).

    CAS  PubMed  Google Scholar 

  32. Karp, P. D., Weaver, D. & Latendresse, M. How accurate is automated gap filling of metabolic models? BMC Syst. Biol. 12, 73 (2018).

    PubMed  PubMed Central  Google Scholar 

  33. Reed, J. L. et al. Systems approach to refining genome annotation. Proc. Natl Acad. Sci. USA 103, 17480–17484 (2006).

    CAS  PubMed  PubMed Central  Google Scholar 

  34. Ekblom, R. & Wolf, J. B. W. A field guide to whole-genome sequencing, assembly and annotation. Evol. Appl. 7, 1026–1042 (2014).

    PubMed  PubMed Central  Google Scholar 

  35. Angel, V. D. D. et al. Ten steps to get started in Genome Assembly and Annotation. F1000Res. 7, 148 (2018).

    Google Scholar 

  36. Magnúsdóttir, S. et al. Generation of genome-scale metabolic reconstructions for 773 members of the human gut microbiota. Nat. Biotechnol. 35, 81–89 (2017).

    PubMed  Google Scholar 

  37. Feist, A. M., Herrgård, M. J., Thiele, I., Reed, J. L. & Palsson, B. Ø. Reconstruction of biochemical networks in microorganisms. Nat. Rev. Microbiol. 7, 129–143 (2009).

    CAS  PubMed  Google Scholar 

  38. Christopher Frey, H. & Patil, S. R. Identification and review of sensitivity analysis methods. Risk Anal. 22, 553–578 (2002).

    PubMed  Google Scholar 

  39. Schmelling N. Reciprocal Best Hit BLASTv1 (protocols.io.grnbv5e). protocols.io. https://doi.org/10.17504/protocols.io.grnbv5e.

  40. Hulsen, T., Huynen, M. A., de Vlieg, J. & Groenen, P. M. A. Benchmarking ortholog identification methods using functional genomics data. Genome Biol. 7, R31 (2006).

    PubMed  PubMed Central  Google Scholar 

  41. Lachance, J.-C. et al. BOFdat: generating biomass objective functions for genome-scale metabolic models from experimental data. PLoS Comput. Biol. 15, e1006971 (2019).

    PubMed  PubMed Central  Google Scholar 

  42. Edwards, D. J. & Holt, K. E. Beginner’s guide to comparative bacterial genome analysis using next-generation sequence data. Microb. Inform. Exp. 3, 2 (2013).

    PubMed  PubMed Central  Google Scholar 

  43. Van Domselaar, G. H. et al. BASys: a web server for automated bacterial genome annotation. Nucleic Acids Res. 33, W455–W459 (2005).

    PubMed  PubMed Central  Google Scholar 

  44. Tanizawa, Y., Fujisawa, T. & Nakamura, Y. DFAST: a flexible prokaryotic genome annotation pipeline for faster genome publication. Bioinformatics 34, 1037–1039 (2018).

    CAS  PubMed  Google Scholar 

  45. Seemann, T. Prokka: rapid prokaryotic genome annotation. Bioinformatics 30, 2068–2069 (2014).

    CAS  PubMed  Google Scholar 

Download references

Acknowledgements

This research was supported by NIH grant 1-U01-AI124316, and Novo Nordisk Foundation Center for Biosustainability and the Technical University of Denmark (grant NNF10CC1016517).

Author information

Authors and Affiliations

Authors

Contributions

C.J.N. and X.F. prepared the manuscript. C.J.N. and J.M.M. prepared the supplementary tutorial. Y.S., J.M.M. and B.O.P. reviewed and edited the manuscript.

Corresponding author

Correspondence to Bernhard O. Palsson.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Related links

Key references using this protocol

Thiele, I. & Palsson, B. Ø. Nat. Protoc. 5, 93–121 (2010): https://doi.org/10.1038/nprot.2009.203

Monk, J. M. et al. Nat. Biotechnol. 35, 904–908 (2017): https://doi.org/10.1038/nbt.3956

Seif, Y. et al. Nat. Commun. 9, 3771 (2018): https://doi.org/10.1038/s41467-018-06112-5

Norsigian, C. J. et al. Front. Cell. Infect. Microbiol. 9, 161 (2019): https://doi.org/10.3389/fcimb.2019.00161

This protocol is an extension to: Nat. Protoc. doi: 10.1038/nprot.2009.203

Integrated supplementary information

Supplementary Fig. 1 Genes retained per strain at incrementing PID thresholds.

The number of genes retained in each strain-specific model is dependent on the threshold utilized for binarization of the homology matrix. The effect of the threshold will also be dependent on how closely related the target strains are to the reference strain. For example, within the strains in the Supplementary Tutorial notebooks, we see that CU651637.1 and CP002167.1 are more dissimilar to reference model iML1515, as the drop-off in retained genes occurs in a steeper fashion. We suggest using a threshold of 80% when comparing strains of the same species to ensure a sufficient similarity metric to include a gene in the draft models.

Supplementary Fig. 2 Resulting assembly statistics at various coverages.

To investigate the effect of coverage on overall assembly statistics of N50 and number of contigs, we randomly sampled reads of the BOP27 strain, which has been sequenced to extremely high coverage (400×), enabling this analysis. Analyzing the resulting assemblies at coverages ranging from 10× to 100×, we see from comparing the metrics that at 70×, the assembly quality mostly saturates, and as such, we recommend that included genomes have at least this much coverage.

Supplementary information

Supplementary Information

Supplementary Figs. 1 and 2, Supplementary Tables 1 and 2, and Supplementary Methods.

Reporting Summary

Supplementary Tutorial

Three Jupyter notebooks detailing the entire Protocol Extension as laid out within the Procedure. The first details sequence comparison to generate homology matrix, the second details generation of multi-strain models from the homology matrix, and the third details the beginning investigation of strain-specific capabilities using the draft models.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Norsigian, C.J., Fang, X., Seif, Y. et al. A workflow for generating multi-strain genome-scale metabolic models of prokaryotes. Nat Protoc 15, 1–14 (2020). https://doi.org/10.1038/s41596-019-0254-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41596-019-0254-3

Further reading

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing