Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Protocol
  • Published:

Defining informative priors for ensemble modeling in systems biology

Abstract

Ensemble modeling in molecular systems biology requires the reproducible translation of kinetic parameter data into informative probability distributions (priors), as well as approaches that sample parameters from these distributions without violating the thermodynamic consistency of the overall model. Although a number of pioneering frameworks for ensemble modeling have been published, the issue of generating informative priors has not yet been addressed. Here, we present a protocol that aims to fill this gap. This protocol discusses the collection of parameter values from a diverse range of sources (literature, databases and experiments), assessment of their plausibility, and creation of log-normal probability distributions that can be used as informative priors in ensemble modeling. Furthermore, the protocol enables sampling from the generated distributions while maintaining thermodynamic consistency. Once all parameter values have been retrieved from literature and databases, the protocol can be implemented within ~5–10 min per parameter. The aim of this protocol is to facilitate the design and use of informative distributions for ensemble modeling, especially in fields such as synthetic biology and systems medicine.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Workflow of the protocol from parameter collection to generation of probability distributions and confirmation of thermodynamic consistency for interconnected parameters.
Fig. 2: Calculation of the weighted median of two unitless parameter values (modes), 103 and 106 (log-transformed to 6.9078 and 13.82, respectively), with varying weights and uncertainty.
Fig. 3
Fig. 4
Fig. 5: Estimated probability distributions for parameters KD1 and \(k_1^ -\), plotted on a log scale.
Fig. 6: The bivariate distribution for parameters k1 and kon1 and the two marginal distributions.

Similar content being viewed by others

References

  1. Samee, M. A. H. et al. A systematic ensemble approach to thermodynamic modeling of gene expression from sequence data. Cell Syst. 1, 396–407 (2015).

    Article  CAS  PubMed  Google Scholar 

  2. Lee, Y., Lafontaine Rivera, J. G. & Liao, J. C. Ensemble modeling for robustness analysis in engineering non-native metabolic pathways. Metab. Eng. 25, 63–71 (2014).

    Article  CAS  PubMed  Google Scholar 

  3. Khazaei, T., McGuigan, A. & Mahadevan, R. Ensemble modeling of cancer metabolism. Front. Physiol. 3, 135 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  4. Kuepfer, L., Peter, M., Sauer, U. & Stelling, J. Ensemble modeling for analysis of cell signaling dynamics. Nat. Biotech. 25, 1001–1006 (2007).

    Article  CAS  Google Scholar 

  5. Andreozzi, S., Miskovic, L. & Hatzimanikatis, V. iSCHRUNK—in silico approach to characterization and reduction of uncertainty in the kinetic models of genome-scale metabolic networks. Metab. Eng. 33, 158–168 (2016).

  6. Jacobsen, J. P., Levin, L. M. & Tausanovitch, Z. Comparing standard regression modeling to ensemble modeling: how data mining software can improve economists’ predictions. East. Econ. J. 42, 387–398 (2016).

    Article  Google Scholar 

  7. Roy, C. J. & Oberkampf, W. L. A comprehensive framework for verification, validation, and uncertainty quantification in scientific computing. Comput. Methods Appl. Mech. Eng. 200, 2131–2144 (2011).

    Article  Google Scholar 

  8. Biddle, J. & Winsberg, E. Value judgements and the estimation of uncertainty in climate modeling. in New Waves in Philosophy of Science (eds. Magnus, P. & Busch, J.) (Palgrave Macmillan, Basingstoke, UK, 2010).

    Chapter  Google Scholar 

  9. Johnstone, R. H., Bardenet, R., Gavaghan, D. J. & Mirams, G. R. Hierarchical Bayesian inference for ion channel screening dose-response data. Wellcome Open Res. 1, 6 (2016).

    Article  PubMed  Google Scholar 

  10. Walters, K. Parameter estimation for an immortal model of colonic stem cell division using approximate Bayesian computation. J. Theor. Biol. 306, 104–114 (2012).

    Article  PubMed  Google Scholar 

  11. Tan, Y., Lafontaine Rivera, J. G., Contador, C. A., Asenjo, J. A. & Liao, J. C. Reducing the allowable kinetic space by constructing ensemble of dynamic models with the same steady-state flux. Metab. Eng. 13, 60–75 (2011).

    Article  CAS  PubMed  Google Scholar 

  12. Miskovic, L. et al. A design–build–test cycle using modeling and experiments reveals interdependencies between upper glycolysis and xylose uptake in recombinant S. cerevisiae and improves predictive capabilities of large-scale kinetic models. Biotechnol. Biofuels 10, 166 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  13. Thijssen, B., Dijkstra, T. M. H., Heskes, T. & Wessels, L. F. A. BCM: toolkit for Bayesian analysis of computational models using samplers. BMC Syst. Biol. 10, 100 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  14. Chakrabarti, A., Miskovic, L., Soh, K. C. & Hatzimanikatis, V. Towards kinetic modeling of genome-scale metabolic networks without sacrificing stoichiometric, thermodynamic and physiological constraints. Biotechnol. J. 8, 1043–1057 (2013).

    Article  CAS  PubMed  Google Scholar 

  15. Babtie, A. C. & Stumpf, M. P. H. How to deal with parameters for whole-cell modelling. J. R. Soc. Interface 14, https://doi.org/10.1098/rsif.2017.0237 (2017).

    Article  PubMed Central  Google Scholar 

  16. Toni, T., Welch, D., Strelkowa, N., Ipsen, A. & Stumpf, M. P. H. Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems. J. R. Soc. Interface 6, 187–202 (2009).

    Article  PubMed  Google Scholar 

  17. Lang, M. & Stelling, J. Modular parameter identification of biomolecular networks. SIAM J. Sci. Comput. 38, B988–B1008 (2016).

    Article  Google Scholar 

  18. Liepe, J. et al. A framework for parameter estimation and model selection from experimental data in systems biology using approximate Bayesian computation. Nat. Protoc. 9, 439–456 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Karr, J. R. et al. Summary of the DREAM8 parameter estimation challenge: toward parameter identification for whole-cell models. PLoS Comput. Biol. 11, e1004096 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  20. Abu Bakar, S. A., Nadarajah, S., Absl Kamarul Adzhar, Z. A. & Mohamed, I. Gendist: an R package for generated probability distribution models. PLoS ONE 11, e0156537 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  21. Liebermeister, W., Uhlendorf, J. & Klipp, E. Modular rate laws for enzymatic reactions: thermodynamics, elasticities and implementation. Bioinformatics 26, 1528–1534 (2010).

    Article  CAS  PubMed  Google Scholar 

  22. Vlad, M. O. & Ross, J. Thermodynamically based constraints for rate coefficients of large biochemical networks. Wiley Interdiscip. Rev. Syst. Biol. Med. 1, 348–358 (2009).

    Article  CAS  PubMed  Google Scholar 

  23. Jenkinson, G. & Goutsias, J. Thermodynamically consistent model calibration in chemical kinetics. BMC Syst. Biol. 5, 64–64 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  24. Saa, P. & Nielsen, L. K. A general framework for thermodynamically consistent parameterization and efficient sampling of enzymatic reactions. PLoS Comput. Biol. 11, e1004195 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  25. Gelman, A. Prior distributions for variance parameters in hierarchical models. Bayesian Anal. 1, 515–534 (2006).

    Article  Google Scholar 

  26. Eydgahi, H. et al. Properties of cell death models calibrated and compared using Bayesian approaches. Mol. Syst. Biol. 9, 644 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Geris, L. & Gomez-Cabrero, D. Uncertainty in Biology: A Computational Modeling Approach (Springer International Publishing, New York, 2015).

  28. Limpert, E., Stahel, W. A. & Abbt, M. Log-normal distributions across the sciences: keys and clues. Bioscience 51, 341–352 (2001).

    Article  Google Scholar 

  29. Tsigkinopoulou, A., Baker, S. M. & Breitling, R. Respectful modeling: addressing uncertainty in dynamic system models for molecular biology. Trends Biotechnol. 35, 518–529 (2017).

    Article  CAS  PubMed  Google Scholar 

  30. Cohen, A. A. et al. Protein dynamics in individual human cells: experiment and theory. PLoS ONE 4, e4901 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  31. Gaudet, S., Spencer, S. L., Chen, W. W. & Sorger, P. K. Exploring the contextual sensitivity of factors that determine cell-to-cell variability in receptor-mediated apoptosis. PLoS Comput. Biol. 8, e1002482 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Klipp, E, Liebermeister, W, Wierling, C. & Kowald, A. Systems Biology: A Textbook. (John Wiley & Sons, Hoboken, NJ, 2016).

    Google Scholar 

  33. Liebermeister, W. & Klipp, E. Biochemical networks with uncertain parameters. Syst. Biol. (Stevenage) 152, 97–107 (2005).

    Article  CAS  Google Scholar 

  34. Achcar, F. et al. Dynamic modelling under uncertainty: the case of Trypanosoma brucei energy metabolism. PLoS Comput. Biol. 8, e1002352 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Achcar, F., Barrett, M. P. & Breitling, R. Explicit consideration of topological and parameter uncertainty gives new insights into a well-established model of glycolysis. FEBS J. 280, 4640–4651 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Placzek, S. et al. BRENDA in 2017: new perspectives and new tools in BRENDA. Nucleic Acids Res. 45, D380–D388 (2017).

    Article  CAS  PubMed  Google Scholar 

  37. Milo, R., Jorgensen, P., Moran, U., Weber, G. & Springer, M. BioNumbers—the database of key numbers in molecular and cell biology. Nucleic Acids Res. 38, D750–D753 (2010).

    Article  CAS  PubMed  Google Scholar 

  38. Ron Milo, R. P. Cell Biology by the Numbers (Garland Science, Taylor & Francis Group, New York, 2015).

  39. Borger, S., Liebermeister, W. & Klipp, E. Prediction of enzyme kinetic parameters based on statistical learning. Genome Inform. 17, 80–87 (2006).

    CAS  PubMed  Google Scholar 

  40. Sridharan, G. V., Ullah, E., Hassoun, S. & Lee, K. Discovery of substrate cycles in large scale metabolic networks using hierarchical modularity. BMC Syst. Biol. 9, 5 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  41. Gebauer, J., Schuster, S., de Figueiredo, L. F. & Kaleta, C. Detecting and investigating substrate cycles in a genome-scale human metabolic network. FEBS J. 279, 3192–3202 (2012).

    Article  CAS  PubMed  Google Scholar 

  42. Beard, D. A. & Qian, H. Metabolic futile cycles and their functions: a systems analysis of energy and control. Syst. Biol. (Stevenage) 153, 192–200 (2006).

    Article  Google Scholar 

  43. Sauro, H. M. Enzyme Kinetics for Systems Biology (Ambrosius Publishing, Lexington, KY, 2011).

  44. Ahn, S. K., Tahlan, K., Yu, Z. & Nodwell, J. Investigation of transcription repression and small-molecule responsiveness by tetR-like transcription factors using a heterologous Escherichia coli–based assay. J. Bacteriol. 189, 6655–6664 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Kleinschmidt, C., Tovar, K., Hillen, W. & Porschke, D. Dynamics of repressor-operator recognition: Tn10-encoded tetracycline resistance control. Biochemistry 27, 1094–1104 (1988).

    Article  CAS  PubMed  Google Scholar 

  46. Kamionka, A., Bogdanska-Urbaniak, J., Scholz, O. & Hillen, W. Two mutations in the tetracycline repressor change the inducer anhydrotetracycline to a corepressor. Nucleic Acids Res. 32, 842–847 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Bolla, J. R. et al. Structural and functional analysis of the transcriptional regulator Rv3066 of Mycobacterium tuberculosis. Nucleic Acids Res. 40, 9340–9355 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Li, T. et al. The TetR-type transcriptional repressor RolR from Corynebacterium glutamicum regulates resorcinol catabolism by binding to a unique operator, rolO. Appl. Environ. Microbiol. 78, 6009–6016 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Kokoska, S. & Zwillinger, D. CRC Standard Probability and Statistics Tables and Formulae, Student Edition (Taylor & Francis, Abingdon, UK, 2000).

  50. Thomas, B. L. K. Geometric means and measures of dispersion. Biometrics 35, 908–909 (1979).

    Google Scholar 

  51. Anderson, T. W. An Introduction to Multivariate Statistical Analysis (Wiley, Hoboken, NJ, 2003).

  52. Hogg, R. V., McKean, J. W. & Craig, A. T. Introduction to Mathematical Statistics (Pearson Prentice Hall, Upper Saddle River, NJ, 2005).

  53. Gut, A. An Intermediate Course in Probability (Springer, New York, 2009).

    Book  Google Scholar 

  54. King, E. L. & Altman, C. A schematic method of deriving the rate laws for enzyme-catalyzed reactions. J. Phys. Chem. 60, 1375–1378 (1956).

    Article  CAS  Google Scholar 

  55. Qi, F., Dash, R. K., Han, Y. & Beard, D. A. Generating rate equations for complex enzyme systems by a computer-assisted systematic method. BMC Bioinform. 10, 238–238 (2009).

    Article  Google Scholar 

  56. Kuzmič, P. Program DYNAFIT for the analysis of enzyme kinetic data: application to HIV proteinase. Anal. Biochem. 237, 260–273 (1996).

    Article  PubMed  Google Scholar 

  57. Leskovac, V. Comprehensive Enzyme Kinetics (Springer US, New York, 2003).

    Google Scholar 

  58. Purich, D. L. & Allison, R. D. Handbook of Biochemical Kinetics: A Guide to Dynamic Processes in the Molecular Life Sciences. (Elsevier Science, New York, 1999).

    Google Scholar 

  59. Fenton, L. The sum of log-normal probability distributions in scatter transmission systems. IEEE Trans. Commun. Syst. 8, 57–67 (1960).

    Article  Google Scholar 

  60. Marlow, N. A. A normal limit theorem for power sums of independent random variables. Bell Syst. Tech. J. 46, 2081–2089 (1967).

    Article  Google Scholar 

Download references

Acknowledgements

We thank F. Del Carratore and the Synthetic Biology Research Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM) for providing technical support. This work received funding from the UK Biotechnology and Biological Sciences Research Council (BB/M000354/1, BB/M017702/1 (R.B.)) and the European Union’s Horizon 2020 Research and Innovation Programme (grant agreement no. 720793, the H2020 TOPCAPI project (R.B.)).

Author information

Authors and Affiliations

Authors

Contributions

A.T. designed and developed the mathematical strategy and the MATLAB functions of the protocol and wrote the manuscript. A.H. designed the first step of the protocol, concerning the criteria of the assignment of weights; performed tests with diverse case studies; and provided feedback for the improvement of the protocol and the manuscript. M.U. tested the protocol and provided insightful advice on the improvement of the manuscript and the computational functions. R.B. supervised the project and wrote the manuscript.

Corresponding author

Correspondence to Rainer Breitling.

Ethics declarations

Competing interests

The authors declare that they have no competing interests as defined by Nature Research, or other interests that might be perceived to influence the results and/or discussion reported in this paper.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Related links

Key references using this protocol

1. Achcar, F. et al. PLOS Comput. Biol. 8, e1002352 (2012): https://doi.org/10.1371/journal.pcbi.1002352

2. Achcar, F., Barrett, M. P. & Breitling, R. FEBS J. 280, 4640–4651 (2013): https://doi.org/10.1111/febs.12436

3. Tsigkinopoulou, A., Baker, S. M. & Breitling, R. Trends Biotechnol. 35, 518–529 (2017): https://doi.org/10.1016/j.tibtech.2016.12.008

Integrated supplementary information

Supplementary Figure 1 Properties of the standard normal and log-normal distributions (μ = 0 and σ = 1).

For the normal distribution, the standard deviation (σ) is additive, and 68.27% of the probability density are contained within the confidence interval [μ−σ, μ+σ]. For the log-normal distribution, the geometric standard deviation is multiplicative and describes a confidence interval around the geometric mean of the distribution, which contains 68.27% of the probability density. The Spread (or multiplicative standard deviation) describes the confidence interval around the mode of the distribution, which contains this fraction of the density. The geometric standard deviation and the Spread are equally valid ways to describe our uncertainty about a parameter, and each has its advantages for some applications. For the protocol, the main advantage in using the Mode and the Spread is the fact that the Spread is symmetric around the most likely value (mode), in the same way as the standard deviation of a normal distribution (i.e., the probability density at each endpoint of the interval is identical). This is not the case for the geometric standard deviation, as shown in the figure. As a result, is more intuitive to specify and communicate our uncertainty about a parameter by using the confidence interval around the mode, rather than that around the median. As can be seen in the figure, the most likely parameter values might not even be included in the confidence interval around the median, which is clearly undesirable when specifying the range of plausible values.

Supplementary Figure 2 Schematic representation of the model.

Blue arrows correspond to the maintenance of the ATP/ADP ratio by direct assignment (in combination with reaction 13), rather than by differential equations, in the published model. Likewise, the cytosolic glycerol levels are kept at zero by direct assignment, corresponding to rapid export of glycerol.

Supplementary Figure 3 Effect of reducing TPI on the steady-state flux of glucose, pyruvate and glycerol.

Replicated results matching Fig. 3b of the published model (Helfert et al,. Biochem. J. (2001)).

Supplementary Figure 4 Plots of the initial priors (red lines) and the samples from the final distributions (green histograms), along with the P values of the K-S test.

For the parameter Km+ the adjusted distribution is also included (blue line).

Supplementary Figure 5

Pairwise correlations between the sampled parameter values.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–5, Supplementary Software 1–7, Supplementary Results 1 and 2, and Supplementary Tables 1–7

Supplementary Software 8

Supplementary Software scripts

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tsigkinopoulou, A., Hawari, A., Uttley, M. et al. Defining informative priors for ensemble modeling in systems biology. Nat Protoc 13, 2643–2663 (2018). https://doi.org/10.1038/s41596-018-0056-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41596-018-0056-z

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing