Ensemble modeling in molecular systems biology requires the reproducible translation of kinetic parameter data into informative probability distributions (priors), as well as approaches that sample parameters from these distributions without violating the thermodynamic consistency of the overall model. Although a number of pioneering frameworks for ensemble modeling have been published, the issue of generating informative priors has not yet been addressed. Here, we present a protocol that aims to fill this gap. This protocol discusses the collection of parameter values from a diverse range of sources (literature, databases and experiments), assessment of their plausibility, and creation of log-normal probability distributions that can be used as informative priors in ensemble modeling. Furthermore, the protocol enables sampling from the generated distributions while maintaining thermodynamic consistency. Once all parameter values have been retrieved from literature and databases, the protocol can be implemented within ~5–10 min per parameter. The aim of this protocol is to facilitate the design and use of informative distributions for ensemble modeling, especially in fields such as synthetic biology and systems medicine.
Subscribe to Journal
Get full journal access for 1 year
only $41.25 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Samee, M. A. H. et al. A systematic ensemble approach to thermodynamic modeling of gene expression from sequence data. Cell Syst. 1, 396–407 (2015).
Lee, Y., Lafontaine Rivera, J. G. & Liao, J. C. Ensemble modeling for robustness analysis in engineering non-native metabolic pathways. Metab. Eng. 25, 63–71 (2014).
Khazaei, T., McGuigan, A. & Mahadevan, R. Ensemble modeling of cancer metabolism. Front. Physiol. 3, 135 (2012).
Kuepfer, L., Peter, M., Sauer, U. & Stelling, J. Ensemble modeling for analysis of cell signaling dynamics. Nat. Biotech. 25, 1001–1006 (2007).
Andreozzi, S., Miskovic, L. & Hatzimanikatis, V. iSCHRUNK—in silico approach to characterization and reduction of uncertainty in the kinetic models of genome-scale metabolic networks. Metab. Eng. 33, 158–168 (2016).
Jacobsen, J. P., Levin, L. M. & Tausanovitch, Z. Comparing standard regression modeling to ensemble modeling: how data mining software can improve economists’ predictions. East. Econ. J. 42, 387–398 (2016).
Roy, C. J. & Oberkampf, W. L. A comprehensive framework for verification, validation, and uncertainty quantification in scientific computing. Comput. Methods Appl. Mech. Eng. 200, 2131–2144 (2011).
Biddle, J. & Winsberg, E. Value judgements and the estimation of uncertainty in climate modeling. in New Waves in Philosophy of Science (eds. Magnus, P. & Busch, J.) (Palgrave Macmillan, Basingstoke, UK, 2010).
Johnstone, R. H., Bardenet, R., Gavaghan, D. J. & Mirams, G. R. Hierarchical Bayesian inference for ion channel screening dose-response data. Wellcome Open Res. 1, 6 (2016).
Walters, K. Parameter estimation for an immortal model of colonic stem cell division using approximate Bayesian computation. J. Theor. Biol. 306, 104–114 (2012).
Tan, Y., Lafontaine Rivera, J. G., Contador, C. A., Asenjo, J. A. & Liao, J. C. Reducing the allowable kinetic space by constructing ensemble of dynamic models with the same steady-state flux. Metab. Eng. 13, 60–75 (2011).
Miskovic, L. et al. A design–build–test cycle using modeling and experiments reveals interdependencies between upper glycolysis and xylose uptake in recombinant S. cerevisiae and improves predictive capabilities of large-scale kinetic models. Biotechnol. Biofuels 10, 166 (2017).
Thijssen, B., Dijkstra, T. M. H., Heskes, T. & Wessels, L. F. A. BCM: toolkit for Bayesian analysis of computational models using samplers. BMC Syst. Biol. 10, 100 (2016).
Chakrabarti, A., Miskovic, L., Soh, K. C. & Hatzimanikatis, V. Towards kinetic modeling of genome-scale metabolic networks without sacrificing stoichiometric, thermodynamic and physiological constraints. Biotechnol. J. 8, 1043–1057 (2013).
Babtie, A. C. & Stumpf, M. P. H. How to deal with parameters for whole-cell modelling. J. R. Soc. Interface 14, https://doi.org/10.1098/rsif.2017.0237 (2017).
Toni, T., Welch, D., Strelkowa, N., Ipsen, A. & Stumpf, M. P. H. Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems. J. R. Soc. Interface 6, 187–202 (2009).
Lang, M. & Stelling, J. Modular parameter identification of biomolecular networks. SIAM J. Sci. Comput. 38, B988–B1008 (2016).
Liepe, J. et al. A framework for parameter estimation and model selection from experimental data in systems biology using approximate Bayesian computation. Nat. Protoc. 9, 439–456 (2014).
Karr, J. R. et al. Summary of the DREAM8 parameter estimation challenge: toward parameter identification for whole-cell models. PLoS Comput. Biol. 11, e1004096 (2015).
Abu Bakar, S. A., Nadarajah, S., Absl Kamarul Adzhar, Z. A. & Mohamed, I. Gendist: an R package for generated probability distribution models. PLoS ONE 11, e0156537 (2016).
Liebermeister, W., Uhlendorf, J. & Klipp, E. Modular rate laws for enzymatic reactions: thermodynamics, elasticities and implementation. Bioinformatics 26, 1528–1534 (2010).
Vlad, M. O. & Ross, J. Thermodynamically based constraints for rate coefficients of large biochemical networks. Wiley Interdiscip. Rev. Syst. Biol. Med. 1, 348–358 (2009).
Jenkinson, G. & Goutsias, J. Thermodynamically consistent model calibration in chemical kinetics. BMC Syst. Biol. 5, 64–64 (2011).
Saa, P. & Nielsen, L. K. A general framework for thermodynamically consistent parameterization and efficient sampling of enzymatic reactions. PLoS Comput. Biol. 11, e1004195 (2015).
Gelman, A. Prior distributions for variance parameters in hierarchical models. Bayesian Anal. 1, 515–534 (2006).
Eydgahi, H. et al. Properties of cell death models calibrated and compared using Bayesian approaches. Mol. Syst. Biol. 9, 644 (2013).
Geris, L. & Gomez-Cabrero, D. Uncertainty in Biology: A Computational Modeling Approach (Springer International Publishing, New York, 2015).
Limpert, E., Stahel, W. A. & Abbt, M. Log-normal distributions across the sciences: keys and clues. Bioscience 51, 341–352 (2001).
Tsigkinopoulou, A., Baker, S. M. & Breitling, R. Respectful modeling: addressing uncertainty in dynamic system models for molecular biology. Trends Biotechnol. 35, 518–529 (2017).
Cohen, A. A. et al. Protein dynamics in individual human cells: experiment and theory. PLoS ONE 4, e4901 (2009).
Gaudet, S., Spencer, S. L., Chen, W. W. & Sorger, P. K. Exploring the contextual sensitivity of factors that determine cell-to-cell variability in receptor-mediated apoptosis. PLoS Comput. Biol. 8, e1002482 (2012).
Klipp, E, Liebermeister, W, Wierling, C. & Kowald, A. Systems Biology: A Textbook. (John Wiley & Sons, Hoboken, NJ, 2016).
Liebermeister, W. & Klipp, E. Biochemical networks with uncertain parameters. Syst. Biol. (Stevenage) 152, 97–107 (2005).
Achcar, F. et al. Dynamic modelling under uncertainty: the case of Trypanosoma brucei energy metabolism. PLoS Comput. Biol. 8, e1002352 (2012).
Achcar, F., Barrett, M. P. & Breitling, R. Explicit consideration of topological and parameter uncertainty gives new insights into a well-established model of glycolysis. FEBS J. 280, 4640–4651 (2013).
Placzek, S. et al. BRENDA in 2017: new perspectives and new tools in BRENDA. Nucleic Acids Res. 45, D380–D388 (2017).
Milo, R., Jorgensen, P., Moran, U., Weber, G. & Springer, M. BioNumbers—the database of key numbers in molecular and cell biology. Nucleic Acids Res. 38, D750–D753 (2010).
Ron Milo, R. P. Cell Biology by the Numbers (Garland Science, Taylor & Francis Group, New York, 2015).
Borger, S., Liebermeister, W. & Klipp, E. Prediction of enzyme kinetic parameters based on statistical learning. Genome Inform. 17, 80–87 (2006).
Sridharan, G. V., Ullah, E., Hassoun, S. & Lee, K. Discovery of substrate cycles in large scale metabolic networks using hierarchical modularity. BMC Syst. Biol. 9, 5 (2015).
Gebauer, J., Schuster, S., de Figueiredo, L. F. & Kaleta, C. Detecting and investigating substrate cycles in a genome-scale human metabolic network. FEBS J. 279, 3192–3202 (2012).
Beard, D. A. & Qian, H. Metabolic futile cycles and their functions: a systems analysis of energy and control. Syst. Biol. (Stevenage) 153, 192–200 (2006).
Sauro, H. M. Enzyme Kinetics for Systems Biology (Ambrosius Publishing, Lexington, KY, 2011).
Ahn, S. K., Tahlan, K., Yu, Z. & Nodwell, J. Investigation of transcription repression and small-molecule responsiveness by tetR-like transcription factors using a heterologous Escherichia coli–based assay. J. Bacteriol. 189, 6655–6664 (2007).
Kleinschmidt, C., Tovar, K., Hillen, W. & Porschke, D. Dynamics of repressor-operator recognition: Tn10-encoded tetracycline resistance control. Biochemistry 27, 1094–1104 (1988).
Kamionka, A., Bogdanska-Urbaniak, J., Scholz, O. & Hillen, W. Two mutations in the tetracycline repressor change the inducer anhydrotetracycline to a corepressor. Nucleic Acids Res. 32, 842–847 (2004).
Bolla, J. R. et al. Structural and functional analysis of the transcriptional regulator Rv3066 of Mycobacterium tuberculosis. Nucleic Acids Res. 40, 9340–9355 (2012).
Li, T. et al. The TetR-type transcriptional repressor RolR from Corynebacterium glutamicum regulates resorcinol catabolism by binding to a unique operator, rolO. Appl. Environ. Microbiol. 78, 6009–6016 (2012).
Kokoska, S. & Zwillinger, D. CRC Standard Probability and Statistics Tables and Formulae, Student Edition (Taylor & Francis, Abingdon, UK, 2000).
Thomas, B. L. K. Geometric means and measures of dispersion. Biometrics 35, 908–909 (1979).
Anderson, T. W. An Introduction to Multivariate Statistical Analysis (Wiley, Hoboken, NJ, 2003).
Hogg, R. V., McKean, J. W. & Craig, A. T. Introduction to Mathematical Statistics (Pearson Prentice Hall, Upper Saddle River, NJ, 2005).
Gut, A. An Intermediate Course in Probability (Springer, New York, 2009).
King, E. L. & Altman, C. A schematic method of deriving the rate laws for enzyme-catalyzed reactions. J. Phys. Chem. 60, 1375–1378 (1956).
Qi, F., Dash, R. K., Han, Y. & Beard, D. A. Generating rate equations for complex enzyme systems by a computer-assisted systematic method. BMC Bioinform. 10, 238–238 (2009).
Kuzmič, P. Program DYNAFIT for the analysis of enzyme kinetic data: application to HIV proteinase. Anal. Biochem. 237, 260–273 (1996).
Leskovac, V. Comprehensive Enzyme Kinetics (Springer US, New York, 2003).
Purich, D. L. & Allison, R. D. Handbook of Biochemical Kinetics: A Guide to Dynamic Processes in the Molecular Life Sciences. (Elsevier Science, New York, 1999).
Fenton, L. The sum of log-normal probability distributions in scatter transmission systems. IEEE Trans. Commun. Syst. 8, 57–67 (1960).
Marlow, N. A. A normal limit theorem for power sums of independent random variables. Bell Syst. Tech. J. 46, 2081–2089 (1967).
We thank F. Del Carratore and the Synthetic Biology Research Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM) for providing technical support. This work received funding from the UK Biotechnology and Biological Sciences Research Council (BB/M000354/1, BB/M017702/1 (R.B.)) and the European Union’s Horizon 2020 Research and Innovation Programme (grant agreement no. 720793, the H2020 TOPCAPI project (R.B.)).
The authors declare that they have no competing interests as defined by Nature Research, or other interests that might be perceived to influence the results and/or discussion reported in this paper.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Key references using this protocol
1. Achcar, F. et al. PLOS Comput. Biol. 8, e1002352 (2012): https://doi.org/10.1371/journal.pcbi.1002352
2. Achcar, F., Barrett, M. P. & Breitling, R. FEBS J. 280, 4640–4651 (2013): https://doi.org/10.1111/febs.12436
3. Tsigkinopoulou, A., Baker, S. M. & Breitling, R. Trends Biotechnol. 35, 518–529 (2017): https://doi.org/10.1016/j.tibtech.2016.12.008
Integrated supplementary information
Supplementary Figure 1 Properties of the standard normal and log-normal distributions (μ = 0 and σ = 1).
For the normal distribution, the standard deviation (σ) is additive, and 68.27% of the probability density are contained within the confidence interval [μ−σ, μ+σ]. For the log-normal distribution, the geometric standard deviation is multiplicative and describes a confidence interval around the geometric mean of the distribution, which contains 68.27% of the probability density. The Spread (or multiplicative standard deviation) describes the confidence interval around the mode of the distribution, which contains this fraction of the density. The geometric standard deviation and the Spread are equally valid ways to describe our uncertainty about a parameter, and each has its advantages for some applications. For the protocol, the main advantage in using the Mode and the Spread is the fact that the Spread is symmetric around the most likely value (mode), in the same way as the standard deviation of a normal distribution (i.e., the probability density at each endpoint of the interval is identical). This is not the case for the geometric standard deviation, as shown in the figure. As a result, is more intuitive to specify and communicate our uncertainty about a parameter by using the confidence interval around the mode, rather than that around the median. As can be seen in the figure, the most likely parameter values might not even be included in the confidence interval around the median, which is clearly undesirable when specifying the range of plausible values.
Blue arrows correspond to the maintenance of the ATP/ADP ratio by direct assignment (in combination with reaction 13), rather than by differential equations, in the published model. Likewise, the cytosolic glycerol levels are kept at zero by direct assignment, corresponding to rapid export of glycerol.
Supplementary Figure 3 Effect of reducing TPI on the steady-state flux of glucose, pyruvate and glycerol.
Replicated results matching Fig. 3b of the published model (Helfert et al,. Biochem. J. (2001)).
Supplementary Figure 4 Plots of the initial priors (red lines) and the samples from the final distributions (green histograms), along with the P values of the K-S test.
For the parameter Km+ the adjusted distribution is also included (blue line).
Pairwise correlations between the sampled parameter values.
About this article
Cite this article
Tsigkinopoulou, A., Hawari, A., Uttley, M. et al. Defining informative priors for ensemble modeling in systems biology. Nat Protoc 13, 2643–2663 (2018). https://doi.org/10.1038/s41596-018-0056-z
Pharmacology & Therapeutics (2020)
Organic Process Research & Development (2020)
Unravelling the γ-butyrolactone network in Streptomyces coelicolor by computational ensemble modelling
PLOS Computational Biology (2020)