Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Brief Communication
  • Published:

EnzymeML: seamless data flow and modeling of enzymatic data

Abstract

The design of biocatalytic reaction systems is highly complex owing to the dependency of the estimated kinetic parameters on the enzyme, the reaction conditions, and the modeling method. Consequently, reproducibility of enzymatic experiments and reusability of enzymatic data are challenging. We developed the XML-based markup language EnzymeML to enable storage and exchange of enzymatic data such as reaction conditions, the time course of the substrate and the product, kinetic parameters and the kinetic model, thus making enzymatic data findable, accessible, interoperable and reusable (FAIR). The feasibility and usefulness of the EnzymeML toolbox is demonstrated in six scenarios, for which data and metadata of different enzymatic reactions are collected and analyzed. EnzymeML serves as a seamless communication channel between experimental platforms, electronic lab notebooks, tools for modeling of enzyme kinetics, publication platforms and enzymatic reaction databases. EnzymeML is open and transparent, and invites the community to contribute. All documents and codes are freely available at https://enzymeml.org.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Overview of the workflow and tools that were implemented in this work.

Similar content being viewed by others

Data availability

All data availability is listed in Supplementary Table 11.

Code availability

All code availability is listed in Supplementary Table 11.

References

  1. Iqbal, S. A., Wallach, J. D., Khoury, M. J., Schully, S. D. & Ioannidis, J. P. A. Reproducible research practices and transparency across the biomedical literature. PLoS Biol. 14, e1002333 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  2. Wulf, C. et al. A unified research data infrastructure for catalysis research—challenges and concepts. ChemCatChem 13, 3223–3236 (2021).

    Article  CAS  Google Scholar 

  3. Halling, P. et al. An empirical analysis of enzyme function reporting for experimental reproducibility: missing/incomplete information in published papers. Biophys. Chem. 242, 22–27 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Stroberg, W. & Schnell, S. On the estimation errors of KM and V from time-course experiments using the Michaelis–Menten equation. Biophys. Chem. 219, 17–27 (2016).

    Article  CAS  PubMed  Google Scholar 

  5. Cvijovic, M. et al. Bridging the gaps in systems biology. Mol. Genet. Genomics 289, 727–734 (2014).

    Article  CAS  PubMed  Google Scholar 

  6. Pleiss, J. Standardized data, scalable documentation, sustainable storage—EnzymeML as a basis for fair data management in biocatalysis. ChemCatChem 13, 3909–3913 (2021).

    Article  CAS  Google Scholar 

  7. Range, J. et al. EnzymeML—a data exchange format for biocatalysis and enzymology. FEBS J. 289, 5864–5874 (2022).

    Article  CAS  PubMed  Google Scholar 

  8. Wilkinson, M. D. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 3, 160018 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  9. Tipton, K. F. et al. Standards for reporting enzyme data: the STRENDA Consortium: what it aims to do and why it should be helpful. Perspect. Sci. 1, 131–137 (2014).

    Article  Google Scholar 

  10. Hucka, M. et al. The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics 19, 524–531 (2003).

    Article  CAS  PubMed  Google Scholar 

  11. Malzacher, S., Range, J., Halupczok, C., Pleiss, J. & Rother, D. BioCatHub, a graphical user interface for standardized data acquisition in biocatalysis. Chem. Ing. Tech. 92, 1251–1251 (2020).

    Article  CAS  Google Scholar 

  12. Hoops, S. et al. COPASI—a complex pathway simulator. Bioinformatics 22, 3067–3074 (2006).

    Article  CAS  PubMed  Google Scholar 

  13. Christensen, C. D., Hofmeyr, J. H. S. & Rohwer, J. M. PySCeSToolbox: a collection of metabolic pathway analysis tools. Bioinformatics 34, 124–125 (2018).

    Article  CAS  PubMed  Google Scholar 

  14. Swainston, N. et al. STRENDA DB: enabling the validation and sharing of enzyme kinetics data. FEBS J. 285, 2193–2204 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Wittig, U., Rey, M., Weidemann, A., Kania, R. & Müller, W. SABIO-RK: an updated resource for manually curated biochemical reaction kinetics. Nucleic Acids Res. 46, D656–D660 (2018).

    Article  CAS  PubMed  Google Scholar 

  16. Bezerra, R. M. F. & Dias, A. A. Discrimination among eight modified Michaelis–Menten kinetics models of cellulose hydrolysis with a large range of substrate/enzyme ratios: inhibition by cellobiose. Appl. Biochem. Biotechnol. 112, 173–184 (2004).

    Article  CAS  PubMed  Google Scholar 

  17. Buchholz, P. C. F., Ohs, R., Spiess, A. C. & Pleiss, J. Progress curve analysis within BioCatNet: comparing kinetic models for enzyme-catalyzed self-ligation. Biotechnol. J. 14, e1800183 (2019).

    Article  PubMed  Google Scholar 

  18. Dias Gomes, M., Moiseyenko, R. P., Baum, A., Jørgensen, T. M. & Woodley, J. M. Use of image analysis to understand enzyme stability in an aerated stirred reactor. Biotechnol. Prog. 35, e2878 (2019).

    Article  CAS  PubMed  Google Scholar 

  19. Woodley, J. M. Advances in biological conversion technologies: new opportunities for reaction engineering. React. Chem. Eng. 5, 632–640 (2020).

    Article  CAS  Google Scholar 

  20. Courtot, M. et al. Controlled vocabularies and semantics in systems biology. Mol. Syst. Biol. 7, 543 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  21. Kluyver, T. et al. Jupyter Notebooks—a publishing format for reproducible computational workflows. In Positioning and Power in Academic Publishing: Players, Agents and Agendas, Proc. 20th Int. Conf. on Electronic Publishing (eds. Loizides, F. & Schmidt, B.) 87–90 (IOS Press, 2016).

  22. Hunter, J. D. Matplotlib: a 2D graphics environment. Comput. Sci. Eng. 9, 90–95 (2007).

    Article  Google Scholar 

  23. Virtanen, P. et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods 17, 261–272 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Newville, M. et al. lmfit/lmfit-py: 1.1.0; https://doi.org/10.5281/zenodo.7370358 (2022).

  25. Pinto, M. F. et al. interferENZY: a web-based tool for enzymatic assay validation and standardized kinetic analysis. J. Mol. Biol. 433, 166613 (2021).

    Article  CAS  PubMed  Google Scholar 

  26. Crosas, M. The Dataverse Network®: an open-source application for sharing, discovering and preserving data. D-Lib Magazine 17, 2 (2011).

  27. Olivier, B. G., Rohwer, J. M. & Hofmeyr, J.-H. S. Modelling cellular systems with PySCeS. Bioinformatics 21, 560–561 (2005).

    Article  CAS  PubMed  Google Scholar 

  28. Dräger, A. et al. JSBML: A flexible java library for working with SBML. Bioinformatics 27, 2167–2168 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

S.L., H.D., J.P.R. and J.P. were supported by the German Research Foundation under Germany’s Excellence Strategy (EXC 2075, grant 390740016) and by the German Federal Ministry of Education and Research (grant 01DG17027). J.D.S. was supported by the German Research Foundation under Germany’s Excellence Strategy (EXC 2186, grant 390919832). A.W. and U.W. were supported by the Klaus Tschira Foundation and the German Federal Ministry of Education and Research within de.NBI (031A540). T.K. and S.N. were supported by the National Research Foundation of South Africa (grants 105889 and 112099). J.M.R. was supported by the National Research Foundation of South Africa (grant 120859). A.H. and J.W. were partially funded by the Sino-Danish Center for Education and Research and the Technical University of Denmark. C.E.L. and A.S.B. were supported by the US Food and Drug Administration, Center for Drug Evaluation and Research, and Office of Pharmaceutical Quality, through grant U01FD006484. C.E.L. also acknowledges funding by the US National Science Foundation through the Graduate Research Fellowship Program (grant DGE-1650044). F.T.B. was supported by the German Federal Ministry of Education and Research within de.NBI (031L0104A). STRENDA and STRENDA DB are funded by the Beilstein-Institut. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Author information

Authors and Affiliations

Authors

Contributions

S.M., J.D.S., M.F.P., C.E.L., A.V.H. and S.N. contributed data to the scenarios. D.R., P.M., A.S.B., J.M.W. and T.K. supervised the scenarios. F.T.B. and J.M.R. contributed their kinetic modeling platforms. D.I., A.W. and U.W. contributed database platforms. C.K., N.S. and S.S. contributed to the conceptualisation of EnzymeML and to the development of the protocols. S.L., H.D. and J.R. implemented the EnzymeML workflows and analyzed data. J.P. supervised the development and application of EnzymeML workflows and prepared the draft of the manuscript with input from all authors. All authors approved the final version of the manuscript.

Corresponding author

Correspondence to Jürgen Pleiss.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Methods thanks Shelley Copley, Kenneth Johnson and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Arunima Singh, in collaboration with the Nature Methods team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lauterbach, S., Dienhart, H., Range, J. et al. EnzymeML: seamless data flow and modeling of enzymatic data. Nat Methods 20, 400–402 (2023). https://doi.org/10.1038/s41592-022-01763-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41592-022-01763-1

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing