MEMOTE for standardized genome-scale metabolic model testing

Recommended Citation Lieven, Christian; Beber, Moritz E; Olivier, Brett G; Bergmann, Frank T; Ataman, Meric; Babaei, Parizad; Bartell, Jennifer A; Blank, Lars M; Chauhan, Siddharth; Correia, Kevin; Diener, Christian; Dräger, Andreas; Ebert, Birgitta E; Edirisinghe, Janaka N; Faria, José P; Feist, Adam M; Fengos, Georgios; Fleming, Ronan M T; García-Jiménez, Beatriz; Hatzimanikatis, Vassily; van Helvoirt, Wout; Henry, Christopher S; Hermjakob, Henning; Herrgård, Markus J; Kaafarani, Ali; Kim, Hyun Uk; King, Zachary; Klamt, Steffen; Klipp, Edda; Koehorst, Jasper J; König, Matthias; Lakshmanan, Meiyappan; Lee, Dong-Yup; Lee, Sang Yup; Lee, Sunjae; Lewis, Nathan E; Liu, Filipe; Ma, Hongwu; Machado, Daniel; Mahadevan, Radhakrishnan; Maia, Paulo; Mardinoglu, Adil; Medlock, Gregory L; Monk, Jonathan M; Nielsen, Jens; Nielsen, Lars Keld; Nogales, Juan; Nookaew, Intawat; Palsson, Bernhard O; Papin, Jason A; Patil, Kiran R; Poolman, Mark; Price, Nathan D; Resendis-Antonio, Osbaldo; Richelle, Anne; Rocha, Isabel; Sánchez, Benjamín J; Schaap, Peter J; Malik Sheriff, Rahuman S; Shoaie, Saeed; Sonnenschein, Nikolaus; Teusink, Bas; Vilaça, Paulo; Vik, Jon Olav; Wodke, Judith A H; Xavier, Joana C; Yuan, Qianqian; Zakhartsev, Maksim; and Zhang, Cheng, "MEMOTE for standardized genome-scale metabolic model testing." (2020). Articles, Abstracts, and Reports. 2926. https://digitalcommons.psjhealth.org/publications/2926


Supplementary Figures: Arranged as they appear in the manuscript
To see the figures in context, please refer to the corresponding sections below. The following text is a verbatim copy modified to work in print taken from memote's documentation at the time of publication. For an updated version please check the latest memote documentation.

Understanding the reports
Memote will return one of four possible outputs. If your preferred workflow is to benchmark one or several genome-scale metabolic models (GSM) memote generates either a snapshot ( Figure 20) or a diff report (Figure 21), respectively. For the reconstruction workflow the primary output is a history report ( Figure 22). This will only work if the provided input models are formatted correctly in the systems biology markup language (SBML). However, if a provided model is not a valid SBML file, memote composes a report enumerating errors and warnings from the SBML validator the order of appearance. To better understand the output of the error report we refer the reader to this section of the SBML documentation. In this section, we will focus on how to understand the snapshot, diff and history reports.

Toolbar
In all three reports, the blue toolbar at the top shows (from left to right) the memote logo, a button which expands and collapses all test results, a button which displays the readme and the github icon which links to memote's github page. On the snapshot report, the toolbar will also display the identifier of the tested GEM and a timestamp showing when the test run was initiated.

Main Body
The main body of the reports is divided into an independent section to the left and a specific section to the right.
The tests in the independent section are agnostic of the type of modeled organism, preferred modeling paradigms, the complexity of a genome-scale metabolic model (GEM) or the types of identifiers that are used to describe its components. The tests in this section focus on testing adherence to fundamental principles of constraint-based modeling: mass, charge and stoichiometric balance as well as the presence of annotations. The results in this section can be normalized, and thus enable a comparison of GEMs. The Score at the bottom of the page summarises the results to further simplify comparison. While calculating an overall score for this section allows for the quick comparison of any two given models at a glance, we recommend a thorough analysis of all results with respect to the desired use case.
The specific section on the right provides model specific statistics and covers aspects of a metabolic network that can not be normalized without introducing bias. For instance, dedicated quality control of the biomass equation only applies to GEMs which are used to investigate cell growth, i.e., those for which a biomass equation has been generated. Some tests in this section are also influenced by whether the tested GEM represents a prokaryote or a eukaryote. Therefore the results cannot be generalized and direct comparisons ought to take bias into account.

Test Results
Test results are arranged in rows with the title visible to the left and the result on the right. The result is displayed as white text in a coloured rectangle detailed below in the subsection Color.
By default only the minimum information is visible as indicated by an arrow pointing down right of the result. Clicking anywhere in the row will expand the result revealing a description of the concept behind the test, its implementation and a brief summary of the result. In addition, there is a text field which contains plain text representations of Python objects which can be copied and pasted into Python code for follow up procedures.
Some tests carry out one operation on several parameters and therefore deviate slightly from the descriptions above. Expanding the title row reveals only the description, while rows of the individual parameters reveal the text fields.
In the history report, instead of text fields scatterplots show how the respective metrics developed over the commit history for each branch of a repository. By clicking an entry in the legend, it is possible to toggle its visibility in the plot.

Interpretation
The variety of constraints-based modeling approaches and differences between various organisms compound the assessment of GSMs. While memote facilitates model assessment it can only do so within limitations. Please bear in mind the diversity of Paradigms that challenge some of memote's results.

Snapshot Report
Results without highlights are kept in the main blue color of the memote color scheme. Scored results ( Figure G1) will be marked with a gradient ranging from red to green denoting a low or a high score respectively: Figure 23: Snapshot Report Score Gradient

Diff Report
The colour in the Diff Report ( Figure G2) depends on the ratio of the sample minimum to the sample maximum. Result sets where the sample minimum and the sample maximum are identical will be coloured in the main blue color of the memote color scheme. Result sets where the sample minimum is very small relative to the sample maximum will appear red</span. This ratio is calculated with as . This is then mapped to the following gradient: Figure 24: Diff Report Ratio Gradient

Score
Each test in the independent section provides a relative measure of completeness with regard to the tested property. The final score is the weighted sum of all individual test results normalized by the maximally achievable score, i.e., all individual results at 100%. Individual tests can be weighted, but it is also possible to apply weighting to entire subsections. Hence the final score is calculated: Weights for sections and individual tests are indicated by a white number inside a magenta badge. No badge means that the weight defaults to 1.
The subsections "Consistency" and "Annotation -SBO" have weights of 3 and 2, respectively. The test "Stoichiometric Consistency" itself is weighted 3 times stronger than the remaining tests in the "Consistency" subsection. The remaining subsections and tests which cover annotations of metabolites, reactions and genes have weights of 1 (Supplementary Figure G1).

"Reconstructions" and "Models"
Some authors may publish metabolic networks which are parameterized, ready to run flux balance analysis (FBA), these are referred to simply as 'models'. Alternatively, others may publish unconstrained metabolic knowledge bases (referred to as 'reconstructions'), from which several models can be derived by applying different constraints. Both can be encoded in SBML. With having an independent test section, we attempt to make both 'models' and 'reconstructions' comparable, although a user should be aware that this difference exists and is subject to some discussion. Please note that some tests in the specific section may error for a reconstruction as they require initialization.

"Lumped" and "Split" Biomass Reaction
There are two basic ways of specifying the biomass composition. The most common is a single lumped reaction containing all biomass precursors. Alternatively, the biomass equation can be split into several reactions each focusing on a different macromolecular component for instance a (1 gDW ash) + b (1 gDW phospholipids) + c (free fatty acids)+ d (1 gDW carbohydrates) + e (1 gDW protein) + f (1 gDW RNA) + g (1 gDW DNA) + h (vitamins/cofactors) + x ATP + x H2O-> 1 gDCW biomass + x ADP + x H + x Pi. The benefit of either approach depends very much on the use cases which are discussed by the community. Memote employs heuristics to identify the type of biomass which may fail to distinguish edge cases.

"Average" and "Unique" Metabolites
A metabolite consisting of a fixed core with variable branches such as a membrane lipid is sometimes implemented by averaging over the distribution of individual lipid species. The resulting pseudometabolite is assigned an average chemical formula, which requires scaling of stoichiometries of associated reactions to avoid floating point numbers in the chemical formulae. An alternative approach is to implement each species as a distinct metabolite in the model, which increases the total count of reactions. Memote cannot yet distinguish between these paradigms, which means that results in the specific sections that rely on the total number of reactions or scaling of stoichiometric parameters may be biased.

Supplementary Note 2: Validation against experimental data
To compare model predictions to experimental measurements, a researcher would typically write a short script. The reproducibility of this script may be limited by the original author's style of writing code, whether the code has been rigorously checked for errors, and whether it is dependent on obsolete libraries. The latter, so called software rot, arises from a lack of active maintenance (Beaulieu-Jones and Greene 2017).
In contrast, with memote researchers may optionally define a configuration file (in YAML format) in which they can set the medium and FBA objective. This file can be used by researchers without prior programming experience. It configures memote to execute clearly defined, formulaic operations, which are unit tested. Lastly, it confers the burden of maintenance to the memote community represented through this consortium. This does not only distribute the necessity for funding onto many shoulders, but also increases the likelihood of the codebase keeping up with advances in its core dependencies, i.e., keeping software rot at bay. The development of the COBRAToolbox (Heirendt et al. 2017) and cobrapy (Ebrahim et al. 2013) are pertinent examples of community projects that operate on a similar strategy. Moreover, frequent versioning ensures that users can return to previous versions to re-run analyses.
Setting up a version-controlled model repository not only allows researchers to publish a 'default' unspecific GEM of the investigated organism, but also reproducible instructions on how to obtain a model that is specific to the organism in a defined experimental context including, and validated against the data supporting this context. This formulaic approach of deriving a GEM into a condition-specific form supports Heavner and Price's (Heavner and Price 2015) call for more transparency and reproducibility in metabolic network reconstruction (25). Figure 25: Experimental tests can be tailored to a specific condition through the use of one or several configuration files (configs). (a) To validate GEMs against experimental data measured in specific conditions, researchers usually write their scripts which constrain the model. This is problematic as scripts can vary a lot and they are, unless actively maintained, susceptible to software rot. (b) With memote, user-defined configuration files replace scripts, which allows the experimental validation of GEMs to be unified and formalized. Bundling the model, configuration files, and experimental data within a version-controlled repository (indicated by the blue asterisk*) facilitates reproducibility.

Supplementary Note 3: Integration in third party tools and services
Memote's core functions are available through a python API and the online service is available through either a web interface or a programmatic REST API. We have integrated memote in KBase (Arkin et al. 2018) as an app, OptFlux (Rocha et al. 2010) (version 3.4) as a plug-in and link to it from the BiGG Models Database (King et al. 2015). We plan to integrate it with BioModels (Li et al. 2010), and the RAVEN toolbox (Agren et al. 2013).

Supplementary Note 4: Discussion of alternatives to memote
The cloud-based, distributed version control for GEMs encoded as SBML3FBC is only one possible implementation approach for version control and collaboration. Alternatives include Pathway Tools (Karp et al. 2009) which internally stores organism data in the form of a database, and AuReMe (Aite et al. 2018), which allows users to interact with a database by wikis. Although databases offer greater capacity and speed than single, large data files, the programmatic or form-based interaction and more complex setup procedure required for databases may not be easily accessible to a broad community. We see Memote in combination with GitHub, GitLab, or BioModels as a means of version control that is simple to set up and easy to manage.
For quality control, alternatives include rBioNet (Thorleifsson and Thiele 2011), an extension to the COBRAToolbox (Heirendt et al. 2017). It primarily focuses on guiding reconstruction by flagging operations which violate SOPs but also provides functions which print basic information such as the amount of model components and dead-end metabolites. Memote may be more widely adopted because no license for MATLAB is required. gsmodutils (Gilbert et al. 2019) is another option but is less accessible to a wider community due to the need for proficiency in Python for use. We note that owing to the exchange format of SBML, memote is fully compatible with rBioNet and gsmodutils.

Supplementary Note 5: Outlook
In future, memote could be extended to provide support for tests based on multi-omics data (Hackett et al. 2016). Moreover, to distribute all files of a model repository together, the model, supporting data and scripts could be automatically bundled into one ZIP-based archive file (so-called COMBINE archive) (Bergmann et al. 2014). These archives can include a formal description of simulation experiments to ensure exchangeability and reproducibility (Waltemath et al. 2011).
The tests that memote offers only apply to stoichiometric models. However, the underlying principles behind memote could be applied to other modeling paradigms, i.e., to models of metabolism and expression (ME-models) (OBrien et al. 2013), kinetic (Vasilakou et al. 2016), or even systems pharmacological models (Thiel et al. 2017).

20
To simplify interpretation, the following figures are grouped by the sections of their corresponding test cases as they appear in a snapshot report. The code that was used to generate the data and figures has been deposited on GitHub https://github.com/biosustain/memote-meta-study.

Tested models
We tested models from seven GEM collections comprising manually and (semi)-automatically reconstructed GEMs ( In order to respect the limited resources on the DTU high performance computing infrastructure, we set a maximum time limit for running the memote test suite. This introduced a bias against large models. Additionally, certain models failed the testing procedure. In the following we tabulate the total size of the collections as well as the final number of tested models. The results are shown in Table 1.

Clustering
In order to perform the clustering analyses, we used all normalized test metrics excluding some particular cases. Excluded are the Sections 7.3.4.2 & 7.3.4.6 because the basic information only contains unnormalized model dimensions and because a biomass formulation is not present in all models. We further removed individual biomass related test cases, as well as the metabolic coverage since that is not properly normalized. Additionally, test cases that contained errors were penalized with the worst metric of one.
To determine the most relevant tests to discriminate between model collections, we built a classifier using a random forest (Breiman 2001) over the collections and normalized test results (0.99 accuracy and 0.01% out-of-bag (OOB) error). Then, the importance of each variable, i.e., test case, was ranked with the Mean Decrease in Accuracy (MDA) (Louppe et al. 2013). This metric measures the total decrease in accuracy, averaged over all trees of the forest, when the value of a given variable is permuted in the OOB samples. Figure 29 represents the 15 most discriminant features on average (see last column) and their independent relevance by collection. The higher the decrease   Figure 28: Depicted are the distances between models in higher order space given by the normalized test features reduced to two dimensions using UMAP. in accuracy, the higher the relative contribution of such a test to differentiate among collections. Thus, the five most discriminant tests are purely metabolic reactions, transport reactions, deadend metabolites, orphan metabolites, and the presence of a non-growth associated maintenance reaction. Although there is a variable range of importance for each collection, e.g., for CarveMe transport reactions and orphans are more relevant; for Kbase transport reactions; for Ebrahim et al. purely metabolic reactions. For a detailed study of the clustering properties, please refer to the Supplementary Clustering Analysis notebook.

Test Suite
The database identifiers referenced throughout the Annotation sections belong to common biochemical databases that are listed in Table 2. -Only for a minority of models in BiGG, Ebrahim et al., and OptFlux Models memote could not identify a biomass reaction (Figure 142).