Identification of flux checkpoints in a metabolic pathway through white-box, grey-box and black-box modeling approaches

Lo-Thong, Ophélie; Charton, Philippe; Cadet, Xavier F.; Grondin-Perez, Brigitte; Saavedra, Emma; Damour, Cédric; Cadet, Frédéric

doi:10.1038/s41598-020-70295-5

Download PDF

Article
Open access
Published: 10 August 2020

Identification of flux checkpoints in a metabolic pathway through white-box, grey-box and black-box modeling approaches

Ophélie Lo-Thong^1,2,
Philippe Charton ORCID: orcid.org/0000-0003-3062-2743^1,2,
Xavier F. Cadet³,
Brigitte Grondin-Perez⁴,
Emma Saavedra⁵,
Cédric Damour⁴ &
…
Frédéric Cadet^1,2

Scientific Reports volume 10, Article number: 13446 (2020) Cite this article

2981 Accesses
10 Citations
1 Altmetric
Metrics details

Subjects

Abstract

Metabolic pathway modeling plays an increasing role in drug design by allowing better understanding of the underlying regulation and controlling networks in the metabolism of living organisms. However, despite rapid progress in this area, pathway modeling can become a real nightmare for researchers, notably when few experimental data are available or when the pathway is highly complex. Here, three different approaches were developed to model the second part of glycolysis of E. histolytica as an application example, and have succeeded in predicting the final pathway flux: one including detailed kinetic information (white-box), another with an added adjustment term (grey-box) and the last one using an artificial neural network method (black-box). Afterwards, each model was used for metabolic control analysis and flux control coefficient determination. The first two enzymes of this pathway are identified as the key enzymes playing a role in flux control. This study revealed the significance of the three methods for building suitable models adjusted to the available data in the field of metabolic pathway modeling, and could be useful to biologists and modelers.

Integrating glycolysis, citric acid cycle, pentose phosphate pathway, and fatty acid beta-oxidation into a single computational model

Article Open access 02 September 2023

Sylwester M. Kloska, Krzysztof Pałczyński, … Tadeusz A. Wysocki

Reconstruction, simulation and analysis of enzyme-constrained metabolic models using GECKO Toolbox 3.0

Article 18 January 2024

Yu Chen, Johan Gustafsson, … Eduard J. Kerkhoven

Diverse classes of constraints enable broader applicability of a linear programming-based dynamic metabolic modeling framework

Article Open access 14 January 2022

Justin Y. Lee & Mark P. Styczynski

Introduction

Entamoeba histolytica is a protozoan parasite responsible for the development of amoebiasis in humans. This disease is a worldwide public health problem that causes over 100 000 deaths per year¹. Indeed, a recent report estimates the prevalence of E. histolytica infection at 42% in Mexico, 41% in China and 34% in South Africa². So far, no vaccine exists to prevent the infection, but patients who suffer from amoebiasis can be treated with different drugs such as metronidazole or tinidazole. However, intolerances to these treatments and potential appearance of drug resistance^2,3,4,5 reveal the urgency of the situation and the need to find new therapies. Previous studies have focused on the identification of new drug targets in E. histolytica glycolysis^6,7,8, since the parasite depends completely on glycolysis to produce ATP⁹.

While drug research and development is time consuming and expensive, the use of computational approaches might help to speed up the process. Lately, the combination of in vitro reconstitution and in silico modeling of the glycolysis pathway in E. histolytica highlighted the possibility of using modeling for predicting flux and metabolite concentrations under given conditions⁷ and for appraising the effect of the addition of alternative routes⁸. Pathway modeling can be done through many statistical or knowledge driven approaches¹⁰. The first one only uses experimental data to understand relationships between biological variables, whereas the second uses pathway information (metabolic reactions, thermodynamic and kinetic parameters) to design complete detailed metabolic pathway reconstructions. Artificial Neural Network (ANN) can be classified among the data-driven approaches and is based on the creation of a network whose structure and functioning are similar to those of a biological neural network¹¹. Traditionally, this method is employed to identify new biomarkers of diseases such as cancer¹¹ or to predict the bioavailability of drugs in patients^12,13.

The recent model of E. histolytica glycolysis applies a knowledge-based method called metabolic network to each part of the pathway: the first part from glucose to dihydroxyacetone phosphate and the second part (Fig. 1) from 3-phosphoglycerate (3PG) to pyruvate (Pyr)⁸. Interestingly, these studies found that 3-phoshoglycerate mutase (PGAM) was the main controlling factor in the second part of glycolysis, whereas pyruvate phosphate dikinase (PPDK) exerted the lowest flux control. This result comes in conflict with previous research⁶, which identified PGAM and PPDK as important flux control steps of amoebal glycolysis. This difference is explained by the use of inappropriate enzyme proportions in the in vitro reconstitution experiments, not identical to those determined in amoebas, in the first study. Moreover, here our study is based on the experimental results of Moreno-Sanchez⁸.

It should be noted that obtaining a solid knowledge-based model relies, as the name suggests, upon an advanced understanding of the cell system, including physiological metabolite concentrations and enzyme activities, kinetic parameters and the type of mechanism involved, as well as thermodynamic constants of the pathway reactions. However, this knowledge is often not available in the literature or is highly complex to model, as seen with the kinetic mechanism of PPDK^8,14.

In the present study, our objective is to contribute to overcome the lack of knowledge and the complexity of kinetic modeling (white-box modeling), by testing two new modeling approaches: a data-driven approach (black-box modeling) which uses ANN model, and a hybrid-based approach (grey-box modeling) which uses a traditional kinetic-based model with an added adjustment term. For this purpose, these three modeling approaches are applied to an experimental example: the second part of E. histolytica glycolysis, using the experimental results previously published by Moreno-Sanchez et al.⁸

Our analysis shows that the different models predict correctly the final flux values in the second part of E. histolytica glycolysis pathway. The ANN model presents great predictive and generalization abilities; however, its complexity, through high Akaike Information Criterion value (AIC), ranks it among the less satisfactory models. The COPASI models provide satisfactory predicted fluxes, as well as the ANN model, with a marked preference for the grey-box approach. Subsequently, the flux control coefficients of the enzymes (${C}_{E}^{J}$) are calculated and allow the identification of the key enzymes involved in flux control^15,16,17. Taken together, these models enable the construction of the pathway from experimental data and the determination of the main controlling enzymes in the system, revealing the relevance of both the traditional white-box approach and the novel grey- and black-box approach. Such approaches could be extended to further biological pathway modeling, as they provide models adapted to various backgrounds.

Materials and methods

Second part of glycolysis experimental data

Experimental data of PGAM, ENO and PPDK activities and pathway flux (J_obs) are obtained from plots of a previous study⁸. The free online software WebPlotDigitizer (Version 4.1, https://automeris.io/WebPlotDigitizer/) is used to extract data from plots. These data are available in Tables S1 and S2.

Artificial neural networks (ANNs)

ANNs functioning mimics that of biological neurons, the networks consist of many layers allowing input reception and processing and output delivery. This technique can be used for solving classification or regression problems¹⁸. To build the second part of glycolysis in ANNs, different types of software are employed: RStudio (Version 1.1.456), an open-source integrated development environment for R¹⁹ and two packages: NeuralNet (Version 1.44.2) and Nnet (Version 7.3–12)^20,21.

Complex pathway SImulator (COPASI) metabolic networks

A first metabolic network of the studied pathway was kindly provided by the authors of a previous study⁸. This model is developed on GEPASI²², an old software for metabolic pathway modeling, replaced by COPASI since 2002.

The second part of the glycolysis is also modeled by using the open source software called COPASI (Version 4.24)²³. This software is used for metabolic network design, analysis and optimization. The resulting metabolic networks are based on the use of enzyme properties (kinetic parameters and mechanism-based rate equations).

Ethics approval and consent to participate

Not applicable.

Methodology

Black- white- and grey-box approach procedure

To conduct the present study, a specific methodology, different from that envisaged in the original article⁸, is defined (Fig. 2). In the first case of the black-box approach, ANN models are built with the experimental data concerning the relationship of pathway flux versus enzyme activity in the pathway in vitro reconstruction. Then, in the second and third case of the white- and grey-box approach, metabolic networks are built with enzyme parameters measured experimentally, and rate equations²⁴ according to the type of kinetic mechanism described for each enzyme. Once the models are designed, a comparison of their final flux and product concentrations is made. Also, for each approach, two different models are designed: one reaching a pseudo-steady-state flux through lactate and another at physiological metabolite concentrations. Subsequently, calculations of flux control coefficient for each of these models are made, allowing the determination of the main flux controlling enzyme.

Black-box approach

Artificial neural networks (ANNs) design

Typical feed-forward networks are designed and consist of three layers of neurons: an input layer, a single hidden layer and an output layer (Fig. 3). Input data are connected to the neurons and weights (w_i and w’_j) are assigned to each connection. When input data are processed by the neuron, the latter computes the weighted sum of its inputs, then applies an activation function (f). The activation function makes it possible to convert input into output and decides whether the neuron is activated or not. There are several activation functions, including the non-linear activation functions: logistic (log) and hyperbolic tangent (tanh). If the resulting output is higher than the set threshold, the neuron is considered as being activated, otherwise not. Lastly, the hidden layer leads to the final output result, displayed in the output layer.

Optimization of ANNs is ensured through the back-propagation method²⁵ in the NeuralNet package and the Broyden-Fletcher-Goldfarb-Shanno (BFGS) method²⁶ in the Nnet package. For detailed information on ANN functioning, see²⁷. In the ANN models, the inputs are the activities of each enzyme (PGAM, ENO and PPDK) used in the in vitro experiment (Table S1,⁸), and the output is the predicted pathway flux (J_pred). Also, each weight in the ANN is assigned automatically by RStudio. Given the small amount of experimental data (Table S1), ANN models are built with a training set made up of the complete Tables S1 or S2 datasets (the data from the experimental dots or data from the fitting curves, respectively), then optimized through a Leave-One-Out cross validation (LOOcv) procedure. Then, since we needed a separate test set to prevent overfitting, the models are evaluated on a different test set generated by the grey-box COPASI model (Table S3).

ANN selection and performance evaluation

The number of artificial neurons (or units) in the hidden layer is selected based on:

the root-mean-square error (RMSE):
$$RMSE = \sqrt{\frac{1}{n}\sum_{i=1}^{n}{({\widehat{Y_{i}}}-{Y}_{i})}^{2}}$$
(1)
with ${Y}_{i}$ and ${\widehat{Y_{i}}}$ respectively the observed and predicted values, n the total number of values, and i = 1, 2…n;
the mean absolute error (MAE) calculations:
$$MAE = \frac{1}{n}\sum_{i=1}^{n}\left|{Y}_{i}-{\widehat{Y_{i}}}\right|$$
(2)
with |…| symbolizing the absolute value;
and a specific equation estimating a range of numbers of HUs^28,29:
$${N}_{h} =\frac{{N}_{s}}{\alpha *({N}_{i}+{N}_{o})}$$
(3)

with N_h the number of HUs, N_s the number of samples in the training data, N_i the number of input units, N_o the number of output units and $\alpha$ an arbitrary scaling factor, usually 2–10.

RMSE and MAE are statistical metrics commonly used to evaluate the model performance^30,31,32,33.

White-box approach

Complex Pathway SImulator (COPASI) metabolic network design

The metabolic networks built in this study use the enzyme properties (kinetic parameters and kinetic rate equations), which are summarized in Tables 1, 2, and metabolite concentrations defined in Table 3. Furthermore, several models are built using either V_max or k_cat and E and pseudo-steady state metabolite concentrations or physiological metabolite concentrations. All simulations are carried out during the first hour, as was done in the experimental procedure⁸.

Table 1 Kinetic parameters of the enzymes in the second part of the glycolysis.

Full size table

Table 2 Kinetic equations of the enzymes in the second part of the glycolysis.

Full size table

Table 3 Metabolite concentrations used in the models.

Full size table

As in the previous study, for establishing a quasi steady-state and calculating the flux control coefficients during modeling, a last reaction is added: Lac formation from Pyr (Fig. 1). The kinetic equation of LDH is $k \times [Pyr]$, with the rate constant $k=2,000$ min⁻¹, and the Lac concentration is fixed at 300 µM.

Metabolic network refinement and validation

To enhance the COPASI model predictions, changes to their contents are carried out. First of all, the PPDK kinetic equation is modified and a more accurate one describing the full rate equation is used, the Uni Uni Bi Bi Ping-Pong (UUBB) mechanism (Eq. 4) as previously determined¹⁴:

$${v}_{2}=\frac{{V}_{f}{V}_{r}\left(ABC-\frac{PQR}{{K}_{eq}}\right)}{D}$$

(4)

with the denominator \(\text{D} = {V}_{r}{K}_{iB}{K}_{C}\text{A}+{V}_{r}{K}_{C}\text{AB}+{V}_{r}{K}_{B}\text{AC}+\frac{{V}_{f}}{{K}_{eq}}{K}_{iR}{K}_{Q}\text{P}+\frac{{V}_{f}}{{K}_{eq}}{K}_{R}\text{PQ}+{V}_{r}{K}_{iB}\frac{{K}_{C}}{{K}_{iQ}}\text{AQ}+\frac{{V}_{f}}{{K}_{eq}}{K}_{Q}\text{PR}+\frac{{V}_{f}}{{K}_{eq}}{K}_{P}\text{QR}+\frac{{V}_{f}}{{K}_{eq}}{K}_{Q}\text{PR}+ {V}_{r}{K}_{A}\text{BC}+\frac{{V}_{f}}{{K}_{eq}}\frac{{K}_{iR}{K}_{Q}}{{K}_{iC}}\text{CP}+{V}_{r}\frac{{K}_{c}}{{K}_{iQ}}\text{ABQ}+\frac{{V}_{f}}{{K}_{eq}}\frac{{K}_{R}}{{K}_{iA}}\text{APQ}+\frac{{V}_{f}}{{K}_{eq}}\frac{{K}_{P}}{{K}_{CB}}\text{BQR}+{V}_{r}\frac{{K}_{A}}{{K}_{iP}}\text{ACP}+{V}_{r}\frac{{K}_{A}}{{K}_{iR}}\text{BCR}+\frac{{V}_{f}}{{K}_{eq}}\frac{{K}_{P}}{{K}_{iB}}\text{BQR}+{V}_{r}\frac{{K}_{C}}{{K}_{iQ}{K}_{iiC}}\text{ABCQ}+\frac{{V}_{f}}{{K}_{eq}}\frac{{K}_{Q}}{{K}_{iC}}\text{CPR}+\frac{{V}_{f}}{{K}_{eq}}\frac{{K}_{Q}}{{K}_{iC}{K}_{iiC}}\text{CPR}+{V}_{r}\frac{{K}_{A}}{{K}_{iR}{K}_{iiC}}\text{BCQR}+{V}_{r}\text{ABC}+\frac{{V}_{f}}{{K}_{eq}}\text{PQR}+\frac{{V}_{f}}{{K}_{eq}}\frac{{K}_{iR}{K}_{Q}}{{K}_{iA}}\text{AP};\) A, B and C and P, Q and R are respectively the concentrations of the substrates PEP, AMP and PP_i and of the products Pyr, P_i and ATP of PPDK reaction; K is the Michaelis constant; K_i and K_ii are respectively the dissociation constant of the substrate or product and the inhibitor constant that affects the intercept (1/V_max). The experimental and fitted constants are listed in Table 4.

Table 4 K_i, K_ii and K_{PPi_AMP} used in the UUBB equation.

Full size table

Also, the estimation of kinetic parameters is made with COPASI Parameter Estimation task. With this task, a range of parameters is tested by COPASI, which predicts the final flux or the product concentrations and compares them to the experimental data. The process relies on the minimization of the cost function (5), i.e. the minimization of the error between the experimental values and the corresponding predicted values.

$$E\left(P\right)= \sum_{i,j}{\omega }_{j}.{({x}_{i,j}-{y}_{i,j}\left(P\right))}^{2}$$

(5)

with E the calculated error, P the tested parameter, ${\omega }_{j}$ is the calculated weight for each experimental data column, ${x}_{i,j}$ a point in the dataset and ${y}_{i,j}\left(P\right)$ the corresponding predicted value; i and j are the rows and columns in the experimental dataset. The weight calculation method was the mean square: ${\omega }_{j}=\frac{1}{\overline{{{x}^{2}}_{j}}}$, with $\overline{{{x}^{2}}_{j}}$ the mean of squared data from one column. The software provides a list of optimization methods, to find optimized values for the estimated parameters (https://copasi.org/Support/User_Manual/Methods/Optimization_Methods/).

Again, two types of estimations are carried out:

one estimating one or several parameters with one target value and
the other estimating one or several parameters with many target values.

The models obtained constitute the white-box approach, with known enzymatic parameters and equations.

Grey-box approach

In the specific case of the grey-box approach, to improve the COPASI model predictions, the kinetic equation of PPDK is changed to a ter-reactant reversible equations⁸ which was modified as follows (6):

$${v}_{1}=\frac{{V}_{f}\left(ABC-\frac{PQR}{{K}_{eq}}\right)}{{K}_{mA}B+{K}_{mB}A+{K}_{mC}B+{K}_{mB}C+\frac{{V}_{f}{K}_{mQ}P}{{V}_{r}{K}_{eq}}+\frac{{V}_{f}{K}_{mP}Q}{{V}_{r}{K}_{eq}}+\frac{{V}_{f}{K}_{mQ}R}{{V}_{r}{K}_{eq}}+\frac{{V}_{f}{K}_{mR}Q}{{V}_{r}{K}_{eq}}+ABC+\frac{{V}_{f}PQR}{{V}_{r}{K}_{eq}}+\alpha \left|{V}_{f}-{V}_{f0}\right|}$$

(6)

with the adjustment term $\alpha \left|{V}_{f}-{V}_{f0}\right|$ in the denominator, $\alpha$ is a defined number, ${V}_{f0}$ is the PPDK maximum rate in the forward direction used in the in vitro reconstitution and ${V}_{f}$ is the PPDK maximum rate in the forward direction in the model.

This particular model was built because, although the previous model could predict fairly well the final flux when PGAM and ENO activities were varied, it overestimated the flux when PPDK activity was varied. However, the previous model predicted the flux well, with the enzyme parameters used in the in vitro reconstitution. Therefore, an adjustment term should be added, in order to decrease PPDK rate with $\alpha$. Also, as ${V}_{f}$ of PPDK is equal to ${V}_{f0}$ when PGAM’s or ENO’s activity is varied, $\alpha$ is multiplied by ${V}_{f}-{V}_{f0}$, so that the adjustment term to be zero when ${V}_{f}={V}_{f0}$ and the flux predictions are not modified in these two cases mentioned above. Also, to ensure that the adjustment term is positive, we used the absolute value $\left|{V}_{f}-{V}_{f0}\right|$.

To determine the value of $\alpha$, first a range of values from 0 to 4*10⁶ with steps of 10⁶ is assessed. Then the range and the steps are reduced, from 10⁶ to 1, until we obtain better results for RMSE, and coefficient of determination (R²) between the predicted and experimental data. The equation for R² is given below:

$${R}^{2}= 1- \frac{\sum_{i=1}^{n}{({Y}_{i}-{\widehat{Y_{i}}})}^{2}}{\sum_{i=1}^{n}{({Y}_{i}-\overline{Y})}^{2}}$$

(7)

with ${Y}_{i}$ and ${\widehat{Y_{i}}}$ respectively the observed and predicted values, n being the total number of values and i = 1, 2…n.

It is important to note that this parameter $\alpha$ has no biological significance and is determined by a data-driven learning method, hence the name “grey box” for this model.

Model comparison

To compare accuracy of the models, RMSE, R² and AIC are assessed for the experimental dataset (Table S2). The same statistical metrics are used to evaluate their generalization ability with the test dataset (Table S3).

AIC measures the quality of the model by taking into account its complexity. Additionally, as the ratio “number of data-number of parameters” is less than 40, a corrected AIC is calculated as follows^20,34:

$$AIC = 2*k+n*ln\left(\frac{SSE}{n}\right)+\frac{2*k*\left(k+1\right)}{n-k-1}$$

(8)

with k being the number of parameters, SSE the Sum of Square Errors and n the number of data.

Furthermore, to assess the generalization ability of the models, a comparison of RMSE, R² and MAE is made on the previous test set (Table S3).

Flux control analysis

For purposes of analyzing the pathway flux control and identifying the key enzymes involved in the flux control, the flux control coefficient of each enzyme (${C}_{E}^{J}$) is calculated with each model (ANNs and metabolic networks). This measure, generally used in Metabolic Control Analysis (MCA), allows us to assess quantitatively the impact of the enzyme on the pathway flux^15,16,17. Here, ${C}_{E}^{J}$ is determined in an analytical way using the formula mentioned below (9):

$${C}_{E}^{J}= \frac{\partial J}{\partial x}*\frac{{x}_{0}}{{J}_{0}}$$

(9)

where $J$ is the flux and x is either the enzyme activity in the case of ANNs or the rate of the reaction catalyzed by the enzyme in the case of metabolic networks (COPASI), multiplied by a scalar factor $\frac{{x}_{0}}{{J}_{0}}$ which represents the reference values of enzyme activity/reaction rate and pathway flux.

Application and results

ANN modeling of the second part of glycolysis

First, we model the second part of E. histolytica glycolysis using the black-box modeling approach with ANN models and the first experimental dataset (Table S1, Fig. 4A,B) or the second experimental dataset (Table S2, Fig. 4C–F). For the first dataset, the evaluation of RMSE in cross-validation (cvRMSE) and MAE in cross-validation (cvMAE) shows a fluctuation of the error values when the number of HUs is varied and allows the identification of the best ANN model, presenting the lowest cvRMSE and cvMAE values. Also, the calculation of N_h gives a maximum value of 4 ($\alpha$ = 2), making it possible to identify the best model, regarding cvRMSE and cvMAE, with a number of HU equal to 1 (Fig. 4A). By comparing the ANN predicted fluxes with the experimental ones, we observe that this model can predict rather well the flux of the pathway for the training set, especially at high values of flux (Fig. 4B), and even if the calculated errors remain high (cvRMSE = 4.23 nmol·min⁻¹, cvMAE = 2.78 nmol·min⁻¹). The prediction of the test set shows that the model predicts the flux better when PGAM or ENO activity is varied, than when PPDK’s activity is varied. This can be explained by the small experimental data number in the training set, which is derived from experimentally controlled conditions. We built other ANN models with the NeuralNet package and tanh activation function and Nnet package, but the predictions are less good than those of previous models, with lower R² in cross-validation (cvR²) and respectively, cvRMSE = 4.47 nmol·min⁻¹ and cvMAE = 2.84 nmol·min⁻¹, for the first one and cvRMSE = 4.56 nmol·min⁻¹ and cvMAE = 2.66 nmol·min⁻¹ for the second one (Fig. S1).

Afterwards, we built another ANN model, this time using the second dataset, corresponding to the data from the fitting curves obtained from the experimental points in the first dataset. From the two packages used, we notice that, with NeuralNet and tanh activation function, it is easier to identify the optimal number of HUs, which is 18, but this is not the case with the Nnet package, where the models with 22 and 23 HUs present a better cvMAE or a better cvRMSE (Fig. 4C,D). As RMSE is the most used model selection criterion of both, we use 23 HUs for the second model with the Nnet package. The comparison of these two models shows their ability to simulate the metabolic pathway, with better results for the Nnet model (Fig. 4E,F). Also, the calculation of N_h gives a maximum value of 23 ($\alpha$ = 2); thus, both models comply with the limit set by the equation.

However, in order to select the best model and ensure that it is not too specific to our second dataset, we used the test set from the most performing COPASI model (Table S3), and predicted the final flux with our two ANN models. The NeuralNet model produced better results, with RMSE = 1.61 nmol·min⁻¹ and MAE = 1.37 nmol·min⁻¹, compared to the Nnet model. These results suggest that this novel black-box approach, using ANN, is relevant for constructing metabolic pathways from experimental data, with better predictions when working with bigger datasets, whether it be with NeuralNet or Nnet package.

Design of metabolic network with the white-box approach

After the modeling phase using the black-box method approach, we focused on the white-box approach and designed mechanistic models with COPASI. The first COPASI model we used was that of Moreno-Sanchez⁸; although it was created in GEPASI, we were able to work with this model on COPASI (Fig. S2A-C). The steady-state flux predicted with this model converged around 16.6 nmol·min⁻¹ for the three enzymes, with a flux that decreased for PGAM and increased for ENO and PPDK during simulation time (Fig. S2A). This result was lower than the experimentally measured result (27 nmol·min⁻¹)⁸. As for the prediction of metabolite concentrations, after one hour simulation time, 2PG was at 139.78 µM, PEP at 6.08 µM and Pyr at 8.31*10^–3 µM (Fig. S2B). Here also the predicted concentrations were higher than the experimentally measured results, with a concentration of 2PG at 58 ± 29 µM and PEP at 37 ± 16 µM (Pyr experimental concentration was not available) in the previous work⁸. Furthermore, analysis of the predicted flux when enzyme activities were varied showed quite good prediction of the flux for PGAM and PEP, but not for PPDK, which showed RMSE of 4.33 nmol·min⁻¹ (Fig. S2C).

The results of this first model clearly indicate that the studied metabolic pathway can be modeled with COPASI as a biochemical network using different kinetic parameters and equations, but it needs to be fine-tuned to be more accurate in terms of predictions. The primary modification made in this model concerned the V_max values and the metabolite concentrations. Indeed, we replaced these values with those used in the experimental conditions at a pseudo steady-state (see Tables 1–3 and Fig. 5A–C). These changes have the effect of increasing the predicted fluxes and metabolite concentrations, in particular with a flux of 25.2 nmol·min⁻¹ closer to the experimental value (Fig. 5A). As for the metabolite concentrations, they were still higher than those measured experimentally (Fig. 5B). The comparison between the predicted and observed fluxes revealed an enhancement of the predictive capability of our model with RMSE = 3.39 nmol·min⁻¹ and R² = 0.88 (Fig. 5C), emphasizing the importance of using appropriate parameters in the model.

However, this second model presents a poorer ability to predict the flux when PPDK activity is varied. For this reason, we decided to improve it by modifying the PPDK kinetic equation only and replace the Bi Bi Ping Pong kinetic equation used in the preceding models with the more precise Uni Uni Bi Bi kinetic equation defined by Varela-Gómez et al.¹⁴ (Fig. 5D–F). As some kinetic parameters (K_i and K_ii) were not characterized experimentally, they were arbitrarily fixed at 1,000 µM (see Table 4). This last model yielded a slight decline of reaction fluxes to around 22 nmol·min⁻¹ and higher metabolite concentrations than experimentally determined (Fig. 5D,E). Interestingly, we noted an improvement of flux predictions when enzyme activities were varied (RMSE = 2.43 nmol·min⁻¹ and R² = 0.94), in particular in the case of PPDK activity variation (Fig. 5F). Therefore, this second attempt to refine the COPASI model revealed that beyond the use of appropriate parameters, our model has to include precise kinetic equations to be more efficient.

As we said before, some parameters are not yet defined experimentally; therefore, we use COPASI Parameter Estimation task to estimate these kinetic parameters. The best results are obtained with the Particle Swarm optimization method, with a cost function of 771.135; the optimized values of K_i and K_ii are presented in Table S4. It is worth noting that the cost function value remains high, suggesting a failure of COPASI to estimate parameters better. This could be due to the high number of values to be parameterized and the low number of experimental data. Besides, these parameterized values have no physiological meaning, since they are in the molar range, and could be explained by the negligible impact of the parameterization with COPASI. Simulations run for one hour and fluxes and concentrations are analyzed again (Fig. 5G–I). We notice no significant change between the initial model and the optimized one. For the most part, the fluxes are increased: PGAM flux is at 23.4 nmol·min⁻¹ and ENO flux at 22.9 nmol·min⁻¹, except for PPDK flux which is at 21.3 nmol·min⁻¹, while metabolite concentrations are greater than their experimental values (Fig. 5G,H). In general, we notice a minor enhancement of flux predictions with this optimized model (Fig. 5I). These findings suggest that the white-box modeling approach, through COPASI modeling, stands as a conventional method of choice to build consistent in silico models of metabolic pathways and this, despite the fact that, in our case, metabolite concentrations are poorly predicted even after the parameterization of the kinetic constants.

Besides, other approximative models, with lin-log approximation kinetics and Liebermeister kinetics, could have been evaluated^35,36. Consequently, we built a model including the approximative lin-log equation (see modeling details in the legend of Fig. S3). Despite simplifying the rate equation by using lin-log kinetics, the model gives results comparable to the previous white-box model, with RMSE = 4.8 nmol·min⁻¹ and R² = 0.78 (Fig. S3C). Another model using the simpler modular rate law from Liebermeister³⁶ is built (see modeling details in the legend of Fig. S4). This model has the immediate effect of simplifying the rate equation for PPDK and allows good prediction of flux (26 nmol·min⁻¹) in the experimental conditions (Fig. S4A). However, results show that metabolite concentrations are still overestimated and the model presents a lower predictive capacity compared to the previous models, with RMSE = 4.03 nmol·min⁻¹ and R² = 0.87 (Fig. S4B,C). Both models, with lin-log approximation kinetics or Liebermeister kinetics, display the same dynamics, with better flux predictions when PGAM’s or ENO’s activity is varied than when PPDK’s activity is varied. Together, these results reveal that there are some aspects of PPDK kinetics that are not completely modeled by these different mechanistic approaches.

The grey-box modeling approach

Based on our previous experiences, the major hurdle in the second part of glycolysis modeling is the third reaction catalyzed by PPDK. Then, we investigate the use of a novel approach called the grey-box modeling approach, consisting of using an adjustment term ($\alpha \left|{V}_{f}-{V}_{f0}\right|$) in the kinetic equation of PPDK. In order to define the optimal value of $\alpha$ in the adjustment term, we test a range of values from 0 to 5*10⁶ and identify the best value $\alpha$ around 3.09*10⁶; below this value, the flux is overestimated, and above, the final flux is underestimated (Fig. 6A). Also, no changes are made to the predicted flux when PGAM or ENO activity is varied (Fig. S5).

Again, simulations were performed over one hour with COPASI and the results of prediction are shown (Fig. 6B–D). We observed that the fluxes were around 25 nmol·min⁻¹ as in a previous model (Figs. 5A and 6B), and consequently closer to the experimental value. In regards to the metabolite concentration predictions, they were also similar to those predicted with the previous model and were still higher than expected (Fig. 5B and 6C). Remarkably, a significant improvement of flux predictions was achieved, notably when PPDK activity was varied, compared to all other models analyzed before (Figs. 5C, F, I and 6D). Collectively, these results validate the use of the adjustment term in the kinetic equation to improve the metabolic pathway model built with COPASI.

Model comparison and reliability

Following the design of the second part of glycolysis using three modeling approaches, we assess the reliability of each approach and proceed to their comparison. Also, for an easier understanding of the following results, the properties of each model are summarized in Table 5.

Table 5 List of the main properties of each model.

Full size table

By comparing the predicted fluxes to their experimental values, we found that all models, from Models 1–6, worked well for predicting the final flux when activity of PGAM varies (Fig. 7A). When ENO activity is varied, we notice that Model 2 does not perform well, particularly for the low values, for which the model overestimates the final flux (Fig. 7B). Besides, for these two enzymes we note that Models 1, 4 and 5 from the white-box approach and Model 6 from the grey-box approach underestimate the flux when activity of PGAM or ENO is varied, with a gap that seems smaller in the case of the grey-box approach. As expected, dots from Model 3 are practically aligned with the first bisector, suggesting an almost perfect flux prediction with this model (Fig. 7A,B). Lastly, the variation of PPDK activity shows the greatest effect on model prediction. We observe that Model 2, as well as Model 1, are the two models that have the most difficulty in predicting flux under these conditions (Fig. 7C). Indeed, they overestimate the flux when PPDK activity is varied; this was also the case for Models 4 and 5, but with a smaller difference between the predicted and observed values. In contrast, fluxes are closely predicted with Models 3, 5 and 6. These results indicate that these models are suitable to simulate our studied metabolic pathway and that we can count on their reliability for the analysis of the flux in the second part of glycolysis, at least for an overall flux ranging from 0 to 30 nmol·min⁻¹.

The analysis of the statistics for each model reinforced the results obtained before (Table 6). Indeed, all models exhibited a fairly low RMSE under 3 nmol·min⁻¹ and a high R², around 0.98, when PGAM activity was varied. When ENO activity was varied, almost all models predicted the flux with a good RMSE under 3 nmol·min⁻¹ and R² above 0.97, except for Model 2. However, when PPDK activity was varied, Models 0, 1 and 2 showed the weakest results, with RMSE above 5 nmol·min⁻¹ and a R² under 0.9. Only the three models mentioned above (Models 3, 5 and 6) yielded good results with a low RMSE and a high R² value. These results corroborated those obtained earlier. Interestingly, the calculation of AIC allows the establishment of a ranking of models (from the best to less good): Model 2 > 3 > 6 > 5 > 4 > 1 > 0 (Table 6). Model 2, which has the lowest AIC, proved to be a poor model for flux prediction. Conversely, Model 3, that gives the best results in terms of RMSE, MAE and R² presents a good AIC. We also notice that the second-best model in flux prediction (Model 6) also presents a low AIC value.

Table 6 Comparative table of statistical metrics of each model for the training set (Table S2).

Full size table

Subsequently, in order to evaluate the generalization ability of our models, we predict the flux with the test set (Table 7). Many models do not have an adequate ability of generalization; nevertheless, Model 6 from the grey-box approach stands out from the others. Indeed, it is the only model able to predict the flux very well from new data, regardless of the enzymatic activity that is varied. Model 0 and 1 can predict the flux well, except when PPDK activity is varied. Also, AIC calculations identify Model 6 as the best one to generalize (AIC = − 486.7), since Model 3 presents higher RMSE, MAE and AIC value (AIC = 539.06). These results confirm the reliability of the three approaches for the analysis of the flux in the second part of glycolysis, with a preference for Model 6, which offers the best compromise between precision and complexity.

Table 7 Comparative table of statistical metrics of each model for the test set (Table S3).

Full size table

Identification of the main controlling enzymes of the pathway

After establishing three types of models for the considered metabolic pathway, we determined the enzyme ${C}_{E}^{J}$ with each model. These coefficients are calculated at a pseudo steady-state flux to Lac (Table 8) or at physiological metabolite concentrations (Table S5) at the reference or basal level of enzyme activity of 75 mU PGAM, 328.5 mU ENO and 196.5 mU PPDK. Each ${C}_{E}^{J}$ provides a quantitative measurement of the enzyme effect on the pathway flux. The closer the coefficient is to 1, the higher the enzyme impact on the flux. Thus, this coefficient differs from the concept of rate-limiting enzyme, which is commonly defined as the enzyme which catalyzes the slowest step in the pathway and corresponds to a qualitative evaluation of the enzyme impact on the pathway flux^15,16,17.

Table 8 Flux control coefficient determination.

Full size table

As we can see, at a pseudo steady-state flux to Lac, the enzyme that exerted the greatest control on the final flux is PGAM (0.65 ± 0.2), then ENO (0.18 ± 0.04) and PPDK (0.07 ± 0.1) which showed the least control on the flux (Table 8). The predicted values by the different models are within the same interval as those experimentally determined by pathway reconstitution⁸. Similar results were obtained with all models at physiological metabolite concentrations (Table S5). From these findings, we can conclude that the main controlling enzymes of the second part of glycolysis in E. histolytica are PGAM and, to a lesser extent, ENO and PPDK exert low or no control over the pathway flux.

In addition, we varied the enzyme activity from 0 to 400 mU and observed the final flux during the first hour of simulation using the COPASI model with the adjustment term (Fig. 8). When PGAM was varied, the flux went from 0 to 90.93 nmol·min⁻¹ (Fig. 8A) and when ENO was varied, the flux went to 26.26 nmol·min⁻¹. By contrast, PPDK activity variation did not affect the final pathway flux very much, which went to 24.13 nmol·min⁻¹ at 400 mU of PPDK. These results were consistent with previous ${C}_{E}^{J}$ calculations showing that PGAM and ENO are indeed the two main controlling enzymes of the pathway.

Discussion

Relevance of the white- grey- and black-box approach for the modeling of metabolic pathways

In this work, we model the second part of the glycolysis pathway of E. histolytica using three approaches: the white-, grey- and black-box approach, and we highlight their ability to predict the final flux. Many comparative studies are made in other fields to evaluate the relevance of using either of the three methods, and point out that the method depends on the problems encountered^37,38,39. In the case of energy model building, Li and Wen showed that simplified grey-box models are better as practical building models, compared to white-box models that require numerous parameters³⁸. In another study, the black-box models outperformed the other two models for the modeling of thermal simulation in a particular environment³⁹.

Here, the first approach is based on the use of kinetic parameters and equations and is related to the widely used method known as kinetics-based (or dynamic) modeling for industrial applications such as the production of molecules of interest, development of de novo synthesis pathways or understanding of microorganism metabolism^40,41,42. This method can provide accurate predictions; however, it requires numerous parameters and good knowledge of mechanistic rate laws; hence the need to develop new strategies of modeling when we do not have access to this information^43,44.

Despite the use of a more complex kinetic equation in the kinetic models, the results were not satisfactory; consequently, we used a simplified kinetic equation with an adjustment term in the grey-box approach. This is the first time this method is applied to enhance performance of a metabolic pathway kinetic model. In other studies found in the literature, the unknown kinetic constants are parameterized or the kinetic equations can be approximated^36,45,46,47. The present approach has some major advantages as it needs less parameters than the white-box approach, and it uses simplified kinetic equations that are biochemically plausible.

Finally, we used a novel black-box approach and built an ANN model with experimental data. As previously mentioned, ANN is generally used in biology to solve classification problems, for example, to classify lung carcinomas⁴⁸, but it has rarely been used to model a metabolic pathway^49,50,51,52. A recent study applied a similar technique to model the first part of glycolysis, and showed the success of this technique for predicting the flux⁵³. This last approach is characterized by its rapidity; however, it requires a large number of experimental data to be sufficiently effective.

Together, the approaches we describe here may be beneficial for modeling other metabolic pathways, depending on background information including “raw” experimental data, kinetic parameters and kinetic equations.

Factors impacting model performance

During this study, we relied on three main statistical metrics (RMSE, MAE and AIC) to evaluate model performance. The results revealed that different criteria are important and impact the value of these metrics and thus the model performance itself. Among these criteria, we identified the size of the dataset, but also the choice of the activation function (log or tanh) and the number of HU in our ANN models. Indeed, having a large number of high-quality datasets is essential to obtain a good ANN, and one challenge here would be to avoid over-fitting^54,55. Other studies bring out the importance of the size of the input sequence during the analysis of the DNA sequence, to increase model performance⁵⁶. Also, they reveal the relevance of neural network architecture, proposing the design of multi-task neural networks with multiple output variables⁵⁷. These factors raise new questions about the use of ANN to model metabolic pathways, and can be subjected to further investigation concerning the number of inputs and outputs to include in our model, to make it more efficient.

As we said earlier, to predict accurate results COPASI models need extensive data, such as kinetic parameters and equations. Our results reveal the impact of the kinetic equation on the final flux prediction. The impact of the kinetic equation on the model predictions depends on the complexity of the model and on the flux control coefficient of the enzyme. When the enzyme has high ${C}_{E}^{J}$, variations of its rate equation or small variations in the kinetic constants or V_max greatly impact the predicted pathway flux (e.g. PGAM in Fig. 8). In contrast, rate equation variations of a low controlling enzyme (such as PPDK) have less impact on the flux. It would be interesting to test in the models the influence of the lack of regulatory feedback on the enzyme that has the highest control, as was done in the Moreno-Sanchez et al. study⁸ focusing on PGAM. As was described in that paper, the lack of those regulatory effects renders the predictive power of the model ineffective. Therefore, regulatory properties on high controlling enzymes can drastically modify the model predictions. Furthermore, the question of which kinetic equation to use in the pathway remains a real topic in research today. Kim et al. review all kinetic rate expressions used in the kinetic model, from mechanistic expressions (Michaelis–Menten and Hill rate laws equations) to approximate kinetic equations (lin-log kinetics, modular rate laws…)⁵⁸. These approximate kinetic equations have the advantage of simplifying the modeling, but they cannot help with estimating the parameters. Moreover, particular attention is given to the kinetic parameters that need to be as close as possible to in vivo kinetics. This can be done during enzyme analysis by bringing the in vitro conditions closer to the in vivo conditions⁵⁸. Therefore, the consideration of these different factors may impact the process of model design but also the upstream research that is done to study metabolic pathways in a particular organism.

Possible model optimizations

Although we have almost accurate prediction results, we can consider additional improvements of the different models. Actually, as this analysis is only made on the second part of glycolysis, it could be envisioned to merge it with the first part of this metabolic pathway to investigate the changes in terms of ${C}_{E}^{J}$ and pathway flux control, and then compare the results to the previous ones, where the parts were modeled separately⁸. It would be interesting to have a detailed kinetic model of glycolysis in E. histolytica combined with other major metabolic pathways (glycogen metabolism, pentose phosphate pathway)⁷, to highlight the need to inhibit or not the main controlling enzymes identified here, as was done for cancer cells⁵⁹. Also, the addition of genetic-level regulations could help to better understand parasite metabolism, as is done for E. coli⁶⁰. However, in order to do this, we still need experimental data on gene expression and regulation in the parasite under conditions of infection.

Also, another way to optimize the models could be by parameter estimation of the unknown kinetic parameters in the UUBB equation. Here, we tried to estimate these parameters, defined first arbitrarily, but the parameter estimation results in very little improvement of the flux prediction with the use of the new estimated values. Actually, parameterization of kinetic constants can provide a mathematical solution to the problem with unrealistic values likely to be physiologically unlikely. Hence, the importance of performing parameter estimation with constraints, within intervals that may be possible in enzymes and may have physiological meaning (e.g. K_m or K_i values not surpassing the lower mM interval). This emphasizes again the need for more experimental data concerning the PPDK mechanism in in vivo conditions. Additionally, in kinetic models, parameters can be determined in two ways, as we have done, either one at a time or collectively; the only difference being that some parameters are often set to measured values^43,58. We can also consider the use of different parameter estimation techniques. As demonstrated in a previous work, kinetic parameters can be estimated with the flux balance analysis constraint-based modeling approach, by integrating multi-omics data in the model (fluxomic, proteomic and metabolomics data)⁶¹. Consequently, additional work needs to be done involving this part of the modeling, to improve our white-box model using a UUBB equation; it would also be interesting to integrate the data from the grey-box approach into the next parameter estimation procedure.

Biological insights

With the MCA method (${C}_{E}^{J}$) and with all models, we identified PGAM as the main controlling enzymes of the second part of glycolysis in this parasite, with a slight contribution of ENO. These results are supported by other studies conducted on this particular pathway^7,8. Furthermore, it has been found by elasticity analysis, another experimental approach of MCA, that the group of enzymes from PPi-dependent phosphofructokinase to PPDK controls about 0.2–0.28 of the pathway flux of amoebal glycolysis⁶². Within this pathway section, PGAM is the enzyme with the lowest activity in the cell⁷, which may contribute to the better control observed. Additionally, novel enzyme inhibitors were recently identified and tested in vitro^63,64. Therefore, these models may be an interesting subject of future research in which the inhibitor effect on the flux can be assessed.

Conclusion

Be it for the purpose of designing new valuable enzymatic pathways for industrial-scale production of molecules of interest or designing new efficient drugs, metabolic pathway modeling remains a great challenge today^65,66,67. Different techniques of modeling exist, including kinetic modeling, based on the use of kinetic parameters and equations that are not necessarily known or experimentally measured. Moreover, several machine learning-based methods are emerging for analysis of metabolic pathway modeling^68,69.

In this study, our objective was to compare three different modeling approaches to model metabolic pathways and identify the main controlling enzymes of the pathway. To this end, we used an application example (lower part of glycolysis of a parasite) and obtained:

The white-box approach, with the use of all known kinetic information about PGAM, ENO and PPDK. This method gave better results after the modification of the PPDK kinetic equation from ter-reactant reversible equation to UUBB equation (Training: R² = 0.95, RMSE = 2.06 nmol·min⁻¹ and MAE = 1.7 nmol·min⁻¹ and AIC = 336.05 and Test: R² = 0.97, RMSE = 2.19 nmol·min⁻¹ and MAE = 2 nmol·min⁻¹).
The grey-box approach, with the kinetic equation with an added adjustment term for PPDK; this model was the best of our models (Training: R² = 0.98, RMSE = 1.71 nmol·min⁻¹, MAE = 1.47 nmol·min⁻¹ and AIC = 244.71 and Test: R² = 1, RMSE = 0.23 nmol·min⁻¹ and MAE = 0.13 nmol·min⁻¹).
The black-box method, using the ANN method to predict the pathway flux. This model presents a low capacity of generalization since its low AIC (124.21) makes it one of the least preferred models here. Nonetheless, the speed and the low cost of this method make it interesting to develop. The model had a good predictive ability with Training: cvR² = 1, cvRMSE = 0.52 nmol·min⁻¹, cvMAE = 0.37 nmol·min⁻¹ and AIC = 124.21 and Test: R² = 0.98, RMSE = 1.61 nmol·min⁻¹ and MAE = 1.37 nmol·min⁻¹.

Also, all these models identified the same enzymes as the main controlling enzymes of the pathway: PGAM and ENO, PPDK not having much influence on the flux in E. histolytica.

Despite the need for further improvement, these models showed the relevance of the different methods for their future application in the field of metabolic pathway modeling and drug design, for in silico design starting from various backgrounds.

Data availability

The datasets used in this study are fully included and described in the Additional file. The ANN and COPASI models built during the present study are available in the Github repository, https://github.com/ophelielt/Lo-Thong_et_al._White-box_grey-box_and_black-box_pathway_modeling.git. All data generated or analyzed during this study are included in this published article (and its Supplementary Information files).

References

Kantor, M. et al.Entamoeba histolytica: updates in clinical manifestation, pathogenesis, and vaccine development. Can. J. Gastroenterol. Hepatol.2018, 1–6 (2018).
Google Scholar
Shirley, D.-A.T., Farr, L., Watanabe, K. & Moonah, S. A review of the global burden, new diagnostics, and current therapeutics for amebiasis. Open Forum Infect. Dis.5, 161 (2018).
Google Scholar
Upcroft, P. & Upcroft, J. A. Drug Targets and Mechanisms of Resistance in the Anaerobic Protozoa. Clin. Microbiol. Rev.14, 150–164 (2001).
CAS PubMed PubMed Central Google Scholar
Duchêne, M. Metronidazole and the redox biochemistry of Entamoeba histolytica. In Amebiasis (eds Nozaki, T. & Bhattacharya, A.) 523–541 (Springer, Tokyo, 2015).
Google Scholar
Samarawickrema, N. Involvement of superoxide dismutase and pyruvate:ferredoxin oxidoreductase in mechanisms of metronidazole resistance in Entamoeba histolytica. J. Antimicrob. Chemother.40, 833–840 (1997).
CAS PubMed Google Scholar
Saavedra, E., Encalada, R., Pineda, E., Jasso-Chávez, R. & Moreno-Sánchez, R. Glycolysis in Entamoeba histolytica: biochemical characterization of recombinant glycolytic enzymes and flux control analysis. FEBS J.272, 1767–1783 (2005).
CAS PubMed Google Scholar
Saavedra, E. et al. Kinetic modeling can describe in vivo glycolysis in Entamoeba histolytica: modeling entamoeba glycolysis. FEBS J.274, 4922–4940 (2007).
CAS PubMed Google Scholar
Moreno-Sánchez, R., Encalada, R., Marín-Hernández, A. & Saavedra, E. Experimental validation of metabolic pathway modeling: an illustration with glycolytic segments from Entamoeba histolytica. FEBS J.275, 3454–3469 (2008).
PubMed Google Scholar
Saavedra, E. et al. Control and regulation of the pyrophosphate-dependent glucose metabolism in Entamoeba histolytica. Mol. Biochem. Parasitol.229, 75–87 (2019).
CAS PubMed Google Scholar
Hou, J., Acharya, L., Zhu, D. & Cheng, J. An overview of bioinformatics methods for modeling biological pathways in yeast. Brief. Funct. Genom.15, 95–108 (2016).
CAS Google Scholar
Lancashire, L. J., Lemetre, C. & Ball, G. R. An introduction to artificial neural networks in bioinformatics–application to complex microarray and mass spectrometry datasets in cancer studies. Brief. Bioinform.10, 315–329 (2008).
Google Scholar
Dorronsoro, I. et al. CODES/neural network model: a useful tool for in silico prediction of oral absorption and blood-brain barrier permeability of structurally diverse drugs. QSAR Comb. Sci.23, 89–98 (2004).
CAS Google Scholar
Thishya, K., Vattam, K. K., Naushad, S. M., Raju, S. B. & Kutala, V. K. Artificial neural network model for predicting the bioavailability of tacrolimus in patients with renal transplantation. PLoS ONE13, e0191921 (2018).
PubMed PubMed Central Google Scholar
Varela-Gómez, M., Moreno-Sánchez, R., Pardo, J. P. & Perez-Montfort, R. Kinetic mechanism and metabolic role of pyruvate phosphate dikinase from Entamoeba histolytica. J. Biol. Chem.279, 54124–54130 (2004).
PubMed Google Scholar
Fell, D. A. Metabolic control analysis: a survey of its theoretical and experimental development. Biochem. J.286, 313–330 (1992).
CAS PubMed PubMed Central Google Scholar
Saavedra, E., Gonzalez-Chavez, Z., Moreno-Sanchez, R. & Michels, P. A. M. Drug Target selection for Trypanosoma cruzi metabolism by metabolic control analysis and kinetic modeling. Curr. Med. Chem.26, 6652–6671 (2019).
CAS PubMed Google Scholar
Moreno-Sánchez, R., Saavedra, E., Rodríguez-Enríquez, S. & Olín-Sandoval, V. Metabolic control analysis: a tool for designing strategies to manipulate metabolic pathways. J. Biomed. Biotechnol.2008, 1–30 (2008).
Google Scholar
Puri, M. et al. Introduction to Artificial Neural Network (ANN) as a Predictive Tool for Drug Design, Discovery, Delivery, and Disposition. In Artificial Neural Network for Drug, Design Delivery and Disposition 3–13 (Elsevier, Amsterdam, 2016).
Google Scholar
RStudio Team (2015). RStudio: Integrated Development for R. (RStudio, Inc., Boston, MA) https://www.rstudio.com/.
Fritsch, S., Guenther, F. & Wright, M. N. Neuralnet: Training of Neural Networks. R package version 1.44.2. (2019).
Venables, W. N. & Ripley, B. D. Modern Applied Statistics with S 4th edn. (Springer, New York, 2002).
MATH Google Scholar
Mendes, P. Biochemistry by numbers: simulation of biochemical pathways with Gepasi 3. Trends Biochem. Sci.22, 361–363 (1997).
CAS PubMed Google Scholar
Hoops, S. et al. COPASI–a COmplex PAthway SImulator. Bioinformatics22, 3067–3074 (2006).
CAS PubMed Google Scholar
Segel, I. H. Enzyme Kinetics (Wiley, New York, 1975).
Google Scholar
Rumelhart, D. E., Geoffrey, E. & Williams, R. J. learning representations by back propagating errors. Nature323, 533–536 (1986).
ADS MATH Google Scholar
Battiti, R. & Masulli, F. BFGS optimization for faster and automated supervised learning. In International neural network conference 757–760 (Springer, 1990).
Jain, A. K., Mao, J. & Mohiuddin, K. M. Artificial neural networks: a tutorial. Computer29, 31–44 (1996).
Google Scholar
Schultz, M. & Reitmann, S. Prediction of aircraft boarding time using LSTM network. In 2018 Winter Simulation Conference (WSC) 2330–2341 (IEEE, 2018). https://doi.org/10.1109/WSC.2018.8632532.
Hagan, M. T., Demuth, H. B., Beale, M. H. & De Jesús, O. Neural Network Design (Martin Hagan, Oklhoma, 2014).
Google Scholar
Vastrad, M. Performance analysis of neural network models for oxazolines and oxazoles derivatives descriptor dataset. Int. J. Inf. Sci. Technol.3, 1–15 (2013).
Google Scholar
Cakit, E., Durgun, B. & Cetik, O. A neural network approach for assessing the relationship between grip strength and hand anthropometry. Neural Netw. World25, 603–622 (2015).
Google Scholar
Küçükönder, H., Boyaci, S. & Akyüz, A. A modeling study with an artificial neural network: developing estimationmodels for the tomato plant leaf area. Turk. J. Agric. For.40, 203–212 (2016).
Google Scholar
Chai, T. & Draxler, R. R. Root mean square error (RMSE) or mean absolute error (MAE)? Arguments against avoiding RMSE in the literature. Geosci. Model Dev.7, 1247–1250 (2014).
ADS Google Scholar
Panchal, G., Ganatra, A., Kosta, Y. P. & Panchal, D. Searching most efficient neural network architecture using Akaike’s information criterion (AIC). Int. J. Comput. Appl.1, 54–57 (2010).
Google Scholar
Visser, D. & Heijnen, J. J. Dynamic simulation and metabolic re-design of a branched pathway using linlog kinetics. Metab. Eng.5, 164–176 (2003).
CAS PubMed Google Scholar
Liebermeister, W., Uhlendorf, J. & Klipp, E. Modular rate laws for enzymatic reactions: thermodynamics, elasticities and implementation. Bioinformatics26, 1528–1534 (2010).
CAS PubMed Google Scholar
Gernaey, K. V., van Loosdrecht, M. C. M., Henze, M., Lind, M. & Jørgensen, S. B. Activated sludge wastewater treatment plant modelling and simulation: state of the art. Environ. Model. Softw.19, 763–783 (2004).
Google Scholar
Li, X. & Wen, J. Review of building energy modeling for control and operation. Renew. Sustain. Energy Rev.37, 517–537 (2014).
ADS Google Scholar
Arendt, K., Jradi, M., Shaker, H. R. & Veje, C. T. Comparative Analysis of White-, Gray- and Black-Box Models For Thermal Simulation of Indoor Environment: Teaching Building Case Study 8 (2018).
Huang, H. & Buekens, A. Chemical kinetic modeling of de novo synthesis of PCDD/F in municipal waste incinerators. Chemosphere44, 1505–1510 (2001).
ADS CAS PubMed Google Scholar
Liu, J., Brazier-Hicks, M. & Edwards, R. A kinetic model for the metabolism of the herbicide safener fenclorim in Arabidopsis thaliana. Biophys. Chem.143, 85–94 (2009).
CAS PubMed Google Scholar
Petroll, K., Kopp, D., Care, A., Bergquist, P. L. & Sunna, A. Tools and strategies for constructing cell-free enzyme pathways. Biotechnol. Adv.37, 91–108 (2019).
CAS PubMed Google Scholar
Almquist, J., Cvijovic, M., Hatzimanikatis, V., Nielsen, J. & Jirstrand, M. Kinetic models in industrial biotechnology: improving cell factory performance. Metab. Eng.24, 38–60 (2014).
CAS PubMed Google Scholar
Saa, P. A. & Nielsen, L. K. Construction of feasible and accurate kinetic models of metabolism: a Bayesian approach. Sci. Rep.6, 29635 (2016).
ADS CAS PubMed PubMed Central Google Scholar
Costa, R. S., Hartmann, A. & Vinga, S. Kinetic modeling of cell metabolism for microbial production. J. Biotechnol.219, 126–141 (2016).
CAS PubMed Google Scholar
Rohwer, J. M. Kinetic modelling of plant metabolic pathways. J. Exp. Bot.63, 2275–2292 (2012).
CAS PubMed Google Scholar
del Rosario, R. C. H., Mendoza, E. & Voit, E. O. Challenges in lin-log modelling of glycolysis in Lactococcus lactis. IET Syst. Biol.2, 136 (2008).
PubMed Google Scholar
Brougham, D. F. et al. Artificial neural networks for classification in metabolomic studies of whole cells using ¹ H nuclear magnetic resonance. J. Biomed. Biotechnol.2011, 1–8 (2011).
Google Scholar
Mendes, P. & Kell, D. B. On the analysis of the inverse problem of metabolic pathways using artificial neural networks. Biosystems38, 15–28 (1996).
CAS PubMed Google Scholar
Voit, E. O. & Almeida, J. Decoupling dynamical systems for pathway identification from metabolic profiles. Bioinformatics20, 1670–1681 (2004).
CAS PubMed Google Scholar
Antoniewicz, M. R., Stephanopoulos, G. & Kelleher, J. K. Evaluation of regression models in metabolic physiology: predicting fluxes from isotopic data without knowledge of the pathway. Metabolomics2, 41–52 (2006).
CAS PubMed PubMed Central Google Scholar
Naushad, S. M. et al. Artificial neural network-based exploration of gene-nutrient interactions in folate and xenobiotic metabolic pathways that modulate susceptibility to breast cancer. Gene580, 159–168 (2016).
CAS PubMed Google Scholar
Ajjolli Nagaraja, A. et al. Flux prediction using artificial neural network (ANN) for the upper part of glycolysis. PLoS ONE14, e0216178 (2019).
PubMed PubMed Central Google Scholar
Oyetunde, T., Bao, F. S., Chen, J.-W., Martin, H. G. & Tang, Y. J. Leveraging knowledge engineering and machine learning for microbial bio-manufacturing. Biotechnol. Adv.36, 1308–1315 (2018).
CAS PubMed Google Scholar
Angermueller, C., Pärnamaa, T., Parts, L. & Stegle, O. Deep learning for computational biology. Mol. Syst. Biol.12, 878 (2016).
PubMed PubMed Central Google Scholar
Zhou, J. & Troyanskaya, O. G. Predicting effects of noncoding variants with deep learning–based sequence model. Nat. Methods12, 931–934 (2015).
CAS PubMed PubMed Central Google Scholar
Dahl, G. E., Jaitly, N. & Salakhutdinov, R. Multi-task neural networks for QSAR predictions. ArXiv14061231 Cs Stat (2014).
Kim, O. D., Rocha, M. & Maia, P. A review of dynamic modeling approaches and their application in computational strain optimization for metabolic engineering. Front. Microbiol.9, 1690 (2018).
PubMed PubMed Central Google Scholar
Marín-Hernández, Á et al. Inhibition of non-flux-controlling enzymes deters cancer glycolysis by accumulation of regulatory metabolites of controlling steps. Front. Physiol.7, 412 (2016).
PubMed PubMed Central Google Scholar
Khodayari, A. & Maranas, C. D. A genome-scale Escherichia coli kinetic metabolic model k-ecoli457 satisfying flux data for multiple mutant strains. Nat. Commun.7, 13806 (2016).
ADS CAS PubMed PubMed Central Google Scholar
Cotten, C. & Reed, J. L. Mechanistic analysis of multi-omics datasets to generate kinetic parameters for constraint-based metabolic models. BMC Bioinform.14, 32 (2013).
Google Scholar
Pineda, E. et al.In vivo identification of the steps that control energy metabolism and survival of Entamoeba histolytica. FEBS J.282, 318–331 (2015).
CAS PubMed Google Scholar
Othman, N., Saidin, S. & Noordin, R. In vitro testing of potential Entamoeba histolytica pyruvate phosphate dikinase inhibitors. Am. J. Trop. Med. Hyg.97, 1204–1213 (2017).
PubMed PubMed Central Google Scholar
Stephen, P., Vijayan, R., Bhat, A., Subbarao, N. & Bamezai, R. N. K. Molecular modeling on pyruvate phosphate dikinase of Entamoeba histolytica and in silico virtual screening for novel inhibitors. J. Comput. Aided Mol. Des.22, 647–660 (2008).
ADS CAS PubMed Google Scholar
Rajasethupathy, P., Vayttaden, S. J. & Bhalla, U. S. Systems modeling: a pathway to drug discovery. Curr. Opin. Chem. Biol.9, 400–406 (2005).
CAS PubMed Google Scholar
Eriksen, D. T., Lian, J. & Zhao, H. Protein design for pathway engineering. J. Struct. Biol.185, 234–242 (2014).
CAS PubMed Google Scholar
Church, G. M. & Regis, E. Regenesis: How Synthetic Biology Will Reinvent Nature and Ourselves (Basic Books, New York, 2012).
Google Scholar
Cuperlovic-Culf, M. Machine learning methods for analysis of metabolic data and metabolic pathway modeling. Metabolites8, 4 (2018).
PubMed Central Google Scholar
Costello, Z. & Martin, H. G. A machine learning approach to predict metabolic pathway dynamics from time-series multiomics data. Npj Syst. Biol. Appl.4, 19 (2018).
PubMed PubMed Central Google Scholar

Download references

Acknowledgments

OLT is supported by a PhD grant from the Region Reunion and European Union (FEDER) under European Operational Program FEDER REUNION –2014/2020 file number 20171389, tiers 216275.

Author information

Authors and Affiliations

University of Paris, UMR_S1134, BIGR, Inserm, 75015, Paris, France
Ophélie Lo-Thong, Philippe Charton & Frédéric Cadet
DSIMB, UMR_S1134, BIGR, Inserm, Laboratory of Excellence GR-Ex, Faculty of Sciences and Technology, University of La Reunion, 97715, Saint-Denis, France
Ophélie Lo-Thong, Philippe Charton & Frédéric Cadet
PEACCEL, Artificial Intelligence Department, 6 square Albin Cachot, box 42, 75013, Paris, France
Xavier F. Cadet
LE2P, Laboratory of Energy, Electronics and Processes EA 4079, Faculty of Sciences and Technology, University of La Reunion, 97444, St Denis cedex, France
Brigitte Grondin-Perez & Cédric Damour
Departamento de Bioquímica, Instituto Nacional de Cardiología Ignacio Chávez, 14080, Mexico City, Mexico
Emma Saavedra

Authors

Ophélie Lo-Thong
View author publications
You can also search for this author in PubMed Google Scholar
Philippe Charton
View author publications
You can also search for this author in PubMed Google Scholar
Xavier F. Cadet
View author publications
You can also search for this author in PubMed Google Scholar
Brigitte Grondin-Perez
View author publications
You can also search for this author in PubMed Google Scholar
Emma Saavedra
View author publications
You can also search for this author in PubMed Google Scholar
Cédric Damour
View author publications
You can also search for this author in PubMed Google Scholar
Frédéric Cadet
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

F.C., C.D. and P.C. designed the method. OLT, CD, PC, BGP, XFC, ES and FC participated in the design of the study and performed the analysis. OLT wrote algorithms. OLT, CD, PC, XFC, ES and FC wrote and corrected the manuscript. All authors read and approved the final version of the manuscript.

Corresponding author

Correspondence to Frédéric Cadet.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Lo-Thong, O., Charton, P., Cadet, X.F. et al. Identification of flux checkpoints in a metabolic pathway through white-box, grey-box and black-box modeling approaches. Sci Rep 10, 13446 (2020). https://doi.org/10.1038/s41598-020-70295-5

Download citation

Received: 11 July 2019
Accepted: 27 July 2020
Published: 10 August 2020
DOI: https://doi.org/10.1038/s41598-020-70295-5

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Integrating glycolysis, citric acid cycle, pentose phosphate pathway, and fatty acid beta-oxidation into a single computational model

Reconstruction, simulation and analysis of enzyme-constrained metabolic models using GECKO Toolbox 3.0

Diverse classes of constraints enable broader applicability of a linear programming-based dynamic metabolic modeling framework

Introduction

Materials and methods

Second part of glycolysis experimental data

Artificial neural networks (ANNs)

Complex pathway SImulator (COPASI) metabolic networks

Ethics approval and consent to participate

Methodology

Black- white- and grey-box approach procedure

Black-box approach

Artificial neural networks (ANNs) design

ANN selection and performance evaluation

White-box approach

Complex Pathway SImulator (COPASI) metabolic network design

Metabolic network refinement and validation

Grey-box approach

Model comparison

Flux control analysis

Application and results

ANN modeling of the second part of glycolysis

Design of metabolic network with the white-box approach

The grey-box modeling approach

Model comparison and reliability

Identification of the main controlling enzymes of the pathway

Discussion

Relevance of the white- grey- and black-box approach for the modeling of metabolic pathways

Factors impacting model performance

Possible model optimizations

Biological insights

Conclusion

Data availability

References

Acknowledgments

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's note

Supplementary information

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

Comments

Search

Quick links