Reaction lumping in metabolic networks for application with thermodynamic metabolic flux analysis

Thermodynamic metabolic flux analysis (TMFA) can narrow down the space of steady-state flux distributions, but requires knowledge of the standard Gibbs free energy for the modelled reactions. The latter are often not available due to unknown Gibbs free energy change of formation \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$, {\Delta }_{f} G^{0}$$\end{document},ΔfG0, of metabolites. To optimize the usage of data on thermodynamics in constraining a model, reaction lumping has been proposed to eliminate metabolites with unknown \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\Delta }_{f} G^{0}$$\end{document}ΔfG0. However, the lumping procedure has not been formalized nor implemented for systematic identification of lumped reactions. Here, we propose, implement, and test a combined procedure for reaction lumping, applicable to genome-scale metabolic models. It is based on identification of groups of metabolites with unknown \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\Delta }_{f} G^{0}$$\end{document}ΔfG0 whose elimination can be conducted independently of the others via: (1) group implementation, aiming to eliminate an entire such group, and, if this is infeasible, (2) a sequential implementation to ensure that a maximal number of metabolites with unknown \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\Delta }_{f} G^{0}$$\end{document}ΔfG0 are eliminated. Our comparative analysis with genome-scale metabolic models of Escherichia coli, Bacillus subtilis, and Homo sapiens shows that the combined procedure provides an efficient means for systematic identification of lumped reactions. We also demonstrate that TMFA applied to models with reactions lumped according to the proposed procedure lead to more precise predictions in comparison to the original models. The provided implementation thus ensures the reproducibility of the findings and their application with standard TMFA.

We note that G 0 j can be obtained from the standard Gibbs free energy of formation of metabolites, f G 0 , weighted by the respective stoichiometric coefficients with which the metabolites enter the reaction r j : This approach has been extended to consider the contribution of different chemical groups, termed group contribution method 20,21 , and later, the contribution of pseudoisomeric groups 22 . However, for many metabolites f G 0 is neither experimentally determined, due to the large experimental efforts and availability of chemical standards needed 23 , nor can be estimated by using the group contribution methods and extensions thereof 22 . The group contribution method is reported to not be applicable in the case of organic-inorganic complexes as well as for a small number but often encountered organic substructures 21 . As a result, the standard Gibbs free energy for more than a third of reactions in the entire KEGG database are missing since f G 0 for the included metabolites are not available 21,22 . Moreover, in the most recent version of the ModelSEED database, more than half of the included metabolites have unspecified f G 0 values 24 .
To overcome the challenge in TMFA, Henry et al. introduced the idea of determining a linear combination of reactions with undetermined G 0 , so-called lumping, to obtain reactions in which metabolites with unknown f G 0 are eliminated, i.e. enter with stoichiometric coefficients of zero 19 . Hence, G 0 of the resulting lumped reaction is fully specified. However, besides the idea and the list of lumped reactions, the process of lumping was not further specified in the original study.
Let the reactions be partitioned into two classes, J lumped , composed of all lumped reactions, and J model , consisting of the reactions comprising the original model. Therefore, the lumped reactions are only introduced to impose more thermodynamic constraints, while the steady-state flux space remains unaltered (by solving Sv = 0 for reactions in J model ). Reactions whose lumping leads to a lumped reaction r k can only be thermodynamically feasible if the lumped reaction itself is feasible 18 . This is ensured by the following constraints: where the binary variables y k and z j take the value 1 if the respective lumped reaction r k and the model reaction r j are thermodynamically feasible. Here, α denotes the linear combination of reactions which yield the lumped reaction and M is a big constant. Note that when y k = 0 , i.e. the lumped reaction is not thermodynamically feasible, then j∈J model α kj z j ≤ j∈J model α kj − 1 , implying that at least one of the reactions forming the lumped reaction r k is inactive. Despite advances in applications of TMFA across different organisms and for various purposes, from estimation of realistic flux distributions 19,25 to model reduction 26,27 , the procedure of reaction lumping has not been fully specified. We would like to note that the reaction lumping does not consider removal of reactions while retaining key functional properties, as applied in stoichiometric techniques for model reduction with application of thermodynamic constraints 27,28 or without them 29,30 .
Here we introduce and precisely formulate an approach for reaction lumping and provide an efficient implementation that is applicable with genome-scale metabolic models. The proposed formulation of the approach identifies a maximal subset of metabolites that can be eliminated by lumping, leading to automation of this step in the application of the constraint-based approaches based on TMFA.

Methods
The calculation of standard Gibbs free energy, G 0 , of reactions is hindered by the presence of metabolites with unknown f G 0 . Reaction lumping aims to identify a linear combination, α , of reactions involving at least one metabolite with unknown f G 0 , such that the corresponding stoichiometric coefficient in the resulting lumped reaction is zero. As a result, G 0 of the lumped reaction can be calculated, thus facilitating the application of TMFA. The proposed approach considers the potentially intertwined relationship of metabolites with unknown f G 0 within a metabolic network, and ensures that every lumped reaction that is eventually created includes only metabolites with known f G 0 . www.nature.com/scientificreports/ Lumping program. Here, all reversible reactions are split into two irreversible reactions. If metabolites with unknown f G 0 co-occur in the lumped reactions, a linear combination may not be able to eliminate all such metabolites of unknown f G 0 at once, resulting in a lumped reaction for which G 0 can still not be calculated.
To resolve this problem, we first define the notion of a group of metabolites with unknown f G 0 . We consider the submatrix R of the stoichiometric matrix S that involves reactions whose G 0 cannot be determined due to the involvement of metabolites with unknown f G 0 (Fig. 1a). These reactions can be represented by a bipartite graph composed of reaction and metabolite nodes. A metabolite node is connected to a reaction node if the corresponding metabolite participates in the reactions. To determine the groups of metabolites with unknown f G 0 , first the metabolites with available f G 0 are removed from the bipartite graph. The connected components in the resulting graph correspond to the groups of metabolites with unknown f G 0 . For instance, in Fig. 1a, metabolite A and B, with unknown f G 0 form a group since no other metabolites of unknown f G 0 participate in the reactions r 1 , r 2 and r 7 , that include these metabolites.
The resulting lumped reaction for a group, U , of metabolites with unknown f G 0 is given by the product y = Rα in which the stoichiometry of each metabolite u ∈ U , specified with y u , is constrained to 0: The formulation in Eq. (5) corresponds to minimizing the sum of absolute values of stoichiometric coefficients of the lumped reaction, implemented by a well-established transformation of variables 31 . We make sure that every reaction involving the metabolite u ∈ U (of unknown f G 0 ), which we aim to eliminate, is associated a positive coefficient in the linear combination (Eqs. (8) and (9)). The reactions that are combined in the lumped reaction are given by the support of α , with the difference of y + and y − denoting the stoichiometric coefficients of the lumped reaction (Eqs. (6) and (7)). Clearly, the lumping program can be iteratively applied for each metabolite with unknown f G 0 (i.e. |U|= 1), which we refer to as naïve iterative procedure, or for a group of metabolites with unknown G f (i.e. |U|> 1).
The set R of reactions to be lumped is specified by the user. In the tests we conduct, we exclude biomass and exchange reactions from R since the biomass reaction is synthetic and the exchange reactions are often poorly supported with evidence. Exclusion of the biomass reaction from the set R was done to prevent an infeasible lumped reaction that would lead to blocking of the biomass reaction (see constraint in Eq. (2), for y k = 0 ). In the provided implementation, we include the option to enable or disable lumping of (internal) transport reactions. The presented result allow lumping of transport reactions, whereby adjustment for transport across membrane is performed for lumped reactions that cross compartments.

Group implementation.
With the group implementation, we aim to eliminate all metabolites in a group at once by checking if the stoichiometric coefficients for the metabolites in the group can be set to zero by a linear combination that satisfies the constraint in Eqs. (6)- (9). If this is possible, a single linear program suffices to find a lumped reaction that eliminates the metabolites in the group (Fig. 1b-blue part). An example where the group procedure can be applied is given by the group U1 on Fig. 1a (bottom). Here, the lumping of r 1 , r 2 , and r 7 simultaneously eliminates the metabolites A and B, forming U1, by solving a single linear program.
The group lumping fails to find a lumped reaction if the feasible space of the linear program is empty, i.e. there exists no linear combination that can eliminate the metabolites with unknown f G 0 in the group U (Fig. 1b). If this is the case, we proceed with sequential lumping. For instance, the group U2, of metabolites E, G, and H, cannot be eliminated at once using the group implementation. The reason is that the linear system: does not have a solution which satisfies the constraint that α 3 is of value at least 1 since r 3 contains a metabolite from the group U2. However, metabolite G (corresponding to the second row) can be eliminated by using the same linear system, by a linear combination of reactions r 4 , r 5 , and r 6 . Sequential implementation. The sequential implementation starts each iteration with a single metabolite u with unknown f G 0 from a given group U of such metabolites. Here, we form a subset of metabolites with unknown f G 0 , denoted by U ′ which initially contains only u . The sequential implementation then aims to identify a linear combination of reactions that eliminates all metabolites in U ′ , which is updated iteratively. If such a linear combination exists, the sequential implementation checks if the lumped reaction involves any other www.nature.com/scientificreports/ The proposed procedure tests if all metabolites within a group can be eliminated at once; if no solution can be found each metabolite in the group is evaluated on its own. Each lution here is preliminary; it must be checked whether there are any other metabolites in the group still involved (which would hinder the G 0 calculation). If the latter hold true, such metabolites are added sequentially while each previous solution is placed in R. www.nature.com/scientificreports/ metabolite, say w , of unknown f G 0 . In such a case, the found lumped reaction y is added to the matrix R and metabolite w is added to the set U ′ . Clearly, then, we need to reiterate the solving of the linear program to ensure that all metabolites in the updated set U ′ are eliminated ( Fig. 1b-magenta part). Note that in solving the linear program in Eqs. (5)-(9), we do not consider the existence of several alternative solutions, to prevent backtracking. This is justified by our aim to eliminate a maximal rather than the maximum number of metabolites with unknown f G 0 via reaction lumping. However, this may have implications in constraints in TMFA applications. The sequential implementation necessitates solving of several linear programs with an increasing size of R . The latter is due to the fact that in each iteration R is extended with the currently found solution. For instance, let us consider group U2 consisting of three metabolites that cannot be eliminated by the group implementation. The sequential solution procedure would first search for a linear combination that eliminates metabolite E. While the linear combination y = r 3 + 2r 4 + r 5 removes metabolite E, it also involves metabolite G which is also of unknown f G 0 . Thus, the set U ′ is enlarged to now include metabolite E and G and the matrix R is augmented with the found solution y . The solution to the next linear program identified y + 2r 6 as a linear combination that eliminates metabolite G ; however, the solution includes the metabolite H of unknown f G 0 . The final iteration is performed with the set U ′ = {E, G, H} and the matrix R enlarged again by previously found linear combination; however, no solution that eliminates all three metabolites in U ′ can be identified in this case. Proceeding with metabolite G , two iterations are needed to identify the lumped reaction, r 4 + r 5 + r 6 , that eliminates G. Similarly to E, no linear combination eliminates metabolite H while constraining metabolite E and G to zero which are added to U ′ in the proceedings to find a lumped reaction. In total eight linear programs need to be solved to exhaustively check if any metabolites of group U2 can be eliminated. In contrast to the group implementation, the sequential implementation is capable to identify a lumped reaction that eliminates the metabolite G with unknown G f . Furthermore, in contrast to the group implementation, which eliminates the group U1 in a single linear program, the sequential implementation requires solving four linear programs to arrive at the same solution.
Combined procedure. The combined implementation applies first the group and then sequential implementation on each of the groups of metabolites resulting from the partition. Therefore, it ensures maximizing the number of metabolites with unknown G f that can be eliminated via lumping while solving a fewer number of linear programs (Fig. 1b). In effect, this approach takes the advantages of the speed of the group implementation, due to the reduced number of programs to be solved, and the exhaustive search of the sequential lumping.
TMFA and variability analysis. We implemented TMFA by allowing the concentrations to range between 1 µM and 20 mM. In the case of E. coli the range for Glycerophosphoglycerol (g3pg), Sn-Glycero-3-phosphoethanolamine (g3pe), and water in the cytosol had to be relaxed to 1 µM -1.4 * 10 55 M, 1 µM-6003 M and 14.92 pM-20 mM, respectively, for both the original model and the model with lumped reactions. Additionally, for the model with lumping, the upper boundary of Nicotinamide adeneine dinucleotide (nadh) had to be expanded to 41.2 mM to obtain 90% of optimal biomass from FBA. Note that similar relaxations need to be performed in applications of TMFA with other models 18,32 . We then determined the variability of G j for every reaction j , by calculating the minimum and maximum values it takes in the original model and the model with the lumped reactions at 90% of the optimal biomass from FBA.
Technical details of the implementation. The lumping procedure was implemented in MATLAB (v 9.6.0 33 ), requiring MATLAB's solver 'intlinprog' as it outperformed other solvers within a benchmark showcase 34 . All options in the solver were left to default apart from 'MaxTime' which was set to 120 s (which is roughly 10 times greater than the longest duration observed in all investigated cases). The time limit had no influence on the outcome as it never caused a premature stop. The statistical analysis was conducted in R (v. 4.0.2) requiring the packages ggplot2 35 , ggnewscale 36 , gridExtra 37 , RColorBrewer 38 , mdthemes 39 and cowplot 40 . Figures 1 and 3 were created using LucidChart 41 . All computations were done on a Desktop PC with AMD Ryzen 5 processor (6 × 3.60 GHz) and 32 GB DDR4 RAM.

Results
Our proposed lumping procedure is designed to identify lumped reactions for any model with incomplete thermodynamic data to enforce stricter thermodynamic constraints, see Eqs. (1)-(4). As stated above, the existing applications of TMFA 11,18 do not specify how to systematically find such combination of reactions that lead to elimination of metabolites of unknown f G 0 . The proposed combined procedure can be applied to genomescale models, independent of the extent of available information and its output can be directly used with existing implementation of TMFA 42 .
Lumping decreases the number of reactions with undetermined G 0 . We applied the combined lumping procedure with the genome-scale models of three organisms E. coli 43  All metabolites with unknown f G are split into 280 groups by following our implementation ("Methods"). By applying the proposed lumping procedure, we were able to investigate the extent to which metabolites with unknown f G 0 could be removed, thus leading to more reactions with specified G 0 . For all three models we could eliminate a sizeable proportion of metabolites with unknown f G 0 (Fig. 2). In the iJR904 model, elimination of 0.5% metabolites (with unknown f G 0 ) led to a decrease of 1.4% of reactions with unknown G 0 (the percentages are with respect to the total number of metabolites and reactions). The decrease was not due to being able to estimate the unknown G 0 , but rather to the inclusion of lumped reactions in the model for which G 0 could be readily determined from the provided thermodynamics data. For instance, inorganic triophosphate (PPPI) cannot be removed, by lumping, from the cytosol due to the reactions in which it occurs (Fig. 3a). In the model of iBsu1103, 15.4% of metabolites (with unknown f G 0 ) could be eliminated, leading to a decrease of 19.7% for the reactions with unspecified G 0 . Therefore, our procedure eliminated more than half of the metabolites with unknown f G 0 . For example, two metabolites with unknown f G 0 can be removed from this model by lumping five reactions (Fig. 3b). Similarly, for the redHuman model, 15.4% of metabolites with unknown f G 0 could be eliminated, leading to a decrease of 16.4% in reactions with known G 0 . Thus, irrespective of the relative amount of available information regarding f G 0 , our proposed lumping procedure identified combinations of reactions with unidentified G 0 which eliminates the maximal number of metabolites with unknown G f .

Figure 2.
Application of lumping on genome-scale metabolic models. The percentage of reactions with unknown G 0 is in three GEMs, (a) iJR904, (b) iBsu1103, and (c) redHuman, reduced after lumping due to elimination of metabolites with unknown ∆ f G 0 . Shown is the percentage of unknown G-values in relation to the total number of present metabolites (mets) and reactions, respectively, (rxns) before and after the lumping procedure was applied. The number of eliminated metabolites with unknown ∆ f G 0 is with respect to the total number of metabolites. The investigated genome-scale models have low, high and medium number of unknown G 0 and ∆ f G 0 , respectively, and pertain to organisms of increasing complexity. www.nature.com/scientificreports/ Combined procedure is more efficient in comparison to the sequential implementation alone. A lumping procedure that naively attempts to eliminate every metabolite of unknown f G 0 would rely solely on the iterative procedure. This approach would require solving at least one linear program per metabolite of unknown f G 0 to identify a lumped reaction that eliminates the metabolite. The novelty of our solution consists of reducing the number of linear programs to be solved, by proposing the combined procedure that relies on the group and sequential implementation executed on each group of metabolites (see "Methods"). The group implementation has the potential to speed up the elimination of the metabolites in a group by solving a single linear program. Here, we first investigated the advantages provided by the combined procedure measured by the difference between the total duration and number of linear programs solved in comparison to the naïve usage of the sequential procedure. We also compared the time spent on solving the linear programs within the group and sequential component of the combined procedure ( Table 1). The combined procedure for the iBsu1103 model required solving 1006 linear program in 196 s. The linear program which took the least amount of time required 0.09 s, while the slowest required 14.31 s ( Table 1). The lumping based on the sequential implementation only necessitated solving of three linear program fewer in comparison to the combined procedure, the duration of which ranges from 0.12 to 0.33 s. The entire procedure relying on the sequential implementation took only 179 s and was faster by 17 s in comparison to the combined ( Table 2).
Despite the better performance (in time) for the sequential procedure on the iBsu1103 model, the potential of the combined implementation was demonstrated on the case of the redHuman model. Here, the combined procedure required solving 2650 linear programs in 2659 s. The fastest linear program took 0.19 s, while the slowest required 13.59 s. The lumping based only on the sequential procedure required 4500 linear programs, with time that ranged from 0.19 to 0.9 s. In comparison to the combined procedure, the application of only the sequential took 772 s longer and required solving 1850 more linear programs (Table 2).  Table 1. Comparison of the combined and sequential implementation. The comparison is carried out on two genome-scale models, iBsu1103 and redHuman. Shown are the number of linear programs (LPs), total time for the execution of the procedure, and the minimum and the maximum time (in seconds) needed to solve a single linear program within the entire procedure.

No. LPs Total time (s) Min time of LP (s) Max time of LP (s) iBsu1103
Sequential www.nature.com/scientificreports/ Next, we investigated the contribution of the group and sequential implementation to the combined procedure. The proposed combined lumping procedure required 280 and 2370 linear programs for the group and the sequential implementation parts, respectively, taking all together ~ 20 min (Fig. 4a). An overall trend is apparent that, in most cases, the fewer linear programs for the group implementation took longer than the linear programs for individual linear programs in the sequential approach (Fig. 4a). The most prominent outliers in terms of the time needed to solve the linear programs corresponded to the largest groups, of size 200 for the redHuman Table 2. Properties of the investigated genome-scale metabolic models. Shown are the number of metabolites and the number of those with unknown ∆ f G 0 , the number of reactions and the number of those with unknown G 0 , the dimensions of the matrix R of reactions used in the lumping procedure, and the number of metabolite groups.

Model
No. metabolites (unknown f G 0 ) No. reactions (unknown G 0 ) R  The iBsu1103 model has a higher degree of missing thermodynamic data, leading to groups of sizes as large as 419 (out of 434) metabolites with missing f G 0 (Fig. 4b). As only seven groups were found, the trend of the group implementation taking longer to calculate was not as striking as for the redHuman model, but was still present with respect to the median time (0.199 vs 0.162). The group implementation was not feasible for the largest group, but the corresponding linear program took 13.59 s to solve. As a result, for each metabolite in this group, the sequential implementation had to be employed. The metabolites with unknown f G are eliminated stepwise compared to the sudden drop, displayed in the redHuman model (Fig. 4b). The linear programs for the group implementation in the case of the smaller, remaining groups are feasible (Fig. 4b). As there were only seven groups present, no reliable trend regarding group size and success could be further established.
The application of the combined procedure on the two models demonstrated that the group implementation provides speed-up due to a reduced number of linear programs to solve. In addition, our results showed that the group procedure adds marginal increase in running time, in case it needs to be followed by the sequential procedure.
Benefits of the lumping procedure in TMFA applications. To demonstrate the advantages of using the models that consider lumped reactions, based on the proposed procedures, we implemented TMFA for the models of E. coli and B. subtilis (see "Methods"). We then computed the range of values that G j takes for every reaction j in the original model and the model with lumped reactions at the optimal biomass obtained from TMFA. This analysis allowed us to classify the reactions into irreversible and reversible based on the sign that the maximum G j takes. The reversible reactions could further be divided into those whose range is reduced upon lumping and those whose ranges show shifts between the two models.
In the case of E. coli's model, we found that the lumping changed the number of irreversible reactions from 353 in the original model to 350 in the one that also considered the lumped reactions to impose thermodynamic constraints ( Table 3). As a result, the number of reversible reaction was reduced from 707 in the original to 704 in the model with lumped reactions ( Table 3). The reactions deemed irreversible in the E. coli model with lumped reactions include: UAG2E, DAPabc, and ACMAMUT (see Supplementary Figure S1). Expectedly, the consideration of lumped reactions decreased the ranges of 116 reversible reactions (see Supplementary Figure S2).
In the case of B. subtilis' model, we observed that there is no change in the number of irreversible and reversible reactions with and without consideration of lumped reaction (Table 3). However, the ranges of Gibbs free energy were substantially reduced for 101 reactions (see Supplementary Figure S3). Therefore, we conclude that the lumping procedure does have an effect on the findings from TMFA and can be effectively used to obtain more constrained predictions, particularly for metabolite concentrations (that are interlinked with the values of Gibbs free energy).

Discussion
The idea of reaction lumping has been introduced to provide additional constraints for reactions with unknown G 0 , but can be linearly combined into an overall lumped reaction whose G 0 can be easily determined based on the thermodynamic data for the modelled components. The applications of TMFA are in part hampered by the lack of good coverage of thermodynamic data 19,32,45 , leaving the flux solution space less constrained. Here, we proposed an algorithm which fills the gap between the introduction of the idea of reaction lumping and the TMFA approach that imposes special constraints to the lumped reactions. Our procedure starts with the identification of groups of metabolites with unknown f G 0 , identified by inspecting the connected components of the bipartite metabolite reaction graph. The identification of lumping reactions can thereby be solved independently on each of the groups using the combined lumping procedure. The combined lumping procedure consists of solving one linear program, for the group implementation, and a series of linear programs that are iteratively solved, for the iterative implementation. The group component can be interpreted as a shortcut, by testing whether all metabolites in a group can be eliminated at once. In addition, the proposed combined procedure maximizes the number of metabolites with unknown f G that can be eliminated via reaction lumping.
In general, the linear program in the group implementation takes a longer time to solve due to the larger number of constraints it includes. Although the sequential procedure starts with smaller linear programs, requiring shorter time to be solved, the iterative process leads to an increasing number of such program that include Table 3. Distribution of reversible/irreversible reactions with and without lumping. Shown are the numbers or reversible and irreversible reactions for the models iJ906 and iBsu1103 with and without lumping. A reaction is defined as irreversible if respective G-range is strictly negative, which is determined with a variability analysis ensuring optimal biomass. www.nature.com/scientificreports/ a larger number of both constraints and variables. The total duration, thus, depends on the particularities of the analyzed model. In our comparative analysis we presented a case when the combined procedure performs worse than the naive sequential implementation applied alone. In this case, only few groups of metabolites with unknown f G 0 were found, including a dominant group that includes almost all such metabolites. The situation differs in the scenario where the groups of metabolites with unknown G f are more evenly distributed, leading to the advantages of the combined procedure. The recently published toolbox to conduct TMFA, called matTFA, states that any reaction involving metabolites with unknown f G 0 "will not be constrained with thermodynamics" 42 . Our proposed group lumping procedure investigates whether such reactions can be constrained with thermodynamics. The shown decrease of reactions with unknown G 0 , with respect to the total number of reactions, corresponds to a gain in constraints which has the potential to further restrict the solution space without the need of any additional data.
Our comparative analysis on three genome-scale models that differ in terms of size and complexity of the modelled metabolic pathways shows that the lumping procedure is time-efficient, systematic, and results on reproducible findings. The proposed combined procedure clearly defines how lumped reactions are formed and ensures that G 0 can be calculated for each lumped reaction. In contrast, previous studies only provide a list of lumped reactions, without specifying how they were obtained, thus not ensuring reproducibility 18 . The reproducibility of our findings is further guaranteed by the provided implementation of the proposed procedure and accessibility of all data, following the FAIR principles. The implementation is general to allow the identification of lumped reactions in any metabolic model complying with the input specifications. Therefore, the systematic way for identification of lumped reactions by the proposed combined procedure has the potential to further propel the applications of TMFA, due to the increasing availability of quantitative metabolomics data 46 . Indeed, our application of TMFA with and without consideration of lumped reactions in two genome-scale models shows that the lumping procedure provides for more constrained predictions of metabolic phenotypes.
The provided procedure allows some flexibility with respect to the choice of reactions to be lumped. In our analysis, we did not consider synthetic and exchange reactions in the lumping, and future efforts will be dedicated to the consideration of these cases. In addition, follow-up studies will be dedicated to investigative the effect of alternative solution in the detection of lumped reactions on the possibility to maximize the number or metabolites with unknown f G 0 that can be eliminated by the proposed combined procedure.