A genome-scale metabolic model of Saccharomyces cerevisiae that integrates expression constraints and reaction thermodynamics

Eukaryotic organisms play an important role in industrial biotechnology, from the production of fuels and commodity chemicals to therapeutic proteins. To optimize these industrial systems, a mathematical approach can be used to integrate the description of multiple biological networks into a single model for cell analysis and engineering. One of the most accurate models of biological systems include Expression and Thermodynamics FLux (ETFL), which efficiently integrates RNA and protein synthesis with traditional genome-scale metabolic models. However, ETFL is so far only applicable for E. coli. To adapt this model for Saccharomyces cerevisiae, we developed yETFL, in which we augmented the original formulation with additional considerations for biomass composition, the compartmentalized cellular expression system, and the energetic costs of biological processes. We demonstrated the ability of yETFL to predict maximum growth rate, essential genes, and the phenotype of overflow metabolism. We envision that the presented formulation can be extended to a wide range of eukaryotic organisms to the benefit of academic and industrial research.


Implementing multiple expression systems
Since an important difference between bacterial and eukaryotic cells is the presence of multiple expression systems in the latter, we extended the original ETFL formulation to be able to consider multiple expression systems. Consider the following two constraints from the original formulation: where ! and ! represent the amounts of ribosome and RNA polymerase that are allocated to transcribe and translate the gene , respectively. Equation (1) implies that the sum of the RNA polymerases allocated to transcribe different peptides must be equal to total available RNA polymerases, i.e., "#$% . Similarly, Equation (2) implies that the sum of the ribosomes allocated to translate different peptides must be equal to total available ribosomes, i.e., &'( . These constraints assume that all the genes are transcribed and translated by a single pool of RNA polymerase and ribosome, respectively. However, there are multiple types of RNA polymerase and ribosome in eukaryotes. Without loss of generality and to make it clearer, we explain the extension for the case that there exist two types (pools) of RNA polymerase. Similar procedure can be applied to the cases that the number of RNA polymerases is different. Also, multiple types of ribosomes can be defined in the same way. Suppose that the cell has two RNA polymerases, RNAP ) and RNAP * .
Per each type of RNA polymerase, an enzymatic pool can be considered, which is represented as "#$% + , ∈ {1,2}. The genes can be divided into at most three sets: (i) the genes transcribed by RNAP ) , represented by ) , (ii) the genes transcribed by RNAP * , represented by * and (iii) the genes transcribed by both enzymes, represented by ),* . Instead of Equation (1), the following constraints are applied: The Equation (3) and Equation (4) ensure that, respectively, the sum of the allocated RNA polymerase to the genes in the set ) and * cannot exceed the total available RNA polymerase in the pool. The equality in Equation (1) is converted to inequality in these equations to let part of the RNA polymerases be allocated to the genes in the set ),* . Then, the Equation (5) enforces the equality between the total allocated RNA polymerases and the available enzyme in the two pools. By maximizing growth rate subject to these constraints, the cheaper RNA polymerase, i.e., the one with the less metabolic demand, takes over all the genes from the shared set ( ),* ).

Thermodynamic curation of the S. cerevisiae genome-scale model Yeast8
Including thermodynamic information in the S. cerevisiae GEM Yeast8 [1] allows us to constrain the reaction directionalities according to the second law of thermodynamics, that is, according to the Gibbs free energy of each reaction. Towards this end, we performed a thermodynamic curation of Yeast8. This curation was performed in two stages. In the first stage, we estimated the Gibbs free energy of formation of the compounds and the corresponding error for the estimation. First, we used MetaNetX (http://www.metanetx.org) [2] to annotate the compounds of Yeast8 with identifiers from public databases, such as KEGG [3], CHEBI [4], and SEED [5]. From these databases we obtained the structures of the compounds in the form of SMILES. For the compounds lacking SMILES, we manually derived them based on the structure and composition of the metabolites, including mainly lipids and phospholipids. We next used Marvin (version 18.1, 2018, ChemAxon http://www.chemaxon.com) to transform the SMILES into their major protonation state at pH 7 and to generate MDL Molfiles. These last ones were used in the Group Contribution Method (GCM) to estimate the standard Gibbs free energy of the formation of the compounds as well as the error of the estimation [6].
Following the described steps, we estimated the standard Gibbs free energy of formation for 1092 of 1326 total unique metabolites from Yeast8.
In the second stage, we estimated the Gibbs free energy of the reactions that are part of the Yeast8. First, we integrated in the Yeast8 model the thermodynamic properties of the compartments present in the model, including pH, ionic strength, membrane potentials and generic compartment concentration ranges (Table S5 and Table S6). Then, the standard Gibbs free energies of formation are adjusted following the Debye-Hükel approximation to the corresponding pH and ionic strength [7]. The corrected Gibbs free energies of formation were then used to compute Gibbs free energy of reactions. Considering a reaction with m components, its Gibbs free energy, ∆ 2 G 3 is: where = 1, … , . 4 is the stoichiometric coefficient of compound ; ∆ 5 G 4 36 is the standard Gibbs free energy of formation of compound ; 4 is the concentration of the compound ; is the ideal gas constant, = 8.31 • 10 :; <= < 76! , and is the temperature. In this case, = 298 .
In the case of transport reactions, the ∆ 2 G 3 is corrected to account for the differences in pH and membrane potential between the compartments [7].
It is worth noting here, that these calculations and estimations of Gibbs free energies were performed for compounds and reactions under aqueous conditions. The Yeast8 model contains compounds that are part of membranes, and therefore they are no longer in an aqueous environment. We did not estimate ∆ 2 G 3 for 1304 reactions that did not take place in aqueous environment, which resulted in a thermodynamic curation of 1880 out of 3991 reactions in the Yeast8 GEM.
In its current version, protons can freely circulate between the cytosol and the mitochondria in Yeast8. Seeking to have an improvement of the transport of protons across the mitochondrial membrane, we introduced in the Yeast8 GEM an additional compartment, to account for the mitochondrial inner-membrane space. This improvement allows to correctly capture the transport of protons in the electron transport chain (ETC) for the production of ATP by the enzyme ATP Synthase. Furthermore, this adjustment of proton transport will improve the estimation of the Gibbs free energy of the corresponding reactions as it will allow to account for the correct thermodynamic properties of the compartments involved (i.e., pH, ionic strength and membrane potential).
The thermodynamically curated Yeast8 model as well as the thermodynamic information (Gibbs free energy of formation for the compounds) are publicly available in Zenodo 10.5281/zenodo.4778047.

Comparing ETFL and the previous formulation of ME-models
In this section, we provide a general comparison between the ETFL [25] and the other MEmodel formulations [23,24].
A key difference between the two formulations is that ETFL enables the integration of thermodynamic constraints and metabolomics data. Moreover, ETFL can account for multiple expression systems. On the other hand, the stable RNA splicing, and transcription initiation are only formulated in the other ME-model formulations. These functions can be also included in ETFL by expanding the formulation around the related processes. The transcription initiation is not completely neglected in ETFL but instead lumped in the other processes. This lumping is similar to the lumping used for translation initiation and elongation in all ME-models. Since it was shown that including such processes does not impact the predictions of the model in the protein limitation studies [24], and the transcription initiation is more complex in eukaryotic organisms, we did not model them at this stage. Also, in the other formulations of ME-models, the formation of metalloproteins is explicitly considered. In ETFL, the ionic requirements are lumped in the biomass reaction as it was done in the FBA models.
There were two formulations in the other ME-models regarding the enzyme mass balances. In the first formulation, presented in O'brien et al. [23], the enzymes appeared as metabolites in the reactions, and to properly account for mass balances, two pseudometabolites per enzyme were added to the model. Consider this toy example, where the following reaction with flux v1 is catalyzed by enzyme E: To associate the enzyme abundance to the flux v1, the following reactions were added: @ : enzyme → enzyme A&'BC + coupling (7) ) : + enzyme A&'BC → + enzyme (8) where coupling and enzyme A&'BC are pseudometabolites introduced to facilitate the addition of constraints and = D E %&' . While the mass balance constraint associated with enzyme A&'BC is an equality, the mass balance for coupling is an inequality. Writing the mass balances for the enzyme and the pseudometabolites, the following constraints can be derived: ) − @ = 0 (11) where FGHI and JCK are respectively the rates of translation and degradation of the enzyme.
In the second formulation of ME-models, presented in Lloyd et al. [24], to simplify the solving process, the inequality constraint in Equation 12 was converted to an equality. This assumes that all the enzymes catalyze their reactions with the maximum possible rate at the optimal solution. This way, the two pseudometabolites and their corresponding reactions were removed.
In the ETFL formulation, no pseudometabolite is considered and the enzymes are not involved in the reactions as metabolites [25]. For the reaction presented by Equation 6, the following two constraints are imposed: where E is the total concentration of the enzyme and Equation

Supplementary Tables
Mass balances for the uncharged tRNAs Mass balances for the rRNAs    Table S3: definition of the variables, parameters and sets used in Table S2.

Symbol Definition
Specific growth rate