Introduction

Microbiome engineering is an upcoming discipline that aims to manipulate, complement or restore the functionality of existing damaged communities, e.g., contaminated soils, by adding specific new metabolic capabilities.1 A rational engineering approach requires a detailed understanding of general principles of the functioning of the microbial community and its physiological adaptations to perturbations, but such understanding is currently lacking and fragmentary.1,2 The technically most simple way to provide new metabolic capacities to an existing microbial community is by strain addition (what one could call an N+1 scenario).3,4 After an initial screening of the existing capacity of the microbial community, one or more preselected and well-characterized strains with the intended complementation could be prepared, formulated and inoculated into the community. Depending on the aims, such inoculants should maintain and reproduce for longer-term inside the resident community or only deploy their metabolic capacity transiently.3,4

Inoculation of preselected strains has been widely practised for pollutant bioaugmentation, using bacteria with particular metabolic capabilities that enable them to efficiently degrade and grow on common pollutants such as toxic aromatic compounds.5 However, even the simplest inoculations and N + 1-strategies are rarely effective because it is insufficiently understood what inoculants need to establish successfully within a (new) existing community, and how they need to adjust their physiology to meet the requirements of the new environment and degrade the desired toxic compound(s). Modeling strategies based on the integration of a variety of (nowadays more easily) accessible condition-specific omics data, would help to better understand and predict how cellular regulation and physiology at different growth conditions and environments interplay. However, the impact and advantage of such integrative analysis are not yet explored to its full extent.6 We propose and demonstrate here that combining comprehensive genome-wide transcriptomics, exometabolomics and metabolic modeling can better predict physiological adaptation.

Metabolic modeling has largely advanced through the development of GEnome-scale Metabolic models (GEMs) and constraint-based modeling techniques such as Flux Balance Analysis (FBA). GEMs can be built from the annotated genomes and they describe an organism’s metabolism as completely as possible, linking genotype to metabolic phenotypes.7 GEMs encompass metabolites, metabolic reactions, and genes coding for the enzymes catalyzing the reactions. Together with FBA, GEMs predict steady-state fluxes,8,9 and therefore, they can predict cellular physiology. While the genome specifies the complete set of biochemical reactions which the cell can potentially carry out, the actual enzymatic capacity at each physiological condition is orchestrated by regulatory networks in the cell. GEMs do not explicitly consider regulation, whose effects are better reflected in the global transcriptome and the metabolome.10,11,12,13,14 FBA approaches have been extended with RNAseq and metabolomics data to capture cell regulation and more accurately describe cellular metabolic behavior.15 For example, transcriptional regulation of gene expression has been linked to GEMs, either by taking into account the absolute expression values, scoring genes and subsequently reaction fluxes as active or non-active based on their expression,16,17,18 or by incorporating relative gene-expression.14,18 Use of relative gene expression is assuming that the relative changes between two conditions correlate with the resulting differential flux profiles. Both approaches can lead to condition-specific GEMs that are more effective for inferring the actual biochemical activity and the observed physiology of the microorganism.

As a study system for predicting physiology from an integrated GEM-transcriptome-metabolome approach, we here use Pseudomonas veronii 1YdBTEX2. Strain 1YdBTEX2 is capable of degrading a variety of mono-aromatic hydrocarbons such as benzene, toluene, ethylbenzene, and m- and p-xylene (BTEX).20,21,22 The ability of P. veronii 1YdBTEX2 to grow in contaminated environments makes it a promising candidate for rational complementation of microbial communities in contaminated soils.23 Based on an available manually curated high-quality genome,22 we reconstructed the first GEM for P. veronii 1YdBTEX2 (iPsvr). Genome-wide transcription changes and exometabolome compounds were measured during growth of P. veronii on toluene, in exponential and in stationary phase. Transcriptome and exometabolome data were integrated into the iPsvr using the recently developed tool REMI (Relative Expression and Metabolomics Integrations).19 REMI integrates gene expression, thermodynamic, and metabolite abundance data into metabolic models, and then maximizes the consistency between differential gene expressions, metabolite concentrations and levels of metabolic fluxes for any given pair of conditions. Therefore, by building a context-specific model for each condition, it allows elucidating metabolic states and predicting cellular phenotypes and flux distributions of condition-specific differences e.g., wild type vs mutation. Two obtained metabolic models representing exponential and stationary physiologies were then used to evaluate growth rates and the production of biomass precursors, and model predictions were compared to the experimentally observed values. Although the temporal variations of the growth rate cannot be predicted using GEMs,24 we showed that introducing the additional regulatory information from gene expression and metabolomics data into GEMs allows for consistently estimating growth rates at different growth phases. Finally, we incorporated into iPsvr previously published transcriptomics data of P. veronii transits from liquid culture to sand22 to understand its physiological adaptation in soil. Our work shows strong consistency of model outputs with the experimental data, manifesting that integration of condition-specific omics data into a curated GEM constitutes a major improvement for prediction of metabolic reprogramming during adaptation.

Results

Developing an integrated genomic-transcriptomic-metabolomic workflow

To develop a pipeline that integrates genomics with transcriptomic and metabolomic data we advanced in three stages: (1) Quantify the cellular states at each phenotype, by genome-wide transcriptomics and exometabolomic data from spent media composition (Fig. 1a); (2) Construct a GEM for P. veronii strain 1YdBTEX2 (iPsvr), gap-fill missing parts of the metabolism (compounds and reactions), complement genome annotation using the transcriptomics and exometabolomic data and estimate the steady-state growth rate using FBA (Fig. 1b); and (3) Link the interrelationships between growth phases and the differentially expressed genes and metabolite abundances by statistical inference and by REMI.

Fig. 1: Schematic overview of the integrated genomic-transcriptomic-metabolomic pipeline applied in this study.
figure 1

a Stage 1: Relative gene expression and exometabolomic data were determined and analyzed, and these data were used to gap-fill iPsvr at blocked reactions. b Stage 2: A genome-scale metabolic model (GEM) of P. veronii strain 1YdBTEX2, iPsvr, was reconstructed and flux-balance analysis (FBA) was performed to simulate the growth of the cell. c Stage 3: Additional relative differential gene expression and metabolite abundance data were integrated into the metabolic model with REMI and physiology-specific models were built. Here, the REMI methodology is illustrated on a section of inferred iPsvr glycine, serine and threonine metabolism and iPsvr-EXPO (exponential phase) and iPsvr-STAT (stationary phase) as the two physiology-specific models. Significantly differentially expressed genes (here, serB, glyA, DSD1 and GGAT) are outlined in boxes. The thickness of arrows designates the fold-change in estimated fluxes, where green arrows indicate consistency with the gene-expression fold-change values, and the red ones inconsistencies. Measured metabolite concentrations (here: pyr, sarcs, gly and glx) are indicated in green if the values are consistent with estimated fluxes and in red otherwise. Phosphoserine phosphatase, SerB; Serine hydroxymethyltransferase, GlyA; D-serine dehydratase, DSD1; Glyoxylate aminotransferase, GGAT; Pyruvate, pyr; Sarcosine, sarcs; Glycine, gly; Glyoxylate, glx.

In this work, we used the pipeline to: (i) generate two growth-phase-specific models of P. veronii growing on toluene, iPsvr-EXPO and iPsvr-STAT, which correspond to the exponential and the stationary phase, respectively, and employ them to predict quantitative and dynamic readouts of P. veronii metabolism in the two conditions (Fig. 1c); and (ii) build two context specific models, iPsvr-LIQ and iPsvr_SAND, for the growth on liquid medium and in a sand environment, respectively, by integrating previously published gene expression data22 into iPsvr. The models were then used to predict the physiology in a liquid-to-sand transition.

Genome-wide gene expression and metabolite formation over time

Whole-genome gene expression profiles and metabolite formation in the spent medium were analyzed in P. veronii cultures growing in liquid minimal medium with toluene as sole carbon and energy source, sampled at 0 h (T0h), 4 h (T4h, EXPO) and 24 h (T24 h, STAT) after inoculation.

Based on growth kinetics of P. veronii 1YdBTEX2 on toluene, we identified 4h and 24 h to be representative time points for P. veronii 1YdBTEX2 metabolic activity at the early exponential growth and late stationary phase, respectively. Genome-wide gene expression was quantified by mapping Illumina 100 nucleotide long single–end sequencing reads from deeply sequenced cDNA libraries to the protein coding genes in P. veronii genome (read numbers indicated in Table S1). For each sampling time point, four replicates clustered closely together, with slightly higher variability observed among the T24h replicates (Fig. S1A). A pair-wise comparison of expression levels showed that 1458 (818 up-regulated and 640 down-regulated) out of the total 6943 genes (21%) were significantly differentially expressed between EXPO and STAT phase cells, with at least 2-log fold-change induction (false discovery rate [FDR] < 0.05) (Fig. 2a, b).

Fig. 2: Genome-wide gene-expression and metabolite abundance differences in P. veronii 1YdBTEX in exponential (EXPO) relative to stationary (STAT) phase.
figure 2

a Smear-plot of global gene expression intensity (2 log counts per kilobase per million, CPKM) versus expression changes (2 log fold change) in EXPO vs. STAT. In gray, genes not statistically differentially expressed (logFC < 1, FDR > 0.05, P > 0.01); magenta, genes with lower, and dark purple, genes with higher expression in EXPO (+). b Differential expression per gene between EXPO and STAT plotted as a function of genomic location (chromosome 1, chr1; chromosome 2, chr2; and plasmid, plm; organized according to locus_tag number). Bars indicate 2log-fold change, dark purple and pink denote statistically significant higher and lower expressed genes in EXPO, respectively. c Heatmap of 164 metabolites annotated with their KEGG IDs showing clustering at different culture sampling time points. Metabolites are clustered (x-axis) in three major groups (left to right): (i) accumulation at T24h (STAT) (pink box), (ii) accumulation at T4 continued to T24 (yellow box), and (iii) accumulation at T4h (EXPO) followed at depletion at T24h (STAT) (purple box).

A priori, based on the genome annotation, a subset of 1241 “metabolic genes” were used in the GEM reconstruction (iPsvr). Out of these 1241, 300 (21%) were significantly differentially expressed in EXPO vs. STAT phase cells (FDR < 0.05) (Table S2). The transition to STAT phase in bacteria is characterized by growth arrest in response to several factors, such as carbon and nutrient depletion, the accumulation of toxic compounds and environmental stress, which decrease ribosomal activity and therefore protein synthesis. This is consistent with toluene becoming depleted in STAT phase. As anticipated, enriched GO terms for the category “Biological Process” among the differentially expressed genes between EXPO and STAT included “protein folding” (GO:0006457), “tRNA aminoacylation for protein translation” (GO:0006418), “intracellular protein transmembrane transport” (GO:0065002) and “regulation of transcription, DNA-templated” (GO:0006355) (Table S3), thus indicating cells to be more active in EXPO phase, as expected. Consistent with nutrients becoming depleted in STAT phase, the terms “benzoate catabolic process via hydroxylation” (GO:0043640) and “tricarboxylic acid cycle” (GO:0006099) (Table S3), important for aromatic compound catabolism, were under-represented in the STAT phase transcriptome.

The untargeted metabolomic analyses of the spent medium detected 1630 (positively charged) and 3509 (negatively charged) distinct ion species or metabolite features. Unsupervised principal component analysis yielded three distinct clusters indicating metabolic phenotype differentiation over time, from inoculum to stationary phase (Fig. S1B). Similar to the transcriptomics data, a greater variability was observed among the T24h replicates.

Temporal patterns of annotated metabolites by HMDB database (accurate mass) matching25 showed a significant increase in the spent media over time of the majority of the metabolites implicated in the toluene and benzene degradation pathways and central carbon pathways, including glycolysis, purine and pyrimidine metabolism and amino acid metabolism (Fig. 2c). This implies their production by the bacteria and progressive release into the media. One specific group of metabolites, including citrate (C00158), glutamate (C00025), glutamine (C00064), and aspartate (C00049), accumulated significantly in EXPO phase (T4h) in comparison to T0h, followed by lower levels in STAT phase (Fig. 2c). This suggests their excretion in EXPO phase and subsequent re-consumption when other nutrients became limiting (Fig. 2c).

Genome-scale metabolic model (GEM) of P. veronii strain 1YdBTEX2 (iPsvr)

A draft GEM was generated from the curated P. veronii genome22 by using the RAVEN toolbox26 (Fig. 1b). The draft GEM was gap-filled by iterative manual curation until we obtained a model able to carry non-zero flux through the biomass reaction at steady state. This signified cell ‘growth’, and indicated that the model was performing as expected for a biological system. The cell biomass composition and compartment information were derived from two available models of other Pseudomonads species: P. putida27,28,29 and P. stutzeri30 (see Methods).

We used Biolog growth data of P. veronii 1YdBTEX2 incubated with 26 different carbon substrates to further curate and refine the model. In comparison to a no-carbon control, Biolog indicated growth of strain 1YdBTEX2 on 15 out of the tested 26 substrates as the sole carbon source (Table S4). We confirmed the experimental observations using iPsvr for 20 substrates, whereas the model could not grow on the remaining six substrates (D-Xylose, D-Galacturonic Acid, L-Phenylalanine, D-Glucosaminic Acid, Itaconic Acid, and Putrescine). We hence gap-filled the model to include these additional metabolic tasks (Table S4). This gap-filling improved the metabolic network connectivity, and it resulted in a more versatile model that can be used for studies on an extended range of substrates.

The scope of iPsvr GEM was further widened by restoring the connectivity of the remaining ‘blocked’ reactions, i.e., isolated reactions that carry zero flux at any condition. To this end, we explicitly considered the empirical gene-expression and exometabolomics data. We first used a graph-based algorithm (see Methods) to decompose the iPsvr metabolic network into its main subnetworks of 1370 reactions and 23 blocked reactions/pathways of different lengths, with the longest blocked pathway consisting of seven reaction steps (Table S5). Out of 191 blocked reactions/pathways, we identified those associated with differentially expressed genes between the two growth conditions and the ones whose participating metabolites were present in the exometabolomic data. The identified reactions/pathways were next unblocked by gap-filling as described in the Methods section. Interestingly, we identified gap-filling reactions that had been annotated to P. veronii genes with RAVEN but had a lower score than the ones chosen as a baseline for the draft reconstruction of iPsvr. The gap-filling algorithm introduced in total 71 new metabolic reactions together with their corresponding 28 genes to iPsvr (Table 1 and Table S6). The reconstructed and gap-filled iPsvr accounted for 1243 genes, 1610 metabolic reactions and 1681 metabolites localized within two intracellular compartments, the cytosol and periplasm, and the extracellular environment (Table 1).

Table 1 iPsvr components in the final GEM.

The experimentally determined maximal specific growth rate of P. veronii strain 1YdBTEX2 in minimal medium with toluene as the sole carbon source was in the range of 0.25 h−1 to 0.35 h−1. In absence of proper empirical measurements of toluene uptake rate, we estimated a maximal uptake rate of toluene from a constrained maximal specific growth rate to 0.35 h−1 in the thermodynamics-based flux balance analysis (TFA).8,31,32 We computed that a toluene uptake rate of 0.45 mmol gDW–1 h–1 is required to achieve this growth rate.

Omics-based curation and gap-filling in iPsvr for toluene degradation and phenylalanine metabolism

Given the importance of toluene degradation by P. veronii 1YdBTEX2, we manually curated the predicted toluene (Fig. 3a) and phenylalanine metabolic pathways (Fig. 3b) to ensure they were fully functional in iPsvr. In P. veronii, toluene is converted via (1S, 2R)-3-methylcyclohexa-3,5-diene-1,2-diol to 3-methylcatechol, which is further degraded, according to the KEGG pathway database, through two pathways until the central carbon metabolism compounds are reached (Fig. 3a and Fig. S2). One of these pathways, which produces pyruvate and acetaldehyde in four reaction steps (1.13.11.2, 3.7.1.-, 4.2.1.8 and 4.1.3.39 in Fig. S2), has been experimentally shown to be the main toluene degradation route based on gene expression data.22 The iPsvr growth simulation on minimal media with toluene as the sole carbon source showed that this pathway was functional. In contrast, the second functional pathway (Fig. 3a),22 was blocked in iPsvr because it lacked the enzyme 2.8.3.6 (highlighted in red in Fig. 3a), and it was therefore disconnected from the main metabolic subnetwork. We found that three out of the six genes in this pathway (PVR_r1g5041, PVR_r1g5042, PVR_r1g1440) were more than 2-fold differentially expressed between the two growth conditions (highlighted in green in Fig. 3a), which further suggested that this pathway is indeed active in P. veronii. Homology-based BLAST searches33 of the gene sequences of 2.8.3.6 against the P. veronii genome identified the corresponding gene for catalyzing this reaction (PVE_r1g3867, e-value of 10−20, to scoA). Note that the default e-value in RAVEN is 10−50, which is why the reaction was not initially captured in the model from the genome annotation. Therefore, the missing reaction (2.8.3.6) was added to iPsvr, and the toluene pathway (Fig. 3a) was connected to the rest of the metabolic network through the Krebs cycle and carried flux.

Fig. 3: Gap filling of two blocked pathways in iPsvr based on the gene expression and exometabolomics data.
figure 3

a One of the two toluene degradation pathways in P. veronii (from the KEGG pathway database53) involved one missing enzyme (highlighted in red), and three genes (highlighted in green) differentially expressed between the exponential and stationary growth conditions. b The phenylalanine metabolic pathway involved three missing enzymes (highlighted in red) and four metabolites (highlighted in blue) identified in the exometabolomic data.

The phenylalanine metabolic pathway was gap-filled using the exometabolomic data (Fig. 3b). Three out of the eight reaction steps of this pathway (4.1.1.28, 1.14.14.54 and 1.14.13.-) were initially missing in iPsvr, leading to a dead-end pathway without flux. Four metabolites of phenylalanine metabolism were detected in the exometabolomics data (colored in blue in Fig. 2b), one of which, 2-hydroxyphenylacetate, was absent in the iPsvr GEM. This suggested that phenylalanine metabolism should proceed via 2-hydroxyphenylacetate in P. veronii (Fig. 3b). The enzyme producing 2-hydroxyphenylacetate from phenylacetaldehyde (1.14.14.54) was found by BLAST (PVE_r1g94, e-value of 10−20 to CYP504), added to iPsvr and could thus link both metabolites. The next step (1.14.13.-), yielding homogentisate (Fig. 3b) was also added, but no protein sequence could be assigned, as it corresponds to an orphan reaction (KEGG reaction ID: R05450). The first enzyme (4.1.1.28), which decarboxylates L-phenylalanine to phenethylamine, must be present, given the detection of both compounds in the exometabolomics data, although no P. veronii protein sequence could be identified through BLAST search.

Omics integration into iPsvr

REMI (Relative Expression and Metabolomics Integrations) was used to integrate the relative gene-expression and metabolite abundance data between EXPO and STAT into iPsvr. We explored three scenarios, depending on the type of data integrated into iPsvr: (i) REMI-TGex for the integration of relative gene-expression data only; (ii) REMI-TM for the integration of metabolomics data only; and (iii) REMI-TGexM for the simultaneous integration of both the gene-expression and metabolite abundance data into a thermodynamically curated model. The “T” in the three methods stands for the inclusion of thermodynamic constraints. Contextualized metabolic models, iPsvr-EXPO and iPsvr-STAT, coherent with omics data obtained at the exponential and stationary growth phase, respectively, were built, and the growth rates were again simulated (Table 2). 279 out of a total of 831 integratable gene-expression and metabolite abundance data points could be consistently included in REMI (Table 2, third row).

Table 2 Summary of the REMI results of the relative integration of exometabolomics (REMI-TM), gene expression (REMI-TGeX) and both datasets (REMI-TGeXM) into iPsvr.

We analyzed the EXPO and STAT phase using iPsvr-EXPO and iPsvr-STAT models. To this end, we used two variants of the REMI method: REMI-TGex, where we integrate thermodynamics and gene expression data, and REMI-TGexM, where we integrate thermodynamics along with both gene-expression and metabolomics data. Both REMI-TGex and REMI-TGexM analyses show that the growth rates at the STAT phase are considerably lower than at the EXPO phase, as observed experimentally. However, the integration of exometabolomics data alone (REMI-TM) was insufficient to predict a reduction in growh rate at the STAT phase. Both REMI-TGex and REMI-TGexM correctly estimated near-zero growth rates of 0.0056 h-1 and 0.0028 h-1, respectively, for the stationary phase (Table 2).

One of the distinguishable characteristics of REMI is the possibility of generating alternative solutions that correspond to the observed physiology (Methods). In this study, the alternative solutions allow investigating different flux distributions that result in the same simulated growth rate. Since REMI-TGexM can integrate the highest number of available experimental data as constraints (Table 2), we used this model in the study of alternative solutions.

REMI analysis revealed that there were 40 alternative solutions with the maximal consistency score (MCS) of 279 (Supplementary File. 1). We identified a set of 202 constaints, 162 from gene expression and 40 from exometabolomics data (Table S7), that were common amongst all alternative solutions. A total of 45 constaints varied amongst alternative outcomes (Table S7). Interestingly, the 40 constraints from exometabolomics data were common among all the 40 solutions and the difference (45 constraints) originated from the gene expression data.

We used the KEGG mapper tool to study those pathways that deregulate during the transition from exponential to stationary phase.34 The common reactions and metabolites amongst all the alternative solutions generated by REMI were mapped to KEGG pathways using KEGG mapper34 (Table S7). Biosynthesis of amino acids (KEGG map01230) with 41 deregulated reactions and metabolites scored as the most deregulated pathway. This was followed by Carbon metabolism (KEGG map01200) with 28 and Purine metabolism (KEGG map00230) with 21 deregulated reactions and metabolites. The complete list of the deregulated pathways is provided in Table S7. This analysis suggests that the deregulation of central carbon meatbolism and amino acid synthesis plays a critical role in growth arrest in the transition from exponential to stationary phase.

Analysis of the biomass precursor production at different growth phases

To further understand the underlying mechanisms of growth reduction when cells transit from EXPO to STAT, we identified the biomass precursors that may become limiting in the stationary phase, therefore leading to growth arrest. The 62 biomass precursors were grouped into seven groups of biomass building blocks (BBBs), namely carbohydrates, cofactors and vitamins, DNA nucleotides, lipids, minerals, amino acids, and RNA nucleotides. For each metabolite, we calculated the fold-change of their maximal production between EXPO and STAT using REMI-TGexM (Table S8).

For the BBB analysis, we used REMI-TGexM and investigated the variation of maximum possible BBB production across all the 40 alternative solutions (Supplementary Information 2), and through sampling, we identified the most representative solution (see Methods). Our results indicated large variations in the maximum production of BBBs between the two phases (Fig. 4), with the maximal relative change occurring in the production of cofactors and vitamins. The production of 52 out of the 62 BBBs (83%), was higher in iPsvr-EXPO than in iPsvr-STAT (Fig. 4, Table S8). The production of minerals remained unchanged, and rather surprisingly, it is possible to achieve higher production of 7 BBBs in STAT conditions (Fig. 4, Table S8). The maximum production of 19 out of the 20 precursors that were classified as cofactors and vitamins, was higher in iPsvr-EXPO than in iPsvr-STAT (Fig. 4, Table S6), and the only exception was Biotin. The maximum allowed production of four amino acids; L-Methionine, L-Cysteine, L-Leucine and L-Tyrosine was also higher in STAT (Fig. 4).

Fig. 4: Differential in silico production (mmol/gDW/h) of seven biomass building block groups between the exponential and stationary growth phase.
figure 4

The result of the integration of relative gene expression and metabolite abundance data (EXPO vs STAT) using REMI-TGexM. Each dot in the graph represents the individual biomass precursors within that category and all the dots represenet statistically significant changes (p = value < 0.05). For BBB group statistics, see Table S8.

P. veronii adaptation from liquid culture to the soil as growth environment

We analyzed the physiology for P. veronii during adaptation to the soil environment, using a previously published genome-wide transcriptome dataset of cells exposed for 1 h to liquid medium or to sand, with either toluene or succinate as carbon substrate.22

Integrating previously obtained transcriptomic data from liquid versus sand exposure22 into iPsvr using REMI-TGex, we produced two models, iPsvr-LIQUID for the liquid media and iPsvr-SAND for the sand environment. While TMCS was 612 for this dataset, we obtained a MCS of 246 and 265 for toluene and succinate carbon sources, respectively. We then generated alternative solutions for the maximum consistency score and simulated growth for the two media (Supplementary Information 1). In silico growth rates of 0.297 ± 0.002 h−1 and 0.44 ± 0.008 h−1 were predicted for toluene and succinate in liquid medium, repectively. Remarkably, the model predicted strong reduction of growth rate upon transition to the sand, i.e., from 0.297 h−1 to 0.161 ± 0.0001 h−1 (toluene) and 0.44 h−1 to 0.161 ± 0.002 h−1 (succinate). This suggests that cells have to adapt to soil as their new environment and have to reprogram their physiology before resuming growth.

To understand how the transition from liquid medium to sand impacts the intracellular metabolic fluxes and growth of P. veronii, we performed the analysis of alternative solutions. In the case of succinate, among the 195 alternative solutions, 152 reactions were in common and 88 reactions were varying between them, whereas for toluene, 134 reactions were common and 97 reactions varied across 84 different alternatives (Table S9). The KEGG mapper tool34 was used to identify the most deregulated pathways within the transition from liquid to sand, for both succinate and toluene (Table S9). Similar to the previous pathway analysis of transition from EXPO to STAT, the two most deregulated pathways were Carbon metabolism (KEGG map01200) and Biosynthesis of amino acids (KEGG map01230), for both Toluene and Succinate (Table S9). Fatty acid metabolism (KEGG map01212) was the third most deregulated pathway (with 20 deregulated reactions) for toluene, and the fifth (with 8 deregulated reactions) for succinate. Interestingly, fatty acid metabolism was not amongst the 25 top ranked deregulated pathways in transition from EXPO to STAT. Another difference from the EXPO to STAT pathway analysis was the deregulation of 2-Oxocarboxylic acid metabolism (KEGG map01210) as the third and the sixth deregulated pathways in toluene and succinate sand exposures, respectively. This suggests that although lower growth is observed in both transitions, i.e., from EXPO to STAT and from liquid to sand, the deregulated pathways underlying the growth reduction are not necessarily the same.

We further investigated the production of BBBs in both growth environments (Table S10). In contrast to the EXPO to STAT transition results, in the case of cells exposed to sand, the production of the majority of precursors was higher in sand compared to liquid media (Fig. S3a, b). This suggests that although cells grow much less in sand than in liquid, they still actively produce BBBs, and the maximum production can be even higher in sand than liquid media for certain BBBs (Table S10). The higher production of lipids and amino acids in sand for both toluene and succinate might indicate that cells reshape their membrane content and protein metabolism upon transition to sand.

Discussion

This work demonstrates how a metabolic-model-based (multi)omics data integration approach can accurately capture cell physiology and adaptation during growth or environmental transitions. Metabolic models are typically used for predicting optimal cell growth, but not for growth transitions and adaptation. We show here the usefulness of models with comprehensively integrated transcriptomics and exometabolomic to capture changes in growth rate during adaptation. As example, we studied transition of the toluene-degrading bacterium P. veronii 1YdBTEX2 from exponential growth to stationary phase, and during adaptation from liquid to a sand environment.

For the purpose of this study, we reconstructed and curated the first genome-scale metabolic model of P. veronii, iPsvr. iPsvr is useful for a wide range of applications in biological and biotechnological research, such as the integration of omics data, investigation of relationships among species in microbial communities, bioremediation strategies, design of metabolic engineering strategies, hypothesis-driven discovery, and analysis of metabolic network properties. As such, iPsvr may represent a valuable resource for the study of complex microbiomes and microbiome engineering.

GEMs usually contain inconsistencies that manifest as blocked reactions, i.e., the reactions that carry zero flux. Most of the available metabolic models are gap-filled to obtain a functional model that produces all the BBBs. Nevertheless, many reactions in GEMs are not directly connected to biomass formation and may remain blocked even after routine gap-filling. Roughly, 20–50% of the reactions of published GEMs are blocked reactions.35,36 For example, in the most recently and best-curated published E. coli model, iML1515, 10% of the reactions are blocked.37 We identified the blocked reactions/pathways in iPsvr and introduced a novel omics-based gap-filling approach, which targets the gap-filling of reactions/pathways by looking at differentially expressed genes or metabolites. We showed that the gap-filling step improved the metabolic model connectivity, thus increasing the number of functional pathways that can carry flux. The curation also increased the consistency of the metabolic model with the omics data, expressed as the number of genes/metabolites with available omics data that can be integrated into iPsvr. We illustrated through two gap-filling examples on toluene and phenylalanine metabolism how we could add the missing parts of metabolism and increase consistency. Gap-filling using metabolomics data not only improved the network connectivity, but also allowed integrating metabolites and uncovering missing reactions that must be part of the metabolic network but are missing due to unannotated and misannotated genes as well as promiscuous enzymes.38

Statistical analysis of gene expressions and metabolite levels revealed significant variations across the exponential and stationary phase, exemplifying the known distinct physiologies under these conditions. However, the system’s level physiology and the underlying adaptation mechanisms remained unrevealed.

It should be highlighted that in FBA, the growth rate is evaluated under the assumption that all metabolic fluxes in the cell are geared towards the maximal production of biomass at each moment of the cell growth and under any environmental condition. Nevertheless, inclusion of REMI,19 a method for the integration of omics data into metabolic models and for building context-specific models, produced predictions of P. veronii growth rates under dynamic conditions. Integration of the transcriptomic data (REMI-TGex and –TGexM) proved to be vital for predictions of growth rate dynamics, i.e., higher growth rate at exponential versus almost a growth arrest at the stationary phase, whereas inclusion of metabolomic data alone (REMI-TM) was insufficient. This might be expected, because only a small subset of metabolites could be putatively identified, such that their impact on the predictions was comparatively smaller. Overall, these results clearly demonstrated that integrating multi-omics data in GEMs significantly increases the consistency of the model predictions with experimental observations.

The analysis of alternative outcomes generated by REMI allowed us to better characterize the intracellular state of P. veronii growth by identifying the metabolic pathways whose fluxes were consistently changed across all alternatives thus identifying where the fluxome was mostly rewired. Interestingly, although the same growth reduction phenotype was observed for both the transition from EXPO to STAT and liquid to sand, the deregulated pathways were not completely identical.

REMI analysis of iPsvr-EXPO and iPsvr-STAT suggested P. veronii adapts to STAT phases primarily by limiting the metabolic fluxes for the production of the majority of cofactors and vitamins required for growth, with metabolism reshuffling being regulated at the transcriptional level. In addition, an increased production of specific amino acids in the STAT phase suggests the cell is preparing for starvation and survival. In contrast to the EXPO to STAT transition, REMI analysis of iPsvr-LIQUID and iPsvr-SAND indicated that, surprisingly, the maximum production of most of precursors from the cofactors and vitamins, and the BBBs amino acids actually increased in cells transited to sand, consistent for both sand with toluene or succinate as the sole carbon source. We hypothesize that the production of these BBBs and the function of their associated metabolic pathways are rewired to adapt to the local sand conditions. This finding resonates with previous work of Morales et al.,22 who concluded from Gene Ontology terminology that P. veronii cells inoculated in sand readjusted their metabolism during the first hour of contact.

Collectively, our results demonstrate the importance of integrating into metabolic models the contextualization of condition-specific gene-expression and metabolite-abundance data. This increases the value of growth rate predictions and improves the assessment of the relative changes of measurable metabolic phenotypes. The method as demonstrated here is thus an important advancement to explain, quantify or predict cellular responses to environmental or genetic perturbations, which is crucial for microbiome engineering.

Methods

Culture conditions for P. veronii transcriptome and exometabolomic studies

P. veronii 1YdBTEX2 was grown on solid 21C mineral medium39 with toluene in the vapor phase at 30 °C for 3 days. Colonies were inoculated into three 100 ml screw-cap conical flasks containing 25 ml of the mineral medium amended with 0.5 ml of a 1:19 mixture of toluene:tetradecane (Sigma-Aldrich ref: 34866; Aldrich ref: 87140). Flasks were incubated at 30 °C and at 180 rpm on an orbital shaker until a culture turbidity (OD600) of 0.7 (mid-exponential growth phase). At this point, the three cultures were pooled, and the bacterial cells were harvested by centrifugation (swing-out rotor A-4-44, Eppendorf; 3220 g, 8 min, 30 °C). The cell pellet was resuspended in the mineral medium and centrifuged again, as described above, to remove any residual carbon. After that, the cells were resuspended in mineral medium, diluted to obtain a starting OD600 of 0.16 and transferred into four replicate flasks, to which the toluene:tetradecane mix was added, as described above. The cultures were incubated at 30 °C and 180 rpm and regularly sampled for OD measurements. For transcriptomics, ~1 × 109 cells were harvested from an appropriate volume of the culture by centrifugation at 3500 g, for 6 min at 30 °C; snap-frozen in liquid nitrogen and stored at −80 °C until RNA extraction. Three quadruplicate sample sets were produced: the inoculum at time 0 (T0h), that is cells starved for ~30 min at 25 °C in mineral medium without a carbon source; exponentially growing cells (at OD600 ≈ 0.5, harvested after 4 h of incubation in the toluene-amended mineral medium); stationary phase cells (at OD600 ≈ 1.8–1.9 harvested after 24 h incubation). To study the composition of the exometabolome (metabolites in the culture medium excreted or leaked out of cells), 1 ml of the culture was sampled at the same time points. Cell culture was transferred into a 1.5 ml polypropylene microcentrifuge tube, which was clarified by 20 min centrifugation at 21,100 × g and 4 °C, after which 0.5 ml of spent media was transferred to a new polypropylene tube, stored at −80 °C and shipped on dry ice to the metabolomics facility.

Biolog data

P. veronii 1YdBTEX2 was plated from –80 °C on LB agar and incubated at 30 °C for 16 h to obtain isolated colonies. A single colony was subsequently restreaked on a minimal medium agar plate, which was incubated in a closed glass jar with a vial of toluene. Single colonies grown on toluene were transferred to closed glass bottles with 20 ml minimal medium to which 20 µl of toluene dissolved in 1 ml of tetradecane was added. Cultures were incubated with agitation at 30 °C and cells were harvested after 24 h by centrifugation, removing tetradecane carefully. Cells were resuspended in minimal medium to a density of 106 per 75 µl. Wells of BIOLOG EcoplatesTM containing triplicates of 31 carbon sources and one no-carbon control were filled with 75 µl minimal medium, after which 75 µl of P. veronii cell suspension was added and mixed. In total five Ecoplates were inoculated, covered with a sterile seal and incubated in the dark for 16 days at room temperature. Absorbance at 590 nm, representative for the respiration activity of the cells, was measured by spectrophotometry every second day.

RNA extraction, RNA-seq library preparation and sequencing

Total RNA from the frozen cell pellets was extracted using the RNA PowerSoil Total RNA Isolation Kit (MoBio Laboratories) as recommended by the manufacturer. Contaminating genomic DNA was removed by two cycles of TURBO DNase (Invitrogen) digestion and RNeasy MinElute Cleanup kit (QIAGEN) column purification. The quantity, purity and integrity of RNA samples were assessed using agarose gel electrophoresis, NanoDrop spectrophotometer (ThermoFisher Scientific) measurements and Agilent 2100 Bioanalyser (Agilent Technologies) profiling.

The completeness of DNA removal was verified by PCR using primers Pv_chr2_fw, ATCGGCTGTCCGACATCGGACG and Pv_chr2_rev, TCGAAGAGCTCCACCGAGAGCCGCC) and 1 pg of genomic DNA as a positive control, as described previously.22 Next, 4 µg of each RNA sample were depleted from ribosomal RNAs, converted to the reverse-complement stranded Illumina sequencing library using the ScriptSeq Complete Kit (Bacteria, Illumina) and indexed with ScriptSeq™ Index PCR primers set 1 (Epicentre, Illumina) following the standard protocol. The resulting directional RNA-seq libraries were sequenced using single-end 100-nt read chemistry on an Illumina HiSeq 2500 platform (Illumina) at the Lausanne Genomic Technologies Facility.

Untargeted LC-HRMS metabolomics

The spend media (100 µL) samples collected at different time points were extracted with 400 µL of ice-cold methanol to quench the metabolism, precipitate proteins and extract a broad range of polar metabolites. The extracted media were analyzed by HILIC-HRMS using an electrospray ionization source operating in both positive and negative mode. Pooled QC samples (representative of the entire sample set) were analyzed periodically (every 4 samples) throughout the entire analytical run in order to assess the quality of the data, correct the signal intensity drift and remove the peaks with poor reproducibility (CV > 30%) that can be considered chemical or bioinformatic noise. Data were acquired using a 1290 UHPLC system (Agilent Technologies) interfaced with a 6550 iFunnel Q-TOF mass spectrometer operating in a full-scan MS mode. In addition, pooled QC samples were analyzed in auto MS/MS mode (i.e., Data Dependent Analysis [DDA]) to acquire the MS/MS data for metabolite identification. In positive mode, chromatographic separation was carried out using an Acquity BEH Amide, 1.7 μm, 100 mm × 2.1 mm I.D. column (Waters, Massachusetts, US). The mobile phase was composed of A = 20 mM ammonium formate and 0.1% formic acid in water and B = 0.1% formic acid in acetonitrile. A linear gradient elution from 95% B (0–1.5 min) down to 45% B (17–19 min) was applied followed by 5 min for column re-equilibration to the initial gradient conditions. The flow rate was 400 μL/min, column temperature 30 °C and sample injection volume 2 µl. ESI source conditions were set as follows: dry gas temperature 290 °C and flow 14 L/min, fragmentor voltage 380 V, sheath gas temperature 350 °C and flow 12 L/min, nozzle voltage 0 V, and capillary voltage 2000 V. In negative mode, a SeQuant ZIC-pHILIC (100 mm, 2.1 mm I.D. and 5 μm particle size; Merck, Damstadt, Germany) column was used. The mobile phase was composed of A = 20 mM ammonium acetate and 20 mM NH4OH in water at pH 9.3 and B = 100% acetonitrile. The linear gradient elution ran from 90% (0–1.5 min) to 50% B (8–11 min) down to 45% B (12–15 min). Finally, the initial chromatographic conditions were established during a 9 min post-run for column re-equilibration. The flow rate was 300 μL/min, column temperature 30 °C and sample injection volume 2 µl. ESI source conditions were set as follows: dry gas temperature 290 °C and flow 14 L/min, fragmentor voltage 380 V, sheath gas temperature 350 °C, nebulizer 45 psi and flow 12 L/min, nozzle voltage 0 V, and capillary voltage 2000 V. The fragmentor voltage of 380 V has been optimized in both negative and positive using pure chemical standards in order to improve metabolite signal and minimize in-source fragmentation. Thus, this value is found as the best compromise between sensitivity and in-source fragmentation.

In the MS-only mode, the instrument was set to acquire over the m/z range 50–1200, with the MS acquisition rate of two spectra/s. Targeted MS/MS data for dysregulated metabolite features were acquired using the inclusion list with narrow isolation window (≈1.3 m/z), MS acquisition rate of 500 ms, and MS/MS acquisition rate of 500 ms.

Data processing and statistical analysis

Transcriptomics

Read mapping, sorting and formatting of the raw reads was done with Bowtie240 and Samtools,41 using the finalized gapless P. veronii 1YdBTEX2 genome sequence as in Morales et al.22 Mapped reads were counted with HTSeq,42 then further processed and analyzed with edgeR.43 Only reads counted more than once per million in at least three replicates were kept. After normalization of the counts, transcript abundances were compared in pairwise conditions in a modified Fischer exact test (as implemented in edgeR). Genes were called significantly differentially expressed between two EXPO and STAT when their false-discovery rate was <0.05 and their fold-change >2 and were subsequently interpreted using Gene Ontology (GO) analysis. GO terms of P. veronii genes were inferred using the program BLAST2GO.44 The same software was then used to analyze GO datasets of significantly differentially expressed genes in each pair-wise comparison, under the TopGO “Weight” algorithm.

Metabolomics

Raw LC–MS data were converted to mzXML files using ProteoWizard MS Convert. mzXML files were uploaded to XCMS for data processing including peak detection, retention time correction, profile alignment, and isotope annotation. Data were processed as a multi-group experiment, and the parameter settings were as follows: centWave algorithm for feature detection (Δm/z = 20 ppm, minimum peak width = 5 s and maximum peak width = 30 s, S/N threshold = 6, mzdiff = 0.01, integration method = 1, prefilter peaks = 3 prefilter intensity = 1000, noise filter = 0), obiwarp settings for retention time correction (profStep = 1), and parameters for chromatogram alignment, including mzwid = 0.015, minfrac = 0.5 and bw = 5.5. Preprocessed data (following the signal-intensity drift correction and noise removal with the “batchCorr” R package) were filtered according to the p-value (<0.05) and signal intensity (>1000 ion counts). The remaining table of metabolite features together with the most significant ion features selected from the loadings plot of the multivariate models in positive and negative ionization mode were subjected to metabolite identification as described below.

In the first instance, putative metabolite identification was performed by accurate mass and retention time (AMRT) matching against an in-house database (containing information on 600 polar metabolites from the Mass Spectrometry Metabolite Library Supplied by IROA Technologies, Sigma-Aldrich, characterized under the same analysis conditions). For that, raw data files (.d) were processed using Profinder B.08.00 software (Agilent Technologies) with the following parameter settings: mass tolerance 10 ppm, retention time tolerance 0.2 min, height filter 1000 counts, and peak spectrum obtained as an average of scans at 10% of the peak. In parallel, the XCMS output table of significantly different metabolite features was matched against the Human Metabolome Database (HMDB)25 based on accurate mass (AM) with Δppm = 10. The list of hits was further manually curated by taking into account the biological relevance of the hit (endogenous vs. exogenous metabolites) and a presence of the “true” peak shape (using the interactive XCMS Online interface45). Short listed ions of interest together with the metabolites identified by the in-house database were further subjected to targeted MS/MS validation. The metabolite identifications were validated by matching the MS/MS46 data acquired in the pooled samples (for each experiment in each LC–MS analysis mode) against the in-house PCDL database, METLIN (https://metlin.scripps.edu/) standard metabolite database47 or mzCloud (https://www.mzcloud.org/). Otherwise, if the MS/MS data quality did not allow for metabolite ID confirmation, the metabolite IDs remained putative, based only on accurate mass matching.

Genome-scale model reconstruction

The P. veronii 1YdBTEX2 genome-scale metabolic reconstruction process combines the automated draft metabolic network process with several manual refinements and curation procedures, in total involving four main steps: (i) functional annotation of the genome: functional annotation of the genome is required prior to the reconstruction of the GEMs, and consequently, the quality of a GEM highly depends on the availability of a gapless genome. The protein sequences (FASTA files) of P. veronii 1YdBTEX2 were acquired from a previously published study22 and were annotated to identify the associated reactions that enzymes catalyze to determine the stoichiometric matrix, done using the RAVEN Toolbox.26 The generation of the draft metabolic network followed the protocol detailed previously,26,48,49 and the output of this annotation process is summarized in Table S6. Version 1.07 of the RAVEN Toolbox and the version of KEGG database as of September 2017 were used. (ii) Compartmentalization, the definition of biomass reaction and uptakes and secretions: In the absence of information or experimental evidence about the compartmentalization and the cell content in P. veronii 1YdBTEX2, the cell compartment information (i.e., cytosol and periplasm), the transport mechanism between the compartments and the extracellular environment, the uptakes and secretions, and the biomass composition were obtained from two available models of closely related Pseudomonads: Pseudomonas putida27,28,29 and Pseudomonas stutzeri.30 The biomass reaction in a GEM designates the metabolic precursors that build the BBBs, i.e., DNA, RNA, amino acids, lipids and carbohydrates and their corresponding stoichiometric coefficients. (iii) Thermodynamic curation: In a thermodynamically curated model, the standard Gibb’s free energy of a reaction and consequently the directionality of the reactions, i.e., reversible or irreversible, are associated with the reaction as additional constraints, which allow the performance of TFA. We followed the established protocol to thermodynamically curate metabolic models31,32 to add thermodynamics constraints to iPsvr, where we could estimate the standard Gibb’s free energy of formation of 76% metabolites, using the group contribution method (GCM)50 and the standard Gibb’s free energy of 84% of metabolic reactions in iPsvr. This allowed us to perform TFA, simulate the growth and determine whether iPsvr was functional, with further evaluations if it correctly predicted the expected growth-associate phenotypes. (iv) Gap-filling of iPsvr: The draft metabolic network of iPsvr did not contain all the necessary reactions for the production of all the biomass building blocks, and thus, the model did not show any growth. This is a very common observation in the reconstruction process of GEMs, since the (automatic) genome annotations are often incomplete or erroneous, with significant proportion of predicted proteins having no functions attributed.51 Therefore, reactions without associated genes were included in order to obtain a functional model that simulates non-zero growth, done using a procedure called gap-filling. We followed the gap-filling procedure introduced previously,48 wherein the production of each biomass precursor is defined as a metabolic task26 and a mixed-integer linear programming (MILP) formulation is used to generate alternative groups of minimal number of reactions (borrowed from KEGG) that enable the production of the biomass precursor. The draft GEM was gap-filled by iterative manual curation until a model was obtained that was able to carry non-zero flux through the biomass reactions at steady state (i.e., signifying ‘growth’). Although gap-filling is a routine procedure in the curation of GEMs and most of the available GEMs are gap-filled to obtain a functional model that is able to grow under defined conditions, gaps in pathways not involved with biomass production are, however, mostly overlooked in GEM analysis. We performed a complementary second gap-filling step apart from the biomass reaction, where as a result, the consistency of iPsvr with the obtained experimental data on gene expression and metabolomics was increased. Available GEMs usually contain a large well-connected subnetwork, which encompasses the most of the central carbon metabolism and a part of the secondary metabolism, and many isolated reactions (or sets of reactions) are probably disconnected from the rest of the network because of misannotations or insufficiently known pathways. Such isolated reactions/pathways are blocked, i.e., cannot carry flux under any condition, and therefore one or more reactions must be added to connect the blocked reactions with the rest of the metabolic network. To this end, the metabolic network structure of iPsvr was decomposed to the main component and to the isolated (disconnected) reactions/pathways using a MATLAB graph-based built-in function (conncomp). Using the same gap-filling approach, the blocked reactions/pathways that were associated with differentially expressed genes (in the pair-wise comparison of exponential- and stationary-phase datasets) or measured metabolites in exometabolomic data were gap-filled to become functional in the model (carry non-zero flux). (v) Quality control of iPsvr: To assure the quality of the model, especially reaction balances, we used Memote,52 an automated genome-scale metabolic model test suite. Memote pinpointed reactions that were not balanced, and we manually balanced them, and commented the reasons for the reactions’ imbalance (Table S11).

Omics data integration

REMI19 was used for the integration of transcriptomics and exometabolomics data into iPsvr. REMI assumes that reaction fluxes associated with genes that are significantly differentially expressed are deregulated. Moreover, REMI also considers that the in vivo metabolite abundance ratios between the two conditions, e.g., the two growth phases can be used to constrain reaction fluxes associated with the metabolites that are differentially regulated. Expression/abundance-based ratios between the two conditions are formulated as flux perturbations for each reaction and are imposed as constraints on individual fluxes. Based on the data used, it translates into three different methods: (i) REMI-TGex allows the integration of relative gene expression data into a thermodynamically curated GEM, (ii) REMI-TM allows the integration of metabolomics data and (iii) REMI-TGexM integrates simultaneously both the gene expression and metabolite abundance data as additional constraints into the metabolic model.

REMI aims to maximize consistency between differential expression and fluxes as well as differential metabolite concentrations and fluxes. To study condition-specific differences in metabolism between two conditions (perturbed vs reference), REMI considers a separate metabolic model for each condition. Then, to integrate differential expression, REMI enforces a higher flux through a reaction in the perturbed condition (perturbed model) as compared to a reference condition (reference model) if the genes of the reaction are upregulated. For downregulated reactions, REMI enforces a lower flux as compared to a reference. To integrate extracellular metabolite concentrations, REMI assumes that if a metabolite is upregulated, then the production of the metabolite is forced to be higher as compared to a reference condition, and similarly, a lower production of a metabolite is forced if a metabolite is found to be downregulated. Then, an optimization problem is formulated to maximize the number of constraints imposed by the relative gene expression and metabolite abundances that can be integrated into the model while preserving a growth phenotype. Two scores are calculated: a theoretical maximum consistency score (TMCS), representing the number of genes/metabolites with available omics data, and the maximum consistency score (MCS), representing the number of genes/metabolites whose relative omics data are consistent with relative network fluxes and therefore can be integrated in the model.

The MILP formulation enables enumerating alternative sets (size equal to MCS) from a given set of constraints. The most consistent models are built by activating constraints that are overlapping and consistent between all the alternatives. Alternative solutions were computed for the maximal consistency score with the MILP solver in IBM CPLEX 12.7.2, and the time limit for the optimization was set to 3 h.

We used the transcriptomics and/or exometabolomics datasets measured at the exponential and stationary phase and also at the sand versus liquid environment, to derive additional flux constraints for the TFA problem, which were applied using REMI.

Thermodynamics-based flux analysis (TFA) and the analysis of the biomass building blocks

To determine fluxes and subsequently the growth rate at different growth and environmental conditions using iPsvr, we employed FBA, the most widely used constraint-based modeling technique for studying biochemical networks and cellular physiology. Previous studies show that the integration of appropriate thermodynamic constraints leads to more accurate metabolic model predictions and also a significant reduction in the ranges of the predicted fluxes (solution space).8,31 We perform TFA for estimating the growth rate before and after the integration of omics data.

We further identified the biomass building blocks (BBB), wherein a low/zero production limits growth upon the transition of cells from the exponential to stationary phase or from liquid to the soil environment (two examples that were discussed in this work). Each BBB was tested by defining the TFA objective function as the maximum production of that metabolite under the defined media condition (the same for all the BBBs). Then, the BBB production was compared and the limiting BBBs for each case were identified.