In silico method for modelling metabolism and gene product expression at genome scale

Lerman, Joshua A.; Hyduke, Daniel R.; Latif, Haythem; Portnoy, Vasiliy A.; Lewis, Nathan E.; Orth, Jeffrey D.; Schrimpe-Rutledge, Alexandra C.; Smith, Richard D.; Adkins, Joshua N.; Zengler, Karsten; Palsson, Bernhard O.

doi:10.1038/ncomms1928

Download PDF

Article
Open access
Published: 03 July 2012

In silico method for modelling metabolism and gene product expression at genome scale

Joshua A. Lerman¹^na1,
Daniel R. Hyduke¹^na1,
Haythem Latif¹,
Vasiliy A. Portnoy¹,
Nathan E. Lewis¹,
Jeffrey D. Orth¹,
Alexandra C. Schrimpe-Rutledge²,
Richard D. Smith²,
Joshua N. Adkins²,
Karsten Zengler¹ &
…
Bernhard O. Palsson¹

Nature Communications volume 3, Article number: 929 (2012) Cite this article

15k Accesses
189 Citations
18 Altmetric
Metrics details

Subjects

Abstract

Transcription and translation use raw materials and energy generated metabolically to create the macromolecular machinery responsible for all cellular functions, including metabolism. A biochemically accurate model of molecular biology and metabolism will facilitate comprehensive and quantitative computations of an organism's molecular constitution as a function of genetic and environmental parameters. Here we formulate a model of metabolism and macromolecular expression. Prototyping it using the simple microorganism Thermotoga maritima, we show our model accurately simulates variations in cellular composition and gene expression. Moreover, through in silico comparative transcriptomics, the model allows the discovery of new regulons and improving the genome and transcription unit annotations. Our method presents a framework for investigating molecular biology and cellular physiology in silico and may allow quantitative interpretation of multi-omics data sets in the context of an integrated biochemical description of an organism.

The ETFL formulation allows multi-omics integration in thermodynamics-compliant metabolism and expression models

Article Open access 13 January 2020

Pierre Salvy & Vassily Hatzimanikatis

Local flux coordination and global gene expression regulation in metabolic modeling

Article Open access 14 September 2023

Gaoyang Li, Li Liu, … Huansheng Cao

Reconstructing organisms in silico: genome-scale models and their emerging applications

Article 21 September 2020

Xin Fang, Colton J. Lloyd & Bernhard O. Palsson

Introduction

A goal of systems biology is to provide comprehensive biochemical descriptions of organisms that are amenable to mathematical enquiry¹. These models may then be used to investigate fundamental biological questions¹, guide industrial strain design² and provide a systems perspective for analysis of the expanding ocean of omics data³. Over the past decade, there has been steady progress in developing genome-scale models of metabolism (M-Models) for basic research and industrial applications^4,5,6. M-Models are stoichiometric representations of the enzymatic and spontaneous biochemical reactions associated with an organism's metabolic network at the genome scale; however, M-Models do not quantitatively describe gene expression (Fig. 1a). The lack of an explicit representation for enzyme production precludes quantitative interpretation of omics data and can result in biologically implausible predictions^7,8. Because M-Models do not contain chemical representations of transcription and translation, to date, it has only been possible to use omics data as ad hoc constraints for enzyme activities^9,10,11,12.

**Figure 1: Genome-scale modelling of metabolism and expression.**

A modelling approach that accounts for the production and degradation of a cell's macromolecular machinery will provide a full genetic basis for every computed molecular phenotype (Fig. 1b). Such computations in turn enable the direct comparison of simulation to omics data and the simulation of variable expression and enzyme activity^13,14. In other words, an integrated model of metabolism and macromolecular expression (ME-Model) affords a genetically consistent description of a self-propagating organism at the molecular level and moves us substantially closer to establishing a systems-level quantitative basis for biology.

Here, we developed an ME-Modelling approach for the relatively simplistic microorganism, Thermotoga maritima, which metabolizes a variety of feedstocks into valuable products including H₂ (ref. 15). T. maritima possess a number of characteristics conducive to systems-level investigations of the genotype–phenotype relationship: a compact 1.8-Mb genome¹⁶, wealth of structural proteome data¹⁷, a limited repertoire of transcription factors (TFs)¹⁸ and reduced genome organizational complexity compared with other microbes (H.L. et al., Unpublished data). Taken together, T. maritima's small set of TFs and reduced genome complexity impose a narrowed range of viable regulatory and functional states (H.L. et al., unpublished data). The existence of few regulatory states may simplify the addition of synthetic capabilities and facilitate metabolic engineering efforts by reducing unexpected and irremediable side-effects arising from genetic manipulation¹⁹. A combination of metabolic versatility and genomic simplicity make T. maritima a promising candidate for investigating fundamental relationships between molecular and cellular physiology, both in silico and in vivo, and for the creation of a minimal chassis for chemical synthesis²⁰. Our T. maritima ME-Model simulates changes in cellular composition with growth rate, in agreement with previously reported experimental findings^21,22. We observed positive correlations between in silico and in vivo transcriptomes and proteomes for the 651 genes in our ME-Model with statistically significant (P<1×10⁻¹⁵ t-test) Pearson correlation coefficients (PCC) of 0.54 and 0.57, respectively. And, when we used our ME-Model as an exploratory platform for an in silico comparative transcriptomics study, we discovered putative TF-binding motifs and regulons associated with L-arabinose (L-Arab) and cellobiose metabolism, and improved functional and transcription unit (TU) architecture annotation. Overall, ME-Models provide a chemically and genetically consistent description of an organism, thus they begin to bridge the gap currently separating molecular biology and cellular physiology.

Results

Genome-scale modelling of metabolism and expression

We developed a network reconstruction and modelling method that includes macromolecular synthesis and post-transcriptional modifications in addition to metabolism (Fig. 1c; Supplementary Methods). Specifically, our method accounts for the production of TUs, functional RNAs (that is, transfer RNAs (tRNAs), ribosomal RNAs (rRNAs) and so on) and peptide chains, as well as the assembly of multimeric proteins and dilution of macromolecules to daughter cells during growth. Based on available genomic, structural proteomic and biochemical literature we constructed an ME-Model for T. maritima that accounts for the functional activities of 50% of the annotated gene products and, more importantly, mechanistically links these enzyme activities to the genome.

To accurately model self-replicating cells at the molecular level, it is necessary to account for material dilution during cell division as a result of volume doubling, and to provide limits on the number of proteins that may be translated from an messenger RNA before the mRNA decays or is transmitted to a daughter cell. To approximate dilution of transcripts and proteins to daughter cells and prevent infinite translation of peptides from an mRNA, we devised a series of coupling constraints (Fig. 1d; Supplementary Methods). These constraints effectively provide upper limits on enzyme expression and activity and are a function of the organism's doubling time (T_d). These coupling constraints may be tuned for specific mRNAs or enzymes if their, respective, degradation rates or catalytic turnover constants (k_cat) are known.

Applications of M-Models often involve simulating log-phase cellular growth using flux balance analysis (FBA)^23,24. The organism's gross lipid, nucleotide, amino acid (AA) and cofactors, as well as growth-associated and maintenance ATP usage, are experimentally measured. Then, these measurements are integrated with the organism's T_d to define a biomass reaction that approximates the dilution of cellular materials during formation of daughter cells. However, cellular composition is known to vary as a function of T_d and medium²¹—with Schaechter et al. indicating that T_d is more influential than growth medium.

Our ME-Model explicitly describes transcription, translation and the dilution of gene products to daughter cells, thus it is unnecessary to use a gross biomass production reaction when simulating growth. Instead, ME-Models contain a structural reaction that accounts for the dilution of structural materials (that is, DNA, cell wall, lipids and so on) during division and the energy cost associated with cellular maintenance of the structure (Supplementary Table S1). Conceptually, this structural reaction approximates the production of a cell whose composition varies as a function of environment and growth rate (Fig. 2a).

**Figure 2: Comparison of M- and ME-Models objective functions and assumptions.**

Molecularly efficient simulation of cellular physiology

The RNA-to-protein mass ratio (r) has been observed to increase as a function of specific growth rate (μ)^21,22 and decreases as a function of translation efficiency²². Schaechter et al. also observed an increase in the number of ribonucleoprotein particles with increasing μ, whereas the translation rate per ribonucleoprotein particle was relatively constant²¹. The increase in r and ribonucleoproteins may be due to the reduced number of translation events mediated by a ribosome as T_d decreases.

To ascertain whether our ME-Model recapitulated the observed increases in r, ribosomal RNA and proteins with increasing μ, we simulated a range of growth rates in a defined minimal medium²⁵ (Supplementary Table S2). To simulate the molecular physiology of T. maritima for a particular μ, we used FBA²⁴ subject to linear programming optimization²⁶ to identify the minimum ribosome production rate required to support a given μ (Fig. 2b). Ribosome production has been shown to be linearly correlated with growth rate in Escherichia coli^22,27,28. Assuming that efficient use of enzymes contributes to the fitness of an evolutionarily adapted lineage²⁹, we would expect a successful organism to produce the minimal amount of ribosomes required to support expression of the proteome.

Consistent with experimental observations^21,22, our ME-Model simulated an increase in r with increasing μ and with decreasing translation efficiency (Fig. 3a). We observed that the fraction of the transcriptome associated with ribosomal RNA in silico increased with μ (Fig. 3b). In addition, the ribosomal proteins account for a larger proportion of the total proteome as μ increases (Fig. 3c). These results indicate that it is possible to mechanistically model changes in cellular physiology that have only recently yielded to phenomenological modelling²².

**Figure 3: Simulation of variable cellular composition and efficient use of enzymes.**

With M-Models, the cellular macromolecular composition is constant, ergo they cannot reproduce the observed increases in r or ribosomes with increasing μ. Although it is possible to empirically determine a relationship between gross biomass composition and μ and then use this relationship to study variable composition in M-Models³⁰, the M-Models will compute a solution space where the range of activity for a number of enzymes may be rather broad and even infinite⁷, if not specifically constrained. The biologically implausible sections of the M-Model solution space are due, in large part, to unconstrained thermodynamically infeasible internal loops that can operate at an arbitrary flux level⁸. These arbitrary activities contradict previous observations that efficient organisms should maintain a minimal total flux through their biochemical network^29,31.

By explicitly accounting for enzyme expression and activity, ME-Model simulations should identify the set of proteins that will result in optimally efficient conversion of growth substrates into cells. To determine whether our ME-Model was more economic in terms of enzyme usage than the M-Model, we compared our ME-Model simulation to a random sampling of the M-Model solution space⁷. After we fit a normal distribution to the sampled M-Model space, we found that there is a small (2.1×10⁻⁵) probability of finding an M-Model solution as efficient as the ME-Model solution (Fig. 3d). Because ME-Models explicitly account for the costs of enzyme expression and dilution to daughter cells, the most efficient growth simulations will minimize the materials required to assemble the cell; that is, ME-Models will efficiently use enzymes when simulating a μ.

To compare the range of permissible, that is, computationally feasible, activity for each metabolic enzyme in the ME-Model versus the M-Model, we performed flux variability analysis. Flux variability analysis identifies the flux range that each reaction may carry given that the model must also simulate the specified objective value, such as μ, with a set tolerance. The permissible enzyme activities for simulating efficient growth with a 1% tolerance tended to have smaller ranges in the ME-Model compared with the M-Model (Fig. 3e; Supplementary Data 1), highlighting the sharply reduced flexibility in the ME-Model solution space when simulating optimal growth.

Our ME-Model contains gene products that carry out 142 of the 206 functions estimated as essential for a minimal organism³², whereas the M-Model contains only 65 of these core functions. With the ME-Model, 120 of the 142 functions were essential for ribosome production, whereas only 23 of the 65 functions in the M-Model were essential for biomass production (Supplementary Data 2). This broader coverage of cellular functions means that ME-Models may be used for in silico investigations of phenotypic states that are inaccessible to M-Models.

Gene product production and turnover alters pathway activity

In addition to simulating variable cellular composition and effectively eliminating the infinite catalysis problem, there are a number of metabolic activities that are required for optimally efficient growth with the ME-Model but not with the M-Model (Fig. 4). These differences are due to the ME-Model producing small metabolites as by-products of gene expression and explicitly accounting for the material and energy costs of macromolecule production and turnover. The ME-Model includes metabolic activities for recycling S-adenosylhomocysteine, which is a by-product of rRNA and tRNA methylation, and guanine, which is a by-product of queuosine modification of various tRNAs (Fig. 4a). The ME-Model, also, produces CTP from CMP that is produced during mRNA degradation (Fig. 4b). Interestingly, the M-Model does not require CDP production to simulate growth, whereas CDP production is essential in the ME-Model. The ME-Model exhibits frugality with respect to central metabolic reactions (Fig. 4c) and proposes the canonical gylcolytic pathway during efficient growth, whereas the M-Model indicates that alternate pathways are as efficient. When the efficiency requirement is relaxed these less-efficient pathways may be active in the ME-Model solution space (Supplementary Data 1). The genes associated with optimal activities tended to be strongly expressed (approximately 60th–90th percentile) in transcriptome data.

**Figure 4: Metabolic reactions required for efficient growth with the ME-Model but not the M-Model.**

These differences highlight the interplay between macromolecular synthesis and degradation, metabolism and salvage, and optimal use of the proteome. The ME-Models allow a fine resolution view of these processes and their simultaneous reconciliation. Not only can one analyse specific pathways in isolation, such as the three examples given above, but it is now possible to investigate in detail the coordination of functions within an organism's biochemical repertoire.

Simulation of systems-level molecular phenotypes

To assess our ME-Model's ability to simulate systems-level molecular phenotypes, we compared model predictions to substrate consumption, product secretion, AA composition, transcriptome and proteome measurements. With the only external constraints for the ME-Model being the experimentally determined μ during log-phase growth in maltose minimal medium at 80 °C, our model accurately predicted maltose consumption and acetate and H₂ secretion (Fig. 5a; Supplementary Table S3). Predicted AA incorporation was linearly correlated (0.79 PCC; P<4.1×10⁻⁵ t-test) with measured AA composition (Fig. 5b).

**Figure 5: The ME-Model accurately simulates molecular phenotypes during log-phase growth.**

FBA simulates reaction fluxes, whereas transcriptomics and proteomics technologies provide semiquantitative measurements of expressed gene product abundance. Thus, the simulated fluxes through the transcriptome and proteome do not directly approximate the respective omics measurements; however, for macromolecules there should be a positive correlation between gene and protein synthesis fluxes and the respective gene product abundances during log-phase growth. In other words, proteins and genes are relatively stable and when an organism is growing at steady state a relative increase in expression rate for a protein will effectively increase the quantity of that protein.

Interestingly, when we compared the simulated transcriptome and proteome fluxes to transcriptome and proteome measurements, respectively, there were statistically significant (P<2.2×10⁻¹⁶ t-test) positive correlations for both the transcriptome (0.54 PCC; Fig. 5c) and the proteome (0.57 PCC; Fig. 5d). This degree of concordance was unexpected because the model does not account for transcriptional regulation or transcript-specific RNA degradation rates. However, this concordance may be the result of our simulation objective being aligned with T. maritima's regulatory programme, whereas a decreased concordance would be expected if the regulatory network was responding to a stress. We have previously observed a tendency to increase the expression of metabolically efficient pathways, and decrease inefficient alternatives, by E. coli after adaptive evolution under growth selection pressure³¹. Also, we have observed that T. maritima's genome is highly active with >89% of the protein-coding genes expressed in diverse conditions (H.L. et al., Unpublished data), which could indicate a general eschewal of complex and expensive circuitry within the global regulatory strategy.

Approximately 30% of T. maritima's genome is not functionally annotated and 50% of the functionally annotated genes fall outside of the scope of our ME-Model. A number of genes not accounted in our model were expressed in vivo (Supplementary Fig. S1), and the costs of their expression as well as their functional activities may contribute to the differences between simulation and measurement. In addition, unknown regulatory features might be responsible for irregularities observed when comparing simulation to the measurement. For instance, ribosomal RNAs and proteins are expected to be expressed at stoichiometric ratios, as occurs with the simulation, yet there is sizable variability in their measured values (Fig. 5c,d, blue colouring). These results illustrate that it is possible to sketch a molecular description of a replicating organism solely from simple, but stoichiometrically accurate, chemical equations represented on a genome scale.

In silico gene expression profiling drives discovery

With our ME-Model it is now possible to compute the gene expression profile associated with growth in a specific condition or for a specific mutant. These gene expression profiles may then be compared to identify genes that are likely differentially regulated. The set of differentially expressed in silico genes may then be used to drive biological discovery or improve our model (Fig. 6).

**Figure 6: *In silico* transcriptome profiling drives biological discovery.**

Towards this end, we computed the transcriptome profiles for T. maritima grown in a minimal medium with either L-Arab or cellobiose as the sole carbon source (Fig. 6a). Our computations identified genes that were exclusively expressed and essential for growth with each carbon source. Because these genes are essential for growth on the respective substrate they are conditionally essential genes. Conditionally essential genes are often subject to transcriptional regulation, however, they may be constitutively expressed. To assess whether the genes were differentially expressed in vivo, we measured the transcriptome of T. maritima growing in minimal medium with L-Arab or cellobiose as the carbon source. The genes with the strongest differential expression in vivo were among the set of differentially expressed genes in silico (Fig. 6b) providing supporting evidence for the presence of transcriptional regulation.

Conditionally expressed genes may be regulated by the same TF³³. The presence of a common motif in the promoter regions of a set of genes may indicate regulation by a common TF. To identify potential TF-binding motifs, we scanned the promoter and upstream regions of the in silico differentially expressed genes with MEME (Multiple Expectation Maximum for Motif Elicitation)³⁴. Surprisingly, there was a high-scoring motif for the genes essential for growth on L-Arab and a high-scoring motif for the genes essential for growth on cellobiose (Fig. 6c). The motif found upstream of the L-Arab upregulated genes is similar to the AraR motif from Bacillus subtilis³⁵ (Supplementary Fig. S2). Also, the motif upstream of the cellobiose upregulated genes bears resemblance to catabolite-responsive elements (cres), known to have an important global role in catabolite repression through the binding of the CcpA protein in B. subtilis³⁶. Here, we term the motif the CelR motif, as the regulated genes are involved in cellobiose metabolism. These discoveries highlight how ME-Model simulations can guide discovery of new regulons.

After identifying the putative AraR and CelR motifs, we scanned T. maritima's genome for the presence of other members of the putative regulons. For the nondegenerate AraR motif 5′-GTACGTAC-3′, we identified a single additional instance in an intergenic region upstream of the TU-containing genes TM0277, TM0278 and TM0279 (Fig. 6d). These genes were induced when L-Arab was the carbon source, but not when cellobiose or maltose serves as the carbon source (Supplementary Fig. S3). L-Arab transport is an orphaned activity in our model, which means that T. maritima may import L-Arab, however, the responsible loci are not known. When we examined these genes using the SEED RAST server³⁷, TM0278 and TM0279 were classified as permeases of an ABC transporter putatively involved in L-Arab utilization, whereas TM0277 was not classified because it was annotated as containing an authentic frameshift³⁸. Recent resequencing of T. maritima's genome (H.L. et al., Unpublished data) refute the initial annotation that TM0277 contains a frameshift mutation; and the SEED RAST annotation for TM0277 is a predicted sugar-binding protein for an arabinoside ABC transporter. Interestingly, the TUs containing ABC transporters for maltose and chitobiose are organized in the same manner: a binding protein followed by two permeases. The presence of the AraR motif, the strong upregulation of the TM0277/TM0278/TM0279 TU in response to L-Arab in vivo, the SEED RAST classification and resequenced genome strongly suggest that we have identified a functional L-Arab transport system in this organism. This discovery illustrates how in silico molecular biology at the genome scale can be used to expand regulons and improve genome annotation.

When we scanned T. maritima's genome for matches to the degenerate CelR motif TGWAAAYRTTTWCA, the promoter regions of TUs associated with cellobiose metabolism were identified. Interestingly, the promoter region of the TU-containing TM1222, TM1221, TM1220, TM1219 and TM1218 did not contain a CelR motif (Fig. 6c,d). TM1222, TM1221, TM1220 and TM1219 encode for a cellobiose ABC transporter, while TM1218 is annotated as a LacI family transcription regulator. However, the promoter region of the TU for TM1233, which is directly upstream of TM1222, contains the CelR motif. TM1233 encodes for the cellobiose-binding protein that facilitates cellobiose transport. In the TU architecture of our model, there was a predicted Rho-independent terminator following TM1223 that resulted in a new TU starting with TM1222. However, no promoter was detected in the intergenic region between TM1223 and TM1222 using PromBase³⁹. This result leads us to believe that the initial assignment of TM1223 and TM1222 to separate TUs was incorrect (Fig. 6d). The presence of the cellobiose transport system in the updated TU, the strong CelR motif and the annotation of TM1218 as a TF suggest that TM1218 may encode for CelR.

Discussion

Our ME-Modelling approach represents a fundamental advance in the evolution of genome-scale biochemical models of life and significantly broadens the scope of microbial systems biology. It is now possible to ask systems-level questions in silico beyond metabolism and quantitatively analyse, in a bottom-up and mechanistic manner, a variety of omics data in the context of a growing organism. For instance, we can use a systems perspective to identify the minimal number of genes required to support homeostasis and replication—120 of the 142 of the proposed minimal bacterial genome³² were essential for ribosome production in maltose minimal medium (Supplementary Data 2).

Not only can ME-Models predict global phenotypes that are traditionally employed with M-Models, such as maximal growth rate in a defined medium, but they can also be used to calculate whether the system has any material and energy reserves available for ancillary functions. For example, the measured maltose consumption rate was greater than the one that we calculated for economically efficient growth (Fig. 5a). This discrepancy between measurement and simulation could indicate that T. maritima does not strive for economic efficiency or represent the portion of sugar used to support the activities of the unannotated genes or regulatory circuitry. Given that the expression levels for the gene products associated with the more efficient pathways were highly expressed (Fig. 4c), we are disposed towards the latter. Although the ME-Model does not account for regulatory events, the presence of a strong discordance between simulation and measurement would indicate that factors other than economic efficiency are influencing the expressome, thus informing hypothesis generation. For example, if a more expensive isozyme was expressed in vivo than in silico, then it would be possible to estimate the improvement in k_cat required for the expensive isozyme to offset its higher materials and energy costs.

Technological advances have contributed to an expanding ocean of omics data that has been under-explored³. Omics data have been under-analysed, in part, due to the lack of a mechanistic systems-level framework for analysing myriad molecular components in the context of cellular physiology. To date, with the notable exception of C13 metabolic flux analysis, it has only been possible to perform indirect comparative analysis between omics data and M-Models³¹ or to neglect the complexity of the genotype–phenotype relationship and use omics data as ad hoc constraints for M-Model enzyme activities^9,10,11,12. Because ME-Models explicitly represent gene expression, directly investigating omics data in the context of the whole is now feasible.

Viewing multi-omics data in the context of biochemically and genomically consistent ME-Models may allow us to extract more value from legacy and future omics data. Comparing in silico and in vitro transcriptomes, or proteomes, can highlight under-explored areas of molecular biology. For example, a set of genes highly expressed in silico but not expressed in vivo may indicate the presence of transcriptional regulation. Differential expression of a class of genes may indicate incompleteness in our knowledge of how those gene products interact or allude to, heretofore unknown, moonlighting functions. For instance, in the case of ribosomal proteins (Fig. 5c,d, blue) the model predicts uniform expression, whereas omics measurements exhibit variability. The model was designed based on evidence that ribosomal protein synthesis is highly coordinated⁴⁰, and does not account for feedback circuits affecting degradation rates that have yet to be fully elucidated^40,41.

Although there is a positive correlation between the simulated transcriptome fluxes and semiquantitative transcriptome data there was still a substantial amount of dispersion (Fig. 5c). When comparing in silico and in vivo transcriptome measurements it is important to realize that both are approximations of the transcript levels in an organism, and that omics technologies have been inherently noisy to date⁴². Incomplete knowledge, such as a lack of specific translation efficacy for each protein and degradation rates for each mRNA, and lack of signalling and regulatory circuitry will contribute to deviations from reality by ME-Model simulations. Similarly, probe-binding and sample-labelling efficacies, as well as other technical issues, serve as barriers to absolute quantitative transcriptome measurements⁴³.

Although it is a non-trivial endeavour to identify the source of all variation between the simulated and measured transcriptomes, it is possible to use the ME-Model for comparative transcriptomics approaches similar to two-channel DNA microarray studies. Despite the early technological limitations of DNA microarrays, biological discovery was enabled by performing comparative transcriptomics^44,45,46,47. Transcriptome profiling has been used extensively to identify genes that are differentially regulated as a function of genetics and environment⁴⁴. Analysis of differentially expressed genes has contributed to the identification of gene products responsible for unannotated enzymatic activities⁴⁵. In combination with sequence analysis, differential gene expression data can be used to investigate transcriptional regulation^46,47.

We devised and implemented a workflow for in silico comparative transcriptomics, which resulted in the discovery of new regulons and improved both genome and TU annotation (Fig. 6a–d). The similarities between the comparative transcriptomics in silico (Fig. 6a) and in vivo (Fig. 6b) studies are striking, given the variation observed between the simulated and measured transcriptomes (Fig. 5c)—this emphasizes that, in spite of its shortcomings, the ME-Modelling framework is a powerful tool for biological research.

Finally, ME-Models enable integrated molecular biology on a genome scale while accounting for the metabolic requirements, which partially fulfills the challenge of Project K⁴⁸ and moves us one step closer to a molecular representation of CellMap¹.

Methods

Network reconstruction procedure

The detailed procedure and formalism are described in detail in the Supplementary Methods. Our method accounts for biochemical reactions associated with transcription of TUs, TU degradation, translation, protein maturation, RNA processing, protein complex formation, ribosomal assembly, rRNA modification, tRNA modification, tRNA charging, aminoacyl-tRNA synthetase charging, charging EF-Tu, cleavage of polycistronic TUs to release stable RNA products, sources, sinks and tRNA activation (EF-TU) as well as metabolism. In our formalism, metabolic reactions are represented as multi-step processes including substrate binding by the enzyme and dissociation of substrate–enzyme complex to enzyme and products. The metabolic content for our reconstruction was based on the previously published model¹⁷, with updates to correct errors and incorporate new data (Supplementary Data 3).

The molecular machinery (for example, proteins, genes, RNAs) involved in macromolecular synthesis were identified from the genome annotation¹⁶, SEED subsystem analysis⁴⁹, comparative genomics analysis of the E. coli model²⁸ and KEGG³⁸. The functions of each of the 159 proteins associated with macromolecular synthesis in T. maritima were determined by primary literature when available. When no primary literature was available, the Uniprot⁵⁰ and SEED⁴⁹ databases were used to infer function by homology. All proteins currently believed to be used for macromolecular synthesis by T. maritima are enumerated in Supplementary Data 4, and 93% of these genes are mechanistically linked in our ME-Model.

The reactions associated with transcription and translation, including initiation, biopolymerization and termination, were generated from the genome sequence and a set of T. maritima template reactions (Supplementary Methods). In our modelling formalism, reversible reactions were represented as two unique reactions: one for the forward direction and one for the reverse.

Protein complexes

For each functional protein, we used primary literature and the RCSB Protein Data Bank⁵¹ to determine whether the machine was a monomer or oligomer. The Protein Data Bank entries provided an opportunity to integrate 3D structural data into our reconstruction (this model includes structures for 32 additional open reading frames compared with Zhang et al.). When data for multimeric state were unavailable for a protein of interest, state data for orthologs from closely related organisms were used; otherwise, the Uniprot database⁵⁰ was consulted. In the absence of data providing insight into the multimeric state of the protein, we assumed that the functional protein was a monomer.

Genetic code determination

From inspection of tRNA sequences and structures downloaded from the transfer RNA database⁵², we determined that T. maritima uses uniform-GUC decoding with only 46 tRNA genes (see Supplementary Data 5). In both Archaea and Bacteria, but not in Eukarya, the conversion of C34 of a CAU-anticodon to lysidine (k2C) or analogue generates an anticodon for isoleucine⁵³. TMtRNA-Met-2 was assigned this role based on a strong sequence alignment to E. coli tRNAs containing k2C. The T. maritima genome encodes two additional tRNA genes with CAU anticodons, TMtRNA-Met-1 and TMtRNA-Met-3. Based on structural similarity⁵⁴ to those found in a crystal structure of E. coli's formyl-methionyl-tRNAfMet⁵⁵, TMtRNA-Met-1 may be involved in translation initiation, therefore, TMtRNA-Met-3 was designated to participate in translation elongation.

TU architecture determination

We assembled a draft TU architecture (Supplementary Data 6) for T. maritima based on a series of rules (Supplementary Methods). In short, we assumed all TUs start with a gene start and proceed until one of the following conditions is met: (1) two genes are found in convergent orientation on different strands, (2) two genes are found in divergent orientation on different strands, (3) a high-confidence Rho-independent transcription terminator is found separating two genes oriented in series on the same strand, (4) more than 55 base pairs separate two genes in series on the same strand or (5) experimental evidence indicates a TU boundary. Finally, to reflect the possibility of internal transcription start sites in TUs reconstructed using the rules above, we added an additional TU in cases where a high-confidence promoter was found in the region separating two genes oriented in series on the same strand.

In silico molecular biology

Log-phase growth simulations were performed using FBA²⁴. Linear programming was used to identify the maximum μ or minimum ribosome production flux supporting a particular μ from the components of the in silico minimal media. Because of the presence of fast (metabolic) and slow (macromolecular synthesis) timescale reactions, the parameters in the ME-Model span a wide range that can result in inaccurate simulations due to floating point limitations of currently available floating point linear programming software (Supplementary Methods). To remove the possibility of simulation results being artefacts arising from floating point limitations, we used the exact simplex routines available in the QSopt_ex package²⁶, with default parameter settings for ME-Model simulations. The predicted transcription level of a gene was determined by summing across the sink fluxes of TUs containing the gene, which is equivalent to the transcription fluxes less the TU degradation fluxes. Translation levels were reported as the sum across the relevant translation initiation fluxes, as many TUs can contribute to the production of a given protein. These values were compared with each other in the case of simulated nutrient shifts or to the abundances reported experimentally.

In vivo methods

T. maritima MSB8 (ATCC: 43589) was grown in 500 ml serum bottles containing 200 ml of anoxic minimal media with 10 mM maltose, L-arabinose or cellobiose as the sole carbon source at 80 °C. All samples were collected during log-phase growth. Substrate uptake and by-product secretion rates, compositional analyses, and transcriptome and proteome measurements were performed as described in the Supplementary Methods. Transcriptome data have been submitted to the NCBI Gene Expression Omnibus (accession ID: GSE28822) and processed values are in Supplementary Data 7. Proteomics data are available through Pacific Northwest National Laboratory (http://omics.pnl.gov) and processed values are in Supplementary Data 8.

RNA modifications

A variety of post-transcriptional modifications of rRNAs are represented in our model. For 16S rRNA, there was experimental evidence for ten modifications⁵⁶ in this organism (Supplementary Table S4). The locations of pseudouridines, which are mass silent, were not available, but an 11th modification, U to Y at position 516, was included in the reconstruction based on the fact that it is well conserved in bacteria and the alignment (Supplementary Data 9) supports its inclusion. An unusual derivative of cytidine-designated N-330 has been sequenced to position 1,404 (ref. 56) in the decoding region of the 16S rRNA. This modified nucleoside was excluded from the reconstruction as the exact chemical composition of the modification is unknown. We were unable to find organism-specific literature supporting modifications to the 5S and the 23S rRNA. Modifications to 5S rRNA are infrequent in bacteria⁵⁷. Attempting to extrapolate 23S rRNA modifications from E. coli was relatively unsuccessful as alignment via ClustalW2⁵⁸ showed significant differences near many of the putative modification sites (Supplementary Data 10). The alignment reveals that the 23S rRNA of T. maritima is significantly longer (>100 bp) than that of E. coli. Only three proteins with annotated roles in modifying the 23S rRNA were added to the model for a total of six modifications (Supplementary Table S5). Those were TM0940, TM0462 and TM1715.

Post-transcriptional modification of tRNA also requires a significant investment in genes, enzymes, substrates and energy⁵⁹. We included a variety of modifications (Supplementary Table S6) in our model based on bioinformatics predictions and literature evidence (Supplementary Table S7).

Sensitivity analysis

To explore the influence of some of the newly introduced parameters on model output, the bulk parameters used for the coupling constraints (Supplementary Methods) were varied (two-, four- and eight-fold increases and decreases away from the parameter set used). The results are summarized in Supplementary Fig. S4.

File formats

Our final model is available as a Systems Biology Markup Language (SBML) XML file (Supplementary Data 11). The model is also available as an LP file (Supplementary Data 12) for use with linear programming solvers.

Additional information

Accession codes: Transcriptome data have been submitted to the NCBI Gene Expression Omnibus under accession code GSE28822.

How to cite this article: Lerman, J.A. et al. In silico method for modelling metabolism and gene product expression at genome scale. Nat. Commun. 3:929 doi: 10.1038/ncomms1928 (2012).

Accession codes

Primary accessions

Gene Expression Omnibus

GSE28822

Referenced accessions

Gene Expression Omnibus

GSE28822

References

Brenner, S. Sequences and consequences. Philos. Trans. R. Soc. Lond. B. Biol. Sci. 365, 207–212 (2010).
Article Google Scholar
Otero, J. M. & Nielsen, J. Industrial systems biology. Biotechnol. Bioeng. 105, 439–460 (2010).
Article CAS Google Scholar
Palsson, B. & Zengler, K. The challenges of integrating multi-omic data sets. Nat. Chem. Biol. 6, 787–789 (2010).
Article Google Scholar
Mahadevan, R., Palsson, B. O. & Lovley, D. R. In situ to in silico and back: elucidating the physiology and ecology of Geobacter spp. using genome-scale modelling. Nat. Rev. Microbiol. 9, 39–50 (2011).
Article CAS Google Scholar
Feist, A. M. & Palsson, B. O. The growing scope of applications of genome-scale metabolic reconstructions using Escherichia coli. Nat. Biotechnol. 26, 659–667 (2008).
Article CAS Google Scholar
Oberhardt, M. A., Palsson, B. O. & Papin, J. A. Applications of genome-scale metabolic reconstructions. Mol. Syst. Biol. 5, 320 (2009).
Article Google Scholar
Reed, J. L. & Palsson, B. O. Genome-scale in silico models of E. coli have multiple equivalent phenotypic states: assessment of correlated reaction subsets that comprise network states. Genome Res. 14, 1797–1805 (2004).
Article CAS Google Scholar
Schellenberger, J., Lewis, N. E. & Palsson, B. O. Elimination of thermodynamically infeasible loops in steady-state metabolic models. Biophys. J. 100, 544–553 (2011).
Article CAS Google Scholar
Akesson, M., Forster, J. & Nielsen, J. Integration of gene expression data into genome-scale metabolic models. Metab. Eng. 6, 285–293 (2004).
Article CAS Google Scholar
Becker, S. A. & Palsson, B. O. Context-specific metabolic networks are consistent with experiments. PLoS Comput. Biol. 4, e1000082 (2008).
Article ADS MathSciNet Google Scholar
Colijn, C. et al. Interpreting expression data with metabolic flux models: predicting Mycobacterium tuberculosis mycolic acid production. PLoS Comput. Biol. 5, e1000489 (2009).
Article MathSciNet Google Scholar
Shlomi, T., Cabili, M. N., Herrgard, M. J., Palsson, B. O. & Ruppin, E. Network-based prediction of human tissue-specific metabolism. Nat. Biotechnol. 26, 1003–1010 (2008).
Article CAS Google Scholar
Allen, T. E. & Palsson, B. O. Sequence-based analysis of metabolic demands for protein synthesis in prokaryotes. J. Theor. Biol. 220, 1–18 (2003).
Article CAS MathSciNet Google Scholar
Thiele, I. Dissertation: A Stoichiometric Model of Escherichia coli's Macromolecular Synthesis Machinery and its Integration with Metabolism (ProQuest, Ann Arbor, MI, 2008).
Schröder, C., Selig, M. & Schönheit, P. Glucose fermentation to acetate, CO2, and H2 in the anaerobic hyperthermophilic eubacterium Thermotoga maritima: involvement of the Embden-Meyerhof pathway. Arch. Microbiol. 161, 460–470 (1994).
Google Scholar
Nelson, K. E. et al. Evidence for lateral gene transfer between Archaea and bacteria from genome sequence of Thermotoga maritima. Nature 399, 323–329 (1999).
Article CAS ADS Google Scholar
Zhang, Y. et al. Three-dimensional structural view of the central metabolic network of Thermotoga maritima. Science 325, 1544–1549 (2009).
Article CAS ADS Google Scholar
Kummerfeld, S. K. & Teichmann, S. A. DBD: a transcription factor prediction database. Nucleic Acids Res. 34, D74–81 (2006).
Article CAS Google Scholar
Andrianantoandro, E., Basu, S., Karig, D. K. & Weiss, R. Synthetic biology: new engineering rules for an emerging discipline. Mol. Syst. Biol. 2, 2006.0028 (2006).
Article Google Scholar
Vickers, C. E., Blank, L. M. & Kromer, J. O. Grand challenge commentary: Chassis cells for industrial biochemical production. Nat. Chem. Biol. 6, 875–877 (2010).
Article CAS Google Scholar
Schaechter, M., Maaloe, O. & Kjeldgaard, N. O. Dependency on medium and temperature of cell size and chemical composition during balanced grown of Salmonella typhimurium. J. Gen. Microbiol. 19, 592–606 (1958).
Article CAS Google Scholar
Scott, M., Gunderson, C. W., Mateescu, E. M., Zhang, Z. & Hwa, T. Interdependence of cell growth and gene expression: origins and consequences. Science 330, 1099–1102 (2010).
Article CAS ADS Google Scholar
Feist, A. M. & Palsson, B. O. The biomass objective function. Curr. Opin. Microbiol. 13, 344–349 (2010).
Article CAS Google Scholar
Orth, J. D., Thiele, I. & Palsson, B. O. What is flux balance analysis? Nat. Biotechnol. 28, 245–248 (2010).
Article CAS Google Scholar
Rinker, K. D. & Kelly, R. M. Growth physiology of the hyperthermophilic Archaeon Thermococcus litoralis: development of a sulfur-free defined medium, characterization of an exopolysaccharide, and evidence of biofilm formation. Appl. Environ. Microbiol. 62, 4478–4485 (1996).
CAS PubMed PubMed Central Google Scholar
Applegate, D. L., Cook, W., Dash, S. & Espinoza, D. G. Exact solutions to linear programming problems. Operations Res. Lett. 35, 693–699 (2007).
Article MathSciNet Google Scholar
Gupta, R. S. & Schlessinger, D. Coupling of rates of transcription, translation, and messenger ribonucleic acid degradation in streptomycin-dependent mutants of Escherichia coli. J. Bacteriol. 125, 84–93 (1976).
CAS PubMed PubMed Central Google Scholar
Thiele, I., Jamshidi, N., Fleming, R. M. & Palsson, B. O. Genome-scale reconstruction of Escherichia coli's transcriptional and translational machinery: a knowledge base, its mathematical formulation, and its functional characterization. PLoS Comput. Biol. 5, e1000312 (2009).
Article ADS MathSciNet Google Scholar
Holzhutter, H. G. The principle of flux minimization and its application to estimate stationary fluxes in metabolic networks. Eur. J. Biochem. 271, 2905–2922 (2004).
Article Google Scholar
Pramanik, J. & Keasling, J. D. Stoichiometric model of Escherichia coli metabolism: incorporation of growth-rate dependent biomass composition and mechanistic energy requirements. Biotechnol. Bioeng. 56, 398–421 (1997).
Article CAS Google Scholar
Lewis, N. E. et al. Omic data from evolved E. coli are consistent with computed optimal growth from genome-scale models. Mol. Syst. Biol. 6, 390 (2010).
Article Google Scholar
Gil, R., Silva, F. J., Pereto, J. & Moya, A. Determination of the core of a minimal bacterial gene set. Microbiol. Mol. Biol. Rev. 68, 518–537 (2004).
Article CAS Google Scholar
Browning, D. F. & Busby, S. J. The regulation of bacterial transcription initiation. Nat. Rev. Microbiol. 2, 57–65 (2004).
Article CAS Google Scholar
Bailey, T. L. et al. MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res. 37, W202–8 (2009).
Article CAS Google Scholar
Franco, I. S., Mota, L. J., Soares, C. M. & de Sa-Nogueira, I. Probing key DNA contacts in AraR-mediated transcriptional repression of the Bacillus subtilis arabinose regulon. Nucleic Acids Res. 35, 4755–4766 (2007).
Article CAS Google Scholar
Miwa, Y., Nakata, A., Ogiwara, A., Yamamoto, M. & Fujita, Y. Evaluation and characterization of catabolite-responsive elements (cre) of Bacillus subtilis. Nucleic Acids Res. 28, 1206–1210 (2000).
Article CAS Google Scholar
Aziz, R. K. et al. The RAST Server: rapid annotations using subsystems technology. BMC Genom. 9, 75 (2008).
Article Google Scholar
Kanehisa, M. & Goto, S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28, 27–30 (2000).
Article CAS Google Scholar
Rangannan, V. & Bansal, M. PromBase: a web resource for various genomic features and predicted promoters in prokaryotic genomes. BMC Res. Notes 4, 257 (2011).
Article CAS Google Scholar
Dennis, P. P. In vivo stability, maturation and relative differential synthesis rates of individual ribosomal proteins in Escherichia coli B/r. J. Mol. Biol. 88, 25–41 (1974).
Article CAS Google Scholar
Singer, P. & Nomura, M. Stability of ribosomal protein mRNA and translational feedback regulation in Escherichia coli. Mol. Gen. Genet. 199, 543–546 (1985).
Article CAS Google Scholar
Ji, H. & Liu, X. S. Analyzing 'omics data using hierarchical models. Nat. Biotechnol. 28, 337–340 (2010).
Article CAS Google Scholar
Canales, R. D. et al. Evaluation of DNA microarray results with quantitative gene expression platforms. Nat. Biotechnol. 24, 1115–1122 (2006).
Article CAS Google Scholar
Schena, M., Shalon, D., Davis, R. W. & Brown, P. O. Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270, 467–470 (1995).
Article CAS ADS Google Scholar
Kharchenko, P., Vitkup, D. & Church, G. M. Filling gaps in a metabolic network using expression information. Bioinformatics 20 (Suppl 1), i178–85 (2004).
Article CAS Google Scholar
Sabatti, C., Rohlin, L., Oh, M. K. & Liao, J. C. Co-expression pattern from DNA microarray experiments as a tool for operon prediction. Nucleic Acids Res. 30, 2886–2893 (2002).
Article CAS Google Scholar
Rhodius, V. A. & LaRossa, R. A. Uses and pitfalls of microarrays for studying transcriptional regulation. Curr. Opin. Microbiol. 6, 114–119 (2003).
Article CAS Google Scholar
Crick, F. Project K: The Complete Solution of E. coli. Perspect. Biol. Med. 17, 67–70 (1973).
Article Google Scholar
Overbeek, R. et al. The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes. Nucleic Acids Res. 33, 5691–5702 (2005).
Article CAS Google Scholar
Wu, C. H. et al. The Universal Protein Resource (UniProt): an expanding universe of protein information. Nucleic Acids Res. 34, D187–91 (2006).
Article CAS Google Scholar
Rose, P. W. et al. The RCSB Protein Data Bank: redesigned web site and web services. Nucleic Acids Res. 39, D392–401 (2011).
Article CAS Google Scholar
Juhling, F. et al. tRNAdb 2009: compilation of tRNA sequences and tRNA genes. Nucleic Acids Res. 37, D159–62 (2009).
Article Google Scholar
Tong, K. L. & Wong, J. T. Anticodon and wobble evolution. Gene 333, 169–177 (2004).
Article CAS Google Scholar
Mandal, N., Mangroo, D., Dalluge, J. J., McCloskey, J. A. & Rajbhandary, U. L. Role of the three consecutive G:C base pairs conserved in the anticodon stem of initiator tRNAs in initiation of protein synthesis in Escherichia coli. RNA 2, 473–482 (1996).
CAS PubMed PubMed Central Google Scholar
Schmitt, E., Panvert, M., Blanquet, S. & Mechulam, Y. Crystal structure of methionyl-tRNAfMet transformylase complexed with the initiator formyl-methionyl-tRNAfMet. EMBO J. 17, 6819–6826 (1998).
Article CAS Google Scholar
Guymon, R., Pomerantz, S. C., Ison, J. N., Crain, P. F. & McCloskey, J. A. Post-transcriptional modifications in the small subunit ribosomal RNA from Thermotoga maritima, including presence of a novel modified cytidine. RNA 13, 396–403 (2007).
Article CAS Google Scholar
Szymanski, M., Barciszewska, M. Z., Erdmann, V. A. & Barciszewski, J. 5S Ribosomal RNA Database. Nucleic Acids Res. 30, 176–178 (2002).
Article CAS Google Scholar
Larkin, M. A. et al. Clustal W and Clustal X version 2.0. Bioinformatics 23, 2947–2948 (2007).
Article CAS Google Scholar
Gustilo, E. M., Vendeix, F. A. & Agris, P. F. tRNA's modifications bring order to gene expression. Curr. Opin. Microbiol. 11, 134–140 (2008).
Article CAS Google Scholar

Download references

Acknowledgements

We thank Jan Schellenberger, Daniel Espinoza, Bill Cook and Michael Saunders for invigorating discussions on solving stiff LPs. Heather Mottaz-Brewer for assistance in proteome sample processing. This work was supported in part by the US National Institute of Allergy and Infectious Diseases and the US Department of Health and Human Services through interagency agreement Y1-AI-8401-01, DOE Awards DE-FG02-09ER25917 and DE-FG02-08ER64686. Proteomic analyses were performed in the Environmental Molecular Sciences Laboratory, a US DOE BER national scientific user facility at Pacific Northwest National Laboratory. D.R.H. is supported in part by a Seed Award from the San Diego Center for Systems Biology funded by NIH/NIGMS (GM085764).

Author information

Joshua A. Lerman and Daniel R. Hyduke: These authors contributed equally to this work.

Authors and Affiliations

Department of Bioengineering, University of California–San Diego, PFBH Room 419, 9500 Gliman Drive, La Jolla, California 92093-0412, USA.,
Joshua A. Lerman, Daniel R. Hyduke, Haythem Latif, Vasiliy A. Portnoy, Nathan E. Lewis, Jeffrey D. Orth, Karsten Zengler & Bernhard O. Palsson
Biological Sciences Division, Pacific Northwest National Laboratory, Richland, 99352, Washington, USA
Alexandra C. Schrimpe-Rutledge, Richard D. Smith & Joshua N. Adkins

Authors

Joshua A. Lerman
View author publications
You can also search for this author in PubMed Google Scholar
Daniel R. Hyduke
View author publications
You can also search for this author in PubMed Google Scholar
Haythem Latif
View author publications
You can also search for this author in PubMed Google Scholar
Vasiliy A. Portnoy
View author publications
You can also search for this author in PubMed Google Scholar
Nathan E. Lewis
View author publications
You can also search for this author in PubMed Google Scholar
Jeffrey D. Orth
View author publications
You can also search for this author in PubMed Google Scholar
Alexandra C. Schrimpe-Rutledge
View author publications
You can also search for this author in PubMed Google Scholar
Richard D. Smith
View author publications
You can also search for this author in PubMed Google Scholar
Joshua N. Adkins
View author publications
You can also search for this author in PubMed Google Scholar
Karsten Zengler
View author publications
You can also search for this author in PubMed Google Scholar
Bernhard O. Palsson
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Experiments and simulations were conceived and designed by J.A.L. and D.R.H. J.A.L. and J.O. led the network reconstruction. Transcriptomics experiments were performed by H.L. and V.A.P. Proteomics data were generated by A.C.S.-R. and J.N.A. Peptides were called and mapped by A.C.S.-R., J.N.A. and R.D.S. Data were normalized by J.A.L and D.R.H. The manuscript was written by J.A.L. and D.R.H. with input from H.L., N.E.L., K.Z. and B.O.P.

Corresponding author

Correspondence to Bernhard O. Palsson.

Ethics declarations

Competing interests

A provisional patent application that includes portions of the research described in this manuscript was filed by the University of California, San Diego Technology Transfer Office on May 9, 2012 entitled “METHOD FOR IN SILICO MODELING OF GENE PRODUCT EXPRESSION AND METABOLISM”.

Supplementary information

Supplementary Figures, Tables, Methods and References

Supplementary Figures S1-S4, Supplementary Tables S1-S7, Supplementary Methods and Supplementary References (PDF 658 kb)

Supplementary Data 1

Flux variability analysis results for metabolic genes present in both metabolism and macromolecular expression models. (XLS 285 kb)

Supplementary Data 2

Essentiality analysis of minimal set of biological functions required to support life. (XLS 69 kb)

Supplementary Data 3

Corrections and additions to the published metabolism model. (XLS 34 kb)

Supplementary Data 4

Proteins and complexes represented in the macromolecular expression model. (XLS 69 kb)

Supplementary Data 5

Thermotoga maritima tRNAs and genetic code and comparison with E. coli. (XLS 24 kb)

Supplementary Data 6

Transcription unit architecture used in the macromolecular expression model. (XLS 137 kb)

Supplementary Data 7

Normalized transcriptome profiles. (XLS 547 kb)

Supplementary Data 8

Normalized proteome profile. (XLS 9749 kb)

Supplementary Data 9

CLUSTAL multiple sequence alignment for E. coli and T. maritima 16S rRNA sequences. (TXT 6 kb)

Supplementary Data 10

CLUSTAL multiple sequence alignment for E. coli and T. maritima 23S rRNA sequences. (TXT 12 kb)

Supplementary Data 11

Compressed SBML file of the T. maritima macromolecular expression model with coupling constraints to simulate growth in maltose minimal medium. (ZIP 2346 kb)

Supplementary Data 12

Compressed LP file (LP file format) of macromolecular expression model with coupling constraints to simulate growth in maltose minimal medium. (ZIP 2981 kb)

Rights and permissions

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-nd/3.0/

Reprints and permissions

About this article

Cite this article

Lerman, J., Hyduke, D., Latif, H. et al. In silico method for modelling metabolism and gene product expression at genome scale. Nat Commun 3, 929 (2012). https://doi.org/10.1038/ncomms1928

Download citation

Received: 28 March 2012
Accepted: 28 May 2012
Published: 03 July 2012
DOI: https://doi.org/10.1038/ncomms1928

This article is cited by

Identification of a deep-branching thermophilic clade sheds light on early bacterial evolution
- Hao Leng
- Yinzhao Wang
- Xiang Xiao
Nature Communications (2023)
Data integration across conditions improves turnover number estimates and metabolic predictions
- Philipp Wendering
- Marius Arend
- Zoran Nikoloski
Nature Communications (2023)
Systematizing Microbial Bioplastic Production for Developing Sustainable Bioeconomy: Metabolic Nexus Modeling, Economic and Environmental Technologies Assessment
- Rimjhim Sangtani
- Regina Nogueira
- Bala Kiran
Journal of Polymers and the Environment (2023)
Predicting stress response and improved protein overproduction in Bacillus subtilis
- Juan D. Tibocha-Bonilla
- Cristal Zuñiga
- Karsten Zengler
npj Systems Biology and Applications (2022)
Understanding and mathematical modelling of cellular resource allocation in microorganisms: a comparative synthesis
- Hong Zeng
- Reza Rohani
- Aidong Yang
BMC Bioinformatics (2021)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Introduction

Results

Genome-scale modelling of metabolism and expression

Molecularly efficient simulation of cellular physiology

Gene product production and turnover alters pathway activity

Simulation of systems-level molecular phenotypes

In silico gene expression profiling drives discovery

Discussion

Methods

Network reconstruction procedure

Protein complexes

Genetic code determination

TU architecture determination

In silico molecular biology

In vivo methods

RNA modifications

Sensitivity analysis

File formats

Additional information

Accession codes

Primary accessions

Gene Expression Omnibus

Referenced accessions

Gene Expression Omnibus

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Comments

Search

Quick links