Cellular life is composed of networks of molecular components that are dynamic and adaptive. Life can only exist as a gradient of free energy, and metabolic networks can account for the uptake and excretion of substrates alongside the synthesis of key biomolecules1. Metabolic networks trace the routes that energy travels through cells, but genetic sequences and their associated enzymatic polymers2,3,4 specify which pathways will be constructed, governed, and maintained in any given cell. While a great deal of current research is focused on filling specific knowledge gaps at the biochemical level, studies of the overall cellular network can provide insights into how the details of biochemistry lead to the emergence of life’s foundational properties.

At the cellular level, analysis of metabolic networks has inspired decades of research into biochemical complexity5,6,7,8,9. These analyses have drawn connections between network attributes such as a heavy-tailed degree distributions and general complex behaviors such as resilience, adaptability, and modularity. Following initial studies of metabolic networks, there have been numerous descriptions of heavy-tailed networks at other biological levels such as ecological interactions, including theories of generative mechanisms that are relevant to characterizing the prebiotic world10,11,12,13,14. These generative mechanisms lead to continuous, heavy-tailed distributions consistent with a power law as an asymptotic sampling limit is approached; discontinuous or multimodal distributions are not explicitly prescribed by these physical models.

While the cellular level is usually the fundamental starting point to describe life properties, life manifests at different nested scales (e.g., cell, organism, ecosystem). At the largest biological levels of populations and ecologies, systems show emergent traits as multimodality or even discontinuities in their properties. One of the most studied properties showing discontinuities is size: depending on the ecosystem, organisms spanning certain size ranges will not exist. Examples are found in arboreal forest15, bird16, fish and plankton17, and mammal18 groupings that can exhibit multimodal or discontinuous size distributions. Ecological multimodal aggregations arise from a confluence of dynamic processes consisting of an overlapping combination of both positive feedback (e.g., overabundant energy resources available for consumption at one size or temporal scale but not at others) and negative feedback (e.g., predation pressures such as grazing that can make survival at specific scales extremely challenging, or few and highly variable available resources)19. Discontinuities and modalities in biology also appear to be idiosyncratic; gaps or troughs in one ecosystem do not necessarily map to those found in others, and persistent gaps can nevertheless vary and shift over time20. It is not clear whether these network-level attributes of biology ought to extend downward to the biomolecular scale.

Considering these scale-dependent aspects of biological networks, a compelling possibility is which (if any) topological features associated with complex high-level biological systems can be found at the cellular level. We hypothesized that these features, if they show up within a cellular system, necessitate an explicit description of the processes that shape chemical hierarchical structure. In ecosystems, these processes include mapping trophic relationships and sources of energy input. Therefore, we address this question at the cellular level by building a translation reaction network in Escherichia coli (E. coli), which we have combined with its metabolic and biosynthetic networks (Fig. 1). Translation is a molecular information decoding process that begins with a ribonucleotide sequence as input and a peptide sequence as output, with remarkable robustness, accuracy and adaptability21,22,23,24,25,26,27. The processes that allow translation to occur are carried out through the coordinated actions of numerous ribonucleic and protein polymers, which we refer to as the translation machinery (TM). The decoding process that converts triplets of RNA monomers into the attachment of a specific amino acid residue to a growing peptide sequence is remarkable in that it is the earliest (and possibly only) emergent system that processes discretized information and manifests semantic relationships entirely in chemical terms. The chemosynthetic nature of the TM (i.e., information processing via chemical reactions, as compared to electronic signals in the brain or in digital computers) enables its description as a chemical reaction network, which is amenable to systems-level analyses as has been done for metabolic networks5,28.

Fig. 1: Schematic representation of the integrated cellular chemosynthesis network, composed of translation, polymer biosynthesis and metabolism layers.
figure 1

We built a translation layer (1) that includes all the relevant steps of that process such as initiation, peptidic bond formation, translocation, and termination. Translation generates peptides that appear in a biosynthesis layer (A). The biosynthesis layer describes interactions between polymers and the formation of reported hetero-complex multimers (2) to carry out metabolic functions. Proteins in the biosynthesis layer act as enzymes or transporters enabling reactions (B) that generate energy and building blocks of translation such as ATP, GTP and charged tRNA within the metabolism layer (3). Finally, the free energy and building blocks of metabolism are employed as substrate inputs for translation (C).

Here, we present the translation reaction network that includes the core processes of translation (initiation, elongation, termination), detailed assembly steps required to replicate the ribosome, and other well-documented features of the TM that maintain translation functionality (Fig. 2a). To connect the translation and metabolic networks, we constructed a polymer biosynthesis network that explicitly links enzymatic components catalyzing the metabolic reactions with the mRNA reading and peptide synthesis steps of translation (Fig. 3a). Comprehensive mapping of the connections between translation and metabolism is necessary to investigate context-dependent feedbacks between the cellular environment, substrate/nutrient availability, and ribosomal protein activity.

Fig. 2: Overview of the translation portions of the integrated chemosynthesis network.
figure 2

a A simplified depiction of the ‘canonical’ translation process encoded within the translation network. b A detailed visualization of the complete translation chemical reaction layer using Gephi 0.9.2 and a Force Atlas 2 layout29 (settings: scaling = 5.0, gravity = 1.0, tolerance = 1.0, approximation = 1.2). c A depiction of the 30 S and 50 S component assembly steps that enable ribosomal replication to occur.

Fig. 3: Detailed and simplified visualizations of all reactions and compounds that compose the integrated chemosynthesis network.
figure 3

a Full map with Translation (T) reactions in orange, Polymer Biosynthesis (B) reactions in green, and Metabolic (M) reactions in blue. Compounds (in white) are scaled in size according to their number of connections within the network. Network depicted with Gephi version 0.9.2 using a ForceAtlas2 algorithm29 (settings: scaling = 5.0, gravity = 1.0, tolerance = 1.0, approximation = 1.2). b A simplified depiction of the number of compounds that connect reactions between (directed arrows) and within (circular arrows) each of the layers.


A network description of translation in the cellular context

Translation is a process described in textbooks in great detail, but there are still many unknowns about its function and modes of integration with other cellular systems. By reviewing the literature, we compiled the reactions of translation and generated a bipartite graph represented in Fig. 2b. This TM network includes the core of translation processes (initiation, elongation and termination) together with the assembly and recycling of the ribosome, and the response to amino acid starvation. A feature of this network is the high level of redundancy of some of the reactions, given that translation is based on parallel processes involving different tRNAs.

All chemical reactions for translation and polymer biosynthesis were then integrated with that for metabolism. The chemical object names in each network were screened for consistent annotation, their molecular masses and object classifications were cross-referenced or inferred (i.e., proteins, RNAs, codons, amino acids, energy molecules, and relevant ions). A full map of the integrated network, with node sizes weighted by node degree and reactions color-coded by network layer category is displayed in Fig. 3a. The overall network contains more than 8000 nodes and 24000 links. When nodes are placed using a graph-layout algorithm (Force-Atlas 2 in this case)29, each layer clusters as a separated group of nodes that are mostly distinct from one another. A breakdown of the numbers of objects connected within and between the different layers is displayed through the use of numbered arrows in Fig. 3b.

To understand how their components are connected within and outside each network, we applied Node2Vec30 non-supervised learning to generate multi-dimensional representations (embeddings) as vectors for each node in the network that reflect its neighborhood of connections. We then reduced these multi-dimensional vectors to two dimensions using a t-Distributed Stochastic Neighbor Embedding (t-SNE) dimensionality reduction technique. The resulting embeddings provide a representation of which nodes share a common neighborhood by placing them closer in a unitless 2D plane. The translation and the metabolic layers represent two different domains that can be easily distinguished (Supplementary Fig. 2), while the biosynthesis nodes appear scattered across these two different domains without any distinct clustering. Within the translation network, we find two large clusters of nodes and many smaller groups. Within these smaller groups there are nodes representing each of the amino acid processing cycles and the tRNA-synthetase process. In contrast, we do not find any cluster in the metabolic network containing distinct components of the TM.

Despite the large number of new connections mapped across the integrated network, the sparsity between translation and metabolic layers may be explained in part by the relatively few compounds joining them directly together (25 and 27 to and from each layer, Fig. 3b), which is limited to the molecules providing peptide bond formation energy and substrate molecules for RNA polymers. It seems likely that the reactions that form connections across the metabolism and translation layers might have a central role keeping the overall network together. We contracted reactions and compounds involved in identified modules (e.g., cofactor synthesis, elongation, etc.), and color-coded the resulting network using the betweenness centrality for each node of the contracted network in Supplementary Fig. 3. In this case, this metric represents how often a node is visited when traversing across the different layers of the integrated translation-metabolic network. The network depiction of these data shows a distinct separation between the translation and metabolism layers, with connections across the separation maintained through modules such as tRNA charging providing the raw materials for polypeptide elongation. Together with transport modules that provide the bulk of metabolic precursors for heterotrophic E. coli metabolism, tRNA charging is one of the most central modules of the cell.

Traits of multimodality and heterogeneity at the cellular level

An important attribute of large networks is their degree distribution: the probability of finding nodes with a given number of connections. In bipartite reaction networks, such as the integrated network assembled here, the degree of each compound represents the number of reactions in which it participates. Networks associated with complex systems usually show heterogeneous degree distributions, where a few nodes show a remarkable number of connections, while random networks are characterized by having more homogeneous degree distributions peaking around a mean number of connections14. We use the log-ratios of different statistical fits to determine the most likely degree distribution (exponential for homogeneous networks, power-law for heterogeneous networks). For objects that compose the metabolism and translation layers, their degree distributions follow statistically probable heavy-tailed distributions (Fig. 4b, Table 1), which is not the case of the polymer biosynthesis layer degree distribution (p > 0.05). The only object in the polymer biosynthesis network that is heavily connected to other compounds is a generic node ‘peptide’ (Fig. 3a), which represents intermediate peptide sequences that for various reasons have not completed the termination step, and thus have no distinct chemosynthetic or functional impact on the network. The overall degree distribution of the full network follows a heavy-tailed degree distribution with an exponent around 2.0 (Fig. 4a, Table 1).

Fig. 4: Statistical analysis of the degree and molecular weight data of the integrated chemosynthesis network and each of its layers.
figure 4

a Complementary cumulative density function (CCDF) of node connection plotted versus bins of node connection degree (k) and distribution model comparisons for the complete chemosynthesis network; heterogeneous (exponential) model shown in dashed line, and heterogeneous (power law) model shown in solid line. b CCDF plotted versus node connection degree (k) and distribution model comparisons for the Translation (top), Biosynthesis (middle) and Metabolism (bottom) layers; homogeneous (exponential) model shown in dashed line, and heterogeneous (power law) model shown in solid line, as in Panel A. c Raw scatter data of the connection degree (k) and molecular weight (Da) of all chemical objects in the network (upper panel) and same data depicted as a histogram of molecular weight bins (lower panel), color coded by molecule type. d Raw scatter data of the connection degree (k) and molecular weight (Da) of all chemical objects in the network (upper panel) and same data depicted as a histogram of molecular weight bins (lower panel), color coded by layer.

Table 1 A tabulated comparison of power law and exponential distributions (log-likelihood ratios), significance estimates (p-value according to one-sided log-likelihood test), minimal degree value at which we start fitting the distribution (kmin), and best-fit exponent values (γ) for the complementary cumulative distribution functions of the translation, biosynthesis and metabolism networks.

While the degree of a compound might reflect its relevance in shaping the chemical network, its mass correlates with the amount of energy required for its synthesis; larger molecules require more energy to synthesize than smaller ones31. Our objective is to assess the relationships, if any, between the connection degrees and the masses of the compounds in the network. A multimodal distribution can be discerned in the panels of Fig. 4c (categorized by chemical compound type) and Fig. 4d (categorized by network layer). Mass distribution concentrations (modes) across all network layers occur at around 240, 40,000 and 2,000,000 Daltons (Da). The central regions of these modes contain both the highest concentrations of compounds and the most highly connected compounds from the different layers. There are bins in troughs between the modes without reported molecules, which indicates that the modal mass distribution may also be discontinuous. All modes contain some objects from the TM network; enzymatic polymers that enable metabolism are almost entirely confined to the middle mode; and metabolic compounds compose the two smallest modes. The structural composition of the chemical compounds (small molecules, polypeptides, and polynucleotides) is the likeliest determinant of the mode regardless of the functions they perform or the layer they are assigned to within this network (Fig. 4c, d).


Bioinformatic databases are constructed and annotated in such a way as to enable the high-throughput reconstruction of many biosynthetic steps of metabolism and gene regulation32, and also to reconcile these steps with sequences found within an organism’s genome in an organized pipeline (gene → protein → function33,34). Prior network descriptions and analyses of translation have focused on its dynamic attributes afforded by its autocatalytic motifs35, genetic regulation of metabolic processes36, the links between TM function and organism-level cellular dynamics37 and a mapping of genotype to phenotype useful for interpreting a wealth of -omics cellular data38. The chemosynthetic function of the TM and the topological attributes of the overall network did not figure significantly into these prior efforts.

The integration of translation and metabolism reactions for E. coli describes a naturally occurring chemical network that is hierarchical and spans at least six orders of magnitude of compound mass. Hierarchy is most clearly resolved when degree per compound is plotted against compound mass (Fig. 4c, d, upper panels). By inference, the energy costs associated with synthesizing and replicating the largest TM components outstrip those of enzymes and metabolites by at least 1-2 orders of magnitude. The energy cost per molecular component demonstrates the hierarchical importance of the TM in specifying chemical functions made possible by biopolymers within the cell.

As cellular operation requires a hierarchy of different mass entities, it also involves modules with different connectivity patterns. The embedding analysis depicted in Supplementary Fig. 2 reveals that the metabolic and polymer biosynthesis reactions do not cluster apart from each other in any significant way, but the TM clusters in a distinct region of the graph. The metabolic and TM layers exhibit degree distributions that are likely heterogeneous, but the biopolymer layer does not. Taken together, this would indicate that protein biosynthesis is mostly shaped by the same structural constraints of the intracellular metabolism module and is not likely to exhibit independent (i.e., intra-layer) attributes associated with complexity.

Based on prior observations of biological networks across the hierarchy of life, we articulated three distinct possibilities for the distribution of compounds in the integrated translation network (see Supplementary Fig. 4). Each of these possibilities has distinct, predicted implications that may be compared to the observed mass distribution. In a parallel to ‘trophic hierarchy’, one hypothesized driver of hierarchical connection in cellular biochemistry is the ability to distribute energy and maximize power in a dynamic way across independently functioning layers. For this possibility, each distinct molecular layer is self-organizing and both feeds upon and into the others. The predicted pattern is that each layer contains distinct connectivity peaks, and the overall mass distribution is continuous. In an ‘energy availability’ distribution, the hierarchical structures of polymer biosynthesis and translation are built directly atop, and extensions of, energy pathways enabled by metabolism. The predicted pattern is that though the translation and polymer biosynthesis layers of the network may contain ‘heavy tails’, the bulk of compounds in each layer all coincide with a peak generated by metabolic compounds. In a distribution shaped by ‘molecular distinction’, the association of compounds with a given biofunctional layer is less significant than the compounds’ chemical attributes in the cellular environment. The predicted pattern for this distribution is that multimodality or discontinuity caused by distinct molecular properties would lead to sorting at the network level. Each of these predicted distributions may carry basic implications about the primordial relationship(s) between emergent metabolic, enzymatic and translation functions, and about identifying measurable network attributes that fundamentally distinguish abiotic and biotic systems. We predicted that energy availability would be the dominant organizing driver of cellular biochemistry, given that energy transduction is indispensable to the overall function of translation.

The observed multimodal (and perhaps discontinuous) mass-degree distribution of the network seems most consistent with the ‘distinct molecules’ prediction. The network’s multimodal mass-degree distribution is the first such distribution reported for biological systems at the cellular level. The multimodal mass distribution of the integrated chemical reaction network is independently supported by chemical assays of living E. coli cells. Non-targeted metabolomic assays of small molecular mass compounds using mass spectrometry (TOF-MS) and different preparatory analytic weighting techniques consistently show a peak in metabolite compounds at approximately 150-350 Da, and a tapering of compounds approaching ~2000 Da, even after accounting for decreased detector sensitivity with increasing mass39,40. This approximately aligns with the centroid and distribution range of the smallest mass mode. For polypeptides, proteomic assays conducted across taxa as diverse as E. coli, S. cerevisiae and H. sapiens consistently show a broad correlation between translated genomic length and in vivo enzyme length41. For E. coli, this specifically includes a median peak length of about 209 amino acids, which corresponds to a peak in the mass distribution of polypeptides of approximately 28,000 Da if the average amino acid mass is assumed to be 136 Da (the average molar mass of all 20 essential amino acids). This approximately aligns with the centroid of the second mass mode. A whole-cell assay of all E. coli enzymes, regardless of composition and function, show that highly connected objects central to both metabolism and translation (e.g., the ribosome, TCA cycle components, glycolysis and fermentation components, etc.) compose comparable protein mass fractions under a variety of growing conditions42. The ribosome itself is consistently composed of 2:1 mass fractions of RNA to protein43. In sum, independent raw cell molecular assays indicate sharply discretized compound mass modes that are attributable to the chemical properties of RNA polymers, proteinaceous polypeptides, and base metabolites, regardless of their cellular functions they carry out or the modules to which they may be assigned. Molecules that fall within the troughs between the modes are undoubtedly present in cells in some form, but would not seem to play essential roles in generating an E. coli’s chemosynthetic pathways, cellular physiology, or adaptive responses to stimuli. Multimodal distributions present a unique challenge to theoretical studies of network generation, and may require elaboration using multiplexes that can account for multimodality34.

An assessment of the possible drivers of multimodality and discontinuity in the chemical reaction networks of bacteria may open new ways of characterizing the evolvability of cellular systems, and for distinguishing them from complex abiotic systems. Different network analysis techniques (Figs. 3a, S2, S3) and raw plots of mass distribution modes (Fig. 4c, d) trace modularized systems of translation and metabolism that, by virtue of their chemical isolation from one another, are each able to independently adapt to conditions that the cell may encounter39. Modules can exhibit hierarchical properties and function as key units of adaptive evolution because organismal fitness depends on their performance7,44,45,46,47.

The E. coli metabolic network is only able to exist via the functions afforded by polymeric peptides. Accounting for these polymeric peptides in the chemical reaction network introduces the first modal trough (Fig. 4c) between the largest metabolites and the smallest polypeptides. This trough is also present in objects that compose the translation network that are found across this range of masses (Fig. 4d). From a purely physical perspective, there are no a priori reasons for a discontinuity of mass size between these compounds. A discontinuity may be understandable, though, in biofunctional terms if resources for synthesis are limited31. The synthesis of an extremely large metabolite necessitates the precise placement of assorted atoms and molecular groups, connected with different bond types. If there is a single placement error, or inadequate feedstock of starting materials, the resulting compound is unlikely to be functional. An expenditure of chemical resources to produce increasingly larger (and increasingly error-prone) metabolites has adverse effects on the persistence of the overall cellular system. The synthesis of a linear oligopeptide of comparable mass, by contrast, can make use of repetitive interconnections of a much smaller number of object types (i.e., amino acids), many of which also have similar chemical properties. A reduction of the number of building block types and of the kinds of bonds through which they are connected can increase production fidelity (and resultant functionality) of peptidic complexes compared to massive metabolites. The polypeptide assembly process cannot scale downward indefinitely, however, since there are natural limits to how small a chemically functional polypeptide may be. Oligopeptides must attain some minimal length to form structural folds, or to form a pocket in which specialized cofactor chemistry can occur that is isolated from the cellular environment46. The tradeoffs between synthesizing extremely large metabolites or extremely small polypeptides to ensure overall functionality can introduce a natural trough in mass distribution between these two chemical regimes that is attributable to both the constraints brought about by the chemical properties of their different building blocks and of the emergent functions they enable.

The trough between the second (polypeptide) and third (polynucleotide assemblies) modes appears to arise for entirely different reasons that are distinct to the TM. As with the first mode gap, there are no obvious physical or chemical reasons why there should not be an overlap between large polypeptide and small ribonucleotide polymer compounds, and the observed trough is counter to our predictions. The ribosome, however, has evolved to localize many of the TM’s protein-protein, protein-ribozyme, and ribosomal interactions on itself, thereby spatially isolating the functions of translation from the surrounding metabolic and physiological activities in the cell45. These functions are further isolated in time and space by regulatory mechanisms, such as via binding with the properly formed initiation factors48,49,50,51, by corresponding activated tRNA complexes52 and through spatial heterogeneity arising from complex physiological relationships with nucleoids53. The TM network description includes a rich array of intraribosomal subunit interactions, which we record as interactions involving an object with the mass of the ribosome. This creates a sharp, isolated peak in the distribution centered at the ribosome’s mass. The ready clustering of the TM in the network topology (Fig. 3a) and embedded vector and dimensionality reduction analysis (Supplementary Fig. 2) depict and corroborate the biochemical facets of translation’s modularity, as do in vitro experimental systems that employ cell-free protein synthesis54,55.

There are numerous examples of systems that may be characterized as complex self-organizing, adaptive phenomena, but which lack the evolvability, emergent novelty and agency of living systems. Far-from-equilibrium physical systems such as galactic, stellar, and planetary structures56, earthquakes57, solar flares58 and sandpile or avalanche dynamics59 all exhibit continuous scaling across many orders of magnitude, but few or no reported discontinuities. Discrete size classes observed in biology, on the other hand, are evidence of multiple dynamic regimes60 in which signaling, and the perturbative effects of novelty can propagate and be controlled in directions both up and down scales of a hierarchy61.

A multimodal biochemical distribution may be interpreted as indicating an evolved response of modularization to significant constraints placed on a complex adaptive chemical reaction network62. In biochemical systems, energy flows through the entire hierarchy mostly in the direction from metabolites up to the translation components, but modal gaps and troughs arise due to a complicated reconciliation process between the basic building blocks that compose large objects in the system, energy availability for synthesis, and optimization of functional traits that large objects can exhibit63. In conjunction with a heavy-tailed degree distribution, a multimodal mass distribution may be one of a select suite of indicators diagnostic of biological dynamics that can be observed across the entirety of life’s various spatiotemporal scales.

Implications for prebiotic chemistry

Mapping the full extent of chemosynthetic relationships between translation and metabolism (Fig. 3a, b) graphically captures long-recognized aspects of the hierarchical relationship between these two cellular functions. Translation of a transcripted mRNA directs the specificity of polymer-catalyzed metabolic chemistry, and this chemistry in turn supplies the required energy and materials for ribosomal replication and operation. At the same time, though, the existence of multiple distinct mass discontinuities that correlate more with differences in chemical composition (Fig. 4c) than with cellular function (Fig. 4d) opens new possibilities about the underlying dynamic circumstances in which translation arose. Chemical modes and discontinuities may be so fundamental as to have been contemporaneous with (or have preceded) the origins of cellular life. As an extension, it is possible that some information-processing attributes of the TM module are primordially rooted in complex feedbacks between ribonucleotides and metabolites, intra-network dynamics of the TM, or physicochemical reinforcement of modularization (such as spatiotemporal isolation) afforded by discrete compound class modes, as much as from the emergence of the ribosome’s large and small subunit mechanical activities.

Generating hierarchically connected networks across so many orders of magnitude, with modules that each contain at least one autocatalytic cycle35,64,65,66 presents a significant challenge to the field of prebiotic chemistry67,68,69. If the biochemical modality and modularity observed in this network are genuinely universal attributes of life, they may arise due to general physical phenomena such as percolation70. The observed coupling between translation, protein biosynthesis and metabolic networks presents a new model system in which the network phenomenon of percolation may be studied in a chemical context71 as an exemplary emergent system in which multiple networks with distinct topologies coexist and are connected via common elements72 and discontinuities stem from mechanisms that lead to network (dis)assortativity73,74. Theoretical analyses of such conditions show a possible connection between the number of distinct modes and the possibility of multiple discontinuous phase transitions enabled by multiplex networks75. Phase transitions have long been recognized as potentially important behavioral features that are correlated with complex adaptive systems76.

The relationships between complex adaptive systems and information processing attributes afforded by network topology are directly relevant to developing new theories of the origin of translation. The decoding process that links nucleotide sequence identity to peptide sequence functionality is highly optimized for error tolerance through a property known as code degeneracy, which can be described in informatic and computational terms without explicit reference to the underlying biochemistry. It is unclear how an inherently informatic, network-level property such as code degeneracy would arise on the basis of tracing the individual origins of the ribosome’s chemical components and interaction partners in isolation from this wider network77. Recent discoveries have shown that ‘biological’ attributes such as self-reproduction, adaptation and evolvability (which together characterize much of biological agency) are intrinsically linked to one another through coupling of Hopf and Turing instabilities within a small network of reactions with a cubic autocatalytic motif78. These attributes are observable even in the absence of the intricacies of macromolecular mechanisms or a diversified biochemical metabolic network. Further studies of coupled metabolic and translation chemical reaction network properties may help to develop new hypotheses about the origin of translation as an information-processing modular network, rather than about how translation can arise due to the chemical relationships between amino acids and nucleotides or through the specific biochemical activities of TM components. Specifically, one may test whether the modularization of information processing centered on the ribosome in modern cells can be decentralized across a primordial TM network.

Life exists, in part, by cohesively channeling energy from abundant, small-mass objects into the synthesis of a specific array of large-mass objects. These larger objects afford functions and activities that smaller objects, for various reasons, cannot perform. The inherent tension between utilizing available energy in an efficient way and ensuring diverse functions and activities will be reliably carried out can lead to multimodal mass distributions. Multimodality in this sense may be diagnostically biological, as an evolved trait that is exhibited by adaptive, self-organizing and self-assembling populations that can exhibit and propagate novel responses to stimuli and which are subject to natural selection. It is a property that has been observed in many different ecological systems. Our study of a combined network of metabolism and translation shows that mass multimodality can arise in biochemistry, with distinct drivers for different troughs between modes, even before physiological or gene regulatory mechanisms of cellular operation are considered. This extends multimodality down to the lowest levels at which biological population dynamics can exist: the chemistry within individual bacterial cells. Translation stands apart in the cell as a functional module that is as much informatic as it is chemosynthetic. Its modularity is achieved in part by concentrating many enzyme and ribozyme component interactions on the ribosomal complex, which has the effect of chemically isolating the translation function from much of the chemical activity going on elsewhere in the cell. Looking forward, it may be possible to use an integrated metabolism and translation network to study the emergence of the translation function as a network-level module in a way that is abstracted from the mechanistic biochemical activities carried out by the ribosome and its interaction partners.


Generation of the translation and polymer biosynthesis reaction databases

The purposes of this study are (i) to describe translation as a chemosynthetic process, (ii) to integrate this process with a chemical reaction network describing metabolism; (iii) to map network-level patterns between object mass and object connectivity across different chemical domains; and (iv) to assess whether network-level patterns observed at the highest levels of the biotic hierarchy are present at its lowest levels of biochemistry. These purposes circumscribe the scope of the resulting network’s component description. Objects and reactions that describe the chemical components of the cellular translation and metabolic systems will be transcribed and included in the network. Those reactions that do not alter the explicit biochemical components of an E. coli cell (i.e., gene regulatory interactions, sequence-specific motifs and their mutual interactions, isomerization reactions, etc.) may be otherwise important, but they fall outside the scope of this particular chemosynthetic study and will be excluded. To be sufficient for inclusion in the network, an object and its associated reactions must have been reported in peer-reviewed literature and be based on observable, experimental data; inferences based only on modeling alone are not sufficient.

To serve this purpose, the level of description for translation includes a chemical accounting of ribosome assembly and replication, as well as adjacent active site (A, P and E) interactions, since each of these interactions has downstream effects on the synthesis of biopolymers that enable autocatalytic replication of the entire system (Fig. 2b). The chemical reaction network is intended to focus on the cellular chemosynthetic process, and does not explicitly include sequence-specific or site-specific (physiological or anatomical) information of the cell.

A list of E. coli translation components was obtained from the EcoCyc database79 (GO: 006412). Supplementary Data 1 summarizes the components of translation used in this network, apart from the ions and metabolites (e.g. GTPs) required to operate. The reactions for tRNA aminoacylation by tRNA synthetases, rRNA maturation and modifications, ribosomal protein modifications and maturations and peptidyl tRNA hydrolysis were obtained from the EcoCyc79 database and were also confirmed by literature synthesis (Supplementary Table 1). The reactions for ribosomal subunit assembly, initiation, elongation and termination steps and stress response were built for E. coli. The generation of translation reactions focused on five processes: (a) elongation, which is largely conserved across all organisms; (b) initiation, where some variations have been described between and within domains of life; (c) termination; (d) translation machinery biosynthesis, which are the assembly steps for replicating a ribosome; and (e) the response to amino acid starvation. The reactions were manually annotated. To describe the elongation process in detail, the reaction model accounts for each of the three ribosome active sites. This includes a large number of reactions describing each of the distinct combinations of amino acids at each position. The metabolic and translation component masses were retrieved from the Kyoto Encyclopedia of Genes and Genomes (KEGG) Application Programming Interface (API). The TM layer reaction network file is provided in Supplementary Table 1.

The polymer biosynthesis layer was constructed to link the metabolic and translation layers together by writing non-stoichiometric reactions that approximate the assembly of amino acids into the assembled enzymatic polypeptides. Each reaction within the metabolic layer associated with a specific enzyme resulted in a new reaction placed within the polymer biosynthesis layer network. For reactions that involved multiple protein components (i.e., the formation of multimer heterocomplexes), the logical relationships of the protein components that enabled specific reactions were automatically read and parsed to generate protein-protein interaction maps within the molecular biosynthesis layer (Supplementary Table 1).

The translation process is carried out as a repetition of steps for each of the proteins. Given that we are not considering sequence-specific processes or relationships, we must employ a placeholder “peptide” node that is used in chemosynthesis steps carried out by translation prior to completion of enzyme synthesis. It denotes any incomplete polypeptide sequence that is in the process of being assembled through the translation process. It does not possess any specific chemical function or identity that must be subsequently tracked in other portions of the chemical reaction network.

Metabolic reaction database

The metabolic layer was obtained from Feist et al.80, who reconstructed E. coli metabolism. We linked these reactions to their corresponding enzyme(s). We obtained chemical information about each of these enzymes through KEGG (Supplementary Table 1). We also considered metabolic models of alternative strains of E. coli obtained from BiGG81. The subtle differences between the strains do not lead to any significant difference in the degree distributions (see Supplementary Fig. 1), so we only consider one strain in this article (K12-MG1655)80 .

Chemical reaction network integration

All chemical reactions from each of the three layers (translation, polymer biosynthesis and metabolism) were uniformly processed into an aggregate network data file for graphing and analysis (Figs. 1, 3a). The aggregate network data file forms a bipartite graph with one class of nodes representing chemical compounds, and another class of nodes representing logical chemical relationships (i.e., chemosynthesis reactions and biosynthesis assembly processes) that describe generating biomolecules from underlying substrates. The integrated network is composed of 1433 reactions for translation, 1455 reactions for polymer biosynthesis and 2085 reactions for cellular metabolism, each with varying proportions of intra- and internetwork connectivity (Supplementary Table 1).

Network visualization, processing, and analyses

We define the degree of a compound based on the number of different reactions involving it (whether as reactant, product, transporter or catalyst). We have built upon prior network analysis efforts5 to explicitly consider enzymes and macromolecular assemblies in addition to metabolites. The complementary cumulative distribution function (CCDF), a binned list of the number of compounds with a degree value in a given network, was analyzed using the power-law Python library82 to determine the likeliest statistical model distribution. We performed discrete and continuous fits using models of exponential and power-law distributions, and we compared the log-likelihood ratios among them to determine if the degree distribution is homogeneous or heterogeneous83.

Network visualizations with many pieces of information tend to become cluttered, which can obscure observational insights. Techniques that reduce a network’s information density without altering its fundamental relationships are useful for recovering these insights. We simplified network visualizations by grouping sets of reactions into combined nodes (node-contracting) according to their module labels. These labels were either provided by the original metabolic data without further modification80 (e.g., transport, pyruvate synthesis, etc.), or manually assigned by the authors for the translation network (e.g., elongation, initiation, etc.). The term ‘module’ denotes a subset of nodes in a network that are densely connected to each other, while being sparsely connected or entirely disconnected from nodes in other modules62,84. Compounds that participate only in reactions within a single module are grouped within combined nodes with their module labels, while those compounds that interact across different modules are conserved and graphed as individual nodes. The resulting contracted network is still a bipartite network with modules connected by compounds. We contracted a network formed by the translation and metabolism layers, and we perform an analysis to find out which nodes are more relevant in the traversal of this contracted network using the betweenness centrality metric. This metric consists of measuring the number of times that a node appears in the shortest path between two different nodes:85

$${BC}\left(n\right)=\mathop{\sum}\limits_{s\ne n\ne t}\frac{{\sigma }_{{st}}\left(n\right)}{{\sigma }_{{st}}}$$

Where \({\sigma }_{{st}}\) is the number of paths joining the nodes s and t, and \({\sigma }_{{st}}(n)\) is the number of those paths that pass through the node n. Reaction directionality is accounted for when using this method. A different contraction approach is taken in Fig. 3b, where we collapse all the reactions belonging to the same layer, and we represent in the edges the number of compounds that participate within each layer or that connect the different layers.

The KEGG database provided the molecular structures and sequences required to compute the molecular weight of each of the chemical compounds (proteins, RNA, metabolites, etc.). In the case of complexes, we add the masses of their components but we disregard their stoichiometry of assembly due to a lack of experimental information.

Large networks are highly dimensional objects where similarity relationships are difficult to assess. Network embeddings allow the mathematical representation of network entities to ease their comparison86. The network embeddings are generated using the node2vec algorithm30 as implemented in PyTorch Geometric87. We use standard parameters to generate the embeddings. The resulting 128 dimensions of each of these embeddings were converted into 2D embeddings by using the t-distributed Stochastic Neighborhood Embedding (t-SNE) technique. Visualization of the integrated network and its individual layers was carried out in Gephi 0.9.288. The code employed to perform this analysis together with the data are provided in the article GitHub repository

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.