The minimum energy required to build a cell

Understanding the energy requirements for cell synthesis accurately and comprehensively has been a longstanding challenge. We introduce a computational model that estimates the minimum energy necessary to build any cell from its constituent parts. This method combines omics and internal cell compositions from various sources to calculate the Gibbs Free Energy of biosynthesis independently of specific metabolic pathways. Our public tool, Synercell, can be used with other models for minumum species-specific energy estimations in any well-sequenced species. The energy for synthesising the genome, transcriptome, proteome, and lipid bilayer of four cell types: Escherichia coli, Saccharomyces cerevisiae, an average mammalian cell and JCVI-syn3A were estimated. Their modelled minimum synthesis energies at 298 K were \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$9.54\times 10^{-11}$$\end{document}9.54×10-11 J/cell, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$4.99\times 10^{-9}$$\end{document}4.99×10-9 J/cell, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$3.71\times 10^{-7}$$\end{document}3.71×10-7 J/cell and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$3.69\times 10^{-12}$$\end{document}3.69×10-12 respectively. Gram-for-gram synthesis of lipid bilayers requires the most energy, followed by the proteome, genome, and transcriptome. The average per gram cost of biomass synthesis is in the 300s of J/g for all four cells. Implications for the generalisability of cell construction and applications to biogeosciences, cellular biology, biotechnology, and astrobiology are discussed.

www.nature.com/scientificreports/In this work, we present a model that leverages the ongoing explosion in omics data availability to extend such calculations, assembling all of the biochemical building blocks of a cell into biomacromolecules.This includes not just proteins but also DNA, RNA, phospholipids and carbohydrates.This approach is independent of specific metabolic pathways, which vary from organism to organism, and variation in the chemical environment.This independence from conventional metabolic considerations represents considerable advantages over traditional approaches.Specifically, it can be generalised for settings where the full microbial community and associated biogeochemistry is incomplete or yet to be defined, or be applied to investigate the prospective habitability space for any well-sequenced species under the in situ thermodynamic conditions in more detail than was previously possible.This model, therefore, can serve as a means to sidestep frequently encountered bottlenecks in fields where considerable biouncertainty exists, such as cellular biology, biotechnology, the biogeosciences and, in particular, astrobiology.
Here we examine the case studies of Escherichia coli (E.coli), Saccharomyces cerevisiae (S. cerevisiae), an average mammalian cell, and JCVI-syn3A.The JCVI-syn3A cell is a synthetic organism characterised by its minimalistic genome, transcriptome, and proteome 23,24 .It represents a streamlined cellular model with the bare minimum genetic and proteomic content required for life.This minimalistic design makes JCVI-syn3A an ideal subject for examining the fundamental lower energy boundaries required for cellular synthesis.This approach is especially crucial in fields such as astrobiology, where understanding the minimal energetic thresholds for life is essential in exploring the potential habitability of extraterrestrial environments and the prospects for synthetic life forms.

The minimum energy necessary to build a cell
The minimum energy needed to build a cell as defined here is the sum of the energy required to assemble all its components into their biomolecules.Based on the cell composition and biomolecule structure, we tailored group-contribution models which estimate the energy required to build a cell's genome, proteome, transcriptome and lipid bilayer (See "Methods" section).These algorithms build a virtual cell by reading a DNA and protein sequence associated with the cell type.A visual representation of the results can be found in Fig. 1, cell-specific values for E. coli in Table 1 and a comparison with other cell types and studies in Table 2.For the present study, these calculations were made at temperatures from 275 to 400 K for four different model cells: E. coli (Table 1), S. cerevisiae, an average mammalian cell and JCVI-syn3A (Table 2).At 298 K, the energy required to synthesise one single E. coli cell is 9.54 × 10 −11 J/cell (331 J/g) and 3.69 × 10 −12 J/cell (329 J/g) for JCVI-syn3A.For S. cerevisiae and a mammalian cell, the energy required is 5 × 10 −9 J/cell (311 J/g) and 3.71 × 10 −7 J/cell (354 J/g), respectively.A summary of the algorithm's workflow can be found in Fig. 2 and the units of the calculations in Table 3.
Preliminary results indicate, as has been noted previously 25 , that minimal energy expenditure in life generally scales with mass by (1) the different contributions of the different cell constituents and (2) different concentrations of the metabolites.However, the synthesis cost of a gram of biomass of each of the four species is remarkably similar indicating a consistent fundamental floor in per-gram cost of biomass synthesis.On a per gram basis, synthesising DNA at 298 K requires 0.12 kJ/g, which is higher than the 0.10 kJ/g needed for RNA in E. coli.However, when we consider the cell's mass fraction, RNA, which constitutes a larger fraction of mass, has a higher net energy requirement when building each cell (0.019 kJ/ g cells) compared to DNA (0.0037 kJ/ g cells), accounting for 5.7% and 1.1% , respectively, as detailed in Table 1.On the other hand, despite the lipid bilayer accounting for only 9 % of the cell's mass fraction, it is the second most energy-intensive component, requiring 21% of the cell's total synthesis energy ( 2.099 × 10 −11 J/cell), as shown in Table 1.Table 2 also shows estimates of the energetic cost of biomass synthesis from other studies.Our results are lower, suggesting we have identified a thermondynamic minimum, but there are important caveats to consider when comparing these estimates ("Discussion" section).
For all models at all temperatures, we estimated G • f for the respective biomolecules, which allowed us to obtain the standard Gibbs free energy of reaction G • r .We computed the mean molar Gibbs free energy ( G r ; Eq. (1)), correcting for the absolute intracellular concentrations of products and reactants reported in different studies (Supplementary Tables S1-S4) 26,27 .The results suggest that the higher the temperature, the more expensive it is to synthesise a cell regardless of the organism Fig. 1, although fully testing this universality would require repeated calculations over many more organisms and variations in internal cell composition with temperature.The energetic requirements across the temperature scale (275-400 K) vary by approximately 16% for E. coli ( 9.20 × 10 −11 J/cell to 1.097 × 10 −10 J/cell), 15% for S. cerevisiae ( 4.87 × 10 −9 J/cell to 5.73 × 10 −9 J/cell), 8.8% ( 3.65 × 10 −7 J/cell to 4 × 10 −7 J/cell) for an average mammalian cell and 12% ( 3.58 × 10 −12 J/cell to 4.1 × 10 −12 J/cell) for JCVI-syn3A.This temperature range was chosen to cover known non-freezing habitable temperatures for low salinity fluids.The current maximum temperature observed for life is 395 K 28 .

The minimum energy necessary to build a proteome
We used the GCA approach described in Higgins and Cockell 4 and Amend and Hegelson 19 to calculate the formation energy of every protein in the cell.The amino acid composition can be obtained from an input sequence for any organism.However, we do not consider the concentration of each protein in the cell and instead use an average value for proteins, consistent with these previous studies.To calculate the energy required to synthesise the proteome of a single cell, we first multiply this value by the fraction of the cell's dry mass made up by the proteome (which for E. coli is 0.55 22,29 ).We then divide the resulting energetic fraction by the number of cells ( 3.51 × 10 12 ) in 1 g.The energy needed to synthesise the proteome of one E. coli cell is 5.86 × 10 −11 J at 298 K or 204.67 J per g of cells as seen in Tables 1 and 2, respectively.The energy necessary to assemble the proteome of S. cerevisiae is 2.37 × 10 −9 , for an average mammalian cell is 2.24 × 10 −7 and for JCVI-syn3A is 1.73 × 10 −12 .
The molar Gibbs free energy of synthesising one gram of proteins remains similar throughout the temperature scale (0.37 kJ/g at 275 K and 0.40 kJ/g at 400 K), varying only 10% .Of all the biomolecular components, the G r needed to synthesise proteins is the second most stable throughout the temperature scale after the the lipid bilayer, which varies by only 3 % .The mean G r for protein polymerisation is of a similar order of magnitude to other estimates ( ∼0.5 kJ/g from Higgins and Cockell 4 ; 0.347 kJ/g from Amend et al. 8 ).

The minimum energy necessary to build a genome and a transcriptome
We extended the approach used for the proteome in this work and other previous studies 6,13,19,19,22,30 to calculate the energy required to synthesise a genome and transcriptome, thus capturing a larger and more representative percentage of the total cell mass.Details of the calculations can be seen in the "Methods" section.
The G r needed to synthesise the genome and transcriptome at 298 K is 1.06 × 10 −12 J and 5.44 × 10 −12 J respectively for an E. coli cell (Table 1), 6.71 × 10 −12 and 1.56 × 10 −10 for S. cerevisiae, 1.44 × 10 −9 and Figure 1.Comparative analysis of minimum energetic costs for cellular synthesis across different organisms at different temperatures calculated using synercell.The panels display the energy cost (horizontal axes) vs. temperature (vertical axes).Values noted below are at 298 K: (A) E. coli Mass-Specific Synthesis Displays the energetic cost in Joules per gram for E. coli's DNA, RNA, proteins, and phospholipids.The energy to synthesise one gram of E. coli cells is 331 Joules.(B) E. coli Cell-Specific Synthesis Shows the synthesis energy of one E. coli cell, including the contributions from its genome, transcriptome, proteome, and lipid bilayer.The energy necessary to synthesise one E. coli cell is 9.54 × 10 −11 J. (C) E. coli approximate millimolar amount of ATP required for syntheses in Panel B, assuming 35,000 J per mole of ATP.(D) S. cerevisiae Cell-Specific Synthesis The energetic costs in building S. cerevisiae and its cellular components.The energy necessary to synthesise one cell is 4.99 × 10 −9 J. (E) Average Mammalian Cell Synthesis The energetic costs in building an average mammalian cell and its cellular components.The energy necessary to synthesise one average mammalian cell is 3.71 × 10 −7 J. (F) JCVI-syn3A Cell Synthesis The energetic costs in building a JCVI-syn3A cell and its cellular components.The energy necessary to synthesise one JCVI-syn3A cell is 3.69 × 10 −12 J. Cell compositions represent averaged values derived from various sources (Supplementary Table S1), aiming to capture a general representation across diverse growth phases.The average composition represents a generalised perspective of cell energetics, mainly reflecting conditions akin to the natural environmental lag phase.
Because the energetic contribution of intramolecular bonds can significantly influence a biomolecule's overall G • f , we also employed the GCA to calculate the energy requirement of the critical ester bond involved in DNA binding.Removing the standard free energy of formation of the ribose and a phosphate from ribose-5-phosphate Table 1.E. coli composition and the energy necessary to synthesise its components at 298 K.The per gram values in this table are not adjusted to the respective fraction of mass.*The Carbohydrates' energy was calculated from the average energy value of the computed biomolecules in order to adjust the energy necessary to synthesise one full cell or one gram of cells.(See Supplementary information).and a cost of 101 ATP per bp 33 .b The transcriptome's energy was calculated per Ref. 33 considering a cost of 46 ATP per nucleotide 33 , an average RNA size of 1000 nts 36 and 9.73×10 4 RNAs.c The proteome's energy was calculated per Ref. 33 considering 2.68×10 6 proteins with an average of 320 amino acids 37 and an average cost of 26 ATPs per AA 38 . d The phospholipids energy was calculated per Ref. 33 considering a surface area of 4.42 µm 239 and an average cost of glycerophospholipids of 367 ATP molecules 34 for a total of 1.72×10 10 molecules of ATP.We also calculated the number of phospholipids based on the mass fraction of dry weight (9.3%) and the average molecular weight of a phospholipid (740 Da) for a total of 4.45×10

The minimum energy necessary to build a lipid bilayer
We developed a straightforward model to calculate the energy of a cell lipid bilayer composed entirely of palmitoyl-2-oleoyl-sn-glycero-3-phosphocholine (POPC).This model was created with the assistance of the GCA and is based on the direct assembly of its building blocks 32 and a balanced reaction from metabolites available in the SUPCRT slop07 database (Table S4).As seen in Fig. 1, the energy to synthesise one gram of POPC at 298 K is 0.764 kJ/g, making it the most expensive component of the cell.However, this might be due to the simplicity of this methodology compared to the model for the proteome, genome and transcriptome.For an E. coli-sized cell, the energy required is 2.099 × 10 −11 J, 5.71 × 10 −10 J (Table 1) for a S. cerevisiae-sized cell, 1.18 × 10 −7 J for a mammalian-sized cell and 1.42 × 10 −12 for JCVI-syn3A (Fig. 1).This value is the most stable of all biomolecules considered throughout the temperature range, being only 3 % more expensive to synthesise a lipid bilayer at 400 K ( 2.08 × 10 −11 J) than at 275 K ( 2.025 × 10 −11 J) for E. coli.Despite methodological differences, our value (73.31J/(g cell)) is similar to that calculated by McCollom and Amend 6 (89 J/(g cell) in anoxic conditions) as seen in Table 2. See Supplementary Document SD2 for the whole data set at different temperature points.Finally, it is worth noting that developing a universal thermodynamic model for carbohydrate synthesis is a significant challenge due to the inherent structural diversity and isomerism found in carbohydrates.The limited experimental data for complex polysaccharides adds to this challenge; as such, we used the mean from the other biomolecule models to simplify the calculations to 1 g as disclosed in Table 1.

Discussion
Establishing a standardised methodology to calculate the minimum energetic requirements for cellular biosynthesis that defines the most efficient metabolic pathways across diverse cell types can provide insights into the energetic constraints of life in different environments and guide research in astrobiology, cellular biology, and biotechnology.This work offers a comprehensive, data-driven approach to elucidate the minimum energetic requirements of cellular biosynthesis.The model calculates organism and environment-specific energy requirements, elaborating on other approaches which rely on model organisms, highly specific applications, or generalising across microbial communities.We have shown that per gram of dry weight, mammalian cells, S. cereviscea, E. coli, and even the 'minmial cell' JCVI-syn3 have similar minimum energetic costs of biosynthesis.
It is important to note that the "energy required for cellular synthesis" and the "minimum energy necessary for cellular synthesis" should be interpreted differently.The former refers to the typical energy expended inpractice, accounting for specific metabolic pathways, environmental influences and biological inefficiencies.The latter, this study's focus, represents optimal conditions, yielding the lowest possible energy required to build a cell.While the in-practice energy reflects typical cellular operations and can be influenced by possible metabolic heterotrophic inefficiencies or the availability of partially constructed carbon sources, the minimum energy is a foundational, agnostic reference for highly efficient metabolisms.One method of quantifying the efficiency of metabolism is by calculating a Gibbs energy dissipation rate (in kJ (g cells) −1 h −1 ).This paramaterises the energy which is not utilized in metabolism, and is lost as entropy, heat, or through other inefficiencies.It varies with growth rate and appears to plateau at high growth rates 40 .As Synercell is integrated into microbial growth models, it may be used in the future to examine how much dissipation is caused by the difference between the two synthesis energies described above.
Historical research on biosynthesis has predominantly centred on the cell maintenance energy, biomass synthesis energy from metabolic models 41 , exploring ATP requirements for specific synthesis pathways 5 , energy dynamics within chemolithotrophic communities 6 , or generalising biomass synthesis from inorganic precursors based on fixed stoichiometries 7 .The approach presented here instead provides insights into a lower thermodynamic floor of energy required to build a cell, which could shed light on the ultimate biophysical limit of efficiency for microbial growth.This approach uses a wider variety of omics data than those listed above.It plays a vital role in lending specificity and variability to the biomolecules under consideration.The variablity in composition and size given by input sequences is adjusted to fix reactants and products' concentration pools.Thus, the model can be deployed for any well-sequenced species, yielding cell-specific biosynthesis energy requirements for application in biogeosciences, cellular biology, biotechnology, or astrobiology.
Our minimum energy necessary for cellular synthesis can be tentatively compared with other estimates that used the various approaches above.Some comparisons for E. coli are listed in Table 2.The synthesis energies computed in this work for E. coli are significantly lower than these other estimates.This owes to the methodological differences between the studies, and our goal in this work to find a fundamental thermodynamic minimum.The largest difference is between this work and the estimates from Lynch and Marinov 33,34 .For each cell component, these estimates are ∼20-40 times larger than our predictions.This most likely owes to the Lynch and Marinov estimates including smaller building blocks than Synercell and that model's association with empirical data.For example, the majority of the Lynch and Marinov 33 synthesis energy ATP cost is associated with building nucleotides with polymerisation a minor (of order percent) contributor 33 , whereas Synercell focuses on building the helix structure.This likely accounts for one portion of the discrepancy, with the remainder associated with alternative inefficiencies such as the residual energy loss described above.McCollom and Amend 6 suggest that the actual observed energy expended on growth processes is approximately an order of magnitude larger than thermodynamic cost of synthesising the constituent building blocks, and their column in Table 2 only characterises that process, not polymerisation.If the McCollom and Amend 6 column reflects the building block synthesis then, and Synercell represents the polymerisation cost, the remainder between the sum of these and the Lynch and Marinov 33,34 column may represent the cellular inefficiency in biomass production in nature.Higgins and Cockell 4 calculated that, for proteins at ≈25 • C, the synthesis of amino acids from organic precursors and the cost of polymerisation are approximately 700 and 500 J (g proteins) −1 respectively.However, this begins to diverge with elevating temperature and amino acid synthesis becomes more energy intensive ( ≈ 4 times more expensive at 100 • C) 4 .The Synercell proteome polymerisation estimate for E. coli at 25 • C is 372 J (g proteins) −1 , in broad agreement with Higgins and Cockell ( ≈500 J (g proteins) −1 ) 4 and Amend et al. (347 J (g proteins) −1 ) 8 .To our knowledge, this is the first application of a GCA for DNA, RNA, and phospholipid polymerisation so it is difficult to verify these results against other studies.Similar relationships are observed between the different components and the other studies noted above and in Table 2.
In Table 2 synthesis energies are also presented in mmoles of ATP per gram of cells in order to compare with other empirically validated results 33,34 .However, the ATP energy yield is influenced by internal cell concentrations and physicochemical parameters like temperature and pressure, which can vary significantly among different organisms 20 and even within the same organism under different growth states.Consequently, while comparing ATP costs is a prevalent approach in the literature, this method only sometimes provides a straightforward comparison due to these variable internal and environmental factors.Our analysis, therefore, treats these ATP cost estimations as part of a broader, context-dependent framework rather than as absolute values for direct comparison.As such, conversion between the units of the studies in Table 2 may account for some of the discrepancy in energy synthesis values.Our approach to estimating the minimal energy requirements for cell synthesis is an alternative to using biomass compositions derived from flux balance analysis (FBA).FBA, a well-recognised method for studying metabolic networks, often involves challenges in accurately capturing the stoichiometry of biomass reactions, a point highlighted by recent studies 42,43 .These challenges stem from the difficulty in obtaining detailed experimental data for all major biomass components, compounded by the variability and complexity of metabolic networks.To sidestep the inherent uncertainty in the biomass reaction stoichiometry used in FBA models, we instead introduce variability in size and composition by reading input sequences.While FBA models are critical for understanding cellular metabolism, they often focus on the growth-associated maintenance (GAM) demand of ATP, making it hard to understand the minimum energy necessary to synthesise these components.Reported values for E. coli cell synthesis calculated with FAB include 23 mmol (g cells) −144 , 59.81 mmol (g cells) −1 41 , 53.81 mmol (g cells) −145 and 75.38 mmol (g cells) −146 , which are larger than our estimates, and, as above, this difference likely characterised cellular inefficiencies and complexities such as GAM and energy dissipation.In contrast, Synercell aims to provide an energetic baseline while staying flexible to any cell with known genome and proteome sequences.This approach is particularly advantageous for analysing cells with less characterised metabolic networks, where detailed experimental data for biomass composition are unavailable.
Results for the JCVI-syn3A cell model can also be compared to some other calculations, albeit in a more limited way than E. coli.JCVI-syn3A is an interesting case study, because it was engineered to function as a 'minimal cell' 24 .This makes it an ideal example to probe the fundamental minimal energy necessary to synthesise a cell.On a per-cell basis, the minimum synthesis energy of JCVI-syn3 was the lowest amongst our sample of four-but it was also the smallest cell so that result alone is limited.On a per-gram basis, all four organisms examined in this work have a similar minimum synthesis cost, and any differences are likely caused primarily by differences in internal cell composition, and secondarily by genome and proteome complexity.Breuer et al. 24 provides some estimates of the ATP requirement to synthesise JCVI-syn3A DNA, RNA, and proteins-0.24,0.14, 21.2 mmol ATP respectively-but those are based on E. coli-like synthesis costs so direct applicability to this organism is limited.Synercell results were generated with the JCVI-syn3A internal metabolites composition, and genome and proteome sequences.
In this work, we have expanded the scope of existing methodologies for peptide synthesis 19,20 to include the energy calculations for a cell's DNA, RNA, and lipid content.This was not possible for carbohydrates owing to the extensive diversity in carbohydrate structures, their varied functional roles across different organisms, lack of standardised structural description 47 and the limited availability of thermodynamic data.The vast heterogeneity in carbohydrates implies that no single structure can adequately capture the essence of all cell types.Instead, we adopted an alternative strategy where we adopted an average value approach for the carbohydrate content, as has been previously done for all non-proteome components 4,8,19 .This approach allowed us to integrate carbohydrates into our whole-cell calculations, ensuring a more comprehensive and representative model, albeit with an acknowledgement of the simplifications necessitated by the complexity of carbohydrate diversity.
Furthermore, our model employs POPC as a representative phospholipid to approximate the energetic costs associated with membrane synthesis.While POPC is a prevalent component in many cell types, this membrane simplification poses limitations in fully capturing the energetic nuances associated with synthesising more complex cell membranes.Cell membranes comprise a rich mixture of various lipid species and proteins and in this model the latter part is calculated as part of the proteome algorithm.Approximately 30% of proteins in a cell are in the membrane 48 .To get a closer approximation to the membrane value we need to consider that the energetic cost of this protein component would be approximately 30 % of the proteome's value (58.57J/(g cells) or 1.76×10 −11 J/ cell for E. coli).The lipid bilayer cost is 69.95 J/(g cells) or 2.10×10 −11 J/cell, giving a total of 128.53 J/(g cells) or 3.86×10 −11 J/cell for an E. coli membrane (values from Table 2).Table 2 also summarises similar estimates of the cell constituents of E. coli from other studies using slightly different methods and chemical environments.
Our model stands out due to its adaptability.It can be refined with additional thermodynamics and omics, allowing for species-specific energy estimates.Conversely, since our model's input requires biomolecule sequences to perform the calculations, it can only perform DNA, RNA and protein calculations based on omics data.Despite previous efforts to sequence phospholipids and carbohydrates 47,49 , there is still a lack of standardised methodologies and data for these biomolecules.Therefore, our model only includes one 'handmade' generic model per biomolecule type.Consequently, since accurately calculating the minimum energy needed to synthesise a cell requires more thermodynamic information for phospholipids and carbohydrates, we provide an open-source tool for different applications that can be updated as data becomes available.
In the development of this model, we evaluated two key aspects: (1) the accuracy of the GCA in constructing a biomolecule's G • f , and (2) the use of different G • f standards for the building blocks (Table 4).First, we built the nucleotides in two ways: phosphate + deoxyribose + adenine (block method 1) and phosphate + deoxyadenosine (block method 2) (Supplementary Fig. S1).We obtained similar results when comparing the G r obtained with the different methodologies and the standard nucleotide's G • f from the SUPCRT slop07 database 50 .Furthermore, we tested this method for chemical bonds and validated the results with experimental data (Supplementary Fig. S2), indicating this is a reliable method.Secondly, we examined thermodynamic and biological standards G • f for the building blocks 18 to ensure consistency in results.Each standard estimated the same G r , likely due to the lack of H + in the overall reaction and our assumption that ionic strength is close to zero 20 .Furthermore, although the only physicochemical parameter considered here was temperature, the models could be corrected for chemical differences by considering the corresponding change in cellular content, if any.Our models show a broad floor when compared to results from other studies which examine a variety of chemical environments [4][5][6]19  www.nature.com/scientificreports/ In the future, our model could benefit from integrating more variables.For example, variations in internal pH among organisms can influence cellular composition stability 51,52 .Environmental shifts can also affect energy consumption in biomacromolecule synthesis.Higgins and Cockell 4 showed that rising temperatures make amino acid synthesis a leading energy expense in protein formation.Additionally, McCollom and Amend 6 found that anaerobic conditions are more conducive to building block synthesis than aerobic ones due to specific oxidation states.In anaerobic settings, the altered oxidation state affects the concentrations of crucial dissolved compounds, influencing biomolecule synthesis.Moreover, we have utilised cell compositions from diverse sources to approximate an average value, aiming to represent a broad spectrum of growth phases.While this approach provides a broad overview, future studies may benefit from analysing cell composition in specific growth phases to assess dynamic changes in energetic costs and maintenance requirements.This would enhance the granularity of our analysis and allow us to examine how changes in the absolute internal cell composition-both reactants and products-impact the overall energetic cost of cell synthesis.
When the model is deployed for analyses of microbial communities in situ, local instantaneous geochemical data should be leveraged to correct internal cell concentrations and their effect on the present biosynthesis calculations, unlocking faster and more robust biomass turnover calculations than are currently possible 20 .Typically, the biological data which serves as input parameters for microbial models are inferred from culturebased studies which themselves are controlled and well-defined but may be time consuming to perform.The model presented here only depends on the omics data of any given organism and its internal composition, so only requires the latter to be updated using insights into the local geochemistry to generate site-specific energetic requirements of biomass synthesis.This could additionally be extended for analyses of habitability and growth through deep time, and to model how the energy requirement changes with its environment 53 .This study's primary goal was determining the minimal energy necessary to assemble a cell, a key metric for understanding the basal energy requirements essential for life.The energetic requirement of biomass synthesis is a critical component of bioenergetic habitability models 4 and a controlling parameter in estimates of biomass turnover, which are pertinent to biosignature production and, by extension, constraining the feasibility of life detection on other worlds 20 .Additionally, our findings have significant implications in biotechnology, offering a pathway to optimise energy efficiency in microbial production systems and synthetic biology applications.By establishing a benchmark for the minimum energy needed to construct cellular biomass, our model, Synercell, is a tool for identifying and enhancing energy-efficient pathways in various biotechnological processes [54][55][56][57] .The potential integration of our model, Synercell, with other predictive models, (e.g., amide bond synthesis 58 ), can enhance the accuracy of bioenergetic predictions across diverse environmental conditions.
In conclusion, this study introduces a comprehensive, data-driven model to understand the minimum energy requirements for cellular biosynthesis.It is a valuable tool for cellular biology, biotechnology, biogeosciences, and astrobiology and can be incorporated into other models.We anticipate its flexibility will encourage further research and data collection, particularly for thermodynamic data related to organisms other than those studied here, and their constituent biomolecules.Ultimately, our research contributes to understanding the energy constraints of life and the factors influencing the fundamental thermodynamic minimum energy requirements for cell construction.This understanding is crucial for exploring life's boundaries in extreme environments, optimising biotechnological processes, and probing the potential for life beyond Earth.

Methods
The Gibbs Free Energy (Eq. 1) represents the energy available to do work.By quantifying the Gibbs Free Energy of synthesis for proteins, DNA, RNA, and lipids, we can determine the work required to build these cellular components from their most direct building blocks under given conditions.This value reflects the energetic investment to maintain and/or replicate a cell and offers insights into the efficiency of cellular processes.To calculate the energy for each cellular component, we developed a model tailored to each biomolecule to calculate G • f and then use internal cell concentrations to calculate the molar Gibbs energy ( G r ) of each biomolecule type: where G • r is the standard reaction Gibbs energy obtained from an average of every biomolecule on the cell, R is the ideal gas constant, T is the absolute temperature in kelvin, and ln Q is the natural logarithm of the reaction quotient between building blocks (reactants) and biomolecules (products).This equation is used for each value produced at different temperatures (Fig. 2).

Cell synthesis energy
The total energy to synthesise a cell ( G [synthesis] ) is the sum of the energy required to synthesise its components- the proteome, genome, transcriptome, and other cellular biomolecules.Numerous approaches to calculating the maintenance energy have been proposed 4,59,60 , and typically depend upon some energy requirement of biomass synthesis for replacement.This work aims to compute synthesis or growth energy, using the Gibbs free energy to synthesise a cell G [synthesis] (Eq. 2)from the building blocks of different biomolecule types.For this, we can break down a cell into: This equation encapsulates the comprehensive energy requirement for cell construction where subscripts refer to the molecule types being synthesised.

Energy for maintenance and growth
Broadly speaking, organisms use their energy supply for either growth or maintenance 2 (Eq.3).Growth processes depend on the energetic cost of biosynthesis (e.g., building proteins or DNA), whereas maintenance processes include all those which consume energy but are not necessarily related to growth 2,22 .Complete cellular energetic calculations can be performed when we gather enough information to calculate the energy to build all the structures of a cell with the cell simultaneously remaining viable 22 : This formula provides a view of the cell's energy budget, incorporating both the energy for biosynthesis (building biomolecules like proteins and DNA) and maintenance (energy-consuming processes that do not directly contribute to growth).

Model implementation
Because G • f for large biomolecules is hard to find in the literature, we made an estimation using the group contribution algorithm (GCA) as similarly done by Mavrovouniotis 16,61 .We calculated the G • f of proteins, DNA, RNA and a lipid bilayer by summing together the G • f of their respective building blocks (e.g.amino acids, nucleotides, phospholipids, etc.) 13 .The flowchart in Fig. 2 summarises the procedure of Synercell 's modules.
The modules obtain the different compounds' G • f used for the GCA from the slop07 database and SUPCRT92 50 .This software package calculates the standard molar thermodynamic properties of minerals, gases, aqueous species, and reactions from 1 to 5000 bar and 0-1000 • C 50 .The data was accessed and calculated using the reaktoro package for chemical systems, using an implementation of the revised HKF equations 62,63 .Once the G • f of each biomolecule is obtained, the Gibbs reaction energy ( G • r ) is obtained with the following: where R is the ideal gas constant (8.314J/mol K), T is the absolute temperature (K), and ln Q is the natural logarithm of the reaction quotient.Because G r calculations heavily rely on the absolute concentrations within a cell, we used the building blocks' (reactant) concentrations obtained experimentally by Bennet et al. 26 and Park et al. 27 , and biomolecules' (product) concentrations estimated on the fraction of dry mass in the cell Table S1.For the latter, the variability is adjusted with the help of the input sequences.For instance, if the input data contains five protein sequences, the program will obtain an average from those five protein sequences and adjust it to the final protein mass depending on the cell type.

Energy to synthesise the proteins
Synpro is the model within the program that implements the GCA approach from Higgins and Cockell 4 and Amend and Helgeson 19 to calculate the formation energy of a protein according to its amino acid composition.The G • f and G • r of a protein are calculated with: where AABB represents the amino acid backbone ( H 2 N − CH − COOH ), N AA is the number of amino acids in the protein, N Gly number of Glycines, PBB is the protein backbone ( HN − CH − C = 0 ) and R is the non-glycine functional group of each amino acid.m i acts as a counter for the number of occurrences of each non-GLY amino acid i in the chain such that 19 i=1 m i = N AA4 .Equation ( 4) is then used to calculate the Gibbs reaction energy ( G • r ): Synpro calculates the G • f and G • r of each protein within the input sequence.Next, the algorithm obtains an average G • r to represent all the proteins in the proteome sequence.This average value is then used to calculate G r taking into account the internal absolute amino acids composition and total protein concentration reported per cell type.

Energy to synthesise the nucleic acids
Syngen calculates the energy necessary to synthesise DNA and RNA based on a similar approach used in Synpro.Respecting its stoichiometry, we obtained the formation energy of a nucleic acid chain ( G • f [NA chain ] ) using the GCA, which can be summarised with: We obtained G • r with: (3) The model calculates the energy for the double strand genome according to the input sequence.In parallel, the tool transcribes the sequence by analysing available open reading frames (ORFs) that the program detects between the first available start codon (ATG) and the immediate next stop codon (either TAA, TAG, TGA in this order) transcribing each ORF until the end of the sequence without considering transcription factors or other transcription criteria other than the length of the potential transcript.

Energy to synthesise the phospholipids
This model assumes all cell membranes are made of phosphatidylcholine (POPC), one of the most abundant phospholipids across cell types 32,[64][65][66] , arranged in a lipid bilayer that does not take into account proteins as the membrane proteins are considered within the proteome calculations.We obtained the G • f of POPC adding up the formation energies of its building blocks: Due to the lack of thermodynamic data available for this biomolecule, we used a different approach to calculate G • r from metabolites as seen in Henry et al. 67 and Jankowski et al. 68 .The G • f of the metabolites involved were available in the slop07 database and used to balance a theoretical condensation reaction to estimate the reaction energy: N PL is the number of phospholipids in the lipid bilayer estimated by comparing the weight of POPC and the weight of the lipids in a cell.G r was calculated using internal compositions of the metabolites The stoichiometry of this reaction can be found in the Supplementary material Table S4.

Assessment and validation of the GCA
To test the accuracy of the GCA up to the nucleotide scale, we broke down the nucleotides of E. coli's genome in three different ways when calculating the cost of synthesis (Supplementary Fig. S1).Using thermodynamic values available in the slop07 69 database for the different building blocks, we calculated the energy to synthesise the E. coli genome.Despite minor discrepancies between the two alternative methods (0.12 kJ/g vs. 0.152 kJ/g), results were generally consistent.The synthesis energy for dAMP varied by about 10 kJ/mol between methods, hinting at potential intramolecular interactions.Additionally, the energy requirements for critical DNA-binding bonds, such as the ester bond in ribose-5-phosphate, resolved by this testing closely matched experimental values (approx.22.17 kJ/mol) 31 .These results reinforce the reliability of our approach for determining the G • f of nucleotides.

Additional notes and standardisation
The thermodynamic standard for G • f used in this work should not be confused with the alternative biological standard G '• f .The thermodynamic standard represents conditions at pH value of 0 (i.e., a concentration of H + equal to 1 M) and ionic strength of zero.In biological standard conditions, pH is set to 7 and ionic strength usually 0.1, but is often less rigorously defined (Table 4).Standards are converted to actual molal quantities as outlined above.
In this study, we used values for E. coli at 37 • C in aerobic glucose containing minimal medium at a doubling time of 40 min and Values for S. cerevisiae grown at 30 • C in aerobic 0.5% glucose containing minimal medium at a doubling time of 160 min.Values represent an average of reported values for various growth conditions.Energies of all building blocks used in our calculations, derived from thermodynamic and biological standards, (8)

Figure 2 .
Figure 2. Workflow summary of Synercell.The tool will first require data input, including the type of cell (bacterial, yeast, mammalian or JCVI-syn3A), a genome sequence, a protein sequence and the temperature at which the user wants to calculate the energy.The tool creates a virtual cell with the omics data input, transcribing the genome and adjusting the concentration pool according to the cell type.Using the GCA, the tool uses tailored models for each biomolecule type to estimate its G • f at the chosen temperature.Next, the tool obtains the G • r following its stoichiometry.Finally, it calculates G r combining experimental and theoretical concentrations data for each constituent. https://doi.org/10.1038/s41598-024-54303-6

Table 2 .
Cell synthesis cost comparison in J (g cell) −1 and ATP mmol (g cell) −1 at 298 K. See footnotes for details on unit conversion.Significant values are in bold.The per gram values in this table, are adjusted to their respective fraction of mass.
33he genome energy was calculated per Ref.33considering a genome size of 4,608,319 bp

This work Lynch and Marinov 33,34 McCollom and Amend 6 Stouthamer 5e Mammalian S. cerevisiae JCVI-Syn3A E. coli
10molecules of ATP.The value in the table is the average between both calculations.The values of ATP were obtained assuming that each molecule of ATP yields 35 kJ/mol.eStouthamer1975reportedvalues in mol ATP (g cell) −1 .Conversion to J (g cell) −1 was done assuming 1 mole of ATP yields 35 kJ, as above.fMcCollom&Amend 2005 reported values in J (g cell) −1 .Conversion to mmol ATP (g cell) −1 was done assuming 1 mole of ATP yields 35 kJ, as above.Joules (g cell) −1

Table 3 .
Thermodynamic, input and output data parameters and units used in our models.

Table 4 .
Conditions for each type of standard condition.The (thermodynamic) standard conditions refer to those typically used in chemistry and physics while the biological recreates those found in typical intracellular physiological environments.