Abstract
Metabolic phenotypes are pivotal for many areas, but disentangling how evolutionary history and environmental adaptation shape these phenotypes is an open problem. Especially for microbes, which are metabolically diverse and often interact in complex communities, few phenotypes can be determined directly. Instead, potential phenotypes are commonly inferred from genomic information, and rarely were model-predicted phenotypes employed beyond the species level. Here, we propose sensitivity correlations to quantify similarity of predicted metabolic network responses to perturbations, and thereby link genotype and environment to phenotype. We show that these correlations provide a consistent functional complement to genomic information by capturing how network context shapes gene function. This enables, for example, phylogenetic inference across all domains of life at the organism level. For 245 bacterial species, we identify conserved and variable metabolic functions, elucidate the quantitative impact of evolutionary history and ecological niche on these functions, and generate hypotheses on associated metabolic phenotypes. We expect our framework for the joint interpretation of metabolic phenotypes, evolution, and environment to help guide future empirical studies.
Similar content being viewed by others
Introduction
Metabolic reactions as well as entire metabolic networks establish function by yielding phenotypes in terms of metabolic flux distributions inside the cell and in the cell’s interaction with the environment. Such metabolic phenotypes of potentially complex cell communities impact many areas, including biogeochemical cycles1 and human health2. Understanding the drivers of metabolic functional diversity requires disentangling links between metabolic gene repertoires, realized metabolic phenotypes, taxonomy to represent evolutionary history, and environmental characteristics. However, inferring these links, and ultimately determining how all factors combined shape metabolic phenotypes, is an open problem3. One challenge is that, especially in complex microbial communities, few phenotypes can be determined directly4. Instead, potential phenotypes are often inferred from genomic information5. Analyses of the global ocean microbiome illustrate common approaches based on metagenomics data: to infer metabolic functions from gene repertoires6, or to use species-level functional annotations7, which are then associated with taxonomy and environment. However, this does not consider interdependencies of genes, cellular networks their products establish, and phenotypes. Since network context shapes gene functions, and the whole network generates metabolic phenotypes. genetic epistasis8 and variable trait relations along the phylogeny3 indicate a need to incorporate interactions in cellular networks.
Genome-scale metabolic network models (GSMs) make these dependencies explicit. They can reliably predict metabolic phenotypes9, their topological analysis can predict environments10,11, and they were instrumental in analyzing enzyme evolution12, but all for single species. However, there are only a few studies that employed model-predicted phenotypes beyond the species level13,14,15. A recent, comparative GSM-based study of bacterial phenotype evolution13 did not make links to genotype or environment, while other comparative studies linked specific (minimal)15 or generic (sampled)14 environments to species and ecological relations, but not to detailed metabolic functions. To bridge the corresponding gaps, here we exploit the concept that genotype–phenotype relationships connect differences at the genomic level (and in environments) with differences in phenotypes16. Specifically, we quantify how perturbations in enzyme-catalyzed reactions affect metabolic fluxes to compare identical biochemical reactions and subsystems across species with varying metabolic network structures.
Results
Functional comparisons via sensitivities
Our framework uses structural sensitivity analysis17 to characterize perturbation effects in metabolic networks (Fig. 1a). It uses only the network structure (that is, the stoichiometry of metabolic reactions) to assess how perturbations of metabolic fluxes propagate through the network. Specifically, structural sensitivities measure the predicted adjustments to all fluxes required to return a network to steady-state when one or more reactions in the network is perturbed. The predictions assume that cells tend to minimally redistribute fluxes upon a perturbation; this assumption allowed more accurate predictions of bacterial growth rates upon a genetic knockout18. To link genes and enzymes to fluxes, we use gene-protein-reaction mappings, which are a common component of GSMs19 (see Methods for details). Here, we compute absolute sensitivities, which are obtained analytically (see Methods) and do not assume a specific operating state of the network or a specific environment17. These sensitivities are considered to capture adjustments to infinitesimal perturbations; they are valid unless a network’s operating state is exactly equal to specific constraints such as the availability of nutrients in a specific environment, which is unlikely. To compare two common reactions (reactions with identical biochemical formula) in two GSMs, we correlate the sensitivities of all common reactions to perturbations of these two reactions (Fig. 1a and Methods). Correspondingly, we use ‘function’ (‘functional similarity’) in the sense of (similarity of) flux responses to perturbations of the common reactions.
We first evaluated if Pearson correlations of sensitivities provide information on network similarity that is different from measures based on reaction presence / absence (the metabolic repertoire) such as the Jaccard index. For this purpose, we quantified the similarities of the neighborhoods of each common reaction in the Escherichia coli and Bacillus subtilis networks. Sensitivity correlations and Jaccard indices for 1-neighborhoods do not correlate (Fig. 1b, R2 = 0.003). In particular, many reactions have a low Jaccard index, but a high sensitivity correlation because sensitivities account for the whole network’s response to a perturbation; they can distribute over large graph distances (Fig. S1a). Jaccard indices do not capture this even when considering the 2-neighborhood of reactions (Fig. S1b).
To illustrate how sensitivity correlations capture the effects of network context on enzyme function, we consider ornithine carbamoyl transferase, which operates in the structurally similar, but not identical arginine biosynthesis pathways of E. coli and B. subtilis (Fig. 1c). Adding the two missing reactions to the E. coli GSM increases context similarity, increasing the sensitivity correlation from 0.61 to 0.74 (Fig. 1d). Sensitivity correlations can also pinpoint known structural differences between two GSMs. For example, the sensitivities to perturbing the reaction 5-amino-6-(5-phosphoribosylamino)uracil reductase in the riboflavin pathway are uncorrelated between B. subtilis and E. coli (R2 = 0) because B. subtilis can adapt by active riboflavin transport across its membrane, but E. coli lacks this transport20,21. Correspondingly, the correlation increased to 0.71 when augmenting the E. coli GSM with riboflavin exchange and transport (Fig. S1c). These examples and the wide spread of sensitivity correlations (Fig. 1d) suggest that our measure is sufficiently fine-grained to differentiate metabolic functions, in contrast to comparisons of metabolic repertoires alone.
Functional similarity
To assess the biological realism of sensitivity-based predictions, we characterized the functional similarity of metabolic subsystems (sets of reactions with related function) in E. coli and B. subtilis (Fig. 2a and Methods). Sensitivity correlations indicate that lipid and cell wall metabolism are the least similar, consistent with the bacteria’s different Gram status. We also observe a bimodal distribution for reactions in the coenzymes and prosthetic groups subsystem, where the mode with lowest similarity includes mostly reactions in riboflavin metabolism as above. The Jaccard index for each metabolic subsystem gives similar results (Fig. 2a), but it relies entirely on the subsystem classification of metabolic reactions and cannot reveal fine-grained differences at the reaction level.
To assess the plausibility of reaction-level predictions as well as the potential of comparing biological functions with sensitivity correlations in more complex networks, we next used human and yeast GSMs. We analyzed Enzyme Commission (EC) number similarities between pairs of enzymes, defined as the number of shared levels in the four-level EC number classification. As expected, enzyme pairs with the highest EC number similarities showed higher sensitivity correlations than more unrelated pairs (Fig. 2b). However, enzymes with identical EC numbers do not necessarily have high sensitivity correlations, reflecting their different network contexts. This context-dependence dominates over coarse classification of catalyzed chemical reactions because even a single difference in EC number abolishes correlations. Similarly, when we classified each gene pair of yeast and human as orthologous or not, orthologs had significantly higher correlations (Fig. 2c; one-sided t-test, \(P < {10}^{-10}\), n = 154 orthologues, n = 1’140’254 non orthologues). In addition, the correlations span a large range of values, confirming that orthologs are not functionally equivalent22, despite often catalyzing the same biochemical reaction.
Because Pearson correlations can be unreliable for highly skewed distributions23 such as here (e.g., Fig. 1d), we also computed copula correlations that are not affected by the underlying marginal distributions (Methods). The two measures can differ for individual reactions, but they are highly correlated (Fig. S1e, r2 = 0.60, linear correlation) and give identical results for the applications (Figs. 1c, d and 2 vs Fig. S2a–d). Also, reducing the number of reactions to calculate sensitivity correlations had only a small effect (Fig. S2e; albeit more pronounced for Pearson correlation, as expected). Hence, sensitivity correlations establish a detailed, biologically valid, and robust measure of functional similarity.
Functional alignments and phylogeny
Next, we aimed to align reactions in pairs of GSMs using our measure to evaluate its precision in general, and for only distinctly related metabolic networks. This is possible because our sensitivity-based method yields a one-to-one reaction mapping for each pair of reactions in two networks (Methods). Functional alignment is challenging because even phylogenetically closely-related organisms can have very different metabolic repertoires24, and structurally similar network parts (e.g., parallel pathways) could have too similar functions to be resolved unambiguously by sensitivity correlations. A previous method for functional network alignment25 reported 85% correct alignments for 100% common reactions when aligning the yeast GSM iMM90426 with itself. In contrast, more than 92% of the metabolic reactions were correctly aligned even when using only 1% of the reactions to compute sensitivity correlations (Fig. 3a). Importantly, this indicates that discriminating reaction functions by our measure is insensitive to the number of common reactions, that is, the similarity of metabolic repertoires of two networks.
With alignments being at the basis of phylogenetic analyses, we hypothesized that the sensitivity concept could extend to such comparisons over multiple networks. We define the global similarity of two GSMs as the average sensitivity correlation of all common reactions (Methods). To validate this measure, we compared the yeast model with a randomly reduced version of itself. As expected, both the Pearson and copula correlations decrease as the number of deleted reactions increases (Fig. S3a, b). We then compared 16 manually curated GSMs retrieved from Metanetx27 that represent 15 species from all kingdoms of life. Consistent with a previous GSM-based analysis of phenotypic evolution13, average sensitivity correlations decrease with increasing species divergence time and they saturate at high divergence times (Fig. 3b). However, two groups with B. subtilis and Saccharomyces cerevisiae comparisons suggest higher similarities than expected by the general trend. We therefore clustered species hierarchically using the pairwise average sensitivity correlations (Methods).
The resulting species tree (Fig. 3c) is consistent with some aspects of phylogeny (e.g., in separating bacteria, eukaryotes, and the archeon Methanosarcina barkeri), but not with others. For example, the metabolically extreme organisms M. barkeri (a methanogen) and Thermotoga maritima (which does not produce ATP when growing on sulfur) are outliers. Yeast clusters with Gram-positive bacteria (Mycobacterium tuberculosis and B. subtilis), and not with multicellular eukaryotes. Note that the two yeast models with different network coverage clustered together, indicating a certain robustness to GSM accuracy or completeness. We also confirmed that the inferred species tree is robust to the tree construction method (Fig. S3d) and the correlation measured used (Fig. S3f). In contrast, comparing metabolic repertoires via Jaccard indices (Methods) gives rather binary species distinctions (Fig. S3c). This leads to lower resolution at high divergence times (Fig. 3b) and a qualitatively different inferred tree where, for example, M. barkeri clusters with bacteria (Fig. S3e). Hence, in particular distinctions from phylogeny indicate that sensitivities may provide orthogonal information on species-specific metabolic functions and lifestyles.
Metabolic diversity across bacteria
To assess the potential orthogonal information in detail, and to exploit the ability to analyze pathways across multiple organisms, we addressed the open question how habitat and taxonomy explain pathway differences. We performed an integrated analysis of metabolic repertoires, functions, lifestyles, and taxonomy, using an established collection of 321 (245 after filtering, see Methods) GSMs for bacteria13. The models cover a broad taxonomic and lifestyle (habitat and physiological) diversity (Fig. S4a) and represent the high diversity of metabolic repertoires in bacteria (Fig. S4b). However, more wide-spread use of a metabolic reaction across species is not associated with more similar function in the network context as characterized by sensitivity correlations (Fig. S4c), confirming orthogonality of our measure.
We quantify functional similarity across multiple bacteria using normalized biases (z-scores, that is, standard deviations from the mean) to aid interpretation and because of heteroskedacity of sensitivity correlations per reaction (Fig. S4d, e and Methods). Metabolic reactions and subsystems with significant biases respond to perturbations differently than expected from an average (‘standard’) reaction or subsystem. We classify those with significant positive (negative) normalized bias as ‘conserved’ (‘variable’). This allows a fine-grained analysis of functional conservation across the 245 bacteria, as shown in Fig. 4a for KEGG28 annotations of subsystems and Pearson correlations. Importantly, this classification is stable over evolutionary distances between species (Fig. S5a–c). It is also robust to alternatives for averaging over reactions (Fig. S5d, e), to significance tests used (Fig. S5f, g), and within SEED29 subsystem annotations (Fig. S5h, i and Methods).
In this analysis, perturbations in conserved subsystems influence the operation of the entire metabolic networks of different species more similarly than expected. Conversely, we anticipate a conserved subsystem to have limited potential for evolutionary adaptation of its function, for example, because the function is essential for the entire cell, or network constraint enforce a specific function of the subsystem. As one would expect, sensitivity biases identify biomass formation and nucleotide metabolism as conserved (Fig. 4a and S6a, b). Conserved nucleotide metabolism is also predicted by Jaccard indices to assess metabolic repertoire similarity (Fig. S6c) and by comparative genomics30.
However, conserved function does not necessarily require a conserved metabolic repertoire. For exchanges with the environment (so-called pseudoexchanges), sensitivities indicate conservation (Fig. 4a) and the metabolic repertoire variability (Fig. S6c). This suggests that, while bacteria have varying exchange repertoires, an exchange’s metabolic function is similar in the network context across bacteria. We find the inverse for cofactor and lipid metabolism: despite their conserved metabolic repertoires (Fig. S6c), they are functionally variable (Fig. 4a). Cofactor metabolism is enriched for essential genes31, but, for example, bacteria use diverse mechanisms for redox balancing32 and they can evolve new enzyme functions to achieve balance33. Functionally variable lipid metabolism may underlie the diversity of bacterial lipids34, especially when considered together with glycan metabolism for synthesis of the outer membrane.
Such distinctions between repertoire and function become more tangible considering that finer details appear, for example, in amino acid metabolism (Figs. 4a and S6c). We find conservation for histidine, as one would expect from one universal known biosynthesis pathway35, but functional conservation vs variable repertoire for arginine and proline. There, alternative biosynthesis pathways exist across species, but the functional requirement of providing the amino acids is conserved. We predict especially proline biosynthesis to be functionally conserved (Fig. S8a), consistent with its evolutionary conservation and proline’s important role in redox and stress protection36.
These results on functional conservation agree with previous analyses37 and copula correlations yield qualitatively very similar functional results (Fig. S7). However, as in prior GSM construction38 and analysis39, our analysis depends on manually curated subsystem classifications. Using the SEED classification29, only some functionally conserved (e.g., biomass, proline) and variable (e.g., cell wall, cofactors) subsystems agree (Fig. S8a), due to limited mapping between KEGG and SEED subsystems (Fig. S8b). For example, having few reactions in SEED nucleotide metabolism prevents reaching significance. This limits biological interpretation, but overall multiple lines of evidence support that sensitivity correlations consistently introduce an orthogonal functional dimension to comparative network analysis.
Drivers of metabolic diversity
To address how taxonomy and environment influence diversity of metabolic functions across bacterial species, we determined functional conservation or variability induced by single features (e.g., marine habitat). Note that these feature annotations based on widely used databases (see Methods) are incomplete and subject to uncertainties that may affect the analysis. Because taxonomy and environment almost always jointly contribute to predicted functional conservation, we used joint estimation by linear regression to untangle their influences (Methods and Supplementary Data 3). Distances between regression coefficients allowed us to cluster species, for example, by their phyla according to NCBI taxonomy (Fig. 4b). Spirochaetes and Tenericutes are outgroups in the resulting tree, presumably because they are under-represented (three GSMs each) and biased (only host-associated species; Fig. S4a). This tree also reveals aspects of phylogeny, such as proteobacteria not being monophyletic40 (Fig. S9a). All trees together show significant impact of both taxonomy and environment on metabolic function, consistent with metabolic diversity within bacterial phyla. With a phylogenetically consistent taxonomy41,42 (Fig. S9b, c) and other analysis alternatives (Fig. S10), we predict more pronounced taxonomic influences and higher functional similarity within taxa, but environmental factors remain significant for metabolic diversity. When subsampling sensitivity correlations to approximate the effect of inaccuracies in GSMs, we observed only small variations in taxonomic clustering (Fig. S11); the categories discussed next remained stable.
Environmental contributions to metabolic function cluster according to previously defined categories of environmental variability43, namely host-associated vs. fresh water and marine vs. soil and other (Fig. 4b). However, the estimated regression coefficient for the environments show that environmental variability does not dictate functional variability. We infer metabolic variability for host association, low-variability environments, and conservation for aquatic environments of intermediate environmental variability. In other words, bacteria in watery (host) environments tend to be more functionally similar (dissimilar) than their taxonomic peers. In particular, marine bacteria show a conservative bias, consistent with metabolic niches reducing metabolic variability across taxa7. Host environments with comparatively abundant resources and host-specific interactions may afford or even require more metabolic variability of bacteria (e.g., via auxotrophies).
Predicted metabolic functions for bacteria living at low-to-average temperature are conserved, but variable for thermophiles. Functional variability of thermophiles is consistent with previous analyses that showed metabolic networks of thermophiles to be less modular than those of other bacteria44,45, considering that reduced modularity implies reduced module function45. For oxygen requirements, we capture diversity of anaerobic metabolism, and facultative anaerobes as generalists separate. As a negative control, Gram status has no significant influence – it is part of taxon definitions.
Variations of (metabolic) phenotypes according to evolutionary history (represented by taxonomy) and environment are largely unknown and corresponding experimental studies are rare4,5. Our approach predicts the experimentally determined impact of taxonomy (high variability in proteobacterial classes and Actinobaceria) vs. environment (low variability in soil) on metabolic phenotypes in soil ecosystems4 (Fig. 4b). It also suggests more details, namely that β- and γ-proteobacteria are more functionally similar to their peers than α-proteobacteria. As another example, among the dominant classes of functionally conserved psychrophiles (Fig. S4a), we expect higher similarity within γ-proteobacteria than within firmicutes (Fig. 4b). Novel hypotheses such as these are testable in experiments similar to ref. 4.
Diversity of metabolic subsystems
Estimates of normalized sensitivity biases adjusted for taxonomic and environmental influences (Fig. 4b) overall agree well with those inferred directly from sensitivity data (Fig. 4a), e.g., regarding variable cofactor and vitamin metabolism. They provide additional evidence for conservation of many metabolic subsystems, but also unexpected findings such as higher conservation of pyrimidine vs purine metabolism. Close connections to amino acid metabolism and cofactor synthesis, respectively, could explain this difference46.
For habitat and taxonomy influences on subsystems, 755 out of 2856 tested hypotheses were significant (Fig. 4b). These significant hypotheses remained stable under subsampling of sensitivity correlations to a large extent (Fig. S12); this holds for all hypotheses discussed in the following except for the taurine subsystem for which the low number of data points prevented regressions with subsamples. For example, the gut microbiome influences host amino acid and glutathione metabolism in mice47; we predict these subsystems as significant for human-associated habitats (Fig. 4b). The most surprising predicted signatures of human-hosted bacteria are: (i) generally strong associations (large absolute biases), indicating high adaptation; (ii) strong conservation of phenylalanine, biotin, and glutathione metabolism, suggesting that adaptation to human requires peculiar functions of these subsystems; and (iii) variability in glycosphingolipid synthesis exclusive to these bacteria, which is intriguing given the underexplored role of bacterial sphingolipids in human immunomodulation and metabolic disorders48.
To take marine habitats as another example, we predict conserved siderophore metabolism (Fig. 4b). To scavenge scarce iron efficiently in this environment, the release of iron-chelating siderophores appears essential for bacteria49, explaining functional conservation. The regressions also show higher variability of taurine metabolism (which is conserved on average) in marine environments. This could reflect both the importance of taurine as C- and N-source in general, and the depth-dependence of the availability of taurine and alternative nutrient sources50.
Finally, we suggest that functional variability facilitates the potentially costly evolution of metabolic cooperation between species. Intriguingly, experimental evolution of mutualism between the γ-proteobacteria Salmonella enterica and E. coli involved methionine and galactose51; those subsystems of γ-proteobacteria are variable among mostly conserved amino acid and carbohydrate metabolism (Fig. 4b). Cooperation via siderophores in marine bacteria, however, is a counter-example: it depends on physico-chemical characteristics of the environment that cannot be captured in GSMs49. Our hypotheses could primarily help define relevant metabolic phenotypes for experimental studies of individual microbial species as well as consortia.
Discussion
We propose sensitivity correlations as a measure to quantify effects of perturbations in metabolic networks to link metabolic repertoires, functions, and their relations to evolutionary history and environment. In contrast to prior work, we do not need to define15 or randomly sample14 environments, and we cover exchange fluxes as well as internal fluxes. Combined with our approach’s focus of the network context, it thereby enables consistent predictions that were previously inaccessible, for example, on functional conservation of metabolic subsystems. However, uncertainties in GSMs (which could be reduced by probabilistic approaches to network reconstruction52), ambiguities in widely used subsystem annotations, and potential biases in the collections of analyzed models limit the biological accuracy of these predictions. For example, detailed reaction-level comparisons of B. subtilis and E. coli required manual checking. Hypotheses generated by our integrated analysis of bacterial metabolic diversity therefore require empirical validation – and they can simultaneously guide corresponding studies.
We envisage different levels of empirical validation. First, recent advances that increase accuracy and throughput of 13C metabolic flux analysis53 can enable systematic testing of sensitivity-based predictions for individual enzymes, provided targeted (and sufficiently small, e.g., by drug dosing) perturbations can be introduced and a sufficient number of common reaction fluxes can be resolved. For example, one could investigate iso-enzymes predicted to have a most dissimilar effect on network operation in humans and (pathogenic) yeasts as candidates for novel antibiotics. Second, more indirect experiments could be designed that use targeted (e.g., via CRISPR/Cas) or untargeted (e.g., transposon mutagenesis) mutations and indirect readouts such as growth on different nutrient sources for different bacteria to test predictions on subsystem functional conservation with largest effect size. Finally, studies of bacterial ecology in different natural habitats4 could be designed on our corresponding predictions on drivers of metabolic diversity, for example, focusing on the pronounced predicted differences between β-/γ- and α-/ε-proteobactria.
We consider the concept of sensitivity correlations as the main contribution of this work, and the range of applications presented as proofs-of-principle. Increased species diversity and thereby statistical power could increase taxonomic depth and functional specificity, for example, regarding ‘accessory’ genomes that enable intra-species metabolic exchanges in microbiomes54. Incorporating quantitative characteristics of environments could lead to finer ecological resolution55. By allowing such refinements directly, our framework will be instrumental for detailed and systematic studies of relations between metabolic repertoires, phenotypes, evolution, and environment.
Methods
Metabolic models and databases
GSMs of yeast (iMM904), human (Recon1), Bacillus subtilis (iYO844), and Escherichia coli (iJO1366) were retrieved from the BIGG database56. We added missing reactions to iYO844, namely a nitric oxide synthase reaction and exchange reactions for nitric oxide. 16 GSMs (Supplementary Data 1) were retrieved from Metanetx27. Orthologous pairs of human and yeast genes were from the OMA database57. We retrieved 321 automatically reconstructed bacterial GSMs (SEED models) as well as pairwise genetic distances between organisms from ref. 13. We obtained the reference bacterial phylogenetic tree from ref. 40.
Curation of SEED models
For every SEED model, we checked that the biomass reaction carries a strictly positive flux when exchange reactions allow uptake of every possible exchanged metabolite. Under this condition, whenever possible, we removed the structurally blocked reactions (reactions that cannot carry any flux) using flux variability analysis (if the minimum and maximum fluxes are both equal to zero, then the reaction is considered blocked)58. As a sanity check, we verified that the biomass production before and after the removal of the blocked reactions was identical. When this could not be verified, the original model instead of the reduced model was used (24 models out of 321). The cases where the model reduction did not work are indicated in Supplementary Data 1. Finally, we restricted the model set by two criteria: (i) to include only one representative per species, and (ii) to require a minimum of two species per taxon. The final set then comprised 245 GSMs for the NCBI taxonomy and 242 GSMs for the phylogenetically consistent GTDB taxonomy, as detailed in Supplementary Data 1.
Annotations for SEED models
We augmented SEED models by taxonomic annotations of bacterial species using the NCBI taxonomy59 as well as the GTDB taxonomy42. Mapping between models and the corresponding databases was performed by species names, including by species synonyms retrieved from the MACADAM database60. For habitat and physiology annotations, we used fusionDB61. For species with multiple habitat assignments, we automatically identified the main categories (‘fresh water’, ‘marine’, ‘host’ and ‘soil’), and subsumed entries that could not be categorized or of low frequency for SEED models under ‘other’. With available evidence (e.g., specific annotations for human gut), we classified ‘host’ as ‘human’ more specifically (see Supplementary Data 1). If not already available from fusionDB, we added annotations on Gram status from the Microbe Directory62,63. To annotate reactions with subsystems, we first updated the GSMs with current ModelSEED64 annotations (v2.6.1). To obtain corresponding KEGG28 subsystem annotations, we relied on reaction aliases provided by ModelSEED (Supplementary Data 2).
Graph distances
To compute graph distances between pairs of reactions in one GSM, we transformed the GSM to an adjacency reaction graph and applied Dijkstra’s algorithm.
Mathematical notation
We denote the set of GSMs by \({ M}\), and correspondingly the number of GSMs by \(|{ M}|\). Denote by \(l,m\in { M}\) two models from this set. Denote by \({ R}\) a set of reactions, by \(k\in { R}\) a reaction in this set, and by \({ { R} }_{l}\) the set of reactions in model \(l\). Assume that we can identify which two reactions are identical (have identical biochemical formula) in two reaction sets. Then, the set of common reactions of the model pair \((l,m)\in {M} \times {M}\) is \({R}={ {R} }_{l}\cap { {R} }_{m}\). The notation extends to arbitrary subsets of reactions. In particular, we define a metabolic subsystem \(j\) as a subset of reactions in any model, \({{{S}}}_{j}\subseteq \mathop{\cup }\limits_{l\in {M} }\,{ {R} }_{l}\).
We denote subsets of models with respect to the presence of a particular reaction \(k\) as \({ {M} }_{k}:=\{l\in {M} \,|\,k\in { {R} }_{l}\}\). This directly leads to the definition of a reaction’s (relative) usage across models as:
Correspondingly, we define all pairs of models as \({{P}}:=\{(l,m)\in {M} \times {M},l\ne m\}\) and the subset of model pairs with reaction \(k\), \({{{P}}}_{k}\subseteq {{P}}\), as \({{{P}}}_{k}:=\{(l,m)\in {M} \times {M},l\ne m\,|\,k\in { {R} }_{l}\cap { {R} }_{m}\}\).
Structural sensitivity analysis
Structural sensitivity analysis17 quantifies how the perturbation of a reaction flux affects all fluxes in a GSM, assuming minimal total flux adjustment as in the minimization of metabolic adjustment (MOMA) method18. Here, we computed absolute structural sensitivities that neither require the definition of a reference flux distribution (as in MOMA18), nor the definition of an environment through constraints on exchange fluxes (as in flux variability analysis, FVA65). Hence, absolute structural sensitivities characterize network responses independent of a cell’s metabolic state or environment.
Specifically, we first characterized each reaction in each GSM using structural sensitivity analysis17. For model \(l\), we assume that the flux through reaction \(k\in { { R} }_{l}\) is perturbed with a disturbance \({\delta }_{k}\). The minimal adjustments of fluxes required to return to steady-state, \({{{{{{\bf{d}}}}}}}_{l}\), are obtained by singular value decomposition. It solves the minimization problem:
where \({{{{{{\bf{N}}}}}}}_{l}\) is the stoichiometric matrix of the metabolic network and \({d}_{l,k}\) the element of vector \({{{{{{\bf{d}}}}}}}_{l}\) corresponding to the perturbed reaction \(k\).
The sensitivity represents the effect of a perturbation on any reaction relative to the strength of the perturbation. Correspondingly, we define the vector of sensitivities of each reaction of model \(l\) with respect to a perturbation of reaction \(k\) as:
This vector has elements \(s(i,k,l)\) for \(i\in { { R} }_{l}\). Here, we used \({\delta }_{k}=1\).
To make sensitivity computations efficient, GSMs were pre-processed by removing blocked reactions. For reaction pairs in pairs of GSMs, we characterized functional similarity (distance) by correlations (lack of correlations) of absolute structural sensitivities over all common reactions after perturbing a single reaction. For gene-based analysis, we computed sensitivities by simultaneously perturbing all reactions associated with a gene (via the GSM’s gene-reaction associations) with the same perturbation magnitude.
Sensitivity-based correlations and distances
To define the similarity of responses of two models \(l,m\in { M}\) to perturbations in a common reaction \(k\in { { R} }_{l}\cap { { R} }_{m}\), we use a correlation function \(\rho (\cdot,\cdot )\):
where we sort the two vectors in the same order of reactions. The sensitivity distance between the two models with respect to perturbation of a single reaction is then defined as:
We extend these concepts from a single reaction \(k\) to an arbitrary set of shared reactions \({ R} \subseteq { { R} }_{l}\cap { { R} }_{m}\) for sensitivity correlations as
This is a quite natural extension because \(S(l,m,\{k\})=S(l,m,k)\).
The average model similarity of two GSMs is then defined as the average sensitivity correlation over all common reactions:
This directly yields an average dissimilarity for pairwise GSM combinations that is interpretable:
Measures based on reaction sets
The Jaccard index quantifies the similarity of two sets, and we use it to compare the reaction contents of two GSMs \(l\) and \(m\):
To expand this concept to quantify the context similarity of a reaction \(k\) in two models, we define a reaction neighborhood via graph distances. Specifically, with \({D}_{G}(k,i)\) a function yielding the graph distance between reactions \(k\) and \(i\), and \(\delta\) a distance threshold, the neighborhood \(N(l,k)\subseteq { { R} }_{l}\) of reaction \(k\) in model \(l\) is:
The Jaccard index for reactions then becomes:
Normalized similarities
To account for potential biases due to different sets of common reactions in GSM pairs, we compute normalized similarities as
Here, \({\hat{b}}_{0}\) and \({\hat{b}}_{1}\) are estimated intercept and slope of a linear regression of the average reaction similarity
as a function of reaction usage \(\varrho (k)\) jointly for all \(k\), and \(\sigma\) is the root mean squared error of the linear regression. Without significant correlations between reaction usages and sensitivity correlations in our data (Fig. S4c), these definitions amount to computing z-scores.
We expand this normalization to more general similarities by replacing \(S\) with \(\tilde{S}\) appropriately, leading to average normalized similarities for pairs of models, \({\tilde{S}}_{M}(l,m)\), and average normalized reaction similarities, \({\tilde{S}}_{R}(k)\). For conciseness, we term the latter ‘normalized bias’.
Subsystem analysis
To study variability of metabolic subsystems across all models, we partition the set of unique reactions into disjoint subsets \({{{S}}}_{j}\) according to subsystem classification and include an additional subset for reactions with unassigned subsystem.
For sensitivity-based analysis, the average normalized bias \({\tilde{S}}_{S}(j)\) of a subsystem \(j\) is defined as:
The classification of subsystems into categories relied on three alternative approaches:
-
(i)
Subsystem mean by reaction: Distribution of \({\tilde{S}}_{R}(k)\) for all \(k\in {{{S}}}_{j}\).
-
(ii)
Subsystem mean, alignments: Distribution of \({n}_{s}=100\) samples from \(\tilde{S}(l,m,k)\) for any \((l,m)\in {{{P}}}_{k}\) and any \(k\in {{{S}}}_{j}\).
-
(iii)
Subsystem mean, bootstrap: Distribution of \({n}_{r}=100\) estimated averages of \({n}_{s}=100\) samples from \(\tilde{S}(l,m,k)\) for any \((l,m)\in {{{P}}}_{k}\) and any \(k\in {{{S}}}_{j}\).
For analyses based on reaction sets, we define a corresponding measure to compare subsystems \({{{S}}}_{j}\) as:
For a subsystem classification that is consistent with the sensitivity-based approach, we compute normalized Jaccard indices \(\tilde{{J}_{S}}(l,m,{{{S}}}_{j})\) by subtracting the average of \({J}_{M}(l,m)\) over \({{P}}\) and dividing by the corresponding standard deviation. We evaluate the distribution of \({n}_{s}=100\) samples from \(\tilde{{J}_{S}}(l,m,{{{S}}}_{j})\).
Enzyme similarity
To validate sensitivity-based predictions with independent measures, we relied on Enzyme Commission (EC) numbers. They provide a hierarchical numerical classification for enzyme functions composed of four levels (numbers) that represent a progressively finer classification. To compare two enzymes, we define an EC similarity level according to the maximal level up to which their EC numbers coincide. This leads to five levels of similarity between zero (no similarity) and four (completely identical EC numbers). When several EC numbers were mapped to the same enzyme, we used the maximal similarity level of all possible pairwise comparisons. We proceed identically when several EC numbers were mapped to one reaction through its catalyzing enzymes.
To globally characterize the aligned reactions in two GSMs \(l\) and \(m\) via their EC number similarity, we define an EC score akin to the Kullback-Leibler divergence:
where \({a}_{i}\) is the fraction of aligned reactions \({ R} \subseteq { { R} }_{l}\cap { { R} }_{m}\) with an EC number similarity level \(i\), and \({n}_{i}\) is the fraction of reaction pairs with EC number similarity \(i\) in the null model (all possible reaction pairs between two models).
Alignment
Sensitivity distances between all possible pairs of reactions characterize all possible mappings between reactions in two GSMs. The assignment problem corresponds to identifying the best reaction mapping using solely sensitivity distances, with blinded reaction identities. We solved it using the Jonker-Volgenant algorithm66, which selects the set of pairs of reactions with total minimum sensitivity distance. Compared to ref. 25, our method is faster (30 min versus 48 h per alignment) because it requires only one optimization per reaction. It returns a set of mapped reactions with their individual sensitivity distances, and a set of unmapped reactions in the larger GSM. For validations, we aligned the yeast GSM with a copy of itself, after randomizing the order of reactions in the copy and varying the number of common reactions. We measured performance by the number of correctly aligned reactions.
Tree construction
We characterized the pairwise distance between GSMs by sensitivity-based average dissimilarity as well as Jaccard index distances. Trees for analyses of habitat and taxonomy used estimated regression coefficients (see below). We applied the Unweighted Pair-Group Method with Arithmetic mean (UPGMA) for all tree constructions. The phylogenetic tree based on divergence times was retrieved from TimeTree67.
Taxonomy, habitat, and physiology analysis
We analyze metabolic flux phenotypes with respect to five classes of features (habitat, temperature preference, gram status, oxygen preference, and phylum). The features in each class can be mutually exclusive (e.g., gram positive or gram negative status), or not (e.g., a microbe may have more than one preferential habitat). Importantly, the feature classes are not independent (e.g., often, taxonomy definitions are based on the bacteria’s gram status). Therefore, all features have to be accounted for simultaneously in the analysis.
We encode features in row feature vectors F, where Fl is the feature vector for model \(l\). The value of element \(p\) of Fl, \({F}_{l,p}\), denotes if an organism has feature \(p\) (1) or not (0), or if it is undefined (−1). We restrict the analysis to those pairs of models where both models have identical and completely specified feature vectors, denoted as \({{{P}}}_{F}\subseteq {{P}}\). These model pairs are defined as:
To quantify how similarities depend on features, we estimate linear models of the form \({{{{{\bf{Y}}}}}}={{{{{\bf{X}}}}}}\cdot {{{{{\bf{b}}}}}}+\varepsilon\). We construct the design matrix X with one row \([1\,{{{{{{\bf{F}}}}}}}_{l}]\) for each \((l,m)\in {{{P}}}_{F}\).
To assess the impact of features on average normalized model similarity, the elements of the response matrix Y are \(\tilde{S}(l,m,{ { R} }_{l}\cap { { R} }_{m})\) for \((l,m)\in {{{P}}}_{F}\). The resulting coefficient estimates \({\hat{b}}_{p}\) are used for tree construction. Correspondingly, for the analysis of a specific subsystem \(j\), the elements of Y are \(\tilde{S}(l,m,{{{S}}}_{j}\cap { { R} }_{l}\cap { { R} }_{m})\) for \((l,m)\in {{{P}}}_{F}\).
Statistical analysis
To assess significance of normalized biases for subsystem classification, two tests were used as indicated in text and figures: (i) two-sided Wilcoxon signed rank test with \(\alpha=0.05\), and (ii) empirical p-value determination after 10'000 repeats of sampling reactions within a given subsystem and all reactions with replacement according to the number of reactions per subsystem and computing the average sample differences. Enrichment of conserved or variable subsystems in their parent classes was determined by one-sided tests via the hypergeometric cumulative distribution, again with confidence level \(\alpha=0.05\).
For taxonomy, habitat, and physiology analysis, we evaluated the impact of features on subsystem variability via significant coefficient estimates of significant linear regressions. Significance of linear regressions was determined by F-tests against the constant null model without correction for multiple testing (\(\alpha=0.05\)). P-values for tests that coefficients are zero were based on the t-statistic (two-sided) and we used \(\alpha=0.05\) for significance.
Implementation
All calculations were performed with Matlab 2019b, node version (Mathworks, Natick / MA) and Gurobi Optimizer (version 8.1.1).
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
The datasets generated during and/or analyzed are available at https://doi.org/10.3929/ethz-b-000598615. All the data and code for reproducing figures in the main text and the supplementary information are provided in the GitHub repository (see Code availability). We used the following publicly available datasets: 1. Orthologues: retrieved using the OMA database, available at https://omabrowser.org/cgi-bin/gateway.pl?f=PairwiseOrthologs&p1=HUMAN&p2=YEAST&p3=EntrezGene 2. iMM904, iJO844, iJO1366, and Recon1 models were downloaded from Bigg (http://bigg.ucsd.edu/). 3. MetaNetX models were retrieved from MetaNetX (https://www.metanetx.org/). 4. SEED models were downloaded from the publication: Plata, G., Henry, C. & Vitkup, D. Long-term phenotypic evolution of bacteria. Nature 517, 369–372 (2015) (http://vitkuplab.c2b2.columbia.edu/phenotypes/) 5. NCBI taxonomy, available at https://www.ncbi.nlm.nih.gov/taxonomy 6. GTDB taxonomy, available at https://data.gtdb.ecogenomic.org/releases/latest/ 7. Species synonyms (MACADAM database), available at http://macadam.toulouse.inra.fr/doc/MACADAMDatabase.zip 8. Habitat and physiology annotation (FusionDB), available at https://services.bromberglab.org/fusiondb/explore 9. Gram status (Microbe directory), available at https://github.com/microbe-directory/microbe-directory/blob/master/data/microbe-directory.csv. 10. Model SEED annotations and reaction aliases, available at https://github.com/ModelSEED/ModelSEEDDatabase/blob/master/Biochemistry/
Code availability
Custom code for the analysis is available at https://doi.org/10.3929/ethz-b-000598615 and the GitHub repository https://gitlab.com/csb.ethz/functionalcomparisonmetabnetworks/-/tree/main.
References
Falkowski, P. G., Fenchel, T. & Delong, E. F. The microbial engines that drive earth’s biogeochemical cycles. Science 320, 1034–1039 (2008).
Cho, I. & Blaser, M. J. The human microbiome: at the interface of health and disease. Nat. Rev. Genet. 13, 260–270 (2012).
Martiny, J. B. H., Jones, S. E., Lennon, J. T. & Martiny, A. C. Microbiomes in light of traits: A phylogenetic perspective. Science 350, https://doi.org/10.1126/science.aac9323 (2015).
Morrissey, E. M. et al. Evolutionary history constrains microbial traits across environmental variation. Nat. Ecol. Evol. 3, 1064–1069 (2019).
Philippot, L. et al. The ecological coherence of high bacterial taxonomic ranks. Nat. Rev. Microbiol. 8, 523–529 (2010).
Sunagawa, S. et al. Structure and function of the global ocean microbiome. Science 348, 1261359 (2015).
Louca, S., Parfrey, L. W. & Doebeli, M. Decoupling function and taxonomy in the global ocean microbiome. Science 353, 1272–1277 (2016).
Hansen, T. F. Why epistasis is important for selection and adaptation. Evolution 67, 3501–3511 (2013).
Schellenberger, J. et al. Quantitative prediction of cellular metabolism with constraint-based models: the COBRA Toolbox v2.0. Nat. Protoc. 6, 1290–1307 (2011).
Weber Zendrera, A., Sokolovska, N. & Soula, H. A. Functional prediction of environmental variables using metabolic networks. Sci. Rep. 11, 12192 (2021).
Borenstein, E., Kupiec, M., Feldman, M. W. & Ruppin, E. Large-scale reconstruction and phylogenetic analysis of metabolic environments. Proc. Natl Acad. Sci., https://doi.org/10.1073/pnas.0806162105 (2008).
Aguilar-Rodríguez, J. & Wagner, A. Metabolic determinants of enzyme evolution in a genome-scale bacterial metabolic network. Genome Biol. Evol. 10, 3076–3088 (2018).
Plata, G., Henry, C. S. & Vitkup, D. Long-term phenotypic evolution of bacteria. Nature 517, 369–372 (2015).
Bernstein, D. B., Dewhirst, F. E. & Segrè, D. Metabolic network percolation quantifies biosynthetic capabilities across the human oral microbiome. eLife 8, e39733 (2019).
Zarecki, R., Oberhardt, M. A., Reshef, L., Gophna, U. & Ruppin, E. A novel nutritional predictor links microbial fastidiousness with lowered ubiquity, growth rate, and cooperativeness. PLOS Comput. Biol. 10, e1003726 (2014).
Orgogozo, V., Morizot, B. & Martin, A. The differential view of genotype–phenotype relationships. Frontiers in Genetics 6, https://doi.org/10.3389/fgene.2015.00179 (2015).
Uhr, M. & Stelling, J. Structural sensitivity analysis of metabolic networks. IFAC Proc. Vol. 41, 15879–15884 (2008).
Segre, D., Vitkup, D. & Church, G. M. Analysis of optimality in natural and perturbed metabolic networks. Proc. Natl Acad. Sci. USA 99, 15112–15117 (2002).
Lieven, C. et al. MEMOTE for standardized genome-scale metabolic model testing. Nat. Biotechnol. 38, 272–276 (2020).
Vogl, C. et al. Characterization of riboflavin (vitamin B2) transport proteins from Bacillus subtilis and Corynebacterium glutamicum. J. Bacteriol. 189, 7367–7375 (2007).
Garcia Angulo, V. A. et al. Identification and characterization of RibN, a novel family of riboflavin transporters from Rhizobium leguminosarum and other proteobacteria. J. Bacteriol. 195, 4611–4619 (2013).
Gharib, W. H. & Robinson-Rechavi, M. When orthologs diverge between human and mouse. Brief. Bioinform. 12, 436–441 (2011).
Kowalski, C. J. On the effects of non‐normality on the distribution of the sample product‐moment correlation coefficient. J. R. Stat. Soc.: Ser. C. (Appl. Stat.) 21, 1–12 (1972).
Bauer, E., Laczny, C. C., Magnusdottir, S., Wilmes, P. & Thiele, I. Phenotypic differentiation of gastrointestinal microbes is reflected in their encoded metabolic repertoires. Microbiome 3, 55 (2015).
Mazza, A., Wagner, A., Ruppin, E. & Sharan, R. Functional alignment of metabolic networks. J. Comput. Biol. 23, 390–399 (2016).
Mo, M. L., Palsson, B. O. & Herrgard, M. J. Connecting extracellular metabolomic measurements to intracellular flux states in yeast. BMC Syst. Biol. 3, 37 (2009).
Moretti, S. et al. MetaNetX/MNXref-reconciliation of metabolites and biochemical reactions to bring together genome-scale metabolic networks. Nucleic Acids Res 44, D523–D526 (2016).
Kanehisa, M. & Goto, S. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 28, 27–30 (2000).
Overbeek, R. et al. The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST). Nucleic Acids Res. 42, D206–D214 (2013).
Nagies, F. S. P., Brueckner, J., Tria, F. D. K. & Martin, W. F. A spectrum of verticality across genes. PLoS Genet. 16, e1009200 (2020).
Xavier, J. C., Patil, K. R. & Rocha, I. Metabolic models and gene essentiality data reveal essential and conserved metabolism in prokaryotes. PLOS Comput. Biol. 14, e1006556 (2018).
Fuhrer, T. & Sauer, U. Different biochemical mechanisms ensure network-wide balancing of reducing equivalents in microbial metabolism. J. Bacteriol. 191, 2112–2121 (2009).
Auriol, C., Bestel-Corre, G., Claude, J.-B., Soucaille, P. & Meynial-Salles, I. Stress-induced evolution of Escherichia coli points to original concepts in respiratory cofactor selectivity. Proc. Natl Acad. Sci. 108, 1278 (2011).
Sohlenkamp, C. & Geiger, O. Bacterial membrane lipids: diversity in structures and pathways. FEMS Microbiol. Rev. 40, 133–159 (2016).
Winkler Malcolm, E., Ramos-Montañez, S. & Stewart, V. Biosynthesis of Histidine. EcoSal Plus 3, https://doi.org/10.1128/ecosalplus.3.6.1.9 (2009).
Fichman, Y. et al. Evolution of proline biosynthesis: enzymology, bioinformatics, genetics, and transcriptional regulation. Biol. Rev. 90, 1065–1099 (2015).
Peregrin-Alvarez, J. M., Sanford, C. & Parkinson, J. The conservation and evolutionary modularity of metabolism. Genome Biol. 10, R63 (2009).
Singh, D. & Lercher, M. J. Network reduction methods for genome-scale metabolic models. Cell. Mol. Life Sci. 77, 481–488 (2020).
Marashi, S.-A., David, L. & Bockmayr, A. On flux coupling analysis of metabolic subsystems. J. Theor. Biol. 302, 62–69 (2012).
Hug, L. A. et al. A new view of the tree of life. Nat. Microbiol. 1, 16048 (2016).
Parks, D. H. et al. A complete domain-to-species taxonomy for Bacteria and Archaea. Nat. Biotechnol. 38, 1079–1086 (2020).
Parks, D. H. et al. A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nat. Biotechnol. 36, 996–1004 (2018).
Parter, M., Kashtan, N. & Alon, U. Environmental variability and modularity of bacterial metabolic networks. BMC Evolut. Biol. 7, 169 (2007).
Takemoto, K., Nacher, J. C. & Akutsu, T. Correlation between structure and temperature in prokaryotic metabolic networks. BMC Bioinforma. 8, 303 (2007).
Kreimer, A., Borenstein, E., Gophna, U. & Ruppin, E. The evolution of modularity in bacterial metabolic networks. Proc. Natl Acad. Sci. 105, 6976–6981 (2008).
Liu, S., Hu, W., Wang, Z. & Chen, T. Production of riboflavin and related cofactors by biotechnological processes. Micro. Cell Fact. 19, 31 (2020).
Mardinoglu, A. et al. The gut microbiota modulates host amino acid and glutathione metabolism in mice. Mol. Syst. Biol. 11, 834 (2015).
Johnson, E. L. et al. Sphingolipids produced by gut bacteria enter host metabolic pathways impacting ceramide levels. Nat. Commun. 11, 2471 (2020).
Leventhal, G. E., Ackermann, M. & Schiessl, K. T. Why microbes secrete molecules to modify their environment: the case of iron-chelating siderophores. J. R. Soc. Interface 16, 20180674 (2019).
Clifford, E. L. et al. Taurine is a major carbon and energy source for marine prokaryotes in the North Atlantic Ocean off the Iberian Peninsula. Microb. Ecol. 78, 299–312 (2019).
Harcombe, W. R., Chacón, J. M., Adamowicz, E. M., Chubiz, L. M. & Marx, C. J. Evolution of bidirectional costly mutualism from byproduct consumption. Proc. Natl Acad. Sci. 115, 12000–12004 (2018).
Bernstein, D. B., Sulheim, S., Almaas, E. & Segrè, D. Addressing uncertainty in genome-scale metabolic model reconstruction and analysis. Genome Biol. 22, 64 (2021).
Long, C. P. & Antoniewicz, M. R. High-resolution 13C metabolic flux analysis. Nat. Protoc. 14, 2856–2877 (2019).
Goyal, A. Metabolic adaptations underlying genome flexibility in prokaryotes. PLOS Genet. 14, e1007763 (2018).
Gianoulis, T. A. et al. Quantifying environmental adaptation of metabolic pathways in metagenomics. Proc. Natl Acad. Sci. 106, 1374–1379 (2009).
King, Z. A. et al. BiGG Models: A platform for integrating, standardizing and sharing genome-scale models. Nucleic Acids Res. 44, D515–D522 (2016).
Altenhoff, A. M. et al. The OMA orthology database in 2018: retrieving evolutionary relationships among all domains of life through richer web and programmatic interfaces. Nucleic Acids Res. 46, D477–D485 (2018).
Heirendt, L. et al. Creation and analysis of biochemical constraint-based models using the COBRA Toolbox v.3.0. Nat. Protoc. 14, 639–702 (2019).
Coordinators, N. R. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 42, D7–D17 (2014).
Le Boulch, M., Déhais, P., Combes, S. & Pascal, G. The MACADAM database: a MetAboliC pAthways DAtabase for Microbial taxonomic groups for mining potential metabolic capacities of archaeal and bacterial taxonomic groups. Database 2019, baz049 (2019).
Zhu, C., Mahlich, Y., Miller, M. & Bromberg, Y. fusionDB: assessing microbial diversity and environmental preferences via functional similarity networks. Nucleic Acids Res. 46, D535–D541 (2018).
Shaaban, H. et al. The Microbe Directory: An annotated, searchable inventory of microbes’ characteristics. Gates Open Res. 2, 3–3 (2018).
Sierra, M. A. et al. The Microbe Directory v2.0: An Expanded Database of Ecological and Phenotypical Features of Microbes. bioRxiv, 2019.2012.2020.860569, https://doi.org/10.1101/2019.12.20.860569 (2019).
Seaver, S. M. D. et al. The ModelSEED Biochemistry Database for the integration of metabolic annotations and the reconstruction, comparison and analysis of metabolic models for plants, fungi and microbes. Nucleic Acids Res. 49, D575–D588 (2021).
Mahadevan, R. & Schilling, C. H. The effects of alternate optimal solutions in constraint-based genome-scale metabolic models. Metab. Eng. 5, 264–276 (2003).
Jonker, R. & Volgenant, A. A shortest augmenting path algorithm for dense and sparse linear assignment problems. Computing 38, 325–340 (1987).
Hedges, S. B., Dudley, J. & Kumar, S. TimeTree: a public knowledge-base of divergence times among organisms. Bioinformatics 22, 2971–2972 (2006).
Acknowledgements
We thank Hans-Michael Kaltenbach, Sean Froese and Uwe Sauer for discussions and comments.
Author information
Authors and Affiliations
Contributions
C.R. and J.S. conceived the study. C.R. performed computations. C.R. and J.S. performed the analysis and wrote the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks Germán Plata and the other, anonymous, reviewers for their contribution to the peer review of this work. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Ramon, C., Stelling, J. Functional comparison of metabolic networks across species. Nat Commun 14, 1699 (2023). https://doi.org/10.1038/s41467-023-37429-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467-023-37429-5
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.