Metaboverse enables automated discovery and visualization of diverse metabolic regulatory patterns

Metabolism is intertwined with various cellular processes, including controlling cell fate, influencing tumorigenesis, participating in stress responses and more. Metabolism is a complex, interdependent network, and local perturbations can have indirect effects that are pervasive across the metabolic network. Current analytical and technical limitations have long created a bottleneck in metabolic data interpretation. To address these shortcomings, we developed Metaboverse, a user-friendly tool to facilitate data exploration and hypothesis generation. Here we introduce algorithms that leverage the metabolic network to extract complex reaction patterns from data. To minimize the impact of missing measurements within the network, we introduce methods that enable pattern recognition across multiple reactions. Using Metaboverse, we identify a previously undescribed metabolite signature that correlated with survival outcomes in early stage lung adenocarcinoma patients. Using a yeast model, we identify metabolic responses suggesting an adaptive role of citrate homeostasis during mitochondrial dysfunction facilitated by the citrate transporter, Ctp1. We demonstrate that Metaboverse augments the user’s ability to extract meaningful patterns from multi-omics datasets to develop actionable hypotheses.

Metabolism is intertwined with various cellular processes, including controlling cell fate, influencing tumorigenesis, participating in stress responses and more. Metabolism is a complex, interdependent network, and local perturbations can have indirect effects that are pervasive across the metabolic network. Current analytical and technical limitations have long created a bottleneck in metabolic data interpretation. To address these shortcomings, we developed Metaboverse, a user-friendly tool to facilitate data exploration and hypothesis generation. Here we introduce algorithms that leverage the metabolic network to extract complex reaction patterns from data. To minimize the impact of missing measurements within the network, we introduce methods that enable pattern recognition across multiple reactions. Using Metaboverse, we identify a previously undescribed metabolite signature that correlated with survival outcomes in early stage lung adenocarcinoma patients. Using a yeast model, we identify metabolic responses suggesting an adaptive role of citrate homeostasis during mitochondrial dysfunction facilitated by the citrate transporter, Ctp1. We demonstrate that Metaboverse augments the user's ability to extract meaningful patterns from multi-omics datasets to develop actionable hypotheses.
Metabolism plays a central role in many biological processes, including cell fate decisions, protein homeostasis, stress responses, energy production, cell signalling, DNA replication and silencing, and more [1][2][3][4][5][6][7][8][9][10][11][12][13] . The rigorous study of metabolism, including the use of high-throughput transcriptomics, proteomics and metabolomics, has generated a systematic map of metabolic reactions and their constituents. Large consortium projects, such as the Kyoto Encyclopedia of Genes and Genomes 14,15 , the Human Metabolome Database 16 and the Reactome Pathway Database [17][18][19] , have helped formalize systematic maps of metabolism. These resources and tools provide a more holistic understanding of metabolism.

Technical Report
https://doi.org/10.1038/s41556-023-01117-9 Metaboverse introduces a broad collection of algorithms that enable the rapid and automated discovery of various complex regulatory patterns (Extended Data Fig. 2a), including using information about reaction catalysts and inhibitors to identify more complex patterns. Generally, Metaboverse compares the inputs (substrates), outputs (products) and modifiers (catalysts and inhibitors) of each reaction to determine if there is a net change across the reaction (Supplementary Notes 2 and 3, equations (1)- (11)). It then returns an interactive list of reactions that pass the specified fold change and/or statistical thresholds. Users may also select annotated pathways in which that reaction is found for exploration and contextualization.

Data sparsity and the identification of complex patterns
Missing data points, particularly in metabolomics experiments, are frequent and make the identification of network regulatory patterns challenging 22,23 . Over 1,000 metabolites are known to participate in human metabolism, yet current metabolomics technologies limit the number of metabolites that are typically quantified. This results in gaps in the measured metabolic network and can confound pattern recognition across reactions. We developed algorithms that collapse up to three connected reactions with intermediate missing data points if they can be bridged with measured data on the distal ends of the reaction series (Extended Data Figs. 2b and 3 and Methods). Thus, if an intermediate reaction component is unmeasured in the user's dataset, but an input to the reaction producing the unmeasured component or an output to the reaction consuming the unmeasured component was quantified, these two reactions can be combined into one representative reaction. Similar concepts have been used to define amino acid-related metabolites 28 .

Metabolic signatures in LUAD
We first used public steady-state metabolomics data from early stage human lung adenocarcinomas (LUAD) 29 to test whether Metaboverse could capture the metabolic perturbations identified in this study as well as additional signatures. In this specific study, the authors asked which metabolites could be used as diagnostic markers to identify early stage adenocarcinomas (Fig. 1a). Metaboverse reliably prioritized patterns in nucleotide metabolism using collapsed and uncollapsed reaction representations ( Fig. 1d-g and Extended Data Fig. 4) that were consistent with the original study 29 and our manual re-analysis of the data using MetaboNet and DyMetaboNet 24 . Metaboverse also identified reaction patterns related to xanthine metabolism, a metabolite highlighted in the original study 29 (Fig. 1d,g and Extended Data Fig. 4a), and a collapsed reaction pattern driven by a reduced relative abundance of citric acid and an increased relative abundance of both glutamic acid and malic acid, with cross-pathway connections to a decrease in lysine (Fig. 1b,c and Extended Data Fig. 4b). Although changes in the concentrations of these metabolites were identified in the original study, their relevance to the study cohort and their connectedness within the metabolic network were not discussed 29 .
Comparing the substrate and product measurements of a reaction allows us to identify multi-dimensional patterns that provide further insight into metabolic behaviours 24 . Strikingly, the top-ranking Average reaction pattern (Supplementary Note 3, equation (2)) suggested spermine synthase activity (SMS) (Fig. 2a), catalysing a reaction connected to polyamine synthesis. The second top-ranking Average reaction pattern (Supplementary Note 3, equation (2)) suggested the activity of glycerate kinase (GLYCTK) (Extended Data Fig. 5), a pattern identified in our previous manual re-analysis of the data 24 , which could implicate perturbations in serine metabolism-a pathway that contributes to tumorigenesis 30 . These particular connections were missed in the original study but were quickly and automatically highlighted by Metaboverse.
To circumvent the challenges associated with the inherent complexity of metabolic networks, it is common to adopt reductionist approaches. Although such strategies are vital to advancing our biological understanding, they can conceal multi-dimensional properties of metabolism as biological perturbations lead to complex, cooperative effects, many of which may seem negligible in isolation. Thus, reductionism can limit insight 20,21 . Additional challenges arise with the sparsity of metabolomics datasets 22,23 and the use of different network visualization parameters, which each influence how effectively one can interpret metabolic data 24 .
To address these limitations, we created Metaboverse, an interactive application for exploring multi-omics data in the context of the metabolic network and for generating data-driven hypotheses. Metaboverse delivers four critical innovations or contributions that provide a powerful interface for the interpretation of data. First, Metaboverse uses a diverse library of possible metabolic patterns to search the metabolic network. Second, Metaboverse generates summarized reaction representations that span multiple reactions, enabling the discovery of patterns across sparse datasets and between pathways. Third, Metaboverse automates pre-processing and network curation tasks for a diverse set of model organisms. Fourth, Metaboverse enhances the contextualization of these patterns by providing a dynamic exploratory interface to facilitate hypothesis generation.
In this Technical Report, we detail these components and their integration into Metaboverse, present two vignettes in which we use Metaboverse to analyse public and newly generated datasets, and outline important patterns that were detected using Metaboverse but not by existing methods. We will present benchmarks between Metaboverse and other comparable tools, as well as sensitivity analysis detailing Metaboverse pattern recognition with sparse datasets. Metaboverse is available at https://github.com/Metaboverse.

Dynamic reaction visualization augments hypothesis creation
Metaboverse curates a reaction network database on the basis of a Reactome knowledgebase [17][18][19] , BiGG 25 or BioModels 26,27 network. Users provide any combination of transcriptomics, proteomics and/or metabolomics data, which are then integrated into the reaction network as log 2 (fold change) and statistical values for each measurement and sample comparison (Extended Data Fig. 1). An interactive data formatting aid is available for users requiring assistance to format their data for Metaboverse. Additional methods during data processing allow for the interpolation of protein complex measurements or protein measurements from upstream components, such as protein or gene measurements.
Once Metaboverse integrates user data onto the network, interactive tools help visualize and explore reactions and their components individually, by canonical pathway definitions or by nearest reaction neighbourhood networks for any given network component. Nearest reaction neighbourhood networks consequently aid in identifying upstream or downstream patterns that may occur between pathways 24 . We integrated visualization options to limit the display of metabolic hubs as detailed in our previous work 24 to assist the user's exploration of the data beyond the most familiar pathways.

Pattern recognition enables robust data interpretation
Previously, we described MetaboNet and DyMetaboNet, wherein we identified multi-dimensional, reaction-based patterns by manually comparing inputs and outputs of reactions and looking for general trends across a reaction 24 . While this approach identified several interesting patterns within datasets, it was time consuming and incomplete. Similarly, other existing tools may provide some pattern identification capabilities, but the scope of the patterns they can capture is limited and fails to capture vital metabolic signatures (Supplementary  Note 1 and Supplementary Table 1).

Metaboverse signatures correlate with patient prognosis
To assess whether the reaction patterns identified by Metaboverse could provide meaningful insights into the clinical outcomes, we analysed a LUAD cohort from The Cancer Genome Atlas (TCGA) using Cox regression analysis with an optimized expression cut-off [31][32][33] , and observed striking correlations between the expression of the gene encoding SMS and patient survival outcomes (optimized FPKM (fragments per kilobase of transcript per million) cut-off: 49.5413; Fig. 2b b and Extended Data Fig. 5). Notably, the log-rank P value for SMS gene expression ranked in the top 0.65% of all regressions (118 of 18,169 surveyed genes) (Fig. 2c). Alternatively, the log-rank P value for GLYCTK gene expression in the LUAD cohort was poor (optimized FPKM cut-off: 0.913; Extended Data Fig. 5), possibly due to the lower DepMap dependency score of GLYCTK compared with SMS gene expression in LUAD cell lines (Extended Data Fig. 6a) (refs. 34,35). Additionally, at the time of writing, SMS is the only spermine-producing enzyme annotated in humans (Reactome), whereas the product of the reaction catalysed by the GLYCTK enzyme, 3-phosphoglyceric acid, can also be produced by the glycolytic enzyme phosphoglycerate kinase 1 (PGK1). It is also interesting to note the modest correlation between the top-ranking LUAD reaction pattern enzymes and their corresponding survival statistics in LUAD gene expression data (Extended Data Fig. 6b).
SMS has been implicated in the silencing of Bim, which encodes a pro-apoptotic factor, in colon adenocarcinomas 36 , and SMS gene expression tends to correlate with more proliferative cell types in the lung (Extended Data Fig. 6c) (refs. 36,37). Similar patterns in SMS have also been identified in mouse xenografts of lung cancer 38 . Thus, Metaboverse-guided predictions may inform research directions and treatment strategies for this disease.

Metabolic signatures of respiratory impairment in yeast
To further demonstrate Metaboverse in the context of multi-omics and time course or multi-condition datasets, we analysed a model of mitochondrial fatty acid synthesis (mtFAS) deficiency in Saccharomyces cerevisiae. mtFAS is an evolutionarily conserved pathway that produces lipoic acid, a critical co-factor for many metabolic enzymes. Recent work has uncovered additional biological roles for this pathway,  including in the generation of acylated-mitochondrial acyl carrier protein, which controls the assembly and activation of mitochondrial oxidative phosphorylation complexes [39][40][41] . The discovery of patients with mutations in genes encoding key mtFAS enzymes further illustrates the physiological importance of this pathway 42 . Deletion of the MCT1 gene (homologous to the Malonyl-CoA-Acyl Carrier Protein Transacylase in humans) abolishes the activity of the mtFAS pathway 40 , allowing us to use an mct1Δ mutant to determine the effects of mtFAS pathway perturbation on metabolism. We utilized previously generated proteomics data in mct1Δ yeast after a shift from glucose-to raffinose-supplemented growth media, which triggers the biogenesis of mitochondria and tends to pronounce respiratory defects 40 , together with RNA sequencing (RNA-seq) at 0, 3 and 12 h and steady-state metabolomics at 0, 0.25, 0.5, 1, 3 and 12 h after this shift in growth media (Fig. 3a). By layering these multi-omics data onto the S. cerevisiae metabolic network, we could explore acute and chronic responses to mtFAS deficiency.
The top-ranked ModReg and TransReg reaction patterns (Supplementary Note 3, equations (4) and (5)) at 12 h predictably centred around the abundance of mitochondrial carrier proteins, Coenzyme A-related reactions and iron-sulfur cluster biogenesis ( Fig. 3b and Extended Data Fig. 7a-c,e). We also observed respiratory signatures consistent with previous studies 40, 43 , including the identification of patterns in electron transfer from ubiquinol to cytochrome C via Complex
The second expected pattern of interest was the general reduction in the abundances of tricarboxylic acid (TCA) cycle-related enzymes and intermediate metabolites (Extended Data Fig. 7d). We observed an increase in Dic1 protein abundance ( Fig. 4e and Extended Data Fig. 8), consistent with reports that DIC1 gene expression may be essential for growth in the presence of certain carbon sources due to its role in shuttling phosphate across the mitochondrial inner membrane in exchange for malate or succinate 44 . Yeast with respiratory defects may adapt by increasing Dic1 protein levels to facilitate substrate transport to maintain TCA cycle flux and mitochondrial respiration. Indeed, we found that whole-cell malate levels were elevated in mct1Δ mutants relative to wild type (WT) (Extended Data Figs. 8 and 9l).

Metaboverse signatures correlate with metabolic adaptations
One unexpected top-ranking reaction pattern identified using the 12 h multi-omics datasets involved the tricarboxylate transporter, Ctp1, which transfers citrate across the mitochondrial inner membrane ( Fig. 3c and Extended Data Fig. 7h) (ref. 45). This reaction was also identified at the 3 h timepoint using metabolomics and RNA-seq data (Fig. 3d). Earlier metabolomics measurements showed citrate levels initially decreasing then increasing over the time course (Fig. 3e).

Technical Report
https://doi.org/10.1038/s41556-023-01117-9 Since citrate is a key metabolite in the TCA cycle, we hypothesized that Ctp1 protein levels decrease in response to early respiratory stresses in mtFAS-deficient cells.
We hypothesized that if mct1Δ cells adaptively downregulate Ctp1, then overexpression of Ctp1 should cause growth defects. Indeed, we observed a specific sensitivity to Ctp1 overexpression in the mct1Δ background ( Fig. 4a and Extended Data Fig. 10a). We found that overexpression of a C-terminal green fluorescent protein (GFP)-CTP1 fusion vector ablated this growth defect, and verified that this did not result from spurious effects from protein overexpression, such as the formation of inclusion bodies ( Fig. 4a and Extended Data Fig. 10b). These data suggest that the GFP-Ctp1 is inactive, mislocalized or otherwise perturbed, and the observed growth defect was due to functional Ctp1 localizing to the mitochondria [46][47][48] .
We performed co-expression analysis of CTP1 gene expression using thousands of uniformly processed WT yeast RNA-seq experiments 49,50 and discovered that correlating gene sets (SpQN-normalized Pearson's r > 0.5) included programmes related to the biosynthesis of aspartate, lysine and other amino acids (Fig. 4b). Aspartate can be converted into fumarate, which might partially explain increased fumarate concentrations despite an impaired ETC, while lysine can be used as a substrate for the generation of acetyl-CoA, which plays a central role in the mtFAS pathway 40 . Metabolic rewiring of these and other amino acids was also apparent in the metabolomics (Fig. 4c) and proteomics datasets (Fig. 4e), potentially explaining the ability of Mct1Δ mutants to maintain growth despite metabolic dysfunction (Fig. 4a). However, CTP1 gene overexpression appears to disrupt this biosynthetic rewiring, disabling the ability of mct1Δ cells to tune their growth (Fig. 4a).
We observed increased protein abundance levels in components of anaplerotic pathways, namely Mae1, Pyc1, Pyc2, Cit2/Cit3 and Gdh2 (Fig. 4f). These enzymes sequentially catalyse the conversion of malate to pyruvate to oxaloacetate to citrate, respectively. PYC1 and CIT2 are also targets of the retrograde signalling pathway, which elicits mitochondrial-to-nuclear communication during mitochondrial stress 51 . Work on the retrograde pathway has suggested that elevated Cit2 expression functions to maintain metabolite pools for anabolic growth 51 , which we also see in the mct1Δ (Fig. 4f and Extended Data Fig. 10). Pyruvate dehydrogenase levels are reduced in mtFAS-deficient cells, so upregulation of Pyc1 and Pyc2 probably provides an alternative pathway for converting pyruvate to oxaloacetate, which can then be converted into citrate for biosynthetic fuel 52,53 (Source Data Fig. 4).

Metaboverse identifies meaningful and verifiable patterns
We evaluated how Metaboverse can offer a more comprehensive experience when analysing metabolism-related datasets for known and novel regulatory patterns compared with existing metabolic network exploration tools. The scope and performance of many of these tools are summarized in Supplementary Note 1 and Supplementary Table 1. However, we seek to emphasize that the key findings and hypotheses we present in this manuscript regarding SMS in LUAD and CTP1 in yeast were uniquely discovered, contextualized and/or prioritized using Metaboverse. For benchmarking purposes, we decided to focus our analysis on Metabolic Network Segmentation (MNS 54 ) and Ingenuity Pathway Analysis (IPA; Qiagen), the two options with operable code or a graphical user interface at the time of writing (Supplementary Table 1).
We evaluated the ability of MNS to prioritize verifiable or canonical signatures within the LUAD metabolomics dataset (Figs. 1 and 2). MNS was able to identify both SMS (ranked number 27 in MNS and number 1 in Metaboverse) and GLYCTK (ranked number 125 in MNS and number 2 in Metaboverse), but their relevance to the dataset and their roles within the metabolic network were opaque. While MNS identified xanthine dehydrogenase (XDH, ranked number 4 in MNS), Metaboverse, ranked this reaction as number 11 due to a poorer statistical value for hypoxanthine (Benjamini-Hochberg corrected P = 0.11) (Fig. 5a). Another challenge we experienced in using this software was the sheer number of genes per result, ranging from 1 to 447 for the top ten results from the dataset. The scope of these lists can greatly hamper the user's ability to generate actionable hypotheses (Source Data).
Next, we evaluated MNS prioritization of verifiable or canonical signatures from the mct1Δ model (Figs. 3 and 4). We were only able to integrate single timepoint metabolomics data from the mct1Δ experiments as MNS does not have any multi-omics analysis capabilities (we chose to evaluate the 12 h timepoint). ETC defects are strong hallmarks of mtFAS dysfunction 40  When comparing Metaboverse to IPA, we used the LUAD metabolomics data described previously 29 . The IPA graphical summary included 'Concentration of phosphotidylcholine', 'PPARGC1B' and 'ARNT', with no further context. IPA's pattern identification tools, 'Upstream regulators' and 'Causal networks', identified most of the SMS-and GLYCTK-related metabolites. However, IPA did not identify either enzyme (SMS or GLYCTK) and the 'Upstream regulators' analysis identified spermidine 184 times, spermine 0 times, 5 -methylthioadenosine 31 times, glyceric acid 1 time (but as regulator number 613) and 3-phosphoglyceric acid 65 times. 'Causal networks' analysis likewise identified spermidine 160 times, spermine 0 times, 5 -methylthioadenosine 31 times, glyceric acid 0 times and 3-phosphoglyceric acid 56 times. In either case, these instances rank from 1 to 934 out of 940 total results, and within large aggregate lists of other metabolites, which resemble overly general set enrichment analysis. This approach, therefore, requires non-trivial parsing of large lists to identify relevant patterns.

Metaboverse pattern analysis is robust against sparse data
To evaluate the ability of Metaboverse to identify relevant metabolic signatures under conditions of data sparsity, a frequent hallmark of metabolomics data 22 , we randomly removed 0-60% of quantified proteins and 0-90% of quantified metabolites from the two data vignettes presented herein. For each dataset, we highlighted the reaction pattern type where the most relevant patterns were found in the Metaboverse analyses (the Average pattern for the LUAD study and the ModReg and TransReg patterns for the mct1Δ study). The LUAD metabolomics study generated 183 consistent metabolite quantifications between paired adenocarcinoma and adjacent normal tissue 29 . We first evaluated the total number of possible reaction patterns (Methods) and found that the LUAD dataset contained 2,160/3,853 (56.06%) and 239/2,219 (10.77%) metabolic reactions with enough measured data with or without reaction collapsing, respectively. By randomly dropping out varying numbers of the 183 metabolites, we noticed a consistent decline in the number of reaction patterns Metaboverse was able to identify without collapsed reaction representations (Fig. 6a). With collapsed reaction representations, Metaboverse was more resilient against data sparsity (Fig. 6b). Metaboverse sometimes identified more reaction patterns with missing data (Fig. 6a). In these cases, reactions with more than one measured input or output can be detrimentally weighted by low values, thus lowering the reaction's   Fig. 6 | Metaboverse pattern recognition is resilient to missing data. a-d, Random analyte dropout datasets for each omics type (n = 6 per dropout). a, Box plots of the number of Average reaction patterns Metaboverse could identify within the LUAD metabolomics dataset with or without reaction collapsing with 0-90% of the original input metabolomics data missing. b, Heat maps of the proportion of replicates that identified each of the signature reaction patterns for the LUAD dataset, as described as part of Fig. 5a. c, Box plots of the number of `Modifier' reaction patterns Metaboverse could identify within the mct1Δ dataset within this study with or without reaction collapsing with 0-90% of the original input metabolomics data missing and 0%, 30% or 60% of the original input proteomics data missing. d, Heat maps of the proportion of replicates that identified each of the signature reaction patterns for the yeast dataset, as described as part of Fig. 5b score. This emphasizes the importance of exploring the data with a variety of the different reaction pattern types provided by Metaboverse to capture the breadth of metabolic signatures. Applying this evaluation design to the mct1Δ 12 h proteomics and metabolomics datasets, which contained 662/986 (67.14%) and 172/604 (28.48%) metabolic reactions with enough measured data with and without reaction collapsing, respectively. We again observed a steady decline in the number of identified reaction patterns. However, similar to the LUAD study, the use of collapsed reaction representations provided more resilience in the number of identified patterns (Fig. 6c). Strikingly, we observed that proteomics data appears to buffer Metaboverse pattern recognition despite missing data points (Fig. 6d). Reassuringly, in some cases, reaction patterns could be identified 100% of the time with up to 60% of the original metabolomics or proteomics data missing (Fig. 6d).

Discussion
Metaboverse provides an easy-to-use interactive visualization tool for exploring multi-omics data in the context of the metabolic network and guiding hypothesis generation (Extended Data Fig. 1). Importantly, Metaboverse introduces pattern recognition and data sparsity handling algorithms to identify metabolic signatures within a user's dataset (Extended Data Fig. 2).
Using public metabolomics data from LUADs, we demonstrated that Metaboverse could not only identify reaction patterns corresponding to known metabolic signatures (Fig. 1), but it identified biologically relevant patterns that were undetected by existing data-driven approaches. For example, Metaboverse identified a previously undescribed reaction pattern centred around SMS, and we subsequently determined a positive correlation of LUAD patient survival with SMS gene expression (Fig. 2).
Using multi-omics datasets in a yeast model of mtFAS dysfunction, we demonstrated that Metaboverse could identify important regulatory and compensatory mechanisms, including adaptations to respiratory defects that enable these mutants to maintain their growth (Figs. 3 and 4). Specifically, the insights from Metaboverse implicate altered Ctp1 activity and citrate homeostasis as part of the broader biosynthetic response to mitochondrial dysfunction, allowing for continued cellular growth and survival.
We benchmarked Metaboverse against existing tools with similar goals to Metaboverse and found that Metaboverse consistently ranked relevant reactions higher and provided an intuitive format to explore these patterns (Fig. 5). We tested the ability of Metaboverse to identify critical reaction patterns with missing data points and found that the number of identified reactions decreased with increasing data sparsity, but this was buffered by the use of collapsed reaction representations and multi-omics data integration (Fig. 6).
Metaboverse integrates multiple data layers onto the metabolic network across model organisms, allowing users to easily analyse data to identify interesting patterns. Thus, Metaboverse provides an integrated platform for pattern recognition and hypothesis generation, and can assist users in their design of future experiments with a more holistic mindset.

Online content
Any methods, additional references, Nature Portfolio reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information, details of author contributions and competing interests, and statements of data and code availability are available at https://doi.org/10.1038/s41556-023-01117-9.
Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Network curation
Biological networks are curated using the current version of the Reactome knowledgebase [17][18][19] . In particular, the pathway records and Ensembl-and UniProt-Reactome mapping tables are integrated into the network database for Metaboverse. Additionally, the ChEBI and The Human Metabolome databases are also referenced for metabolite synonym mapping to accept more flexible metabolite input nomenclature from the user 16,56 . These data are used to generate a series of mapping dictionaries for entities to reactions and reactions to pathways for the curation of the total reaction network. Reaction annotations are additionally obtained from the Reactome knowledgebase [17][18][19] . At the time of writing, users can also provide BiGG 25 and BioModels 26,27 networks; however, full support cannot always be guaranteed due to the more bespoke nature of some network models from these sources. The resulting curation file is output as a pickle-formatted .mvdb file. For further details, we refer the reader to the accompanying Supplementary Note 2.
To overlay user data on the global network, first, user-provided gene expression, protein abundance and/or metabolite abundances' names are mapped to Metaboverse compatible identifiers. Metaboverse accepts any data input that can be appropriately mapped to a standard gene, protein or metabolite identifier or name. For components that Metaboverse is unable to map, a table is returned to the user so they can provide alternative names to aid in mapping. Second, provided data values are mapped to the appropriate nodes in the network. In cases where gene expression data are available, but protein abundance values are missing, Metaboverse will take the average of the available gene expression values to broadcast to the protein node. For complexes, the median of all available component values (metabolites, proteins and so on) is calculated (Supplementary Note 3, equation (12)). An aggregated P value is inferred by multiplying the geometric mean of the P values, as in refs. 57,58 (Supplementary Note 3, equation (13)).

Collapsing reactions with missing values
After data mapping is complete, Metaboverse will generate a collapsed network representation for optional viewing during later visualization. Metaboverse enforces a limit of up to three reactions that can be collapsed as data down a pathway should only be inferred so far. Reaction collapsing allows for partial matches between inputs and outputs of two reactions to account for key metabolic pathways where a metabolite that is output by one reaction may not be required for the subsequent reaction. To perform a partial collapse, Metaboverse operates by largely the same scheme as outlined below, but additionally if a perfect match between reactions is not available, checks for partial matches by filtering out high-degree nodes (quartile 98 of all non-reaction node degrees) and then checking if, by default, at least 30% of the nodes match with its neighbour. For further details, we refer the reader to the accompanying Supplementary Note 2 and Extended Data Fig. 3.

Regulatory pattern searches and prioritization
Metaboverse provides a variety of different regulatory patterns for the user to explore. To identify a reaction pattern is to compare some value that is computed from a reaction with a user-specified threshold. Equations for the reaction patterns available at the time of publication are included in the accompanying Supplementary Note 3 (equations (1)- (11)). Metaboverse provides a variety of reaction pattern sorting methods, which are detailed further in the accompanying Supplementary Notes 2 and 3. A complete and current list and description of all available reaction pattern modules can be found in the documentation at https://metaboverse.readthedocs.io.
Stoichiometry is not directly accounted for in reaction pattern identification as log 2 (fold change) values are utilized and factor out stoichiometric constants between experiment and control conditions and instead focus on relative magnitude changes of a given reaction component.

Nearest neighbourhood searches and prioritization
To visualize all connections to a given network component, a user can select an entity (a gene, protein or metabolite) and visualize all reactions in which the component is involved. By doing so, the user can visualize other downstream effects the change of one entity might have across the total network, which consequently aids in bridging and identifying any reaction that may occur between canonically annotated pathways. These neighbourhoods can be expanded to view multiple downstream reaction steps and their accompanying genes, proteins and metabolites by modulating the appropriate user option in the software.
The user can also limit which entities are shown by enforcing a degree threshold. By setting this value at 50, for example, the network would not show nodes that have 50 or more connections. One caveat, however, is that this feature will occasionally break synchronous pathways into multiple pieces if one of these high-degree nodes formed the bridge between two ends of a pathway.

Perturbation networks
Perturbation networks are generated by searching each reaction in the total reaction network for any reaction where at least one component is substantially perturbed. The user can modify the necessary criteria to base the search on the expression or abundance value or the statistical value and can choose the thresholding value to be used. For the expression thresholding, the provided value is assumed to be the absolute value, so a thresholding value of 3 would include any reactions where at least one component showed a greater than 3 measured change or less than −3 measured change, the value of which is dependent on the data provided by the user. Thus, these networks could represent reactions where a component was perturbed to a notable degree on a log 2 fold change scale, z-score scale, or another appropriate unit for that biological context. Once a list of perturbed reactions is collected, the network is constructed, including each of these reactions and their components. Perturbed neighbouring reactions that share components are thus connected within the network, and perturbed reactions that are not next to other perturbed reactions are shown as disconnected subnetworks.

Network visualization and exploration
Force-directed layouts of networks are constructed using D3 (https:// d3js.org) by taking a user-selected pathway or entity and querying the reactions that are components of the selected pathway or entity. All inputs, outputs, modifiers and other components of these reactions, along with edges where both source and target are found in the subnetwork as nodes, are included and displayed. Relevant metadata, such as user-provided data and reaction descriptions, can be accessed by the user in real time. To visualize a pathway, a user selects a pathway, and all component reactions and their substrates, products, modifiers and metadata are queried from the total reaction database. Super-pathways help categorize these pathways and are defined as any pathway containing more than 200 nodes.
Time course and multiple condition experiments are automatically detected from the user's input data. When users provide these data and specify the appropriate experimental parameters on the variable input page, they will have the option to provide timepoint or condition labels. Provided data should be listed in the data table in the same order that the labels are provided. Within all visualization modules, the data for each timepoint or condition can then be displayed using a slider bar, which will allow the user to cycle between timepoints or conditions.
Some performance optimization features are included by default to prevent computational overload. For example, nearest neighbour subnetworks with more than 1,500 nodes, or nodes with more than 500 edges, will not be plotted because the plotting of this information in real time can be prohibitively slow.

Software packaging
The Metaboverse application is packaged using Electron (https://electronjs.org). Back-end network curation and data processing are performed using Python (https://www.python.org/) and the NetworkX 59 , pandas 60,61 , NumPy 62 , SciPy 61,63 and Matplotlib 63 libraries. This back-end functionality is packaged as a single, operating system-specific executable using the PyInstaller library (https://www.pyinstaller.org) and is available to the app's visual interface for data processing. Front-end visualization is performed using Javascript and relies on the D3 (https:// d3js.org) and JQuery packages (https://jquery.com). Saving network representations to a PNG file is performed using the (https://github. com/edeno/d3-save-svg) and string-pixel-width (https://github. com/adambisek/string-pixel-width) packages. Documentation for Metaboverse is available at https://metaboverse.readthedocs.io. Continuous integration services are performed by GitHub Actions to routinely run test cases for each change made to the Metaboverse architecture. The Metaboverse source code can be accessed at https:// github.com/Metaboverse/metaboverse. The code used to draft and revise this manuscript, as well as all associated scripts used to generate and visualize the data presented in this manuscript, can be accessed at ref. 55.

Human LUAD metabolomics and analysis
Data were accessed from Metabolomics Workbench project PR000305 and processed as in our previous re-study of these data 24 . P values were derived using a two-tailed, homoscedastic Student's t-test and adjusted using the Benjamini-Hochberg correction procedure.
The initial Kaplan-Meier survival analysis was performed using tools and data hosted on The Human Protein Atlas (version 20.1; released 24 February 2021) (refs. [65][66][67]. Survival analysis as displayed in this manuscript was performed in R (version 4.0.3) using the survival (version 3.2-11) and survminer (version 0.4.9) packages. Correlation analysis between Metaboverse reaction pattern rank and survival statistic was performed using the Pearson correlation coefficient and a loess regression using the ggscatter() function from the ggpubr (version 0.4.0) package. TCGA FPKM gene expression data were obtained from the Human Protein Atlas project (https://www. proteinatlas.org/download/rna_cancer_sample.tsv.zip) and clinical patient data were obtained from TCGA (https://portal.gdc.cancer.gov/ projects/TCGA-LUAD). Clinical data were censored as 'Dead' or 'Alive', and 'Alive' patients were right censored using days since last follow-up. Patients were stratified into two gene expression groups (High, Low) using the optimized surv_cutpoint() function from the survminer package (version 0.4.9) with the minimum proportion for a group set at 0.2 (ref. 33).
DepMap data (21Q4 Public) were subsetted to include only non-small cell lung cancer cell lines. The scatter plot was generated from the DepMap online interface (https://depmap.org/).

RNA-seq
RNA-seq data were generated by growing S. cerevisiae biological replicates for strains mct1Δ (n = 4) and WT (n = 4). Briefly, cells were grown in glucose and switched to raffinose-supplemented growth medium (SR-URA) for 0, 3 and 12 h, such that at the time of collection, cultures were at OD 600 of 1. Cultures were flash frozen, and later total RNA was isolated using the Direct-zol kit (Zymo Research) with on-column DNase digestion and water elution. Sequencing libraries were prepared by purifying intact poly(A) RNA from total RNA samples (100-500 ng) with oligo(dT) magnetic beads, and stranded messenger RNA-seq libraries were prepared as described using the Illumina TruSeq Stranded mRNA Library Preparation Kit (RS-122-2101 and RS-122-2102). Purified libraries were qualified on an Agilent Technologies 2200 TapeStation using a D1000 ScreenTape assay (cat. nos. 5067-5582 and 5067-5583). The molarity of adaptor-modified molecules was defined by quantitative polymerase chain reaction (PCR) using the Kapa Biosystems Kapa Library Quant Kit (cat. no. KK4824). Individual libraries were normalized to 5 nM, and equal volumes were pooled in preparation for Illumina sequence analysis. Sequencing libraries (25 pM) were chemically denatured and applied to an Illumina HiSeq v4 single-read flow cell using an Illumina cBot. Hybridized molecules were clonally amplified and annealed to sequencing primers with reagents from an Illumina HiSeq SR Cluster Kit v4-cBot (GD-401-4001). Following the transfer of the flow cell to an Illumina HiSeq 2500 instrument (HCSv2.2.38 and RTA v1.18.61), a 50-cycle single-read sequence run was performed using HiSeq SBS Kit v4 sequencing reagents (FC-401-4002).
Sequence FASTQ files were processed using XPRESSpipe (version 0.6.0) (ref. 68). Batch and log files are available at ref. 55. Notably, reads were trimmed of adaptors (AGATCGGAAGAGCACACGTCTGAACTC-CAGTCA). On the basis of library complexity quality control, de-duplicated alignments were used for read quantification due to the high number of duplicated sequences in each library. Differential expression analysis was performed using DESeq2 (version 1.22.1) (ref. 69) by comparing mct1Δ samples with WT samples at the 12 h timepoint to match the steady-state proteomics data. log 2 (fold change) and false discovery rate (FDR; 'p-adj') values were extracted from the DESeq2 output.
Briefly, cells were grown in glucose and switched to SR-URA overnight and collected at the mid-log phase. Cells were resuspended, lysed and clarified. Proteins were then subjected to disulfide reduction and digested for 16 h with LysC (1:100 enzyme:protein ratio) at room temperature, followed by trypsin (1:100 enzyme:protein ratio) for 6 h at 37 °C. Proteins were again quantified and subjected to tandem mass tag (TMT)-11 labelling, after which samples were pooled. Pooled TMT-labelled peptide samples were fractionated using basic pH reversed-phase high-performance liquid chromatography (LC) (Agilent 1260 Infinity pump equipped with a degasser and a single wavelength detector set at 220 nm). Fractions were desalted via StageTip, dried via vacuum centrifugation and reconstituted in 5% acetonitrile, 5% formic acid for LC-tandem mass spectrometry (MS/MS) processing. MS data were collected using an Orbitrap Fusion Lumos mass https://doi.org/10.1038/s41556-023-01117-9 spectrometer (Thermo Fisher Scientific) equipped with a Proxeon EASY-nLC 1000 LC system (Thermo Fisher Scientific). The multi-notch MS3-based TMT method was used 70 . MS2 mass spectra were processed using the Sequest algorithm 71 . Spectra were converted to mzXML, followed by database searching using the yeast proteome downloaded from Uniprot (UniProt-Consortium, 2015) in both forward and reverse directions, along with common contaminating protein sequences. Peptide-spectrum matches were adjusted to a 1% FDR 72 . Linear discriminant analysis was used to filter peptide-spectrum matches, as described previously 73 . Each TMT channel was summed across all quantified proteins and normalized to enforce equal protein loading. Each protein's quantitative measurement was then scaled to 100.
For the analysis used within this manuscript, we compared the mct1Δ (n = 3) with the WT (n = 3) cell populations. log 2 (fold change) values and Benjamini-Hochberg-corrected P values were generated by comparing mct1Δ with the WT cells. P values were generated before correction using a two-tailed, homoscedastic Student's t-test.

Gas chromatography metabolomics
Metabolomics data were generated by growing the appropriate yeast strains in synthetic complete media supplemented with 2% glucose until they reached saturation (n = 6; except in one 3 h WT sample, where n = 5). Cells were then transferred to S-minimal medium containing 2% raffinose and leucine and collected after 0, 15, 30, 60 and 180 min (n = 6 per timepoint per strain, except for the 3 h WT samples, where n = 5) at OD 600 0.6-0.8.
A 75% boiling ethanol (EtOH) solution containing the internal standard d4-succinic acid (Sigma 293075) was then added to each sample. Boiling samples were vortexed and incubated at 90 °C for 5 min. Samples were then incubated at −20 °C for 1 h. After incubation, samples were centrifuged at 5,000g for 10 min at 4 °C. The supernatant was then transferred from each sample tube into a labelled, fresh 13 × 100 mm glass culture tube. A second standard was then added (d27-myristic acid CDN Isotopes: D-1711). Pooled quality control samples were made by removing a fraction of the collected supernatant from each sample, and process blanks were made using only extraction solvent and no cell culture. The samples were then dried en vacuo. This process was completed in three separate batches.
All gas chromatography-MS analysis was performed with an Agilent 5977b GC-MS MSD-HES and an Agilent 7693A automatic liquid sampler. Dried samples were suspended in 40 μl of 40 mg ml −1 O-methoxylamine hydrochloride (MP Bio no. 155405) in dry pyridine (EMD Millipore no. PX2012-7) and incubated for 1 h at 37 °C in a sand bath. Then, 25 μl of this solution was added to autosampler vials and 60 μl of N-methyl-N-trimethylsilyltrifluoracetamide (MSTFA) with 1% trimethylchlorosilane (TMCS) (Thermo Fisher no. TS48913) was added automatically via the autosampler and incubated for 30 min at 37 °C. After incubation, samples were vortexed, and 1 μl of the prepared sample was injected into the gas chromatograph inlet in the split mode with the inlet temperature held at 250 °C. A 10:1 split ratio was used for the analysis of the majority of metabolites. For those metabolites that saturated the instrument at the 10:1 split concentration, a split of 50:1 was used for the analysis. The gas chromatograph had an initial temperature of 60 °C for 1 min followed by a 10 °C min −1 ramp to 325 °C and a hold time of 5 min. A 30 m Phenomenex Zebron AB-5HT with a 5 m inert Guardian capillary column was employed for chromatographic separation. Helium was used as the carrier gas at a rate of 1 ml min −1 .
Data were collected using MassHunter software (Agilent). Metabolites were identified, and their peak area was recorded using Mass-Hunter Quant. These data were transferred to an Excel spreadsheet (Microsoft). Metabolite identity was established using a combination of an in-house metabolite library developed using pure purchased standards, and the NIST (https://www.nist.gov) and Fiehn libraries 74 . Resulting data from all samples were normalized to the internal standard d4-succinate. P values were derived using a homoscedastic, two-tailed Student's t-test and adjusted using the Benjamini-Hochberg correction procedure.

Liquid chromatography metabolomics
Metabolomics data were generated by growing the appropriate yeast strains in synthetic complete medium supplemented with 2% glucose until they reached saturation (n = 3). Cells were then transferred to S-minimal medium containing 2% raffinose and leucine and collected after approximately 8 h (n = 3) at OD 600 0.6-0.8.
The procedures for metabolite extraction were performed as previously described 75 . Yeast cultures were pelleted, snap-frozen and kept at −80 °C. Then 5 ml of 75% boiled EtOH was added to every frozen pellet. Pellets were vortexed and incubated at 90 °C for 5 min. All samples were then centrifuged at 5,000g for 10 min. Supernatants were transferred to fresh tubes, evaporated overnight in a Speed Vacuum and then stored at −80 °C until they were run on the mass spectrometer.
The conditions for LC are described in previous studies 76,77 . Briefly, a hydrophilic interaction LC method with an Xbridge amide column (100 × 2.1 mm, 3.5 μm) (Waters) was employed on a Dionex (Ultimate 3000 UHPLC) for compound separation and detection at room temperature. The mobile phase A was 20 mM ammonium acetate and 15 mM ammonium hydroxide in water with 3% acetonitrile, pH 9.0, and the mobile phase B was acetonitrile. The linear gradient was as follows: 0 min, 85% B; 1. MS was performed as described in previous studies 76,77 . Briefly, the Q Exactive MS (Thermo Scientific) is equipped with a heated electrospray ionization probe, and the relevant parameters are as listed: evaporation temperature, 120 °C; sheath gas, 30; auxiliary gas, 10; sweep gas, 3; spray voltage, 3.6 kV for positive mode and 2.5 kV for negative mode. The capillary temperature was set at 320 °C and S-lens was 55. A full scan range from 60 to 900 m/z was used. The resolution was set at 70,000. The maximum injection time was 200 ms. Automated gain control was targeted at 3,000,000 ions.
Data were collected, metabolites were identified and their peak area was recorded using El-Maven software (version 0.12.0) (refs. [78][79][80]. A pre-entered compound list of m/z values and corresponding metabolites was utilized to enable El-Maven EIC (extracted ion chromatogram) extraction of all samples. A manual visual examination of peaks selected by El-Maven was performed and misannotated peaks were manually corrected and exported as an Excel spreadsheet (Microsoft), as described in refs. 78,80. Metabolite identity was established using a combination of an in-house metabolite library developed using pure purchased standards, and the NIST (https://www.nist.gov) and Fiehn libraries 74 . P values were derived using a homoscedastic, two-tailed Student's t-test and adjusted using the Benjamini-Hochberg correction procedure.

Correlation analysis
To correct the expression bias arising from highly expressed genes, gene expression data were first corrected using spatial quantile normalization (SpQN; version 1.0.0) for each dataset with the first four principal components being removed for each dataset 50 . Genes were considered co-expressed in refine.bio datasets if SpQN-normalized Pearson's r > 0. 5  GO terms, the GO term with the highest fold change was used for the visualization. Enrichment FDRs and fold changes were visualized as bubble plots generated using seaborn (version 0.11.0) and Matplotlib (version 3.4.2) (refs. 63,85). Scatter plots of co-expressed genes against the gene of interest were generated using the regplot() function from seaborn (version 0.11.0) and Matplotlib (version 3.4.2) (refs. 63,85).

Sensitivity analysis
For the original input metabolomics datasets from Wikoff, et al. 29 and the 12 h timepoint mct1Δ yeast, six replicates each of 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80% and 90% of metabolites missing were generated. For the proteomics dataset for the 12 h timepoint mct1Δ yeast, 0%, 30% or 60% of proteins missing were generated as above. The random seed for each replicate was distinct but consistent between re-analyses. Each of these datasets were then processed by Metaboverse version 0.10.0 using default parameters, and 'Average' patterns from the Wikoff dataset and 'ModReg' patterns from the mct1Δ were output. Box plots and heat maps of the resulting data were generated using pandas (version 1.

Other visualization
Heat maps were generated using the clustermap() function from seaborn (version 0.11.0) and Matplotlib (version 3.4.2) using custom gene, protein or metabolite lists 63,85 . Heat map values were mean centred at 0 (z score). Hierarchical clustering was performed where indicated by the linkage lines using a simple agglomerative (bottom-up) hierarchical clustering method (or unweighted pair group method with arithmetic mean (UPGMA)). Gene comparison box plots were generated using the swarmplot() and boxplot() functions from seaborn (version 0.11.0) and Matplotlib (version 3.4.2) (refs. 63,85). Single-cell dot plots were generated using scanpy (version 1.8.2) (ref. 86). The pathway images in Fig. 4d and Supplementary Fig. 7 were generated manually using Adobe Illustrator (https://www.adobe.com/). Hex values for each protein or metabolite were determined by converting the fold change values in Python https://www.python.org/) using Matplotlib 64 .

Biological materials availability
All biological materials related to this manuscript are available without restriction. Requests for biological materials should be directed to J.R. (rutter@biochem.utah.edu).

Statistics and reproducibility
P values associated with log 2 fold changes for Metaboverse integration were calculated using a two-tailed, homoscedastic Student's t-test and adjusted using the Benjamini-Hochberg correction procedure, except for RNA-seq data, which used DESeq2 (version 1.22.1) to determine FDRs 69 . Other details are listed in Methods. For yeast experiments, samples were prepared with separate and fresh preparations with three to six biological replicates in each experimental or control group, as detailed in Methods and elsewhere as appropriate within the manuscript. In the case of the refine.bio yeast cohort, the entire WT sample cohort was used as specified in the manuscript text. For the public human LUAD datasets, the Wikoff 2015 study 29 contained 39 tumour tissue samples and 39 paired normal tissue samples; and TCGA data contained 487 gene expression samples total that were relevant to this study. No statistical method was used to predetermine the sample size. Sample sizes for high-throughput data generated for this study were chosen on the basis of a first-principles estimated understanding of the number of samples needed to generate expected statistical distributions on the basis of the data type. Statistical values were then adjusted for false positives following the convention for the respective data type. Other data were previously generated for other studies. For survival analysis, TCGA data were right censored and then removed if no days to death or censored days to death were available. Metabolomics samples that did not pass basic quality control (n = 1) were excluded from further analysis. No additional data were excluded. All biological assays were repeated at least three times. All replication attempts were successful. Verification of plasmid construct expression by western blot was performed once as a simple validation that the construct was being overexpressed. Samples were randomized during sample preparation, but not during sample collection for the yeast RNA-seq and yeast metabolomics data, or were previously collected (human TCGA data, human metabolomics data and yeast proteomics data), or otherwise not amenable to randomization (yeast growth spot tests and so on). Yeast sample ordering and handling would have otherwise been randomized during sample processing. Data were either previously collected (human TCGA data, human metabolomics data and yeast proteomics data), blinded during sample preparation but not sample collection (yeast RNA-seq and yeast metabolomics) or were otherwise not amenable to blinding (yeast growth spot tests and so on). Yeast samples were additionally difficult to blind during growth and collection as exact growth rates need to be measured throughout, and often correlate with genetic background.
All analysis code, raw and processed data are available in the above-named repositories or at ref. 55. This archive is organized by the figure number, and the source data, references to source data and code needed to replicate associated figures are included.

Protocols
A step-by-step protocol describing use of Metaboverse can be found at Nature Protocol Exchange 87 .

Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article. Fig. 1 | Metaboverse provides a simple, interactive user interface for processing and exploring multi-omics datasets and metabolic patterns. a. An overview of the graphical user interface of Metaboverse and a summary of analytical submodules contained within the platform. b. Overview of back-end metabolic network curation and data layering. .sbml refers to the systems biology markdown language-formatted resource containing the metabolic network information for the model organism. Colored circles represent subcellular compartments, biochemical reactions are stars, metabolites are circles, and proteins are squares. Grey arrows represent core reaction relationships and red arrows represent reaction inhibitor relationships. Progressively blue shades indicate decreased measurements between two conditions and progressively red shades indicate increased measurements between two conditions. The dashed box represents an identified reaction pattern, or net change across the measured components of a reaction. Fig. 2 | Metaboverse identifies a variety of metabolic patterns and enables pattern identification with sparse measurements. a. Examples of a selection of reaction patterns available in Metaboverse. Reactions are depicted as stars, metabolites as circles, protein complexes as squares, and proteins as diamonds. Core interactions (inputs, outputs) are depicted as grey arrows, reaction catalysts as green arrows, and reaction inhibitors as red arrows. Component measurements are depicted in a blue-to-red color map, where lower values are more blue and higher values are more red. b. Example sub-networks where a reaction collapse would occur. Measured components are depicted as red circles, unmeasured components as white circles, and reactions as stars. Core interactions (inputs, outputs) are depicted as grey lines and identical components that would form the bridge between two reactions are depicted as dashed black lines between circles. A collapsed reaction is depicted as a star with a dashed border and its summarized connections between measured components are dashed black lines between a measured component and a reaction node. https://doi.org/10.1038/s41556-023-01117-9

Extended Data
Extended Data Fig. 4 | Metaboverse identifies reaction patterns in xanthine and TCA metabolism. a. Identification of xanthine regulation by both the pattern recognition and perturbation analysis modules. b. Disruptions of TCA metabolism support canonical disruptions during adenocarcinoma development. Metabolomics values are shown as node shading, where an increasingly blue shade indicates downregulation, and an increasingly red shade indicates upregulation. Measured log 2 (fold change) and statistical values for each entity are displayed below the node name. A gray node indicates a reaction. A bold gray node with a purple border indicates a motif at this reaction. Circles indicate metabolites, squares indicate complexes, and diamonds indicate proteins. Gray edges indicate core relationships between reaction inputs and outputs. Green edges indicate a catalyst, and red edges indicate inhibitors. Dashed blue edges point from a metabolite component to the protein complex in which it is involved. Dashed orange edges point from a protein component to the protein complex in which it is involved. Protein complexes with dashed borders indicate that the values displayed on that node were inferred from the constituent protein and metabolite measurements. Hub limit was set at 30 during generation of the network visualization as shown in sub-panel b. Source numerical data are available at 64 . https://doi.org/10.1038/s41556-023-01117-9 Extended Data Fig. 9 | Metabolite relative abundance changes during CTP1 overexpression in mct1 Δ or wild-type backgrounds at 12 hours growth in raffinose. Boxplot overlaid with swarm plot for metabolites quantified by LC-MS in the mct1 Δ or wild-type background with either an empty vector or vector overexpressing CTP1 for a. glucose, b. fructose 6-phosphate (F6P), c. fructose 1,6-bisphosphate (F16BP), d. pyruvate, e. CoenzymeA species (CoA), f. citrate, g.α-ketoglutarate (a-KG), h. glutamine, i. glutamate, j. succinate, k. fumarate, l. malate, m. adenine, n. alanine, o. arginine, p. asparagine, q. aspartate, r. ATP, s. inosine, t. leucine, u. lysine, v. uracil, and w. valine. Cells were transferred from media with 2% glucose to media with 2% raffinose, allowed to grow for 12 hours, then harvested. All measurements were normalized using the average of the WT + EV samples for each metabolite. Each comparison group contained n = 3 samples. Center line represents data median, top and bottom lines represent 1.5x interquartile range. Individual data points are visualized as points. Source numerical data are available 64 .