Metabolic profiling of cancer cells reveals genome-wide crosstalk between transcriptional regulators and metabolism

Transcriptional reprogramming of cellular metabolism is a hallmark of cancer. However, systematic approaches to study the role of transcriptional regulators (TRs) in mediating cancer metabolic rewiring are missing. Here, we chart a genome-scale map of TR-metabolite associations in human cells using a combined computational-experimental framework for large-scale metabolic profiling of adherent cell lines. By integrating intracellular metabolic profiles of 54 cancer cell lines with transcriptomic and proteomic data, we unraveled a large space of associations between TRs and metabolic pathways. We found a global regulatory signature coordinating glucose- and one-carbon metabolism, suggesting that regulation of carbon metabolism in cancer may be more diverse and flexible than previously appreciated. Here, we demonstrate how this TR-metabolite map can serve as a resource to predict TRs potentially responsible for metabolic transformation in patient-derived tumor samples, opening new opportunities in understanding disease etiology, selecting therapeutic treatments and in designing modulators of cancer-related TRs.


Analysis of tissue signatures across four omics data types
Pair-wise similarity between cell lines was estimated by calculating mutual information 98 between transcriptome 1 , proteome 2 , drug sensitivity 3 , metabolite exchange rates 4 and intracellular metabolome profiles. Euclidian distance was used to estimate similarity in growth rates across cell line pairs. Receiveroperator-characteristic (ROC) curve analysis was then applied to test for similar molecular profiles across cell lines from the same tissue type (main text Figure 1c). To find metabolites featuring significant difference in relative abundances across tissue types we used one-way ANOVA analysis. Statistical significance (i.e. p-values) were corrected for multiple hypothesis testing using the Storey method 5 (i.e. q-values). Metabolites exhibiting a q-value below 0.05 for at least one tissue were considered significantly tissue-dependent.

Measurement of glucose and lactate exchange rates
Uptake of glucose and secretion of lactate was quantified in 54 adherent cell lines from the NCI-60 panel. Supernatants of growing cell cultures were collected every 24 hours for 5 days of cultivation. Samples between 20 and 80% confluence were retained for estimates of glucose uptake and lactate secretion. Residual glucose in culture supernatants was quantified using an enzymatic assay involving coupled glucose oxidase and peroxidase reactions (GOPOD format, K-GLUC, Megazyme, Bray, UK). For each sample, 5 µL supernatant was mixed with 150 µL assay reagent, and incubated at 40°C for one hour before reading absorbance at 510 nm. Lactate was quantified from FIA-TOFMS measurements (as described above) against a standard calibration curve comprising different lactate concentrations in constant RPMI1640 background. To estimate exchange rates, measured amounts of glucose and lactate were fitted with a linear model relating ion intensity to cell numbers over 3 biological replicates. For each cell line estimated curve slopes were multiplied by the corresponding cell line growth rates and normalized by the volume correcting factor (Supplementary Data 1).

RNA interference, siRNA transfection and HIF-1A knockdown
Transfection-and pooled siRNA reagents were obtained from Horizon Discovery Group Company, Lafayette, CO: DharmaFECT 1 siRNA Transfection Reagent (cat. no. T-2001), ON-TARGETplus Human HIF1A siRNA -SMARTpool (cat. no. L-004018-00) and ON-TARGETplus Non-targeting Pool (cat. no. D-001810-10). Upon receipt, siRNAs were resuspended in 1x siRNA buffer (B-002000-UB, diluted 1:5 in sterile-filtered and nucleasefree water) and stored at -20°C. To verify transfection of IGROV1 cells with these reagents, a fluorescent control siRNA (BLOCK-iT Alexa Fluor Red Fluorescent Control, cat. no. 14750100, Thermo Scientific) was transfected to monitor internalization of siRNA into cells using a Nikon Ti-E inverted microscope equipped with an LED illumination system (pE-2, CoolLED, Andover, UK) and a Hamamatsu ORCA (C4742-95-12ER) camera. Cells were counted manually in phase contrast and fluorescence images, respectively, and a transfection efficiency of 92% was calculated (see Supplementary Figure 6). For knockdown and dynamic metabolome profiling experiments, IGROV1 ovarian cancer cells were resuspended in RPMI1640 cell culture medium (5% dFBS, 2 g/L glucose, 2 mM glutamine) without antibiotics, seeded at low cell density (approx. 20% confluence) in 96-well plates and allowed to attach at 37°C and 5% CO2 for approximately 6 hours. The transfection mix was prepared as per manufacturer's instructions. For each well to be transfected, 0.25 µL transfection reagent were mixed with 9.75 µL serum-and antibiotics-free medium. In parallel, 5 µM stock solutions of siRNA were pre-diluted in a total volume of 10 µL serum-and antibiotics-free medium per well to achieve final concentrations at transfection of 10, 25 and 50 nM of HIF-1A siRNAs, and 25 nM of the non-targeting control siRNA. After incubating the separate components at room temperature for 5 minutes, both were mixed and incubated for 20 minutes at room temperature. Finally, 20 µL of the medium in each well was replaced by 20 µL transfection mix in three biological replicates per condition. Transfected cell cultures were then incubated at 37°C and 5% CO2 atmosphere for a total of 144 hours. Nine replicate 96-well plates were prepared: two for metabolomics sampling (using the protocol described in Supplementary Figure 1) at time point: 48, 64, 87 and 111 hours after transfection, and one plate for continuous monitoring of cell confluence in a TECAN Spark 10M plate-reader. Sampling time-points were selected to be between one and two population doublings after transfection (Supplementary Figure 6) to allow for sufficient reduction of HIF-1A protein levels by dilution during growth. The presence of fluorescent control siRNA in separate control cells was verified as described above at all sampling time-points to ensure sustained transfection. Cell extracts were analyzed by flow-injection TOFMS as described above. Analysis of time-dependent fold-changes relative to the non-targeting control was performed as described in detail in ref. 6 .

Associations between TR activity and drug action
In this analysis we used publicly available data (http://dtp.cancer.gov/mtargets/mt_index.html) on the sensitivity of the NCI-60 cell lines to 21738 compounds. Cell-line drug sensitivity is reported as the normalized Z-score of the GI50 values (e.g. the concentration of a compound that causes 50% growth inhibition, relative to the no drug control). Here we considered only the 130 FDA approved drugs (Supplementary Data 4). Some of the FDA-approved anticancer drugs were tested in the NCI-60 screen twice, resulting in the overall selection of 187 drug sensitivity profiles across the NCI-60 tumor cell panel. The influence of TRs on drug susceptibility is inferred using a multivariate statistical analysis to decouple the contribution of different tissues of origin from changes in TR activity. For each TR i and drug j, drug susceptibility (S) is modeled as a linear combination of changes in TR activity (TR α ) and tissue-specific offsets values (β): Supplementary eq. (1) λ and β values are estimated using least squares fitting. The significance of the association between changes in TR activity and drug susceptibility is estimated by calculating the p-value of the Spearman correlation between TR activity and drug susceptibility after removal of tissue-specific offset β values across the 54 cell lines (i.e. TR i d j − Lung − . . . − Breast ). The inferred slope λ, p-value significance and Z-score normalized βKidney values are used in Figure 3e-f, to visualize the inferred relationships between drug susceptibility and changes in VHL and HIF-1 activity. In order to find significant associations between TRs with common biological functions and drug modes of action (Figure 3d)

Prediction of metabolite-TR effectors
Because of the poor correlation between protein levels and NCA estimated TR activities (Supplementary Figure  5), we systematically investigate whether variation in TR activities across cell lines could be explained by alternative models. Instead of assuming a base model where the TR activity is a simple linear function of protein abundance, we model TR activity as a function of TR protein levels and the post-translational regulatory functions of metabolites and/or kinases, individually or in combination as follows: Where K1,2,3,4 are free parameters in the model, TR j is the relative protein abundance associated to TR j , Px represents the relative protein abundance of kinase x and Mi the level of metabolite i. For models in which we assume the non/inhibitory actions of a single metabolite or kinase, we use a non-linear model fitting scheme to find the best set of 3 parameters to describe changes in TR activity across cell lines. To minimize the probability of local minima, we adopted the GlobalSearch function of Matlab and 50,000 starting points. By using the same approach, we tested all possible combinations between pairs of metabolites and kinases that can act either as activators and/or inhibitors, to find the best set of 4 parameters that describe TR activity. For each pair or triplet of TRs and metabolites and/or kinases we estimate the Mean Squared Error (MSE) associated to each tested model. In addition, we estimated the MSE when fixing the model parameters and randomly permuting TR activity, TR protein, metabolite and kinase levels across cell lines (MSE e ). For each TR we then repeat model fitting, each time by randomly shuffling metabolite and kinases levels. We used this approach to empirically assess the significance of each model in better explaining TR activity. To this end, we calculated MSE values for each model in which kinases and metabolite levels were randomly permuted (MSERandom and MSE e Random ). Finally we retain only those models in which MSE and MSE e are both above the 0.1% of the respective distributions of MSERandom and MSE e Random (Supplementary Data 5). The proteome dataset (ArrayExpress, project accession: E-PROT-2) reports protein levels for 100 annotated TRs and 63 kinases, and to reduce computation time we here consider only differentially abundant metabolites annotated to KEGG identifiers (260 metabolites selected). Fitting analysis was performed in Matlab using the "fitnlm" function, on a cluster with 900 computational nodes.

Mathematical modeling of metabolic pathways and TR-metabolite correlations
Here, we describe a set of assumptions sufficient to formally explain and describe the existence of a dependency between metabolite abundance and TR activity. We assume that a TR has the ability to directly regulate the flux demand or supply of a linear metabolic pathway, which consists of 4 intermediate metabolites, 3 irreversible and 1 reversible reactions, which we assume to follow Michaelis-Menten kinetics. A schematic representation is illustrated in Supplementary Figure 5, panel a.
TRs can regulate reaction rates by changing the abundance of enzymes. In reactions that operate above maximum enzyme capacity regulation of enzyme abundance confers to the TR high control on pathway usage. In such a case, the flux through the reaction is largely proportional to the amount of enzyme (Ei), which we assume to be a monotonous function of TR activity. Specifically, we assume enzyme abundance to be a positive linear function of TR activity. The individual control coefficient that is defined as the slope of the log-log plot of flux versus TR activity reflects the influence of TR on the flux through the pathway 7 . In this particular example where the flux decreases by the same relative amount as the TR activity, the value of the control coefficient is 1. Hence, at steady state, flux through the pathway is set by and proportional to TR activity. where M represents metabolite abundance, vinput is the flux through the pathway at steady-state, Km, Kcat and Vmax are the Michaelis-Menten constant, the catalyst rate constant and maximum flux capacity, respectively. Notably, kinetic parameters (i.e. Kcat and Vmax) can be assumed to be largely conserved across different cell lines, and enzyme levels of reactions mainly regulated by substrate abundance are expected to exhibit little fluctuations. In support of this assumption, we observed that the median fluctuation of protein levels, estimated as the standard deviation of protein levels across cell lines divided by the corresponding average values (i.e. coefficient of variance), is below 50%. Hence, for pathways in which steady-state flux is directly regulated by activity of one TR, and intermediate reactions are governed by substrate abundances, one could expect to find a correlative signature between the relative TR activity and metabolite abundances across cell lines, similarly to the simplified metabolic toy model described here.
It is worth noting that the incidence of metabolic reactions that operate above or below maximum enzyme capacity is still a matter of debate. While recent experimental evidence demonstrates that nearly 70% of reactions in central carbon metabolism are likely to operate at enzyme saturation 8 (i.e. substrate abundances exceeding reactions Km), a growing body of evidence advocates for a scenario where a large array of posttranslation inhibitory mechanisms can increase the apparent Km of metabolic reactions 9,10 . This possibility might shift the balance in favor of metabolic reactions to be mainly governed by substrate abundances, supporting the large relevance that TR-metabolite associations might have in identifying functional regulation of TRs on cell metabolism.

TR regulatory activity in metabolic pathways -example associations
Tumor suppressor TP53. p53, one of the most well-studied tumor suppressor proteins 11,12 , significantly correlates with levels of intermediates in nucleotide metabolism (q-value < 0.05), such as purine biosynthesis and nucleotide sugar metabolism and arginine/proline metabolism. Both associations are consistent with previous experimental evidence 5-7 directly pointing to the well-known functional role of p53 in allocating metabolic resources for DNA repair, DNA replication and cell division [11][12][13] and regulation of arginine and proline metabolism 14 (see also next section).
Proline metabolism. In recent years, proline metabolism has received considerable attention as a source of NAD/NADP regeneration 15 , but also as a central integration point of signaling information 16 . While to date, only few cancer-related TRs have been associated to the regulation of proline metabolism, we here found TR-proline metabolism associations recapitulating known regulatory interactions (KEGG pathway enrichment q-value < 0.05), such as the induction of proline catabolism by tumor suppressor p53 14 (Figure 3a), as well as the regulation of proline metabolism by ATF4 as part of the amino acid starvation response pathway 17,18 . Overall, we found 197 TRs that can potentially impinge on proline metabolism (q-value < 0.05), such as EGR1 19,20 and EGR2 21 , REL-family transcriptional regulators 22 and ATF-family TRs 23 (full list in Supplementary Data 3).

MYC and its interactome.
MYC is a global regulator that takes part in a number of growth-promoting signaling pathways, many of which ultimately impinge on metabolism 24 . Discovered many years ago and recognized for its frequent hyper-activation in human cancers, MYC is among the most well-studied oncogenes 25,26 . MYC forms stable dimers with the MAX protein to gain specific DNA-binding activity 27 . The formation of MYC/MAX dimers is affected by MAX-binding proteins, most prominently from the MXD family of MAX dimerization proteins 28,29 . MXD proteins are transcriptional repressors that act as MYC antagonists not only by competing for available MAX with MYC, but MAX/MXD and MYC/MAX dimers also compete for DNA binding sites. MXD proteins and other members of the dense network of TRs around MYC (the so-called MYC interactome) can hence attenuate MYC-dependent gene expression by a plethora of homo-and hetero-dimerization events 28,29 , and affect the functional outcome of MYC activation/inhibition 30 . Consistent with the current working hypothesis of a MYC interactome, from which MYC activity emerges as a complex function of protein-protein interactions, we found that, while MYC itself exhibits rather weak associations with metabolic pathways (Supplementary Figure 6,  It is worth noting that the evidence provided here, based on direct measurements of metabolic intermediates and their associations with TR activity, is complementary to previous reports that were predominantly based on differential gene expression and knockdown experiments. Nuclear factor kappa B (NF-κB). Nuclear factor kappa B (NF-κB) complexes act as transcriptional regulators of general stress responses 22,31 , inflammatory responses and cellular homeostasis with widespread implications for cancerogenesis 32 . The role of NF-κB dimers extends into metabolism, where NF-κB has been associated with p53-mediated metabolic reprogramming of energy metabolism 33,34 . NF-κB complexes can be homo-or heterodimers of different subunits, including NF-κB1/p105, NF-κB2/p10, RelA/p65 and RelB, which exhibit different specificity for DNA elements 35,36 and can hence trigger transcription of specific sets of target genes. NF-κB activity is regulated by members of the inhibitor of κB (IκB) protein family that can affect NF-κB localization 37 by masking its nuclear localization signal sequence. Our metabolome-based TR-metabolite associations hint at metabolic functions specific of different NF-κB complexes (Supplementary Figure 6, panel c). In glycolysis and pentose phosphate pathway, we found strong associations of NF-κB2 and RelA. Similarly, Bcl-3, a member of the IκB protein family, exhibited a strong association with pentose phosphate pathway ( Figure 2 in the main text). These associations are consistent with the known gene targets of NF-κB1 and RelA ( Figure 2 in the main text), while interactions of NF-κB2, RelB and Bcl-3 with central carbon metabolism were not annotated in the TR-gene interaction network.
Nuclear receptor Rev-ErbA-Alpha (NR1D1). The NR1D1 gene encodes the heme-responsive nuclear receptor Rev-ErbAα, a sequence-specific transcription factor 38 . By acting predominantly as a transcriptional repressor, Rev-ErbA regulates a number of cellular processes, including lipid metabolism and the circadian rhythm 39,40 . In cancer, Rev-ErbA is often found co-expressed with ERBB2 41 , encoding the Her2 oncogene, and there has been considerable interest in targeting Rev-ErbA and the frequent disruption of the circadian clock in cancer for therapeutical purposes 42,43 . Among metabolites associated with Rev-ErbA-Alpha in our TR-metabolite association network, we find associations to lipid metabolism and membrane biogenesis (glycerophospholipid metabolism, choline metabolism), as well as to one-carbon-, carbohydrate-, purine-and amino acid metabolism (histidine, glutamine and glutamate). A similar association to lipid metabolism has been previously suggested based on changes in gene expression and altered lipid metabolism phenotypes in gene knock-down experiments 40 .

Mapping oncogenic TRs to in vivo metabolic reprogramming -renal, lung and colon cancer
In the following sections, we discuss our prediction of TR drivers responsible for metabolic rearrangements in cohorts of patients with renal, lung or colon cancer. The predictions are based on a novel TR-metabolite association network established in vitro. The interplay between transcriptional regulation and metabolism is investigated by direct experimental measurements of intracellular metabolite levels and changes in TR activity across 53 cell lines from the NCI60 panel of human tumor cell lines (see main text and Methods section for full detail). Our integrative analysis of TR-metabolite associations and in vivo metabolic changes between cancer-and healthy tissue can help to identify transcriptional regulators that directly contribute to the metabolic rearrangement and aid the therapeutically relevant subtyping of cancers and patients.

Clear-cell renal cell carcinoma.
Clear-cell renal cell carcinoma (ccRCC) constitutes more than 70% of all renal cancers 44 . Its genetic basis has been extensively characterized [45][46][47] , and paints a picture in which metabolism plays a crucial role in the development and manifestation of ccRCC 48 Table 3.
Lung cancer. Lung cancers are most frequently associated with mutations in TP53 (46% of cases) 50 , a type of oncogenic mutation that is shared with a wide range of cancer types.
In the cohort of matched lung cancer/normal fibroblast cells reported by Chaudhri et al. 51 , we matched 209 metabolites to our TR-metabolite association network, of which 80 were detected and quantified in the lung cancer cells, with mean absolute fold changes ranging from 1.005 (creatine) to 2.9 (phosphoenolpyruvate). A summary of the metabolites and the corresponding strength of association with the top 1% of TRs recapitulating the observed changes between lung cancer and normal fibroblasts in the dataset is given in Supplementary Figure 7, panel b. The median association strength to this set of metabolites was highest for NFYB, followed by YBX1, NFYC, GFI1, FOXN1, HOXC13 and AATR. Several metabolites exhibiting a high absolute fold change showed a consistently strong association to the seven TRs. This group of metabolites mainly included long-chain fatty acids such as arachidonate, eicosapentaenoate, adrenate and docosapentaenoate. A list of the top 15 predictive metabolites for each TR is provided in Supplementary Data Table 3. The fact that p53, despite being among the most frequently mutated genes in lung cancer 50 , is not among the top-ranking TRs likely indicates that there are downstream TRs which can more directly explain the observed metabolic reprogramming between lung cancer cells and normal fibroblasts. Notably, several of the top 1% most significant TRs found in our analysis are linked to p53, including GFI1 52,53 and AATF 54 . The top-ranked TRs NFYB and NFYC are two subunits of the heterotrimeric NF-Y complex, a regulator of cell cycle genes and proliferation 55 . NF-Y appears to act as a downstream effector of both wild-type 56 and mutant p53 57 , interestingly changing its role from transcriptional repressor to a transcriptional activator in the latter case. Recently, NF-Y has been described in context of alterations in metabolism 58 , including lipid and fatty acid metabolism, glycolysis as well as TCA cycle, where NF-Y binding sites are particularly frequent 58 . Several of the predicted TRs have been associated with lung cancers, including YBX1 (YB-1) 59 , GFI1 60 and HOXC13 61 . . CEBPE and POU2F2 showed minor associations to only 2 metabolites each. Clearly, given the small number of metabolites quantified in the colon cancer samples, our prediction of an involvement of these TRs would have to be substantiated with additional data (i.e. more quantified metabolites).
Here, we predict an involvement of RELB, a subunit of the NF-κB complex that is involved in inflammation and plays a critical role in malignant transformation and cancer progression 66 , with particular relevance in colitisassociated cancers 67,68 . Notably, several of the predicted TRs have previously been associated with signaling pathways that are de-regulated in colon cancer (e.g. Wnt 62,63 and SMAD 69 signaling), including CBX7 70,71 , MLLT10 72 and ZNF217 73,74 .

Metabolite and/or kinase effectors of TR activity -example predictions
In the following sections, we provide selected examples for regulatory interactions affecting TR activity, as predicted in silico using non-linear model analysis to integrate cellular information across three layers, i.e. metabolome, proteome and transcriptome (see main text and Methods section).
Choline. Among the most highly connected metabolites predicted by our model-based fitting analysis is choline (118 interactions with 36 TRs), substrate of the first step in the biosynthesis of phosphatidylcholine, a key component of eukaryotic cellular membranes. Recent studies have revealed a deep interplay between oncogenic signaling and choline metabolism, such that altered choline levels have become a metabolic hallmark of malignant transformation 75 . Our results support this key role, suggesting that choline availability could trigger pleiotropic transcriptional changes.
EIF2AK2/PKR activation by oxidative stress. Eukaryotic Translation Initiation Factor 2 Alpha Kinase 2 (EIF2AK2, also known as PKR) is a regulatory protein acting on eukaryotic initiation factors (eIFs) that mediate the initiation of protein synthesis 76 . In response to various stress signals 77 such as oxidative stress 78,79 , EIF2AK2/PKR phosphorylates the subunit α of eIF2. This prevents the assembly of the translation initiation complex, which effectively inhibits mRNA translation 76 . Furthermore, EIF2AK2 acts as part of a signaling pathway that can activate NF-κB 80,81 , an important stress response mediator, and inflammation 82 .
Here, we predict an interaction between EIF2AK2 and oxidized glutathione (GSSG), which is part of the intracellular redox buffer GSH/GSSG, in combination with several different kinases (Supplementary Data 5). As cells face oxidative stress, excess reactive oxygen species (ROS) are neutralized in part through oxidation of reduced glutathione (GSH), leading to the accumulation of GSSG while the GSH pool is depleted. Serving as a signal for oxidative stress, the predicted interaction may hence represent a mechanism underlying the oxidative stress-mediated activation of EIF2AK2/PKR. The only other interactions predicted for EIF2AK2 are with choline and N-acetylserotonin, in combination with CKB and CKM kinase, respectively.
Effectors of tumor suppressor/oncogene p53. The extensively studied tumor suppressor p53 11,83 has been associated with a plethora of cellular processes, from apoptosis induction 84 and the DNA damage response 85,86 to metabolism 87,88 . Here, our model fitting-based analysis indicates an influence of ATP and fatty acid metabolites such as lipoic acid and myristic acid, as well as the polyamine intermediate N-acetylspermine and the cholesterol metabolite hydroxycholesterol on p53 activity (Supplementary Data 5). ATP was previously found to directly bind and stabilize p53-DNA complexes 89 . Interestingly, the interaction with p53 was one of only two interactions we predicted for ATP, the second one being with interleukin enhancer-binding factor 2 (Ilf2). Lipoic acid has been attributed a potential involvement in preventing p53 protein degradation 90 , while myristic acid, as a representative of saturated fatty acids, can act as a negative regulator of the DNA damage response 91 to promote cell proliferation. N-acetylspermine is an intermediate in the conversion of spermine to putrescine, catalyzed by spermidine/spermine N(1)-acetyltransferase (SAT1). The polyamines spermine, spermidine and putrescine are required for cell growth 92 , and have been suggested to engage in regulatory interplay with p53 93,94 . Hydroxycholesterol is an oxygenation product of cholesterol 95 , which has been suggested to act on p53 activity through modulation of the estrogen receptor in breast cancer 96 . Taken together, we predict several small-molecule compounds that could potentially affect p53 activity. Given the drastically different roles of p53 in a wild-type versus mutant p53 setting 97 , the therapeutic relevance of these interactions remains to be elucidated. Searching for drug molecules chemically similar to predicted effectors of p53, we found palm oil to potentially interfere with p53 activity, an effect already suggested in the literature 98,99 .

Regulation of diverse cellular processes by hydroxycholesterol.
Cholesterol is an important structural component of mammalian cells 100

Image analysis: cell number quantification from bright-field images of adherent cells
Cell concentration and viability information can be used for monitoring proliferation rates, optimizing growth conditions and normalizing cell data, like in our case metabolomics data. Here, we describe the procedure used for processing time-lapse bright-field microscopy images of adherent cells during time-course growth experiments. With the proposed protocol, we aim at a rapid quantification of cell numbers in images of live adherent cell cultures. The procedure has been optimized to scale with the number of images collected during long-term time lapse microscopy experiments, consisting of typically thousands of frames, and to generalize to cell types with radically different morphologies, without customization from the operator.
Currently, multiple platforms are routinely used for cell number quantification. Several methods are destructive and require enzymatic cell detachment using trypsin, adding increased cost to researchers and limiting flexibility as well as throughput. Identifying cells directly from microscopy images (cell segmentation) is an attractive alternative, and several image processing analysis tool are now routinely used for quantitative single-cell biology [116][117][118][119][120] . However, most methods use fluorescence microscopy, can be computationally demanding and poorly scale with the number of images. Here we propose a label-free and non-destructive approach based on bright-field microscopy imaging.
The detection and segmentation of adherent cells from bright-field microscopy images represents a challenging task. This is because (i) intensity of the background is often similar to the intensity of the cells, (ii) illumination is uneven and cells often have a surrounding bright white area (i.e. halo), (iii) contrast is poor and makes cell boundaries poorly defined, especially among sets of spatially close cells.
Here we acquired and processed bright-field images using a TECAN Spark 10M plate reader. For each well, 25 images were acquired and then stitched together, with a spatial resolution of 1.3 µm per pixel (panel a in Supplementary Figure 9). The subsequent steps of image analysis are described in detail below.
Our automatic procedure consists of 3 main steps: (i) The image is preprocessed to improve contrast of cell borders (panels b-c in Supplementary Figure 9) using a series of filter masks in Matlab (e.g. imopen, imsharpen and imfilter). (ii) Then we combined two techniques to perform cell segmentation. The first one is an empirically derived image gradient threshold selection method named Empirical Gradient Threshold (EGT) from 117 . Subsequently, we refined the segmentation using the Maximally Stable Extremal Regions (MSER) approach 121,122 using a previously developed code 123 . To speed up image acquisition by the plate reader, only the center of the well is used during the auto-focus procedure. Hence, image quality is higher in the central area, while images become more blurry (and hence difficult to segment) further away from the center of the well (panels a and d in Supplementary Figure 9Error! Reference source not found.). Therefore, to reduce the computational complexity and improve the quality of the segmentation, we restricted the area of the image that is segmented to the central round area of each well (0.05 cm from the center of the well). (iii) In the last step, we aim at estimating the characteristic size of adherent cells. Given the poor contrast of cell borders, segmentation of dense clusters of cells can fail to accurately separate individual cells. This often results in few, but large, unified blobs of cells. Other typical segmentation artefacts are associated to false identification of cells in the halo region. Such errors are more frequent, and typically associate with small segmented areas. While these wrongly segmented cells are typically a minor fraction with respect to correctly segmented cells, they can significantly bias the number of estimated cells. Hence, to cope with these classical segmentation errors, instead of directly counting segmented cells, we estimated the most characteristic cell size area based on the distribution of observed cell sizes, and use this estimate to compute total cell number as follows. While there are several freely available image analysis tools to perform cell segmentation analysis, most methods are optimized for phase contrast microscopy images 125 , are computationally demanding 123 or necessitate parameter optimization based on the cell type imaged. The performance data presented here confirm that our methodology is capable of quantitatively estimating characteristic cell size and cell concentration across several adherent cell lines, without customization. Overall, our approach offers a rapid coarse-grained segmentation analysis, from which we can derive a rapid and accurate estimate of characteristic cell size area and cell number. It is worth noting that our image analysis tool can be easily adapted to other similar plate readers or similar microcopy setups. The code is implemented in Matlab and is available for download at http://www.imsb.ethz.ch/research/zampieri-group/resources.html.

Supplementary Figures
Supplementary Figure 1 acquisition (instrumental noise). Of note, the degree of variance observed within each cell line overlaps with the observed variation in cell confluence between the same three biological replicates (panel b). Since we collected samples at different time points during growth, the number of extracted cells has a major contribution in MS-signal intensity and will play a crucial role in the subsequent normalization of raw MS data (see Figure 1 in the main text).

Supplementary Figure 3. Comparison of NCI-60 metabolome profiles with other metabolome resources. (a-e)
Here we compare the metabolic profiles measured using our approach and previous metabolic profiles measured by the Metabolon company and available at the NCI DTP Molecular Target Program (https://wiki.nci.nih.gov/display/ncidtpdata/molecular+target+data). We detected 45 metabolites annotated to KEGG identifiers common between the Metabolon and our dataset. The second component is plotted against cell size. The significant (p-value<0.05) correlation with cell size suggest that these data are not normalized to take into account differences in cell size. (f) Here we compare our metabolome dataset with a previously published metabolomics dataset 127 measuring the concentrations of 296 metabolites in 11 different tissue types of mouse: testis, pancreas, plasma, heart, kidney, liver, lung, spleen, thymus, cerebella, cerebra. Three tissue types: kidney, lung and cerebra, are in common with renal, lung and CNS tissues of origin in the NCI-60 cell lines. Since metabolite concentrations in 127 are given in absolute amounts per gram tissue, and not per cell volume, we used our data before cell volume correction (i.e. relative abundance per cell) for the comparison.
To compare the relative values across tissues, we first take the mean values of metabolite intensities in NCI-60 cell lines of the same tissue types. Next, we calculate the median Pearson correlation for the 153 metabolites in common between the two datasets. Expected correlation values were estimated by randomly selecting mouse tissue types and average and standard deviation across 1000 permutations are reported (red line). To avoid artefacts related to sensitivity, we repeated this calculation for metabolites with an increasing minimum absolute concentration (nmol/g) as measured in ref. 127 . We found that for metabolites with a concentration in at least one tissue type above 260 nmol/g, the median correlation with our data is above 0.5 and was larger than random. (g) Here we compare the results of our quantitative measurements of glucose uptake-and lactate secretion rates (y-axis) with measurements obtained for 53 cell lines in the work of Jain and colleagues 4 (x-axis). Two are the main differences between the two datasets: Firstly, instead of measuring only end-point concentrations of glucose and lactate in the spent medium, we measured changes in concentrations every 24 hours for approximately 5 days of cell growth. A linear model relating ion intensity to growth-normalized time (cell number divided by growth rate) were fitted, allowing for a more robust and quantitative assessment of uptake-and secretion rates. Secondly, instead of reporting the flux per cell, as in ref. 4, we correct for differences in cell volumes using the correction factor derived in our metabolomics approach (Supplementary Figure 2). Hence, we derive a relative measure of rates for both glucose uptake and lactate secretion expressed in fmol/h/volume (AU). .9% confidence intervals were calculated using a bootstrapping approach. The cell lines in the metabolome data set were randomized by resampling 100 times with replacement, and Spearman correlation coefficients were calculated for each randomized data set. Correlation coefficients exceeding the maximum value among 99.5 and 99.9% (0.5 and 0.1% FDR, respectively) lowest correlation coefficients were obtained from the pooled list of absolute correlation coefficients. (l) Analysis of correlations between metabolite levels and TR activity profiles in relation to their distance between TR and enzyme targets in the stoichiometric network 131 . Only TR-target genes interactions listed in the TRRUST interaction database 132 were considered. The black line represents the average pairwise TR-metabolite distance (y-axis) at different levels of correlation between metabolite abundance and TR activity (x-axis). The blue line represents the average TR-metabolite distance in 10.000 randomized stoichiometric networks. Data points are color-coded to reflect the significance (q-value, corrected for multiple tests) of TR-metabolite proximity in the stoichiometric network as compared to the randomized networks. To test the ability of our TR-metabolite association network to resolve altered TR activity, we perturbed HIF-1A mRNA levels in IGROV1 ovarian cancer cells using RNAi. To that end, we transfected IGROV1 cells with three different siRNA concentrations (10, 25 and 50 nM). Control cultures were transfected with a non-targeting siRNA. Cell growth (i.e. cell confluence, y-axis) was monitored continuously using a TECAN Spark 10M plate reader for automated time-lapse microscopy (37°C, 5% CO2). (b) Replicate IGROV1 cultures were transfected with a red fluorescent siRNA (see Supplementary Methods) to monitor cationic lipid-mediated transfection throughout the experiment using a Nikon Ti-E inverted fluorescence microscope. Cells were imaged directly in the 96-well plate without fixation. The image shows a representative section at 87 hours post-transfection in an overlay of a phase contrast and a fluorescence image. Red cells contain the fluorescent control siRNA, indicating successful transfection of IGROV1 cells (transfection efficiency: 92%). (c) Twelve TRs yielding a higher prediction score than HIF-1A share a strong association with fatty acid metabolism and may reflect effects of lipid transfection. The association to TCA cycle, however, is confined to HIF-1A and only two other TRs with a higher score than HIF-1A, CTBP1 and CTNNBIP1. , where variance in TR activity across 53 cell lines was modeled as a function of the protein abundance of a given TR, and the additional activating or inhibiting action of a metabolite and/or a kinase, alone or in combinations. In total, we considered 100 TRs and 64 kinases/phosphatases for which protein abundance data was available, and 230 metabolites exhibiting largest variation across the 53 cell lines, resulting in 6,753,600 models. We tested for each model whether the interaction significantly improves the explained variance in TR activity across the 53 cell lines, and retained 1,888 network links (FDR ≤ 0.1%). (d) Probability density function (pdf) of the number of interactions reported for the metabolites in the predicted regulatory interaction network, as compared to all metabolites for which at least one allosteric interaction has been reported 9 . P-values report on the significance of the difference in interaction frequencies (Kolmogorov-Smirnov test) between the two groups of metabolites.