Tumor metabolism and associated serum metabolites define prognostic subtypes of Asian hepatocellular carcinoma

Treatment effectiveness in hepatocellular carcinoma (HCC) depends on early detection and precision-medicine-based patient stratification for targeted therapies. However, the lack of robust biomarkers, particularly a non-invasive diagnostic tool, precludes significant improvement of clinical outcomes for HCC patients. Serum metabolites are one of the best non-invasive means for determining patient prognosis, as they are stable end-products of biochemical processes in human body. In this study, we aimed to identify prognostic serum metabolites in HCC. To determine serum metabolites that were relevant and representative of the tissue status, we performed a two-step correlation analysis to first determine associations between metabolic genes and tissue metabolites, and second, between tissue metabolites and serum metabolites among 49 HCC patients, which were then validated in 408 additional Asian HCC patients with mixed etiologies. We found that certain metabolic genes, tissue metabolites and serum metabolites can independently stratify HCC patients into prognostic subgroups, which are consistent across these different data types and our previous findings. The metabolic subtypes are associated with β-oxidation process in fatty acid metabolism, where patients with worse survival outcome have dysregulated fatty acid metabolism. These serum metabolites may be used as non-invasive biomarkers to define prognostic tumor molecular subtypes for HCC.


Results
Tissue metabolites can distinguish between tumor and non-tumor tissue. The overall flowchart of the analysis strategy is illustrated in Fig. 1A, with a detailed analysis flowchart shown in Supplementary  Fig. S1. To test whether tissue metabolites can distinguish tumor and non-tumor tissues of 56 HCC patients, Principal Component Analysis (PCA) was used to classify tissue samples based on metabolites measured by an untargeted metabolomics approach (Fig. 1B). Based on the 500 metabolites with highest variability among samples, tumor and non-tumor tissues can be separated clearly. To visualize the extent of tissue metabolites and pathways that are dysregulated, differentially altered metabolites between tumor and non-tumor tissues were overlaid onto overall human metabolite map from KEGG 15 . The 562 known detected tissue metabolites can be converted to 352 KEGG compound IDs from the total of 1271 metabolites present on the map (accession number hsa01100). The affected metabolites in HCC fall into four main metabolic categories, i.e., nucleotide metabolism, amino acid metabolism, energy metabolism, and lipid metabolism (Fig. 1C).
Metabolic genes, tissue and serum metabolites can identify patients with worse survival outcome. We used metabolism-specific genes to further determine if they can classify HCC into prognostic subtypes. To ensure that the metabolic genes and tissue metabolites are tumor-specific, we incorporated the calculation of a paired t-test between tumor and adjacent non-tumor tissue gene expression. For the first step of the correlation analysis, the numbers of metabolic genes and metabolites that passed correlation thresholds from a permutation test and a paired t-test with an FDR-adjusted p-value of < 0.05 were 491 genes and 78 metabolites, respectively (Supplementary Table S1). Among 78 metabolites that passed the correlation coefficient thresholds (Supplementary Table S2), 40 metabolites passed an additional 1.5 fold-change cutoff between tumor and adjacent non-tumor tissues ( Fig. 2A, Supplementary Fig. S2A, and Supplementary Table S3). We performed hierarchical clustering analysis on Thai HCC patient samples by using consensus clustering algorithm 16 based on these 491 metabolic genes (MetGene) or 40 tumor-specific metabolites (TissueMet) which revealed three stable clusters based on metabolic genes or tumor-specific tissue metabolites, named MetGene-C1, MetGene-C2, and MetGene-C3, or TissueMet-C1, TissueMet-C2, and TissueMet-C3, respectively ( Fig. 2B-upper panel). The metabolic gene expression based on 491 genes shows distinct expression patterns between the three clusters, which are similar to previously identified common Asian subtypes ( Fig. 2B-middle panel) 13 . Similar to global transcriptome profiling analysis, there was no discernible pattern emerging from standard clinical and etiological parameters of HCC ( Fig. 2B-lower panel). These results suggest that metabolic genes can capture major molecular features of tumor samples. We also explored whether these subclasses were consistent with prognostic signatures published among HCC cohorts. To do so, we assessed S1-S3 signature 17 , stem cell signature 18 , EpCAM signature 19 , Metastatic signature 20 , CCA-like signature 21 , Cluster 2 signature 22 , proliferation signature 23 and found a concordance between the good and poor survival subclasses defined by the metabolic genes and those defined by published signatures. We also performed consensus clustering using the same set of metabolic genes in two other cohorts: Liver Cancer Institute (LCI) 20 cohort and The Cancer Genome Atlas (TCGA) 24 cohort on Asian patients only. Similar gene expression patterns were observed in both cohorts ( Supplementary  Fig. S3A, B). The resulting MetGene clusters from LCI and TCGA cohorts were then subjected to Subclass Mapping analysis to identify clusters that matched the original consensus clusters from the TIGER-LC cohort. MetGene-C1 and MetGene-C2 clusters matched the original clusters from TIGER-LC cohort ( Supplementary  Fig. S3C, D). To identify serum metabolites that are directly associated with tumor tissue metabolites, we performed a second step of correlation analysis between the 40 tumor-specific tissue metabolites and global serum metabolites, which resulted in a set of 75 correlated serum metabolites (Fig. 3A and Supplementary Table S4). Finally, to ascertain that the identified serum metabolites can stratify patients into similar subgroups as those defined by metabolic genes and tumor tissue metabolites, we performed consensus clustering analysis based on the 75 serum metabolites, which similarly yielded three stable clusters of the samples named SerumMet-C1, SerumMet-C2, and SerumMet-C3. Similar to our assessment of metabolic genes, we found a concordance between good and poor survival subclasses defined by the 75 serum metabolites and those defined by published signatures.
We also investigated the mutation status of two genes that often mutated in HCC, namely TP53 and CTNNB1. These two genes are not usually mutated together in the same patient. We found that TP53 mutation are predominantly in MetGene-C1, TissueMet-C1, and SerumMet-C1 clusters, whereas CTNNB1 mutation are predominantly found in MetGene-C3, TissueMet-C3, and SerumMet-C3 clusters (Fig. 2B-lower panel).
To determine whether the resulting patient subgroups are associated with patient outcome, survival analysis was performed. Survival analysis based on consensus clusters revealed that the C1 cluster from metabolic   [6][7][8][9][10][11][12] respectively. The same trends were also observed in the LCI and TCGA-Asian cohorts, however the C3 cluster was not present in either cohort ( Supplementary Fig. S3E, F). The C2 and C3 clusters from both metabolic genes and tissue metabolites of the TIGER-LC cohort have similar survival trajectories (MetGene-C2 and MetGene-C3 in Fig. 3B, and TissueMet-C2 and TissueMet-C3 in Fig. 3C), which should be expected since these metabolic genes and metabolites are correlated. The trajectory of the C3 cluster in serum metabolites, however, did not resemble those of metabolic  15 ) overlaid with metabolite ratios between tumor and adjacent non-tumor tissues from HCC patients. Each node represents a human metabolite in a metabolic pathway. The colors represent log2 fold-change of the tissue metabolites. Blue represents increased mean metabolite abundance from adjacent non-tumor tissues, and red represents higher mean metabolite abundance from tumor tissues. White nodes represent metabolites with no data available. Abbreviations: PCA-Principal Component Analysis; HCC-Hepatocellular Carcinoma. www.nature.com/scientificreports/ genes or tissue metabolites clusters (Fig. 3D). Traditional prognostic biomarker for HCC, i.e., AFP > 300 ng/ml, cannot separate patients into two subgroups with statistical significance in this cohort ( Supplementary Fig. S4).
Hazard Ratio [95%CI] for AFP as prognostic biomarker in this cohort is 2 [0.72-5.5]. Interestingly, the samples that were classified as SerumMet-C1 using serum metabolites were predominantly C1 samples identified by metabolic genes, tissue metabolites, and common Asian subtypes in our previous study 13 , and the samples from the C1 cluster classified by tissue metabolites are consistent with the C1 samples from serum metabolites ( Fig. 4A-upper panel). Similar to the consensus clusters identified using metabolic genes, there is no discernible pattern associated with any cluster defined by the serum metabolites and standard HCC etiology parameters ( Fig. 4A-lower panel). Overall, serum metabolites were representative of metabolic genes and tissue metabolites in their capacity to classify outcome-related molecular subgroups.
Identified serum metabolites are associated with fatty acid metabolism and secondary metabolism of microbial metabolites. We next ascertained the pathways that were present and/or altered among the metabolite-defined clusters. To do so, we performed pathway analysis based on the metabolic gene expression among the MetGene clusters. When comparing the 491 metabolic gene expression between samples from different MetGene clusters using Ingenuity Pathway Analysis (IPA) 25 Table S5). The results concurred with serum metabolite data, where short-chain and long-chain acylcarnitines were differentially abundant among the three serum metabolite clusters ( Fig. 4A and Supplementary Fig. S6).
To further confirm that these dysregulated pathways based on gene expression levels are truly associated with lipid metabolism, we determined the composition of the identified 75 serum metabolites by tabulating the metabolites according to their classes and subclasses. Almost half (46%) of the identified serum metabolites were lipids ( Supplementary Fig. S2B), which consisting mainly of acylcarnitines (28%), phospatidylcholine and lysophosphatidylcholine (PC and lysoPC-25%), while the remainder are other assorted lysolipids, steroids and fatty acids ( Supplementary Fig. S2C). One class of serum metabolites with highest abundance, namely acylcarnitines, is strongly correlated with tissue metabolites (Fig. 4C, D) and reflect the pathway analysis based on tumor and adjacent non-tumor matched tissue gene expression, with HNF4α as possible transcriptional regulator (Supplementary Figs. S7A, B and S8). Notable metabolites from this class are short-chain acylcarnitines (butyrylcarnitine, 2-methylmalonyl carnitine, and glutarylcarnitine) and long-chain acylcarnitines (laurylcarnitine, myristoleoylcarnitine, and oleoylcarnitine). The median relative abundance levels in log10 of these metabolites for SerumMet-C1, C2, and C3 are as follow: butyrylcarnitine-23. 32 metabolism, specifically in process of fatty acid transportation from the cytosol to the mitochondria during fatty acid β-oxidation whereby fatty acids are broken down. The next most abundant class of serum metabolites is lysoPC, which constitutes the majority of biological membranes and is easily obtained through the diet. Other metabolite classes were amino acids, xenobiotics, peptides, carbohydrates, cofactors and vitamins, and energy metabolites. One other notable class of metabolites observed among the 75-serum metabolite set is microbial metabolites, which consisted of p-cresol sulfate (PCS), 4-ethylphenyl sulfate (4-EPS), and 4-methylcathechol ( Fig. 4A Finally, to determine whether clinical variables were associated with various subtypes identified in this study, we performed a Fisher's Exact test between these clinical variables and the subtypes. We found that most of the clinical features are independent of these subtypes, with the two exceptions of AFP/MetGene subtypes and BMI/SerumMet subtypes (Supplementary Table S5). The samples with AFP level > 300 ng/ml are clustered in the MetGene-C1 subgroup, which has worse prognosis compared to other subgroups. The group with BMI > 24 (overweight patients) is linked with the SerumMet-C2 subgroup, which has better prognosis compared to other OV-Opisthorchis viverrini. Each colored box above heatmap represents one sample. The color representing "Good" prognosis and S3 signature is yellow, while "Poor" prognosis and S1 signature is represented by purple. S2 signature is represented by green box.

Discussion
In this study, we present a set of serum metabolites that can potentially be used as non-invasive prognostic biomarkers that are reflective of tumor-tissue status. The multiple levels of metabolite-related data can independently stratify Asian HCC patients into molecular subtypes that have similar patient outcomes. Compared to our previous study that utilized whole transcriptome analysis to identify HCC molecular subtypes 13 , here, we have considerably decreased the number of markers that are required for patient stratification without compromising the subgroup consistency. Furthermore, the identified metabolic genes and metabolites gave new insights into the molecular underpinnings of Asian HCC. Serum metabolite measurement is a potential non-invasive tool for early detection and prognosis determination of HCC. Compared to other noninvasive techniques, such as ultrasonography, computed tomography, or magnetic resonance imaging (MRI), serum metabolites are considerably less cumbersome in terms of implementation and results interpretation. Previous publications have identified several fatty acids and acylcarnitines in serum and plasma as early detection and prognostic biomarkers of HCC, indicating that fatty acid metabolism is universally dysregulated in HCC 12,[26][27][28][29] . However, to our knowledge, there is no study which directly correlates the chain of events between gene expression and metabolites from tumor tissues to serum. Our study therefore may present a more accurate picture of the molecular underpinnings of HCC.
The methodology employed in this study, namely two-step correlation analysis by first correlating tissue gene expression and metabolite and then correlating tissue metabolites and serum metabolites, has unique advantages over single step correlation. This approach combines permutation-based correlation threshold selection with correlation analysis to objectively identify correlated genes and metabolites. The approach also utilized tissue gene expression, tumor tissue and serum metabolites to identify the plausible causal chain of events that should directly reflect molecular processes from tumor tissues to tissue metabolites, and finally to serum metabolites. To our knowledge, this is the first study that utilized these data in this way. This method, combined with gene expression and metabolomics data, allows us to directly observe the chain of associations between gene expression and metabolites in tumor tissue, and between tumor tissue and serum metabolites. It enables us to see a convergence of evidence from multiple levels of data, from which patient subgroups resulting independently from metabolic gene expression, tissue, and serum metabolites had similar patient outcome.
The set of 75 serum metabolites was able to independently stratify patients into subgroups that were also consistent with patient subgroups from metabolic gene expression, tumor tissue metabolites, and whole transcriptome from our previous study 13 . The consistency in these clustering results is another indicator that the identified metabolic genes and metabolites are involved in the disease manifestation. Furthermore, the numbers of genes and metabolites involved in sample stratification were significantly lower compared to our previous study, which will be beneficial for further utilization of these genes and metabolites as prognostic biomarkers. Finally, serum metabolites are more accessible and less invasive than gene expression or metabolites from tumor tissues, therefore they are more suitable as less invasive prognostic biomarkers for Asian HCC patients. This set of serum metabolites can be validated in larger patient cohorts in the future to facilitate HCC patient prognostication in clinical settings.
Lipids and fatty acid metabolism play central roles in all cancer types, including HCC 30 . Pathway analyses based on the abundance of both tumor tissue and serum metabolites, combined with tissue gene expression, indicated that lipid and fatty acid metabolism are among the top dysregulated metabolic pathways in HCC. The roles of one class of fatty acid-related metabolites of interest, i.e., acylcarnitines, are slowly emerging in HCC. Acylcarnitines are known as intermediate metabolites in β-oxidation of fatty acids in mitochondria 31,32 . Shortand medium-chain acylcarnitines largely reside in peroxisomes and produced mainly by liver and muscle 33,34 . A recent study showed that acylcarnitine profiles can be used as biomarkers for obesity-driven HCC, pinpointed a serum metabolite, oleoylcarnitine (C18:1 carnitine), which was found in this study, as one of the biomarkers for NAFLD and HCC, and revealed the mechanism of actions of oleoylcarnitine through STAT3 pathway activation 31 . The same study also identified several other acylcarnitines that were also identified in this study, namely laurylcarnitine (C12:0) and myristoleoylcarnitine (C14:1), as possible HCC biomarkers. For short-and medium-chain acylcarnitines, the evidence of the involvement in HCC are limited. Possible master regulators for these metabolites are HNF4α, which were found in the pathway analysis and interacts with CPT1 and CPT2. HNF4α was found to be the main regulator for acylcarnitines in brown adipose tissue in mice, in which the brown tissue increases the uptake of acylcarnitines during the cold 35 . This will need further analyses to confirm the actual link between HNF4α and acylcarnitines in human HCC.
The gut microbiome has been shown to play several important roles in liver diseases. Several mechanisms involving the gut microbiome have been proposed, along with approaches to mitigate, prevent, or even reverse HCC 36 . Many essential metabolites and nutrients are produced by gut microbes to provide precursors for more complex molecules in human, such as hormones, vitamins, and other cofactors. Some metabolites identified in this study, i.e., 4-EPS and PCS, are microbial metabolites derived from the diet 37 . Several studies found that PCS is a highly potent uremic toxin 38 and can induce cell proliferation and migration of kidney cancer 39 . The roles of these compounds in HCC are still obscure. p-cresol is produced mainly by Clostridioides difficile (also known as Clostridium difficile or C. diff) 40 . A recent study predicted that there may be other microbes which can also produce p-cresol, but this needs to be confirmed 41 . p-cresol is metabolized in the liver through a conjugation process, i.e. sulfonation, to produce a less toxic form, PCS 42 . There are several possible mechanisms that might be specifically linked to PCS. It was found through molecular docking simulations that PCS can activate epidermal growth factor receptor (EGFR) via interdomain pocket interaction of extracellular EGFR, and finally induce kidney tissue remodeling 43 . A recent study showed that EGFR signaling promotes survival of HCC cell lines via interaction with E-cadherin and β-catenin, but only in late stages of the disease 44 . Another possible mechanism is that PCS might increase liver detoxification burden, especially sulfonation, affecting the ability of liver to detoxify www.nature.com/scientificreports/ other highly potent liver toxins, such as acetaminophen (paracetamol) or aflatoxin 45 . This study also proposed a less invasive means for measuring PCS through urine in HCC patients 45 . PCS may also be associated with the microbiome in that PCS-producing C. diff was found to be dominating in the gut of mice, due to toxicity of PCS to other bacterial species in the gut 46 and may be the mechanism of action of dysbiosis from C. diff infection. Therefore, PCS might be a key link between gut dysbiosis and HCC pathogenesis. Biomarker discovery for HCC is an active field of research. The metabolites mapped onto the KEGG's Human Metabolism Map in Fig. 1C were previously found to be dysregulated in HCC and have been proposed as either prognostic or diagnostic biomarkers, but independently they were not able to outperform traditional HCC biomarkers. To further explore these dysregulations, we have assessed tumor specific metabolic gene alterations and serum alterations and determined their association with clinical features including patient outcome. The specific alterations and how they apply to HCC tumorigenesis would require detailed molecular and functional studies which could be addressed in future studies. Numerous studies have been conducted to search for suitable serum biomarkers that can predict survival outcome of patients in different populations [47][48][49][50] . Currently, however, clinically accepted prognostic serum biomarkers are limited to only α-fetoprotein (AFP), AFP-L3, and Des-γ-carboxy prothrombin (DCP) 51,52 . Various acylcarnitines were suggested as serum or plasma biomarkers in several Asian HCC cohorts, but none of these biomarkers have been validated in any clinical trials 29,53,54 . The main reason for this may result from the inconsistency of the data from these studies. One study of PCS suggests that it could be a urine-based biomarker for differentiating HCC and cirrhotic patients. However, the study found that HCC patients have lower PCS level compared to cirrhotic patients and suggested that sulfonation alteration might be responsible mechanisms 55 . Thus, additional comprehensive studies are needed to determine the consistency and effectiveness of such biomarkers.
In conclusion, we have established stable metabolic subtypes based on metabolic gene expression, tissue metabolites and importantly, serum metabolites, through a two-step correlation analysis. Metabolic gene subtypes can stratify patients into better and worse prognosis based on the metabolic gene expression across different Asian patient cohorts. Tissue and serum metabolites can also consistently stratify patients into subgroups similar to the metabolic gene subtypes and our previously identified common molecular Asian transcriptome-defined subtypes. Some of the serum metabolites that we identified might serve as prognostic biomarkers after future clinical validation. Validation in other cohorts should be a priority to translate this finding into clinical use. Publicly available resource with both gene expression and serum metabolite measurement are not currently available to further validate our findings. If and when such resources are available, we should have clearer picture of the serum metabolites associated with HCC. Finally, future studies that link metabolic alterations with disease etiologies and clinical features would be of future interest to shed light on the underlying mechanisms contributing to HCC which could be utilized in clinical management.

Methods
Cohorts, clinical, and genomic data. Tissue gene expression of the TIGER-LC cohort 13 , LCI cohort 20 and TCGA liver cohort 24 , were used this analysis. The list of metabolic genes (3765 genes) were retrieved from Human Metabolic Reaction 2.0 (HMR2.0) database 56 . Tissue metabolite data were described previously among the TIGER-LC cohort 13 . In brief, metabolites were measured in tumor tissues and serum using Metabolon's Discover HD4 Platform, which is an untargeted metabolomics approach that combines positive and negative modes of liquid chromatography (LC+/LC−) and gas chromatography/mass spectrometry (GS/MS) to measure metabolite abundance. After data processing, 718 and 990 metabolites were detected from tumor tissues and serum samples, respectively. After excluding unidentified metabolites (metabolites that cannot be matched to standard library compounds), there were 562 and 615 metabolites remaining for further analyses from tumor tissues and serum, respectively. Clinical features data included in this study are 1. Mutation status of TP53 and CTNNB1 genes, determined in our previous study 13 2. Sex 3. Age, with two categories of < = 50 and > 50 years old 4. Body Mass Index (BMI), with two categories of BMI < = 24 and BMI > 24 5. Diabetes status, with two categories of no diabetes and with diabetes 6. CA-19-9 level, with two categories of CA-19-9 < = 37 U/ml and CA19-9 > 37 U/ml 7. AFP level, with two categories of AFP < = 300 ng/ml and AFP > 300 ng/ml  13 . Metabolic gene expression between tumor and adjacent non-tumor tissue were compared using a paired t-test. The resulting p-values were adjusted using Benjamini-Hochberg false discovery rate (FDR) procedure 60 .
Metabolic genes with FDR-adjusted p-values of less than 0.05 were retained for further correlation analysis. Spearman correlation coefficients between the expression of 2300 metabolic genes from tumor tissues and 562 metabolites from the TIGER-LC cohort were calculated. For correlation threshold selection, the gene names, metabolite names, and the sample names in both metabolic genes and metabolite matrices were permutated to destroy original relationships of the original data. Then, the correlation coefficient between permutated metabolic gene expression and metabolites were calculated and recorded with 10,000 permutations. The distribution of correlation coefficients from the original data was plotted against the average of distributions of correlation coefficients from the permutated data. Finally, the intersection points of correlation coefficients between the original data and the average of permutated data was determined to be -0.165 and 0.165 ( Supplementary Fig. S10) and were used as correlation coefficient thresholds to select the correlation coefficients between metabolic genes and metabolites for further analyses.
The metabolic genes and tissue metabolites were selected by calculating the average value of the sum of absolute correlation coefficients from the following equations: where |c ij | and |c ji | are absolute correlation coefficient between metabolic gene i and tissue metabolite j, I is total number of metabolic genes, J is total number of tissue metabolites, G i is the average value of sum of absolute correlation coefficients of metabolic gene i over tissue metabolites, and T j is the average value of sum of absolute correlation coefficients of tissue metabolite j over all metabolic genes. G and T that are higher than the correlation coefficient threshold identified by permutation test are then selected for further analyses.
For the first step of correlation, the number of metabolic genes and tissue metabolites from correlation coefficient threshold selection is 491 genes and 78 metabolites, respectively. Tissue metabolites that are tumorspecific was determined based on a metabolite fold-change of 1.5 between tumor and non-tumor adjacent tissues, which yielded 40 tumor-specific tissue metabolites that were used in the second step of correlation with serum metabolites.
For the second step of correlation, correlation thresholds used for correlation analysis between tissue and serum metabolites are − 0.15 and 0.15. These thresholds correspond to the sums of correlation coefficients between tumor tissue gene expression and tissue metabolites previously identified. Serum metabolite selection, the same strategy of average of the sum of absolute correlation coefficient is used, following the equation: where |c km | is absolute value of correlation coefficient between serum metabolite k and tumor-specific tissue metabolite m, M is total number of tumor-specific tissue metabolites, and S k is average value of sum of absolute correlation coefficient of serum metabolite k over all tumor-specific tissue metabolites. The final number of serum metabolites that passed the threshold is 75 metabolites. www.nature.com/scientificreports/ Consensus clustering analysis and subclass mapping analysis. To define consensus clusters for the patient samples among the Thai cohort, the matrix of 491 metabolic genes was used as the input for consensus clustering 61 . Two other cohorts were used to validate identified metabolic genes, namely TCGA-LIHC 24 and LCI 20 cohorts. The TCGA-LIHC cohort is comprised of liver cancer patients from across US cancer centers as a part of The Cancer Genome Atlas project. The LCI cohort is comprised of Chinese liver cancer patients from the Liver Cancer Institute. In our previous study we found that Asian patients had common molecular features, which are different than patients with European or African descent 13 . Therefore, in this study we only used Asian descent samples from TCGA-LIHC. The data matrices from these two cohorts were obtained by using the original 491 genes from the TIGER-LC cohort analysis and identifying the overlapping genes from the TCGA-LIHC or LCI data sets. The LCI cohort has 385 overlapping genes, while TCGA-LIHC cohort has 461 overlapping genes. Consensus clustering was performed by using ConsensusClusterPlus 16 package version 1.38 in R version 3.3.3. The reason for using consensus clustering in this study is to determine whether the sets of metabolic gene expression, tumor tissue, and serum metabolites can independently stratify patients into subgroups that have similar consistency amongst different molecular data types. The parameters used in all the analyses were as follows: maxK = 8, number of bootstrap = 20,000; item subsampling proportion = 0.8; feature subsampling proportion = 1; cluster algorithm = hc [hierarchical clustering]; inner linkage type = ward.D; final linkage type = ward.D; correlation method = spearman; seed = 3044. The final number of clusters (k) selected for further analyses is three. The clustering results from the LCI and TCGA-Asian cohorts were then subjected to Subclass Mapping (SubMap) 62 analysis to determine whether the clusters from these two cohorts matched the clusters from the TIGER-LC cohort. SubMap version 7.0 from GenePattern software suite 63 was used with default parameters. After two-step correlation analysis, the list of 40 tumor-specific tissue metabolites and 75 serum metabolites were also used for consensus clustering with the same parameters previously described to confirm that the metabolites identified by correlation analysis can independently and consistently stratify patients into similar subgroups. Submap analysis was also used to determine the if the TIGER-LC samples matched with other HCC signatures. The HCC signatures used in this study were S1-S3 signature based on WNT-TGFβ, AKT-MYC signaling pathway activation, and hepatocyte differentiation 17 , stem cell signature based on expression of stem cell gene set 18 , EpCAM signature based on a gene set that is co-expressed with EpCAM 19 , metastatic signature based on a gene set related to metastasis of HCC 20 , CCA-like signature based on a gene set from cholangiocarcinoma-like HCC 21 , Cluster 1 and 2 signatures based on 238-gene signature from CCA patients 22 , and proliferation signature based on copy number alterations and activation of oncogenic pathways 23 . Statistical and survival analyses. All statistical and survival analyses were performed using R statistical computing software version 3.3.3. Survival analysis on all cohorts was performed using a Kaplan Meier estimator and a Cox proportional hazards model was calculated between the worst survival group and the rest of the groups, accompanied with Log-rank p-value for each comparison. The Cox proportional hazard assumption was tested and met for all the tests carried out. The Paired t-test was used to calculate differential gene expression between tumor and adjacent non-tumor tissues. All p-values are two-sided p-values unless otherwise stated and adjusted for false discovery rate by Benjamini-Hochberg FDR procedure 64 .
Pathway analyses. The final 491 metabolic genes were subjected to pathway analyses using GSEA 65