A cluster of metabolism-related genes predict prognosis and progression of clear cell renal cell carcinoma

Clear cell renal cell carcinoma (ccRCC) has long been considered as a metabolic disease characterized by metabolic reprogramming due to the abnormal accumulation of lipid droplets in the cytoplasm. However, the prognostic value of metabolism-related genes in ccRCC remains unclear. In our study, we investigated the associations between metabolism-related gene profile and prognosis of ccRCC patients in the Cancer Genome Atlas (TCGA) database. Importantly, we first constructed a metabolism-related prognostic model based on ten genes (ALDH6A1, FBP1, HAO2, TYMP, PSAT1, IL4I1, P4HA3, HK3, CPT1B, and CYP26A1) using Lasso cox regression analysis. The Kaplan–Meier analysis revealed that our model efficiently predicts prognosis in TCGA_KIRC Cohort and the clinical proteomic tumor analysis consortium (CPTAC_ccRCC) Cohort. Using time-dependent ROC analysis, we showed the model has optimal performance in predicting long-term survival. Besides, the multivariate Cox regression analysis demonstrated our model is an independent prognostic factor. The risk score calculated for each patient was significantly associated with various clinicopathological parameters. Notably, the gene set enrichment analysis indicated that fatty acid metabolism was enriched considerably in low-risk patients. In contrast, the high-risk patients were more associated with non-metabolic pathways. In summary, our study provides novel insight into metabolism-related genes’ roles in ccRCC.

Patients with a higher risk score had a poorer clinical outcome. The Kaplan-Meier survival analysis was applied to determine OS in different subgroups of patients according to the risk score level. The results demonstrated that risk score may be a potential prognostic biomarker for patients with the following characteristics: aged ≤ 60 or > 60 years old (Fig. 4A The risk score system was associated with various clinicopathological parameters. We explored the correlation between the risk score and different clinicopathological factors. Chi-square test revealed that the risk scores were significantly associated with patients' gender, histological grade, TNM stage, T stage, N stage, M stage and Vital status (Table 1). Furthermore, subgroup analysis confirmed that higher risk scores were significantly correlated with higher TNM stage, higher tumor T stage, distant metastasis, lymph node metastasis, and higher histologic grade (Fig. 5A-F).

RT-qPCR and external validation of expression level.
We used online database to validate the expression level of the 10 genes constructing our prognostic model. Consistent with our results, ALDH6A1, FBP1, HAO2 and PSAT1 were found to be significantly downregulated in tumor samples compared with normal samples in both Oncomine and TIMER database (supplementary figure 3A,B); TYMP, HK3, and P4HA3 were found to be significantly overexpressed in tumor samples in both Oncomine and TIMER database, while IL4I1 and CYP26A1 were found over-expressed only in TIMER database (supplementary figure 4A,B). Meanwhile, RT-qPCR assays revealed that ALDH6A1, FBP1, HAO2 and PSAT1 were markedly under-expressed in tumor tissues in exceeding 60% cases, and that TYMP, HK3, P4HA3, IL4I1, CYP26A1 and CPT1B were over-expressed in tumor tissues in exceeding 70% cases ( Fig. 6A-J). Furthermore, we retrieved the HPA database and found that the protein expression of ALDH6A1, FBP1 and PSAT1 were downregulated in KIRC; the protein expression level of TYMP, P4HA3 and IL4I1 were upregulated in KIRC (Fig. 7).
Functional annotation of 10-gene signature. To investigate the potentially underlying mechanism between the two risk groups, GSEA was performed in TCGA_KIRC cohort and we found 34 significantly www.nature.com/scientificreports/   figure 5).

Discussion
Globally, RCC accounts for more than 2% of neoplasms in humans worldwide with the incidence and mortality persistently increasing 1 . Recently, gene signatures based on specific characteristic-related to predict prognosis have become a hotspot in cancer research [13][14][15] . In the present study, to the best of our knowledge, we constructed www.nature.com/scientificreports/ the first novel metabolism-related risk signature with the potential application of predicting the prognosis and progression of ccRCC. We identified a novel robust 10-gene metabolic prognostic signature based on the TCGA_KIRC dataset and further validated it in CPTAC_ccRCC dataset. Our risk score model could efficiently stratify patients' survival with various clinicopathological parameters and be an independent prognostic factor; moreover, the risk scores increased with the patients' increasing malignancy. Besides, the performance of our model in the prediction of the 5-year overall survival was superior to other parameters, indicating that it might facilitate the development of a long-term follow-up plan; meantime, the prediction ability of 1-, 3-and 5-years overall survival surpassed a single gene. Altogether, these results revealed our 10-gene metabolic prognostic signature has a great potential in ccRCC. However, in the future, validating our signature in more independent cohorts is still required.
We performed GSEA analysis to investigate the underlying molecular mechanism of the signature. Notably, we found out that fatty acid metabolism was significantly enriched in the low-risk patients, while the high-risk www.nature.com/scientificreports/  www.nature.com/scientificreports/ patients were more associated with the non-metabolic pathways. Research evidence reveals that abnormal accumulation of lipid droplets plays a crucial role in ccRCC progression 16 . Our study provided insights by suggesting that low-risk patients could benefit more from fatty acid metabolism targeted therapy in the future. Moreover, the gene annotation and analysis results revealed that overexpressed genes in the high-risk patients are involved in incretin synthesis, secretion, and inactivation, and that the down-regulated genes participate in complement and coagulation cascades, metabolism of xenobiotics by cytochrome P450, response to glucose. Hua et al. 17 identified an immune-related risk signature for predicting prognosis of ccRCC, they found that tumors from high-risk patients had higher relative abundance of T follicular helper cells, regulatory T cells than low-risk patients. Moreover, the time-dependent ROC analysis revealed that the AUC was 0.753, 0.686, and 0.637 at 1, 3, and 5 years. However, in our study, we extracted a cluster of metabolism-related genes and constructed a multi-gene signature, the AUC at 3 and 5 years were 0.731 and 0.780 respectively, which exhibit a better predictive performance.
Most of the genes in our signature have previously been shown to be involved in cancers. However, several genes' role in ccRCC remain unclear. The Fructose-Bisphosphatase 1 (FBP1) is a rate-limiting enzyme in gluconeogenesis, which is suppressed in kidney tumors and thus feeds ccRCC; FBP1 is associated with impaired cell proliferation, glycolysis and the pentose phosphate pathway in ccRCC in a catalytic-activity-independent manner via direct interaction with the HIF inhibitory domain 18 . PSAT1 is up-regulated in many cancers and acts as an oncogene that exerts a vital role in cancer progression and metastasis [19][20][21] . Interestingly, through mining public database and RT-qPCR, we found that PSAT1 in ccRCC patients is down-regulated compared with normal tissue. In the future, we will further investigate the underlying mechanism. Hydroxyacid oxidase 2 (HAO2) encodes peroxisomal proteins with 2-hydroxy acid oxidase activity, which participated in the production of reactive oxygen species and cellular breakdown 22 ; HAO2 has already been shown downregulated in hepatocellular carcinoma (HCC) tissues and ccRCC, overexpression of which restrained HCC and RCC cells proliferation by eliminating lipid droplet accumulation 23,24 . ALDH6A1 was found significantly reduced in metastatic prostate cancer according to the immunochemistry and western blot results 25 . Our results also revealed that ALDH6A1 is suppressed in ccRCC, which predicts poor prognosis. TYMP is a rate-limiting enzyme in the thymine catabolic pathway and contributes to tumor angiogenesis 26,27 . Prolyl 4-Hydroxylase Subunit Alpha 3 (P4HA3) is a critical enzyme in maintaining the stability of newly synthesized collagen. Song H et al. indicated that P4HA3 is upregulated in gastric cancer and epigenetically activated by slug 28 . In the present study, we demonstrated that P4HA3 is upregulated in ccRCC, and that its high expression reflected poor outcome. IL4I1, an immune-associated gene, inhibited CD8 + T-cells and thus exerted immunosuppressive functions 29 . Our results first revealed that IL4I1 was upregulated in ccRCC, and that low expression of IL4I1 indicated favorable prognosis. HK3 encodes hexokinase 3, upregulation of which is associated with epithelial-mesenchymal transition and may be a metabolic adaptation for colorectal cancer progression 30 . However, the HK3′s role in ccRCC remains unknown. CPT1B is the rate-controlling enzyme in the long-chain fatty acid beta-oxidation pathway in mitochondria. In bladder cancer, Vantaku et al. 31 demonstrated the suppression of CPT1B inhibits cell proliferation, metastasis in vivo. However, to the best of our knowledge, the role of CPT1B in ccRCC has not been understood. In our study, we showed that CPT1B was significantly upregulated in ccRCC, which was associated with poor prognosis. CYP26A1 (cytochrome P450, family 26, subfamily A, polypeptide 1), a retinoic acid-metabolizing enzyme, is a crucial regulator of cell proliferation, differentiation and apoptosis that is overexpressed in many types of tumors 32,33 . However, one major limitation of our study is that our research regarding the ten genes constructing www.nature.com/scientificreports/ www.nature.com/scientificreports/ our signature is insufficient, further intensive studies are required; moreover, despite the robust performance of our model in predicting survival, future validation in clinical practice is needed. In summary, for the first time, we identified a 10-gene risk signature related to metabolism to independently predict the prognosis of patients with ccRCC; this signature may provided guidance for targeted therapy and potential biomarkers in the future.

Materials and methods
Data acquisition. The messenger RNA (mRNA) sequencing data and corresponding clinical data of ccRCC patients were obtained from TCGA_KIRC Cohort of TCGA database. Similarly, the CPTAC protein expression profile of ccRCC and clinical data were downloaded from CPTAC database as validation dataset. The detailed characteristics of patients were presented in Table 1. Moreover, metabolism-related gene set was extracted from the Molecular Signature Database v5.1 (MSigDB) (https ://www.broad .mit.edu/gsea/msigd b). Only those genes that were common expressed in these three database were selected for further prognostic analysis.
The prognostic metabolic gene signature construction. 930 annotated metabolic protein-coding genes were utilized for the differentially expressed analysis with the "Limma" version-3.6.1 R package 34 . Then, we performed univariate Cox regression analysis and Lasso-cox regression analysis to identify the prognostic-related metabolic genes and construct a metabolic-related gene signature 35 . A P-value < 0.001 in univariate Cox regression analysis was treated as statistically significant. We then computed the risk score for each patients as follows: Risk score = (expression of mRNA1 *Coefficient mRNA1 ) + (expression of mRNA2 *Coefficient mRNA2 ) + (expression of mRNAn *Coefficient mRNAn ) 36 . According to the median risk value, patients in both TCGA_KIRC Cohort and CPTAC_ccRCC Cohort were separated into low-and high-risk group; The Log-rank test and Kaplan-Meier survival curve were used to assess the prognostic value. Furthermore, we investigated the time-dependent prognostic significance of the gene signature by the R package "survivalROC" 37 . Meanwhile the independence of the prognostic gene signature in TCGA_KIRC cohort was evaluated by multivariate analysis; the associations between risk score and various clinicopathological parameters were also assessed. A predictive nomogram was built including clinicopathological characteristics and risk score by the R package "rms" 38  Bioinformatics analysis. To further validate the expression of the prognostic genes constructing the gene signature, we retrieved the Oncomine database (https ://www.oncom ine.org/), Timer database (https ://cistr ome. shiya pps.io/timer /) and the human protein atlas (HPA) database (https ://www.prote inatl as.org/). To explore the possible underlying mechanism, we performed differentially expressed gene analysis between low risk patients and high risk patients with "edgeR" package 39 ; GSEA analysis were conducted to reveal enriched terms; upregulated genes and downregulated genes were put into the gene annotation and analysis resource "Metascape database" (https ://metas cape.org/gp/index .html#/main/step1 ) to obtain functional annotation of these genes.

Statistical analysis.
Chi-square test was conducted to detect the difference between two groups of patients with regard to clinicopathological features; differences in risk score between various cllinicopathological param-Scientific RepoRtS | (2020) 10:12949 | https://doi.org/10.1038/s41598-020-67760-6 www.nature.com/scientificreports/ eters were tested by Student's t test. All statistical analysis were performed using GraphPad Prism7.0, SPSS version 16.0 software (SPSS Inc., Chicago, IL, USA); all data for bioinformatic analysis in the study were processed by Software R (version 3.6.1, https ://www.r-proje ct.org/). Data are presented as the mean ± SEM. If not specified above, P < 0.05 was regarded as statistically significant.

Data availability
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.