Prognostic value and underlying mechanism of autophagy-related genes in bladder cancer

Bladder cancer (BLCA) is the most common malignancy whose early diagnosis can ensure a better prognosis. However, the predictive accuracy of commonly used predictors, including patients’ general condition, histological grade, and pathological stage, is insufficient to identify the patients who need invasive treatment. Autophagy is regarded as a vital factor in maintaining mitochondrial function and energy homeostasis in cancer cells. Whether autophagy-related genes (ARGs) can predict the prognosis of BLCA patients deserves to be investigated. Based on BLCA data retrieved from the Cancer Genome Atlas and ARGs list obtained from the Human Autophagy Database website, we identified prognosis-related differentially expressed ARGs (PDEARGs) through Wilcox text and constructed a PDEARGs-based prognostic model through multivariate Cox regression analysis. The predictive accuracy, independent forecasting capability, and the correlation between present model and clinical variables or tumor microenvironment were evaluated through R software. Enrichment analysis of PDEARGs was performed to explore the underlying mechanism, and a systematic prognostic signature with nomogram was constructed by integrating clinical variables and the aforementioned PDEARGs-based model. We found that the risk score generated by PDEARGs-based model could effectively reflect deteriorated clinical variables and tumor-promoting microenvironment. Additionally, several immune-related gene ontology terms were significantly enriched by PDEARGs, which might provide insights for present model and propose potential therapeutic targets for BLCA patients. Finally, a systematic prognostic signature with promoted clinical utility and predictive accuracy was constructed to assist clinician decision. PDEARGs are valuable prognostic predictors and potential therapeutic targets for BLCA patients.

. ARGs-based signature's predictive value has also been validated in glioblastoma 11 and breast cancer 12 . However, the clinical relevance and prognostic significance of ARGs-based signature in BLCA remain unknown.
In present study, we aimed to develop a reliable prognostic model for BLCA patients using multiple ARGs and investigate its clinical implications. Furthermore, we evaluated the correlation between present prognostic model and tumor microenvironment (TME) and explored the underlying mechanism. Finally, to facilitate clinical utility of the model mentioned above, a systematic prognostic signature was constructed by integrating clinical predictors and ARGs-based molecular biomarkers.

Materials and methods
Data source and preprocessing. Transcriptomic data (RNA-Seq FPKM) and clinical information were downloaded from The Cancer Genome Atlas (TCGA) portal (https:// portal. gdc. cancer. gov/), including 413 BLCA data and 19 non-tumor data. After integrating those data through ID numbers, the gene measured with multi-probes were replaced with their average via limma package (http:// www. bioco nduct or. org/ packa ges/ relea se/ bioc/ html/ limma. html) 13 . Then, 371 patients with follow-up time > 90 d and complete data were selected for further analyses. All data were processed and analyzed with R software 3.6.0 (https:// www.r-proje ct. org/), and the flow diagram of present study was shown in Fig. 1.

Differential expression analysis and ARGs identification. DEGs between tumor tissues and normal
tissues were analyzed through Wilcoxon test. p-value was adjusted with FDR, and filter criteria was FDR < 0.05 and|log 2 fold-change [FC]|> 1. DEARGs were identified by matching the DEGs with the latest list of ARGs obtained from the Human Autophagy Database (HADb) website (http:// www. autop hagy. lu/ index. html). Univariate Cox regression analysis was used to identify possible PDEARGs (P < 0.05).
Prognostic signature construction and validation. Survival R package (https:// cran.r-proje ct. org/ web/ packa ges/ survi val/ index. html) was adopted to construct a PDEARGs-based prognostic model. To avoid overfitting, PDEARGs that correlated highly with other genes were deleted. Then, Cox proportional hazards regression was used to build a prognostic risk model, and the regression coefficients were used to weight the expression value of selected PDEARGs. The risk score of each patient was calculated using the following formula 14 : Individuals were separated into high-risk or low-risk groups according to the median risk score. Subsequently, survival analysis and receiver operating characteristic (ROC) analysis were performed as reported 6 . The expression level of the five selected PDEARGs between two risk groups was analyzed with R software 3.6.0. Quantitative real-time PCR (qRT-PCR) analysis. The expression level of five selected PDEARGs between tumor and adjacent tissues were further detected through qRT-PCR. Briefly, the BLCA samples used for present assay were collected from three BLCA patients who underwent radical cystectomy at Xijing Hospital, and corresponding paracancerous tissue were used as control. The clinical characterization of selected patients   Independent prognostic factors analysis. Univariate Cox regression analysis was performed to identify factors affecting the OS of BLCA patients. Then, multivariate Cox regression analysis was used to evaluate whether the risk score generated from the present prognostic model could be used as an independent prognostic factor. P < 0.05 was considered statistically significant.
Clinical utility of prognostic signature. The relationship between risk factors of the present model and certain clinical variables (i.e., age, gender, histological grade, pathological stage, and tumor-node-metastasis (TNM) status) was analyzed with t-test. The box plot was prepared with beeswarm R package (https:// cran.rproje ct. org/ web/ packa ges/ beesw arm/ index. html), and the impact of PDEARGs on BLCA prognosis was analyzed through survival R package (https:// cran.r-proje ct. org/ web/ packa ges/ survi valAn alysis/ index. html).
Correlation between risk score and TME. Tumor purity, infiltrating stromal and immune cells of TCGA-BLCA were assessed through ESTIMATE R package (https://r-forge.r-proje ct. org/ proje cts/ estim ate/) as previously reported 15 . The relative fraction of 22 TIICs types in each sample was quantified by the CIB-ERSORT method and LM22 signature matrix (https:// rdrr. io/ github/ singh a53/ amritr/ src/R/ suppo rtFunc_ ciber sort.R) 16,17 . The algorithm ran at 100 permutations with a threshold of P < 0.05 to select eligible patients for further analysis 18 . The correlation between risk score and TME was analyzed with the Pearson correlation coefficient test, and the impact of TME on clinical variables was evaluated as well.
Enrichment analysis of PDEARGs. GO function enrichment of the five selected PDEARGs was performed via clusterProfiler and enrichplot R package (http:// master. bioco nduct or. org/). FDR < 0.05 was considered statistically significant. Then, the top 10 terms involved with different GO functions were plotted. To better explain the relationship between PDEARGs and GO function, a chord plot of representative terms was constructed with Goplot R package (http:// wencke. github. io/).

Construction and validation of PDEARGs-based systematic prognostic signature. A systematic
prognostic signature was constructed through a similar method mentioned above by integrating seven clinical variables (i.e., age, gender, histological grade, pathological stage, and TNM status) with the signature discussed above. The variables highly correlated with others were deleted to avoid overfitting, and the regression coefficients were used to weight selected variables. The median of the systematic risk was used to separate the patients into two risk groups, and survival probability was analyzed using R software 3.6.0. Predictive accuracy of the novel systematic signature was evaluated with survivalROC R package (https:// cran.r-proje ct. org/ web/ packa ges/ survi valROC/ index. html). Finally, the systematic signature was visualized through a nomogram constructed by rms R package (https:// cran.r-proje ct. org/ web/ packa ges/ rms/ index. html).

Results
Identification of BLCA specific PDEARGs. We obtained 3126 differentially expressed genes (DEGs) based on the TCGA-BLCA dataset, among which 1223 genes were downregulated, and 1903 genes were upregulated in tumor tissues compared with normal tissues (false discovery rate (FDR) < 0.05, |log 2 FC|> 1; Fig Construction of PDEARGs-based prognostic signature. The regression coefficients were employed to construct a five PDEARGs-based prognostic risk model (Table 1). Subsequently, the risk score for each patient was calculated with the following computational formula: Individuals were sorted into a high-risk group (n = 185) and a low-risk group (n = 186) by the median risk score of 1.02. Survival analysis indicated that the prognosis was poorer in the high-risk group than in the lowrisk group (P < 0.001; Fig. 3A). Precisely, the 5-year OS rate in the high-risk group was 33.2%, while it was 59.5% in the low-risk group. Then, we analyzed the distribution of each patient's risk score and survival status, and a large amount of death existed in the high-risk group (Fig. 3B). As well, a heat map and a series of box plots were generated to depict the expression level of the five selected PDEARGs, among which the protective gene (APOL1) was downregulated, and the other four risk genes were upregulated in patients with high-risk score (Fig. 3B, D). The area under the curve (AUC) was 0.724, which was much higher than other clinical parameters, suggesting that the present model was more accurate in predicting BLCA patients' OS (Fig. 3C).
To validate the RNA-seq data, we analyzed the expression level of five selected PDEARGs using qRT-PCR. As shown in Fig. 3E and Supplementary Table 2, the protective gene (APOL1) was downregulated, while the other four risk genes were upregulated in tumor, compared with the adjacent normal tissue, which were consistent with the RNA-seq data above. Independent prognostic value of present signature. As shown in Fig. 4A, the variables of pathological stage, tumor-node-metastasis (TNM) status, and risk score were associated with the prognosis of BLCA patients (P < 0.05). Multivariate analysis showed that the risk score was an independent prognostic factor for OS (P < 0.01; Fig. 4B). The hazard ratio for the risk score was 1.695, indicating a high-risk score would predict a bad prognosis. The commonly used clinical variables, such as age, gender, pathological stage, and TNM status, were insufficient to serve as independent prognostic predictors (P > 0.05).
Clinical utility of present signature. The relationship between present model and clinical variables was analyzed ( Table 2). High-expression of APOL1 was associated with decreased pathological stage, N status, and M status (P < 0.05; Fig. 5B-D). Contrarily, as the value of risk score or the expression of the other four risk genes (DIRAS3, NAMPT, P4HB, and SPHK1) increased, the histological grade, pathological stage, T status, or N status of BLCA patients increased (P < 0.05; Fig. 5E-U). Furthermore, low-expression of APOL1 and increased risk score were observed in patients with age > 65 (P < 0.05; Fig. 5A-Q). Survival analysis showed that high expression  www.nature.com/scientificreports/ of the protective gene APOL1 indicated a good prognosis (P < 0.01; Fig. 6A), while increased expression of risk gene P4HB resulted in a poor prognosis (P < 0.05; Fig. 6D). The other three risk genes (DIRAS3, NAMPT, and SPHK1) had no significant effect on survival outcome (P > 0.05; Fig. 6B, C, and E).
Correlation between risk score and TME. As shown in Fig. 7A-D, with the increase of risk score generated by present prognostic signature, tumor-infiltrating stromal cells (i.e., stromal score) and estimate score (sum

GO enrichment analysis.
To further explore the relationship between PDEARGs and TAM, Gene Ontology (GO) enrichment analysis was employed. The five selected PDEARGs were associated with 52 biological processes (BPs), six molecular functions (MFs), and nine cellular components (CCs), among which immunologic process-related GO terms were significantly enriched, such as the activation of microglial cell, leukocyte, and macrophage as well as inflammatory response. Besides, oxidative stress and relevant apoptotic signal path-  www.nature.com/scientificreports/ way were significantly enriched (FDR < 0.05; Fig. 8A). Therefore, TAM might be regulated by PDEARGs. To better explain the relationship between PDEARGs and GO terms, a chord plot was constructed, and three essential genes (SPHK1, P4HB, and NAMPT) that might participate in macrophage regulation were found (Fig. 8B).

Construction of PDEARGs-based systematic signature. To promote clinical application and predic-
tive accuracy of the model mentioned above, we constructed a systematic prognostic signature by integrating seven clinical variables and the five PDEARGs-based signatures. The clinical variables (i.e., age, histological grade, and TNM status) correlated highly with the aforementioned risk score were deleted to avoid overfitting. Then, the regression coefficients were used to weight selected variables, and the systematic risk of each patient was calculated with the following computational formula ( Table 3): The median of the systematic risk was 0.96, and with the increase of systemic risk, a bad prognosis of BLCA patients was observed (P < 0.001; Fig. 9A, C). Precisely, the three-year and 5-year OS rate was 49.0% and 37.6% in the high-risk group, while it was 77.6% and 69.0% in the low-risk group, respectively. A nomogram was constructed accordingly, and the AUC value for systematic prognostic signature was 0.791, which was more accurate than the pure PDEARGs-based model (Fig. 9B, D).

External validation of both signatures.
Both PDEARGs-based prognostic signature and systematic signature were validated in external datasets. As shown in Fig. 10A and B, a significantly different survival outcome was observed between the two risk groups (P < 0.005, log-rank test). The OS rate at 5-year was 15.49% and 37.50% for PDEARGs-based high-risk and low-risk groups. The corresponding rate for systematic signature was 20.83% and 32.39%, indicating that both signatures could precisely predict BLCA prognosis regardless of internal and external cohorts. Besides, the prognostic value of five selected PDEARGs was estimated through survival analysis. Except for APOL1, the other four risk genes could effectively separate BLCA patients with different survival outcomes (P < 0.05, Fig. 10C). Though these results were not exactly consistent with Fig. 6, the general trend of survival curve between the high expressive and low expressive groups was similar.

Discussion
BLCA is a frequent malignant disease with rising incidence and high recurrence rates 19 . Although new diagnostic and therapeutic strategies have been carried out over the past several decades, the clinical outcome of BLCA patients remains unsatisfied 4,20 . Hence, there is an urgent need to develop better diagnostic methods to accurately identify asymptomatic and recurrent individuals at an early stage and propose novel treatment targets to improve prognosis. Previous studies have revealed that autophagy plays an essential role in TME regulation, resulting in tumor cell migration and invasion, tumor stem cell maintenance, and therapy resistance 21,22 . Therefore, ARGs may be an ideal biomarker or indicator to predict the progression and prognosis of BLCA.   www.nature.com/scientificreports/ Recently, many malignant tumor-related studies have shown that ARGs-based signature or multigene expression patterns can favorably predict cancer prognosis 9,10,12 . For example, P4HB retrieved from HADb website has been identified as a prognostic biomarker for BLCA, which could dramatically inhibit the invasion and proliferation of bladder cancer cells, providing novel insights for autophagy-mediated tumor progression 23 . However, whether multigene-based prognostic signature can predict the outcome of BLCA patients aroused our interest. In present study, we identified five optimal PDEARGs based on a comprehensive analysis, among which APOL1 was a protective gene, and the other four PDEARGs were risk genes. Then, we constructed a reliable PDEARGsbased prognostic model that could stratify TCGA-BLCA patients into two risk groups with statistically different survival outcomes. The risk score generated from the present signature could serve as an independent prognostic factor to predict patients' OS, and the predictive accuracy (AUC = 0.724) was much better than other clinical parameters. As reported, AUC > 0.60 was regarded as acceptable for predictions 24 ; thus, the present PDEARGsbased signature can precisely predict the prognosis of BLCA patients.
Subsequently, we analyzed the expression of the five PDEARGs in two risk groups and evaluated the relationship between present signature and certain clinical variables. Firstly, the expression of the protective gene APOL1 was downregulated in the high-risk group, and the other four risk genes were upregulated in the patient with an increased risk score. Consistently, the RNA level of five PDEARGs detected through qRT-PCR obtained similar results as RNA-Seq data, indicating its potential for clinical transformation. However, we didn't include a large sample in the present proof of concept study, which need further confirmation with prospective clinical study. Besides, APOL1 expression significantly decreased with the increase of pathological stage, N and M status. The high expression of risk genes was accompanied by bad histological grade, pathological stage, TNM status, and prognosis. Based on the above, the present signature constructed with low-expression of protective gene and high-expression of risk genes could accurately predict BLCA patients' progression, including histological grade, pathological stage, TNM status, and survival outcome. What's more, BLCA incidence increased with age 25 , and www.nature.com/scientificreports/ we found that APOL1 level was down-regulated and risk score increased in elderly patients (age > 65), which further confirmed the reliability and clinical consistency of present model. TME, which comprises recruited stromal cells and TIICs, has emerged as an important player in tumor progression, with the potential to be used in future treatment and diagnosis 26 . In present study, the TME of BLCA patients was analyzed with the ESTIMATE and CIBERSORT R package. With the increase of risk score generated from PDEARGs-based signature, tumor purity significantly decreased, and the proportion of stromal cells, neutrophils, and macrophage (M0 and M2 phenotypes) in tumor tissues significantly increased, which was correlated with bad pathological stage, detrimental TNM status, and poor prognosis. As reported, tumorassociated stromal cells can synthesize and secrete many pro-tumorigenic factors to promote cancer initiation, angiogenesis, invasion, and metastasis 27 . Additionally, emerging evidence indicates that elevated neutrophils are associated with detrimental outcomes in several solid tumors, and new strategies to decrease their presence and activity have shown efficacy in preclinical models 28,29 . TAM is another prominent component of TME, and the accumulation of M0 and M2 macrophages in tumors is most strongly associated with poor clinical outcomes 18 . Therefore, the ability to reflect the tumor-promoting microenvironment may explain why the present PDEARGsbased signature could precisely predict the prognosis of BLCA patients.
To explore the underlying mechanism by which the present prognostic model effectively stratified BLCA patients, GO enrichment analysis of the five PDEARGs was performed. As a result, several immune-related GO terms were significantly enriched, such as macrophage activation, inflammatory response, cellular response to decreased oxygen levels, and related apoptotic signaling pathway. It has been reported that ARGs are tightly related to hypoxia and hypoxia-induced metabolic reprogramming, among which P4HB is a crucial molecule that has been extensively studied 7,30 . The expression of P4HB significantly increased in several solid tumors, including bladder cancer 31 , kidney cancer 32 and prostate cancer 30 , which is consistent with the present study. Besides, hypoxia and oxygen stress-induced autophagy and apoptosis could be regulated by ARGs such as SPHK1, P4HB, and NAMPT 33 . During the process mentioned above, inflammatory response would be initiated, and TME could be changed to recruit and activate macrophages. NAMPT is regarded as a pleiotropic modulator governing monocyte/macrophage differentiation, polarization, and migration 34 . Therefore, we speculate that the process of "PDEARGs → autophagy → TME change → TAM recruitment and polarization → tumor progression" would provide insights for present prognostic signature and propose potential treatment targets for BLCA patients. Though the present model exhibits a promising value in predicting BLCA patients' prognosis, we intend to integrate clinical variables and risk scores generated from the model above to construct a systematic signature with promoted clinical utility and predictive accuracy. Notably, the novel systematic signature could more precisely predict the prognosis of TCGA-BLCA patients (AUC = 0.791). As reported, AUC > 0.75 was deemed to have an excellent predictive value 24 . The reliability of both PDEARGs-based signatures was also validated through external cohorts downloaded from the GEO database. A nomogram of the systematic signature was then prepared for clinicians to identify BLCA patients who need invasive therapy.
In conclusion, we constructed a valuable PDEARGs-based prognostic model that can precisely predict the prognosis of BLCA patients. This model's risk score can serve as an independent prognostic factor and can