Identification and validation of an autophagy-related gene signature for predicting prognosis in patients with esophageal squamous cell carcinoma

Esophageal squamous cell carcinoma (ESCC) is the main subtype of esophageal cancer. Since autophagy-related genes (ARGs) play a key role in the pathogenesis of many tumors, including ESCC, the purpose of this study is to establish an autophagy-related prognostic risk signature based on ARGs expression profile, and to provide a new method for improving prediction of clinical outcomes. We obtained the expression profiles of ESCC from public data (GSE53625) and extracted the portion of ARGs. Differential expression analysis and enrichment analysis were performed to confirm abnormal autophagy-related biological functions. Univariate and multivariate Cox regression analyses were performed on RNA microarray data (GSE53625) to construct a prognostic risk signature associated with autophagy. The performance of the model was evaluated by receiver operating characteristic (ROC) analysis, survival analysis and Brier score. The model was subjected to bootstrap internal validation. The potential molecular mechanism of gene signature was explored by gene set enrichment analysis (GSEA). Spearman correlation coefficient examined the correlation between risk score and immune status and ferroptosis. The expression levels of genes and proteins were validated by qRT-PCR and immunohistochemistry in ESCC cell lines and ESCC tissues. We constructed and validated an autophagy-related prognostic risk signature in 179 patients with ESCC. The long-term survival of patients in high-risk group was lower than that in low-risk group (log-rank, P value < 0.001). ROC analysis and Brier score confirmed the reliability of the signature. GSEA results showed significant enrichment of cancer- and autophagy-related signaling pathways in the high-risk ESCC patients and immunoregulatory signaling pathways in the low-risk ESCC patients. Correlation analysis showed that the risk signature can effectively predict the effect of immunotherapy. About 33.97% (71/209) ferroptosis-related genes were significantly correlated with risk scores. Finally, the results of qRT-PCR and immunohistochemistry experiments were consistent with bioinformatics analysis. In brief, we constructed a novel autophagy-related gene signature (VIM, UFM1, TSC2, SRC, MEFV, CTTN, CFTR and CDKN1A), which could improve the prediction of clinical outcomes in patients with ESCC.

Validation and assessment of the prognostic signature. The risk score of ESCC patients was calculated by using the prognostic signature. According to the median risk score, all ESCC patients were separated into high-risk group and low-risk group. The difference of overall survival between the two groups was analyzed by Kaplan-Meier survival curve. Receiver operating characteristic (ROC) curve and area under ROC (AUC) were used to evaluate the sensitivity and specificity of the prognostic signature. The Brier score and calibration curves were used to assess the calibration of the prognostic model.
In order to verify the effectiveness of the prognostic model, the bootstrap method based on 1000 resampling was used to obtain the testing set 18,19 . The training set was the original dataset. In the testing set, AUC and Brier score for the prognostic model were calculated using the R package of the "riskRegression". The relationship between the risk score and clinical factors was also analyzed to identify the validity of the risk signature, and the survival prediction ability of prognostic factors was further compared. By comparing the clinical traits, univariate and multivariate Cox regression analyses were used to confirm the independence of the prognostic model. At the same time, a nomogram was constructed using the Cox regression coefficients with the R package "rms", and its calibration curves were drawn.
Gene set enrichment analysis. GSEA, using the "clusterProfiler" package, was employed for assessing the possible mechanisms between high-risk and low-risk groups of ESCC patients based on the KEGG (v7.2) and Hallmark (v7.2) gene sets collections.
Relationship between the risk score and immune status and ferroptosis. The CIBERSORT algorithm was employed to calculate the relative proportion of 22 tumor-Infiltrating Immune Cells (TICs) in 179 ESCC samples 20,21 . After quality screening (P < 0.05), 107 cases of ESCC could be for subsequent analysis. The Spearman coefficient examined the correlations between risk score and 22 TICs proportion. Meanwhile, Spearman correlation was also applied to detect the relationship between the risk score and immunotherapy-related targets, such as programmed cell death 1(PDCD1, also known as PD-1) and programmed cell death 1 ligand 1(PD-L1, also known as CD274).
Ferroptosis is a newly introduced form of programmed cell death discovered in recent years. Accumulating studies have revealed ferroptosis could inhibit tumor growth or promote tumor proliferation in tumor  Table 3. IHC staining analysis was conducted as described previously 29 . The primary antibody used was anti-cortactin (CTTN) (ab81812, Abcam, 1:300).

Statistical analysis.
Statistical analyses were performed in R software (Version 4.0.2; https:// www.R-proje ct. org) 30 . Statistical significance was set at P-value < 0.05. The Kaplan-Meier method was used for survival analysis, and the log-rank test was used to compare the results. Wilcoxon test was used for two independent samples or the paired data. One-way analysis of variance (ANOVA) or Kruskal-Wallis test was used to test the difference among multiple groups. P < 0.05 and Benjamini-Hochberg adjusted P < 0.05 were considered as the cut-off criterion for enrichment analysis.
Ethics statement. The work was approved by the Ethics Committee of The First Affiliated Hospital of Xi'an Jiaotong University.

Results
Identification of 87 differentially expressed ARGs. Figure 1 presents the flow diagram of the research.
According to the criteria for screening differential genes, the expression levels of 87 of the 585 ARGs in ESCC were significantly changed compared with the normal controls, among which 44 and 43 ARGs were respectively up-regulated and down-regulated. Volcano and heatmap plots are shown ( Fig. 2A, B). GO biological process  Table 4) showed that they were mainly involved in the regulation of autophagy and catabolic processes (Fig. 2C). KEGG analysis showed that these genes were mainly involved in the IL-17 signaling pathway, Necroptosis, TNF signaling pathway and apoptosis signaling pathway (Fig. 2D). The enrichment analysis details are given in Supplementary Table 5.  Table 6). Subsequently, the prognostic signature based on 8 ARGs was established using step-  The risk score of ESCC patients was calculated by using the prognostic signature. According to the median risk score, all ESCC patients were separated into high-risk group (n = 89) and low-risk group (n = 90). The long-term survival of patients in high-risk group was lower than that in low-risk group (log-rank, P < 0.001) (Fig. 4A). The sensitivity and specificity of prognostic signature were evaluated by ROC curves of 1-, 3-and 5-year OS rates, with AUCs of 0.757, 0.764 and 0.813, respectively (Fig. 4B). The distribution of risk scores, patients' survival status and the expression values of the eight genes are shown in Fig. 4C.

Construction and validation
The calibration curves of prognosis signature for the prediction of 1-, 3-and 5-year OS rates demonstrated good agreement in the training set, with Brier scores of 0.15, 0.197, and 0.175, respectively (Fig. 4D). Through bootstrapping verification, the AUCs and brier scores of predicted 1-, 3-and 5-year OS rates were 0.714, 0.714, 0.751, 0.164, 0.222 and 0.203, respectively, indicating that the model had good discrimination and calibration (Fig. 4E).
Univariate Cox regression showed that advanced age, higher tumor stage, N stage, grade, and higher risk score were risk factors for poor prognosis (Fig. 5A). Meanwhile, multivariate Cox regression analysis revealed that the www.nature.com/scientificreports/ autophagy-related genes signature was an independent prognostic factor for ESCC patients, whereas the clinicopathologic features were not (Fig. 5B). To develop a quantitative method for predicting the prognosis of ESCC patients, we constructed a prognostic nomogram based on the risk score of autophagy and clinicopathological www.nature.com/scientificreports/ features, such as age, gender, stage and grade (Fig. 5C). The calibration curves for 1-, 3-, and 5-years OS showed excellent agreement between predicted and observed outcomes (Fig. 5D-F).
Gene set enrichment analysis. The GSEA results revealed that cancer-and autophagy-related signaling pathways were significantly enriched in the high-risk ESCC patients, including ECM receptor interaction, WNT signaling pathway, epithelial-mesenchymal transition, focal adhesion and TGF-β signaling pathway (Fig. 6A, www.nature.com/scientificreports/ C). Meantime, immunoregulatory pathways were significantly enriched in the low-risk ESCC patients, including Natural killer cell mediated cytotoxicity, Cytokine-cytokine receptor interaction, NOD like receptor signaling pathway, Interferon-alpha response, Interferon-gamma response and IL6 JAK STAT3 signaling pathway (Fig. 6B, D). The enrichment analysis details are given in Supplementary Table 8.
Immune cell infiltration analysis. Figure 7A shows the relative content distribution of 22 TICs in the ESCC cohort and their correlation with risk scores. The risk score had negative correlation with the levels of B cells naive (R = − 0.19, P = 0.044) and Plasma cells (R = − 0.35, P < 0.001). However, the risk score was positively correlated with the levels of T cells CD4 memory resting (R = 0.24, P = 0.011) and Dendritic cells resting (R = 0.19, P = 0.047) (Fig. 7B). The mRNA expression levels of PD-1(R = − 0.32, P < 0.001) and PD-L1(R = − 0.21, P = 0.0057) were negatively correlated with the risk score, indicating that patients with lower risk scores might have a better response to PD-L1 immunotherapy (Fig. 7C, D).

Identification of the ferroptosis correlation with the prognostic signature.
Among the 209 ferroptosis-related genes, 71 (33.97%) were significantly associated with risk scores, of which 25 were positively correlated with risk scores and 46 were negatively correlated with risk scores (Supplementary Table 9). As shown in Fig. 8, ACSL3, ATG7, ANO6, HIC1and ULK2 are the leading five ferroptosis -related genes that have a positive correlation with the risk score, while the top five ferroptosis -related genes that are negatively correlated with the risk score are ALOX12B, ALOXE3, HMOX1, HAMP and CHAC1.
Multiple database validation. Among the 8 ARGs in the prognostic signature, 2 genes (CTTN and CFTR) were identified as differentially expressed genes between tumors and normal tissues in the GSE53625 dataset (Fig. 9A, B). To reduce bias, we used a multi-database approach to determine the expression of 8 ARGs at the tissue and cell levels (Supplementary Table 10). The expression differences of CTTN and CFTR in TCGA and Oncomine databases were completely consistent with the above analysis, which showed that CTTN was up-regulated in ESCC, while CFTR was down-regulated in ESCC (Fig. 9C, D, Supplementary Fig. 3). In the Cancer Cell Line Encyclopedia database, CTTN was highly expressed in ESCC cell line, while CFTR was lowly expressed in ESCC cell line, which was consistent with the above differential expression pattern ( Supplementary  Fig. 4). In addition, the results of Kaplan-Meier Plotter database indicated that high expression level of CTTN was significantly associated with poor OS in ESCC patients (Fig. 9E), while no statistical difference was shown between CTFR expression and OS in ESCC patients (Fig. 9F). The remaining six genes in the prognostic signature had similar expression patterns in the four databases mentioned above ( Supplementary Fig. 5). www.nature.com/scientificreports/ Preliminary experimental validation. In order to verify the accuracy of bioinformatics analysis, we used ESCC cell line and ESCC tissue to detect the expression level of ARGs in prognostic signature. The qRT-PCR results were consistent with the above bioinformatics analysis. Compared with human normal esophageal epithelial cells (Het-1A), CTTN was increased and CFTR was down-regulated in ESCC cell lines (Fig. 9G, H). At the cellular level, the expression patterns of the other six genes were similar to those of previous bioinformatics analysis ( Supplementary Fig. 6). To further study the CTTN protein expression, we performed immunohistochemistry staining of 42 ESCC tumor samples and 10 normal esophageal samples. The results indicate the expression of CTTN in tumor tissues was significantly higher than that of normal esophageal tissues (Fig. 9I,  Supplementary Table 11). Moreover, Kaplan-Meier survival analysis showed that OS and progression-free survival were shorter in the group with high CTTN than in the group with low-negative CTTN (Fig. 9J, K). www.nature.com/scientificreports/

Discussion
In China, ESCC is the most common pathological subtype, and patients with ESCC have a poor prognosis due to unknown pathogenesis. Therefore, it is urgent to explore new and accurate molecular biomarkers for prognosis. Autophagy plays a dual role in tumors by inhibiting tumor growth or promoting tumor invasion, and has been reported to be involved in the diagnosis and treatment of esophageal squamous cell carcinoma 8,9,31 . Therefore, it is of great significance to search for biomarkers related to autophagy for individualized treatment and prognosis of ESCC patients.
In this study, we identified 87 differentially expressed ARGs by integrating autophagy-related genes from HADB and Molecular Signatures Database based on ESCC expression profile from GEO data. GO and KEGG analysis of differentially expressed ARGs showed that they were mainly involved in the regulation of autophagy, TNF signaling pathway and Apoptosis signaling pathway. We then used univariate Cox regression analysis to identify prognosis-related ARGs. Furthermore, a novel prognostic signature containing eight autophagy-related genes was established, which showed good prognostic performance and was a significant independent prognostic www.nature.com/scientificreports/ factor for predicting the long-term outcomes in patients with ESCC. Correlation analysis showed that the risk score of autophagy-related signature was associated with survival outcomes, tumor stage, and N stage. The GSEA results revealed the prognostic signature may be involved in the cancer-and autophagy-related signaling pathways and immunoregulatory signaling pathways. Notably, we also verified ARGs expression using ESCC cell lines, ESCC tissues and multiple databases. In summary, our research demonstrated that autophagy dysfunction was important during the process of ESCC. Most importantly, for the first time, we developed a robust ESCC www.nature.com/scientificreports/ prognostic risk signature based on ARGs, which may provide a more accurate assessment of the prognosis of ESCC patients and a more personalized treatment strategy. It has been reported that most of the 8 ARGs (VIM, UFM1, TSC2, SRC, MEFV, CTTN, CFTR and CDKN1A) are related to cancer. VIM is an important marker of epithelial-mesenchymal transition (EMT) that plays a key role in the progression and metastasis of a variety of tumors, including ESCC 32 . UFM1 is a potential marker protein that induces DNA double-strand breaks response, and is associated with breast cancer progression 33,34 . TSC2 is a negative regulator upstream of mTOR, and the TSC2-mTOR signaling pathway plays a crucial role in the regulation of tumor autophagy 35 . SRC was a prognostic factor for overall survival in patients with colorectal cancer and glioblastoma multiforme 36,37 . CTTN was closely correlated with the prognosis of head and neck cancer and glioma 38,39 . It has been reported that CFTR can be used as a prognostic biomarker of nasopharyngeal carcinoma, colorectal cancer and head and neck cancer [40][41][42] . CDKN1A has been reported to play a protective role in the prognosis of cervical cancer 43 . At present, the mechanism of these eight genes in ESCC has not been fully elucidated.
CIBERSORT analysis showed that the risk score was correlated with TICs, including B cells naive, Plasma cells, T cells CD4 memory resting and dendritic cells resting. Meanwhile, the mRNA expression levels of PD-1and PD-L1were negatively correlated with the risk score. These data suggest further studies will be interesting to explore specific mechanisms of risk signature and tumor-infiltrating immune cells and ARGs-related risk score may provide potential guidance for personalized treatments of ESCC.
Ferroptosis is a newly introduced form of programmed cell death discovered in recent years. Accumulating studies have revealed Ferroptosis could inhibit tumor growth or promote tumor proliferation in tumor development, and there is crosstalk with autophagy at the molecular level 22,23 Ferroptosis-related genes, such as GPX4 and HMOX1, were related to prognosis to the prognosis of esophageal squamous cell carcinoma 44 . In the present study, we found that more than 1/3 of the ferroptosis-related genes (33.97%, (71/209)) were associated with the risk score, which further provided more information for targeting ferroptosis therapy.
Nonetheless, there are still some limitations in this study. First, although we have used the bootstrap method to verify the robustness of the prognostic signature established in 179 ESCC patients, external datasets are still needed to further validate the prognostic signature. Second, although we performed qPCR and immunohistochemistry analysis, the role and mechanism of genes that constitute the prognostic signature of esophageal squamous cell carcinoma need to be explored deeply.
In brief, we constructed an autophagy-related prognostic signature for ESCC for the first time and successfully verified its reliability. Our study might provide a novel approach for effective prediction of clinical outcomes and individualized therapeutic strategies selection.

Data availability
Publicly available datasets were analyzed in this study. This data can be found at: https:// www. ncbi. nlm. nih. gov/ geo/ query/ acc. cgi.