Identification of tumor microenvironment-related prognostic genes in colorectal cancer based on bioinformatic methods

Colorectal cancer (CRC) ranks fourth among the deadliest cancers globally, and the progression is highly affected by the tumor microenvironment (TME). This study explores the relationship between TME and colorectal cancer prognosis and identifies prognostic genes related to the CRC microenvironment. We collected the gene expression data from The Cancer Genome Atlas (TCGA) and calculated the scores of stromal/immune cells and their relations to clinical outcomes in colorectal cancer by the ESTIMATE algorithm. Lower immune scores were significantly related to the malignant progression of CRC (metastasis, p = 0.001). We screened 292 differentially expressed genes (DEGs) by dividing CRC cases into high and low stromal/immune score groups. Functional enrichment analyses and protein–protein interaction (PPI) networks illustrated that these DEGs were closely involved in immune response, cytokine–cytokine receptor interaction, and chemokine signaling pathway. Six DEGs (FABP4, MEOX2, MMP12, ERMN, TNFAIP6, and CHST11) with prognostic value were identified by survival analysis and validated in two independent cohorts (GSE17538 and GSE161158). The six DEGs were significantly related to immune cell infiltration levels based on the Tumor Immune Estimation Resource (TIMER). The results might contribute to discovering new diagnostic and prognostic biomarkers and new treatment targets for colorectal cancer.

GO analysis, KEGG analysis, and PPI. The GO function analyses consisted of biological processes (BP), cellular components (CC), and molecular function (MF). Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis explores significant pathways related to DEGs. The R packages "cluster profile" (version 3.17.0), "org.Hs.eg.db" (version 3.11.1), "enrichplot" (version 1.8.1), and "ggplot2" (version 3.3.0) were used to conduct function enrichment analyses with a cut-off criterion of p < 0.05. We used an open-source software platform, Search Tool for the Retrieval of Interacting Genes (STRING, https:// string-db. org/) 16 , to build the PPI network of DEGs. We selected the degree of interactions between proteins with a combined score > 0.9 and performed analysis using Cytoscape (https:// cytos cape. org/; version 3.7.1) 17 . We then used the MCODE 18 plug-in in the Cytoscape software to conduct modular analysis and selected the top two significant modules with more than five nodes for further analysis. We used Metascape (http:// metas cape. org/ gp/ index) to conduct enrichment analysis of module genes, which is an effective and efficient tool to provide a comprehensive gene list annotation in the big data era 19 . Survival analysis. To identify whether the TME-related DEGs relate to the prognosis of CRC patients, we performed Kaplan-Meier survival analysis using the R package "survival". We divided the patients into two groups for survival analysis according to the median value of the expression of each gene. And we verified the prognostic DEGs in the GSE17538 and GSE161158 data sets. For all results, p < 0.05 was considered statistically significant.

Results
Lower immune scores significantly related to malignant progression of CRC . We collected gene expression profile data and clinical information of 568 CRC patients from the TCGA data portal. We calculated that stromal scores ranged from -2173.94 to 1941.04 and immune scores from -952.87 to 2979.35. We conducted the difference analysis of immune/stromal scores according to different clinical characteristics. As shown in Fig. 1A-C, American CRC patients' immune scores decreased with increasing tumor metastasis, but it has no significant relationship with the size of the primary tumor. Besides, we found that stromal scores increased gradually as the T stage increased Fig. 1D. The correlation between immune/stromal scores and other clinical traits is in Fig. S1. The results indicated that stromal/immune scores might be helpful indicators to reflect the malignant degree of CRC. Expression analysis of differentially expressed genes (DEGs). We divided the CRC patients into high-and low-stromal/immune scores groups following median stromal (− 527.66) scores and immune (501.24) scores. We then ascertained the differentially expressed genes (DEGs) by calculating gene expression data between high and low immune/stromal scores groups using the limma package. The heatmaps of immune scores ( Fig. 2A) and stromal scores (Fig. 2B)  www.nature.com/scientificreports/ mainly concentrated in neutrophil activation involved in immune response, leukocyte migration, secretory granule membrane, receptor-ligand activity, immunoglobulin binding, signaling receptor activator activity, and immune receptor activity (Fig. 3A). The KEGG pathways were displayed in Fig. 3B, mainly enriched in cytokine-cytokine receptor interaction, phagosome, and chemokine signaling pathway. To further explored the potential relationship between the DEGs, we conducted a protein-protein interaction (PPI) network using the STRING tool with the minimum required interaction score > 0.900 (Fig. 3C). The Cytoscape software was used to reconstruct the PPI network. The network is formed from eight modules, including 145 nodes and 701 edges (Fig. 3D). We selected the top two significant modules with more than 5 nodes for further analysis. Module1, contains 24 nodes and 276 edges (Fig. 4A). The Metascape gene list analysis indicated that Module 1 was mainly involved in humoral immune response, dendritic cell chemotaxis, neutrophil chemotaxis, peptide ligand-binding receptors, and cAMP signaling pathway (Fig. 4B). Module 2 contains 20 nodes and 180 edges (Fig. 4C). The enrichment analysis revealed the 20 genes were mainly associated with neutrophil degranulation, antigen processing and presentation, hematopoietic cell lineage, lymphocyte activation, and immunoregulatory interactions between a Lymphoid and a non-Lymphoid cell (Fig. 4D). The results further illustrated that these DEGs were closely associated with TME and immune response.
Mining and verification of DEGs with predictive value. The Kaplan-Meier survival analysis showed that 15 TME-related DEGs were significantly related to the prognosis of colorectal cancer (Fig. 5, p < 0.05). As shown in the figure, the expression levels of some genes were positively correlated with prognosis, while some genes were negative. Finally, to verify whether the 15 prognostic-related genes have prognostic significance in other independent databases, we downloaded and analyzed the GSE17538 and GSE161158 data sets from the GEO database. The results showed that six TME-related DEGs were associated with the overall survival rates of colorectal cancer patients (Figs. 6, 7, p < 0.05). And the higher expression of MMP12 (  www.nature.com/scientificreports/ the differences in the mRNA expression levels of the six DEGs in colorectal cancer patients, we also used the HPA database to explore the expression levels of the proteins encoding by the genes (Fig. 8). The results indicated that FABP4, ERMN, and CHST11 were overexpressed in CRC tumor tissues compared with normal tissues. However, there was no significant difference in the expression of TNFAIP6 between CRC tumor tissues and normal tissues. The protein level of MEOX2 was not detected in normal tissues and cancer tissues of the intestine, and the HPA database did not provide immunohistochemical information of MMP12.

Association of tumor-infiltrating immunocyte fraction with the 6 DEGs.
To further clarify the relationship between the expression of the identified DEGs and the immune infiltration of the CRC tumor microenvironment, we used the TIMER for exploratory analysis. The immune cells include B cell, CD4 + T cell, CD8 + T cell, macrophage, neutrophil, and dendritic cell in our research. As shown in Fig. 9, the expression of the six identified DEGs were positively correlated with the level of infiltration of different immune cells. These results indicated that the six prognostic-related DEGs might regulate the CRC immune microenvironment by affecting the infiltration of immune cells.

Discussion
This study calculated the scores of stromal/immune cells and their relations to prognosis and clinical outcomes in colorectal cancer by the ESTIMATE algorithm. Next, we screened 292 tumor microenvironment-related DEGs obtained from the intersection of stromal and immune score groups. We then conducted functional enrichment analysis and the PPI network further to understand the differentially expressed genes (DEGs). We identified 15 It has been reported that the tumor microenvironment (TME) has large correlations to the occurrence, evolution, and prognosis of colorectal cancer. TME provides an interaction place for the immune system and tumor. TME plays an important role in tumor occurrences and spreads. Therefore, the recognition of DEGs in the CRC microenvironment might help us formulate a more sensible management and treatment plan for CRC patients. Stromal cells and immune cells are the chief elements of the non-tumor component in TME, mainly in the evolution of cancers. ESTIMATE can compute stromal and immune scores to speculate tumor purity based on single-sample gene set enrichment analysis. The ESTIMATE algorithm, has been deployed in the researches of cutaneous melanoma 20 , lung cancer 21,22 , breast cancer 23 , ovarian cancer 24 , endometrial cancer 25 , renal cell carcinoma 26,27 , bladder cancer 28 , pancreatic adenocarcinoma 29 and osteosarcoma 30 , which shows good precision and practicality.
Our study first explored the relationship between stromal/immune cells scores and clinical characteristics in American CRC patients. The results indicated that immune scores in American CRC patients decreased with increasing stage and tumor metastasis. Besides, we found that stromal scores increased gradually as the T stage increased. The results were consistent with previous studies, which have indicated that lower immune scores were significantly associated with malignant progression in adrenocortical carcinoma 9 and osteosarcoma 30 .
Then we analyzed 292 differentially expressed genes with 290 upregulated and 2 downregulated genes. The GO analysis indicated that DEGs were mainly concentrated in neutrophil activation involved in immune response, www.nature.com/scientificreports/ leukocyte migration, secretory granule membrane, receptor-ligand activity, immunoglobulin binding, signaling receptor activator activity, and immune receptor activity. The KEGG pathways illustrated that they were mainly enriched in cytokine-cytokine receptor interaction, phagosome, and chemokine signaling pathway. Cytokines are expressed primarily by immune cells and tumor cells in the tumor microenvironment. They are also information carriers between tumors and immune cells, which affect the changes in tumor immunity 31 . Enrichment analysis found that most of the differential genes play the role of cytokines and cytokine receptors. In the future, we can explore the therapeutic direction of tumor immunotherapy by elucidating the mechanism of cytokine action. The results manifested that they all played critical roles in the microenvironment and the progression of tumors. The module analyses indicated that they were mainly correlated with humoral immune response, dendritic cell chemotaxis, neutrophil chemotaxis, peptide ligand-binding receptors, cAMP signaling pathway neutrophil degranulation, antigen processing and presentation, hematopoietic cell lineage, lymphocyte activation, and immunoregulatory interactions between a Lymphoid and a non-Lymphoid cell. It has been reported that cAMP signaling is significantly related to cancer, and its targeting has shown specific antitumor effects 32 .
The results further illustrated that these DEGs were closely associated with TME and immune response. Moreover, we identified 15 prognosis-related genes by conducting survival analysis of the 292 DEGs. Then we verified 6 TME-related DEGs were associated with the overall survival rates of CRC patients based on the GSE17538 and GSE161158 data sets. And among these genes, the higher expression of MMP12 predicts a better prognosis, while CHST11, ERMN, FABP4, MEOX2, and TNFAIP6 predicted worse prognosis.
Matrix metalloproteinase 12 (MMP-12), also known as metalloelastase, is mainly expressed in macrophages 33 . Studies have shown that MMP-12 has a protective effect on CRC 34 . The expression of MMP-12 will reduce the expression of VEGF (vascular endothelial growth factor) and inhibit tumor angiogenesis, thereby increasing the survival rate of patients with colorectal cancer 35,36 . This study shows that the expression of MMP-12 is related to the infiltration of immune cells in the CRC microenvironment, which provides a new idea for the development of effective treatments for selective targeting of MMP12 in the future. Fatty acid-binding protein 4 (FABP4) has been reported to directly or indirectly regulate the growth and invasion of colorectal cancer through other channels 37 .
The other four candidate genes are CHST11, ERMN, MEOX2, and TNFAIP6. The expression of chondroitin sulfate (CS) predicts the poor prognosis of many human cancers [38][39][40] . However, a recent study has shown that chondroitin sulfate supplements can reduce the risk of CRC 41 . Carbohydrate sulfotransferase 11 (CHST11) is a crucial enzyme in the biosynthesis of chondroitin sulfate (CS). According to research findings, the expression   38,46 , but its specific role in colorectal cancer has not been studied. A study suggests that MEOX2 may play a role in the carcinogenic process of colon adenocarcinoma 47 , but the specific mechanism is still unclear. TNFα-stimulated gene-6 (TNFAIP6) plays a vital role in the prognosis of various tumors 48,49 , but few studies on the specific mechanism of TNFAIP6 in the progression of CRC. However, we are particularly interested in TNFAIP6 among these six prognostic-related genes. TNFAIP6 is a potential marker of active inflammatory bowel disease 50 , and inflammatory bowel disease is one of the risk factors for colorectal cancer. Therefore, it is necessary to clarify further the potential biological relationship between TNFAIP6 and colorectal cancer. We also used the HPA database to verify the encoded proteins expression levels of the genes. We also analyzed the correlation between the TME -related prognostic DEGs (CHST11, ERMN, FABP4, MEOX2, TNFAIP6, and MMP12) and immune cell infiltration. The results indicate that these DEGs may regulate the immune microenvironment by affecting various immune cell infiltration levels. However, the DEGs are identified by splitting samples based on the level of immune and stromal scores estimated by the ESTIMATE algorithm. Therefore, it is also possible that the genes are expressed by the infiltrating cells themselves. In tumors, immune cell infiltration is very complicated, which requires further in-depth research.
Our research may provide new ideas for analyzing the complex relationship between CRC and its tumor microenvironment. However, this study also has some limitations. First of all, the clinical information in the TCGA database is not too complete, such as the lack of grade data for colorectal cancer patients and follow-up www.nature.com/scientificreports/ information after surgery. This makes it impossible for us to conduct a comprehensive analysis of the prognosis of patients. Further research requires a large number of samples to verify the current results, which is also the focus of our subsequent work. Secondly, our research is based on public databases for bioinformatics analysis. At present, there is no functional experiment to verify the specific mechanism of the novel DEGs in the CRC microenvironment, so further investigations are needed to confirm in vivo and in vitro. In conclusion, we hope that this study will help discover new diagnostic and prognostic biomarkers and new treatment targets for colorectal cancer. www.nature.com/scientificreports/