Identification and verification of the molecular mechanisms and prognostic values of the cadherin gene family in gastric cancer

While cadherin (CDH) genes are aberrantly expressed in cancers, the functions of CDH genes in gastric cancer (GC) remain poorly understood. The clinical significance and molecular mechanisms of CDH genes in GC were assessed in this study. Data from a total of 1226 GC patients included in The Cancer Genome Atlas (TCGA) and Kaplan–Meier plotter database were used to independently explore the value of CDH genes in clinical application. The TCGA RNA sequencing dataset was used to explore the molecular mechanisms of CDH genes in GC. Using enrichment analysis tools, CDH genes were found to be related to cell adhesion and calcium ion binding in function. In TCGA cohort, 12 genes were found to be differentially expressed between GC para-carcinoma and tumor tissue. By analyzing GC patients in two independent cohorts, we identified and verified that CDH2, CDH6, CDH7 and CDH10 were significantly associated with a poor GC prognosis. In addition, CDH2 and CDH6 were used to construct a GC risk score signature that can significantly improve the accuracy of predicting the 5-year survival of GC patients. The GSEA approach was used to explore the functional mechanisms of the four prognostic CDH genes and their associated risk scores. It was found that these genes may be involved in multiple classic cancer-related signaling pathways, such as the Wnt and phosphoinositide 3-kinase signaling pathways in GC. In the subsequent CMap analysis, three small molecule compounds (anisomycin, nystatin and bumetanide) that may be the target molecules that determine the risk score in GC, were initially screened. In conclusion, our current study suggests that four CDH genes can be used as potential biomarkers for GC prognosis. In addition, a prognostic signature based on the CDH2 and CDH6 genes was constructed, and their potential functional mechanisms and drug interactions explored.


Results
Functional enrichment of cadherin genes. Through functional enrichment analysis, the main functions of cadherin genes were found to include calcium ion binding and participation in various biological processes involving cell adhesion (Fig. 1A).
Gene-gene interaction analysis revealed significant co-expression and pathway interactions among the CDH genes ( Fig. 1B,C). Subsequently, in order to verify the co-expression interaction relationship of these genes in GC tumor tissue, the cor function was used to calculate the co-expression correlation coefficient of these genes in R. These CDH genes were also found to have significant co-expression gene-gene interactions in GC tumor tissue (Fig. 2, Table S1).
Clinical significance of cadherin genes. A total of 408 GC-derived RNA-seq samples were obtained from TCGA website. These included 32 para-carcinoma and 375 tumor tissue samples. The pheatmap package in R was used to draw a heat map of the CDH gene expression distribution in GC para-carcinoma and tumor tissue samples (Fig. 3). In the CDH gene family, 12 genes were found to be differentially expressed between GC para-carcinoma and tumor tissue. Five of these genes were significantly down-regulated in the cancer tissue, while seven were significantly up-regulated (Fig. 4, Table 1). When adjusting for tumor stage and age in the subsequent multivariate Cox proportional hazard regression model of the CDH genes found that five CDH genes were significantly associated with GC prognosis. These five prognostic genes included CDH6, CDH10, CDH7, CDH2, and CDH13 (Table 1, Fig. 5A-E). High expression of each these five genes was significantly associated with poor GC prognosis. Among these five prognostic CDH genes, with a time-dependent area under the ROC curve of 0.680, CDH6 was found to have the highest accuracy in predicting 5-year survival in GC patients ( Fig. 6A-E). To verify the prognostic value of these five CDH genes in GC, GC patient data derived from the Kaplan-Meier plotter database were used as a validation cohort. CDH6, CDH10, CDH7 and CDH2 were found to be significantly correlated with GC prognosis, and that high expression of these genes were significantly correlated with poor clinical outcome in GC cases (P < 0.01 for all log-rank values, Fig. 7A-D). No significant correlation was found between CDH13 mRNA expression levels and GC prognosis (log-rank P = 0.47, Fig. 7E). An expression matrix of the TCGA cohort was therefore used for the CDH6, CDH10, CDH7 and CDH2 genes to construct the prognostic risk score model. Through step function screening, a prognostic risk score model was constructed based on the expression of CDH2 and CDH6. High-and low-risk GC patients were defined according to the median risk score value. The risk score calculation model was as follows: risk score = (0.0979 × CDH2 expression) + (0.1841 × CDH6 expression). In the analysis of the prognostic risk scores, it was observed that high-risk GC patients were significantly associated with poor prognosis (log-rank P < 0.001, adjusted P < 0.001, HR = 1.910, 95%CI 1.339-2.724, Fig. 8A,B), and that their median survival time of 675 days was shorter than lowrisk patients (1686 days). The construction of the prognostic risk score model based on CDH2 and CDH6 was observed to significantly improve the accuracy when predicting the 5-year survival of GC patients (AUC = 0.698, Fig. 8C). In order to evaluate the contribution of CDH genes to the prognosis of GC patients, CDH prognostic genes and associated risk scores were used to construct the nomogram models. Using the nomogram model, it was found that tumor stage had the greatest contribution to prognosis. Among the CDH prognostic genes, CDH6 had a greater contribution to GC prognosis than the other prognostic CDH genes (Fig. 9A). In the riskscore nomogram model, tumor stage was found to have had a greater contribution to the prognosis of GC than risk score, but risk score had a greater contribution to GC prognosis than single CDH prognostic genes (Fig. 9B).

Functional enrichment analysis of cadherin genes in GC.
In order to understand the potential biological mechanisms involved in the prognostic CHD genes and risk scores in GC, GSEA was used to perform enrichment analysis for different CDH expression levels or risk score phenotypes. GSEA of CDH2 in TCGA cohort found that a high CDH2 expression phenotype was significantly involved in several systems and pathways. These included the integrin1 pathway, VEGFA targets, tumor tumorigenesis, JNK signaling dn, metastasis epithelial-mesenchymal transition (EMT) up, PI3K cascade: FGFR2, metastasis, Wnt signaling pathway, TGF  Table S2). For CDH6, we found that the high CDH6 expression phenotype was significantly associated with the integrin1 pathway, and Wnt signaling pathway, VEGFA targets, focal adhesion, NF-κB signaling, metastasis up, calcium signaling pathway, tumorigenesis up, tumor vasculature up, vascular endothelial growth factor signaling pathway, transforming growth factor beta binding, cell cell adhesion via plasma membrane adhesion molecules, Wnt protein binding, cell cell adhesion mediated by cadherin, G protein coupled neurotransmitter receptor activity and phosphatidylinositol 3 kinase signaling ( Fig. 11A-P, Table S3). For CDH7, the differences between low-and high-CDH7 expression phenotypes were significant in metastasis dn, mTOR 4 pathway, TNF pathway, ERBB2/ERBB3 pathway, Notch signaling pathway, MAPK pathway, apoptosis, p53 downstream pathway, ERBB signaling pathway, NF-κB pathway, Akt pathway, signaling by EGFR, Ras pathway, G protein coupled glutamate receptor signaling pathway, cell cycle process and cell-cell recognition ( Fig. 12A-P, Table S4). The high CDH10 expression phenotype was significantly involved in calcium signaling pathway, metastasis, PI3K cascade: FGFR1, PI3K cascade: FGFR2, EZH2 targets, targets of CCND1 and CDK4 up, fibroblast growth factor receptor binding, phospholipase C activating G protein coupled receptor signaling pathway, cell cell adhesion via plasma mem-    Table S5). The high-risk score phenotype was significantly involved in Kras targets up, ECM receptor interaction, integrin1 pathway, Wnt signaling pathway, PI3K cascade: FGFR1, VEGFA targets, calcium signaling pathway, metastasis EMT up, NF-κB signaling, vascular endothelial growth factor signaling pathway, regulation of cell junction assembly and regulation of non canonical Wnt signaling pathway ( Fig. 14A-L, Table S6).
Drug screening for GC risk score model. In order to screen targeted therapeutic drugs for GC risk scores, edgeR was used. This enabled the screening of DEGs between high-and low-risk phenotypes. A total of 344 DEGs were obtained across high-and low-risk phenotypes (Fig. 15, Table S7). The heat map for these DEGs is shown in Fig. S1. A multivariate Cox proportional hazards regression model, corrected for tumor stage and age, was then used for prognostic analysis. A total of 45 DEGs were observed to be significantly correlated with GC prognosis in the TCGA cohort (Table S8, Fig. 16A). The three most significant DEGs included cerebellin 4 precursor (CBLN4, log-rank P = 0.00096, Fig. 16B), chorionic gonadotropin subunit beta 3 (CGB3, logrank P = 0.0029, Fig. 16C) and butyrylcholinesterase (BCHE, log-rank P = 0.004, Fig. 16D). Through functional enrichment analysis of the DEGs, it was found that these DEGs may participate in calcium signaling pathway, ECM-receptor interaction, cAMP signaling pathway, cGMP-PKG signaling pathway, gastric acid secretion, adenylate cyclase-activating G-protein coupled receptor signaling pathway, calcium-dependent protein binding and G-protein coupled receptor signaling pathway, coupled to cyclic nucleotide second messenger function, was also found to be impacted. These biological processes and signaling pathways may be the potential mechanisms driving the differences in clinical outcome among GC patients across high-and low-risk phenotypes (Table S9). The DEGs were used to conduct targeted drug screening through the CMap online tool. Three potential small molecule compounds, anisomycin, nystatin and bumetanide, were found to be linked to the GC risk score. The chemical structures of the three drugs are shown in Fig. 17A-C, while the CMap analysis results are summarized in Fig. 17D. The STITCH online tool was used to construct the drug-gene interaction network. Bumetanide was found to potentially play a role in GC by targeting ERAS (ES cell expressed Ras) genes, while nystatin may play a role in GC by targeting SST (somatostatin) and ADRB3 (adrenoceptor Beta 3) genes (Fig. 18).  www.nature.com/scientificreports/

Discussion
In previous studies, the CDH gene family have reportedly played an important role in cancers, especially in its prognosis [6][7][8] 15 . In gallbladder cancer, CDH2 is not only closely related to a poor prognosis, but also to clinicopathological features such as tumor size, invasion,      www.nature.com/scientificreports/ up-regulated in patients with lymph node metastasis in oral squamous cell carcinoma (OSCC) and that patients have a poor prognosis 28 . It was further suggested that, when combined with other adhesion factor-related genes, CDH6 can be used as an important biomarker for OSCC lymph node metastasis and prognosis 28   www.nature.com/scientificreports/ migration in malignant melanomas 33 . CDH10 is a specific adhesion molecule of the blood-brain barrier, and plays a key role in the development and maintenance of this system 34 . A study using a TCGA dataset cohort observed that low expression of CDH10 is closely related to poor prognosis in breast cancer patients 35 , while sequencing on patients with lung squamous cell carcinoma (SQCC) and found that CDH10 is a high-frequency, SQCC-associated mutation 40 . In vitro experiments confirmed that CDH10 plays the role of a tumor suppressor gene in SQCC 40 . Collectively, CDH genes have been widely reported in a variety of cancers, with most of the studies investigating the biological role of these genes in cancer cell adhesion and their associated prognostic value. For functional analysis, GSEA results suggest that the functional mechanisms of the four prognostic CDH genes and their associated risk scores may be involved in multiple, classic, cancer-related signaling pathways in GC. These include the Wnt and PI3K signaling pathways [41][42][43] . For the drugs screened in this study, while anisomycin and bumetanide were found to have been reported for cancer therapy in previous studies, studies concerning the anti-tumor effect of nystatin were absent. In vivo experiments showed that bumetanide has antiangiogenic effects in CRC and that this drug may have clinical application value in CRC patients 44 . Similarly, photodynamic therapy combined with bumetanide was, through in vivo experiments, shown to significantly inhibit the growth of rat gliomas, reduce the peritumoral edema caused by simple photodynamic therapy, and thus improve the survival of rats 45 . Bumetanide can also enhance cisplatin-induced apoptosis of mesothelioma cells, thereby augmenting the sensitivity of chemotherapy drugs 46 . Anisomycin can have a direct killing effect in hepatocellular carcinoma and has an anti-tumor effect mediated by natural killer cell immunotherapy 47 . While anisomycin inhibits angiogenesis, proliferation and invasion in ovarian cancer cells by regulating the lncRNA-Meg3/miR-421/PDGFRA-Notch pathway axis 48 , a separate study found that lncRNA BACE1-AS is a new anisemycin target in ovarian cancer 49 . Anisomycin can also inhibit the proliferation of CRC cells and can also enhance the anti-tumor effect of 5-fluorouracil 50 . Similarly, anisomycin has an anti-tumor effect in osteosarcoma that can inhibit osteosarcoma cell line proliferation by blocking the cell cycle, inducing apoptosis by caspase-dependent inducement, and enhancing the patient's sensitivity to doxorubicin 51 . In summary, anisomycin has been reported to have anti-tumor effects in a variety of cancers, and can increase the sensitivity of chemotherapy drugs. However, the anti-tumor effect of anisomycin in gastric cancer has never been reported. Through bioinformatics, this study is the first to predict that this anti-tumor drug can act on gastric cancer. Regarding the research on tumor immune infiltration, we have not found any reports about CDH and tumor immune infiltration of GC in the previous studies. The present study is the first report to investigate the relationship between prognostic CDH genes mRNA expression level and GC tumor immune infiltration.
This study still has some limitations that need to be explained. First, in the Kaplan-Meier plotter database, patients in this cohort were derived from multiple Gene Expression Omnibus datasets. It was therefore challenging to use the same method to verify the target drugs found in the TCGA cohort against the verification cohort. Second, in vivo and in vitro experiments to verify the functional mechanism of the CDH family genes in relation to the drugs screened in this study are lacking. Third, since gastric cancer data from the Kaplan-Meier plotter cohort could not obtain integrated expression values, we could not use the Kaplan-Meier plotter cohort to verify the risk score model. Despite the above shortcomings, this is the first study to conduct a comprehensive analysis www.nature.com/scientificreports/ into the clinical significance, tumor immune infiltration and molecular mechanism of the CDH genes in relation to GC. Molecular mechanisms screened using the whole genome dataset is thus able to provide guidance and a theoretical basis for future CDH gene studies. Notably, multiple CDH genes that may be used as prognostic markers for GC were found, while three potential GC-targeted drugs were identified. Should the results of this study be verified in a large, multi-center cohort, it will be possible to change the treatment strategy of GC.

Conclusions
In the present study, CDH2, CDH6, CDH7 and CDH10 were identified and verified as being significantly associated with poor GC prognosis. A risk score signature which can significantly improve the accuracy of predicting the 5-year survival rate of GC patients was constructed based on CDH2 and CDH6. In addition, results from GSEA suggested that the functional mechanisms of the four prognostic CDH genes and their associated risk score may be involved in multiple, classic cancer-related signaling pathways in GC, including the Wnt and PI3K signaling pathways. Lastly, CMap screening identified three small molecule compounds (anisomycin, nystatin and www.nature.com/scientificreports/ bumetanide) that could be target drugs for risk score adjustment in GC. This study also revealed the relationship between prognostic CDH genes and GC tumor immune infiltration. Since this study is an in silico investigation, our results still need to be verified in future studies using in vivo and in vitro experiments.    60 . Kaplan-Meier plotter data from a total of 875 GC patients was used in the survival analysis validation cohort. All data in this study were obtained from public databases, and the authors were not involved in any animal or human experiments. Therefore, no additional ethical approval was required for this study.
Clinical significance of cadherin genes. The edgeR package was used to evaluate the differences in the distribution of CDH gene expression in GC tumor and para-carcinoma tissue. The TCGA test and Kaplan-Meier plotter verification cohorts were used to analyze the prognosis of CDH genes in GC patients. In addition, the survivalROC package was used to assess the accuracy of GC prognosis when using CDH genes as predictive markers. The step function was simultaneously implemented to screen the prognostic CDH genes so as to construct a prognostic signature with higher predictive accuracy. Lastly, the prognostic CDH gene and risk score model was combined with clinical parameters in order to construct two nomograms for individual prognostic GC patient scores. www.nature.com/scientificreports/ Functional enrichment analysis of cadherin genes in GC. In order to further understand the prognostic differences, biological function and mechanism among gastric cancer patients with different CDH gene expression levels, gene set enrichment analysis (GSEA, http:// softw are. broad insti tute. org/ gsea/ index. jsp) was used 61,62 . Functional differences across different risk score phenotypes were also analyzed using GSEA. GSEA results meeting the following criteria were considered to indicate significant differences between the two phenotypes: |normalized enrichment score (NES)|> 1, nominal P < 0.05, and a false discovery rate (FDR) < 0.25. Subsequently, and in order to discover potential therapeutic drugs for GC, the TCGA whole genome RNA sequencing dataset was used to screen differentially expressed genes (DEGs) between different risk score phenotypes, while the connectivity map online database (CMap, https:// porta ls. broad insti tute. org/ cmap/) was used for drug discovery. PubChem (https:// pubch em. ncbi. nlm. nih. gov) and STITCH (http:// stitch. embl. de/) were used to explore the drug chemical structure and gene-drug interaction network respectively. This was done so as to further understand the mechanisms leading to prognostic differences between various risk score phenotypes. The relationship between prognostic CDH genes expression and tumor immune infiltration abundance were carried out by Tumor IMmune Estimation Resource (TIMER: https:// cistr ome. shiny apps. io/ timer/) 63 .
Statistical analysis. Survival analysis was assessed using Kaplan-Meier curves and Cox proportional hazard regression models. Statistical analysis was performed using SPSS version 22.0 and R version 3.6.2. P < 0.05 was considered to indicate statistical significance.
Ethics approval and consent to participate. Since all datasets of gastric cancer included in the present study were downloaded from open access public database, and the authors were not involved in any animal or human experiments. Therefore, additional approval by an Ethics Committee was not needed.

Data availability
The datasets used during the present study are available from the corresponding author upon reasonable request. All raw data of gastric cancer, which were included in the current study, can be downloaded from TCGA