Identifying of biomarkers associated with gastric cancer based on 11 topological analysis methods of CytoHubba

Gastric cancer (GC) is one of the most common types of malignancy. Its potential molecular mechanism has not been clarified. In this study, we aimed to explore potential biomarkers and prognosis-related hub genes associated with GC. The gene chip dataset GSE79973 was downloaded from the GEO datasets and limma package was used to identify the differentially expressed genes (DEGs). A total of 1269 up-regulated and 330 down-regulated genes were identified. The protein-protein interactions (PPI) network of DEGs was constructed by STRING V11 database, and 11 hub genes were selected through intersection of 11 topological analysis methods of CytoHubba in Cytoscape plug-in. All the 11 selected hub genes were found in the module with the highest score from PPI network of all DEGs by the molecular complex detection (MCODE) clustering algorithm. In order to explore the role of the 11 hub genes, we performed GO function and KEGG pathway analysis for them and found that the genes were enriched in a variety of functions and pathways among which cellular senescence, cell cycle, viral carcinogenesis and p53 signaling pathway were the most associated with GC. Kaplan-Meier analysis revealed that 10 out of the 11 hub genes were related to the overall survival of GC patients. Further, seven of the 11 selected hub genes were verified significantly correlated with GC by uni- or multivariable Cox model and LASSO regression analysis including C3, CDK1, FN1, CCNB1, CDC20, BUB1B and MAD2L1. C3, CDK1, FN1, CCNB1, CDC20, BUB1B and MAD2L1 may serve as potential prognostic biomarkers and therapeutic targets for GC.

Cox analysis and LASSO regression analysis. The analysis results of correlation between DEGs expression and overall survival (OS) as well as other clinical features investigated by Cox analysis were shown in Table 2. The results suggested that six clinical features including age, stage, grade, T, M and N and six hub genes including C3, CDK1, FN1, CCNB1, CDC20 and BUB1B were revealed significantly correlated with OS (p-value < 0.05) by univariate or multivariate Cox analysis. The LASSO method established regression model and continued to screen the 11 hub genes. By setting different , the path change graph of the regression coefficient was www.nature.com/scientificreports/ obtained (Fig. 7a). The trend of each curve in the figure represented the change of the regression coefficient path. It could be seen that the regression coefficients were mostly compressed to zero, which showed that the model had a good advantage in dimensionality reduction and variable selection. Each point in Fig. 7b corresponded to a penalty value, and the position of the vertical dashed line represented the number of genes selected under the optimal model. It could be seen from the figure that there were five genes under the optimal model, including C3, CCNB1, CDC20, FN1 and MAD2L1. Therefore, seven of the 11 selected hub genes including C3, CDK1, FN1, CCNB1, CDC20, BUB1B and MAD2L1 were verified through Cox analysis or LASSO regression analysis which could be taken as independent prognostic biomarkers for GC.

Discussion
The study of molecular genetics and signal transduction pathways are helpful for further understanding of the pathogenesis and early diagnosis of GC. Therefore, recognition of DEGs for GC based on transcriptome microarray datasets may contribute to early diagnosis and develop effective therapies. In our study, a total of 1599 DEGs were identified in the dataset GSE79973, of which 1269 were up-regulated and 330 were down-regulated. Based on these DEGs, a PPI network was constructed where 1376 genes formed the network with 14,394 edges. And 11 hub genes were selected through intersection of 11 topological analysis methods. Clustering module related to these 11 genes were obtained by MCODE. In order to explore the role of the 11 selected hub genes in the pathogenesis of GC, we performed GO function and KEGG pathway enrichment analysis for them and found that the genes were enriched in a variety of functions and pathways. Kaplan-Meier analysis revealed that 10 out of the 11 hub genes were related to the overall survival of GC patients. Cox analysis and LASSO regression analysis showed that seven of the 11 selected hub genes were significantly correlated with GC.
The overall aim of this study was to identify the hub genes which may serve as potential biomarkers for GC diagnosis and therapy, and to further explore the potential mechanisms of GC by integrated profiling analysis. In our study, seven of the 11 selected hub genes including C3, CDK1, CCNB1, BUB1B, MAD2L1, FN1 and CDC20 were considered to be the most likely independent prognostic biomarkers associated with GC. And www.nature.com/scientificreports/ five of them were newly found including C3, CDC20, CCNB1, BUB1B and MAD2L1 compared with previously published results using the same dataset. Some relevant literatures suggested from biological point of view that most of these found hub genes played important roles on GC. Kitano et al. showed that the synthesis and secretion of C3 by all the tested GC derived cell lines in response to TNF, suggested that C3 may be secreted in the gastric wall as part of its normal physiology, or as a result of tumour pathology and thereby participate in local immune or inflammatory responses 14 . Lee et al. found that the high expression of CDK1 in GC patients may imply a strong biological ability of tumor invasion and CDK1 was the target gene of mir-490-5p. Downregulation of mir-490-5p and up-regulation of CDK1 can promote the proliferation ability of GC cells and the transformation of G1/S phase 15 . A study by Kidokoro et al. found that CDC20 was often up-regulated in many types of tumors and significantly inhibited by ectopic introduction of p53. Additionally, treatment of cancer cells with siRNA against CDC20 can induce G2/M arrest and inhibit cell growth 16 . CCNB1 knockdown by RNA interference was found to significantly inhibited proliferation, migration and invasion of HCC cells 17 . Hudler et al. found that the expression of BUB1B in GC tissues was significantly higher than that in adjacent normal tissues, nearly (8.875 ± 1.08) times 18 . Frio et al. found that at least one BUB1B mutation can result in autosomal recessively inherited susceptibility to gastrointestinal cancer, as do mutations in MUTYH and the mismatchrepair genes 19 . FN1 was an significant regulatory factor promoting the development and formation of various cancer cells, such as laryngeal, skin squamous carcinoma 20,21 and brain glioblastoma 22 . Zhang et al. found that miR-200c can inhibit the migration, proliferation and invasion of GC cells in vitro by directly combining with FN1, which indicated that mir-200c and FN1 may be potential biomarkers or therapeutic methods for GC 23 . A study by Wang et al. confirmed the prognostic value of two key mitotic checkpoint genes MAD2L1 and BUB1, which have been included in multiple gene expression signatures for breast cancer prognosis. And they also found that these genes were biologically relevant to breast cancer progression, as suppression of their expression was associated with reduced tumor cell growth, migration and invasion 24 . This study provides important clues for exploring potential biomarkers and targets for the diagnosis, prognosis and treatment of GC. In future work, if condition permits, we hope to conduct some experiments to verify the important relation between these hub genes and GC from biological point of view.    www.nature.com/scientificreports/

Conclusion
Through 11 topological analysis methods, we identified 11 hub genes for GC. We validated these hub genes through functional enrichment analysis, the clustering module with the highest score, relevant literatures, Kaplan-Meier analysis, Cox analysis and LASSO regression analysis. The results suggested that seven of the 11 selected hub genes including C3, CDK1, CCNB1, BUB1B, MAD2L1, FN1 and CDC20 may serve as potential prognostic biomarkers and therapeutic targets for GC. These results may provide a theoretical direction for future research with regards to the molecular mechanisms of the progression of GC.  4,6,[25][26][27] , where the authors mainly clarified the emerging role of long non-coding RNA (lncRNA) in cancer development, explored novel lncRNA candidates, identified key candidate genes and circRNA, and explored the molecular mechanism of GC through comprehensive analysis of mRNA and miRNA expression profiles. Here we aimed to identify potential prognostic biomarkers of GC based on 11 topological analysis methods and used the MCODE method and survival analysis to verify these hub genes.  www.nature.com/scientificreports/

GO function and KEGG pathway enrichment analysis.
To identify the potential functions of the selected hub genes, we used the clusterProfiler package in R to perform GO functions and KEGG pathway analysis of these genes 31 . ClusterProfiler is an R package of Bioconductor which can perform statistical analysis and visualization of functional clustering of gene sets or gene clusters.
Survival analysis. The Kaplan-Meier plotter database (http://kmplo t.com), an online tool to evaluate the prognostic values of genes in breast, ovarian, lung and gastric cancer patients, was applied to analyze the associations between the identified hub genes and overall survival 32 . The hazard ratio (HR) and its 0.95 confidence intervals were calculated. p < 0.05 was used to indicate a statistically significant difference 33 .  www.nature.com/scientificreports/ Verification based on another dataset. In order to validate the efficiency of our method, another mRNA expression dataset (GSE19826) was downloaded from the GEO database (https ://www.ncbi.nlm.nih. gov/geo/), the mRNA profifiles were based on the GPL570 platform (Affymetrix Human Genome U113 Plus 2.0 Array). The GSE19826 dataset included 15 paracancerous tissues and 12 tumor tissues. The same method was carried out on this dataset. Firstly, the Limma method was used to identify DEGs between GC tissues and paracancerous tissues. Secondly, a PPI network of DEGs was constructed based on the STRING V11 database. Based on the genes in the network, we searched for hub genes through 11 topological analysis methods in the CytoHubba. Then, the clustering module related to these hub genes were constructed by CytoHubba and the selected hub genes were verified by MCODE method.
Cox analysis and LASSO regression analysis. The TCGA data with corresponding clinical features of GC was downloaded from TCGA database (https ://tcgad ata.nci.nih.gov/tcga/) which contained 375 tumor tissue samples and 32 paracancerous tissue samples. Cox proportional hazard models of univariate and multivariate were used to calculate 95% confidence interval (CI) and hazard ratio (HR) for the DEGs and clinical features where survival package was used for statistical analysis. Univariate Cox analysis model was used to compare the relationship between clinical features and survival rates. Multivariate Cox analysis model was used to evaluate how the genes expression and the clinical factors affect overall survival (OS). p < 0.05 was set as the threshold. LASSO regression analysis was applied to confirm the selected hub genes further. When solving the regression model, LASSO fitted the model that contained prediction variables by constraining or punishing the coefficients and compressing the coefficient estimation value to zero direction, so that the model had a better interpretation effect. The screening of related variables could be done through the cv.glmnet and glmne function in R language.