Screening and predicted value of potential biomarkers for breast cancer using bioinformatics analysis

Breast cancer is the most common cancer and the leading cause of cancer-related deaths in women. Increasing molecular targets have been discovered for breast cancer prognosis and therapy. However, there is still an urgent need to identify new biomarkers. Therefore, we evaluated biomarkers that may aid the diagnosis and treatment of breast cancer. We searched three mRNA microarray datasets (GSE134359, GSE31448 and GSE42568) and identified differentially expressed genes (DEGs) by comparing tumor and non-tumor tissues using GEO2R. Functional and pathway enrichment analyses of the DEGs were performed using the DAVID database. The protein–protein interaction (PPI) network was plotted with STRING and visualized using Cytoscape. Module analysis of the PPI network was done using MCODE. The associations between the identified genes and overall survival (OS) were analyzed using an online Kaplan–Meier tool. The redundancy analysis was conducted by DepMap. Finally, we verified the screened HUB gene at the protein level. A total of 268 DEGs were identified, which were mostly enriched in cell division, cell proliferation, and signal transduction. The PPI network comprised 236 nodes and 2132 edges. Two significant modules were identified in the PPI network. Elevated expression of the genes Discs large-associated protein 5 (DLGAP5), aurora kinase A (AURKA), ubiquitin-conjugating enzyme E2 C (UBE2C), ribonucleotide reductase regulatory subunit M2(RRM2), kinesin family member 23(KIF23), kinesin family member 11(KIF11), non-structural maintenance of chromosome condensin 1 complex subunit G (NCAPG), ZW10 interactor (ZWINT), and denticleless E3 ubiquitin protein ligase homolog(DTL) are associated with poor OS of breast cancer patients. The enriched functions and pathways included cell cycle, oocyte meiosis and the p53 signaling pathway. The DEGs in breast cancer have the potential to become useful targets for the diagnosis and treatment of breast cancer.

Protein level verification. We visualized the selected hub-gene through ualcan 29 , and the protein expression data with 18 normal and 125 breast cancer samples were from CPTAC (Office of Cancer Clinical Proteomics Research, https:// prote omics. cancer. gov/ progr ams/ cptac).

Screening of DEGs.
A total of 1529, 1550, and 2188 DEGs were identified from the GSE134359, GSE31448, and GSE42568 datasets, respectively. Of these, 268 genes were present in all three datasets (Fig. 1A). 89 genes consistently showed high expression and 179 genes showed low expression in all three databases. The top 22 DEGs are shown on the heatmap, based on the criteria |log 2 FC|> 3 and adj.P < 0.05 (Fig. 1B).
GO and KEGG pathway enrichment analysis. GO enrichment and KEGG pathway analysis were performed on the DEGs using the DAVID database. GO enrichment analysis covers three aspects: biological processes, cell composition and molecular function ( Fig. 2A). The upregulated genes were mainly related to mitotic cytokinesis, mitotic spindle assembly and microtubule-based movement; while the downregulated genes were mainly involved in cell adhesion, the response to mechanical stimuli and the response to glucose. The KEGG pathway analysis showed that the genes upregulated in tumors were enriched in cell cycle, oocyte meiosis and the P53 signaling pathway, while the downregulated genes were enriched in PPAR signaling pathway, AMPK signaling pathway, tyrosine metabolism, pathways in cancer and so on (Fig. 2B).
PPI network construction and module selection. Considering  www.nature.com/scientificreports/ identified the 268 DEGs. The results showed that there were dense regions in PPI, that is, genes closely related to breast cancer (HUB genes) modules. A total of 236 nodes and 2132 edges were selected to plot the PPI network, which consisted of 87 up-regulated genes and 149 down-regulated genes (Fig. 3A). Subsequently, a pivotal module of 53 genes (CDK1, KIF11, DLGAP5, KIF4A and so on) was identified with the degree ≥ 10 as the cut-off value by using MCODE (Fig. 3B). Another important module of 8 genes including both up-regulated and down-regulated genes was also identified (Fig. 3C). The top 10 HUB genes were identified by cytoHubba (Top 10 genes ranked in MCC). GO and KEGG analysis of these ten genes were conducted. HUB genes are related with cell division, mitotic cytokinesis in Biologycal Process; spindle, nucleus, spindle microtubule in Cellular Component; protein kinase binding, ATP binding in Molecular Function (Fig. 3D). They are also enriched in cell cycle, oocyte meiosis, p53 signaling pathway and so on (Fig. 3E).
Survival and redundancy analyses. Ten HUB genes in PPI network were evaluated for their prognostic value on the Kaplan-Meier plotter. All 10 genes exhibited their potential in the prediction of survival based on their expression. The OS for breast cancer patients was determined based on the expression level of each gene (low vs. high). As shown in Fig. 4 It is of great significance to analyze the role of HUB genes in breast cancer cell survival, and the essential genes are potential therapeutic targets. Here we analyzed the function of HUB genes using online-available DepMap tool, which was established based on CRISPR screening and siRNA screening data. There are 2 genes (KIF11, RRM2) that are common essential in both CRIAPR knockout and RNAi; 6 genes (AURKA, CCNB1, DTL, KIF23, NCAPG, ZWINT) that are common essential only in CRISPR knockout, indicating that these genes are not only diagnosis markers but also potential therapeutic targets (Fig. 5). www.nature.com/scientificreports/

Discussion
Regardless of recent progress in the treatment of breast cancer, it has remained the most common cause of cancer-related deaths in the past few years. The high mortality rate of breast cancer is partly due to the lack of adequate screening methods with high sensitivity and specificity. Therefore, it is necessary to identify potential biomarkers for screening and early diagnosis of breast cancer. Microarray technologies and next-generation sequencing have become key tools for providing comprehensive genetic information on breast cancer samples and revealing the changes in disease progression. In this study, we used proven online bioinformatics tools to investigate possible biomarkers for diagnosis of breast cancer. We identified a total of 268 DEGs common to all three GEO datasets, which included 89 upregulated genes and 179 downregulated genes.
The upregulated genes were mainly involved in the three pathways, namely cell cycle, oocyte meiosis and the P53 signaling pathway, which are closely associated with cancer. The downregulated genes were mainly enriched in three other pathways: cell adhesion, the response to mechanical stimuli and the response to hormonal hypoxia. Among the identified DEGs, 87 showed high degrees in the PPI network. Further analysis revealed that the following 10 DEGs within these modules were closely associated with a shorter survival time of breast cancer patients: DLGAP5, AURKA, UBE2C, CCNB1, RRM2, KIF23, KIF11, NCAPG, ZWINT and DTL.
DLGAP5 is involved in Aurora A signaling and its neurogenic locus notch homolog protein 3 (NOTCH3) intracellular domain regulates transcription. DLGAP5 overexpression is associated with poor prognosis of breast . Prognostic estimation of the top 10 HUB genes. The top 10 HUB genes including ZWINT, DLGAP5, DTL, NCAPG, CCNB1, AURKA, KIF23, KIF11, RRM2 and UBE2C, were identified by cytoHubba, followed by survival analysis. Breast cancer patients were divided into two groups according to auto select best cutoff. Low, patients with gene expression lower than best cutoff; high, patients with gene expression higher than best cutoff. www.nature.com/scientificreports/ cancer 30 . DLGAP5 is also associated with the prognosis of colorectal cancer, prostate cancer, and non-small cell lung cancer (NSCLC) [31][32][33][34][35] . A study identified a critical target of NOTCH3 signaling was the mitotic apparatus organizing protein DLGAP5 (HURP/DLG7) 36 . DLGAP5, which is regulated by nucleolar and spindle associated protein 1 (NUSAP1), is associated with the proliferation, migration and invasion of invasive breast cancer 37 . DLGAP5, required for AURKA-dependent, centrosome-independent mitotic spindle assembly, is essential for the survival and proliferation of SMARCA4/BRG1 mutant 38 . One subpopulation of prostate cancerwas associated with enhanced expression of DLGAP5 and decreased dependence upon androgen receptor signaling 39 . AURKA plays an important role in cell cycle progression by promoting cell entry into mitosis, and is associated with increased risk of developing breast cancer. AURKA can translocate to the nucleus and enhance the phenotype of breast cancer stem cells, promoting unique oncogenic properties in malignant cells 40 . It has been reported that AURKA regulates the phenotype of breast cancer tumor stem cells by modifying and stabilizing Drosha mRNA with M6A 41 . In addition, AURKA plays an important role in the treatment of drug-resistant breast cancer 42 , and Aurora kinase A inhibitor has been in a five-arm phase 2 study for safety and activity 43 . www.nature.com/scientificreports/ UBE2C can ubiquitinate Anaphase-Promoting Complex/Cyclosome (APC/C) (Ub) 44 . The high expression of UBE2C in breast cancer was reported to be an independent prognostic factor associated with increased risk of disease recurrence and death. Thus, it is considered as a potential therapeutic target for breast cancer [45][46][47] .
Cyclin B1, the protein encoded by CCNB1, is a regulatory protein involved in mitosis. It is necessary for proper control of the G2/M transition phase of the cell cycle. A study showed cyclin B1 and B2 transgenic mice are highly prone to tumors, including tumor types where B-type cyclins serve as prognosticators 48 . CCNB1 is associated with radiosensitivity in colorectal cancer 49 . CCNB1 can also affect cavernous sinus invasion in pituitary adenomas through the epithelial-mesenchymal transition 50 .
The gene RRM2 encodes ribonucleotide reductase regulatory subunit M2, one of two non-identical subunits of ribonucleotide reductase. In a study that reported RRM2 acetylation at K95 suppresses tumor cell growth in vitro and in vivo, and is therefore a potentially attractive strategy for cancer therapy 51 . In a study that searched www.nature.com/scientificreports/ the GEO database for miRNA-mRNA or lncRNA-mRNA as novel biomarkers for breast cancer, the miR-21/ RRM2 axis was identified as a candidate biomarker for the diagnosis and treatment of breast cancer 52 . In another study that showed a lincRNA, lincNMR, regulates tumor cell proliferation through a YBX1-RRM2-TYMS-TK1 axis governing nucleotide metabolism 53 . In addition, RRM2 was reported to be associated with the prognosis of prostate cancer 54 . Kinesin family member 23, the protein encoded by KIF23 is a member of the kinesin-like protein family, also known as MKLP1. MKLP1/KIF23 is the kinesin component of the centralspindlin complex 55 . It was reported that KIF23 expression is high in the majority of primary and metastatic lung cancer tissues or cell lines, and it is associated with poor survival 56 . In a study that examined the association between members of the kinesin family and breast cancer, KIF23 and KIF11 were found to be associated with poor prognosis 57 . KIF23 is regulated through wnt signaling pathway and associated with recurrence of hepatocellular carcinoma 58 .
Kinesin family member 11, the protein encoded by KIF11, is another member of the kinesin-like protein family. According to an Oncomine analysis of GEO and TCGA databases, KIF11 is a proto-oncogene associated with breast cancer and is significantly associated with poor prognosis 59 . KIF11 is also regulated through wnt signaling pathway and associated with recurrence of hepatocellular carcinoma.
NCAPG is a potential prognostic marker in HER2 + breast cancer, and a therapeutic target to effectively overcome trastuzumab resistance as well 60 . NCAPG has also been identified as a key gene in triple-negative breast cancer 61 as well as hepatocellular carcinoma 62 . Furthermore, it was reported that high expression of NCAPG is associated with poor prognosis of various tumor types, and its overexpression may play an important role in the regulation of tumor-related pathways in tumor growth 63 .
Currently, little is known about the role of ZW10 interactor (ZWINT) in breast cancer. Denticleless E3 ubiquitin protein ligase homolog (DTL) is associated with proliferation and appears to be a promising molecular therapeutic target in breast cancer 64 . DTL may also be associated with poor prognosis of acral melanoma and gastric carcinoma 65,66 .
Based on redundancy analysis, two genes, KIF11 and RRM2, may serve as therapeutic targets or prognostic indicators. The two genes are also differentially expressed by protein level verification. There are many differences between the predicted data and the clinical data, and the survival data derived from the Kaplan-Meier tool need to be validated. In future studies, more attention should be paid to breast cancer patients. There are many tumor subtypes for breast cancer, and it is necessary to define the biomarker characteristics of each subtypes. In our future study, we intend to recruit a cohort of breast cancer patients to investigate the sensitivity and specificity of these biomarkers for early screening of breast cancer; the results should facilitate the clinical application of these biomarkers for the diagnosis of breast cancer.

Data availability
The datasets are available from the GEO database.