Introduction

More than 200,000 people die from pancreatic cancer each year, one of the highly malignant tumors of the digestive system, which makes pancreatic cancer the seventh leading cause of cancer death worldwide1. Pancreatic cancer has nearly the same number of deaths and cases, with the highest rates in Europe and North America2. Among all cancer types, pancreatic cancer has the lowest 5-year survival rate of 3%-15%, and is projected to be the second leading cause of cancer-related death in the United States by 20303,4. PDAC accounts for over 90 percent of pancreatic malignancies, and it is one of the most prevalent cancer type5,6. Compared with lung, breast, colorectal and gastric cancers, PDAC has a lower incidence but higher mortality. The clinical prognosis of PDAC is generally poor, with one-year and five-year survival ratio of only 24% and 9%, respectively7,8. Genetics, smoking, high-fat diet and chronic pancreatitis, etc. are closely related to the occurrence of PDAC9. At present, surgical resection remains the preferred options for the treatment of PDAC. However, due to the latent and occult nature of pancreatic cancer, most patients are diagnosed too late, and the tumor tissue has already invaded and formed distant metastasis, which reduced the effect of surgical treatment10. Furthermore, postoperative chemotherapy is unsatisfactory due to drug resistance11. Therefore, finding specific markers for early diagnosis and treatment is of great significance for improving the prognosis and survival ratio of PDAC patients.

Over the past few years, high-throughput sequencing and gene chip technology are applied in many fields of biology and medicine, such as the discovery of gene variants and methylation modifications that are closely related to tumor progression, which help to classify tumors according to histology and clinical data and identify cancer-related genes and biological pathways12,13,14. In recent years, bioinformatics, an emerging discipline that integrates mathematics and biology, makes large-scale microarray data analysis more convenient and effective by obtaining gene expression profile information. It is an effective method to systematically screen tumor-related genes. It is very helpful to explore the relevant molecular mechanisms15,16,17.

In this study, the differently expressed genes (DEGs) between normal and PDAC tissues were selected out of the Gene Expression Omnibus (GEO) database. After Gene Ontology (GO) functional enrichment and expression validation, four genes were considered as potential biomarkers for survival prognostics of patients with pancreatic cancer.

Results

Identification of DEGs in pancreatic carcinoma

In the present study, we selected three GEO datasets which covered 65 pancreatic ductal adenocarcinoma tissues and 50 normal pancreatic tissues. By using |logFC|≥ 1 and P < 0.05 as cut-off criterion, we obtained 1379, 1082, 730, 4634 upregulated genes and 203, 786, 3445, 2470 downregulated genes in GSE15471, 32688, 46234 and 46385, respectively. We found 115 common DEGs in the PDAC samples, including 7 down-regulated genes and 108 up-regulated genes (Fig. 1 and Table 1) using the Venn diagram software.

Figure 1
figure 1

Selection of 115 common DEGs from three datasets (GSE15471, GSE32688, GSE46234 and GSE46385) (A–D). Volcano plot of DEGs from the three datasets; (E) 108 DEGs are up-regulated (logFC ≥ 1); (F) 7 DEGs are down-regulated (logFC ≤ −1).

Table 1 115 commonly DEGs were screened from four profile datasets, including 108 upregulated and 7 downregulated genes in PDAC tissues compared to normal tissues.

Functional enrichment analysis of the DEGs

In order to gain insight into the functional properties of DEGs, we performed gene functional analysis via DAVID, and identified 115 significant enrichment categories, including BP (57), CC (37), MF (21). As shown in Fig. 2 and Supplementary Table 13, the DEGs were mostly clustered in extracellular matrix organization, cell adhesion and cell migration in terms of BP. With regard to CC, the DEGSs were particularly enriched in extracellular matrix, extracellular region, extracellular space and focal adhesion. As for the MF group, the DEGs were strongly enriched in protein binding, integrin binding, laminin binding and extracellular matrix structural constituent. Pathway analysis uncovered that the DEGs were enriched in 11 pathways including ECM-receptor interaction, PI3K/Akt signaling pathway, Focal adhesion and p53 signaling pathway (Fig. 2D and Supplementary Table 4).

Figure 2
figure 2

The top 20 GO and significantly enriched KEGG pathways. (A) BP; (B) CC; (C) MF; (D) KEGG pathways. The Y-axis indicates remarkably enriched items, the X-axis shows the degree of enrichment; P-value are indicated by the color of the dots, and the size of the dots represents the number genes enriched in the GO and KEGG pathways.

PPI network establishment and module analysis

Based on the STRING database, we visualized the PPI network of DEGs by cytoscape software which was constructed with 113 nodes and 212 edges, including 108 up-regulated and 7 down-regulated genes. Furthermore, we conducted cluster analysis, and obtained 15 central nodes which are all up-regulated (Fig. 3 and Table 2).

Figure 3
figure 3

PPI network constructed and module analysis using STRING and Cytoscape. (A) PPI network; (B) Module 1; (C) Module 2. Every node represents a protein; edges represent protein interactions; red circles indicate up-regulated DEGs, while blue ones indicate down-regulated DEGs.

Table 2 15 central genes were selected by STRING and Cytoscape software from PPI network.

Selection of hub genes and validation of the expression levels

By using GEPIA online software, we investigated the effect of 15 central genes on the overall survival of pancreatic cancer patients, and the expression levels of these genes. The data suggested that 11 highly expressed genes, namely ASPM, CCNB1, CDK1, CENPK,, DDX60, MELK, MX1, OAS1, OAS3, PTTG1 and TOP2A were thought to be remarkably associated with shorter overall survival in pancreatic cancer patients (Fig. 4), and these genes significantly higher in tumor tissues than in normal tissues (Fig. 5A–K). ASPM, CCNB1, CDK1, MELK, OAS3, PPTG1 and TOP2A were thought to be significantly associated with shorter overall survival in pancreatic cancer patients (Fig. 6). We performed the same analysis on these 15 genes using UALCAN and found that there were nine genes were significantly associated with survival of pancreatic cancer patients (Fig. 7) while only the expression of CCNB1 was significantly different between tumor tissues and normal tissues via ENCORI pan-cancer analysis (Fig. 5L). Therefore, we took CCNB1 as a candidate target gene for mRNA level detection. The qPCR assay indicated that the expression of CCNB1 was obviously increased in the PANC-1, SW1990 and BxPC-3 cells compared to the HPDE6-C7 cell (Fig. 8A). Furthermore, inmmunohistochemical results and patient clinical data were obtained from the HPA database, which showed significantly higher in situ expression of CCNB1 in pancreatic cancer tissues as compared with normal pancreatic tissues (Fig. 8B).

Figure 4
figure 4

Analysis of correlation between the expression of central genes and the overall survival of PDAC patients via GEPIA.

Figure 5
figure 5

Expression analysis of central genes (A–K). Expression analysis via GEPIA; (L) expression analysis via ENCORI.

Figure 6
figure 6

Analysis of correlation between the expression of central genes and the disease free survival of PDAC patients via GEPIA.

Figure 7
figure 7

Analysis of correlation between the expression of central genes and the overall survival of PDAC patients via UALCAN.

Figure 8
figure 8

Expression validation of CCNB1 at mRNA and protein level. (A) Relative mRNA expression of CCNB1 in PANC-1, SW1990 and BxPC-3 compared with HPDE6-C7 cells. (B) The IHC staining of CCNB1 in normal and tumor tissues from HPA database was displayed. The antibody information is CCNB1 (CAB003804). * indicates P < 0.05, ** indicates P < 0.01, *** indicates P < 0.001, **** < indicates P < 0.0001.

Discussion

PDAC is one of the most aggressive and deadliest solid malignancies with increasing incidence and mortality in recent year18. Evidence suggests that PDAC will become the second leading cause of cancer death within the next 10 years19. PDAC patients have been reported to live an average of 4 months without any treatment. As PDAC is a highly aggressive malignant tumor and its treatment options are limited, the survival time has not been significantly prolonged due to untimely detection even with treatment20. Therefore, early accurate diagnosis and effective targeted therapy for pancreatic cancer are particularly important.

Previous studies had identified several biomarkers associated with pancreatic cancer. In this study, we selected four datasets from the same platform GPL570 specifically for the PDAC type of pancreatic cancer in order to ensure the uniformity and reliability of data analysis. 108 up-regulated DEGs and 7 down-regulated DEGs were screened, which significantly enriched in 8 important pathways including ECM-receptor interaction, PI3K/AKT signaling pathway, et al. Abnormal signaling pathways were important hallmarks of tumors and crucial to the occurrence and development of tumors. ECM exists in both the basement membrane and the interstitial matrix in the body. The main components of the basement membrane are collagen IV and laminin, which separate the epithelial or endothelial cell layer from the connective tissue layer21,22. The expression level of laminin β3 (LAMB3) were upregulated in the ECM of many tissues including pancreatic cancer, lung cancer, colon cancer23,24. Furtherly, Zhang et al. found that inhibition of LAMB3 counteracted the cell proliferation, invasion and migration caused by activation of PI3K/AKT signaling pathway in PDAC24. As a major component of the extracellular matrix, collagen plays a crucial role in the tumor microenvironment of PDAC such as cell adhesion, migration, ECM remodeling and EMT25. An wouding-healing experiment had shown that collagen type I is important in PDAC cell migration and metastasis26. During the malignant transformation from normal pancreas to PDAC, collagen type I induced MMP-2/9 activity and stimulated tumor invasion. A recent study demonstrated that serum hyaluronan and propeptide of type III were higher at baseline in PDAC patients than healthy subjects, and were associated with poor survival of PDAC patients27. The PI3K regulated signaling pathway network could recognize the dynamic signaling of the tumor microenvironment (TME), and could directly promote a variety of oncogenic processes or activate parallel interconnected signaling nodes28. More and more research had confirmed that increased activation of PI3K signaling pathway is closely related to poor overall survival in PDAC patients. Overexpression of AKT was observed in 10–20% PDAC patients, and increased AKT hyperphosphorylation levels and activity were found in approximately 60% of PDAC samples29. Studies had suggested that the use of PI3K inhibitors combined with small molecule attenuators of the downstream effector could effectively prevent the progression of PDAC. Studies had shown that Urolithin A could inhibit the PI3K/AKT/mTOR signaling pathway, thereby effectively reprogramming the fibroinflammatory tumor stroma, by reducing the immunosuppressive tumor-associated macrophages (TAMs) and increasing the recruitment of T cells in the TME of PDAC to achieve the effect of promoting the anti-tumor immune microenvironment30. Taken together, pathway enrichment analysis in this study reconfirmed the results of previous research.

Thereafter, we constructed the PPI network and screened 11 candidate genes including ASPM, CDK1, CENPK, DDX60, MELK, MX1, OAS1, OAS3, PTTG1, TOP2A and CCNB1 via STRING and Cytoscape. We performed over survival analysis on candidate genes using GEPIA and UALCAN, respectively, took the intersection, and found that a total of 9 genes were associated with the survival and prognosis of PDAC patients. The analysis via GEPIA showed that the expression levels of 11 candidate genes in PDAC tissues were higher than in normal tissues. But the data from UALCAN indicated that only the expression of CCNB1 between PDAC and normal tissues was significantly different. Subsequently, we examined the mRNA expression of CCNB1 in pancreatic cancer cell lines and the protein level via HPA database. The results showed that the expression of CCNB1 in PDAC group was higher than normal group both at the mRNA and protein level. Given the above results, we considered that CCNB1 may play important roles in PDAC tumorigenesis and progression.

As an important member of the cyclin family, CCNB1 is an important cell cycle regulator involved in the regulation of G2/M checkpoints, and can interact with cyclin-dependent kinase (CDK1) forms a complex, phosphorylate the substrates, and ensure that cells enter the G2/M phase from G1/S phase to promote mitosis31,32,33. The overexpression of CCNB1 were reported in several tumors such as renal, liver, breast34,35,36. The oncogene c-myc could activate the transcription of the m7G methyltransferase WDR4, which enhanced the translation of CCNB1 through transcriptional regulation, thereby promoting the progression of hepatocellular carcinoma36. 5MeOIndox could induce G2/M arrest of PDAC cells via inhibition of CDK1/CCNB1levels, thereby leading to apoptosis37. Analysis of 107 samples showed that CCNB1 was associated with poor prognosis in patients with pancreatic neuroendocrine tumors38. Many studies had showed that CCNB1 was involved in p53 signaling pathway. Silence of CCNB1 could inhibit proliferation, decrease the ratio of S-phase cell and the expression level of MDM2, induce apoptosis, senescence, and increase G0/G1-phase cell proportion, suggesting that these may be caused by the activation of p53 signaling pathway.

Silence of CCNB1 could inhibit the proliferation and promote senescence of Capan-2 cell via activation of p53 signaling pathway39. CCNB1 could promote the phosphorylation of PI3K and AKT in liver cancer and reduce p53 protein expression by promoting p53 ubiquitination36. This was consistent with the PI3K/AKT and p53 signaling pathways mentioned in above enrichment analysis.

Of course, our study has certain limitations and further exploration is needed. On one hand, only four datasets were selected for analysis, which should be validated in more samples in the future. On the other hand, some experiments such as knockdown assays should be performed to determine the mechanism of these hub genes in PDAC progression.

Materials and methods

Collection of data

We downloaded the gene expression profiles of PDAC and normal (or adjacent) samples from the GEO database, which is an open source database that freely stores data including expression profiles, sequencing, and more, where researchers can easily access, download, and process raw data. Four datasets including GSE15471, GSE32688, GSE46234 and GSE 46385 were selected, which contain 36 matching pairs of pancreatic tumor and adjacent non-tumor tissues, 25 pancreatic tumors and 7 normal pancreas tissues, 2 pancreatic tumors and 4 adjacent non-tumor tissues, 2 pancreatic tumors and 3 adjacent non-tumor tissues. All datasets were based on GPL570 platform and all tumor tissues were pathologically identified40,41,42,43as PDAC (Table 3).

Table 3 Information of datasets in the analysis of PDAC tissues vs. normal or adjacent tissues.

Identification of DEGs

GEO2R, an online analysis tool, was applied to compare and filter DEGs present in the raw data of gene profiles. The cutoff criteria were |logFC|≥ 1 and P value < 0.05, which were considered statistically significant, where the logFC of up-regulated genes was ≥ 1, and the logFC of down-regulated genes was ≤ − 1. The Venn software online (http://bioinformatics.psb.ugent.be/webtools/Venn/) was used to screen and show the DEGs in the three datasets.

Enrichment analysis of DEGs via gene ontology and Kyoto encyclopedia of genes and genomes pathway

In order to further investigate the function and relationship of DEGs, Database Annotation, Visualization, and Integrated Discovery (DAVID 2021 update, http://david.ncifcrf.gov)44 was applied to perform enrichment analysis covering Biological Process (BP), Cellular Component (CC), Molecular Function (MF) and KEGG pathway analysis. P-value < 0.05 indicated statistically significant.

Construction of the PPI network and analysis of the module

The Search Tool for the Retrieval of Interacting Genes/Proteins (STRING, https://cn.string-db.org/cgi/input.pl)45 was used to build the PPI network for analyze the relationship between the differently expressed proteins and further clarify the specific relationship between genes and disease. Cytoscape (version 3.8.2)46 is the software that can analyze and visualize the PPI network. As a component of Cytoscape, Molecular complex detection (MCODE) was used to decipher the network protein association and identify gene clusters (highly interconnected regions); the criteria are as follows: node score cutoff = 0.2; degree cutoff = 2; k‑core = 2 and Max. Depth = 100.

Analysis of survival and expression validation of hub genes in TGCA dataset

The Gene Expression Profiling Interactive Analysis tool (GEPIA, http://gepia.cancer-pku.cn/)47 is an useful resource for the analysis of the Cancer Genome Atlas (TCGA) and Genotype-tissue Expression data. The analysis of expression and interaction of normal and cancer genes, and pathological stage, prognostic analysis of genes in normal and cancer tissues were performed via GEPIA. UALCAN (http://ualcan.path.uab.edu/index.html)48 was applied to analyze the correlation between pancreatic cancer patients survival prognosis and hub gene expression. The Human Protein Atlas (HPA) (http://www.proteinatlas.org/) was used for validating the protein expression level of hub genes. Encyclopedia of RNA Interactomes (ENCORI)49 (http://starbase.sysu.edu.cn/) is an open-source bioinformatics platform to study RNA-RNA interactions, targets prediction, signaling pathways and pan-cancer differential expression and survival analysis.

Gene expression assay at mRNA level via quantitative real-time polymerase chain reaction

We extracted total RNA from cells using EasyPure RNA Purification Kit (Tansgen, China). Purity measurement confirmed that A260/280 was between 1.8 and 2.0 and the concentration is 800-1000 ng/μl, all of which met the requirements of further experiments. The quantitative real-time PCR was examined by SYBR Green (Takara, Japan) on the LightCycler 96 Real-Time PCR Systems (Roche, Switzerland). All the primers were designed with Primer 7.0 software and the sequences of primer were listed in Table 4. GAPDH was chosen as the internal reference.

Table 4 Primers used for real-time PCR analysis.

Statistical analysis

Every experiment in vitro was conducted at least three times. SPSS 25.0 was used for conducting the statistical analysis. P values of less than 0.05 (*) were considered statistically significant.

Conclusion

In summary, we performed an integrative bioinformatics approach to screen 115 DEGs from three PDAC GEO datasets, in which CCNB1was identified that may serve as potential diagnostic and therapeutic biomarker in PDAC patients. In the future, we will further study the molecular mechanism of the occurrence and development of pancreatic cancer through biological experiments by establishing stable gene silencing transfected cell models. These results will provide new strategies for the diagnosis, treatment and prognosis of PDAC patients.