GNG7 and ADCY1 as diagnostic and prognostic biomarkers for pancreatic adenocarcinoma through bioinformatic-based analyses

Pancreatic adenocarcinoma (PAAD) is one of the most lethal malignant tumors in the world. The GSE55643 and GSE15471 microarray datasets were downloaded to screen the diagnostic and prognostic biomarkers for PAAD. 143 downregulated genes and 118 upregulated genes were obtained. Next, we performed gene ontology (GO) and The Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis on these genes and constructed a protein–protein interaction (PPI) network. We screened out two important clusters of genes, including 13 upregulated and 5 downregulated genes. After the survival analysis, 3 downregulated genes and 10 upregulated genes were identified as the selected key genes. The KEGG analysis on 13 selected genes showed that GNG7 and ADCY1 enriched in the Pathway in Cancer. Next, the diagnostic and prognostic value of GNG7 and ADCY1 was investigated using independent cohort of the Cancer Genome Atlas (TCGA), GSE84129 and GSE62452. We observed that the expression of the GNG7 and ADCY1 was decreased in PAAD. The diagnostic receiver operating characteristic (ROC) analysis indicated that the GNG7 and ADCY1 could serve as sensitive diagnostic markers in PAAD. Survival analysis suggested that expression of GNG7, ADCY1 were significantly associated with PAAD overall survival (OS). The multivariate cox regression analysis showed that the expression of GNG7, ADCY1 were independent risk factors for PAAD OS. Our study indicated GNG7 and ADCY1 may be potential diagnostic and prognostic biomarkers in patients with PAAD.

tissue of PAAD patients and the paired adjacent non-malignant tissue. Next, we used Venn plots to query the overlapping DEGs obtained from GSE55643 and GSE15471. Then, the Database for Annotation, Visualization and Integrated Discovery (DAVID) database was used to perform the gene ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis on the overlapping DEGs 14 . We constructed the protein-protein interaction (PPI) network of DEGs via the Search Tool for the Retrieval of Interacting Genes (STRING) database to identify the highly interconnected protein molecules and visualized by Cytoscape software [15][16][17][18] . Finally, we investigated the association between the selected gene expression with clinical outcomes using independent PAAD patient data obtained from the Cancer Genome Atlas (TCGA) database, GSE84129, and GSE62452 datasets 19,20 . Through integrated bioinformatics approaches, we screened out two potentially valuable biomarkers for the diagnosis and prognosis of PAAD.

Methods
Acquisition and processing of microarray data. mRNA expression data from microarray files GSE55643 21 and GSE15471 22 were downloaded from the GEO database 23,24 . GSE55643 contains 45 primary and 8 adjacent non-malignant tissue samples from PAAD patients. GSE15471 contains 78 pairs of primary and paracancerous tissue samples from PAAD patients. We downloaded the normalized log2 ratio representing tumor/ normal of the GSE55643 and GSE15471 dataset and converted the probe identification numbers to gene symbols using the Whole Human Genome Microarray 4 · 44 K G4112F (Probe Name Version) and the Affymetrix Human Genome U133 Plus 2.0 Array, respectively.

Identification of DEGs.
We utilized the R software and built-in limma packages (version 3.22.1; Fred Hutchinson Cancer Research Center) to perform DEGs analysis between the primary tumor tissue of PAAD patients and the paired adjacent non-malignant tissue. DEGs must meet our two important criteria to be selected for the next stage of analysis: adjusted P value < 0.05, |log2 fold change (FC)|> 1.2 25 . KEGG and GO enrichment analysis of DEGs. The GO clarifies the functions of genes from three aspects: cellular component (CC), molecular function (MF), and biological process (BP) 26 . The KEGG provides data resources for understanding high-level functions and utilities of the biological metabolic pathways 27 . DAVID, a network server that provides a comprehensive set of functional annotation tools to determine the biological meaning of a large list of genes, was used for GO function annotation and KEGG pathway enrichment analysis 14 . P value < 0.05 was considered statistically significant.
Construction of PPI networks and identification of gene clusters. STRING database (version11.0), containing more than 2 billion interactions of 24.6 million proteins involving more than 5,000 organisms, was used to search for DEGs' encoded protein and PPI network information. We uploaded the overlapping DEGs from the microarray data GSE55643 and GSE15471 into the STRING database and the significance threshold was set to an interaction score of 0.900 (highest confidence). Then, we visualized the PPI network in Cytoscape software. Next, we used built-in Molecular Complex Detection (MCODE) of Cytoscape to screen out highly interconnected protein molecules as molecular clusters.

Survival and expression analysis of gene clusters and KEGG enrichment analysis. The Kaplan
Meier plotter is capable to assess the effect of 54 k genes (mRNA, miRNA, protein) on survival in 21 cancer types. Sources for the databases include GEO, EGA, and TCGA 19 . But for PAAD, the expression and survival data are all sources from the TCGA database (n = 177). The TNMplot database contains 56,938 unique multilevel quality-controlled samples of normal tissues, tumor tissues, and metastatic tissues from GEO, GTex/TCGA, and TARGET databases. We performed the Kaplan-Meier curve and log-rank test analyses to investigate the prognostic value of gene clusters in PAAD tissues. The TNMplot database (https:// www. tnmpl ot. com/) was used for the comparison of gene expression in normal, tumor and metastatic tissues of PAAD patients. Then, the member of gene clusters validated by the Kaplan-Meier plotters and TNMplot database were screened as selected genes. Finally, the DAVID was used to perform the KEGG enrichment analysis on the selected genes to explore the pathway that may regulate the occurrence and development of PAAD.
Diagnostic and prognostic significance of selected gene in TCGA and GEO database. We analyzed the expression of the selected genes in PAAD tissues and para carcinoma tissues from patients in the TCGA database and GEO database (GSE84129, and GSE62452 datasets). We performed the Kaplan-Meier curve and log-rank test and ROC analyses as well as multivariate Cox regression analysis to investigate the relationship between the expression of selected genes and the clinical outcome of PAAD patients. Statistical analysis. SPSS 21 (SPSS Inc., Chicago, IL, USA) was used to perform statistical analysis. Graph-Pad Prism 5.0 (GraphPad Software, Inc., San Diego, CA, USA) was used to draw figures. Pearson's chi-square test was used to compare the categorical variables. The student's t-test was used to analyze normally distributed continuous variables. Kaplan-Meier plots with the log-rank test were used to estimate survival differences. The diagnostic significance of selected genes for PAAD was evaluated by the ROC curve. P < 0.05 was considered statistically significant.

Identification of DEGs in PAAD.
We screened DEGs from 123 cases of primary tumor tissues using limma packages in R language, compared with 86 cases paired para-cancerous nonmalignant tissues. Using both adjusted P value < 0.05 and logFC (fold change) > 1.2 criteria, a total of 1228 genes were identified in GSE55643, including 529 up-regulated genes and 699 down-regulated genes. A total of 2280 genes were screened in GSE15471, including 1289 up-regulated genes and 991 down-regulated genes. Then, the overlapping DEGs in the microarray data GSE55643 and GSE15471 were analyzed by Venn plot (Fig. 1), and finally, 261 DEGs were obtained, including 118 up-regulated genes and 143 down-regulated genes (Supplementary Table 1).

KEGG and GO enrichment analysis of DEGs.
To further explore the function of the identified DEGs in PAAD, the online biological classification software DAVID was applied to perform GO and KEGG pathway analyses on PAAD. As shown in Supplementary Fig. 1, we list the top 6 significant terms for the BP, CC, MF, and KEGG pathways of DEGs, respectively (Supplementary Fig. 1). We list the annotation of the up-regulated genes and down-regulated genes. The up-regulated DEGs were mainly enriched for genes involved in the type I interferon signaling pathway, response to virus and extracellular matrix organization in the BP group. The downregulated DEGs were mainly enriched genes in the reactive oxygen species metabolic process, regulation of cell adhesion and multicellular organism aging. In the CC group, the up-regulated DEGs were mainly enriched for genes associated with the extracellular exosome, extracellular space and extracellular region. The downregulated DEGs were mainly enriched in the integral component of plasma membrane, extracellular exosome and integral component of membrane. In the MF group, the up-regulated DEGs were mainly enriched from genes associated with chemokine activity, laminin binding and calcium ion binding. The down-regulated DEGs were enriched in the lipid binding, phospholipid binding and serine-type endopeptidase inhibitor activity in the MF group. In addition, 3 significant KEGG pathways for up-regulated genes, such as ECM-receptor interaction, PI3K-Akt signaling pathway, focal adhesion. Moreover, 3 significant KEGG pathways for down-regulated genes, such as fat digestion and absorption, protein digestion and absorption, vitamin digestion and absorption (Supplementary Table 2).

Construction of PPI networks and identification of gene clusters.
The 118 up-and 143 downregulated genes were input into the STRING database to identify the significant interactions between proteins and then visualized by Cytoscape software. 186 nodes with 348 edges were selected to construct the PPI networks with a confidence score of > 0.900 (highest confidence). 2 gene clusters both including 9 nodes with 36 edges were identified by MCODE which was built in the Cytoscape (Fig. 2). Gene cluster 1 includes five down-regulated genes (C5, ADCY1, GNG7, CXCL12, LPAR3) and four up-regulated genes (CXCL3, CXCL5, CCL20, NMU). All genes of gene cluster 2 were up-regulated genes (IFIT2, IRF4, BST2, XAF1, OAS1, OAS2, IFI6, IFI27, RSAD2).  19 . We also used the TNMplot database for the comparison of these genes expression validated by Kaplan-Meier curve and log-rank test analy-  We downloaded the PAAD data set from the TCGA database to analyze the relationship between GNG7, ADCY1 mRNA expression and the clinical outcome. The mRNA expression of GNG7 and ADCY1 in the PAAD tissues was significantly lower than in the normal tissues. In addition, the GNG7 and ADCY1 mRNA expression www.nature.com/scientificreports/ was incrementally downregulated with increasing neoplasm histology grades as well as tumor stages (Fig. 6A-D). The low expression of GNG7 mRNA was related to the neoplasm histology grades (P = 0.045) and survival (P = 0.007) in PAAD patients. The low expression of ADCY1 mRNA was related to gender (P = 0.035), neoplasm histology grades (P = 0.008), and survival (P = 0.036) ( Table 2 Fig. 6E). Finally, we performed the Pearman correlation test analysis in the LinkedOmics database 28 to investigate the correlation of the mRNA expression between GNG7 and ADCY1 in the PAAD samples. The result exhibited a significantly positive correlation (r = 0.622, P = 0.000, Fig. 6F). Diagnostic and prognostic value of GNG7 and ADCY1 was investigated in the GSE84129 and GSE62452 datasets.

Survival and expression analysis of cluster of genes in the Kaplan
We downloaded two independent PAAD datasets from the GEO database (GSE84129 and GSE62452 datasets) to analyze the diagnostic and prognostic value of GNG7 and ADCY1. In the GSE84129 dataset, PAAD patients with high survival time (more than 24 months) have significantly higher mRNA expression of GNG7 and ADCY1 than patients with short survival time (no more than 24 months) (Fig. 7A,B). In addition, the ROC analysis showed that combined expression of GNG7 and ADCY1 have diagnostic value in distinguishing between PAAD and normal pancreas tissue (AUC = 0.863, P = 0.0008, Fig. 7C). In the independent GSE62452 dataset, the GNG7 and ADCY1 mRNA expression was incrementally downregulated with increasing tumor stages as well as neoplasm histology grades (Fig. 7D-G). Furthermore, PAAD patients with high expression of GNG7 and ADCY1 have a longer survival time than patients with low GNG7 and ADCY1 expression (Fig. 7H,I). We also performed    Fig. 7J). Moreover, the Multivariate Cox Regression analysis and the results showed that low expression of GNG7 and ADCY1 were independent risk factors for OS of PAAD patients (Fig. 7K). All these results demonstrated that the GNG7 and ADCY1 has diagnostic and prognostic value for PAAD.

Discussion
PAAD is one of the most malignant cancers. Due to the lack of sensitive early diagnosis biomarker, it is challenging to develop effective treatments to delay and reverse the progression of PAAD. The 5-year survival percentage of PAAD patients for all stages was only 9%, based on data in 2019 released by the American Cancer Society 29 . Therefore, more meaningful biomarkers should be discovered to improve the diagnostic and prognostic efficiency of PAAD patients 30,31 . At present, a variety of biomarkers with prognosis and prognosis for PAAD has been reported 32-34 . Jiang et al. suggest that CDK1 and CCNA2 may be potential diagnostic and prognostic biomarkers in patients with PAAD 35 . In addition, CDK1 and CCNA2 has been reported promoting progression of PAAD via regulating the P53 and cell cycle signaling pathway, which were subpathways of Pathway in cancer [35][36][37] . Georgiadou et al. reported that overexpression of VEGF and Id-1 was associated with high microvessel density and emerged as prognostic factors in patient survival in PAAD 38 . Additionally, the VEGF signaling pathways take an important position in Pathway in cancer (Fig. 6). This present study aimed to screen out the genes that has diagnostic and prognostic significance for PAAD by comparing the tumor tissues with adjacent non-malignant tissues. 118 up-regulated and 143 down-regulated overlapping DEGs were screened out from the microarray data GSE55643 and GSE15471. Then, we performed the GO and KEGG enrichment analysis for overlapping DEGs using the DAVID database. Subsequently, we constructed a PPI network with 186 nodes/genes and 348 edges in the STRING database and visualized it by Cytoscape software. From the PPI network, we screened out the two most core gene clusters, which including 13 up-regulated and 5 down-regulated genes. We analyzed the mRNA expression and the prognosis of survival of these genes in the TNMplot and Kaplan-Meier plotters database, respectively. By this process, 10 up-regulated Table 2. Correlation between GNG7, ADCY1 expression and clinical outcomes in PAAD (177cases, GDC TCGA pancreatic Cancer cohort).    www.nature.com/scientificreports/ and 3 down-regulated genes were screened. Next, we performed KEGG analysis on 13 selected genes in DAVID to explore the potential pathway and mechanism that 13 genes may regulate the occurrence and development of PAAD. We observed that two genes (GNG7, ADCY1) meaningfully enriched in the Pathways in Cancer. Additionally, we performed diagnostic and prognostic analysis of GNG7 and ADCY1 in the independent TCGA cohort, GSE84129 and GSE62452 datasets. The result showed that the mRNA expression of GNG7 and ADCY1 in the PAAD tissues was significantly lower than in the normal tissues. The Multivariate Cox Regression analysis showed that the low expression of GNG7 (HR (95%CI): 2.106 (1.357-3.269); P = 0.010) and ADCY1 (HR (95%CI): 1.881 (1.221-2.898); P = 0.004) were independent risk factors for overall survival of PAAD (Table 3). The ROC analysis showed that low combined expression of GNG7 and ADCY1 have diagnostic value in distinguishing between PAAD and normal pancreas tissue. Meanwhile, the Pearman correlation test analysis in the LinkedOmics database exhibited a significantly positive correlation between GNG7 and ADCY1 (r = 0.622, P = 0.000). We observed that the expression of GNG7 and ADCY1 in the TCGA database has no significant difference between adjacent normal tissue and tumors with early stages (stage 1) or tumor grades (grade 1). This lack of significance may relate to the small normal sample size (n = 4). After all, we notice a significantly different expression in the GSE62452 dataset between adjacent normal tissue and tumors with early stages or grades. However, the diagnostic efficiency of GNG7 and ADCY1 for early stage PAAD indeed requires further investigations. Therefore, all our results indicate that GNG7 and ADCY1 may as diagnostic and prognostic biomarker for PAAD. GNG7 are involved as a modulator or transducer in various transmembrane signaling systems. The beta and gamma chains are required for the GTPase activity, for replacement of GDP by GTP, and for G protein-effector interaction. Plays a role in the regulation of adenylyl cyclase signaling in certain regions of the brain 39 . A previous study showed that GNG7 was down-regulated in clear cell renal cell carcinoma tissues due to promoter methylation and frequent gene mutations, and negatively associated with overall survival 40 . Additionally, previous studies showed that reduced expression of GNG7 was associated with breast cancer, lung carcinogenesis, head and neck cancer and esophageal cancer [41][42][43][44][45] . However, rarely the previous study has addressed the role of GNG7 in PAAD and its prognostic value. This study is the first systematic investigation of diagnostic value, clinical significance of GNG7 in PAAD.
Adenylate cyclase 1 (ADCY1) encodes a member of the adenylate cyclase gene family that is primarily expressed in the brain 46,47 . This protein is regulated by calcium/calmodulin concentration and may be involved in brain development. ADCY1 can regulate drug resistance in lung cancer through participating in cAMP signaling pathways and was of great significance to be a novel prognostic biomarker 48 . It is worth noting that the cAMP signaling pathways was the subpathways of Pathway in cancer. Additionally, the upregulation of ADCY1 by regulating microRNA-127-3p exerts anti-tumor effects on colon cancer 49 . Y Li et al. showed that the expression of ADCY1 was downregulated in osteosarcoma compared with benign bone tumors, suggesting that ADCY1 may be potential biomarkers for osteosarcoma tumorigenesis and therapeutics 50 . Furthermore, previous studies reported that ADCY1 was a key candidate gene in melanoma and rectal adenocarcinoma metastasis 51,52 . However, Ma M et al. reported that ADCY1 was regulated by miR-23a-3p and plays a cancer-promoting role in mucosal melanoma 53 .
In summary, this study screened out 2 potential genes that has diagnostic and prognostic significance for PAAD through the systematic bioinformatic-based analyses. Of course, all of the DEG candidate genes we have screened out related to PAAD should be confirmed through molecular biology and cytology experiments.

Conclusion
The mRNA expression level of GNG7, ADCY1 was significantly down-regulated in PAAD tissues compared with adjacent normal tissues. Low GNG7, ADCY1 mRNA expression level was associated with poor clinical outcomes of PAAD patients. GNG7 and ADCY1 may be an oncogene that regulated tumorigenesis and development through pathways in cancer and could be used as a biomarker of the diagnostic and prognostic value of PAAD.