FN1, SPARC, and SERPINE1 are highly expressed and significantly related to a poor prognosis of gastric adenocarcinoma revealed by microarray and bioinformatics

Gastric adenocarcinoma (GAC), also known as stomach adenocarcinoma (STAD), is one of the most lethal malignancies in the world. It is vital to classify and detect the hub genes and key pathways participated in the initiation and progression of GAC. In this study, we collected and sequenced 15 pairs of GAC tumor tissues and the adjacent normal tissues. Differentially expressed genes (DEGs) were analyzed and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway and Gene Ontology (GO) analysis were used to annotate the unique biological significance and important pathways of enriched DEGs. Moreover, we constructed the protein-protein interaction (PPI) network by Cytoscape and conducted KEGG enrichment analysis of the prime module. We further applied the TCGA database to start the survival analysis of these hub genes by Kaplan-Meier estimates. Finally, we obtained total 233 DEGs consisted of 64 up-regulated genes and 169 down-regulated genes. GO enrichment analysis found that DEGs most significantly enriched in single organism process, extracellular region, and extracellular region part. KEGG pathway enrichment analysis suggested that DEGs most significantly enriched in Protein digestion and absorption, Gastric acid secretion, and ECM-receptor interaction. Furthermore, the PPI network showed that the top 10 hub genes in GAC were IL8, COL1A1, MMP9, SST, COL1A2, TIMP1, FN1, SPARC, ALDH1A1, and SERPINE1 respectively. The prime gene interaction module in PPI network was enriched in protein digestion and absorption, ECM receptor interaction, the PI3K-Akt signaling pathway, and pathway in cancer. Survival analysis based on the TCGA database found that the expression of the FN1, SERPINE1, and SPARC significantly predicted poor prognosis of GAC. Collectively, we identified several hub genes and key pathways associated with GAC initiation and progression by analyzing the microarray data on DEGs, which provided a detailed molecular mechanism underlying GAC occurrence and progression.

Microarray could be used to probe the key biomarkers and provide a better understanding of the molecular mechanisms involved in GAC. Until now, clinically applicable biomarkers are still lacking. Therefore, exploring novel and effective molecular biomarkers to elucidate effective therapeutic targets for GAC patients is still imperative. In this study, we focused on the different expression pattern between the GAC tumor tissues and matched normal tissues. To discover the hub genes and key pathways associated with the initiation and progression of GAC, we applied differential gene expression analysis and functional enrichment analysis. In conclusion, we identified a set of hub genes that participated in several cancer-relevant pathways and their abnormal expression are correlated with the clinical prognosis of GAC people by overall survival analysis.

Methods and Materials
Patients and samples. Tumor and matched normal tissues samples were obtained from the GAC patients at the Affiliated Hospital of Xuzhou Medical University in 2014. These tissues were stored in RNAlater (Ambion, Life Technologies, ThermoFisher Scientific, Waltham, MA, USA) at 4 °C until full penetration of RNAlater into the tissues and transferred to −80 °C for storage. The selection criteria were as follows: (1) the subject presented was diagnosed as GAC and no history of other tumors; (2) Complete demographic and clinical data including age, gender, clinical manifestations, tumor size, the extent of resection, and date of relapse and/or death have been collected. In order to get the formal permission of surgical procedures and the intelligent use of the resected tissues, the legal surrogates of those participants provided their Written informed consent. The National Regulations on the Use of Clinical Samples in China is as a guideline for human tissue acquisition and legitimate use. This study was approved by the Medical Ethics Committee of the Affiliated Hospital of Xuzhou Medical University. The demographic and clinical features of the patient were summarized in Table 1.
Profiling of gene expression. Total RNA was extracted from frozen tissues separately using EZNA ® HP Tissue RNA Kit (Omega Bio-Tek Inc., Norcross, GA, USA) according to the manufacturer's recommended procedure. Bioanalyzer 2100 (Agilent Technologies, Palo Alto, Calif.) was used to assess the quality and quantity of these total RNAs. Affymetrix microarray was used for mRNA profiling, which was performed by GeneChem (Shanghai Genechem Co., Ltd.). Briefly, after rRNA removal, biotinylated aRNA (cRNA) was prepared according to the manufacturer's protocol (3′ IVT Express Kit, Affymetrix 901228). PrimeView Human Gene Expression Array (cat. no. 901838; Affymetrix; Thermo Fisher Scientific, Inc.) were hybridized and scanned according to standard Affymetrix protocols. All samples were processed in technical duplicate. GeneChip Scanner 3000 (Affymetrix) was used to scan the completed arrays. Images were extracted with Affymetrix GeneChip Command Console (AGACC) and analyzed by using Expression Console Software (Affymetrix, CA, USA). Data were deposited in the Gene Expression Omnibus database (http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?ac-c=GSE118916, GSE118916). Data preprocess and differentially expressed genes identification. Probes were converted to gene symbols according to the platform annotation information of the raw data. The expression value for a gene, which was mapped by multiple probes, was acquired by selecting the max value among those probes. Those invalided probes without any gene information were removed. The original CEL data was then started background correction, normalization, and expression calculation by using the R package "affy". The R package limma (http://www.R-project.org) was used to conduct data normality by log2 transformation. We applied the R package "limma" to identify the differentially expressed genes (DEGs) following the following criteria: (I) |logFC| > 2; (II) P-value < 0.05 and (III) false discovery rate (FDR) < 0.05.
Gene Ontology and Kyoto Encyclopedia of Genes and Genomes pathway enrichment analysis. Gene Ontology (GO) enrichment analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis were conducted by using clusterProfiler 6 to reveal the unique biological significance and key pathways associated with GAC of the DEGs (criteria: p-value < 0.05, significantly enriched). Fisher's exact test 7 was used to identify the significant GO terms and pathways and corrected P-value was obtained by Benjamini and Hochberg (BH) false discovery rate (FDR) algorithm. Cytoscape 8 , Enrichment Map 9 , and Gephi 10 were used for visualization of the network.   www.nature.com/scientificreports www.nature.com/scientificreports/ The protein-protein interaction network construction. The Retrieval of Interacting Genes (STRING v10) 11 (http://string-db.org/) was used to analyze the interactive relationships among DEGs to construct protein-protein interaction (PPI) network and only experimentally validated interactions with a combined score >0.4 were selected as significant. Cytoscape was used to construct the PPI network and Gephi was used to network visualization. The plug-in Molecular Complex Detection (MCODE) was used to select the prime module from the PPI network. The criteria were set as follows: MCODE scores >2 and number of nodes >5. Then the KEGG pathway enrichment analysis of the DEGs from the module was conducted. P < 0.05 was considered to be significant.  www.nature.com/scientificreports www.nature.com/scientificreports/ TCGA data acquisition and processing. We searched the GAC cases with both clinical information and gene expression profile from The Cancer Genome Atlas (TCGA) database 12 by using the R package "OIsurv". The expression value of each hub genes was defined as either high (expression value >median value) or low (expression value <median value). Survival curves were analyzed by the Kaplan-Meier method 13 , and univariate survival analysis was performed using a log-rank test. Overall survival (OS) was defined as the period between the resection date and death for any cause.
TCGA data analysis. We downloaded the RNA-seq data and clinical phenotype information of the GAC-related tumor and the adjacent normal tissues from The Cancer Genome Atlas (TCGA) database (https:// xenabrowser.net/datapages/). We calculated the expression level of FN1, SERPINE1, and SPARC and conducted the difference significance test between the GAC tumor group and the normal group. We further did the correlation analysis between the hub gene and the disease stage.
Statistical analysis. The statistical data were processed using SPSS 19.0 software (IBM, SPSS, Chicago, IL, USA). The paired t-test was performed to evaluate the significant difference between tumor tissues and tumor-adjacent normal tissues. The statistical significance of the difference in patients' overall survival (OS) between the high expression group and low expression group was obtained using the two-sided log-rank test. All p-values were two-sided, and generally, p-values < 0.05 were considered to have statistical significance.

Results
Differentially expressed genes in gastric adenocarcinoma. Total 15 pairs of GAC tumor and normal samples were analyzed (Fig. 1A) and the basic global gene expression pattern among all samples was denoted by Pearson correlation matrix for calculation of pairwise correlation coefficient (Fig. 1B). Then the first step was to identify the DEGs between these two groups. The classical Bayesian algorithm was applied to identify the DEGs and volcano plots were used for visualizing the DGEs variation between tumor and normal tissues (Fig. 1C). In total, 233 mRNAs displayed differential expression in GAC, including 64 up-regulated mRNAs and 169 down-regulated mRNAs. Each of the top 10 upregulated DGEs and downregulated DGEs were listed in Table 2. Hierarchical clustering analysis showed the expression pattern of DEGs among samples, which suggested that the expression of genes in GAC tumor tissues significantly differ from those in adjacent-normal tissues (Fig. 1D). Detailed information of these up-and down-regulated probes is shown in Supplementary Tables 1 and 2. Differentially expressed genes enrichment analysis. Gene enrichment analysis uses predefined gene sets and ranks of genes to identify significant biological changes or gene co-expression pattern, which could assess the functional associations of target gene set derived from the set of experiments. KEGG analysis results showed that DEGs were mostly enriched in Protein digestion and absorption, Gastric acid secretion, and ECM-receptor interaction (Table 3). Figure 2A shows the TOP20 enriched KEGG pathways. The five top mostly enriched  www.nature.com/scientificreports www.nature.com/scientificreports/ pathways were got to construct a network (Fig. 3A). From the network, the up-regulated genes Human fibronectin gene (FN1), collagen α1 (I) gene (COL1A1) and collagen α2 (I) gene (COL1A2) were supposed to have a direct relationship with ECM-receptor interaction, while the down-regulated gene Somatostatin (SST) was supposed to be associated with Gastric acid secretion. GO analysis results showed that DEGs were mostly enriched in the single-organism process (belonging to a biological process), cellular process (belonging to a biological process), and cell (belonging to cellular component) ( Table 4). Figure 2B shows the TOP10 enriched function of molecular function (MF), biological process (BP) and cellular component (CC) respectively. The five top mostly enriched functions were also got to construct a network (Fig. 3B). The detailed information of KEGG and GO enrichment results are listed in Supplementary Tables 3 and 4 separately.

Discussion
GAC is one of the lethal of malignancies originating in the stomach, which is a kind of heterogeneous disease and consists of various phenotypes and genotypes 5 . Of all, cell cycle plays a key important role in cancer originating including mutation of signaling pathways or mutation of protein-encoding gene, and those relevant cell cycle proteins are thought to have a vital association with cancer progression. Making deeper understanding of the role of cell cycle proteins, including revealing their function mechanisms and finding their relevant upstream or downstream targets, could help us to discover more valuable biomarkers. Fortunately, the development of biotechnology, like microarray, transcriptome sequencing, and proteomics, has provided an advanced way to get the nearly all gene expression information during all stages for any human genomes and lots of potential therapeutic targets have been excavated, including AFP 14 , BCR-ABL 15 , BRCA1/BRCA2 16 , EGFR 17 , HER2 18 and so on.
In this study, we analyzed the gene expression profiles of 15 GAC tumor tissues and 15 adjacent none-tumor tissues by microarray. We finally identified 64 up-regulated DEGs and 169 down-regulated DEGs. The heatmap of all DEGs showed obviously different between the two groups (Fig. 1D), which demonstrated that each group has its specific gene expression pattern and the samples from the same group seemed to share the similar expression pattern and often participate in a parallel biological process (Fig. 1B). To better understand the interactions among DEGs, we further performed functional enrichment analysis. The GO term enrichment analysis shows that DGEs were mainly involved in cell, extracellular region, extracellular region part, extracellular space, and single organism process (Fig. 3B). As previous studies, the structure of cancer-relevant proteins, like HER2 19 , HER3 20 has a ligand-activated state in the extracellular region and could interact with other same receptors in the absence of direct ligand binding, which maybe indicate that many DEGs of this study might be transmembrane proteins and may involve in some extracellular signal transduction pathways. KEGG pathway enrichment analysis reveals that DEGs were mainly involved in several cancer-related pathways such as ECM-receptor interaction, protein digestion and absorption, gastric acid secretion and retinol metabolism (Fig. 3A). Gastric acid secretion has been reported to have an association with gastric mucosal healing 21 , which may be involved in the destruction of normal cell proliferation. The previous study pointed out that retinoic acid receptors are frequently having low expression in epithelial gastric cancer 22 .
Subsequently, in order to find the co-existence pattern among all the DEGs, we constructed the PPI network. From the network, 10 hub genes with high degrees were found (Fig. 4). Moreover, we extracted the main module from the PPI network and carried out KEGG pathway enrichment analysis (Fig. 5). The result showed that the family of collagen genes (CLO1A1, COL1A2, COL2A1, etc.) are tight clustered and have directed association with protein digestion and absorption, ECM-receptor interaction and P13K-Akt signaling pathway. Besides the other two pathways, P13K-Akt signaling pathway has been reported to be vital in cell cycle including cell proliferation and gene mutation and is activated in various cancers 23 . The FN1 gene, a glycoprotein involved in cellular adhesion and migration processes, was denoted to have an association with the pathway in cancer.
In order to find the relationship between the target genes and the GAC prognosis, we conducted survival analysis based on the TCGA data, which included 411 GAC cases. The result showed that FN1, SPARC, and SERPINE1 expression were significantly (p < 0.05) associated with GAC prognosis (Fig. 6). The high expression of these three genes often indicates a poor prognosis. The expression level of these three hub genes were all remarkably higher in GAC tumor tissues than the normal tissues (Fig. 7), which indicated these hub genes might have potential impact on the GAC cell progression.

Conclusion
We identified the hub genes and key pathways associated with GAC by analysis microarray data and provided more information about molecular mechanisms about GAC occurrence and progression, holding promise for acting as biomarkers and potential therapeutic targets. However, further molecular biological experiments including q-RT PCR, colony-formation assay, and flow cytometry analysis are required to confirm the specific functional role and mechanisms underlying the three crucial genes including FN1, SPARC, and SERPINE1 in GAC.