MicroRNA hsa-mir-3923 serves as a diagnostic and prognostic biomarker for gastric carcinoma

Gastric carcinoma (GC) refers to a common digestive system disease that exhibits a very high incidence. MicroRNA hsa-mir-3923 belongs to a type of miRNA, of which the function has been merely investigated in breast, pancreatic cancers and pre-neoplasic stages of gastric cancer. It has not been studied or reported in gastric carcinoma, so the relationship between gastric hsa-mir-3923 expression and the clinics feature and pathology of GC cases was examined. This study employed data mining for analyzing gastric carcinoma data in The Cancer Genome Atlas database. A Chi squared test was performed for assessing the relations of hsa-mir-3923 expression with clinics-related and pathology-regulated variables. This study conducted the assessment of the role of hsa-mir-3923 in prognostic process using Kaplan–Meier curves, Receiver operating characteristic (ROC) analysis and proportional hazards model (Cox) study. With the use of Gene Expression Omnibus, this study carried out gene set enrichment analysis (GSEA). In the meantime, the common miRNA database was compared to predict potential target genes; as revealed by co-expression analysis, a regulatory network probably existed, containing hsa-mir-3923. For the analysis of the most tightly associated cytological behavior and pathway in GC, this study adopted the databases for Annotation, Visualization and Integrated Discovery (David) and KO-Based Annotation System (KOBAS). Cytoscape, R and STRING were employed for mapping probable regulatory networks displaying relations to hsa-mir-3923. Lastly, we obtained 69 genes most tightly associated with hsa-mir-3923 and described their relationship with Circos plot. As revealed from the results, hsa-mir-3923 displayed up-regulation in gastric carcinoma, and it displayed associations with vital status, N stage and histologic grade when being expressed. The predicted results of miRNA target genes suggested that there may be a close relationship between 66 genes and hsa-mir-3923 in gastric cancer. As indicated from co-expression data, a small regulating network of 4 genes probably existed. Our results elucidated that hsa-mir-3923 high-expression reveals poor prognosis of GC patients.


Materials and Methods
Data acquisition and collection. RTCGA Toolbox package (version 3.5) in R (version 3.5.3) provided the data of gastric carcinoma cases and RNA-seq expression outcomes 9,10 . Additionally, this study achieved the expression data of hsa-mir-3923 tumor from TCGA in terms of several digestive tumors, covering stomach, pancreas, liver, esophagus, colon and bile duct. The GEO database (https://www.ncbi.nlm.nih.gov/geo/) provided gene microarray with cancer tissue data (GSE13195 & GSE30727) 11 . In the mentioned databases in June 2019, this study obtained the data employed here 12 . Statistical analyses. SPSS software 23.0 (IBM Corporation, Armonk, NY, USA) was employed for data analyzing. This study adopted R/Bioconductor package of edgeR for determining miRNAs with differential expression based on TCGA STAD dataset 13 . All thresholds were set at the absolute log2(count + 1) fold change (tumor/ normal) ≥ 2 and the false discovery rate (FDR) < 0.01. Boxplots were adopted in terms of discrete variables for the measurement of diversifications in expression, and influences exerted by clinicopathological characteristics on hsa-mir-3923 expression were studied by Kolmogorov-Smirnov test (K-S test). This study presented alterations in expression between respective group by scatter plots. χ 2 tests were adopted for examining the correlation of hsa-mir-3923 expression and clinical data 14 . GraphPad Prism 7.0 software (GraphPad Software, Inc.) was employed for analyzing the differentially expressed condition of hsa-mir-3923 in a range of tumor tissues 15 . Scatter plots and histograms were adopted for discrete parameters for measuring diversifications in expression between a range of tissues, and influences exerted by tumor tissue of origin on hsa-mir-3923 expression were analyzed using the mean ± SD. Receiver-operating characteristic curve (ROC) was plotted by "p-ROC package" (version 1.0.3) 16 for evaluating the diagnosing ability we divide cases to groups with high and low hsa-mir-3923 expression by the best cutoff value of OS determined by the Youden index 17 . Correlation coefficient analyses were performed using R software; a correlation coefficient R > 0.5 was taken into account for indicating a strong correlation 18 . Kaplan-Meier curves were adopted for the comparison of the diversifications in the overall survival and relapse-free survival using survival package in R 19 . Univariate Cox analysis was used to select the related variables. Subsequently, the Multivariate Cox analysis was employed on the effect exerted by hsa-mir-3923 expression on the overall survival and relapse-free survival of cases 16 . Gene set enrichment analysis (GSEA). GSEA refers to a computational approach determining if an a priori defined set of genes is of statistical significance, concordant diversification between two biological states. Here, GSEA was carried out with the GSEA software 3.0 from the Broad Institute 20 . The gene expression data referred to RNA-seq data from GEO and TCGA-STAD database. The gene set of "c2. cp.biocarta.v6.2.symbols.gmt", "c3. cp.biocarta.v6.2.symbols.gmt", "c5.cp.biocarta.v6.2.symbols.gmt" and "h.all.v6.2.symbols.gmt", summarizing and representing specific, well-defined biological states or processes, originated from the Molecular Signatures Database (http://software.broadinstitute.org/gsea/msigdb/index.jsp) 21 . The normalized enrichment score (NES) was calculated by analysis with permutations for 1,000 times. A gene set shows significant enrichment at a normal P-value of <0.05 and false discovery rate (FDR) of <0.25.
Gene enrichment and functional annotation evaluation. The Database for Annotation, Visualization, and Integrated Discovery (DAVID; http://david.abcc.ncifcrf.gov/) 22 24 were adopted for conducting related pathway analysis 25 , and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway and Gene ontology (GO) term enrichment analysis was carried out for the functional annotation of the co-expressed genes 26 . Three GO terms [molecular function (MF), cellular component (CC) and biological process (BP)] were employed for identifying the enrichment of target genes. GO terms and KEGG pathways with P-values <0.05 were of statistical significance. With the use of Cytoscape, the enrichment map of annotation analysis was made (version 3.3.1) (http://www.cytoscape.org/cy 3.html) 27 .
Prediction of related genes. Comparative analysis several miRNA databases, such as miRDB (http:// www.mirdb.org/miRDB/), miRPathDB (https://mpd.bioinf.uni-sb.de/), TargetScan (http://www.targetscan.org/), Weighted co-expression network construction. The co-expression study module of WGCNA's result ("WGCNA" package in R) can harvest genes co-expressed with hsa-mir-3923, to build a weighted correlation network by WGCNA 31 . For WGCNA, the R package DCGL (version 2.1.2) was adopted for filtering genes; we took genes with FPKM values >0.85 to conduct subsequent study. The adjacency matrix between a range of genes was built with 3 as the variable of soft thresholding power to decrease noise and false relation. In brief, weighted correlation matrices were transformed into matrices of connection strengths using a power function 32 . The mentioned link strengths were subsequently adopted for calculating topological overlap, a robust and biologically   Genome map. When the gene expression matrixes of GC cases in three databases were integrated, the correlation coefficient between different genes and hsa-mir-3923 (calculation method reference statistic part) was calculated, and the Venn diagram with GraphPad on the calculated results was drawn to verify the feasibility of the results achieved in the co-expression study again. Alinements of gene co-expression maps to GRCh38.95 (reference genome version of Homo sapiens) (ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000001405.15_ GRCh38.95) 35 found reference genomic regions promoting the composition of 69 genome sets (the location of the genes of interest in the human reference genome). To search for the most typical reference fragments, all GRCh38.95 loci presenting in the gene co-expressing maps were derived and integrated, so 69 reference donor fragments were achieved, as fixed in the outermost track. The number of gene regulation maps containing the tags of each fragment is then labeled in the second orbital, excluding the double count. In the internal layer, gene pairs showing regulatory relationships were linked. The sum of the genome map alignments across the entire genome was adopted to link the 69 gene regulatory maps in the Circos plot 36 .

Results
Patients' characteristics. The Cancer Genome Atlas (TCGA) database provided both gene expression and clinical data of cases with gastric carcinoma. The overall number of cases was 452. When the first screening was achieved, 25 tumor samples and 3 normal samples were removed with excessive number of lost or ambiguous data, and the rest 39 normal samples and 385 tumor samples could be achieved. The specific clinical characteristics, covering ethnic compositions, pathological status, survival status, TNM stage, gender, and age are listed in Table 1.

MicroRNA hsa-mir-3923 high-expression in GC.
With the use of boxplots, the diversification in hsa-mir-3923 expression in GC cases and normal people were ascertained. Figure 1 suggests that the overall expression trend of hsa-mir-3923 in GC was assessed, and subsequently hsa-mir-3923 expression was reported to be remarkably higher in primary cancer tissues than in normal gastric tissues (P = 0.027; Fig. 1A). Besides, distinct hsa-mir-3923 expressions existed in the groups by vital status (P = 0.050; Fig. 1D), T stage (P = 0.002; Fig. 1E), N stage (P = 0.037; Fig. 1F) and histologic grade (P = 0.034; Fig. 1I). It is noteworthy that diversifications  www.nature.com/scientificreports www.nature.com/scientificreports/ in hsa-mir-3923 expression were identified in accordance with patient age as well as TNM stage, gender, race and other clinicopathological parameters (Fig. 1). hsa-mir-3923 expression data in several common tumors were also harvested from TCGA database ( Fig. 2A). (Note: For the late discovery of this miRNA, there is no such miRNA www.nature.com/scientificreports www.nature.com/scientificreports/ expression data in numerous tumor types. The statistical analysis was only conducted on the tumor types with this special miRNA expression and large sample size.) After horizontal comparison, hsa-mir-3923 expression was reported to be up-regulated in tumors from a range of organs and tissues ( Fig. 2B,C).

The relationship between hsa-mir-3923 expression and clinical features in GC.
In accordance with Chi-square tests, the relationship of the clinical features with the expression of hsa-mir-3923 was studied and enumerated in Table 2. hsa-mir-3923 expression displayed tight relations to vital status (X 2 = 4.829, P = 0.028).

GSEA identifies the hsa-mir-3923 related biological functions and proteins.
For the identification of biological functions excited in gastric carcinoma, data were screened out from tissue chips (GSE13195 & GSE30727) in the GEO database. The GSEA between high and low hsa-mir-3923 expression data sets was performed. GSEA reveals significant differences (FDR < 0.25, P-value<0.05) in the enrichment of "MSigDB   www.nature.com/scientificreports www.nature.com/scientificreports/ Collection", and Table 4 lists the specific contents. In the GC, hsa-mir-3923 participates in the anabolism of RNA in tumor cells, covering NLS (Nuclear localization sequence) bearing protein import into nucleus, piRNA metabolic process, regulation of alternative mRNA splicing via spliceosome, pre mRNA binding and enhancement of RNA polymerase activity, etc. Moreover, hsa-mir-3923 was reported probably participating in the sperm formation process, sperm maturation and the final fertilization process. Likewise, the results of KEGG pathways clarify that hsa-mir-3923 participates in the anabolism of RNA (e.g., RNA degradation and RNA polymerase). It may be wrapped in exosomes, probably acting as a marker for tumor detection. Furthermore, the miRNA can also be involved in the anabolism of proteins, and these signaling pathways cover glutamate metabolism and alanine aspartate, protein export, pyrimidine metabolism, cysteine and methionine metabolism, etc. Only the 20 most characteristic biological functions and signaling pathways were selected, as listed in Table 4.

Estimation of relevant genes and gene-enrichment and functional annotation studies.
Several miRNA databases (e.g. miRDB, miRPathDB, TargetScan, miRNAWalk and miRTarBase) were comparatively analyzed to find lncRNAs with regulatory relationships with hsa-mir-3923 and mRNAs that this special miRNA may regulate. After the TCGA and GEO databases were compared, the probably relevant genes in GC were estimated. In Fig. 5, after comparing these five common miRNA databases, 66 more relevant target genes were identified to be probably present in gastric cancer tissues, each of which are presented in Table 5. First, STRING was used for enriching the functional protein relation network. By eliminating some nodes with no additional links, some small regulatory networks were found probably existing in the whole system, as shown in Fig. 6. By KOBAS and DAVID, we identified the noticeable GO terms and KEGG pathways. Cytoscape (version 3.3.1) was used for a visual enrichment study of genes up-regulated in the GO pathways and for building an interaction network for related genes (Fig. 7). Next, R was adopted to visually enrich KEGG pathways (Fig. 8) and these GO terms (Fig. 7). Table 6 clarifies that these genes are critical in the biological behaviors below: molecular function (MF) (2,4-dichlorophenoxyacetate alpha-ketoglutarate dioxygenase activity, hypophosphite dioxygenase activity, sulfonate dioxygenase activity, procollagen-proline dioxygenase activity, C-20 gibberellin 2-beta-dioxygenase activity, C-19 gibberellin 2-beta-dioxygenase activity, DNA-N1-methyladenine dioxygenase activity, ion channel binding), biological process (BP) (proteasome-mediated ubiquitin-dependent protein catabolic process, peptidyl-proline hydroxylation, negative regulation of oxidative stress-induced intrinsic apoptotic signaling pathway, transmembrane transport, calcium ion transmembrane transport, mitotic spindle organization) and cellular component (CC) (nucleoplasm, neuron projection, spindle). Besides the mentioned information regarding GO terms, hsa-mir-3923 is tightly associated with these KEGG pathways (e.g., pantothenate and CoA biosynthesis, MAPK signaling pathway) within GC cells, as well as the progression of arrhythmogenic right ventricular cardiomyopathy (ARVC).
Co-expression network construction. First, R's "edgr" package was used for calculating the diversification (log Fold Changeå 1, P-value <0.05) in expression among mRNA, miRNA and lncRNA in the TCGA-STAD and GEO (GSE13195 & GSE30727) database. Then, co-expression analysis on the data was performed again.

GO Terms
Size  www.nature.com/scientificreports www.nature.com/scientificreports/  Table 5 for details.  www.nature.com/scientificreports www.nature.com/scientificreports/ WGCNA package in R was adopted for the analysis of the relationship between lncRNA, miRNA and mRNA (Power Value=0.85) (Fig. 9). Next, the gene associated with hsa-mir-3923 was taken for co-expression grid. Since the initial screening conditions are overly loose (Fig. 10A), the constructed co-expression network nodes are extremely sophisticated. Accordingly, the screening criteria were modified, and more rigorous screening conditions were exploited to build a more concise co-expression network. (log Fold Changeå 2, P-value <0.05) (Fig. 10B) & (log Fold Changeå 2, P-value <0.01) (Fig. 10C). Nevertheless, the result overlaps with over 96% between the 66 genes predicted in Fig. 5, so to avoid duplication, the functional enrichment analysis was not performed again. Instead, the scope of gene screening was narrowed, and redundant and interference genes were eliminated. Cytoscape was employed for building a possible co-expressed regulatory network, revealing that in these co-expressed genes, there might be a small regulatory network only covering 4 genes (hsa-mir-3923, AC117402.1, AC009646.2 and OPRK1) ( Table 5).

MicroRNA has-mir-3923 Target Gene Prediction
Gene expression and Co-expressed genome maps. 69 genes probably associated with hsa-mir-3923 were integrated and considered with the use of two different estimation methods. In addition, their variations in gastric cancer expression are shown in Fig. 11A. In accordance with the alinements of genes co-expressed maps to GRCh38.95, genomic regions for reference were identified promoting 70 genome sets' composition.(hsa-mir-3923 marked as red dot) The genes co-expressed result and predictive analysis results in GEO (GSE13195 & GSE30727) and TCGA-STAD with hsa-mir-3923 were comprehensively analyzed based on the co-expression genes and  www.nature.com/scientificreports www.nature.com/scientificreports/ miRNA predictive analysis results. To explore the most typical fragments for reference, all GRCh38.95 loci presenting in the genes co-expression maps were derived and combined. Subsequently, 70 donor fragments for reference were created and set in the outermost track. Next, the amounts of genes co-expressed maps covering respective label of these fragments were labelled in the second track, except for duplicate counts. 91 pairs of genes were found to display co-expression. In the internal sector, the mentioned pairs of genes sets co-expressed with lines were linked. The sum of genes co-expressed map alignments across the entire genome was the connections of these genes co-expressed maps in the Circos plot. (Fig. 11B   www.nature.com/scientificreports www.nature.com/scientificreports/

Discussion
It is reported that as a gene group, MicroRNA presented high or low expression in cancer. Besides, as oncogene, miRNAs can interact with mRNAs or lncRNAs, thereby modulating cancer development and accordingly regulating the cytological behavior 37,38 . The study found the large significance of miRNA hsa-mir-3923 in gastric carcinoma, and as a biomarker, miRNA hsa-mir-3923 could be applied to detecting the prognosis of GC. Analysis on the expression of hsa-mir-3923 in patients suffering GC helped to find factors which caused the high expression of hsa-mir-3923, i.e. the survival time, the histologic grade, the vital status, T stage, N stage, etc.
Thus far, rare studies represent the significance of hsa-mir-3923 located at chromosome 3p12.3 7,8 . Though no publications have been available in STAD by far, hsa-mir-3923 was reported to be significantly up-regulated in clinical STAD tissues with normal tissues. According to the study here, hsa-mir-3923 displayed high expression in gastric carcinoma, which is consistent with other studies about tumor. Note that hsa-mir-3923 expression was significantly elevated from T1 to T4, histologic grade varied from G1 to G3 and clinical stage varied from stage I to stage IV, suggesting its relevance to the progression of cancer. Besides, the hsa-mir-3923 expression was higher in patients survival time≤3 years than survival timeå 3 years, revealing its relevance to cancer prognosis and the necessity for subgroup study. Moreover, hsa-mir-3923 was more highly expressed in the deceased as compared with the living, so it is necessary to explore its link with the survival. After the analysis of the M & N stage of GC, though the statistical results were not significant, the expression level of hsa-mir-3923 in M1 & N1 phase was higher than that in M0 & N0 phase. To exclude tissue-specific interference, gene expression data were collected for various cancer types recorded in the TCGA database. As revealed in lateral comparison, this special miRNA is highly expressed among common tumor types.
Some existing studies also explored the way hsa-mir-3923 affects the occurrence and development of tumor 7,39 . According to large-scale clinical statistics, its obvious high-expression phenomenon was identified in the development of some tumor cell lines 8,40 . In the present study, hsa-mir-3923 is capable of affecting the initiation and www.nature.com/scientificreports www.nature.com/scientificreports/ proliferation of tumor, explaining that it is clinically associated with the TNM classification. Hsa-mir-3923 is tightly associated with cancer prognosis. In the meantime, it was found that the higher the hsa-mir-3923 expression, the poorer the OS will be, particularly in age (age ≤ 55), gender (female), advanced T stage (T3/4), M1 stage, advanced pathological stage (G3/4) and advanced clinical stage (stage 3/4). The independent prognostic effect of hsa-mir-3923 on the OS of patients was revealed from the results of Cox analysis; therefore, it could monitor the GC as a biomarker. Some studies have also suggested that besides affecting some of the common biological functions of tumor cells, the miRNA hsa-mir-3923 could also affect some specific cytological behaviors. After functional enrichment analysis of hsa-mir-3923, it was reported that this special miRNA was tightly related to NLS (Nuclear localization sequence) bearing protein import into nucleus, piRNA metabolic process, regulation of alternative mRNA splicing by spliceosome, pre mRNA binding and enhancement of RNA polymerase activity. Figure 11. Circos plot derived from VENN and UpSet data. In order to explore the most represented reference fragments, all GRCh38.95 loci present in the genes co-expressed maps were deduced and merged. (A) 70 reference donor fragments were created, settled in the outermost track. Subsequently, the numbers of genes co-expressed maps containing each of these fragments' labels were marked in the second track, except for duplicate counts. 91 pairs of genes were found to be co-expressed. In the inner sector, these pairs of genes sets co-expressed with lines were linked. The sum of genes co-expressed map alignments across the whole genome acted as the links for these genes co-expressed maps in the Circos plot. (B). (2020) 10:4672 | https://doi.org/10.1038/s41598-020-61633-8 www.nature.com/scientificreports www.nature.com/scientificreports/ To delve into the biological role of miRNA hsa-mir-3923 in GC, we comparatively analyzed several miRNA databases (e.g., miRDB, miRPathDB, TargetScan, miRNAWalk, and miRTarBase) to identify 66 genes that might be tightly related to hsa-mir-3923. First, as suggested from the functional protein association network, several small regulatory networks might exist in the whole system, also indicating a probable hierarchy of regulatory networks involved in hsa-mir-3923. After functional enrichment of these genes with GO and KEGG, these genes were reported to be critical to the following biological behaviors. First, this hsa-mir-3923 could regulate the activity of many dioxygenases (e.g., 2,4-dichlorophenoxyacetate alpha-ketoglutarate dioxygenase activity, and hypophosphite dioxygenase activity). Many of the mentioned enzyme molecules were involved in the oxidative demethylation of biological molecules. Under normal conditions, it acted as one of the ways to maintain normal methylation/demethylation 41 . If under overly high activity of these enzymes, the degree of histone demethylation in some segments of the nucleus of tumor cells could decrease, making the chromosome structure more relaxed and facilitating gene transcription 42,43 . With the enhancement of these enzyme activities, it can also cause the maintenance of DNA demethylation of chromosomal sites containing CpG islands. The above biological behavior of has-mir-3923 can directly or indirectly promote the transcriptional activity of oncogenes by affecting the chromosomal structure, DNA stability, the binding capacity of transcription factors, etc. Besides, according to the results of functional enrichment analysis, the molecule could also participate in the oxidation of lipids, expedite the metabolism of lipids to reduce the damage of ROS to tumor cells and reduce the autophagy and apoptosis attributed to ROS 44 . Furthermore, such oxidation process could also facilitate the metabolism of considerable metabolic products (e.g., pantothenic acid and acetyl-CoA) into the mitochondria and participate in the Warburg Effect to promote the energy supply of tumor cells 45 . This molecule could also affect energy-related pathways (e.g., the Pi3k-MAPK signaling pathway) to regulate the cAMP concentration and calcium ion concentration in cells, which may promote the epithelial-mesenchymal transition (EMT) process in cells to induce tumor metastasis 46 .
According to the analysis on TCGA, we analyzed the data from GEO database regarding the co-expression. Then a comprehensive analysis was carried out on the co-expression result of genes in GEO and that in TCGA with hsa-mir-3923. Based on the result of WGCNA in R, lncRNA, miRNA and mRNA are associated with each other (Power Value=0.85), and they are specific genes which are related to the co-expression grid about hsa-mir-3923. A regulatory network with possible co-expression showed that these genes which are associated with each other regarding the co-expression have a small regulating network which contains four genes. Review of many miRNA databases helped to get 66 target genes with larger relevance in the gastric cancer and these genes could be found in large regulating network. Integration of data from GEO, TCGA and miRNA databases helped to get 69 genes which exhibit a close association with different types of has-mir-3923 such as lncRNAs and mRNAs. At last, we combined these data with the expression of gene to predict possible regulating networks. GRCh38.95 was used to carry out whole-genome mapping, which covers the abovementioned 70 genes that represent regions with the largest expression difference in cancer genome. On that account, it is available to directly observe the co-expressed regulating network without needing to complexly calculating the complicated assumptions.
To the best of our knowledge, the present study proved the great effect of miRNA hsa-mir-3923 on the prognosis of GC for the first time. The study together with other related studies was also beneficial for finding that hsa-mir-3923 is very important in GC. However, future studies are supposed to verify these findings relying on clinical trials, as an attempt to ensure the wide application of hsa-mir-3923 for the prognosis of GC.

Conclusion
Our study reported that the high-expression of hsa-mir-3923 was noticeably up-regulated in GC patients and associated with several clinical features and undesirable prognosis, so miRNA hsa-mir-3923 could act as an effective biomarker for the prognosis of gastric carcinoma patients.

Data availability
Availability of data and materials The Cancer Genome Atlas-Stomach Adenocarcinoma (TCGA-STAD) and Gene Expression Omnibus (GEO) (GSE13195 & GSE30727). The data used in this article was downloaded in June 2019.