Introduction

Renal cell carcinoma (RCC) is one of the most common urinary system tumors in adults, accounting for ~90% of all malignant renal tumors [1]. In 2017, there were about 63,990 new cases and 14,400 deaths in the United States [2]. The incidence of RCC has been rising at a rate of 2–4% a year [3]. Due to its asymptomatic nature at early stage, many patients are diagnosed in advanced or terminal stage, resulting in distant metastases of tumor cells to bone, lung, brain, and other important organs [4, 5]. The 5-year survival rate of metastatic RCC patients is <10%, while which of early patients is more than 70% [6, 7]. Traditional diagnostic methods have some disadvantages such as low sensitivity, poor specificity and invasiveness [8, 9]. Therefore, a new kind of early diagnostic method is urgently needed.

MicroRNA (miRNA) is a kind of small non-coding RNA, several miRNAs have been shown to play important roles in influencing gene expression at the post-transcriptional levels [10], and can extensively regulate the stability and translation of mRNAs in multiple biological processes such as cell cycle, proliferation, and apoptosis [11, 12].

Faragalla et al. [13] found that miR-21 is upregulated in clear cell RCC (ccRCC) and papillary RCC, and can distinguish between ccRCC and chromophobe RCC (chRCC) with 90% specificity and 83% sensitivity. In the case of unclear morphological features and overlapping histological features among tumor subtypes, miR-21 can be the key factor to improve the accuracy of cancer diagnosis. In addition to early diagnosis, miRNA can also be used as a therapeutic target for cancer. Kelly Gaudelot et al. [14] used anti-miR-21-based silencing strategy to improve the chemotherapy efficacy of ccRCC. PTEN is a potential inhibitor of the PI3K/Akt pathway, by silencing miR-21, PTEN can be upregulated and thus inhibit the PI3K/Akt pathway, and further inhibit the development of RCC.

More and more studies have shown that miRNAs participate in the development of many diseases, and can be a useful biology molecule in the early diagnosis, therapeutic intervention and prognosis evaluation of tumors [15, 16].

In order to find possible biomarkers for the early diagnosis of RCC, we performed a computational identification and evaluation analysis to identify the significant differences in expression data of miRNAs from stage-I renal cancer tissues and normal adjacent tissues. Then several bioinformatic methods were used to construct a biological network of miRNA-mediated target genes, and to found some key target genes and their corresponding miRNAs. The biological function of these target genes were then enriched and analyzed to find the roles they played in the signal transduction pathways that may participate in the development process of RCC.

Materials and methods

Collection and pre-processing of miRNA expression data

Level 3 miRNA-seq isoform quantification data that associated with RCC (including ccRCC and papillary RCC) were downloaded from The Cancer Genome Atlas (TCGA). Then miRNA expression data of clinical stage I (including phase IA and phase IB) were selected, the data from cancer tissues and normal adjacent tissues were also paired up according to their patient IDs for further processing. A total of 50 patients’ clinical data were used in this study, each patient’s clinical data includes 1881 miRNA expression data.

For data preprocessing, firstly, the miRNAs with missing values of more than 20% in all samples were removed. In additional, local least square imputation (LLS impute) method was used to estimate the rest of miRNAs’ missing values [17]. Lastly, z-score method was used to replace abnormal values and standardize the data [18].

Computational identification of significant differentially expressed (SDE) miRNAs

In order to improve the recognition accuracy of SDE miRNAs, Kolmogorov-Smirnov test (K-S test) [19, 20], volcano plot [21] and significance analysis of microarrays (SAM) software [22] were used to identify the SDE miRNAs between the tumor tissue samples and normal adjacent tissue samples. After that, VENN diagram was used to take the intersection of the three results, and heat map was used to visually show the expression pattern of the miRNAs. K-S test is a non-parametric test [19], which can test whether a single population distribution is related to a certain theoretical distribution, and whether there is a significant difference between the two population distributions [20]. It can be divided into single-sample K-S test and paired K-S test. The null hypothesis is that there is no significant difference in the distribution of the two populations from the two independent samples. The null hypothesis was rejected when the significance level ɑ = 0.05. When D ≥ Dcrit (the critical value of D for the significance level ɑ), it indicates that there are differences in the expression of miRNAs between paired samples.

$${D} = \mathop {{{\mathrm{max}}}}\limits_{\mathrm{x}} \left| {F_{1}(x) - F_{2}(x)} \right|$$

where F1(x) and F2(x) represent the cumulative distribution functions of the miRNA in the tumor tissue samples and normal adjacent tissue samples.

Volcano plot is a type of scatter-plot that is used to quickly identify changes in large data sets composed of replicate data. Our criteria for the data selection are |log2 fold change (FC) | ≥ 1 and p-value < 0.05.

SAM is a method that can identify SDE miRNAs and classify the up (or down)-regulated miRNAs directly in the results [22]. The Excel add-in SAM package (Stanford University, CA) was used, with FDR (false discovery rate) < 0.05 and |log2 FC| ≥ 1 as the cut-off criteria to select SDE miRNAs.

Afterwards, the R package “pheatmap” was used to build a heat map in order to visually display the SDE miRNAs [21].

Obtaining candidate biomarkers

The SDE miRNAs were obtained from the intersection of the above three methods, and the MEDCALC software (MedCalc Software, Mariakerke, Belgium) was used to build the receiver operating characteristic (ROC) curve and calculate the AUC score [23]. In this study, a miRNA with AUC > 0.85 was considered to be a candidate biomarker.

Bioinformatics analysis of candidate biomarkers

miRNA can regulate gene expression at the post-transcriptional level by pairing with the 3′-untranslated region (UTR) of complementary mRNA, inhibiting the translation of mRNA, thus influence the expression level and function of genes involved in the occurrence and development of cancer [24, 25]. To analyze possible target genes of candidate miRNA biomarkers, we used the target prediction algorithms of miRWalk2.0 (http://zmf.umm.uni-heidelberg.de/apps/zmf/mirwalk2/) and miRTarbase 6.0 (http://mirtarbase.mbc.nctu.edu.tw/php/index.php). The target genes predicted by six or more of the 12 online prediction tools of miRWalk, or experimentally validated with strong evidence from miRTarbase were selected as significant target genes. In order to determine whether above candidate genes are associated with RCC, gene-kidney-related enrichment analysis was performed using the enrichment function module of DAVID (https://david.ncifcrf.gov/). Lastly, the GO enrichment analysis and KEGG pathway enrichment analysis were performed using DAVID with the threshold of p < 0.05.

Construction and analysis of protein–protein interaction network

To evaluate the interactive relationships among these target genes associated with RCC, a protein–protein interaction (PPI) network were constructed using STRING (https://string-db.org/cgi/input.pl) online tool (combined score > 0.8), and was visualized using Cytoscape v3.5.1.

Function and pathway enrichment analysis of key sub-networks

The Molecular Complex Detection (MCODE) module of Cytoscape was used to screen the key sub-networks of PPI network with important biological significance in RCC, the cut-off values are as follows: MCODE scores > 30 and number of nodes > 30. After that, DAVID was used to perform the GO enrichment and KEGG pathway analyse of target genes in each key sub-network.

Survival analysis of the hub genes

The CentiScaPe 2.2 module of Cytoscape was used to analyze the topological characteristics of PPI network and search for key nodes and their target genes—hub genes. Then RCC samples from the TCGA database and mRNA z-score data from the cBioPortal database were downloaded, for each hub gene, all samples were divided into the high-expression group and the low-expression group depending on whether the expression is above or below the median in patients with RCC. Then an overall survival (OS) analysis was performed with Graph Pad Prism 6 software, the Kaplan-Meier survival curve visually showed the differences of OS rate between hub genes high-expression group and low-expression group. The log-rank test was used to find the hub genes that significantly affected the OS rate of two groups of patients (p-value < 0.01).

Biological pathway analysis of hub genes

In order to verify the relationship between hub genes and RCC, we used the online enrichment tool of KEGG (http://www.genome.jp/kegg/tool/map_pathway2.html) to perform the pathway enrichment analysis of hub genes. The results can also provide us a better understanding of the mechanism of RCC.

Results

Identification of the candidate biomarkers

K-S test, volcano plot and SAM were used to identify the SDE miRNAs between tumor samples and normal adjacent tissues. 271 SDE miRNAs were identified by K-S test (p-value < 0.05). Hundred and thirteen miRNAs with 58 upregulated and 55 downregulated were identified by volcano plot (|log2 fold-change| ≥ 1, p-value < 0.05). 85 miRNAs with 80 upregulated and 5 downregulated were identified by SAM (|score (d)| ≥ 1.5, |log2 fold-change| ≥ 1, q-value < 0.01). In order to improve the accuracy, the intersecting results from these three methods were selected for later analysis (Fig. 1). Finally, ROC curves and AUC score were used to screen potential biomarkers, 13 miRNAs with AUC > 0.85 were selected as potential biomarkers (Fig. 2).

Fig. 1
figure 1

The VENN diagram of three methods. Fifty-six SDE miRNAs (including 51 upregulated and 5 downregulated miRNAs) were identified by all three methods

Fig. 2
figure 2

ROC curve and AUC score of 13 potential biomarkers. The AUC score of miR-21, miR-616, miR-362, and miR-155 are larger than 0.9, which indicates that they may have a higher diagnostic value

A heat map analysis was applied for these 13 miRNAs to show the different expression pattern between the tumor tissue samples and normal adjacent tissue samples (Fig. 3).

Fig. 3
figure 3

Heat map of 13 potential biomarker miRNAs. The miRNA IDs are displayed on the right of the heat map, and the tissue number are displayed on the bottom of the heat map. Upregulation in metastasis is depicted as red squares and downregulation as green squares

Prediction and enrichment analysis of miRNA-mediated genes

In order to understand how these 13 miRNAs participate in the pathogenesis of RCC, we predicted the potential target genes of these miRNAs. The total number of predicted target genes was 5357, of which 1967 genes were enriched in RCC. Then, the online tools of DAVID were used to conduct the GO and KEGG pathway enrichment analysis on these 1967 genes. Five most enriched GO terms in biological processes are modification-dependent macromolecule catabolic process, modification-dependent protein catabolic process, proteolysis involved in cellular protein catabolic process, cellular protein catabolic process and protein catabolic process. The most enriched GO terms in molecular function and cellular component are acid-amino acid ligase activity and ubiquitin ligase complex. Five most enriched KEGG pathway terms are ubiquitin mediated proteolysis, oocyte meiosis, progesterone-mediated oocyte maturation, Wnt signaling pathway, and RCC.

PPI network analysis of miRNA-mediated genes

In order to understand the interaction between these genes, the online analysis tool of STRING were used to construct a PPI network of the proteins encoded by these genes (combine score > 0.8), this network has 824 nodes and 3618 edges. To find the key nodes and their genes, we further selected 20 nodes using CentiScaPe (degree ≥ 35, betweenness centrality > average) as hub nodes (corresponding coding genes as hub genes). Further, the distinct modules of the whole network and their interacting proteins were identified by the MCODE add-on of Cytoscape software. Among these modules, a key sub-network with 38 nodes were selected (MCODE score > 30, nodes > 30) (Fig. 4), and an enrichment analysis was conducted using DAVID. The genes involved in the sub-networks are mainly associated with proteasome, ECM–receptor interaction, protein digestion and absorption, and focal adhesion.

Fig. 4
figure 4

The key sub-network of the PPI network. The previously selected hub nodes are colored in red, there are totally 10 hub nodes in this sub-network

Survival analysis of the hub gene

In order to find the relationship between the expression of these 20 hub genes and the development of RCC, the hub genes were divided into downregulated group and upregulated group based on whether the expression level of the gene is below or above the median number, after that, the survival data of 499 RCC patients were downloaded and the OS analysis was conducted. The results showed that among the 20 hub genes, 14 were significantly associated with the survival rate of RCC patients (Fig. 5).

Fig. 5
figure 5

Overall survival curve of 14 survival-related hub genes. Black curves represent the downregulated genes and red curves represent the upregulated genes.Among these 14 hub genes, three genes are upregulated, including ANAPC5, RAC1 and NCBP2, while the other 11 genes are downregulated

Integrated miRNA-mRNA-pathway analysis of RCC

Using the online tools of KEGG, a pathway enrichment analysis of the 14 survival-related hub genes was conducted, the results showed that only RNF111 did not participate in any pathway, the remaining 13 hub genes involved in 147 biological pathways, including 40 signal transduction pathways and 107 metabolic pathways. To further demonstrate the regulatory relationship among the potential biomarker miRNAs, target genes and these biological pathways, an interaction network was constructed (Fig. 6).

Fig. 6
figure 6

The interaction network of potential biomarker miRNAs, target mRNAs and biological pathways. The red nodes are miRNAs, the green nods are target genes, the blue nodes are pathways and the key pathways (with five or more genes enriched) are colored in yellow. Nodes with credible interactions were linked with black lines, the interaction among key pathways, genes and miRNAs were colored in red. The 8 key pathways are pathways in cancer, Rap1 signaling pathway, prostate cancer, focal adhesion, proteoglycans in cancer, ubiquitin mediated proteolysis, colorectal cancer and PI3K-Akt signaling pathway

Discussion

Among the 13 potential biomarker miRNAs, 10 miRNAs (miR-142, miR-15, miR-155, miR-185, miR-21, miR-340, miR-362, miR-210, miR-93, and miR-942) were reported to be associated with RCC [26,27,28,29,30,31,32], nine of them (except miR-210) were involved in the regulation of the 147 pathways mentioned in the results section. In these 147 pathways, four of the top 15 pathways have been reported to be involved with the regulatory of RCC and other cancer, they are PI3K/Akt pathway [33], Wnt signaling pathway [34], MAPK/ERK pathway [35], and ubiquitin mediated proteolysis [36]. Although the target genes of miR-210 are not involved in these 147 pathways, there are many reports that miR-210 is associated with the development of renal cancer [29].

Eight miRNAs (miR-21, miR-155, miR-185, miR-362, miR-15a, miR-93, miR-142, and miR-576) and the genes they regulate (EGFR, HSP90AA1, PIK3R3 and RAC1) are involved in the regulation of PI3K/Akt pathway. It is known that the PI3K/Akt signaling pathway plays an important role in the development of tumors by inhibiting the downstream apoptotic genes, activating anti-apoptotic genes, promoting cell survival, and mediating hematopoiesis and angiogenesis [35]. An initial stimulation of a growth factor causes activation of cell surface receptors and phosphorylation of PI3K. Activated PI3K further phosphorylates lipids on the cell membrane to form a second messenger PIP3 (phosphatidylinositol-3,4,5-triphosphate). Akt, a serine/threonine kinase, is recruited to the cell membrane by interaction with these phosphatidylinositol docking sites. Activated Akt mediates downstream responses by phosphorylating a range of intracellular proteins such as Bad, Caspase-9, mTOR, NF-κB, GSK-3, FKHR, thus regulate multiple biological progress including cell survival, cell growth, proliferation, cell migration and angiogenesis, and eventually lead to cancer [37].

Seven miRNAs (miR-15a, miR-362, miR-142, miR-155, miR-340, miR-576, and miR-942) and the genes they regulate (BTRC, CTNNB1, RAC1, and SKP1) are involved in the regulation of Wnt/β-catenin signaling pathway. Previous studies have found that abnormal activation of the Wnt signaling pathway is associated with the development of multiple cancers [38]. When the Wnt signal pathway is activated, Wnt binds with its specific receptors, Frizzled and LRP, and the signal is transmitted from the extracellular to the intracellular dishevelled (Dsh or DV1). Under the synergistic effect of Axin, Dsh inhibits the degradation of hygro-catenin, subsequently, β-catenin enters the nucleus and interacts with TCF (T cell transcription factor)/LEF to activate the expression of downstream related genes, such as c2myc and cyclin D1, and eventually leads to malignant transformation [39].

Seven miRNAs (miR-21, miR-155, miR-15a, miR-93, miR-142, miR-185, and miR-576) and the genes they regulate (EGFR, MAPK1, and RAC1) are involved in the regulation of MAPK signaling pathway. Extracellular growth factors (EGF) activate tyrosine kinase receptors (RTK), providing binding sites for the adapter protein Grb2, and recruiting SOS proteins to the cell membrane, SOS proteins then activate Ras by consuming GTP. Ras-GTP binds directly to a silk/threonine kinase Raf, forming an instantaneous signal, activated Raf kinases further phosphorylate MEK to activate it [40]. Activated MEK phosphorylates ERK1/ERK2, and the phosphorylated ERK enters the nucleus, phosphorylates transcription factors such as elk-1 and myc [41], then regulates gene expression related to cell growth and differentiation, resulting in the promotion of proliferation and evolution of cancer cells [42].

Eight miRNAs (miR-93, miR-185, miR-15a, miR-362, miR-142, miR-576, miR-21, and miR-942) and the genes they regulate (ANAPC5, BTRC, CUL3, NEDD4, and SKP1) are involved in the regulation of ubiquitin mediated proteolysis. The ubiquitin (Ub)-proteasome degradation system can specifically recognize and target the substrate protein, participate in important processes such as cell growth, proliferation and apoptosis, and plays an important role in maintaining and regulating the dynamic balance of cells [43, 44]. The system consists of ubiquitin, ubiquitin-activating enzyme E1, ubiquitin-conjugating enzyme E2, ubiquitin ligase E3, 26S proteasome and ubiquitin dissociation enzyme (DUBs). With ATP, through the catalysis of E1, E2 and E3, ubiquitin is finally connected with e-nh2, a lysine side chain of the target protein, through the isopeptide bond, and then enters the 26S proteasome as the recognition signal to degrade the target protein.

However, there are three miRNAs were not reported to be associated with RCC before, they are miR-576, miR-616 and miR-133a-2.

Although miR-576 has not been reported to be associated with renal cancer, this study found that its corresponding target genes, CUL3 and RAC1, are involved in the regulation of several cancer-related biological pathways such as PI3K/Akt signaling pathway and ubiquitin proteasome degradation. To be more specific, CUL3 involved in the formation of E3 ubiquitin ligase in the ubiquitin mediated proteolysis pathway, and RAC1 encodes G protein linkage receptor Rac1 which participates in the PI3K/Akt signaling pathway, thus it may be a new early diagnostic biomarker.

MiR-616 was neither reported to be associated with RCC, nor involved in the regulation of any one of 147 biological pathways. However, this study found that two proteins encoded by the target genes of miR-616, ASB13 and FBXW2, are the key nodes in the key sub-network of the PPI network, suggesting that they may have interactions with other proteins in the network. Also, ASB13 has been reported to be associated with diffuse large B cell lymphoma [45] and breast cancer [46], and FBXW2 has been reported to have relationship with lung cancer [47, 48]. The evidence indicates that miR-616 may affect the development of renal cancer in an indirect way. Thus it may also be defined as a new early diagnostic biomarker.

The target genes of miR-133a-2 have not appeared in the key sub-network, also, it has not been reported to be associated with RCC, so the relationship between miR-133a-2 and RCC needs to be further verified.

In conclusion, by using multiple computational methods and bioinformatic methods, we managed to find several potential early diagnostic biomarker and the relationship between these potential biomarker miRNAs and RCC. There are totally 13 miRNAs (miR-142, miR-15, miR-155, miR-185, miR-21, miR-340, miR-362, miR-576, miR-93, miR-942, miR-616, miR-133a-2, and miR-210), including two new early diagnostic biomarker miRNA, miR-576 and miR-616, were found by this study, and they will have guiding significance for the early diagnosis of RCC. Also, the mechanism of how these miRNAs participate in the regulation of multiple cancer-related pathways will provide new ideas and theoretical support for the study of mechanisms contained in kidney cancer and the prediction of therapeutic targets.