Prioritizing cancer-related microRNAs by integrating microRNA and mRNA datasets

MicroRNAs (miRNAs) are small non-coding RNAs regulating the expression of target genes, and they are involved in cancer initiation and progression. Even though many cancer-related miRNAs were identified, their functional impact may vary, depending on their effects on the regulation of other miRNAs and genes. In this study, we propose a novel method for the prioritization of candidate cancer-related miRNAs that may affect the expression of other miRNAs and genes across the entire biological network. For this, we propose three important features: the average expression of a miRNA in multiple cancer samples, the average of the absolute correlation values between the expression of a miRNA and expression of all genes, and the number of predicted miRNA target genes. These three features were integrated using order statistics. By applying the proposed approach to four cancer types, glioblastoma, ovarian cancer, prostate cancer, and breast cancer, we prioritized candidate cancer-related miRNAs and determined their functional roles in cancer-related pathways. The proposed approach can be used to identify miRNAs that play crucial roles in driving cancer development, and the elucidation of novel potential therapeutic targets for cancer treatment.


Methods
As shown in Fig. 1, we assumed that some genes and miRNAs, illustrated in the left part of the presented biological network, have many interactions and cause significant alterations of other genes/miRNAs, directly or indirectly. Therefore, they are more likely to contribute to the development of cancer than genes and miRNAs represented in the right part of the network, which are mainly affected by the previous category of genes and miRNAs, and are less likely to contribute to the cancer development. A schematic overview of our approach is presented in Fig. 2, while the details are described below.
Data collection. We obtained microarray and/or RNA-Seq datasets for GBM, ovarian cancer (OVC), prostate cancer (PRCA), and breast cancer (BRCA) from the TCGA data portal (http://cancergenome.nih.gov). Paired datasets of gene and miRNA expressions were obtained.
For microarray, gene and miRNA expression data were generated using Affymetrix HG-U133A and Agilent H-miRNA_8 × 15 for GBM and OVC, respectively. GBM dataset contains 12,042 genes and 470 mature miRNAs, obtained from 482 tumor samples, and OVC dataset contains 12,042 genes and 723 mature miRNAs obtained from 578 tumor samples. For RNA-Seq, gene and miRNA expression datasets were generated by IlluminaHiSeq_ RNASeqV2 and IlluminaHiSeq_miRNASeq, respectively, using OVC, PRCA and BRCA samples. OVC dataset contains 20,806 genes from 416 tumor samples, PRCA dataset contains 20,531 genes from 494 tumor samples, and BRCA dataset contains 20,532 genes from 461 tumor samples. Additionally, they commonly contain 1,046 miRNAs obtained from the paired samples with genes.
Predicted gene-miRNA interactions were collected from microCosms 3 , PicTar 4 , and TargetScans 5 . The information about miRNA-disease relationships was obtained from the Human microRNA Disease Database (HMDD) 28 . We collected OVC miRNA data using "Ovarian Neoplasm" term, GBM miRNAs using "Glioblastoma" or "Glioma" terms, PRCA miRNA data using "Prostatic Neoplasms" term, and BRCA miRNA data using "Breast Neoplasms" term.
Scientific RepoRts | 6:35350 | DOI: 10.1038/srep35350 Feature selection and analysis. We propose three miRNA features that may strongly affect the biological network. First, we assumed that highly expressed miRNAs show a high potential to affect the biological network. Therefore, the average miRNA expressions from all cancer samples are selected as the first feature (F1). F a = {a 1 , … , a i , … , a M }, where a i is the average expression of a miRNA i in the cancer samples, and M represents the number of miRNAs. Note that we only considered the expression levels of miRNAs in cancer cells, but not in the normal cells.
Additionally, we assumed that if a miRNA significantly affects some genes, the expressions of this miRNA and the genes may be highly correlated. A miRNA can directly regulate a set of genes, which may indirectly lead to the alterations in the expression of many other genes. Therefore, we considered all genes in the biological network and used the average of absolute Pearson correlation coefficients (PCCs) between miRNA and all gene expressions as the second feature (F2). F c = {c 1 , …, c i ,… c M }, where c i is the average of absolute PCC values between miRNA i and all genes in the cancer samples.
We further assumed that miRNAs that bind to many genes strongly affect the biological network. However, only a small fraction of miRNA target genes has been experimentally validated, and therefore, we used computationally predicted gene-miRNA interactions, based on sequence complementary. We obtained the predicted gene-miRNA interactions from microCosms 3 , PicTar 4 , and TargetScans 5 . All interaction pairs were extracted from these three databases and duplicated interaction pairs were removed. Furthermore, we counted the number of the predicted targets for each miRNA and these numbers can be considered the numbers of potential interacting genes, representing our third feature (F3). F t = {t 1 ,… , t i ,… , t M }, where t i is the number of target genes for miRNA i. Note that, because the number of predicted targets is determined by the sequence complementary information, gene-miRNA interactions for the third feature are the same, regardless of the cancer type used.

Integration of features.
We used order statistics to integrate the three features, F1, F2, and F3. In Fig. 3, a flowchart of feature integration process is presented. First, we computed the ranking ratios for the values of each feature. F a , F c , and F t values were ranked in a decreasing order and their ranking ratios were stored in RD 1 , RD 2 , and RD 3 , respectively. Let , where td i r is the ranking ratio of t i ∈ F t and < td td i r j r if t i > t j . Furthermore, for a given miRNA i, a Q statistic is computed based on joint cumulative distribution of order statistic of the three features 26 . The integration was approximated using a recursive solution proposed by Stuart et al. 29 . Figure 1. Biological network structure. Cancer-related miRNAs (dark orange rectangles) and genes (dark blue circles) affect other genes and miRNAs and ultimately the entire biological network. In contrast to this, white rectangles and circles represent miRNAs and genes that do not affect other genes and these miRNAs are mostly regulated by the previous category of genes and miRNAs. Black solid lines and green dotted lines represent the direct or indirect changes, respectively, of one node that affects linked nodes. Although we presented only one indirect edge for simplicity, there are many indirect relationships in a real biological network. Right nodes influence on the left nodes shown in blue dotted lines, however, influences from left to right are more abundant globally.   Following this, we computed ranking ratios by decreasing (RD) and increasing order (RI). Afterward, we computed Q statistics, Q 1 and Q 2 , and rankings, R Q1 and R Q2 . Finally, we determined the final rankings by the addition of these two rankings. where N = 3, the number of the features, r 0 = 0, and r k is the ranking ratio for the kth feature for the given miRNA i: = ∈ r ad RD , where qd i r is a ranking of qd i ∈ Q 1 and < qd qd i r j r if qd i < qd j . We assumed that a miRNA i with a smaller value of qd i (thus, with a higher ranking) is more highly related to cancer development, since we assumed that miRNAs with larger F a , F c , and F t values are more likely to be related to cancer, resulting in smaller qd i values. Therefore, R Q1 is a set of miRNA rankings, indicating their relevance to cancer development. However, some miRNAs are highly ranked in some features but not in others. In these cases, miRNA rankings can be dominantly determined by a single or a small number of features because the ranking in R Q1 becomes higher with a lower number of high ranking features. To reduce the effects produced by these few features, we incorporated the following step.
We calculated another Q statistic that measures the extent to which a miRNA is not related to cancer. When F a , F c , and F t feature values are small, it is unlikely that a miRNA is related to cancer. Hence, F a , F c , and F t values are ranked in an ascending order and their ranking ratios are stored in RI 1 , RI 2 , and RI 3 , respectively. Let Pathway analysis related to miRNA. In order to understand the functional roles of miRNAs in biological pathways, we performed a pathway enrichment analysis using KEGG pathways and gene ontology (GO) biological functions. We downloaded 186 KEGG pathways and 825 GO biological functions from Gene Set Enrichment Analysis (GSEA, http://www.broadinstitute.org/gsea), and extracted experimentally validated gene-miRNA interaction datasets from miRTarbase 30 . Strong evidence, such as reporter assays, western blotting, and qPCR and/ or weak evidence, such as microarrays, next generation sequencing (NGS), and pSILAC support these datasets. For each miRNA, we selected genes with expression levels highly correlated with the given miRNA expression levels (1% of genes in PCC values) or those interacting with that miRNA according to miRTarbase 30 . Finally, we performed a pathway enrichment analysis using a hypergeometric test and obtained p-values for each miRNA. p-values were corrected to q-values based on Benjamini & Hochberg correction 31 , in order to address multiple comparison issues. We considered pathways with q-value less than 0.05 as significant.

Survival analysis.
We performed a survival analysis to identify miRNAs that play crucial roles in cancer patient survival. Clinical information about GBM and OVC samples was obtained from TCGA. For each candidate miRNA, we divided these samples into two groups: an under-expressed group, where the expression levels of that miRNA belong to the bottom 10% of all values, and an over-expressed group, where the expression levels of the miRNA belong to the top 10% of values. We performed Kaplan-Meier survival analysis and obtained p-values.

Results
Feature selection and analysis. For each miRNA, we computed three feature values, F1, F2, and F3. To show that these three features are cancer-related, we ranked all miRNAs based on the feature values. We assigned a value of one to miRNAs if they were found in the HMDD database and zero if not. We computed a cumulative ratio by computing the average number of miRNAs that had the value of one for each ranking. Afterward, we determined whether the higher-ranked miRNAs have higher cumulative ratios. In Fig. 4, cumulative ratios of cancer-related miRNAs for the five cancer datasets are presented. Images on the left in Fig. 4 show that cumulative ratios of all features, F1, F2, and F3, have a similar tendency of steady decrease with the ranking, suggesting that highly ranked miRNAs are more likely to be related to cancer regardless of cancer type (GBM, OVC, PRCA, and BRCA) and platforms (microarray or RNA-Seq).

Integration of features.
We integrated the three features (F1, F2, and F3) using order statistics and ranked miRNAs based on R Q . As shown in Fig. 4, the cumulative ratios of cancer-related miRNAs were shown to be the highest when these three features were integrated, compared with single-feature or two-feature analyses. Additionally, we applied different integration methods: an average of ranking ratios of the three features, F1, F2, and F3 (e.g. (RD 1 + RD 2 + RD 3 )/3), an inverse normal transformation 32 , and two other order statistics, R Q1 and R Q2 . The comparison of the proposed integration method (R Q ) with the other tested methods showed that our method had the highest performance on average when GBM, OVC (microarray), OVC (RNA-Seq), PRCA, and BRCA samples were used (Supplementary Table 2). These results suggest that the integration of these features assists the identification of cancer-related miRNAs. Supplementary Tables 3, 4  x-axis represents miRNA rankings and y-axis represents a cumulative ratio of cancer-related miRNAs included in HMDD. (A-E) correspond to GBM, OVC (microarray), OVC (RNA-Seq), PRCA and BRCA samples, respectively. Left, cumulative ratios of cancer-related miRNAs are presented, which were obtained using a single feature. Right, cumulative ratios of cancer-related miRNAs, by integrating features, are presented. For most cancer types, cumulative ratios of cancer-related miRNAs increase with the rank. Additionally, the integration of more than one feature shows higher performance than when only one feature is used. Furthermore, integration of all features shows the highest performance for most cancer types. 100 miRNAs (64.0%) were shown to be related to GBM (Fig. 4A), which demonstrates a significant overlap with GBM-related miRNAs (p-value = 4.43e-20, in the hypergeometric test).
Furthermore, we investigated whether 98 previously known GBM-related genes 33,34 interact with miRNAs ranked high in our analysis. The target genes of the top 20 ranked miRNAs obtained from miRTarbase 30 included 3949 genes and 515 genes, found to interact with at least one of the miRNAs with some evidence and with strong evidence, respectively. Among these, 68 and 43 genes were common with the 98 GBM genes, with some evidence and strong evidence, respectively, showing statistically significant overlaps (p-values < 1.12e-13 in the hypergeometric test). Extracted gene-miRNA interactions are listed in Supplementary Table 8.
We further investigated whether previously known 379 OVC genes 33,34 interact with the highly ranked miR-NAs. The top 20 ranked miRNAs are known to interact with 4485 genes and 559 genes with some evidence and strong evidence in miRTarbase 30 , respectively. Of these genes, 183 and 89 genes were found among 379 OVC genes with some evidence and strong evidence, respectively, showing statistically significant overlaps (p-values < 5.37e-06 in the hypergeometric test). In Supplementary Table 9, a list of gene-miRNA interactions is presented.
PRCA and BRCA-related miRNAs. PRCA dataset contained 1046 miRNAs, and 128 (12.2%) are indicated in HMDD as well. In our analysis, 16 out of 20 miRNAs (80.0%) and 63 out of 100 miRNAs (63.0%) were shown to be PRCA-related (Fig. 4D). In the BRCA dataset, 229 (21.9%) miRNAs, of 1046, were found in HMDD as well. Our analysis, integrating two features, F1 and F3, showed the best performance in the identification of BRCA-related miRNAs. However, the integration of all features shows good performance in general. This analysis identified 19 out of 20 miRNAs (95.0%) and 75 out of 100 miRNAs (75.0%) as related to BRCA (Fig. 4E).

Pathway analysis.
We investigated biological pathways affected by cancer-related miRNAs identified in our study. For each miRNA, we calculated PCCs between the expression level of all genes and the given miRNA, and selected highly correlated genes, with PCC values within top 1%. These highly correlated genes and genes that are reported to interact with a specific miRNA in miRTarbase 30 are considered candidate genes related to the given miRNA.
Furthermore, we constructed reference cancer-related pathways for GBM and OVC. We performed a pathway enrichment test using 98 GBM genes curated from two previous reports 33,34 , and 379 OVC genes curated from Dragon Database For Exploration Of Ovarian Cancer Genes (DDOC) 35 against KEGG pathways. As a result, 61 and 73 pathways were shown to be significantly enriched (q-value < 0.05) and used as reference GBM and OVC, respectively, related pathways (Supplementary Tables 10 and 11).
In OVC analysis, 84 pathways were associated with the top 20 OVC candidate miRNAs ( Fig. 5B and Supplementary Table 13). Among them, 56 pathways represent reference cancer-related pathways for OVC, including previously known OVC related pathways, such as apoptosis 61 , ECM receptor interaction 62 , ErbB signaling pathway 63 , JAK-STAT signaling pathway 64 , p53 signaling pathway 65 , and TGF-β signaling pathway 66 . Additionally, in Fig. 5B, previously known associations between miRNAs and pathways are presented. MiR-21 and miR-17 were shown to be involved in OVC cell apoptosis 67,68 in p53 signaling pathway in OVC cells 69,70 .
Additionally, we performed a functional enrichment test for GO biological functions using the genes related to top 20 GBM and OVC candidate miRNAs. Among 1123 enriched terms for GBM, 290 terms were related to cancer hallmark signatures such as apoptosis, immune response, chromosome abnormalities, inflammation, and angiogenesis 71,72  patients, 290 were deceased and the average survival is 1088 days. Considering these data, GBM is more aggressive than OVC. For each miRNA, we divided the samples into 10% underexpressed group and 10% overexpressed group and performed the survival analysis. For GBM samples, 10 out of the top 20 miRNAs (50%) and 29 out of the top 100 miRNAs (29%) were shown to be significantly related to the survival of patients. In Fig. 6, survival curves for samples containing overexpressed or underexpressed miR-17-5p, miR-106a, and miR-181d, which function as tumor suppressors, are shown. Overexpressed groups show longer survival time compared with that of the underexpressed groups. These miRNAs were previously identified as tumor suppressors, as presented in Table 1. For OVC, only six out of the top 100 miRNAs (6%) were shown to be related to the survival of patients.
Although multiple studies showed that the highly ranked miRNAs identified in this study are related to OVC, for most of them, no significant difference in the survival time was observed. This may be a result of the higher survival ratio of patients with OVC compared with GBM patients and relatively small differences in the expression levels between the samples. Details of the survival analysis for the top 100 miRNA are described in Supplementary  Tables 16 and 17 for GBM and OVC samples, respectively.
Comparison with other methods. We compared performances of other methods with that of our method using GBM and OVC (microarray) datasets. We used differentially expressed miRNAs for comparison, because the identification of miRNAs based on differential expression between cancer samples and normal samples is the most commonly used approach for the selection of cancer-related miRNAs. We additionally downloaded eight and 10 unmatched normal GBM and OVC (microarray) samples, respectively, computed p-values using t-test, and ranked miRNAs based on p-values.
For comparison, we employed our previous approach 25 as well, in which gene and miRNA expressions and gene-gene interactions are integrated using a biclustering algorithm and a Gaussian Bayesian network framework, followed by construction of gene-miRNA modules. The inclusion of genes and miRNAs in the same module was explained through the direct regulation of genes by miRNAs and/or indirect regulation by transcription factors. We assumed here that the miRNAs included in the modules are cancer-related. We listed all miRNAs in the modules without duplication and obtained 95 GBM-related and 108 OVC-related candidate miRNAs.
In Fig. 7, each bar represents a cumulative ratio of GBM miRNAs in HMDD (Fig. 7A) and OVC miRNAs in HMDD (Fig. 7B) among miRNAs identified by each method, and on the x-axis are represented the numbers of candidate miRNAs selected based on the rankings in each method. Across all rankings ranging from 10 to 100, our method outperformed the other two investigated approaches. Although our method showed a slightly higher performance when 95 GBM candidate miRNAs were used, in comparison with our previous approach (66.3% vs. 61.1%, respectively), it significantly outperformed our previous approach when using 108 OVC candidate miRNAs (66.3% vs 48.1%, respectively).
An approach proposed by Zhang et al. 24 was also employed for comparison. Zhang et al. 24 identified differentially expressed genes and miRNAs, computed PCCs between them, and constructed a gene-miRNA network by intersecting negative PCC pairs and combination of experimentally and computationally derived gene-miRNA interactions. They ranked candidate cancer-related miRNAs by considering the number of genes uniquely regulated by miRNAs in the obtained network. When the method was applied to a GSE34933 dataset from NCBI GEO, containing the information about gene and miRNA expression for four PRCA samples and benign prostatic hyperplasia samples, 26, 39 and 69 PRCA candidate miRNAs were identified depending on the thresholds. In Fig. 7C, it is clearly demonstrated that our method outperforms the approach used by Zhang et al. 24 when the same PRCA dataset is used.

Discussion and Conclusion
Our approach can identify cancer-related miRNAs that may affect other molecules in the biological network based on three distinct features. For most of the analyzed cancer types, the integration of the three features led to the best performance in the identification of cancer-related miRNAs. However, the identification of BRCA-related miRNAs was improved when only two features, F1 and F3, were integrated, because F2 showed less power. Although the influence of these features can be controlled by assigning different weights to each feature, this requires prior knowledge, and therefore, the ability of generalization decreases. Here, we demonstrated that our order statistics-based method outperformed the average ranking ratio approach. To further investigate the influence of different weighting of features, we determined performance of feature integration by assigning different weights to each feature. These three features were integrated with various weights ranging from 0.1 to 1.0, resulting in a total of 1,000 test cases for each cancer type. We then selected the weight combinations providing the best performance, and the best combinations yielded performances similar to our approach for GBM, OVC and PRCA when selecting 100 candidate miRNAs, although our approach was outperformed in other cases. However, the best weight combinations were differed between different cancer types and the number of miRNAs selected as candidate cancer-related miRNA, confirming that it is hard to generalize optimal feature weights. Additionally, our order statistics approach significantly outperformed the worst weight combination, showing that our approach that considers ascending and descending rankings of the features can be generalized to any cancer type. Performances with various feature weights are described in Supplementary Table 18. For GBM and OVC, we investigated top 20 miRNAs in detail to understand their roles in the cancer development, because more highly ranked miRNAs are more likely to be related to cancer. Functional roles of the top 20 candidate GBM miRNAs are described in Table 1 and Supplementary Table 19. Among them, miR-9/9* (ranked second and seventh), miR-181a/a* (ranked third and 10 th ), miR-21 (ranked fourth), miR-93 (ranked fifth),  and overexpressed miR-17-5p, miR-106a, or miR-181d, respectively. These miRNAs function as tumor suppressors, and the survival rates of patients with the overexpression of these miRNAs are higher than those of patients with the underexpression of these miRNAs.
Scientific RepoRts | 6:35350 | DOI: 10.1038/srep35350 miR-34a (ranked sixth), miR-222 (ranked 13 th ), miR-17-5p/3p (ranked 14 th and 18 th ), and miR-181d (ranked 16 th ) were reported as important GBM-related miRNAs. MiR-9/9* function as onco-miRNAs or tumor suppressors, depending on the cellular environment. Schraivogel et al. 73 reported that miR-9/9* are highly abundant in GBM stem cells and function as oncogenes by repressing tumor suppressor CAMTA1. Wu et al. 57 reported that miR-9 is upregulated in glioma patients with high WHO grade (III-IV) and represents a useful prognostic factor for overall survival. Although these studies indicated that high miR-9/9* expression promotes tumor progression, Gomez et al. 74 showed that, in GBM driven by EGFR mutation (EGFRvIII), miR-9 suppression leads to the enhanced tumor growth, because this miRNA targets FOXP1 transcription factor, and its upregulation can be oncogenic. MiR-181a/a* act as tumor suppressors, and Shi et al. 46 showed that miR-181a inhibits cell growth and invasion and induces apoptosis in glioma cells. MiR-21 was reported to be an important biomarker in GBM. Chan et al. 75 showed that miR-21 functions as an antiapoptotic factor by repressing apoptosis-related genes in GBM cells. Gabriely et al. 76 showed that miR-21 promotes the activity of matrix metalloproteinases (MMPs) by targeting MMP inhibitors, which results in the increase of tumor invasiveness. Zhou et al. 52 showed that miR-21 is involved in the regulation of EGFR and AKT pathways, and the suppression of cell growth in GBM. Furthermore, it was demonstrated 77 that miR-93 targets integrin β8, affecting integrin β8-induced cell death in GBM. MiR-34a has been reported as an important regulator in GBM, and several studies 56,[78][79][80] reported that miR-34a functions as a tumor suppressor. MiR-34a targets several genes, such as c-Met, Notch-1, Notch-2, and CDK6, and regulates GBM-related pathways, e.g., p53 pathway. MiR-222 was shown 50 to inhibit cell apoptosis and induce cell survival through the direct targeting of p53-upregulated modulator of apoptosis (PUMA) in GBM. MiR-222 was shown to be involved in tumorigenesis 81 through the regulation of protein tyrosine phosphatase u (PTPu) expression in glioma cells. MiR-17-5p/3p were reported to regulate E2F1, PTEN, and MDM2. Srinivasan et al. 13 demonstrated that the expression of miR-17 is associated with survival time in GBM. Additionally, decreased expression level of E2F1 and cyclin D1, which represent miR-17 targets, were shown to correlate with longer patient survival. MiR-17-5p was shown to target PTEN and miR-17-3p targets MDM2 51 . MiR-17 overexpression was shown to increase the overall survival time. MiR-181d functions as tumor suppressor in GBM, and it may represent both a predictive biomarker in temozolomide therapy and a prognostic marker regulating MGMT expression in GBM 82 . Furthermore, Wang et al. 83 showed that miR-181d functions as a tumor suppressor by regulating K-ras and Bcl-2.
Additionally, several studies 49,58,[84][85][86][87] showed that miR-22 (ranked first), miR-29a (ranked eighth), miR-30a-5p (ranked ninth), miR-30c (ranked 12 th ), miR-106a (ranked 15 th ), and miR-15b (ranked 17 th ) are primarily associated with cell apoptosis and proliferation in GBM. MiR-125b (ranked 11 th ) was shown to have different functions depending on the environment, and targets E2F1 and inhibits the proliferation of CD133-positive glioma stem cells 88 . However, it was shown that miR-125b function as oncogene as well, promoting proliferation and inhibiting the apoptosis of GBM stem cells 89 . Let-7a was ranked 20 th in our analysis, and Wang et al. 90 showed that it directly targets K-ras and reduces glioma cell malignancy. Although let-7f (ranked 19 th ) was not described as GBM-related miRNA in HMDD, Yan et al. 91 showed that it is involved in the inhibition of proliferation, migration, and invasion of glioma cells.
Functional roles of the top 20 OVC candidate miRNAs in microarray dataset are described in Supplementary  Table 20. Among them, miR-21 (ranked third and 14 th ), miR-93 (ranked sixth), miR-20a (ranked 11 th ), miR-125b (ranked 13 th ), miR-16 (ranked 18 th ), and miR-27a (ranked 19 th ) are well-known OVC-related miRNAs. MiR-21 functions as an oncogene, and it is involved in tumorigenesis and tumor progression through the regulation of PTEN expression, an OVC-related tumor suppressor 70 . The suppression of miR-21 induces apoptosis of cisplatin-resistant OVC cells and the overexpression of miR-21 leads to the lowering of patient survival rate 67 . Similarly, miR-93 plays a key role in cell survival, since its suppression induces the apoptosis of OVC cells and it is involved in the determination of cisplatin chemosensitivity through the regulation of PTEN expression 92 . OVC cell survival was shown to be increased following miR-93 cell transfection 77 . Fan et al. 93 showed that the overexpression of miR-20a promotes proliferation and invasion through direct targeting of amyloid precursor protein (APP) in OVC cells. MiR-125b was reported to be a tumor suppressor, since it suppresses cell proliferation through the regulation of BCL3, a proto-oncogene 94 , while it also inhibits tumor angiogenesis by regulating HER2 and HER3 95 . Bhattacharya et al. 96 showed that miR-16 regulates BMI-1 expression and that the downregulation of BMI-1 leads to the inhibition of proliferation and clonal growth of OVC cells. MiR-27a functions as an oncogene, and it controls the expression of multi-drug resistance (MDR)1/P-glycoprotein by targeting homeodomain interacting protein kinase (HIPK)2 that acts as a tumor suppressor 97 . Multiple studies [98][99][100][101] showed that let-7b (ranked first), miR-29a (ranked fourth), miR-30c/30e* (ranked second and 17 th ) and miR-30b (ranked 20 th ) expression levels significantly differ between OVC tissues and controls. Furthermore, let-7b and miR-30c/30e* expression changes were associated with survival time, and the copy number of the mir-30b gene was shown to be increased in OVC 102 . MiR-29b/29c (ranked fifth and 16 th ) are involved in the development of cisplatin resistance 103 . Li et al. 104 showed that miR-22 (ranked seventh) inhibits cell migration and invasion, and plays a key role in OVC metastasis. MiR-17 (ranked eighth in our analysis) is involved in the regulation of OVC-related pathways, suppressing LKB1-p53-p21/WAF1 pathway, which results in the induction of OVC stem cell development 69 . Yang et al. 105 showed that miR-130a (ranked 10 th ) is involved in drug resistance mediated by DR1/P-glycoprotein in OVC cells.
Although miR-181a-2* (ranked ninth), miR-142-3p (ranked 12 th ), miR-24 (ranked 15 th ), and miR-29c (ranked 16 th ) are not indicated as OVC-related miRNAs in HMDD, Parikh et al. 106 showed that miR-181a plays a crucial role in OVC progression, by promoting TGF-β-mediated epithelial-to-mesenchymal transition through the suppression of Smad7. Furthermore, the expression of miR-142-3p was shown to be highly correlated with a set of genes, including some cancer-related genes, and a set of methylation sites in OVC 107 . MiR-24 acts as a tumor suppressor, inducing apoptosis in OVC cells 108 , while miR-29c was described previously 103 , together with miR-29b.
Let-7a (ranked fifth, sixth, and eighth), mir-200c (ranked 10 th ), and mir-25 (ranked 17 th ) are known as important OVC-related miRNAs. Let-7a-2/3/1 were reported as OVC prognosis markers. The methylation of let-7a-3 was identified in epithelial OVCs, and it affects the expressions of insulin-like growth factor 2 (IGF2) and patient survival 109 . Low let-7a expression and high LIN28B expression were correlated with poorer prognosis 110 . Two studies 111,112 showed that miR-200c acts as a regulator, reducing tumor burden and increasing the sensitivity to chemotherapy by regulating TUBB3 (class III β-tubulin gene). High miR-200c expression suppresses TUBB3 expression, which ultimately prolongs survival. Zhang et al. 113 showed that the overexpression of miR-25 induces cell proliferation by directly repressing the expressions of a pro-apoptotic protein Bim in OVC. Two studies 114,115 showed that miR-92a directly represses the expression of integrin α5, and it was shown that the increased expression of this integrin leads to the poorer survival of patients with stage III OVC. Moreover, Cai et al. 116 showed that let-7e (ranked seventh) is involved in the development of cisplatin resistance in OVC, while miR-30a/99b Figure 7. Comparison of our method with other methods. Our approach was compared with the approach based on DE miRNAs, Zhang et al. 24 , and our previous approach 25 . Each bar represents a cumulative ratio of cancer-related miRNAs in HMDD 28 and miRNAs identified by each of the presented methods for each cancer type. The number of miRNAs identified by these methods is represented on x-axis. In (A,B), GBM and OVC (microarray) datasets from TCGA were used while, in (C), PRCA dataset (GSE34933) 24 from NCBI GEO was used for comparison.
(ranked 12 th and 15 th , respectively), were shown to be differentially expressed between OVC tissues and controls 99,100 . Additionally, the functional roles of the top 20 PRCA and BRCA candidate miRNAs are described in Supplementary Tables 22 and 23.
We demonstrated here that our proposed approach outperforms the approaches based on the datasets showing differentially expressed miRNAs in the identification of cancer-related miRNAs. Therefore, our method can be widely applied to other cancer datasets and may contribute to the elucidation of cancer-related miRNA mechanisms. Furthermore, providing the ranking information, which allows the selection of a small number of highly relevant miRNAs, represent an advantage of our method.