Computing microRNA-gene interaction networks in pan-cancer using miRDriver

Bose, Banabithi; Moravec, Matthew; Bozdag, Serdar

doi:10.1038/s41598-022-07628-z

Download PDF

Article
Open access
Published: 08 March 2022

Computing microRNA-gene interaction networks in pan-cancer using miRDriver

Banabithi Bose¹,
Matthew Moravec² &
Serdar Bozdag³

Scientific Reports volume 12, Article number: 3717 (2022) Cite this article

2259 Accesses
3 Citations
1 Altmetric
Metrics details

Subjects

Abstract

DNA copy number aberrated regions in cancer are known to harbor cancer driver genes and the short non-coding RNA molecules, i.e., microRNAs. In this study, we integrated the multi-omics datasets such as copy number aberration, DNA methylation, gene and microRNA expression to identify the signature microRNA-gene associations from frequently aberrated DNA regions across pan-cancer utilizing a LASSO-based regression approach. We studied 7294 patient samples associated with eighteen different cancer types from The Cancer Genome Atlas (TCGA) database and identified several cancer-specific and common microRNA-gene interactions enriched in experimentally validated microRNA-target interactions. We highlighted several oncogenic and tumor suppressor microRNAs that were cancer-specific and common in several cancer types. Our method substantially outperformed the five state-of-art methods in selecting significantly known microRNA-gene interactions in multiple cancer types. Several microRNAs and genes were found to be associated with tumor survival and progression. Selected target genes were found to be significantly enriched in cancer-related pathways, cancer hallmark and Gene Ontology (GO) terms. Furthermore, subtype-specific potential gene signatures were discovered in multiple cancer types.

Pan-cancer analysis reveals cooperativity of both strands of microRNA that regulate tumorigenesis and patient survival

Article Open access 20 February 2020

Evidence of antagonistic predictive effects of miRNAs in breast cancer cohorts through data-driven networks

Article Open access 25 March 2022

Multidimensional computational study to understand non-coding RNA interactions in breast cancer metastasis

Article Open access 22 September 2023

Introduction

MICRORNAs (miRNAs) are small non-coding RNAs that act as modulators of the target genes' expression either by inhibiting translation or promoting RNA degradation¹. Several studies found miRNAs to be the regulators of cancer driver genes that promote tumor initiation, progression and proliferation^2,3,4.

Several state-of-the-art methods utilize miRNA and gene expression data to infer miRNA-gene regulatory networks. Among these, ARACNe⁵ and ProMISe⁶ use mutual information-based algorithms and HiddenICP⁷, idaFast⁸ and jointIDA⁹ use invariant causal relationships, i.e., direct or indirect effects of miRNAs on targets to infer miRNA-gene regulatory networks.

Several studies found that DNA copy number aberrated areas, i.e., amplification and deletion regions harbor cancer-driving genes^10,11 and miRNAs^12,13,14.

Several studies integrated copy number data, DNA methylation and gene expression to compute miRNA-gene regulatory networks in cancer^15,16 using regression-based approaches. These studies, however, mined miRNAs and target genes from the entire genomic locations.

In our previous study, we developed a computational pipeline called miRDriver based on the hypothesis that copy number data from cancer patient samples can be utilized to discover driver miRNAs of cancer¹⁷. miRDriver assumes that miRNAs located within an aberrated region regulate the expression of the genes outside the aberration, extending the aberration effects across the genome and beyond the aberrated region. Since other factors can influence the expression of the genes outside the aberration, miRDriver integrates DNA methylation and copy number aberration (CNA) of these genes, transcription factors (TFs) and the expression of the genes located inside an aberration along with the miRNAs to select the regulatory miRNAs for these genes¹⁷. We computed frequently aberrated chromosomal copy number regions, namely, GISTIC regions, among tumor patient samples (see Materials and Methods). Then, for each GISTIC region, we computed differentially expressed (DE) genes between the tumor samples with the aberration and the samples that did not have the aberration. Afterward, we computed DE trans genes (genes outside of aberrated areas) and cis genes (genes inside of aberrated areas) for each GISTIC region. Finally, we applied a LASSO-based¹⁸ regression model to select miRNAs regulating DE genes' expression (Fig. 1).

miRDriver outperformed ARACNe, ProMISe, Hidden-ICP, ICP-PAM50, idaFast and jointIDA in retrieving significantly enriched miRNA-gene interactions with the known miRNA-gene interactions. miRDriver discovered several potentially novel interactions in multiple cancer types. Several oncogenic and tumor suppressor miRNAs and genes were found to be enriched in the computed miRNA-gene networks. Several miRNAs were found to be associated with patients' survival and disease progression. Selected target genes were found to be significantly enriched in cancer-related biological pathways and GO terms¹⁹. Furthermore, subtype-specific gene signatures were discovered in multiple cancer types.

In our previous publication, we have demonstrated miRDriver’s statistical robustness by applying it to two different cancer types. This study has unique contributions. In the current study, we present miRDriver as an R software package with various options for users to run our workflow. We have also demonstrated its application and biological importance by running miRDriver on eighteen different cancer types. We have presented extensive results on these cancer types that were not present in our prior publication. We have also presented pan-cancer-wide findings and their relevance to cancer. We have put together a resource of pan-cancer miRNA-gene interactions that will be useful to biologists, clinicians and scientists working on cancer research.

Results

In this study, we integrated CNA, DNA methylation, TF-gene interactions, gene, and miRNA expression datasets in the miRDriver tool to compute miRNA-gene interactions based on DNA copy number aberrated regions in eighteen different cancer types from TCGA. Table 1 shows the cohort sizes for each data modality, the number of all GISTIC regions, the count of trans genes in the LASSO step, and the computed miRNA-gene interactions in eighteen different cancer types.

Table 1 TCGA cancer types in the study with cohort sizes in different data modalities and results of miRDriver.

Full size table

Computed miRNAs were significantly enriched in the experimentally-validated oncogenic miRNAs

We performed a two-sided Fisher's exact test to check the association between the cancer-related miRNAs in OncomiRDB (see Materials and Methods) and the computed miRNAs by miRDriver. For each cancer type, the background set in the Fisher's exact test consisted of all TCGA miRNAs used in the LASSO step (see Materials and Methods) for that cancer type. For all cancer types, computed miRNAs were significantly enriched (Fisher's exact test p-value < 0.05) in the oncogenic miRNAs in OncomiRDB (Table 1).

Computed miRNA-gene interactions were enriched in the known miRNA-gene interactions

To check if the miRNA-gene interactions computed by miRDriver were significantly enriched in the known miRNA-gene interactions, we performed a hypergeometric test for each miRNA's computed target genes in each cancer type. We considered only those miRNAs that had at least one known target in the ground truth data (i.e., known miRNA-gene interactions) (see Materials and Methods) from the computed target list. We labeled them as "Eligible miRNAs" for the hypergeometric test. The background set, i.e., the hypergeometric test universe, was the set of all the trans genes in the HGNC symbol²⁰ that were common to the ground truth data. For fourteen cancer types, at least 50% of the "Eligible miRNAs" had significant enrichment (p-value < 0.05) (Table 2). The entire list of the computed miRNAs with individual hypergeometric p-values for all eighteen cancer types can be accessed in Supplemental Table S1.

Table 2 Target enrichment.

Full size table

miRDriver outperformed five state-of-the-art methods in inferring significant miRNA-gene interactions

We compared miRDriver with five state-of-the-art methods, namely, ARACNe, ProMISe, HiddenICP, idaFast and jointIDA, by running them on eighteen different cancer types from TCGA. For all these methods, we used gene expression data to compute miRNA-gene interaction networks for our comparison (see Materials and Methods). We performed the hypergeometric test to measure each miRNA's computed targets' enrichment significance in the known miRNA-gene interaction data. We selected only "Eligible miRNAs" (i.e., miRNAs with at least one known target in the ground truth data) for this test. We computed the overlapping "Eligible miRNAs" for miRDriver and each comparable method. We checked if the count of the "Significant miRNAs" (i.e., miRNAs with target enrichment test p-value < 0.05) in miRDriver was more (i.e., miRDriver won), less (i.e., miRDriver lost), or equal (i.e., there was a draw) than the other method in the overlap. miRDriver had more "Significant miRNAs" than all other methods for most of the cancer types. For ACC, LUSC and THCA, miRDriver and the different methods had no common "Eligible miRNAs"; hence, we eliminated these three cancer types from this test. Table 3 summarizes the comparison results in all the cancer types. Table 4 presents the comparison results for ovarian cancer (OV) in detail with the number of "Eligible miRNAs" and "Significant miRNAs" in all the methods. For a detailed comparison with all the cancer types, see Supplemental Table S2. We also compared miRDriver with sequence-based competing endogenous RNA (ceRNA) prediction tool, Cupid²¹ for BRCA. miRDriver outperformed Cupid as well. Cupid predicts miRNAs that are also predicted to "mediate" ceRNA interactions. For TCGA BRCA, the authors of Cupid predicted 299K candidate miRNA–target interactions. We filtered this list with 6504 input genes and 255 miRNAs, the same inputs we used in miRDriver for BRCA. We considered the top 2437 (top 1 percentile) of miRNA-gene interactions based on Cupid reported scores to get highly confident interactions for our comparison. The count of the "Significant miRNAs" in miRDriver was higher than Cupid in the overlap (see Supplemental Table S2).

Table 3 Comparison of miRDriver with other methods. We computed the overlapping miRNAs computed by miRDriver and each comparable method.

Full size table

Table 4 Comparison results of miRDriver with five other methods in ovarian cancer.

Full size table

Computed genes were enriched in biological pathways, cancer hallmark and GO terms

To evaluate the functional roles of the computed target genes by miRDriver for each cancer type, we checked whether these genes were enriched in the biological pathways and GO terms¹⁹. For this purpose, we performed pathway enrichment analysis with the pathways in REACTOME²² and KEGG²³ databases. For REACTOME pathway enrichment, we used R package Pathfinder²⁴ and for KEGG pathways, hallmark gene set from the MSigDB^25,26 database and GO enrichment, we used R package clusterProfiler²⁷. We selected the pathways and GO terms with significant enrichment (multiple testing corrected, i.e., adjusted p-value < 0.05). We found 213 unique REACTOME pathways spanning over seventeen cancer types, twelve unique KEGG pathways in twelve cancer types and 224 unique enriched GO terms spanning over fifteen cancer types. Table 5 shows the enriched pathways and GO terms that were common in multiple cancer types. We provided the entire list of enriched pathways and GO terms for all the cancer types in Supplemental Table S3. Among these pathways, "Immune System" related pathways were found to play essential roles in cancer^28,29. The G protein-coupled receptors (GPCRs)-related REACTOME pathways such as "Signaling by GPCR", "GPCR ligand binding" and "GPCR downstream signalling", which were implicated in several cancer-related studies, were found to be enriched in the computed target genes in more than ten cancer types in our study. These pathways were found to play crucial roles in tumor development, invasion, migration, survival, and metastasis^30,31. The GO terms, such as "receptor ligand activity" and "receptor regulator activity", enriched in at least five cancer types, were highlighted in several cancer studies for playing roles in drug toxicity, cell function, tumor growth^32,33,34. The computed target genes in each cancer type were also enriched in the cancer hallmark gene set (Table 6).

Table 5 Enriched pathways and GO terms in pan-cancer.

Full size table

Table 6 Enriched cancer hallmark terms in pan-cancer for computed target genes.

Full size table

Furthermore, miRDriver computed 22 common miRNAs that were shared in at least eight different cancer types among eighteen total cancer types used in the study (Table 7). The targets of these miRNAs could regulate the common biological processes in cancer. Hence, we performed a GO enrichment test with 1161 computed genes targeted by at least one of these 22 miRNAs among eighteen cancer types and found 49 GO terms with significant enrichment. Table 8 shows a few of these GO terms with their cancer-related citations; the entire list can be found in Supplemental Table S4.

Table 7 Twenty two common miRNAs computed by miRDriver in multiple cancer types.

Full size table

Table 8 Enriched GO terms with the cancer-related citations in the targets of the common miRNAs in Table 7.

Full size table

Although there were common miRNAs across multiple cancer types, there were not many common miRNA-gene interactions due to a much higher number of trans genes than the miRNAs in this pan-cancer analysis. Table 9 presents fourteen common gene-miRNA interactions shared in two cancer types among 11,548 selected interactions from pan-cancer. Among these, RSPO3 and miR-22 interaction have been selected in LAML (leukemia) and LUAD (lung cancer). Interestingly, RSPO3 was found to play a role in leukemia³⁵ and promote tumors in lung cancer³⁶. miR-22 was found to play the anti-tumor role with therapeutic potential in acute myeloid leukemia³⁷ and found to have roles in lung cancer via CNAs³⁸. Another interaction PAX5 with miR-5699 was found in BLCA (bladder cancer) and OV (ovarian). Interestingly, PAX5 was found to have a role in bladder cancer³⁹ and ovarian cancer⁴⁰ as a co-regulator of PAX8. miR-5699 has a proven role in ovarian cancer treatment's oxidative response⁴¹. There are some miRNA-long noncoding RNA (lncRNA) interactions in Table 9. lncRNAs are known to have binding sites for miRNAs, also lncRNAs can be direct–indirect targets of miRNAs^42,43. Several lncRNAs were found to be prevalent in cancer⁴⁴. In our case, LINC01833- miR-1226, was found in BRCA (breast cancer) and LGG (brain cancer). LINC01833 was listed in the top five lncRNAs according to the prioritization of variation in ER-negative-associated lncRNAs in breast cancer⁴⁵. miR-1266 was found to regulate the expression of the mucin 1 oncoprotein and induce cell death in a breast cancer study⁴⁶.

Table 9 miRNA-gene interactions computed by miRDriver in multiple cancer types. Cancer type column shows in which cancer types the interactions are present.

Full size table

Several cancer-related terms and pathways were enriched in the targets of the computed miRNAs

We checked the involvement of the computed miRNAs in cancer-related pathways. For this analysis, we collected all 556 miRNAs that were computed by miRDriver in at least one of the cancer type. We collected the computed target genes for each of these miRNAs from all the cancer types where that miRNA was present. We performed cancer hallmark gene set enrichment with these collected target genes of each miRNA. We found 38 unique enriched cancer hallmark terms (adjusted p-value < 0.05) for 134 miRNAs (Supplemental Table S5).

We also performed REACTOME pathway enrichment analysis with these collected target genes of each miRNA. We found 240 unique enriched REACTOME pathways (adjusted p-value < 0.05) for 69 miRNAs with these target genes (Supplemental Table S5). Eleven of these enriched pathways, such as, "Epithelial-Mesenchymal Transition", "Hypoxia", "Inflammatory Response", "KRAS Signaling Up", "p53 Pathway", "P13 AKT MTOR Signaling", "Xenobiotic Metabolism", "Apoptosis", "DNA Repair" and "Immune" were present in nineteen experimentally-validated cancer-related pathways for miRNAs⁵⁷.

Furthermore, we performed an analysis to find cancer-driving miRNAs (i.e., tumor-suppressor, oncogenes or both) using the enriched cancer hallmark terms (Supplemental Table S5). We hypothesized that a miRNA could be a candidate cancer-driving miRNA if its target genes that were found to be enriched in the cancer hallmark terms could also be enriched in the known cancer-driving genes. Hence, for each of the enriched cancer hallmark terms, we gathered all the miRNAs with their target genes for which that term was enriched (Table 10). We downloaded a list of 83 cancer-driving genes found to be frequently mutated in different cancer types from the Catalogue Of Somatic Mutation In Cancer (COSMIC) database from the cancer gene census project⁵⁸. We performed a hypergeometric test for the overlapping target genes with the 83 cancer-driving genes for each cancer hallmark term. The background gene set for this test was all 5604 target genes computed by miRDriver in pan-cancer. We considered the miRNAs related to the hypergeometric p-value < 0.05 as the candidate miRNAs to be evaluated as cancer-driving miRNAs since their targets were enriched in known cancer-driving genes. Furthermore, considering the fact that the up- or down-regulation of a miRNA causes the inverse regulation of its target genes^59,60,61, we specifically checked the target genes of these candidate miRNAs for different cancer types that were found to have negative LASSO regression coefficient computed by miRDriver (Table 11). Interestingly, all of the target genes in this group (Table 11), except OLIG2, were found to be oncogene in the previous studies^{62,63,64,65,66,67,68}. OLIG2 was found to be working as a tumor-suppressor gene (TSG) in human glioblastoma⁶⁹. All the miRNAs except miR-5001 and miR-2276 were found to act as TSGs in cancer in several studies^{70,71,72,73,74}. miR-5001 and miR-2276 were found to have evidence of working as oncogenes in endometrial cancer and colorectal cancer, respectively^75,76. These studies support the findings of miRDriver in terms of connecting miRNAs and genes that were related inversely, having a possibility to be working as drivers in pairs of TSG-oncogene in different cancer types.

Table 10 Hallmark term-related target enrichment in cancer driver genes.

Full size table

Table 11 miRNA targets with negative LASSO coefficient in different cancer types.

Full size table

Computed target genes revealed the subtype-specific expression signature in multiple cancer types

We checked the subtype-specific association of gene expression of computed target genes in BRCA, LGG, LUSC and PAAD. We used the R package TCGAbiolinks⁷⁷ to download the different subtype labels for the different cancer types. Since TPM (transcript per million reads) values are normalized and comparable across samples, for this analysis, we utilized RNA-Seq data in TPM of TCGA samples whose subtype labels were available. We applied log2(TPM + 1) transformation from Cancer Dependency Map [https://depmap.org]. For all these cancer types, we performed unsupervised clustering using gene expression of these target genes and compared these clusters with baseline (i.e., known) subtype clusters using Rand Index (RI) and Uniform Manifold Approximation and Projection (UMAP)⁷⁸ plots.

For BRCA, we computed a UMAP plot using around 1000 BRCA samples and 106 high-degree genes (i.e., computed genes targeted by more than three miRNAs) to check the PAM50 gene-based subtypes⁷⁹. These subtypes were, Basal-like (BL), HER2-enriched (HER2+), LuminalA (LA), LuminalB (LB) and Normal-like (NL) (Fig. 2A). We also computed the UMAP plot using the PAM50 genes with PAM50 gene-based subtypes (Fig. 2B). These UMAP plots show a clear separation between different subtype-specific clusters. We also performed an unsupervised clustering (k-means) (with R base package Stats with k = 5 and all other parameters as default) on the BRCA cohort with high-degree target genes (Fig. 2C) and with PAM50 genes (Fig. 2D). The computed RIs between five known subtype labels with the five predicted clusters by computed high-degree target genes and PAM50 genes were 0.74 and 0.82, respectively. This result shows that both the computed high-degree target genes and PAM50 gene set were able to detect subtype structure in BRCA samples with high accuracy.

Furthermore, we used the high-degree genes to classify the BRCA cohort into five different classes. For this purpose, we used R package keras⁸⁰ (https://github.com/rstudio/keras) implementation of the Random Forest classifier with 80% samples for training with 10-fold cross-validation where 20% of data was held out to test the performance of the model. We achieved a high classification accuracy of 0.86. The same sample cohort was classified with PAM50 genes and achieved a classification accuracy of 0.89. Figure 2E,F present the confusion matrices for both cases with F1 scores. The F1 scores for the classification with high-degree target genes were comparable to F1 scores of the PAM50-based classification, which suggests that these high-degree target genes can serve as potential markers for PAM50-based subtype signatures in BRCA.

For the other cancer types except for LGG, we computed UMAP plots to check the baseline subtype clusters with the selected high-degree target genes. For these cancer types, since there was a fewer number of genes targeted by more than three miRNAs, we defined high-degree genes as the genes targeted by more than two miRNAs. For LGG, we used 402 samples with all 151 computed target genes since no gene was targeted by multiple miRNAs (Fig. 2G). For LUSC, we used 178 patient samples with 75 high-degree target genes (Fig. 2H), and in PAAD, we used 150 patient samples with 101 selected high-degree target genes (Fig. 2I). We also performed k-means clustering for all these cancer types. For LGG, LUSC and PAAD, the computed RIs between known subtype clusters with the predicted clusters were 0.71, 0.62 and 0.70, respectively. For LGG and PAAD in which we achieved high RI values, we visualized clear separation among the known subtype-specific clusters based on UMAP plots. For LUSC, although we achieved a lower RI value, the "Basal" cluster was separated from other clusters (Fig. 2H). These results showed that the computed high-degree target genes could reveal subtype-specific expression signatures in multiple cancer types.

Computed miRNAs were found to be potential biomarkers for patients' survival and progression of the disease in each cancer type

We performed survival analysis with the computed miRNAs to assess the miRNAs' prognostic relevance as clinical biomarkers for patients' survival (Fig. 3). For each miRNA, we divided the patient cohort of each cancer type into two groups, such as high expression and low expression for that miRNA. We considered the available clinical variables among age, race, gender, stage, and grade as independent variables (see Materials and Methods). To remove the confounding effect of multiple factors, we used the Adjusted Kaplan–Meier Estimator and computed adjusted survival curves by weighting the individual contributions by the inverse probability weighting (IPW) using the R package IPWsurvival⁸². We considered four different survival endpoints, namely, Overall Survival (OS), Progression Free Interval (PFI), Disease Specific Survival (DSS) and Disease Free Interval (DFI) (see Materials and Methods). We found several prognostic miRNAs (adjusted log-rank test p-value < 0.05) based on Adjusted Kaplan–Meier survival plots in multiple cancer types. Figure 3 shows the survival plots for the common miRNAs in different cancer types. Among 22 common miRNAs (Table 7), eighteen had significant survival differences in high and low miRNA expression patient groups in at least one cancer type (Fig. 3). We provided the survival plots for all miRNAs for eighteen cancer types in Supplemental Figure S1–S18.

miRDriver discovered several cancer-specific miRNAs

In this study, miRDriver discovered 240 cancer-specific miRNAs, i.e., these miRNAs were selected in only one cancer type. We used the R package OncoScore⁸³ to measure these miRNAs' association with cancer based on citation frequencies in cancer-related biomedical literature. Fifty percent of these miRNAs (i.e., 121) were found to be cited in cancer-related studies (Supplemental Table S6). Moreover, several of these miRNAs were found to be prognostic, i.e., associated with patients' survival based on Adjusted Kaplan–Meier survival analysis (adjusted log-rank test p-value < 0.05) (Table 12).

Table 12 Cancer-specific miRDriver miRNAs with citation frequency.

Full size table

The copy number changes of the computed miRNAs were predictive of their expressions

We computed the Spearman correlation values between copy number and expression across all the samples of the computed miRNAs of miRDriver in eighteen different cancer types (Supplemental Figure S19). As expected, we observed that most miRNAs had a positive correlation between their copy number and expression. There were also some negative correlations, but this is not surprising as miRNA expression is dependent on regulatory factors beyond copy number events, too. Despite this, the positive median distribution of correlations across all cancer types supports our hypothesis that miRNA expression in copy number areas may be predictive of DE trans gene expression variation.

Selected high-degree genes were highly significant as potential biomarkers to predict prognosis in cancer patients than low-degree genes in several cancer types

We computed the hazard ratio (HR) of the selected high-degree target genes as the genes targeted by four or more miRNAs and low-degree target genes as the genes targeted by only one miRNA to get the optimized list of high-degree and low-degree genes. We performed the multivariate Cox regression analysis⁸⁴ using these genes. Due to the low sample size of the high-degree target genes, we computed effect size using the r-value of the Mann–Whitney test with |ln (HR)|. Higher |ln (HR)| implies a higher association with an event's risk with an increase or decrease of gene expression. The r-value was negative if the |ln (HR)| values in the high-degree group were higher than the low-degree group and positive otherwise. We used OS, PFI, DSS and DFI as clinical endpoints in this analysis. We ran this analysis on fifteen different cancer types omitting the cancer types with no high-degree target gene (THCA and PRAD) and no clinical endpoint (LAML). In our previous work¹⁷ with BRCA and OV, we discussed the significance of high-degree target genes; hence, we omitted these two cancer types as well, leaving us thirteen cancer types for this analysis. Although the Wilcoxon rank-sum test p-values for the comparison between the boxplots of the two groups were insignificant (p-value > 0.05), we found negative r-values in most of the cancer types (see Fig. 4). The hazard ratio boxplots of all thirteen cancer types with r-values in different clinical endpoints can be found in Supplemental Figure S20–S23. Table 13 shows the high-degree target genes with OS in seven cancer types that had negative r-values. These genes were found to be cited in cancer-related work in a high percentage (≥ 50%) among total citations in biomedical literature by OncoScore. The entire list of high-degree genes with OncoScore frequencies has been provided in Supplemental Table S7.

Table 13 Cancer types with negative r-values from the ^aMann-Whitney test between low-degree and high-degree gene groups; ^bHighly cited high-degree genes in these cancer types in cancer-related literature.

Full size table

Materials and methods

All the experiments were conducted in accordance with relevant guidelines and regulations.

Running miRDriver on pan-cancer

In this study, we conducted a pan-cancer analysis where we applied the miRDriver R package to identify copy-number derived miRNA-gene interactions. We integrated gene expression, CNA, DNA methylation, TF-gene interactions and miRNA expression data from eighteen different cancer types (Table 1). miRDriver has four computational steps: GISTIC Step, DE Step, REGULATOR Step, and LASSO Step. In the following paragraphs, we described the miRDriver R functions to run these steps. The entire pipeline of miRDriver running on pan-cancer is illustrated in Fig. 1.

To mine miRNAs that reside in the aberrated chromosomal regions of cancer patients, in the first step (i.e., GISTIC Step), we computed frequently aberrated chromosomal regions, namely, GISTIC regions, for eighteen different cancer cohorts. We utilized segmented chromosomal copy number profiles of each cancer cohort as inputs in GISTIC 2.0⁸⁵ tool in GenePattern⁸⁶ webserver and computed chromosomal regions that were frequently aberrated within each patient cohort using a confidence interval of 0.90. The GISTIC regions with a \({\mathrm{log}}_{2}\) ratio above 0.1 were considered amplified and the GISTIC regions with a \({\mathrm{log}}_{2}\) ratio below − \(0.1\) were considered deleted. We further processed the GISTIC regions of each cancer type using the getRegionWiseGistic function in the miRDriver R package to gather patients from each region with their aberration status (i.e., aberrated and non-aberrated).

In the second step (i.e., the DE Step), we computed DE genes for each GISTIC region. We computed these DE genes between frequently aberrated and non-aberrated patient sample groups in each cancer type cohort using getDifferentiallyExpressedGenes function in miRDriver with default parameters. This function employed edgeR⁸⁷ package in R utilizing mRNA raw counts to compute DE genes among these two groups using absolute log fold change (logFC) ≥ 1 and adjusted p-value < 0.05. Using the makingCisAndTransGenes function, we annotated DE genes located inside the GISTIC region as cis genes and DE genes outside of the GISTIC region as trans genes. This step also retrieves the miRNAs (i.e., cis miRNAs) in each GISTIC region. Since the number of cis miRNAs per GISTIC region was extremely low, to avoid reducing the sensitivity and precision of our findings, we did not further filter cis miRNAs based on differential expression. The counts of trans genes, cis genes and cis miRNAs for each GISTIC region in eighteen different cancer types can be accessed from Supplemental Table S8.

In the REGULATOR Step (i.e., the third step) of miRDriver, we collected all the potential predictors, namely, cis genes, cis miRNAs, gene-centric copy number data, gene-centric DNA methylation beta values and TFs in each cancer type that could influence each DE trans gene's expression. We used the getTransGenePredictorFile function to gather all the predictors. This function only considered those trans genes that had at least one cis miRNA as a possible predictor.

In the LASSO Step, we computed the potential cis miRNAs that regulate the DE trans genes' expression variation. We used the lassoParallelForTransGene function in the miRDriver R package that utilized R package glmnet⁸⁸ to perform LASSO to compute miRNA regulators of the DE trans genes. This function considered the gene-centric copy number, gene-centric DNA methylation, TFs, miRNA expression as independent variables and the trans gene's expression as the response variable. For each trans gene, out of all its candidate predictors (i.e., independent variables), LASSO selected a set of non-zero coefficient predictors. Since the independent variables selected by LASSO have been shown to be inconsistent, especially when the sample size gets large⁸⁹, we ran LASSO 100 times for each trans gene and kept the cis miRNAs selected by LASSO at least 70 times. We found that miRNAs with threshold 70 to be the most consistent set of potential regulator miRNAs to be considered in the computed miRNA-gene interaction networks in each cancer type cohort (Supplemental Fig. S24). To optimize the regularization parameter λ of LASSO, for each of 100 runs, we applied 10-fold cross-validation and picked λ that provided the simplest model with the minimum cross-validation error.

Although miRNAs typically cause the inverse regulation of their target genes^59,60,61, miRDriver considers both positively and negatively correlated miRNA-target pairs for each cancer type. Since miRDriver computes miRNA-gene interactions that could be direct or indirect interactions, a positive correlation between them is also possible. Furthermore, a positive correlation between miRNAs and their direct targets is also possible^90,91,92,93. The computed miRNA-gene interactions in eighteen different cancer types can be accessed from Supplemental Table S9.

Running state-of-the-art-methods

We compared miRDriver with five state-of-the-art methods, namely, ARACNe⁵, ProMISe⁶, HiddenICP⁷, idaFast⁸ and jointIDA⁹ by running them on datasets from eighteen cancer types in TCGA. Since these methods can only utilize gene expression data, we used gene expression data to compute miRNA-gene interaction networks for our comparison For ARACNe, ProMIse and hiddenICP, we used the same number of input genes and miRNAs that we used in miRDriver for each cancer type. Since idaFast and jointIDA methods have high computational complexity and therefore are not scalable to large datasets, we run these two methods with ≤ 50 top miRNAs and ≤ 1500 top genes selected by Feature Selection Based on The Most Variant Median Absolute Deviation (FSbyMAD)⁹⁴ for each cancer type. After running ARACNe, we selected all of the miRNA-gene interactions that had non-zero scores to be compared with our method. For ProMIse, hiddenICP, idaFast and jointIDA, we considered the top 3, 3, 3.5 and 3.5 percentile of miRNA-gene interactions based on reported scores, respectively. Based on our previous work with the breast cancer cohort, these thresholds were chosen to get highly confident gene-miRNA interactions for comparison and were used for all eighteen different cancer types. The details of running these methods can be found in our previous publication¹⁷.

Datasets to run miRDriver on pan-cancer

In this study, we utilized gene expression, CNA, DNA methylation, TF-gene interaction and miRNA expression data from eighteen different cancer types. We used the R Bioconductor package TCGAbiolinks⁷⁷ to download the genomic data of cancer patient samples from TCGA. We retrieved gene expression quantification data for raw count (Illumina HiSeq) and RNA sequencing data with FPKM (Fragments Per Kilobase of the transcript, per Million, mapped reads) for all the cancer types. TCGA gene expression data consist of mRNAs (i.e., messenger RNAs), lncRNAs, and pseudogenes. Thus, our analysis considered all these RNAs.

We downloaded miRNAs' gene quantification expression with file type hg19.mirbase20.mirna and isoform gene quantification data with file type hg19.mirbase20.isoform from the legacy data of TCGA. For each cancer type, we used the miRNAs that have ≥ 0.01 RPM (reads per million mapped reads) value across ≥ 30% of the cohort.

We retrieved masked copy number variation (Affymetrix SNP Array 6.0) and computed the gene-centric copy number value compatible with hg38 using the R Bioconductor package CNTools⁹⁵.

We downloaded DNA methylation data of Infinium HumanMethylation27 Bead-Chip (27K) and Infinium HumanMethylation450 Bead-Chip (450K) platforms from TCGA. Gene-specific beta values were calculated separately for both platforms. For the 450K platform, the average beta value for promoter-specific probes was considered due to their role in transcriptional silencing⁹⁶. Given lower coverage in the 27K platform, we utilized all the probes. In this case, we set the DNA methylation of a gene as the average beta values of all its probes.

We downloaded experimentally-validated TF-gene interactions from TRED and TRRUST databases to incorporate TF-gene interactions in the LASSO step. Table 1 shows the sample sizes of different data modalities used in this study for eighteen different cancer types from TCGA.

Datasets to evaluate miRDriver

To check the correlation between copy number and expression across all the samples of the computed miRNAs of miRDriver, we used TCGA's masked copy number variation (Affymetrix SNP Array 6.0) data. We utilized the R Bioconductor package CNTools⁹⁵ to compute the miRNA-centric copy number value by giving miRNA coordinates extracted from the TCGA's legacy data file type hg19.mirbase20.isoform.

To evaluate if the miRNAs computed by miRDriver were enriched in cancer-related miRNAs, we downloaded a list of 351 known oncogenic miRNAs from the oncomiRDB database⁹⁷. Each miRNA listed in oncomiRDB is involved in at least one cancer-related phenotype or cellular process. We harmonized the names of oncomiRDB miRNAs regarding the miRBase⁹⁸ database.

To check if the miRNA-gene interactions computed by miRDriver were significantly enriched in the known miRNA-gene interactions, we performed a hypergeometric test for the computed target genes of each miRNA. For this purpose, we compiled a list of experimentally-validated miRNA-gene interactions from miRTarBasev6.1, TarBasev7.0 and miRWalk databases⁹⁹ as our ground truth data. Considering that miRDriver could compute direct targets and the indirect downstream targets (i.e., targets of the direct targets), we included potential indirect targets to the ground truth dataset. Hence, for each miRNA-gene interaction where the gene was a known TF, we included the experimentally-validated targets of this TF obtained from TRED and TRRUST databases.

To assess the prognostic relevance of the miRDriver-selected miRNAs as clinical biomarkers, we performed multivariate survival analysis⁸² and multivariate Cox regression⁸⁴. We downloaded the clinical data for eighteen different cancer types using TCGAbiolinks⁷⁷. We considered the available clinical variables from age, race, gender, stage, and grade as independent variables whenever available (see Table 14).

Table 14 Availability of clinical variables in TCGA.

Full size table

We considered four different endpoints, namely, OS, PFI, DSS and DFI. In OS, patients who were dead from any cause were considered as dead, otherwise censored. In PFI, patients having new tumor event whether it was a progression of the disease, local recurrence, distant metastasis, new primary tumor event, or died with cancer without new tumor event, including cases with a new tumor event whose type is N/A were considered as "event occurred" and all other patients were censored. DFI was similar to PFI with the inclusion of censored patients with new primary tumors in other organs; patients who were dead with tumors without new tumor event and patients with stage IV were excluded. In DSS, disease-specific survival time in days, last contact days, or death days, whichever was larger, was used to identify "event occurred" versus censored patients¹⁰⁰.

We checked the subtype-specific association of gene expression of computed target genes in BRCA, LGG, KIRC, LUSC and PAAD. We used the R package TCGAbiolinks⁷⁷ to download the different subtype labels for the different cancer types.

Discussion

We developed a computational pipeline called miRDriver, which integrates multi-omics datasets such as CNA, DNA methylation, TFs, gene, and miRNA expression to infer copy number-derived miRNA-gene interactions in cancer. In the current study, we extended the use of miRDriver with an R package and carried out a comprehensive and rigorous analysis of the pan-cancer characterization of TCGA samples to infer miRNA-gene interaction networks integrating multi-omics datasets. We focused on DNA aberration regions of 7294 cancer samples associated with eighteen different cancer types uncovering the tissue-specific omics interplay in establishing the miRNA–gene associations. miRDriver outperformed several existing methods in all different cancer types used in the study. In each case, miRDriver was able to select many miRNA-gene interactions enriched in known miRNA-target databases. We observed that selected miRNAs by miRDriver were significantly enriched in the known cancer-related miRNAs.

Several cancer-related biological pathways and GO terms were found to be enriched in the computed genes. Among these, GPCR-related pathways, which play crucial roles in tumor development, invasion, migration, survival, and metastasis, were enriched in ten or more cancer types. More than 40% of the total computed genes were cited in cancer-related studies based on OncoScore frequency. Among these, at least 50% of genes had more than ten cancer-related citations.

We highlighted 22 common miRNAs that were frequently selected in multiple cancer types and explored their prognostic roles. Several of these miRNAs had significant survival differences in high and low-expression patient sample groups. Among these, miRNAs belonging to the let-7 family were found to act as both tumor suppressors and oncogene in several studies¹⁰¹. miR-100, miR-149, miR-210, miR-31, miR-346, miR-34b, miR-486 and miR-675 were cited in cancer-related studies with high OncoScore frequency. We found several enriched GO terms with the computed targets of these 22 common miRNAs. Among these, GO terms such as "Regulation of gene silencing by miRNA" and "Regulation of post-transcriptional gene silencing" were implicated in several cancer-related studies explaining the miRNAs' roles in cancer initiation and progression ^53,102. The GO term "Chromatin silencing" was involved in cancer ^49,103. The GO term "DNA replication-dependent nucleosome assembly" has been studied concerning cell fate and differentiation regulation and suggested to be explored in cancer in a recent study¹⁰⁴.

We also assessed these common miRNAs as non-invasive biomarkers, such as the presence of these miRNAs as the circulating miRNAs that can be detected in organic liquids effectively after getting discharged by the tumor cells. For this purpose, we submitted these 22 miRNAs to the MiRandola¹⁰⁵ database as a knowledge base for extracellular circulating miRNAs for inferring their relevance as non-invasive biomarkers. We found ten out of 22 common miRNAs, namely let-7b, miR-100, miR-1249, miR-149, miR-210, miR-31, miR-346, miR-34b, miR-486 and miR-675, to be as potent non-invasive biomarkers.

Although there were common miRNAs across multiple cancer types, there were not many common miRNA-gene interactions. Only fourteen common interactions were shared in at least two cancer types among ~ 10,000 computed interactions. Considering the much higher number of target genes than the miRNAs used in this analysis, these findings were not surprising. We discussed several of these interactions that were found to be in experimental studies.

We identified several cancer driver genes targeted by multiple miRNAs (i.e., high-degree genes) across different cancer types. Also, high-degree target genes have been shown to have a strong association with the molecular subtypes in multiple cancer types, such as BRCA, LGG, LUSC and PAAD. Specifically, in BRCA, 106 high-degree genes (three genes were common with PAM50 genes) were found to serve as subtype-specific gene signatures with high classification accuracy with respect to the baseline PAM50 gene-based subtypes. We compared the prognostic significance of low-degree target genes with high-degree target genes in the disease progression and survival hazards. We discovered high-degree genes to be more significant prognostic factors than low-degree genes. These findings point out that multiple miRNAs in coordination can impact the gene expression stronger than a single miRNA.

The presented pan-cancer-wide analysis discovering copy number-aberration-influenced miRNA-target associations may be used in future experimental work to validate the roles of the miRNAs in context-specific gene regulation to derive even greater confidence in their tissue-specific associations. We integrated several potential co-regulators such as CNA, DNA methylation, miRNA expression and TFs, that can influence trans gene's expression in the LASSO step. Other potential regulators such as histone modification and chromatin accessibility (such as ATAC-seq) could also be integrated. miRDriver outperformed the existing sequence-based ceRNA inference tool, Cupid. This analysis reveals that this work can be further examined by taking into account the presence of recognized target sites that contribute to gene regulation, as well as utilizing ceRNA interactions to improve the inferred miRNA-gene networks. miRDriver does compute both direct and indirect targets of miRNAs, which helps decipher the downstream biological processes and pathways regulated by these miRNAs. To identify the direct targets of these selected miRNAs, one could utilize sequence-based filtering.

Finally, in this study, we established miRDriver as an R software package and provided users with a variety of options for running our workflow with their preferred settings. Users can, for example, utilize the tool exclusively with up or down-regulated genes from amplified or deleted regions, or both. However, in these cases, the context in which miRNA-gene interactions are discovered will limit their detection. To receive the most comprehensive list of miRNA-gene interactions, we propose that users evaluate all of the directions. In the software, we have also included the flexibility to utilize user-defined TF-targets with evidence-based confidence levels filtering options for cancer-related TF-target interactions from the DoRothEA gene set resource¹⁰⁶. In this study, however, we used only the highly confident TF-target interactions from TRED and TRRUST in the LASSO step as using many predictors in LASSO could affect its performance, and cause false positive and false negative interactions. Furthermore, considering gene expression is controlled at multiple levels, including transcriptional regulation and post-transcriptional regulation, our software provides the flexibility to run the LASSO step in two phases. In the first run, only the transcriptional predictors could be utilized to explain the expression variation. In the second run, post-transcriptional predictors and the residual of the first LASSO run can be utilized as the independent and dependent variables, respectively. Alternatively, if the user has the transcriptional and post-transcriptional expression change data, both LASSO runs can be performed in any order. The details of all these options can be accessed in the vignette of the miRDriver R package.

Data availability

The miRDriver pipeline was developed as an R package. The source codes of the package are available at https://github.com/bozdaglab/miRDriver under Creative Commons Attribution Non Commercial 4.0 International Public License. The scripts for running the pipeline and the evaluation results can be accessed from the supplementary documents. The datasets can be accessed from Figshare via https://figshare.com/s/7400ad8445b2e78e4636 .

References

He, L. & Hannon, G. J. MicroRNAs: Small RNAs with a big role in gene regulation. Nat. Rev. Genet. 5, 522–531 (2004).
CAS PubMed Google Scholar
Esquela-Kerscher, A. & Slack, F. J. Oncomirs—MicroRNAs with a role in cancer. Nat. Rev. Cancer 6, 259–269 (2006).
CAS PubMed Google Scholar
Liu, W., Lv, C., Zhang, B., Zhou, Q. & Cao, Z. MicroRNA-27b functions as a new inhibitor of ovarian cancer-mediated vasculogenic mimicry through suppression of VE-cadherin expression. RNA 23, 1019–1027 (2017).
CAS PubMed PubMed Central Google Scholar
Parikh, A. et al. microRNA-181a has a critical role in ovarian cancer progression through the regulation of the epithelial–mesenchymal transition. Nat. Commun. 5, 1–16 (2014).
ADS Google Scholar
Margolin, A. A. et al. ARACNE: An algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics 7, S7 (2006).
PubMed PubMed Central Google Scholar
Li, Y., Liang, C., Wong, K.-C., Jin, K. & Zhang, Z. Inferring probabilistic miRNA–mRNA interaction signatures in cancers: a role-switch approach. Nucleic Acids Res. 42, e76 (2014).
CAS PubMed PubMed Central Google Scholar
Pham, V. V. et al. Identifying miRNA-mRNA regulatory relationships in breast cancer with invariant causal prediction. BMC Bioinformatics 20, 143 (2019).
PubMed PubMed Central Google Scholar
Williams, J. Causal inference using invariant prediction: identification and confidence intervals | Max Planck Institute for Intelligent Systems. https://is.tuebingen.mpg.de/.
Le, T. D. et al. Inferring microRNA–mRNA causal regulatory relationships from expression data. Bioinformatics 29, 765–771 (2013).
CAS PubMed Google Scholar
Shlien, A. & Malkin, D. Copy number variations and cancer. Genome Med. 1, 62 (2009).
PubMed PubMed Central Google Scholar
Taylor, B. S. et al. Functional copy-number alterations in cancer. PLoS ONE 3, e3179 (2008).
ADS PubMed PubMed Central Google Scholar
Bertoli, G., Cava, C. & Castiglioni, I. MicroRNAs: New biomarkers for diagnosis, prognosis, therapy prediction and therapeutic tools for breast cancer. Theranostics 5, 1122–1143 (2015).
CAS PubMed PubMed Central Google Scholar
Calin, G. A. et al. MiR-15a and miR-16-1 cluster functions in human leukemia. Proc. Natl. Acad. Sci. 105, 5166–5171 (2008).
ADS CAS PubMed PubMed Central Google Scholar
Zhang, L. et al. microRNAs exhibit high frequency genomic alterations in human cancer. Proc. Natl. Acad. Sci. USA 103, 9136–9141 (2006).
ADS CAS PubMed PubMed Central Google Scholar
Setty, M. et al. Inferring transcriptional and microRNA-mediated regulatory programs in glioblastoma. Mol. Syst. Biol. 8, 605 (2012).
PubMed PubMed Central Google Scholar
Li, Y., Liang, M. & Zhang, Z. Regression Analysis of Combined Gene Expression Regulation in Acute Myeloid Leukemia. PLOS Comput. Biol. 10, e1003908 (2014).
ADS PubMed PubMed Central Google Scholar
Bose, B. & Bozdag, S. miRDriver: A Tool to Infer Copy Number Derived miRNA-Gene Networks in Cancer. in Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics 366–375 (Association for Computing Machinery, 2019). https://doi.org/10.1145/3307339.3342172.
Tibshirani, R. Regression Shrinkage and Selection via the Lasso. J. R. Stat. Soc. Ser. B (Methodol.) 58, 267–288 (1996).
MathSciNet MATH Google Scholar
Ashburner, M. et al. Gene Ontology: Tool for the unification of biology. Nat. Genet. 25, 25–29 (2000).
CAS PubMed PubMed Central Google Scholar
Braschi, B. et al. Genenames.org: The HGNC and VGNC resources in 2019. Nucleic Acids Res. 47, D786–D792 (2019).
CAS PubMed Google Scholar
Chiu, H.-S. et al. Cupid: Simultaneous reconstruction of microRNA-target and ceRNA networks. Genome Res. 25, 257–267 (2015).
CAS PubMed PubMed Central Google Scholar
Jassal, B. et al. The reactome pathway knowledgebase. Nucleic Acids Res. 48, D498–D503 (2020).
CAS PubMed Google Scholar
Kanehisa, M. & Goto, S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28, 27–30 (2000).
CAS PubMed PubMed Central Google Scholar
Ulgen, E., Ozisik, O. & Sezerman, O. U. pathfindR: An R package for comprehensive identification of enriched pathways in omics data through active subnetworks. Front. Genet. 10, 858 (2019).
CAS PubMed PubMed Central Google Scholar
Subramanian, A. et al. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA 102, 15545 (2005).
ADS CAS PubMed PubMed Central Google Scholar
Liberzon, A. et al. The Molecular Signatures Database (MSigDB) hallmark gene set collection. Cell Syst. 1, 417–425 (2015).
CAS PubMed PubMed Central Google Scholar
Yu, G., Wang, L.-G., Han, Y. & He, Q.-Y. clusterProfiler: An R package for comparing biological themes among gene clusters. OMICS 16, 284–287 (2012).
CAS PubMed PubMed Central Google Scholar
Gonzalez, H., Hagerling, C. & Werb, Z. Roles of the immune system in cancer: From tumor initiation to metastatic progression. Genes Dev. 32, 1267–1284 (2018).
CAS PubMed PubMed Central Google Scholar
Nicolini, A., Ferrari, P., Diodati, L. & Carpi, A. Alterations of signaling pathways related to the immune system in breast cancer: New perspectives in patient management. Int. J. Mol. Sci. 19, 2733 (2018).
PubMed Central Google Scholar
Arakaki, A. K. S., Pan, W.-A. & Trejo, J. GPCRs in cancer: Protease-activated receptors, endocytic adaptors and signaling. Int. J. Mol. Sci. 19, 1886 (2018).
PubMed Central Google Scholar
Bar-Shavit, R. et al. G Protein-Coupled Receptors in Cancer. Int J Mol Sci 17, 1320 (2016).
PubMed Central Google Scholar
Murray, I. A., Patterson, A. D. & Perdew, G. H. Aryl hydrocarbon receptor ligands in cancer: friend and foe. Nat. Rev. Cancer 14, 801–814 (2014).
CAS PubMed PubMed Central Google Scholar
van Waarde, A. et al. Potential applications for sigma receptor ligands in cancer diagnosis and therapy. Biochim. Biophys. Acta 10, 2703–2714. https://doi.org/10.1016/j.bbamem.2014.08.022 (2015).
Article CAS Google Scholar
Nguyen-Vu, T. et al. Liver × receptor ligands disrupt breast cancer cell proliferation through an E2F-mediated mechanism. Breast Cancer Res. 15, R51 (2013).
PubMed PubMed Central Google Scholar
Salik, B. et al. Targeting RSPO3-LGR4 signaling for leukemia stem cell eradication in acute myeloid leukemia. Cancer Cell 38, 263-278.e6 (2020).
CAS PubMed Google Scholar
Gong, X. et al. Aberrant RSPO3-LGR4 signaling in Keap1-deficient lung adenocarcinomas promotes tumor aggressiveness. Oncogene 34, 4692–4701 (2015).
CAS PubMed Google Scholar
Jiang, X. et al. miR-22 has a potent anti-tumour role with therapeutic potential in acute myeloid leukaemia. Nat. Commun. 7, 11452 (2016).
ADS CAS PubMed PubMed Central Google Scholar
Wang, J. et al. Molecular mechanisms and clinical applications of miR-22 in regulating malignant progression in human cancer (Review). Int. J. Oncol. 50, 345–355 (2016).
PubMed PubMed Central Google Scholar
Mhawech-Fauceglia, P. et al. Pax-5 immunoexpression in various types of benign and malignant tumours: a high-throughput tissue microarray analysis. J. Clin. Pathol. 60, 709–714 (2007).
CAS PubMed Google Scholar
Adler, E. K. et al. The PAX8 cistrome in epithelial ovarian cancer. Oncotarget 8, 108316–108332 (2017).
PubMed PubMed Central Google Scholar
Belotte, J. et al. The role of oxidative stress in the development of cisplatin resistance in epithelial ovarian cancer. Reprod. Sci. 21, 503–508 (2014).
PubMed PubMed Central Google Scholar
López-Urrutia, E., BustamanteMontes, L. P., Ladrón de Guevara Cervantes, D., Pérez-Plasencia, C. & Campos-Parra, A. D. Crosstalk between long non-coding RNAs, micro-RNAs and mRNAs: Deciphering molecular mechanisms of master regulators in cancer. Front. Oncol. 9, 669 (2019).
PubMed PubMed Central Google Scholar
Paraskevopoulou, M. D. & Hatzigeorgiou, A. G. Analyzing MiRNA-LncRNA interactions. Methods Mol Biol 1402, 271–286 (2016).
CAS PubMed Google Scholar
Jiang, M.-C., Ni, J.-J., Cui, W.-Y., Wang, B.-Y. & Zhuo, W. Emerging roles of lncRNA in cancer and therapeutic opportunities. Am. J. Cancer Res. 9, 1354–1366 (2019).
CAS PubMed PubMed Central Google Scholar
Zhang, J. et al. The transcriptional landscape of lncRNAs reveals the oncogenic function of LINC00511 in ER-negative breast cancer. Cell Death Dis. 10, 1–16 (2019).
Google Scholar
Jin, C., Rajabi, H. & Kufe, D. miR-1226 targets expression of the mucin 1 oncoprotein and induces cell death. Int. J. Oncol. 37, 61–69 (2010).
CAS PubMed Google Scholar
Ballestar, E. & Esteller, M. The impact of chromatin in human cancer: linking DNA methylation to gene silencing. Carcinogenesis 23, 1103–1109 (2002).
CAS PubMed Google Scholar
Sarthy, J. F., Henikoff, S. & Ahmad, K. Chromatin bottlenecks in cancer. Trends Cancer 5, 183–194 (2019).
CAS PubMed Google Scholar
Brock, M. V., Herman, J. G. & Baylin, S. B. Cancer as a manifestation of aberrant chromatin structure. Cancer J. 13, 3–8 (2007).
CAS PubMed PubMed Central Google Scholar
Foglizzo, M. et al. A bidentate Polycomb Repressive-Deubiquitinase complex is required for efficient activity on nucleosomes. Nat. Commun. 9, 3932 (2018).
ADS PubMed PubMed Central Google Scholar
Lu, Y. et al. Epigenetic regulation in human cancer: the potential role of epi-drug in cancer therapy. Mol. Cancer 19, 79 (2020).
PubMed PubMed Central Google Scholar
Perri, F. et al. Epigenetic control of gene expression: Potential implications for cancer treatment. Crit. Rev. Oncol. Hematol. 111, 166–172 (2017).
CAS PubMed Google Scholar
Oliveto, S., Mancino, M., Manfrini, N. & Biffo, S. Role of microRNAs in translation regulation and cancer. World J. Biol. Chem. 8, 45–56 (2017).
PubMed PubMed Central Google Scholar
Peng, Y. & Croce, C. M. The role of MicroRNAs in human cancer. Signal Transduct. Target Ther. 1, 1–9 (2016).
Google Scholar
Lemoine, N. R. Silencing RNA: A novel treatment for pancreatic cancer?. Gut 54, 1215 (2005).
CAS PubMed PubMed Central Google Scholar
DeOcesano-Pereira, C. et al. Post-Transcriptional Control of RNA Expression in Cancer. Gene Expression and Regulation in Mammalian Cells - Transcription From General Aspects (IntechOpen, 2018). https://doi.org/10.5772/intechopen.71861.
Dhawan, A., Scott, J. G., Harris, A. L. & Buffa, F. M. Pan-cancer characterisation of microRNA across cancer hallmarks reveals microRNA-mediated downregulation of tumour suppressors. Nat. Commun. 9, 5228 (2018).
ADS CAS PubMed PubMed Central Google Scholar
Tate, J. G. et al. COSMIC: The catalogue of somatic mutations in cancer. Nucleic Acids Res. 47, D941–D947 (2019).
CAS PubMed Google Scholar
Ritchie, W., Rajasekhar, M., Flamant, S. & Rasko, J. E. J. Conserved Expression Patterns Predict microRNA Targets. PLOS Comput. Biol. 5, e1000513 (2009).
ADS PubMed PubMed Central Google Scholar
Catalanotto, C., Cogoni, C. & Zardo, G. MicroRNA in control of gene expression: An overview of nuclear functions. Int. J. Mol. Sci. 17, 1712 (2016).
PubMed Central Google Scholar
Valencia-Sanchez, M. A., Liu, J., Hannon, G. J. & Parker, R. Control of translation and mRNA degradation by miRNAs and siRNAs. Genes Dev. 20, 515–524 (2006).
CAS PubMed Google Scholar
Zhang, Z., Wang, Y., Zhang, J., Zhong, J. & Yang, R. COL1A1 promotes metastasis in colorectal cancer by regulating the WNT/PCP pathway. Mol. Med. Rep. 17, 5037–5042 (2018).
CAS PubMed PubMed Central Google Scholar
Duah, E. et al. Cysteinyl leukotriene 2 receptor promotes endothelial permeability, tumor angiogenesis, and metastasis. Proc. Natl. Acad. Sci. USA 116, 199 (2019).
CAS PubMed Google Scholar
Pellecchia, A. et al. Overexpression of ETV4 is oncogenic in prostate cells through promotion of both cell proliferation and epithelial to mesenchymal transition. Oncogenesis 1, e20–e20 (2012).
CAS PubMed PubMed Central Google Scholar
Ganaie, A. A. et al. Characterization of novel murine and human PDAC Cell models: Identifying the role of intestine specific homeobox gene ISX in hypoxia and disease progression. Transl. Oncol. 12(8), 1056–1071. https://doi.org/10.1016/j.tranon.2019.05.002 (2019).
Article PubMed PubMed Central Google Scholar
Li, N.-F. et al. Genetic Variations in the KCNJ5 Gene in Primary Aldosteronism Patients from Xinjiang, China. PLoS ONE 8, e54051 (2013).
ADS CAS PubMed PubMed Central Google Scholar
Yang, X. et al. NTRK1 is a positive regulator of YAP oncogenic function. Oncogene 38, 2778–2787 (2019).
CAS PubMed Google Scholar
Zhang, L. et al. SALL4, a novel marker for human gastric carcinogenesis and metastasis. Oncogene 33, 5491–5500 (2014).
CAS PubMed Google Scholar
Tabu, K. et al. A novel function of OLIG2 to suppress human glial tumor cell growth via p27Kip1 transactivation. J. Cell. Sci. 119, 1433–1441 (2006).
CAS PubMed Google Scholar
Pekow, J. et al. miR-4728-3p functions as a tumor suppressor in ulcerative colitis-associated colorectal neoplasia through regulation of focal adhesion signaling. Inflamm. Bowel Dis. 23, 1328–1337 (2017).
PubMed Google Scholar
Yu, Q. et al. miRNA-346 promotes proliferation, migration and invasion in liver cancer. Oncol. Lett. 14, 3255–3260 (2017).
PubMed PubMed Central Google Scholar
An, T. et al. Comparison of alterations in miRNA expression in matched tissue and blood samples during spinal cord glioma progression. Sci. Rep. 9, 9169 (2019).
ADS PubMed PubMed Central Google Scholar
Sun, C.-C. et al. The lncRNA PDIA3P interacts with miR-185-5p to modulate oral squamous cell carcinoma progression by targeting cyclin D2, molecular therapy. Nucleic Acids 9, 100–110. https://doi.org/10.1016/j.omtn.2017.08.015 (2017).
Article CAS PubMed PubMed Central Google Scholar
Yan, W., Liu, Z., Yang, W. & Wu, G. miRNA expression profiles in Smad4-positive and Smad4-negative SW620 human colon cancer cells detected by next-generation small RNA sequencing. Cancer Manag. Res. 10, 5479–5490 (2018).
CAS PubMed PubMed Central Google Scholar
Canlorbe, G. et al. Identification of microRNA expression profile related to lymph node status in women with early-stage grade 1–2 endometrial cancer. Mod. Pathol. 29, 391–401 (2016).
CAS PubMed Google Scholar
Zhang, J., Luo, X., Li, H., Deng, L. & Wang, Y. Genome-wide uncovering of STAT3-mediated miRNA expression profiles in colorectal cancer cell lines. Biomed Res Int 2014, 187105 (2014).
PubMed PubMed Central Google Scholar
Colaprico, A. et al. TCGAbiolinks: An R/Bioconductor package for integrative analysis of TCGA data. Nucleic Acids Res 44, e71 (2016).
PubMed Google Scholar
McInnes, L., Healy, J., Saul, N. & Großberger, L. UMAP: Uniform manifold approximation and projection. J. Open Sour. Softw. 3, 861 (2018).
Google Scholar
Parker, J. S. et al. Supervised risk predictor of breast cancer based on intrinsic subtypes. J Clin Oncol 27, 1160–1167 (2009).
PubMed PubMed Central Google Scholar
Chollet, F. et al. R Interface to Keras. https://github.com/rstudio/keras (2017).
Collisson, E. A., Bailey, P., Chang, D. K. & Biankin, A. V. Molecular subtypes of pancreatic cancer. Nat. Rev. Gastroenterol. Hepatol. 16, 207–220 (2019).
PubMed Google Scholar
Borgne, F. L. & Foucher, Y. IPWsurvival: Propensity Score Based Adjusted Survival Curves and Corresponding Log-Rank Statistic (2017).
Sano, L. D., Passerini, C. G., Piazza, R., Ramazzotti, D. & Spinelli, R. OncoScore: A tool to identify potentially oncogenic genes (Bioconductor version: Release (3.11), 2020). https://doi.org/10.18129/B9.bioc.OncoScore.
Bradburn, M. J., Clark, T. G., Love, S. B. & Altman, D. G. Survival analysis part II: Multivariate data analysis—An introduction to concepts and methods. Br. J. Cancer 89, 431–436 (2003).
CAS PubMed PubMed Central Google Scholar
Mermel, C. H. et al. GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers. Genome Biol. 12, R41 (2011).
PubMed PubMed Central Google Scholar
Reich, M., Liefeld, T., Tamayo, P. & Mesirov, J. GenePattern 2.0. Nat. Genet. 38(5), 500–501 (2006).
CAS PubMed Google Scholar
Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: A Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010).
CAS PubMed Google Scholar
Friedman, J. H., Hastie, T. & Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. https://doi.org/10.18637/jss.v033.i01 (2010).
Article PubMed PubMed Central Google Scholar
Tibshirani, R. J. The lasso problem and uniqueness. Electron. J. Statist. 7, 1456–1490 (2013).
MathSciNet MATH Google Scholar
Couzigou, J.-M. et al. Positive gene regulation by a natural protective miRNA enables arbuscular mycorrhizal symbiosis. Cell Host Microbe 21, 106–112 (2017).
CAS PubMed Google Scholar
Vasudevan, S. & Steitz, J. A. AU-rich-element-mediated upregulation of translation by FXR1 and Argonaute 2. Cell 128, 1105–1118 (2007).
CAS PubMed PubMed Central Google Scholar
Vasudevan, S., Tong, Y. & Steitz, J. A. Switching from repression to activation: MicroRNAs can up-regulate translation. Science 318, 1931–1934 (2007).
ADS CAS PubMed Google Scholar
Xiao, M. et al. MicroRNAs activate gene transcription epigenetically as an enhancer trigger. RNA Biol 14, 1326–1334 (2017).
PubMed Google Scholar
Xu, T. & Thuc, L. FSbyMAD: Biological feature (such as gene) selection based on the most... in CancerSubtypes: Cancer subtypes identification, validation and visualization based on multiple genomic data sets. https://rdrr.io/bioc/CancerSubtypes/man/FSbyMAD.html.
Zhang,Jianhua. CNTools: Convert segment data into a region by sample matrix to allow for other high level computational analyses. R package version 1.40.0. (2019).
Maunakea, A. K. et al. Conserved role of intragenic DNA methylation in regulating alternative promoters. Nature 466, 253–257 (2010).
ADS CAS PubMed PubMed Central Google Scholar
Wang, D., Gu, J., Wang, T. & Ding, Z. OncomiRDB: A database for the experimentally verified oncogenic and tumor-suppressive microRNAs. Bioinformatics 30, 2237–2238 (2014).
CAS PubMed Google Scholar
Griffiths-Jones, S., Grocock, R. J., van Dongen, S., Bateman, A. & Enright, A. J. miRBase: MicroRNA sequences, targets and gene nomenclature. Nucleic Acids Res. 34, D140–D144 (2006).
CAS PubMed Google Scholar
Karagkouni, D. et al. DIANA-TarBase v8: A decade-long collection of experimentally supported miRNA–gene interactions. Nucleic Acids Res. 46, D239–D245 (2018).
CAS PubMed Google Scholar
Liu, J. et al. An integrated TCGA pan-cancer clinical data resource to drive high-quality survival outcome analytics. Cell 173, 400-416.e11 (2018).
CAS PubMed PubMed Central Google Scholar
Chirshev, E., Oberg, K. C., Ioffe, Y. J. & Unternaehrer, J. J. Let-7 as biomarker, prognostic indicator, and therapy for precision medicine in cancer. Clin. Transl. Med. https://doi.org/10.1186/s40169-019-0240-y (2019).
Article PubMed PubMed Central Google Scholar
Macfarlane, L.-A. & Murphy, P. R. MicroRNA: Biogenesis, function and role in cancer. Curr. Genomics 11, 537–561 (2010).
CAS PubMed PubMed Central Google Scholar
Hon, G. C. et al. Global DNA hypomethylation coupled to repressive chromatin domain formation and gene silencing in breast cancer. Genome Res. 22, 246–258 (2012).
CAS PubMed PubMed Central Google Scholar
Serra-Cardona, A. & Zhang, Z. Replication-coupled nucleosome assembly in the passage of epigenetic information and cell identity. Trends Biochem. Sci. 43, 136–148 (2018).
CAS PubMed Google Scholar
Russo, F. et al. miRandola 2017: A curated knowledge base of non-invasive biomarkers. Nucleic Acids Res 46, D354–D359 (2018).
CAS PubMed Google Scholar
Garcia-Alonso, L., Holland, C. H., Ibrahim, M. M., Turei, D. & Saez-Rodriguez, J. Benchmark and integration of resources for the estimation of human transcription factor activities. Genome Res. 29, 1363–1375 (2019).
CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

This work was supported by the National Institute of General Medical Sciences of the National Institutes of Health under Award Number R35GM133657.

Author information

Authors and Affiliations

Center for Genetic Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, 60611, USA
Banabithi Bose
Department of Mathematical and Statistical Sciences, Marquette University, Milwaukee, WI, 53201, USA
Matthew Moravec
Department of Computer Science and Engineering, University of North Texas, Denton, TX, 76203, USA
Serdar Bozdag

Authors

Banabithi Bose
View author publications
You can also search for this author in PubMed Google Scholar
Matthew Moravec
View author publications
You can also search for this author in PubMed Google Scholar
Serdar Bozdag
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

B.B. and S.B conceived the study, B.B conducted the study, S.B supervised the study, B.B and M.M developed the software, B.B wrote the manuscript, B.B and S.B reviewed and edited the manuscript.

Corresponding authors

Correspondence to Banabithi Bose or Serdar Bozdag.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information 1.

Supplementary Information 2.

Supplementary Information 3.

Supplementary Information 4.

Supplementary Information 5.

Supplementary Information 6.

Supplementary Information 7.

Supplementary Information 8.

Supplementary Information 9.

Supplementary Information 10.

Supplementary Information 11.

Supplementary Information 12.

Supplementary Information 13.

Supplementary Information 14.

Supplementary Information 15.

Supplementary Information 16.

Supplementary Information 17.

Supplementary Information 18.

Supplementary Information 19.

Supplementary Information 20.

Supplementary Information 21.

Supplementary Information 22.

Supplementary Information 23.

Supplementary Information 24.

Supplementary Information 25.

Supplementary Information 26.

Supplementary Information 27.

Supplementary Information 28.

Supplementary Information 29.

Supplementary Information 30.

Supplementary Information 31.

Supplementary Information 32.

Supplementary Information 33.

Supplementary Information 34.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Bose, B., Moravec, M. & Bozdag, S. Computing microRNA-gene interaction networks in pan-cancer using miRDriver. Sci Rep 12, 3717 (2022). https://doi.org/10.1038/s41598-022-07628-z

Download citation

Received: 22 November 2021
Accepted: 18 February 2022
Published: 08 March 2022
DOI: https://doi.org/10.1038/s41598-022-07628-z

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Introduction

Results

Computed miRNAs were significantly enriched in the experimentally-validated oncogenic miRNAs

Computed miRNA-gene interactions were enriched in the known miRNA-gene interactions

miRDriver outperformed five state-of-the-art methods in inferring significant miRNA-gene interactions

Computed genes were enriched in biological pathways, cancer hallmark and GO terms

Several cancer-related terms and pathways were enriched in the targets of the computed miRNAs

Computed target genes revealed the subtype-specific expression signature in multiple cancer types

Computed miRNAs were found to be potential biomarkers for patients' survival and progression of the disease in each cancer type

miRDriver discovered several cancer-specific miRNAs

The copy number changes of the computed miRNAs were predictive of their expressions

Selected high-degree genes were highly significant as potential biomarkers to predict prognosis in cancer patients than low-degree genes in several cancer types

Materials and methods

Running miRDriver on pan-cancer

Running state-of-the-art-methods

Datasets to run miRDriver on pan-cancer

Datasets to evaluate miRDriver

Discussion

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Publisher's note

Supplementary Information

Rights and permissions

About this article

Cite this article

Share this article

Comments

Search

Quick links