MicroRNA–mRNA interactions underlying colorectal cancer molecular subtypes

Colorectal cancer (CRC) transcriptional subtypes have been recently identified by gene expression profiling. Here we describe an analytical pipeline, microRNA master regulator analysis (MMRA), developed to search for microRNAs potentially driving CRC subtypes. Starting from a microRNA–mRNA tumour expression data set, MMRA identifies candidate regulator microRNAs by assessing their subtype-specific expression, target enrichment in subtype mRNA signatures and network analysis-based contribution to subtype gene expression. When applied to a CRC data set of 450 samples, assigned to subtypes by 3 different transcriptional classifiers, MMRA identifies 24 candidate microRNAs, in most cases downregulated in the stem/serrated/mesenchymal (SSM) poor prognosis subtype. Functional validation in CRC cell lines confirms downregulation of the SSM subtype by miR-194, miR-200b, miR-203 and miR-429, which share target genes and pathways mediating this effect. These results show that, by combining statistical tests, target prediction and network analysis, MMRA effectively identifies microRNAs functionally associated to cancer subtypes.


Min number of DBs predicting the interactions
List of microRNAs with a significant number of targets in the signature(s) for the subtype(s) in which they are differentially expressed. For each microRNAs is reported the classifier and the class in which the microRNA was differential and the sign. Then number of target mRNAs in the signature, hypergeometric p-value, Bonferroni adjusted p-value, observed on expected ratio and the minimum number of databases supporting microRNA target prediction used for the analysis

Mirna
Classifier miRNA sign signature

MMRA validation and comparisons with alternative procedures
To test the robustness and reliability of the MMRA pipeline we performed the following analyses: 1. Pipeline validation in two independent datasets 2. Comparisons with variants of the pipeline

Comparison with other pipelines and methods
For the sake of space, all the comparisons presented here were made for the CCMS classifier 1 because it is the one with the highest number of signature genes per subtype and therefore the one giving the largest lists of candidate microRNAs in output. We want to emphasize that the absence or low size of a gene signature is not an obstacle for the pipeline application. In fact, there are many procedures that can be applied for signature construction starting from an annotated expression dataset. However, it is better to reconstruct the gene signature on a dataset independent from the one that is used to apply the MMRA pipeline, to avoid overfitting and to guarantee that the defined signature is effectively well representative of the studied phenotypes.

Pipeline validation in two independent datasets
We considered testing the MMRA pipeline on an independent dataset, but we weren't able to find a paired mRNA/microRNA CRC expression dataset of at least 100 samples needed to apply the pipeline. Therefore we randomly divided the TCGA dataset in two subsets using the R function "sample()" 2,3 . We compared the outputs obtained in the two datasets at each step of the pipeline. After the first step, we obtained 55 differentially expressed microRNAs in dataset one and 67 in dataset two, with an intersection of 44 microRNAs (best / worst validation rate = 80% / 66%). After the second step of target transcript enrichment analysis we obtained 22 microRNAs in dataset one and 18 in dataset two, with an intersection of 15 microRNAs (best / worst validation rate = 83% / 69%). Finally, after the network analysis phase, we obtained 17 microRNAs in dataset one and 14 microRNAs in dataset two, with an intersection of 11 microRNAs (best / worst validation rate = 79% / 65%). These results allowed us to estimate that in suboptimal conditions due to reduced size of the data subsets, the independent validation rate was between 65% and 80% at all steps of the analysis. Notably, all four microRNAs experimentally validated on cell lines were included in the 11 cross-validated microRNAs, showing that biologically relevant interactions emerge repeatedly also when lower size datasets are employed for MMRA.

Comparison with variants of the pipeline
Five variations of the MMRA pipeline were implemented and compared with the original procedure that, when applied on CCMS-classified samples, yielded 13 microRNA/subtype associations, of which 5 (40%) were validated in cell lines. Of note, the comparison was made considering microRNA/subtype associations, not only on the number of microRNAs: one microRNA may have more than one subtype association.
-Alternative pipeline 1: microRNA differential expression analysis followed only by target enrichment analyses, i.e. only steps 1 and 2 of MMRA. Given that steps 3 and 4 of MMRA filtered out only 7 microRNAs, we considered their removal from the pipeline. This yielded 24 microRNA/subtype associations, of which only 6 (25%) validated in cell lines. All 13 associations identified by MMRA were of course also found here. This pipeline variant can therefore be considered slightly more sensitive and noticeably less specific.
-Alternative pipeline 2: microRNA differential expression analysis and target enrichment analyses followed by Stepwise linear regression, to test the contribution of the network analysis step to the performances of MMRA. This variant yielded 12 microRNA/subtype associations, of which only 3 (25%) validated in cell lines. Of the 13 associations identified by MMRA, 8 were also found by this pipeline, of which 1 (13%) validated in cell lines. This pipeline can therefore be considered less sensitive and substantially less specific.
-Alternative pipeline 3: microRNA differential expression analysis followed by GSEA analysis on genes ranked by their correlation with the microRNA. The GSEA step is aimed at verifying, without the use of thresholds, whether, among all expressed genes, microRNA predicted targets belonging to the associated subtype gene signature are significantly more anticorrelated or correlated with the microRNA across the whole dataset. This variant yielded 69 microRNA/subtype associations, of which only 19 (28%) validated in cell lines.
All 13 associations identified by MMRA were also found by this pipeline, that can therefore be considered more sensitive but considerably less specific.
-Alternative pipeline 4: microRNA differential expression analysis and target enrichment analyses followed by GSEA analysis to verify if the genes belonging to the signature associated to the microRNA are significantly more anticorrelated or correlated with the microRNA in respect to all the expressed genes. This variant yielded 23 microRNA/subtype associations, of which only 3 (13%) validated in cell lines. Of the 13 associations identified by MMRA, 11 were also found by this pipeline, of which 3 (27%) validated in cell lines. This pipeline can therefore be considered slightly more sensitive and substantially less specific.
-Alternative pipeline 5: microRNA differential expression analysis followed only by stepwise linear regression analysis. This yielded 10 microRNA/subtype associations, of which only 2 (20%) validated in cell lines. 2 of the 13 associations identified by MMRA were also found here. This pipeline variant can therefore be considered slightly less sensitive and noticeably less specific.
-Alternative pipeline 6: microRNA differential expression analysis followed by stepwise linear regression analysis restricted to the signature genes that have a miRNA-target relationship. This variant yielded 7 microRNA/subtype associations, of which only 2 (29%) validated in cell lines. 3 of the 13 associations identified by MMRA were also found by this pipeline, that can therefore be considered considerably less sensitive and less specific.
These results show that every tested change to the MMRA pipeline always reduced its specificity, in some cases increasing sensitivity and in others also reducing sensitivity.

Comparisons with other pipelines and methods
-Alternative pipelines. We considered for comparison simpler pipelines originally developed to discover microRNA-mRNA interactions dysregulated in cancer vs. normal tissue. It must be noted that the differences between normal and transformed tissues are much wider than those between tumor subtypes. Two such pipelines are available online: The first, by Fu and colleagues 4 , is also at the basis of other pipelines, and involves microRNA and mRNA differential expression analysis, followed by anticorrelation analysis, leading to the selection of anticorrelated microRNA/mRNA pairs in which the mRNA is also a predicted target of the microRNA; the second, by Pizzini and colleagues 5 , follows the basic steps of the Fu pipeline, but in the final output also the microRNA-mRNA pairs in which the mRNA is not significantly differentially expressed are reported. Moreover, this pipeline also integrates the effects of transcription factors on these interactions, which adds an additional variable to the interaction analysis. For this reason we selected for comparison the basic Fu pipeline. In our dataset, the output of this pipeline was composed of broad lists of microRNA-mRNA interactions. Each microRNA was frequently associated to more than one subtype. No prioritization was made in the output based on the potentiality of the microRNA to be driver of the associated class. Finally, this kind of pipeline takes in account only those microRNA-target interactions supported by anticorrelation, even if has been recently observed that microRNAs can act also indirectly through the regulation of silencing complexes 6 . Overall, the number of microRNAs identified by the two pipelines is comparable, but the number of microRNA/class associations was higher for the Fu pipeline, because it associated each microRNA to more than one subtype in the same classifier. This in principle could happen also for MMRA, but it did not occur in the present analysis. Therefore, while the MMRA pipeline identified 13 microRNAs, each with one subtype association, the Fu pipeline identified 40 microRNA/subtype associations involving 23 microRNAs. The fraction of associations validated in cell lines was reduced to 32% in the Fu pipeline. However, of the 13 interactions identified by MMRA, 8 were also identified by the FU pipeline, and the validation rate in cell lines for these associations raised to 63% (5 out of 8). This result indicates that combined use of the two pipelines could result in shorter but more reliable lists of microRNA/subtype associations.
-Alternative method. The last question that we addressed is if a simple miRNA differential expression analysis followed by analysis of anticorrelation with subtype signature genes could bring to the identification of the same microRNAs obtained by MMRA. As a first point, the output of such alternative procedure would be a list of microRNAs ranked by their anticorrelation to subtype signature genes, after which the problem of choosing significance thresholds and estimating FDR would have to be addressed. To avoid choosing thresholds, we verified whether the top subtype signature anti-correlated MiRs were different from those identified by MMRA. As described above, MMRA applied to CCMS classification and signatures identified 13 microRNAs of which 12 associated to the CCMS4-UP signature and one associated to the CCMS1-DOWN signature, where "UP" and "DOWN" mean up-regulation and down-regulation in the subtype, respectively. Given that the microRNA associated to the CCMS1-DOWN signature is also the only one differentially expressed in this class, correlation analysis would not allow comparing ranks.
For this reason the comparison was performed only for CCMS4-UP-associated microRNAs. The number of differentially expressed microRNAs in CCMS4 subtype is 38.
For each differentially expressed microRNA, we computed the mean correlation with the genes of the CCMS4-UP signature. Then, microRNAs were ranked by increasing average correlation values, and the 12 microRNAs most anticorrelated with the CCMS4-UP signature were compared with the 12 microRNAs associated by MMRA to the CCMS4-UP signature. Only 7 microRNAs were present in both lists. Moreover miR-203, that we functionally validated in cell lines, was in position 17 of the anticorrelation ranking, therefore it would not be selected according to this kind of procedure. Interestingly the validation rate in cell lines of this alternative method is 33% while the validation rate of MMRA output restricted only to CCMS4 is 42%. This shows that a simple analysis of differential expression followed by correlation analysis couldn't capture the relationship between master MiRs and subtype profiles identified by MMRA. In fact some of the most anticorrelated microRNAs are not significant in the MMRA pipeline, and vice versa.