RBMMMDA: predicting multiple types of disease-microRNA associations

Accumulating evidences have shown that plenty of miRNAs play fundamental and important roles in various biological processes and the deregulations of miRNAs are associated with a broad range of human diseases. However, the mechanisms underlying the dysregulations of miRNAs still have not been fully understood yet. All the previous computational approaches can only predict binary associations between diseases and miRNAs. Predicting multiple types of disease-miRNA associations can further broaden our understanding about the molecular basis of diseases in the level of miRNAs. In this study, the model of Restricted Boltzmann machine for multiple types of miRNA-disease association prediction (RBMMMDA) was developed to predict four different types of miRNA-disease associations. Based on this model, we could obtain not only new miRNA-disease associations, but also corresponding association types. To our knowledge, RBMMMDA is the first model which could computationally infer association types of miRNA-disease pairs. Leave-one-out cross validation was implemented for RBMMMDA and the AUC of 0.8606 demonstrated the reliable and effective performance of RBMMMDA. In the case studies about lung cancer, breast cancer, and global prediction for all the diseases simultaneously, 50, 42, and 45 out of top 100 predicted miRNA-disease association types were confirmed by recent biological experimental literatures, respectively.

Scientific RepoRts | 5:13877 | DOi: 10.1038/srep13877 RWRMDA and HDMP obtained a reliable performance for miRNA-disease association prediction, but they can't be applied to the diseases without known related miRNAs. Furthermore, HDMP strongly rely on the selection of the number of neighbors considered in the model and it didn't set different values of this parameter when different diseases were investigated. Based on the assumption that if miRNAs implicated in a specific tumor phenotype, their target genes will be aberrantly regulated, Xu et al. 4 constructed a miRNA-target dysregulated network (MTDN) by integrating target prediction results and expression profiles data of miRNA and mRNA that in tumor and non-tumor tissues. Furthermore, feature vectors were extracted and support vector machine classifier was adopted to distinguish positive disease miRNAs from negative ones, respectively. However, this method needs the information of known negative disease-related miRNAs. By integrating the information of known miRNA-disease associations, disease-disease semantic similarity, and miRNA functional similarity, Chen et al. 57 further developed a semi-supervised prediction method (RLSMDA) based on the assumption that functional similar miR-NAs tend to be associated with similar diseases and the framework of regularized least squares. RLSMDA achieved excellent performance in the both cross validation and case studies about several important diseases. Specially, RLSMDA is a semi-supervised method, so it does not need negative samples. More importantly, RLSMDA demonstrated excellent predictive ability for diseases without any known related miRNAs.
However, all of these previous approaches can only predict binary associations between diseases and miRNAs. On the one hand, current rich information about different types of miRNA-disease associations has not been well exploited for disease-related miRNA prediction. On the other hand, no previous computational methods could predict the types of disease-miRNA associations. Predicting the different types of disease-miRNA associations can further broaden our understanding about the molecular basis of diseases in the level of miRNAs. In this paper, we developed the model of Restricted Boltzmann machine for multiple types of miRNA-disease association prediction (RBMMMDA) to predict different types of miRNA-disease associations. Based on this model, we could obtain not only new miRNA-disease associations, but also corresponding association types. Restricted Boltzmann machine (RBM) has become the core of deep learning and successfully applied to solve various problems. Based on the following considerations, we chose RBM to predict multiple miRNA-disease associations in this work. Firstly, compared to previous work in predicting miRNA-disease associations outlined in the introduction section, RBM model could be used to predict multi-type associations and has shown its powerful performance in multi-type drug-target interaction prediction 58 and neuroimaging data analysis 59 . Secondly, RBM provides a self-contained framework to obtain competitive classifiers directly and does not need to collect biological features and implement feature selection when classical methods such as SVM are adopted 60 . Thirdly, RBM could capture strong high-order non-linear correlations between the activities of features in the layer and therefore has demonstrated a good predict performance 58,60,61 . To our knowledge, RBMMMDA is the first model which could infer association types of miRNA-disease pairs on a large scale.
Leave-one-out cross validation (LOOCV) was implemented for RBMMMDA based on the known experimentally verified multiple types of miRNA-disease associations obtained from HMDD. As a result, the AUC of 0.8606 demonstrated the reliable and effective performance of RBMMMDA. Furthermore, RBMMMDA was evaluated by the case studies of lung cancer and breast cancer. Fifty and forty-two out of top 100 predicted miRNA-disease association types were confirmed by recent biological experimental literatures, respectively. Especially, RBMMMDA is a global approach, which can predict miRNA-disease association types for all the diseases simultaneously. Therefore, we applied RBMMMDA to all the diseases investigated in this study simultaneously and confirmed 45 out of top 100 predicted miRNA-disease association types. These confirmed associations were involved with as many as 10 important human complex diseases, such as breast cancer, hepatocellular cancer, non-small-cell lung cancer, and colorectal cancer. The excellent performance in the LOOCV and case studies fully demonstrated the potential value of RBMMMDA for the identification of miRNA-disease association types and the detection of human disease biomarkers.

Results
Performance evaluation. In this paper, we implemented LOOCV on the known multiple types of miRNA-disease associations obtained from HMDD to evaluate the predictive performance of RBMMMDA. Here, we considered all the diseases simultaneously to implement the global LOOCV. As for the parameters, we chose the learning rate as 0.01 and iterative number as 100 according to previous successful study of applying the idea of Restricted Boltzmann machine (RBM) into drug-target interactions prediction 58 . Specifically, each known miRNA-disease association was left out in turn as test association and other known multiple types of miRNA-disease associations were taken as seed associations. After that, RBMMMDA model was trained and predictive results were provided. Then this test miRNA-disease association was ranked relative to candidate associations which include all the miR-NA-disease pairs that don't have known experimental evidences. If the rank of the test miRNA-disease association exceeds the given threshold, the RBM model was considered to predict this miRNA-disease association correctly.
Finally, Receiver-Operating Characteristics (ROC) curve which plots true positive rate (TPR, sensitivity) versus false positive rate (FPR, 1-specificity) was drawn. Sensitivity refers to the percentage of the test miRNA-disease associations which are ranked higher than the given threshold. And specificity refers to the percentage of miRNA-disease associations that are below the threshold. Then the area under ROC curve (AUC) was calculated to evaluate the performance of RBMMMDA method. If AUC = 1, it means that the RBMMMDA method has perfect performance. And AUC = 0.5 indicates random performance. As a result, RBMMMDA achieved a reliable AUC of 0.8606 (See Fig. 1). Considering RBMMMDA is the first method to predict the multiple types of miRNA-disease associations, therefore there is no other method to implement performance comparisons. However, excellent predictive ability of RBMMMDA has been demonstrated based on the above LOOCV.
Case studies. Researchers in the field of computational biology and machine learning are much concerned about overfit, which means the training error would keep decreasing steadily and the generalization error would start increasing instead of decreasing. In order to see whether RBM tends to overfit, case studies have been implemented to validate the multiple types of miRNA-disease associations in the prediction list. All the known multiple types of miRNA-disease associations in the gold standard dataset were used as training samples to predict potential miRNA-disease associations and their association types for several important diseases based on the model of RBMMMDA. Prediction results were verified based on recent biological experimental results to demonstrate the prediction ability of RBMMMDA.
Breast cancer is currently regarded as the most leading type of invasive cancer in women worldwide and it is estimated that there will be approximately 231,840 new cases of invasive breast cancer and 40,290 breast cancer deaths happen among US women in 2015 62 . The number of the affected people is still increasing, which has been predicted to reach nearly 3.2 million new cases per year by 2050 63 . Invasive breast cancer would occur in about one eighth of the women from the United States in her lifetime. Breast cancer can also be diagnosed in men, but with a much lower ratio than that in women 64 . The majority deaths of the breast cancer come from the developing countries, where most of the women are diagnosed in late stages 65 . Recently, growing evidence shows that several miRNAs are highly correlated with breast cancer and play important roles in the tumorigenesis of breast cancer. There are 176 miR-NAs known to be related to the breast cancer in the golden standard dataset, and the associations of the breast cancer with related miRNA are categorized into four different subtypes according to the different supporting evidences. For example, mir-10b, which is up-regulated in metastatic breast carcinomas compared with the benign breast lesions, targets E-cadherin to promote tumor cell invasion, while mir-122 is down-regulated in breast cancer cells and functions to inhibit tumorigenesis of the cancer by targeting IGF1R 66,67 . This kind of miRNA and disease association is classified to be evidences from miRNA-target interaction. We implemented RBMMMDA to prioritize candidate miRNAs without the known relevance to breast cancer. As a result, among the top 10, 20 and 100 potential breast cancer-related miRNAs, 7, 13 and 42 miRNA-disease associations and their association type predications are supported by various biological experimental literatures, respectively (See Table 1 and Supplementary Table 1). It has been well-known that let-7 family mainly functions as tumor suppressors to inhibit breast cancer development and migration. Seven miRNAs from let-7 family has been ranked in top 10 predict list and five out of them has been confirmed by experimental literatures. For example, it has been confirmed that let-7i and let-7b can both inhibit the invasion of the breast cancer by targeting the oncogenes and tumor migration-related genes and induce tamoxifen sensitivity of the breast cancer by repressing the estrogen receptor α 68-72 ; Down-regulation of let-7 g promotes breast cancer invasion by stimulating GAB2 and FN1 expression 73 ; Androgen induced let-7a expression contributes to ER-, PR-, AR+ breast cancer As a result, RBMMMDA achieved a reliable AUC of 0.8606, demonstrating the reliable predictive ability of RBMMMDA. More importantly, RBMMMDA is the first method which could computationally predict the multiple types of miRNA-disease associations. pathogenesis 74 ; Let-7f is a tumor-suppressor miRNA in breast cancer, which is induced by Aromatase inhibitors(Als) treatment to inhibit the aromatase gene and is also involved in low-dose metronomic (LDM) paclitaxel therapy by targeting Thrombospondin-1 75,76 . Both mir-193b and mir-221 has been ranked in the top 10 prediction list for breast cancer and confirmed by literatures. Mir-193b decreases in breast cancer cells, which allows the expression of its target genes DNAJC13 and RAB22A, and promotes breast cancer progression 77 . The plasma mir-221 is accumulated in breast cancer patients 78 , and may be a predictive biomarker for sensitivity to Neoadjuvant chemotherapy in patients with breast cancer 79 .
According to the American Cancer Society, the lung cancer is the most common cause of cancer deaths worldwide in both man and woman, which account for about 13% of all new cancers and 27% all cancer deaths, greater than the combination of colon, breast, and prostate cancer 80 . There are estimated 1.4 million deaths of lung cancer each year [80][81][82][83] . The most affected people come from North American, Europe and East Asia. Especially, lung cancer has become the first cause of death among people with malignant tumors in China and the registered lung cancer mortality rate has increased by 464.84% 84 . The five-year survival rate of lung cancer is much lower than many other leading cancers, such as breast cancer and prostate cancer, due to the fact that most lung cases are diagnosed at late stage 80,81,[85][86][87] . So it's important and urgent to study the mechanism of the tumorigenesis of lung cancer and screen for new biomarkers for early detection 80,81,[88][89][90] . Recently, many miRNAs have been shown to play critical roles in lung cancer development and progression [91][92][93] . In our golden standard dataset, there are 52 lung cancer related microRNAs with various association types. For example, mir-101 is reduced in non-small cell lung cancer (NSCLC) and can suppress NSCLC development by targeting enhancer of Zeste Homolog2 94 , in contrast, the plasma miR-29c is significantly increased in NSCLC 95 . We further prioritize candidate miRNAs based on the scored calculated based on RBMMMDA. Half of top 100 potential lung cancer-related miRNAs and their association types are confirmed by several literatures. Especially, among the top 10 and top 20 prediction list, 90% of them have literature evidences (See Table 2 and Supplementary Table 2). In the top 10 potential related miRNAs, mir-34 family (a and b), which functions as tumor-suppressive miRNAs to induce apoptosis and inhibit proliferation in lung cancer cells by directly targeting TGFβ R2 and Met, are inactivated by CpG methylation at their promoter region [96][97][98][99][100][101][102][103] ; Also, mir-218, 133a and 143 are tumor-suppressors that play roles in inhibiting tumor cell invasion by targeting the tumorigenesis-related genes in lung cancer, such as N-cadherin, oncogenic receptors and so on [104][105][106][107][108][109][110][111][112][113] . There is also confirmed lung cancer-related miRNAs with the third association type. For example, sequence variants of mir-146a are associated with increased risk of NSCLC. Only mir-16 in the top 10 list currently has no supporting evidence. RBMMMDA is a global ranking method, which could predict potential multiple association types of miRNA-disease pair for all the diseases simultaneously. Therefore, RBMMMDA was further applied to simultaneously rank all the candidate miRNA-disease associations. As a result, 45 of top 100 potential associations have experimental evidences (See Table 3 and Supplementary Table 3). Except for the breast and lung cancer, there are many other diseases involved in the top 100 list, such as Hepatocellular carcinoma, Stomach neoplasms, Melanoma, and so on. In the top 10 prediction list, except for 7 breast cancer-related miRNAs, the regulatory mechanism of which are mentioned above, other 3 miRNAs in the top 10 list all belong to mir-34 family. MiR-34a is shown to inhibit the growth and the metastasis of gastric cancer by directly targeting Met and PDGFR 114 . The CpG island methylation frequency of miR-34b in hepatocellular carcinoma cancer (HCC) is significantly higher in the tumor cells compared with that in adjacent non-tumor tissues, which may correlate to the inactivation of miR-34b in HCC 115 . MiR-34c is predicted to be related to prostatic Neoplasms, which currently has no experimental evidence. In the top 100 association, we can further observer that one miRNA may be related to one disease based on different association types, while it may also be associated with different diseases based on the same association type. For example, miR-34a, the methylation frequency of which is significantly higher in hepatocellular carcinoma than that in the non-tumor tissues, inhibits the invasion of both stomach neoplasms and hepatocellular carcinoma by targeting Met [114][115][116][117] .
In conclusions, fifty, forty-two, and forty-five out of top 100 predicted miRNA-disease association types for breast cancer, lung cancer, and global prediction have been confirmed by recent biological experimental literatures, respectively. All of these validation results demonstrate that RBMMMDA doesn't have a tendency of overfit.
Predicting novel multiple types of miRNA-disease associations. Here, after confirming the reliable performance of RBMMMDA in the framework of LOOCV and case studies, we further applied RBMMMDA to predict potential human miRNA-disease associations under the four different association types for all the diseases investigated in this paper. All the known multi-type miRNA-disease associations obtained from HMDD were used as training data. For all the 174 diseases, we publicly released the top 100 potential related miRNAs under four association types for each disease to facilitate experimental validation of human miRNA-disease associations. In the above case studies about breast  cancer and lung cancer, most of the top 100 multiple types of miRNA-disease associations have been confirmed. Therefore it is anticipated that other potential multi-type miRNA-disease associations predicted by RBMMMDA could be validated by further biological experiments.

Discussions
Identifying novel miRNA-disease associations and their corresponding association types is vitally important goal for biological development, which plays a critical role in the understanding of disease pathogenesis at the miRNA level. In this paper, we proposed the first computational method, RBMMMDA, to predict different types of miRNA-disease associations on a large scale based on known multiple types of miRNA-disease associations derived from HMDD. RBMMMDA approach can effectively encode multiple types of miRNA-disease associations by constructing an RBM model and can effectively predict different types of miRNA-disease associations, including genetics, epigenetics, circulating miRNAs and miRNA-target interactions, respectively. The performance of RBMMMDA was evaluated by implementing LOOCV on the known experimentally verified multiple types of miRNA-disease associations. The AUC score of 0.8606 demonstrated the reliable and effective performance of RBMMMDA. Moreover, we implemented case studies of breast cancer and lung cancer for further evaluations, in which fifty and forty-two out of top 100 predicted miRNA-disease association types were confirmed by recent biological experimental literatures, respectively. More importantly, RBMMMDA was applied to predict multiple types of miRNA-disease associations for all the disease simultaneously and forty-five out of top 100 results were confirmed. All of these show the reliable performance of RBMMMDA. It is anticipated that RBMMMDA could be an important and valuable computational tool for miRNA-disease association prediction and miRNA biomarker identification for human disease diagnosis, treatment, prognosis and prevention.
The reliable performance of RBMMMDA can largely be attributed to the combination of the following several factors. Firstly, RBMMMDA takes full advantage of known multiple types of miRNA-disease associations obtained from HMDD to implement predictions, which can further help us to understanding the molecular basis of diseases in the level of miRNAs under four different association types. Secondly, as far as we know, compared with previous methods that could only predict the binary associations between miRNAs and diseases, RBMMMDA is the first computational approach for multiple types of miRNA-disease association prediction, which can not only predict potential miRNA-disease  Table 3. RBMMMDA is a global ranking method, which could predict potential multiple association types of miRNA-disease pairs for all the diseases simultaneously. Therefore, RBMMMDA was further applied to simultaneously rank all the candidate miRNA-disease associations. As a result, 13 of top 20 potential associations have experimental evidences.
Scientific RepoRts | 5:13877 | DOi: 10.1038/srep13877 associations, but also their corresponding association types. Finally, RBMMMDA could be applied to predict miRNA-disease association types for all diseases simultaneously. Of course, some limitations also exist in the current version of RBMMMDA. Firstly, how to choose the appropriate parameter values in RBMMMDA is not still solved well. Secondly, the current version of RBMMMDA only takes advantage of the information of known multiple types of miRNA-disease associations. In the future, new biological information, such as the disease similarity information and miRNA functional similarity, could be also incorporated into our predictive model to further improve the performance of RBMMMDA. Currently, the RBM model only considered the connections between visible layer and hidden layer, the connections within the same layers is not allowed, thus how to integrate the data of disease similarity and miRNA similarity still require careful consideration. Thirdly, RBMMMDA is not applicable to the diseases without any known miRNA-disease association information. Finally, RBMMMDA may cause bias to miRNAs with more known associated diseases. In the future, with the existence of more available experimental verified multiple types of human miRNA-disease associations, the performance of RBMMMDA will further be improved.

Methods
Multiple types of miRNA-disease associations. In this paper, we downloaded the data of miR-NA-disease associations from HMDD V2.0 (http://www.cuilab.cn/hmdd) constructed by Li et al. 44 , which provide a comprehensive resource of experimentally verified miRNA-disease associations and lays an important data fundamental for further miRNA-related computational research. The new version of database annotates miRNA-disease associations in more details, including miRNA-disease association data from miRNA-target interactions, circulation, epigenetics, and genetics. After getting rid of duplicate associations with the different evidences, we obtained 1680 distinct high-quality experimentally confirmed multi-type miRNA-disease associations about 174 diseases, 322 miRNAs and 4 different types of associations and used these miRNA-disease associations as training samples. Specifically, the data contains 682 miRNA-target interactions, 443 circulation, 199 epigenetics, and 356 genetics, respectively (see Supplementary Table 4). To our knowledge, the multiple types of miRNA-disease training samples used in this study have been the largest dataset until now. Considering training samples are incomplete and no previous computational models have been developed to solve this important problem, predicting multiple types of miRNA-disease associations is a difficult challenge. However, it is worth noting that RBMMMDA still obtained the reliable predictive performance in both LOOCV (AUC of 0.8606) and case studies about lung cancer, breast cancer, and global prediction for all the diseases simultaneously. More available associations obtained in the future would further improve the predictive performance of RBMMMDA.

RBMMMDA.
In this study, we developed the model of Restricted Boltzmann machine for multiple types of miRNA-disease association prediction (RBMMMDA) to predict different types of miRNA-disease associations (See Fig. 2, motivated by literature by Wang and Zeng 58 ). Based on this model, we could obtain not only new miRNA-disease associations, but also their corresponding association types. RBM has been successfully applied to many important research fields 58,118,119 .
As shown in Fig. 2, RBM is a two-layer undirected graphical model consisting of layers of visible units and hidden units, respectively. In our RBM model, a visible unit is used to represent a disease. Hidden units represent unknown features describing miRNA-disease associations. In visible or hidden layer, there is no intra-layer connection. Furthermore, each visible unit is connected to all hidden units.
In the first step in Fig. 2, a simple example is provided to demonstrate how to construct RBMs from a multidimensional miRNA-disease interaction network. There are two miRNAs and three diseases which are included in this simple miRNA-disease network. Firstly, each miRNA-disease pair is associated with four binary variables, which indicates whether this miRNA-disease pair corresponds to this association type (i.e. the miRNA-disease associations from the evidences of miRNA-target interactions, circulation, epigenetics and genetics, respectively). Then, a particular RBM is constructed for each single miRNA. Here, two RBMs are constructed, and each RBM contains three visible units representing three diseases. The binary numbers inside rectangles represent the states of visible units to indicate whether a disease and a miRNA have a connection under each specific type. RBM model captures the existed multi-type connections between disease and miRNA pair in the multidimensional miRNA-disease interaction network to implement further prediction. In this example, we have known miRNA-disease associations for miRNA 1. Therefore, three diseases send messages to hidden units and update their states, and then the states of hidden units for miRNA 1 are obtained. After that, hidden units send messages again to visible units and update their states. Based on this idea, RBM is trained and potential multiple types of miRNA-disease associations are obtained.
, denotes the state of i-th visible unit. In hidden layer, h j , j = 1, …, m denotes the state of j-th hidden unit. Let W ij k be the weight between visible variable v i k and hidden variable h j , and a i k , b j denote bias weights of visible units and hidden units, respectively. To further formulate our RBM model, a binary indicator vector r = (r 1 , …, r m ) is adopted, in which r j = 1 if there exists a known disease-miRNA interaction between the input miRNA and the j-th disease, and r j = 0 otherwise. And D ij is a parameter describing the effect of r on h.
Then the energy of a joint configuration (v, h) can be defined by Then the probability of a joint configuration can be defined by is called the normalizing constant or partition function. Then we can get the marginal distribution over all visible data v by summing all possible configurations of h.
According to equation (3), we can get the probability distribution over input data. In visible layer or hidden layer, there is no intra-layer connection, so we can define the following conditional probabilities: where σ(x) = 1/(1 + e −x ) is the logistic function. However, we do not know the values of many parameters, such as W ij k , a i k , b j , D ij . Therefore, a mean-field version of the Contrastive Divergence (CD) algorithm is adopted here to train RBM and obtain the values of various parameters. In the CD algorithm, we use the following procedure in each training pass to incrementally adjust the weights and bias to maximize the likelihood of visible data with respect to the parameters W ij k , a i k , b j and D ij .
where ε is the learning rate, . data denotes an average value over all input data for each update and . T denotes the average value over T mean-field iteration. Based on CD algorithm, the parameters of RBM model are obtained. Therefore, we can use this RBM model to implement prediction.
Prediction and Implementation. We can compute the following conditional probabilities after one mean-field iteration to predict the unknown interactions between disease and miRNA pair. Because there is no intra-layer connection between any pair of visible or hidden units, once given the input data, we can get the state of hidden units according to equation (10). Then, use equation (11), we can get the probability distribution of visible units as our final prediction. We implemented the whole algorithm in Java, and used the jaRBM package. In order to implement the algorithm, we need to initialize some variables. In this study, according to previous successful study of applying RBM to potential drug-target interactions prediction 58 , we set the number of hidden units m = 100, learning rate ε = 0.01, and chose Gaussian distribution with standard derivation of 0.1 to initialize W ij k , h j , a i k and D ij . As for other parameters, we used the default values defined in jaRBM package.
Webserver of RBMMMDA. In addition, we built a web server which can implements the prediction function of RBMMMDA. This web server is freely available at http://42.120.43.172/RBMMMDA/. This web server enables the prediction of multiple types of miRNA-disease associations based on RBMMMDA method. When visitors choose a specific disease, potential miRNA associated with this disease based on various association types would be provided. The final prediction results would be shown in a table, where the miRNA name, association type, and potential association probability would be included.