Introduction

MicroRNAs (miRNAs) are a class of short non–coding RNAs (21–25 nt)1,2,3. As an important transcriptional regulatory factor, miRNAs are widely involved in the biological procedures of disease-related gene regulation, which is closely related to human multi-gene diseases4,5,6,7. Increasing evidences have demonstrated that miRNAs play a critical role in the emergence and development of diseases8,9. Hence, revealing miRNAs associated diseases is an efficient way to accelerate the acquaintance about disease pathology at the molecular level10,11,12,13.

As detecting miRNA-disease associations by experimental techniques is expensive and time-consuming, many effective computational methods about the prediction of the relationship between miRNAs and diseases have been proposed. For example, Jiang et al.14 proposed a computational approach to infer potential miRNA-disease associations by hypergeometric distribution. For a given disease, they priorited the entire human miRNAs. In addition, Jiang et al.15 further improved the calculation of concordance score between a miRNA and a given disease. Chen et al.16 firstly presented a prediction computational method named RWRMDA based on global network similarity, to predict novel human miRNA-disease associations by adopting the method of random walk on network of miRNA functional similarity. Then, Xuan et al.17 developed a reliable prediction method based on random walk, they assigned different weights to transition matrix of miRNAs depending on whether they are associated with given diseases to exploit the prior information of nodes and the various ranges of topologies. And they extended the walk on a miRNA-disease bipartite network to predict candidates miRNAs, specially for the diseases without any known related miRNAs. Furthermore, Chen et al.18 developed a novel prediction method named WBSMDA for inferring miRNA-disease based on integrating miRNA functional similarity, disease semantic similarity, the known miRNA-disease associations, and the Gaussian interaction profile kernel similarity into heterogeneous network. WBSMDA not only could deal with new diseases without any known associated miRNAs, but also could handle new miRNAs without any known associated disease. In 2016, Zeng et al.19 conducted a review on methods for predicting disease and miRNA associations based on biological interaction networks. After detailed comparing these methods, they pointed out the current challenges in predicting disease and miRNA correlations. Liu et al.20 further proposed a method to explore potential miRNAs related to diseases by integrating multiple biology data in 2017. In recent years, recommendation system algorithms have been successfully applied in many fields. Chen et al.21 presented a hybrid approach for miRNA-disease association prediction (HAMDA) method based on hybrid recommendation methods, which combined available biology data and network-based inference methods. However, just like the above mentioned methods, they only prioritized miRNAs by utilizing the same layers neighbor nodes of miRNAs and diseases rather than making use of the different structural and topological characteristics among subgraphs of heterogeneous networks. Luo et al.22 proposed a novel effective prediction model that use unbalanced bi-random walk to improve performance of prediction. They fully exploited the different topological and structural of miRNA similarity networks and disease similarity network. This method improved prediction performance, but ignored the prior information and the respective topological structural of bipartite network. Zeng et al.23 found that heterogeneous miRNA-disease networks perform better on prediction than single disease similarity networks, miRNA similarity networks, and the known disease-gene association networks in 2018. So they adopted a method of structural perturbation to improve the prediction accuracy of miRNA-disease association.

We believe that the topological and structural features of heterogeneous network contain important information which is useful for discovering more reliable miRNA-disease associations. In present work, we develop an efficient computational method based on hybrid recommendation approach and unbalanced bi-random walk, called BRWHNHA (Bi-random Walk on Heterogeneous Network based on Hybrid Approach), which exploits the characteristic of nodes and the topological structural of the known miRNA-disease association by using hybrid recommendation approach, and taken advantage of the different topological structural between similarity networks of miRNA and disease by adopting bi-random walk on heterogeneous network. The hybrid recommendation algorithm adds some virtual edges to heterogeneous networks by calculating the transition matrix of bipartite network, so that the unbalanced bi-random walk on the new heterogeneous network can find potential miRNAs related to diseases more efficiently. To validate the prediction ability of BRWHNHA, we adopted five-fold cross-validation and compared BRWHNHA with MIDPE17, HAMDA21, and BRWH22. The average AUC is 2.13%, 0.69%, and 2.20% higher than the three methods. The case studies on lung neoplasms and prostatic neoplasms, and in the top 50 predicted associations, there are 49 and 46 real associations, respectively. It further demonstrates the ability of BRWHNHA in discovering potential miRNAs associated with disease.

Results

To evaluate the prediction effectiveness of BRWHNHA in exploring undiscovered association between miRNAs and diseases, we compared BRWHNHA with MIDPE17, HAMDA21, and BRWH22 by five-fold cross-validation with repeating 100 times on the dataset obtained by Luo and Xiao22. For a given disease, we randomly divided the known-related miRNAs into five subsets with equal size. For each round, we used one subset as testing set and other four subsets as training set. After 5 rounds, we calculated the average AUC value. In order to reduce false positive, we recalculated miRNAs similarity and obtained a bran-new similarity matrix in each round of prediction. Then we calculated the probability of association between the given disease and miRNAs by BRWHNHA. Finally, all candidate miRNAs were ranked by association probability. The higher the miRNAs in testing set were ranked, the better the performance. As the most of diseases only have a few association with miRNAs that have been proved, the performance of the prediction methods can not be accurately evaluated. Hence we only tested the 22 diseases associated with at least 60 miRNAs as Luo and Xiao22. We only showed recall-precision curve of breast neoplasms and lung neoplasms. In addition, we analyzed effect of parameters on performance of BRWHNHA.

Performance evaluation

In this study, the novelty of BRWHNHA was to calculate the transition probability matrix of bipartite network by using hybrid recommendation algorithm, and then a bi-random walk on heterogeneous network based on hybrid approach was adopted. The average AUC is 83.55% without using hybrid recommendation algorithm, which was 2.14% less than using hybrid recommendation algorithm. Therefore, it is important to construct the transition probability matrix by hybrid recommendation algorithm. The prediction accuracy was actually improved by exploring the prior information and topological structure of bipartite networks. The same heterogeneous network was used on MIDPE17, HAMDA21 and BRWH22. The best parameters of α = 0.9 and γ = 0.8 for MIDPE, σ = 0.7 and ρ = 0.8 for HAMDA λ = 0.6, α = 0.4, r = 2, l = 1 for BRWH were adopted as reported in original papers.

As illustrated in Table 1, the average AUC values of MIDPE, HAMDA, BRWHA and BRWHNHA in 22 diseases are 83.55%, 85.00%, 83.49% and 85.69% respectively. BRWHNHA performed the best with AUC 2.13%, 0.69% and 2.20% higher than other three methods. Moreover, BRWHNHA is superior to MIDPE and BRWH in all measurements for 22 diseases. Although HAMDA achieves higher AUC than BRWHNHA in 7 out of 22 diseases, but BRWHNHA obtains better performance in most of diseases. Since HAMDA repeatedly uses the known miRNAs-disease association data in the measurement of miRNAs similarity, it maybe overestimate the results. ROC curves of BRWHNHA and other three methods corresponding to the maximum AUC value in five-fold cross-validation at 100 times have shown in Supplementary Fig. S1 in Additional file.

Table 1 Predicting outcomes for MIDPE, HAMDA, BRWH and BRWHNHA by the five-fold cross-validation.

In Fig. 1, we compared BRWHNHA with other three methods in the recall-precision curves of breast neoplasms and lung neoplasms based on five-fold cross-validation. The precision-recall curve was obtained by measuring recall and precision at positions of top k (k = 10, 20, …, 100). The results show that our method achieves the highest precision and recall in the top 20. Moreover, with the increase of k value, the precision of BRWHNHA decreases, but the recall increases. It suggestes that the associations ranked in top position have higher probability of being potential miRNA-disease associations. We also compared the statistical significance of the difference in predictive ability between BRWHNHA and other three methods by paired t-tests. The P-values are listed in Table 2. Obviously, BRWHNHA achieves better performance than MIDPE, HAMDA, BRWHA at the significance level of 0.05.

Figure 1
figure 1

Recall-precision curves of breast neoplasm and lung neoplasm by five-fold cross-validation.

Table 2 Pairwise comparison between BRWHNHA and another method by paired t-test on the AUC of prediction.

We also compared our method with SPM on the dataset used by Zeng et al.23. We found that our method performs slightly better than SPM on five subsets with equal size in five-fold cross-validation in most cases (comparison results are not shown here).

Effect of parameters in BRWHNHA

There are four parameters λ, α, r and l explored in our method. The parameter λ is the hybridization parameter to mediate between HeatS algorithm and ProbS algorithm different kinds of resource distribution processes, and parameter α plays the role to control the consistence between the predicted candidate miRNA-disease associations and the known associations. The parameters of r and l are the numbers of maximal random walk steps in miRNA similarity network and disease similarity network, respectively. We set various values of λ and α ranging from 0 to 1, the step length was 0.1. r and l were taken to be between 0 step to 5 steps, the step length was 1. Then, we calculated average AUC in the framework of five-fold cross-validation. Table 3 shows the effects of λ, α, r and l on the cross validation result in miRNA-disease association dataset. It can be observed that BRWHNHA achieves the best performance, when λ = 0.6, α = 0.4, r = 2, l = 1.

Table 3 Effects of parameters λ, α, r, l on prediction performance of BRWHNHA.

Case study

To further validate efficiency of BRWHNHA for discovering the potential associations between miRNAs and diseases, we conducted two case studies of Lung neoplasms and Prostatic neoplasms here. All known miRNA-disease associations released in June 2014 were regarded as training sets, and the set of candidate associations formed by all other associations. The prediction results of Lung neoplasms and Prostatic neoplasms were confirmed based on relevant literatures and two important public database: dbDEMC24 and MiR2Disease25.

Lung cancer is one of the malignant tumors with the highest morbidity and mortality, and it is the greatest health and life threat to human. Over the past 50 years, many countries have reported significant increases in the incidence and mortality of lung cancer. The first 50 predicted miRNA associated with lung cancer were shown in Table 4. As a result, among the top 20 and 50 potential Lung neoplasms associated miRNAs, 20 and 49 were confirmed by dbDEMC database, MiR2Disease and literature. Though there is no database or literature that proved the miRNA (hsa-mir-200) relevance to lung neoplasms, the mir-200 family, which includes 5 members (miRNA-200a, miRNA-200b, miRNA-200c, miRNA-429, and miRNA-141), is associated with Lung neoplasms in dbDEMC, so we have reason to believe that it is related to the disease. In addition, we also listed the potential miRNAs from top 51 to top100 (Supplementary Table S2 in Additional file 1).

Table 4 The first 50 potential miRNAs associated with lung neoplasms predicted by BRWHNHA.

Prostate neoplasms is an important malignant tumor in male patients. There are usually no clinical symptoms in the early stage. Currently, most of the patients admitted by prostate neoplasms are in the late stage26. Therefore, early diagnosis is an urgent problem. There are many evidences that have confirmed a link between miRNA and prostate neoplasms, and it could be therapeutically useful for the treatment of prostate neoplasms by regulating the expression of related miRNAs27,28. As a result of the case study for prostate neoplasms, 18 out of the top-20 and 46 out of the top-50 predicted miRNAs of prostate neoplasms were verified by dbDEMC, MiR2Disease and literature (shown in Table 5). However, hsa-mir-302f, hsa-mir-1915, hsa-mir-4257 and hsa-mir-1286 are not included in dbDEMC, MiR2Disease and literature. We also listed the potential miRNAs from top 51 to top100 (Supplementary Table S3 in Additional file 1).

Table 5 The first 50 potential miRNAs associated with prostate neoplasms predicted by BRWHNHA.

Conclusion

Taking full account of the different topological and structural characteristics of heterogeneous network is a very challenging and meaningful task in prioritizing potential disease-related miRNAs. In this paper, we first adopted an effective measurement, which is suitable for miRNAs and diseases without known miRNA-disease associations, to estimate the similarity of miRNAs and diseases. Then, we presented a BRWHNHA method based on hybrid recommendation algorithm and unbalanced bi-random walk to predict potential diseases associated miRNAs. We made full use of the prior information and topological structural by calculating the transition probability matrix of bipartite network in using hybrid recommendation algorithm, in addition, we fully exploited the topologies and structures of miRNA similarity network(MMS) and disease similarity network(DDS) in the different lever by adopting unbalanced bi-random walk on heterogeneous network. To assess the performance of BRWHNHA, we compared BRWHNHA with MIDP, HAMDA and BRWH on the dataset obtained by Luo and Xiao22. The results indicate that BRWHNHA has the best prediction ability among these methods, the average AUC was 2.13%, 0.69% and 2.20% higher than MIDP, HAMDA and BRWH, respectively. Furthermore, case studies on lung neoplasms and prostatic were employed to further identify the performance evaluation of BRWHNHA, which the top 49 out of 50 and 46 out of 50 predicted miRNA-disease associations respectively were confirmed by recently published literature and databases of dbDEMC and MiR2Disease. The results show that BRWHNHA can be used as an effective and important method to explore the potential association between miRNAs and diseases.

Nevertheless, there is a limitation on our BRWHNHA that should be improved in future study. That is there are many parameters need to be set in this method. So a more effective method need be adopted to find the optimal parameters.

Methods

The measurement of disease semantic similarity and miRNA functional similarity

As described in the category C of MeSH descriptor, the disease relationships can be regarded as a directed acyclic graph structure. The disease K can be represented as DAG(K) = (K, T(K), E(K))22, where T(K) represents the set of all the ancestor nodes of disease K and disease K, E(K) represents the set of all direct edges from parent nodes to child nodes in the subgraph, as shown in Fig. 2. For two diseases di and dj, the disease semantic similarity measurement DSS(di, dj) is defined by Luo and Xiao22.

Figure 2
figure 2

Hierarchical DAG graph of Lung neoplasms.

Based on the assumption that miRNAs with similar functions are more likely to be associated with similar diseases and vice versa29,30, the miRNA function similarity measurement MFS(mi, mj) for two miRNAs mi and mj is adopted, which proposed by Wang et al.31.

Gaussian interaction profile kernel similarity

The Gaussian interaction profile kernel similarity also is based on the assumption that miRNAs with similar functions are more likely to be associated with similar diseases and vice versa. Let \(A={({a}_{ij})}_{{n}_{m}\times {n}_{d}}\) be the adjacency matrix of MD, and nm denotes the number of miRNAs and nd the number of diseases, respectively. The Gaussian interaction profile kernel similarity is calculated by the known miRNA-disease associations21, so let IP(di) binary vector indicate whether disease di is associated with each miRNA, in other words, IP(di) is the ith column of A. The Gaussian interaction profile kernel similarity between two diseases di and dj is calculated as:

$$DGS({d}_{i},{d}_{j})=exp(\,-\,{r}_{d}\parallel IP({d}_{i})-IP({d}_{j}){\parallel }^{2})$$
(1)

where \({r}_{d}={r^{\prime} }_{d}/(\tfrac{1}{{n}_{d}}\,{\sum }_{i=1}^{{n}_{d}}\,\parallel IP({d}_{i}){\parallel }^{2})\) is the kernel bandwidth and \({r}_{d}^{^{\prime} }\) is a new bandwidth parameter (e.g. (\({r}_{d}^{^{\prime} }=1\) as32,33) to normalize rd.

For two miRNAs mi and mj, the Gaussian interaction profile kernel similarity is calculated as:

$$MGS({m}_{i},{m}_{j})=exp(\,-\,{r}_{m}\parallel IP({m}_{i})-IP({m}_{j}){\parallel }^{2})$$
(2)

where \({r}_{m}={r^{\prime} }_{m}/(\tfrac{1}{{n}_{m}}\,{\sum }_{i=1}^{{n}_{m}}\,\parallel IP({m}_{i}){\parallel }^{2})\) is the kernel bandwidth and \({r}_{m}^{^{\prime} }\) is a new bandwidth parameter to normalize rm.

Integrated similarity for miRNAs and diseases

A new disease similarity matrix can be obtained by integrated disease semantic similarity and the disease Gaussian interaction profile kernel similarity21. For two diseases di and dj, the new diseases similarity can be defined as follows:

$${D}_{S}({d}_{i},{d}_{j})=\{\begin{array}{ll}DSS({d}_{i},{d}_{j}), & DSS({d}_{i},{d}_{j})\ne 0\\ DGS({d}_{i},{d}_{j}), & {\rm{otherwise}}\end{array}$$
(3)

The integrated similarity between miRNAs mi and mj can be defined as follows:

$${M}_{S}({m}_{i},{m}_{j})=\{\begin{array}{ll}MFS({m}_{i},{m}_{j}), & MFS({m}_{i},{m}_{j})\ne 0\\ MGS({m}_{i},{m}_{j}), & {\rm{otherwise}}\end{array}$$
(4)

Hybrid recommendation algorithm

A binary network MD(M, D, E) is constructed by experimentally confirmed miRNA-disease association, where D represents all diseases nodes, M represents all miRNAs nodes, and E represents all edges in MD. The adjacency matrix A is defined as follows:

$${a}_{ij}=\{\begin{array}{ll}1, & {m}_{i}\,{\rm{is}}\,{\rm{associated}}\,{\rm{with}}\,{d}_{j}\\ 0, & {\rm{otherwise}}\end{array}$$
(5)

Zhou et al.34 proposed the hybrid recommendation algorithm, which combined the heat spreading (HeatS) algorithm and probabilistic spreading (ProbS) algorithm by incorporating the hybridization parameter λ to balance the accuracy of HeatS and the diversity of ProbS. For a given disease, HeatS and ProbS both work by assigning miRNA an initial resource represented by the vector f (where fi is the resource possessed by miRNA mi), which was redistributed though the transformation \(f^{\prime} =W\,\ast \,f\). The miRNA that possess more resource is more likely associated the given disease. In Fig. 3, the visualization process of HeatS algorithm and ProbS algorithm is presented.

Figure 3
figure 3

The HeatS (ac) and ProbS (df) algorithms at work on the bipartite miRNA-disease network. Disease are shown as green squares and dark green squares is a given disease, miRNAs are shown as red circles.

HeatS is defined as follows:

$${W}_{ij}^{H}=\frac{1}{k({m}_{i})}\,{\sum }_{l=1}^{{n}_{d}}\,\frac{{a}_{il}{a}_{jl}}{k({d}_{l})}$$
(6)
$$f^{\prime} ={W}^{H}\ast f$$
(7)

ProbS is defined as follows:

$${W}_{ij}^{P}=\frac{1}{k({m}_{j})}\,{\sum }_{l=1}^{{n}_{d}}\,\frac{{a}_{il}{a}_{jl}}{k({d}_{l})}$$
(8)
$$f^{\prime} ={W}^{P}\ast f$$
(9)

Hybrid recommendation algorithm is defined as follows:

$${W}_{ij}^{H+P}=\frac{1}{k{({m}_{i})}^{1-\lambda }k{({m}_{j})}^{\lambda }}\,\sum _{l=1}^{nd}\,\frac{{a}_{il}{a}_{jl}}{k({d}_{l})}$$
(10)
$$f^{\prime} ={W}^{H+P}\ast f$$
(11)

k(x) denotes the degree of nodes x in bipartite graph MD(M, D, E).

Our method BRWHNHA

In this paper, we present a BRWHNHA method based on hybrid recommendation algorithm and unbalance bi-random walk to predict potential diseases associated miRNAs. Luo et al.22 found that most of the nodes in DDS and MMS are isolated, and the sparsity of disease semantic similarity and miRNA functional similarity effect the prediction performance. To overcome this disadvantages in data, the similarity is estimated for each disease pair via integrating disease semantic similarity and disease Gaussian interaction profile kernel similarity, as well as miRNA pair is estimated via integrating miRNA function similarity and miRNA Gaussian interaction profile kernel similarity. Then, the bipartite miRNA-disease network (MD) is constructed, where edges in the miRNA-disease network are the known associations between miRNAs and diseases that were released by HMDD in June 2014. The transition probability matrix of MD is obtained by using hybrid recommendation algorithm in bipartite networks. Then, unbalance bi-random walk is carried out in heterogeneous network that includes DDS, MMS and MD. Finally, for a given disease, all candidate miRNAs will be ranked according to transition probability matrix, and the higher the rank, the more likely it is to be associated with the given disease. Flowchart of potential miRNA-disease association prediction based on the computational model of BRWHNHA is shown in Fig. 4. The most important 2 steps is:

Figure 4
figure 4

Flowchart of potential miRNA-disease association prediction based on the computational model of BRWHNHA.

Step 1 (Calculate the transition probability matrix of DDS, MMS and MD): The transition probability matrix \(M={(M(i,j))}_{{n}_{m}\times {n}_{m}}\) of MMS is constructed as:

$$M(i,j)=\{\begin{array}{ll}\tfrac{{M}_{S}(i,j)}{{\sum }_{k=1}^{{n}_{m}}\,{M}_{S}(k,j)}, & \sum _{k=1}^{{n}_{m}}\,{M}_{S}(k,j)\ne 0\\ 0, & {\rm{otherwise}}\end{array}$$
(12)

Similarly, \(D{(D(i,j))}_{{n}_{d}\times {n}_{d}}\) is the transition probability matrix of the DDS:

$$D(i,j)=\{\begin{array}{ll}\tfrac{{D}_{S}(i,j)}{{\sum }_{k=1}^{{n}_{d}}\,{D}_{S}(k,j)}, & \sum _{k=1}^{{n}_{d}}\,{D}_{S}(k,j)\ne 0\\ 0, & {\rm{otherwise}}\end{array}$$
(13)

Based on hybrid recommendation algorithm, the miRNA node mi is assigned an initial lever of resource f(mi) = 1, or 0 depending on whether the miRNA is associated with given disease. All the resource of miRNA nodes redistributed via the transition matrix of hybrid recommendation algorithm, and transition probability matrix of MD is calculated as:

$${P}_{A}={W}^{H+P}\ast A.$$
(14)

Step 2 (Implement unbalance bi-random walk in heterogeneous network): Because of the different topological characteristics between MMS and DDS, in these two networks, we introduce two parameters of l and r as the biggest step random walk on MMS and DDS.

$$MMS:{P}_{{t}_{ \mbox{-} M}}=(1-\alpha )\ast M\ast {P}_{t-1}+\alpha \ast {P}_{A}$$
(15)
$$DDS:{P}_{{t}_{ \mbox{-} D}}=(1-\alpha )\ast {P}_{t-1}\ast D+\alpha \ast {P}_{A}$$
(16)
$${P}_{t}=\{\begin{array}{ll}\tfrac{{P}_{{t}_{ \mbox{-} M}}+{P}_{{t}_{ \mbox{-} D}}}{2} & t\le r,t\le l\\ {P}_{{t}_{ \mbox{-} M}}, & t\le r,t > l\\ {P}_{{t}_{ \mbox{-} D}}, & t > r,t\le l\end{array}$$
(17)

α denotes a decay factor ranging from 0 and 1. The matrix PA is used to control the prior probability of the iterative process and is the transition probability matrix of the bipartite network G obtained by the recommendation algorithm. PA is a transition probability matrix, and P0 = PA/sum(PA). After several iterations, Pt is the steady-state probability matrix between miRNAs and diseases. For a given disease, we ranked all the candidate miRNAs based on the probability. In BRWHNHA algorithm, we effectively utilize the topological information of heterogeneous networks, including: MMS, DDS and MD.