Large-scale prediction of microRNA-disease associations by combinatorial prioritization algorithm

Identification of the associations between microRNA molecules and human diseases from large-scale heterogeneous biological data is an important step for understanding the pathogenesis of diseases in microRNA level. However, experimental verification of microRNA-disease associations is expensive and time-consuming. To overcome the drawbacks of conventional experimental methods, we presented a combinatorial prioritization algorithm to predict the microRNA-disease associations. Importantly, our method can be used to predict microRNAs (diseases) associated with the diseases (microRNAs) without the known associated microRNAs (diseases). The predictive performance of our proposed approach was evaluated and verified by the internal cross-validations and external independent validations based on standard association datasets. The results demonstrate that our proposed method achieves the impressive performance for predicting the microRNA-disease association with the Area Under receiver operation characteristic Curve (AUC), 86.93%, which is indeed outperform the previous prediction methods. Particularly, we observed that the ensemble-based method by integrating the predictions of multiple algorithms can give more reliable and robust prediction than the single algorithm, with the AUC score improved to 92.26%. We applied our combinatorial prioritization algorithm to lung neoplasms and breast neoplasms, and revealed their top 30 microRNA candidates, which are in consistent with the published literatures and databases.


Fig.S1
The distributions of similarity scores between microRNAs (diseases).

Table S1
The top 30 microRNA candidates for breast neoplasms in the ranked list.

Table S2
The top 30 microRNA candidates for lung neoplasms in the ranked list     (1) 'literature' means that there is a literature to support that the microRNA is related to human breast neoplasm. (2) With analysis of the microarray data sets, a microRNA is considered to have different express levels in breast cancer when compared to normal tissues. This kind of microRNAs is labeled by 'dbDEMC'. (3) 'HMDD' means that a microRNA is a newly reported breast neoplasms-related microRNA which is collected by September-2012 version of human microRNA-disease database HMDD. (4) 'miR2Disease' means that a microRNA is included in miR2Disease database, a manually curated microRNA-disease association database. (5) G2SBC is a genes-to-systems breast cancer database, which is usually used for assistant studying the breast cancer. 'G2SBC' means some of the top predicted target mRNAs of a microRNA are breast cancer-related genes.

Rank
MicroRNA Name Evidences PMIDs Descriptors With the significance analysis of the microarrays, hsa-let-7c is identified as a potential microRNA down-regulated in breast cancer when compared to normal tissues. With the significance analysis of the microarrays, hsa-mir-126 is identified as a potential microRNA down-regulated in breast cancer when compared to normal tissues.
Hsa-mir-126 is included in miR2Disease, a manually curated microRNA-disease association database. It means hsa-mir-126 is really associated with lung neoplasms. With the significance analysis of the microarrays, hsa-mir-100 is identified as a potential microRNA down-regulated in breast cancer when compared to normal tissues. With the significance analysis of the microarrays, hsa-let-7e is identified as a potential microRNA down-regulated in breast cancer when compared to normal tissues. With the significance analysis of the microarrays, hsa-mir-135a is identified as a potential microRNA up-regulated in breast cancer when compared to normal tissues.
With the significance analysis of the microarrays, hsa-mir-130a is identified as a potential microRNA down-regulated in breast cancer when compared to normal tissues. With the significance analysis of the microarrays, hsa-let-7i is identified as a potential microRNA down-regulated in breast cancer when compared to normal tissues.
Hsa-let-7i is included in miR2Disease, a manually curated microRNA-disease association database. It means has-let-7i is really associated with lung neoplasms.
10 hsa-mir-106a dbDEMC 17922911 With the significance analysis of the microarrays, hsa-mir-106a is identified as a potential microRNA down-regulated in breast cancer when compared to normal tissues. 11 hsa-mir-150 dbDEMC 17922911 With the significance analysis of the microarrays, hsa-mir-150 is identified as a potential microRNA up-regulated in breast cancer when compared to normal tissues. With the significance analysis of the microarrays, hsa-mir-181a is identified as a potential microRNA up-regulated in breast cancer when compared to normal tissues.
Hsa-mir-181a is included in miR2Disease, a manually curated microRNA-disease association database. It means hsa-mir-181a is really associated with lung neoplasms. With the significance analysis of the microarrays, has-let-7g is identified as a potential microRNA down-regulated in breast cancer when compared to normal tissues. With the significance analysis of the microarrays, hsa-mir-371 is identified as a potential microRNA up-regulated in breast cancer when compared to normal tissues. With the significance analysis of the microarrays, hsa-mir-152 is identified as a potential microRNA down-regulated in breast cancer when compared to normal tissues.
Hsa-mir-152 is included in miR2Disease, a manually curated microRNA-disease association database. It means hsa-mir-152 is really associated with lung neoplasms. With the significance analysis of the microarrays, hsa-mir-148a is identified as a potential microRNA up-regulated in breast cancer when compared to normal tissues.
Hsa-mir-148a is included in miR2Disease, a manually curated microRNA-disease association database. It means hsa-mir-148a is really associated with lung neoplasms.
30 hsa-mir-208 dbDEMC 16784538 With the significance analysis of the microarrays, hsa-mir-208 is identified as a potential microRNA up-regulated in breast cancer when compared to normal tissues. (1) 'literature' means that there is a literature to support that the microRNA is related to human lung neoplasm.
(2) With analysis of the microarray data sets, a microRNA is considered to have different express levels in lung cancer when compared to normal tissues. This kind of microRNAs is labeled by 'dbDEMC'. (3) 'HMDD' means that a microRNA is a newly reported lung neoplasms-related microRNA which is collected by September-2012 version of human microRNA-disease database of HMDD. (4) 'miR2Disease' means that a microRNA is included in the miR2Disease, a manually curated microRNA-disease association database.

MicroRNA Name Evidences PMIDs Descriptors
1 hsa-mir-106b dbDEMC 19584273 With the significance analysis of the microarrays, hsa-mir-106b is identified as a potential microRNA up-regulated in lung cancer when compared to normal tissues.
2 hsa-mir-15a dbDEMC 15944708 With the significance analysis of the microarrays, hsa-mir-15a is identified as a potential microRNA up-regulated in lung cancer when compared to normal tissues. With the significance analysis of the microarrays, hsa-mir-133a is identified as a potential microRNA down-regulated in lung cancer when compared to normal tissues. 4 hsa-mir-10a dbDEMC 15944708 With the significance analysis of the microarrays, hsa-mir-10a is identified as a potential microRNA down-regulated in lung cancer when compared to normal tissues. With the significance analysis of the microarrays, hsa-mir-100 is identified as a potential microRNA down-regulated in lung cancer when compared to normal tissues. With the significance analysis of the microarrays, hsa-mir-10b is identified as a potential microRNA down-regulated in lung cancer when compared to normal tissues.
12 hsa-mir-152 dbDEMC 15944708 With the significance analysis of the microarrays, hsa-mir-152 is identified as a potential microRNA down-regulated in lung cancer when compared to normal tissues. With the significance analysis of the microarrays, hsa-mir-137 is identified as a potential microRNA down-regulated in lung cancer when compared to normal tissues.
14 hsa-mir-16 With the significance analysis of the microarrays, hsa-mir-16 is identified as a potential microRNA down-regulated in lung cancer when compared to normal tissues.
Hsa-mir-16 is included in miR2Disease, a manually curated microRNA-disease association database. It means hsa-mir-16 is really associated with lung neoplasms. With the significance analysis of the microarrays, hsa-mir-130a is identified as a potential microRNA down-regulated in lung cancer when compared to normal tissues.
Hsa-mir-130a is included in miR2Disease, a manually curated microRNA-disease association database. It means hsa-mir-130a is really associated with lung neoplasms.
16 hsa-mir-148b dbDEMC 15944708 With the significance analysis of the microarrays, hsa-mir-148b is identified as a potential microRNA down-regulated in lung cancer when compared to normal tissues.