A deep ensemble model to predict miRNA-disease association

Cumulative evidence from biological experiments has confirmed that microRNAs (miRNAs) are related to many types of human diseases through different biological processes. It is anticipated that precise miRNA-disease association prediction could not only help infer potential disease-related miRNA but also boost human diagnosis and disease prevention. Considering the limitations of previous computational models, a more effective computational model needs to be implemented to predict miRNA-disease associations. In this work, we first constructed a human miRNA-miRNA similarity network utilizing miRNA-miRNA functional similarity data and heterogeneous miRNA Gaussian interaction profile kernel similarities based on the assumption that similar miRNAs with similar functions tend to be associated with similar diseases, and vice versa. Then, we constructed disease-disease similarity using disease semantic information and heterogeneous disease-related interaction data. We proposed a deep ensemble model called DeepMDA that extracts high-level features from similarity information using stacked autoencoders and then predicts miRNA-disease associations by adopting a 3-layer neural network. In addition to five-fold cross-validation, we also proposed another cross-validation method to evaluate the performance of the model. The results show that the proposed model is superior to previous methods with high robustness.


Five-fold cross validation.
Cross-validation is a frequently used method in machine learning and can greatly reduce the bias caused by sample selection. In this case, evaluating the performance of different models is crucial and practical when some of the positive miRNA-disease associations are missing or a new miRNA-disease association is added. To evaluate the prediction performance of DeepMDA, we adopted 5-fold cross-validation compared with other five state-of-the-art computational models (i.e., RLSMDA, HGIMDA, NCPMDA, PBMDA, RKNNMDA). RWRMDA was a representative approach in the domain and was often considered as a standard method to validate performance. HGIMDA was an improved version of RWRMDA that incorporated changes in the data pre-processing procedure. RLSMDA was a semi-supervised method often listed as a compared method in miRNA-disease studies. NCPMDA was a network consistency projection method that showed superior performance compared to HDMP 50 and NetCBI 51 . PBMDA is a path-based method that adopted a depth-first search algorithm to infer potential miRNA-disease associations 41 . RKNNMDA is a KNN-based model that was combined with the SVM rank method to predict miRNA-disease associations 40 . Other recently developed methods such as MCMDA 52 and ILRMR 42 , whose datasets used in their studies differ from ours; therefore, we did not choose them for comparison. In the 5-fold cross-validation, all the known interactions were randomly split into 5 subsets with equal size. In each fold, one subset was left out as testing samples, and the remaining four subsets were treated as training sets. The entire procedure was repeated until the entire subset was used for training. The average performance was adopted for evaluation.
The receiver-operating characteristics (ROC) curve was chosen by plotting the true positive rate (TPR, sensitivity) curve against the false positive rate (FPR, specificity) at different thresholds. Specificity is the proportion of samples below the given threshold and sensitivity represents the percentage of samples higher than the threshold. The area under the ROC curve (AUC) was also calculated to evaluate the ability of the prediction model. An AUC value of 1 denotes that the performance is perfect, and an AUC value of 0.5 indicates random prediction performance. Furthermore, we also adopted another type of quality measure used in these types of studies called AUPR (Area Under the Precision vs. Recall Curve). Due to the unbalanced phenomenon of the dataset, the positive data were smaller compared to the negative data. Therefore, AUPR was proposed to reduce the impact caused by a high proportion of false positive data. Similar to the AUC score, AUPR values closer to 1 indicate that the performance is better.
The 152 miRNAs in SM T and 255 miRNAs in SM F are separately utilized with 383 disease similarities to evaluate the performance in small datasets of miRNAs. The experiments implemented 5-fold cross-validation, and the results are shown in Tables 1 and 2. The results show that DeepMDA achieved the highest AUC and AUPR scores in both datasets compared to other algorithms. Furthermore, the three deep models that will be mentioned later had advantages compared to other network based algorithms.
Next, we integrated multiple datasets and used all the 495 miRNAs in SM with the 383 disease similarities to evaluate the performances. The results of the five different approaches together with DeepMDA are shown in Table 3. DeepMDA, RLSMDA, HGIMDA, NCPMDA, PBMDA, RKNNMDA obtained average AUCs of 0.9486 ± 0.002, 0.8475 ± 0.005, 0.7689 ± 0.011, 0.8731 ± 0.007, 0.9086 ± 0.004, 0.7076 ± 0.005, respectively, in 5-fold cross-validation. DeepMDA also showed the highest AUPR score compared to the five previous methods in all three datasets.
To validate the reasonability of proposed model, another two models were implemented based on the deep learning framework. In the first model, we constructed Stacked AutoEncoder(SAE) with Adaboost, called SAE-ADA, as one alternative classifier. Adaboost is an ensemble method often used in machine learning that obtains a more satisfactory result compared to other methods during experiments. In the second model, we used the latter part of DeepMDA, that is, the raw similarity data was used as input, removing the two stacked autoencoders and directly feeding the feature vector to a three-layer fully connected network to construct the classifier (RAW-DNN). We measured these two deep models (SAE + ADA, RAW + DNN) and calculated their AUC values for comparison with DeepMDA. As shown in Table 3, SAE + ADA achieved an average AUC score of 0.9211 ± 0.002 and RAW + DNN obtained an average AUC score of 0.9386 ± 0.001. Therefore, the two deep network models show promising results. We also compared the DRMDA model (i.e., single auto-encoder with  Table 2. Results on the SM F miRNA datasets. The AUC and AUPR scores are listed above. The * indicates the highest AUC/AUPR score. Generally, the three deep learning models performed better than the other five models. SVM) and single auto-encoder with DNN model to validate their performances. As shown in Table 4, DeepMDA still achieved the highest AUC and AUPR scores. From these results, it is clear that adopting the SAE as a high-level feature extractor is an essential aspect for improving performance when comparing DeepMDA with RAW + DNN. On the other hand, DNN still inevitably played the role of final classifier when we compared the result of the proposed model with SAE + ADA. Overall, DeepMDA showed a better result in ensemble deep network frameworks compared with two other deep models, and it achieved the best performance of all the compared methods. The standard error of each AUC was small in the five CVs'; therefore, we randomly chose one of the ROC results during 5-fold cross-validation, as depicted in Fig. 1.
Leave-one-disease-out cross-validation. The traditional leave-one-out method of cross-validation (LOOCV) leaves one known miRNA-disease association out in each turn and uses other known associations for model training, and the method then uses that test sample ranking with all the other associations in every iteration. However, training samples were separated from test samples during every recursion in the proposed model, because it could induce bias if we used only one miRNA-disease association as the test sample. Thus, instead of  Table 3. Results on the full miRNA-disease datasets. The AUC and AUPR scores are listed above. The * indicates the highest AUC/AUPR score. Generally, the three deep learning models performed better than the other five models.  Table 4. Comparison between DRMDA and DeepMDA on the full miRNA-disease datasets in five-fold cross validation. The AUC and AUPR scores are listed above. Generally, DeepMDA performed better than the other two models. leaving each known miRNA-disease association out and predicting it among all the unknown miRNA-disease association w.r.t. of the investigated disease in each turn, we left every column samples about one disease each time. This method was called Leave-One-Disease-Out Cross-Validation (LODOCV). In every iteration, we tried to predict all the chosen disease-associated miRNAs using the information of other disease-related miRNAs. To our knowledge, LODOCV is considerably more difficult to use than traditional LOOCV because we tried to uncover every miRNA-disease association w.r.t to each disease without any known miRNA-disease information.

Method
To be specific, we left all of the diseases as test samples in one iteration. Using other disease-related miRNA information to predict all the unknown disease-associated miRNA associations is a challenging and meaningful problem for researchers and medical diagnoses. Network-based models and three deep learning models were selected to evaluate the overall performance. The results are shown in Table 5. DeepMDA achieved an average AUC score of 0.8729, SAE + ADA obtained an average AUC score of 0.8552, and RAW + DNN reached an AUC score of 0.8633. Five network-based models (RLSMDA, NCPMDA, HGIMDA, PBMDA, RKNNMDA) achieved average AUC scores of 0.8530, 0.6374, 0.7616, 0.6902 and 0.5680, respectively. Regarding the AUPR scores, DeepMDA still achieved the highest AUPR score compared with the other methods. Likewise, we randomly picked one ROC result and drew the ROC curve as shown in Fig. 2. Overall, DeepMDA obtained the best performance in LODOCV compared with the other 7 methods.

Robustness in DeepMDA.
The deep learning models showed powerful abilities in high-level feature extraction, especially in complex relationship analysis. To measure their abilities to capture the data structure and interaction relationship, we further implemented five-fold cross-validation using noisy data. We added some white noise data to the trained data obtained from the autoencoders, and then implemented a deep neural network classifier, an AdaBoost classifier and a random forest classifier separately to compare their performances. The latter two classifiers were chosen because they are both ensemble classifiers and they achieved more satisfactory prediction results during the experiments. The results in Table 6 show that the AUC score dropped from 0.9486 to 0.9334 using DeepMDA, but the AUC score dropped from 0.9211 to 0.8235 using SAE + ADA and from 0.9249 to 0.8122 using SAE + RF. The AUC score of DRMDA dropped from 0.8812 to 0.7757. This result illustrated that DeepMDA could capture the complex relationships and be robust when noise data were introduced.  Table 5. Results on the full miRNA-disease datasets in LODOCV. The AUC and AUPR scores are listed above. Generally, DeepMDA performed better than the other seven models in LODOCV. Case studies. We further investigated some complex human diseases to determine the disease-related miR-NAs using the proposed model for measuring model prediction ability. The results showed that human digestive and urinary systems are occasionally deregulated through miRNA functional expression. The oesophagus and colon belong to the digestive system, while the kidneys belong to the urinary system. Therefore, we investigated the potential association between miRNAs and three different diseases, i.e., oesophageal neoplasms, kidney neoplasms and colon neoplasms. The prediction results were validated by checking the experimental results presented in two databases, miR2Disease 19 and dbDEMC 53 , which record many experimentally verified miRNA-disease associations. We implemented LODOCV to predict candidate disease-related miRNAs for these three disease-related cases, and many miRNAs could be precisely predicted using DeepMDA. In total, 47, 42 and 44 out of the top 50 validated miRNAs were predicted w.r.t. colon neoplasms, oesophageal neoplasms, and kidney neoplasms, respectively (see details in Supplementary Table S1, Supplementary Table S2 and Supplementary  Table S3).
Colon neoplasms are one of the most severe diseases worldwide 54 . It was reported that almost half of the patients with colon neoplasms die of metastatic disease within 5 years from diagnosis 55,56 . Increased evidence has indicated that miRNAs have potential associations with colon neoplasms. For instance, miR-145 may inhibit cell growth in colon neoplasms by targeting the insulin receptor substrate-1 57 . Furthermore, tumour specimens showed highly significant and large-fold change differential expression of the levels of several miRNAs, including miR-135b, miR-133a, miR-1, miR-31, and others 58 . MiR-20a and miR-155 were confirmed to be up-regulated in Colon Neoplasms 59 . By using DeepMDA, the potential colon neoplasm-related miRNAs were identified, and the results are listed in Supplementary Table S1, which shows that 10 out of the top 10 and 46 out of the top 50 predicted miRNAs were confirmed based on miR2Disease and dbDEMC. For example, an inverse correlation of miR-21 was found in 10 colorectal cell lines suggesting that it might play a role as a useful diagnostic biomarker for colon neoplasms prognosis 60,61 . To further validate the relationship between predicted miRNAs and cancers, various cancer hallmarks were verified, such as genes that are associated with miRNAs. For instance, some genes such as BRAF, APC, and TP53 can be regarded as colon cancer hallmarks 62 , and these gene-related miRNAs associations could be validated by miRTarBase 63 , showing that these miRNAs could possibly regulate these genes. We also found that many disease-related miRNAs are likely to be enriched together; this pattern is similar to disease-associated genes that play roles in some cancer hallmarks 30 , suggesting that these miRNAs may co-regulate some diseases such as cancers.
Oesophageal neoplasms are one of the most common malignant tumours worldwide and are ranked as the sixth main cause of cancer related deaths 64 . It has been reported that the overall 5-year survival rate is approximately 20% despite advanced treatments 56,65 . Improving the understanding of the biological mechanism underlying oesophageal cancer is crucial for diagnosis and disease prevention 66 . Experimental evidence has revealed that several human miRNAs are located at genomic regions related to the expression of tumour genes such as oesophageal neoplasms 67 . For instance, miR-155 and miR-103 are highly expressed in tumour tissues and could be correlated with different clinic pathologic classifications 68 . miR-98 may suppress migration and invasion in human oesophageal squamous cell carcinomas 69 . Using DeepMDA to predict potential oesophageal neoplasm-related miRNAs could help validate the prediction ability of our model. As a result, 8 out of the top 10 candidates and 42 out of the top 50 predicted miRNAs were selected as having potential relationships with oesophageal neoplasms, according to miR2Disease and dbDEMC (see Supplementary Table S2).
Kidney neoplasms are a type of cancer with an incidence increase of 43% since 1973 in the US 70 . The risk of the disorder increases with age and differs between men and women. The diagnosed number of kidney neoplasms every year has exceeded 250,000 cases 71 , among which over 80% are found to have renal-cell-carcinoma (RCC). Recent studies have found that miR-34a can be over-expressed in patients with RCC who suffer from kidney neoplasms 72 . It also showed that a combination of miR-141 with miR-155 resulted in a 97% correct classification rate, which implied reliable evidence of potential associations between miR-141/miR-155 and kidney neoplasms. To discover the potential associations between miRNAs and kidney neoplasms, we implemented DeepMDA to accomplish the prediction. The results, shown in Supplementary Table S3, were that 8 out of the top 10 and 44 out of the top 50 candidates were chosen as the kidney neoplasms related miRNAs. For example, miR-155, miR-126 and miR-20a were found over-expressed in malignant samples such as clear-cell type human renal cell carcinoma 73 . miR-145 was reported to down-regulate its target mRNA and the corresponding protein in kidney tissues 74 . Overall, the results from LODOV and the separate case studies on three typical diseases showed satisfactory performances using DeepMDA. Unlike the traditional method, which uses prior knowledge to perform the prediction, the proposed method was capable of capturing one specific potential disease related to the relationship miRNAs without relying on any known information. Therefore, DeepMDA can be applied to a wide range of applications. To make the model more useful for the community and biologists, we also developed a web server to search for each potential disease-related miRNA that our model predicted (https://laiyifu.shinyapps.io/ DeepMDA/).

Discussions
Increasing evidence shows that miRNA genes are located at genomic regions involved in cancer, indicating that miRNAs play significant roles in the development of various diseases. Due to the limitations of previous computational models, a more effective and less costly way to predict miRNA-disease associations is required. In this study, a deep ensemble miRNA-disease association prediction (DeepMDA) framework was proposed by synthesizing heterogeneous biological networks. First, miRNA functional similarity and heterogeneous Gaussian interaction profile kernel similarities were integrated to form the miRNA similarity data and the disease semantic information. In addition, heterogeneous disease-related data were utilized to construct disease similarity data. Second, two similarity data matrices were segmented by lines separately and fed into two stacked autoencoders to learn complex high-level features. Then, the two output feature vectors from two SAEs were concatenated to form an independent feature vector, whose corresponding label was picked from a known miRNA-disease association matrix. The latter part of DeepMDA used a three-layer fully connected neural network to make the final predictions of the potential miRNA-disease associations with the feature vectors gained from two autoencoders. Both LODOCV and 5-fold cross-validation were implemented to validate DeepMDA performance. Compared with five state-of-the-art computational models and two other deep models, DeepMDA showed the best performance and good robustness compared to the other deep models. Furthermore, case studies were also implemented using several complex human diseases (colon neoplasms, oesophageal neoplasms, and kidney neoplasms), in which 47, 42 and 44 out of the top 50 predicted miRNAs, respectively, had experimentally supported evidence based on previous literature. DeepMDA can also be used to predict the miRNAs associated with isolated diseases, which could benefit human disease diagnoses and prevention.
There are several reasons that account for the reliable performance of DeepMDA. First, multiple dataset sources (more knowledge) were adopted to enlarge the miRNA and disease similarity matrices, and more data could provide more evidence when trying to predict the associations of disease-related miRNAs. Second, a deep ensemble framework was proposed to extract high-level features from traditional feature vectors and predict the potential associations using these non-linear high-level features, which improved the model's performance compared to other state-of-the-art models.
Furthermore, the proposed model can be regarded as a more general model that may play a potential role in predicting other kinds of associations, such as lncRNA-disease, drug-targets, and so on.

Methods
Datasets. Biological experiments have collected many miRNA-disease associations, and multiple databases were constructed for researchers to verify the research. The human miRNA-disease dataset used in this study was downloaded from the HMDD database (June 2013) 18 . It consists of 5430 validated distinct experimental human miRNA-disease associations of approximately 495 miRNAs and 383 diseases. We used adjacency matrix A md to represent miRNA-disease associations. For instance, if miRNA m(i) is reported to be associated with disease d(j) in the database HMDD, the value of a md (i,j) is 1; otherwise, it is 0. The number of miRNAs and diseases in the database are denoted as nm and nd, respectively.
We also adopted disease related long noncoding RNAs (lncRNAs) data from LncRNADisease 75 . The LncRNADisease database has integrated more than 1000 lncRNA-disease entries including 321 lncRNAs and 221 diseases from ~500 publications. Furthermore, disease-related gene data was retrieved from the DisGeNET (Version 4.0) database 76 . We chose curated gene-disease association containing 14412 genes and 10757 unique diseases from DisGeNET. Using these two data sets, we constructed two adjacency matrixes A ld and A gd to denote the lncRNA-disease associations and the gene-disease associations, respectively. Furthermore, due to the close relationship between miRNAs and their corresponding targets, we utilized the experimentally validated miRNA-target interaction data from miRTarBase 63 . MiRTarBase has collected more than 41000 human miRNA-target interactions, including 2649 miRNAs and 14894 targets that are validated through various studies. The adjacency matrix A mt was constructed to represent miRNA-target associations. The overall design of the dataset integration is shown in Fig. 3.
Gaussian interaction profile kernel similarity for miRNAs. Based on the assumption that similar miRNAs with similar functions tend to be associated with similar diseases, and vice versa, the interaction profile of miRNA m(i) is denoted by a binary vector IP(m(i)) representing whether miRNA m(i) is interacted with each disease or not. Then, the kernel for the two miRNAs m(i) and m(j) are defined to calculate the Gaussian kernel similarity based on their interaction profiles, which are defined as follows: KM m i m j IP m i IP m j ( ( ), ( )) exp( ( ( )) ( ( )) ) (1) where γ m used to control the kernel bandwidth and is obtained by normalizing a new bandwidth parameter m γ ′ by the average number of associated diseases for all the miRNAs. Here, m γ ′ is set to 1 according to previous research 38 . Likewise, similar miRNAs with similar functions tend to be related to similar target genes, and vice versa. Thus, by using miRNA-target interaction matrix A mt , we could also obtain 2649 miRNAs Gaussian interaction profile kernel similarity matrix. The calculation is the same as before.
where γ m is calculated by normalizing γ ′ m , which divided the average number of associated targets for all miRNAs. m γ ′ is set to 1 again.
Integrate similarity for miRNAs. Because of the lack of data concerning the 495 miRNAs similarity scores, we chose to integrate miRNA functional similarity and two Gaussian interaction profile kernel similarity matrices for a new mixed similarity for each pair of miRNAs. Specifically, for a miRNA pair m(i) and m(j) that exist only in the functional similarity matrix, the miRNA functional similarity is chosen as their integrated similarity score: if m(i) and m(j) do not exist in the miRNA functional similarity matrix but both exist in the KM 2 matrix, we chose their Gaussian profile kernel similarity score from KM 2 as their integrated similarity score, and if m(i) and m(j) both exist in the miRNA functional similarity matrixes FS and KM 2 , the average score was calculated. If the two miRNAs do not exist in the matrixes FS or KM 2 , we adopted their Gaussian profile kernel similarity score from KM 1 as the integrated similarity. The overall integrated similarity score between miRNA m(i) and m(j) is as follows:  Figure 3. The flowchart of proposed DeepMDA. The miRNA similarity was integrated using miRNAfunctional similarity and miRNA-disease association. As for disease similarity, we adopted DAG information and Gaussian interaction profile similarity information. The two input data was fed into two stacked autoencoders to learn high-level features, then merged and finally utilized a 3-layer network to infer the association between miRNAs and diseases.
ScIEntIfIc REPORts | 7: 14482 | DOI:10.1038/s41598-017-15235-6 Other similarity for small dataset of miRNAs. In addition, we also obtained two miRNA similarity matrices SM F and SM T separately from the miRNA functional similarity matrix FS and the miRNA Gaussian profile similarity matrix KM 2 gained from the miRNA-target associations. Specifically, 255 miRNAs in SM F both appeared in HMDD and the miRNA functional similarity matrix FS. Similarly, there are 152 miRNAs in SM T that exist in both HMDD and miRNA-target associations. These two similarity matrices SM F and SM T were also used to train and test our model, and their performance was evaluates using the same procedure as was used for SM.
Disease semantic similarity. Many diseases' MeSH descriptors are collected in the MeSH database, which can be downloaded from the National Library of Medicine (http://www.nlm.nih.gov) 77 . Each disease can be described as an entry item in a Directed Acyclic Graph (DAG), such as DAG(D) = (D, T(D), E(D)), where T(D) stands for the node set that includes node D itself and its ancestor nodes,E(D) represents the corresponding edge set that directly links the parent nodes to the child nodes. Here, we chose the MeSH descriptor starting with the capital letter "C" to acquire the diseases to construct the disease DAGs. Each tree number corresponds to a specific position in the DAG collected from each MeSH descriptor. In the traditional disease semantic similarity calculation 78 , disease terms in the same layer would contribute the same to the disease semantic value of disease A as an example. However, if two disease terms (disease A and B) occur in the same layer of disease DAGs but their frequency varies in all the DAGs, this causes an inaccurate measurement of the contributions of the two disease terms. Consequently, we adopted an alternative way to calculate the semantic value based on the assumption that a more frequent disease term should have a greater contribution to the semantic value of disease A, which is shown as follows:

number of DAGs including t the number of diseases
The semantic value of disease A was calculated by summing the contribution from all the disease terms in DAG(A).
Finally, the semantic similarity between diseases A and B can be obtained by summing the contributions of disease terms shared by the following two DAGs: Gaussian interaction profile kernel similarity for diseases. Similar to the miRNA Gaussian similarity matrix construction, the disease Gaussian interaction profile kernel similarity matrices were also computed using three association matrices, the miRNA-disease association matrix, the lncRNA-disease association matrix, and the gene-disease association matrix. Three Gaussian interaction profile kernel matrices, KD 1 , KD 2 and KD 3 , were obtained and integrated to determine the overall 383 diseases in the Gaussian profile kernel similarity matrix, defined as follows: The integrated disease Gaussian interaction profile kernel similarity matrix can be used to adjust our model and improve its performance, which will be discussed in the next section.
DeepMDA. In this study, we proposed a deep ensemble framework for miRNA-disease association prediction (DeepMDA). DeepMDA is a neural network structure composed of two parts. First, for every miRNA-disease pair a md (i, j), we assigned the labels of all the known miRNA-disease association pair (positive samples) to 1; otherwise, to 0. Second, the ith of the miRNA similarity matrix (i.e., the similarity data between miRNA i and all the other miRNAs) was fed into a stacked autoencoder to learn another representation, the jth row of disease similarity matrix and the jth row of integrated disease Gaussian interaction profile similarity matrix (i.e., the similarity data between disease j and all the other diseases) were concatenated into a feature vector and regarded as an independent training sample, which was fed to another stacked autoencoders. The autoencoder was a stacked deep neural network that can be trained to learn high-level biological patterns and was already implemented in some bioinformatics field such as yeast microarrays analysis 79 and DNA Methylation state prediction 48 . Third, two separate features were then merged and integrated into a three-layer fully connected neural network to predict the label of each pair sample, which indicated whether it has a connection or not. The flowchart of DeepMDA is shown in Fig. 3.

Stacked autoencoder.
The nm miRNA samples correspond to nm rows of miRNA similarity data and the nd disease samples correspond to nd rows of the disease similarity data. These were fully connected to form a large dataset of nm × nd samples. These pair-wise samples were separately fed into two autoencoders consisting of multiple layers. Autoencoders have been widely used in capturing complex biological patterns 80 .
Assume an input data x has d dimensions. Its mapping formation is constructed as follows: ScIEntIfIc REPORts | 7: 14482 | DOI:10.1038/s41598-017-15235-6 where f is a non-linear function that maps the linear result of x to a non-linear space. The output y is then projected back to form the reconstruction output z, which has the same shape as original input x. The equation is as follows: (11) where ′ W is the reconstruction-weighting matrix, ′ b is the reconstruction bias, and g is another non-linear function same as f. The entire reconstruction procedure needs to calculate the error; therefore, we chose the mean squared error between x and z, which can be optimized using stochastic gradient descent (SGD) 81 . All the parameters used in the network implemented greedy layer-wise learning, which learns the parameters of one layer while freezing the parameters of the other layers.
In this study we did not utilize a commonly used deep network module, such as a convolutional neural network (CNN) 47 or a recurrent neural network (RNN) 82 because our input was purely a similarity item corresponding to a miRNA or a disease without sequence or positional information. Therefore, CNN and RNN do not show great improvements and introduce large computational cost. As an alternative, we used fully connected layers with an activation function and dropout layers 83 to construct an autoencoder. Dropout layers mainly help to avoid the possibility of over-fitting by randomly dropping some neuron units. The dropout rates were all set to 0.5 during model training. Finally, for a pairwise sample, we obtained two pairs of extracted high-level features using the autoencoders; these were input to the classifier to make the final predictions.
Deep neural network. After the two autoencoders extracted two parts of high-level features, they were concatenated to form an integrated sample feature vector. Altogether, there were nm × nd samples, and the label of each sample was 1 if the miRNA-disease pair has connection according to the known relationship in the miRNA-disease association matrix otherwise 0. The combined feature vector was then fed into a feed-forward neural network consisting of three fully connected layers. We set the output dimension of each autoencoder to 64; therefore, a 128-dimensional feature vector was fed to the network. In the fully connected layer, a three-layer neural network was implemented to obtain the final prediction of the association between each miRNA and disease. The number of layers we chose here was dependent on the experiments, and the best results were obtained when the three-layer network was utilized. The predicted association possibility of each pairwise miRNA-disease sample exceeding the threshold was considered as a potential disease-related miRNA, and vice versa.
In the fully connected layer, a three-layer neuron network was implemented to get the final prediction of the association between each miRNA and disease. The predicted association possibility of each pairwise miRNA-disease sample exceeded the threshold was considered as a potential disease-related miRNA and vice versa. All the neuron units in the layer i was connected to the previous layer (i − 1) and generated outputs using non-linear transformation function f. Activation function ReLU is a non-linear function that can capture hidden patterns within the data 46 and can reduce gradient vanishing in the meantime. Dropout was also used behind every fully connected layer to avoid overfitting. The final output utilized sigmoid function to make prediction of each sample, which is shown as follows: x To train the model, we need to minimize the objective function in order to minimize the loss. We chose one function frequently used called cross-entropy cost function C 84 .

∑∑
x t where C is the loss function output called cross-entropy cost function. And x is the index of the training examples and t indicates the index of different labels, y represents the true label for sample x, 0 or 1 respectively and a indicates the predicted output of the model for 0 or 1 label given input sample x. The more the predicted outputs approaches the true values, the less value C gets. As the cross-entropy function is non-negative, our goal is to minimize the function to get the best prediction. Neural network models were trained using the Keras 1.0.1 library (https://github.com/fchollet/keras) with Tensorflow as the backend. The ADADELTA algorithm 85 with a mini batch size of 200 was used to minimize the loss on the training set. The batch number was set to 200 because the model achieved the best performance using the 200-batch size. All the weights were initialized using a Gaussian distribution with a standard deviation of 0.05, and its corresponding bias was initialized ranging from unif (−1.0,0.0) as is typical. A computer with an NVIDIA Tesla K80 GPU was used to train the model. The Python code and the datasets are all available at https://github. com/sperfu/DeepMDA.