DisSim: an online system for exploring significant similar diseases and exhibiting potential therapeutic drugs

The similarity of pair-wise diseases reveals the molecular relationships between them. For example, similar diseases have the potential to be treated by common therapeutic chemicals (TCs). In this paper, we introduced DisSim, an online system for exploring similar diseases, and comparing corresponding TCs. Currently, DisSim implemented five state-of-the-art methods to measure the similarity between Disease Ontology (DO) terms and provide the significance of the similarity score. Furthermore, DisSim integrated TCs of diseases from the Comparative Toxicogenomics Database (CTD), which can help to identify potential relationships between TCs and similar diseases. The system can be accessed from http://123.59.132.21:8080/DisSim.

Scientific RepoRts | 6:30024 | DOI: 10.1038/srep30024 similarity score. In this paper, we presented an online tool DisSim to compute the similarity between DO terms, which provides both semantic-based and functional-based methods (Resnik, Lin, Wang, PSB, and SemFunSim). In addition, the system obtains the P-value of the similarity score to provide the significance of it. Furthermore, DisSim compares TCs of similar diseases. The system is freely available at http://123.59.132.21:8080/DisSim.

Results
Two tools including SimDisExplore and SimPDExplore are provided in DisSim. The details about the usage of these two tools are described as follows.
A case for using SimDisExplore. SimDisExplore can be accessed from the web http://123.59.132.21:8080/ DisSim/single.jsp. Figure 1 shows a case for searching similar diseases of 'acute myocardial infraction' and finding correlated TCs of 'acute myocardial infraction' and its similar diseases.

Searching similar diseases of 'acute myocardial infraction'.
Step 1: Input a disease term. Disease terms of Disease Ontology (DO) in DisSim (see Materials and Methods) can be used as reference when inputting a disease term. These disease terms can be downloaded from the page. For convenience, we also provide the function to autocomplete the disease term. In this case, we explored similar diseases of 'acute myocardial infraction' .
Step 2: Select a threshold for the similarity score. For an inputted disease term, the system will return its similar diseases with the P-value less than 0.10 or 0.05 according to the user's selection. In this case, we chose the P-value less than 0.05 as the threshold.
Step 3: Select disease similarity algorithms. For an inputted disease term, five algorithms can be used to explore its similar diseases in DisSim. Users can select multiple algorithms as they need, and select the frequency of the pair that has been identified as similar diseases. In this case, all of these five algorithms are selected, and the frequency is selected as '1 or more' . It means that similar diseases identified by any one of these five algorithms would be shown.
After submitting the input page, similar diseases of the inputted disease term based on the selected algorithms are listed. The first column represents the number of similar diseases. The second and third columns represent the inputted disease and its similar diseases, respectively. The last column is the link to the network visualization of the relationships among TCs and the pair of diseases in this line. Each of the other columns lists the similarity score based on an algorithm and the P-value of the similarity score. All the results can be downloaded from the page.
In this case, the seventh column of the first line is '0.223358238699015 (6.21155725364365e-07)' , which means the similarity score between 'acute myocardial infraction' and 'disease of metabolism' based on the PSB method is 0.223358238699015 and the P-value of the similarity score is 6.21155725364365e-07. The eighth columns is null, which means the P-value of the similarity score between 'acute myocardial infraction' and 'disease of metabolism' based on SemFunSim method is more than 0.05. Finding potential TCs of 'acute myocardial infraction' and its similar diseases. After clicking the link at the last column of the results page of similar diseases, we can get the result page of network visualization of the relationships among TCs and diseases. In this page, each of the red nodes represents a disease, and each of the white nodes is a chemical. For each TC of a disease, we use an edge to link them in the page. The network showing the connections between TCs and diseases is visualized by the Cytoscape Web plugin 18 . Furthermore, potential TCs of diseases are sorted and can be downloaded from the page.
In this case, we got the network among 'acute myocardial infraction' , 'disease of metabolism' and their TCs. The network shows there are no common TCs between 'acute myocardial infraction' and 'disease of metabolism' . This is because 'acute myocardial infraction' didn't be documented in CTD, and the TCs of this disease could not be exploited from CTD directly. According to our system, potential TCs of 'acute myocardial infraction' based on its similar diseases could be sorted and downloaded. 'milrinone' was one of the potential TCs predicted by our system, and it was validated for treating 'acute myocardial infraction' 19 .
A case for using SimPDExplore. SimPDExplore can be accessed from the web (http://123.59.132.21:8080/ DisSim/pairs.jsp). Figure 2 shows a case for searching similarity score and finding correlated TCs between 'liver disease' and 'gallbladder disease' . The difference is that SimPDExplore can be used to search similarity of a given pair of diseases.

Discussion
A recent study showed that similarity of diseases could serve to predict potential TCs of diseases 1 . Although multiple systems have been implemented for calculating the disease similarity, few of them provide potential TCs of diseases and none of them gives significant of the similarity score. In this study, we provided a web interface DisSim (http://123.59.132.21:8080/DisSim) for calculating disease similarity based on five state-of-the-art methods and providing the significance of it. Through integrating TCs of diseases from the CTD, DisSim can help to identify potential relationships between TCs and similar diseases.
All of these five state-of-the-art methods exploited semantic association of terms in the ontology to calculate disease similarity (see Supplementary Methods). Wang's method didn't depend on any other associations. In comparison, Resnik's and Lin's methods utilized information content to measure disease similarity, which incorporates the number of disease-related genes. PSB and SemFunSim method further introduced functional associations of genes in the aspects of the biological process and multiple views, respectively.
The advantage of each method mainly depended on the associations they used and the reasonableness of the method. All of these five methods are frequently used and sufficiently verified, which shows that these methods are designed reasonable. Semantic associations of terms are sourced from ontology, which focuses on relationships between terms at the phenotype level and are manually established by domain experts. Therefore, the method based on the semantic associations could be affected by the domain knowledge of experts and the structure of the ontology. In comparison, associations between diseases and genes and functional associations of genes mainly focus on the molecular level. These types of associations could be much more directly but not easy to be identified. In theory, the method based on more types of associations could be more complete and more comprehensive, such as SemFunSim. In contrast, because each type of association is not complete, the method based on fewer types of associations could lead to less bias, such as Wang's method. In addition, Resnik's, Lin's, and PSB methods could be exploited otherwise.

Materials and Methods
Data Collection. Data sets of DisSim are from open source databases, and they are listed in Table 1. Among them, disease-related genes are from CTD 17 , Gene Reference into Function (GeneRIF) 20 , Online Mendelian Inheritance in Man (OMIM) 6 , Genetic Association Database (GAD) 21 . Disease terms of these databases were assigned to DO according to SIDD 22 . Functional associations between genes are from Gene Ontology (GO), GO Annotation (GOA) 23,24 , and HumanNet 25 . And TCs of diseases are from CTD 17 .
The significance of the similarity score. The existing methods mainly concentrated on calculating the similarity score between DO terms. Few of them provided the significance of the similarity score. In DisSim, five state-of-the-art methods including Resnik, Lin, Wang, PSB, and SemFunSim have been put in place for calculating the similarity score between DO terms. For each of these methods, the process of the calculation of the P-value was repeated as follows: 1) First, we calculated similarity scores for all the pairs of diseases; 2) Next, we got the z-score of these similarity scores; 3) Then, one-sided P-value was accessed for each similarity score; 4) Finally, the P-value was adjusted by the BH method 26 . Then, the P-value less than 0.10 or 0.05 is deemed as the threshold of significance.
The exhibition of potential TCs of diseases. The TCs of diseases from CTD are integrated into DisSim.
And the TCs of pair-wise diseases can be compared by the system in order to find the correlated TCs of the disease pair.
Based on comparing TCs of diseases, DisSim can provide potential TCs of diseases. According to the hypothesis that similar diseases can be treated by common TCs 2 , TCs of one disease can be used as potential TCs of its similar diseases 1 . For those diseases without TCs in CTD, the TCs of its similar diseases can be easily accessed from DisSim. Implementation. DisSim has been implemented on a JavaEE framework and run on the web server (2-core (2.26 GHz) processors) of UCloud 27 . The four-layer architecture involving DATABASE, ALGORITHM, TOOLS, and VIEW layer is shown in Fig. 3. The detailed description of the architecture is fixed as following.
DATABASE layer. This layer stores DO 8 , disease-related genes, functional associations between genes, and therapeutic chemicals (TCs) of diseases. Among them, DO, disease-related genes, and functional associations between genes are exploited by ALGORITHM layer for calculating the similarity between diseases.
ALGORITHM layer. Five algorithms of measuring the similarity between DO terms have been implemented, which include Resnik, Lin, Wang, PSB, and SemFunSim.
TOOL layer. Two tools including SimDisExplore and SimPDExplore have been provided for exploring the similarity score between diseases. SimDisExplore is used to explore similar diseases for an inputted disease term. In comparison, SimPDExplore calculates the similarity for a given pair of diseases. Both SimDisExplore and SimPDExplore provide the function for comparing TCs of diseases.
VIEW layer. Web pages are provided for viewing the results. It shows the similarity of pair-wise diseases and the p-value of the similarity score. It also provides network visualization of relationships among TCs and a pair of similar diseases.