Drug-Drug Interaction Predicting by Neural Network Using Integrated Similarity

Drug-Drug Interaction (DDI) prediction is one of the most critical issues in drug development and health. Proposing appropriate computational methods for predicting unknown DDI with high precision is challenging. We proposed "NDD: Neural network-based method for drug-drug interaction prediction" for predicting unknown DDIs using various information about drugs. Multiple drug similarities based on drug substructure, target, side effect, off-label side effect, pathway, transporter, and indication data are calculated. At first, NDD uses a heuristic similarity selection process and then integrates the selected similarities with a nonlinear similarity fusion method to achieve high-level features. Afterward, it uses a neural network for interaction prediction. The similarity selection and similarity integration parts of NDD have been proposed in previous studies of other problems. Our novelty is to combine these parts with new neural network architecture and apply these approaches in the context of DDI prediction. We compared NDD with six machine learning classifiers and six state-of-the-art graph-based methods on three benchmark datasets. NDD achieved superior performance in cross-validation with AUPR ranging from 0.830 to 0.947, AUC from 0.954 to 0.994 and F-measure from 0.772 to 0.902. Moreover, cumulative evidence in case studies on numerous drug pairs, further confirm the ability of NDD to predict unknown DDIs. The evaluations corroborate that NDD is an efficient method for predicting unknown DDIs. The data and implementation of NDD are available at https://github.com/nrohani/NDD.


Supplementary File #1 -The results of the neighbor recommender and label propagation on each types of similarity
Supplementary Tables S1 and S2 summarizes the calculated criteria for performance of neighbor recommender and label propagation trained on different similarity matrices in DS1.

Method
Similarity

3/7
The DrugBank evidence list for false positive DDIs is presented in Supplementary Table S7. "No" means that no evidence in DrugBank was found for DDI.

Supplementary File #3 -Two case studies
Case Study 1: Interaction between "DB00798; Gentamicin" and "DB00493; Cefotaxime Gentamicin has high activity against Gram-positive and Gram-negative bacteria. It is a novel antibiotic compound from Micromonospora 1, 2 . It is used for cases with severe infections 3 and treatments of febrile patients that have cancer and granulocytopenia 4 . Cefotaxime is the first choice antibiotics for primary treatment of spontaneous bacterial peritonitis in cirrhosis with serious infections 5,6 . It is verified that Cefotaxime has influence on stimulating growth, regeneration and inducing embryogenesis 7 .
A growing body of evidence in the literature confirms the interaction of these two drugs. Some of the literature confirmations are presented here.
• An extensive study on this interaction is done by Bryan et al 8 , which is published in American Journal of Diseases of Children.
• Murry et al. 9 have done a broad investigation of the interaction between these two drugs and proved the resistance to Gentamicin in patients that were consuming Cefatoxime.
• An in vitro studies conducted by Elliott et al. 10 suggests that cefotaxime and gentamicin may provide some activity against enterococci.

Case Study 2: Interaction between "DB00945; Acetylsalicylic acid (Aspirin)" and "DB00424; Hyoscyamine"
Acetylsalicylic acid is prescribed in diverse situations for alleviating pain, fever, and inflammation. It can help the patients that had transient ischaemic attack and stroke to decrease the stroke risk 11 . It has some effects on reducing the possibility of the colorectal cancer and some other types of cancer 12 . Hyoscyamine can be found commonly in the stems and leaves of young Datura stramonium plants 13 . It is helpful in the treatment of disorders of the gastrointestinal (GI) tract. It is used in therapies for patients with irritable bowel syndrome, peptic ulcer disease, bladder spasms, colic, pancreatitis, and diverticulitis. Its influence in the treatment of Parkinson is proved 14,15 . Hyoscyamine is effective in short-term tremor abatement 15 .
There is an abundant body of evidence for interaction between Aspirin and Hyoscyamine. First, we enquired into "Drugs" database, which is the most popular, wide-ranging and up-to-date source of drug information online 16 . This database declared that Aspirin has an interaction with Hyoscyamine that is available at 17 and also 18 . There is also evidence in "RxList" which is an online medical resource dedicated to offering detailed and current pharmaceutical information on the brand and generic drugs 19 . The information at 20 verified the interaction of mentioned drugs.

Supplementary File #4 -Performance of dimension reduction methods on DS1
We tried to reduce feature dimensions by applying multiple methods and evaluating them. The evaluation criteria in Supplementary Table S8 shows that using common dimension reduction methods does not help the improvement of model.

Supplementary File #5 -About similarity types
The following information is compiled from the literature and internet.
Similarity measures: • Chemical-based: "Canonical simplified molecular input line entry specification (SMILES) of the drug molecules were downloaded from DrugBank. Hashed fingerprints were computed using the Chemical Development Kit (CDK) with default parameters. The similarity score between two drugs is computed based on their fingerprints according to the two-dimensional Tanimoto score, which is equivalent to the Jaccard score of their fingerprints, that is, the size of the intersection over the union when viewing each fingerprint as specifying a set of elements." 21 • Ligand-based: The Similarity Ensemble Approach (SEA) relates protein receptors based on the chemical 2D similarity of the ligand sets modulating their function. Given a drugs canonical SMILES, the SEA search tool compares it against a compendium of ligand sets and computes E-values for those ligand sets. To compute a drugdrug similarity drugs are queried using their canonical SMILES on the SEA tool. To obtain robust results, the drug is queried against the two ligand databases provided in the tool (MDL Drug data report and WOMBAT) and used two different methods to compute the drug fingerprint (Scitegic ECFP4 and Daylight), resulting in four lists of similar ligand sets. Unifying the four lists and filtering drugligand set pairs with E-values410 _5, a list of relevant protein is obtained receptor families for each drug. Finally, the similarity between a pair of drugs was computed as the Jaccard score between the corresponding sets of receptor families." 21 • Side-effect based: "Drug side effects were obtained from SIDER, an online database containing drug side effects associations extracted from package inserts using text mining methods. This list is augmented by side effect predictions for drugs that are not included in SIDER based on their chemical properties. Following this latter work, the similarity between drugs is defined according to the Jaccard score between either their known side effects or top 13 predicted side effects in case they are unknown." 21 • Annotation-based: The World Health Organization (WHO) ATC classification system is used. "This hierarchical classification system categorizes drugs according to the organ or system on which they act, their therapeutic effect, and their chemical characteristics. ATC codes were obtained from DrugBank. To define a similarity between ATC terms the semantic similarity algorithm of (Resnik, 1999) is used. This algorithm associates probabilities p(x) with all the nodes (i.e., ATC levels) x in the ATC hierarchy by computing the number of levels below x; it then calculates the similarity of two drugs as the maximum over all their common ancestors ATC level c of log (p(c))." 21 • Sequence-based: "Based on a SmithWaterman sequence alignment score between the corresponding drug targets (proteins). Following the normalization suggested in Bleakley and Yamanishi (2009), the SmithWaterman score is divided by the geometric mean of the scores obtained from aligning each sequence against itself." 21 • Closeness in a PPI network: "The distances between each pair of drug targets were calculated using an all-pairs shortest paths algorithm on the human PPI network. Distances were transformed to similarity values using the formula described in Perlman et al (2011): where S(p, p ′ ) is the computed similarity value between two proteins, D(p, p ′ ) is the shortest path between these proteins in the PPI network and A was chosen according to Perlman et al (2011) to be 0.9.e. Self-similarity was assigned a value of 1." 21 • GO based: "Semantic similarity scores between drug targets were calculated according to Resnik (1999), using the csbl.go R package selecting the option to use all three ontologies." 21 • Off-label side effect: "OFFSIDES20 is a side effect database built by mining FAERS system while controlling confounding factors such as concomitant medications, patient demographics, and patient medical histories. There are 1,332 drugs and 10,093 side effects in the dataset. We called side effects extracted from OFFSIDES as Off-Label Side Effect 22 .
The sets of Off-Label Side Effect of drugs can be vectorized by a binary vector. The similarity of drugs is defined as Jaccard score between either their known Off-Label Side Effects.

5/7
• Target similarity: "Targets are biological components that drugs interact with and alter their function to induce therapeutic effect(s)" 23 . "If a carrier, transporter, enzyme or target of a given drug is occupied or changed by another drug, then the pharmacological activity of the given drug is changed" 24 . "In order to calculate the similarity between two drugs, we extract the information from drug-target interactions. The underlying assumption made here is that two drugs that share more common target proteins are more similar" 25 .
• Transporter similarity: "Drug transporters play an important role in modulating drug absorption, distribution, and elimination. Acting alone or in concert with drug metabolizing enzymes they can affect the pharmacokinetics and pharmacodynamics of a drug. This commentary will focus on the potential role that drug transporters may play in drug-drug interactions and what information may be needed during drug development and new drug application (NDA) submissions to address potential drug interactions mediated by transporters" 26 . The Jaccard score between either the known transporters of drug pairs is defined as their similarity.
• Indication similarity: "In medical terminology, an "indication" for a drug refers to the use of that drug for treating a particular disease. For example, diabetes is an indication for insulin. Another way of stating this relationship is that insulin is indicated for the treatment of diabet. Medicinenet often have more than one indication, which means that there is more than one disease for which it is used. The Food and Drug Administration (FDA) classifies indications for drugs in the United States" 27 . The Jaccard score between either the known indication of drug pairs is defined as their similarity.
• Pathways similarity: 'The pathways of drugs are important information for not only understanding the mechanisms of drug action and metabolism but also for drug repositioning, which finds new therapeutic indications for approved drugs and experimental drugs that fail approval in their initial indication. Therefore, databases collecting drug pathways are increasingly important. Currently, several databases have included drug pathways, such as DrugBank. These databases are useful to drug-related studies. However, most of the drug pathways in the above databases are pathways for drug action and drug metabolism. Actually, besides its targets, one drug can induce expression changes of a number of genes, and thus can deregulate a number of pathways. In recent years, high-throughput technologies such as microarray and RNA-sequencing have produced a lot of drug-induced gene expression profiles. As a result, molecular activity generated from global gene expression profiling now emerges as a promising resource for drug research and then identify the druginduced pathways" 28 . The Jaccard score between either the known patways of drug pairs is defined as their similarity.
• Enzymes similarity: "Enzymes are the proteins in the drug design that act as drug targets for the diseases in the process of drug discovery and development" 29 . The majority of drugs which act on enzymes act as inhibitors and most of these are competitive, in that they compete for binding with the enzyme's substrate-for example the majority of the original (first generation) kinase inhibitors bind to the ATP pocket of the enzyme. Some inhibitors are non-competitive, binding away from the substrate binding domain, competing for co-factor/co-enzyme binding, or causing an allosteric conformational change in the 3-dimensional protein structure that prevents substrate interaction" 30 . The Jaccard score between either the known enzymes that drugs affect them, is defined as their similarity