Clustering drug-drug interaction networks with energy model layouts: community analysis and drug repurposing

Analyzing drug-drug interactions may unravel previously unknown drug action patterns, leading to the development of new drug discovery tools. We present a new approach to analyzing drug-drug interaction networks, based on clustering and topological community detection techniques that are specific to complex network science. Our methodology uncovers functional drug categories along with the intricate relationships between them. Using modularity-based and energy-model layout community detection algorithms, we link the network clusters to 9 relevant pharmacological properties. Out of the 1141 drugs from the DrugBank 4.1 database, our extensive literature survey and cross-checking with other databases such as Drugs.com, RxList, and DrugBank 4.3 confirm the predicted properties for 85% of the drugs. As such, we argue that network analysis offers a high-level grasp on a wide area of pharmacological aspects, indicating possible unaccounted interactions and missing pharmacological properties that can lead to drug repositioning for the 15% drugs which seem to be inconsistent with the predicted property. Also, by using network centralities, we can rank drugs according to their interaction potential for both simple and complex multi-pathology therapies. Moreover, our clustering approach can be extended for applications such as analyzing drug-target interactions or phenotyping patients in personalized medicine applications.


Results
We take the drug-drug interaction data from DrugBank 4.1 24 (each individual drug has a corresponding list of drug interactions) to build a raw DDI in Gephi 25 , as presented on the top of Fig. 1. Each node corresponds to a distinct drug; a link between two drugs corresponds to an interaction between them according to the considered database. No information regarding the structural or functional properties of these drugs is used throughout the process. We opt not to discriminate between the types of drug interactions (i.e. synergistic or antagonistic), because any kind of interaction contributes to defining the functional profile of a drug category or cluster. Also, we use the older DrugBank 4.1 to build our DDI, so that a later version (DrugBank 4.3) can be used to validate the newly found pharmacological properties in a fair manner.
Then, we apply the procedure suggested in Fig. 1 to automatically assign distinct colors to distinct node modularity classes 26 and to generate topological clusters (or communities) with energy-model layout algorithm Force Atlas 2. As a result of applying the procedure from Fig. 1, we generate our DDI: the Community Based Drug-Drug Interaction Network (CBDDIN) from Fig. 2. The tools that generate CBDDIN are normally employed in social network analysis 26,27 , and the fact that we use them for DDIs is motivated by the topological similarity between our DDI and typical social networks. Indeed, the network statistics for CBDDIN are similar to typical social network statistics: small average path length L = 2.978 for a network diameter Dmt of 7, significant clustering coefficient C = 0.2, average degree 〈 k〉 = 20.031, modularity Mod = 0.452 and density Dst = 0.017. Also, Fig. 3 provides the main network centrality distributions, which indicate a scale-free network due to the power-law degree distribution from   The power law parameters, slope α and cutoff point X min , are provided for degree, betwenness and eigenvector distributions; for the closeness distribution, the best fit is Gaussian function = . Network centrality analysis. The structure of CBDDIN is dictated by drug interaction relationships.
Therefore, drugs that are the most prone to drug-drug interactions correspond to the CBDDIN nodes with the biggest network centrality values. Table 1 presents the top 10 drugs, in terms of drug-drug interaction centrality, based on degree, betweenness, closeness, page rank, and eigenvector. Analyzing degree and betweenness centrality distributions contributes to better assessing the interaction potential of each individual drug. Our Supplementary Information (section 1, Fig. 1) presents a visual representation of CBDDIN's node degree and betweenness.
Interpretation of drug communities. Topological communities highlighted in Fig. 2 are generated by Force Atlas 2 27 , whereas modularity classes (identified with distinct colors) are automatedly generated according to Girvan and Newman's algorithm 29 . Both modularity classes and topological communities are labeled according to a common pharmacological property which characterizes the largest majority of drugs within a distinct community or class.
We assign a pharmacological property to the segregated topological community or modularity class according to DrugBank's terminology, even if we do not use these properties for clusterization. In some cases the property label refers to an organ/system, in other cases to a medical indication, and in other cases to a chemical structure. However, we consider that we should not put functional restrictions to our annotation process; instead, we look only for the general property that identifies the largest number of drugs within the modularity class and topological community. Indeed, there are several drug categories within each modularity class and topological community (as respectively presented in Tables 1 and 2 from the Supplementary Information file), but we generalized the included drug categories according to DrugBank terminology, referring to either pharmacodynamic or pharmacokinetic properties (see column Generalized label in Tables 1 and 2 from the Supplementary Information). If we consider smaller or larger distinct drug categories, then we will not have a unique label for each community or class.
The modularity-based labeling, as presented in Table 2, is very consistent with the allocated tag and its corresponding interpretation. This result is a consequence of modularity being a very good predictor of properties and functionality 30 . Also, modularity is directly linked to the distribution and density of links, which in our case represent drug interactions.
The accuracy of pharmacological labeling of the nine topological communities is reported, on average, in Table 3. Each community has nodes with one or several modularity classes which confirm the topological community tag. The accuracy percentage of color confirmation for the topological communities is provided in the fourth column (Conf. by mod. [%]). Although the initial confirmation percentages according to properties given in DrugBank 4.1 are not particularly high (i.e. the fourth column in Table 3), we are able to explain the allocated topological community labels by cross-validating with extensive literature survey (see supporting information provided as SupplementaryCBDDIN.xls file, tab Cross-checking references) and searching other databases, such as Drugs.com 31 , RxList 32 , as well as a later DrugBank version (i.e. DrugBank 4.3 24 , last accessed April 2016.). The summarizing result of this cross-validating process is presented in the fifth column of Table 3, which contains the percentages of drug labels which are further confirmed. According to Noack 26 , force-directed energy layouts such as Force Atlas 2 are producing the same clustering as modularity but, at the same time, they contain additional information regarding the nodes' positions (i.e. if nodes are central or eccentric within the topological cluster, if a node is at the border between topological clusters, etc.) Taken together, the results show that, although the network is generated based on drug-drug interaction information only, an average of 63% drug labels are found to be in agreement with the properties listed in DrugBank 4.1, whereas 22% are not listed but are subsequently confirmed by an extensive cross-checking process. Thus, a low percentage of drugs (15%) seem to be out of place in their respective topological communities. As such, we can deliver repositioning hypotheses by considering these cases of drugs that seem not to comply with either topological or modularity labels; it is the case of drugs from column "Not expl.
[%]" (not explained) in Table 3. Indeed, we suggest that the "not explained" drugs can be repositioned to a property that corresponds to either their modularity class or their topological community. Another repurposing strategy is to consider drugs that topologically lie at the border between two communities; this situation indicate that such drugs may have both pharmacological properties of the neighboring communities. Community I: Immune system related drugs. This community contains antineoplastic agents, immunostimulants and immunosuppressants. In this community, the nodes' LB color indicates antineoplastic and immunomodulatory properties. The presence of only one VM node in this community (busulfan) is based on its antineoplastic activity, besides being a substrate for cytochrome P450 3A4 (CYP3A4) 24 . Community II: CYP P450 acting drugs. The community consists of substrates, inhibitors and inducers of specific cytochrome P450 enzymes and it includes VM, GB, G, DB, LB and M nodes. VM is directly related to CYP; however, the presence of different color nodes is justified by the fact that, although being characterized by other properties, they also have a CYP-related activity. For instance, oxaliplatin lays within community II because it is identified as a strong CYP2E1 inducer. Oxaliplatin is an antineoplatic agent-this property is indicated by the LB modularity class 24 . Community III: Nervous system acting drugs. This cluster includes drugs that interfere with the metabolism of all neurotransmitters, inducing central as well as peripheral nervous effects. The node inspection reveals the presence of DB, M, LB, and GB colors. DB and M are directly related to the nervous system, while GB and LB correspond to drugs that have other primary properties but generate additional nervous system effects. One such example is guanabenz, a centrally acting antihypertensive that also pertains to modularity class LB because it inhibits 5-lipoxygenase, thus having anti-inflammatory effect 33 . Community IV: Sympathetic nervous system acting drugs. Here we have the classes of drugs that directly and indirectly act on alpha-and beta-adrenoreceptors, including drugs mimicking or inhibiting the sympathetic nervous system (SNS) effects. The included node colors are M (the large majority), GB, G, and DB, because M is directly linked to SNS, and GB, G, DB are additionally acting on SNS (e.g. tolbutamide is placed in community IV because it augments the vasoconstrictor effect of catecholamines 34,35 and has GB modularity due to its anti-platelet aggregation activity 36,37 ).  Community V: Kalemia and platelet activity related drugs. The majority of this community consists of renin-angiotensin system acting drugs and diuretics-with confirmed influence on kalemia and platelet activity-, as well as drugs that are used as platelet aggregation inhibitors. Because all nodes are GB, the community is described as platelet activity and plasma potassium level related drugs, thus confirming the strong connection between the serum potassium concentration and the platelet reactivity 38 . Community VI: Hemostasis related drugs. There are two characteristic node colors, namely GB and G, both of them characterizing drugs that interfere in different phases of hemostasis. Other nodes are LB, M, and P; the presence of LB (pentoxifylline) and P (levothyroxine) nodes is justified by their properties related to hemostasis (both are platelet aggregation inhibitors). Apparently, the presence of the M node (exenatide) within this community has no explanation. Community VII: Neuromuscular transmission acting drugs. All nodes have the same LB color, therefore the community mainly consists of drugs with neuromuscular-blocking activity, as a pharmacodynamic effect, as well as a pharmacotoxicologic effect. However, the community also contains immunosuppressive drugs (e.g. azathioprine) that are used in the treatment of myasthenia gravis (an autoimmune disease affecting the neuromuscular junction 39,40  Illustrating examples. In order to demonstrate that our methodology is able to recover multiple pharmacological properties as well as known repositionings, using only information about drug-drug interactions, we present the following cases. • Zafirlukast is an example for recovering multiple already known pharmacological properties. As such, zafirlukast is an oral leukotriene receptor antagonist used in asthma therapy. Its placement in topological community II (see Fig. 4A) is based on its CYP3A4 activity (substrate and inhibitor) according to DrugBank 4.1 24 and RxList 32 . Unlike most other nodes within community II, zafirlukast is golden brown (GB), because it interferes with platelet activity by inhibiting the platelet-activating factor 46 . • Thalidomide is a well-known example of successful drug repositioning; this drug was first used for preventing morning sickness in pregnant women, but then withdrawn because of its teratogenic effects. Nowadays, thalidomide is used in immunological and inflammatory diseases 47 . Indeed, as Fig. 4B presents, thalidomide is confirmed by our method as having anti-cancerous activity because it is a light blue (LB) node placed in Community I (Immune system related drugs).
Our Supplementary (Tables 3 and 4 in Supplementary Information), which can be developed as new drug repurposings or as new drug interaction discoveries.

Discussion
Our CBDDIN is characterizing drug's interaction potential for therapies with a small number of associated pathologies. On the other hand, betweenness and closeness have a non-local character, making them appropriate in characterizing drug interactions in the case of therapies for polypathologies. Furthemore, due to the fact that degree and betweenness are power-law distributed, we are able to identify the drugs with the highest potential for drug-drug interactions (see Table 1).
By using our dual clustering methodology for drug-drug interaction network community analysis, we identify 9 pharmacological characteristics (pointed by the community labels in Fig. 2); this may suggest that the 9 characteristics are paramount in determining and interpreting drug-drug interactions. Unveiling that certain pharmacological properties are more important than others in terms of drug-drug interactions is a clear contribution to the state-of-the-art.
The effectiveness of clustering CBDDIN drug communities according to specific pharmacological properties is confirmed for 85% of drugs from DrugBank 4.1., by cross-checking with other drug databases and extensive literature survey. Consequently, this high prediction accuracy indicates that, for the remaining 15% drugs, there is a high probability that the pharmacological properties predicted by our dual clustering methodology will be confirmed. We prefer the general term "property" because, according to the topological community or modularity class label, the property can be of pharmacokinetic, pharmacodynamic or pharmacotoxicological nature; as such, confirming a predicted property can lead to discovering a new interaction, a new indication (repositioning) or a new possible side-effect, respectively. As possible repositioning examples, our method predicts that chlorzoxazone has immunological properties because it is placed in topological community I. Cefalosporins such as Scientific RepoRts | 6:32745 | DOI: 10.1038/srep32745 cefalotin, cefamandole, cefixime, etc. also present certain effects on the immune system or neuromuscular junction, as they are placed in community VII. As possible new interaction examples are bevacizumab or lorazepam which possible interfere with CYP enzymes, due to their positioning within community II. The details for all 15% drugs that have unexplained modularity classes and positions within the respective communities, leading to potential new properties, are provided in section 3.3 of our Supplementary Information file (Tables 3 and 4). We suggest that these drugs be further investigated with in vivo and in vitro techniques, in order to confirm possible repositionings or new interactions.
We observe that topological communities IX (Epilepsy related drugs) and I (Immune system related drugs) present a significant overlapping zone (see Figures 12 and 13 in Supplementary Information's section 3.2); this overlapping suggests that there is a connection between the two drug categories, namely epilepsy-related and immune system acting. As a confirmation, recent research results indicate disulfiram as an anticancer drug 48 . Indeed, in our CBDDIN the epileptogenic disulfiram is placed in Community IX-Community I overlapping zone ( Supplementary Information, Figure 12). Moreover, medroxyprogesterone and megestrol are also placed in Community IX-Community I overlapping zone ( Supplementary Information, Figure 11), because they have anti-epileptic activity 49,50 and, at the same time, are hormonal antineoplastic agents 24 . The connection between antineoplastic endocrine drugs and epilepsy is further strengthened by recent research 51 which reports that anticancer drugs letrozole and fadrozole substantially mitigate seizures.
Drug-drug interactions reflect the interference of drug behaviors. In order to infer possible repositionings, our strategy relies on the behavioral relationships between drugs rather than the structural similarities represented by chemical structure relationships or drug-target relationships. In order to reduce the complexity entailed by drug repurposing, we suggest that our behavioral perspective can be integrated with the complementary structural perspective 52 , which can rely on various sets of data (genome, phenome, drug-target combinations, etc.) Therefore, we indicate the Big Mechanism project 53 as a possible integrator platform that can gather and process all repurposing information from different perspectives and tools.
Our CBDDIN is not the only bio-medical complex network that is similar to social networks; as such, our methodology is very useful for clustering patients in medical studies based on their compatibility, which is defined in terms of anthropometric measures, questionnaire results and simple clinical data (such as hypertension value, body temperature, oxygen desaturation, etc.) Thus, applications for cardiovascular diseases 54 , apnea 55 , and defining endophenotypes 56 are using a similar dual clustering methodology in order to show that the multiple disease factors do not associate at random; instead, they converge towards defining patient phenotypes which are useful for designing personalized therapies.

Methods
Databases. We build our CBDDIN by using drug interaction information from the public drug database DrugBank Version 4.1 24 , which has 7739 drug entries (among them there are FDA drugs and biotech drugs, nutraceuticals, and experimental drugs.) We filter all the drugs which have no drug-drug interaction information, obtaining 1141 drugs. The verification of interpretations and repositioning predictions obtained by our network-based methodology is performed by cross-checking functional properties in other databases: Drugs. com, RxList, and DrugBank 4.3. In addition, the cross-checking verification of functional drug properties is also made by searching the literature in electronic format. The list of 309 retrieved papers which confirm the predicted drug properties is given as supporting information (tab Cross-checking references in SupplementaryCBDDIN.xls).
Complex network clustering. The processing of our CBDDIN, including computation of statistics, modularity clustering and graphical layouts, is performed in Gephi 25 , a leading tool in visualization and analysis of large networks. The clustering algorithms that we use in this paper are based on modularity classes 26   The average path length L of a network (V, E) is the mean distance between two nodes, averaged over all pairs of nodes; the clustering coefficient C is defined as the average fraction of node's neighbor pairs that are also neighbors to each other 57 . The degree k of a node is defined as the total number of its incident edges. Thus, the network's degree distribution is characterized by function P(k), whereas the diameter Dmt is the maximal distance among all distances between any pair of nodes in the (V, E) network 57 . The network density Dst is defined as the ratio of edges in the network to the total number of possible edges 28 . Modularity Mod is a measure that characterizes the strength of dividing the network into communities 29 .
Besides the degree, the main centrality metrics (expressing the importance of a node in the society) are: betweenness, closeness, page rank and eigenvector. The node betweenness is defined as the number of minimal paths from all vertices in the network to all other vertices that pass through the node, normalized over the total number of shortest paths. The inverse of the sum of shortest path lengths from one node to all other vertices is called closeness. The eigenvector centrality computes relative scores for all nodes in the network, by considering that the connections to high-influence vertices are more important than the connections to low-influence vertices; the page rank is merely a variant of eigenvector used to rank websites 30,58 .
In order to measure the similarity between two complex networks, the network fidelity metric ϕ was introduced 28 : . Energy model layouts are layout algorithms that can be represented as force systems. Many energy model layouts are developed as attraction-repulsion or a-r force systems 26 . In a-r layouts, adjacent vertices attract whereas all other pairs of vertices repulse, thus forming groups of vertices with dense connections (i.e. communities or clusters). The a-r forces are proportional to the power (a or r) of the Euclidean distances between the nodes: the attraction between adjacent vertices v and w is − ⋅  →  x x x x v w a v w and the repulsion between any two vertices v w as the unit vector from v to w). Normally, ∈ a r R , , are chosen so that a ≥ 0 and r ≤ 0, so that attraction is not decreasing and repulsion is not increasing with the Euclidean distance. The most popular force-based layout systems are the model of Fruchterman and Reingold (a = 2, r = − 1) 59 , and the LinLog model (a = 0, r = − 1) 60 .
For all a-r energy models, the resulted layout corresponds to the situation where local energy minimum is attained 26 . As such, total energy for the a-r layout (a > r) is: Definition 3. Network clustering consists of classifying all the vertices ∈ v V in one of the disjoint vertex subsets (i.e. clusters) C, pertaining to the set of disjoint subsets V C

61
. Several network parameters are used for network clustering in complex networks, but one of the most useful is the modularity which was advocated by Newman and Girvan 29 . In an unweighted network, such as our DDI, the modularity of a clustering V C is defined as: In Equation 3, |E C | is the number of edges in cluster C, |E| is the total number of edges in the network, k C is the total degree of nodes in cluster C, while k is the total degree of nodes in the entire network. At the same time, E E C represents the fraction of intra-cluster edge density relative to the density of the entire network (which is assumed to be uniform), while is the expected such fraction. Therefore, modularity grows as clustering produces clusters with edge densities that are larger than expected. Noack has demonstrated that, when a > − 1 and r > − 1, force-directed layout algorithms produce topological clusters which are equivalent with those rendered by modularity-based network clustering 26 . However, force-directed algorithms provide additional topological information about clusters, which leads to recommending the usage of both modularity clustering and a-r force directed layouts for more accurate network analysis 27 .