Abstract
Background
Discovering potential drugdrug interactions (DDIs) is a longstanding challenge in clinical treatments and drug developments. Recently, deep learning techniques have been developed for DDI prediction. However, they generally require a huge number of samples, while known DDIs are rare.
Methods
In this work, we present KnowDDI, a graph neural networkbased method that addresses the above challenge. KnowDDI enhances drug representations by adaptively leveraging rich neighborhood information from large biomedical knowledge graphs. Then, it learns a knowledge subgraph for each drugpair to interpret the predicted DDI, where each of the edges is associated with a connection strength indicating the importance of a known DDI or resembling strength between a drugpair whose connection is unknown. Thus, the lack of DDIs is implicitly compensated by the enriched drug representations and propagated drug similarities.
Results
Here we show the evaluation results of KnowDDI on two benchmark DDI datasets. Results show that KnowDDI obtains the stateoftheart prediction performance with better interpretability. We also find that KnowDDI suffers less than existing works given a sparser knowledge graph. This indicates that the propagated drug similarities play a more important role in compensating for the lack of DDIs when the drug representations are less enriched.
Conclusions
KnowDDI nicely combines the efficiency of deep learning techniques and the rich prior knowledge in biomedical knowledge graphs. As an original opensource tool, KnowDDI can help detect possible interactions in a broad range of relevant interaction prediction tasks, such as proteinprotein interactions, drugtarget interactions and diseasegene interactions, eventually promoting the development of biomedicine and healthcare.
Plain Language Summary
Understanding how drugs interact is crucial for safe healthcare and the development of new medicines. We developed a computational tool that can analyze the data about medicines within large medical databases and predict the impact of being treated by multiple drugs at the same time on the person taking the drugs. Our tool, named KnowDDI, can predict which drugs interact with each other and also provide an explanation for why the interaction is likely to take place. We demonstrated that our tool can identify known drug interactions. It could potentially be used in the future to identify previously unknown or unanticipated interactions that could have negative consequences to people being treated with unusual combinations of medicines.
Similar content being viewed by others
Introduction
Accurately predicting drugdrug interaction (DDI) can play an important role in the field of biomedicine and healthcare. On the one hand, combination therapies, where multiple drugs are used together, can be used to treat complex disease and comorbidities, such as human immunodeficiency virus (HIV)^{1,2}. Recent study also shows that combination therapies, such as a combination of lopinavir and ritonavir, may treat coronavirus disease (COVID19)^{3,4,5}, the infectious disease which causes global pandemic in the past three years. On the other hand, DDI is an important cause of adverse drug reactions, which accounts for 1% hospitalizations in the general population and 2–5% hospital admissions in the elderly^{6,7,8}. A concrete example is that if warfarin and aspirin enter the body together, they will compete for binding to plasma proteins. Then, the remained warfarin that cannot be bounded to plasma proteins will remain in the blood, which results in acute bleeding in patients^{9}.
Identifying DDIs by clinical evidence such as laboratory studies is extremely costly and timeconsuming^{6,8}. In recent years, computational techniques especially deep learning approaches are developed to speed up the discovery of potential DDIs. Naturally, DDI fact triplets can be represented as a graph where each node corresponds to a drug, and each edge represents an interaction between two drugs. Provided with DDI fact triplets, a number of graph learning methods have been developed to identify unknown interactions between drugpairs. Graph neural networks (GNNs)^{10,11}, which can obtain expressive node embeddings by endtoend learning from the topological structure and associated node features, have also been applied for DDI prediction problem. However, known DDI fact triplets are rare due to the high experimental cost and continually emerging new drugs^{12}. For example, the latest DrugBank database with 14,931 drug entries only contains 365,984 known DDI fact triplets^{13}, the quantity of which is less than 1% of the total potential DDIs. This makes overparameterized deep learning models fail to give full play to its expressive ability and may perform even worse than traditional twostage embedding methods^{14,15}.
In biomedicine and healthcare, many international level agencies such as National Center for Biotechnology Information and European Bioinformatics Institute are endeavored to regularly maintain rich publicly available biomedical data resources^{16}. Researchers then integrate these disparate and heterogeneous data resources into knowledge graphs (KGs) to facilitate an organized use of information. Examples are Hetionet^{17,18}, PharmKG^{19} and PrimeKG^{20}. These KGs contain rich prior knowledge discovered in biomedicine and healthcare. A proper usage of them may compensate for the lack of samples for DDI prediction. The pioneer work KGNN^{21} firstly leverages external KGs to provide topological information for each drug in target drugpair. In particular, it uniformly samples a fixed size set of neighbors around each drug, then aggregates drug features and messages from the sampled neighbors into the drug representation without considering which drug to interact. Later works merge the DDI network with external KGs as a combined network, extract enclosing subgraphs for different drugpairs to encode the drugpair specific information, and then predict DDI for the target drugpair using the concatenation of nodes embeddings of drugs and subgraph embedding of enclosing subgraphs^{22,23,24}. However, as these KGs integrate diverse data resources by automated process or experts, existing methods fail to filter out noise or inconsistent information. As a result, properly leveraging external KGs is still a challenging problem.
In this paper, we propose KnowDDI, an accurate and interpretable method for DDI prediction. First, we merge the provided DDI graph and an external KG into a combined network, upon which generic representations for all nodes are learned to encode the generic knowledge. Next, we extract a drugflow subgraph for each drugpair from the combined network. We then learn a knowledge subgraph from generic representations and the drugflow subgraph. After optimization, the representations of drugs are transformed to be more predictive of the DDI types between the target drugpair. In addition, the returned knowledge subgraph contains explaining paths to interpret the prediction result for the drugpair, where the explaining paths consist of only edges of important known DDIs or newlyadded edges connecting highly similar drugs. In other words, the learned knowledge subgraph helps filter out irrelevant information and adds in resembling relationships between drugs whose interactions are unknown. This allows the lack of DDIs to be implicitly compensated by the enriched drug representations and propagated drug similarities. We perform extensive experimental results on benchmark datasets, and observe that KnowDDI consistently outperforms existing works. We also conduct a series of case studies which further show that KnowDDI can discover convincing explaining paths which help interpret the DDI prediction results. KnowDDI has the potential to be used in a broad range of relevant interaction prediction tasks, such as proteinprotein interactions, drugtarget interactions and diseasegene interactions to help detect potential interactions, eventually advancing the development of biomedicine and healthcare.
Methods
Overview of KnowDDI
Our KnowDDI (Fig. 1) learns to predict DDIs between a drugpair, i.e., head drug h and tail drug t in the DDI graph, by learning with knowledge subgraph, i.e., denoted as \({{{{{{{{\mathcal{S}}}}}}}}}_{h,t}\). The provided DDI graph and an external KG are merged into a combined network as the start. Every node of the combined network is associated with a unique generic embedding which is learned to encode the generic knowledge. Given a target drugpair (h, t), a drugflow subgraph \({\bar{{{{{{{{\mathcal{S}}}}}}}}}}_{h,t}\) which captures local context relevant to (h, t) is extracted from the combined network. As directly leveraging external KG (and hence \({\bar{{{{{{{{\mathcal{S}}}}}}}}}}_{h,t}\)) may bring in irrelevant information, the graph structure and node embeddings of \({\bar{{{{{{{{\mathcal{S}}}}}}}}}}_{h,t}\) are further iteratively optimized. During this process, the generic embeddings are transformed to be more predictive of the DDI types between the target drugpair. In addition, KnowDDI estimates a connection strength for every drugpair in the subgraph, representing the importance of a given edge between connected nodes in the drugflow subgraph or similarity between two nodes whose connection is unknown. Accordingly, a new edge of type “resemble" is added between two nodes if their node embeddings are highly similar, and existing edges can be dropped if the importance is estimated as low. Thus, only useful information flows between nodes are kept. The final optimized subgraph becomes our knowledge subgraph \({{{{{{{{\mathcal{S}}}}}}}}}_{h,t}\) which consists of explaining paths. The average connection strength over all consecutive nodepairs along each explaining path indicates its ability of explaining the current prediction result in the perspective of KnowDDI. Supplementary Table 1 shows a summary of characteristics comparing KnowDDI with existing works.
Problem setup
A DrugDrug Interaction (DDI) graph is denoted as \({{{{{{{{\mathcal{G}}}}}}}}}_{{{{{{{{\rm{DDI}}}}}}}}}\,\) = \(\{{{{{{{{{\mathcal{V}}}}}}}}}_{{{{{{{{\rm{DDI}}}}}}}}},{{{{{{{{\mathcal{E}}}}}}}}}_{{{{{{{{\rm{DDI}}}}}}}}},{{{{{{{{\mathcal{R}}}}}}}}}_{{{{{{{{\rm{DDI}}}}}}}}}\}\), where \({{{{{{{{\mathcal{V}}}}}}}}}_{{{{{{{{\rm{DDI}}}}}}}}}\) is a set of drug nodes, \({{{{{{{{\mathcal{E}}}}}}}}}_{{{{{{{{\rm{DDI}}}}}}}}}\) is a set of edges, and \({{{{{{{{\mathcal{R}}}}}}}}}_{{{{{{{{\rm{DDI}}}}}}}}}\) is a set of DDI relation types associated with the edges. In particular, each edge \((u,r,v)\in {{{{{{{{\mathcal{E}}}}}}}}}_{{{{{{{{\rm{DDI}}}}}}}}}\) corresponds to an observed fact triplet, which records the DDI relation type \(r\in {{{{{{{{\mathcal{R}}}}}}}}}_{{{{{{{{\rm{DDI}}}}}}}}}\) associated with (u, v).
The external Knowledge Graph (KG) is denoted as \({{{{{{{{\mathcal{G}}}}}}}}}_{{{{{{{{\rm{KG}}}}}}}}}=\{{{{{{{{{\mathcal{V}}}}}}}}}_{{{{{{{{\rm{KG}}}}}}}}},{{{{{{{{\mathcal{E}}}}}}}}}_{{{{{{{{\rm{KG}}}}}}}}},{{{{{{{{\mathcal{R}}}}}}}}}_{{{{{{{{\rm{KG}}}}}}}}}\}\), which contains rich biomedical knowledge of various kinds of biomedical entities. Particularly, \({{{{{{{{\mathcal{V}}}}}}}}}_{{{{{{{{\rm{KG}}}}}}}}}\) consists of \( {{{{{{{{\mathcal{V}}}}}}}}}_{{{{{{{{\rm{KG}}}}}}}}}\) entities ranging from drugs, genes to proteins, \({{{{{{{{\mathcal{R}}}}}}}}}_{{{{{{{{\rm{KG}}}}}}}}}\) consists of \( {{{{{{{{\mathcal{R}}}}}}}}}_{{{{{{{{\rm{KG}}}}}}}}}\) types of interactions occurred in \({{{{{{{{\mathcal{G}}}}}}}}}_{{{{{{{{\rm{KG}}}}}}}}}\), while \({{{{{{{{\mathcal{E}}}}}}}}}_{{{{{{{{\rm{KG}}}}}}}}}=\{(u,r,v) u,v\in {{{{{{{{\mathcal{V}}}}}}}}}_{{{{{{{{\rm{KG}}}}}}}}},r\in {{{{{{{{\mathcal{R}}}}}}}}}_{{{{{{{{\rm{KG}}}}}}}}}\}\) consists of observed fact triplets in \({{{{{{{{\mathcal{G}}}}}}}}}_{{{{{{{{\rm{KG}}}}}}}}}\). Usually, \({{{{{{{{\mathcal{G}}}}}}}}}_{{{{{{{{\rm{KG}}}}}}}}}\) is much larger than \({{{{{{{{\mathcal{G}}}}}}}}}_{{{{{{{{\rm{DDI}}}}}}}}}\), such that \({{{{{{{{\mathcal{V}}}}}}}}}_{{{{{{{{\rm{DDI}}}}}}}}}\subseteq {{{{{{{{\mathcal{V}}}}}}}}}_{{{{{{{{\rm{KG}}}}}}}}}\) and \({{{{{{{{\mathcal{R}}}}}}}}}_{{{{{{{{\rm{DDI}}}}}}}}}\subseteq {{{{{{{{\mathcal{R}}}}}}}}}_{{{{{{{{\rm{KG}}}}}}}}}\) hold.
The combination of \({{{{{{{{\mathcal{G}}}}}}}}}_{{{{{{{{\rm{DDI}}}}}}}}}\) and \({{{{{{{{\mathcal{G}}}}}}}}}_{{{{{{{{\rm{KG}}}}}}}}}\) then forms a large combined network \({{{{{{{\mathcal{G}}}}}}}}=\{{{{{{{{\mathcal{V}}}}}}}},{{{{{{{\mathcal{E}}}}}}}},{{{{{{{\mathcal{R}}}}}}}}\}=\{{{{{{{{{\mathcal{V}}}}}}}}}_{{{{{{{{\rm{DDI}}}}}}}}}\cup {{{{{{{{\mathcal{V}}}}}}}}}_{{{{{{{{\rm{KG}}}}}}}}},{{{{{{{{\mathcal{E}}}}}}}}}_{{{{{{{{\rm{DDI}}}}}}}}}\cup {{{{{{{{\mathcal{E}}}}}}}}}_{{{{{{{{\rm{KG}}}}}}}}},{{{{{{{{\mathcal{R}}}}}}}}}_{{{{{{{{\rm{DDI}}}}}}}}}\cup {{{{{{{{\mathcal{R}}}}}}}}}_{{{{{{{{\rm{KG}}}}}}}}}\}\).
The target of this paper is to learn a mapping function from the combined network \({{{{{{{\mathcal{G}}}}}}}}\), which can predict the relation type between new drugpairs. For multiclass DDI prediction, each drugpair only has one specific relation type \(r\in {{{{{{{{\mathcal{R}}}}}}}}}_{{{{{{{{\rm{DDI}}}}}}}}}\). As for multilabel DDI prediction, multiple relation types r_{1}, r_{2}, … ∈ \({{{{{{{{\mathcal{R}}}}}}}}}_{{{{{{{{\rm{DDI}}}}}}}}}\) can cooccur between a drugpair.
Architecture of KnowDDI
The overall architecture of KnowDDI is shown in Fig. 1. Here, we provide the details of how drug representations are enriched by an external KG and how similarities are propagated in knowledge subgraphs. The complete algorithms of training and testing KnowDDI are summarized in Supplementary Note 1.
Generic embedding generation
To encode the generic knowledge of various other type of entities in \({{{{{{{\mathcal{G}}}}}}}}\), which can help enrich representation for drug nodes, we run a GNN on the combined network \({{{{{{{\mathcal{G}}}}}}}}\) to obtain generic embedding of each node.
Let \({{{{{{{{\bf{e}}}}}}}}}_{v}^{(0)}\) denote the feature of node \(v\in {{{{{{{\mathcal{V}}}}}}}}\). In KnowDDI, we follow GraphSAGE^{25} and update the embedding \({{{{{{{{\bf{e}}}}}}}}}_{v}^{(l)}\) for node v at the lth layer as:
where MEAN( ⋅ ) is elementwise mean pooling, \({{{{{{{{\bf{W}}}}}}}}}_{a}^{(l)}\) and \({{{{{{{{\bf{W}}}}}}}}}_{c}^{(l)}\) are learnable parameters, and \(\left[\cdot \parallel \cdot \right]\) concatenates vectors along the last dimension. After L layers of message passing, the final node embedding \({{{{{{{{\bf{e}}}}}}}}}_{v}^{(L)}\) is taken as the generic embedding of \(v\in {{{{{{{\mathcal{V}}}}}}}}\).
Drugflow subgraph construction
As each drugpair can depend on different local contexts, i.e., entities and relations, we construct a drugflow subgraph \({\bar{{{{{{{{\mathcal{S}}}}}}}}}}_{h,t}\) specific to (h, t) from the combined network \({{{{{{{\mathcal{G}}}}}}}}\), which transforms drug representations obtained in Section 6 to be drugpairaware.
For each (h, t), we define its drugflow subgraph \({\bar{{{{{{{{\mathcal{S}}}}}}}}}}_{h,t}=\{{\bar{{{{{{{{\mathcal{V}}}}}}}}}}_{h,t},{\bar{{{{{{{{\mathcal{E}}}}}}}}}}_{h,t},{\bar{{{{{{{{\mathcal{R}}}}}}}}}}_{h,t}\}\) as a directed subgraph of graph \({{{{{{{\mathcal{G}}}}}}}}\) consisting of relational paths \(\{{\bar{{{{{{{{\bf{p}}}}}}}}}}_{h,t}\}\) with length at most P pointing from h to t in \({{{{{{{\mathcal{G}}}}}}}}\), where a relational path
is a sequence of nodes connected by relations. Here, \({\bar{{{{{{{{\mathcal{V}}}}}}}}}}_{h,t}\) is a set of nodes appearing in \({\bar{{{{{{{{\mathcal{S}}}}}}}}}}_{h,t}\), \({\bar{{{{{{{{\mathcal{R}}}}}}}}}}_{h,t}\) is a set of relation types occurred between nodes in \({\bar{{{{{{{{\mathcal{V}}}}}}}}}}_{h,t}\) and \({\bar{{{{{{{{\mathcal{E}}}}}}}}}}_{h,t}=\{(u,r,v) u,v\in {{{{{{{{\mathcal{V}}}}}}}}}_{h,t}\,{{{{{{{\rm{and}}}}}}}}\,(u,r,v)\in {{{{{{{\mathcal{E}}}}}}}}\}\) is a set of edges connecting nodes in \({\bar{{{{{{{{\mathcal{V}}}}}}}}}}_{h,t}\).
Algorithm ?? in Supplementary Note 1 summarizes the procedure of extracting \({\bar{{{{{{{{\mathcal{S}}}}}}}}}}_{h,t}\). Given a drugpair (h, t), we first extract the interaction of local neighborhoods of h and t (i.e. Khop enclosing subgraph^{22}, where K is a hyperparameter) from \({{{{{{{\mathcal{G}}}}}}}}\). For computational simplicity, we make all the relational paths \(\{{\bar{{{{{{{{\bf{p}}}}}}}}}}_{h,t}\}\) between h and t have length P. This is done by augmenting relational paths with length less than P by identity relations^{26,27}, i.e., (t, r_{identity}, t). If there is no relational path connecting h and t, we return \({\bar{{{{{{{{\mathcal{S}}}}}}}}}}_{h,t}=\{\{h,t\},{{\emptyset}},{{\emptyset}}\}\). As these Khop enclosing subgraphs neglect directional information, we need to conduct directional pruning to remove all nodes and corresponding edges which are not on any relational path pointing from h to t. Thus, after directional pruning, the resultant drugflow subgraph \({\bar{{{{{{{{\mathcal{S}}}}}}}}}}_{h,t}\) only contains nodes which supports learning the information flow from h to t.
Knowledge subgraph generation
Further, we learn a knowledge subgraph \({{{{{{{{\mathcal{S}}}}}}}}}_{h,t}\) from generic embeddings and the drugflow subgraph \({\bar{{{{{{{{\mathcal{S}}}}}}}}}}_{h,t}\). During this process, irrelevant edges are removed and new edges of type “resemble" are added between nodes with highly similar node embeddings.
Let \({\bar{{{{{{{{\bf{A}}}}}}}}}}_{h,t}\) be a binary thirdorder tensor with size \( {\bar{{{{{{{{\mathcal{V}}}}}}}}}}_{h,t} \times  {\bar{{{{{{{{\mathcal{V}}}}}}}}}}_{h,t} \times  {\bar{{{{{{{{\mathcal{R}}}}}}}}}}_{h,t}\). Its (u, v, r)th entry is computed as
which records whether drugpair (u, v) is connected by relation type \(r\in {\bar{{{{{{{{\mathcal{R}}}}}}}}}}_{h,t}\) in \({\bar{{{{{{{{\mathcal{S}}}}}}}}}}_{h,t}\).
In addition, we estimate another thirdorder tensor A_{h,t} from \({\bar{{{{{{{{\bf{A}}}}}}}}}}_{h,t}\) with elements A_{h,t}(u, v, r) ∈ [0, 1] and size \( {\bar{{{{{{{{\mathcal{V}}}}}}}}}}_{h,t} \times  {\bar{{{{{{{{\mathcal{V}}}}}}}}}}_{h,t} \times ( {\bar{{{{{{{{\mathcal{R}}}}}}}}}}_{h,t} +1)\) to record the connection strength between nodes \(u,v\in {\bar{{{{{{{{\mathcal{V}}}}}}}}}}_{h,t}\) w.r.t. relation r. Specifically, if \({\bar{{{{{{{{\bf{A}}}}}}}}}}_{h,t}(u,v,r)=1\) but A_{h,t}(u, v, r) = 0, this means the existing edge (u, r, v) is not useful and should be removed. Besides, if \({\bar{{{{{{{{\bf{A}}}}}}}}}}_{h,t}(u,v,r)=0\) for all \(r\in {\bar{{{{{{{{\mathcal{R}}}}}}}}}}_{h,t}\), we add an edge of relation type “resemble" \({r}_{{{{{{{{\rm{sim}}}}}}}}}\in {{{{{{{\mathcal{R}}}}}}}}\) to connect u and v. \({{{{{{{{\bf{A}}}}}}}}}_{h,t}(u,v,{r}_{{{{{{{{\rm{sim}}}}}}}}}) \, > \, 0\) then represents the similarity between u and v. Corresponding to A_{h,t} and \({\bar{{{{{{{{\mathcal{S}}}}}}}}}}_{h,t}\), the knowledge subgraph \({{{{{{{{\mathcal{S}}}}}}}}}_{h,t}\) is generated as
where \({{{{{{{{\mathcal{R}}}}}}}}}_{h,t}=\{{r}_{{{{{{{{\rm{sim}}}}}}}}}\}{\bar{{{{{{{{\mathcal{R}}}}}}}}}}_{h,t}\), and \({{{{{{{{\mathcal{E}}}}}}}}}_{h,t}=\{(u,r,v)\}\) with each (u, r, v) constructed as \((u,r,v)\in {\bar{{{{{{{{\mathcal{E}}}}}}}}}}_{h,t}\) if \({\bar{{{{{{{{\bf{A}}}}}}}}}}_{h,t}(u,v,r)=1\wedge {{{{{{{{\bf{A}}}}}}}}}_{h,t}(u,v,r) \, > \, 0\), or \((u,{r}_{{{{{{{{\rm{sim}}}}}}}}},v)\) if \({\bar{{{{{{{{\bf{A}}}}}}}}}}_{h,t}(u,v,r)=0\wedge {{{{{{{{\bf{A}}}}}}}}}_{h,t}(u,v,{r}_{{{{{{{{\rm{sim}}}}}}}}}) \, > \, 0\).
To learn such a A_{h,t}, we conduct graph structure learning to alternate the following two steps for T times:

estimate connection strengths between every pair of nodes in \({\bar{{{{{{{{\mathcal{V}}}}}}}}}}_{h,t}\), and

refine node embeddings on the updated subgraph.
First, we initialize the node embedding of each \(v\in {\bar{{{{{{{{\mathcal{V}}}}}}}}}}_{h,t}\) as \({{{{{{{{\bf{h}}}}}}}}}_{u}^{(0)}={{{{{{{{\bf{e}}}}}}}}}_{u}^{(L)}\) to encode the global topology of \({{{{{{{\mathcal{G}}}}}}}}\). Let \({{{{{{{{\bf{A}}}}}}}}}_{h,t}^{(\tau )}\) be the estimation of A_{h,t} at the τth iteration. We initialize \({{{{{{{{\bf{A}}}}}}}}}_{h,t}^{(0)}={\bar{{{{{{{{\bf{A}}}}}}}}}}_{h,t}\). Next, we estimate relevance score \({{{{{{{{\bf{C}}}}}}}}}_{h,t}^{(\tau )}(u,v,r)\) for each relation \(r\in {{{{{{{{\mathcal{R}}}}}}}}}_{h,t}\) between every nodepair (u, v) in \({{{{{{{{\mathcal{S}}}}}}}}}_{h,t}\) as
where \({{{{{{{{\bf{h}}}}}}}}}_{uv}^{\tau 1}=\exp ( {{{{{{{{\bf{h}}}}}}}}}_{u}^{(\tau 1)}{{{{{{{{\bf{h}}}}}}}}}_{v}^{(\tau 1)} )\), ∣ ⋅ ∣ returns the elementwise absolute value, MLP is multilayer perception, and h_{r} is the learnable relation embedding of relation r. We set \({{{{{{{{\bf{C}}}}}}}}}_{h,t}^{(\tau )}(v,v,r)=1\) for all \(v\in {\bar{{{{{{{{\mathcal{V}}}}}}}}}}_{h,t}\) and \(r\in {{{{{{{{\mathcal{R}}}}}}}}}_{h,t}\).
This learned \({{{{{{{{\bf{C}}}}}}}}}_{h,t}^{(\tau )}\) reveals how the model understands the connection between different nodepairs at the τth iteration. It helps filter out irrelevant information and captures resembling relationships between drugs whose interactions are unknown. In the early stage of optimization, \({{{{{{{{\bf{C}}}}}}}}}_{h,t}^{(\tau )}\) can be less trustworthy. Hence, we merge this learned subgraph with \({\bar{{{{{{{{\bf{A}}}}}}}}}}_{h,t}\) to obtain \({{{{{{{{\bf{A}}}}}}}}}_{h,t}^{(\tau )}\), i.e.,
where hyperparameter α is used to balance their contribution in the final prediction, the threshold γ ≥ 0 is used to screen out those less informative edges, and \([{{{{{{{\rm{ReLU}}}}}}}}(x)]=\max (x,0)\). Considering that nodes are connected by different numbers of neighbors, we use function δ( ⋅ ) to ensure that the relevance scores of incoming edges for v sum into 1, i.e.,
Here, we instantiate δ( ⋅ ) as edge softmax function which computes softmax over attention weights of incoming edges regardless of their relation types for every node, i.e., \({{{{{{{\rm{softmax}}}}}}}}({x}_{i})=\exp ({x}_{i})/{\sum }_{j\in {{{{{{{\mathcal{N}}}}}}}}(i)}\exp ({x}_{j})\) where \({{{{{{{\mathcal{N}}}}}}}}(i)=\{j:(j,r,i)\in {{{{{{{{\mathcal{E}}}}}}}}}_{h,t}\}\). Let \({{{{{{{{\bf{H}}}}}}}}}_{h,t}^{(\tau )}\) be embeddings of all nodes in \({\bar{{{{{{{{\mathcal{V}}}}}}}}}}_{h,t}\) where the vth row corresponds to node embedding \({{{{{{{{\bf{h}}}}}}}}}_{v}^{(\tau )}\) of \(v\in {\bar{{{{{{{{\mathcal{V}}}}}}}}}}_{h,t}\), and
where \({{{{{{{{\bf{A}}}}}}}}}_{h,t}^{(\tau )}(:,:,r)\) is the rth slice of \({{{{{{{{\bf{A}}}}}}}}}_{h,t}^{(\tau )}\) and W_{r} is a learnable parameter. Then, \({{{{{{{{\bf{H}}}}}}}}}_{h,t}^{(\tau )}\) is updated as
After T iterations, the representations of drugs are transformed to be more predictive of the DDI types between the target drugpair, and \({{{{{{{{\mathcal{S}}}}}}}}}_{h,t}\) only keeps edges of important known DDIs or newlyadded edges connecting highly similar drugs. We set \({{{{{{{{\bf{h}}}}}}}}}_{v}={{{{{{{{\bf{h}}}}}}}}}_{v}^{(T)}\) as the final node embedding, and return \({{{{{{{{\bf{A}}}}}}}}}_{h,t}={{{{{{{{\bf{A}}}}}}}}}_{h,t}^{(T)}\) which records the updated graph structure of \({{{{{{{{\mathcal{S}}}}}}}}}_{h,t}\). Learning subgraph embedding of \({{{{{{{{\mathcal{S}}}}}}}}}_{h,t}\) is commonly adopted to encode the subgraph topology ^{22,23,25}. Hence, we follow this routine and obtain the subgraph embedding \({{{{{{{{\bf{h}}}}}}}}}_{{{{{{{{{\mathcal{S}}}}}}}}}_{h,t}}\) of \({{{{{{{{\mathcal{S}}}}}}}}}_{h,t}\) as
Finally, we predict the relation for (h, t) as
where W_{c} is the classifier parameter.
The results of applying different knowledge subgraph generation strategies are shown in Supplementary Fig. 1 and analyzed in Supplementary Note 2.
Learning and inference
Let θ_{g} and θ_{k} denote the collection of parameters associated with generic embedding generation and knowledge subgraph generation respectively. Further, let y_{h,t} = [y_{h,t}(i)] be a vector where the ith element y_{h,t}(i) = 1 if relation \(i\in {{{{{{{{\mathcal{R}}}}}}}}}_{{{{{{{{\rm{DDI}}}}}}}}}\) occurs in (h, t) and 0 otherwise.
For multiclass DDI prediction, we optimize KnowDDI w.r.t. the cross entropy loss:
As for multilabel DDI prediction, drugpairs are associated with varying number of relations. We further use a loss function with negative sampling. Following related works^{22,23}, we construct negative triplets to prevent KnowDDI from selecting those unknown relations. For each \((h,r,t)\in {{{{{{{{\mathcal{E}}}}}}}}}_{{{{{{{{\rm{DDI}}}}}}}}}\), we replace t by a randomly sampled drug \(w\in {{{{{{{{\mathcal{V}}}}}}}}}_{{{{{{{{\rm{DDI}}}}}}}}}\) to form (h, r, w) whose label vector y_{h,w} = [0, …, 0] contains zeros only. Let \({{{{{{{{\mathcal{E}}}}}}}}}_{{{{{{{{\rm{neg}}}}}}}}}=\{(h,r,w) (h,r,t)\in {{{{{{{{\mathcal{E}}}}}}}}}_{{{{{{{{\rm{DDI}}}}}}}}}\,{{{{{{{\rm{and}}}}}}}}\,(h,r,w) \, \notin \, {{{{{{{{\mathcal{E}}}}}}}}}_{{{{{{{{\rm{DDI}}}}}}}}}\}\) collectively contains the negative triplets. We optimize KnowDDI w.r.t. the following loss for multilabel DDI prediction:
where 1 is a vector of all 1s. Note that Eq. (14) only penalizes wrong prediction of known relations between drugpairs. In other words, for triplets that are not observed in \({{{{{{{{\mathcal{E}}}}}}}}}_{{{{{{{{\rm{DDI}}}}}}}}}\), we regard them as unknown.
During inference, given a new drugpair \(({h}^{{\prime} },{t}^{{\prime} })\) where \({h}^{{\prime} },{t}^{{\prime} }\in {{{{{{{\mathcal{V}}}}}}}}\), we directly use KnowDDI with optimized θ_{g}, θ_{k} to obtain the class prediction vector \({\hat{{{{{{{{\bf{y}}}}}}}}}}_{{h}^{{\prime} },{t}^{{\prime} }}\). For multiclass prediction, the class is predicted as the relation which obtains the highest possibility in \({\hat{{{{{{{{\bf{y}}}}}}}}}}_{{h}^{{\prime} },{t}^{{\prime} }}\). As for multilabel prediction, the complete \({\hat{{{{{{{{\bf{y}}}}}}}}}}_{{h}^{{\prime} },{t}^{{\prime} }}\) is returned. Please refer to Algorithm ?? in Supplementary Note 1 for details.
Identifying explaining paths
To explain the predicted DDI for (h, t), we take out the explaining paths from \({{{{{{{{\mathcal{S}}}}}}}}}_{h,t}\). In particular, an explaining path
is a sequence of nodes, where node v_{i} and node v_{i+1} are connected by relation \({r}_{i+1}\in {{{{{{{\mathcal{R}}}}}}}}\) with a connection strength indicated by A_{h,t}(v_{i}, v_{j}, r_{j}). We then obtain the average connection strength of p_{h,t} by averaging over A_{h,t}(v_{i}, v_{j}, r_{j}) of consecutive pairs of nodes in p_{h,t}. This average connection strength reflects the ability of the explaining path to interpret the prediction result from the perspective of KnowDDI.
Training details
In KnowDDI, we use a twolayer GraphSAGE^{25} to obtain the generic node embedding \({{{{{{{{\bf{e}}}}}}}}}_{v}^{(l)}\) whose dimension is set as 32. For drugflow subgraph extraction, we extract 2hop neighborhood and then extract relational paths with length at most 4 pointing from h to t. The dimension of edge embedding h_{r} in Eq. (6) is set as 32. We alternate between estimating connection strengths and refining node embeddings for 3 times (T in Algorithm ??). We select γ in Eq. (7) from [0.05, 0.2] and α in Eq. (10) from [0.3, 0.7]. We train the model for a maximum number of 50 epochs using Adam^{28} with learning rate 5 ∗ 10^{−3} and weight decay rate 10^{−5}. We early stop training if the validation loss does not decrease for 10 consecutive epochs. We set dropout rate as 0.2 and batch size as 256. All results are averaged over five runs and are obtained on a 32GB NVIDIA Tesla V100 GPU. A summary of hyperparameters used by KnowDDI is provided in Supplementary Table 2. Their sensitivity analysis results are shown in Supplementary Fig. 2 and discussed in Supplementary Note 3.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Results
Data
In this study, we perform experiments on two publicly available benchmark DDI datasets: (i) Drugbank^{13} is a multiclass DDI prediction dataset consisting of 86 types of pharmacological relations occurred between drugs; and (ii) TWOSIDES^{29} is a multilabel DDI prediction dataset recording multiple DDI side effects between drugs. We adopt Hetionet^{17,18}, which is a benchmark biomedical KG for various tasks within drug discovery, as the external KG in this paper. Other recent developed biomedical KGs such as ogblbiokg^{30}, OpenBioLink^{31}, and PharmKG^{19} can also be used.
Data preprocessing
We preprocess the two benchmark DDI datasets DrugBank^{13,32} and TWOSIDES^{29,33} following the same procedure adopted by SumGNN^{23}. In DrugBank, relations are skewed. Each drugpair is filtered to have one relation only^{23}. In TWOSIDES, 200 commonly occurring relations are selected. In particular, relations are ranked by decreasing number of associating fact triplets, and the 200 relations ranked between 600 to 800 are kept such that each relation is associated with at least 900 fact triplets^{10}. Thus, relations in TWOSIDES are associated with comparable number of fact triplets. We formulate the benchmark DDI datasets as DDI graphs separately, whose statistics are summarized in Table 1. The fact triplets in DDI datasets are split into training, validation, and testing sets with a ratio of 7:1:2 following SumGNN^{23} for fair comparison. We remove from external KG the drugdrug edges contained in DDI graph to avoid information leakage, then merge the resultant external KG and DDI graph into a large combined network. Eventually, the DDI graph of DrugBank is merged with a graph of 33765 nodes and 1690693 edges extracted from Hetionet, and the DDI graph of TWOSIDES is merged with a graph of 28132 nodes and 1666632 edges extracted from Hetionet, respectively. During training, the drugdrug edges in validation and testing sets are unseen. After tuning hyperparameters on fact triplets in validation set, the model performance is evaluated on fact triplets in testing set.
Evaluation metric
We evaluate the multiclass DDI prediction performance by three metrics: (i) Macroaveraged F1 which is averaged over classwise F1 scores, (ii) Accuracy (ACC) which is the microaveraged F1 score calculated using all testing fact triplets, and (iii) Cohen’s κ which measures the interannotator agreement. As for multilabel DDI prediction, we report the results averaged over all relation types. The performance is evaluated by (i) AUROC which is the average area under the receiver operating characteristics (ROC) curve, (ii) AUPRC which is the average area under the precisionrecall (PR) curve, and (iii) AP@50 which is the average precision at 50.
Comparison with the stateoftheart
We consider multityped DDI prediction problem where interactions between drugs can have multiple relation types. For example, a drugpair (drug A, drug B) can have two relation types “the matabolism of drug A can be decreased when combined with drug B" and “the therapeutic efficacy of drug A can be increased when combined with drug B". In particular, we compare the proposed KnowDDI with the following models:

Traditional twostage methods (w/o external KG). (i) KG embeddingbased methods use shallow linear models to encode drug entities and their associated relations into lowdimentional embeddings, then feed the drug embeddings into a separately learned classifier for DDI prediction. Exemplar methods are TransE^{34}, KGDDI^{35,36}, and MSTE^{15,37}. (ii) Network embeddingbased methods which use neural networks to encode structural information into node embeddings of drugs, and predict the relation types by a linear layer. Exemplar methods are DeepWalk^{14,38}, node2vec^{39,40}, and LINE^{41,42}.

GNNbased methods (w/o external KG) formulate the existing DDI fact triplets into the form of a graph where each node corresponds to a drug and each edge between two drugs represents one relation type, then solve the resultant link prediction problem on the DDI graph using GNNs^{43}, including GAT^{44,45}, Decagon^{10,46}, and SkipGNN^{11,47}.

GNNbased methods (w/ external KG) leverage an external KG to provide rich organized biomedical knowledge, and aggregate the messages from neighboring nodes of drugs by GNNs. Existing methods include GraIL^{22,48}, KGNN^{21,49}, DDKG^{50,51}, SumGNN^{23,52}, and LaGAT^{24,53}.
We implement the baselines using public codes of the respective authors, except TransE^{34} which is implemented by us.
Overall performance
Table 1 shows the results obtained on two benchmark DDI datasets. Overall, we can see that GNNbased methods (w/ external KG) generally perform the best and traditional twostage methods generally perform the worst. Comparing with traditional twostage methods, GNNbased methods (w/o external KG) can better propagate information among connected nodes (i.e., drugs) by modeling the fact triplets integrally as a graph and jointly learning all model parameters w.r.t. the objective in an endtoend manner. However, due to the lack of DDI fact triplets and the overparameterization of GNN, they may not consistently be better than traditional twostage methods. This can be supported by the observation that DeepWalk and KGDDI perform better than the deep GAT.
Next, GNNbased methods ((w/ external KG)) leverage rich biomedical knowledge to alleviate the data scarcity problem. Among these methods, KGNN performs the worst. In contrast to pure GNN, KGNN uniformly samples N nodes as neighbors of each node during message passing to reduce computational overhead. DDKG improves KGNN by assigning attention weights to the N nodes during message passing, where the attention weights are obtained by calculating the similarity between initial node embeddings constructed from SMILES. In the end, each drug obtains its representation without considering which drug to interact in KGNN and DDKG. While GraIL, SumGNN, LaGAT and our KnowDDI merge the DDI graph with an external KG as a large combined network, then learn to encode more local semantic information from the combined network by extracting subgraphs w.r.t. drugpairs. A drug can be represented differently in different subgraphs. Thus, these methods can obtain drugpairaware representations that can be beneficial to predict DDI types. In particular, GraIL directly propagates messages on the extracted subgraphs, LaGAT aggregates messages with attention weights calculated using node embeddings, and SumGNN only prunes edges based on node features that are randomly initialized or pretrained on other tasks and then fixes the subgraphs. They all cannot adaptively adjust the structure of subgraphs during learning.
Finally, KnowDDI learns to remove irrelevant edges and add new edge of type “resemble" based on learned node embeddings. Upon the purified subgraph (i.e., knowledge subgraph) of target drugpair, KnowDDI transforms generic node embeddings to be more predictive of DDI types. The performance gain of KnowDDI over existing methods validates its effectiveness.
Relationwise performance
Next, we take a closer look at the performance gain w.r.t. different relations grouped by frequency. Figure 2a shows the relationwise F1 score (%) grouped into bins according to the number of fact triplets associated with the relation. We compare the proposed KnowDDI with SkipGNN which performs the best among GNNbased methods (w/o external KG), and SumGNN which obtains the secondbest among GNNbased methods (w/ external KG). In addition, we compare with KnowDDI (w/o resemble), a variant of KnowDDI which does not add new edges of type “resemble" between nodes with highly similar node embeddings.
As shown, by comparing the performance of SkipGNN and the other methods, we can see that external KG plays an important role. In general, KnowDDI and KnowDDI (w/o resemble) consistently obtain better performance than SumGNN, and the performance gap is larger on relations with fewer known fact triplets. This shows that enriched drug representations and adjusted subgraphs can be helpful to compensate for the lack of known DDI fact triplets. KnowDDI performs the best, which further shows the contribution of learning to propagate resembling relationships between highly similar nodes. Additionally, Supplementary Fig. 3 shows statistics of relation frequency and Supplementary Fig. 4 shows relationwise performance improvement of KnowDDI over SumGNN on DrugBank, TWOSIDES and a larger version of TWOSIDES with more relations. We also examine the performance of KnowDDI and the secondbest method SumGNN obtained on some important and commonly studied adverse drug reactions (ADRs) in Supplementary Table 3. Results consistently show that KnowDDI obtains better performance. An extended discussion is provided in Supplementary Note 4.
Compensating unknown DDIs
Recall that the lack of DDI fact triplets is compensated by both enriched drug representations and propagated drug similarities in KnowDDI. Here, we pay a closer look to the effectiveness of these two designs. To achieve this goal, we examine the performance of proposed KnowDDI and KnowDDI (w/o resemble) with different amount of fact triplets introduced from external KG to the combined network. We compare them with SumGNN which obtains the secondbest among GNNbased methods (w/ external KG), and take SkipGNN from GNNbased methods (w/o external KG) as a reference. Figure 2b plots the performance changes w.r.t varying portion (%) of fact triplets sampled from the external KG. First, SumGNN, KnowDDI and KnowDDI (w/o resemble) all perform worse given fewer fact triplets from the external KG. This is because a sparser external KG means less information introduced into DDI datasets, which reduces the information gap between GNNbased methods w/ or w/o external KG. Then, KnowDDI (w/o resemble) consistently outperforms the other two methods as it keeps information that is more relevant to predicting DDI for the drugpair at hand both during subgraph construction and learning. Besides, KnowDDI is the best, as it further learns drug similarities by propagating resembling relationships between drugs with highly similar representations. As a result, KnowDDI suffers the least from a sparser KG. However, the performance gap between KnowDDI (w/o resemble) and KnowDDI gets larger with fewer triples. This means removing irrelevant edges and propagating drug similarities have a larger influence on compensatingq for the lack of DDIs when drug representations are less enriched. Finally, let us pay special attention to the case of given 0% triples, which means external KG is not used. Still, we can observe that both KnowDDI (w/o resemble) and SumGNN still outperform SkipGNN. This can be attributed to the different subgraph extraction strategies adopted in these three methods, which will be carefully examined in Section 6.
Effectiveness of knowledge subgraph
Here, we pay a closer look at knowledge subgraph \({{{{{{{{\mathcal{S}}}}}}}}}_{h,t}\) designed in KnowDDI, and compare it with other choices of subgraphs in terms of performance and interpretability.
Subgraph extraction strategy
To empirically validate the effectiveness of the proposed knowledge subgraph, we consider the following subgraphs: (i) Random subgraph consists of a fixedsize set of nodes uniformly sampled from the neighborhoods of h and t in \({{{{{{{\mathcal{G}}}}}}}}\), which is adopted in KGNN^{21} to reduce the computation overhead; (ii) Enclosing subgraph is the interaction of Khop neighborhoods of h and t in \({{{{{{{\mathcal{G}}}}}}}}\), which is adopted in GraIL^{22}, and SumGNN^{23}; (iii) Drugflow subgraph \({\bar{{{{{{{{\mathcal{S}}}}}}}}}}_{h,t}\) consists of relational paths pointing from h to t in \({{{{{{{\mathcal{G}}}}}}}}\) with length at least P; and (iv) Knowledge subgraph \({{{{{{{{\mathcal{S}}}}}}}}}_{h,t}\) consists of explaining paths from h to t, which is obtained by iteratively refining the graph structure and node embeddings of \({\bar{{{{{{{{\mathcal{S}}}}}}}}}}_{h,t}\). Apart from them, we further compare with the knowledge subgraphs obtained by KnowDDI (w/o resemble), and denote the results as knowledge subgraph (w/o resemble).
Figure 3a shows the results obtained by KnowDDI on DrugBank with different subgraphs and different percentages of fact triplets from external KG. Enlarged enclosing subgraphs are provided in Supplementary Fig. 5. As shown, leveraging subgraphs consistently leads to performance gain, regardless of the subgraph type. This shows the necessity of modeling local contexts of target drugpairs. Among these subgraphs, learning with knowledge subgraph obtains the best performance. As random subgraph consists of uniformly sampled nodes without considering the node importance, the selected nodes may not contribute to recognize the relationships between head and tail drugs. Enclosing subgraph keeps the local neighborhood of head and tail drugs intact, thus it does not lose information. However, directly learning on these subgraphs may lead to bad performance, if irrelevant edges exist. In contrast, drugflow subgraph focuses on relational paths pointing from head drug to tail drug, and knowledge subgraph further only keeps explaining paths. They all remove irrelevant nodes which do not appear in any paths. Besides, by comparing the performance of drugflow subgraph and knowledge subgraph under different percentages of fact triplets, we can see that both enriched drug representations and propagated drug similarities contribute to the performance improvements. However, the performance gain is larger when fewer fact triplets are used. This means removing irrelevant edges and propagating drug similarities play a stronger influence on compensating for the lack of DDIs when drug representations are less enriched. In summary, learning knowledge subgraph is effective.
Interpretability
As discussed, being able to understand the DDI between drugpairs helps drug discovery. Here, we show that KnowDDI can explain why two drugs associate with each other by leveraging explaining paths in knowledge subgraphs \({{{{{{{{\mathcal{S}}}}}}}}}_{h,t}\). Figure 3 shows the subgraphs of four drugpairs. Random subgraphs are not plotted, as they naturally loses semantic information. As can be observed, drugflow subgraphs contain fewer nodes in comparison to the enclosing subgraphs. Particularly, as we take Hetionet as the external KG, where only drugs have incoming edges with drugs and the relation type is “CompoundresemblesCompound", drugflow subgraphs only contain drugs. Knowledge subgraphs further adjust the graph structure. In particular, KnowDDI assigns a connection strength between each nodepair from both direction. It represents the importance of a given edge between two connected nodes in the drugflow subgraph or similarity between two nodes whose interactions are unknown. Even if two nodes are connected in the \({\bar{{{{{{{{\mathcal{S}}}}}}}}}}_{h,t}\), KnowDDI can delete an existing edge if the estimated connection strength is too small, such as the edge pointing from 575 to 284 in Fig. 3d. Likewise, two originally disconnected nodes can be connected after learning, if the estimated connection strength is large. This reveals that KnowDDI thinks the connected drugs are highly similar and can contribute to explaining the DDI type between two drugs, such as the edge pointing from 121 to 622 in Fig. 3c. Supplementary Fig. 6 shows the nodepair whose connection strength is the largest on each of knowledge graph, including their molecular graphs and drug efficacy.
Further, Table 2 shows the explaining paths with the largest average connection strengths assigned by KnowDDI (see Section 6) for the four drugpairs in Fig. 3. We use Hetionet KG and DrugBank database to help interpret these explaining paths. As can be seen, these explaining paths indeed discover reasonable explanations. Moreover, note that in the second drugpair, without the newly added edge of relation type “resemble" pointing from 121 to 622, the discovered explaining path no longer exists. This validates the necessity of learning with knowledge subgraphs.
Embedding visualization
Finally, we wish to show that KnowDDI helps better shape the embeddings of drugpairs and relations to be more predictive of DDI types between target drugpairs. From the DDI dataset, we randomly sample ten DDI relations, then randomly sample fifty fact triplets per relation. Given a drugpair (h, t), it is represented as the concatenation of drug embeddings which correspond to the node embeddings of h and t. For methods operating on subgraphs, the drugpair embedding is obtained by concatenating drug embeddings with subgraph embedding which is obtained by mean pooling over node embeddings of all nodes in the subgraph^{22,23}. In this way, local context of target drugpair is leveraged to obtain better prediction results. We compare KnowDDI with SumGNN and KnowDDI (w/o resemble). In addition, we also show the drugpair embeddings obtained by simply learning generic node embeddings without refining them on subgraphs. Figure 4 shows the tSNE visualization^{54} obtained on DrugBank. First, we can see from generic embeddings in Fig. 4b, to KnowDDI (w/o resemble) in Fig. 4c, to KnowDDI in Fig. 4d, drugpair with the same relation type are getting closer while drugpair with different relation types are moving farther apart. Also, as can be seen, the clusters are more obvious in KnowDDI (Fig. 4c, d) than that of SumGNN (Fig. 4a). This means that learning knowledge subgraphs is beneficial to obtain more distinctive drugpair embeddings.
Discussion
In this study, we are motivated to develop an effective solution to accurately construct a DDI predictor from the rare DDI fact triplets. The proposed KnowDDI achieves the goal by taking advantage of rich knowledge in biomedicine and healthcare and the plasticity of deep learning approaches. In KnowDDI, the enriched drug representations and propagated drug similarities together implicitly compensate for the lack of known DDIs. We first combine the provided DDI graph and an external KG into a combined network, and manage to encode the rich knowledge recorded in KG into the generic node representations. Then, we extract a drugflow subgraph for each drugpair from the combined network, and learn a knowledge subgraph from generic representations and the drugflow subgraph. During learning, the knowledge subgraph is optimized, where irrelevant edges are removed and new edges are added if two disconnected nodes have highly similar node representations. Finally, the representations of drugs are transformed to be more predictive of the DDI types between the target drugpair while knowledge subgraph contains explaining paths to interpret the prediction result. The performance gap between KnowDDI and other approaches gets larger for relation types given a smaller number of known DDI fact triplets, which validates the effectiveness of KnowDDI.
Due to the popularity of GNN for learning from graphs, existing works have applied it to solve link prediction problem. Early works, like GAT^{44} and RGCN^{55}, usually obtain the representation for each node by running message passing on the whole graph, then feed the representation of two target nodes into a predictor to estimate the existence of a link between two target nodes. Decagon^{10}, SkipGNN^{11}, KGNN^{21} and DDKG^{50} compared in Table 1 also follow this routine. In particular, KGNN uniformly samples a fixed number of nodes as neighbors of each node during message passing to reduce the computation overhead. DDKG improves the message passing part of KGNN by assigning attention weights to the uniformly sampled neighboring nodes, where the attention weights are obtained by calculating the similarities between initial node embeddings constructed from SMILES. These works treat all nodes equally and ignore pairwise information when propagating messages. Each drug will get a representation without considering which drug to interact. Thus, the performance of these methods can be worse than classical handdesigned heuristics, which count common neighbors or connected paths between a nodepair^{56}. As a result, recent work GraIL^{22} proposes a pipeline to learn with subgraphs, i.e., first extracting a subgraph containing the two target nodes, then obtaining the node representation from the subgraph, finally estimating the link between two target nodes using the nodepair representation which consists of the node embeddings of two target nodes and the subgraph embedding. KnowDDI, SumGNN^{23} and LaGAT^{24} compared in Table 1 follow this pipeline. They adopt different strategies to learn with subgraphs. In particular, LaGAT extracts a subgraph consisting of a fixed number of nodes around the head and tail drugs, updates node embeddings by aggregating neighboring nodes based on attention weights calculated using node embeddings, and leaves the subgraph unchanged. While SumGNN extracts the enclosing subgraph of each drugpair, then prunes edges based on the node features. By encoding local context within subgraphs, these methods obtain nodepairaware representations, i.e., a drug can be represented different depending on which drug to interact. Our KnowDDI also learns with subgraphs, while two major design differences makes it obtain better and more interpretable results. The first difference is that KnowDDI learns generic node embeddings on the combined network to enrich the drug representations, then transforms them on knowledge subgraphs to incorporate with the local context of drugpairs. The second difference is the adjustment of subgraphs where existing edges can be dropped if their estimated importance are low, and new edges of type “resemble" can be added between disconnected nodes if their node embeddings are highly similar. This allows KnowDDI to capture explaining paths pointing from head drug to tail drug. While SumGNN directly learns drugpairaware representations from the extracted subgraphs. With the two differences, KnowDDI achieves the balance of generic information and drugpairaware local context during learning.
The architecture of KnowDDI can be further improved. For instance, pretraining GNN from other large datasets which may provide better initialized parameters and therefore reduce the training time. Besides, we do not use any molecular features of drugs in order to test the ability of KnowDDI learning solely from the combination of external KG and DDI fact triplets. Taking these predefined node features may improve the predictive performance of KnowDDI in the future. Although we implement KnowDDI to handle DDI prediction in this paper, KnowDDI is a general approach which can be applied to other relevant applications, to help detect possible proteinprotein interactions, drugtarget interactions, and diseasegene interactions. Relevant practitioners can easily leverage the rich biomedical knowledge existing in large KGs to obtain good and explainable prediction results. We believe our opensource KnowDDI can act as an original algorithm and unique deep learning tool to promote the development of biomedicine and healthcare. For example, it can help detect possible interactions of new drugs, accelerating the speed of drug design. Given drug profiles of patients, KnowDDI can be used to identify possible adverse reactions. These results have the potential to serve as a valuable resource for alerting clinicians and healthcare providers when devising management plans for polypharmacy, as well as for guiding the inclusion criteria of participants in clinical trials. Beyond biomedicine and healthcare, similar approaches can be developed to adaptively leverage domainspecific large KGs to help solve downstream applications in lowdata regimes.
Data availability
All data used in this study are available in supplementary data and public repositories. Source data underlying Figs. 2–4 can be found in Supplementary Data 1. For the benchmark DDI datasets, DrugBank dataset^{13} can be downloaded from https://bitbucket.org/kaistsystemsbiology/deepddi/src/master/data/^{32}, and TWOSIDES dataset^{29} can be downloaded from https://tatonettilab.org/resources/nsides/^{33}. The external KG Hetionet^{17} is obtained from https://het.io^{18}. The processed data analyzed in this paper is available in GitHub repository at https://github.com/LARSresearch/KnowDDI/tree/main/data^{57}.
Code availability
The code implementing KnowDDI is deposited in public available GitHub repository at https://github.com/LARSresearch/KnowDDI^{58}. The version for this publication is provided in Zenodo with the identifier: https://doi.org/10.5281/zenodo.10285646^{59}.
References
Juurlink, D. N., Mamdani, M., Kopp, A., Laupacis, A. & Redelmeier, D. A. Drugdrug interactions among elderly patients hospitalized for drug toxicity. JAMA 289, 1652–1658 (2003).
Bangalore, S., Kamalakkannan, G., Parkar, S. & Messerli, F. H. Fixeddose combinations improve medication compliance: A metaanalysis. Am. J. Med. 120, 713–719 (2007).
Scavone, C. et al. Current pharmacological treatments for COVID19: What’s next? Brit. J. Pharmacol. 177, 4813–4824 (2020).
Chakraborty, C., Sharma, A. R., Bhattacharya, M., Agoramoorthy, G. & Lee, S.S. The drug repurposing for COVID19 clinical trials provide very effective therapeutic combinations: Lessons learned from major clinical studies. Front. Pharmacol. 12, 704205 (2021).
Akinbolade, S. et al. Combination therapies for COVID19: An overview of the clinical trials landscape. Brit. J. Clin. Pharmacol. 88, 1590–1597 (2022).
Percha, B. & Altman, R. B. Informatics confronts drug–drug interactions. Trends Pharmacol. Sci. 34, 178–184 (2013).
Letinier, L. et al. Risk of drugdrug interactions in outhospital drug dispensings in france: Results from the drugdrug interaction prevalence study. Front. Pharmacol. 10, 265 (2019).
Jiang, H. et al. Adverse drug reactions and correlations with drug–drug interactions: A retrospective study of reports from 2011 to 2020. Front. Pharmacol. 13, 923939 (2022).
Marijon, E. et al. Causes of death and influencing factors in patients with atrial fibrillation: A competingrisk analysis from the randomized evaluation of longterm anticoagulant therapy study. Circulation 128, 2192–2201 (2013).
Zitnik, M., Agrawal, M. & Leskovec, J. Modeling polypharmacy side effects with graph convolutional networks. Bioinformatics 34, i457–i466 (2018).
Huang, K., Xiao, C., Glass, L. M., Zitnik, M. & Sun, J. SkipGNN: Predicting molecular interactions with skipgraph networks. Sci. Rep. 10, 1–16 (2020).
Derry, S., Kong Loke, Y. & Aronson, J. K. Incomplete evidence: The inadequacy of databases in tracing published adverse drug reactions in clinical trials. BMC Med. Res. Methodol. 1, 1–6 (2001).
Wishart, D. S. et al. DrugBank 5.0: A major update to the DrugBank database for 2018. Nucleic Acids Res. 46, D1074–D1082 (2018).
Perozzi, B. et al. DeepWalk: Online learning of social representations. In The 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (eds Macskassy, S. A., Perlich, C., Leskovec, J., Wang, W., & Ghani, R.) 701–710 (2014).
Yao, J., Sun, W., Jian, Z., Wu, Q. & Wang, X. Effective knowledge graph embeddings based on multidirectional semantics relations for polypharmacy side effects prediction. Bioinformatics 38, 2315–2322 (2022).
Bonner, S. et al. A review of biomedical datasets relating to drug discovery: A knowledge graph perspective. Briefings in Bioinformatics 23, bbac404 (2022).
Himmelstein, D. S. & Baranzini, S. E. Heterogeneous network edge prediction: A data integration approach to prioritize diseaseassociated genes. PLoS Computat. Biol. 11, e1004259 (2015).
Himmelstein, D. S. & Baranzini, S. E. Heterogeneous network edge prediction: A data integration approach to prioritize diseaseassociated genes. Hetionet Knowledge Graph. https://het.io/ (2015).
Zheng, S. et al. PharmKG: A dedicated knowledge graph benchmark for bomedical data mining. Briefings Bioinformatics 22, bbaa344 (2021).
Chandak, P., Huang, K. & Zitnik, M. Building a knowledge graph to enable precision medicine. Sci. Data 10, 67 (2023).
Lin, X. et al. KGNN: Knowledge graph neural network for drugdrug interaction prediction. In International Joint Conference on Artificial Intelligence (ed Bessiere, C.) 380, 2739–2745 (ijcai.org, 2020).
Teru, K., Denis, E., & Hamilton, W. Inductive relation prediction by subgraph reasoning. In International Conference on Machine Learning, (PMLR, 2020) pp. 9448–9457.
Yu, Y. et al. SumGNN: Multityped drug interaction prediction via efficient knowledge graph summarization. Bioinformatics 37, 2988–2995 (2021).
Hong, Y., Luo, P., Jin, S. & Liu, X. LaGAT: linkaware graph attention network for drug–drug interaction prediction. Bioinformatics 38, 5406–5412 (2022).
Hamilton, W. et al. Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems (eds Guyon, I. et al.) 1024–1034 (Neural Information Processing Systems Foundation, Inc., 2017).
Vashishth, S., Sanyal, S., Nitin, V. & Talukdar, P. Compositionbased multirelational graph convolutional networks. Paper presented at the 8th International Conference on Learning Representations (OpenReview.net, 2020).
Sadeghian, A. et al. DRUM: Endtoend differentiable rule mining on knowledge graphs. In Advances in Neural Information Processing Systems (eds Wallach, H. M. et al.) 15347–15357 (Neural Information Processing Systems Foundation, Inc., 2019).
Kingma, D. P. & Ba, J. Adam: A method for stochastic optimization. In Paper presented at the 3rd International Conference on Learning Representations (OpenReview.net, 2015).
Tatonetti, N. P., Patrick, P. Y., Daneshjou, R. & Altman, R. B. Datadriven prediction of drug effects and interactions. Sci. Transl. Med. 4, 125ra31 (2012).
Hu, W. et al. Open graph benchmark: Datasets for machine learning on graphs. In Advances in Neural Information Processing Systems (eds Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M. & Lin, H.) 33, 22118–22133 (Neural Information Processing Systems Foundation, Inc., 2020).
Breit, A., Ott, S., Agibetov, A. & Samwald, M. OpenBioLink: A benchmarking framework for largescale biomedical link prediction. Bioinformatics 36, 4097–4098 (2020).
Wishart, D. S. et al. DrugBank 5.0: A major update to the DrugBank database for 2018. DrugBank Database, https://www.drugbank.ca/ (2018).
Tatonetti, N. P., Patrick, P. Y., Daneshjou, R. & Altman, R. B. Datadriven prediction of drug effects and interactions. TWOSIDES Database, https://tatonettilab.org/resources/nsides/ (2012).
Bordes, A. et al. Translating embeddings for modeling multirelational data. In Advances in Neural Information Processing Systems, (eds Burges, C. J. C., Bottou, L., Ghahramani, Z. & Weinberger, K. Q.) 2787–2795 (Neural Information Processing Systems Foundation, Inc., 2013).
Karim, M. R. et al. Drugdrug interaction prediction based on knowledge graph embeddings and convolutionalLSTM network. In ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, (eds Shi, X. M., Buck, M., Ma, J. & Veltri, P.) 113–123 (ACM, 2019).
Karim, M. R. et al. Drugdrug interaction prediction based on knowledge graph embeddings and convolutionalLSTM network. Codes of KGDDI https://github.com/rezacsedu/DrugDrugInteractionPrediction (2019).
Yao, J., Sun, W., Jian, Z., Wu, Q. & Wang, X. Effective knowledge graph embeddings based on multidirectional semantics relations for polypharmacy side effects prediction. Codes of MSTE, https://github.com/galaxysunwen/MSTEmaster (2022).
Perozzi, B., AlRfou, R. & Skiena, S. DeepWalk: Online learning of social representations. Codes of DeepWalk, https://github.com/phanein/deepwalk (2014).
Grover, A. et al. node2vec: Scalable feature learning for networks. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (eds Krishnapuram, B. et al.) 855–864 (ACM, 2016).
Grover, A. & Leskovec, J. node2vec: Scalable feature learning for networks. Codes of node2vec, https://github.com/shenweichen/GraphEmbedding (2016).
Tang, J. et al. LINE: Largescale information network embedding. In International Conference on World Wide Web, (eds Gangemi, A., Leonardi, S. & Panconesi, A.) 1067–1077 (ACM, 2015).
Tang, J. et al. LINE: Largescale information network embedding. Codes of LINE, https://github.com/tangjianpku/LINE (2015).
Kipf, T. & Welling, M. Semisupervised classification with graph convolutional networks. In Paper presented at the 5th International Conference on Learning Representations (OpenReview.net, 2017).
Veličković, P. et al. Graph attention networks. In Paper presented at the 6th International Conference on Learning Representations (OpenReview.net, 2018).
Veličković, P. et al. Graph attention networks. Codes of GAT, https://github.com/PetarV/GAT (2018).
Zitnik, M., Agrawal, M. & Leskovec, J. Modeling polypharmacy side effects with graph convolutional networks. Codes of Decagon, https://github.com/mimsharvard/decagon (2018).
Huang, K., Xiao, C., Glass, L. M., Zitnik, M. & Sun, J. SkipGNN: Predicting molecular interactions with skipgraph networks. Codes of SkipGNN, https://github.com/kexinhuang12345/SkipGNN (2020).
Teru, K., Denis, E. & Hamilton, W. Inductive relation prediction by subgraph reasoning. Codes of GraIL, https://github.com/kkteru/grail (2020).
Lin, X., Quan, Z., Wang, Z.J., Ma, T. & Zeng, X. KGNN: Knowledge graph neural network for drugdrug interaction prediction. Codes of KGNN, https://github.com/xzenglab/KGNN (2020).
Su, X., Hu, L., You, Z., Hu, P. & Zhao, B. Attentionbased knowledge graph representation learning for predicting drugdrug interactions. Briefings Bioinformatics 23, bbac140 (2022).
Su, X., Hu, L., You, Z., Hu, P. & Zhao, B. Attentionbased knowledge graph representation learning for predicting drugdrug interactions. Codes of DDKG, https://github.com/Blair1213/DDKG (2022).
Yu, Y. et al. SumGNN: Multityped drug interaction prediction via efficient knowledge graph summarization. Codes of SumGNN, https://github.com/yueyu1030/SumGNN (2021).
Hong, Y., Luo, P., Jin, S. & Liu, X. LaGAT: linkaware graph attention network for drug–drug interaction prediction. Codes of LaGAT, https://github.com/Azra3lzz/LaGAT (2022).
Van der Maaten, L. & Hinton, G. Visualizing data using tSNE. J. Mach. Learn. Res. 9, 2579−2605 (2008).
Schlichtkrull, M. et al. Modeling relational data with graph convolutional networks. In The Semantic Web: 15th International Conference, (eds Gangemi, A. et al.) 593–607 (Springer, 2018).
Zhang, M. et al. Link prediction based on graph neural networks. In Advances in Neural Information Processing Systems, (eds Bengio, S. et al.) 5171–5181 (Neural Information Processing Systems Foundation, Inc., 2018).
Wang, Y., Yang, Z. & Yao, Q. Accurate and interpretable drugdrug interaction prediction enabled by knowledge subgraph learning. Processed data used in KnowDDI, https://github.com/LARSresearch/KnowDDI/tree/main/data (2024).
Wang, Y., Yang, Z. & Yao, Q. Accurate and interpretable drugdrug interaction prediction enabled by knowledge subgraph learning. Codes of KnowDDI hosted by GitHub, https://github.com/LARSresearch/KnowDDI (2024).
Wang, Y., Yang, Z. & Yao, Q. Accurate and interpretable drugdrug interaction prediction enabled by knowledge subgraph learning. Zenodo https://doi.org/10.5281/zenodo.10285646 (2023).
Acknowledgements
Q.Y. is supported by research fund of National Natural Science Foundation of China (No. 92270106), and Independent Research Plan of the Department of Electronic Engineering Department at Tsinghua University,
Author information
Authors and Affiliations
Contributions
For this manuscript, Y.W. contributes to the idea development, experiment design, paper presentation and writing; Z.Y. contributes to the code implementations, obtaining and analysis results; Q.Y. contributes to the idea development, result analysis, paper presentation and writing.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Communications Medicine thanks JianYu Shi, Beilun Wang and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Wang, Y., Yang, Z. & Yao, Q. Accurate and interpretable drugdrug interaction prediction enabled by knowledge subgraph learning. Commun Med 4, 59 (2024). https://doi.org/10.1038/s4385602400486y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s4385602400486y