SkipGNN: predicting molecular interactions with skip-graph networks

Molecular interaction networks are powerful resources for molecular discovery. They are increasingly used with machine learning methods to predict biologically meaningful interactions. While deep learning on graphs has dramatically advanced the prediction prowess, current graph neural network (GNN) methods are mainly optimized for prediction on the basis of direct similarity between interacting nodes. In biological networks, however, similarity between nodes that do not directly interact has proved incredibly useful in the last decade across a variety of interaction networks. Here, we present SkipGNN, a graph neural network approach for the prediction of molecular interactions. SkipGNN predicts molecular interactions by not only aggregating information from direct interactions but also from second-order interactions, which we call skip similarity. In contrast to existing GNNs, SkipGNN receives neural messages from two-hop neighbors as well as immediate neighbors in the interaction network and non-linearly transforms the messages to obtain useful information for prediction. To inject skip similarity into a GNN, we construct a modified version of the original network, called the skip graph. We then develop an iterative fusion scheme that optimizes a GNN using both the skip graph and the original graph. Experiments on four interaction networks, including drug–drug, drug–target, protein–protein, and gene–disease interactions, show that SkipGNN achieves superior and robust performance. Furthermore, we show that unlike popular GNNs, SkipGNN learns biologically meaningful embeddings and performs especially well on noisy, incomplete interaction networks.


Direct similarity
In contrast, in molecular interaction networks, directly interacting entities are not necessarily similar, which has been observed in numerous networks, including genetic interaction networks 12,13 and protein-protein interaction networks 14,15 .
fully exploit this important quality for biomedical interaction network. Notably, there are recent advancements in GNN such as MixHop 11 , JK-Net 34 which are designed to capture higher order graph structures through skip connections and higher order adjacency matrix. However, they are motivated by general network model and does not propose a solution for the specific challenge of 2-hop skip similarity in biomedical network. In molecular interaction networks, the goal is to predict if a given pair of biomedical entities such as proteins, drugs or diseases will interact. We can divide methods for interaction prediction into three main groups. (1) Structural representation learning generates embeddings for each entity using the entity's structural representation, such as a compound's molecular graph or a protein's amino acid sequence. The embeddings of two entities are then combined and fed into a decoder for prediction. For example [35][36][37] , use graph-convolutional (GCN) and convolutional (CNN) networks on molecular graphs and gene sequence data to predict binding of drugs to target proteins. Similarly [38][39][40] , learn embedding for drugs and concatenate embeddings of drug pairs to predict drug-drug interactions. (2) Similarity-based learning is based on the assumption that entities with similar interaction patterns are likely to interact. These methods devise a similarity measure (e.g., a graphlet-based signature of proteins in the PPI network 41 ) and then use the measure to predict interactions based on how similar a candidate interaction is to known interactions. A variety of techniques are used to aggregate similarity values and score interactions, including matrix factorization 42 , clustering 43 , and label propagation 44 . (3) Finally, network relational learning views the task as a network completion problem. It uses network structure together with side information about nodes to complete the network and predict interactions 4,33,45 . SkipGNN belongs to the structural representation learning category.
Preliminaries on graph neural networks (GNNs). Next, we describe graph neural networks as they are one of the state-of-the-art models for link prediction and are also the focus of our study. The input to a GNN is the network, represented by its adjacency matrix A . Most often, the goal (output) of the GNN is to learn an embedding for each node in the network by capturing the network structure as well as node attributes. GNN can be represented as a series of neighborhood aggregations layers (e.g., 9 ): , where H (l) is a matrix of node embeddings at the lth layer, H (0) are input node attributes, W is a trainable parameter matrix, σ is a non-linear activation function, and D and A are the renormalized degree and adjacency matrices, defined as: A = A + I and D ii = j A ij ( I is the identity matrix). The GNN propagates information across network neighborhoods and transforms the information in a way that is most useful for a downstream prediction tasks, such as link prediction.

Methods
SkipGNN is a graph neural network uniquely suited for molecular interactions. SkipGNN takes as input a molecular interaction network and uses it to construct a skip graph, which is a second-order network representation capturing the skip similarity. SkipGNN then specifies a novel graph neural network architecture that fuses the original and the skip graph to accurately and robustly predict new molecular interactions. Notations are described in Table 1.

Problem formulation.
Consider an interaction network G on N nodes representing biomedical entities V (e.g., drugs, proteins, or diseases) and M edges E representing interactions between the entities. For example, G can be a drug-target interaction network recording information on how drugs bind to their protein targets 3 . For every pair of entities i and j, we denote their interaction with a binary indicator e ij ∈ {0, 1} , indicating the experimental evidence that i and j interact (i.e., e ij = 1 ) or the absence of evidence for interaction (i.e., e ij = 0 ). We denote the adjacency matrix of G as A , where A ij is 1 if nodes i and j are connected ( e ij = 1 ) in the graph and otherwise 0 ( e ij = 0 ). Further, D is the degree matrix, a diagonal matrix, where D ii is the degree of node i.
Problem (Molecular Interaction Prediction) Given a molecular interaction network G = (V , E ) , we aim to learn a mapping function f : E → [0, 1] from edges to probabilities such that f(i, j) optimizes the probability that nodes i and j interact.
Construction of the skip graph. Next, we describe skip graphs, the key novel representation of interaction networks that allow for effective use of GNNs for predicting interactions. We realize Skip similarity by encouraging the GNN model to embed skipped nodes close together in the embedding space. To do that, we construct skip graph G s , in two-hop neighbors are connected by edges. This construction creates paths in G s along which neural messages can be exchanged between the skipped nodes.
Formally, we use the following operator to obtain the skip graph's adjacency matrix A s : The corresponding degree matrix is D ii s = j A ij s . An efficient way to implement the skip graph is through matrix multiplication: where sign(x) is the sign function, sign(x) = 1 if x > 0 and 0 otherwise, which is applied element-wise on AA T . It counts the number of two-hop paths from node i to j . Hence, if an entry for node i, j in AA T is larger than 0, it means there exists a skipped node between node i, j. Then, we convert the positive entry into 1 to construct the skip graph's adjacent matrix. Given this skip graph, we proceed to describe the full SkipGNN model.
(1) A s = sign(AA T ),  , are specified using side information (e.g., gene expression vectors if nodes represent genes) or generated using node2vec 18 . In SkipGNN, node embeddings are then propagated along edges of G s and G and transformed through a series of computations (layers), which output powerful embeddings that can then be used for downstream prediction of interactions. Illustrated is a two-layer iterative fusion scheme. In the first layer, two GNNs with parameter weight matrices W (2). This completes computations in the first layer of SkipGNN, producing embeddings H (1) and S (1) . In the second layer, those embeddings are transformed via W www.nature.com/scientificreports/ The SkipGNN model. In this section, we describe how we leverage the skip graph for link prediction. After we generate the novel skip graph from "Construction of the skip graph" section, we propose an iterative fusion scheme for SkipGNN to allow the skip graph and the original graph to learn from each other for better integration. Lastly, a decoder is used to output a probability measuring if the given pair of molecular entities interact.
Iterative fusion. We want a model to automatically learn how to balance between direct similarity and skip similarity in the final embedding. We design an iterative fusion scheme with aggregation gates to combine both similarity information. The motivation is that to represent biomedical entity to its fullest extent, node embedding must capture its complicated bioactive functions with skip/direct similarities. Hence, instead of simply concatenating the output node embeddings from the GNN output of the original graph G that captures direct similarity and skip graph G s that captures skip similarity, we allow two GNNs on G and G s to interact with each other iteratively via the following propagation rules (see Fig. 2): Here, H (l) , S (l) are node embeddings at the lth layer from direct similarity graph G and skip similarity graph G S , respectively. F, F s are the re-normalized adjacency matrices from direct similarity and skip similarity, respectively. And W s are the transformed weights for layer l. H (0) and S (0) are set to be X , the input node attributes generated from node2vec. The aggregate gate AGG in Eq. (2) can be a summation, a Hadamard product, max-pooling, or some other aggregation operator 46 . Empirically, we find that summation gate has the best performance. σ () is the activation function and we use ReLU(·) = max(·, 0) to add non-linearity in the propagation.
In each iteration, the node embedding for original graph H (l+1) is first updated with its previous layer's node embedding H (l) , combined with skip graph embedding S (l) . After obtaining the updated original graph embedding H (l+1) , we then update the skip graph embedding S (l+1) in a similar fashion. This update rule is very different from simple concatenation as it is an iterative process where each update of the node embedding for each graph is affected by the most recent node embedding from both graphs. This way, two embedding are learned to find the best dependency structure between each other and fuse into one final embedding instead of a simple concatenation. In the last layer, final node embedding E is obtained through: where (1) is the index for the last layer and AGG is the summation gate. As in the motivation, we are interested only in up to second order neighbors, thus we use two layers GNN, see Fig. 2. We don't use activation function here as it does not require an extra non-linear transformation to be fed into the decoder network. Empirically, we show this fusion scheme boosts predictive performance in "Ablation studies" section.
SkipGNN decoder. Given the target nodes (i, j) and their corresponding node embedding E i , E j , we implement a neural network as a decoder to first combine E i , E j to obtain an input embedding through a COMB function (e.g., concatenation, sum, Hadamard product). Then, the combined embedding is fed into a neural network parametrized by weight W d and bias b as a binary classifier to obtain probability p ij : where p ij represents the probability that nodes i and j interact (i.e., f(i, j). We use concatenation as the COMB function as it consistently yield the best performance across different types of networks. www.nature.com/scientificreports/ The SkipGNN algorithm. The overall algorithm is shown in Algorithm 1. Here, we only leverage accessible network information (adjacent matrix A of the network G) to predict links. In all experiments, we initialize embeddings using node2vec 18 as: X = node2vec(A). Second, we construct the skip graph with adjacent matrix A s via Eq. (1) to capture the skip-similarity principle. Next, at every step, a mini-batch of interaction pairs M with labels y is sampled. Then, two graph convolutions networks are used for the original graph and the skip graph respectively. In the propagation step, we use iterative fusion (Eq. (2)) to naturally combine embeddings convolved on the original graph and on the skip graph, corresponding to direct and skip similarity, respectively. In the last layer, embeddings are stored in E . We then retrieve the embeddings for each node in the mini-batched pairs M and concatenate them to feed into decoder (Eq. (4)).
During training, we optimize the SkipGNN 's parameters W in an end-to-end manner through a binary cross-entropy loss: where y ij is the true label for nodes i and j that are sampled during training via mini-batching, (i, j) ∈ M , and M is a mini-batch of interaction pairs. After the model is trained, it can be used to make predictions. Given two entities i and j, the model predicts probability f(i, j) that i and j interact.

Results
We conduct a variety of experiments to investigate the predictive power of SkipGNN ("Predicting molecular interactions" section). We then study the method's robustness to noise and missing data ("Robust learning on incomplete interaction networ" section) and demonstrate the skip similarity principle ("SkipGNN learns meaningful embedding spaces" section). Next, we conduct ablation studies to examine contributions of each of SkipGNN 's components towards the final SkipGNN performance ("Ablation studies" section). Finally, we investigate novel predictions made by SkipGNN ("Investigation of SkipGNN's novel predictions" section).
Data and experimental setup. Next we provide details on molecular interaction datasets, baseline methods, and experimental setup.  48 is the human reference protein-protein interaction network generated by multiple orthogonal the high-throughput yeast two-hybrid screens. We use HI-III network, which has 5,604 proteins and 23,322 interactions. (4) Finally, we consider DisGeNET-GDI 49 collects curated gene-disease interactions (GDIs) from GWAS studies, animal models and scientific literature. The dataset has 81,746 interactions between 9,413 genes and 10,370 diseases. Dataset statistics are described in Table 2.
SkipGNN implementation and hyperparameters. We implemented SkipGNN using PyTorch deep learning framework (The source code implementation of SkipGNN is available at https ://githu b.com/kexin huang 12345 / SkipG NN). We use a server with 2 Intel Xeon E5-2670v2 2.5GHZ CPUs, 128GB RAM and 1 NVIDIA Tesla P40 GPU. We set optimization parameters as follows: learning rate is 5e−4 using the Adam optimizer 50 , mini-batch size is |M| = 256 , epoch size is 15, and dropout rate is 0.1. We set hyper-parameters using 10 runs random search based on best average prediction performance on validation set of DTI task. We find the setup is robust in other datasets. The ranges of hyper-parameters are set as follows: learning rate: [1e−3, 5e−4, 1e−4, 5e−5]; minibatch size [32,64,128,256,512]; dropout rate [0, 0.05, 0.1, 0.2]; hidden size [16,32,64,128]. Specifically, we set hidden size in the first layer as d (1) = 64 and hidden size in the second layer as d (2) = 16.
Baseline methods. We compare SkipGNN to seven powerful predictors of molecular interactions from network science and graph machine-learning fields. From machine learning, we use three direct network embedding methods: DeepWalk 17 , node2vec 18 , and we also include struc2vec 19 . The latter method is conceptually distinct by leveraging local network structural information, while the former methods use random walks to learn embeddings for nodes in the network. Further, we examine five graph neural networks: VGAE 32 , GCN 9 , GIN 21 , JK-Net 34 and MixHop 11 . They all use the same input encoding as SkipGNN. From network science, we consider Spectral Clustering 20 . We also use L3 15 heuristic, which was recently shown to outperform over 20 network science methods for the problem of PPI prediction. Further details on baseline methods, their implementation and parameter selection are in supplementary. www.nature.com/scientificreports/ Experimental setup. In all our experiments, we follow an established evaluation strategy for link prediction (e.g., 4,51 ). We divide each dataset into train, validation, and test sets in a 7:1:2 ratio, which yields positive examples (molecular interactions). We generate negative counterparts by sampling the complement set of positive examples. The cardinality of negative samples are set to be the same as positive data points. For every experiment, we conduct five independent runs with different random splits of the dataset. We select the best performing model based on the loss value on the validation set. The performance of selected model is calculated on the test set. To calculate prediction performance, we use: (1) area under precision-recall curve (PR-AUC): PR-AUC = n k=1 Prec(k)�Rec(k), where k is kth precision/recall operating point ( Prec(k), Rec(k) ); and (2) area under the receiver operating characteristics curve (ROC-AUC): ROC-AUC = n k=1 TP(k)�FP(k), where k is kth true-positive and false-positive operating point ( TP(k), FP(k) ). Higher values of PR-AUC and ROC-AUC indicate better predictive performance. In addition to the PR-AUC and ROC-AUC, we rank each method in each dataset based on its PR-AUC and provide the average rank of a method across four datasets. The rank suggests the overall performance of the method compared to others. To further show the performance gain of SkipGNN, we resort to statistical test. For each method, we take the ROC-AUC and PR-AUC of each run for each dataset as the data samples. Then, we compute the p value for Wilcoxon signed-rank test between SkipGNN and the compared method.
Predicting molecular interactions. We start by evaluating SkipGNN on four distinct types of molecular interactions, including drug-target interactions, drug-drug interactions, protein-protein interactions, and gene-disease interactions, and we then compare SkipGNN 's performance to baseline methods.
In each interaction network, we randomly mask 30% interactions as the holdout validation (20%) and test (10%) sets. The remaining 70% interactions are used to train the SkipGNN and each of the baselines. After training, each method is asked to predict whether pairs of entities in the test set will likely interact.
We report results in Table 3 and the method rank, along with the p values for statistical test are provided in Table 4. We see that SkipGNN is the top performing method out of 11 methods across all molecular interaction networks. SkipGNN has the best predictive performance for DTI and PPI datasets and has the second best performance in DDI and GDI datasets, with an average rank of 1.5. In contrast, the best performing baseline MixHop has average rank of 2.5, as it sometimes is worse than JK-Net and GIN. We also see that SkipGNN's improvement over all baselines is statistically significant ( < .05 ). To show the usefulness of skip graph, we compare with GCN-backend baselines GCN and VGAE. We see up to 2.7% improvement of SkipGNN over GCN and up to 8.8% improvement over VGAE on PR-AUC. Since GCN and VGAE can only use direct similarity, this finding provides evidence that considering skip similarity and direct similarity together, as is made possible by SkipGNN, is important to be able to accurately predict a variety of molecular interactions. Compared to direct embedding methods, SkipGNN has up to 28.8% increase over DeepWalk, 20.4% increase over node2vec, and 15.6% over spectral clustering on PR-AUC. These results support previous observations 4 that graph neural networks can learn more powerful network representations than direct embedding methods. Finally, all baselines vary in performance across datasets/tasks while SkipGNN consistently yields the most powerful predictor.
Robust learning on incomplete interaction networks. Next, we test SkipGNN 's performance on incomplete interaction networks. Due to knowledge gaps in biology, many of today's interaction networks are incomplete and thus it is crucial that methods are robust and able to perform well even when many interactions are missing.
In this experiment, we let each method be trained on 10%, 30%, 50%, and 70% of edges in the DTI, DDI, and PPI datasets and predict on the rest of the data (we use 10% of test edges as validation set for early stopping).
Results in Fig. 3 show that SkipGNN gives the most robust results among all the methods. In all tasks, Skip-GNN achieves strong performance even when having access to only 10% of the interactions. Further, in almost every percentage point, SkipGNN is better than the baselines. In addition, we see that VGAE is not robust as its performance dropped to around 0.5 PR-AUC in highly-incomplete settings on DTI and DDI tasks. Performance of node2vec and GCN steadily improve as the percentage of seen edges increases. Further, while spectral clustering is robust to incomplete data, its performance varies substantially with tasks. We conclude that SkipGNN is robust and is especially appropriate for data-scarce networks.

SkipGNN learns meaningful embedding spaces. Next, we visualize embeddings learned by GCN and
SkipGNN in an effort to investigate whether SkipGNN can better capture the structure of interaction networks than GCN. For that, we use DTI and GDI networks in which drugs/diseases are linked to associated proteins/ genes. We use t-SNE 52 and visualize the learned embeddings in Fig. 4 (DTI network) and Fig. 5 (GDI network). Note that both GCN and SkipGNN uses the same input embedding, which means the only difference is whether or not skip similarity is used.
First, we observe that GCN cannot distinguish between different types of biomedical entities (i.e., drugs vs. proteins and disease vs. genes). In contrast, SkipGNN can successfully separate the entities, as evidenced by distinguishable groups of points of the same color in the t-SNE visualizations. This observation confirms that SkipGNN has a unique ability to capture the skip similarity whereas GCN cannot. This is because GCN forces embeddings of connected drug-protein/gene-disease pairs to be similar and thus it embeds those pairs close together in the embedding space. However, by doing so, GCN conflates drugs with proteins and genes with diseases. In contrast, SkipGNN generates a biologically meaningful embedding space in which drugs are distinguished from proteins (or, genes from diseases) while drugs are still positioned in the embedding space close to proteins to which they bind (or, in the case of GDI network, diseases are positioned close to relevant disease-associated genes). www.nature.com/scientificreports/ www.nature.com/scientificreports/ We also calculate the silhouette score of the t-SNE plot, which measures the inter-cluster and intra-cluster distance and is used to calculate the goodness of a clustering technique. A higher value indicates that the sample is better matched to its own cluster and poorly matched to neighboring clusters. Here SkipGNN has a silhouette score of 0.114 for DTI whereas GCN has a score of 0.014 for DTI. For GDI, SkipGNN has a score 0.079 and GCN has a score 0.018. The up to 8 times increase in silhouette scores suggest that SkipGNN can better distinguish the entities than GCN.  www.nature.com/scientificreports/ Further, we find that GCN and its graph convolutional variants cannot capture skip similarity because they aggregate neural messages only from direct (i.e., immediate) neighbors in the interaction network. SkipGNN solves this problem by passing and aggregating neural message from direct as well as in-direct neighbors, thereby explicitly capturing skip similarity.
Ablation studies. To show that each component of SkipGNN has an important role in the final performance of SkipGNN, we conduct a series of ablation studies. SkipGNN has four key components, and we study how the metho performance changes when we remove each of the components: • -fusion replaces SkipGNN 's fusion scheme with a simple concatenation of node embeddings generated by GCN. • -skipGraph removes skip graph and degenerates to GCN.  Table 5 show results of deactivating each of these components, one at a time. We find that -fusion outperforms -skipGraph (i.e., GCN) by a large margin. This finding identifies skip graph as a key driver of performance improvement. Further, we find that our iterative fusion scheme is important, indicating that successful methods need to integrate both direct and skip similarity in interaction networks. Next, we see that weighted L 1 gate has comparable or worse performance than the summation gate and Hadamard operator performs the worst, suggesting that SkipGNN 's summation gate is the best-performing aggregation function. Altogether, we conclude that all SkipGNN 's components are necessary for its strong performance.

Investigation of SkipGNN 's novel predictions.
The main goal of link prediction on graphs is to find novel hits that do not exist in the dataset. We conduct a literature search and find SkipGNN is able to discover novel hits. We select pairs that are not interacted in the original dataset but are flagged as interaction from our model. We then pick the top 10 confident interactions and feed them into literature database and see if there are evidence supporting our findings. We find promising result for the DDI task ( Table 6). Out of the 10 top-ranked interaction pairs, we are able to find 6 pairs that have literature evidence support.
For example, for the interaction between Warfarin and Calozapine 53 , reports that "Clozapine increase the concentrations of commonly used drugs in elderly like digoxin, heparin, phenytoin and Warfarin by displacing them from plasma protein. This can lead to increase in respective adverse effects with these medications. " Also, the manufacturer 59 also reports that "Clozapine may displace Warfarin from plasma protein-binding sites. Increased levels of unbound Warfarin could result and could increase the risk of hemorrhage. " Take another example between Warfarin and Ivacaftor 54 , conducts a DDI study and reports that "caution and appropriate monitoring are recommended when concomitant substrates of CYP2C9, CYP3A and/or P-gp are used during treatment with Ivacaftor, particularly drugs with a narrow therapeutic index, such as Warfarin." Finally, we provide the top 10 outputs for DTI, PPI, and GDI tasks in Appendix 3. www.nature.com/scientificreports/

Discussion
We introduced SkipGNN, a novel graph neural network for predicting molecular interactions. The architecture of SkipGNN is motivated by a principle of connectivity, which we call skip similarity. Remarkably, we found that skip similarity allows SkipGNN to much better capture structural and evolutionary forces that govern molecular interaction networks that what is possible with current graph neural networks. SkipGNN achieves superior and robust performance on a variety of key prediction tasks in interaction networks and performs well even when networks are highly incomplete. There are several future directions. We focused here on networks in which all edges are of the same type. As SkipGNN is a general graph neural network, it would be interesting to adapt SkipGNN to heterogeneous networks, such as drug-gene-disease networks. Another fruitful direction would be to implement skip similarity in other types of biological networks.

Appendix 1: Experiments on the importance of each layer of GNN for biomedical link prediction
To further support our claim on the importance of integrating skip similarity for GNN-based methods on biomedical interaction network link prediction, we vary the architecture of vanilla GNN and perform predictive comparison on DDI, PPI, and DTI tasks. Here are the variations:  www.nature.com/scientificreports/ • TwoLayers-OriGraph is the two layers GCN on original graph. It uses an indirect two-hops neighborhood aggregation because the two-hops nodes information is conveyed to the center node through the one-hop nodes. • OneLayer-OriGraph is a one layer vanilla GCN. It only utilizes the immediate one-hop neighbor information. Hence, it is a direct measure of direct similarity. • TwoLayers-SkipGraph is the vanilla two layers GCN that operates on the skip graph. It uses direct connection of center node with its two-hops neighborhood as against the indirect connection in vanilla GCN. As it is two layer, it also considers indirect four-hops neighbor nodes. • OneLayer-SkipGraph is the one layer version of GCN-A2. As it only uses two-hop neighbor information, it directly measures the skip similarity. • OneLayer-3Hops is the one layer version of GCN-A3. We test to show the significance of higher order neighbors. Table 7 compares the results. From the large improvement of TwoLayers-OriGraph over OneLayer-OriGraph, this is the initial evidence that two-hops neighborhood, which contains skip similarity node relation assumption, is essential. Then, comparing OneLayer-OriGraph and OneLayer-SkipGraph, the large margin improvement of OneLayer-SkipGraph implies two-hops neighbor alone has more predictive information than one-hop neighbor alone, supporting our motivation analysis of the importance of skip similarity for biomedical interaction network. Note also that the improvement from OneLayer-OriGraph to TwoLayers-OriGraph is much larger than the improvement from OneLayer-SkipGraph to TwoLayers-SkipGraph, meaning second-hop is essential and higher-order neighborhood is of limited importance for interaction link prediction. Lastly, TwoLayers-OriGraph performs better than TwoLayers-SkipGraph, meaning that biomedical interaction link prediction is a balance between immediate neighbor and two-hops neighbor, confirming with our intuition that an ideal network should pursue a balance between them and adding support for the iterative fusion scheme. Note that OneLayer-SkipGraph uses only the second hop neighborhood, without the first-hop neighborhood information. This suggests first-hop importance, and a necessity to integrate both first and second hop neighbors, such as SkipGNN's iterative fusion scheme. We also find 3-hops neighbor is less important than 2-hops neighbor when comparing OneLayer-SkipGraph and OneLayer-3Hops, further confirming the importance of 2-hops in biomedical interaction network.

Appendix 2: Details about baseline methods
• L3 15 counts the length-3 paths among all the network nodes pairs. The number of length-3 paths are then normalized by the degree of node pairs. www.nature.com/scientificreports/ • DeepWalk 17 performs uniform distributed random walk and applies skip-gram model to learn a node embedding. We use 20 walk lengths and then concatenate the target nodes embedding with a logistic regression classifier. • node2vec 18 builds on DeepWalk and uses biased random walk based on depth/breath first search to consider both local and global network structure. We use 20 walk length as the paper suggests longer walk lengths improve the embedding quality. The paper also reported Hadamard product perform better than average and weighted L1/L2 for link prediction. However, in our experiment, the simple concatenation is better than Hadamard. After the concatenation, we feed into a logistic regression classifier as described in the paper. • struc2vec 19 leverages the local network structure in addition to the node2vec. We use 80 walk length and 20 number of walks, following author's recommendation. We then concatenate the latent embedding and feed into a logistic regression classifier. • Spectral Clustering 20 projects nodes on top-16 eigenvectors of the normalized Laplacian matrix and uses the transposed eigenvectors as node embeddings. The embeddings are then multiplied and pass through a sigmoid function to obtain link probabilities. • VGAE 32 applies variational graph auto-encoder and learns node embeddings that best reconstruct the adjacent matrix. We use a two-layer GCN with hidden size 64 for layer one and 16 for layer two. The learning rate is set to be 5e−4 with Adam optimizer for 300 epochs. The dropout rate is set to be 0.1. • GCN 9 uses two-layers GCN layers on original adjacency matrix to obtain node embeddings, others are with same setting as SkipGNN . We use a two-layer GCN with hidden size 64 for layer one and 16 for layer two. The learning rate is set to be 5e−4 with Adam optimizer for 10 epochs with batch size 256. • GIN 21 uses multi-layer perceptron (MLP) as the aggregation function. We use a five layer GIN with hidden size 32. The learning rate is set to be 5e−4 with Adam optimizer for 10 epochs with batch size 256.