ZeroBind: a protein-specific zero-shot predictor with subgraph matching for drug-target interactions

Wang, Yuxuan; Xia, Ying; Yan, Junchi; Yuan, Ye; Shen, Hong-Bin; Pan, Xiaoyong

doi:10.1038/s41467-023-43597-1

Download PDF

Article
Open access
Published: 29 November 2023

ZeroBind: a protein-specific zero-shot predictor with subgraph matching for drug-target interactions

Nature Communications volume 14, Article number: 7861 (2023) Cite this article

4271 Accesses
3 Citations
19 Altmetric
Metrics details

Subjects

Abstract

Existing drug-target interaction (DTI) prediction methods generally fail to generalize well to novel (unseen) proteins and drugs. In this study, we propose a protein-specific meta-learning framework ZeroBind with subgraph matching for predicting protein-drug interactions from their structures. During the meta-training process, ZeroBind formulates training a protein-specific model, which is also considered a learning task, and each task uses graph neural networks (GNNs) to learn the protein graph embedding and the molecular graph embedding. Inspired by the fact that molecules bind to a binding pocket in proteins instead of the whole protein, ZeroBind introduces a weakly supervised subgraph information bottleneck (SIB) module to recognize the maximally informative and compressive subgraphs in protein graphs as potential binding pockets. In addition, ZeroBind trains the models of individual proteins as multiple tasks, whose importance is automatically learned with a task adaptive self-attention module to make final predictions. The results show that ZeroBind achieves superior performance on DTI prediction over existing methods, especially for those unseen proteins and drugs, and performs well after fine-tuning for those proteins or drugs with a few known binding partners.

Accurate and interpretable drug-drug interaction prediction enabled by knowledge subgraph learning

Article Open access 28 March 2024

Hierarchical Molecular Graph Self-Supervised Learning for property prediction

Article Open access 17 February 2023

An adaptive graph learning method for automated molecular interactions and properties predictions

Article 23 June 2022

Introduction

Identifying the interactions between drugs and targets (proteins) plays a crucial role in the process of drug discovery^1,2,3. However, the traditional experimental methods for resolving the crystal structures of drug-protein complexes to identify the drug-target interactions are costly and time-consuming^3,4,5. In order to reduce costs, in silico approaches are gaining more attention. Instead of taking massive candidates into an in vitro search, it is more efficient and less costly to use computational approaches to virtually screen out most candidates prior to an in vitro search. Generally, there are two major groups of in silico approaches, docking simulations and data-driven learning-based methods. Docking simulations utilize the 3D structure of drug molecules and target proteins to identify their potential binding sites, which are also time-consuming^1,6,7. In contrast, due to the rapid development of machine learning, utilizing the features derived from proteins and drugs to identify their interactions achieves both high accuracy and low cost.

Data-driven learning methods generally formulate drug-target interaction (DTI) prediction as binary classification or regression tasks^8,9,10,11, where interacting pairs of proteins and drugs are extracted from existing databases like BindingDB¹², CHEMBL¹³, PDBbind^14,15, and DrugBank¹⁶. Since the nature of potency values is logarithmic, a decrease in kinetic constants from micromolar to nanomolar levels results in an exponential change. Thus, the output values of regression tasks are normally defined as the negative log of kinetic constants, i.e., ${K}_{i}$, ${K}_{d}$, ${{IC}}_{50}$ and ${{EC}}_{50}$. For classification tasks, a threshold is set to define binding or nonbinding according to ${K}_{i}$, ${K}_{d}$, ${{IC}}_{50}$, and ${{EC}}_{50}$^2,17,18. Machine learning-based methods from molecules and proteins features mainly focus on learning a good representation of molecules and proteins, which are then fed into a classification/regression model to perform the prediction task^8,9,10,11,19.

Recently, deep learning has achieved exciting results in DTI prediction by learning from known drug-target interactions. However, these methods cannot generalize well to those unseen proteins and drugs^8,20. Similarity/distance-based^20,21,22,23 and network-based^24,25,26 methods, which utilize protein-protein similarity, drug-drug similarity and known DTIs, achieve overestimated performance on the transductive test⁹, where both proteins and ligands in the test set are present in the training set. In addition, there are often few known binding drugs with a newly discovered target protein, making it difficult to work on the inductive test, where both proteins and ligands in the test set are absent in the training set. Currently, most existing methods focus on how to effectively learn representations of molecules and proteins, and then feed the representations into classification or regression models^8,9,10,11,19. The representation learning methods, like DeepPurpose⁸, take molecular fingerprints and SMILES²⁷ strings as the molecular features and amino acid sequences as protein features. Then they apply a convolutional neural network (CNN)²⁸, recurrent neural network (RNN)²⁹ or Transformer³⁰ model to embed the input features. To take spatial information into consideration, some methods apply CNNs to 3D images derived from structures of proteins and molecule³¹. GNN³² has gained much attention in recent years due to its capability of handling graphs as non-Euclidean structured data³³. Other methods utilize GNNs to embed graphs of proteins and molecules to perform DTI prediction¹⁰. DTI prediction also benefits from the rapid development of GNNs since both drug and target protein are naturally graph-structured data. GEFA³⁴ combines pre-trained protein embeddings and a graph-in-graph neural network with an attention mechanism to capture the interactions between drugs and protein residues. Thus, it is reasonable to use GNNs for the representation learning of proteins and drugs.

However, classical pair-input approaches often take drug-protein pairs as training samples by concatenating and feeding their representations into a dense layer to identify the interactions. However, these approaches may fail at inductive tests for unseen proteins and drugs not seen during the model training, which is due to their potential shortcuts³⁵ that remember the degree ratio of binding annotations in the training set rather than learning the molecular features of interactions⁹. To generalize to unseen proteins and drugs, AI-Bind leverages network-derived negatives and pre-training to resolve the shortcut learning issue⁹. Furthermore, proteins may have diverse binding patterns with drugs, which can be captured by training a protein-specific model for individual proteins. Recently, a ligand-based method MetaDTA³⁶ is developed as a few-shot predictor for proteins with a few known binding drugs under the meta-learning framework. However, MetaDTA does not support zero-shot prediction for unseen proteins without known binding drugs in the training set.

To address these challenges, we propose a protein-specific zero-shot predictor ZeroBind for drug-target interaction prediction under the meta-learning framework. ZeroBind adopts the meta-learning framework MAML++³⁷ as the training strategy and each base model makes binding drug predictions of a specific protein, which mainly consists of four modules: (1) a Graph Convolutional Network (GCN) encoder learning the embeddings of the molecule graph and protein graph, (2) a Subgraph Information Bottleneck (SIB) module generating the essential IB-subgraph of the protein graph as a potential binding pocket, (3) a Multilayer Perceptron (MLP) module concatenating the protein IB-subgraph embedding and molecular embedding to perform DTI prediction, and (4) a task adaptive self-attention module to measure the importance of different tasks, where different DTI tasks contribute differently to the meta-learner. The generated task weight is used for weighted average loss and further incorporated into the meta-learning procedure.

In summary, our contributions are as follows: (1) We formulate the drug-target interaction prediction for unseen drugs and proteins as a zero-shot learning problem. The general knowledge of DTI prediction learned from existing proteins, drugs, and their interactions with meta-learning strategy have a greater ability for generalization to unseen proteins and drugs than existing methods. (2) We train one DTI task model per protein, where task adaptive self-attention is designed to calculate the contributions of multiple DTI tasks to the protein-specific meta-learner. Thus, each protein-specific model captures individual binding patterns to drugs. (3) We propose model-agnostic IB-subgraph learning to automatically discover compressed subgraphs as potential binding pockets in proteins instead of redundant graph information derived from the whole protein. (4) We conduct extensive experiments on three independent zero-shot test sets and one few-shot test set. Results show that ZeroBind consistently outperforms existing methods. Further validation of real-world SARS-COV-2 drug-target binding prediction demonstrates the reliability of ZeroBind predictions and the subgraphs detected by IB-subgraph learning align well with the known binding pockets in proteins.

Results

Overview of ZeroBind

In this study, ZeroBind formulates the DTI prediction as a meta-learning task and proposes a meta-learning framework to solve the generalization problem of unseen proteins and drugs in DTI prediction. Specifically, a meta-learning task is defined as binding drug predictions of a specific protein, where IB-subgraph learning is leveraged to automatically discover compressed subgraphs as potential binding pockets in proteins and a self-attention mechanism is designed to learn weights for each task of a protein. The flowchart of ZeroBind is illustrated in Fig. 1.

In detail, ZeroBind uses network-based negative sampling⁹ as data augmentation to alleviate the annotation imbalance (Fig. 1a, Methods section). Figure 1b, c illustrates the ratio of positive samples before and after network-based negative sampling on the training set, indicating that network-based negative sampling alleviates the annotation imbalance to a certain extent. Then, it samples the DTIs into the support and query set (Fig. 1d), where the support set is used to train the meta-learner and the query set is used to train the task-specific models. After repeating N inner steps, all losses are weighted to optimize the meta-learner with gradient descent. For each protein, ZeroBind trains a DTI prediction task. Figure 1e gives the architecture of the base model in ZeroBind, where the protein graph and molecule graph are fed into a backbone GCN to learn embeddings for drugs and proteins. Furthermore, a weakly supervised Subgraph Information Bottleneck (SIB) module is designed to model and discover potential binding pockets in proteins. The SIB module not only reduces redundant information to boost the performance, but also brings interpretable insights into ZeroBind by identifying the critical residues in the protein. Figure 1f introduces an adaptive self-attention module to measure the contribution of each task of a protein, where different DTI tasks contribute differently to the meta-learner. ZeroBind support predicting DTIs in zero-shot and few-shot scenarios. The former uses the meta-learner to make predictions directly without fine-tuning using the samples of proteins in meta-testing, and the latter uses the protein-specific model to make predictions with fine-tuning using the samples of the protein in meta-testing.

ZeroBind outperforms existing methods in both zero-shot and few-shot settings for DTI prediction

To demonstrate the advantages of ZeroBind, we compare it with multiple baseline methods and calculate the area under the receiver operating characteristic (AUROC) and the area under the precision-recall curve (AUPRC) on three independent test sets and one few-shot test set. To ensure the effectiveness of performance comparison, we conduct five independent experiments with five random seeds for datasets partitioning and model training, and report the average results along with standard deviation.

The overall performances of all methods in three independent test sets are reported in Fig. 2a (Supplementary Table 1). ZeroBind achieves an AUROC of 0.9521(±0.0034), 0.8681(±0.0052), 0.8139(±0.0035) in the Transductive, Semi-inductive, and Inductive test sets, respectively. We can see that ZeroBind outperforms all baseline methods in the Transductive, Semi-inductive, and Inductive test sets. Compared to the best baseline method, ZeroBind achieves relative improvements in AUROC with 2.86% on the Transductive test set, 10.29% on the Semi-inductive test set, and 3.38% on the Inductive test set, the corresponding t-test p-value is 1.02 × 10⁻⁶, 8.02 × 10⁻¹¹, 3.06 × 10⁻⁷, respectively. Moreover, the relative improvements of AUPRC are 1.00% on the Transductive test set, 0.96% on the Semi-inductive test set, and 1.21% on the Inductive test set compared to the best baseline method, the corresponding t-test p-value is 8.93 × 10⁻⁸, 5.15 × 10⁻³, 1.56 × 10⁻⁵, respectively. In addition, on the three test sets, we observe that the performance of all methods decreases, indicating that a certain external shortcut learning exists for the Transductive test set.

**Fig. 2: Performance comparison of ZeroBind with baseline methods in zero-shot and few-shot scenarios.**

Figure 2a shows that AI-bind and our proposed ZeroBind outperform other baseline methods on the Inductive test set due to the incorporation of network-based negative sampling and unsupervised pre-trained embeddings to avoid the potential shortcut learning. ZeroBind achieves stable and better performance at Transductive and semi-inductive test sets than baseline methods, indicating that ZeroBind can effectively learn useful embeddings of proteins and drugs. Furthermore, ZeroBind achieves better performance in the inductive test set than AI-Bind. The potential reason is that ZeroBind uses a meta-learning framework to obtain a general knowledge of DTI prediction across multiple proteins, which can generalize well to the DTI prediction of unseen proteins and drugs.

We also evaluate the protein-specific DTI prediction performance of ZeroBind with other baseline methods on the inductive test set and semi-inductive test set (Fig. 2b). After excluding proteins without binding or nonbinding labels, we obtain 775 proteins with binding drugs and one task model is trained per protein, here ZeroBind is evaluated on the combined inductive and semi-inductive test sets. ZeroBind achieves an average AUROC of 0.78, which is higher than the AUROC 0.66 of DeepConv-DTI, 0.68 of GraphDTA, 0.69 of Deeppurpose, 0.75 of AI-bind and 0.73 of DrugBAN across the 775 proteins. Of the 775 proteins, ZeroBind outperforms DeepConv-DTI, GraphDTA, Deeppurpose, AI-bind, and DrugBAN for 525, 491, 305, 180 and 346 proteins, respectively. For each method, we present the number of proteins that this method outperforms other methods (Fig. 2c) according to the number of binding molecules. We observe that ZeroBind outperforms other baseline methods for the most proteins in the three ranges, especially for the proteins with only 1–10 known binding molecules, followed by AI-Bind.

We further evaluate ZeroBind against the baseline methods on the few-shot test set. ZeroBind and other baseline methods are first trained with the training set as the pre-training model, and further fine-tuned on each few-shot fine-tuning set and evaluated on the corresponding few-shot test set. The performance of ZeroBind model and other baselines on the few-shot test set is shown in Fig. 2d (Supplementary Table 2). In the few-shot test, ZeroBind outperforms the best baseline methods with relative improvements of AUROC 1.55% and AUPRC 1.62%. The potential reason is that the meta-learning framework has the strong generalization ability to quickly adapt to additional protein tasks with a few training samples. We can see that fine-tuning improves the zero-shot prediction performance by a large margin, which is very useful for the real-world application scenario that a protein has only a few known binding drugs.

ZeroBind detects the subgraphs that align well with known binding pockets of proteins in a weakly supervised way

ZeroBind does not use binding pocket information as ground truth labels for model training, instead, it uses the global DTI labels as a weakly supervised labels. To demonstrate that the IB-subgraph module in ZeroBind is able to detect the binding pocket in the protein, we use the Jaccard similarity coefficient³⁸ to compare the predicted binding pocket against the true binding pocket on PDBbind dataset^14,15. The true binding pocket information containing pocket residues and 3D coordinate locations are downloaded from PDBbind, which contains 14,000 binding residues for 535 proteins. The Jaccard similarity coefficient is used to calculate the degree of intersection between two sets of individual DTIs, and the formula is defined as ${Jaccard\; similarity\; coefficient}(\hat{P},P)=\frac{\hat{P}\cap P}{\hat{P}\cup P}$, $\hat{P}$ represents the set of predicted binding pocket residues by ZeroBind for a DTI, and $P$ represents the set of true binding residues in the pocket for this DTI.

In addition, we calculate the Jaccard similarity coefficient of predicted binding pockets nodes and the first-order neighbors of the true binding pocket nodes as ${Jaccard\; similarity\; coefficient}(\hat{P},{P}_{{neighbor}})=\frac{\hat{P}\cap {P}_{{neighbor}}}{\hat{P}\cup {P}_{{neighbor}}}$, ${P}_{{neighbor}}$ represents the set of first-order neighbor pockets of the real binding pocket.

Figure 3a, b shows the distribution of Jaccard similarity coefficients of predicted binding pockets with true binding pockets and the first-order neighbors of true binding pockets, respectively. ZeroBind yields average Jaccard similarity coefficients of 0.358 and 0.605, respectively. For the neighboring pockets, Jaccard similarity coefficients are above 0.5 for most pockets. The results show that despite some discrepancy between the predicted binding pockets and true binding pockets, the predicted binding pockets are mostly around the true binding pockets, indicating the effectiveness of the generated IB-subgraph in ZeroBind as the potential binding pocket with some biological interpretability. We further conduct an experiment of randomly sampled residues as potential protein pockets, here denoted as ZeroBind^random. The results are shown in the ablation studies and here we also calculate the Jaccard similarity coefficients of randomly sampled binding residues with true binding pockets and the first-order neighbors of true binding pockets, respectively. The ZeroBind^random yields an average Jaccard similarity coefficients of 0.013 and 0.072 (all the values are below 0.2), which is much smaller than that of ZeroBind, respectively. As shown in Fig. 3a, b, we can see that the randomly selected binding residues have little overlap with the true binding pockets or their neighbors in proteins. The results indicate that the SIB module in ZeroBind learns the potential binding pockets instead of other unrelated factors, since the DTI binding information is to a certain extent able to guide the IB-subgraph module to locate potential binding pockets.

**Fig. 3: ZeroBind is able to detect binding pockets of proteins in a weakly supervised way.**

To further demonstrate the effectiveness of the IB-subgraph module, we train a variant ZeroBind for those proteins with known binding pockets using true binding pocket as the node assignment matrix ${{{{{\bf{Z}}}}}}$ for the inductive test set. This variant ZeroBind achieves an average AUROC of 0.8278, which is higher than the AUROC 0.8032 of the ZeroBind that uses learned node assignment matrix ${{{{{\bf{Z}}}}}}$ by a narrow margin. The experimental results further validate the effectiveness of the IB-subgraph module in ZeroBind.

We further visualize the generated IB-subgraph as the potential binding pocket for Serine/threonine-protein kinase N1 protein in Fig. 3c, d. We can see that although the IB-subgraph introduces some false positive binding residues due to no known binding pocket information for IB-subgraph module training, the potential binding residues included in the IB-subgraph contain most of the true binding residues deposited in BioLip³⁹.

ZeroBind is able to predict potential drugs against SARS-COV-2 Proteins

To better demonstrate the effectiveness of ZeroBind, we use Auto-Docking simulations to validate its predicted potential drugs targeting the SARS-COV-2 proteins. Auto-Docking simulation is a time-consuming but reliable tool to simulate the drug-target binding process. To respond quickly to public health emergencies, we need to validate ZeroBind on proteins that don’t have many binding molecules. Thus, we apply ZeroBind to predict the interactions between the structures of 10 SARS-CoV-2 viral proteins not in the training set and 10,000 drugs in PubChem⁴⁰, where the PDB IDs for SARS-COV-2 protein structures are given in Supplementary Table 3. Then, we choose top-10 binding confidence pairs based on the predicted scores by ZeroBind and further perform Auto-Docking simulations to validate the effectiveness of ZeroBind predictions with AutoDock Vina⁴¹.

The average binding affinity of the top-10 predicted pairs by ZeroBind is −7.42 kcal/mole (Fig. 4a), which are promising drug-target pairs and also validate the reliability of ZeroBind predictions⁴². As demonstrated in⁹, the average binding affinities are close to −7.5 kcal/mole for three binding pairs of SARS-COV-2 proteins and drugs. Furthermore, we simulate the drug-target binding complex between ORF8 accessory protein and VZBSCWDKCMOJCR-UHFFFAOYSA-N drug in Fig. 4b, it yields a relative good binding affinity score of −8.4 kcal/mole. And we also simulate the drug-target binding complex between ORF3a protein and OLTVRSUIOUTBRQ-UHFFFAOYSA-N drug in Fig. 4c, which yields a binding affinity score of −5.8 kcal/mole. As shown in Fig. 4b, c, we can see that the predicted drugs bind well to the pockets in SARS-COV-2 proteins. In future work, we expect ZeroBind to validate the potential drugs against the target proteins, especially those proteins with no verified drugs.

**Fig. 4: ZeroBind predicts binding drugs for SARS-COV-2 proteins.**

Ablation studies on ZeroBind

To demonstrate the added value of individual modules in ZeroBind, we also conducted an ablation study to evaluate the effectiveness. The details of ZeroBind with different modules are illustrated as follows:

(1)
ZeroBind^MAML-: Train the base model of ZeroBind directly without the meta-learning strategy.
(2)
ZeroBind^SIB-: Using all node embeddings of the protein to identify the interaction instead of applying a SIB module to find the IB-graph on the protein graph.
(3)
ZeroBind^attention-: ZeroBind without task adapted attention module to balance the importance of different tasks.
(4)
ZeroBind^GIN: ZeroBind uses GIN instead of GCN as the backbone GNN.
(5)
ZeroBind^random: ZeroBind sets the node assignment matrix $Z$ in Eq. (4) randomly.

The results are shown in Table 1. For ZeroBind^MAML-, we observe a significant decrease in the semi-inductive and inductive test sets, demonstrating that the meta-learning training strategy provides the powerful generalization ability to unseen proteins and drugs. We can see that ZeroBind^SIB- yields a lower performance without the SIB module to find the IB-subgraph, indicating the embedding of the whole protein is less effective than the subgraph embedding. The potential reason is that redundant graph information exists in the whole protein graph, further indicating that the molecule binds to a binding pocket instead of the whole protein. ZeroBind^attention- performs slightly worse than ZeroBind, indicating the necessity of task adaptive self-attention module to balance different DTI tasks for different proteins. We also see a performance decline for ZeroBind^sampling-, due to the lack of the value-generated nonbinding annotations by network-based negative samplings that is able to alleviate the potential short-cut learning. For ZeroBind^GIN, there is no significant difference in the performance compared with ZeroBind, which indicates that the backbone GNN has a smaller impact on the performance than other modules in ZeroBind. Compared to randomly selected binding pockets, ZeroBind^random performs worse than ZeroBind, further validating the effectiveness of the SIB module for detecting potential binding pockets. Moreover, we can see that ZeroBind^random with randomly selected pockets performs worse than ZeroBind^SIB- without the IB-subgraph module, indicating that initially inaccurate binding pockets guides the model to not locate true binding pockets, resulting in a wrong DTI prediction.

Table 1 Performance evaluation of ablation studies on ZeroBind

Full size table

Discussion

The interaction of proteins with drug molecules is an important research topic, especially in the face of proteins and drug molecules not seen in the training set. Considering the information of proteins and molecules simultaneously is an under-explored idea to solve this problem. In this study, we formulate the drug-target interaction (DTI) prediction as a meta-learning task and propose a meta-learning framework called ZeroBind to solve the generalization problem of unseen proteins and drugs in DTI. Specifically, a meta-learning task is defined as binding drug predictions of a specific protein. The results demonstrate that ZeroBind outperforms existing methods in zero-shot and few-shot scenarios. In addition, the subgraphs learned by SIB align well with binding pockets in the protein, where randomly selected residues as binding pockets almost do not overlap with true binding pockets. Furthermore, we validate ZeroBind’s performance in a real-world scenario, where it predicts drug-target bindings for SARS-COV-2.

Due to the rapid development of the research on GNNs, proteins and molecules can be encoded in a more natural form than sequences in previous studies. In addition, the meta-learning strategy also provides a more precise way of delineating the protein-specific DTI task space, which is also consistent with the experimental workflow for proteins in real drug experiments. However, ZeroBind also has some limitations, such as the difficulty of meta-learning training, where the training procedure is complex and prone to instability.

The IB-subgraph method offers an interpretable ability of the model for understanding representation learning. In ZeroBind, the potential binding pockets are automatically detected with the weakly supervised IB-subgraph method that does not use binding pocket annotations as labels. To the best of our knowledge, still no published methods use IB-subgraph or other subgraph-based methods to identify potential binding pockets in the proteins, existing subgraph-based methods are focused on drug molecules. In the current study, we do not incorporate real binding pocket information into model training, since the binding pocket data is far less than the DTI data, and the protein structures predicted by AlphaFold2 still have no known binding pocket annotations. Of the proteins in the benchmark dataset, only 535 proteins have a part of known binding pockets. Thus, it is difficult to train the SIB module in ZeroBind for all proteins to detect binding pockets using the local binding pocket labels. Instead, we train the SIB module of ZeroBind using the global DTI binding labels, which potentially guides the SIB module to locate the potential binding pockets in proteins. As shown in our experiment, ZeroBind with true binding pockets yields better performance. A future update of ZeroBind is expected to take true binding pocket information into the training process to make a more accurate fitting to the DTI problem, if we can collect more true binding pocket data.

Furthermore, the base model in ZeroBind is GCN and there have been more advanced neural architectures for protein-molecule binding, such as SE(3)-equivariant GNN used in EquiBind⁴³. In future work, we expect to investigate more advanced GNNs in ZeroBind.

Methods

Dataset generation and augmentation

BindingDB¹² is a public database of DTI interactions, which deposits binding affinity data between drugs (drug-like molecules) and target proteins. It currently contains over 2,600,000 experimentally determined binding affinities of protein-drug complexes between over 8000 protein targets and over 1,100,000 small molecules.

To create the training and test datasets for ZeroBind, we apply several filtering and preprocessing steps to create a high-quality benchmark dataset. First, data points are filtered with “single protein” for the “target type” attribute, and kinetic constants ${K}_{i}$, ${K}_{d}$, ${{IC}}_{50}$ and ${{EC}}_{50}$ for the “standard type” attribute. In addition, all target proteins should be human or human-like proteins, so they are filtered using “Homo sapiens” for “Target Source Organism” attribute. After excluding proteins that don’t have SwissProt name and molecules that cannot be handled by RDKit⁴⁴, 1,500,000 protein-drug pairs were collected. We use the threshold in AI-bind⁹, which treats kinetic constants ${K}_{i}$, ${K}_{d}$, ${{IC}}_{50}$ and ${{EC}}_{50} < $1000 nM as positive samples and $ > {10}^{6}$ nM as negative samples.

To demonstrate the effectiveness of ZeroBind, we construct three independent test sets for model evaluation through protein sequence similarity clustering by cd-hit and molecule scaffold split: (1) Transductive test set, where molecules with the same scaffold and proteins with the same cluster are in the training set, but their interactions do not exist in the training set; (2) Semi-inductive test set, where the proteins in the same cluster are in the training set, but the molecules with the same scaffold are not; (3) Inductive test set, where molecules with the same scaffolds and proteins in the same clusters are not in the training set.

We first use cd-hit, a widely used program for clustering protein sequences, to cluster 1603 proteins into 1101 clusters with a similarity threshold 0.4. Through comparing molecule scaffolds, we split the molecules into the training molecule set and test molecule set with a training ratio of 0.95, and ensure that these two sets do not have overlapping scaffolds. As the meta-learning-based framework requires sufficient data, we first divide the clusters that have any protein with the number of associated molecules <20 into the test clusters, and the remaining clusters as the training clusters. Then, we construct the training set using 95% of the proteins in the training clusters and SMILES in the training molecules set. The remaining 5% of the proteins in the training clusters and smiles in the training molecules set are used as the Transductive test set. Finally, the proteins and smiles both in the test clusters and molecule set are further used as the Inductive test set, and the rest data are used as the Semi-inductive test set. During the model training, the cross-validation approach is applied for model optimization.

In addition, we construct another few-shot test set with combining the Semi-inductive test set and the Inductive test set to evaluate the few-shot learning power of ZeroBind. Then, we randomly select 5 positive and 5 negative DTIs of each protein as the few-shot fine-tuning set, and the rest DTIs of each protein as the few-shot test set. Proteins that don’t have enough positive or negative DTIs are excluded from the few-shot fine-tuning set and the few-shot test set. The details of the training set and four test sets are given in Table 2.

Table 2 The details of the training set and four test sets

Full size table

We observe there exists data imbalance in the training set, and most proteins have a high positive ratio. AI-bind⁹ demonstrates that the annotation imbalance causes the network to learn topological shortcuts instead of the binding patterns of drug-target interactions. Similar to AI-Bind⁹, we use the network-based negative sampling of training datasets as data augmentation to alleviate the annotation imbalance. Specifically, we construct a bipartite drug-target network. The bipartite network is illustrated in Fig. 1a. Dijkstra’s Algorithm is used to find the shortest path distances between any pairs of nodes in the network, and we consider the node pairs with the shortest path distance ≥ 7 in the network as non-binding pairs. After this processing, the annotation imbalance was slightly relieved as shown in Fig. 1b, c.

The 3D structures of 996 proteins in this study are downloaded in PDB format from the RCSB Protein Data Bank (https://www.rcsb.org)⁴⁵, and the rest 3D structures of 635 proteins are predicted by AlphaFold2⁴⁶. In addition, we download the true binding pocket information containing pocket residuals and pocket 3D coordinate locations from the PDBbind database, which contains binding pockets for 14,336 DTIs.

Graph construction for proteins and drugs

In this section, we first introduce the construction of drug graphs and protein graphs, and then give formal definitions of the DTI task and its derivatives, zero-shot DTI predicting task and few-shot DTI predicting task.

After using RDkit⁴⁴ to construct a molecule from a SMILES²⁷ string, we apply the encoding format used in Open Graph Benchmark (OGB) datasets⁴⁷ to obtain molecular representation. Specifically, atom chemical features consist of atomic number, chirality, atomic formal charge, number of hydrogen atoms attached, radical electrons number, hybridization type, and aromatic. Geometric features consist of atom degree, and a binary value that whether in the ring is used for encoding node representation. Edge features consist of bond type, bond stereo and a binary value that whether bond is conjugated.

Definition 1: We define a drug as a graph denoted as ${G}_{m}=\left\{{V}_{m},{E}_{m}\right\}$, where ${V}_{m}$ is the node set representing the atoms of a molecule and ${E}_{m}$ is the edge set indicating which pair of nodes are connected.

Definition 2: We define a protein graph as ${G}_{p}=\left\{{V}_{p},{E}_{p}\right\}$ from the protein 3D structure, where ${V}_{p}$ is the node set of the residues, ${E}_{p}$ is the edge set indicating which pair of residue nodes are connected. In this study, if the Euclidean distance between two residues in 3D structure space is less than 8 angstroms (Å)⁴⁸, the two residue nodes are connected with one edge in the protein graph.

For proteins without known 3D structures, we use AlphaFold2⁴⁶ to predict their structures as complementary. The node features of the protein graph are initialized with pre-trained ESM-2⁴⁹ emebddings, a general-purpose protein language model, of the residues to incorporate prior knowledge. The edge features of the protein graph are the Euclidean distance between the pairs of nodes. The whole process of constructing a protein graph is shown in Fig. 5.

**Fig. 5: The data processing of ZeroBind.**

In this work, we train a DTI task per protein.

Definition 3: We define a DTI task ${T}_{p}$ from a task set ${{{{{\mathscr{T}}}}}}$ for meta learning as ${T}_{p}=\left\{p,{M}_{p}\right\}$, where p is a protein in the protein set P and M_p is the drug set of its corresponding binding molecules and non-binding molecules.

Definition 4: We define the zero-shot DTI predicting task inherited from DTI task, which represents that the meta-model is directly used for prediction without fine-tuning with samples of the protein in meta-learning test set.

Definition 5: We define the few-shot DTI predicting task inherited from DTI task, which uses the protein-specific model for prediction after fine-tuning with samples of the protein in meta-testing.

The few-shot DTI prediction schedule and zero-shot DTI prediction schedule are shown in Fig. 5c.

Meta-learning setup in ZeroBind

The meta-learning framework was originally proposed to learn general knowledge across multiple related tasks within a distribution and used this general experience to quickly adapt to additional tasks and improve prediction performance⁵⁰. There are two main types of meta-learning framework: (1) Gradient-based methods: The classic gradient-based method MAML⁵¹ uses a meta-learner to learn a good initialization by summing up multiple task losses and updating the parameters across tasks. Therefore, MAML could achieve both high accuracy and speed for generalization; (2) Metric-based methods: The prototypical network⁵² is a classic metric-based meta-learning algorithm that directly trains the vector representation (i.e., prototypes) of each category. Once a good feature extractor is trained, the category of a new sample is determined by its closest prototype in the vector space. In our study, we apply gradient-based methods MAML as the training strategy. Besides, we follow MAML++³⁷ to make some improvements to stabilize the training process of MAML.

In the details of ZeroBind, several techniques including multi-step loss optimization (MSL), learning per-layer per-step learning rates and gradient directions (LSLR), subgraph information bottleneck (SIB), and task adaptive self-attention are applied to improve the performance of ZeroBind. We build the zero-shot learning framework based on MAML. First, we randomly initialize the network parameter θ. Given training tasks (proteins) sampled from the task distribution ${{{{\boldsymbol{{{{{\mathscr{T}}}}}}}}}}$, MAML aims to learn good initial parameters that can quickly adapt to additional tasks. For a pair of a protein and drug, we sample a 2-way, k-shot, m-query training task. Of these three hyperparameters, 2-way is set up since there are only two types of labels of binding and nonbinding in the classification task. In the original MAML framework, the model first trains k data samples of each label for several inner steps to learn a task-specific model and then tests on m data samples. After obtaining the loss for each task, MAML applies the gradient descent with the sum or average of the loss across all tasks. Here we refer the training task and test task as the support set and query set, respectively. However, MAML suffers from instability during training and is sensitive to neural network architectures and a large number of hyperparameters. Therefore, we follow MAML++³⁷ training procedure, which leverages multi-step loss optimization (MSL) and learning per-layer per-step learning rates and gradient directions (LSLR) in MAML.

Specifically, the multi-step loss optimization calculates the loss of the query set after each inner step, and optimizes the base network with the support set after all inner step optimization is completed. More formally, we optimize the model parameters as follows:

$$\theta=\theta -\alpha {\nabla }_{\theta }\mathop{\sum }\limits_{b=1}^{B}\mathop{\sum }\limits_{i=0}^{N}{\upsilon }_{i}{{{{{{\mathscr{L}}}}}}}_{{T}_{b}}\left({f}_{{\theta }_{i}^{b}}\right)$$

(1)

where $\alpha$ is the learning rate, ${{{{{{\mathscr{L}}}}}}}_{{T}_{b}}\left({f}_{{\theta }_{i}^{b}}\right)$ denotes the loss of the query set after every inner step optimization with the support step and ${{{{{{\boldsymbol{\upsilon }}}}}}}_{{{{{{\bf{i}}}}}}}$ denotes the importance of step $i$ generated by the number of inner steps.

The LSLR module sets the learning rate to be learnable parameters for each layer at each inner step. Without adding much computation, different learning rates are automatically learned for each layer at each step, which may help alleviate overfitting.

To summarize the meta-learning procedure in ZeroBind (Boxes 1 and 2), the model is first updated to task-specific models using the support set of each task, and then calculates the loss of the query set of the task. After repeating N inner steps, the losses of all query sets are weighted average by ${\{{{{{{{\boldsymbol{\eta }}}}}}}_{{{{{{{\bf{T}}}}}}}_{{{{{{\bf{i}}}}}}}}\}}_{{{{{{\bf{i}}}}}}{{{{{\boldsymbol{=}}}}}}{{{{{\bf{1}}}}}}}^{{{{{{\bf{N}}}}}}}$, and are used to optimize the meta-model by gradient descent. After training with a sufficient number of samples, the learned model has a good capability of predicting DTI for unseen molecules and proteins or quickly adapting to additional protein tasks with a few training samples. The meta-learning procedure of ZeroBind is shown in Fig. 1d.

Box 1 ZeroBind training process

Require: $\left\{{G}_{m}^{\tau },{G}_{p}^{\tau },{Y}^{\tau }\right\}:$ training data;

While not done do:

Sample batch of tasks ${T}_{\tau }\sim p\left(\tau \right)$

For all ${T}_{\tau }$ do

Sample k examples as support sets of ${T}_{\tau }$ ${\left\{{G}_{{mn}}^{\tau },{G}_{{pn}}^{\tau },{Y}_{n}^{\tau }\right\}}_{n=1}^{k}\in \left({G}_{m}^{\tau },{G}_{p}^{\tau },{Y}^{\tau }\right)$

Sample m examples as query sets of ${T}_{\tau }$ ${\left\{{G}_{{mn}}^{\tau },{G}_{{pn}}^{\tau },{Y}_{n}^{\tau }\right\}}_{n=1}^{m}\in \left({G}_{m}^{\tau },{G}_{p}^{\tau },{Y}^{\tau }\right)$

${\theta }_{{base}}\leftarrow \theta$

For i=1 to inner steps do:

Calculate the inner step loss weight ${\omega }_{i}$

${\left\{{y}_{in}^{\tau }\right\}}_{n=1}^{k}={BaseModel}\left({\left\{{G}_{{Pn}}^{\tau }\right\}}_{n=1}^{k},{\left\{{G}_{{Mn}}^{\tau }\right\}}_{n=1}^{k}{{{{{\rm{;}}}}}}{\theta }_{{base}}\right)$

${{{{{{\mathscr{L}}}}}}}_{i}^{\tau }\leftarrow {Eq}.\left(11\right){with}{\left\{{y}_{in}^{\tau }\right\}}_{n=1}^{k}$

${\theta }_{{base}}$ optimize with ${\left\{{y}_{in}^{\tau }\right\}}_{n=1}^{k}$

${\left\{{y}_{in}^{\tau }\right\}}_{n=1}^{m}={BaseModel}\left({\left\{{G}_{{Pn}}^{\tau }\right\}}_{n=1}^{m},{\left\{{G}_{{Mn}}^{\tau }\right\}}_{n=1}^{m}{{{{{\rm{;}}}}}}{\theta }_{{base}}\right)$

${{{{{{\mathscr{L}}}}}}}_{i}^{\tau }\leftarrow {Eq}.\left(11\right){with}{\left\{{y}_{in}^{\tau }\right\}}_{n=1}^{m}$

End for

${{{{{{\mathscr{L}}}}}}}^{\tau }={\left\{{\omega }_{i}\right\}}_{i=1}^{N}\cdot {\left\{{{{{{{\mathscr{L}}}}}}}_{i}^{\tau }\right\}}_{i=1}^{N}$

End for

${\left\{{\eta }_{i}\right\}}_{i=1}^{N}\leftarrow {Eq}.\left(14\right)$

$\theta \leftarrow \theta -\alpha {\nabla }_{\theta }{\sum }_{{T}_{\tau }\sim p\left(\tau \right)}{\eta }_{i}\cdot {{{{{{\mathscr{L}}}}}}}^{\tau }$

End while

Box 2 BaseModel forward process

Require: $\left\{{G}_{P},{G}_{M}\right\}:$ training data;

${G}_{P}={{GNN}}_{P}\left({G}_{P};{\theta }_{P}\right)$

${G}_{M}={{GNN}}_{M}\left({G}_{M};{\theta }_{M}\right)$

$Z\leftarrow {Eq}.\left(4\right){with}\left({G}_{P},{G}_{M}\right)$

${{G}_{P}}_{{sub}}\leftarrow {Eq}.\left(5\right){with}\left(Z,{G}_{P}\right)$

For i=1 to num steps do

${{{{{{\mathscr{L}}}}}}}_{{MI}-{pro}}\leftarrow {Eq}.\left(8\right){with}\left({G}_{P},{{G}_{P}}_{{sub}};{\varphi }_{2}\right)$

${{{{{{\mathscr{L}}}}}}}_{{MSE}}\leftarrow {Eq}.\left(6\right){with\; Z}$

${{{{{{\mathscr{L}}}}}}}_{{cls}}\leftarrow {Eq}.\left(10\right){with}\left({G}_{m},{{G}_{P}}_{{sub}}{;Y}\right)$

${{{{{{\mathscr{L}}}}}}}_{{Base}}\leftarrow {Eq}.\left(11\right){with}\left({{{{{{\mathscr{L}}}}}}}_{{cls}},{{{{{{\mathscr{L}}}}}}}_{{MSE}},{{{{{{\mathscr{L}}}}}}}_{{MI}-{pro}}\right)$

Architecture of base model in ZeroBind

Figure 1e shows the architecture of the base model in ZeroBind, which mainly consists of three modules: a GNN module to obtain the embedding of molecules and proteins, a SIB module to find the most predictive subgraph as the binding pocket in the protein, and a dense module with concatenating protein subgraph representation and molecular representation to score the interactions.

Graph neural network-based representation learning for proteins and drugs. After constructing drug and protein graphs for a given protein-drug pair, we first embed the molecule atom features to vector space with randomly initialized parameters. We denote the node embedding of the molecule as ${{{{{{\bf{X}}}}}}}_{{{{{{\bf{m}}}}}}}{{{{{\boldsymbol{\in }}}}}}{{\mathbb{R}}}^{{N}_{m}*{D}_{m}}$, and also as the initial node embedding of GNN, denoted as ${{{{{{\bf{G}}}}}}}_{{{{{{\bf{m}}}}}}}^{\left({{{{{\bf{0}}}}}}\right)}$, where ${N}_{m}$ is the number of nodes and ${D}_{m}$ is the dimension of the node embeddings. Here, we denote the graph convolutional network (GCN) based molecule backbone as ${{{{{{\bf{GCN}}}}}}}_{{{{{{\bf{m}}}}}}}$, which is used to learn the graph embedding from the molecule graph. Then, the $l$-th layer output of ${{{{{{\bf{GCN}}}}}}}_{{{{{{\bf{m}}}}}}}$ can be formulated as:

$${G}_{m}^{\left(l\right)}={RELU}\left({\hat{A}}_{m}{G}_{m}^{\left(l-1\right)}{W}_{m}^{\left(l-1\right)}+{b}^{\left(l-1\right)}\right)$$

(2)

where ${\hat{{{{{{\bf{A}}}}}}}}_{{{{{{\bf{m}}}}}}}$ represents the adjacent matrix of a graph, ${{{{{{\bf{G}}}}}}}_{{{{{{\bf{m}}}}}}}^{\left({{{{{\bf{l}}}}}}{{{{{\boldsymbol{-}}}}}}{{{{{\bf{1}}}}}}\right)}$ denotes the node embedding of $\left(l-1\right)$-th layer and ${{{{{{\bf{W}}}}}}}_{{{{{{\bf{m}}}}}}}^{\left({{{{{\bf{l}}}}}}{{{{{\boldsymbol{-}}}}}}{{{{{\bf{1}}}}}}\right)}$ is the learnable weight matrix.

For protein graphs, we initialize the node embedding ${{{{{{\bf{G}}}}}}}_{{{{{{\bf{p}}}}}}}^{\left({{{{{\bf{0}}}}}}\right)}$ of a protein graph with ESM-2, a pre-trained general-purpose protein embedding. Similarly to the molecule graph, a GCN-based backbone ${{{{{{\bf{GCN}}}}}}}_{{{{{{\bf{p}}}}}}}$ is used to learn the graph embedding from the protein graphs. Then, the $l$-th layer output of ${{{{{{\bf{GCN}}}}}}}_{{{{{{\bf{p}}}}}}}$ can be formulated as:

$${G}_{p}^{\left(l\right)}={RELU}\left({\hat{A}}_{p}{G}_{p}^{\left(l-1\right)}{W}_{p}^{\left(l-1\right)}+{b}^{\left(l-1\right)}\right)$$

(3)

Subgraph learning in ZeroBind for potential binding pockets in proteins. Considering that a molecule binds to a binding pocket in the protein instead of the whole protein, after the protein backbone GCN, we apply a model-agnostic SIB module⁵³ to identify the most interpretable subgraph with the most crucial information associated with the DTI task, where the learned subgraph corresponds to the binding pocket on the protein graph. The SIB module is proposed to recognize a compressed subgraph named IB-subgraph under the information bottleneck (IB) principle. The IB-subgraph could also eliminate the noisy and redundant graph information. In our DTI task, the DTI score is largely determined by the protein pocket features, which are the subgraph information bottleneck of the protein graph or the potential drug binding sites in the protein.

The SIB module contains a subgraph generator for node assignment and a dense layer to guarantee that the generated subgraph matches the IB-subgraph. The subgraph generator is a Multi-layer Perceptron (MLP), which takes the protein node embedding and the molecule embedding as the input and outputs the node assignment matrix ${{{{{\bf{Z}}}}}}$, which is formulated as follows:

$$Z={Softmax}\left({MLP}\left({concatenate}\left({G}_{p}^{\left(l\right)},{G}_{m}^{\left(l\right)}\right){{{{{\rm{;}}}}}}{\varphi }_{1}\right)\right)$$

(4)

$${G}_{{sub}}={Z}^{T}{G}_{p}^{\left(l\right)}\left[0\right]$$

(5)

where the output of the MLP is a $n\times 2$ matrix with the number of nodes n. After applying ${Softmax}$ on the output of MLP layer, each row of ${{{{{\bf{Z}}}}}}$ is the probability of the corresponding node belonging to the ${{{{{{\bf{G}}}}}}}_{{{{{{\bf{sub}}}}}}}$ and ${\bar{{{{{{\bf{G}}}}}}}}_{{{{{{\bf{sub}}}}}}}$, denoted as $\left[p\left({h}_{i}\in {G}_{{sub}}|{h}_{i}\right),p\left({h}_{i}\in {\bar{G}}_{{sub}}|{h}_{i}\right)\right]$,where ${{{{{{\bf{G}}}}}}}_{{{{{{\bf{sub}}}}}}}$ represents the generated subgraph of protein graph ${{{{{{\bf{G}}}}}}}_{{{{{{\bf{P}}}}}}}$ and ${\bar{{{{{{\bf{G}}}}}}}}_{{{{{{\bf{sub}}}}}}}$ represents the leftover subgraph of protein graph. After training with a sufficient number of training samples, the learned node assignment matrix ${{{{{\bf{Z}}}}}}$ is supposed to approach to 0 or 1. Thus, the generated IB-subgraph embedding can be obtained by taking the first row of the product between ${{{{{{\bf{Z}}}}}}}^{{{{{{\bf{T}}}}}}}$ and node embedding.

To stabilize the training process of the subgraph generation and separate $p\left({h}_{i}\in {G}_{{sub}}|{h}_{i}\right)$ from $p\left({h}_{i}\in {\bar{G}}_{{sub}}|{h}_{i}\right)$, we apply a connectivity loss as follows:

$${{{{{{\mathscr{L}}}}}}}_{{MSE}}={MSE}\left({diag}\left({Norm}\left({Z}^{T}{AZ}\right)\right)-{diag}\left({I}_{2}\right)\right)$$

(6)

where ${Norm}\left(\cdot \right)$ is the row normalization, ${{{{{\bf{A}}}}}}$ is the adjacency matrix of protein graph ${{{{{{\bf{G}}}}}}}_{{{{{{\bf{P}}}}}}}$ and ${{{{{{\bf{I}}}}}}}_{{{{{{\bf{2}}}}}}}$ is a $2{\times} 2$ identity matrix. Minimizing ${{{{{{\mathscr{L}}}}}}}_{{MSE}}$ could encourages $\left[p\left({h}_{i}\in {G}_{{sub}}|{h}_{i}\right),p\left({h}_{i}\in {\bar{G}}_{{sub}}|{h}_{i}\right)\right]\to \left[{{{{\mathrm{0,1}}}}}\right]/\left[{{{{\mathrm{1,0}}}}}\right]$, which results in a distinctive ${{{{{\bf{Z}}}}}}$ to stabilize the training process and a compact topology in the subgraph.

Formally, the subgraph information bottleneck seeks the IB-subgraph by optimizing the following objective function:

$$\mathop{\max }\limits_{{G}_{{sub}}}I\left(Y,{G}_{{sub}}\right)-\beta I\left(G,{G}_{{sub}}\right)$$

(7)

where $\beta$ is a weight, $Y$ represents the label associated with ${{{{{\bf{G}}}}}}$, and ${{{{{\bf{G}}}}}}$ represents the original graph. $I\left(Y,{G}_{{sub}}\right)$ and $I\left(G,{G}_{{sub}}\right)$ respectively represents the mutual information between $\left(Y,{G}_{{sub}}\right)$ and $\left(G,{G}_{{sub}}\right)$, and ${{{{{{\bf{G}}}}}}}_{{{{{{\bf{sub}}}}}}}$ represents the IB-subgraph embedding.

To optimize the objective function, the lower bound of the first term is formulated as the opposite number of classification loss between the ground truth label and predicted label with subgraph embedding. By minimizing the classification loss, the lower bound of $I\left(Y,{G}_{{sub}}\right)$ achieves the maximum value. For the second term, the DONSKER-VARADHAN representation⁵⁴ of the KL-divergence is applied to approximate the upper bound of $I\left(G,{G}_{{sub}}\right)$. The approximation of $I\left(G,{G}_{{sub}}\right)$ can be formulate as:

$$\mathop{\max }\limits_{{\varphi }_{2}}{{{{{{\mathscr{L}}}}}}}_{{MI}-{pro}}\left({\varphi }_{2},{G}_{{sub}}\right)= \frac{1}{N}\mathop{\sum }\limits_{i=1}^{N}{Dense}\left({G}_{i},{G}_{{{sub}}_{i}}{{{{{\rm{;}}}}}}{\varphi }_{2}\right)\\ -\log \frac{1}{N}\mathop{\sum }\limits_{i=1,j\ne i}^{N}{e}^{{Dense}\left({G}_{i},{G}_{{{sub}}_{j}}{{{{{\rm{;}}}}}}{\varphi }_{2}\right)}$$

(8)

where ${Dense}\left(\bullet \right)$ is the dense network of several MLP layers with the concatenation of the embedding of ${{{{{\bf{G}}}}}}$ and ${{{{{{\bf{G}}}}}}}_{{{{{{\bf{sub}}}}}}}$ as the input.

To minimize $I\left(G,{G}_{{sub}}\right)$, several inner steps of the maximum optimization of ${{{{{{\mathscr{L}}}}}}}_{{MI}-{pro}}$ are applied to minimize the upper bound of $I\left(G,{G}_{{sub}}\right)$.

After obtaining the embedding of the generated protein IB-subgraph, we concatenate the molecular embedding and protein IB-subgraph embedding as the input to the MLP classification layer. The DTI classification loss could be formulated as:

$$\hat{y}={MLP}\left({concatenate}\left({G}_{m},{G}_{{sub}}\right){{{{{\rm{;}}}}}}{\theta }_{{cls}}\right)$$

(9)

$${{{{{{\mathscr{L}}}}}}}_{{cls}}=-y\log \hat{y}-\left(1-y\right)\log \left(1-\hat{y}\right)$$

(10)

The overall loss can be formulated as:

$$\mathop{\min }\limits_{{\theta }_{M},{\theta }_{P},{\varphi }_{1},{\varphi }_{2},{\theta }_{{cls}}}{{{{{{\mathscr{L}}}}}}}_{{base}}={{{{{{\mathscr{L}}}}}}}_{{cls}}+{\lambda }_{1}{{{{{{\mathscr{L}}}}}}}_{{MSE}}+{\lambda }_{2}{{{{{{\mathscr{L}}}}}}}_{{MI}-{pro}}$$

(11)

$$s.t.{\varphi }_{2}^{*}={\arg }\mathop{\max }\limits_{{\varphi }_{2}}{{{{{{\mathscr{L}}}}}}}_{{MI}-{pro}}$$

where ${\lambda }_{1}$ and ${\lambda }_{2}$ are weights to balance the importance of different losses, ${{{{{{\boldsymbol{\theta }}}}}}}_{{{{{{\bf{M}}}}}}}{{{{{\boldsymbol{,}}}}}}{{{{{{\boldsymbol{\theta }}}}}}}_{{{{{{\bf{P}}}}}}}$ are the parameters of ${{{{{{\rm{GCN}}}}}}}_{{{{{{\rm{m}}}}}}}$ and ${{{{{{\rm{GCN}}}}}}}_{{{{{{\rm{p}}}}}}}$, ${{{{{{\boldsymbol{\varphi }}}}}}}_{{{{{{\bf{1}}}}}}}$ is the parameters of IB-subgraph generator, ${{{{{{\boldsymbol{\varphi }}}}}}}_{{{{{{\bf{2}}}}}}}$ is the parameters of the dense network, and ${{{{{{\boldsymbol{\theta }}}}}}}_{{{{{{\bf{cls}}}}}}}$ is the parameters of the MLP layer.

Here, we summarize the training process, the ${{{{{{\mathscr{L}}}}}}}_{{cls}}$ measures the discrepancy between the estimated distribution and the original data distribution by utilizing the cross-entropy between the predicted values and the ground truth (the DTI labels instead of binding pocket labels). The ${{{{{{\mathscr{L}}}}}}}_{{MSE}}$, as indicated by Eq. (6), encourages the node assignment matrix $Z$ to approach close to either 1 or 0. The ${{{{{{\mathscr{L}}}}}}}_{{MI}-{pro}}$, as indicated by Eq. (8), represents an approximation of $I\left(G,{G}_{{sub}}\right)$ introduced in Eq. (7). To maximize Eq. (7) and achieve subgraph information bottleneck, the lower bound of $I\left(Y,{G}_{{sub}}\right)$ needs to increase as much as possible and the upper bound of $I\left(G,{G}_{{sub}}\right)$ needs to decrease. Therefore, we begin by performing gradient descent on the negative of ${{{{{{\mathscr{L}}}}}}}_{{MI}-{pro}}$ within the dense network. The goal is to maximize ${{{{{{\mathscr{L}}}}}}}_{{MI}-{pro}}$ and achieve the upper bound of $I\left(G,{G}_{{sub}}\right)$, thus satisfying the constraints of Eq. (11). After that, we utilize gradient descent to optimize Eq. (11). By optimizing ${{{{{{\mathscr{L}}}}}}}_{{cls}}$, we increase the lower bound of $I\left(Y,{G}_{{sub}}\right)$, and decrease the upper bound of $I\left(G,{G}_{{sub}}\right)$ by optimizing ${{{{{{\mathscr{L}}}}}}}_{{MI}-{pro}}$. This approach aims to maximize Eq. (7) and seek the subgraph information bottleneck for detecting potential binding pockets in proteins.

In our task, we aim to find the protein binding pocket under the weakly supervised training process, since the amount of protein binding pockets data is much smaller than the amount of binding data. So we design the IB-subgraph to find the protein binding pockets optimized on the DTI binding labels instead of binding pocket labels in proteins. Considering the protein pockets are determined by both proteins and molecules, we take the concatenation of molecule embedding and protein embedding as input when we generate the node assignment matrix $Z$ in Eq. (4).

Task adaptive self-attention

In a traditional meta-learning schedule, ${batch\_size}$ tasks are sampled and treated with the same weight when applying mini-batch gradient descent. However, the same weight of multiple tasks can’t reflect the importance of different tasks contributing to the meta-model optimization. We further design a task adaptive self-attention module (Fig. 1f) to learn the importance of different tasks automatically. Since each task is a DTI prediction of a specific protein, we take the concatenation of protein subgraph embedding and the average of the embeddings of all molecules in the query set as the task embedding. The self-attention module is:

$${h}_{{T}_{b}}={concatenate}\left({G}_{p},{Mean}\left({\left\{{G}_{m}^{i}\right\}}_{i=1}^{m}\right)\right)$$

(12)

$$\begin{array}{c}Q={W}_{Q}{\left\{{h}_{{T}_{b}}\right\}}_{b=1}^{B}\\ K={W}_{K}{\left\{{h}_{{T}_{b}}\right\}}_{b=1}^{B}\\ V={W}_{V}{\left\{{h}_{{T}_{b}}\right\}}_{b=1}^{B}\end{array}$$

(13)

$${\left\{{\eta }_{{T}_{b}}\right\}}_{b=1}^{B}={Attention}\left(Q,K,V\right)={softmax}\left(\frac{Q{K}^{T}}{\sqrt{{d}_{k}}}\right)V$$

(14)

$${{{{{{\mathscr{L}}}}}}}_{{all}}=\mathop{\sum }\limits_{b=1}^{B}{\eta }_{{T}_{b}}{{{{{{\mathscr{L}}}}}}}_{{query},b}$$

(16)

where ${T}_{b}$ is one of the DTI tasks, ${{{{{{\bf{h}}}}}}}_{{{{{{{\bf{T}}}}}}}_{{{{{{\bf{b}}}}}}}}$ represents the task embedding, ${{{{{{\bf{W}}}}}}}_{{{{{{\bf{Q}}}}}}}$, ${{{{{{\bf{W}}}}}}}_{{{{{{\bf{K}}}}}}}$ and ${{{{{{\bf{W}}}}}}}_{{{{{{\bf{V}}}}}}}$ are learnable parameter matrices, ${{{{{\bf{Q}}}}}}{{{{{\boldsymbol{,}}}}}}{{{{{\bf{K}}}}}}{{{{{\boldsymbol{,}}}}}}{{{{{\bf{V}}}}}}$ are the generated attention matrices, ${d}_{k}$ is the dimension of task embedding to normalize the variance of ${{{{{\bf{Q}}}}}}{{{{{{\bf{K}}}}}}}^{{{{{{\bf{T}}}}}}}$ and stabilize gradient values during training, and ${{{{{{\boldsymbol{\eta }}}}}}}_{{{{{{{\bf{T}}}}}}}_{{{{{{\bf{b}}}}}}}}$ is the generated weight to balance the importance of different task.

MetaDTA³⁶ uses multi-head cross-attention network to capture the relationships between the support and the query ligands instead of different proteins, where it does not use any protein information. In contrast, ZeroBind utilizes a task adaptive self-attention module to learn the importance for different tasks of proteins, and it focuses on the shared binding patterns among different proteins.

Experimental settings

We take graph convolutional networks as base graph neural network. In our experiment, the GCN layer number of ${{{{{{\bf{GCN}}}}}}}_{{{{{{\bf{P}}}}}}}$ is set as 4 and we set the embedding dimensions as 1280,512,256,256,256, and the GCN layer number of ${{{{{{\bf{GCN}}}}}}}_{{{{{{\bf{M}}}}}}}$ is set as 3, the embedding dimensions we set are 256,256,256. We set the update step in inner loops as 5. 20 inner steps of the maximum optimization of ${{{{{{\mathscr{L}}}}}}}_{{MI}-{pro}}$ are applied. We set the ${\lambda }_{1}$ and ${\lambda }_{2}$ as 0.05. The learning rate of optimization of ${{{{{{\mathscr{L}}}}}}}_{{MI}-{pro}}$ is set at 0.01. The meta learning rate is using simulated annealing algorithm. We use Pytorch to implement the model and run it on a GPU.

Baseline methods

We compare ZeroBind with multiple baselines to evaluate its advantages, and calculate the area under the receiver operating characteristic (AUROC) and the area under the precision-recall curve (AUPRC) on three independent test sets and one few-shot test set.

1.
DeepConv-DTI¹¹: DeepConv-DTI trains a CNN to learn the embedding of the neighboring residues and an MLP to learn the molecular fingerprint, which is concatenated to make DTI prediction.
2.
GraphDTA¹⁰: GraphDTA leverages multiple types of GNN models to embed molecular embeddings and a CNN model to embed protein residue features.
3.
Deeppurpose⁸: Deeppurpose is a state-of-art deep learning library for DTI prediction with multiple backbone networks. Here we use CNN as the backbone network for learning embeddings of both drugs and proteins.
4.
AI-bind⁹: AI-bind uses pre-trained mol2vec and protvec models to initialize the molecule embedding and protein embedding, and makes the DTI predictions by concatenating the molecule embedding and protein embedding.
5.
DrugBAN⁵⁵: DrugBAN uses a deep bilinear attention network (BAN) framework with domain adaptation to explicitly learn local pairwise interactions between drugs and targets.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

The online webserver is freely available at http://www.csbio.sjtu.edu.cn/bioinf/ZeroBind/. The benchmark dataset is collected from the original database BindingDB (https://www.bindingdb.org/bind/downloads/BindingDB_All_2D_202311_sdf.zip) and it is freely available at http://www.csbio.sjtu.edu.cn/bioinf/ZeroBind/datasets.html along with the SARS-COV-2 test dataset, and the experimental protein structure data used in this study are downloaded from the RCSB PDB database (https://www.rcsb.org/downloads/) and the predicted structures by AlphaFold are downloaded from AlphaFold Protein Structure Database (https://www.alphafold.ebi.ac.uk/). All PDB and AlphaFold codes can be found on GitHub (https://github.com/myprecioushh/ZeroBind). Source data are provided with this paper.

Code availability

The source codes of ZeroBind are available on GitHub (https://github.com/myprecioushh/ZeroBind), together with a usage documentation and setup environment.

References

Peska, L., Buza, K. & Koller, J. Drug-target interaction prediction: a Bayesian ranking approach. Comput. Methods Prog. Biomed. 152, 15–21 (2017).
Article Google Scholar
Bagherian, M. et al. Machine learning approaches and databases for prediction of drug–target interaction: a survey paper. Brief. Bioinforma. 22, 247–269 (2021).
Article Google Scholar
Abbasi, K., Razzaghi, P., Poso, A., Ghanbari-Ara, S. & Masoudi-Nejad, A. Deep learning in drug target interaction prediction: current and future perspectives. Curr. Med. Chem. 28, 2100–2113 (2021).
Article CAS PubMed Google Scholar
Deng, J., Yang, Z., Ojima, I., Samaras, D. & Wang, F. Artificial intelligence in drug discovery: applications and techniques. Briefings Bioinformatics 23, bbab430 (2022).
Thafar, M., Raies, A. B., Albaradei, S., Essack, M. & Bajic, V. B. Comparison study of computational prediction tools for drug-target binding affinities. Front. Chem. 7, 782 (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
Cheng, A. C. et al. Structure-based maximal affinity model predicts small-molecule druggability. Nat. Biotechnol. 25, 71–75 (2007).
Article PubMed Google Scholar
Alonso, H., Bliznyuk, A. A. & Gready, J. E. Combining docking and molecular dynamic simulations in drug design. Med. Res. Rev. 26, 531–568 (2006).
Article CAS PubMed Google Scholar
Huang, K. et al. DeepPurpose: a deep learning library for drug–target interaction prediction. Bioinformatics 36, 5545–5547 (2020).
Article CAS PubMed Central Google Scholar
Chatterjee, A. et al. Improving the generalizability of protein-ligand binding predictions with AI-Bind. Nat. Commun. 14, 1989 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Nguyen, T. et al. GraphDTA: predicting drug–target binding affinity with graph neural networks. Bioinformatics 37, 1140–1147 (2021).
Article CAS PubMed Google Scholar
Lee, I., Keum, J. & Nam, H. DeepConv-DTI: Prediction of drug-target interactions via deep learning with convolution on protein sequences. PLoS Comput. Biol. 15, e1007129 (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
Gilson, M. K. et al. BindingDB in 2015: a public database for medicinal chemistry, computational chemistry and systems pharmacology. Nucleic Acids Res. 44, D1045–D1053 (2016).
Article CAS PubMed Google Scholar
Davies, M. et al. ChEMBL web services: streamlining access to drug discovery data and utilities. Nucleic Acids Res. 43, W612–W620 (2015).
Article CAS PubMed PubMed Central Google Scholar
Wang, R., Fang, X., Lu, Y. & Wang, S. The PDBbind database: Collection of binding affinities for protein− ligand complexes with known three-dimensional structures. J. Med. Chem. 47, 2977–2980 (2004).
Article CAS PubMed Google Scholar
Wang, R., Fang, X., Lu, Y., Yang, C.-Y. & Wang, S. The PDBbind database: methodologies and updates. J. Med. Chem. 48, 4111–4119 (2005).
Article CAS PubMed Google Scholar
Wishart, D. S. et al. DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res. 46, D1074–D1082 (2018).
Article CAS PubMed Google Scholar
Sachdev, K. & Gupta, M. K. A comprehensive review of feature based methods for drug target interaction prediction. J. Biomed. Inform. 93, 103159 (2019).
Article PubMed Google Scholar
Wu, Z., Li, W., Liu, G. & Tang, Y. Network-based methods for prediction of drug-target interactions. Front. Pharmacol. 9, 1134 (2018).
Article CAS PubMed PubMed Central Google Scholar
Wang, H., Zhou, G., Liu, S., Jiang, J.-Y. & Wang, W. Drug-target interaction prediction with graph attention networks. Preprint at https://arxiv.org/abs/2107.06099 (2021).
Öztürk, H., Ozkirimli, E. & Özgür, A. A comparative study of SMILES-based compound similarity functions for drug-target interaction prediction. BMC Bioinforma. 17, 1–11 (2016).
Article Google Scholar
Perlman, L., Gottlieb, A., Atias, N., Ruppin, E. & Sharan, R. Combining drug and gene similarity measures for drug-target elucidation. J. Comput. Biol. 18, 133–145 (2011).
Article CAS PubMed Google Scholar
Mei, J.-P., Kwoh, C.-K., Yang, P., Li, X.-L. & Zheng, J. Drug–target interaction prediction by learning from local information and neighbors. Bioinformatics 29, 238–245 (2013).
Article CAS PubMed Google Scholar
Thafar, M. A. et al. DTiGEMS+: drug–target interaction prediction using graph embedding, graph mining, and similarity-based techniques. J. Cheminformatics 12, 1–17 (2020).
Article Google Scholar
Cheng, F. et al. Prediction of drug-target interactions and drug repositioning via network-based inference. PLoS Comput. Biol. 8, e1002503 (2012).
Article CAS PubMed PubMed Central Google Scholar
Luo, Y. et al. A network integration approach for drug-target interaction prediction and computational drug repositioning from heterogeneous information. Nat. Commun. 8, 1–13 (2017).
Article ADS Google Scholar
Chen, H. & Zhang, Z. A semi-supervised method for drug-target interaction prediction with consistency in networks. PlOS ONE 8, e62975 (2013).
Article ADS CAS PubMed PubMed Central Google Scholar
Weininger, D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inf. Comput. Sci. 28, 31–36 (1988).
Article CAS Google Scholar
LeCun, Y. et al. Backpropagation applied to handwritten zip code recognition. Neural Comput. 1, 541–551 (1989).
Article Google Scholar
Elman, J. L. Finding structure in time. Cogn. Sci. 14, 179–211 (1990).
Article Google Scholar
Vaswani, A. et al. Attention is all you need. Adv. Neural Inf. Process. Syst. 30, 6000–6010 (2017).
Jiménez, J., Skalic, M., Martinez-Rosell, G., & De Fabritiis, G. K deep: protein–ligand absolute binding affinity prediction via 3d-convolutional neural networks. J. Chem. Inf. Modeling 58, 287–296 (2018).
Article Google Scholar
Scarselli, F., Gori, M., Tsoi, A. C., Hagenbuchner, M. & Monfardini, G. The graph neural network model. IEEE Trans. Neural Netw. 20, 61–80 (2008).
Article PubMed Google Scholar
Hamilton, W. L., Ying, R. & Leskovec, J. Inductive representation learning on large graphs. 31th Conf. Neural Inf. Process. Syst. 31, 1025–1035 (2017).
Google Scholar
Nguyen, T. M., Nguyen, T., Le, T. M. & Tran, T. GEFA: early fusion approach in drug-target affinity prediction. IEEE/ACM Trans. Comput. Biol. Bioinform. 19, 718–728 (2022).
Article CAS PubMed Google Scholar
Geirhos, R. et al. Shortcut learning in deep neural networks. Nat. Mach. Intell. 2, 665–673 (2020).
Article Google Scholar
Lee, E., Yoo, J., Lee, H. & Hong, S. MetaDTA: meta-learning-based drug-target binding affinity prediction. ICLR2022 Machine Learning for Drug Discovery (2022).
Antoniou, A., Edwards, H. & Storkey, A. How to train your MAML. Proc. ICLR 2019 (2019).
Murphy, A. H. The Finley affair: a signal event in the history of forecast verification. Weather Forecast. 11, 3–20 (1996).
Article ADS Google Scholar
Yang, J., Roy, A. & Zhang, Y. BioLiP: a semi-manually curated database for biologically relevant ligand–protein interactions. Nucleic Acids Res. 41, D1096–D1103 (2012).
Article PubMed PubMed Central Google Scholar
Kim, S. et al. PubChem 2019 update: improved access to chemical data. Nucleic Acids Res. 47, D1102–D1109 (2019).
Article PubMed Google Scholar
Trott, O. & Olson, A. J. AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J. Comput. Chem. 31, 455–461 (2010).
Article CAS PubMed PubMed Central Google Scholar
Gurung, A. B., Bhattacharjee, A. & Ali, M. A. Exploring the physicochemical profile and the binding patterns of selected novel anticancer Himalayan plant derived active compounds with macromolecular targets. Inform. Med. Unlocked 5, 1–14 (2016).
Article Google Scholar
Stark, H., Ganea, O. E., Pattanaik, L., Barzilay, R. & Jaakkola, T. EquiBind: geometric deep learning for drug binding structure prediction. Int. Conf. Mach. Learn. 2022, 20503–20521 (2022).
Google Scholar
Landrum, G. RDKit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum (2013).
Sussman, J. L. et al. Protein Data Bank (PDB): database of three-dimensional structural information of biological macromolecules. Acta Crystallogr. Sect. D: Biol. Crystallogr. 54, 1078–1084 (1998).
Article ADS CAS Google Scholar
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Hu, W. et al. OGB-LSC A large-scale challenge for machine learning on graphs. 35th Conference on Neural Information Processing Systems (NeurIPS 2021) Track on Datasets and Benchmarks (2021).
Xie, Z. W. & Xu, J. B. Deep graph learning of inter-protein contacts. Bioinformatics 38, 947–953 (2022).
Article CAS PubMed Google Scholar
Lin, Z. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023).
Article ADS MathSciNet CAS PubMed Google Scholar
Hospedales, T., Antoniou, A., Micaelli, P. & Storkey, A. Meta-learning in neural networks: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 44, 5149–5169 (2021).
Google Scholar
Finn, C., Abbeel, P. & Levine, S. Model-agnostic meta-learning for fast adaptation of deep networks. Pr. Mach. Learn. Res. 70, 1126–1135 (2017).
Snell, J., Swersky, K. & Zemel, R. Prototypical networks for few-shot learning. Adv. Neural Inf. Process. Syst. 30 (2017).
Yu, J. et al. Recognizing predictive substructures with subgraph information bottleneck. IEEE Trans. Pattern Anal. Mach. Intell. https://doi.org/10.1109/TPAMI.2021.3112205 (2021).
Donsker, M. D. & Varadhan, S. S. Asymptotic evaluation of certain Markov process expectations for large time, I. Commun. Pure Appl. Math. 28, 1–47 (1975).
Article MathSciNet MATH Google Scholar
Bai, P. Z., Miljkovic, F., John, B. & Lu, H. P. Interpretable bilinear attention network with domain adaptation improves drug-target prediction. Nat. Mach. Intell. 5, 126–136 (2023).
Article Google Scholar

Download references

Acknowledgements

This work was supported by the National Key Research and Development Program of China (No. 2020AAA0107600 to X.P.), the National Natural Science Foundation of China (No. 61725302 and 62073219 to H.B.S., 61903248 to X.P.), and the Science and Technology Commission of Shanghai Municipality (20S11902100 to X.P., 22511104100 to H.B.S.).

Author information

Authors and Affiliations

Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, China
Yuxuan Wang, Ying Xia, Ye Yuan, Hong-Bin Shen & Xiaoyong Pan
Department of Computer Science and Engineering, and MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai, 200240, China
Junchi Yan

Authors

Yuxuan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Ying Xia
View author publications
You can also search for this author in PubMed Google Scholar
Junchi Yan
View author publications
You can also search for this author in PubMed Google Scholar
Ye Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Hong-Bin Shen
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoyong Pan
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Y.W. contributed to writing the manuscript, data curation, and preparation, implemented the prediction model and performing the experiments, running docking simulation, and writing the manuscript. Y.X. contributed to data preparation and writing the manuscript. J.Y., Y.Y., and H.B.S. provided the guidance on designing the model and experiments and writing the manuscript. X.P. conceived the project, designed the experiments, contributed to data analysis and writing the manuscript.

Corresponding author

Correspondence to Xiaoyong Pan.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks Thin Nguyen, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

SupplementaryInformation

Peer Review File

Reporting Summary

Source data

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Wang, Y., Xia, Y., Yan, J. et al. ZeroBind: a protein-specific zero-shot predictor with subgraph matching for drug-target interactions. Nat Commun 14, 7861 (2023). https://doi.org/10.1038/s41467-023-43597-1

Download citation

Received: 15 April 2023
Accepted: 13 November 2023
Published: 29 November 2023
DOI: https://doi.org/10.1038/s41467-023-43597-1

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Accurate and interpretable drug-drug interaction prediction enabled by knowledge subgraph learning

Hierarchical Molecular Graph Self-Supervised Learning for property prediction

An adaptive graph learning method for automated molecular interactions and properties predictions

Introduction

Results

Overview of ZeroBind

ZeroBind outperforms existing methods in both zero-shot and few-shot settings for DTI prediction

ZeroBind detects the subgraphs that align well with known binding pockets of proteins in a weakly supervised way

ZeroBind is able to predict potential drugs against SARS-COV-2 Proteins

Ablation studies on ZeroBind

Discussion

Methods

Dataset generation and augmentation

Graph construction for proteins and drugs

Meta-learning setup in ZeroBind

Architecture of base model in ZeroBind

Task adaptive self-attention

Experimental settings

Baseline methods

Reporting summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Supplementary information

SupplementaryInformation

Peer Review File

Reporting Summary

Source data

Source data

Rights and permissions

About this article

Cite this article

Share this article

Comments

Search

Quick links