Abstract
Structurebased lead optimization is an open challenge in drug discovery, which is still largely driven by hypotheses and depends on the experience of medicinal chemists. Here we propose a pairwise binding comparison network (PBCNet) based on a physicsinformed graph attention mechanism, specifically tailored for ranking the relative binding affinity among congeneric ligands. Benchmarking on two heldout sets (provided by Schrödinger and Merck) containing over 460 ligands and 16 targets, PBCNet demonstrated substantial advantages in terms of both prediction accuracy and computational efficiency. Equipped with a finetuning operation, the performance of PBCNet reaches that of Schrödinger’s FEP+, which is much more computationally intensive and requires substantial expert intervention. A further simulationbased experiment showed that active learningoptimized PBCNet may accelerate lead optimization campaigns by 473%. Finally, for the convenience of users, a web service for PBCNet is established to facilitate complex relative binding affinity prediction through an easytooperate graphical interface.
Similar content being viewed by others
Main
AlphaFold2, which appeared in the 14th round of the Critical Assessment of protein Structure Prediction (CASP), is believed to have solved the halfcenturyold problem of predicting a protein structure from its primary sequence. This breakthrough has ushered in a new era in structurebased drug design^{1}. Recently, the Critical Assessment of Computational Hitfinding Experiments (CACHE), a public benchmarking project, has garnered attention from the computational chemistry community and pharmaceutical industry for enhancing smallmolecule hitfinding algorithms^{2}. However, the hittolead optimization process is still largely driven by hypotheses and depends on the experience of medicinal chemists. Lead optimization aims to design ligands with higher binding affinity while maintaining other properties^{3,4,5}. During optimization, a congeneric series of ligands is generated that generally share the same core structure and differ only in some substituent groups. The extensive optimization space for a lead, spanning hundreds to thousands of compounds, necessitates substantial resources for experimental evaluations^{6,7}. Consequently, developing in silico predictive tools is important to expedite drug discovery. By minimizing the number of designmaketestanalyze cycles, these tools facilitate the attainment of compounds possessing desired affinity and property profiles.
In recent decades, many relative binding free energy (RBFE) simulation methods have been proposed for lead optimization, benefiting from improved force fields and sampling algorithms. For example, free energy perturbation (FEP) is a widely used alchemical method^{8} that is achieving remarkable accuracy on specific systems that is nearing 1 kcal mol^{−1} (ref. ^{9}). However, FEP also suffers from several limitations, such as depending on the process of system preparation for its accuracy^{10}, being limited by considerable computational cost^{9} and being limited to a maximum number of changes between ligands. Another category of RBFE simulation method involves endpoints sampling^{11}, such as the molecular mechanics generalized Born surface area (MMGB/SA)^{12,13}. Endpoints sampling methods reduce the computational requirements, but their performance is also compromised. In summary, despite the high accuracy of RBFE simulation methods, their complicated preparation process, limited molecule throughput and low allowance for changes between molecules hinder their practical usage in quickly navigating the optimization space of lead molecules.
In recent years, some artificial intelligence (AI) models designed for guiding lead optimization have emerged^{14,15,16}. Inspired by RBFE simulation methods, Jiménez–Luna et al. proposed a convolutional Siamese neural network (SNN), called DeltaDelta^{15}, to directly determine the RBFE between two bound ligands. One advantage of SNN is that it directly determines the RBFE, which eliminates the systematic error derived from the absolute binding free energies (ABFEs). Another advantage is its ability to factor in information from both input ligands, incorporating their structural differences and commonalities. However, DeltaDelta has yet to take full advantage of the SNN architecture. Specifically, DeltaDelta first predicts the ABFE of two inputted compounds, and then directly uses the difference of the predicted ABFE as the final RBFE prediction for loss calculation. This approach does not consider the association between the two inputs (pairwise separability^{17}). DeltaDelta showed relatively poor outcomes in retrospective lead optimization campaigns without finetuning. McNutt et al. recently proposed a multitask convolutional SNN model^{16}. Their approach involves using the explicit differences between the representations of two inputted ligands as the molecularpair representation. The potential assumption is that features that are common to two ligands are irrelevant to predicting their difference, which is obviously unreasonable in RBFE predictions. Moreover, they used the prediction of the ABFE as one of the auxiliary tasks, potentially reintroducing the noise originally eliminated by RBFE prediction. Consequently, compared with DeltaDelta, their models did not show substantial performance gains.
In summary, developing an efficient and accurate method to guide lead optimization is an urgent need. To this end, we propose a pairwise binding comparison network (PBCNet) based on a physicsinformed graph attention mechanism that is specifically tailored for ranking the relative binding affinity among a congeneric series of ligands. Several physicaloriented modeling strategies are introduced, considering that the formation of intermolecular interactions always follows strict geometric rules^{18}. Based on our interpretation studies, we found that a relatively high attention score assigned to protein–ligand atom pairs may indicate a more significant interaction. Additionally, PBCNet focuses on molecular substructures that can form intermolecular interactions.
PBCNet has been evaluated in terms of the error and correlation between the predicted and experimental binding affinities. Benchmarking results show that our model substantially outperformed all baselines except FEP+. Furthermore, with a small amount of finetuning^{19} data, PBCNet is comparable to Schrödinger’s FEP+, but with substantially less computational cost. An ideal model should also have the ability to enrich key highactivity compounds from a batch of structural analogs. We built a benchmark to test whether our model can identify ‘leading’ compounds, and the results indicate that, on average, PBCNet can accelerate lead optimization projects by 473%. Finally, PBCNet has been deployed in the cloud, and the corresponding web service is accessible at https://pbcnet.alphama.com.cn/index.
Results
Model structure
The framework of PBCNet is shown in Fig. 1. It consists of three parts: (1) the messagepassing phase, (2) the readout phase and (3) the prediction phase. The input of PBCNet is a pair of pocket–ligand complexes in which the ligands are structural analogs and the parts comprising the pockets are entirely identical. The aminoacid residues of the protein for which the minimum distance for the ligand is less than or equal to 8.0 Å are kept as the protein pocket. The messagepassing phase is designed to obtain nodelevel representations. First, the graph convolutional network (GCN)^{20} is applied to update the atom representations of the protein pocket alone. Then, the updated protein pocket is combined with the two ligands by building edges between pairs of atoms less than 5.0 Å apart. A welldesigned messagepassing network (detailed in the Methods) is then used to transmit information across the molecule graphs. Finally, we remove the pocket from the molecular graphs and only retain the ligands. The goal of the readout phase is to obtain the molecular representations (graphlevel). In this phase, molecular representations of the ligands (x^{(i)} and x^{(j)} in Fig. 1) are computed by an Attentive FP^{21} readout operation. Then, the molecularpair representations (\({\widetilde{{\bf{x}}}}^{{{(}}i,\,j{{)}}}\) in Fig. 1) are obtained by equation (7) in the Methods. In the prediction phase, molecularpair representations are learned by optimizing the losses of two tasks: (1) the predictions of affinity differences and (2) the probabilities that the affinity of ligand i is greater than that of ligand j by two independent branches of threelayer feedforward neural networks (see section Model training and finetuning process).
In the inference process, we only need to provide docking poses of a pair of structurally similar small molecules to the same protein to obtain the predicted relative binding affinity. A more detailed description of the model framework, and the difference between the Siamese network and traditional networks are also demonstrated in the Methods.
Performance of PBCNet
Zeroshot learning
First, we analyzed the zeroshot performance of PBCNet on the two heldout test sets (FEP1 and FEP2 sets, see section Benchmark dataset for performance assessment), and selected Schrödinger’s FEP+ (ref. ^{9}), Schrödinger’s Glide SP^{22}, MMGB/SA^{11}, as well as four AIbased models (DeltaDelta^{15}, Default2018 (ref. ^{16}), Dense^{16} and PIGNet^{23}) as baselines. The general idea of zeroshot learning is to transfer the knowledge contained in the training instances to the task of testing instance prediction^{24}. This evaluation is designed to simulate the early stage of a leadoptimization campaign, where there is always a lack of compounds with known activity. For each test series we randomly selected one ligand as the reference ligand to infer the absolute binding affinities of the remaining ligands (see section Mathematical formulation), and this process was repeated ten times to avoid randomness. The performances of all methods on the FEP1 and FEP2 sets are summarized in Supplementary Data 1 and 2, respectively. Pearson’s correlation coefficient (R), Spearman’s rank correlation coefficient (ρ) and the pairwise rootmeansquare error (r.m.s.e._{pw}) are used here (see section Determination of model performance). For PIGNet, the results were calculated using its officially reported code and weights. For other baselines, we utilized performance metrics as detailed in their respective original literature.
The results show that the performance of PBCNet is substantially better than that of all baselines except FEP+, meaning that PBCNet is the best of all highthroughput methods mentioned here. Moreover, the accuracy of PBCNet on the FEP1 set has achieved 1.11 kcal mol^{−1}, which is very close to 1 kcal mol^{−1}, and it also achieves the lowest average r.m.s.e._{pw} (1.49 kcal mol^{−1}) on the FEP2 set. Supplementary Fig. 1 visualizes the model predictions, demonstrating a strong alignment between the predicted ∆pIC_{50} values (ΔpIC_{50} is the difference between the pIC_{50} values of two ligands, pIC_{50} is the negative logarithm of IC_{50} in molar concentration and IC_{50} means 50% inhibitory concentration, which is a type of binding affinity. Please see section Training dataset and data balance) and the corresponding experimental values across the majority of the test series.
We also find that PBCNet is robust, with more stable performance across all testing series compared with other highthroughput baseline methods. This is evident from the Spearman’s rank correlation coefficient; PBCNet shows correlations of over 0.30 in all test series, whereas other highthroughput baseline methods show a more fluctuating ρ, such as Glide SP (CKD2, ρ = −0.36; Tyk2, ρ = 0.79). This phenomenon reflects the good generalization ability of PBCNet.
Then, we can also observe that the performance of PBCNet on the FEP1 set is better than that on the FEP2 set, possibly due to the several outofdomain samples in the FEP2 set. As a model for lead optimization, PBCNet is designed to infer the activity differences of structural analogs, which always generate high molecule similarities. To be closely consistent with the application scenario, the training set is composed of molecule pairs whose Tanimoto similarity scores are higher than 0.6 (ref. ^{25}). Figure 2a shows the relationship between the model accuracy and molecule similarity, and an obvious negative correlation can be observed. It is not a surprise to notice the similaritydependent performance of PBCNet, because identifying molecules with different structures is more relevant to virtual screening than lead optimization. Correspondingly, the methods and models designed for virtual screening are always poor at lead optimization, such as Glide and PIGNet, which have been evaluated here. We further counted the proportions of ligand pairs with different similarity scores in the FEP1 and FEP2 sets (Fig. 2b). Figure 2b shows that the proportion of molecule pairs with a Tanimoto similarity score of less than 0.6 in the FEP2 set are substantially higher than that in the FEP1 set (70.4% versus 54.4%), which may lead to the performance differences of our model on the FEP1 and FEP2 sets. However, PBCNet’s ranking performance on the FEP2 set still surpassed all the baselines, except for FEP+. Given this, we may conclude that PBCNet should be of practical value for guiding leadoptimization projects.
Finally, we also find our model is highly robust to small changes in ligand poses (specific information is provided in Supplementary Section 1).
Fewshot learning
The reason why we assumed the ranking ability of PBCNet to be inferior to that of FEP+ is because of the ability of FEP+ to sample various binding conformations. Other methods, except MMGB/SA, only use a single snapshot, which leads to less comprehensive information about the molecular binding process. However, PBCNet has two advantages over FEP+ in a realworld application. First, PBCNet is not limited by molecule throughput, allowing for comprehensive exploration of lead optimization. According to public information^{9}, running FEP+ for four perturbations per day requires eight commodity Nvidia GTX780 graphics processing units (GPUs). In contrast, PBCNet takes only 0.9 s to calculate one perturbation by use of a commodity Nvidia V100 GPU. Through a rough performance conversion, PBCNet is ~100,000 times faster than FEP+. The second advantage is PBCNet’s flexibility. During a leadoptimization campaign, the binding affinity data newly generated can be used to finetune PBCNet. Fewshot learning^{19} is used to achieve this. For each test congeneric series, we randomly selected several ligands (~2–10) as finetuning ligands with known binding affinity, which also serve as reference ligands in the inference phase. The remaining ligands are still the ligands to be tested (referred to as the new testing series). We repeat the above process ten times to avoid randomness.
The performances of the finetuned models on the new testing series are summarized in Supplementary Data 3 and Fig. 3. Figure 3 shows that the fewshot learning strategy substantially improves the performance of PBCNet, and the performance increases with the number of finetuning ligands. Supplementary Table 1 shows that the performances of the finetuned PBCNet on the new and original testing series are similar. This suggests that the performance improvement is not due to the bias resulting from the reduced length of the test series. This consistency is also essential for comparing the finetuned PBCNet and FEP+ under existing conditions. We find that, after finetuning, PBCNet’s ranking ability is comparable to that of FEP+. For example, PBCNet finetuned with four ligands even outperformed FEP+ in terms of Spearman’s rank correlation coefficient on the FEP1 set (0.724 versus 0.720).
Using PBCNet to accelerate lead optimization
In this section we test whether our model can efficiently identify highactivity compounds in a closetorealworld leadoptimization scenario by comparing the order of model selection to the experimental order of synthesis, similar to the study of Jiménez–Luna and others^{15}. We use active learning (AL)^{26}, an uncertaintyguided algorithm, to intelligently prioritize sample acquisition. Data acquisition was simulated as iterative selection from each chemical series, with PBCNet as the active learner. In each series, the compound displaying the highest activity was used as the target ligand that needs to be identified. In cases where multiple compounds hold the same highest activity, we prioritized the earliest synthesized among them as the target ligand. In the first iteration, the earliest synthesized compound in each chemical series was chosen as the reference ligand, and activity values were evaluated across the remaining compounds. Subsequently, three ligands with the highest predictive values were selected. If the target ligand was not among these three, they become new reference ligands for the next iteration. In the second iteration, four existing reference ligands were paired to form a finetune set for refining PBCNet. Both the predicted activity values and uncertainties (equations (10) and (11) in the Methods) of the remaining ligands were evaluated by the finetuned PBCNet. This evaluation guided the prioritization of three ligands, according to the predefined sampling method. This iteration was repeated until the target ligand was successfully identified.
We adopted three sampling methods with different settings (see section The sample method for simulationbased experiment). Results for this simulationbased benchmark are presented in Supplementary Data 4. We find that the strategies taking uncertainty into consideration are superior to the purely exploitationoriented one, and the modeloriented as well as useroriented strategies do not exhibit an obvious performance difference. The modeloriented AL strategy is selected as the representative for further comparison, and three metrics are used and computed as follows:
The ‘advantage ratio’ represents the theoretical percentage of resources saved when utilizing PBCNet for guiding lead optimization, compared to not using it. The ‘efficiency improvement ratio’ represents the increase in efficiency when completing a compound optimization project before and after using PBCNet, assuming that a project ends after obtaining the most active compound.
In six out of nine datasets, ALequipped PBCNet is able to attain the compound with the highest affinity faster than its experimental order. On average, it accelerated the leadoptimization projects by ~473%, while also achieving an ~30% reduction in resource investment. Surprisingly, for the BCL6, sEH and AAK1 targets, the compounds with the highest affinity were found by PBCNet in the first iteration without the finetuning operation. We compared our results to the baseline MMGB/SA, which was implemented using the Schrödinger Prime MMGBSA with default settings. The results, presented in Supplementary Table 2, demonstrate that PBCNet consistently outperforms MMGB/SA across all evaluated metrics. Overall, the results are very promising and suggest that PBCNet could be successfully applied in a prospective scenario to accelerate lead optimization.
Model interpretability analysis
Atom level
Given PBCNet’s impressive performance, it is valuable to investigate how the model makes predictions. Because PBCNet is attentionbased, the attention score between a pair of atoms can be seen as a measure of importance. A strong model should assign high scores to atom pairs forming key intermolecular interactions. To illustrate this, we performed a case study on two different ligands in the FEP1 set, focusing on identifying hydrogen bonds^{27}, which are crucial and common intermolecular interactions.
We first computed the intermolecular interactions between the ligands and proteins with Schrödinger20204. Because the positions of the hydrogen atoms depended heavily on the program used to add hydrogens, we did not take them into account. For hydrogenbond donors, we selected the heavy atoms covalently linked with hydrogen atoms for further analysis. We then extracted the attention weights, generated in the last layer of the Distanceaware edge to node block (Methods), of the atoms involved in the formation of hydrogen bonds. The results of these operations are illustrated in Fig. 4, and the intermolecular interactions computed by Schrödinger are summarized in Supplementary Table 3.
Compound 6a from the thrombin series forms three hydrogen bonds with the target at the 3, 8 and 10 positions (Fig. 4a). We found that the hydrogen bonds formed at the 3 and 10 positions are highlighted. The covalent bonds are also emphasized. This is consistent with a chemical prior that the chemical environment of a ligand atom is largely determined by its covalently linked atoms and the protein atoms involved in key intermolecular interactions. It reveals that PBCNet is able to capture key intermolecular interactions. The computed hydrogen bond at the 8 position is not emphasized, unlike its counterparts at the 3 and 10 positions, possibly due to the relatively weaker hydrogenbond donor nature of the amidedonor hydrogen atom^{28}. Compound 186601 from the JNK1 series forms two hydrogen bonds with the target at the 12 and 18 positions (Fig. 4b). As expected, all of them are highlighted. Moreover, the carbon atom of 186601 at the 5 position, which does not form any key intermolecular interaction (computed by Schrödinger), was selected as a negative sample. We can clearly see that only covalent bonds are assigned relatively high attention scores, while the attention scores of the virtual distance bonds are small and uniform in value. The above results all reflect the rationality of the prediction basis of our model.
Substructure level
Medicinal chemists prefer to investigate molecular properties in terms of chemically meaningful fragments rather than individual atoms^{29}. Therefore, we extended our analysis to include substructurelevel interpretability.
In this analysis, we employed the substructure mask explanation (SME) methodology, as recently proposed by Wu and others^{29}. We assume that the model’s prediction value for a compound is denoted as \({\hat{y}}\). Then, the compounds are split into substructures using the BRICS method. Sequentially, the hidden representations of the atoms of each substructure are masked during the readout phase, yielding the corresponding prediction value \({\hat{y}}_{{\rm{sub}}_{i}}\) where the subscript sub_{i} represents the ith substructure. When the predicted value represents the compound’s activity, we consider that a greater decrease in \({\hat{y}}_{{\rm{sub}}_{i}}\) compared to \({\hat{y}}\) indicates that the corresponding substructure plays a more crucial role in the model’s prediction. Thus, the attribution scores used to quantify the importance of each substructure are defined by the following equation:
and we normalize the attribution scores to normalized attribution scores (Attribution_N) within a range of 0 and 1, according to
where N is the number of substructures.
Here, we take compound 6a from the thrombin system as a case study, using compound 1a as a reference ligand to illustrate PBCNet’s activity prediction for compound 6a (Fig. 5a). Compound 6a was segmented into seven substructures using the BRICS method, with the amide group being divided into two distinct substructures. To provide a more intuitive representation for medicinal chemists, we manually merged the amide group as a whole (Supplementary Table 4). The visualization is presented in Fig. 5b.
As shown, we found that Sub_{4} and Sub_{1} (Supplementary Table 4) have the greatest impact on the predictive results. PBCNet is designed to predict the relative binding affinities, which are predominantly derived from the different substructures of a pair of ligands. Sub4, being the part of compound 6a that structurally deviates from compound 1a, has been emphasized, suggesting that PBCNet indeed captures the structural differences between input ligands. Moreover, as depicted in Fig. 4a, Sub_{1} forms two hydrogen bonds with the protein, so the emphasizing of Sub_{1} also implies that PBCNet focuses on key molecular motifs that form intermolecular interactions.
Ablation experiments
To enhance the performance of PBCNet, we implemented various strategies, which can be broadly divided into two categories: frameworkrelated and knowledgerelated. The former includes the SNN architecture and the classification assistance task, while the latter incorporates physical and prior knowledge. To verify whether these strategies really contribute to the model performance improvement, we performed the following ablation experiments on PBCNet.
PBCNet stands out due to its SNN network framework with paired inputs. We constructed a singleinput model termed ‘Singular PBCNet’ to remove the SNN framework. Meanwhile, to verify the effect of pairwise separability on the SNN framework, we built a pairwise separated model referred to ‘Separated PBCNet’. Their frameworks are shown in Supplementary Fig. 2. We also removed the classification auxiliary task and obtained ‘MSE PBCNet’. Note that Singular PBCNet and Separated PBCNet lack the assistance task as they do not use molecular pairs information, and their performance should be compared with MSE PBCNet subsequently. The performance of the ablated models is shown in Supplementary Table 5.
Compared with PBCNet, MSE PBCNet showed a small decrease in performance on both the FEP1 and FEP2 sets (FEP1, 0.636 versus 0.629; FEP2, 0.513 versus 0.488). This aligns with expectations, as the auxiliary task addresses samples with small errors but wrong rankings, which constitute a small fraction of the dataset. Compared with MSE PBCNet, the performance of Singular PBCNet showed a substantial decrease both on the FEP1 set and on the FEP2 set (FEP1, 0.629 versus 0.559; FEP2, 0.488 versus 0.372 (statistically significant)). This result illustrates the advantage of the SNN framework in relative binding affinity prediction. Compared with MSE PBCNet, the performance of Separated PBCNet significantly decreases on the FEP2 set (0.488 versus 0.425). For such results we believe that the ability to consider the structural information of both inputted molecules and their connections simultaneously is crucial for the model performance.
We next removed the distance information, angle information and aromatic information, separately. The performance of the ablated PBCNet is shown in Supplementary Table 5. After removing any of the knowledgerelated strategies, the performance of PBCNet decreases on both the FEP1 and FEP2 sets, especially the distance information. This phenomenon indicates that all three knowledgerelated strategies contribute to the performance of PBCNet.
Discussion
AI has gained prominence in solving scientific problems by incorporating domainspecific knowledge into its modeling. PBCNet is an example of this integration of physical knowledge into its framework. However, there are still avenues for improvement. First, although PBCNet shows substantial predictive advancements over prior attempts, its zeroshot performance is lower than that of Schrödinger’s FEP+. Therefore, capturing protein conformational changes prompted by ligand binding, just like FEP+, remains an ongoing pursuit to improve model accuracy. Second, the underlying assumption of this study is that similar ligands exhibit similar binding modes. Therefore, extreme cases, where highly similar ligands bind to the protein with entirely different binding modes, may pose challenges for PBCNet’s handling capabilities. Furthermore, PBCNet still relies on medicinal chemists for molecule design and molecular docking binding poses generation. A directshot pipeline that integrates molecular generation, docking and optimization, could circumvent cumulative errors in the process of lead optimization.
In the future, we will continue to refine our modeling strategies to enhance PBCNet’s predictive performance by considering the alterations of protein conformation and ligand pose. Simultaneously, we will also try to combine PBCNet with deep molecular generative models to streamline the automated design of highpotency molecules.
Methods
Mathematical formulation
In traditional modeling protocols (singleinput modeling methods), suppose we are given a training set with N samples (protein–ligand complexes from the same congeneric series) \({\mathcal{D}}{\mathscr{=}}{\left\{{{\bf{x}}}^{{{(}}i{{)}}},\,{y}^{(i)}\right\}}_{i=1}^{N}\). Here, \({{\bf{x}}}^{{{(}}i{{)}}}\,{\boldsymbol{\in }}\,{{\mathbb{R}}}^{m}\) represents the feature vector of an input, m means its dimension and \({y}^{(i)}\,{\mathbb{\in }}\,{\mathbb{R}}\) is a realvalued property (pIC_{50} here). \({\mathcal{M}}\) is a deep learningbased regression model parameterized by weights θ and trained on \({\mathcal{D}}\), and \({\hat{y}}^{(i)}={\mathcal{M}}({{\bf{x}}}^{\left(i\right)}{;}\,{\mathbf{\uptheta }})\) represents the prediction result of \({\mathcal{M}}\) for x^{(i)}.
For Siamese models, however, these concepts are subject to slight change. First, N training samples are paired with each other to form \({N}\choose{2}\) paired training samples, and tuple p is used to index them:
where i and j correspond to indexes of the first and second complex of a paired sample. Then, the feature vector \({\widetilde{{\bf{x}}}}^{{{(}}i,\,j{{)}}}\) of a paired sample is dependent on x^{(i)} and x^{(j)}. Here, \({\widetilde{{\bf{x}}}}^{{{(}}i,\,j{{)}}}\,{{\in }}\,{{\mathbb{R}}}^{3* m}\) is constructed by the following equation:
where ⊕ is the concatenation operation. The label of a paired sample \({\widetilde{y}}^{(i,\,j)}\) (∆pIC_{50} here) is calculated according to
Finally, the pairwise training dataset \({{\mathcal{D}}}_{p}={\left\{{\widetilde{{\bf{x}}}}^{(i,\,j)},\,{\widetilde{y}}^{(i,\,j)}\right\}}_{1\le i < j\le N}\) is obtained. \({{\mathcal{M}}}_{p}\) is a Siamese regression model parameterized by weights θ_{p} and trained on \({{\mathcal{D}}}_{p}\). \({\hat{y}}^{(i,\,j)}={{\mathcal{M}}}_{p}({\widetilde{{\bf{x}}}}^{(i,\,j)}{{;}}\,{{\mathbf{\uptheta }}}_{p})\) represents the prediction result of \({{\mathcal{M}}}_{p}\) for \({\widetilde{{\bf{x}}}}^{(i,\,j)}\).
For an unseen complex u whose feature vector is represented by x^{(u)}, we pair it with every complex in \({\mathcal{D}}\), which can be seen as a set of reference samples with known binding affinities in the inference phase, to obtain the pairwise test dataset \({\left\{{\widetilde{{\bf{x}}}}^{{{(}}i,\,u{{)}}},\,{\widetilde{y}}^{(i,\,u)}\right\}}_{i=1}^{N}\). \({{\cal{M}}}_{p}\) is able to output the corresponding N predictions \({\left\{{\hat{y}}^{\left(i,\,u\right)}\right\}}_{i=1}^{N}\), and the predicted absolute affinity of u \({\left\{{\hat{y}}_{i}^{\left(u\right)}\right\}}_{i=1}^{N}\) based on different reference samples can be obtained by the equations
The mean value and variance of \({\left\{{\hat{y}}_{{\rm{i}}}^{\left(u\right)}\right\}}_{i=1}^{N}\) can be deemed the final prediction \({\hat{y}}^{\left(u\right)}\) and uncertainty estimation \({{\sigma }^{2}}^{(u)}\) of u, respectively (equations (10) and (11)):
The structure of alternately updated messagepassing neural network
A welldesigned messagepassing neural network (alternately updated messagepassing neural network, AUMPNN) is applied in the messagepassing phase (Fig. 1a). Before the detailed introduction of AUMPNN, some definitions need to be clarified. First, the complex of a ligand and the corresponding protein binding pocket is deemed a directed molecular graph G, in which all heavy atoms are treated as nodes (Nd), and all covalent bonds are treated as edges (E). Moreover, virtual distance edges are built between atom pairs of the ligand and the binding pocket, whose distances are less than or equal to 5.0 Å. Additionally, virtual aromatic nodes are set up for the centroid of each aromatic ring, and virtual aromatic edges are also established between virtual aromatic nodes and the nodes in corresponding aromatic rings. During message passing, all nodes (heavy atom nodes and virtual aromatic nodes) and all edges (covalent bond edges, virtual distance edges and virtual aromatic edges) are equivalent. Finally, the final whole graph G = 〈Nd, E〉 is constructed. Here, all edges are directed, and an edge \({e}_{\overrightarrow{{uv}}}\) indicates that its direction goes from node a_{u} to node a_{v}. If there is an edge \({e}_{\overrightarrow{{uv}}}\) in G, a_{u} is a neighbor node of a_{v}. In the following, a_{v} is assumed to be the target node whose representation needs to be updated. The set \(V_{nei}=\{a_{u_1},a_{u_2},a_{u_3},\cdots\}\) represents all neighbor nodes of a_{v}, and a_{u} refers to any neighbor node of a_{v} (Supplementary Fig. 3a). Correspondingly, the set \({UV}={\left\{{e}_{\overrightarrow{{u}_{1}v}},\,{e}_{\overrightarrow{{u}_{2}v}},\,{e}_{\overrightarrow{{u}_{3}v}},\,\cdots \right\}}\) is all incoming edges of a_{v} (edges that point to a_{v}). Moreover, \({e}_{\vec{{uv}}}\) is assumed to be the target edge that needs to be updated. The set \({U}_{\rm{nei}}={\left\{{a}_{{k}_{1}},\,{a}_{{k}_{2}},\,{a}_{{k}_{3}},\,\cdots \right\}}\) represents all neighbor nodes of a_{u} except a_{v}. The set \({KU}={\left\{{e}_{\overrightarrow{{k}_{1}u}},\,{e}_{\overrightarrow{{k}_{2}u}},\,{e}_{\overrightarrow{{k}_{3}u}},\,\cdots \right\}}\) stands for all neighbor edges of \({e}_{\overrightarrow{{uv}}}\), and \({e}_{\overrightarrow{{ku}}}\) refers to any neighbor edge of \({e}_{\overrightarrow{{uv}}}\) (Supplementary Fig. 3a).
The specific architecture of AUMPNN is shown in Supplementary Fig. 3c. In general, AUMNPP consists of two phases: (1) distance and angleaware bondtobond blocks and (2) distanceaware bondtoatom blocks. In the following sections, we will give a detailed introduction for these two phases and the corresponding preparations.
Initial featurization
Node and edge features need to be defined before message passing. Here we use a total of 15 types of atomic feature (Supplementary Table 6) and five types of bond feature (Supplementary Table 7) to characterize them and their local chemical environment. Except for atomic mass, explicit valence, implicit valence and van der Waals (vdw) radius, the rest of these features are encoded in a onehot fashion. Of note is that the feature vectors of virtual nodes and edges are set as zero vectors.
Initial hidden representations
Initial node and edge features should be further encoded as their initial hidden representations before the first step of message passing. Taking a_{v} and \({e}_{\overrightarrow{{uv}}}\) as examples, we initialize their hidden representations with
where \({{\bf{x}}}_{v}\in {{\mathbb{R}}}^{{l}_{\rm{node}}}\) and \({{\bf{x}}}_{\overrightarrow{{uv}}}\in {{\mathbb{R}}}^{{l}_{\rm{edge}}}\) are initial features of a_{v} and \({e}_{\overrightarrow{{uv}}}\); \({{\bf{h}}}_{v}^{0}\in {{\mathbb{R}}}^{m}\), \({{\bf{h}}}_{u}^{0}\in {{\mathbb{R}}}^{m}\) and \({{\bf{h}}}_{\overrightarrow{{uv}}}^{0}\in {{\mathbb{R}}}^{m}\) are initial hidden representations of a_{v}, a_{u} and \({e}_{\overrightarrow{{uv}}}\), respectively; \({{\bf{x}}}_{\overrightarrow{{uv}}}^{{\prime} }\in {{\mathbb{R}}}^{\frac{m}{2}}\) is an intermediate vector to obtain \({{\bf{h}}}_{\overrightarrow{{uv}}}^{0}\); cat(∙) is the concatenate operation; \({W}_{{\rm{i}}{\rm{node}}}\), \({W}_{{\rm{i}}{\rm{edge}}}\) and W_{i} are learned matrices; and i means ‘initial’. This process is visualized in Supplementary Fig. 3b.
Distance and angleaware edgetoedge blocks (DAEE blocks)
The aim of this block is to use the information of the neighbor edges in KU to update the hidden representation of \({e}_{\overrightarrow{{uv}}}\). For \({e}_{\overrightarrow{{uv}}}\), the neighbor edges are not equally important. For example, a neighbor edge that stands for a key intermolecular interaction between ligand and protein should be highlighted. Hence, the attention mechanism in GAT^{30} is applied here. Moreover, considering that intermolecular interactions are determined by the atomic types and distances, atom pairwise statistical potentials^{31} are introduced as an additional attention bias term. Here, the Bayesian field theorybased potentials^{32} proposed by Zheng et al. are adopted. Additionally, the degree of the angle between two edges also limits the formation of intermolecular interactions (for example, hydrogen bonds and halogen bonds). Thus, angle information is taken into consideration in computing the attention scores.
The computing process of this block is summarized in Supplementary Fig. 3c (left). First, on each step l, the queries of \({e}_{\overrightarrow{{uv}}}\) (\({{\bf{q}}}_{\overrightarrow{{uv}}}^{l}\)) and the keys of its any neighbor edge \({e}_{\overrightarrow{ku}}\) (\({{\bf{k}}}_{\overrightarrow{{ku}}}^{l}\)) are obtained according to
where \({W}_{q{\rm{edge}}}^{l}\) and \({W}_{k{\rm{edge}}}^{l}\) are two learned matrices. According to the spatial coordinates of nodes a_{k}, a_{u} and a_{v}, the degree of angle θ_{kuv} between \({e}_{\overrightarrow{ku}}\) and \({e}_{\overrightarrow{{uv}}}\) can be computed. Then, we divide the angles into six angle domains with a cutoff of \({\frac{\uppi }{6}}\) (Supplementary Fig. 3d), and encode them as the corresponding angle embedding. Here, the angle information is fused by extending the original attention mechanism in the GAT with angleaware attention:
where Divider is used to map θ_{kuv} to the located angle domain onehot vector, \({W}_{\rm{angle}}^{l}\) is a learned matrix, \({{\bf{w}}}_{\rm{edge}}^{l}\) is a learned vector and \({\varepsilon }_{\overrightarrow{{uv}},\,\overrightarrow{{ku}}}^{l}\) is the correlation coefficient of \({e}_{\overrightarrow{ku}}\) and \({e}_{\overrightarrow{{uv}}}\). After that, atom pairwise statistical potentials are converted as an additional bias term (p_{k, u}) to combine distance information:
where type_{k} and type_{u} are atomic types of a_{k} and a_{u}; \({\rm{dist}}_{\overrightarrow{{ku}}}\) represents the distance between a_{k} and a_{u} (meaning the length of \({e}_{\overrightarrow{{ku}}}\)); \({P\left(\cdot \right)}\) is the mapping function of atom pairwise statistical potentials; \({{\varepsilon }^{{\prime} }}_{\overrightarrow{{uv}},\,\overrightarrow{{ku}}}^{l}\) is the updated correlation coefficient of \({e}_{\overrightarrow{ku}}\) and \({e}_{\overrightarrow{{uv}}}\); and the final calculated attention score \({\alpha }_{\overrightarrow{{uv}},\,\overrightarrow{{ku}}}^{l}\) reflects how important \({e}_{\overrightarrow{ku}}\) is for \({e}_{\overrightarrow{{uv}}}\). Then, the message embedding (\({{\bf{m}}}_{\overrightarrow{{uv}}}^{l}\)) used to update the hidden representation of \({e}_{\overrightarrow{{uv}}}\) is computed according to:
Finally, the updated hidden representation of \({e}_{\overrightarrow{{uv}}}\) (\({{\bf{h}}}_{\overrightarrow{{uv}}}^{l}\)) is acquired by residual connections by the following equation:
where \({W}_{\,{\rm{edge}}1}^{l}\) and \({W}_{{\rm{edge}}2}^{l}\) are trained parameter matrices, and \({\rm{Res}}{(\cdot )}\) is the residual connection module (Supplementary Fig. 3e).
Distanceaware edgetonode blocks (DEN blocks)
The goal of this block is to use the information of the neighbor nodes in \({V}_{\rm{nei}}\) and the incoming edges in UV to update the hidden representation of a_{v}. The computing process of this block is summarized in Supplementary Fig. 3c (right). Similar to DAEE blocks, we also introduce the attention mechanism and additional distancebased bias term. Similarly, the messagepassing phase of the DEN block operates according to
followed by
followed by
Note that all the variables here correspond to those in the DAEE blocks.
Data collection and processing
Training dataset and data balance
In this study, the BindingDB protein–ligand validation sets (2020 version)^{33} were selected as the original training data source. A total of 1,265 congeneric series were included in the dataset, and, for each series, SMILES (Simplified Molecular Input Line Entry System) of the ligands, PDB IDs of the available cocrystal structures and corresponding binding affinity values were provided by the dataset.
The goal of data processing is to generate docking poses of all the ligands and their corresponding proteins by Glide as the input of our model. SMILES that failed during preparation with RDKit^{34} were removed. Binding affinity measurements without values as well as uncertain, for example, qualified data with either the ‘<’ or ‘>’ sign, were discarded. The initial threedimensional structures of the ligands were constructed using RDKit. Then, the ligands were further preprocessed for docking using the Schrödinger LigPrep module with default parameters. From the protein side, the PDB files were prepared using the Protein Preparation Wizard of the Schrödinger suite, following the default protocol. Resolved water molecules that made more than three hydrogen bonds to ligand or receptor atoms were kept, and the structure was centered using the cocrystallized ligand as the center of the receptor grid generated for each protein structure. According to the statistics, 843 (out of 1,265) series possessed multiple available PDB files. For each of these congeneric series, a crossdocking experiment (taking the observed binding site from one protein–ligand complex and docking a different ligand into the site) was carried out to obtain the protein structure with the best pose prediction accuracy for further investigation^{35}. After the pretreatment, the docking was performed using the Glide module in Schrödinger with default parameters, and at most 100 poses per ligand can be written out. Medicinal chemists have long recognized that ligands from the same chemical series tend to bind a given protein in similar poses^{36}; therefore, a key step of pose selection was performed here. For each series, the maximum common substructure (MCS) of each ligand and the cocrystallized ligand was extracted first. Then, the r.m.s.d. of each pose of a ligand and the experimentally determined pose of the cocrystallized ligand in the MCS moiety were calculated, and if the r.m.s.d. was within 2.0 Å, the corresponding pose (referred to as the acceptable pose) will be considered to share the same binding mode with the cocrystallized ligand. When there are multiple acceptable poses of a ligand, the pose with the highest glide score is selected as the final pose. When we cannot obtain the acceptable pose of a ligand through docking, however, the ligand will be discarded to ensure data quality. The above operations associated with Schrödinger were implemented with the 20204 version and by the Schrödinger Python API. The Numpy^{37}, Pandas^{38} and scikitlearn^{39} packages were used for data processing. Matplotlib^{40} was used for visualization.
A total of 1,007 (out of 1,265) series with IC_{50} affinity values were extracted (this was the unit with most data available), containing a diverse set of targets. The IC_{50} affinity values were then logconverted to avoid target scaling issues (pIC_{50} = −log_{10}IC_{50}). Accordingly, the pIC_{50} difference (ΔpIC_{50}) between a pair of ligands from the same congeneric series was chosen as the model prediction target here. Twentysix congeneric series including only one ligand (could not form ligand pairs) and ten congeneric series containing the same protein and ligand as the holdout test congeneric series (detailed in the next section) were also removed. As a result, there is no overlap in the test congeneric series with the training datasets. Finally, we obtained 971 congeneric series with an average of ~34 ligands per series.
Additionally, we found that the labels of the training data were normally distributed, and most of them were concentrated in the area of [−1, 1] (Supplementary Fig. 4a), which would easily lead to overfitting (a model is able to achieve a low training error as long as the model predicts the mean value of the training labels). Thus, we balanced the training data by undersampling the samples in the highdensity regions and oversampling the samples in the lowdensity regions to alleviate this problem. The label distribution of the balanced training dataset is shown in Supplementary Fig. 4b. The final training dataset consists of 0.6 million pairwise samples.
Benchmark dataset for performance assessment
Datasets provided by Wang et al.^{9} and Schindler et al.^{6} were chosen as the heldout test sets and used to benchmark the performance of different methods for lead optimization in this study. Wang et al. provide eight congeneric series (referred to as the FEP1 set) on different targets with experimentally validated binding free energy ∆G values and corresponding evaluation statistics of FEP calculations. We converted ∆G values to the pIC_{50} range assuming noncompetitive binding, generating the following equation for conversion:
where R = 1.987 × 10^{−3} kcal K^{−1} mol^{−1} is the gas constant, T = 297 K is the thermodynamic temperature and e = 2.718 is the Euler number. Schindler et al. also provided eight congeneric series (referred to as the FEP2 set) with pharmaceutically relevant targets, all with experimentally measured binding affinities (IC_{50} values). Compared with the FEP1 set, the congeneric series in the FEP2 set contains changes in net charge and the charge distribution of molecules as well as ring openings and core hopping. For each series, we also logconverted the labels and paired the ligands as we did for the training data.
Benchmark dataset for simulationbased experiment
Apart from the assessment of model accuracy and model ranking ability on the whole congeneric series, we still intend to test whether our model is able to efficiently identify key highactivity compounds in a closetorealworld leadoptimization scenario, by retrospectively comparing the order of model selection to the experimental order of synthesis, similar to JiménezLuna and others^{15}. On this basis, we constructed a benchmark consisting of nine recently published datasets^{41,42,43,44,45,46,47,48,49} with available cocrystal structures and pharmaceutically relevant targets. All series were processed as we did for the training data. The information (for example, protein name and PDB ID) about the benchmark is summarized in Supplementary Table 8.
Determination of model performance
We include three different metrics used to determine the performance of the predictive models. Pearson’s correlation coefficient (R) and Spearman’s rank correlation coefficient (ρ) are used to evaluate the ranking ability, and r.m.s.e._{pw} is used to assess the accuracy of the predictive models.
Note that PBCNet requires at least one reference complex to infer the predictive affinities of other test samples and calculate the corresponding R and ρ. As a result, the test process was repeated ten times independently and the reference complex of each test process was randomly selected to simulate the uncertainty in real applications.
R.m.s.e. is defined as
where u corresponds to a test sample (a protein–ligand complex here); y^{(u)} and \({\hat{y}}^{\left(u\right)}\) are the true label and prediction results of the test sample, respectively; and N is the total number of test samples. R.m.s.e._{pw} is defined as
where (i, u) corresponds to a paired test sample composed of a test complex and any reference complex (from the same congeneric series), and \({\widetilde{y}}^{\left(i,\,u\right)}\) and \({\hat{y}}^{\left(i,\,u\right)}\) are the true label and prediction results of the paired test sample, respectively. Note that here we use r.m.s.e._{pw} to evaluate the accuracy of the models. The reason for this is that we use experimental affinities of reference complexes to achieve the conversion of \({\hat{y}}^{\left(u\right)}\) and \({\hat{y}}^{\left(i,\,u\right)}\) (equation (8)), as Wang et al. and Schindler et al. did in their studies. Additionally, r.m.s.e._{pw} in the kcal mol^{−1} and pIC_{50} units of our model are reported to compare with baseline models from different studies.
Model training and finetuning process
As discussed in the Model structure section, a hybrid loss function is deployed in the training process with equation (33):
where α is a factor controlling the balance between the two types of loss, which can be seen as a hyperparameter. Here, α is set as 1, Loss_{MSE} is the loss of meansquareerror loss function, Loss_{entropy} is entropy loss and Loss_{total} is final loss. The aim of the introduction of entropy loss is to penalize the predictions with low errors but completely wrong ranking. For example, it is difficult for the regression loss function to penalize a sample with a label of 0.1 and a predicted value of −0.1 due to its low MSE value, but this can be effectively realized by the classification loss function. Additionally, the ranking information contained in the hidden representation of a paired sample may be further reinforced by the auxiliary task to improve the ranking ability of PBCNet.
Hyperparameter optimization was performed by grid research on the training data with intercongeneric series fivefold crossvalidation. Considering the considerable number of training samples, 0.25 epochs was set as the unit of early stopping. In the final training process, the model is trained using a batch size of 96 samples for 5.75 epochs with a learning rate of 5e^{−7}.
In the finetuning phase, we did not perform the auxiliary task of PBCNet. PBCNet was finetuned using a batch size of 30 samples for 10 epochs with a learning rate of 1e^{−5}.
Sample method for simulationbased experiment
The sampling method we define here is as follows:
where \({\hat{y}}\) and σ^{2} are the predicted activity value and uncertainty, a is the acquisition score, N_{ite} is the number of iterations and β is a userdefined parameter adjusting the exploration–exploitation tradeoff. Different values of β correspond to three different situations:

β is equal to zero. It is a purely exploitationoriented AL scenario where the users do not take uncertainty into consideration.

β is more than zero (a hybrid AL scenario). This sampling strategy is modeloriented or in favor of ‘exploration’. Samples with greater uncertainty have a higher possibility to be selected (meaning more structure–activity relationship will be explored), so that the finetuned model’s applicability domain may be expanded and the model is expected to give more reliable predictions in the followed iterations.

β is less than zero. This sampling strategy is useroriented or in favor of ‘exploitation’. In a realworld scenario, the compounds with the highest predicted activity values will be selected for further experimental verification. However, compounds with greater uncertainty are more likely to be overestimated. Given this point, users may tend to treat uncertainty as a penalty term to ensure the data quality in this iteration.
The strategies mentioned above are all simulated in our work (β = 0, 2, −2, respectively), and six independent runs with different random seeds are conducted.
Statistics and reproducibility
The P values to test for differences in ablation experiments were calculated using a twosided Wilcoxon signed rank test. The sample size for each analysis was determined by the maximum number of eligible samples available in the respective datasets. The study design did not require blinding. The model’s performance testing involves randomness in the selection of test and reference samples. To mitigate its impact, we conducted multiple repeated experiments using controlled random seed settings (n = 10). To reproduce the primary results of this research, refer to the analytical pipeline available at https://doi.org/10.5281/zenodo.8275244 (ref. ^{50}).
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this Article.
Data availability
The unprocessed training data are from BindingDB source and can be found at https://www.bindingdb.org/validation_sets/index.jsp. The test datasets used in this study are available at https://doi.org/10.5281/zenodo.8275244 (ref. ^{50}). Source data are provided with this paper.
Code availability
The source code for PBCNet is available in the Code Ocean software capsule: https://doi.org/10.24433/CO.1095515.v2 (ref. ^{51}).
References
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Ackloo, S. et al. CACHE (Critical Assessment of Computational Hitfinding Experiments): a public–private partnership benchmarking initiative to enable the development of computational methods for hitfinding. Nat. Rev. Chem. 6, 287–295 (2022).
Nicolaou, C. A. & Brown, N. Multiobjective optimization methods in drug design. Drug Discov. Today Technol. 10, e427–e435 (2013).
Kola, I. & Landis, J. Can the pharmaceutical industry reduce attrition rates? Nat. Rev. Drug Discov. 3, 711–716 (2004).
Ekins, S., Honeycutt, J. D. & Metz, J. T. Evolving molecules using multiobjective optimization: applying to ADME/Tox. Drug Discov. Today 15, 451–460 (2010).
Schindler, C. E. M. et al. Largescale assessment of binding free energy calculations in active drug discovery projects. J. Chem. Inf. Model. 60, 5457–5474 (2020).
WilliamsNoonan, B. J., Yuriev, E. & Chalmers, D. K. Free energy methods in drug design: prospects of ‘alchemical perturbation’ in medicinal chemistry: miniperspective. J. Med. Chem. 61, 638–649 (2018).
Steinbrecher, T. & Labahn, A. Towards accurate free energy calculations in ligand proteinbinding studies. Curr. Med. Chem. 17, 767–785 (2010).
Wang, L. et al. Accurate and reliable prediction of relative ligand binding potency in prospective drug discovery by way of a modern freeenergy calculation protocol and force field. J. Am. Chem. Soc. 137, 2695–2703 (2015).
Cournia, Z., Allen, B. & Sherman, W. Relative binding free energy calculations in drug discovery: recent advances and practical considerations. J. Chem. Inf. Model. 57, 2911–2937 (2017).
Genheden, S. & Ryde, U. The MM/PBSA and MM/GBSA methods to estimate ligandbinding affinities. Expert Opin. Drug Discov. 10, 449–461 (2015).
Srinivasan, J., Cheatham, T. E., Cieplak, P., Kollman, P. A. & Case, D. A. Continuum solvent studies of the stability of DNA, RNA and phosphoramidateDNA helices. J. Am. Chem. Soc. 120, 9401–9409 (1998).
Kollman, P. A. et al. Calculating structures and free energies of complex molecules: combining molecular mechanics and continuum models. Acc. Chem. Res. 33, 889–897 (2000).
Green, H., Koes, D. R. & Durrant, J. D. DeepFrag: a deep convolutional neural network for fragmentbased lead optimization. Chem. Sci. 12, 8036–8047 (2021).
JiménezLuna, J. et al. DeltaDelta neural networks for lead optimization of small molecule potency. Chem. Sci. 10, 10911–10918 (2019).
McNutt, A. T. & Koes, D. R. Improving ΔΔG predictions with a multitask convolutional Siamese network. J. Chem. Inf. Model. 62, 1819–1829 (2022).
Tynes, M. et al. Pairwise difference regression: a machine learning metaalgorithm for improved prediction and uncertainty quantification in chemical search. J. Chem. Inf. Model. 61, 3846–3857 (2021).
Bissantz, C., Kuhn, B. & Stahl, M. A medicinal chemist’s guide to molecular interactions. J. Med. Chem. 53, 5061–5084 (2010).
Pan, S. J. & Yang, Q. A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22, 1345–1359 (2010).
Kipf, T. N. & Welling, M. Semisupervised classification with graph convolutional networks. In Proc. International Conference on Learning Representations (ICLR) (OpenReview.net, 2017); https://arxiv.org/pdf/1609.02907.pdf
Xiong, Z. et al. Pushing the boundaries of molecular representation for drug discovery with the graph attention mechanism. J. Med. Chem. 63, 8749–8760 (2020).
Friesner, R. A. et al. Glide: a new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy. J. Med. Chem. 47, 1739–1749 (2004).
Moon, S., Zhung, W., Yang, S., Lim, J. & Kim, W. Y. PIGNet: a physicsinformed deep learning model toward generalized drugtarget interaction predictions. Chem. Sci. 13, 3661–3673 (2022).
RomeraParedes, B. & Torr, P. An embarrassingly simple approach to zeroshot learning. In Visual Attributes. Advances in Computer Vision and Pattern Recognition (Eds. Feris, R. et al.) 2152–2161 (Springer, Cham, 2015).
Zilian, D. & Sotriffer, C. A. SFCscore(RF): a random forestbased scoring function for improved affinity prediction of proteinligand complexes. J. Chem. Inf. Model. 53, 1923–1933 (2013).
Ding, X. et al. Active learning for drug design: a case study on the plasma exposure of orally administered drugs. J. Med. Chem. 64, 16838–16853 (2021).
Kenny, P. W. Hydrogenbond donors in drug design. J. Med. Chem. 65, 14261–14275 (2022).
Kenny, P. W. Hydrogen bonding, electrostatic potential and molecular design. J. Chem. Inf. Model. 49, 1234–1244 (2009).
Wu, Z. et al. Chemistryintuitive explanation of graph neural networks for molecular property prediction with substructure masking. Nat. Commun. 14, 2585 (2023).
Velikovi, P. et al. Graph Attention Networks. In Proc. International Conference on Learning Representations (ICLR) (OpenReview.net, 2018); https://openreview.net/forum?id=rJXMpikCZ
Muegge, I. & Martin, Y. C. A general and fast scoring function for proteinligand interactions: a simplified potential approach. J. Med. Chem. 42, 791–804 (1999).
Zheng, Z. et al. Generation of pairwise potentials using multidimensional data mining. J. Chem. Theory Comput. 14, 5045–5067 (2018).
Gilson, M. K. et al. BindingDB in 2015: a public database for medicinal chemistry, computational chemistry and systems pharmacology. Nucleic Acids Res. 44, D1045–D1053 (2016).
Landrum, G. RDKit: opensource cheminformatics from machine learning to chemical registration. RDKit https://rdkit.org/docs/source/rdkit.Chem.Scaffolds.rdScaffoldNetwork.html (2019).
Fischer, A., Smiesko, M., Sellner, M. & Lill, M. A. Decision making in structurebased drug discovery: visual inspection of docking results. J. Med. Chem. 64, 2489–2500 (2021).
Paggi, J. M. et al. Leveraging nonstructural data to predict structures and affinities of proteinligand complexes. Proc. Natl Acad. Sci. USA 118, e2112621118 (2021).
Harris, C. R. et al. Array programming with NumPy. Nature 585, 357–362 (2020).
McKinney, W. Data structures for statistical computing in Python. In Proc. 9th Python in Science Conference (Eds. van der Walt, S. & Millma, J.) 56–61 (SCIPY, 2010); https://doi.org/10.25080/Majora92bf192200a
Pedregosa, F. et al. Scikitlearn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
Hunter, J. D. Matplotlib: a 2D graphics environment. Comput. Sci. Eng. 9, 90–95 (2007).
Wilson, C. et al. Optimization of TAM16, a benzofuran that inhibits the thioesterase activity of Pks13; evaluation toward a preclinical candidate for a novel antituberculosis clinical target. J. Med. Chem. 65, 409–423 (2022).
Keylor, M. H. et al. Structureguided discovery of aminoquinazolines as brainpenetrant and selective LRRK2 inhibitors. J. Med. Chem. 65, 838–856 (2022).
Davis, O. A. et al. Optimizing shape complementarity enables the discovery of potent tricyclic BCL6 inhibitors. J. Med. Chem. 65, 8169–8190 (2022).
Hartz, R. A. et al. Bicyclic heterocyclic replacement of an aryl amide leading to potent and kinaseselective adaptor protein 2associated kinase 1 inhibitors. J. Med. Chem. 65, 4121–4155 (2022).
Teuscher, K. B. et al. Discovery of potent orally bioavailable WD repeat domain 5 (WDR5) inhibitors using a pharmacophorebased optimization. J. Med. Chem. 65, 6287–6312 (2022).
Lillich, F. F. et al. Structurebased design of dual partial peroxisome proliferatoractivated receptor γ agonists/soluble epoxide hydrolase inhibitors. J. Med. Chem. 64, 17259–17276 (2021).
Barlaam, B. et al. Discovery of a series of 7azaindoles as potent and highly selective CDK9 inhibitors for transient target engagement. J. Med. Chem. 64, 15189–15213 (2021).
Fallica, A. N. et al. Discovery of novel acetamidebased heme oxygenase1 inhibitors with potent in vitro antiproliferative activity. J. Med. Chem. 64, 13373–13393 (2021).
Turner, L. D. et al. From fragment to lead: de novo design and development toward a selective FGFR2 inhibitor. J. Med. Chem. 65, 1481–1504 (2022).
Yu, J. et al. Computing the relative binding affinity of ligands based on a pairwise binding comparison network. Zenodo https://doi.org/10.5281/zenodo.8275244 (2023).
Yu, J. et al. Computing the relative binding affinity of ligands based on a pairwise binding comparison network. Code Ocean https://doi.org/10.24433/CO.1095515.v2 (2023).
Acknowledgements
This work was supported by the National Natural Science Foundation of China (T2225002, 82273855 to M.Z.; 82130108 to X. Luo; 82204278 to X. Li), Lingang Laboratory (LG2021020102 to M.Z.), the National Key Research and Development Program of China (2022YFC3400504 to M.Z.), China Postdoctoral Science Foundation (2022M720153 to X. Li), SIMMSHUTCM Traditional Chinese Medicine Innovation Joint Research Program (E2G805H to M.Z.), Shanghai Municipal Science and Technology Major Project, and the open fund of state key laboratory of Pharmaceutical Biotechnology, Nanjing University, China (KF202301 to M.Z.).
Author information
Authors and Affiliations
Contributions
J.Y., M.Z., X. Luo, X. Li, H.J. and D.W. designed the research study. J.Y. developed the method and wrote the code. G.C., X.K., J.H., D.C., G.W., R.H. and Y.L. performed the analysis. J.Y., M.Z. and X. Luo wrote the paper. Z.L., J.Y. and X. Liu developed the web service. All authors read and approved the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Computational Science thanks Sandro Cosconati and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available. Primary Handling Editor: Kaitlin McCardle, in collaboration with the Nature Computational Science team.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Section 1, Figs. 1–4 and Tables 1–10.
Supplementary Data 1
The performance of PBCNet with zeroshot learning on the FEP1 set. The first column of the table denotes the different methods and the second column denotes the different metrics, where R denotes Pearson’s correlation coefficient, ρ denotes Spearman’s rank correlation coefficient and RMSEpw denotes the pairwise rootmeansquareerror. For each average metric, the best one is in bold, and the suboptimal one is underlined. For PBCNet, the mean and the variance (in brackets) of the ranking metrics are all reported (n = 10).
Supplementary Data 2
The performance of PBCNet with zeroshot learning on the FEP2 set. The first column of the table denotes the different methods and the second column denotes the different metrics, where R denotes Pearson’s correlation coefficient, ρ denotes Spearman’s rank correlation coefficient and RMSEpw denotes the pairwise rootmeansquareerror. For each average metric, the best one is in bold, and the suboptimal one is underlined. For PBCNet, the mean and the variance (in brackets) of the ranking metrics are all reported (n = 10).
Supplementary Data 3
The performance of PBCNet with fewshot learning on the FEP1 and FEP2 sets. The first column of the table denotes the different methods and the second column denotes the different metrics, where R denotes Pearson’s correlation coefficient, ρ denotes Spearman’s rank correlation coefficient and RMSEpw denotes the pairwise rootmeansquareerror. For each average metric, the best one is in bold. For PBCNet, the mean and the variance (in brackets) of the ranking metrics are all reported (n = 10). One thing to keep in mind is that there are only 11 ligands in Thrombin (a testing series in FEP1 set), so the performance of the FEP1 set reported in the 11^{th} column (finetuned with 10 ligands) is based on the remaining seven series.
Supplementary Data 4
Selection experiment results of the active learning equipped PBCNet for nine different datasets. The first column of the table indicates the name of the system, the second column is the number of compounds per system and the third column indicates the order of experimental synthesis of the target ligands (the compound with the highest affinity in each chemical series). Columns 4–6 indicate the order of selection of the target compounds for PBCNet with different β values, which is a userdefined parameter adjusting the exploration–exploitation tradeoff (see equation (34) in the main text), and the average and corresponding variance values based on six independently runs with different random seeds are reported here (n = 6). For the definition of the last three indicators, refer to equations (1)–(3) in the main text.
Source data
Source Data Fig. 2
Statistical source data.
Source Data Fig. 3
Statistical source data.
Source Data Fig. 4
Statistical source data.
Source Data Fig. 5
Statistical source data.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Yu, J., Li, Z., Chen, G. et al. Computing the relative binding affinity of ligands based on a pairwise binding comparison network. Nat Comput Sci 3, 860–872 (2023). https://doi.org/10.1038/s43588023005299
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s43588023005299
This article is cited by

Extrapolation is not the same as interpolation
Machine Learning (2024)