3D molecular generative framework for interaction-guided drug design

Deep generative modeling has a strong potential to accelerate drug design. However, existing generative models often face challenges in generalization due to limited data, leading to less innovative designs with often unfavorable interactions for unseen target proteins. To address these issues, we propose an interaction-aware 3D molecular generative framework that enables interaction-guided drug design inside target binding pockets. By leveraging universal patterns of protein-ligand interactions as prior knowledge, our model can achieve high generalizability with limited experimental data. Its performance has been comprehensively assessed by analyzing generated ligands for unseen targets in terms of binding pose stability, affinity, geometric patterns, diversity, and novelty. Moreover, the effective design of potential mutant-selective inhibitors demonstrates the applicability of our approach to structure-based drug design.


Model architectures of DeepICL (a) Node and edge features
The input node features of ligand and protein atoms, X l i ∈ R F l and X p j ∈ R F p , are used as described in the Supplementary Table 1.Each of the features is represented as a one-hot vector of a corresponding category.Only an atom type is used for a ligand atom feature, while more informative features are used for a protein atom feature by concatenating every one-hot vector.The resulting feature dimensions of a ligand and a protein atom are F l = 9 and F p = 51, respectively.Initial node features of a ligand and a protein are then embedded into a hidden dimension F h with a single linear layer.1.The used node features of ligand and protein atoms with their available item lists to construct a one-hot vector.
We used the Gaussian expansion of a distance between the i-th and j-th nodes as an edge feature, . Each Gaussian distribution is located at each center value of a distance bin, where a total distance is divided into N bins with a spacing of ∆µ.The smoothness of the expansion is controlled by γ.Formally, ( Specific values of each hyper-parameter are summarized in the Supplementary Table 2.

(b) E (3)-Invariant interaction network architecture
While updating the node features of a protein and a ligand, we devise an E(3)-invariant interaction network that can propagate the inter-and intra-molecular messages between a protein and a ligand.A single layer of the network consists of three steps: inter-and intra-molecular message calculation and aggregation (eqs.( 2) to ( 5)), gate coefficient calculation (eq.( 6)), and node feature update (eq.( 7)).Formally, For convenience, subscripts i, k denote ligand atom indices, and j, l denote protein atom indices.ϕ intra , ϕ inter , and ψ gate are learnable models shared on both a ligand and a protein.ϕ intra and ϕ inter are multi-layer perceptrons (MLPs) activated by sigmoid linear units (SiLUs) [1], while ψ gate is a single linear layer followed by a sigmoid function.Intramolecular message m and intermolecular message µ are linearly interpolated by gate coefficients z i and z j and then used to update the current node hidden states.The hidden vectors for a ligand and a protein, h l and h p , are updated by χ l and χ p of gated recurrent units (GRUs) [2], yielding the updated hidden vectors h l ′ and h p′ , respectively.
(c) Atom type and position prediction model After node feature updates through multiple layers of E(3)-invariant interaction networks, the features are joined with a latent vector, z.After that, only protein node features are additionally joined with an interaction condition, I.The resulting features then undergo fully connected layers into dimensions of type and distance distributions individually.We used SiLU as an activation function, where final prediction outputs go through a softmax function for normalization.

Training details (a) Loss scheduling
A decoder that generates an output in an autoregressive fashion can be susceptible to the KL-vanishing problem, which might cause undesired model behaviors such as mode collapse [3,4].To mitigate this KLvanishing problem, one can employ an annealing schedule for the KL divergence term.Various strategies for the annealing schedules have been proposed [3][4][5][6].In our study, we adopted the simplest monotonic KL annealing, gradually increasing the weight of the KL divergence term up to a predefined value during training.Formally, the weight β(t) is scheduled at the t-th epoch as: where β i is the initial weight, β f is the final wight, and η is the weight annealing factor.Their specific values used in this work can be found in the Supplementary Table 2. Thus, our final loss function can be written as follows: In addition, we used ReduceLROnPlateau module implemented in PyTorch [7], which reduces a learning rate if there is no improvement in the validation loss for a fixed number of training epochs until the learning rate reaches a minimum threshold value.Specific hyper-parameters are also summarized in Supplementary Table 2.

Sampling details (a) Controlling the randomness
In conditional deep generative models, ensuring diversity and novelty in the sampled outputs is an important concern.One simple way to increase them is by introducing some randomness into the sampling process.In our study, we control the randomness of the proposed ligands by employing a temperature factor and a roto-translational noise to the sampling process using DeepICL.Although the randomness of the sampling process affects the validity, our model produces approximately 99% of chemically valid ligands from the provided experimental settings in Supplementary Table 2.
i. Temperature factor As in the work of G-SchNet [8], we use an additional temperature factor that allows for controlling the randomness of the generation.We reformulate eqs.(11) and (12) in the main text (Section 4.2.5), which is to define the likelihood of the next atom type and its position in the sampling process, to introduce temperature factors τ type and τ pos : where a and b are normalization constants.Increasing τ type and τ pos will smoothen the predicted probability distributions, adding more randomness to the next atom prediction, whereas small values will produce sharper distributions, leading to less randomness.
ii. Adding roto-translational noise We adopt an additional method that introduces a random noise to the initial state of the sampling process.We use a different random noise depending on the ligand design task being considered; in the case of de novo ligand design, a translational noise is added to the given center-ofmass, whereas a roto-translational noise is added to the initial ligand core structure for a ligand elaboration task.A translational noise is simply a vector sampled from a Gaussian distribution in 3D space centered at the origin with variance σ t , which is a hyper-parameter to control the randomness of the translational noise.
Then, the center-of-mass is moved by the vector.
For applying extra rotational noise, the rotation axis is sampled from a uniform distribution, and its rotational angle is sampled from a Gaussian distribution centered at the origin with variance σ r , which is a hyper-parameter to control the randomness of the rotational noise.By increasing σ t and σ r , the initial state will be scattered with a greater variance, appending more randomness to the ligand sampling process.The specific value of each hyper-parameter used in the experiments is summarized in the Supplementary Table 2.

(b) Determining the pocket coefficient
We introduce a pocket coefficient λ in eqs.( 11) and ( 12) in the main text (Section 4.2.5), which controls the weight of a pocket-side prediction during the atom sampling.When the space where the next atom will be added is located far from the pocket, the dependence of ligand atom types and positions on the pocket diminishes.To reflect this, we allow the distance between a ligand and a pocket to determine the pocket coefficient.We average the distances between the ligand atom-of-interest and its k-nearest pocket atoms as described in eq. ( 12).Then, the pocket coefficient λ is determined by a function defined in eq. ( 13).The function decays as the average distance, d, increases in the region where d is larger than 2.5 Å.We set its decaying coefficient so that λ ≃ 1 at d = 5.0 Å.

Reference-free interaction conditions
We used specific criteria to designate protein atoms' interaction types in the case when no reference ligand information is available.Atoms involved in salt bridge anions and cations, hydrogen bond acceptors, and donors are selected by using SMARTS descriptors summarized in the Supplementary Table 3.Since only particular motifs that appear on amino acid chains are known to have cations or anions, we set SMARTS patterns to fully cover them, even their resonance structures.Halogen atoms or carbons, which are surrounded only by carbon or hydrogen atoms, are classified as hydrophobic atoms, and atoms within aromatic rings are classified as aromatic atoms.

Molecular dynamics simulation details
During the MD simulations done in this work, topology and parameter files for the ligands were generated using the GAFF-2.11force field [9] via the OpenMM Toolkit [10].Protein-ligand structures were solvated in a cubic box using TIP3P water models [11], extending 10 Å from the protein to provide padding.The systems were neutralized by adding N a + and Cl − ions.Periodic boundary conditions were applied to the systems in the NPT ensemble using the Langevin thermostat [12].To simulate the interactions, we employed the Amber FF14SB force field [13].Equilibration and production runs were performed using the OpenMM toolkit.Initially, the systems underwent energy minimization followed by 1 ns of equilibration.Production runs were conducted at 303.5 K and 1 bar, using a 2 fs integration time step.Each protein-ligand complex underwent 10 ns production runs, initiated with the same initial structure but differently randomized initial velocities.To assess the stability of ligand binding to the target protein, we aligned protein backbone structures from each frame based on their heavy atom coordinates using MDtraj software [14] to capture the ligand movement only and calculate the ligand RMSDs.For each protein-ligand complex, the ligand RMSD was averaged over the entire 10 ns simulation time, yielding the averaged ligand RMSD value and its variance.

Additional discussions and examples of an interaction-aware conditioned ligand elaboration
In Section 2.2 of the main text, we only discussed the first case in which the target was BMP1; we here continue the comprehensive study of the other two examples.The second example is an elaboration of 2-(oxan-3-yloxy)oxane, a skeletal structure of a disaccharide, to fit in the fibroblast growth factor-1 (FGF1, the middle column of Main Figure 2).While the original ligand forms multiple hydrogen bonds with neighboring backbone atoms and the polar side chain of ASN9 (Main Figure 2(c-2)), DeepICL successfully designed a ligand of an identical interaction pattern with the original complex by generating phosphate and carboxylate groups instead of a sulfate (Main Figure 2(d-2)).Interestingly, the generated ligand formed a hydrogen bond with LYS119 via its equatorial hydroxyl group, while the original ligand possesses an axial methoxy group that is directed away from LYS119.
For the third case, we generated ligands from the benzene inside the pocket of the dihydrofolate reductase (DHFR, the right column of Main Figure 2).The original complex shows a complicated interaction pattern, including multiple salt bridges and π-π stackings (Main Figure 2(c-3)).The designed ligand contains a guanidine moiety similar to the original ligand to form a salt bridge and hydrogen bonds with surrounding amino acids simultaneously (Main Figure 2(d-3)).Additionally, the pyruvate-like group of the sampled ligand could form both a hydrogen bond and a salt bridge with ARG31.However, the sampled ligand could not interact with HIS28, which forms a salt bridge with the original ligand.The interaction similarity distributions show multi-modal distributions apparently, which happened from the construction of particular motifs that can form multiple interactions existing in the original complex.In this case, whether the model constructs a hetero-aromatic ring at the position near PHE30 and ASP26 of DHFR results in the subsequent bimodal distribution in Main Figure 2(e-3).
In addition, we provide five more examples of interaction-aware conditioned ligand elaboration in Supplementary Figure 1., which are composed of human sialidase (PDB ID: 2f0z), beta-lactamase Mox-1 (PDB ID: 4wbg), ubiquitin ligase (PDB ID: 6do4), abscisic acid receptor (PDB ID: 6nwc), and MAP kinase-activated protein kinase 2(PDB ID: 3fpm).With DeepICL, we sampled 100 molecules starting from the core structures of the original ligands by using the interaction condition extracted from the original complexes.
Notably, similarity distributions have dramatically diverged in the case of human sialidase (PDB ID: 2f0z).However, one might be concerned about the chemical diversity of the generated ligands since the structure of the top-1 ligand is so similar to the original ligand, placed on the first row of Supplementary Figure 1.Thus, we further provide more molecular structures of ligands that were generated to target the human sialidase in Supplementary Figure 2.While sharing the common core structure, which is oxane, generated ligands are highly diverse.