A dual diffusion model enables 3D molecule generation and lead optimization based on target pockets

Structure-based generative chemistry is essential in computer-aided drug discovery by exploring a vast chemical space to design ligands with high binding affinity for targets. However, traditional in silico methods are limited by computational inefficiency, while machine learning approaches face bottlenecks due to auto-regressive sampling. To address these concerns, we have developed a conditional deep generative model, PMDM, for 3D molecule generation fitting specified targets. PMDM consists of a conditional equivariant diffusion model with both local and global molecular dynamics, enabling PMDM to consider the conditioned protein information to generate molecules efficiently. The comprehensive experiments indicate that PMDM outperforms baseline models across multiple evaluation metrics. To evaluate the applications of PMDM under real drug design scenarios, we conduct lead compound optimization for SARS-CoV-2 main protease (Mpro) and Cyclin-dependent Kinase 2 (CDK2), respectively. The selected lead optimization molecules are synthesized and evaluated for their in-vitro activities against CDK2, displaying improved CDK2 activity.


A Notations used in this paper
We provide notations used in this paper for easier reading.

Notations Descriptions t Time step x
The atom representation r The position of the atoms A adjacent matrix G0 The ground truth molecule geometry G1...T The latent information via diffusion βt A fixed variance schedule The Euclidean distance between atom i and atom j eij Edge between atom i and atom j σ 2 t

User defined variance σv
The standard deviation outputted by VAE encoder µv The mean outputted by VAE encoder ϵ θ Parameterized noise ϵ ϵ = [ϵ a , ϵ r ] where ϵ a ∼ N (0, I),ϵ r ∼ N (0, I) s θ Parameterized stein score µ θ Parameterized mean z Sample from N (0, I) ξ Sample from N (0, I) ϕ Neural networks q() Distribution of diffusion process p() Distribution of reverse process Supplementary Table 1.Notations used in this paper.

B Proof of the diffusion model
We provide proofs for the derivation of several properties in the diffusion model.For a detailed explanation and discussion, we refer readers to [1].

B.1 Marginal distribution of the diffusion process
In the diffusion process, we have the marginal distribution of the data at any arbitrary time step t in a closed form: Recall the posterior q (G t | G t−1 ) in Eq.1 (main document), we can obtain G t using the reparameterization trick.A property of the Gaussian distribution is that if we add N (0, σ 2 1 I) and N (0, σ 2 2 I), the new distribution is N (0, (σ 2 1 + σ 2 2 )I) where α t = 1 − β t , ϵ and ε are sampled from independent standard Gaussian distributions.

B.2 The parameterized mean µ θ
A learned Gaussian transitions p θ (G t−1 | G t ) is devised to approximate the q (G t−1 | G t ) of every time step: ) , σ 2 t I .µ θ is parameterized as follows: The distribution q (G t−1 | G t ) can be expanded by Bayes' rule: where C (G t , G 0 ) is a constant.We can find that q (G t−1 | G t ) is also a Gaussian distribution.We assume that: where βt = 1/ αt βt + 1− ᾱt G 0 .From Eq. 2, we have G t == √ ᾱt G 0 + √ 1 − ᾱt ε.We take this into μ: µ θ is designed to model μ.Therefore, µ θ has the same formulation as μ but parameterizes ϵ:

B.3 Decompose atomic coordinates to pairwise distances
In order to achieve the equivariance of the atomic coordinates in 3D space, we attempt to decompose them to pairwise distances.
Where g : R n×3 → R |E|×1 denotes a function that maps the n atomic coordinates to |E| interatomic distances and R |E| → R is a neural network that estimates the log density of a molecule based on the interatomic distances d.

B.4 The ELBO objective
It is hard to directly calculate conditional log likelihood of the data.Instead, we can derive its ELBO objective for optimizing.For simplicity, we denote the 3D ligand geometry G L as G Then we further derive the conditional ELBO objective: C Experiment details

C.1 Implementation
In this section, we introduce the implementation details of PMDM.Both the local and global ligand EGNNs of PMDM consist of 3 layers and the hidden feature dimension is 128.The protein SchNet of PMDM consists of 3 layers and the hidden feature dimension is 128.We set the local radius τ l as 3 Å which could include almost all the chemical bonds and the global radius τ g as 6 Å.We trained PMDM by Adam optimizer on 64 V100 cards for 500 epochs with a learning rate of 0.001.The batch size is 16.We took around 7 hours to train PMDM.

C.2 Ablation study
We remove the cross-attention layer, local equivariant kernel and global equivariant kernel to observe the performance degradation of PMDM.We observe that the Vina score will drop if we remove any module.

D Algorithms of PMDM
We describe the training and sampling algorithms in this section.To ensure the invariance of ϵ, we introduce zero center of mass (COM) from the previous work [2] to achieve invariance for p(G T ).By extending the approximation of p(G T ) from a standard Gaussian to an isotropic Gaussian, the ϵ is invariant to rotations and translations around the zero COM.Algorithm 1 and algorithm 2 display the complete training procedure and sampling procedure.For the training process, we diffuse the real ligand data and use the ELBO Supplementary Figure 1.Results comparison of PMDM and PMDM w.o.CA for generating 40000 molecules for SARS-CoV-2 main protease (M pro ) (n=10000 in each group; center line, median; box limits, upper and lower quartiles; upper line, maxima; whiskers, lower line, minima; 1.5 × interquartile range;).Source data are provided with this paper.
objective of the loss function to train the model.For the sampling process, the next less chaotic state The final molecule G 0 is generated by progressively sample G t−1 for T times.Algorithm 3 demonstrates the sampling process given seed fragments which we employ this method for lead optimization.The seed fragment data will be diffused to be concatenated with the unknown (sampling) part, and we only update this part while keeping the seed fragment data fixed.

E.3 Clash analysis
Since the pocket spatial information is treated as the condition, we keep the protein position fixed during the update of each layer of the equivariant kernel.The fixed protein positions can ensure the relative distance between the atoms of ligand and atoms of proteins, which could avoid the ligand clashes with the protein.
We select three examples in Figure 7.
The generated molecules shown in the left part of

E.4 Samples generated by PMDM
We provide more visualizations of generated samples which are trained on CrossDocked dataset in Figure 8 and Figure 9.

Table 2 .
Interestingly, QED and SA increase if we remove the cross-attantion (CA) layer, indicating that PMDM can concentrate on the molecule itself if it fuses less protein information.Compared with PMDM without local kernel, PMDM surpasses it on all matrics except for the sampling speed.All the deformed PMDM exceed the test set.We do not report the results of PMDM without global kernel since we find that PMDM cannot generate valid molecules if we remove this indispensable module.Albation studies.
Since PMDM without CA performs better than PMDM on QED and SA, we further investigate whether PMDM without CA can generate molecules binding to the SARS-CoV-2 main protease (M pro ) with high SA scores and rational medicinal property values.We also generate 40000 molecules and calculate their Vina, SA, QED, logP and Lipinski rules to compare with PMDM.As demonstrated in Figure1, we found that PMDM with CA outperformed PMDM without CA on Vina and SA, indicating that it can generate more potent and feasible inhibitors for M pro .On the other hand, PMDM without CA achieved higher QED scores than PMDM with CA, suggesting that it can generate more drug-like molecules.However, both methods generated most molecules with reasonable logP values, and PMDM with CA generated more molecules that satisfied the Lipinski rules than PMDM without CA.Therefore, we conclude that PMDM with CA is a more effective method for generating molecules that can target M pro .
zero COM, move G P to zero COM of G L The seed fragment G s , ligand encoder SchNet l , protein encoder SchNetp, co-attention later ϕcoa, global equivariant neural networks ϕg, local neural networks ϕ l Output: the molecular coordinates R and atom types A Input: Supplementary Figure5.Shape distribution of generated (left) and test set (right) molecules, which is visualized using the Normalized Principal Moment of Inertia ratios(NPR) descriptors.

Table 5 .
We retrained PMDM on 99k samples (8 V100 GPU cards for 3 days) and evaluated the performance on 1k samples which have only 20% sequence similarity.We generate 100 molecules for each protein in the test set, which takes around 4 days.The results are presented in the following table.PMDM still generates molecules with high affinities with specific proteins.Supplementary Table3.The results of molecules generated by PMDM on 1K test pairs.We listed 12 CDK inhibitors in the following table that have been published during the discovery of CDk2 inhibitors since the 1990s.The CDK2 and CDK1 biochemical activities of the selected CDK2 inhibitors are ether tested internally using the CDK2 and CDK1 biochemical assay protocols described in this paper when they are commercially available or cited from the original publications when they are not commercially available.Representative chemotypes and biochemical activities in the discovery of selective CDK2 inhibitors PMDM CVAE AR-SBDD DiffSBDD Test setSupplementary Figure6.The Plane of Best Fit (PBF) descriptor values of all the methods and test set (n=10000 for all the methods, n=100 for test set; center line, median; box limits, upper and lower quartiles; upper line, maxima; whiskers, lower line, minima; 1.5 × interquartile range;).Source data are provided with this paper.F.1 Performance of PMDM on 1K test pairs F.2 Chemical property results of out-of-distribution moleculesWe report the chemical property values of out-of-distribution molecules in the section Analysis of PMDM on chemical space distribution.SupplementaryTable 4. The chemical property values of OOD molecules.QED ↑ SA ↑ Lipinski ↑ LogP 0.448 ± 0.16 0.628 ± 0.19 4.713 ± 0.73 1.065 ± 2.22 F.3 Representative chemotypes and biochemical activities in the discovery of selective CDK2 inhibitors
G.1.1 Experimental procedures and spectroscopic data of the selected compounds General Procedures.All reactions were carried out under a nitrogen atmosphere with dry solvents under anhydrous conditions, unless otherwise noted.No unexpected or unusually high safety hazards were encountered.Low-resolution mass spectra (LC-MS) was used to monitor progression of reactions and was recorded on Waters ACQUITY UPLC with SQ Detectors using a Waters CORTECS C18 column (2.7 µm, 4.6 × 30 mm) using a gradient elution method: solvent A: 0.1% formic acid in water; solvent B: 0.1% formic acid in CH3CN; 5% solvent B to 95% solvent B in 1.0 min, hold 1.0 min, equilibration to 5% solvent B in 0.5 min; flow rate: 1.8 mL/min; column temperature 40 °C.Purification of final products by Prep-HPLC were carried out on Waters Prep-HPLC with QDA detector, using Xbridge C18 column (5 µm, 150 × 19 mm) using a gradient elution method.1H NMR spectra were recorded on a Bruker Ascend 400 spectrometer.Chemical shifts are expressed in parts per million (ppm, δ units).Coupling constants are in units of hertz (Hz).Splitting patterns describe apparent multiplicities and are designated as s (singlet), d (doublet), t (triplet), q (quartet), quint (quintet), m (multiplet), br (broad