Abstract
Generative deep learning methods have recently been proposed for generating 3D molecules using equivariant graph neural networks (GNNs) within a denoising diffusion framework. However, such methods are unable to learn important geometric properties of 3D molecules, as they adopt moleculeagnostic and nongeometric GNNs as their 3D graph denoising networks, which notably hinders their ability to generate valid large 3D molecules. In this work, we address these gaps by introducing the GeometryComplete Diffusion Model (GCDM) for 3D molecule generation, which outperforms existing 3D molecular diffusion models by significant margins across conditional and unconditional settings for the QM9 dataset and the larger GEOMDrugs dataset, respectively. Importantly, we demonstrate that GCDM’s generative denoising process enables the model to generate a significant proportion of valid and energeticallystable large molecules at the scale of GEOMDrugs, whereas previous methods fail to do so with the features they learn. Additionally, we show that extensions of GCDM can not only effectively design 3D molecules for specific protein pockets but can be repurposed to consistently optimize the geometry and chemical composition of existing 3D molecules for molecular stability and property specificity, demonstrating new versatility of molecular diffusion models. Code and data are freely available on GitHub.
Similar content being viewed by others
Introduction
Generative modeling has recently been experiencing a renaissance in modeling efforts driven largely by denoising diffusion probabilistic models (DDPMs). At a high level, DDPMs are trained by learning how to denoise a noisy version of an input example. For example, in the context of computer vision, Gaussian noise may be successively added to an input image with the goals of a DDPM in mind. We would then desire for a generative model of images to learn how to successfully distinguish between the original input image’s feature signal and the noise added to the image thereafter. If a model can achieve such outcomes, we can use the model to generate new images by first sampling multivariate Gaussian noise and then iteratively removing, from the current state of the image, the noise predicted by the model. This classic formulation of DDPMs has achieved significant results in the space of image generation^{1}, audio synthesis^{2}, and even metalearning by learning how to conditionally generate neural network checkpoints^{3}. Furthermore, such an approach to generative modeling has expanded its reach to encompass scientific disciplines such as computational biology^{4,5,6,7,8}, computational chemistry^{9,10,11}, and computational physics^{12}.
Concurrently, the field of geometric deep learning^{13} has seen a sizeable increase in research interest lately, driven largely by theoretical advances within the discipline^{14} as well as by applications of such methodology^{15,16,17,18}. Notably, such applications even include what is considered by many researchers to be a solution to the problem of predicting 3D protein structures from their corresponding amino acid sequences^{19}. Such an outcome arose, in part, from recent advances in sequencebased language modeling efforts^{20,21} as well as from innovations in equivariant neural network modeling^{22}.
However, it is currently unclear how the expressiveness of geometric neural networks impacts the ability of generative methods that incorporate them to faithfully model a geometric data distribution. In addition, it is currently unknown whether diffusion models for 3D molecules can be repurposed for important, realworld tasks without retraining or finetuning and whether geometric diffusion models are better equipped for such tasks. Toward this end, in this work, we provide the following findings:

Neural networks that perform messagepassing with geometric quantities enable diffusion generative models of 3D molecules to generate valid and energeticallystable large molecules whereas nongeometric messagepassing networks fail to do so, where we introduce key computational metrics to enable such findings.

Physical inductive biases such as invariant graph attention and molecular chirality both play important roles in diffusiongenerating valid 3D molecules.

Our newlyproposed GeometryComplete Diffusion Model (GCDM—see Fig. 1), which is the first diffusion model to incorporate the above insights and achieve the ideal type of equivariance for 3D molecule generation (i.e., SE(3) equivariance), establishes stateoftheart (SOTA) results for conditional 3D molecule generation on the QM9 dataset as well as for unconditional molecule generation on the GEOMDrugs dataset of large 3D molecules, for the latter more than doubling PoseBusters validity rates; generates more unique and novel small molecules for unconditional generation on the QM9 dataset; and achieves better Vina energy scores and more than twofold higher PoseBusters validity rates^{23} for proteinconditioned 3D molecule generation.

We further demonstrate that geometric diffusion models such as GCDM can consistently perform 3D molecule optimization for molecular stability as well as for specific molecular properties without requiring any retraining and can consistently do so whereas nongeometric diffusion models cannot.
Results and discussion
Unconditional 3D molecule generation—QM9
The first dataset used in our experiments, the QM9 dataset^{24}, contains molecular properties and 3D atom coordinates for 130k small molecules. Each molecule in QM9 can contain up to 29 atoms after hydrogen atoms are imputed for each molecule following dataset postprocessing as in ref. ^{25}. For the task of 3D molecule generation, we train GCDM to unconditionally generate molecules by producing atom types (H, C, N, O, and F), integer atom charges, and 3D coordinates for each of the molecules’ atoms. Following ref. ^{26}, we split QM9 into training, validation, and test partitions consisting of 100k, 18k, and 13k molecule examples, respectively.
Metrics
We measure each method’s average negative loglikelihood (NLL) over the corresponding test dataset, for methods that report this quantity. Intuitively, a method achieving a lower test NLL compared to other methods indicates that the method can more accurately predict denoised pairings of atom types and coordinates for unseen data, implying that it has fit the underlying data distribution more precisely than other methods. In terms of moleculespecific metrics, we adopt the scoring conventions of ref. ^{27} by using the distance between atom pairs and their respective atom types to predict bond types (single, double, triple, or none) for all but one baseline method (i.e., ENF). Subsequently, we measure the proportion of generated atoms that have the right valency (atom stability—AS) and the proportion of generated molecules for which all atoms are stable (molecule stability—MS). To offer additional insights into each method’s behavior for 3D molecule generation, we also report the validity (Val) of the generated molecules as determined by RDKit^{28}, the uniqueness of the generated molecules overall (Uniq), and whether the generated molecules pass each of the de novo chemical and structural validity tests (i.e., sanitizable, all atoms connected, valid bond lengths and angles, no internal steric clashes, flat aromatic rings and double bonds, low internal energy, correct valence, and kekulizable) proposed in the PoseBusters software suite^{23} and adopted by recent works on molecule generation tasks^{29,30}. Each method’s results in the top half (bottom half) of Table 1 are reported as the mean and standard deviation (mean and Student’s tdistribution 95% confidence error intervals) (±) of each metric across three (five) test runs on QM9, respectively.
Baselines
Besides including a reference point for molecule quality metrics using QM9 itself (i.e., Data), we compare GCDM (a geometrycomplete DDPM  i.e., GCDDPM) to 10 baseline models for 3D molecule generation, each trained and tested using the same corresponding QM9 splits for fair comparisons: GSchnet^{31}; Equivariant Normalizing Flows (ENF)^{27}; Graph Diffusion Models (GDM)^{25} and their variations (i.e., GCMaug); Equivariant Diffusion Models (EDM)^{25}; Bridge and Bridge + Force^{32}; latent diffusion models (LDMs) such as GraphLDM and its variation GraphLDMaug^{33}; as well as the stateoftheart GeoLDM method^{33}. Note that we specifically include these baselines as representative implicit bond prediction methods for which bonds are inferred using their generated molecules’ atom types and interatom distances, in contrast to explicit bond prediction approaches such as those of refs. ^{34,35} for fair comparisons with our method. For each of such baseline methods, we report their results as curated by refs. ^{32,33}. We further include two GCDM ablation models to more closely analyze the impact of certain key model components within GCDM. These two ablation models include GCDM without chiral and geometrycomplete local frames \({{{{{{{{\mathcal{F}}}}}}}}}_{ij}\) (i.e., GCDM w/o Frames) and GCDM without scalar message attention (SMA) applied to each edge message (i.e., GCDM w/o SMA). In “Methods” section as well as Supplementary Methods A.2 and Supplementary Note B, we further discuss GCDM’s design, hyperparameters, and optimization with these model configurations.
Results
In the top half of Table 1, we see that GCDM achieves the highest percentage of probable (NLL), valid, and unique molecules compared to all baseline methods, with AS and MS results marginally lower than those of GeoLDM yet with lower standard deviations. In the bottom half of Table 1, where we reevaluate GCDM and GeoLDM using 5 sampling runs and report 95% confidence intervals for each metric, GCDM generates 1.6% more RDKitvalid and unique molecules and 5.2% more novel molecules compared to GeoLDM, all while offering the best reported NLL for the QM9 test dataset. This result indicates that although GeoLDM offers novelty rates close to parity (i.e., 50%), GCDM nearly matches the stability and PBvalidity rates of GeoLDM while yielding novel molecules nearly 60% of the time on average, suggesting that GCDM may prove more useful for accurately exploring the space of novel yet valid small molecules. Our ablation of SMA within GCDM demonstrates that, to generate stable 3D molecules, GCDM heavily relies on both being able to perform a lightweight version of fullyconnected graph selfattention^{20}, which suggests avenues of future research that will be required to scale up such generative models to large biomolecules such as proteins. Additionally, removing geometric local frame embeddings from GCDM reveals that the inductive biases of molecular chirality and geometrycompleteness are important contributing factors in GCDM achieving these SOTA results. Figure 2 illustrates PoseBustersvalid examples of QM9sized molecules generated by GCDM.
Propertyconditional 3D molecule generation—QM9
Baselines
Towards the practical use case of conditional generation of 3D molecules, we compare GCDM to existing E(3)equivariant models, EDM^{25} and GeoLDM^{33}, as well as to two naive baselines: “Naive (Upperbound)” where a molecular property classifier ϕ_{c} predicts molecular properties given a method’s generated 3D molecules and shuffled (i.e., random) property labels; and “# Atoms” where one uses the numbers of atoms in a method’s generated 3D molecules to predict their molecular properties. For each baseline method, we report its mean absolute error (MAE) in terms of molecular property prediction by an ensemble of three EGNN classifiers ϕ_{c}^{36} as reported in ref. ^{25}. For GCDM, we train each conditional model by conditioning it on one of six distinct molecular property feature inputs—α, gap, homo, lumo, μ, and C_{v}—for approximately 1500 epochs using the QM9 validation split of ref. ^{25} as the model’s training dataset and the QM9 training split of ref. ^{25} as the corresponding EGNN classifier ensemble’s training dataset. Consequently, one can expect the gap between a method’s performance and that of “QM9 (Lowerbound)” to decrease as the method more accurately generates propertyspecific molecules.
Results
We see in Table 2 that GCDM achieves the best overall results compared to all baseline methods in conditioning on a given molecular property, with conditionallygenerated samples shown in Fig. 3 (Note: PSI4computed property values^{37} for (a) and (f) are 69.1 Bohr^{3} (energy: −402 a.u.) and 89.7 Bohr^{3} (energy: −419 a.u.), respectively, at the DFT/B3LYP/631G(2DF,P) level of theory^{24,38}). In particular, as shown in the bottom half of this table, GCDM surpasses the MAE results of the SOTA GeoLDM method (by 19% on average) for all six molecular properties—α, gap, homo, lumo, μ, and C_{v}—by 28%, 9%, 3%, 15%, 21%, and 35%, respectively, while nearly matching the PBValid rates of GeoLDM (similar to the results in Table 1). These results qualitatively and quantitatively demonstrate that, using geometrycomplete diffusion, GCDM enables notably precise generation of 3D molecules with specific molecular properties (e.g., α—polarizability).
Unconditional 3D molecule generation—GEOMDrugs
The second dataset used in our experiments, the GEOMDrugs dataset, is a wellknown source of large, 3D molecular conformers for downstream machine learning tasks. It contains 430k molecules, each with 44 atoms on average and with up to as many as 181 atoms after hydrogen atoms are imputed for each molecule following dataset postprocessing as in ref. ^{25}. For this experiment, we collect the 30 lowestenergy conformers corresponding to a molecule and task each baseline method with generating new molecules with 3D positions and types for each constituent atom. Here, we also adopt the negative loglikelihood, atom stability, and molecule stability metrics as defined in the “Unconditional 3D molecule generation—QM9” section and train GCDM using the same hyperparameters as listed in Supplementary Note B.2, with the exception of training for approximately 75 epochs on GEOMDrugs.
Baselines
In this experiment, we compare GCDM to several stateoftheart baseline methods for 3D molecule generation on GEOMDrugs. Similar to our experiments on QM9, in addition to including a reference point for molecule quality metrics using GEOMDrugs itself (i.e., Data), here we also compare against ENF, GDM, GDMaug, EDM, Bridge along with its variant Bridge + Force, as well as GraphLDM, GraphLDMaug, and GeoLDM. As in the “Unconditional 3D molecule generation—QM9” section, each method’s results in the top half (bottom half) of the table are reported as the mean and standard deviation (mean and Student’s tdistribution 95% confidence interval) (±) of each metric across three (five) test runs on GEOMDrugs.
Results
To start, Table 3 displays an interesting phenomenon that is important to note: due to the size and atomic complexity of GEOMDrugs’ molecules and the subsequent errors accumulated when estimating bond types based on such interatom distances, the baseline results for the molecule stability metrics measured here (i.e., Data) are much lower than those collected for the QM9 dataset. Thus, reporting additional chemical and structural validity metrics (e.g., PBValid) for comparison is crucial to accurately assess a method’s performance in this context, which we do in the bottom half of Table 3. Nonetheless, for GEOMDrugs, GCDM supersedes EDM’s SOTA negative loglikelihood results by 57% and advances GeoLDM’s SOTA atom and molecule stability results by 4% and more than sixfold, respectively. More importantly, however, GCDM can generate a significant proportion of PBvalid large molecules, surpassing even the reference molecule stability rate of the GEOMDrugs dataset (i.e., 2.8) by 54%, demonstrating that geometric diffusion models such as GCDM can not only effectively generate valid large molecules but can also generalize beyond the native distribution of stable molecules within GEOMDrugs.
Figure 4 illustrates PoseBustersvalid examples of large molecules generated by GCDM at the scale of GEOMDrugs. As an example of the notion that GCDM produces low energy structures for a generated molecular graph, the free energies for Fig. 4a, f were computed to be −3 kcal/mol and −2 kcal/mol, respectively, using CREST 2.12^{39} at the GFN2XTB level of theory (which matches the corresponding free energy distribution mean for the GEOMDrugs dataset (−2.5 kcal/mol) as illustrated in Fig. 2 of ref. ^{40}). Lastly, to detect whether a method, in aggregate, generates molecules with unlikely 3D conformations, a generated molecule’s energy ratio is defined as in ref. ^{23} to be the ratio of the molecule’s UFFcomputed energy^{41} and the mean of 50 RDKit ETKDGv3generated conformers^{42} of the same molecular graph. Note that, as discussed by ref. ^{43}, generated molecules with an energy ratio greater than 7 are considered to have highly unlikely 3D conformations. Subsequently, Fig. 5 reveals that the average energy ratio of GCDM’s large 3D molecules is notably lower and more tightly bounded compared to GeoLDM, the baseline SOTA method for this task, indicating that GCDM also generates more energeticallystable 3D molecule conformations compared to prior methods.
Propertyguided 3D molecule optimization—QM9
To evaluate whether molecular diffusion models can not only generate new 3D molecules but can also optimize existing small molecules using molecular property guidance, we adopt the QM9 dataset for the following experiment. First, we use an unconditional GCDM model to generate 1000 3D molecules using 10 time steps of timescaled reverse diffusion (to leave such molecules in an unoptimized state), and then we provide these molecules to a separate propertyconditional diffusion model for optimization of the molecules towards the conditional model’s respective property. This conditional model accepts these 3D molecules as intermediate states for 100 and 250 time steps of propertyguided optimization of the molecules’ atom types and 3D coordinates. Lastly, we repurpose our experimental setup from the “Propertyconditional 3D molecule generation—QM9” section to score these optimized molecules using an ensemble of external property classifier models to evaluate (1) how much the optimized molecules’ predicted property values have been improved for the respective property (first metric) and (2) whether and how much the optimized molecules’ stability (as defined in the “Unconditional 3D molecule generation—QM9” section) has been changed during optimization (second metric).
Baselines
Baseline methods for this experiment include EDM^{25} and GCDM, where both methods use similar experimental setups for evaluation. Our baseline methods also include propertyspecificity and molecule stability measures of the initial (unconditional) 3D molecules to demonstrate how much molecular diffusion models can modify or improve these existing 3D molecules in terms of how propertyspecific and stable they are. As in the “Propertyconditional 3D molecule generation—QM9” section, property specificity is measured in terms of the corresponding property classifier’s MAE for a given molecule with a targeted property value, reporting the mean and Student’s tdistribution 95% confidence interval for each property MAE across an ensemble of three corresponding classifiers. Molecular stability (i.e., Mol Stable (%)), here abbreviated at MS, is defined as in the “Unconditional 3D molecule generation—QM9” section.
Results
In this section, we quantitatively explore (in Fig. 6) whether and how much generative models can reduce the propertyspecific MAE and improve the molecular stability of a batch of existing 3D molecules. In particular, Fig. 6 showcases a practical finding: geometric diffusion models such as GCDM can effectively be repurposed as 3D molecule optimization methods with minimal modifications, improving both a molecule’s stability and property specificity. This finding empirically supports the idea that molecular denoising diffusion models may be applied in the optimization stage of the typical drug discovery pipeline^{44} to experiment with a wider range of potential drug candidates (postoptimization) more quickly than previously possible. Simultaneously, the baseline EDM method fails to consistently optimize the stability and property specificity of existing 3D molecules, which suggests that geometric methods such as GCDM are theoretically and empirically better suited for such tasks. Notably, on average, with 100 time steps GCDM improves the stability of the initial molecules by over 25% and their specificity for each molecular property by over 27%, whereas for the properties it can optimize with 100 time steps, EDM improves the stability of the molecules by 13% and their property specificity by 15%. Lastly, it is worth noting that increasing the number of optimization time steps from 100 to 250 steps inconsistently leads to further improvements to molecules’ stability and property specificity, indicating that the optimization trajectory likely reaches a local minimum around 100 time steps and hence rationalizes reducing the required compute time for optimizing 1000 molecules e.g., from 15 min (for 250 steps) to 5 min (for 100 steps).
Proteinconditional 3D molecule generation
To investigate whether geometrycomplete methods can enhance the ability of molecular diffusion models to generate 3D models within a given protein pocket (i.e., to perform structurebased drug design (SBDD)), in this experiment, we adopt the standard Binding MOAD^{45} and CrossDocked^{46} datasets for training and evaluation of GCDMSBDD, our geometrycomplete, diffusion generative model based on GCPNET++ that extends the diffusion framework of ref. ^{47} for protein pocketaware molecule generation. The Binding MOAD dataset consists of 100,000 highquality proteinligand complexes for training and 130 proteins for testing, with a 30% sequence identity threshold being used to define this crossvalidation split. Similarly, the CrossDocked dataset contains 40,484 highquality proteinligand complexes split between training (40,354) and test (100) partitions using proteins’ enzyme commission numbers as described by ref. ^{47}.
Baselines
Baseline methods for this experiment include DiffSBDDcond^{47} and DiffSBDDjoint^{47}. We compare these methods to our proposed geometrycomplete proteinaware diffusion model, GCDMSBDD, using metrics that assess the properties, and thereby the quality, of each method’s generated molecules. These moleculeaveraged metrics include a method’s average Vina score (computed using QuickVina 2.1)^{48} as a physicsbased estimate of a ligand’s estimated binding affinity with a target protein, measured in units of kcal/mol (lower is better); average drug likeliness QED^{49} (computed using RDKit 2022.03.2); average synthesizability^{50} (computed using the procedure introduced by ref. ^{51}) as an increasing measure of the ease of synthesizing a given molecule (higher is better); on average how many rules of Lipinski’s rule of five are satisfied by a ligand^{52} (computed compositionally using RDKit 2022.03.2); and average diversity in mean pairwise Tanimoto distances^{53,54} (derived manually using fingerprints and Tanimoto similarities computed by RDKit 2022.03.2). Following established conventions for 3D molecule generation^{25}, the size of each ligand to generate was determined using the ligand size distribution of the respective training dataset. Note that, in this context, “joint” and “cond” configurations represent generating a molecule for a protein target, respectively, with and without also modifying the coordinates of the binding pocket within the protein target. Also note that, similar to our experiments in the “Unconditional 3D molecule generation—QM9”, “Propertyconditional 3D molecule generation—QM9”, “Unconditional 3D molecule generation—GEOMDrugs” and “Propertyguided 3D molecule optimization—QM9” sections, the GCDMSBDD model uses 9 GCP messagepassing layers along with 256 (64) and 32 (16) invariant (equivariant) node and edge features, respectively.
Results
Table 4 shows that, across both of the standard SBDD datasets (i.e., Binding MOAD and CrossDocked), GCDMSBDD generates more clashfree (PBValid) and lower energy (Vina) molecules compared to prior methods. Moreover, across all other metrics, GCDMSBDD achieves comparable or better results in terms of druglikeness measures (e.g., QED) and comparable results for all other molecule metrics without performing any hyperparameter tuning due to compute constraints. These results suggest that GCDM, with GCPNET++ as its denoising neural network, not only works well for de novo 3D molecule generation but also protein targetspecific 3D molecule generation, notably expanding the number of realworld application areas of GCDM. Concretely, GCDMSBDD improves upon DiffSBDD’s average Vina energy scores by 8% on average across both datasets while generating more than twice as many PBvalid “candidate” molecules for the more challenging Binding MOAD dataset.
As suggested by ref. ^{23}, the gap between the PBValid ratios in Table 4 without and with proteinligand steric clashes considered for both GCDMSBDD and DiffSBDD suggests that deep learningbased drug design methods for targeted protein pockets can likely benefit significantly from interactionaware molecular dynamics relaxation following proteinconditional molecule generation, which may allow for many generated “candidate” molecules to have their PB validity “recovered” by such relaxation. Nonetheless, Fig. 7 demonstrates that GCDM can consistently generate clashfree realistic and diverse 3D molecules with low Vina energies for unseen protein targets.
Conclusions
While previous methods for 3D molecule generation have possessed insufficient geometric and molecular priors for scaling well to a variety of molecular datasets, in this work, we introduced a geometrycomplete diffusion model (GCDM) that establishes a clear performance advantage over previous methods, generating more realistic, stable, valid, unique, and propertyspecific 3D molecules, while enabling the generation of many large 3D molecules that are energetically stable as well as chemically and structurally valid. Moreover, GCDM does so without complex modeling techniques such as latent diffusion, which suggests that GCDM’s results could likely be further improved by expanding upon these techniques^{33}. Although GCDM’s results here are promising, since it (like previous methods) requires fullyconnected graph attention as well as 1000 time steps to generate a highquality batch of 3D molecules, using it to generate several thousand large molecules can take a notable amount of time (e.g., 15 minutes to generate 250 new large molecules). As such, future research with GCDM could involve adding new timeefficient graph construction or sampling algorithms^{55} or exploring the impact of higherorder (e.g., type2 tensor) yet efficient geometric expressiveness^{56} on 3D generative models to accelerate sample generation and increase sample quality. Furthermore, integrating additional external tools for assessing the quality and rationality of generated molecules^{57} is a promising direction for future work.
Methods
Problem setting
In this work, our goal is to generate new 3D molecules either unconditionally or conditioned on userspecified properties. We represent a molecular point cloud (e.g., 3D molecule) as a fullyconnected 3D graph \({{{{{{{\mathcal{G}}}}}}}}=({{{{{{{\mathcal{V}}}}}}}},{{{{{{{\mathcal{E}}}}}}}})\) with \({{{{{{{\mathcal{V}}}}}}}}\) and \({{{{{{{\mathcal{E}}}}}}}}\) representing the graph’s sets of nodes and edges, respectively, and \(N= {{{{{{{\mathcal{V}}}}}}}}\) and \(E= {{{{{{{\mathcal{E}}}}}}}}\) representing the numbers of nodes and edges in the graph, accordingly. In addition, \({{{{{{{\bf{X}}}}}}}}=({{{{{{{{\bf{x}}}}}}}}}_{1},{{{{{{{{\bf{x}}}}}}}}}_{2},...,{{{{{{{{\bf{x}}}}}}}}}_{N})\in {{\mathbb{R}}}^{N\times 3}\) represents the respective Cartesian coordinates for each node (i.e., atom). Each node in \({{{{{{{\mathcal{G}}}}}}}}\) is described by scalar features \({{{{{{{\bf{H}}}}}}}}\in {{\mathbb{R}}}^{N\times h}\) and m vectorvalued features \({{{{{{{\boldsymbol{\chi }}}}}}}}\in {{\mathbb{R}}}^{N\times (m\times 3)}\). Likewise, each edge in \({{{{{{{\mathcal{G}}}}}}}}\) is described by scalar features \({{{{{{{\bf{E}}}}}}}}\in {{\mathbb{R}}}^{E\times e}\) and x vectorvalued features \({{{{{{{\boldsymbol{\xi }}}}}}}}\in {{\mathbb{R}}}^{E\times (x\times 3)}\). Then, let \({{{{{{{\mathcal{M}}}}}}}}=[{{{{{{{\bf{X}}}}}}}},{{{{{{{\bf{H}}}}}}}}]\) represent the molecules (i.e., atom coordinates and atom types) our method is tasked with generating, where [⋅,⋅] denotes the concatenation of two variables. Important to note is that the input features H and E are invariant to 3D rototranslations, whereas the input vector features X, χ and ξ are equivariant to 3D rototranslations. Lastly, in particular, we design a denoising neural network Φ to be equivariant to 3D rototranslations (i.e., SE(3)equivariant) by defining it such that its internal operations and outputs match corresponding 3D rototranslations acting upon its inputs.
Overview of GCDM
We will now introduce GCDM, a new GeometryComplete SE(3)Equivariant Diffusion Model. GCDM defines a joint noising process on equivariant atom coordinates x and invariant atom types h to produce a noisy representation z = [z^{(x)}, z^{(h)}] and then learns a generative denoising process using the newlyproposed GCPNET++ model (see Supplementary Methods A.1), which desirably contains two distinct feature channels for scalar and vector features, respectively, and supports geometrycomplete and chiralityaware messagepassing^{58}.
As an extension of the DDPM framework^{59} outlined in Supplementary Methods A.2.1, GCDM is designed to generate molecules in 3D while maintaining SE(3) equivariance, in contrast to previous methods that generate molecules solely in 1D^{60}, 2D^{61}, or 3D modalities without considering chirality^{9,25}. GCDM generates molecules by directly placing atoms in continuous 3D space and assigning them discrete types, which is accomplished by modeling forward and reverse diffusion processes, respectively:
Overall, these processes describe a latent variable model p_{Φ}(z_{0}) = ∫p_{Φ}(z_{0:T})dz_{1:T} given a sequence of latent variables z_{0}, z_{1}, …, z_{T} matching the dimensionality of the data \({{{{{{{\mathcal{M}}}}}}}} \sim p({{{{{{{{\bf{z}}}}}}}}}_{0})\). As illustrated in Fig. 1, the forward process (directed from right to left) iteratively adds noise to an input, and the learned reverse process (directed from left to right) iteratively denoises a noisy input to generate new examples from the original data distribution. We will now proceed to formulate GCDM’s joint diffusion process and its remaining practical details.
Joint molecular diffusion
Recall that our model’s molecular graph inputs, \({{{{{{{\mathcal{G}}}}}}}}\), associate with each node a 3D position \({{{{{{{{\bf{x}}}}}}}}}_{i}\in {{\mathbb{R}}}^{3}\) and a feature vector \({{{{{{{{\bf{h}}}}}}}}}_{i}\in {{\mathbb{R}}}^{h}\). By way of adding random noise to these model inputs at each time step t via a fixed, Markov chain variance schedule \({\sigma }_{1}^{2},{\sigma }_{2}^{2},\ldots ,{\sigma }_{T}^{2}\), we can define a joint molecular diffusion process for equivariant atom coordinates x and invariant atom types h as the product of two distributions^{25}:
where \({{{{{{{{\mathcal{N}}}}}}}}}_{xh}\) serves as concise notation to denote the product of two normal distributions; the first distribution, \({{{{{{{{\mathcal{N}}}}}}}}}_{x}\), represents the noised node coordinates; the second distribution, \({{{{{{{{\mathcal{N}}}}}}}}}_{h}\), represents the noised node features; and \({\alpha }_{t}=\sqrt{1{\sigma }_{t}^{2}}\) following the variance preserving process of ref. ^{59}. With α_{t∣s} = α_{t}/α_{s} and \({\sigma }_{t s}^{2}={\sigma }_{t}^{2}{\alpha }_{t s}{\sigma }_{s}^{2}\) for any t > s, we can directly obtain the noisy data distribution q(z_{t}∣z_{0}) at any time step t:
Bayes Theorem then tells us that if we then define μ_{t→s}(z_{t}, z_{0}) and σ_{t→s} as:
we have that the inverse of the noising process, the true denoising process, is given by the posterior of the transitions conditioned on \({{{{{{{\mathcal{M}}}}}}}} \sim {{{{{{{{\bf{z}}}}}}}}}_{0}\), a process that is also Gaussian^{25}:
Parametrization of the reverse process
Noise parametrization
We now need to define the learned generative reverse process that denoises pure noise into realistic examples from the original data distribution. Towards this end, we can directly use the noise posteriors q(z_{s}∣z_{t}, z_{0}) of Eq. A12 within Supplementary Methods A.2.1 after sampling \({{{{{{{{\bf{z}}}}}}}}}_{0} \sim ({{{{{{{\mathcal{M}}}}}}}}=[{{{{{{{\bf{x}}}}}}}},{{{{{{{\bf{h}}}}}}}}])\). However, to do so, we must replace the input variables x and h with the approximations \(\hat{{{{{{{{\bf{x}}}}}}}}}\) and \(\hat{{{{{{{{\bf{h}}}}}}}}}\) predicted by the denoising neural network Φ:
where the values for \({\tilde{{{{{{{{\bf{z}}}}}}}}}}_{0}=[\hat{{{{{{{{\bf{x}}}}}}}}},\hat{{{{{{{{\bf{h}}}}}}}}}]\) depend on z_{t}, t, and the denoising neural network Φ. GCDM then parametrizes \({{{{{{{{\boldsymbol{\mu }}}}}}}}{}_{{{{{{{{\boldsymbol{\Phi }}}}}}}}}}_{t\to s}({{{{{{{{\bf{z}}}}}}}}}_{t},{\tilde{{{{{{{{\bf{z}}}}}}}}}}_{0})\) to predict the noise \(\hat{{{{{{{{\boldsymbol{\epsilon }}}}}}}}}=[{\hat{{{{{{{{\boldsymbol{\epsilon }}}}}}}}}}^{(x)},{\hat{{{{{{{{\boldsymbol{\epsilon }}}}}}}}}}^{(h)}]\), which represents the noise individually added to \(\hat{{{{{{{{\bf{x}}}}}}}}}\) and \(\hat{{{{{{{{\bf{h}}}}}}}}}\). We can then use the predicted \(\hat{{{{{{{{\boldsymbol{\epsilon }}}}}}}}}\) to derive:
Invariant likelihood
Ideally, we desire for a 3D molecular diffusion model to assign the same likelihood to a generated molecule even after arbitrarily rotating or translating it in 3D space. To ensure the model achieves this desirable property for p_{Φ}(z_{0}), we can leverage the insight that an invariant distribution composed of an equivariant transition function yields an invariant distribution^{9,25,27}. Moreover, to address the translation invariance issue raised by ref. ^{27} in the context of handling a distribution over 3D coordinates, we adopt the zero center of gravity trick proposed by ref. ^{9} to define \({{{{{{{{\mathcal{N}}}}}}}}}_{x}\) as a normal distribution on the subspace defined by ∑_{i}x_{i} = 0. In contrast, to handle node features h_{i} that are invariant to rototranslations, we can instead use a conventional normal distribution \({{{{{{{\mathcal{N}}}}}}}}\). As such, if we parametrize the transition function p_{Φ} using an SE(3)equivariant neural network after using the zero center of gravity trick of ref. ^{9}, the model will have achieved the desired likelihood invariance property.
Geometrycomplete denoising network
Crucially, to satisfy the desired likelihood invariance property described in the “Parametrization of the reverse process” section while optimizing for model expressivity and runtime, GCDM parametrizes the denoising neural network Φ using GCPNET++, an enhanced version of the SE(3)equivariant GCPNET algorithm^{58}, that we propose in Supplementary Methods A.1.2. Notably, GCPNET++ learns both scalar (invariant) and vector (equivariant) node and edge features through a chiralitysensitive graph message passing procedure, which enables GCDM to denoise its noisy molecular graph inputs using not only noisy scalar features but also noisy vector features that are derived directly from the noisy node coordinates z^{(x)} (i.e., ψ(z^{(x)})). We empirically find that incorporating such noisy vectors considerably increases GCDM’s representation capacity for 3D graph denoising.
Optimization objective
Following previous works on diffusion models^{25,32,59}, the noise parametrization chosen for GCDM yields the following model training objective:
where \({\hat{\epsilon }}_{t}\) is the denoising network’s noise prediction for atom types and coordinates as described above and where we empirically choose to set w(t) = 1 for the best possible generation results. Additionally, GCDM permits a negative loglikelihood computation using the same optimization terms as ref. ^{25}, for which we refer interested readers to Supplementary Methods A.2.2–A.2.4.
Data availability
The data required to train new GCDM models or reproduce our results are available under a Creative Commons Attribution 4.0 International Public License at https://zenodo.org/record/7881981^{62}. Additionally, all pretrained model checkpoints are available under a Creative Commons Attribution 4.0 International Public License at https://zenodo.org/record/10995319^{63}.
Code availability
The source code for GCDM is available at https://github.com/BioinfoMachineLearning/BioDiffusion, and the source code for structurebased drug design experiments with GCDM is separately available at https://github.com/BioinfoMachineLearning/GCDMSBDD.
References
Rombach, R., Blattmann, A., Lorenz, D., Esser, P. & Ommer, B. Highresolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10684–10695 (2022).
Kong, Z., Ping, W., Huang, J., Zhao, K. & Catanzaro, B. Diffwave: a versatile diffusion model for audio synthesis. International Conference on Learning Representations (2021).
Peebles, W., Radosavovic, I., Brooks, T., Efros, A. A. & Malik, J. Learning to learn with generative models of neural network checkpoints. Preprint at arXiv https://doi.org/10.48550/arXiv.2209.12892 (2022).
Anand, N. & Achim, T. Protein structure and sequence generation with equivariant denoising diffusion probabilistic models. Preprint at arXiv https://doi.org/10.48550/arXiv.2205.15019 (2022).
Corso, G., Stärk, H., Jing, B., Barzilay, R. & Jaakkola, T. Diffdock: diffusion steps, twists, and turns for molecular docking. International Conference on Learning Representations (2023).
Guo, Z. et al. Diffusion models in bioinformatics and computational biology. Nat. Rev. Bioeng. 2, 136–154 (2024).
Watson, J. L. et al. De novo design of protein structure and function with rfdiffusion. Nature 620, 1089–1100 (2023).
Morehead, A., Ruffolo, J. A., Bhatnagar, A. & Madani, A. Towards joint sequencestructure generation of nucleic acid and protein complexes with se(3)discrete diffusion. In NeurIPS 2023 Workshop on Machine Learning in Structural Biology, 14 (2023).
Xu, M. et al. Geodiff: a geometric diffusion model for molecular conformation generation. International Conference on Learning Representations (2022).
Gebauer, N. W., Gastegger, M., Hessmann, S. S., Müller, K.R. & Schütt, K. T. Inverse design of 3d molecular structures with conditional generative neural networks. Nat. Commun. 13, 973 (2022).
Anstine, D. M. & Isayev, O. Generative models as an emerging paradigm in the chemical sciences. J. Am. Chem. Soc. 145, 8736–8750 (2023).
Mudur, N. & Finkbeiner, D. P. Can denoising diffusion probabilistic models generate realistic astrophysical fields? NeurIPS MLPS Workshop (2022).
Bronstein, M. M., Bruna, J., Cohen, T. & Veličković, P. Geometric deep learning: grids, groups, graphs, geodesics, and gauges. Preprint at arXiv https://doi.org/10.48550/arXiv.2104.13478 (2021).
Joshi, C. K., Bodnar, C., Mathis, S. V., Cohen, T. & Liò, P. On the expressive power of geometric graph neural networks. International Conference on Machine Learning (2023).
Stärk, H., Ganea, O., Pattanaik, L., Barzilay, R. & Jaakkola, T. Equibind: Geometric deep learning for drug binding structure prediction. In International Conference on Machine Learning, 20503–20521 (PMLR, 2022).
Morehead, A., Chen, C. & Cheng, J. Geometric transformers for protein interface contact prediction. In 10th International Conference on Learning Representations (ICLR 2022) (2022).
Jamasb*, A. R. et al. Evaluating representation learning on the protein structure universe. In 12th International Conference on Learning Representations (ICLR 2024), 14 (2024).
Morehead, A., Liu, J. & Cheng, J. Protein structure accuracy estimation using geometrycomplete perceptron networks. Protein Sci. 33, e4932 (2024).
Jumper, J. et al. Highly accurate protein structure prediction with alphafold. Nature 596, 583–589 (2021).
Vaswani, A. et al. Attention is all you need. In Advances in Neural Information Processing Systems 30 (2017).
Lin, Z. et al. Evolutionaryscale prediction of atomiclevel protein structure with a language model. Science 379, 1123–1130 (2023).
Thomas, N. et al. Tensor field networks: rotationand translationequivariant neural networks for 3d point clouds. Preprint at arXiv https://doi.org/10.48550/arXiv.1802.08219 (2018).
Buttenschoen, M., Morris, G. M. & Deane, C. M. Posebusters: AIbased docking methods fail to generate physically valid poses or generalise to novel sequences. Chem. Sci. 15, 3130–3139 (2024).
Ramakrishnan, R., Dral, P. O., Rupp, M. & Von Lilienfeld, O. A. Quantum chemistry structures and properties of 134 kilo molecules. Sci. Data 1, 1–7 (2014).
Hoogeboom, E., Satorras, V. G., Vignac, C. & Welling, M. Equivariant diffusion for molecule generation in 3D. In International Conference on Machine Learning, 8867–8887 (PMLR, 2022).
Anderson, B., Hy, T. S. & Kondor, R. Cormorant: covariant molecular neural networks. In Advances in Neural Information Processing Systems 32 (2019).
Satorras, V. G., Hoogeboom, E., Fuchs, F. B., Posner, I. & Welling, M. E (n) equivariant normalizing flows. Advances in Neural Information Processing Systems (2021).
Landrum, G. et al. Rdkit: a software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8 (2013).
Krishna, R. et al. Generalized biomolecular modeling and design with rosettafold allatom. Science 384, 291 (2024).
DeepMindIsomorphic. Performance and structural coverage of the latest, indevelopment alphafold model. DeepMind (2023).
Gebauer, N., Gastegger, M. & Schütt, K. Symmetryadapted generation of 3d point sets for the targeted discovery of molecules. In Advances in Neural Information Processing Systems 32 (2019).
Wu, L., Gong, C., Liu, X., Ye, M. & Liu, Q. Diffusionbased molecule generation with informative prior bridges. Advances in Neural Information Processing Systems (2022).
Xu, M., Powers, A., Dror, R., Ermon, S. & Leskovec, J. Geometric latent diffusion models for 3d molecule generation. International Conference on Machine Learning (2023).
Vignac, C., Osman, N., Toni, L. & Frossard, P. Midi: Mixed graph and 3d denoising diffusion for molecule generation. Joint European Conference on Machine Learning and Knowledge Discovery in Databases (2023).
Le, T., Cremer, J., Noé, F., Clevert, D.A. & Schütt, K. Navigating the design space of equivariant diffusionbased generative models for de novo 3d molecule generation. International Conference on Learning Representations(2024).
Satorras, V. G., Hoogeboom, E. & Welling, M. E (n) equivariant graph neural networks. In International Conference on Machine Learning, 9323–9332 (PMLR, 2021).
Smith, D. G. et al. Psi4 1.4: opensource software for highthroughput quantum chemistry. J. Chem. Phys. 152, 184108 (2020).
Lehtola, S., Steigemann, C., Oliveira, M. J. & Marques, M. A. Recent developments in libxca comprehensive library of functionals for density functional theory. SoftwareX 7, 1–5 (2018).
Pracht, P., Bohle, F. & Grimme, S. Automated exploration of the lowenergy chemical space with fast quantum chemical methods. Phys. Chem. Chem. Phys. 22, 7169–7192 (2020).
Axelrod, S. & GomezBombarelli, R. Geom, energyannotated molecular conformations for property prediction and molecular generation. Sci. Data 9, 185 (2022).
Rappé, A. K., Casewit, C. J., Colwell, K., Goddard III, W. A. & Skiff, W. M. Uff, a full periodic table force field for molecular mechanics and molecular dynamics simulations. J. Am. Chem. Soc. 114, 10024–10035 (1992).
Riniker, S. & Landrum, G. A. Better informed distance geometry: using what we know to improve conformation generation. J. Chem. Inf. Model. 55, 2562–2574 (2015).
Wills, S. et al. Fragment merging using a graph database samples different catalogue space than similarity search. J. Chem. Inf. Model. (2023).
Deore, A. B., Dhumane, J. R., Wagh, R. & Sonawane, R. The stages of drug discovery and development process. Asian J. Pharm. Res. Dev. 7, 62–67 (2019).
Hu, L., Benson, M. L., Smith, R. D., Lerner, M. G. & Carlson, H. A. Binding moad (mother of all databases). Proteins Struct. Funct. Bioinforma. 60, 333–340 (2005).
Francoeur, P. G. et al. Threedimensional convolutional neural networks and a crossdocked data set for structurebased drug design. J. Chem. Inf. Model. 60, 4200–4215 (2020).
Schneuing, A. et al. Structurebased drug design with equivariant diffusion models (2022).
Alhossary, A., Handoko, S. D., Mu, Y. & Kwoh, C.K. Fast, accurate, and reliable molecular docking with quickvina 2. Bioinformatics 31, 2214–2216 (2015).
Bickerton, G. R., Paolini, G. V., Besnard, J., Muresan, S. & Hopkins, A. L. Quantifying the chemical beauty of drugs. Nat. Chem. 4, 90–98 (2012).
Ertl, P. & Schuffenhauer, A. Estimation of synthetic accessibility score of druglike molecules based on molecular complexity and fragment contributions. J. Cheminform. 1, 1–11 (2009).
Peng, X. et al. Pocket2mol: efficient molecular sampling based on 3d protein pockets. In International Conference on Machine Learning, 17644–17655 (PMLR, 2022).
Lipinski, C. A. Leadand druglike compounds: the ruleoffive revolution. Drug Discov. Today Technol. 1, 337–341 (2004).
Tanimoto, T. T. Elementary Mathematical Theory of Classification and Prediction (International Business Machines Corp., 1958).
Bajusz, D., Rácz, A. & Héberger, K. Why is tanimoto index an appropriate choice for fingerprintbased similarity calculations? J. Cheminform. 7, 1–13 (2015).
Song, J., Meng, C. & Ermon, S. Denoising diffusion implicit models. International Conference on Learning Representations (2021).
Liao, Y.L., Wood, B. M., Das, A. & Smidt, T. Equiformerv2: Improved equivariant transformer for scaling to higherdegree representations. In The Twelfth International Conference on Learning Representations. https://openreview.net/forum?id=mCOBKZmrzD (2024).
Harris, C. et al. Benchmarking generated poses: how rational is structurebased drug design with generative models? Preprint at arXiv https://doi.org/10.48550/arXiv.2308.07413 (2023).
Morehead, A. & Cheng, J. Geometrycomplete perceptron networks for 3d molecular graphs. Bioinformatics (2024).
Ho, J., Jain, A. & Abbeel, P. Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst. 33, 6840–6851 (2020).
Segler, M. H., Kogej, T., Tyrchan, C. & Waller, M. P. Generating focused molecule libraries for drug discovery with recurrent neural networks. ACS Cent. Sci. 4, 120–131 (2018).
Jin, W., Barzilay, R. & Jaakkola, T. Junction tree variational autoencoder for molecular graph generation. In Proceedings of the 35th International Conference on Machine Learning, Vol. 80 of Proceedings of Machine Learning Research (eds Dy, J. & Krause, A.) 2323–2332 (PMLR, 2018).
Morehead, A. & Cheng, J. Replication Data for: EDM (Zenodo, 2023). https://doi.org/10.5281/zenodo.7881981 (2023).
Morehead, A. & Cheng, J. Replication Data for: GeometryComplete Diffusion for 3D Molecule Generation and Optimization Zenodo. https://doi.org/10.5281/zenodo.10995319 (2024).
Acknowledgements
The authors would like to thank Chaitanya Joshi and Roland Oruche for helpful discussions and feedback on early versions of this manuscript. In addition, the authors acknowledge that this work is partially supported by three NSF grants (DBI2308699, DBI1759934, and IIS1763246), two NIH grants (R01GM093123 and R01GM146340), three DOE grants (DEAR0001213, DESC0020400, and DESC0021303), and the computing allocation on the Summit compute cluster provided by the Oak Ridge Leadership Computing Facility under Contract DEAC05 00OR22725.
Author information
Authors and Affiliations
Contributions
A.M. and J.C. conceived the project. A.M. designed the experiments. A.M. performed the experiments and collected the data. A.M. analyzed the data. J.C. secured the funding for this project. A.M. and J.C. wrote the manuscript. A.M. and J.C. edited the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Communications Chemistry thanks the anonymous reviewers for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Morehead, A., Cheng, J. Geometrycomplete diffusion for 3D molecule generation and optimization. Commun Chem 7, 150 (2024). https://doi.org/10.1038/s4200402401233z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s4200402401233z
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.