Abstract
The holy grail of materials science is de novo molecular design, meaning engineering molecules with desired characteristics. The introduction of generative deep learning has greatly advanced efforts in this direction, yet molecular discovery remains challenging and often inefficient. Herein we introduce GaUDI, a guided diffusion model for inverse molecular design that combines an equivariant graph neural net for property prediction and a generative diffusion model. We demonstrate GaUDI’s effectiveness in designing molecules for organic electronic applications by using single- and multiple-objective tasks applied to a generated dataset of 475,000 polycyclic aromatic systems. GaUDI shows improved conditional design, generating molecules with optimal properties and even going beyond the original distribution to suggest better molecules than those in the dataset. In addition to point-wise targets, GaUDI can also be guided toward open-ended targets (for example, a minimum or maximum) and in all cases achieves close to 100% validity of generated molecules.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$99.00 per year
only $8.25 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
All data for cc-PBHs used in this project were obtained from the COMPAS project33, a freely available data repository at https://gitlab.com/porannegroup/compas. All PAS data are available free of charge at https://doi.org/10.5281/zenodo.7798697 (ref. 61). Source Data are provided with this paper.
Code availability
All codes used to train the models and generate molecules are provided free of charge at https://gitlab.com/porannegroup/gaudi (minted version https://doi.org/10.5281/zenodo.8311764)62. The repository also contains an original tutorial for generating GOR representations of PASs and for generating new PASs with user-defined target functions.
References
Hwang, J. et al. Perovskites in catalysis and electrocatalysis. Science 358, 751–756 (2017).
Bilodeau, C., Jin, W., Jaakkola, T., Barzilay, R. & Jensen, K. F. Generative models for molecular discovery: recent advances and challenges. Wiley Interdiscip. Rev. Comput. Mol. Sci. 12, e1608 (2022).
Fuhr, A. S. & Sumpter, B. G. Deep generative models for materials discovery and machine learning-accelerated innovation. Front. Mater. https://doi.org/10.3389/fmats.2022.865270 (2022).
Walters, W. P. & Barzilay, R. Applications of deep learning in molecule generation and molecular property prediction. Acc. Chem. Res. 54, 263–270 (2020).
Anstine, D. M. & Isayev, O. Generative models as an emerging paradigm in the chemical sciences. J. Am. Chem. Soc. 145, 8736–8750 (2023).
Popova, M., Isayev, O. & Tropsha, A. Deep reinforcement learning for de novo drug design. Sci. Adv. 4, eaap7885 (2018).
Shree Sowndarya, S. V. et al. Multi-objective goal-directed optimization of de novo stable organic radicals for aqueous redox flow batteries. Nat. Mach. Intell. 4, 720–730 (2022).
Gómez-Bombarelli, R. et al. Automatic chemical design using a data-driven continuous representation of molecules. ACS Central Sci. 4, 268–276 (2018).
Sanchez-Lengeling, B. & Aspuru-Guzik, A. Inverse molecular design using machine learning: generative models for matter engineering. Science 361, 360–365 (2018).
Putin, E. et al. Reinforced adversarial neural computer for de novo molecular design. J. Chem. Inform. Model. 58, 1194–1204 (2018).
Prykhodko, O. et al. A de novo molecular generation method using latent vector based generative adversarial network. J. Cheminform. 11, 1–13 (2019).
Jennings, P. C., Lysgaard, S., Hummelshøj, J. S., Vegge, T. & Bligaard, T. Genetic algorithms for computational materials discovery accelerated by machine learning. npj Comput. Mater. 5, 46 (2019).
Henault, E. S., Rasmussen, M. H. & Jensen, J. H. Chemical space exploration: how genetic algorithms find the needle in the haystack. Peer J. Phys. Chem. 2, e11 (2020).
Jensen, J. H. A graph-based genetic algorithm and generative model/Monte Carlo tree search for the exploration of chemical space. Chem. Sci. 10, 3567–3572 (2019).
Shi, C. et al. GraphAF: a flow-based autoregressive model for molecular graph generation. Preprint at https://arxiv.org/abs/2001.09382 (2020).
Hoogeboom, E., Satorras, V. G., Vignac, C. & Welling, M. Equivariant diffusion for molecule generation in 3D. In Proc. 39th International Conference on Machine Learning 8867–8887 (ML Research Press, Cambridge, 2022).
Ho, J., Jain, A. & Abbeel, P. Denoising diffusion probabilistic models. In Proc. 34th International Conference on Neural Information Processing Systems 6840–6851 (Curran Associates Inc., Red Hook, 2020).
Ho, J. et al. Video diffusion models. Preprint at https://arxiv.org/abs/2204.03458 (2022).
Austin, J., Johnson, D. D., Ho, J., Tarlow, D. & van den Berg, R. Structured denoising diffusion models in discrete state-spaces. In Proc. 35th Conference on Neural Information Processing Systems 17981–17993 (Curran Associates Inc., Red Hook, 2021).
Xu, M. et al. GeoDiff: a geometric diffusion model for molecular conformation generation. Preprint at https://arxiv.org/abs/2203.02923 (2022).
Lyngby, P. & Thygesen, K. S. Data-driven discovery of 2D materials by deep generative models. npj Comput. Mater. 8, 232 (2022).
Corso, G., Stärk, H., Jing, B., Barzilay, R. & Jaakkola, T. DiffDock: diffusion steps, twists, and turns for molecular docking. Preprint at https://arxiv.org/abs/2210.01776 (2022).
Dhariwal, P. & Nichol, A. Diffusion models beat GANs on image synthesis. In Proc. 35th Conference on Neural Information Processing Systems 8780–8794 (Curran Associates Inc., Red Hook, 2021).
Ho, J. & Salimans, T. Classifier-free diffusion guidance. Preprint at https://arxiv.org/abs/2207.12598 (2022).
Song, Y. et al. Score-based generative modeling through stochastic differential equations. Preprint at https://arxiv.org/abs/2011.13456 (2021).
Balaban, A. T., Oniciu, D. C. & Katritzky, A. R. Aromaticity as a cornerstone of heterocyclic chemistry. Chem. Rev. 104, 2777–2812 (2004).
Li, Q. et al. Polycyclic aromatic hydrocarbon-based organic semiconductors: ring-closing synthesis and optoelectronic properties. J. Mater. Chem. C 10, 2411–2430 (2022).
Aumaitre, C. & Morin, J.-F. Polycyclic aromatic hydrocarbons as potential building blocks for organic solar cells. Chem. Rec. 19, 1142–1154 (2019).
Kilaru, S. et al. Organic materials based on hetero polycyclic aromatic hydrocarbons for organic thin-film transistor applications. Mater. Sci. Semicond. Process. 147, 106730 (2022).
Omar, Ö. H., Del Cueto, M., Nematiaram, T. & Troisi, A. High-throughput virtual screening for organic electronics: a comparative study of alternative strategies. J. Mater. Chem. C 9, 13557–13583 (2021).
Das, S., Bhauriyal, P. & Pathak, B. Polycyclic aromatic hydrocarbons as prospective cathodes for aluminum organic batteries. J. Phys. Chem. C 125, 49–57 (2020).
Weiss, T., Wahab, A., Bronstein, A. M. & Gershoni-Poranne, R. Interpretable deep-learning unveils structure–property relationships in polybenzenoid hydrocarbons. J. Organic Chem. https://doi.org/10.1021/acs.joc.2c02381 (2023).
Wahab, A., Pfuderer, L., Paenurk, E. & Gershoni-Poranne, R. The COMPAS project: a computational database of polycyclic aromatic systems. Phase 1: cata-condensed polybenzenoid hydrocarbons. J. Chem. Inf. Model. 62, 3704–3713 (2022).
Landrum, G. et al. RDKit: A Software Suite for Cheminformatics, Computational Chemistry, and Predictive Modeling (RDKit, 2013).
Olivecrona, M., Blaschke, T., Engkvist, O. & Chen, H. Molecular de-novo design through deep reinforcement learning. J. Cheminform. 9, 1–14 (2017).
Gao, W., Fu, T., Sun, J. & Coley, C. Sample efficiency matters: a benchmark for practical molecular optimization. In Proc. 36th Conference on Neural Information Processing Systems 21342–21357 (Curran Associates Inc., Red Hook, 2022).
Gebauer, N., Gastegger, M. & Schütt, K. Symmetry-adapted generation of 3D point sets for the targeted discovery of molecules. In Proc. 33rd Conference on Neural Information Processing Systems (Curran Associates Inc., Red Hook, 2019).
Schilter, O., Vaucher, A., Schwaller, P. & Laino, T. Designing catalysts with deep generative models and computational data. A case study for Suzuki cross coupling reactions. Digit. Discov. 2, 728–735 (2023).
Westermayr, J., Gilkes, J., Barrett, R. & Maurer, R. J. High-throughput property-driven generative design of functional organic molecules. Nat. Comput. Sci. 3, 139–148 (2023).
Bao, F. et al. Equivariant energy-guided SDE for inverse molecular design. Preprint at https://arxiv.org/abs/2209.15408 (2022),
Fite, S., Wahab, A., Paenurk, E., Gross, Z. & Gershoni-Poranne, R. Text-based representations with interpretable machine learning reveal structure–property relationships of polybenzenoid hydrocarbons. J. Phys. Org. Chem. 36, e4458 (2022).
Gidron, O., Dadvand, A., Sheynin, Y., Bendikov, M. & Perepichka, D. F. Towards ‘green’ electronic materials. α-Oligofurans as semiconductors. Chem. Commun. 47, 1976–1978 (2011).
Gidron, O. & Bendikov, M. α-Oligofurans: an emerging class of conjugated oligomers for organic electronics. Angew. Chem. Int. Ed. 53, 2546–2555 (2014).
Li, X.-H. et al. Narrow-bandgap materials for optoelectronics applications. Front. Phy. 17, 1–33 (2022).
Agnoli, S. & Favaro, M. Doping graphene with boron: a review of synthesis methods, physicochemical characterization, and emerging applications. J. Mater. Chem. A 4, 5002–5025 (2016).
Kahan, R. J., Hirunpinyopas, W., Cid, J., Ingleson, M. J. & Dryfe, R. A. Well-defined boron/nitrogen-doped polycyclic aromatic hydrocarbons are active electrocatalysts for the oxygen reduction reaction. Chem. Mater. 31, 1891–1898 (2019).
Stoycheva, J. et al. Boron-doped polycyclic aromatic hydrocarbons: a molecular set revealing the interplay between topology and singlet fission propensity. J. Phys. Chem. Lett. 11, 1390–1396 (2020).
Kothavale, S. S. & Lee, J. Y. Three-and four-coordinate, boron-based, thermally activated delayed fluorescent emitters. Adv. Optical Mater. 8, 2000922 (2020).
Brinkmann, G., Grothaus, C. & Gutman, I. Fusenes and benzenoids with perfect matchings. J. Math. Chem. 42, 909–924 (2007).
Grimme, S., Bannwarth, C. & Shushkov, P. A robust and accurate tight-binding quantum chemical method for structures, vibrational frequencies, and noncovalent interactions of large molecular systems parametrized for all spd-block elements (Z = 1–86). J. Chem. Theory Comput. 13, 1989–2009 (2017).
Bannwarth, C., Ehlert, S. & Grimme, S. GFN2-xTB—an accurate and broadly parametrized self-consistent tight-binding quantum chemical method with multipole electrostatics and density-dependent dispersion contributions. J. Chem. Theory Comput. 15, 1652–1671 (2019).
SMARTS—A Language for Describing Molecular Patterns (Daylight Chemical Information Systems, 2007).
Weininger, D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inform. Comput. Sci. 28, 31–36 (1988).
Heller, S., McNaught, A., Stein, S., Tchekhovskoi, D. & Pletnev, I. InChI—the worldwide chemical structure identifier standard. J. Cheminform. 5, 1–9 (2013).
Riniker, S. & Landrum, G. A. Better informed distance geometry: using what we know to improve conformation generation. J. Chem. Inform. Model. 55, 2562–2574 (2015).
Rappé, A. K., Casewit, C. J., Colwell, K., Goddard III, W. A. & Skiff, W. M. UFF, a full periodic table force field for molecular mechanics and molecular dynamics simulations. J. Am. Chem. Soc. 114, 10024–10035 (1992).
Goodfellow, I. et al. Generative adversarial networks. Commun. ACM 63, 139–144 (2020).
Satorras, V. G., Hoogeboom, E. & Welling, M. E(n) equivariant graph neural networks. In Proc. 38th International Conference on Machine Learning 9323–9332 (ML Research Press, Cambridge, 2021).
Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N. & Ganguli, S. Deep unsupervised learning using nonequilibrium thermodynamics. In Proc. 32nd International Conference on Machine Learning 2256–2265 (ML Research Press, Cambridge, 2015).
Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. Preprint at https://arxiv.org/abs/1412.6980 (2014).
Weiss, T., Mayo-Yanes, E., Chakraborty, S. & Gershoni-Poranne, R. PASs molecular dataset. Zenodo https://doi.org/10.5281/zenodo.7798697 (2023).
Weiss, T. GaUDI—2/9/2023. Zenodo https://doi.org/10.5281/zenodo.8311764 (2023).
Acknowledgements
We thank A. Wahab (ETH Zurich) for assistance with implementing the RDKit validity code and for proofreading the paper. We also thank A. Tsybizova (ETH Zurich) for proofreading and for providing helpful comments on the clarity of the text. We gratefully acknowledge P. Chen (ETH Zurich) for his scientific support and mentorship. E.M.Y., S.C. and R.G.P. are grateful for the financial support of the Branco Weiss Fellowship (awarded to R.G.P). R.G.P. is a Branco Weiss Fellow and a Horev Fellow. A.M.B. and T.W. were partially supported by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement no. 863839) and by the Council For Higher Education - Planning & Budgeting Committee. L.C. is supported by the IRIDE grant from DAIS, Ca’ Foscari University of Venice.
Author information
Authors and Affiliations
Contributions
R.G.P. and A.M.B. conceived the original idea and designed and supervised the research project. T.W., L.C. and A.M.B. designed the generative and predictive models. T.W. wrote the code and trained the models. E.M.Y. and S.C. performed the quantum chemistry calculations. E.M.Y., S.C. and R.G.P. performed the dataset curation. T.W. and R.G.P. wrote the paper with the help of the other authors. The paper reflects the contributions of all authors.
Corresponding authors
Ethics declarations
Competing interests
All other authors declare no competing interests.
Peer review
Peer review information
Nature Computational Science thanks Ganna Gryn’ova, Rocío Mercado, Rostislav Fedorov and the other, anonymous reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Kaitlin McCardle, in collaboration with the Nature Computational Science team. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Figs. 1–8 and Discussion.
Source data
Source Data Fig. 2
Numerical source data for data distribution.
Source Data Fig. 3
;Numerical source data for data distribution.
Source Data Fig. 4
Numerical source data for data distribution.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Weiss, T., Mayo Yanes, E., Chakraborty, S. et al. Guided diffusion for inverse molecular design. Nat Comput Sci 3, 873–882 (2023). https://doi.org/10.1038/s43588-023-00532-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s43588-023-00532-0
This article is cited by
-
COMPAS-2: a dataset of cata-condensed hetero-polycyclic aromatic systems
Scientific Data (2024)
-
Crafting molecular architectures with guided diffusion
Nature Computational Science (2023)