Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Guided diffusion for inverse molecular design

A preprint version of the article is available at ChemRxiv.

Abstract

The holy grail of materials science is de novo molecular design, meaning engineering molecules with desired characteristics. The introduction of generative deep learning has greatly advanced efforts in this direction, yet molecular discovery remains challenging and often inefficient. Herein we introduce GaUDI, a guided diffusion model for inverse molecular design that combines an equivariant graph neural net for property prediction and a generative diffusion model. We demonstrate GaUDI’s effectiveness in designing molecules for organic electronic applications by using single- and multiple-objective tasks applied to a generated dataset of 475,000 polycyclic aromatic systems. GaUDI shows improved conditional design, generating molecules with optimal properties and even going beyond the original distribution to suggest better molecules than those in the dataset. In addition to point-wise targets, GaUDI can also be guided toward open-ended targets (for example, a minimum or maximum) and in all cases achieves close to 100% validity of generated molecules.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Generation workflow.
Fig. 2: Guided generation of cc-PBH molecules to global minimum.
Fig. 3: Guided design of PASs with high HLG values.
Fig. 4: Guided design of narrow-band-gap molecules.

Similar content being viewed by others

Data availability

All data for cc-PBHs used in this project were obtained from the COMPAS project33, a freely available data repository at https://gitlab.com/porannegroup/compas. All PAS data are available free of charge at https://doi.org/10.5281/zenodo.7798697 (ref. 61). Source Data are provided with this paper.

Code availability

All codes used to train the models and generate molecules are provided free of charge at https://gitlab.com/porannegroup/gaudi (minted version https://doi.org/10.5281/zenodo.8311764)62. The repository also contains an original tutorial for generating GOR representations of PASs and for generating new PASs with user-defined target functions.

References

  1. Hwang, J. et al. Perovskites in catalysis and electrocatalysis. Science 358, 751–756 (2017).

    Article  Google Scholar 

  2. Bilodeau, C., Jin, W., Jaakkola, T., Barzilay, R. & Jensen, K. F. Generative models for molecular discovery: recent advances and challenges. Wiley Interdiscip. Rev. Comput. Mol. Sci. 12, e1608 (2022).

    Article  Google Scholar 

  3. Fuhr, A. S. & Sumpter, B. G. Deep generative models for materials discovery and machine learning-accelerated innovation. Front. Mater. https://doi.org/10.3389/fmats.2022.865270 (2022).

  4. Walters, W. P. & Barzilay, R. Applications of deep learning in molecule generation and molecular property prediction. Acc. Chem. Res. 54, 263–270 (2020).

    Article  Google Scholar 

  5. Anstine, D. M. & Isayev, O. Generative models as an emerging paradigm in the chemical sciences. J. Am. Chem. Soc. 145, 8736–8750 (2023).

  6. Popova, M., Isayev, O. & Tropsha, A. Deep reinforcement learning for de novo drug design. Sci. Adv. 4, eaap7885 (2018).

    Article  Google Scholar 

  7. Shree Sowndarya, S. V. et al. Multi-objective goal-directed optimization of de novo stable organic radicals for aqueous redox flow batteries. Nat. Mach. Intell. 4, 720–730 (2022).

    Article  Google Scholar 

  8. Gómez-Bombarelli, R. et al. Automatic chemical design using a data-driven continuous representation of molecules. ACS Central Sci. 4, 268–276 (2018).

    Article  Google Scholar 

  9. Sanchez-Lengeling, B. & Aspuru-Guzik, A. Inverse molecular design using machine learning: generative models for matter engineering. Science 361, 360–365 (2018).

    Article  Google Scholar 

  10. Putin, E. et al. Reinforced adversarial neural computer for de novo molecular design. J. Chem. Inform. Model. 58, 1194–1204 (2018).

    Article  Google Scholar 

  11. Prykhodko, O. et al. A de novo molecular generation method using latent vector based generative adversarial network. J. Cheminform. 11, 1–13 (2019).

    Article  Google Scholar 

  12. Jennings, P. C., Lysgaard, S., Hummelshøj, J. S., Vegge, T. & Bligaard, T. Genetic algorithms for computational materials discovery accelerated by machine learning. npj Comput. Mater. 5, 46 (2019).

    Article  Google Scholar 

  13. Henault, E. S., Rasmussen, M. H. & Jensen, J. H. Chemical space exploration: how genetic algorithms find the needle in the haystack. Peer J. Phys. Chem. 2, e11 (2020).

    Article  Google Scholar 

  14. Jensen, J. H. A graph-based genetic algorithm and generative model/Monte Carlo tree search for the exploration of chemical space. Chem. Sci. 10, 3567–3572 (2019).

    Article  Google Scholar 

  15. Shi, C. et al. GraphAF: a flow-based autoregressive model for molecular graph generation. Preprint at https://arxiv.org/abs/2001.09382 (2020).

  16. Hoogeboom, E., Satorras, V. G., Vignac, C. & Welling, M. Equivariant diffusion for molecule generation in 3D. In Proc. 39th International Conference on Machine Learning 8867–8887 (ML Research Press, Cambridge, 2022).

  17. Ho, J., Jain, A. & Abbeel, P. Denoising diffusion probabilistic models. In Proc. 34th International Conference on Neural Information Processing Systems 6840–6851 (Curran Associates Inc., Red Hook, 2020).

  18. Ho, J. et al. Video diffusion models. Preprint at https://arxiv.org/abs/2204.03458 (2022).

  19. Austin, J., Johnson, D. D., Ho, J., Tarlow, D. & van den Berg, R. Structured denoising diffusion models in discrete state-spaces. In Proc. 35th Conference on Neural Information Processing Systems 17981–17993 (Curran Associates Inc., Red Hook, 2021).

  20. Xu, M. et al. GeoDiff: a geometric diffusion model for molecular conformation generation. Preprint at https://arxiv.org/abs/2203.02923 (2022).

  21. Lyngby, P. & Thygesen, K. S. Data-driven discovery of 2D materials by deep generative models. npj Comput. Mater. 8, 232 (2022).

    Article  Google Scholar 

  22. Corso, G., Stärk, H., Jing, B., Barzilay, R. & Jaakkola, T. DiffDock: diffusion steps, twists, and turns for molecular docking. Preprint at https://arxiv.org/abs/2210.01776 (2022).

  23. Dhariwal, P. & Nichol, A. Diffusion models beat GANs on image synthesis. In Proc. 35th Conference on Neural Information Processing Systems 8780–8794 (Curran Associates Inc., Red Hook, 2021).

  24. Ho, J. & Salimans, T. Classifier-free diffusion guidance. Preprint at https://arxiv.org/abs/2207.12598 (2022).

  25. Song, Y. et al. Score-based generative modeling through stochastic differential equations. Preprint at https://arxiv.org/abs/2011.13456 (2021).

  26. Balaban, A. T., Oniciu, D. C. & Katritzky, A. R. Aromaticity as a cornerstone of heterocyclic chemistry. Chem. Rev. 104, 2777–2812 (2004).

    Article  Google Scholar 

  27. Li, Q. et al. Polycyclic aromatic hydrocarbon-based organic semiconductors: ring-closing synthesis and optoelectronic properties. J. Mater. Chem. C 10, 2411–2430 (2022).

    Article  Google Scholar 

  28. Aumaitre, C. & Morin, J.-F. Polycyclic aromatic hydrocarbons as potential building blocks for organic solar cells. Chem. Rec. 19, 1142–1154 (2019).

    Article  Google Scholar 

  29. Kilaru, S. et al. Organic materials based on hetero polycyclic aromatic hydrocarbons for organic thin-film transistor applications. Mater. Sci. Semicond. Process. 147, 106730 (2022).

    Article  Google Scholar 

  30. Omar, Ö. H., Del Cueto, M., Nematiaram, T. & Troisi, A. High-throughput virtual screening for organic electronics: a comparative study of alternative strategies. J. Mater. Chem. C 9, 13557–13583 (2021).

    Article  Google Scholar 

  31. Das, S., Bhauriyal, P. & Pathak, B. Polycyclic aromatic hydrocarbons as prospective cathodes for aluminum organic batteries. J. Phys. Chem. C 125, 49–57 (2020).

    Article  Google Scholar 

  32. Weiss, T., Wahab, A., Bronstein, A. M. & Gershoni-Poranne, R. Interpretable deep-learning unveils structure–property relationships in polybenzenoid hydrocarbons. J. Organic Chem. https://doi.org/10.1021/acs.joc.2c02381 (2023).

  33. Wahab, A., Pfuderer, L., Paenurk, E. & Gershoni-Poranne, R. The COMPAS project: a computational database of polycyclic aromatic systems. Phase 1: cata-condensed polybenzenoid hydrocarbons. J. Chem. Inf. Model. 62, 3704–3713 (2022).

    Article  Google Scholar 

  34. Landrum, G. et al. RDKit: A Software Suite for Cheminformatics, Computational Chemistry, and Predictive Modeling (RDKit, 2013).

  35. Olivecrona, M., Blaschke, T., Engkvist, O. & Chen, H. Molecular de-novo design through deep reinforcement learning. J. Cheminform. 9, 1–14 (2017).

    Article  Google Scholar 

  36. Gao, W., Fu, T., Sun, J. & Coley, C. Sample efficiency matters: a benchmark for practical molecular optimization. In Proc. 36th Conference on Neural Information Processing Systems 21342–21357 (Curran Associates Inc., Red Hook, 2022).

  37. Gebauer, N., Gastegger, M. & Schütt, K. Symmetry-adapted generation of 3D point sets for the targeted discovery of molecules. In Proc. 33rd Conference on Neural Information Processing Systems (Curran Associates Inc., Red Hook, 2019).

  38. Schilter, O., Vaucher, A., Schwaller, P. & Laino, T. Designing catalysts with deep generative models and computational data. A case study for Suzuki cross coupling reactions. Digit. Discov. 2, 728–735 (2023).

    Article  Google Scholar 

  39. Westermayr, J., Gilkes, J., Barrett, R. & Maurer, R. J. High-throughput property-driven generative design of functional organic molecules. Nat. Comput. Sci. 3, 139–148 (2023).

  40. Bao, F. et al. Equivariant energy-guided SDE for inverse molecular design. Preprint at https://arxiv.org/abs/2209.15408 (2022),

  41. Fite, S., Wahab, A., Paenurk, E., Gross, Z. & Gershoni-Poranne, R. Text-based representations with interpretable machine learning reveal structure–property relationships of polybenzenoid hydrocarbons. J. Phys. Org. Chem. 36, e4458 (2022).

  42. Gidron, O., Dadvand, A., Sheynin, Y., Bendikov, M. & Perepichka, D. F. Towards ‘green’ electronic materials. α-Oligofurans as semiconductors. Chem. Commun. 47, 1976–1978 (2011).

    Article  Google Scholar 

  43. Gidron, O. & Bendikov, M. α-Oligofurans: an emerging class of conjugated oligomers for organic electronics. Angew. Chem. Int. Ed. 53, 2546–2555 (2014).

    Article  Google Scholar 

  44. Li, X.-H. et al. Narrow-bandgap materials for optoelectronics applications. Front. Phy. 17, 1–33 (2022).

    Google Scholar 

  45. Agnoli, S. & Favaro, M. Doping graphene with boron: a review of synthesis methods, physicochemical characterization, and emerging applications. J. Mater. Chem. A 4, 5002–5025 (2016).

    Article  Google Scholar 

  46. Kahan, R. J., Hirunpinyopas, W., Cid, J., Ingleson, M. J. & Dryfe, R. A. Well-defined boron/nitrogen-doped polycyclic aromatic hydrocarbons are active electrocatalysts for the oxygen reduction reaction. Chem. Mater. 31, 1891–1898 (2019).

    Article  Google Scholar 

  47. Stoycheva, J. et al. Boron-doped polycyclic aromatic hydrocarbons: a molecular set revealing the interplay between topology and singlet fission propensity. J. Phys. Chem. Lett. 11, 1390–1396 (2020).

    Article  Google Scholar 

  48. Kothavale, S. S. & Lee, J. Y. Three-and four-coordinate, boron-based, thermally activated delayed fluorescent emitters. Adv. Optical Mater. 8, 2000922 (2020).

    Article  Google Scholar 

  49. Brinkmann, G., Grothaus, C. & Gutman, I. Fusenes and benzenoids with perfect matchings. J. Math. Chem. 42, 909–924 (2007).

    Article  MathSciNet  MATH  Google Scholar 

  50. Grimme, S., Bannwarth, C. & Shushkov, P. A robust and accurate tight-binding quantum chemical method for structures, vibrational frequencies, and noncovalent interactions of large molecular systems parametrized for all spd-block elements (Z = 1–86). J. Chem. Theory Comput. 13, 1989–2009 (2017).

  51. Bannwarth, C., Ehlert, S. & Grimme, S. GFN2-xTB—an accurate and broadly parametrized self-consistent tight-binding quantum chemical method with multipole electrostatics and density-dependent dispersion contributions. J. Chem. Theory Comput. 15, 1652–1671 (2019).

    Article  Google Scholar 

  52. SMARTSA Language for Describing Molecular Patterns (Daylight Chemical Information Systems, 2007).

  53. Weininger, D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inform. Comput. Sci. 28, 31–36 (1988).

    Article  Google Scholar 

  54. Heller, S., McNaught, A., Stein, S., Tchekhovskoi, D. & Pletnev, I. InChI—the worldwide chemical structure identifier standard. J. Cheminform. 5, 1–9 (2013).

    Article  Google Scholar 

  55. Riniker, S. & Landrum, G. A. Better informed distance geometry: using what we know to improve conformation generation. J. Chem. Inform. Model. 55, 2562–2574 (2015).

    Article  Google Scholar 

  56. Rappé, A. K., Casewit, C. J., Colwell, K., Goddard III, W. A. & Skiff, W. M. UFF, a full periodic table force field for molecular mechanics and molecular dynamics simulations. J. Am. Chem. Soc. 114, 10024–10035 (1992).

    Article  Google Scholar 

  57. Goodfellow, I. et al. Generative adversarial networks. Commun. ACM 63, 139–144 (2020).

    Article  Google Scholar 

  58. Satorras, V. G., Hoogeboom, E. & Welling, M. E(n) equivariant graph neural networks. In Proc. 38th International Conference on Machine Learning 9323–9332 (ML Research Press, Cambridge, 2021).

  59. Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N. & Ganguli, S. Deep unsupervised learning using nonequilibrium thermodynamics. In Proc. 32nd International Conference on Machine Learning 2256–2265 (ML Research Press, Cambridge, 2015).

  60. Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. Preprint at https://arxiv.org/abs/1412.6980 (2014).

  61. Weiss, T., Mayo-Yanes, E., Chakraborty, S. & Gershoni-Poranne, R. PASs molecular dataset. Zenodo https://doi.org/10.5281/zenodo.7798697 (2023).

  62. Weiss, T. GaUDI—2/9/2023. Zenodo https://doi.org/10.5281/zenodo.8311764 (2023).

Download references

Acknowledgements

We thank A. Wahab (ETH Zurich) for assistance with implementing the RDKit validity code and for proofreading the paper. We also thank A. Tsybizova (ETH Zurich) for proofreading and for providing helpful comments on the clarity of the text. We gratefully acknowledge P. Chen (ETH Zurich) for his scientific support and mentorship. E.M.Y., S.C. and R.G.P. are grateful for the financial support of the Branco Weiss Fellowship (awarded to R.G.P). R.G.P. is a Branco Weiss Fellow and a Horev Fellow. A.M.B. and T.W. were partially supported by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement no. 863839) and by the Council For Higher Education - Planning & Budgeting Committee. L.C. is supported by the IRIDE grant from DAIS, Ca’ Foscari University of Venice.

Author information

Authors and Affiliations

Authors

Contributions

R.G.P. and A.M.B. conceived the original idea and designed and supervised the research project. T.W., L.C. and A.M.B. designed the generative and predictive models. T.W. wrote the code and trained the models. E.M.Y. and S.C. performed the quantum chemistry calculations. E.M.Y., S.C. and R.G.P. performed the dataset curation. T.W. and R.G.P. wrote the paper with the help of the other authors. The paper reflects the contributions of all authors.

Corresponding authors

Correspondence to Alex M. Bronstein or Renana Gershoni-Poranne.

Ethics declarations

Competing interests

All other authors declare no competing interests.

Peer review

Peer review information

Nature Computational Science thanks Ganna Gryn’ova, Rocío Mercado, Rostislav Fedorov and the other, anonymous reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Kaitlin McCardle, in collaboration with the Nature Computational Science team. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Figs. 1–8 and Discussion.

Peer Review File

Source data

Source Data Fig. 2

Numerical source data for data distribution.

Source Data Fig. 3

;Numerical source data for data distribution.

Source Data Fig. 4

Numerical source data for data distribution.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Weiss, T., Mayo Yanes, E., Chakraborty, S. et al. Guided diffusion for inverse molecular design. Nat Comput Sci 3, 873–882 (2023). https://doi.org/10.1038/s43588-023-00532-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s43588-023-00532-0

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing