Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

High-throughput property-driven generative design of functional organic molecules

A preprint version of the article is available at arXiv.

Abstract

The design of molecules and materials with tailored properties is challenging, as candidate molecules must satisfy multiple competing requirements that are often difficult to measure or compute. While molecular structures produced through generative deep learning will satisfy these patterns, they often only possess specific target properties by chance and not by design, which makes molecular discovery via this route inefficient. In this work, we predict molecules with (Pareto-)optimal properties by combining a generative deep learning model that predicts three-dimensional conformations of molecules with a supervised deep learning model that takes these as inputs and predicts their electronic structure. Optimization of (multiple) molecular properties is achieved by screening newly generated molecules for desirable electronic properties and reusing hit molecules to retrain the generative model with a bias. The approach is demonstrated to find optimal molecules for organic electronics applications. Our method is generally applicable and eliminates the need for quantum chemical calculations during predictions, making it suitable for high-throughput screening in materials and catalyst design.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Workflow of the proposed method and distribution of molecules in the dataset.
Fig. 2: Distribution of electronic properties and structures of generated molecules.
Fig. 3: Cluster analysis for molecules with small ΔE.
Fig. 4: Multiproperty biasing.

Similar content being viewed by others

Data availability

The OE62 dataset is available in ref. 24 and the OE62 + 340k G-SchNet molecule dataset is uploaded on https://figshare.com/articles/dataset/G-SchNet_for_OE62/20146943 (ref. 64). Quantum chemistry calculations carried out in this study are uploaded to NOMAD under DOI 10.17172/NOMAD/2022.07.02-1 (ref. 65). A supplementary data file showing the number of molecules predicted and used for training in each experiment and each loop is included as Supplementary Data 1.

Code availability

The modified G-SchNet version is available on GitHub (https://github.com/rhyan10/G-SchNetOE62) and tagged as version v0.1 (minted version under DOI 10.5281/zenodo.7430248)66. The GitHub repository includes scripts to analyze the data and carry out PCA. SchNet + H is published in ref. 23 and available on http://www.github.com/schnarc (minted version under DOI 10.5281/zenodo.7424017)67. We include a tutorial for using SchNet + H and G-SchNet models for OE62 on figshare (https://figshare.com/articles/dataset/G-SchNet_for_OE62/20146943), including instructions for installation64. Original tutorials for training and using G-SchNet and SchNet + H are available on GitHub with the original code of G-SchNet (https://github.com/atomistic-machine-learning/G-SchNet)3 and SchNarc (https://github.com/schnarc/SchNarc/tree/develop)68, respectively.

References

  1. Gómez-Bombarelli, R. et al. Design of efficient molecular organic light-emitting diodes by a high-throughput virtual screening and experimental approach. Nat. Mater. 15, 1120–1127 (2016).

    Article  Google Scholar 

  2. Bilodeau, C., Jin, W., Jaakkola, T., Barzilay, R. & Jensen, K. F. Generative models for molecular discovery: recent advances and challenges. WIRES Comput. Mol. Sci. 12, e1608 (2022).

  3. Gebauer, N. W. A., Gastegger, M. & Schütt, K. T. Symmetry-adapted generation of 3D point sets for the targeted discovery of molecules. Adv. Neural Inf. Process. Syst. 32 (2019).

  4. Tkatchenko, A. Machine learning for chemical discovery. Nat. Commun. 11, 4125 (2020).

    Article  Google Scholar 

  5. Coley, C. W. Defining and exploring chemical spaces. Trends Chem. 3, 133–145 (2021).

    Article  Google Scholar 

  6. Wu, T. C. et al. A materials acceleration platform for organic laser discovery. Adv. Mater. https://doi.org/10.1002/adma.202207070 (2022).

  7. Gryn’ova, G., Lin, K.-H. & Corminboeuf, C. Read between the molecules: computational insights into organic semiconductors. J. Am. Chem. Soc. 140, 16370–16386 (2018).

    Article  Google Scholar 

  8. Li, X.-H. et al. Narrow-bandgap materials for optoelectronics applications. Front. Phys. 17, 13304 (2022).

    Article  Google Scholar 

  9. Xue, D. et al. Advances and challenges in deep generative models for de novo molecule generation. WIRES Comput. Mol. Sci. 9, e1395 (2019).

    Article  Google Scholar 

  10. Meyers, J., Fabian, B. & Brown, N. De novo molecular design and generative models. Drug Discov. Today 26, 2707–2715 (2021).

    Article  Google Scholar 

  11. Sanchez-Lengeling, B. & Aspuru-Guzik, A. Inverse molecular design using machine learning: generative models for matter engineering. Science 361, 360–365 (2018).

    Article  Google Scholar 

  12. Gebauer, N. W. A., Gastegger, M., Hessmann, S. S. P., Müller, K.-R. & Schütt, K. T. Inverse design of 3D molecular structures with conditional generative neural networks. Nat. Commun. 13, 973 (2022).

    Article  Google Scholar 

  13. Li, Y., Pei, J. & Lai, L. Structure-based de novo drug design using 3D deep generative models. Chem. Sci. 12, 13664–13675 (2021).

    Article  Google Scholar 

  14. Elton, D. C., Boukouvalas, Z., Fuge, M. D. & Chung, P. W. Deep learning for molecular design—a review of the state of the art. Mol. Syst. Des. Eng. 4, 828–849 (2019).

    Article  Google Scholar 

  15. Tan, X. et al. Automated design and optimization of multitarget schizophrenia drug candidates by deep learning. Eur. J. Med. Chem. 204, 112572 (2020).

    Article  Google Scholar 

  16. Sumita, M., Yang, X., Ishihara, S., Tamura, R. & Tsuda, K. Hunting for organic molecules with artificial intelligence: molecules optimized for desired excitation energies. ACS Cent. Sci. 4, 1126–1133 (2018).

    Article  Google Scholar 

  17. Bilodeau, C. et al. Generating molecules with optimized aqueous solubility using iterative graph translation. React. Chem. Eng. 7, 297–309 (2022).

    Article  Google Scholar 

  18. Zhavoronkov, A. et al. Deep learning enables rapid identification of potent DDR1 kinase inhibitors. Nat. Biotechnol. 37, 1038–1040 (2019).

    Article  Google Scholar 

  19. Simm, G. N. & Hernández-Lobato, J. M. A generative model for molecular distance geometry. In Proc. 37th International Conference on Machine Learning 8949–8958 (JMLR.org, 2020).

  20. Xu, M., Luo, S., Bengio, Y., Peng, J. & Tang, J. Learning neural generative dynamics for molecular conformation generation. Preprint at https://arxiv.org/abs/2102.10240 (2021).

  21. Axelrod, S. & Gómez-Bombarelli, R. GEOM, energy-annotated molecular conformations for property prediction and molecular generation. Sci. Data 9, 185 (2022).

    Article  Google Scholar 

  22. Ganea, O. et al. GeoMol: torsional geometric generation of molecular 3D conformer ensembles. Adv. Neural Inf. Process. Syst. 34 (2021).

  23. Westermayr, J. & Maurer, R. J. Physically inspired deep learning of molecular excitations and photoemission spectra. Chem. Sci. 12, 10755–10764 (2021).

    Article  Google Scholar 

  24. Stuke, A. et al. Atomic structures and orbital energies of 61,489 crystal-forming organic molecules. Sci. Data 7, 58 (2020).

    Article  Google Scholar 

  25. Golze, D., Dvorak, M. & Rinke, P. The GW compendium: a practical guide to theoretical photoemission spectroscopy. Front. Chem 7, 377 (2019).

    Article  Google Scholar 

  26. Coley, C. W., Rogers, L., Green, W. H. & Jensen, K. F. SCScore: synthetic complexity learned from a reaction corpus. J. Chem. Inf. Model. 58, 252–261 (2018).

    Article  Google Scholar 

  27. Brown, N., Fiscato, M., Segler, M. H. S. & Vaucher, A. C. GuacaMol: benchmarking models for de novo molecular design. J. Chem. Inf. Model. 59, 1096–1108 (2019).

    Article  Google Scholar 

  28. Ramakrishnan, R., Dral, P. O., Rupp, M. & von Lilienfeld, O. A. Big data meets quantum chemistry approximations: the Δ-machine learning approach. J. Chem. Theory Comput. 11, 2087–2096 (2015).

    Article  Google Scholar 

  29. Lawson, A. J., Swienty-Busch, J., Géoui, T. & Evans, D. in The Future of the History of Chemical Information ACS Symposium Series Vol. 1164, 127–148 (American Chemical Society, 2014).

  30. Joshi, R. P. et al. 3D-Scaffold: a deep learning framework to generate 3D coordinates of drug-like molecules with desired scaffolds. J. Phys. Chem. B 125, 12166–12176 (2021).

    Article  Google Scholar 

  31. Bartók, A. P., Kondor, R. & Csányi, G. On representing chemical environments. Phys. Rev. B 87, 184115 (2013).

    Article  Google Scholar 

  32. Zhang, T., Ramakrishnan, R. & Livny, M. BIRCH: a new data clustering algorithm and its applications. Data Min. Knowl. Discov. 1, 141–182 (1997).

    Article  Google Scholar 

  33. Schubert, E., Sander, J., Ester, M., Kriegel, H. P. & Xu, X. DBSCAN revisited, revisited: why and how you should (still) use DBSCAN. ACM Trans. Database Syst. 42, 19 (2017).

    Article  MathSciNet  Google Scholar 

  34. Liotta, D. & Monahan, R. Selenium in organic synthesis. Science 231, 356–361 (1986).

    Article  Google Scholar 

  35. Wilbraham, L., Smajli, D., Heath-Apostolopoulos, I. & Zwijnenburg, M. A. Mapping the optoelectronic property space of small aromatic molecules. Commun. Chem. 3, 14 (2020).

    Article  Google Scholar 

  36. Bajusz, D., Rácz, A. & Héberger, K. Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations? J. Cheminform. 7, 20 (2015).

    Article  Google Scholar 

  37. Bendikov, M., Wudl, F. & Perepichka, D. F. Tetrathiafulvalenes, oligoacenenes, and their buckminsterfullerene derivatives: the brick and mortar of organic electronics. Chem. Rev. 104, 4891–4946 (2004).

    Article  Google Scholar 

  38. Hu, Y., Chaitanya, K., Yin, J. & Ju, X.-H. Theoretical investigation on the crystal structures and electron transfer properties of cyanated TTPO and their selenium analogs. J. Mater. Sci. 51, 6235–6248 (2016).

    Article  Google Scholar 

  39. Ferri, N. et al. Hemilabile ligands as mechanosensitive electrode contacts for molecular electronics. Ang. Chem. Int. Ed. 58, 16583–16589 (2019).

    Article  Google Scholar 

  40. Manzoor, F. et al. Theoretical calculations of the optical and electronic properties of dithienosilole- and dithiophene-based donor materials for organic solar cells. Chem. Sel. 3, 1593–1601 (2018).

    Google Scholar 

  41. Li, Y., Liu, J., Liu, D., Li, X. & Xu, Y. D–A–π–A based organic dyes for efficient DSSCs: a theoretical study on the role of π-spacer. Comput. Mater. Sci. 161, 163–176 (2019).

    Article  Google Scholar 

  42. Kim, T. H. & Kim, K. S. Acridine derivative and organic electroluminescence device comprising the same. South Korea patent KR101120892B1 (2009).

  43. Seifermann, S. & Choné, R. Organic molecules, in particular for use in optoelectronic devices. Europe patent EP3916072 (2018).

  44. Sharma, V. K., Sohn, M. & McDonald, T. J. in Advances in Water Purification Techniques (ed. Ahuja, S.) 203–218 (Elsevier, 2019).

  45. Fordyce, F. M. in Essentials of Medical Geology: Revised Edition (ed. Selinus, O.) 375–416 (Springer, 2013).

  46. Landrum, G. RDKit: Open-Source Cheminformatics (2006); https://www.rdkit.org/

  47. Blum, V. et al. Ab initio molecular simulations with numeric atom-centered orbitals. Comput. Phys. Commun. 180, 2175–2196 (2009).

    Article  MATH  Google Scholar 

  48. Perdew, J. P., Burke, K. & Ernzerhof, M. Generalized gradient approximation made simple. Phys. Rev. Lett. 77, 3865–3868 (1996).

    Article  Google Scholar 

  49. Tkatchenko, A. & Scheffler, M. Accurate molecular van der Waals interactions from ground-state electron density and free-atom reference data. Phys. Rev. Lett. 102, 073005 (2009).

    Article  Google Scholar 

  50. Adamo, C. & Barone, V. Toward reliable density functional methods without adjustable parameters: the PBE0 model. J. Chem. Phys. 110, 6158–6170 (1999).

    Article  Google Scholar 

  51. Perdew, J. P., Ernzerhof, M. & Burke, K. Rationale for mixing exact exchange with density functional approximations. J. Chem. Phys. 105, 9982–9985 (1996).

    Article  Google Scholar 

  52. Ren, X. et al. Resolution-of-identity approach to Hartree–Fock, hybrid density functionals, RPA, MP2 and GW with numeric atom-centered orbital basis functions. New J. Phys. 14, 053020 (2012).

    Article  Google Scholar 

  53. Weigend, F. & Ahlrichs, R. Balanced basis sets of split valence, triple zeta valence and quadruple zeta valence quality for H to Rn: design and assessment of accuracy. Phys. Chem. Chem. Phys. 7, 3297–3305 (2005).

    Article  Google Scholar 

  54. van Setten, M. J. et al. GW100: benchmarking G0W0 for molecular systems. J. Chem. Theory Comput. 11, 5665–5687 (2015).

    Article  Google Scholar 

  55. Ramakrishnan, R., Dral, P. O., Rupp, M. & von Lilienfeld, O. A. Quantum chemistry structures and properties of 134 kilo molecules. Sci. Data 1, 140022 (2014).

    Article  Google Scholar 

  56. Ruddigkeit, L., van Deursen, R., Blum, L. C. & Reymond, J.-L. Enumeration of 166 billion organic small molecules in the chemical universe database GDB-17. J. Chem. Inf. Model. 52, 2864–2875 (2012).

    Article  Google Scholar 

  57. Schütt, K. T., Sauceda, H. E., Kindermans, P.-J., Tkatchenko, A. & Müller, K.-R. SchNet—a deep learning architecture for molecules and materials. J. Chem. Phys. 148, 241722 (2018).

    Article  Google Scholar 

  58. Schütt, K. T. et al. SchNetPack: a deep learning toolbox for atomistic systems. J. Chem. Theory Comput. 15, 448–455 (2019).

    Article  Google Scholar 

  59. Reining, L. The GW approximation: content, successes and limitations. WIRES Comput. Mol. Sci. 8, e1344 (2018).

    Article  Google Scholar 

  60. Weininger, D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inform. Comput. Sci. 28, 31–36 (1988).

    Article  Google Scholar 

  61. O’Boyle, N. M. et al. Open Babel: an open chemical toolbox. J. Cheminform 3, 33 (2011).

    Article  Google Scholar 

  62. Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).

    MathSciNet  MATH  Google Scholar 

  63. Baldi, P. & Nasr, R. When is chemical similarity significant? The statistical distribution of chemical similarity scores and its extreme values. J. Chem. Inf. Model. 50, 1205–1222 (2010).

    Article  Google Scholar 

  64. Westermayr, J., Barrett, R., Gilkes, J. & Maurer, R. J. G-SchNet for OE62. Figshare https://doi.org/10.6084/m9.figshare.20146943.v2 (2022).

  65. Westermayr, J. & Maurer, R. J. Organic molecules from generative autoregressive models. NOMAD https://doi.org/10.17172/NOMAD/2022.07.02-1 (2022).

  66. Westermayr, J. & Barrett, R. G-Schnet for OE62 dataset (v0.1). Zenodo https://doi.org/10.5281/zenodo.7430248 (2022).

  67. Westermayr, J. SchNarc for SchNet + H. Zenodo https://doi.org/10.5281/zenodo.7424017 (2021).

  68. Westermayr, J., Gastegger, M. & Marquetand, P. Combining SchNet and SHARC: the SchNarc machine learning approach for excited-state dynamics. J. Phys. Chem. Lett. 11, 3828–3834 (2020).

    Article  Google Scholar 

Download references

Acknowledgements

This work was funded by the Austrian Science Fund (FWF; J 4522-N) (J.W.), the EPSRC Centre for Doctoral Training in Modelling of Heterogeneous Systems (EP/S022848/1) (R.J.M.), the EPSRC-funded Network+ on Artificial and Augmented Intelligence for Automated Scientific Discovery (EP/S000356/10) (R.J.M.) and the UKRI Future Leaders Fellowship program (MR/S016023/1) (R.J.M.). Computational resources have been provided by the Scientific Computing Research Technology Platform of the University of Warwick, the EPSRC-funded Northern Ireland High Performance Computing service (EP/T022175/1) via access to Kelvin2, the EPSRC-funded HPC Midlands+ computing service (EP/P020232/1) via access to Athena and Sulis and the EPSRC-funded High End Computing Materials Chemistry Consortium (EP/R029431/1) for access to the ARCHER2 UK National Supercomputing Service (https://www.archer2.ac.uk). We thank N. Gebauer (TU Berlin) for fruitful discussions on the G-SchNet model. For the purpose of open access, we have applied a Creative Commons Attribution (CC BY) license to any Author Accepted Manuscript version arising from this submission.

Author information

Authors and Affiliations

Authors

Contributions

R.J.M. conceived the original idea and supervised the research project. R.J.M. and J.W. designed the research project. R.B. and J.W. trained the deep learning models and created the property-guided design workflow. J.G. and J.W. performed the dataset curation, predictions, model validation and data analysis. J.W. performed the quantum chemistry calculations. J.W. and R.J.M. wrote the manuscript with the help of the other authors. The manuscript reflects the contributions of all authors.

Corresponding authors

Correspondence to Julia Westermayr or Reinhard J. Maurer.

Ethics declarations

Competing interests

R.J.M. is an editorial board member of the journal Communications Materials. All other authors declare no competing interests.

Peer review

Peer review information

Nature Computational Science thanks Camille Bilodeau and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Kaitlin McCardle, in collaboration with the Nature Computational Science team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Sections 1–9, Figs. 1–11 and Table 1.

Supplementary Data

Number of molecules predicted and used for training. Number of molecules used initially, obtained either from OE62 (initial loop), from OE62 + G-SchNet (initial loop for multiproperty biasing) or from G-SchNet alone (remaining loops). Molecules that were generated with G-SchNet are already sorted, hence the number of valid molecules is shown. The number of generated molecules was set to 200,000 for EA, ΔE and multiproperty biasing and to 100,000 for IP and ΔE (knockout) biasing. The third column shows the number of molecules that were selected for biasing G-SchNet. The fourth column shows the percentage of selected molecules with respect to the number of predicted molecules at this iteration.

Source data

Source Data for all Figures

Data depicted in Figs. 1–4.

Source Data Fig. 3

ChemDraw file of molecules depicted in Fig. 3.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Westermayr, J., Gilkes, J., Barrett, R. et al. High-throughput property-driven generative design of functional organic molecules. Nat Comput Sci 3, 139–148 (2023). https://doi.org/10.1038/s43588-022-00391-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s43588-022-00391-1

This article is cited by

Search

Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics