A generalized-template-based graph neural network for accurate organic reactivity prediction

Chen, Shuan; Jung, Yousung

doi:10.1038/s42256-022-00526-z

Article
Published: 15 September 2022

A generalized-template-based graph neural network for accurate organic reactivity prediction

Nature Machine Intelligence volume 4, pages 772–780 (2022)Cite this article

4619 Accesses
15 Citations
5 Altmetric
Metrics details

Subjects

An Author Correction to this article was published on 15 November 2022

This article has been updated

Abstract

The reliable prediction of chemical reactivity remains in the realm of knowledgeable synthetic chemists. Automating this process by using artificial intelligence could accelerate synthesis design in future digital laboratories. While several machine learning approaches have demonstrated promising results, most current models deviate from how human chemists analyse and predict reactions based on electronic changes. Here, we propose a chemistry-motivated graph neural network called LocalTransform, which learns organic reactivity based on generalized reaction templates to describe the net changes in electron configuration between the reactants and products. The proposed concept dramatically reduces the number of reaction rules and exhibits state-of-the-art product prediction accuracy. In addition to the built-in interpretability of the generalized reaction templates, the high score–accuracy correlation of the model allows users to assess the uncertainty of the machine predictions.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: The extraction process and examples of GRT.**

**Fig. 2: The overall prediction pipeline of LocalTransform.**

**Fig. 3: The top-1 exact match accuracy and the percentage of reactions as a function of prediction score.**

**Fig. 4: Six examples with high prediction score but ‘incorrect’ predictions by LocalTransform compared with the ground-truth product.**

**Fig. 5: Performance of LocalTransform on the human benchmark dataset.**

Graph neural networks for materials science and chemistry

Article Open access 26 November 2022

Geometric deep learning on molecular representations

Article 15 December 2021

Learning characteristics of graph neural networks predicting protein–ligand affinities

Article 13 November 2023

Data availability

The USPTO-480k dataset used in this manuscript is publicly available at https://github.com/wengong-jin/nips17-rexgen¹³. The source data used for each figure can be found at https://github.com/kaist-amsg/LocalTransform/releases/tag/raw_data⁴⁰.

Code availability

The code for LocalTransform described in this manuscript is publicly available at https://github.com/kaist-amsg/LocalTransform⁴⁰.

Change history

15 November 2022
A Correction to this paper has been published: https://doi.org/10.1038/s42256-022-00586-1

References

Engkvist, O. et al. Computational prediction of chemical reactions: current status and outlook. Drug Discov. Today 23, 1203–1218 (2018).
Article Google Scholar
de Almeida, A. F., Moreira, R. & Rodrigues, T. Synthetic organic chemistry driven by artificial intelligence. Nat. Rev. Chem. 3, 589–604 (2019).
Article Google Scholar
Struble, T. J. et al. Current and future roles of artificial intelligence in medicinal chemistry synthesis. J. Med. Chem. 63, 8667–8682 (2020).
Article Google Scholar
Jorner, K., Tomberg, A., Bauer, C., Sköld, C. & Norrby, P.-O. Organic reactivity from mechanism to machine learning. Nat. Rev. Chem. 5, 240–255 (2021).
Article Google Scholar
Wei, J. N., Duvenaud, D. & Aspuru-Guzik, A. Neural networks for the prediction of organic chemistry reactions. ACS Cent. Sci. 2, 725–732 (2016).
Article Google Scholar
Segler, M. H. S. & Waller, M. P. Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chem. Eur. J. 23, 5966–5971 (2017).
Article Google Scholar
Coley, C. W., Barzilay, R., Jaakkola, T. S., Green, W. H. & Jensen, K. F. Prediction of organic reaction outcomes using machine learning. ACS Cent. Sci. 3, 434–443 (2017).
Article Google Scholar
Schwaller, P., Gaudin, T., Lányi, D., Bekas, C. & Laino, T. ‘Found in translation’: predicting outcomes of complex organic chemistry reactions using neural sequence-to-sequence models. Chem. Sci. 9, 6091–6098 (2018).
Article Google Scholar
Schwaller, P. et al. Molecular Transformer: a model for uncertainty-calibrated chemical reaction prediction. ACS Cent. Sci. 5, 1572–1583 (2019).
Article Google Scholar
Tetko, I. V., Karpov, P., Van Deursen, R. & Godin, G. State-of-the-art augmented NLP transformer models for direct and single-step retrosynthesis. Nat. Commun. 11, 5575 (2020).
Article Google Scholar
Irwin, R., Dimitriadis, S., He, J. & Bjerrum, E. Chemformer: a pre-trained transformer for computational chemistry. Mach. Learn.: Sci. Technol. 3, 015022 (2022).
Google Scholar
Kayala, M. & Baldi, P. A. in Advances in Neural Information Processing Systems vol. 24 (NeurIPS, 2011).
Jin, W., Coley, C., Barzilay, R. & Jaakkola, T. in Advances in Neural Information Processing Systems vol. 30 (NeurIPS, 2017).
Coley, C. W. et al. A graph-convolutional neural network model for the prediction of chemical reactivity. Chem. Sci. 10, 370–377 (2019).
Article Google Scholar
Bradshaw, J., Kusner, M. J., Paige, B., Segler, M. H. S. & Hernández-Lobato, J. M. A generative model for electron paths. In Int. Conf. for Learning Representations. (ICLR, 2019).
Do, K., Tran, T. & Venkatesh, S. Graph Transformation Policy Network for Chemical Reaction Prediction. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 750–760 (ACM, 2019).
Sacha, M. et al. Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. J. Chem. Inf. Model. 61, 3273–3284 (2021).
Article Google Scholar
Qian, W. W. et al. Integrating deep neural networks and symbolic inference for organic reactivity prediction. Preprint at https://doi.org/10.26434/chemrxiv.11659563.v1 (2020).
Bi, H. et al. Non-autoregressive electron redistribution modeling for reaction prediction. In Proceedings of the 38th International Conference on MachineLearning. (PMLR, 2021).
Lowe, D. M. Extraction of chemical structures and reactions from the literature. Thesis, University of Cambridge (2012).
Tu, Z. & Coley, C. W. Permutation Invariant Graph-to-Sequence Model for Template-Free Retrosynthesis and Reaction Prediction. J. Chem. Inf. Model. 62, 3503–3513 (2022).
Article Google Scholar
Li, M. et al. DGL-LifeSci: an open-source toolkit for deep learning on graphs in life science. ACS Omega 6, 27233–27238 (2021).
Article Google Scholar
Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O. & Dahl, G. E. Neural message passing for quantum chemistry. In Proceedings of the 34th International Conference on MachineLearning (PMLR, 2017).
Indigo toolkit. https://lifescience.opensource.epam.com/indigo/ EPAM [accessed 23 Aug 2022].
Jaworski, W. et al. Automatic mapping of atoms across both simple and complex chemical reactions. Nat. Commun. 10, 1434 (2019).
Article Google Scholar
Schwaller, P., Hoover, B., Reymond, J.-L., Strobelt, H. & Laino, T. Extraction of organic chemistry grammar from unsupervised learning of chemical reactions. Sci. Adv. 7, eabe4166 (2021).
Article Google Scholar
Toniato, A., Schwaller, P., Cardinale, A., Geluykens, J. & Laino, T. Unassisted noise reduction of chemical reaction datasets. Nat. Mach. Intell. 3, 485–494 (2021).
Article Google Scholar
Kearnes, S. M. et al. The Open Reaction Database. J. Am. Chem. Soc. 143, 18820–18826 (2021).
Article Google Scholar
Coley, C. W., Green, W. H. & Jensen, K. F. RDChiral: an RDKit wrapper for handling stereochemistry in retrosynthetic template extraction and application. J. Chem. Inf. Model. 59, 2529–2537 (2019).
Article Google Scholar
Pesciullesi, G., Schwaller, P., Laino, T. & Reymond, J.-L. Transfer learning enables the molecular transformer to predict regio- and stereoselective reactions on carbohydrates. Nat. Commun. 11, 4874 (2020).
Article Google Scholar
Pattanaik, L. et al. Message passing networks for molecules with tetrahedral chirality. Preprint at https://arxiv.org/abs/2012.00094 (2020).
Kearnes, S., McCloskey, K., Berndl, M., Pande, V. & Riley, P. Molecular graph convolutions: moving beyond fingerprints. J. Comput. Aided Mol. Des. 30, 595–608 (2016).
Article Google Scholar
Li, Y., Tarlow, D., Brockschmidt, M. & Zemel, R. Gated graph sequence neural networks. In Int. Conf. for Learning Representations (ICLR, 2016).
Vaswani, A. et al. in Advances in Neural Information Processing Systems, pp 6000–6010 (NeurIPS, 2017).
Shaw, P., Uszkoreit, J. & Vaswani, A. Self-attention with relative position representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT, 2018)
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Article Google Scholar
RDKit: Open-source cheminformatics; http://www.rdkit.org [accessed 23 Aug 2022].
Paszke, A. et al. PyTorch: an imperative style, high-performance deep learning library. Adv. Neural Inform. Process. Syst. 32, 8026–8037 (2019).
Google Scholar
Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. In 3rd Int. Conf. for Learning Representations (ICLR, 2017).
Chen, S. kaist-amsg/LocalTransform: v1.0.0. Zenodo https://doi.org/10.5281/zenodo.6536406 (2022).
Chen, S. & Jung, Y. Deep retrosynthetic reaction prediction using local reactivity and global attention. JACS Au 1, 1612–1620 (2021).
Article Google Scholar
Yasuma, T. & Negoro, N. Condensed ring compound. US patent 7820837B2 (2010).
Jensen, A. et al. Compounds. US patent 20080039450 (2008).
Yamada, A. et al. N-coating heterocyclic compounds. US patent 20030176454 (2003).

Download references

Acknowledgements

This work was supported by the Technology Innovation Program (20015850) funded by the Ministry of Trade, Industry and Energy (MOTIE, Korea) and the National Research Foundation of Korea (2019M3D3A1A01069099, 2019M3D1A1079303, 2021R1A5A1030054).

Author information

Authors and Affiliations

Department of Chemical and Biomolecular Engineering, KAIST, Daejeon, South Korea
Shuan Chen & Yousung Jung
Graduate School of AI, KAIST, Daejeon, South Korea
Yousung Jung

Authors

Shuan Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yousung Jung
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

S.C. and Y.J. conceived the project. S.C. designed the methods and performed the computational experiments and analyses. S.C. and Y.J. discussed the results and wrote the manuscript. Y.J. supervised the project.

Corresponding author

Correspondence to Yousung Jung.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks the anonymous reviewers for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Sections 1–8, containing Supplementary Figs. 1–9, Tables 1–4 and discussion.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Chen, S., Jung, Y. A generalized-template-based graph neural network for accurate organic reactivity prediction. Nat Mach Intell 4, 772–780 (2022). https://doi.org/10.1038/s42256-022-00526-z

Download citation

Received: 09 February 2022
Accepted: 26 July 2022
Published: 15 September 2022
Issue Date: September 2022
DOI: https://doi.org/10.1038/s42256-022-00526-z

This article is cited by

Prediction of chemical reaction yields with large-scale multi-view pre-training
- Runhan Shi
- Gufeng Yu
- Yang Yang
Journal of Cheminformatics (2024)
Bidirectional generation of structure and properties through a single molecular foundation model
- Jinho Chang
- Jong Chul Ye
Nature Communications (2024)
Computational drug development for membrane protein targets
- Haijian Li
- Xiaolin Sun
- Horst Vogel
Nature Biotechnology (2024)
Precise atom-to-atom mapping for organic reactions via human-in-the-loop machine learning
- Shuan Chen
- Sunggi An
- Yousung Jung
Nature Communications (2024)
RetroRanker: leveraging reaction changes to improve retrosynthesis prediction through re-ranking
- Junren Li
- Lei Fang
- Jian-Guang Lou
Journal of Cheminformatics (2023)