Article

Planning chemical syntheses with deep neural networks and symbolic AI

  • Nature volume 555, pages 604610 (29 March 2018)
  • doi:10.1038/nature25978
  • Download Citation
Received:
Accepted:
Published:

Abstract

To plan the syntheses of small organic molecules, chemists use retrosynthesis, a problem-solving technique in which target molecules are recursively transformed into increasingly simpler precursors. Computer-aided retrosynthesis would be a valuable tool but at present it is slow and provides results of unsatisfactory quality. Here we use Monte Carlo tree search and symbolic artificial intelligence (AI) to discover retrosynthetic routes. We combined Monte Carlo tree search with an expansion policy network that guides the search, and a filter network to pre-select the most promising retrosynthetic steps. These deep neural networks were trained on essentially all reactions ever published in organic chemistry. Our system solves for almost twice as many molecules, thirty times faster than the traditional computer-aided search method, which is based on extracted rules and hand-designed heuristics. In a double-blind AB test, chemists on average considered our computer-generated routes to be equivalent to reported literature routes.

  • Subscribe to Nature for full access:

    $199

    Subscribe

Additional access options:

Already a subscriber?  Log in  now or  Register  for online access.

References

  1. 1.

    , , & Organic Chemistry 2nd edn (Oxford Univ. Press, 2008)

  2. 2.

    Reaktionsmechanismen: Osganische Reaktionen, Stereochemie, Moderne Synthesemethoden (Springer, 2014)

  3. 3.

    LXIII. A synthesis of tropinone. J. Chem. Soc. Trans. 111, 762–768 (1917)

  4. 4.

    & The Logic of Chemical Synthesis (Wiley, 1989)

  5. 5.

    & Strategic Applications of Named Reactions in Organic Synthesis (Elsevier, 2005)

  6. 6.

    in The Oxford Handbook of Thinking and Reasoning (eds & Morrison, R. G.) 115–133 (Oxford Univ. Press, 2012)

  7. 7.

    & A robustness screen for the rapid assessment of chemical reactions. Nat. Chem. 5, 597–601 (2013)

  8. 8.

    , , & Organic synthesis: march of the machines. Angew. Chem. Int. Ed. 54, 3449–3464 (2015)

  9. 9.

    & De novo design at the edge of chaos: miniperspective. J. Med. Chem. 59, 4077–4086 (2016)

  10. 10.

    , , & Generating focussed molecule libraries for drug discovery with recurrent neural networks. ACS Cent. Sci. 4, 120–131 (2018)

  11. 11.

    Concerning one system of classification and codification of organic reactions. Inform. Storage Retrieval 1, 117–146 (1963)

  12. 12.

    Computer-aided organic synthesis. Chem. Soc. Rev. 34, 247–266 (2005)

  13. 13.

    et al. Computer-assisted synthetic planning: the end of the beginning. Angew. Chem. Int. Ed. 55, 5904–5937 (2016)

  14. 14.

    et al. Computer-aided synthesis design: 40 years on. Wiley Interdiscip. Rev. Comput. Mol. Sci. 2, 79–107 (2012)

  15. 15.

    & Computer-assisted planning of organic syntheses: the second generation of programs. Angew. Chem. Int. Edn Engl. 34, 2613–2633 (1996)

  16. 16.

    Konzepte zur Syntheseplanung: Strukturelle Ähnlichkeit und Strategische Bindungen. PhD thesis, Friedrich-Alexander-Universität (1996)

  17. 17.

    et al. Models, concepts, theories, and formal languages in chemistry and their use as a basis for computer assistance in chemistry. J. Chem. Inf. Comput. Sci. 34, 3–16 (1994)

  18. 18.

    , , & Learning to predict chemical reactions. J. Chem. Inf. Model. 51, 2209–2222 (2011)

  19. 19.

    A Framework for Representing Knowledge. Technical Report (Massachusetts Institute of Technology, 1974)

  20. 20.

    et al. Route design in the 21st century: the ICSYNTH software tool as an idea generator for synthesis prediction. Org. Process Res. Dev. 19, 357–368 (2015)

  21. 21.

    et al. Route designer: a retrosynthetic analysis tool utilizing automated retrosynthetic rule generation. J. Chem. Inf. Model. 49, 593–602 (2009)

  22. 22.

    , & Mining electronic laboratory notebooks: analysis, retrosynthesis, and reaction based enumeration. J. Chem. Inf. Model. 52, 1745–1756 (2012)

  23. 23.

    & Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry 23, 5966–5971 (2017)

  24. 24.

    , & Structure and reaction based evaluation of synthetic accessibility. J. Comput. Aided Mol. Des. 21, 311–325 (2007)

  25. 25.

    & Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. J. Cheminform. 1, 8 (2009)

  26. 26.

    Efficient selectivity and backup operators in Monte-Carlo tree search. In Int. Conf. on Computers and Games 72–83 (Springer, 2006)

  27. 27.

    & Bandit based Monte-Carlo planning. In 17th Eur. Conf. on Machine Learning 282–293 (Springer, 2006)

  28. 28.

    et al. A survey of Monte Carlo tree search methods. IEEE Trans. Comput. Intell. AI Games 4, 1–43 (2012)

  29. 29.

    & Reinforcement Learning: An Introduction 2nd edn (MIT Press, in the press)

  30. 30.

    Computing “elo ratings” of move patterns in the game of go. ICGA J. 30, 198–208 (2007)

  31. 31.

    , & Bayesian pattern ranking for move prediction in the game of Go. In Int. Conf. on Machine Learning 873–880 (Omni Press, 2006)

  32. 32.

    , , & Move evaluation in Go using deep convolutional neural networks. In 3rd Int. Conf. on Learning Representations (2015); preprint at

  33. 33.

    & Training deep convolutional neural networks to play Go. In 32nd Int. Conf. on Machine Learning 1766–1774 (PMLR, 2015);

  34. 34.

    Neural networks for video game AI. In Artificial and Computational Intelligence in Games: Integration (Dagstuhl Seminar 15051) Vol. 5 (eds et al.) 224 (2015)

  35. 35.

    et al. Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489 (2016)

  36. 36.

    Reaxys (Elsevier Life Sciences, 2017)

  37. 37.

    ., & Training very deep networks. In Advances in Neural Information Processing Systems 2377–2385 (MIT Press, 2015); preprint at

  38. 38.

    , & Fast and accurate deep network learning by exponential linear units (ELUs). In 4th Int. Conf. on Learning Representations (2016); preprint at

  39. 39.

    Time-split cross-validation as a method for estimating the goodness of prospective prediction. J. Chem. Inf. Model. 53, 783–790 (2013)

  40. 40.

    et al. Expert system for predicting reaction conditions: the Michael reaction case. J. Chem. Inf. Model. 55, 239–250 (2015)

  41. 41.

    & Modelling chemical reasoning to predict and invent reactions. Chemistry 23, 6118–6128 (2017)

  42. 42.

    , , , & Prediction of organic reaction outcomes using machine learning. ACS Cent. Sci. 3, 434–443 (2017)

  43. 43.

    Machine Learning: a Probabilistic Perspective (MIT Press, 2012)

  44. 44.

    , , & Design, synthesis and biological evaluation of novel benzopyran sulfonamide derivatives as 5-HT6 receptor ligands. Asian J. Chem. 27, 2117–2124 (2015)

  45. 45.

    , , & Building machines that learn and think like people. Behav. Brain Sci. 40, 1–101 (2016)

  46. 46.

    & Dead ends and detours en route to total syntheses of the 1990s. Angew. Chem. Int. Ed. 39, 1538–1559 (2000)

  47. 47.

    & End-to-end differentiable proving. In Advances of Neural Information Processing Systems (eds Guyon, I. et al.) 3788–3800 (Curran Associates, 2017);

  48. 48.

    , & Computing organic stereoselectivity—from concepts to quantitative calculations and predictions. Chem. Soc. Rev. 45, 6093–6107 (2016)

  49. 49.

    et al. Automatized assessment of protective group reactivity: a step toward big reaction data analysis. J. Chem. Inf. Model. 56, 2140–2148 (2016)

  50. 50.

    , , & Dehydrogenative tempo-mediated formation of unstable nitrones: easy access to n-carbamoyl isoxazolines. Chemistry 21, 12053–12060 (2015)

  51. 51.

    , , & Generic strategies for chemical space exploration. Int. J. Comput. Biol. Drug Des. 7, 225–258 (2014)

  52. 52.

    et al. Recent developments of the chemistry development kit (CDK)-an open-source Java library for chemo- and bioinformatics. Curr. Pharm. Des. 12, 2111–2120 (2006)

  53. 53.

    RDKit: Open-Source Cheminformatics

  54. 54.

    Reinforcement Learning and Simulation-Based Search. PhD thesis, Univ. Alberta (2009)

  55. 55.

    , , & The enumeration of chemical space. Wiley Interdiscip. Rev. Comput. Mol. Sci. 2, 717–733 (2012)

  56. 56.

    ., & Monte Carlo connection prover. Preprint at (2016)

  57. 57.

    Multi-armed bandits with episode context. Ann. Math. Artif. Intell. 61, 203–230 (2011)

  58. 58.

    , & Monte-Carlo tree search solver. In Int. Conf. on Computers and Games 25–36 (Springer, 2008)

  59. 59.

    , , & Development of a novel fingerprint for chemical reactions and its application to large-scale reaction classification and similarity. J. Chem. Inf. Mod. 55, (2015)

  60. 60.

    , , & Computer-assisted retrosynthesis based on molecular similarity. ACS Cent. Sci. 3, 1237–1245 (2017)

  61. 61.

    , & Building and refining a knowledge base for synthetic organic chemistry via the methodology of inductive and deductive machine learning. J. Chem. Inf. Comput. Sci. 30, 492–504 (1990)

  62. 62.

    & Horace: an automatic system for the hierarchical classification of chemical reactions. J. Chem. Inf. Comput. Sci. 34, 74–90 (1994)

  63. 63.

    et al. Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS Cent. Sci. 3, 1103–1113 (2017)

  64. 64.

    & ADAM: a method for stochastic optimization. In 3rd Int. Conf. for Learning Representations; preprint at (2015)

  65. 65.

    et al. Keras (2015)

  66. 66.

    The Theano Development Team Theano: a Python framework for fast computation of mathematical expressions. Preprint at (2016)

  67. 67.

    & Extended-connectivity fingerprints. J. Chem. Inf. Model. 50, 742–754 (2010)

  68. 68.

    , & Neural networks for the prediction of organic chemistry reactions. ACS Cent. Sci. 2, 725–732 (2016)

  69. 69.

    & The ROBIA program for predicting organic reactivity. J. Chem. Inf. Model. 46, 606–614 (2006)

  70. 70.

    & Sophia, a knowledge base-guided reaction prediction system—utilization of a knowledge base derived from a reaction database. J. Chem. Inf. Comput. Sci. 35, 34–44 (1995)

  71. 71.

    , , & Knowledge-based approach to de novo design using reaction vectors. J. Chem. Inf. Model. 49, 1163–1184 (2009)

  72. 72.

    & Aires-de Sousa, J. Structure-based classification of chemical reactions without assignment of reaction centers. J. Chem. Inf. Model. 45, 1775–1783 (2005)

  73. 73.

    et al. Structure–reactivity modeling using mixture-based representation of chemical reactions. J. Comput. Aided Mol. Des. 31, 829–839 (2017)

  74. 74.

    , & Aires-de Sousa, J. Machine learning of chemical reactivity from databases of organic reactions. J. Comput. Aided Mol. Des. 23, 419–429 (2009)

  75. 75.

    The ORCA program system. WIREs Comput. Mol. Sci. 2, 73–78 (2012)

  76. 76.

    Unsupervised data base clustering based on Daylight’s fingerprint and Tanimoto similarity: a fast and automated way to cluster small and large data sets. J. Chem. Inf. Comput. Sci. 39, 747–750 (1999)

  77. 77.

    et al. Discovery and structural diversity of the hepatitis C virus NS3/4a serine protease inhibitor series leading to clinical candidate IDX320. Bioorg. Med. Chem. Lett. 25, 5427–5436 (2015)

Download references

Acknowledgements

M.H.S.S. and M.P.W. thank the Deutsche Forschungsgemeinschaft (SFB858) for funding. M.H.S.S. and M.P.W. also thank D. Evans (RELX Intellectual Properties) and J. Swienty-Busch (Elsevier Information Systems) for the reaction dataset. We thank all AB-test participants in Shanghai and Münster, and J. Guo for assistance in AB testing. M.H.S.S. thanks M. Wiesenfeldt, the Studer group, D. Barton, S. McAnanama-Brereton, R. Vidyadharan and T. Kogej for discussions. M.P. thanks M. Winands and J. Togelius for insights.

Author information

Affiliations

  1. Institute of Organic Chemistry and Center for Multiscale Theory and Computation, Westfälische Wilhelms-Universität, Münster, Germany

    • Marwin H. S. Segler
  2. BenevolentAI, London, UK

    • Marwin H. S. Segler
  3. European Research Center for Information Systems, Westfälische Wilhelms-Universität Münster, Germany

    • Mike Preuss
  4. Department of Physics and International Centre for Quantum and Molecular Structures, Shanghai University, Shanghai, China

    • Mark P. Waller

Authors

  1. Search for Marwin H. S. Segler in:

  2. Search for Mike Preuss in:

  3. Search for Mark P. Waller in:

Contributions

M.H.S.S. conceived the project, M.P.W. and M.P. contributed ideas. M.H.S.S., M.P. and M.P.W. designed the experiments. M.H.S.S. implemented the program. M.H.S.S. and M.P.W. conducted the experiments. M.P.W. supervised the project. All authors co-wrote the manuscript.

Competing interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to Marwin H. S. Segler or Mark P. Waller.

Reviewer Information thanks D. Duvenaud, W. H. Green and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Supplementary information

PDF files

  1. 1.

    Supplementary Information

    This file contains the DOE, route diversity analysis, and failed molecules (including Supplementary Figures 1-7, Supplementary Table 1 and Supplementary References). Available on figshare (DOI 10.6084/m9.figshare.5832054) are 2 files, mcts_examples.pdf which contains routes found by the 3N-MCTS algorithm and heuristicBFS_examples.pdf which contains routes found by heuristic best first search without policy network and in-scope filter.

  2. 2.

    Supplementary Information

    This file contains the AB test.

Excel files

  1. 1.

    Supplementary Data

    This file contains the experiment for correlating the in-scope filter to physicochemical properties.

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.