Organic chemistry has largely been conducted in an ad hoc manner by academic laboratories that are funded by grants directed towards the investigation of specific goals or hypotheses. Although modern synthetic methods can provide access to molecules of considerable complexity, predicting the outcome of a single chemical reaction remains a major challenge. Improvements in the prediction of ‘above-the-arrow’ reaction conditions are needed to enable intelligent decision making to select an optimal synthetic sequence that is guided by metrics including efficiency, quality and yield. Methods for the communication and the sharing of data will need to evolve from traditional tools to machine-readable formats and open collaborative frameworks. This will accelerate innovation and require the creation of a chemistry commons with standardized data handling, curation and metrics.
Access optionsAccess options
Subscribe to Journal
Get full journal access for 1 year
only $3.90 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Wöhler, F. Ueber künstliche bildung des harnstoffs. Ann. Phys. 88, 253–256 (1828).
Whitesides, G. M. Complex organic synthesis: structure, properties, and/or function? Isr. J. Chem. 58, 142 (2018).
Corey, E. J. & Wipke, W. T. Computer-assisted design of complex organic syntheses. Science 166, 178–192 (1969).
Corey, E. J., Wipke, W. T., Cramer, R. D. III & Howe, W. J. Computer-assisted synthetic analysis. Facile man–machine communication of chemical structure by interactive computer graphics J. Am. Chem. Soc. 94, 421–430 (1972).
Szymkuć, S. et al. Computer-assisted synthetic planning: the end of the beginning. Angew. Chem. Int. Ed. 55, 5904–5937 (2016).
Coley, C. W., Green, W. H. & Jensen, K. F. Machine learning in computer-aided synthesis planning. Acc. Chem. Res. 51, 1281–1289 (2018).
Klucznik, T. et al. Efficient syntheses of diverse, medicinally relevant targets planned by computer and executed in the laboratory. Chem 4, 522–532 (2018).
Silver, D. et al. Mastering the game of Go without human knowledge. Nature 550, 354–359 (2017).
Schwaller, P., Gaudin, T., Lanyi, D., Bekas, C. & Laino, T. “Found in translation”: predicting outcomes of complex organic chemistry reactions using neural sequence-to-sequence models. Chem. Sci. 9, 6091–6098 (2018).
Segler, M. H. S. & Waller, M. P. Modelling chemical reasoning to predict and invent reactions. Chem. Eur. J. 23, 6118–6128 (2017).
Segler, M. H. S., Preuss, M. & Waller, M. P. Planning chemical syntheses with deep neural networks and symbolic AI. Nature 555, 604–610 (2018).
Kasparov, G. Chess, a Drosophila of reasoning. Science 362, 1087 (2018).
Cernak, T. A machine with chemical intuition. Chem 4, 401–403 (2018).
Steiner, S. et al. Organic synthesis in a modular robotic system driven by a chemical programming language. Science 363, eaav2211 (2019).
Garg, N. K. Empowering students to innovate: engagement in organic chemistry teaching. Angew. Chem. Int. Ed. 57, 15612–15613 (2018).
Engkvist, O. et al. Computational prediction of chemical reactions: current status and outlook. Drug Discov. Today 23, 1203–1218 (2018).
Gaich, T. & Baran, P. S. Aiming for the ideal synthesis. J. Org. Chem. 75, 4657–4673 (2010).
Trost, B. M. The atom economy—a search for synthetic efficiency. Science 254, 1471–1477 (1991).
Burns, N. Z., Baran, P. S. & Hoffmann, R. W. Redox economy in organic synthesis. Angew. Chem. Int. Ed. 48, 2854–2867 (2009).
Cernijenko, A., Risgaard, R. & Baran, P. S. 11-step total synthesis of (−)-maoecrystal V. J. Am. Chem. Soc. 138, 9425–9428 (2016).
Griffen, E. J., Dossetter, A. G., Leach, A. G. & Montague, S. Can we accelerate medicinal chemistry by augmenting the chemist with Big Data and artificial intelligence? Drug Discov. Today 23, 1373–1384 (2018).
Kutchukian, P. S. et al. Chemistry informer libraries: a chemoinformatics enabled approach to evaluate and advance synthetic methods. Chem. Sci. 7, 2604–2613 (2016).
Yao, H. et al. Enabling efficient late-stage functionalization of drug-like molecules with LC-MS and reaction-driven data processing. Eur. J. Org. Chem. 2017, 7122–7126 (2017).
Yasuda, N. (ed.) The Art of Process Chemistry (Wiley-VCH, 2010).
Li, J., Albrecht, J., Borovika, A. & Eastgate, M. D. Evolving green chemistry metrics into predictive tools for decision making and benchmarking analytics. ACS Sustainable Chem. Eng. 6, 1121–1132 (2018).
Trobe, M. & Burke, M. D. The molecular industrial revolution: automated synthesis of small molecules. Angew. Chem. Int. Ed. 57, 4192–4214 (2018).
Buitrago Santanilla, A. et al. Nanomole-scale high-throughput chemistry for the synthesis of complex molecules. Science 347, 49–53 (2015).
Gesmundo, N. et al. Nanoscale synthesis and affinity ranking. Nature 557, 228–232 (2018).
Sanchez-Lengeling, B. & Aspuru-Guzik, A. Inverse molecular design using machine learning: generative models for matter engineering. Science 361, 360–365 (2018).
Schneider, G. Automating drug discovery. Nat. Rev. Drug Discov. 17, 97–113 (2018).
Lin, S. et al. Mapping the dark space of chemical reactions with extended nanomole synthesis and MALDI-TOF MS. Science 361, eaar6236 (2018).
Ahneman, D. T., Estrada, J. G., Lin, S., Dreher, S. D. & Doyle, A. G. Predicting reaction performance in C–N cross-coupling using machine learning. Science 360, 186–190 (2018). This article demonstrates machine learning in prediction of the performance of a catalytic reaction using data obtained via high-throughput experimentation.
Zhao, S. et al. Enantiodivergent Pd-catalyzed C–C bond formation enabled through ligand parameterization. Science 362, 670–674 (2018).
Chuang, K. V. & Keiser, M. J. Comment on “Predicting reaction performance in C–N cross-coupling using machine learning”. Science 362, eaat8603 (2018). This article illustrates the need to incorporate random-control procedures when applying machine learning to new scientific domains and the importance of experimental design.
Nielsen, M. K., Ahneman, D. T., Riera, O. & Doyle, A. G. Deoxyfluorination with sulfonyl fluorides: navigating reaction space with machine learning. J. Am. Chem. Soc. 140, 5004–5008 (2018). This paper demonstrates the use of machine learning on a relatively small dataset obtained by traditional laboratory experimentation.
Reizman, B. J. & Jensen, K. F. Feedback in flow for accelerated reaction development. Acc. Chem. Res. 49, 1786–1796 (2016).
Perera, D. et al. A platform for automated nanomole-scale reaction screening and micromole-scale synthesis in flow. Science 359, 429–434 (2018). This article illustrates that a flow apparatus can accelerate reaction optimization earlier in the drug-discovery process and also provides reliable data that enables other laboratories to build machine-learning algorithms.
Bedard, A.-C. et al. Reconfigurable system for automated optimization of diverse chemical reactions. Science 361, 1220–1225 (2018).
Caramelli, D. et al. Networking chemical robots for reaction multitasking. Nat. Commun. 9, 3406 (2018).
Granda, J. M., Donina, L., Dragone, V., Long, D.-L. & Cronin, L. Controlling an organic synthesis robot with machine learning to search for new reactivity. Nature 559, 377–381 (2018). This article predicts the reactivity of about 1,000 reaction combinations with accuracy greater than 80 per cent after considering the outcomes of slightly over 10 per cent of the dataset and, notably, the approach was also used to calculate the reactivity of published datasets.
Harper, K. C. & Sigman, M. S. Predicting and optimizing asymmetric catalyst performance using the principles of experimental design and steric parameters. Proc. Natl Acad. Sci. USA 108, 2179–2183 (2011).
Zahrt, A. F. et al. Prediction of higher-selectivity catalysts by computer-driven workflow and machine learning. Science 363, eaau5631 (2019).
Matsuda, T. (ed.) Future Directions in Biocatalysis 2nd edn (Elsevier, 2017).
Kan, S. B. J., Russell, D., Lewis, R. D., Chen, K. & Arnold, F. H. Directed evolution of cytochrome c for carbon–silicon bond formation: bringing silicon to life. Science 354, 1048–1051 (2016).
Arnold, F. H. Innovation by evolution: bringing new chemistry to life – Nobel lecture. Nobel Media AB 2019 https://www.nobelprize.org/prizes/chemistry/2018/arnold/lecture/ (2019).
Metsänen, T. T. et al. Combining traditional 2D and modern physical organic-derived descriptors to predict enhanced enantioselectivity for the key aza-Michael conjugate addition in the synthesis of Prevymis™ (letermovir). Chem. Sci. 9, 6922–6927 (2018).
Gedeck, P., Skolnik, S. & Rodde, S. Developing collaborative QSAR models without sharing structures. J. Chem. Inf. Model. 57, 1847–1858 (2017).
Donoho, D. 50 years of data science. J. Comput. Graph. Stat. 26, 745–766 (2017).
Bajusz, D., Racz, A. & Heberger, K. Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations? J. Cheminf. 7, 20 (2015).
Martinot, T. Could Internet-of-Things be the next step in the evolution of chemistry. TetraScience Blog https://blog.tetrascience.com/blog/could-internet-of-things-be-the-next-step-in-the-evolution-of-chemistry/ (2016).
Contreras, J. L. Bermuda’s legacy: policy, patents, and the design of the genome commons. Minn. J. Law Sci. Technol. 12, 61–125 (2011).
Amann, R. I. et al. Toward unrestricted use of public genomic data. Science 363, 350–352 (2019).
Lander, E. S. The heroes of CRISPR. Cell 164, 18–28 (2016).
Baker, M. Is there a reproducibility crisis? Nature 533, 452–454 (2016).
Bergman, R. G. & Danheiser, R. L. Reproducibility in chemical research. Angew. Chem. Int. Ed. 55, 12548–12549 (2016).
Brock, J. “A love letter to your future self”: what scientists need to know about FAIR data. Nature Index https://www.natureindex.com/news-blog/what-scientists-need-to-know-about-fair-data (2019).
Preece, A., Harborne, D., Braines, D., Tomsett, R. & Chakraborty, S. Stakeholders in explainable AI. Preprint at https://arxiv.org/abs/1810.00184 (2018).
Caliskan, A., Bryson, J. J. & Narayanan, A. Semantics derived automatically from language corpora contain human-like biases. Science 356, 183–186 (2017).
Nature thanks Ian Churcher, Jacob Janey and the other anonymous reviewer(s) for their contribution to the peer review of this work.