The discovery of chemical reactions is an inherently unpredictable and time-consuming process1. An attractive alternative is to predict reactivity, although relevant approaches, such as computer-aided reaction design, are still in their infancy2. Reaction prediction based on high-level quantum chemical methods is complex3, even for simple molecules. Although machine learning is powerful for data analysis4,5, its applications in chemistry are still being developed6. Inspired by strategies based on chemists’ intuition7, we propose that a reaction system controlled by a machine learning algorithm may be able to explore the space of chemical reactions quickly, especially if trained by an expert8. Here we present an organic synthesis robot that can perform chemical reactions and analysis faster than they can be performed manually, as well as predict the reactivity of possible reagent combinations after conducting a small number of experiments, thus effectively navigating chemical reaction space. By using machine learning for decision making, enabled by binary encoding of the chemical inputs, the reactions can be assessed in real time using nuclear magnetic resonance and infrared spectroscopy. The machine learning system was able to predict the reactivity of about 1,000 reaction combinations with accuracy greater than 80 per cent after considering the outcomes of slightly over 10 per cent of the dataset. This approach was also used to calculate the reactivity of published datasets. Further, by using real-time data from our robot, these predictions were followed up manually by a chemist, leading to the discovery of four reactions.
Subscribe to Journal
Get full journal access for 1 year
only $3.90 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Collins, K. D., Gensch, T. & Glorius, F. Contemporary screening approaches to reaction discovery and development. Nat. Chem. 6, 859–871 (2014).
Warr, W. A. A short review of chemical reaction database systems, computer-aided synthesis design, reaction prediction and synthetic feasibility. Mol. Inform. 33, 469–476 (2014).
Plata, R. E. & Singleton, D. A. A case study of the mechanism of alcohol-mediated Morita Baylis-Hillman reactions. The importance of experimental observations. J. Am. Chem. Soc. 137, 3811–3826 (2015).
LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).
Jordan, M. I. & Mitchell, T. M. Machine learning: trends, perspectives, and prospects. Science 349, 255–260 (2015).
Raccuglia, P. et al. Machine-learning-assisted materials discovery using failed experiments. Nature 533, 73–76 (2016).
Graulich, N., Hopf, H. & Schreiner, P. R. Heuristic thinking makes a chemist smart. Chem. Soc. Rev. 39, 1503–1512 (2010).
Gil, Y., Greaves, M., Hendler, J. & Hirsh, H. Amplify scientific discovery with artificial intelligence. Science 346, 171–172 (2014).
Trobe, M. & Burke, M. D. The molecular industrial revolution: automated synthesis of small molecules. Angew. Chem. Int. Ed. 57, 4192–4214 (2018).
Ley, S. V., Fitzpatrick, D. E., Ingham, R. J. & Myers, R. M. Organic synthesis: march of the machines. Angew. Chem. Int. Ed. 54, 3449–3464 (2015).
Sans, V. & Cronin, L. Towards dial-a-molecule by integrating continuous flow, analytics and self-optimisation. Chem. Soc. Rev. 45, 2032–2043 (2016).
Houben, C. & Lapkin, A. A. Automatic discovery and optimization of chemical processes. Curr. Opin. Chem. Eng. 9, 1–7 (2015).
Sans, V., Porwol, L., Dragone, V. & Cronin, L. A self optimizing synthetic organic reactor system using real-time in-line NMR spectroscopy. Chem. Sci. 6, 1258–1264 (2015).
Dragone, V., Sans, V., Henson, A. B., Granda, J. M. & Cronin, L. An autonomous organic reaction search engine for chemical reactivity. Nat. Commun. 8, 15733 (2017).
Cortes, C. & Vapnik, V. Support vector networks. Mach. Learn. 20, 273–297 (1995).
Gómez-Bombarelli, R. et al. Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent. Sci. 4, 268–276 (2018).
Bengio, Y., Courville, A. & Vincent, P. Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. 35, 1798–1828 (2013).
Coomans, D., Jonckheer, M., Massart, D. L., Broeckaert, I. & Blockx, P. Application of linear discriminant analysis in the diagnosis of thyroid diseases. Anal. Chim. Acta 103, 409–415 (1978).
Perera, D. et al. A platform for automated nanomole-scale reaction screening and micromole-scale synthesis in flow. Science 359, 429–434 (2018).
Ahneman, D. T., Estrada, J. G., Lin, S., Dreher, S. D. & Doyle, A. G. Predicting reaction performance in C-N cross-coupling using machine learning. Science 360, 186–190 (2018).
Nielsen, M. K., Ahneman, D. T., Riera, O. & Doyle, A. G. Deoxyfluorination with sulfonyl fluorides: navigating reaction space with machine learning. J. Am. Chem. Soc. 140, 5004–5008 (2018).
Bajusz, D., Racz, A. & Heberger, K. Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations? J. Cheminform. 7, 20 (2015).
Palazzolo, A. M. E., Simons, C. L. W. & Burke, M. D. The natural productome. Proc. Natl Acad. Sci. 114, 5564–5566 (2017).
Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
We acknowledge financial support from the EPSRC (grants number EP/H024107/1, EP/I033459/1, EP/J00135X/1, EP/J015156/1, EP/K021966/1, EP/K023004/1, EP/K038885/1, EP/L015668/1 and EP/L023652/1) and the ERC (project 670467 SMART-POM). J.M.G. acknowledges financial support from the Polish Ministry of Science and Higher Education grant number 1295/MOB/IV/2015/0. We thank A. Henson for help with the Tanimoto analysis.
L.C. is the founder and director of DeepMatterGroup PLC and is listed as an inventor on a patent application filed by The University of Glasgow (GB 1810944.7).
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
The chemical inputs (1–18) used in the platform to search for new transformations and to evaluate the performance of the algorithm.
Extended Data Fig. 2 Suggested mechanisms for observed transformations and small library of compounds synthesized.
a, Suggested mechanism for the synthesis of compound 19. b, Small library of compounds synthesized. c, Suggested mechanism for the synthesis of compound 22. d, Suggested mechanism for the synthesis of compound 21.
This file contains Supplementary Tables 1–6, Supplementary Figures 1–73, hardware specification, machine learning details, characterization of new compounds, structural assignments and copies of NMR spectra.
This zipped file contains the X-ray structure of compound 20.
This zipped file contains the X-ray structure of compound 21.
This table shows exemplary run of LDA algorithm exploring the chemical space. The first ninety experiments were chosen randomly and the next subsequent experiments were chosen using the LDA classifier. The name column contains the identity of the reaction composed from the names of the starting materials. The reactivity column contains the assignment of reactivity from SVM classifier for a given reaction mixture.
This table contains the LDA scores for all 969 reaction formed from chemical space shown in Extended Data Fig. 1. The LDA_reactivity column contains scores from LDA and reactivity column contains the assignment of reactivity from SVM classifier for a given reaction mixture.
About this article
Cite this article
Granda, J.M., Donina, L., Dragone, V. et al. Controlling an organic synthesis robot with machine learning to search for new reactivity. Nature 559, 377–381 (2018). https://doi.org/10.1038/s41586-018-0307-8
Nature Communications (2021)
Nature Reviews Methods Primers (2021)
Nature Computational Science (2021)
Reaction screening in multiwell plates: high-throughput optimization of a Buchwald–Hartwig amination
Nature Protocols (2021)
Nature Reviews Chemistry (2021)