Letter | Published:

Controlling an organic synthesis robot with machine learning to search for new reactivity

Naturevolume 559pages377381 (2018) | Download Citation

Abstract

The discovery of chemical reactions is an inherently unpredictable and time-consuming process1. An attractive alternative is to predict reactivity, although relevant approaches, such as computer-aided reaction design, are still in their infancy2. Reaction prediction based on high-level quantum chemical methods is complex3, even for simple molecules. Although machine learning is powerful for data analysis4,5, its applications in chemistry are still being developed6. Inspired by strategies based on chemists’ intuition7, we propose that a reaction system controlled by a machine learning algorithm may be able to explore the space of chemical reactions quickly, especially if trained by an expert8. Here we present an organic synthesis robot that can perform chemical reactions and analysis faster than they can be performed manually, as well as predict the reactivity of possible reagent combinations after conducting a small number of experiments, thus effectively navigating chemical reaction space. By using machine learning for decision making, enabled by binary encoding of the chemical inputs, the reactions can be assessed in real time using nuclear magnetic resonance and infrared spectroscopy. The machine learning system was able to predict the reactivity of about 1,000 reaction combinations with accuracy greater than 80 per cent after considering the outcomes of slightly over 10 per cent of the dataset. This approach was also used to calculate the reactivity of published datasets. Further, by using real-time data from our robot, these predictions were followed up manually by a chemist, leading to the discovery of four reactions.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Change history

  • 24 July 2018

    The chemical structure formatting in Fig. 5 has been corrected online.

References

  1. 1.

    Collins, K. D., Gensch, T. & Glorius, F. Contemporary screening approaches to reaction discovery and development. Nat. Chem. 6, 859–871 (2014).

  2. 2.

    Warr, W. A. A short review of chemical reaction database systems, computer-aided synthesis design, reaction prediction and synthetic feasibility. Mol. Inform. 33, 469–476 (2014).

  3. 3.

    Plata, R. E. & Singleton, D. A. A case study of the mechanism of alcohol-mediated Morita Baylis-Hillman reactions. The importance of experimental observations. J. Am. Chem. Soc. 137, 3811–3826 (2015).

  4. 4.

    LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).

  5. 5.

    Jordan, M. I. & Mitchell, T. M. Machine learning: trends, perspectives, and prospects. Science 349, 255–260 (2015).

  6. 6.

    Raccuglia, P. et al. Machine-learning-assisted materials discovery using failed experiments. Nature 533, 73–76 (2016).

  7. 7.

    Graulich, N., Hopf, H. & Schreiner, P. R. Heuristic thinking makes a chemist smart. Chem. Soc. Rev. 39, 1503–1512 (2010).

  8. 8.

    Gil, Y., Greaves, M., Hendler, J. & Hirsh, H. Amplify scientific discovery with artificial intelligence. Science 346, 171–172 (2014).

  9. 9.

    Trobe, M. & Burke, M. D. The molecular industrial revolution: automated synthesis of small molecules. Angew. Chem. Int. Ed. 57, 4192–4214 (2018).

  10. 10.

    Ley, S. V., Fitzpatrick, D. E., Ingham, R. J. & Myers, R. M. Organic synthesis: march of the machines. Angew. Chem. Int. Ed. 54, 3449–3464 (2015).

  11. 11.

    Sans, V. & Cronin, L. Towards dial-a-molecule by integrating continuous flow, analytics and self-optimisation. Chem. Soc. Rev. 45, 2032–2043 (2016).

  12. 12.

    Houben, C. & Lapkin, A. A. Automatic discovery and optimization of chemical processes. Curr. Opin. Chem. Eng. 9, 1–7 (2015).

  13. 13.

    Sans, V., Porwol, L., Dragone, V. & Cronin, L. A self optimizing synthetic organic reactor system using real-time in-line NMR spectroscopy. Chem. Sci. 6, 1258–1264 (2015).

  14. 14.

    Dragone, V., Sans, V., Henson, A. B., Granda, J. M. & Cronin, L. An autonomous organic reaction search engine for chemical reactivity. Nat. Commun. 8, 15733 (2017).

  15. 15.

    Cortes, C. & Vapnik, V. Support vector networks. Mach. Learn. 20, 273–297 (1995).

  16. 16.

    Gómez-Bombarelli, R. et al. Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent. Sci. 4, 268–276 (2018).

  17. 17.

    Bengio, Y., Courville, A. & Vincent, P. Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. 35, 1798–1828 (2013).

  18. 18.

    Coomans, D., Jonckheer, M., Massart, D. L., Broeckaert, I. & Blockx, P. Application of linear discriminant analysis in the diagnosis of thyroid diseases. Anal. Chim. Acta 103, 409–415 (1978).

  19. 19.

    Perera, D. et al. A platform for automated nanomole-scale reaction screening and micromole-scale synthesis in flow. Science 359, 429–434 (2018).

  20. 20.

    Ahneman, D. T., Estrada, J. G., Lin, S., Dreher, S. D. & Doyle, A. G. Predicting reaction performance in C-N cross-coupling using machine learning. Science 360, 186–190 (2018).

  21. 21.

    Nielsen, M. K., Ahneman, D. T., Riera, O. & Doyle, A. G. Deoxyfluorination with sulfonyl fluorides: navigating reaction space with machine learning. J. Am. Chem. Soc. 140, 5004–5008 (2018).

  22. 22.

    Bajusz, D., Racz, A. & Heberger, K. Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations? J. Cheminform. 7, 20 (2015).

  23. 23.

    Palazzolo, A. M. E., Simons, C. L. W. & Burke, M. D. The natural productome. Proc. Natl Acad. Sci. 114, 5564–5566 (2017).

  24. 24.

    Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).

Download references

Acknowledgements

We acknowledge financial support from the EPSRC (grants number EP/H024107/1, EP/I033459/1, EP/J00135X/1, EP/J015156/1, EP/K021966/1, EP/K023004/1, EP/K038885/1, EP/L015668/1 and EP/L023652/1) and the ERC (project 670467 SMART-POM). J.M.G. acknowledges financial support from the Polish Ministry of Science and Higher Education grant number 1295/MOB/IV/2015/0. We thank A. Henson for help with the Tanimoto analysis.

Author information

Affiliations

  1. School of Chemistry, University of Glasgow, Glasgow, UK

    • Jarosław M. Granda
    • , Liva Donina
    • , Vincenza Dragone
    • , De-Liang Long
    •  & Leroy Cronin

Authors

  1. Search for Jarosław M. Granda in:

  2. Search for Liva Donina in:

  3. Search for Vincenza Dragone in:

  4. Search for De-Liang Long in:

  5. Search for Leroy Cronin in:

Contributions

L.C. conceived the idea, developed the initial algorithms, designed the project and coordinated the efforts of the research team. J.M.G. developed the machine learning algorithms and devised the LDA and built and programmed the chemical robot. J.M.G. conducted experiments and isolated and characterized new compounds with input from L.D. and V.D. J.M.G. and L.C. co-wrote the paper with input from all authors.

Competing interests

L.C. is the founder and director of DeepMatterGroup PLC and is listed as an inventor on a patent application filed by The University of Glasgow (GB 1810944.7).

Corresponding author

Correspondence to Leroy Cronin.

Extended data figures and tables

  1. Extended Data Fig. 1 Reaction space explored.

    The chemical inputs (118) used in the platform to search for new transformations and to evaluate the performance of the algorithm.

  2. Extended Data Fig. 2 Suggested mechanisms for observed transformations and small library of compounds synthesized.

    a, Suggested mechanism for the synthesis of compound 19. b, Small library of compounds synthesized. c, Suggested mechanism for the synthesis of compound 22. d, Suggested mechanism for the synthesis of compound 21.

Supplementary information

  1. Supplementary Information

    This file contains Supplementary Tables 1–6, Supplementary Figures 1–73, hardware specification, machine learning details, characterization of new compounds, structural assignments and copies of NMR spectra.

  2. Supplementary Data

    This zipped file contains the X-ray structure of compound 20.

  3. Supplementary Data

    This zipped file contains the X-ray structure of compound 21.

  4. Supplementary Table

    This table shows exemplary run of LDA algorithm exploring the chemical space. The first ninety experiments were chosen randomly and the next subsequent experiments were chosen using the LDA classifier. The name column contains the identity of the reaction composed from the names of the starting materials. The reactivity column contains the assignment of reactivity from SVM classifier for a given reaction mixture.

  5. Supplementary Table

    This table contains the LDA scores for all 969 reaction formed from chemical space shown in Extended Data Fig. 1. The LDA_reactivity column contains scores from LDA and reactivity column contains the assignment of reactivity from SVM classifier for a given reaction mixture.

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/s41586-018-0307-8

Further reading

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.