Perspective | Published:

The digitization of organic synthesis


Organic chemistry has largely been conducted in an ad hoc manner by academic laboratories that are funded by grants directed towards the investigation of specific goals or hypotheses. Although modern synthetic methods can provide access to molecules of considerable complexity, predicting the outcome of a single chemical reaction remains a major challenge. Improvements in the prediction of ‘above-the-arrow’ reaction conditions are needed to enable intelligent decision making to select an optimal synthetic sequence that is guided by metrics including efficiency, quality and yield. Methods for the communication and the sharing of data will need to evolve from traditional tools to machine-readable formats and open collaborative frameworks. This will accelerate innovation and require the creation of a chemistry commons with standardized data handling, curation and metrics.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.


  1. 1.

    Wöhler, F. Ueber künstliche bildung des harnstoffs. Ann. Phys. 88, 253–256 (1828).

  2. 2.

    Whitesides, G. M. Complex organic synthesis: structure, properties, and/or function? Isr. J. Chem. 58, 142 (2018).

  3. 3.

    Corey, E. J. & Wipke, W. T. Computer-assisted design of complex organic syntheses. Science 166, 178–192 (1969).

  4. 4.

    Corey, E. J., Wipke, W. T., Cramer, R. D. III & Howe, W. J. Computer-assisted synthetic analysis. Facile man–machine communication of chemical structure by interactive computer graphics J. Am. Chem. Soc. 94, 421–430 (1972).

  5. 5.

    Szymkuć, S. et al. Computer-assisted synthetic planning: the end of the beginning. Angew. Chem. Int. Ed. 55, 5904–5937 (2016).

  6. 6.

    Coley, C. W., Green, W. H. & Jensen, K. F. Machine learning in computer-aided synthesis planning. Acc. Chem. Res. 51, 1281–1289 (2018).

  7. 7.

    Klucznik, T. et al. Efficient syntheses of diverse, medicinally relevant targets planned by computer and executed in the laboratory. Chem 4, 522–532 (2018).

  8. 8.

    Silver, D. et al. Mastering the game of Go without human knowledge. Nature 550, 354–359 (2017).

  9. 9.

    Schwaller, P., Gaudin, T., Lanyi, D., Bekas, C. & Laino, T. “Found in translation”: predicting outcomes of complex organic chemistry reactions using neural sequence-to-sequence models. Chem. Sci. 9, 6091–6098 (2018).

  10. 10.

    Segler, M. H. S. & Waller, M. P. Modelling chemical reasoning to predict and invent reactions. Chem. Eur. J. 23, 6118–6128 (2017).

  11. 11.

    Segler, M. H. S., Preuss, M. & Waller, M. P. Planning chemical syntheses with deep neural networks and symbolic AI. Nature 555, 604–610 (2018).

  12. 12.

    Kasparov, G. Chess, a Drosophila of reasoning. Science 362, 1087 (2018).

  13. 13.

    Cernak, T. A machine with chemical intuition. Chem 4, 401–403 (2018).

  14. 14.

    Steiner, S. et al. Organic synthesis in a modular robotic system driven by a chemical programming language. Science 363, eaav2211 (2019).

  15. 15.

    Garg, N. K. Empowering students to innovate: engagement in organic chemistry teaching. Angew. Chem. Int. Ed. 57, 15612–15613 (2018).

  16. 16.

    Engkvist, O. et al. Computational prediction of chemical reactions: current status and outlook. Drug Discov. Today 23, 1203–1218 (2018).

  17. 17.

    Gaich, T. & Baran, P. S. Aiming for the ideal synthesis. J. Org. Chem. 75, 4657–4673 (2010).

  18. 18.

    Trost, B. M. The atom economy—a search for synthetic efficiency. Science 254, 1471–1477 (1991).

  19. 19.

    Burns, N. Z., Baran, P. S. & Hoffmann, R. W. Redox economy in organic synthesis. Angew. Chem. Int. Ed. 48, 2854–2867 (2009).

  20. 20.

    Cernijenko, A., Risgaard, R. & Baran, P. S. 11-step total synthesis of (−)-maoecrystal V. J. Am. Chem. Soc. 138, 9425–9428 (2016).

  21. 21.

    Griffen, E. J., Dossetter, A. G., Leach, A. G. & Montague, S. Can we accelerate medicinal chemistry by augmenting the chemist with Big Data and artificial intelligence? Drug Discov. Today 23, 1373–1384 (2018).

  22. 22.

    Kutchukian, P. S. et al. Chemistry informer libraries: a chemoinformatics enabled approach to evaluate and advance synthetic methods. Chem. Sci. 7, 2604–2613 (2016).

  23. 23.

    Yao, H. et al. Enabling efficient late-stage functionalization of drug-like molecules with LC-MS and reaction-driven data processing. Eur. J. Org. Chem. 2017, 7122–7126 (2017).

  24. 24.

    Yasuda, N. (ed.) The Art of Process Chemistry (Wiley-VCH, 2010).

  25. 25.

    Li, J., Albrecht, J., Borovika, A. & Eastgate, M. D. Evolving green chemistry metrics into predictive tools for decision making and benchmarking analytics. ACS Sustainable Chem. Eng. 6, 1121–1132 (2018).

  26. 26.

    Trobe, M. & Burke, M. D. The molecular industrial revolution: automated synthesis of small molecules. Angew. Chem. Int. Ed. 57, 4192–4214 (2018).

  27. 27.

    Buitrago Santanilla, A. et al. Nanomole-scale high-throughput chemistry for the synthesis of complex molecules. Science 347, 49–53 (2015).

  28. 28.

    Gesmundo, N. et al. Nanoscale synthesis and affinity ranking. Nature 557, 228–232 (2018).

  29. 29.

    Sanchez-Lengeling, B. & Aspuru-Guzik, A. Inverse molecular design using machine learning: generative models for matter engineering. Science 361, 360–365 (2018).

  30. 30.

    Schneider, G. Automating drug discovery. Nat. Rev. Drug Discov. 17, 97–113 (2018).

  31. 31.

    Lin, S. et al. Mapping the dark space of chemical reactions with extended nanomole synthesis and MALDI-TOF MS. Science 361, eaar6236 (2018).

  32. 32.

    Ahneman, D. T., Estrada, J. G., Lin, S., Dreher, S. D. & Doyle, A. G. Predicting reaction performance in C–N cross-coupling using machine learning. Science 360, 186–190 (2018). This article demonstrates machine learning in prediction of the performance of a catalytic reaction using data obtained via high-throughput experimentation.

  33. 33.

    Zhao, S. et al. Enantiodivergent Pd-catalyzed C–C bond formation enabled through ligand parameterization. Science 362, 670–674 (2018).

  34. 34.

    Chuang, K. V. & Keiser, M. J. Comment on “Predicting reaction performance in C–N cross-coupling using machine learning”. Science 362, eaat8603 (2018). This article illustrates the need to incorporate random-control procedures when applying machine learning to new scientific domains and the importance of experimental design.

  35. 35.

    Nielsen, M. K., Ahneman, D. T., Riera, O. & Doyle, A. G. Deoxyfluorination with sulfonyl fluorides: navigating reaction space with machine learning. J. Am. Chem. Soc. 140, 5004–5008 (2018). This paper demonstrates the use of machine learning on a relatively small dataset obtained by traditional laboratory experimentation.

  36. 36.

    Reizman, B. J. & Jensen, K. F. Feedback in flow for accelerated reaction development. Acc. Chem. Res. 49, 1786–1796 (2016).

  37. 37.

    Perera, D. et al. A platform for automated nanomole-scale reaction screening and micromole-scale synthesis in flow. Science 359, 429–434 (2018). This article illustrates that a flow apparatus can accelerate reaction optimization earlier in the drug-discovery process and also provides reliable data that enables other laboratories to build machine-learning algorithms.

  38. 38.

    Bedard, A.-C. et al. Reconfigurable system for automated optimization of diverse chemical reactions. Science 361, 1220–1225 (2018).

  39. 39.

    Caramelli, D. et al. Networking chemical robots for reaction multitasking. Nat. Commun. 9, 3406 (2018).

  40. 40.

    Granda, J. M., Donina, L., Dragone, V., Long, D.-L. & Cronin, L. Controlling an organic synthesis robot with machine learning to search for new reactivity. Nature 559, 377–381 (2018). This article predicts the reactivity of about 1,000 reaction combinations with accuracy greater than 80 per cent after considering the outcomes of slightly over 10 per cent of the dataset and, notably, the approach was also used to calculate the reactivity of published datasets.

  41. 41.

    Harper, K. C. & Sigman, M. S. Predicting and optimizing asymmetric catalyst performance using the principles of experimental design and steric parameters. Proc. Natl Acad. Sci. USA 108, 2179–2183 (2011).

  42. 42.

    Zahrt, A. F. et al. Prediction of higher-selectivity catalysts by computer-driven workflow and machine learning. Science 363, eaau5631 (2019).

  43. 43.

    Matsuda, T. (ed.) Future Directions in Biocatalysis 2nd edn (Elsevier, 2017).

  44. 44.

    Kan, S. B. J., Russell, D., Lewis, R. D., Chen, K. & Arnold, F. H. Directed evolution of cytochrome c for carbon–silicon bond formation: bringing silicon to life. Science 354, 1048–1051 (2016).

  45. 45.

    Arnold, F. H. Innovation by evolution: bringing new chemistry to life – Nobel lecture. Nobel Media AB 2019 (2019).

  46. 46.

    Metsänen, T. T. et al. Combining traditional 2D and modern physical organic-derived descriptors to predict enhanced enantioselectivity for the key aza-Michael conjugate addition in the synthesis of Prevymis™ (letermovir). Chem. Sci. 9, 6922–6927 (2018).

  47. 47.

    Gedeck, P., Skolnik, S. & Rodde, S. Developing collaborative QSAR models without sharing structures. J. Chem. Inf. Model. 57, 1847–1858 (2017).

  48. 48.

    Donoho, D. 50 years of data science. J. Comput. Graph. Stat. 26, 745–766 (2017).

  49. 49.

    Bajusz, D., Racz, A. & Heberger, K. Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations? J. Cheminf. 7, 20 (2015).

  50. 50.

    Martinot, T. Could Internet-of-Things be the next step in the evolution of chemistry. TetraScience Blog (2016).

  51. 51.

    Contreras, J. L. Bermuda’s legacy: policy, patents, and the design of the genome commons. Minn. J. Law Sci. Technol. 12, 61–125 (2011).

  52. 52.

    Amann, R. I. et al. Toward unrestricted use of public genomic data. Science 363, 350–352 (2019).

  53. 53.

    Lander, E. S. The heroes of CRISPR. Cell 164, 18–28 (2016).

  54. 54.

    Baker, M. Is there a reproducibility crisis? Nature 533, 452–454 (2016).

  55. 55.

    Bergman, R. G. & Danheiser, R. L. Reproducibility in chemical research. Angew. Chem. Int. Ed. 55, 12548–12549 (2016).

  56. 56.

    Brock, J. “A love letter to your future self”: what scientists need to know about FAIR data. Nature Index (2019).

  57. 57.

    Preece, A., Harborne, D., Braines, D., Tomsett, R. & Chakraborty, S. Stakeholders in explainable AI. Preprint at (2018).

  58. 58.

    Caliskan, A., Bryson, J. J. & Narayanan, A. Semantics derived automatically from language corpora contain human-like biases. Science 356, 183–186 (2017).

Download references

Reviewer information

Nature thanks Ian Churcher, Jacob Janey and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Author information

Competing interests

The author declares no competing interests.

Correspondence to Ian W. Davies.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark
Fig. 1: Above-the-arrow conditions and the digitization of organic synthesis.
Fig. 2: Optimizing one step in the total synthesis of maoecrystal V.
Fig. 3: Reaction prediction of a deoxyfluorination, a high-value transformation in medicinal chemistry, using machine learning.
Fig. 4: Accelerated reaction development in flow and reaction prediction.


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.