Synthetic organic chemistry driven by artificial intelligence


Synthetic organic chemistry underpins several areas of chemistry, including drug discovery, chemical biology, materials science and engineering. However, the execution of complex chemical syntheses in itself requires expert knowledge, usually acquired over many years of study and hands-on laboratory practice. The development of technologies with potential to streamline and automate chemical synthesis is a half-century-old endeavour yet to be fulfilled. Renewed interest in artificial intelligence (AI), driven by improved computing power, data availability and algorithms, is overturning the limited success previously obtained. In this Review, we discuss the recent impact of AI on different tasks of synthetic chemistry and dissect selected examples from the literature. By examining the underlying concepts, we aim to demystify AI for bench chemists in order that they may embrace it as a tool rather than fear it as a competitor, spur future research by pinpointing the gaps in knowledge and delineate how chemical AI will run in the era of digital chemistry.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: Variability in chemical reaction data available from patents (1976–2016).
Fig. 2: Similarity search for in silico retrosynthesis analysis.
Fig. 3: Artificial intelligence tools for retrosynthetic analysis.
Fig. 4: Comparison of two methods for the prediction of reaction products.
Fig. 5: Active learning for the optimization of reaction conditions.
Fig. 6: Automated discovery of new chemistry.
Fig. 7: Networking robots.


  1. 1.

    Nantermet, P. G. Reaction: the art of synthetic chemistry. Chem 1, 335–336 (2016).

    CAS  Google Scholar 

  2. 2.

    Nicolaou, K. C. & Chen, J. S. The art of total synthesis through cascade reactions. Chem. Soc. Rev. 38, 2993–3009 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  3. 3.

    Baran, P. S. Natural product total synthesis: as exciting as ever and here to stay. J. Am. Chem. Soc. 140, 4751–4755 (2018).

    CAS  PubMed  Google Scholar 

  4. 4.

    Ley, S. V. The engineering of chemical synthesis: humans and machines working in harmony. Angew. Chem. Int. Ed. 57, 5182–5183 (2018).

    CAS  Google Scholar 

  5. 5.

    Bergman, R. G. & Danheiser, R. L. Reproducibility in chemical research. Angew. Chem. Int. Ed. 55, 12548–12549 (2016).

    CAS  Google Scholar 

  6. 6.

    Duros, V. et al. Human versus robots in the discovery and crystallization of gigantic polyoxometalates. Angew. Chem. Int. Ed. 56, 10815–10820 (2017).

    CAS  Google Scholar 

  7. 7.

    Roch, L. M. et al. ChemOS: Orchestrating autonomous experimentation. Science Robot. 3, eaat5559 (2018).

    Google Scholar 

  8. 8.

    Schneider, G. Mind and machine in drug design. Nat. Mach. Intell. 1, 128–130 (2019).

    Google Scholar 

  9. 9.

    Wang, Y. et al. Acoustic droplet ejection enabled automated reaction scouting. ACS Cent. Sci. 5, 451–457 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  10. 10.

    Fitzpatrick, D. E., Battilocchio, C. & Ley, S. V. Enabling technologies for the future of chemical synthesis. ACS Cent. Sci. 2, 131–138 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  11. 11.

    Ley, S. V., Fitzpatrick, D. E., Myers, R. M., Battilocchio, C. & Ingham, R. J. Machine-assisted organic synthesis. Angew. Chem. Int. Ed. 54, 10122–10136 (2015).

    CAS  Google Scholar 

  12. 12.

    Lehmann, J. W., Blair, D. J. & Burke, M. D. Toward generalization of iterative small molecule synthesis. Nat. Rev. Chem. 2, 0115 (2018).

    PubMed  PubMed Central  Google Scholar 

  13. 13.

    Corey, E. J. & Wipke, W. T. Computer-assisted design of complex organic syntheses. Science 166, 178–192 (1969).

    CAS  PubMed  Google Scholar 

  14. 14.

    Pensak, D. A. & Corey, E. J. in Computer-Assisted Organic Synthesis Ch. 1 (eds Wipke, W. T. & Howe, W. J.) 1-32 (American Chemical Society, 1977).

  15. 15.

    Lajiness, M. S., Maggiora, G. M. & Shanmugasundaram, V. Assessment of the consistency of medicinal chemists in reviewing sets of compounds. J. Med. Chem. 47, 4891–4896 (2004).

    CAS  PubMed  Google Scholar 

  16. 16.

    Earkin, D. R. & Warr, W. A. in Computer-Assisted Organic Synthesis Ch. 10 (eds Wipke, W. T. & Howe, W. J.) 217-226 (American Chemical Society, 1977).

  17. 17.

    Sridharan, N. S. in Computer-Assisted Organic Synthesis Ch. 7 (eds Wipke, W. T. & Howe, W. J.) 148-178 (American Chemical Society, 1977).

  18. 18.

    Wipke, W. T., Ouchi, G. I. & Krishnan, S. Simulation and evaluation of chemical synthesis—SECS: An application of artificial intelligence techniques. Artif. Intell. 11, 173–193 (1978).

    Google Scholar 

  19. 19.

    Hessler, G. & Baringhaus, K. H. Artificial intelligence in drug design. Molecules 23, E2520 (2018).

    PubMed  Google Scholar 

  20. 20.

    Sellwood, M. A., Ahmed, M., Segler, M. H. & Brown, N. Artificial intelligence in drug discovery. Future Med. Chem. 10, 2025–2028 (2018).

    CAS  PubMed  Google Scholar 

  21. 21.

    Aspuru-Guzik, A., Lindh, R. & Reiher, M. The matter simulation (r)evolution. ACS Cent. Sci. 4, 144–152 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  22. 22.

    Lusher, S. J., McGuire, R., van Schaik, R. C., Nicholson, C. D. & de Vlieg, J. Data-driven medicinal chemistry in the era of big data. Drug Discov. Today 19, 859–868 (2014).

    CAS  PubMed  Google Scholar 

  23. 23.

    Tetko, I. V., Engkvist, O., Koch, U., Reymond, J. L. & Chen, H. BIGCHEM: challenges and opportunities for big data analysis in chemistry. Mol. Inf. 35, 615–621 (2016).

    CAS  Google Scholar 

  24. 24.

    Henson, A. B., Gromski, P. S. & Cronin, L. Designing algorithms to aid discovery by chemical robots. ACS Cent. Sci. 4, 793–804 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  25. 25.

    Rich, A. S. & Gureckis, T. M. Lessons for artificial intelligence from the study of natural stupidity. Nat. Mach. Intell. 1, 174–180 (2019).

    Google Scholar 

  26. 26.

    Ekins, S. et al. Exploiting machine learning for end-to-end drug discovery and development. Nat. Mater. 18, 435–441 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  27. 27.

    Wishart, D. S. et al. DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res. 46, D1074–D1082 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  28. 28.

    Gaulton, A. et al. The ChEMBL database in 2017. Nucleic Acids Res. 45, D945–D954 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  29. 29.

    Kim, S. et al. PubChem 2019 update: improved access to chemical data. Nucleic Acids Res. 47, D1102–D1109 (2019).

    PubMed  Google Scholar 

  30. 30.

    Grzybowski, B. A. et al. Chematica: A story of computer code that started to think like a chemist. Chem 4, 390–398 (2018).

    CAS  Google Scholar 

  31. 31.

    Segler, M. H. S., Preuss, M. & Waller, M. P. Planning chemical syntheses with deep neural networks and symbolic AI. Nature 555, 604–610 (2018).

    CAS  PubMed  Google Scholar 

  32. 32.

    Schneider, N., Lowe, D. M., Sayle, R. A., Tarselli, M. A. & Landrum, G. A. Big data from pharmaceutical patents: A computational analysis of medicinal chemists’ bread and butter. J. Med. Chem. 59, 4385–4402 (2016).

    CAS  PubMed  Google Scholar 

  33. 33.

    Ahneman, D. T., Estrada, J. G., Lin, S., Dreher, S. D. & Doyle, A. G. Predicting reaction performance in C–N cross-coupling using machine learning. Science 360, 186–190 (2018).

    CAS  PubMed  Google Scholar 

  34. 34.

    Roughley, S. D. & Jordan, A. M. The medicinal chemist’s toolbox: an analysis of reactions used in the pursuit of drug candidates. J. Med. Chem. 54, 3451–3479 (2011).

    CAS  PubMed  Google Scholar 

  35. 35.

    Lowe, D. AI designs organic syntheses. Nature 555, 592–593 (2018).

    CAS  PubMed  Google Scholar 

  36. 36.

    Coley, C. W., Green, W. H. & Jensen, K. F. Machine learning in computer-aided synthesis planning. Acc. Chem. Res. 51, 1281–1289 (2018).

    CAS  PubMed  Google Scholar 

  37. 37.

    Gelernter, H. L. et al. Empirical explorations of SYNCHEM. Science 197, 1041–1049 (1977).

    CAS  PubMed  Google Scholar 

  38. 38.

    Cadeddu, A., Wylie, E. K., Jurczak, J., Wampler-Doty, M. & Grzybowski, B. A. Organic chemistry as a language and the implications of chemical linguistics for structural and retrosynthetic analyses. Angew. Chem. Int. Ed. 53, 8108–8112 (2014).

    CAS  Google Scholar 

  39. 39.

    Coley, C. W., Rogers, L., Green, W. H. & Jensen, K. F. Computer-assisted retrosynthesis based on molecular similarity. ACS Cent. Sci. 3, 1237–1245 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  40. 40.

    Hartenfeller, M. et al. DOGS: reaction-driven de novo design of bioactive compounds. PLoS Comput. Biol. 8, e1002380 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  41. 41.

    Rodrigues, T. et al. De novo design and optimization of Aurora A kinase inhibitors. Chem. Sci. 4, 1229–1233 (2013).

    CAS  Google Scholar 

  42. 42.

    Rodrigues, T. et al. Steering target selectivity and potency by fragment-based de novo drug design. Angew. Chem. Int. Ed. 52, 10006–10009 (2013).

    CAS  Google Scholar 

  43. 43.

    Friedrich, L., Rodrigues, T., Neuhaus, C. S., Schneider, P. & Schneider, G. From complex natural products to simple synthetic mimetics by computational de novo design. Angew. Chem. Int. Ed. 55, 6789–6792 (2016).

    CAS  Google Scholar 

  44. 44.

    Lewell, X. Q., Judd, D. B., Watson, S. P. & Hann, M. M. RECAP — retrosynthetic combinatorial analysis procedure: a powerful new technique for identifying privileged molecular fragments with useful applications in combinatorial chemistry. J. Chem. Inf. Comput. Sci. 38, 511–522 (1998).

    CAS  PubMed  Google Scholar 

  45. 45.

    Reker, D., Bernardes, G. J. L. & Rodrigues, T. Computational advances in combating colloidal aggregation in drug discovery. Nat. Chem. 11, 402–418 (2019).

    CAS  PubMed  Google Scholar 

  46. 46.

    Liu, B. et al. Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS Cent. Sci. 3, 1103–1113 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  47. 47.

    Altae-Tran, H., Ramsundar, B., Pappu, A. S. & Pande, V. Low data drug discovery with one-shot learning. ACS Cent. Sci. 3, 283–293 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  48. 48.

    Chen, H., Engkvist, O., Wang, Y., Olivecrona, M. & Blaschke, T. The rise of deep learning in drug discovery. Drug Discov. Today 23, 1241–1250 (2018).

    PubMed  Google Scholar 

  49. 49.

    Ching, T. et al. Opportunities and obstacles for deep learning in biology and medicine. J. R. Soc. Interface 15, 20170387 (2018).

    PubMed  PubMed Central  Google Scholar 

  50. 50.

    Baylon, J. L., Cilfone, N. A., Gulcher, J. R. & Chittenden, T. W. Enhancing retrosynthetic reaction prediction with deep learning using multiscale reaction classification. J. Chem. Inf. Model. 59, 673–688 (2019).

    CAS  PubMed  Google Scholar 

  51. 51.

    Fialkowski, M., Bishop, K. J., Chubukov, V. A., Campbell, C. J. & Grzybowski, B. A. Architecture and evolution of organic chemistry. Angew. Chem. Int. Ed. 44, 7263–7269 (2005).

    CAS  Google Scholar 

  52. 52.

    Gothard, C. M. et al. Rewiring chemistry: algorithmic discovery and experimental validation of one-pot reactions in the network of organic chemistry. Angew. Chem. Int. Ed. 51, 7922–7927 (2012).

    CAS  Google Scholar 

  53. 53.

    Grzybowski, B. A., Bishop, K. J., Kowalczyk, B. & Wilmer, C. E. The ‘wired’ universe of organic chemistry. Nat. Chem. 1, 31–36 (2009).

    CAS  PubMed  Google Scholar 

  54. 54.

    Kowalik, M. et al. Parallel optimization of synthetic pathways within the network of organic chemistry. Angew. Chem. Int. Ed. 51, 7928–7932 (2012).

    CAS  Google Scholar 

  55. 55.

    Segler, M. H. S. & Waller, M. P. Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chem. Eur. J. 23, 5966–5971 (2017).

    CAS  PubMed  Google Scholar 

  56. 56.

    Silver, D. et al. Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489 (2016).

    CAS  PubMed  Google Scholar 

  57. 57.

    Browne, C. et al. A survey of Monte Carlo tree search methods. IEEE Trans. Comput. Intell. AI Games 4, 1–43 (2012).

    Google Scholar 

  58. 58.

    Schreck, J. S., Coley, C. W. & Bishop, K. J. M. Learning retrosynthetic planning through simulated experience. ACS Cent. Sci. 5, 970–981 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  59. 59.

    Szymkuc, S. et al. Computer-assisted synthetic planning: The end of the beginning. Angew. Chem. Int. Ed. 55, 5904–5937 (2016).

    CAS  Google Scholar 

  60. 60.

    Klucznik, T. et al. Efficient syntheses of diverse, medicinally relevant targets planned by computer and executed in the laboratory. Chem 4, 522–532 (2018).

    CAS  Google Scholar 

  61. 61.

    Molga, K., Dittwald, P. & Grzybowski, B. A. Navigating around patented routes by preserving specific motifs along computer-planned retrosynthetic pathways. Chem 5, 460–473 (2019).

    CAS  Google Scholar 

  62. 62.

    Badowski, T., Molga, K. & Grzybowski, B. A. Selection of cost-effective yet chemically diverse pathways from the networks of computer-generated retrosynthetic plans. Chem. Sci. 10, 4640–4651 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  63. 63.

    Burke, K. Perspective on density functional theory. J. Chem. Phys. 136, 150901 (2012).

    PubMed  Google Scholar 

  64. 64.

    Chermette, H. Chemical reactivity indexes in density functional theory. J. Comput. Chem. 20, 129–154 (1999).

    CAS  Google Scholar 

  65. 65.

    Hegde, G. & Bowen, R. C. Machine-learned approximations to density functional theory Hamiltonians. Sci. Rep. 7, 42669 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  66. 66.

    Smith, J. S., Isayev, O. & Roitberg, A. E. ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost. Chem. Sci. 8, 3192–3203 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  67. 67.

    Grisafi, A. et al. Transferable machine-learning model of the electron density. ACS Cent. Sci. 5, 57–64 (2019).

    CAS  PubMed  Google Scholar 

  68. 68.

    Sadowski, P., Fooshee, D., Subrahmanya, N. & Baldi, P. Synergies between quantum mechanics and machine learning in reaction prediction. J. Chem. Inf. Model. 56, 2125–2128 (2016).

    CAS  PubMed  Google Scholar 

  69. 69.

    Moosavi, S. M. et al. Capturing chemical intuition in synthesis of metal-organic frameworks. Nat. Commun. 10, 539 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  70. 70.

    Raccuglia, P. et al. Machine-learning-assisted materials discovery using failed experiments. Nature 533, 73–76 (2016).

    CAS  PubMed  Google Scholar 

  71. 71.

    Kayala, M. A., Azencott, C. A., Chen, J. H. & Baldi, P. Learning to predict chemical reactions. J. Chem. Inf. Model. 51, 2209–2222 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  72. 72.

    Fooshee, D. et al. Deep learning for chemical reaction prediction. Mol. Syst. Des. Eng. 3, 442–452 (2018).

    CAS  Google Scholar 

  73. 73.

    Schwaller, P., Gaudin, T., Lanyi, D., Bekas, C. & Laino, T. “Found in Translation”: predicting outcomes of complex organic chemistry reactions using neural sequence-to-sequence models. Chem. Sci. 9, 6091–6098 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  74. 74.

    Wei, J. N., Duvenaud, D. & Aspuru-Guzik, A. Neural networks for the prediction of organic chemistry reactions. ACS Cent. Sci. 2, 725–732 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  75. 75.

    Hughes, T. B., Dang, N. L., Miller, G. P. & Swamidass, S. J. Modeling reactivity to biological macromolecules with a deep multitask network. ACS Cent. Sci. 2, 529–537 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  76. 76.

    Hughes, T. B., Miller, G. P. & Swamidass, S. J. Modeling epoxidation of drug-like molecules with a deep machine learning network. ACS Cent. Sci. 1, 168–180 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  77. 77.

    Coley, C. W., Barzilay, R., Jaakkola, T. S., Green, W. H. & Jensen, K. F. Prediction of organic reaction outcomes using machine learning. ACS Cent. Sci. 3, 434–443 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  78. 78.

    Coley, C. W. et al. A graph-convolutional neural network model for the prediction of chemical reactivity. Chem. Sci. 10, 370–377 (2019).

    CAS  PubMed  Google Scholar 

  79. 79.

    Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).

    Google Scholar 

  80. 80.

    Ho, T. K. The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20, 832–844 (1998).

    Google Scholar 

  81. 81.

    Rodrigues, T. et al. De novo fragment design for drug discovery and chemical biology. Angew. Chem. Int. Ed. 54, 15079–15083 (2015).

    CAS  Google Scholar 

  82. 82.

    Rodrigues, T. et al. Machine intelligence decrypts beta-lapachone as an allosteric 5-lipoxygenase inhibitor. Chem. Sci. 9, 6899–6903 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  83. 83.

    Richter, M. F. et al. Predictive compound accumulation rules yield a broad-spectrum antibiotic. Nature 545, 299–304 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  84. 84.

    Wolfe, J. M. et al. Machine learning to predict cell-penetrating peptides for antisense delivery. ACS Cent. Sci. 4, 512–520 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  85. 85.

    Chuang, K. V. & Keiser, M. J. Comment on “Predicting reaction performance in C–N cross-coupling using machine learning”. Science 362, eaat8603 (2018).

    PubMed  Google Scholar 

  86. 86.

    Estrada, J. G., Ahneman, D. T., Sheridan, R. P., Dreher, S. D. & Doyle, A. G. Response to Comment on “Predicting reaction performance in C–N cross-coupling using machine learning”. Science 362, eaat8763 (2018).

    PubMed  Google Scholar 

  87. 87.

    Skoraczynski, G. et al. Predicting the outcomes of organic reactions via machine learning: are current descriptors sufficient? Sci. Rep. 7, 3582 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  88. 88.

    Chuang, K. V. & Keiser, M. J. Adversarial controls for scientific machine learning. ACS Chem. Biol. 13, 2819–2821 (2018).

    CAS  PubMed  Google Scholar 

  89. 89.

    Beker, W., Gajewska, E. P., Badowski, T. & Grzybowski, B. A. Prediction of major regio-, site-, and diastereoisomers in diels-alder reactions by using machine-learning: the importance of physically meaningful descriptors. Angew. Chem. Int. Ed. 58, 4515–4519 (2019).

    CAS  Google Scholar 

  90. 90.

    Nielsen, M. K., Ahneman, D. T., Riera, O. & Doyle, A. G. Deoxyfluorination with sulfonyl fluorides: navigating reaction space with machine learning. J. Am. Chem. Soc. 140, 5004–5008 (2018).

    CAS  PubMed  Google Scholar 

  91. 91.

    Halford, G. S., Baker, R., McCredden, J. E. & Bain, J. D. How many variables can humans process? Psychol. Sci. 16, 70–76 (2005).

    PubMed  Google Scholar 

  92. 92.

    Leardi, R. Experimental design in chemistry: A tutorial. Anal. Chim. Acta 652, 161–172 (2009).

    CAS  PubMed  Google Scholar 

  93. 93.

    Murray, P. M. et al. The application of design of experiments (DoE) reaction optimisation and solvent selection in the development of new synthetic chemistry. Org. Biomol. Chem. 14, 2373–2384 (2016).

    CAS  PubMed  Google Scholar 

  94. 94.

    Austin, N. D., Sahinidis, N. V., Konstantinov, I. A. & Trahan, D. W. COSMO-based computer-aided molecular/mixture design: A focus on reaction solvents. AIChE J. 63, 104–122 (2018).

    Google Scholar 

  95. 95.

    Struebing, H. et al. Computer-aided molecular design of solvents for accelerated reaction kinetics. Nat. Chem. 5, 952–957 (2013).

    CAS  PubMed  Google Scholar 

  96. 96.

    Truhlar, D. G. Chemical reactivity: Inverse solvent design. Nat. Chem. 5, 902–903 (2013).

    CAS  PubMed  Google Scholar 

  97. 97.

    Gao, H. et al. Using machine learning to predict suitable conditions for organic reactions. ACS Cent. Sci. 4, 1465–1476 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  98. 98.

    Zhou, Z., Li, X. & Zare, R. N. Optimizing chemical reactions with deep reinforcement learning. ACS Cent. Sci. 3, 1337–1344 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  99. 99.

    Bedard, A. C. et al. Reconfigurable system for automated optimization of diverse chemical reactions. Science 361, 1220–1225 (2018).

    CAS  PubMed  Google Scholar 

  100. 100.

    Reker, D. & Schneider, G. Active-learning strategies in computer-assisted drug discovery. Drug. Discov. Today 20, 458–465 (2015).

    PubMed  Google Scholar 

  101. 101.

    Reker, D., Schneider, P. & Schneider, G. Multi-objective active machine learning rapidly improves structure–activity models and reveals new protein–protein interaction inhibitors. Chem. Sci. 7, 3919–3927 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  102. 102.

    Reker, D. & Brown, J. B. Selection of informative examples in chemogenomic datasets. Methods Mol. Biol. 1825, 369–410 (2018).

    CAS  PubMed  Google Scholar 

  103. 103.

    Reker, D., Schneider, P., Schneider, G. & Brown, J. B. Active learning for computational chemogenomics. Future Med. Chem. 9, 381–402 (2017).

    CAS  PubMed  Google Scholar 

  104. 104.

    Sans, V., Porwol, L., Dragone, V. & Cronin, L. A self optimizing synthetic organic reactor system using real-time in-line NMR spectroscopy. Chem. Sci. 6, 1258–1264 (2015).

    CAS  PubMed  Google Scholar 

  105. 105.

    Häse, F., Roch, L. M., Kreisbeck, C. & Aspuru-Guzik, A. Phoenics: A Bayesian optimizer for chemistry. ACS Cent. Sci. 4, 1134–1145 (2018).

    PubMed  PubMed Central  Google Scholar 

  106. 106.

    Frazier, P. I. A tutorial on Bayesian optimization. Preprint at arXiv (2018).

  107. 107.

    Brochu, E., Cora, V. M. & Freitas, N. d. A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. Preprint at arXiv (2010).

  108. 108.

    Reker, D., Bernardes, G. J. L. & Rodrigues, T. Evolving and nano data enabled machine intelligence for chemical reaction optimization. Preprint at ChemRxiv (2018).

  109. 109.

    Granda, J. M., Donina, L., Dragone, V., Long, D. L. & Cronin, L. Controlling an organic synthesis robot with machine learning to search for new reactivity. Nature 559, 377–381 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  110. 110.

    Ahmadi, M., Vogt, M., Iyer, P., Bajorath, J. & Frohlich, H. Predicting potent compounds via model-based global optimization. J. Chem. Inf. Model. 53, 553–559 (2013).

    CAS  PubMed  Google Scholar 

  111. 111.

    Patil, P. C. & Luzzio, F. A. Synthesis of extended oxazoles II: Reaction manifold of 2-(halomethyl)-4,5-diaryloxazoles. Tetrahedron Lett. 57, 757–759 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  112. 112.

    Blakemore, D. C. et al. Organic synthesis provides opportunities to transform drug discovery. Nat. Chem. 10, 383–394 (2018).

    CAS  PubMed  Google Scholar 

  113. 113.

    Roberts, R. M. Serendipity: Accidental Discoveries in Science 1-288 (John Wiley & Sons, 1989).

  114. 114.

    Davey, S. Rapid reaction discovery. Nat. Chem. 4, 69 (2012).

    CAS  Google Scholar 

  115. 115.

    McNally, A., Prier, C. K. & MacMillan, D. W. Discovery of an alpha-amino C–H arylation reaction using the strategy of accelerated serendipity. Science 334, 1114–1117 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  116. 116.

    Amara, Z. et al. Automated serendipity with self-optimizing continuous-flow reactors. Eur. J. Org. Chem. 2015, 6141–6145 (2015).

    CAS  Google Scholar 

  117. 117.

    Dragone, V., Sans, V., Henson, A. B., Granda, J. M. & Cronin, L. An autonomous organic reaction search engine for chemical reactivity. Nat. Commun. 8, 15733 (2017).

    PubMed  PubMed Central  Google Scholar 

  118. 118.

    Gromski, P. S., Henson, A. B., Granda, J. M. & Cronin, L. How to explore chemical space using algorithms and automation. Nat. Rev. Chem. 3, 119–128 (2019).

    Google Scholar 

  119. 119.

    Cao, Y., Romero, J. & Aspuru-Guzik, A. Potential of quantum computing for drug discovery. IBM J. Res. Dev. 62, 6:1–6:20 (2019).

    Google Scholar 

  120. 120.

    Rodrigues, T. et al. Multidimensional de novo design reveals 5-HT2B2B receptor-selective ligands. Angew. Chem. Int. Ed. 54, 1551–1555 (2015).

    CAS  Google Scholar 

  121. 121.

    Reutlinger, M., Rodrigues, T., Schneider, P. & Schneider, G. Combining on-chip synthesis of a focused combinatorial library with computational target prediction reveals imidazopyridine GPCR ligands. Angew. Chem. Int. Ed. 53, 582–585 (2014).

    CAS  Google Scholar 

  122. 122.

    Ban, T. A. The role of serendipity in drug discovery. Dialogues Clin. Neurosci. 8, 335–344 (2006).

    PubMed  PubMed Central  Google Scholar 

  123. 123.

    Rosales, A. R. et al. Rapid virtual screening of enantioselective catalysts using CatVS. Nat. Catal. 2, 41–45 (2019).

    CAS  Google Scholar 

  124. 124.

    Steiner, S. et al. Organic synthesis in a modular robotic system driven by a chemical programming language. Science 363, eaav2211 (2019).

    CAS  PubMed  Google Scholar 

  125. 125.

    Caramelli, D. et al. Networking chemical robots for reaction multitasking. Nat. Commun. 9, 3406 (2018).

    PubMed  PubMed Central  Google Scholar 

  126. 126.

    Fitzpatrick, D. E., Maujean, T., Evans, A. C. & Ley, S. V. Across-the-world automated optimization and continuous-flow synthesis of pharmaceutical agents operating through a cloud-based server. Angew. Chem. Int. Ed. 57, 15128–15132 (2018).

    CAS  Google Scholar 

  127. 127.

    Lavecchia, A. Machine-learning approaches in drug discovery: methods and applications. Drug Discov. Today 20, 318–331 (2015).

    PubMed  Google Scholar 

  128. 128.

    Butler, K. T., Davies, D. W., Cartwright, H., Isayev, O. & Walsh, A. Machine learning for molecular and materials science. Nature 559, 547–555 (2018).

    CAS  PubMed  Google Scholar 

  129. 129.

    Jordan, M. I. & Mitchell, T. M. Machine learning: Trends, perspectives, and prospects. Science 349, 255–260 (2015).

    CAS  PubMed  Google Scholar 

  130. 130.

    Sanchez-Lengeling, B. & Aspuru-Guzik, A. Inverse molecular design using machine learning: Generative models for matter engineering. Science 361, 360–365 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  131. 131.

    Wallach, I. & Heifets, A. Most ligand-based classification benchmarks reward memorization rather than generalization. J. Chem. Inf. Model. 58, 916–932 (2018).

    CAS  PubMed  Google Scholar 

Download references


A.F.A. acknowledges Fundação para a Ciência e Tecnologia (FCT) Portugal for financial support through a PhD grant (PD/BD/143125/2019). T.R. is an investigador auxiliar supported by FCT Portugal (CEECIND/00887/2017). T.R. acknowledges FCT/FEDER (02/SAICT/2017, grant 28333) for funding. The authors thank the reviewers for their comments.

Author information




The authors contributed equally to all aspects of the article.

Corresponding author

Correspondence to Tiago Rodrigues.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information

Nature Reviews Chemistry thanks R. Lewis and B. Maryasin for their contribution to the peer review of this work.

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.


Natural-language processing

Area of computer science that deals with the recognition, processing and analysis of human (natural) language.


(SMILES arbitrary target specification). A notation for the accurate substructural feature identification and atom typing.


(Simplified molecular-input line-entry system). A notation to describe chemical structure using ASCII strings.

Morgan fingerprints

A method to map substructural information into a bit string. The bit length (size) and detail of encoded features are defined by the user.

Tanimoto index

A method to quantify similarity (ranging from 0 to 1) between molecules. Complete dissimilarity equates to 0 and full identity equals 1.

Softmax layer

A method that normalizes a vector of length j into a probability distribution containing J probabilities in the interval [0,1]. The sum of all probabilities equals 1.0.

Gaussian processes

A machine-learning method giving a probability distribution over a number of possible functions. A prior belief regarding an event is refined through Bayesian inference as data builds up.

Linear discriminant analysis

(LDA). A machine-learning method that finds linear combinations of features that separate classes, prior to dimensionality reduction and classification.

Support-vector machine

(SVM). A machine-learning method that separates data points in hyperspace through mathematical functions called kernels.

Transfer learning

A method for fine-tuning a model trained on a larger set of related data. The method is employed when limited data are available to answer a research question.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

de Almeida, A.F., Moreira, R. & Rodrigues, T. Synthetic organic chemistry driven by artificial intelligence. Nat Rev Chem 3, 589–604 (2019).

Download citation

Further reading


Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing