Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Automation and computer-assisted planning for chemical synthesis

Abstract

The molecules of today — the medicines that cure diseases, the agrochemicals that protect our crops, the materials that make life convenient — are becoming increasingly sophisticated thanks to advancements in chemical synthesis. As tools for synthesis improve, molecular architects can be bold and creative in the way they design and produce molecules. Several emerging tools at the interface of chemical synthesis and data science have come to the forefront in recent years, including algorithms for retrosynthesis and reaction prediction, and robotics for autonomous or high-throughput synthesis. This Primer covers recent additions to the toolbox of the data-savvy organic chemist. There is a new movement in retrosynthetic logic, predictive models of reactivity and chemistry automata, with considerable recent engagement from contributors in diverse fields. The promise of chemical synthesis in the information age is to improve the quality of the molecules of tomorrow through data-harnessing and automation. This Primer is written for organic chemists and data scientists looking to understand the software, hardware, data sets and tactics that are commonly used as well as the capabilities and limitations of the field. The Primer is split into three main components covering retrosynthetic logic, reaction prediction and automated synthesis. The former of these topics is about distilling the strategy of multistep synthesis to a logic that can be taught to a computer. The section on reaction prediction details modern tools and models for developing reaction conditions, catalysts and even new transformations based on information-rich data sets and statistical tools such as machine learning. Finally, we cover recent advances in the use of liquid handling robotics and autonomous systems that can physically perform experiments in the chemistry laboratory.

This is a preview of subscription content

Access options

Buy article

Get time limited or full article access on ReadCube.

$32.00

All prices are NET prices.

Fig. 1: General tools underlying chemical synthesis with information theory.
Fig. 2: Retrosynthetic planners, reaction prediction and automated synthesis workflows.
Fig. 3: Results from retrosynthetic planning programs.
Fig. 4: Applications of retrosynthetic planning validated by laboratory efforts.
Fig. 5: Retrosynthetic planning, reaction prediction and automated synthesis platform directed by ASKCOS.
Fig. 6: Prediction of reaction enantioselectivities for chiral phosphoric acid-catalysed nucleophilic additions to imines94.
Fig. 7: Prediction of reaction yields for palladium-catalysed Buchwald–Hartwig aminations93.
Fig. 8: Probing reaction selectivity using high-throughput experimentation177.
Fig. 9: Reaction miniaturization and validation114.
Fig. 10: Organic synthesis using a modular automated robotic system144.

References

  1. Shannon, C. E. A mathematical theory of communication. Bell Syst. Tech. J. 27, 379–423 (1948).

    MathSciNet  MATH  Google Scholar 

  2. Corey, E. J. & Wipke, W. T. Computer-assisted design of complex organic syntheses. Science 166, 178 (1969).

    ADS  Google Scholar 

  3. Hammett, L. P. Physical Organic Chemistry; Reaction Rates, Equilibria, and Mechanisms 1st edn (McGraw-Hill, 1940).

  4. Brønsted, J. N. & Pedersen, K. J. Die katalytische Zersetzung des Nitramids und ihre physikalisch-chemische Bedeutung [German]. Zeitschrift für Phys. Chemie Stochiometrie und Verwandtschaftslehre 108, 185–235 (1924).

    Google Scholar 

  5. Merrifield, R. B., Stewart, J. M. & Jernberg, N. Instrument for automated synthesis of peptides. Anal. Chem. 38, 1905–1914 (1966).

    Google Scholar 

  6. Merrifield, R. B. in Hypotensive Peptides 1–13 (Springer, 1966).

  7. Evans, D. A. History of the Harvard ChemDraw project. Angew. Chem. Int. Ed. 53, 11140–11145 (2014).

    Google Scholar 

  8. Weininger, D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inf. Comput. Sci. 28, 31–36 (1988).

    Google Scholar 

  9. Todd, M. H. Computer-aided organic synthesis. Chem. Soc. Rev. 34, 247–266 (2005).

    Google Scholar 

  10. Ihlenfeldt, W.-D. & Gasteiger, J. Computer-assisted planning of organic syntheses: the second generation of programs. Angew. Chem. Int. Ed. Engl. 34, 2613–2633 (1996).

    Google Scholar 

  11. Cook, A. et al. Computer-aided synthesis design: 40 years on. Wiley Iinterdiscip. Rev. Comput. Mol. Sci. 2, 79–107 (2012).

    Google Scholar 

  12. Ravitz, O. Data-driven computer aided synthesis design. Drug Discov. Today Technol. 10, e443–e449 (2013).

    Google Scholar 

  13. Engkvist, O. et al. Computational prediction of chemical reactions: current status and outlook. Drug Discov. Today 23, 1203–1218 (2018).

    Google Scholar 

  14. Coley, C. W., Green, W. H. & Jensen, K. F. Machine learning in computer-aided synthesis planning. Acc. Chem. Res. 51, 1281–1289 (2018).

    Google Scholar 

  15. Johansson, S. et al. AI-assisted synthesis prediction. Drug Discov. Today Technol. 32-33, 65–72 (2019).

    Google Scholar 

  16. Zahrt, A. F., Athavale, S. V. & Denmark, S. E. Quantitative structure–selectivity relationships in enantioselective catalysis: past, present, and future. Chem. Rev. 120, 1620–1689 (2020).

    Google Scholar 

  17. Strieth-Kalthoff, F., Sandfort, F., Segler, M. H. S. & Glorius, F. Machine learning the ropes: principles, applications and directions in synthetic chemistry. Chem. Soc. Rev. 49, 6154–6168 (2020).

    Google Scholar 

  18. Reid, J. P. & Sigman, M. S. Comparing quantitative prediction methods for the discovery of small-molecule chiral catalysts. Nat. Rev. Chem. 2, 290–305 (2018).

    Google Scholar 

  19. de Almeida, A. F., Moreira, R. & Rodrigues, T. Synthetic organic chemistry driven by artificial intelligence. Nat. Rev. Chem. 3, 589–604 (2019).

    Google Scholar 

  20. Shevlin, M. Practical high-throughput experimentation for chemists. ACS Med. Chem. Lett. 8, 601–607 (2017).

    Google Scholar 

  21. Mennen, S. M. et al. The evolution of high-throughput experimentation in pharmaceutical development and perspectives on the future. Org. Process. Res. Dev. 23, 1213–1242 (2019).

    Google Scholar 

  22. Schneider, G. Automating drug discovery. Nat. Rev. Drug Discov. 17, 97 (2018).

    Google Scholar 

  23. Krska, S. W., DiRocco, D. A., Dreher, S. D. & Shevlin, M. The evolution of chemical high-throughput experimentation to address challenging problems in pharmaceutical synthesis. Acc. Chem. Res. 50, 2976–2985 (2017).

    Google Scholar 

  24. Welch, C. J. High throughput analysis enables high throughput experimentation in pharmaceutical process research. React. Chem. Eng. 4, 1895–1911 (2019).

    Google Scholar 

  25. Allen, C. L., Leitch, D. C., Anson, M. S. & Zajac, M. A. The power and accessibility of high-throughput methods for catalysis research. Nat. Catal. 2, 2–4 (2019).

    Google Scholar 

  26. Vléduts, G. É. Concerning one system of classification and codification of organic reactions. Inform. Stor. Retr. 1, 117–146 (1963).

    Google Scholar 

  27. Ugi, I. et al. Models, concepts, theories, and formal languages in chemistry and their use as a basis for computer assistance in chemistry. J. Chem. Inf. Comput. Sci. 34, 3–16 (1994).

    Google Scholar 

  28. Ugi, I. et al. Computer-assisted solution of chemical problems — the historical development and the present state of the art of a new discipline of chemistry. Angew. Chem. Int. Ed. Engl. 32, 201–227 (1993).

    Google Scholar 

  29. Corey, E. J. The Logic of Chemical Synthesis (Nobel Foundation, [Nobelstiftelsen], 1991).

  30. Szymkuć, S. et al. Computer-assisted synthetic planning: the end of the beginning. Angew. Chem. Int. Ed. 55, 5904–5937 (2016).

    Google Scholar 

  31. Pensak, D. A. & Corey, E. J. in Computer-Assisted Organic Synthesis Vol. 61 Ch. 1 1–32 (American Chemical Society, 1977).

  32. Campbell, M., Hoane, A. J. & Hsu, F.-H. Deep Blue. Artif. Intell. 134, 57–83 (2002).

    MATH  Google Scholar 

  33. Silver, D. et al. Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489 (2016).

    ADS  Google Scholar 

  34. Hanessian, S., Franco, J. & Larouche, B. The psychobiological basis of heuristic synthesis planning, man, machine, and the chiron aproach. Pure Appl. Chem. 62, 1887–1910 (1990).

    Google Scholar 

  35. Wipke, W. T. & Rogers, D. Artificial intelligence in organic synthesis. SST: starting material selection strategies. An application of superstructure search. J. Chem. Inf. Comput. Sci. 24, 71–81 (1984).

    Google Scholar 

  36. Mehta, G., Barone, R. & Chanon, M. Computer-aided organic synthesis — SESAM: a simple program to unravel “hidden” restructured starting materials skeleta in complex targets. Eur. J. Org. Chem. 1998, 1409–1412 (1998).

    Google Scholar 

  37. Corey, E. J., Long, A. K. & Rubenstein, S. D. Computer-assisted analysis in organic synthesis. Science 228, 408 (1985).

    ADS  Google Scholar 

  38. Klucznik, T. et al. Efficient syntheses of diverse, medicinally relevant targets planned by computer and executed in the laboratory. Chem 4, 522–532 (2018).

    Google Scholar 

  39. Law, J. et al. Route designer: a retrosynthetic analysis tool utilizing automated retrosynthetic rule generation. J. Chem. Inf. Model. 49, 593–602 (2009).

    Google Scholar 

  40. Christ, C. D., Zentgraf, M. & Kriegl, J. M. Mining electronic laboratory notebooks: analysis, retrosynthesis, and reaction based enumeration. J. Chem. Inf. Model. 52, 1745–1756 (2012).

    Google Scholar 

  41. Segler, M. H. S. & Waller, M. P. Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chem. Eur. J. 23, 5966–5971 (2017).

    Google Scholar 

  42. Coley, C. W., Rogers, L., Green, W. H. & Jensen, K. F. Computer-assisted retrosynthesis based on molecular similarity. ACS Cent. Sci. 3, 1237–1245 (2017).

    Google Scholar 

  43. Segler, M. H. S. & Waller, M. P. Modelling chemical reasoning to predict and invent reactions. Chem. Eur. J. 23, 6118–6128 (2017).

    Google Scholar 

  44. Baylon, J. L., Cilfone, N. A., Gulcher, J. R. & Chittenden, T. W. Enhancing retrosynthetic reaction prediction with deep learning using multiscale reaction classification. J. Chem. Inf. Model. 59, 673–688 (2019).

    Google Scholar 

  45. Gómez-Bombarelli, R. et al. Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent. Sci. 4, 268–276 (2018).

    Google Scholar 

  46. Segler, M. H. S., Kogej, T., Tyrchan, C. & Waller, M. P. Generating focused molecule libraries for drug discovery with recurrent neural networks. ACS Cent. Sci. 4, 120–131 (2018).

    Google Scholar 

  47. Liu, B. et al. Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS Cent. Sci. 3, 1103–1113 (2017).

    Google Scholar 

  48. Krenn, M., Häse, F., Nigam, A., Friederich, P. & Aspuru-Guzik, A. Self-referencing embedded strings (SELFIES): a 100% robust molecular string representation. Mach. Learn. Sci. Technol. 1, 045024 (2020).

    Google Scholar 

  49. Lin, K., Xu, Y., Pei, J. & Lai, L. Automatic retrosynthetic route planning using template-free models. Chem. Sci. 11, 3355–3364 (2020).

    Google Scholar 

  50. Karpov, P., Godin, G. & Tetko, I. V. in Artificial Neural Networks and Machine Learning — ICANN 2019: Workshop and Special Sessions (eds Kůrková, V., Karpov, P. & Theis, F.) 817–830 (Springer International, 2019).

  51. Schwaller, P. et al. Molecular transformer: a model for uncertainty-calibrated chemical reaction prediction. ACS Cent. Sci. 5, 1572–1583 (2019).

    Google Scholar 

  52. Somnath, V. R., Bunne, C., Coley, C. W., Krause, A. & Barzilay, R. Learning graph models for template-free retrosynthesis. Preprint at https://arxiv.org/abs/2006.07038 (2020).

  53. Sacha, M., Błaż, M., Byrski, P., Włodarczyk-Pruszyński, P. & Jastrzębski, S. Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Preprint at https://arxiv.org/abs/2006.15426 (2020).

  54. Segler, M. H. S., Preuss, M. & Waller, M. P. Planning chemical syntheses with deep neural networks and symbolic AI. Nature 555, 604 https://www.nature.com/articles/nature25978#supplementary-information (2018).

  55. Segler, M., Preuß, M. & Waller, M. P. Towards “AlphaChem”: chemical synthesis planning with tree search and deep neural network policies. Preprint at https://arxiv.org/abs/1702.00020 (2017).

  56. Bertz, S. H. The first general index of molecular complexity. J. Am. Chem. Soc. 103, 3599–3601 (1981).

    Google Scholar 

  57. Huang, Q., Li, L.-L. & Yang, S.-Y. RASA: a rapid retrosynthesis-based scoring method for the assessment of synthetic accessibility of drug-like molecules. J. Chem. Inf. Model. 51, 2768–2777 (2011).

    Google Scholar 

  58. Coley, C. W., Rogers, L., Green, W. H. & Jensen, K. F. SCScore: synthetic complexity learned from a reaction corpus. J. Chem. Inf. Model. 58, 252–261 (2018).

    Google Scholar 

  59. Gasteiger, J. et al. Computer-assisted synthesis and reaction planning in combinatorial chemistry. Perspect. Drug Discov. Des. 20, 245–264 (2000).

    Google Scholar 

  60. Coley, C. W., Barzilay, R., Jaakkola, T. S., Green, W. H. & Jensen, K. F. Prediction of organic reaction outcomes using machine learning. ACS Cent. Sci. 3, 434–443 (2017).

    Google Scholar 

  61. Wei, J. N., Duvenaud, D. & Aspuru-Guzik, A. Neural networks for the prediction of organic chemistry reactions. ACS Cent. Sci. 2, 725–732 (2016).

    Google Scholar 

  62. Rosales, A. R. et al. Rapid virtual screening of enantioselective catalysts using CatVS. Nat. Catal. 2, 41–45 (2019).

    Google Scholar 

  63. Burai Patrascu, M. et al. From desktop to benchtop with automated computational workflows for computer-aided design in asymmetric catalysis. Nat. Catal. 3, 574–584 (2020).

    Google Scholar 

  64. Marcou, G. et al. Expert system for predicting reaction conditions: the Michael reaction case. J. Chem. Inf. Model. 55, 239–250 (2015).

    ADS  Google Scholar 

  65. Walker, E. et al. Learning to predict reaction conditions: relationships between solvent, molecular structure, and catalyst. J. Chem. Inf. Model. 59, 3645–3654 (2019).

    Google Scholar 

  66. Raccuglia, P. et al. Machine-learning-assisted materials discovery using failed experiments. Nature 533, 73–76 (2016).

    ADS  Google Scholar 

  67. Schneider, N., Lowe, D. M., Sayle, R. A., Tarselli, M. A. & Landrum, G. A. Big data from pharmaceutical patents: a computational analysis of medicinal chemists’ bread and butter. J. Med. Chem. 59, 4385–4402 (2016).

    Google Scholar 

  68. Jia, X. et al. Anthropogenic biases in chemical reaction data hinder exploratory inorganic synthesis. Nature 573, 251–255 (2019).

    ADS  Google Scholar 

  69. Mehr, S. H. M., Craven, M. S., Leonov, A. I., Keenan, G. & Cronin, L. A universal system for digitization and automatic execution of the chemical synthesis literature. Science 370, 101 (2020).

    ADS  Google Scholar 

  70. Martin, T. M. et al. Does rational selection of training and test sets improve the outcome of QSAR modeling? J. Chem. Inf. Model. 52, 2570–2578 (2012).

    Google Scholar 

  71. Murray, P. M. & Forfar, L. C. The application of advanced design of experiments for the efficient development of chemical processes. Chem. Inform. https://doi.org/10.21767/2470-6973.100023 (2017).

    Article  Google Scholar 

  72. Luque Ruiz, I., Cerruela Garcí a, G. & G ómez-Nieto, M. Á. in Statistical Modelling of Molecular Descriptors in QSAR/QSPR Ch. 7 (eds Varmuza, K., Dehmer, M. & Bonchev, D.) 201–228 (Wiley, 2012).

  73. Zahrt, A. F. et al. Prediction of higher-selectivity catalysts by computer-driven workflow and machine learning. Science https://doi.org/10.1126/science.aau5631 (2019).

    Article  Google Scholar 

  74. Henle, J. J. et al. Development of a computer-guided workflow for catalyst optimization. descriptor validation, subset selection, and training set analysis. J. Am. Chem. Soc. 142, 11578–11592 (2020).

    Google Scholar 

  75. Zhao, S. et al. Enantiodivergent Pd-catalyzed C–C bond formation enabled through ligand parameterization. Science 362, 670 (2018).

    ADS  Google Scholar 

  76. Woods, B. P., Orlandi, M., Huang, C. Y., Sigman, M. S. & Doyle, A. G. Nickel-catalyzed enantioselective reductive cross-coupling of styrenyl aziridines. J. Am. Chem. Soc. 139, 5688–5691 (2017).

    Google Scholar 

  77. Gao, H. et al. Using machine learning to predict suitable conditions for organic reactions. ACS Cent. Sci. 4, 1465–1476 (2018).

    Google Scholar 

  78. Lin, A. I. et al. Automatized assessment of protective group reactivity: a step toward big reaction data analysis. J. Chem. Inf. Model. 56, 2140–2148 (2016).

    Google Scholar 

  79. Casari, A. & Zheng, A. Feature Engineering for Machine Learning: Principles and Techniques for Data Scientists 1st edn (O’Reilly Media, 2018).

  80. Granda, J. M., Donina, L., Dragone, V., Long, D. L. & Cronin, L. Controlling an organic synthesis robot with machine learning to search for new reactivity. Nature 559, 377–381 (2018).

    ADS  Google Scholar 

  81. David, L., Thakkar, A., Mercado, R. & Engkvist, O. Molecular representations in AI-driven drug discovery: a review and practical guide. J. Cheminform. 12, 56 (2020).

    Google Scholar 

  82. Cherkasov, A. et al. QSAR modeling: where have you been? Where are you going to? J. Med. Chem. 57, 4977–5010 (2014).

    Google Scholar 

  83. Rogers, D. & Hahn, M. Extended-connectivity fingerprints. J. Chem. Inf. Model. 50, 742–754 (2010).

    Google Scholar 

  84. Moriwaki, H., Tian, Y. S., Kawashita, N. & Takagi, T. Mordred: a molecular descriptor calculator. J. Cheminform. 10, 4 (2018).

    Google Scholar 

  85. Merkwirth, C. & Lengauer, T. Automatic generation of complementary descriptors with molecular graph networks. J. Chem. Inf. Model. 45, 1159–1168 (2005).

    Google Scholar 

  86. Duvenaud, D. K. et al. Convolutional networks on graphs for learning molecular fingerprints. Adv. Neural Inf. Process. Syst. 28, 2224–2232 (2015).

    Google Scholar 

  87. Kearnes, S., McCloskey, K., Berndl, M., Pande, V. & Riley, P. Molecular graph convolutions: moving beyond fingerprints. J. Comput. Mol. Des. 30, 595–608 (2016).

    Google Scholar 

  88. Yang, K. et al. Analyzing learned molecular representations for property prediction. J. Chem. Inf. Model. 59, 3370–3388 (2019).

    Google Scholar 

  89. Brethomé, A. V., Fletcher, S. P. & Paton, R. S. Conformational effects on physical-organic descriptors: the case of Sterimol steric parameters. ACS Catal. 9, 2313–2323 (2019).

    Google Scholar 

  90. Harper, K. C., Bess, E. N. & Sigman, M. S. Multidimensional steric parameters in the analysis of asymmetric catalytic reactions. Nat. Chem. 4, 366–374 (2012).

    Google Scholar 

  91. Clavier, H. & Nolan, S. P. Percent buried volume for phosphine and N-heterocyclic carbene ligands: steric properties in organometallic chemistry. Chem. Commun. 46, 841–861 (2010).

    Google Scholar 

  92. Hillier, A. C. et al. A combined experimental and theoretical study examining the binding of N-heterocyclic carbenes (NHC) to the Cp*RuCl (Cp* = η5-C5Me5) moiety:  insight into stereoelectronic differences between unsaturated and saturated NHC ligands. Organometallics 22, 4322–4326 (2003).

    Google Scholar 

  93. Ahneman, D. T., Estrada, J. G., Lin, S., Dreher, S. D. & Doyle, A. G. Predicting reaction performance in C–N cross-coupling using machine learning. Science 360, 186–190 (2018).

    ADS  Google Scholar 

  94. Reid, J. P. & Sigman, M. S. Holistic prediction of enantioselectivity in asymmetric catalysis. Nature 571, 343–348 (2019).

    ADS  Google Scholar 

  95. Santiago, C. B., Guo, J.-Y. & Sigman, M. S. Predictive and mechanistic multivariate linear regression models for reaction development. Chem. Sci. 9, 2398–2412 (2018).

    Google Scholar 

  96. Li, X., Zhang, S. Q., Xu, L. C. & Hong, X. Predicting regioselectivity in radical C−H functionalization of heterocycles through machine learning. Angew. Chem. Int. Ed. 59, 13253–13259 (2020).

    Google Scholar 

  97. Chan, W. & White, P. Fmoc Solid Phase Peptide Synthesis: a Practical Approach Vol. 222 (OUP Oxford, 1999).

  98. Seeberger, P. H. Automated oligosaccharide synthesis. Chem. Soc. Rev. 37, 19–28 (2008).

    Google Scholar 

  99. Kaplan, B. E. The automated synthesis of oligodeoxyribonucleotides. Trends Biotechnol. 3, 253–256 (1985).

    Google Scholar 

  100. Cernak, T. et al. Microscale high-throughput experimentation as an enabling technology in drug discovery: application in the discovery of (piperidinyl)pyridinyl-1H-benzimidazole diacylglycerol acyltransferase 1 inhibitors. J. Med. Chem. 60, 3594–3605 (2017).

    Google Scholar 

  101. Hook, A. L. et al. High throughput methods applied in biomaterial development and discovery. Biomaterials 31, 187–198 (2010).

    Google Scholar 

  102. Yan, Y., Robinson, S. G., Sigman, M. S. & Sanford, M. S. Mechanism-based design of a high-potential catholyte enables a 3.2 V all-organic nonaqueous redox flow battery. J. Am. Chem. Soc. 141, 15301–15306 (2019).

    Google Scholar 

  103. Francis, M. B. & Jacobsen, E. N. Discovery of novel catalysts for alkene epoxidation from metal-binding combinatorial libraries. Angew. Chem. Int. Ed. 38, 937–941 (1999).

    Google Scholar 

  104. Taylor, S. J. & Morken, J. P. Thermographic selection of effective catalysts from an encoded polymer-bound library. Science 280, 267–270 (1998).

    ADS  Google Scholar 

  105. Kölmel, D. K., Loach, R. P., Knauber, T. & Flanagan, M. E. Employing photoredox catalysis for DNA-encoded chemistry: decarboxylative alkylation of α-amino acids. ChemMedChem 13, 2159–2165 (2018).

    Google Scholar 

  106. Geri, J. B. et al. Microenvironment mapping via Dexter energy transfer on immune cells. Science 367, 1091–1097 (2020).

    ADS  Google Scholar 

  107. Bellomo, A. et al. Rapid catalyst identification for the synthesis of the pyrimidinone core of HIV integrase inhibitors. Angew. Chem. Int. Ed. 51, 6912–6915 (2012).

    Google Scholar 

  108. Dreher, S. D., Dormer, P. G., Sandrock, D. L. & Molander, G. A. Efficient cross-coupling of secondary alkyltrifluoroborates with aryl chlorides — reaction discovery using parallel microscale experimentation. J. Am. Chem. Soc. 130, 9257–9259 (2008).

    Google Scholar 

  109. Buitrago Santanilla, A. et al. Nanomole-scale high-throughput chemistry for the synthesis of complex molecules. Science 347, 49 (2015).

    ADS  Google Scholar 

  110. Perera, D. et al. A platform for automated nanomole-scale reaction screening and micromole-scale synthesis in flow. Science 359, 429 (2018).

    ADS  Google Scholar 

  111. Shaabani, S. et al. Automated and accelerated synthesis of indole derivatives on a nano-scale. Green Chem. 21, 225–232 (2019).

    Google Scholar 

  112. Trobe, M. & Burke, M. D. The molecular industrial revolution: automated synthesis of small molecules. Angew. Chem. Int. Ed. 57, 4192–4214 (2018).

    Google Scholar 

  113. Wong, H. & Cernak, T. Reaction miniaturization in eco-friendly solvents. Curr. Opin. Green Sustain. Chem. 11, 91–98 (2018).

    Google Scholar 

  114. Wang, Y. et al. Acoustic droplet ejection enabled automated reaction scouting. ACS Cent. Sci. 5, 451–457 (2019).

    Google Scholar 

  115. Boga, S. B. et al. Selective functionalization of complex heterocycles via an automated strong base screening platform. React. Chem. Eng. 2, 446–450 (2017).

    Google Scholar 

  116. MacLeod, B. P. et al. Self-driving laboratory for accelerated discovery of thin-film materials. Sci. Adv. 6, eaaz8867 (2020).

    ADS  Google Scholar 

  117. Lee, G. M., Clément, R. & Baker, R. T. High-throughput evaluation of in situ-generated cobalt (III) catalysts for acyl fluoride synthesis. Catal. Sci. Technol. 7, 4996–5003 (2017).

    Google Scholar 

  118. Qiu, J., Albrecht, J. & Janey, J. Solubility behaviors and correlations of common organic solvents. Org. Process. Res. Dev. 24, 2702–2708 (2020).

    Google Scholar 

  119. Christensen, M. et al. Data-science driven autonomous process optimization. Preprint at https://doi.org/10.26434/chemrxiv.13146404.v2 (2020).

  120. Lin, S. et al. Mapping the dark space of chemical reactions with extended nanomole synthesis and MALDI-TOF MS. Science 361, eaar6236 (2018).

    Google Scholar 

  121. Uehling, M. R., King, R. P., Krska, S. W., Cernak, T. & Buchwald, S. L. Pharmaceutical diversification via palladium oxidative addition complexes. Science 363, 405 (2019).

    ADS  Google Scholar 

  122. Gesmundo, N. J. et al. Nanoscale synthesis and affinity ranking. Nature 557, 228–232 (2018).

    ADS  Google Scholar 

  123. Bahr, M. N. et al. Collaborative evaluation of commercially available automated powder dispensing platforms for high-throughput experimentation in pharmaceutical applications. Org. Process Res. Dev. 22, 1500–1508 (2018).

    Google Scholar 

  124. Martin, M. C. et al. Versatile methods to dispense submilligram quantities of solids using chemical-coated beads for high-throughput experimentation. Org. Process Res. Dev. 23, 1900–1907 (2019).

    Google Scholar 

  125. Tu, N. P. et al. High-throughput reaction screening with nanomoles of solid reagents coated on glass beads. Angew. Chem. Int. Ed. 58, 7987–7991 (2019).

    Google Scholar 

  126. Coley, C. W. et al. A robotic platform for flow synthesis of organic compounds informed by AI planning. Science https://doi.org/10.1126/science.aax1566 (2019).

    Article  Google Scholar 

  127. Noel, T. et al. Palladium-catalyzed amination reactions in flow: overcoming the challenges of clogging via acoustic irradiation. Chem. Sci. 2, 287–290 (2011).

    Google Scholar 

  128. Boele, M. D. K. et al. Selective Pd-catalyzed oxidative coupling of anilides with olefins through C–H bond activation at room temperature. J. Am. Chem. Soc. 124, 1586–1587 (2002).

    Google Scholar 

  129. McMullen, J. P., Stone, M. T., Buchwald, S. L. & Jensen, K. F. An integrated microreactor system for self-optimization of a heck reaction: from micro- to mesoscale flow systems. Angew. Chem. Int. Ed. 49, 7076–7080 (2010).

    Google Scholar 

  130. Zhang, J., Bellomo, A., Creamer, A. D., Dreher, S. D. & Walsh, P. J. Palladium-catalyzed C(sp3)–H arylation of diarylmethanes at room temperature: synthesis of triarylmethanes via deprotonative-cross-coupling processes. J. Am. Chem. Soc. 134, 13765–13772 (2012).

    Google Scholar 

  131. Reizman, B. J., Wang, Y.-M., Buchwald, S. L. & Jensen, K. F. Suzuki–Miyaura cross-coupling optimization enabled by automated feedback. React. Chem. Eng. 1, 658–666 (2016).

    Google Scholar 

  132. Kashani, S. K., Jessiman, J. E. & Newman, S. G. Exploring homogeneous conditions for mild Buchwald–Hartwig amination in batch and flow. Org. Process Res. Dev. 24, 1948–1954 (2020).

    Google Scholar 

  133. Boström, J., Brown, D. G., Young, R. J. & Keserü, G. M. Expanding the medicinal chemistry synthetic toolbox. Nat. Rev. Drug Discov. 17, 709–727 (2018).

    Google Scholar 

  134. Twilton, J. et al. Selective hydrogen atom abstraction through induced bond polarization: direct α-arylation of alcohols through photoredox, HAT, and nickel catalysis. Angew. Chem. Int. Ed. Engl. 57, 5369–5373 (2018).

    Google Scholar 

  135. Dirocco, D. A. et al. Late-stage functionalization of biologically active heterocycles through photoredox catalysis. Angew. Chem. Int. Ed. Engl. 53, 4802–4806 (2014).

    Google Scholar 

  136. Mo, Y., Rughoobur, G., Nambiar, A. M. K., Zhang, K. & Jensen, K. F. A multifunctional microfluidic platform for high-throughput experimentation of electroorganic chemistry. Angew. Chem. Int. Ed. 59, 20890–20894 (2020).

    Google Scholar 

  137. Deadman, B. J., Collins, S. G. & Maguire, A. R. Taming hazardous chemistry in flow: the continuous processing of diazo and diazonium compounds. Chemistry 21, 2298–2308 (2015).

    Google Scholar 

  138. Movsisyan, M. et al. Taming hazardous chemistry by continuous flow technology. Chem. Soc. Rev. 45, 4892–4928 (2016).

    Google Scholar 

  139. Selekman, J. A. et al. High-throughput automation in chemical process development. Annu. Rev. Chem. Biomol. Eng. 8, 525–547 (2017).

    Google Scholar 

  140. Hwang, Y. J. et al. A segmented flow platform for on-demand medicinal chemistry and compound synthesis in oscillating droplets. Chem. Commun. 53, 6649–6652 (2017).

    Google Scholar 

  141. Reker, D., Hoyt, E. A., Bernardes, G. J. L. & Rodrigues, T. Adaptive optimization of chemical reactions with minimal experimental information. Cell Rep. Phys. Sci. 1, 100247 (2020).

    Google Scholar 

  142. Jiang, T. et al. An integrated console for capsule-based, fully automated organic synthesis. Preprint at https://doi.org/10.26434/chemrxiv.7882799.v1 (2019).

  143. Wang, C. & Glorius, F. Controlled iterative cross-coupling: on the way to the automation of organic synthesis. Angew. Chem. Int. Ed. 48, 5240–5244 (2009).

    Google Scholar 

  144. Steiner, S. et al. Organic synthesis in a modular robotic system driven by a chemical programming language. Science 363, eaav2211 (2019).

    Google Scholar 

  145. Collins, N. et al. Fully automated chemical synthesis: toward the universal synthesizer. Org. Process Res. Dev. 24, 2064–2077 (2020).

    Google Scholar 

  146. Wanner, B. M., Nichols, P. L. & Jiang, T. Cartridge-based automated synthesis — a new tool for the synthetic chemist. Chimia 74, 808–813 (2020).

    Google Scholar 

  147. Gillis, E. P. & Burke, M. D. Multistep synthesis of complex boronic acids from simple MIDA boronates. J. Am. Chem. Soc. 130, 14084–14085 (2008).

    Google Scholar 

  148. Li, J., Grillo, A. S. & Burke, M. D. From synthesis to function via iterative assembly of N-methyliminodiacetic acid boronate building blocks. Acc. Chem. Res. 48, 2297–2307 (2015).

    Google Scholar 

  149. Sun, S. & Kennedy, R. T. Droplet electrospray ionization mass spectrometry for high throughput screening for enzyme inhibitors. Anal. Chem. 86, 9309–9314 (2014).

    Google Scholar 

  150. Doi, T. et al. A formal total synthesis of taxol aided by an automated synthesizer. Chem. Asian J. 1, 370–383 (2006).

    Google Scholar 

  151. Roughley, S. D. & Jordan, A. M. The medicinal chemist’s toolbox: an analysis of reactions used in the pursuit of drug candidates. J. Med. Chem. 54, 3451–3479 (2011).

    Google Scholar 

  152. Cernak, T., Dykstra, K. D., Tyagarajan, S., Vachal, P. & Krska, S. W. The medicinal chemist’s toolbox for late stage functionalization of drug-like molecules. Chem. Soc. Rev. 45, 546–576 (2016).

    Google Scholar 

  153. Hsieh, H.-W., Coley, C. W., Baumgartner, L. M., Jensen, K. F. & Robinson, R. I. Photoredox iridium–nickel dual-catalyzed decarboxylative arylation cross-coupling: from batch to continuous flow via self-optimizing segmented flow reactor. Org. Process Res. Dev. 22, 542–550 (2018).

    Google Scholar 

  154. Burger, B. et al. A mobile robotic chemist. Nature 583, 237–241 (2020).

    ADS  Google Scholar 

  155. Mahjour, B., Shen, Y., Liu, W. & Cernak, T. A map of the amine–carboxylic acid coupling system. Nature 580, 71–75 (2020).

    ADS  Google Scholar 

  156. Roch, L. M. et al. ChemOS: an orchestration software to democratize autonomous discovery. PLoS ONE 15, e0229862 (2020).

    Google Scholar 

  157. Pendleton, I. M. et al. Experiment specification, capture and laboratory automation technology (ESCALATE): a software pipeline for automated chemical experimentation and data management. MRS Commun. 9, 846–859 (2019).

    Google Scholar 

  158. Marth, C. J. et al. Network-analysis-guided synthesis of weisaconitine D and liljestrandinine. Nature 528, 493 (2015).

    ADS  Google Scholar 

  159. Schwaller, P., Gaudin, T., Lányi, D., Bekas, C. & Laino, T. “Found in translation”: predicting outcomes of complex organic chemistry reactions using neural sequence-to-sequence models. Chem. Sci. 9, 6091–6098 (2018).

    Google Scholar 

  160. Coley, Connor W. et al. A graph-convolutional neural network model for the prediction of chemical reactivity. Chem. Sci. 10, 370–377 (2019).

    Google Scholar 

  161. Mikulak-Klucznik, B. et al. Computational planning of the synthesis of complex natural products. Nature 588, 83–88 (2020).

    ADS  Google Scholar 

  162. Alexander, D. L. J., Tropsha, A. & Winkler, D. A. Beware of R2: simple, unambiguous assessment of the prediction accuracy of QSAR and QSPR models. J. Chem. Inf. Model. 55, 1316–1322 (2015).

    Google Scholar 

  163. Chung, R. & Hein, J. E. Automated solubility and crystallization analysis of non-UV active compounds: integration of evaporative light scattering detection (ELSD) and robotic sampling. React. Chem. Eng. 4, 1674–1681 (2019).

    Google Scholar 

  164. Baranczak, A. et al. Integrated platform for expedited synthesis–purification–testing of small molecule libraries. ACS Med. Chem. Lett. 8, 461–465 (2017).

    Google Scholar 

  165. Hoogenboom, R., Wiesbrock, F., Leenen, M. A. M., Meier, M. A. R. & Schubert, U. S. Accelerating the living polymerization of 2-nonyl-2-oxazoline by implementing a microwave synthesizer into a high-throughput experimentation workflow. J. Comb. Chem. 7, 10–13 (2005).

    Google Scholar 

  166. Troshin, K. & Hartwig, J. F. Snap deconvolution: an informatics approach to high-throughput discovery of catalytic reactions. Science 357, 175 (2017).

    ADS  Google Scholar 

  167. McNally, A., Prier, C. K. & MacMillan, D. W. C. Discovery of an α-amino C–H arylation reaction using the strategy of accelerated serendipity. Science 334, 1114 (2011).

    ADS  Google Scholar 

  168. Johnson, A. P., Marshall, C. & Judson, P. N. Some recent progress in the development of the LHASA computer system for organic synthesis design: starting-material-oriented retrosynthetic analysis. Recl. Trav. Chim. Pays Bas 111, 310–316 (1992).

    Google Scholar 

  169. Snider, B. B. & Kulkarni, Y. S. Preparation of unsaturated. α.-chloro acids and intramolecular [2 + 2] cycloadditions of the chloroketenes derived from them. J. Org. Chem. 52, 307–310 (1987).

    Google Scholar 

  170. Schwaller, P. et al. Predicting retrosynthetic pathways using transformer-based models and a hyper-graph exploration strategy. Chem. Sci. 11, 3316–3325 (2020).

    Google Scholar 

  171. Genheden, S. et al. AiZynthFinder: a fast, robust and flexible open-source software for retrosynthetic planning. J. Cheminformatics 12, 70 (2020).

    Google Scholar 

  172. Nicolaou, C. A., Watson, I. A., LeMasters, M., Masquelin, T. & Wang, J. Context aware data-driven retrosynthetic analysis. J. Chem. Inf. Model. 60, 2728–2738 (2020).

    Google Scholar 

  173. Bøgevig, A. et al. Route design in the 21st century: the ICSYNTH software tool as an idea generator for synthesis prediction. Org. Process Res. Dev. 19, 357–368 (2015).

    Google Scholar 

  174. Sandfort, F., Strieth-Kalthoff, F., Kühnemund, M., Beecks, C. & Glorius, F. A structure-based platform for predicting chemical reactivity. Chem 6, 1379–1390 (2020).

    Google Scholar 

  175. Miró, J. et al. Enantioselective allenoate-Claisen rearrangement using chiral phosphate catalysts. J. Am. Chem. Soc. 142, 6390–6399 (2020).

    Google Scholar 

  176. Collins, K. D. & Glorius, F. Intermolecular reaction screening as a tool for reaction evaluation. Acc. Chem. Res. 48, 619–627 (2015).

    Google Scholar 

  177. Yayla, H. G. et al. Discovery and mechanistic study of a photocatalytic indoline dehydrogenation for the synthesis of elbasvir. Chem. Sci. 7, 2066–2073 (2016).

    Google Scholar 

  178. Vaucher, A. C. et al. Automated extraction of chemical synthesis actions from experimental procedures. Nat. Commun. 11, 3601 (2020).

    ADS  Google Scholar 

Download references

Acknowledgements

A.G.D., R.S., M.A.H. and J.E.B. were supported by the National Science Foundation (NSF) under the Center for Computer Aided Synthesis (C-CAS) (CHE-1925607). M.A.H. is grateful for funding from the NSF graduate research fellowship program (DGE-1752814). Y.S. and T.C. were supported by the University of Michigan College of Pharmacy.

Author information

Authors and Affiliations

Authors

Contributions

Introduction (Y.S., J.E.B., M.A.H., R.S., A.G.D. and T.C.); Experimentation (Y.S., J.E.B., M.A.H., R.S., A.G.D. and T.C.); Results (Y.S., J.E.B., M.A.H., R.S., A.G.D. and T.C.); Applications (Y.S., J.E.B., M.A.H., R.S., A.G.D. and T.C.); Reproducibility and data deposition (Y.S., J.E.B., M.A.H., R.S., A.G.D. and T.C.); Limitations and optimizations (Y.S., J.E.B., M.A.H., R.S., A.G.D. and T.C.); Outlook (Y.S., J.E.B., M.A.H., R.S., A.G.D. and T.C.); Overview of the Primer (T.C.).

Corresponding authors

Correspondence to Richmond Sarpong, Abigail G. Doyle or Tim Cernak.

Ethics declarations

Competing interests

T.C. has received mosquito robotics from SPT Labtech and Merck & Co., Inc. T.C. and R.S. receive research support from MilliporeSigma, the company that owns the retrosynthetic software SYNTHIA. All other authors declare no competing interests.

Additional information

Peer review information

Nature Reviews Methods Primers thanks O. Ravitz, M. Segler, S. Trice and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Related links

AiZynthFinder: https://github.com/MolecularAI/aizynthfinder

ASKCOS: https://github.com/connorcoley/ASKCOS

Chemical.AI: https://Chemical.AI

IBM RXN for Chemistry: https://rxn.res.ibm.com/

ICSYNTH: https://www.deepmatter.io/products/icsynth/

Iktos spaya.ai: https://beta.spaya.ai/

RDKit: https://www.rdkit.org/

Reaxys: https://www.elsevier.com/solutions/reaxys/features-and-capabilities/synthesis-planner

SciFindern: https://www.cas.org/products/scifinder

SciKit-learn: https://scikit-learn.org/stable/

SYNTHIA: https://www.sigmaaldrich.com/chemistry/chemical-synthesis/synthesis-software.html

Glossary

Linear free energy relationships

Linear relationships between the free energy of activation or free energy change of a reaction induced by a substituent of a molecule and a parameter that describes the electronic or steric properties of that substituent. Linear free energy relationships are a subset of structure–function (or structure–activity) relationships.

Simplified molecular input line entry system

(SMILES). A string notation to represent chemical structures that can be generated from a two-dimensional or three-dimensional graph notation. Notably, the same molecule can sometimes be represented by multiple different SMILES codes depending on the drawing that was input. These notations are human understandable and variable in length.

International Chemical Identifier

(InChI). A fixed-length, 27-character line notation that is designed to allow easy searches of chemical compounds. These are derived from the full length that encodes layers of information about a given molecular structure including connectivity, charge, stereochemistry and atomic isotopes. These notations are not human understandable.

SMILES arbitrary target specification

(SMARTS). An extension of the simplified molecular input line entry system (SMILES) notations that allows for the specification of generic atoms and bonds to allow for substructures for searching databases.

Reaction rules

Descriptions of chemical transforms that can be applied in a retrosynthetic module. These encode the substructures of the products and starting materials for a given synthetic step, and also include additional layers to express the scope and limitations of when the transform can be applied.

Reaction templates

Descriptions of chemical transforms that include the substructures of the reactants and products and highlight structural changes. These contain somewhat less context than a reaction rule and often require additional strategy to select which of the numerous templates to apply in a retrosynthetic module to minimize computational cost.

Sequence-to-sequence

A family of machine learning algorithms developed for natural language processing (language translation, image captioning and so on) that relies on recurrent neural networks to transform one sequence into another sequence.

Transformer

An algorithm developed for natural language processing (language translation, image captioning and so on). This algorithm does not rely on recurrent neural networks and can process data in any order, thus allowing for reduced training times .

Monte Carlo tree search

An algorithm for navigating search trees in which search steps are selected randomly, without branching, until a solution has been found or a maximum depth is reached. Algorithms of this type have emerged as strategic in applications of sequential decision problems without clear heuristics.

Quantitative structure–activity relationship

A statistical modelling method used to relate molecular structure to biological and physico-chemical properties and predict these properties in new molecules.

Density functional theory

(DFT). A computational method for modelling the electronic structure of atoms and molecules using quantum mechanics. In synthetic chemistry, density functional theory is used to compute and study molecular structures and their corresponding energies that cannot be obtained through experimental methods.

Molecular mechanics

A computational method for modelling molecular structure using classical mechanics. Bonds are treated as springs from which a potential energy can be determined. Molecular mechanics is a less computationally expensive method relative to density functional theory.

HOMO–LUMO energies

(Highest-occupied molecular orbital–lowest-unoccupied molecular orbital energies). These values correspond to the energetics of the molecular orbitals that are most involved in bond-making and bond-breaking processes, commonly referred to as the frontier molecular orbitals.

Sterimol parameters

Three steric parameters — B1, B5 and L — for molecular substituents determined from three-dimensional structures. B1 and B5 represent the minimum and maximum widths, respectively, of the molecule perpendicular to the primary bond axis. L is the total length of the substituent measured along the primary bond axis.

Buried volume

A steric parameter for ligands in transition metal complexes. The volume of a ligand, bonded to a metal at a fixed distance, enclosed by a sphere of a defined radius r. Provided as a percentage, representing the percentage of the sphere that is filled by a single bound ligand.

Conformers

(Also known as conformational isomers). Structures of a molecule that differ by the rotation of groups about one or more single bonds in the molecule. Conformers can interconvert without making or breaking bonds and will have different relative energies based on the presence of attractive or repulsive interactions.

High-throughput experimentation

(HTE). A technique used for screening chemical experiments, typically in a miniaturized format. Common formats for HTE include 24-well, 96-well and 384-well arrays, whereas ultraHTE refers to arrays of 1,536 experiments or more.

k-fold cross-validation

A method for evaluating model performance on limited data. The data are split into k groups; one group is a test set, whereas the other is used as the training set. This is repeated k times to train and test the several groupings of the data.

R 2 value

(Also known as the coefficient of determination). A measure of how well a model fits the data when comparing the measured values against predicted values for the training set. An R2 value of 0.8 means that the model can account for 80% of the observed variance in the data.

Acoustic droplet ejection

A technology that uses precise ultrasound waves to move or transfer nanolitre volumes of solutions. Acoustic droplet ejection transfers the droplets from the source plate into an inverted receiving plate above the source plate.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Shen, Y., Borowski, J.E., Hardy, M.A. et al. Automation and computer-assisted planning for chemical synthesis. Nat Rev Methods Primers 1, 23 (2021). https://doi.org/10.1038/s43586-021-00022-5

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1038/s43586-021-00022-5

Further reading

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing