Abstract
The molecules of today — the medicines that cure diseases, the agrochemicals that protect our crops, the materials that make life convenient — are becoming increasingly sophisticated thanks to advancements in chemical synthesis. As tools for synthesis improve, molecular architects can be bold and creative in the way they design and produce molecules. Several emerging tools at the interface of chemical synthesis and data science have come to the forefront in recent years, including algorithms for retrosynthesis and reaction prediction, and robotics for autonomous or high-throughput synthesis. This Primer covers recent additions to the toolbox of the data-savvy organic chemist. There is a new movement in retrosynthetic logic, predictive models of reactivity and chemistry automata, with considerable recent engagement from contributors in diverse fields. The promise of chemical synthesis in the information age is to improve the quality of the molecules of tomorrow through data-harnessing and automation. This Primer is written for organic chemists and data scientists looking to understand the software, hardware, data sets and tactics that are commonly used as well as the capabilities and limitations of the field. The Primer is split into three main components covering retrosynthetic logic, reaction prediction and automated synthesis. The former of these topics is about distilling the strategy of multistep synthesis to a logic that can be taught to a computer. The section on reaction prediction details modern tools and models for developing reaction conditions, catalysts and even new transformations based on information-rich data sets and statistical tools such as machine learning. Finally, we cover recent advances in the use of liquid handling robotics and autonomous systems that can physically perform experiments in the chemistry laboratory.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 1 digital issues and online access to articles
$119.00 per year
only $119.00 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Shannon, C. E. A mathematical theory of communication. Bell Syst. Tech. J. 27, 379–423 (1948).
Corey, E. J. & Wipke, W. T. Computer-assisted design of complex organic syntheses. Science 166, 178 (1969).
Hammett, L. P. Physical Organic Chemistry; Reaction Rates, Equilibria, and Mechanisms 1st edn (McGraw-Hill, 1940).
Brønsted, J. N. & Pedersen, K. J. Die katalytische Zersetzung des Nitramids und ihre physikalisch-chemische Bedeutung [German]. Zeitschrift für Phys. Chemie Stochiometrie und Verwandtschaftslehre 108, 185–235 (1924).
Merrifield, R. B., Stewart, J. M. & Jernberg, N. Instrument for automated synthesis of peptides. Anal. Chem. 38, 1905–1914 (1966).
Merrifield, R. B. in Hypotensive Peptides 1–13 (Springer, 1966).
Evans, D. A. History of the Harvard ChemDraw project. Angew. Chem. Int. Ed. 53, 11140–11145 (2014).
Weininger, D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inf. Comput. Sci. 28, 31–36 (1988).
Todd, M. H. Computer-aided organic synthesis. Chem. Soc. Rev. 34, 247–266 (2005).
Ihlenfeldt, W.-D. & Gasteiger, J. Computer-assisted planning of organic syntheses: the second generation of programs. Angew. Chem. Int. Ed. Engl. 34, 2613–2633 (1996).
Cook, A. et al. Computer-aided synthesis design: 40 years on. Wiley Iinterdiscip. Rev. Comput. Mol. Sci. 2, 79–107 (2012).
Ravitz, O. Data-driven computer aided synthesis design. Drug Discov. Today Technol. 10, e443–e449 (2013).
Engkvist, O. et al. Computational prediction of chemical reactions: current status and outlook. Drug Discov. Today 23, 1203–1218 (2018).
Coley, C. W., Green, W. H. & Jensen, K. F. Machine learning in computer-aided synthesis planning. Acc. Chem. Res. 51, 1281–1289 (2018).
Johansson, S. et al. AI-assisted synthesis prediction. Drug Discov. Today Technol. 32-33, 65–72 (2019).
Zahrt, A. F., Athavale, S. V. & Denmark, S. E. Quantitative structure–selectivity relationships in enantioselective catalysis: past, present, and future. Chem. Rev. 120, 1620–1689 (2020).
Strieth-Kalthoff, F., Sandfort, F., Segler, M. H. S. & Glorius, F. Machine learning the ropes: principles, applications and directions in synthetic chemistry. Chem. Soc. Rev. 49, 6154–6168 (2020).
Reid, J. P. & Sigman, M. S. Comparing quantitative prediction methods for the discovery of small-molecule chiral catalysts. Nat. Rev. Chem. 2, 290–305 (2018).
de Almeida, A. F., Moreira, R. & Rodrigues, T. Synthetic organic chemistry driven by artificial intelligence. Nat. Rev. Chem. 3, 589–604 (2019).
Shevlin, M. Practical high-throughput experimentation for chemists. ACS Med. Chem. Lett. 8, 601–607 (2017).
Mennen, S. M. et al. The evolution of high-throughput experimentation in pharmaceutical development and perspectives on the future. Org. Process. Res. Dev. 23, 1213–1242 (2019).
Schneider, G. Automating drug discovery. Nat. Rev. Drug Discov. 17, 97 (2018).
Krska, S. W., DiRocco, D. A., Dreher, S. D. & Shevlin, M. The evolution of chemical high-throughput experimentation to address challenging problems in pharmaceutical synthesis. Acc. Chem. Res. 50, 2976–2985 (2017).
Welch, C. J. High throughput analysis enables high throughput experimentation in pharmaceutical process research. React. Chem. Eng. 4, 1895–1911 (2019).
Allen, C. L., Leitch, D. C., Anson, M. S. & Zajac, M. A. The power and accessibility of high-throughput methods for catalysis research. Nat. Catal. 2, 2–4 (2019).
Vléduts, G. É. Concerning one system of classification and codification of organic reactions. Inform. Stor. Retr. 1, 117–146 (1963).
Ugi, I. et al. Models, concepts, theories, and formal languages in chemistry and their use as a basis for computer assistance in chemistry. J. Chem. Inf. Comput. Sci. 34, 3–16 (1994).
Ugi, I. et al. Computer-assisted solution of chemical problems — the historical development and the present state of the art of a new discipline of chemistry. Angew. Chem. Int. Ed. Engl. 32, 201–227 (1993).
Corey, E. J. The Logic of Chemical Synthesis (Nobel Foundation, [Nobelstiftelsen], 1991).
Szymkuć, S. et al. Computer-assisted synthetic planning: the end of the beginning. Angew. Chem. Int. Ed. 55, 5904–5937 (2016).
Pensak, D. A. & Corey, E. J. in Computer-Assisted Organic Synthesis Vol. 61 Ch. 1 1–32 (American Chemical Society, 1977).
Campbell, M., Hoane, A. J. & Hsu, F.-H. Deep Blue. Artif. Intell. 134, 57–83 (2002).
Silver, D. et al. Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489 (2016).
Hanessian, S., Franco, J. & Larouche, B. The psychobiological basis of heuristic synthesis planning, man, machine, and the chiron aproach. Pure Appl. Chem. 62, 1887–1910 (1990).
Wipke, W. T. & Rogers, D. Artificial intelligence in organic synthesis. SST: starting material selection strategies. An application of superstructure search. J. Chem. Inf. Comput. Sci. 24, 71–81 (1984).
Mehta, G., Barone, R. & Chanon, M. Computer-aided organic synthesis — SESAM: a simple program to unravel “hidden” restructured starting materials skeleta in complex targets. Eur. J. Org. Chem. 1998, 1409–1412 (1998).
Corey, E. J., Long, A. K. & Rubenstein, S. D. Computer-assisted analysis in organic synthesis. Science 228, 408 (1985).
Klucznik, T. et al. Efficient syntheses of diverse, medicinally relevant targets planned by computer and executed in the laboratory. Chem 4, 522–532 (2018).
Law, J. et al. Route designer: a retrosynthetic analysis tool utilizing automated retrosynthetic rule generation. J. Chem. Inf. Model. 49, 593–602 (2009).
Christ, C. D., Zentgraf, M. & Kriegl, J. M. Mining electronic laboratory notebooks: analysis, retrosynthesis, and reaction based enumeration. J. Chem. Inf. Model. 52, 1745–1756 (2012).
Segler, M. H. S. & Waller, M. P. Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chem. Eur. J. 23, 5966–5971 (2017).
Coley, C. W., Rogers, L., Green, W. H. & Jensen, K. F. Computer-assisted retrosynthesis based on molecular similarity. ACS Cent. Sci. 3, 1237–1245 (2017).
Segler, M. H. S. & Waller, M. P. Modelling chemical reasoning to predict and invent reactions. Chem. Eur. J. 23, 6118–6128 (2017).
Baylon, J. L., Cilfone, N. A., Gulcher, J. R. & Chittenden, T. W. Enhancing retrosynthetic reaction prediction with deep learning using multiscale reaction classification. J. Chem. Inf. Model. 59, 673–688 (2019).
Gómez-Bombarelli, R. et al. Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent. Sci. 4, 268–276 (2018).
Segler, M. H. S., Kogej, T., Tyrchan, C. & Waller, M. P. Generating focused molecule libraries for drug discovery with recurrent neural networks. ACS Cent. Sci. 4, 120–131 (2018).
Liu, B. et al. Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS Cent. Sci. 3, 1103–1113 (2017).
Krenn, M., Häse, F., Nigam, A., Friederich, P. & Aspuru-Guzik, A. Self-referencing embedded strings (SELFIES): a 100% robust molecular string representation. Mach. Learn. Sci. Technol. 1, 045024 (2020).
Lin, K., Xu, Y., Pei, J. & Lai, L. Automatic retrosynthetic route planning using template-free models. Chem. Sci. 11, 3355–3364 (2020).
Karpov, P., Godin, G. & Tetko, I. V. in Artificial Neural Networks and Machine Learning — ICANN 2019: Workshop and Special Sessions (eds Kůrková, V., Karpov, P. & Theis, F.) 817–830 (Springer International, 2019).
Schwaller, P. et al. Molecular transformer: a model for uncertainty-calibrated chemical reaction prediction. ACS Cent. Sci. 5, 1572–1583 (2019).
Somnath, V. R., Bunne, C., Coley, C. W., Krause, A. & Barzilay, R. Learning graph models for template-free retrosynthesis. Preprint at https://arxiv.org/abs/2006.07038 (2020).
Sacha, M., Błaż, M., Byrski, P., Włodarczyk-Pruszyński, P. & Jastrzębski, S. Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Preprint at https://arxiv.org/abs/2006.15426 (2020).
Segler, M. H. S., Preuss, M. & Waller, M. P. Planning chemical syntheses with deep neural networks and symbolic AI. Nature 555, 604 https://www.nature.com/articles/nature25978#supplementary-information (2018).
Segler, M., Preuß, M. & Waller, M. P. Towards “AlphaChem”: chemical synthesis planning with tree search and deep neural network policies. Preprint at https://arxiv.org/abs/1702.00020 (2017).
Bertz, S. H. The first general index of molecular complexity. J. Am. Chem. Soc. 103, 3599–3601 (1981).
Huang, Q., Li, L.-L. & Yang, S.-Y. RASA: a rapid retrosynthesis-based scoring method for the assessment of synthetic accessibility of drug-like molecules. J. Chem. Inf. Model. 51, 2768–2777 (2011).
Coley, C. W., Rogers, L., Green, W. H. & Jensen, K. F. SCScore: synthetic complexity learned from a reaction corpus. J. Chem. Inf. Model. 58, 252–261 (2018).
Gasteiger, J. et al. Computer-assisted synthesis and reaction planning in combinatorial chemistry. Perspect. Drug Discov. Des. 20, 245–264 (2000).
Coley, C. W., Barzilay, R., Jaakkola, T. S., Green, W. H. & Jensen, K. F. Prediction of organic reaction outcomes using machine learning. ACS Cent. Sci. 3, 434–443 (2017).
Wei, J. N., Duvenaud, D. & Aspuru-Guzik, A. Neural networks for the prediction of organic chemistry reactions. ACS Cent. Sci. 2, 725–732 (2016).
Rosales, A. R. et al. Rapid virtual screening of enantioselective catalysts using CatVS. Nat. Catal. 2, 41–45 (2019).
Burai Patrascu, M. et al. From desktop to benchtop with automated computational workflows for computer-aided design in asymmetric catalysis. Nat. Catal. 3, 574–584 (2020).
Marcou, G. et al. Expert system for predicting reaction conditions: the Michael reaction case. J. Chem. Inf. Model. 55, 239–250 (2015).
Walker, E. et al. Learning to predict reaction conditions: relationships between solvent, molecular structure, and catalyst. J. Chem. Inf. Model. 59, 3645–3654 (2019).
Raccuglia, P. et al. Machine-learning-assisted materials discovery using failed experiments. Nature 533, 73–76 (2016).
Schneider, N., Lowe, D. M., Sayle, R. A., Tarselli, M. A. & Landrum, G. A. Big data from pharmaceutical patents: a computational analysis of medicinal chemists’ bread and butter. J. Med. Chem. 59, 4385–4402 (2016).
Jia, X. et al. Anthropogenic biases in chemical reaction data hinder exploratory inorganic synthesis. Nature 573, 251–255 (2019).
Mehr, S. H. M., Craven, M. S., Leonov, A. I., Keenan, G. & Cronin, L. A universal system for digitization and automatic execution of the chemical synthesis literature. Science 370, 101 (2020).
Martin, T. M. et al. Does rational selection of training and test sets improve the outcome of QSAR modeling? J. Chem. Inf. Model. 52, 2570–2578 (2012).
Murray, P. M. & Forfar, L. C. The application of advanced design of experiments for the efficient development of chemical processes. Chem. Inform. https://doi.org/10.21767/2470-6973.100023 (2017).
Luque Ruiz, I., Cerruela Garcí a, G. & G ómez-Nieto, M. Á. in Statistical Modelling of Molecular Descriptors in QSAR/QSPR Ch. 7 (eds Varmuza, K., Dehmer, M. & Bonchev, D.) 201–228 (Wiley, 2012).
Zahrt, A. F. et al. Prediction of higher-selectivity catalysts by computer-driven workflow and machine learning. Science https://doi.org/10.1126/science.aau5631 (2019).
Henle, J. J. et al. Development of a computer-guided workflow for catalyst optimization. descriptor validation, subset selection, and training set analysis. J. Am. Chem. Soc. 142, 11578–11592 (2020).
Zhao, S. et al. Enantiodivergent Pd-catalyzed C–C bond formation enabled through ligand parameterization. Science 362, 670 (2018).
Woods, B. P., Orlandi, M., Huang, C. Y., Sigman, M. S. & Doyle, A. G. Nickel-catalyzed enantioselective reductive cross-coupling of styrenyl aziridines. J. Am. Chem. Soc. 139, 5688–5691 (2017).
Gao, H. et al. Using machine learning to predict suitable conditions for organic reactions. ACS Cent. Sci. 4, 1465–1476 (2018).
Lin, A. I. et al. Automatized assessment of protective group reactivity: a step toward big reaction data analysis. J. Chem. Inf. Model. 56, 2140–2148 (2016).
Casari, A. & Zheng, A. Feature Engineering for Machine Learning: Principles and Techniques for Data Scientists 1st edn (O’Reilly Media, 2018).
Granda, J. M., Donina, L., Dragone, V., Long, D. L. & Cronin, L. Controlling an organic synthesis robot with machine learning to search for new reactivity. Nature 559, 377–381 (2018).
David, L., Thakkar, A., Mercado, R. & Engkvist, O. Molecular representations in AI-driven drug discovery: a review and practical guide. J. Cheminform. 12, 56 (2020).
Cherkasov, A. et al. QSAR modeling: where have you been? Where are you going to? J. Med. Chem. 57, 4977–5010 (2014).
Rogers, D. & Hahn, M. Extended-connectivity fingerprints. J. Chem. Inf. Model. 50, 742–754 (2010).
Moriwaki, H., Tian, Y. S., Kawashita, N. & Takagi, T. Mordred: a molecular descriptor calculator. J. Cheminform. 10, 4 (2018).
Merkwirth, C. & Lengauer, T. Automatic generation of complementary descriptors with molecular graph networks. J. Chem. Inf. Model. 45, 1159–1168 (2005).
Duvenaud, D. K. et al. Convolutional networks on graphs for learning molecular fingerprints. Adv. Neural Inf. Process. Syst. 28, 2224–2232 (2015).
Kearnes, S., McCloskey, K., Berndl, M., Pande, V. & Riley, P. Molecular graph convolutions: moving beyond fingerprints. J. Comput. Mol. Des. 30, 595–608 (2016).
Yang, K. et al. Analyzing learned molecular representations for property prediction. J. Chem. Inf. Model. 59, 3370–3388 (2019).
Brethomé, A. V., Fletcher, S. P. & Paton, R. S. Conformational effects on physical-organic descriptors: the case of Sterimol steric parameters. ACS Catal. 9, 2313–2323 (2019).
Harper, K. C., Bess, E. N. & Sigman, M. S. Multidimensional steric parameters in the analysis of asymmetric catalytic reactions. Nat. Chem. 4, 366–374 (2012).
Clavier, H. & Nolan, S. P. Percent buried volume for phosphine and N-heterocyclic carbene ligands: steric properties in organometallic chemistry. Chem. Commun. 46, 841–861 (2010).
Hillier, A. C. et al. A combined experimental and theoretical study examining the binding of N-heterocyclic carbenes (NHC) to the Cp*RuCl (Cp* = η5-C5Me5) moiety: insight into stereoelectronic differences between unsaturated and saturated NHC ligands. Organometallics 22, 4322–4326 (2003).
Ahneman, D. T., Estrada, J. G., Lin, S., Dreher, S. D. & Doyle, A. G. Predicting reaction performance in C–N cross-coupling using machine learning. Science 360, 186–190 (2018).
Reid, J. P. & Sigman, M. S. Holistic prediction of enantioselectivity in asymmetric catalysis. Nature 571, 343–348 (2019).
Santiago, C. B., Guo, J.-Y. & Sigman, M. S. Predictive and mechanistic multivariate linear regression models for reaction development. Chem. Sci. 9, 2398–2412 (2018).
Li, X., Zhang, S. Q., Xu, L. C. & Hong, X. Predicting regioselectivity in radical C−H functionalization of heterocycles through machine learning. Angew. Chem. Int. Ed. 59, 13253–13259 (2020).
Chan, W. & White, P. Fmoc Solid Phase Peptide Synthesis: a Practical Approach Vol. 222 (OUP Oxford, 1999).
Seeberger, P. H. Automated oligosaccharide synthesis. Chem. Soc. Rev. 37, 19–28 (2008).
Kaplan, B. E. The automated synthesis of oligodeoxyribonucleotides. Trends Biotechnol. 3, 253–256 (1985).
Cernak, T. et al. Microscale high-throughput experimentation as an enabling technology in drug discovery: application in the discovery of (piperidinyl)pyridinyl-1H-benzimidazole diacylglycerol acyltransferase 1 inhibitors. J. Med. Chem. 60, 3594–3605 (2017).
Hook, A. L. et al. High throughput methods applied in biomaterial development and discovery. Biomaterials 31, 187–198 (2010).
Yan, Y., Robinson, S. G., Sigman, M. S. & Sanford, M. S. Mechanism-based design of a high-potential catholyte enables a 3.2 V all-organic nonaqueous redox flow battery. J. Am. Chem. Soc. 141, 15301–15306 (2019).
Francis, M. B. & Jacobsen, E. N. Discovery of novel catalysts for alkene epoxidation from metal-binding combinatorial libraries. Angew. Chem. Int. Ed. 38, 937–941 (1999).
Taylor, S. J. & Morken, J. P. Thermographic selection of effective catalysts from an encoded polymer-bound library. Science 280, 267–270 (1998).
Kölmel, D. K., Loach, R. P., Knauber, T. & Flanagan, M. E. Employing photoredox catalysis for DNA-encoded chemistry: decarboxylative alkylation of α-amino acids. ChemMedChem 13, 2159–2165 (2018).
Geri, J. B. et al. Microenvironment mapping via Dexter energy transfer on immune cells. Science 367, 1091–1097 (2020).
Bellomo, A. et al. Rapid catalyst identification for the synthesis of the pyrimidinone core of HIV integrase inhibitors. Angew. Chem. Int. Ed. 51, 6912–6915 (2012).
Dreher, S. D., Dormer, P. G., Sandrock, D. L. & Molander, G. A. Efficient cross-coupling of secondary alkyltrifluoroborates with aryl chlorides — reaction discovery using parallel microscale experimentation. J. Am. Chem. Soc. 130, 9257–9259 (2008).
Buitrago Santanilla, A. et al. Nanomole-scale high-throughput chemistry for the synthesis of complex molecules. Science 347, 49 (2015).
Perera, D. et al. A platform for automated nanomole-scale reaction screening and micromole-scale synthesis in flow. Science 359, 429 (2018).
Shaabani, S. et al. Automated and accelerated synthesis of indole derivatives on a nano-scale. Green Chem. 21, 225–232 (2019).
Trobe, M. & Burke, M. D. The molecular industrial revolution: automated synthesis of small molecules. Angew. Chem. Int. Ed. 57, 4192–4214 (2018).
Wong, H. & Cernak, T. Reaction miniaturization in eco-friendly solvents. Curr. Opin. Green Sustain. Chem. 11, 91–98 (2018).
Wang, Y. et al. Acoustic droplet ejection enabled automated reaction scouting. ACS Cent. Sci. 5, 451–457 (2019).
Boga, S. B. et al. Selective functionalization of complex heterocycles via an automated strong base screening platform. React. Chem. Eng. 2, 446–450 (2017).
MacLeod, B. P. et al. Self-driving laboratory for accelerated discovery of thin-film materials. Sci. Adv. 6, eaaz8867 (2020).
Lee, G. M., Clément, R. & Baker, R. T. High-throughput evaluation of in situ-generated cobalt (III) catalysts for acyl fluoride synthesis. Catal. Sci. Technol. 7, 4996–5003 (2017).
Qiu, J., Albrecht, J. & Janey, J. Solubility behaviors and correlations of common organic solvents. Org. Process. Res. Dev. 24, 2702–2708 (2020).
Christensen, M. et al. Data-science driven autonomous process optimization. Preprint at https://doi.org/10.26434/chemrxiv.13146404.v2 (2020).
Lin, S. et al. Mapping the dark space of chemical reactions with extended nanomole synthesis and MALDI-TOF MS. Science 361, eaar6236 (2018).
Uehling, M. R., King, R. P., Krska, S. W., Cernak, T. & Buchwald, S. L. Pharmaceutical diversification via palladium oxidative addition complexes. Science 363, 405 (2019).
Gesmundo, N. J. et al. Nanoscale synthesis and affinity ranking. Nature 557, 228–232 (2018).
Bahr, M. N. et al. Collaborative evaluation of commercially available automated powder dispensing platforms for high-throughput experimentation in pharmaceutical applications. Org. Process Res. Dev. 22, 1500–1508 (2018).
Martin, M. C. et al. Versatile methods to dispense submilligram quantities of solids using chemical-coated beads for high-throughput experimentation. Org. Process Res. Dev. 23, 1900–1907 (2019).
Tu, N. P. et al. High-throughput reaction screening with nanomoles of solid reagents coated on glass beads. Angew. Chem. Int. Ed. 58, 7987–7991 (2019).
Coley, C. W. et al. A robotic platform for flow synthesis of organic compounds informed by AI planning. Science https://doi.org/10.1126/science.aax1566 (2019).
Noel, T. et al. Palladium-catalyzed amination reactions in flow: overcoming the challenges of clogging via acoustic irradiation. Chem. Sci. 2, 287–290 (2011).
Boele, M. D. K. et al. Selective Pd-catalyzed oxidative coupling of anilides with olefins through C–H bond activation at room temperature. J. Am. Chem. Soc. 124, 1586–1587 (2002).
McMullen, J. P., Stone, M. T., Buchwald, S. L. & Jensen, K. F. An integrated microreactor system for self-optimization of a heck reaction: from micro- to mesoscale flow systems. Angew. Chem. Int. Ed. 49, 7076–7080 (2010).
Zhang, J., Bellomo, A., Creamer, A. D., Dreher, S. D. & Walsh, P. J. Palladium-catalyzed C(sp3)–H arylation of diarylmethanes at room temperature: synthesis of triarylmethanes via deprotonative-cross-coupling processes. J. Am. Chem. Soc. 134, 13765–13772 (2012).
Reizman, B. J., Wang, Y.-M., Buchwald, S. L. & Jensen, K. F. Suzuki–Miyaura cross-coupling optimization enabled by automated feedback. React. Chem. Eng. 1, 658–666 (2016).
Kashani, S. K., Jessiman, J. E. & Newman, S. G. Exploring homogeneous conditions for mild Buchwald–Hartwig amination in batch and flow. Org. Process Res. Dev. 24, 1948–1954 (2020).
Boström, J., Brown, D. G., Young, R. J. & Keserü, G. M. Expanding the medicinal chemistry synthetic toolbox. Nat. Rev. Drug Discov. 17, 709–727 (2018).
Twilton, J. et al. Selective hydrogen atom abstraction through induced bond polarization: direct α-arylation of alcohols through photoredox, HAT, and nickel catalysis. Angew. Chem. Int. Ed. Engl. 57, 5369–5373 (2018).
Dirocco, D. A. et al. Late-stage functionalization of biologically active heterocycles through photoredox catalysis. Angew. Chem. Int. Ed. Engl. 53, 4802–4806 (2014).
Mo, Y., Rughoobur, G., Nambiar, A. M. K., Zhang, K. & Jensen, K. F. A multifunctional microfluidic platform for high-throughput experimentation of electroorganic chemistry. Angew. Chem. Int. Ed. 59, 20890–20894 (2020).
Deadman, B. J., Collins, S. G. & Maguire, A. R. Taming hazardous chemistry in flow: the continuous processing of diazo and diazonium compounds. Chemistry 21, 2298–2308 (2015).
Movsisyan, M. et al. Taming hazardous chemistry by continuous flow technology. Chem. Soc. Rev. 45, 4892–4928 (2016).
Selekman, J. A. et al. High-throughput automation in chemical process development. Annu. Rev. Chem. Biomol. Eng. 8, 525–547 (2017).
Hwang, Y. J. et al. A segmented flow platform for on-demand medicinal chemistry and compound synthesis in oscillating droplets. Chem. Commun. 53, 6649–6652 (2017).
Reker, D., Hoyt, E. A., Bernardes, G. J. L. & Rodrigues, T. Adaptive optimization of chemical reactions with minimal experimental information. Cell Rep. Phys. Sci. 1, 100247 (2020).
Jiang, T. et al. An integrated console for capsule-based, fully automated organic synthesis. Preprint at https://doi.org/10.26434/chemrxiv.7882799.v1 (2019).
Wang, C. & Glorius, F. Controlled iterative cross-coupling: on the way to the automation of organic synthesis. Angew. Chem. Int. Ed. 48, 5240–5244 (2009).
Steiner, S. et al. Organic synthesis in a modular robotic system driven by a chemical programming language. Science 363, eaav2211 (2019).
Collins, N. et al. Fully automated chemical synthesis: toward the universal synthesizer. Org. Process Res. Dev. 24, 2064–2077 (2020).
Wanner, B. M., Nichols, P. L. & Jiang, T. Cartridge-based automated synthesis — a new tool for the synthetic chemist. Chimia 74, 808–813 (2020).
Gillis, E. P. & Burke, M. D. Multistep synthesis of complex boronic acids from simple MIDA boronates. J. Am. Chem. Soc. 130, 14084–14085 (2008).
Li, J., Grillo, A. S. & Burke, M. D. From synthesis to function via iterative assembly of N-methyliminodiacetic acid boronate building blocks. Acc. Chem. Res. 48, 2297–2307 (2015).
Sun, S. & Kennedy, R. T. Droplet electrospray ionization mass spectrometry for high throughput screening for enzyme inhibitors. Anal. Chem. 86, 9309–9314 (2014).
Doi, T. et al. A formal total synthesis of taxol aided by an automated synthesizer. Chem. Asian J. 1, 370–383 (2006).
Roughley, S. D. & Jordan, A. M. The medicinal chemist’s toolbox: an analysis of reactions used in the pursuit of drug candidates. J. Med. Chem. 54, 3451–3479 (2011).
Cernak, T., Dykstra, K. D., Tyagarajan, S., Vachal, P. & Krska, S. W. The medicinal chemist’s toolbox for late stage functionalization of drug-like molecules. Chem. Soc. Rev. 45, 546–576 (2016).
Hsieh, H.-W., Coley, C. W., Baumgartner, L. M., Jensen, K. F. & Robinson, R. I. Photoredox iridium–nickel dual-catalyzed decarboxylative arylation cross-coupling: from batch to continuous flow via self-optimizing segmented flow reactor. Org. Process Res. Dev. 22, 542–550 (2018).
Burger, B. et al. A mobile robotic chemist. Nature 583, 237–241 (2020).
Mahjour, B., Shen, Y., Liu, W. & Cernak, T. A map of the amine–carboxylic acid coupling system. Nature 580, 71–75 (2020).
Roch, L. M. et al. ChemOS: an orchestration software to democratize autonomous discovery. PLoS ONE 15, e0229862 (2020).
Pendleton, I. M. et al. Experiment specification, capture and laboratory automation technology (ESCALATE): a software pipeline for automated chemical experimentation and data management. MRS Commun. 9, 846–859 (2019).
Marth, C. J. et al. Network-analysis-guided synthesis of weisaconitine D and liljestrandinine. Nature 528, 493 (2015).
Schwaller, P., Gaudin, T., Lányi, D., Bekas, C. & Laino, T. “Found in translation”: predicting outcomes of complex organic chemistry reactions using neural sequence-to-sequence models. Chem. Sci. 9, 6091–6098 (2018).
Coley, Connor W. et al. A graph-convolutional neural network model for the prediction of chemical reactivity. Chem. Sci. 10, 370–377 (2019).
Mikulak-Klucznik, B. et al. Computational planning of the synthesis of complex natural products. Nature 588, 83–88 (2020).
Alexander, D. L. J., Tropsha, A. & Winkler, D. A. Beware of R2: simple, unambiguous assessment of the prediction accuracy of QSAR and QSPR models. J. Chem. Inf. Model. 55, 1316–1322 (2015).
Chung, R. & Hein, J. E. Automated solubility and crystallization analysis of non-UV active compounds: integration of evaporative light scattering detection (ELSD) and robotic sampling. React. Chem. Eng. 4, 1674–1681 (2019).
Baranczak, A. et al. Integrated platform for expedited synthesis–purification–testing of small molecule libraries. ACS Med. Chem. Lett. 8, 461–465 (2017).
Hoogenboom, R., Wiesbrock, F., Leenen, M. A. M., Meier, M. A. R. & Schubert, U. S. Accelerating the living polymerization of 2-nonyl-2-oxazoline by implementing a microwave synthesizer into a high-throughput experimentation workflow. J. Comb. Chem. 7, 10–13 (2005).
Troshin, K. & Hartwig, J. F. Snap deconvolution: an informatics approach to high-throughput discovery of catalytic reactions. Science 357, 175 (2017).
McNally, A., Prier, C. K. & MacMillan, D. W. C. Discovery of an α-amino C–H arylation reaction using the strategy of accelerated serendipity. Science 334, 1114 (2011).
Johnson, A. P., Marshall, C. & Judson, P. N. Some recent progress in the development of the LHASA computer system for organic synthesis design: starting-material-oriented retrosynthetic analysis. Recl. Trav. Chim. Pays Bas 111, 310–316 (1992).
Snider, B. B. & Kulkarni, Y. S. Preparation of unsaturated. α.-chloro acids and intramolecular [2 + 2] cycloadditions of the chloroketenes derived from them. J. Org. Chem. 52, 307–310 (1987).
Schwaller, P. et al. Predicting retrosynthetic pathways using transformer-based models and a hyper-graph exploration strategy. Chem. Sci. 11, 3316–3325 (2020).
Genheden, S. et al. AiZynthFinder: a fast, robust and flexible open-source software for retrosynthetic planning. J. Cheminformatics 12, 70 (2020).
Nicolaou, C. A., Watson, I. A., LeMasters, M., Masquelin, T. & Wang, J. Context aware data-driven retrosynthetic analysis. J. Chem. Inf. Model. 60, 2728–2738 (2020).
Bøgevig, A. et al. Route design in the 21st century: the ICSYNTH software tool as an idea generator for synthesis prediction. Org. Process Res. Dev. 19, 357–368 (2015).
Sandfort, F., Strieth-Kalthoff, F., Kühnemund, M., Beecks, C. & Glorius, F. A structure-based platform for predicting chemical reactivity. Chem 6, 1379–1390 (2020).
Miró, J. et al. Enantioselective allenoate-Claisen rearrangement using chiral phosphate catalysts. J. Am. Chem. Soc. 142, 6390–6399 (2020).
Collins, K. D. & Glorius, F. Intermolecular reaction screening as a tool for reaction evaluation. Acc. Chem. Res. 48, 619–627 (2015).
Yayla, H. G. et al. Discovery and mechanistic study of a photocatalytic indoline dehydrogenation for the synthesis of elbasvir. Chem. Sci. 7, 2066–2073 (2016).
Vaucher, A. C. et al. Automated extraction of chemical synthesis actions from experimental procedures. Nat. Commun. 11, 3601 (2020).
Acknowledgements
A.G.D., R.S., M.A.H. and J.E.B. were supported by the National Science Foundation (NSF) under the Center for Computer Aided Synthesis (C-CAS) (CHE-1925607). M.A.H. is grateful for funding from the NSF graduate research fellowship program (DGE-1752814). Y.S. and T.C. were supported by the University of Michigan College of Pharmacy.
Author information
Authors and Affiliations
Contributions
Introduction (Y.S., J.E.B., M.A.H., R.S., A.G.D. and T.C.); Experimentation (Y.S., J.E.B., M.A.H., R.S., A.G.D. and T.C.); Results (Y.S., J.E.B., M.A.H., R.S., A.G.D. and T.C.); Applications (Y.S., J.E.B., M.A.H., R.S., A.G.D. and T.C.); Reproducibility and data deposition (Y.S., J.E.B., M.A.H., R.S., A.G.D. and T.C.); Limitations and optimizations (Y.S., J.E.B., M.A.H., R.S., A.G.D. and T.C.); Outlook (Y.S., J.E.B., M.A.H., R.S., A.G.D. and T.C.); Overview of the Primer (T.C.).
Corresponding authors
Ethics declarations
Competing interests
T.C. has received mosquito robotics from SPT Labtech and Merck & Co., Inc. T.C. and R.S. receive research support from MilliporeSigma, the company that owns the retrosynthetic software SYNTHIA. All other authors declare no competing interests.
Additional information
Peer review information
Nature Reviews Methods Primers thanks O. Ravitz, M. Segler, S. Trice and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Related links
AiZynthFinder: https://github.com/MolecularAI/aizynthfinder
ASKCOS: https://github.com/connorcoley/ASKCOS
Chemical.AI: https://Chemical.AI
IBM RXN for Chemistry: https://rxn.res.ibm.com/
ICSYNTH: https://www.deepmatter.io/products/icsynth/
Iktos spaya.ai: https://beta.spaya.ai/
RDKit: https://www.rdkit.org/
Reaxys: https://www.elsevier.com/solutions/reaxys/features-and-capabilities/synthesis-planner
SciFindern: https://www.cas.org/products/scifinder
SciKit-learn: https://scikit-learn.org/stable/
SYNTHIA: https://www.sigmaaldrich.com/chemistry/chemical-synthesis/synthesis-software.html
Glossary
- Linear free energy relationships
-
Linear relationships between the free energy of activation or free energy change of a reaction induced by a substituent of a molecule and a parameter that describes the electronic or steric properties of that substituent. Linear free energy relationships are a subset of structure–function (or structure–activity) relationships.
- Simplified molecular input line entry system
-
(SMILES). A string notation to represent chemical structures that can be generated from a two-dimensional or three-dimensional graph notation. Notably, the same molecule can sometimes be represented by multiple different SMILES codes depending on the drawing that was input. These notations are human understandable and variable in length.
- International Chemical Identifier
-
(InChI). A fixed-length, 27-character line notation that is designed to allow easy searches of chemical compounds. These are derived from the full length that encodes layers of information about a given molecular structure including connectivity, charge, stereochemistry and atomic isotopes. These notations are not human understandable.
- SMILES arbitrary target specification
-
(SMARTS). An extension of the simplified molecular input line entry system (SMILES) notations that allows for the specification of generic atoms and bonds to allow for substructures for searching databases.
- Reaction rules
-
Descriptions of chemical transforms that can be applied in a retrosynthetic module. These encode the substructures of the products and starting materials for a given synthetic step, and also include additional layers to express the scope and limitations of when the transform can be applied.
- Reaction templates
-
Descriptions of chemical transforms that include the substructures of the reactants and products and highlight structural changes. These contain somewhat less context than a reaction rule and often require additional strategy to select which of the numerous templates to apply in a retrosynthetic module to minimize computational cost.
- Sequence-to-sequence
-
A family of machine learning algorithms developed for natural language processing (language translation, image captioning and so on) that relies on recurrent neural networks to transform one sequence into another sequence.
- Transformer
-
An algorithm developed for natural language processing (language translation, image captioning and so on). This algorithm does not rely on recurrent neural networks and can process data in any order, thus allowing for reduced training times .
- Monte Carlo tree search
-
An algorithm for navigating search trees in which search steps are selected randomly, without branching, until a solution has been found or a maximum depth is reached. Algorithms of this type have emerged as strategic in applications of sequential decision problems without clear heuristics.
- Quantitative structure–activity relationship
-
A statistical modelling method used to relate molecular structure to biological and physico-chemical properties and predict these properties in new molecules.
- Density functional theory
-
(DFT). A computational method for modelling the electronic structure of atoms and molecules using quantum mechanics. In synthetic chemistry, density functional theory is used to compute and study molecular structures and their corresponding energies that cannot be obtained through experimental methods.
- Molecular mechanics
-
A computational method for modelling molecular structure using classical mechanics. Bonds are treated as springs from which a potential energy can be determined. Molecular mechanics is a less computationally expensive method relative to density functional theory.
- HOMO–LUMO energies
-
(Highest-occupied molecular orbital–lowest-unoccupied molecular orbital energies). These values correspond to the energetics of the molecular orbitals that are most involved in bond-making and bond-breaking processes, commonly referred to as the frontier molecular orbitals.
- Sterimol parameters
-
Three steric parameters — B1, B5 and L — for molecular substituents determined from three-dimensional structures. B1 and B5 represent the minimum and maximum widths, respectively, of the molecule perpendicular to the primary bond axis. L is the total length of the substituent measured along the primary bond axis.
- Buried volume
-
A steric parameter for ligands in transition metal complexes. The volume of a ligand, bonded to a metal at a fixed distance, enclosed by a sphere of a defined radius r. Provided as a percentage, representing the percentage of the sphere that is filled by a single bound ligand.
- Conformers
-
(Also known as conformational isomers). Structures of a molecule that differ by the rotation of groups about one or more single bonds in the molecule. Conformers can interconvert without making or breaking bonds and will have different relative energies based on the presence of attractive or repulsive interactions.
- High-throughput experimentation
-
(HTE). A technique used for screening chemical experiments, typically in a miniaturized format. Common formats for HTE include 24-well, 96-well and 384-well arrays, whereas ultraHTE refers to arrays of 1,536 experiments or more.
- k-fold cross-validation
-
A method for evaluating model performance on limited data. The data are split into k groups; one group is a test set, whereas the other is used as the training set. This is repeated k times to train and test the several groupings of the data.
- R 2 value
-
(Also known as the coefficient of determination). A measure of how well a model fits the data when comparing the measured values against predicted values for the training set. An R2 value of 0.8 means that the model can account for 80% of the observed variance in the data.
- Acoustic droplet ejection
-
A technology that uses precise ultrasound waves to move or transfer nanolitre volumes of solutions. Acoustic droplet ejection transfers the droplets from the source plate into an inverted receiving plate above the source plate.
Rights and permissions
About this article
Cite this article
Shen, Y., Borowski, J.E., Hardy, M.A. et al. Automation and computer-assisted planning for chemical synthesis. Nat Rev Methods Primers 1, 23 (2021). https://doi.org/10.1038/s43586-021-00022-5
Accepted:
Published:
DOI: https://doi.org/10.1038/s43586-021-00022-5
This article is cited by
-
Exploring the combinatorial explosion of amine–acid reaction space via graph editing
Communications Chemistry (2024)
-
Autonomous closed-loop mechanistic investigation of molecular electrochemistry via automation
Nature Communications (2024)
-
Printed polymer platform empowering machine-assisted chemical synthesis in stacked droplets
Nature Communications (2024)
-
A deep learning framework for accurate reaction prediction and its application on high-throughput experimentation data
Journal of Cheminformatics (2023)
-
Complex molecule synthesis by electrocatalytic decarboxylative cross-coupling
Nature (2023)