Training algorithms to computationally plan multistep organic syntheses has been a challenge for more than 50 years1,2,3,4,5,6,7. However, the field has progressed greatly since the development of early programs such as LHASA1,7, for which reaction choices at each step were made by human operators. Multiple software platforms6,8,9,10,11,12,13,14 are now capable of completely autonomous planning. But these programs ‘think’ only one step at a time and have so far been limited to relatively simple targets, the syntheses of which could arguably be designed by human chemists within minutes, without the help of a computer. Furthermore, no algorithm has yet been able to design plausible routes to complex natural products, for which much more far-sighted, multistep planning is necessary15,16 and closely related literature precedents cannot be relied on. Here we demonstrate that such computational synthesis planning is possible, provided that the program’s knowledge of organic chemistry and data-based artificial intelligence routines are augmented with causal relationships17,18, allowing it to ‘strategize’ over multiple synthetic steps. Using a Turing-like test administered to synthesis experts, we show that the routes designed by such a program are largely indistinguishable from those designed by humans. We also successfully validated three computer-designed syntheses of natural products in the laboratory. Taken together, these results indicate that expert-level automated synthetic planning is feasible, pending continued improvements to the reaction knowledge base and further code optimization.
This is a preview of subscription content, access via your institution
Open Access articles citing this article.
Nature Communications Open Access 03 July 2023
Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing
Nature Communications Open Access 25 May 2023
Dataset of solution-based inorganic materials synthesis procedures extracted from the scientific literature
Scientific Data Open Access 25 May 2022
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 51 print issues and online access
$199.00 per year
only $3.90 per issue
Rent or buy this article
Prices vary by article type
Prices may be subject to local taxes which are calculated during checkout
All data that support the findings of this study are available within the paper and its Supplementary Information, or from the corresponding authors on reasonable request.
In Supplementary Data, we provide the pseudocode for the multistep retrosynthetic design, pathway generation and retrieval (PSEUDOCODE_Aug2.pdf), an example of one of the reaction rules as coded in Chematica (RULE.pdf), and additional details of the availability and execution of the software (README_Aug2.pdf).
Corey, E. J. & Wipke, W. T. Computer-assisted design of complex organic syntheses. Science 166, 178–192 (1969).
Gelernter, H. L. et al. Empirical explorations of SYNCHEM. Science 197, 1041–1049 (1977).
Hanessian, S., Franco, J. & Larouche, B. The psychobiological basis of heuristic synthesis planning - man, machine and the Chiron approach. Pure Appl. Chem. 62, 1887–1910 (1990).
Hendrickson, J. B. Systematic synthesis design. 6. Yield analysis and convergency. J. Am. Chem. Soc. 99, 5439–5450 (1977).
Ugi, I. et al. Computer-assisted solution of chemical problems - the historical development and the present state of the art of a new discipline of chemistry. Angew. Chem. Int. Edn Engl. 32, 201–227 (1993).
Todd, M. H. Computer-aided organic synthesis. Chem. Soc. Rev. 34, 247–266 (2005).
Ravitz, O. Data-driven computer aided synthesis design. Drug Discov. Today. Technol. 10, e443–e449 (2013).
Szymkuć, S. et al. Computer-assisted synthetic planning: the end of the beginning. Angew. Chem. Int. Ed. 55, 5904–5937 (2016).
Klucznik, T. et al. Efficient syntheses of diverse, medicinally relevant targets planned by computer and executed in the laboratory. Chem 4, 522–532 (2018).
Segler, M. H. S., Preuss, M. & Waller, M. P. Planning chemical syntheses with deep neural networks and symbolic AI. Nature 555, 604–610 (2018).
Coley, C. W. et al. A robotic platform for flow synthesis of organic compounds informed by AI planning. Science 365, eaax1566 (2019).
SciFindern, https://scifinder-n.cas.org (accessed 20 July 2020).
Lee, A. A. et al. Molecular transformer unifies reaction prediction and retrosynthesis across pharma chemical space. Chem. Commun. 55, 12152–12155 (2019).
Schwaller, P. et al. Predicting retrosynthetic pathways using transformer-based models and a hyper-graph exploration strategy. Chem. Sci. 11, 3316–3325 (2020).
Nicolaou, K. C. Classics in Total Synthesis II: More Targets, Strategies, Methods (Wiley-VCH, 2003).
Huang, P. Efficiency in Natural Product Total Synthesis (Wiley, 2018).
Yi, K. et al. CLEVERER: collision events for video representation and reasoning. Preprint at https://arxiv.org/abs/1910.01442 (2020).
Bergstein, B. What AI still can’t do. MIT Technical Review https://www.technologyreview.com/s/615189/what-ai-still-cant-do/ (2020).
Kowalik, M. et al. Parallel optimization of synthetic pathways within the network of organic chemistry. Angew. Chem. Int. Ed. 51, 7928–7932 (2012).
Lin, Y. et al. Reinforcing the supply chain of COVID-19 therapeutics with expert-coded retrosynthetic software. Preprint at https://doi.org/10.26434/chemrxiv.12765410.v1 (2020).
Beker, W., Gajewska, E. P., Badowski, T. & Grzybowski, B. A. Prediction of major regio-, site-, and diastereoisomers in Diels-Alder reactions by using machine-learning: the importance of physically meaningful descriptors. Angew. Chem. Int. Ed. 58, 4515–4519 (2019).
Badowski, T., Gajewska, E. P., Molga, K. & Grzybowski, B. A. Synergy between expert and machine-learning approaches allows for improved retrosynthetic planning. Angew. Chem. Int. Ed. 59, 725–730 (2020).
Badowski, T., Molga, K. & Grzybowski, B. A. Selection of cost-effective yet chemically diverse pathways from the networks of computer-generated retrosynthetic plans. Chem. Sci. 10, 4640–4651 (2019).
Molga, K., Dittwald, P. & Grzybowski, B. A. Computational design of syntheses leading to compound libraries or isotopically labelled targets. Chem. Sci. 10, 9219–9232 (2019).
Molga, K., Dittwald, P. & Grzybowski, B. A. Navigating around patented routes by preserving specific motifs along computer-planned retrosynthetic pathways. Chem 5, 460–473 (2019).
Gajewska, E. P. et al. Algorithmic discovery of tactical combinations for advanced organic syntheses. Chem 6, 280–293 (2020).
Molga, K., Gajewska, E. P., Szymkuć, S. & Grzybowski, B. A. The logic of translating chemical knowledge into machine-processable forms: a modern playground for physical-organic chemistry. React. Chem. Eng. 4, 1506–1521 (2019).
Emami, F. E. et al. A priori estimation of organic reaction yields. Angew. Chem. Int. Ed. 54, 10797–10801 (2015).
Skoraczyński, G. et al. Predicting the outcomes of organic reactions via machine learning: are current descriptors sufficient? Sci. Rep. 7, 3582 (2017).
Corey, E. J. & Cheng, X.-M. The Logic of Chemical Synthesis (Wiley, 1995).
Serratosa, F. Organic Chemistry in Action: The Design of Organic Synthesis (Elsevier, 1996).
Copeland, B. J. (ed.) The Essential Turing: The Ideas That Gave Birth to the Computer Age (Oxford Univ. Press, 2004).
Shah, H., Warwick, K., Vallverdú, J. & Wu, D. Can machines talk? Comparison of Eliza with modern dialogue systems. Comput. Human Behav. 58, 278–295 (2016).
Yang, Z. et al. Dauricine induces apoptosis, inhibits proliferation and invasion through inhibiting NF-κB signaling pathway in colon cancer cells. J. Cell. Physiol. 225, 266–275 (2010).
Kametani, T. & Fukumoto, K. Total synthesis of (±)-dauricine. Tetrahedr. Lett. 5, 2771–2775 (1964).
Lim, K.-H. et al. Ibogan, tacaman, and cytotoxic bisindole alkaloids from Tabernaemontana. Cononusine, an iboga alkaloid with unusual incorporation of a pyrrolidone moiety. J. Nat. Prod. 78, 1129–1138 (2015).
Torii, M. et al. Lamellodysidines A and B, sesquiterpenes isolated from the marine sponge Lamellodysidea herbacea. J. Nat. Prod. 80, 2536–2541 (2017).
Fialkowski, M., Bishop, K. J. M., Chubukov, V. A., Campbell, C. J. & Grzybowski, B. A. Architecture and evolution of organic chemistry. Angew. Chem. Int. Ed. 44, 7263–7269 (2005).
Grzybowski, B. A., Bishop, K. J. M., Kowalczyk, B. & Wilmer, C. E. The ‘wired’ universe of organic chemistry. Nat. Chem. 1, 31–36 (2009).
Sammut, C. in Encyclopedia of Machine Learning and Data Mining (eds Sammut, C. & Webb, G. I.) 120 (Springer, 2017).
Gremmen, C., Willemse, B., Wanner, M. J. & Koomen, G.-J. Enantiopure tetrahydro-β-carbolines via Pictet−Spengler reactions with N-sulfinyl tryptamines. Org. Lett. 2, 1955–1958 (2000).
Gansäuer, A., Worgull, D., Knebel, K., Huth, I. & Schnakenburg, G. 4-exo cyclizations by template catalysis. Angew. Chem. Int. Ed. 48, 8882–8885 (2009).
Hadjaz, F., Yous, S., Lebegue, N., Berthelot, P. & Carato, P. A mild and efficient route to 2-benzyl tryptamine derivatives via ring-opening of β-carbolines. Tetrahedron 64, 10004–10008 (2008).
Taylor, M. S. & Jacobsen, E. N. Highly enantioselective catalytic acyl-Pictet−Spengler reactions. J. Am. Chem. Soc. 126, 10558–10559 (2004).
Goetz, A. E., Silberstein, A. L., Corsello, M. A. & Garg, N. K. Concise enantiospecific total synthesis of tubingensin A. J. Am. Chem. Soc. 136, 3036–3039 (2014).
White, J. D., Grether, U. M. & Lee, Ch.-S. (R)-(+)-3,4-dimethylcyclohex-2-en-1-one. Org. Synth. 82, 108 (2005).
Nicolaou, K. C., Zhong, Y.-L. & Baran, P. S. A new method for the one-step synthesis of α,β-unsaturated carbonyl systems from saturated alcohols and carbonyl compounds. J. Am. Chem. Soc. 122, 7596–7597 (2000).
Xu, L., Wang, C., Gao, Z. & Zhao, Y.-M. Total synthesis of (±)-cephanolides B and C via a palladium-catalyzed cascade cyclization and late-stage sp3 C–H bond oxidation. J. Am. Chem. Soc. 140, 5653–5658 (2018).
Xu, B., Xun, W., Su, S. & Zhai, H. Total syntheses of (−)-conidiogenone B, (−)-conidiogenone, and (−)-conidiogenol. Angew. Chem. Int. Ed. 59, 16475 (2020).
Hafeman, N. J. et al. The total synthesis of (−)-scabrolide A. J. Am. Chem. Soc. 142, 8585–8590 (2020).
Wilde, N. C., Isomura, M., Mendoza, A. & Baran, P. S. Two-phase synthesis of (−)-taxuyunnanine D. J. Am. Chem. Soc. 136, 4909–4912 (2014).
Zhang, Y. & Danishefsky, S. J. Total synthesis of (±)-aplykurodinone-1: traceless stereochemical guidance. J. Am. Chem. Soc. 132, 9567–9569 (2010).
Guo, L., Frey, W. & Plietker, B. Catalytic enantioselective total synthesis of the picrotoxane alkaloids (−)-dendrobine, (−)-mubironine B, and (−)-dendroxine. Org. Lett. 20, 4328–4331 (2018).
Nicolaou, K. C. et al. Total synthesis and structural revision of antibiotic CJ-16,264. Angew. Chem. Int. Ed. 54, 9203–9208 (2015).
Chuang, K. V., Xu, C. & Reisman, S. E. A 15-step synthesis of (+)-ryanodol. Science 353, 912–915 (2016).
Kanda, Y. et al. Two-phase synthesis of taxol. J. Am. Chem. Soc. 142, 10526–10533 (2020).
Lambert, T. H. & Danishefsky, S. J. Total synthesis of UCS1025A. J. Am. Chem. Soc. 128, 426–427 (2006).
Roszak, R., Beker, W., Molga, K. & Grzybowski, B. A. Rapid and accurate prediction of pKa values of C–H acids using graph convolutional neural networks. J. Am. Chem. Soc. 141, 17142–17149 (2019).
Crosby, S. R., Harding, J. R., King, C. D., Parker, G. D. & Willis, C. L. Oxonia-Cope rearrangement and side-chain exchange in the Prins cyclization. Org. Lett. 4, 577–580 (2002).
Kormann, C., Heinemann, F. W. & Gmeiner, P. A consecutive Diels–Alder approach toward a Tet repressor directed combinatorial library. Tetrahedron 62, 6899–6908 (2006).
Owens, K. R. et al. Total synthesis of the diterpenoid alkaloid Arcutinidine using a strategy inspired by chemical network analysis. J. Am. Chem. Soc. 141, 13713–13717 (2019).
Jung, M. E. & Davidov, P. Efficient synthesis of a tricyclic BCD analogue of ouabain: Lewis acid catalyzed Diels–Alder reactions of sterically hindered systems. Angew. Chem. Int. Ed. 41, 4125–4128 (2002).
Sheu, J.-H., Ahmed, A. F., Shiue, R.-T., Dai, C.-F. & Kuo, Y.-H. Scabrolides A−D, four new norditerpenoids isolated from the soft coral Sinularia scabra. J. Nat. Prod. 65, 1904–1908 (2002).
Cui, W.-X. et al. Polycyclic furanobutenolide-derived norditerpenoids from the South China Sea soft corals Sinularia scabra and Sinularia polydactyla with immunosuppressive activity. Bioorg. Chem. 94, 103350 (2020)
Mendoza, A., Ishihara, Y. & Baran, P. S. Scalable enantioselective total synthesis of taxanes. Nat. Chem. 4, 21–25 (2012).
Liao, W. & Yu, Z.-X. DFT study of the mechanism and stereochemistry of the Rh(I)-catalyzed Diels–Alder reactions between electronically neutral dienes and dienophiles. J. Org. Chem. 79, 11949–11960 (2014).
Xu, B., Xun, W., Wang, T. & Qiu, F. G. Total synthesis of (+)-aplykurodinone-1. Org. Lett. 19, 4861–4863 (2017).
Wang, Y.-M., Bruno, N. C., Placeres, Á. L., Zhu, S. & Buchwald, S. L. Enantioselective synthesis of carbo- and heterocycles through a CuH-catalyzed hydroalkylation approach. J. Am. Chem. Soc. 137, 10524–10527 (2015).
Development of Chematica was partly supported by US DARPA under the Make-It Award, 69461-CH-DRP #W911NF1610384 (K.M., S.S., E.P.G., P.D., T.B., B.A.G.); the same award also supported the synthesis of dauricine (A.A.B., M.M.). Synthesis of tacamonidine was supported in part (B.M.-K., T.K., B.A.G.) by the National Science Center, NCN, Poland under the Symfonia Award (#2014/12/W/ST5/00592). Synthesis of lamellodysidine A was supported in part (P.G., B.A.G.) by the National Science Center, NCN, Poland under the Maestro Award (#2018/30/A/ST5/00529). J.M. and O.P. thank the Foundation for Polish Science for financial support under award TEAM/2017-4/38. B.A.G. acknowledges support from the Institute for Basic Science Korea, project code IBS-R020-D1. We thank B. Sieredzińska for help in the synthesis of tacamonidine and S. Trice (Merck, KGaA) for help in organizing the Turing test. We thank the following experts for their participation in the Turing test (in alphabetical order): P. Baran (Scripps), J. Bode (ETH Zurich), M. Burke (University of Illinois), M. Christmann (Freie Universität Berlin), H. Davies (Emory University), M. Giedyk (ICHO PAN), D. Huryn (University of Pittsburgh), M. Krische (University of Texas), S. Matsubara (Kyoto University), N. Maulide (Universität Wien), G. Molander (University of Pennsylvania), R. Sarpong (Berkeley), P. Schreiner (Justus Liebig University Giessen) and J. Siitonen (Rice University), as well as four others, who prefer to remain anonymous.
Although Chematica was originally developed and owned by B.A.G.’s Grzybowski Scientific Inventions, LLC, neither he nor the co-authors currently hold any stock in this company, which is now property of Merck KGaA, Darmstadt, Germany. S.S., E.P.G. P.D., T.B., K.M. and B.A.G. continue to collaborate with Merck KGaA, Darmstadt. The algorithms described in this paper are currently being transitioned into Chematica’s commercial version, called SynthiaTM. All queries about access options to Chematica/SynthiaTM, including academic collaborations, should be directed to S. Trice (email@example.com).
Peer review information Nature thanks the anonymous reviewer(s) for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
Extended Data Fig. 1 New components of Chematica essential to its ability to plan the syntheses of complex, natural-product targets.
Only key algorithmic improvements (since the publication of ref. 9) are highlighted. a, Increase in the knowledge base of reactions rules to more than 100,000, including a large fraction of advanced stereoselective transforms. b, Implementation of various machine-learning molecular mechanics and quantum-mechanics routines to further evaluate the correctness of the reaction prediction. Illustrated here is the machine-learning method (random forest classifier) that evaluates the applicability of Diels–Alder cyclizations21. c, Information about specific motifs in the synthons that are not only too strained (top)8 but also prone to side reactions. An electron-rich allylic alcohol substrate in the Prins cyclization may undergo a competitive oxonia-Cope rearrangement59 (bottom). d, Scoring functions, either improved heuristics-based or best-in-class neural networks22. e, Search algorithms that combine two strategies: searching broadly to explore wide spectrum of options and deeply to reach stop-point substrates as soon as possible. Each search strategy maintains its own priority queue (PQ), with different queues sharing results. f, Large numbers of previously unrecognized two-step reaction sequences that allow the program to overcome local maxima of structural complexity. Image reproduced with permission from ref. 26 (https://doi.org/10.1016/j.chempr.2019.11.016; Elsevier), which is published under a Creative Commons license (CC BY-NC-ND 4.0; http://creativecommons.org/licenses/by-nc-nd/4.0/). g, Hard-coded sequences of some 100 FGIs to rapidly reach less reactive synthons. h, Bypasses—that is, routines that navigate around intermittent reactivity conflicts (red reaction arrow), by first converting the conflicting group into a non-conflicting one (here, a primary alcohol into an alkene or a silyl ether) and only then performing a high-gain, structure-simplifying step (here, stereoselective alkylation of cyclohexenone). Without the bypass algorithm, the search would explore other, less-structure-simplifying options such as the allylic oxidation indicated by blue arrow. i, The ability to perform two different reactions on the retron simultaneously, if multiple reaction loci are reactive under the reaction conditions. Here, treatment with hydrogen and Pd catalyst should remove both phosphonate esters and benzyl ethers (left). Under these conditions, only esters or only ethers cannot be selectively removed. Attempting such selective removal, Chematica would see the unremoved groups (marked in red) as incompatible; in effect, it would not be able to perform the desired global deprotection. Similarly, global debenzylation of an aminoalcohol should be performed in a single step (right).
Extended Data Fig. 2 Enantioselective synthesis of a pentacyclic diterpenoid, cephanolide B, designed by Chematica.
This target was recently prepared in racemic form in 12 steps48, with Pd-catalysed carbonylative cyclization as the key step. In its design, Chematica used 13 steps to reach commercially available crotonyl chloride and a known iodoalkyne 2 (available in two steps from the commercially available oxirane and TMS-acetylene). The synthesis commences with the formation of enantioenriched diene 5 via stereoselective alkylation of the enolate (with stereochemistry controlled by a chiral auxiliary) and subsequent metathesis of the enyne 4. Subsequently, addition of a Grignard reagent derived from bromide 6, cyanation, reduction of ketone, lactonization, methylenation and oxidation of the less hindered allylic position derives triene 12. This is then used in an elegant, intramolecular Diels–Alder cycloaddition (the feasibility of which was confirmed separately by molecular-mechanics calculations) to form the tetracyclic skeleton of cephanolide B. The synthesis of the target is then accomplished via the (non-intuitive) construction of the aromatic part via Robinson annulation of 13 with butanone 14 and oxidation of the thus-obtained enone.
Extended Data Fig. 3 Enantioselective synthesis of a cyclopiane diterpene, conidiogenone B, and its derivative designed by Chematica.
Synthesis of conidiogenone B, which includes a challenging 6–5–5–5 ring system and six contiguous stereocentres (of which three are quaternary), was recently accomplished in 14 steps49 (starting from trimethylcyclopentenone, itself one step from a buyable substrate) and relied on a substrate-controlled Nicholas/Pauson–Khand reaction and Danheiser annulation. Chematica’s plan (top panel) also uses 14 steps and relies on intramolecular alkylations to construct five-membered rings and Diels–Alder cycloaddition to build the six-membered ring of conidiogenone B. The synthesis commences with the chiral-auxiliary-controlled alkylation of cyclopentenone 4 with protected bromoethanol 5 to install the first stereocentre. Subsequent Stork–Danheiser transposition is followed by a substrate-controlled addition of a tertiary organocuprate and intramolecular alkylation to yield the bicyclic ketone 10, which is further methylenated to enone 11. Formation of the six-membered ring of conidiogenone B is accomplished via the Diels–Alder reaction of 11 with diene 12 to give the tricyclic ketone 13, which is further elaborated into iodoketone 17. Formation of the last ring of conidiogenone B is accomplished via the intramolecular alkylation of the ketone. In the bottom panel, Chematica was asked to design a plan for a more complex derivative of conidiogenone B, which differs by an extra methyl group (at a new quaternary stereocentre). Within 18 steps from the target, Chematica reached a known enantioenriched ketoester 4 (marked with a yellow asterisk) which was then sourced, in a few minutes of additional searching, to the commercially available and inexpensive 1. The synthesis commences with the reduction of the ketone (with stereochemistry controlled by Noyori’s catalyst). Subsequent substrate-controlled alkylation and oxidation are followed by elaboration of ester 4 into iodoenone 10. Stereoselective alkylation with protected bromoethanol 11 and subsequent cyclization yields the bicyclic ketone 13, which is further elaborated to tricyclic enone 17. We make two notes here. First, owing to the presence of a matched stereocentre, conversion of 10 to 12 could probably be performed as one step, without Enders’ auxiliary to control the stereochemical outcome. Chematica did not recognize this possibility, probably because it has not yet been taught detailed rules that govern substrate-directed alkylations controlled by quaternary stereocentres. Second, desmethyl analogue of enone 17 was also used in the published synthesis of conidiogenone B, but, to form the six-membered ring, it was subjected to Danheiser annulation followed by ozonolysis-aldol condensation rather than to Diels–Alder cyclization. The formation of the last ring of conidiogenone B is accomplished via intermolecular Diels–Alder reaction with electron-rich diene 18 (available in a single step from pent-3-enal) approaching from the less hindered face of the enone (see refs. 60,61,62 for similar Diels–Alder cyclizations promoted by Lewis-acid catalysts). From this point, the target molecule is obtained in three straightforward steps.
Scabrolide A is a polycyclic furanobutenolide-derived norcembranoid diterpenoid that belongs to a family of marine natural products isolated from Sinularia soft corals63,64. The molecule poses a synthetic challenge owing to its compact, densely functionalized core: a fused 5–6–7 carbocyclic scaffold decorated with five adjacent stereocentres and one additional remote stereocentre on the seven-membered ring. A recent literature pathway50 (to the enantiomer from ref. 64) comprises 21 synthetic steps and relies on the intramolecular Diels–Alder cycloaddition and late-stage [2+2] photocycloaddition/fragmentation sequence. During computer planning of the enantiomer from ref. 63, several constrains were imposed; for example, Chematica was asked to design an enantioselective strategy (using the REMOVE_DIAST variable to exclude reactions that lead to a single racemic diastereoisomer), and was not allowed to use SAMP or RAMP hydrazones (to minimize the use of chiral auxiliaries), or highly strained bridgehead intermediates. The route proposed by the software is longer (about 30 steps) and more conservative in the sense that it relies on only broadly applicable chemistries. When planning its route, Chematica did not know the highly scaffold-specific (though elegant) fragmentation–recombination–elimination sequence of steps used towards the end of the literature pathway. The synthesis proposed by the machine relies on an intramolecular aldol addition of 17 followed by FGI, which sets the scene for the closure of a six-membered ring via alkylation reaction to yield intermediate 20. Subsequent substrate-controlled, stereoselective addition installs the tertiary alcohol. Reduction (with double-bond migration) of intermediate 21 followed by reductive ozonolysis sets the scene for the construction of the second five-membered ring of scabrolide’s scaffold. The fourth and final, seven-membered ring is closed via Pd-mediated coupling. The starting material initially identified by the software (aldehyde 11) is not commercially available, but can be sourced in four steps from (±)-cis-bicyclo[3.2.0]hept-2-en-6-one. Looking for alternative endings of the pathways, that terminate in commercially available, achiral and inexpensive starting materials, we restarted the search from a node marked in the graph view (top) by a yellow asterisk (bicyclic intermediate 18). The alternative ending (blue reaction arrows in the bottom scheme) was found within about half an hour and commenced from readily available, protected hydroxyaldehyde and cyclopentanone. The initial ending, starting from the aldehyde 11, is marked by green arrows.
Extended Data Fig. 5 Chematica-designed, enantioselective synthesis of taxuyunnanine D, a less oxidized taxane.
The previous synthesis51,65 of this target was accomplished in 12 steps via a two-phase cyclase-oxidase strategy, and required extensive exploration of conditions to achieve satisfactory selectivity during C–H oxidations. Here, within 14 steps from the target molecule, Chematica reached simple and known starting materials: iodocyclohexenone 6 and protected iodoethanol 7. The synthesis commences with the Pd-mediated coupling of 6 and 7. Subsequent catalyst-controlled methylation and oxidation introduce the all-carbon quaternary and C5 hydroxylated stereocentres of taxuyunnanine D. Subsequently, protection of alcohol, stereoselective alkylation of cyclohexanone (with proposed Enders’ auxiliary controlling the stereochemical outcome, but probably also feasible when performed directly; see notes in the caption of Extended Data Fig. 3), Hofmann elimination, removal of protecting groups and Appel reaction yield iodide 15, which is coupled with iododiene 5 (available in four steps from enone 1) to give triene 16, setting the scene for the key formation of the taxane skeleton via electron-neutral intramolecular Diels–Alder cycloaddition (such an electronically neutral system that lacks electron-withdrawing groups may require activation with high temperature or a transition-metal catalyst66). Formation of taxuyunnanine D from the [4+2] cycloadduct 18 is then accomplished in two steps and requires olefination of ketone and allylic oxidation. The latter step appears less risky compared to the known solution51, because 19 lacks any competitive allylic CH2 groups, which are prone to oxidation and could cause selectivity problems.
Extended Data Fig. 6 Chematica-designed, enantioselective synthesis of a marine steroid, aplykurodinone-1.
Prior syntheses67 of this target, featuring six contiguous stereocentres, either relied on the late-stage introduction of the side chain via Michael addition to cyclopentenone (which suffers from low selectivity), or started67 from chiral building blocks (in the latter case, in 11 steps but from much more advanced, chiral substrates). Chematica used 17 steps to reach achiral and commercially available substrates: crotonyl chloride, allyl bromide and bromochloropropane 2. This synthesis commences with the installation of two contiguous stereocentres via stereoselective vic-difunctionalization of unsaturated amide and subsequent hydroboration and bisoxidation, followed by McMurry coupling to give cyclopentene 7. From there on, oxidation of the less hindered allylic position, methylation of cyclopentenone, reoxidation and formation of imine (elegantly ensuring that a single regioisomer would form in the Diels–Alder reaction) with aminodiene 10 (available in four steps from ethyl sorbate) derives triene 11, which is then used in an intramolecular Diels–Alder cycloaddition that forms the desired 6–5 ring system of aplykurodinone-1. Hydrolysis of the imine linker and conversion of the primary amine to the carboxylic acid via oxidation and hydrolysis yields 15, which is then subjected to iodolactonization followed by dehalogenation to form the entire 5–6–5 ring system. The synthesis is completed by elaborating the remaining alkyl chloride to the desired alkene.
Extended Data Fig. 7 Chematica-designed enantioselective synthesis of a tetracyclic alkaloid, dendrobine.
Synthesis of this target, which features a challenging 5–6–5–5 ring system and seven contiguous stereocentres, was performed recently53 in 11 steps, taking advantage of enantioselective Diels-Alder reaction, substrate-controlled hydroboration and reduction of imine as the key steps. In Chematica’s synthetic plan, within 14 steps from the target, the software reached the commercially available crotonyl chloride and known 3-iodopropanol (that is, simpler starting materials than the Danishefsky’s diene and unsaturated imide used in the literature synthesis). The synthesis commences with the chiral-auxiliary-controlled alkylation of the amide enolate. Ensuing steps allow for the preparation of enoate 10. Further homologation with allylic phosphonate 11 (available in two steps from an appropriate alcohol) and hydrolysis yield the triene 13, setting the scene for an intramolecular Diels–Alder reaction that forms the desired 6–5 ring system. Subsequent hydroxylactonization gives tricyclic alcohol 15, which is then efficiently transformed into the target molecule via stereoretentive chlorination of the alcohol, Cbz removal and substitution of chloride.
Extended Data Fig. 8 Enantioselective synthesis of mevastatin designed by Chematica with all its reaction knowledge and on exclusion of user-specified reaction types.
Top, synthetic plan obtained when the program was allowed to use all of its reaction knowledge base. Under these circumstances, the planned route relies on an intramolecular Diels–Alder reaction to construct mevastatin’s 6–6 ring system. The synthesis commences with stereoselective reduction of a ketone to give iodoalcohol 2, which is transformed in five steps into triene 8. Subsequent cycloaddition (note that such an electronically neutral system that lacks electron-withdrawing groups may require activation with high temperature or a transition-metal catalyst66) and elaboration of the side chain give the target molecule in the total of 14 steps. Bottom, synthetic plan designed by Chematica when it was forbidden from using the key Diels–Alder reaction and was thus forced to come up with a completely different approach; the synthesis is now much longer. The formation of each ring is accomplished via ring-closing metathesis.
Extended Data Fig. 9 Pathways leading to ramelteon designed by the software with and without multistep strategizing routines.
The top synthetic pathway was designed without the new, multistep heuristics. The scaffold of the target was constructed via Cu-catalysed hydroalkylation of alkenes68. Although the pathway does not contain chemically erroneous steps, it is long, relies heavily on reductions and oxidations, and involves many FGIs. The bottom route, designed with the new strategizing routines, is more concise and elegant. The key element in this path is a strategy that relies on Robinson annulation followed by dehydrogenation of enones (in the retrosynthetic direction, when planning the route, the program strategizes and first performs a seemingly unproductive dearomatization of a phenol, which then enables Robinson annulation).
Extended Data Fig. 10 Pathways leading to tybost designed by the software with and without multistep strategizing routines.
The top synthetic pathway was designed without the new multistep algorithms. This route is longer and requires additional protection and deprotection operations on intermediate 11 (node in blue halo). The program was not able to find better routes even after hours of searching. In the bottom route, when the program was allowed to strategize, it found a more elegant route that relies on two bypasses (two sets of red reaction arrows) and one FGI (pair of violet reaction arrows). The software navigated the pathways to starting materials that already had relevant groups protected (such that no protections were required mid-way into the pathway) and were easily available from appropriate amino acids.
This file contains additional synthetic, spectroscopic, chromatographic, and statistical details and includes Supplementary Figures 1-89.
This zipped folder contains 3 files. The pseudocode for the multistep retrosynthetic design, pathway generation and retrieval can be found in the PSEUDOCODE_Aug2.pdf file. An example of one of the reaction rules as coded in Chematica is provided in the RULE.pdf file. Additional details of the software’s availability and execution are given in the README_Aug2.pdf file.
About this article
Cite this article
Mikulak-Klucznik, B., Gołębiowska, P., Bayly, A.A. et al. Computational planning of the synthesis of complex natural products. Nature 588, 83–88 (2020). https://doi.org/10.1038/s41586-020-2855-y
This article is cited by
Nature Synthesis (2023)
Nature Computational Science (2023)
Nature Communications (2023)
Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing
Nature Communications (2023)
Artificial Intelligence Review (2023)