When it comes to making molecules, less is more. Peptides1, oligonucleotides2 and increasingly oligosaccharides3 can now be made at the push of a button, by anyone, and the impact on society has been transformative. These automated approaches are achieved with iterative synthesis — using just a few reactions repeatedly to access many different molecular structures and functions. Small molecules, functionally rich chemical matter composed primarily of carbon–carbon-bond-based skeletons, are much more challenging to access this way. But nature does it most of the time4,5,6, and progress in the lab is being made7,8. Now, in Nature Synthesis, Grzybowski and co-workers report an ingenious computer algorithm that can autonomously discover thousands of iterative syntheses of small molecules and motifs found in natural products9.

To enable this breakthrough, Grzybowski and co-workers leverage their leading Chematica platform10 for automated retrosynthesis, which generates efficient synthesis plans, even for highly complex natural products, that are indistinguishable from those generated by chemists11. Chematica combines the power of machine learning with an extensive database of known chemical transformations and more than 100,000 expert-coded rules that capture many aspects of the fundamental principles of organic chemistry. These aspects include reaction mechanisms, stereochemistry, functional group incompatibilities, tactical reaction combinations and molecular symmetry. To transform Chematica into a discovery engine for iterative synthesis pathways, a new way to autonomously search for closed loops of reactivity amongst the infinite universe of possible synthesis pathways is invented (Fig. 1). This search is akin to the way machine learning algorithms autonomously scan the sky for iterative fluctuations of light generated by exoplanets as they circle their stars12.

Fig. 1: A computer algorithm identifies constellations of chemical transformations that represent iterative reaction sequences.
figure 1

A new web application, Allchemy’s ‘Iterator’ module, illuminates an extraordinary range of potential products that can theoretically be generated by each identified iterative synthesis process.

The computer algorithm identifies constellations of synthetic transformations that link substrates to intermediates, which, in turn, can be transformed into products that possess the same functionality as the initial substrate (Fig. 1). To identify such an iterative sequence, the algorithm requires that four conditions are met: (1) the sequence cannot produce a product identical to the substrate; (2) the proposed reactions must maintain functional group compatibility with each compound; (3) one reactive motif must be present in the initial substrate and the product; (4) the chemical reactions must convert that motif into a new functional group in the intermediate, and either interconvert back to the original motif or interact with a newly introduced reactive species containing the original motif. Importantly, the algorithm extends beyond traditional cycles of deprotection and coupling steps and can find between two and four compatible iterative reactions.

Searches using this algorithm illuminate thousands of iterative synthesis pathways. These pathways include recursive approaches to form a wide range of motifs resembling complex natural products, pharmaceuticals and materials. Several of these identified sequences are validated experimentally. For example, computer-identified iterative sequences are employed to prepare key subsections of the natural products squamocin, nystatin and monhexocin in a stereoselective fashion. These experimental efforts showcase the usability of this program while highlighting the potential to prepare highly complex natural products with iterative chemistry.

Although this report is a major step forward, the iterative sequences that have been experimentally validated thus far primarily access chemical space that tends to overlap with known iterative pathways, such as asymmetric allylation and crotylation reactions8. Moreover, the reactions employed in these sequences are stereoselective, as opposed to stereospecific, and this can lead to a lack of stereocontrol when performed on complex chiral substrates. By contrast, stereospecific reactions can faithfully translate stereochemistry from prefabricated building blocks into growing products. Excitingly, the algorithm also proposes ways to make such complex natural-product-like motifs via new stereospecific iterative sequences. Other sequences are identified that theoretically give access to a range of polyheterocyclic systems found in many pharmaceuticals and materials, including motifs that have thus far proved challenging to access with automated iterative chemistry. These pathways have substantial potential to address areas of unmet methodological need in iterative small-molecule synthesis, however, they remain to be put into practice.

It is exciting to ask how many iterative sequences will be needed to access most of the functional space that small molecules represent. Inspiringly, after a billion years of probing this question, nature uses only a few sets of iterative synthesis pathways to make most natural products4,5,6. Thus, just a small set of complementary iterative synthetic pathways give rise to an extraordinary range of useful molecular functions. An important challenge moving forward will be to identify the analogous minimized collection of iterative sequences that can be automatically conducted in the lab to collectively cover most of this targeted functional space. This study provides thousands of great new options to address this challenge. Further empowering such a search, the researchers developed a web application, dubbed Allchemy’s ‘Iterator’ module, that enumerates the vast regions of chemical space that can be theoretically accessed using each of the known and novel iterative sequences that were identified (Fig. 1).

Fully automated iterative platforms for making small molecules will enable a transition from customized molecular synthesis currently available only to highly trained experts, to a democratized discovery engine broadly available to anyone in the world. An inspiring recent example of the disruptive impact of achieving such a goal can be found amongst the stars. NASA’s citizen-scientist-led Transiting Exoplanet Survey Satellite (TESS) enables amateur stargazers to participate in scientific exploration, leading to the discovery of a new exoplanet13. Imagine what is possible when brilliant minds from different backgrounds can gaze into the depths of functional chemical space to help discover tomorrow’s personalized anticancer drugs, sustainable plastics and highly efficient organic photovoltaics.