Synthetic organic chemistry is the science of building desired chemical structures from simpler parts. The knowledge and experience of researchers has always been the key to combining chemical reactions into successful synthetic schemes. But in a paper in Nature, Segler et al.1 report that an artificial-intelligence program can design routes for synthesizing compounds that — at least, on paper — seem just as good as those produced by humans.
Organic chemists often work by thinking backwards as much as they do forwards when designing a synthetic route. The concept of retrosynthesis2, introduced by E. J. Corey in the 1960s, and for which he was awarded the Nobel Prize in Chemistry in 1990, codified the way in which many chemists think (Fig. 1). When looking at a target molecule, they ask: “What could this have been made from? Which bonds could have been formed, and which atoms or chemical groups could have been added or transformed?” Then, the process starts again, as researchers try to determine the reactions that could have led to the precursor molecule. The aim is to work back to easily available starting compounds, while balancing the factors that make a good synthesis — including the number of steps involved, the probable product yields of those steps, and the ease of use of the chemistry involved. Organic chemists deal constantly with such questions, for example when making compounds for testing in drug-discovery programmes.
Since the birth of synthetic organic chemistry in the mid-nineteenth century, a huge number of synthetic organic reactions have been reported across a literature that gets larger every hour. Before the 1980s, many chemists kept collections of handwritten, cross-referenced index cards containing useful reactions from the literature, to guide the design of synthetic pathways. These aide-memoires moved naturally on to digital databases as computer technology became widespread.
These days, chemists review the various methods for turning chemical group X into chemical group Y by drawing the molecular structures of interest using a computer program and then performing an online search for relevant reactions. This almost invariably produces a long list, from which researchers must select the most appropriate reaction for their needs, according to their knowledge and experience. Stringing such reactions into a useful synthesis has been thought of as a problem that only humans can solve.
But does it have to be? Could a sufficiently large and well-curated database of chemical transformations be used as the basis for a program that not only finds reactions, but also arranges them into plausible synthetic plans? Such programs have been sought since Corey’s work in the 1960s, but (until recently) with little practical success.
Two fundamental problems have frustrated the dream. First, computing hardware simply could not tackle the scale of the challenge. Second, the chemical literature is hard to define in terms that a software program can understand: given reactions would work for the type of compound for which they were claimed to work (most of the time), but only under certain conditions. For example, group X would turn into group Y, unless group Z was present elsewhere in the reactant molecules. When group Z was present, the reaction might still work if group Q was nearby in the same molecule — but only, for instance, when the pH was lower than a certain value, or when the temperature was high enough, or when there was no water present.
There are various ways to overcome this second problem. One is to provide the program with an exhaustively human-curated list of the reactions that can enable a desired chemical transformation, and which takes into account all the limitations and conditions. The program can then combine such reactions into synthetic routes in ways that are broadly similar to those used to evaluate combinations of chess moves. This approach is starting to yield results (see ref. 3, for example), and several competing commercial software products are available.
Segler et al. have investigated another method: instead of getting researchers to load their expertise into a machine, is it possible to design a program that learns by itself what researchers know? This concept has already produced startling results, with programs that can learn to play games such as Go on their own4, rather than being trained using lists of human strategies.
The authors devised a computational process that starts by automatically extracting chemical transformations from a large commercial database, being careful to include only reactions that have been reported several times. Their system accepts these well-precedented reactions as ‘allowed moves’ in organic synthesis. When the system is asked to devise a synthetic route to a target molecule, it works backwards from the target as would a human, picking out the most promising precursor molecules according to the design rules that it has learnt, and then seeing how feasible it is to synthesize those. The authors combined three artificial neural networks with a random Monte Carlo tree search — a type of search algorithm used by computers in certain decision-making processes — to narrow down the most promising synthetic routes, without getting stuck too quickly on a particular path.
Importantly, the routes that emerged were evaluated not only by the program’s scoring system, but also by trained organic chemists in a blind test for plausibility. When the chemists were asked to assess machine-generated synthetic pathways for target molecules alongside routes reported in the literature, they expressed no preference for the routes that had been shown to work by their fellow researchers. In other words, they found the chemistry suggested by the program to be as reasonable as the syntheses proposed by researchers.
This does not necessarily mean that all machine-suggested routes will work in the laboratory — but, as organic chemists know to their sorrow, many routes designed by humans fail there, too. Further development of the program could include such ‘reductions to practice’, to determine whether the machine-proposed routes are better (or, at least, no worse) than those devised by people. A study3 this year that assessed a more conventional, hand-curated retrosynthesis program is notable for its inclusion of such a laboratory test. Achieving routes that are ‘no worse’ than those of researchers is a clear victory for Segler and colleagues’ program, which arrives at pathways in considerably less time, and with much greater coverage of the literature, than a person could manage.
If such programs fulfil their promise, and there is little reason to think that they won’t, synthetic chemists will find that a mainstay of their work starts to disappear. Technological innovations have had similar effects in the past, but usually by automating physical ‘grunt work’ that is missed by no one. Disconcertingly, developments in artificial intelligence encroach on the thinking part of the job. There will always be complex, unusual and unprecedented structures that such software cannot handle, but the task of solving more-routine synthetic questions will be taken out of the hands of researchers.
The idea that intellectual tasks can be categorized as automatable grunt work will probably be insulting to many chemists, and will certainly feel like a threat. But the use of artificial intelligence will actually free up time in which they can think more about the higher-level questions of which molecules should be made and why, rather than focusing on the details of how to make molecules. Not all researchers will welcome this shift. But it seems to be coming, regardless.
Nature 555, 592-593 (2018)