Main

'Automation of science' bears the promise of making better decisions faster1. In drug discovery, automated systems already have a long and fruitful history2 (Fig. 1). Medium-throughput to high-throughput robotic screening in specialized assays has become standard in the pharmaceutical industry (Fig. 2). The breadth of other applications of automated systems extends from decision-support systems, to computational molecular design to fully fledged robotic synthesis and hit finding3. Prominent examples include traditional rule-based and model-based approaches (for example, the archetypal DENDRAL system for analysing mass spectra4, LHASA5 software for synthesis planning and various in-house tools for accessing and analysing chemical and biological data similar to Amgen's AADAPT system6), various software tools for de novo molecular design7 and prototypical robotic systems such as ADAM and EVE for automated target and hit finding1,8.

Figure 1: The molecular design cycle.
figure 1

Starting off from results obtained by high-throughput compound screening, fragment screening, computational modelling or data from the literature, this feedback-driven discovery process alternates between deduction and induction, eventually leading to optimized hit and lead compounds. Smart automation of the individual parts of the cycle can help to reduce randomness and error, thereby supporting less wasteful, more productive and efficient drug discovery. Miniaturization and advanced lab-on-a-chip technology, together with machine learning methods, represent enabling technologies. The whole design cycle can also be performed completely inside a software program. These adaptive de novo design methods are equipped with both chemical knowledge for in silico compound synthesis and meaningful virtual screening models as surrogates for biochemical and biological tests, while active learning algorithms enable chemical space navigation towards compounds with promising properties. Note that the terms 'deduction' and 'induction' in the context of drug discovery are not always used in a strictly logical sense. Induction refers to explanatory reasoning in generating hypotheses. Deductive inference necessarily results in a true statement if the underlying hypothesis is true. Because a hypothesis in drug design is based on incomplete, error-prone experimental data, the term 'abduction' may be formally better suited. (Q)SAR, (quantitative) structure–activity relationship.

PowerPoint slide

Figure 2: Automated drug discovery facilities.
figure 2

a | Millions of compound samples are stored in compact high-capacity facilities and handled by robots. b | Robot systems perform both high-throughput and medium-throughput screening of up to ten thousand samples per day to determine the activity against the biological target of interest. Multiple arms and flexible workstations enable fully automated liquid dispensing, compound preparation and testing. These storage and screening systems have become cornerstones of contemporary drug discovery. c | A prototype of a novel miniaturized design–synthesize–test–analyse facility for rapid automated drug discovery at AstraZeneca is shown. Images a and b courtesy of Jan Kriegl, Boehringer–Ingelheim Pharma; image c courtesy of Michael Kossenjans, AstraZeneca.

PowerPoint slide

Nevertheless, the full integration of all aspects of compound design, synthesis, testing and automated iteration throughout the molecular design cycle (Fig. 1) has not yet been productively applied on a broader scale, although there have been a few isolated proof-of-concept studies. For example, MacConnell et al.9 recently disclosed a microfluidics-based, miniaturized discovery platform for ultra-high-throughput hit deconvolution by sequencing. The device distributes DNA-encoded compound beads into picolitre-scale droplets, cleaves off the compounds from the beads by ultraviolet (UV) irradiation and performs a fluorescence-based binding assay, hit detection and subsequent hit identification by DNA barcode sequencing. By replicate analysis, the authors were able to reduce the false-positive hit rate to below 3%. This proof-of-concept study highlights the use of integrated microfluidics systems for large-scale screening within short, hour-scale time frames and with very low material consumption. Another example is provided by researchers at AbbVie, who have developed an integrated robotic platform for the automated parallel synthesis of small, focused compound libraries, built mainly from commercially available components10. Their system is able to perform liquid handling and evaporation for in-line analytics, purification and activity testing. Turnaround times of 24–36 hours were reported, which allow the project teams involved to obtain results from hypothesis testing within a day or two11. Similar robotic systems have been installed or are under construction in several pharmaceutical companies (for an example, see Fig. 2, right panel).

Now, advances in areas such as 'organ-on-a-chip' technologies and artificial intelligence are increasingly providing the basis for more widespread application of semi-autonomous or even fully autonomous processes to support project teams in identifying and optimizing tool and hit compounds in drug discovery. The benefits of automation include: diminished measurement errors and reduced material consumption by the application of standardized procedures with robotic support; shortened synthesize-and-test cycle times, enabling fast feedback loops and compound optimization; and 'objectified' molecular design towards multiple relevant biochemical and biological end points without personal bias. Furthermore, given the increased interest in the application of sophisticated cell-based assays12 — in an effort to more effectively recapitulate disease biology and thereby improve the likelihood of identifying compounds that show efficacy in humans — more rigorous compound prioritization aided by automated approaches could be particularly important because these assays are not always suitable for high-throughput compound testing13,14.

The potential value of more fully integrated automated systems in drug discovery is substantial. However, as with past technological advances that have raised hopes of revolutionizing drug discovery (but often not lived up to expectations), it is important to look beyond the hype, for example, around automated high-throughput combinatorial synthesis, 'big data' and artificial intelligence. This article aims to identify the key approaches and technologies that could be implemented robustly by medicinal chemists in the near future and to critically analyse the technological and conceptual challenges of doing so in the context of workflows in industry. It first summarizes the state of the art in the application of automated systems in separate aspects of the 'design–synthesize–test–analyse' cycle and then discusses progress in the integration of these aspects to fully harness the potential of automation in drug discovery.

Automation in molecule design

Medicinal chemists select, design and prioritize molecular structures on the basis of factors including the desired biological activity of the compounds, other characteristics important for drugs (such as absorption, distribution, metabolism, excretion and toxicity (ADMET) properties), the availability of compounds and retrosynthetic analysis (if the compounds are being synthesized rather than being sourced from existing libraries or commercial suppliers). Consequently, medicinal chemists routinely face complex multidimensional optimization problems, with the importance of different parameters changing as the drug discovery process progresses from the identification of initial screening hits (when identifying compounds with the relevant biological activity is crucial) via hit-to-lead expansion (which often requires massive synthetic effort to improve compound activity and developability) towards the selection of clinical candidates (when there may be a need to compromise to achieve the best possible mix between desirable biological activity and desirable ADMET properties). Given the vast size (cardinality) of the relevant 'chemical space', which is estimated to be in the range of 1030–1060 drug-like molecules, the key challenge for medicinal chemists could be summed up as 'what to make and test next?' Automated drug discovery platforms must be able to provide the right answers to this question.

Chemical design concepts. Traditionally, compound selection and/or design was the sole domain of medicinal chemists, drawing on their expert knowledge and providing a substantial role for intuitive decision making. Over the past two decades, various broad concepts have emerged to help guide compound library design, hit-to-lead expansion and the enrichment of compound collections with new chemical entities. For example, diversity-oriented synthesis (DOS) provides a rationale for generating collections of small molecules with diverse functional groups, stereochemistry and frameworks in a controlled fashion15,16. Following this concept, Maurya and Rana17 recently reported on the diversification of macrocycles by carbohydrate-derived building blocks. As a complement to DOS, biology-oriented synthesis (BIOS) takes natural products as templates for generating synthetically accessible derivatives and mimetics18,19, often relying on natural product-derived scaffolds20. Finally, so-called function-oriented synthesis (FOS)21 strategies take the BIOS concept to the next level by aiming to recapitulate or tune the function of a biologically active lead structure to obtain simpler scaffolds, increase their ease of synthesis and achieve synthetic innovation22. A recent example of the FOS approach is the successful design of oxazolidine derivatives with antibiotic activities as simplified analogues of the structurally intricate natural product caprazamycin from Streptomyces23.

A wide range of guidelines that aim to improve the lead-likeness or drug-likeness of compounds have also been introduced, beginning with Lipinski's recommendations (often referred to as the 'rule of 5')24,25 and combined ligand efficiency (LE) and lipophilic ligand efficiency (LLE) values, which can be applied automatically or semi-automatically as computational filters for existing compound libraries or candidates for synthesis (see Refs 26,27,28 for reviews). Early applications of artificial neural networks have contributed to rationalization of the drug-likeness concept in more sophisticated abstract terms and enabled on-the-fly computational compound profiling29,30. Importantly, it has been realized that compound quality can be controlled by appropriate lead selection and optimization based on informed decisions rather than by the naive application of empirical rules31. Today, fully fledged in silico decision-support systems that greatly extend and augment such concepts and guidelines can assist medicinal chemists in multi-objective compound design, selection and prioritization32,33. A consequent 'predict first' mindset has recently been advocated by researchers at Merck, drawing from positive experiences with their own integrated design–make–test activities34. The concepts and guidelines have been reviewed comprehensively in the articles cited above, and thus this article focuses on some selected illustrative examples, as well as the limitations and challenges of autonomous computational selection and design of compounds.

Automated de novo design. Importantly, the probabilities of the underlying research hypotheses are recorded as experimental metadata and stored in databases, which enables automated semantic analysis, generating both revised design hypotheses and deriving new examples (that is, chemical entities) for testing35,36. Numerous automated compound generators and selection operators have been conceived for this purpose, some of which use certain classes of 'deep' machine learning methods; for example, generative and recurrent neural networks37,38, inverse quantitative structure–relationship models39,40,41 and reaction-based compound assembly techniques42.

De novo molecular design methods in particular have matured enough to be applicable in prospective settings and are now receiving increasing attention. Figure 3 presents examples of recent compounds that were obtained by fully autonomous or semi-autonomous de novo computational design. In each of these cases, a computer-generated molecular design hypothesis guided the decision of which compound to make next. The first example (Fig. 3a) demonstrates how computational target prediction can prioritize combinatorial compound assays. A focused imidazopyridine (compound 1) library was obtained by linear microfluidic synthesis on a chip, with the building block selection performed by an ant colony algorithm and multi-target activity predictions43. Several active molecules, such as compound 2, were obtained within minutes. The results of this study provide support for the close integration of microfluidics-assisted synthesis with computer-based target prediction as a viable approach to rapidly generate bioactivity-focused combinatorial compound libraries with high success rates. We revisit this design concept in more detail in the subsequent sections of this article.

Figure 3: Examples of automated computer-assisted de novo design as an enabling technology.
figure 3

a | A focused library of compounds with an imidazopyridine scaffold (compound 1) was synthesized on a microfluidics chip, based on the Ugi three-component reaction43. Coupling building block prioritization to a computational method for predicting ligand–target association led to the rapid identification of several ligands for G protein-coupled receptors (GPCRs), such as the α1A and α1B adrenoceptor antagonist shown (compound 2). b | Integration of computational activity prediction at GPCRs with microfluidics-assisted synthesis based on a reductive amination reaction enabled the identification of ligands with various binding profiles (compounds 3–6)44,45,46. c | Automated scaffold hopping from the drug fasudil (known to be a moderate inhibitor of death-associated protein kinase 3 (DAPK3); compound 7) and structure determination enabled the identification of a novel DAPK3 inhibitor (compound 8). On the basis of its binding mode determined by crystallographic studies, the diuretic drug azosemide (compound 9) was identified as a DAPK3 inhibitor47. d | Compounds 10 and 11 are examples of ligand structures that were computationally optimized from weaker or less selective precursors by using design methods trained on publicly available activity data48,49. e | The natural product (−)-englerin A (compound 12) was computationally morphed52 into the synthetically accessible compound 13; both compounds inhibit the transient receptor potential cation channel subfamily M member 8 (TRPM8). IC50, half-maximal inhibitory concentration; Ki, inhibition constant; LE, ligand efficiency; VEGFR, vascular endothelial growth factor receptor.

PowerPoint slide

The second example (Fig. 3b) showcases the benefits of using virtual library enumeration in concert with target-panel prediction for focused library design and building block selection. Compounds 3–6 originated from the same chemical space accessible by reductive amination reaction products but possess different target preferences, validating the computational selection strategies employed. Compounds 3 and 4 were identified as potent and target-subtype selective ligands and synthesized in flow on a microfluidics chip44. Compound 5 was obtained as a target-subtype selective serotonin receptor 5-HT2B antagonist based on computational prediction, with no activities towards a large panel of off-targets45. By contrast, compound 6 was deliberately designed as an 'ultimately promiscuous' ligand, without showing aggregation in solution or possessing undesired frequent-hitter properties46. Importantly, very few compounds had to be synthesized to reach the design objectives.

The example shown in Fig. 3c demonstrates the advantageous interplay between ligand-based and structure-based hypothesis generation for scaffold hopping. With the known drug fasudil (a vasodilator, potent Rho kinase inhibitor and moderate inhibitor of death-associated kinase 3 (DAPK3)) as a template, computational de novo design suggested several scaffold hops47. A target prediction method relying on self-organizing neural networks prioritized these frameworks to obtain a novel DAPK3 inhibitor, compound 8. Subsequent crystallographic studies confirmed the binding of inhibitor 8 in the ATP–substrate pocket of the kinase (Protein Data Bank identifier: 5a6n). On the basis of the known binding mode of the de novo generated ligand, the diuretic drug azosemide (compound 9) could be identified as a DAPK3 inhibitor. This particular study succeeded in lead identification through the combination of automated scaffold hopping and experimental structure determination.

Compounds 10 and 11 are examples of computationally optimized ligand structures, starting from weaker or less selective precursors48,49 (Fig. 3d). In both cases, the design–synthesize–test cycles were guided by computational design methods trained on publicly available activity data, epitomizing the aforementioned 'predict first' philosophy.

The last de novo design example shown in Fig. 3e highlights the concept of automated morphing of natural products into synthetically accessible, isofunctional compounds, and illustrates the FOS design concept introduced previously. The natural anticancer compound (−)-englerin A (compound 12)50, which is synthetically accessible in a 14-step process51, was computationally (and by subsequent manual refinement) converted into compound 13, which could be afforded in only three synthetic steps52. Both compounds potently block transient receptor potential cation channel subfamily M member 8 (TRPM8) calcium channels, as correctly predicted by the software.

These selected examples of computer-assisted molecular design illustrate some of the potential of contemporary in silico methods for hypothesis generation. There is no doubt that state-of-the-art computational de novo design delivers new synthesizable chemical entities with desired properties. Multi-objective compound selection strategies have shown their applicability to de novo design, which is not only useful for prioritizing chemically attractive lead-like and drug-like molecular structures but also relevant in light of ligand–target promiscuity (estimates range between up to 5 and 11 pharmacologically relevant targets per drug)53,54,55,56. The logical next step is to combine these and related techniques with automated synthesis and compound testing in an integrated discovery platform.

Automation in compound synthesis

The automation and parallelization of chemical synthesis offer benefits such as increased speed and throughput, greater reproducibility, lower consumption of materials and, consequently, the possibility to explore wider areas of chemical space within a given time frame compared with manual, serial compound synthesis57. Historically, the first automated synthetic processes and robots were conceived for peptides58,59 (Merrifield's method for amide bond formation), oligonucleotides60,61 (solid-phase phosphoramidite method for internucleotide linkage) and later for oligosaccharides62 (for example, the trichloroacetimidate method for glycosidic bond formation).

A key element in each of these processes is the use of a small set of building blocks (including larger fragments) and a well-defined, robust chemical reaction to afford large sets of diverse products in high yields by iterative building block assembly, orthogonal protection group chemistry and purification. Various methodological and technical improvements, including stereoselective synthesis, parallelization of subprocesses and preparatory steps, miniaturization (small volumes and compact synthesis arrays) and automated in-line purification, have resulted in highly reliable synthesis machines for increasingly complex oligomeric structures. Their underlying general design concept mimics the biosynthesis of most natural products. Furthermore, combinatorial thinking has led to methods for the massively parallelized scaffold-centric synthesis of structurally diverse compound libraries63. Many of these approaches are readily amenable to miniaturization and inclusion in automated design cycles64. Researchers at Eli Lilly have established a superb example of such a fully automated robotic synthesis laboratory that can be remotely controlled, which is a major step towards advancing the efficiency and effectiveness of chemical synthesis for drug discovery65,66.

Some reaction schemes have been shown to be more agreeable than others for straightforward automation and parallelization67,68. Typically, these reactions do not require exotic reaction conditions, can be standardized, are amenable to a wide variety of (readily available or obtainable) educts and can be optimized for maximum yield. Prominent examples include scaffold-forming reactions (for example, the Pictet–Spengler reaction and metathesis reactions)69,70. Other desirable linkage reactions (for example, palladium-free C–C bond forming reactions) have been scarcely used in medicinal chemistry or automated synthesis set-ups71,72.

However, automated discovery processes may be crucial for exploring new chemistry73. One of the most versatile automated synthesis platforms for drug-like small molecules to date was developed by Burke and co-workers74. The synthesis of Csp3-rich macrocyclic and polycyclic natural products, pharmaceuticals and natural product-like cores was achieved by iterative building block assembly via automated C–C bond formation and cyclization reactions75 (Fig. 4). Cartridged bifunctional N-methyliminodiacetic acid (MIDA) boronate building blocks were prepared for this purpose, complementing the commercially available samples. Importantly, a small set of building blocks was sufficient for generating remarkable structural core diversity in the final products. The authors developed an in-line catch-and-release purification protocol for realizing a seamless three-step reaction cycle. Similarly to the automated synthesis of oligomers, this important advancement in automated synthesis was enabled by standardizing the synthesis and purification processes involved.

Figure 4: Automated formation of C–C bonds to yield structurally diverse products.
figure 4

The example demonstrates the concept of sequential boronate building block assembly. Four building blocks (coloured circles) are combined in a standardized deprotection, coupling and purification process. Synthesizers implementing this and other combinatorial reaction schemes can serve as chemistry modules in automated drug discovery platforms. Adapted with permission from Ref. 74, Science/AAAS.

PowerPoint slide

Microfluidics-based synthesis. 'From batch to continuous' is a general trend in industry and not limited to chemical production processes76,77. Evidently, miniaturized microfluidic synthetic and analytical devices will play a central role in drug discovery automation. Microfluidic reactors integrated with real-time product detection and a command-and-control system can, in theory, perform and analyse thousands of reactions on timescales that are not possible with conventional macroscale technologies.

Embracing such advantages demands the substitution of widespread, but inefficient, one-parameter-at-a-time methods with more sophisticated and specialized algorithms. For example, trial-and-error scanning of the experimental parameter space can identify local optima but often fails to find global optima. In the field of medicinal chemistry, reagents and products are often expensive. Furthermore, many reagents and intermediates have unknown hazards and must be treated with extreme caution owing to their unknown pharmacology. Microfluidics can offer an advantage by decreasing opportunities for human exposure and minimizing material usage78.

There are also several other technologies that can be used for this purpose. For instance, acoustic liquid handling systems for precision droplet dispensing are well-accepted tools in chemical synthesis that increase the reproducibility of experiments and reduce the amount of consumables needed, thereby cutting costs79,80. Exceptionally high precision has been reported for transferring microlitre droplets into well plates81. Nevertheless, each automation process requires skilled chemists and solid chemical engineering, as the individual usage of acoustic droplet ejection and its applicability depend on the types of liquids and mixtures handled82.

As a distinct feature of microfluidics systems, converging streams of fluids flow in parallel without turbulence (that is, the conditions of laminar flow are fulfilled), with characteristically low Reynolds numbers (the ratio of inertial forces to viscous forces, a dimensionless parameter indicating whether a flow condition will be laminar or turbulent)83. In addition to allowing miniaturized bioassays in flow, this property of microfluidics systems enables fine-tuned, diffusion-controlled synthetic reactions84. The short distances in microfluidic channels guarantee the desired rapid and controlled transport of heat and mass. Complex channel geometries, pulsed flow conditions and the high surface-to-volume ratio of miniaturized reactors can result in a dramatic increase in throughput and yield in microreactors85.

Ley and colleagues pioneered the field of flow chemistry, which has numerous practical applications in drug discovery; for example, the synthesis of imatinib in flow86, the translation of four sequential steps into a continuous-flow system to generate (E/Z)-tamoxifen with 100% conversion and 84% yield87 and numerous natural product syntheses88. Their seminal work has introduced single-step and multistep microscale and mesoscale flow systems, which enable otherwise difficult reactions with low yields or reactions that require special safety measures to be performed, such as hydrogenation or ozonolysis89,90,91. Warrington and co-workers have explored numerous reactions and microreactor designs, which have paved the way for advanced applications92,93,94,95. The technical capability of multistep continuous-flow synthesis was demonstrated by the Ley group in the generation of key intermediates for the total synthesis of the polyketide spirangien A96. This high-yielding system consists of heterogeneous reactor coils and microfluidics components, requiring minimal downstream processing.

Some of these techniques are already being applied in the pharmaceutical industry. For example, researchers at the Novartis–Massachusetts Institute of Technology (MIT) Center for Continuous Manufacturing succeeded in assembling a compact system for the continuous end-to-end synthesis of diphenhydramine hydrochloride, lidocaine hydrochloride, diazepam and fluoxetine hydrochloride in qualities that meet US Pharmacopeia standards97. Continuous-flow syntheses have also been used early on to obtain drug-like combinatorial compound libraries with heterocyclic scaffolds98,99.

Nagaki and co-workers noted the specific advantage of flow microreactors to enable 'flash' chemistry reactions that cannot be performed in batch100. The high-resolution reaction time control possible in microreactors allows access to a multitude of otherwise difficult synthetic procedures101. One such prominent example is the sequential synthesis of the subtype-selective retinoic acid receptor-α (RARα) ligand TAC-101 with a total on-chip residence time of 13 seconds and a productivity of 100–200 mg min−1 (Ref. 102). Another example is the high-temperature, high-pressure continuous-flow synthesis of 1H-4-substituted imidazoles103. The use of microfluidics technology to simulate the cytochrome P450-catalysed oxidation of drug molecules bears the promise of substituting in vitro metabolite identification by on-chip chemotransformations of compounds in the near future (for example, aromatic hydroxylation, C–H oxidation, glutathione conjugation and sulfoxidation)104,105. For further instances of advanced continuous-flow applications in chemical synthesis, see the topical review by Britton and Raston106.

Automated optimization of reaction conditions. Single-step and multistep syntheses can be optimized by feedback control107. Jensen and co-workers108 pioneered self-optimizing microscale and mesoscale reactor systems, for example, for C–C bond forming reactions. A recent example of such reaction optimization by suitable algorithms to achieve the maximum product yield, highest throughput and lowest production cost is the palladium-catalysed Heck–Matsuda arylation reaction109. Our group used microfluidic synthesis with in-line analytics to determine the optimal flow rate, temperature range, catalyst loading and reagent concentrations for continuous imidazopyridine formation on a chip43. Comparable conversion rates were obtained in a microwave procedure, albeit with much longer reaction times (15 min in the microwave reactor versus 0.3 s in flow). In-line mass spectrometry has enabled the optimization of atropine synthesis in microdroplets obtained by preparative electrospray (ES), as recently demonstrated by researchers from Purdue University110. They devised several continuous-flow set-ups with multistep or telescoped preparative ES, yielding up to 47% conversion of the starting material to atropine in residence times of a few minutes. Microfluidics techniques have also simplified the set-up and improved the functions of ambient mass spectrometry by integrating probe sampling and ES on a single glass microchip111.

Nevertheless, there are limitations to continuous-flow systems including the (in)stability of the fluidic interfaces between microscopic and macroscopic fluid handling and the deposition of reactive by-products, and automated batch synthesis and fast parallel synthetic strategies have been suggested as alternatives112. For example, researchers at Merck recently presented their 'chemical high-throughput experimentation' (HTE) platform in 3,456-well microtitre plates, aiming to optimize a key synthetic step in a drug discovery programme. HTE successfully identified the preferred catalyst, reaction conditions, reagents and solvents for the given transformation. The authors conclude that hypothesis-driven HTE allows a scientist to 'go fast' and may be considered the logical extension of traditional chemical experimentation113. Chow and Nelson114 have argued that automated HTE discovery workflows may enable expansion of the synthetic chemistry toolkit and increase innovation in medicinal chemistry.

An advantage of batch approaches, namely the ability to collect data from many time points in a single experiment, and a limitation of one-at-a-time flow experiments, has been addressed by recording time-series reaction and interaction data in-flow for kinetic analysis115. Similarly, microfluidics systems are no longer restricted to single-step reactions. For all these applications, in-line spectroscopy and purification of intermediates are vital to ensure maximal yields. Various fluorescence-based and infrared-based detectors, as well as Raman, NMR and mass-spectrometric analytical devices, have been integrated into continuous mix and flow systems116,117,118. Steady progress in miniaturized manufacturing of analytical devices facilitates system integration. In particular, 3D printing provides opportunities for building versatile multifunctional microfluidics modules with embedded in-line reaction monitoring and analytical capability119.

Droplet reactors. Although there are several off-the-shelf instruments available (for example, for hydrogenation reactions), the majority of current microfluidics platforms require a custom set-up, and one should carefully weigh the pros and cons of microfluidic versus batch technologies before deciding on a particular technology.

Coupling the individual components is an engineering challenge. The majority of platforms currently being introduced in industry for the automated parallel synthesis of small, focused compound libraries seem to operate without making extensive use of microfluidics-assisted chemical synthesis, probably because for certain microfluidic reactors, clogging of the reactor channels and leakage due to back-pressure issues or incompatibility of the solvents and materials remain a major problem. Performing chemical flow reactions in droplet environments offers a potential solution to several of these problems. Droplets may be considered isolated mini-reactors with volumes reduced to the femtolitre scale120,121, facilitating sorting and process control122. DeMello and co-workers123,124 have demonstrated that droplet-based microfluidics systems are precise tools for studying and optimizing the synthetic parameters of chemical reactions, leading to the production of materials with superior characteristics (Fig. 5).

Figure 5: Chemical synthesis in microfluidics droplet reactors.
figure 5

The image shows a microreactor channel with droplets containing multinary (Cs/FA)Pb(Br/I)3 perovskite nanocrystals123. Each droplet exhibits different, composition- dependent emission under ultraviolet excitation, revealing the compositional gradient along the reactor. The flow rates of the individual precursor streams provided control over reaction times as well as precursor concentration ratios. This example from the field of nanomaterials demonstrates the unique capabilities of droplet- based synthesis for the production of chemical matter. Image courtesy of Andrew J. deMello and Richard Maceiczyk, ETH Zürich.

PowerPoint slide

A challenge for drug discovery is the slow reaction time of many chemical transformations. Furthermore, any realistic application of such high-throughput miniaturized synthetic devices in drug discovery requires rapid in-line analytics of the generated products. Belder and co-workers125 have recently presented a droplet-based microfluidics system with seamless coupling to ES–mass spectrometry. In a proof-of-concept study, they applied the device to an amino-catalysed domino reaction in nanolitre droplets (Knoevenagel condensation followed by an intramolecular hetero-Diels–Alder reaction), with only picomolar amounts of catalyst needed. The greatly increasing numbers of applications and technological advances in the field of continuous microfluidic synthesis showcase the potential of these platforms for the high-throughput generation of diverse chemical entities for subsequent testing. The concept of continuous microfluidic reactors, which were originally designed for the continuous production of single compounds, has been augmented by their suitability for producing many compounds within very short time frames.

Microfluidics technologies for screening

The use of miniaturized microfluidics devices not only supports chemistry but also enables the use of human cell lines, biopsy material and organ models for screening, thereby helping to address the well-known issues with species-specific variations and poorly predictive animal models126,127. For example, liver-on-a-chip technology based on human hepatocytes can be used to swiftly screen compounds for cytochrome P450 binding to substrates and inhibitors, as well as subsequent high-performance liquid chromatography (HPLC)–mass spectroscopy for metabolite identification128. Combined with computational predictive models, this technology is ready for prospective practical application129. Cancer-on-a-chip systems that use single cells or 3D cancer models bear the promise of replicating the pathophysiology of human tumours and tumour environments in vitro130,131. Again, as with the many other organ-on-a-chip models, this technology has the potential to produce relevant readouts within short time frames and to enable informed hit and lead prioritization and optimization.

Physiologically relevant microfluidic environments are stable over weeks and have a footprint of a few square millimetres. For example, Loskill et al.132 recently presented a white adipose tissue (WAT)-on-a-chip system, allowing drug–WAT interactions to be studied by convective transport. Cao et al.133 reported a microfluidics system for rapid epigenetic DNA scanning to monitor drug effects on stem cells, using as few as 100 cells. Microfluidics platforms have been developed for the high-throughput (thousands of samples) analysis of DNA methylation patterns in low volumes on a chip, greatly extending chemical base modification studies for epigenetics-related drug effects134. Dittrich and co-workers135 demonstrated the possibility of determining the concentration of intracellular cAMP in response to extracellular stimuli in single cells, thereby greatly extending the capabilities of continuous chip-based assay systems for measuring relevant biochemical parameters for drug discovery. In addition, 3D triple co-culture microfluidics devices have been established as functional surrogates for the blood–brain barrier136.

Advanced nanotechnology offers even farther-reaching opportunities such as micromachines (nanobots) for drug delivery137. In fact, the prospect of combining nanotechnological devices with on-chip testing of computationally designed compounds does not seem far-fetched. Advances in chemical imaging further augment the capabilities of on-chip monitoring, for example, by miniature electrode arrays for high-resolution peak analysis138. 'Plug-and-play' microfluidics modules are the next step towards fully integrated on-chip drug discovery. Miled and co-workers developed such a modular lab-on-a-chip device for automated monitoring and modulating of the concentrations of neurotransmitters such as dopamine and serotonin, thereby opening new possibilities for functional drug screening with feedback control139.

Integration for automated design cycles

Coupling synthesis and testing. The Automated Lead Optimization Equipment (ALOE) platform is a prototypical example of an adaptive molecular design process140. Its software control contains an algorithm for building predictive bioactivity models and prioritizing the selection of starting materials for subsequent rounds of on-chip compound generation. The system can adapt to the underlying structure–activity relationship (SAR) and rapidly find optima in chemical space, with low reagent consumption.

Basic schematics of integrated microfluidics synthesize-and-test platforms are shown in Fig. 6, and a selection of applications is listed in Table 1. These methods operate on small volumes of fluids in geometrically well-controlled environments composed of different functional units, for example, dispensers, mixers, reactors and detectors. Solvent exchange may be required when transferring newly synthesized compounds to biochemical or biological testing, which is typically performed in aqueous media. Some of the integrated flow systems allow for slow solvent mixing and direct in-line testing. Fast evaporation and reformatting has also proved suitable and may represent an alternative working solution, especially in combination with batch synthesis. For example, researchers at Cyclofluidics developed a flow technology platform integrating the key elements of adaptive SAR modelling to the discovery of novel ABL1 kinase inhibitors141. Similarly, Tseng and co-workers142 devised a complex microfluidics chip for 'click' chemistry and subsequent hit identification. In their proof-of-concept study, throughput was limited by the employment of an eight-channel mass spectrometer for reaction monitoring, but the authors argue that substantially higher throughput could be achieved by expanding the instrumentation.

Figure 6: Schematics of integrated microfluidics-assisted synthesize-and-test platforms.
figure 6

The classic linear layout shown in part a does not contain automated feedback from the assay to the reagent selection, whereas the cyclic layout shown in part b includes an adaptive computer model for reactant prioritization based on the assay readout. LC, liquid chromatography; MS, mass spectrometry; UV, ultraviolet light.

PowerPoint slide

Table 1 Selected examples of microfluidics-assisted synthesize-and-test platforms for hit identification and optimization

For biological experimentation and integration with chemical synthesis devices, droplet microfluidics systems and biological readouts from single cells seem to be reasonable choices143,144 (Fig. 7). These systems are suitable for creating concentration gradients and generating microdroplets of varying compositions for biochemical and cell-based screening applications. Similar to chemical microreactors, compared to single-layer microfluidics systems, 3D droplet-based systems have been shown to be more efficient and amenable to ultra-high-throughput analysis145. Droplets are especially suitable for performing enzyme-controlled processes146,147 and may contain cells for probing drug effects in continuous flow148. In this way, single cells may be addressed, thereby eliminating potential issues of readout interpretability caused by cell heterogeneity, for example, for studying cancer cells149. Often, a fluorescence-based readout of phenotypic drug effects is obtained for further analysis150. The rapidly developing and progressing field of microfluidics-assisted lab-on-a-chip platforms has recently been reviewed by Nakajima and co-workers151.

Figure 7: Microfluidic single-cell screening device.
figure 7

A microfluidics system for the continuous screening of compound effects on single cells is shown. It consists of a double-layer device containing an array of chambers. Each chamber has a central trap for capturing cells or vesicles (individual traps are visible in the enlarged illustration of a section of the device) and a round valve that can be opened and closed for fluid exchange. For analysis, the valve is usually closed. The volume of the chambers depends on the particular chip design and is typically 150–500 picolitres. Reproduced with permission from Lucas Armbrecht and Petra S. Dittrich, Bioanalytics Group, ETH Zürich.

PowerPoint slide

The full automation of compound synthesis also requires reliable planning tools for synthesis and retrosynthesis. In fact, numerous such programmes have been conceived, dating back to Corey's pioneering work from the 1960s152, employing rigorous physical models (for example, reactivity prediction), rule-based approaches (for example, synthons and reaction schemes) or empirical models (for example, precedent-based database searching). Classic approaches have been reviewed elsewhere153,154,155. Their main drawbacks are their limited scope and often inaccurate results caused by insufficient chemical background knowledge captured by the software tools, paired with low execution speed.

Current computational tools are largely data driven. For example, ReactionExplorer is based on thousands of manually curated rules (electron-transfer steps) that represent basic chemical transformations to devise a mechanistic interpretation of a plausible reaction pathway156. More recently, machine learning models have been developed for automated synthesis planning, enabled by large curated reaction databases. ReactionPredictor is such a method and automatically identifies and ranks electron-transfer steps by use of a simplified molecular orbital description157. The number of prospective applications of these and other tools is still limited, and there is not much experience, if any, with integrating such tools in automated synthesis platforms. However, the continuously growing 'Network of Organic Chemistry' (NOC) contains approximately ten million reactions and reactants for synthesis planning158. One may consider such a collection of facts 'big data' in chemistry. Szymkuc et al.159 presented an innovative approach to reaction pathway construction based on NOC, using fast graph-analysis methods borrowed from bioinformatics. These algorithms are able to efficiently navigate through the entire breadth of chemical synthesis knowledge to identify optimal synthetic pathways. Alternative synthetic routes leading from the reactants to the products are compared using a function that includes the number of steps and the cost of synthesis. Finally, algorithmically identified optimal syntheses are obtained.

These and related data-driven machine learning approaches, with continuously increasing accuracy and chemical reaction space coverage, are no longer science fiction and will enable fully integrated drug discovery platforms to be built. One such straightforward approach implements a combination of forward reaction templates for generating a set of chemically plausible candidate products and a machine learning classifier for virtual product scoring160. This system is based on more than one million reactions compiled from United States patent literature. Importantly, the model does not predict quantitative yields but merely spots plausible true reaction products in the pool of potential solutions. Although this overall concept may not be entirely new, the availability of suitable reaction databases and advanced machine learning models has enabled the development of robust classifiers.

Artificial intelligence in molecular design. Aside from the required robotic hardware and synthesize-and-test machinery, the learning aspect probably represents the most crucial part of the automated design cycle. If the design hypothesis is wrong, then even the most advanced synthesize-and-test approach will fail to deliver, irrespective of the technology used. It is important to note that if we can achieve partial predictability of SAR models in this situation and build on iterative adjustments of our underlying molecular design hypothesis, we can gradually approximate the underlying function. This process is referred to as 'adaptive design' or 'active learning' (Refs 161,162). The key requirement for active learning is rapid feedback, and for hit and lead discovery, rapid feedback can be achieved by fast synthesize-and-test cycles.

Considering this situation from an information-theoretical viewpoint, the full-deck screening of hundreds of thousands of compounds by contemporary technology (for example, as shown in Fig. 2) may be not only cost intensive but also inefficient. Such an approach does not include feedback but relies on a single library design step before brute-force compound testing. The necessary continuous adjustment of the molecular design hypothesis is performed only in the later stages of hit optimization and lead expansion. This design concept is prone to fail when relying on noisy data, personal bias and poor intuitive choices ('gut feeling').

The active learning concept is central to automated drug discovery. This concept is based on iteratively adapting a design hypothesis — for example, a quantitative SAR model — by adjusting its free variables on the basis of newly acquired compound activity data. The modified design hypothesis is then used to select new compound sets for synthesis and testing. Dating back to the early 1990s, there have been several attempts to use adaptive de novo drug design guided by artificial neural networks and other machine learning techniques (see Refs 163,164,165 for reviews), although these attempts have been isolated. In a recent article, Hunter166 advanced the view that adopting and exploiting the full potential of artificial intelligence methods for pharmaceutical research might be essential to creating a sustainable drug discovery process.

A specific advantage of machine-driven hypothesis generation is that new compounds may be designed according to numerous criteria in parallel, for example, activity, synthesizability, predicted off-target effects and so on. Importantly, these models are able to capture essential non-additive (nonlinear) feature contributions to the design objectives, which cannot be appropriately considered by linear substituent contribution models (for example, Free−Wilson analysis and matched-molecular-pair analysis)167,168. Non-additive models of protein−ligand binding are a basic prerequisite for rational drug design169.

While explorative selection by active learning aims to add new information to the model with each iteration through the design cycle, exploitive selection maximizes compound quality with regard to certain design criteria, such as activity and selectivity. Balanced selection strategies compromising between these two extremes seem to be particularly suitable for both finding potent compounds (exploitive selection) with novel scaffolds (explorative selection) and optimal SAR model building170,171. This principle of model adaptation by active learning offers the additional advantage of limiting both the number of iterations that are required to find compounds with the desired properties and the number of compounds to be synthesized and tested in each iteration of the design cycle172. Visualization of the fitness landscape ('activity landscape') modelled during each iteration can additionally help to navigate the chemical space173 (Fig. 8). Compound 14 is a new subtype-selective antagonist of the dopamine D4 receptor found by active learning with an ant colony algorithm (MAntA, Molecular Ant Algorithm)174 for compound selection44. Similarly, new CXC-chemokine receptor 4 (CXCR4) antagonists have been identified by active learning with a random forest model175.

Figure 8: Active learning in drug design.
figure 8

Knowledge of the underlying structure–activity relationship (SAR) captured by a machine learning model is very limited in the beginning of a discovery project but grows over time with each active learning step. The 'fitness landscapes' visualize the areas of chemical space that are associated with low (transparent) and high (strong colour intensity) predictive confidence (part a). In the example, d1 and d2 denote meaningful coordinates of chemical space, which can be obtained, for example, by projection or dimensionality-reduction techniques245. The distributions shown in part b illustrate four stages of a SAR model during active learning. The average predictive confidence increases (and the margin of error decreases) with each iteration (models 1–4). The initial model 1 was trained on literature data (in this case, CXC-chemokine receptor 4 (CXCR4) ligands)175. Models 2 and 3 were obtained after testing 30 additional compounds per learning step. Model 4 was trained with all tested compounds taken together. The small discrepancy of predictive confidence between models 3 and 4 demonstrates the efficiency of the active learning process. D4R, dopamine D4 receptor; KD, dissociation constant; Ki, inhibition constant; P, pseudo- probability density function.

PowerPoint slide

'Deep learning' from 'big data'. The possibilities of computational molecular structure generation and property–activity prediction seem virtually unlimited. A particular appeal of automated structure generators lies in their trainability on complex chemical data, extreme speed and consideration of several design objectives in parallel. The young research field of constructive machine learning offers innovative methods for learning multidimensional SARs and iteratively navigating in very large chemical spaces to suggest chemical entities for testing that optimally fit the design hypothesis.

Based on the body of assay data stored in public and proprietary databases, it is now possible to train learning machines on arbitrary target−target, ligand−target and ligand−effect associations. Algorithms are able to recognize hidden patterns in molecules that escape medicinal chemical rationales and intuition because of the large set of variables and drug design objectives that should be considered in parallel. Suitable molecular structures that fit these patterns can then be computationally generated and forwarded to chemical synthesis and analytics and subsequent biophysical, biochemical and biological testing. A new design hypothesis is formed after updating the machine learning model with the newly obtained assay data (feedback loop), and swift compound optimization can take place. With such a set-up, one can expect to make informed choices of starting points for lead optimization.

Drug design can be regarded as a pattern recognition process. Medicinal chemists are skilled in visual chemical structure recognition and their association with retrosynthetic routes and pharmacological properties. In this context, various 'deep-learning' concepts are currently being evaluated as potentially enabling technology for drug discovery and automation because these systems aim to mimic the chemist's pattern recognition process and to take it to the next level by considering all available domain-specific data and associations during model development. While acknowledging their usefulness, we should not fool ourselves with the term 'deep learning' or consider these methods 'magic wands'. These systems are reincarnations of artificial neural network prototypes for automated molecular design from the 1990s176,177,178,179 that, in augmented and expanded form, can now be trained and optimized on complex pattern recognition tasks, largely owing to substantial improvements in available hardware and software180,181. One of the prominent machine learning toolkits harnessing the computational power of specifically developed tensor processing units (TPUs; application-specific integrated circuits developed by Google)182 is the TensorFlow open-source software library for numerical computation183,184. This software library provides access to contemporary machine learning methods and has found widespread use for cheminformatics and bioinformatics modelling and medicinal informatics185,186,187,188. For a review on toolkits and software libraries for deep learning, see Ref. 189.

To date, most machine learning applications in the field have been 'shallow' — that is, using a single layer of feature transformation to achieve their goals. This class of algorithms includes various clustering and regression methods (for example, nearest neighbour approaches, support vector machines, standard neural networks and decision trees). The successes of these methods in activity prediction and lead suggestion are, in part, due to the development of useful, often domain-specific, molecular representations, which enable comparably simple machine learning architectures to make reasonable predictions. In the process of engineering and applying these descriptor systems, we include a measure of our chemical knowledge and understanding in the depiction of the actuality of these molecules. Now, 'deep' methods based on learning directly from molecular graphs and other physically oriented models of complex molecular objects have been proposed that remove some of this input-level abstraction190,191,192. This more general approach, however, benefits from a more sophisticated machine learning methodology for pattern recognition, as the input data are much less amenable to producing useful output with 'shallow' transformation methods.

Essentially, deep-learning models are hypothesis generators. Their secret lies in a cascaded feature extraction and transformation process from the training data representation and in nonlinear function estimation based on these features (Fig. 9). While passing information from the input to the output layer, increasingly intricate features are formed in the subsequent layers of such models. Each network layer may contain heterogeneous processing units that select and refine features in different ways. Such a learning process often results in models that elude our immediate interpretation in chemical terms193,194. Nonetheless, such models can be extremely useful195,196.

Figure 9: Schematic of a deep-learning network.
figure 9

Deep neural networks transform the input data (for example, molecular structures or microscopic images) by cascaded feature extraction and compute a nonlinear function of the input, f(x). They essentially represent universal function estimators. Each network layer can vary in size and architecture, can have alternating functionality and can contain different types of processing units. When trained on compound activity data, the overall network function adapts to the underlying structure–activity relationship and, after successful training, can be used for automated compound design. Essentially, such learning systems are able to incorporate new data (for example, new compound–target activities or chemogenomics data) and continuously adjust their internal model of the input–output relationship. The depicted network architecture highlights only one of several related deep-learning concepts.

PowerPoint slide

From a chemogenomics viewpoint197,198, deep-learning methods for model building may indeed represent a breakthrough199,200,201. Currently, there are approximately 70 million SAR data points stored in public databases, not accounting for the very large volumes of proprietary data from deep sequencing and other massively parallel and ultra-high-throughput assays. Deep-learning networks provide appropriate technology for analysing such large amounts of data to find meaningful relationships between ligands, proteins, genotypes and phenotypes202,203,204,205. Several heterogeneous deep-learning systems with high prediction accuracies have been developed for drug–target association, drug repurposing opportunities and target identification, among other tasks202,206,207,208. Deep network models have also been shown to improve conventional virtual screening methods, such as automated ligand docking209, and to accelerate otherwise computationally costly chemical computing tasks210. Various applications of deep learning in biomedicine have been comprehensively reviewed211.

Curated consistent data are a prerequisite for improved model building. A consortium of industrial and academic partners has recently published a new comprehensive database of standardized chemical and biological data for chemogenomics data analysis (ExCAPE-DB212, Exascale Compound Activity Prediction Engine)213. Although the number of compound structures and activity values stored in these databases may appear impressive from a chemistry-oriented viewpoint, they are vanishingly small in comparison with other fields, such as computer vision214. With the exception of virtual chemical space, one may indeed wonder if big experimental data exist in chemistry215. In this context, Tetko et al.216 suggested the definition of big data as “out of the scale of traditional applications, which require efforts beyond the traditional analysis”. Data sharing and open software between research organizations will further expedite successful model building for automated drug discovery217. Importantly, big data as such are not a prerequisite or guarantee for obtaining good predictive models. Similarly, it is advisable not to simply try and apply deep models to any given classification or regression task in drug discovery, but to carefully evaluate the required model complexity and its applicability domain beforehand210,218,219.

Conceptual and practical challenges

Judging from successful proof-of-concept studies and pilot applications, potentially major benefits for drug design from the integration of automated discovery processes can be anticipated. These include low error rates (for example, reduced risk of false positives), high speed of execution (for example, faster hit and lead identification), low consumption of materials (advancing green chemistry), straightforward synthetic schemes for ease of compound production, potentially patentable compound structures (in combination with scaffold hopping), ease of instrument handling (low maintenance) and, ultimately, improved decision making for hit and lead candidate selection.

Nevertheless, molecular design is governed by nonlinear relationships between the chemical structures and their biological activities, random events (serendipity), measurement and judgement errors and the incompleteness of available drug discovery data. In addition, erroneous assay readouts hamper accurate model building, and poor data curation can easily be a limiting factor for machine learning. Reducing errors in data annotation and relying on suitable assays will therefore be mandatory for future success. Progress in automatically detecting and recovering false negatives (that is, active compounds misidentified as inactive by the test) points to new means of hit selection besides relying on primary activity alone220. Automated retesting of suspicious compounds could be performed by autonomous robots. Researchers at Pfizer recently disclosed success rates of 13–51% of true false negatives from HTS that were rescued based on computational prediction221.

Although the required flexibility and adaptability of the design hypothesis have long been adopted in software solutions for de novo molecular design and model building, real-life applications have only recently been demonstrated. Minimizing the time gap between synthesis and testing may be the vital factor for increased productivity of drug discovery projects. A high program speed increases the number of design loops that can be made and limits the risk of generating new compounds agnostically, without full integration of the test results into the design hypothesis. There is no learning without reflection and feedback.

Lab-on-a-chip and other miniaturized and/or mobile platforms with a small footprint seem to be suited to address this bottleneck in hit expansion. As appealing as this technology may be, however, seamless integration of the heterogeneous instrumentation faces technical challenges. New continuous-flow platforms may provide a complement or even an alternative to these mixed-method systems. Similar to conventional robot-assisted systems, in continuous-flow devices, the lack of direct in-line methods for compound profiling in dose–response format has prevented the emergence of fully automated hit discovery and optimization in the past.

Another limiting factor is the currently restricted versatility of automated synthesis platforms. Each chemical reaction requires optimization and often hardware modifications (for example, seals, reactors and piping); the reagents must be prepared for handling, detection and purification protocols must be adjusted and so on. On-the-fly switching from one chemical transformation scheme to another and sequentially performing multiple steps automatically may be straightforward in silico, but remains challenging in real life. Although one-step syntheses of individual compounds or focused libraries can be robustly performed in parallel batches or in flow, we still need to identify the sweet spots of such platforms for seamless integration in drug discovery. The elegant automated synthetic strategy devised by Burke and co-workers74, which enabled the generation of structurally diverse compounds from a limited set of simple building blocks (Fig. 4), points to a direction of future research to address this issue.

With all the current excitement about sophisticated artificial intelligence systems and the maturation of rapid automation, it is crucial to identify approaches and technologies that could be implemented robustly by medicinal chemists in the near future and to discuss the challenges of doing so in the context of industrial workflows. Computational molecular design has always raised hopes that some computer wizardry might come to the rescue of stalled discovery projects. The prospect of process automation in the age of 'big data' further stimulates a drug designer's fantasies. What will the laboratory of the future look like? Are we facing the automation of drug discovery with autonomous molecular design robots replacing medicinal chemists?

There is no doubt that the automation of science has already begun. The use of robotic devices is not limited to improving the reproducibility of experiments; a particular feature of 'robot scientists' is their explicit foundation of scientific reasoning, which contrasts with the more polymorphic, generalized human mind222. The key technology drivers are hardware and software improvements and data availability. However, there may be limitations to the applicability of machine learning in chemistry, as recently noted by Gambin and co-workers223. According to their study, fundamental mathematical theorems impose upper bounds on the accuracy with which reaction yields and times can be predicted, which in turn will limit the scope of autonomous drug discovery platforms. Furthermore, the hundreds of thousands (or more) data points required for deep learning will be unavailable in many drug discovery projects. Alternative methods for equally robust feature extraction and hypothesis generation from 'small data' sets need to be identified. Pande and co-workers recently suggested 'one-shot' learning for such instances224.

More conventional modelling techniques are not expected to become outdated. The combination of 'big data' and 'deep learning' per se does not solve problems; it is the ability of the researchers involved who devise appropriate representations of chemistry and biology for computational analysis. Their scientific skills will be needed even more in future drug discovery settings. This notion becomes especially relevant when contemplating the fragility of autonomous discovery platforms. Although there have been reports about robots that can adapt to damage and show outwardly 'intelligent' behaviour225,226, at least in the foreseeable future, it will remain the task of the skilled scientists, technicians and engineers who design, run and maintain these discovery platforms.

Irrespective of the success or failure of individual technologies, this fresh view on drug discovery goes far beyond traditional approaches and will deliver innovative methodologies and potentially ground-breaking solutions that may have a substantial impact on future discovery concepts. One could envisage the future development of benchtop instruments equipped with building block cartridges for chemical synthesis and cassette-like bespoke assay panels for in-line screening, opening up great opportunities for small and medium-sized technology companies; for example, such a mobile instrument could be made available for project teams in many laboratories. Certainly, this concept does not make medicinal chemistry obsolete, as one might mistakenly deduce from some published comments on this topic227,228; in reality, the opposite expectation is probably closer to the truth. However, medicinal chemistry training needs to adapt to this new situation and to prepare chemists accordingly229,230,231.

The well-controlled conditions possible using microfluidic synthesis technology enable otherwise strongly exothermic, dangerous or difficult reactions to be performed safely, potentially making novel molecular scaffolds more accessible. However, chemists will still have to design these experiments to be performed by a machine, and the tool compounds obtained will not represent perfect lead compounds for immediate expansion and development. Furthermore, because the design machine will be able to produce chemical starting points very quickly, future hit-to-lead optimization and scaffold morphing will require strong chemical expertise and will probably generate demand for increased conventional synthesis capacity.

The possibilities of bioinspired molecular machines allow for even farther-reaching goals: for example, in the performance of diverse operations in response to chemical triggers. A recent example is provided by a DNA nanomachine that uses DNA origami command tracks to control a microfluidics device232. One may also envisage automated drug discovery platforms that include modules for dynamic combinatorial chemistry with biocompatible reactions; that is, the in situ generation of drugs binding to a protein target233,234. In light of the rather limited compound library sizes used in such projects to date, automated adaptive feedback control offers opportunities for the optimal exploration of chemical space for dynamic combinatorial chemistry.

There is no doubt that drug discovery demands the right mix of human mind, automation and machine intelligence. In the future, the 'intranet/internet of things' may enable fully autonomous cross-platform drug discovery. In combination with the appropriate test systems and metrics of success, such integrated environments bear the promise not only of stable system performance but also of increasing the competitiveness and efficiency of drug discovery processes by sharing resources and data intramurally and extramurally235,236.

Conclusions and future perspectives

The drug discovery process has characteristics of chaotic systems, including nonlinear behaviour, error, incompleteness, random serendipitous events and partial predictability237. Not surprisingly, good compounds may be overlooked for various reasons. Clearly, drug discovery is a challenging endeavour that requires skilful navigation in a multidimensional, multimodal search space. For example, 'activity cliffs' may affect lead optimization238, and unexpected biochemical and pharmacological effects can derail lead compound expansion and development.

The three challenges for automated drug design are the assembly of synthetically accessible structures, scoring and property prediction, and the systematic optimization of promising molecules in adaptive learning cycles. Over the past three decades, numerous guidelines, methods, algorithms and heuristics have been proposed to address each of these problems. Although the generation of new chemical entities with attractive chemical scaffolds has become feasible and although the algorithmic optimization problem can also be considered largely solved, the persisting issue of compound scoring — that is, picking the best compounds from a large pool of accessible possibilities — remains difficult. While compound elimination by appropriate scoring models discards the bulk of the designs ('negative design') with acceptable accuracy, the selection of the best or most promising ('positive design') remains prone to error. More accurate activity prediction models that extend the capabilities of existing approaches could originate from advanced machine learning methods.

Prognoses of the sustainability of customary pharmaceutical discovery and development practices imply the need for adjusted strategies for the future239,240,241,242. In such a situation, one can and must be creative. Given the prospects of labs-on-a-chip, human organoid assay systems, automated synthesis and intelligent learning software, we are currently witnessing a new wave of excitement about the changes in pharmaceutical research and development243,244. The concept of automated drug discovery could help to considerably reduce the number of compounds to be tested in a medicinal chemistry project and, at the same time, establish a rational unbiased foundation of adaptive molecular design. Recent advances in both lab-on-a-chip and computer technology, as well as the development of self-teaching artificial intelligence systems, could allow bottlenecks in the molecular design cycle to be addressed, thereby enabling better decision making in the future. Automation will play a central role in this process.

The envisaged drug discovery engine imitates human decision making by transferring responsibility to an objective machine learning system as a core aspect of the discovery process. If successful in the long run, the approach will amalgamate a continuously learning machine intelligence with the synthesis of pharmacologically relevant chemical matter. Thus, the medicinal chemist will gain the freedom to draw inspiration from potentially surprising solutions delivered by computational models, have fast access to initial tool compounds for a given discovery project and save precious material.

Rapid feedback cycles require the customization of instrumentation and the adjustment of work processes. Establishing this concept in pharmaceutical discovery may require considerable investment in terms of money and the reorganization of laboratory structures and processes. It will be necessary to evaluate the feasibility of fully autonomous molecular design with the aid of computers and robotic devices and, at the same time, to analyse which aspects of compound generation are best left to a chemically savvy artificial intelligence or a skilled human mind. The answers to these questions may vary depending on the particular discovery context, and keeping an open mind to many different viewpoints is advisable. Medicinal chemistry has always borrowed methodological thinking from engineering and experimental design so that tailored solutions could be implemented to meet challenges in chemistry, and continuing to do so would be wise. While keeping a healthy scepticism of automation for its own sake, embracing new technologies for planning and performing compound design, synthesis and testing, without fearing a loss of control, could enable substantial improvements in the effectiveness of drug discovery.

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.