Innovation | Published:

Directing evolution: the next revolution in drug discovery?

Nature Reviews Drug Discovery volume 16, pages 681698 (2017) | Download Citation


The strong biological rationale to pursue challenging drug targets such as protein–protein interactions has stimulated the development of novel screening strategies, such as DNA-encoded libraries, to allow broader areas of chemical space to be searched. There has also been renewed interest in screening natural products, which are the result of evolutionary selection for a function, such as interference with a key signalling pathway of a competing organism. However, recent advances in several areas, such as understanding of the biosynthetic pathways for natural products, synthetic biology and the development of biosensors to detect target molecules, are now providing new opportunities to directly harness evolutionary pressure to identify and optimize compounds with desired bioactivities. Here, we describe innovations in the key components of such strategies and highlight pioneering examples that indicate the potential of the directed-evolution concept. We also discuss the scientific gaps and challenges that remain to be addressed to realize this potential more broadly in drug discovery.


Drug discovery remains slow, expensive and unreliable, despite many technological advances in recent decades. The journey from a small-molecule screening hit to a candidate drug typically takes 4–5 years, costs US$14–25 million and has an attrition rate of around 50% or higher, and the attrition rate from a candidate drug to the launch of a commercial product has been reported to be as high as 97%1. The overall cost of drug development is high, with the cost per drug launch (including the cost of failures and capital) recently estimated to exceed $2.6 billion2.

The choice of biological target remains a key source of attrition. Hence, large investments are being made in translational science to more effectively validate the role of a biological target in human disease and identify the most appropriate patient subset in which to evaluate potential drugs3. In the hunt for effective therapeutics, new drug modalities have been successfully exploited, most notably antibody-based therapies4. However, some targets with a compelling biological rationale, including various transcription factors and intracellular protein–protein interactions, are still very challenging for either current small-molecule drug discovery technologies5 or large-molecule approaches6.

Standard small-molecule drug discovery approaches can conceptually be broken down into two components. The first component is an initial screen — often a high-throughput in vitro assay that can screen up to ~106 compounds — to identify compounds that show some level of the desired activity (hits). Setting up and analysing such screens typically takes 1 year1. The second component is the optimization of hits into leads through design–make–test–analyse cycles (DMTA cycles), ultimately leading to the selection of a candidate drug. In addition to the desired biological activity, such optimization has to take into account other properties that are crucial for candidate drugs, including pharmacokinetics and safety. Chemistry teams in large pharmaceutical companies can typically carry out DMTA cycles on 1–2 lead series in parallel, with each iteration taking 4–6 weeks.

Some advances in the initial screening component have focused on searching larger areas of chemical space. For example, DNA-encoded libraries7,8,9 can typically be used to screen up to 107–1010 molecules10, and virtual screening can profile up to 1013 compounds11,12, but even the largest of screening libraries cannot be used to fully explore 'drug-like' chemical space, which has been estimated to be as large as 1063 molecules13. Other advances, such as fragment-based drug design, conceptually enable wider areas of chemical space to be explored more efficiently with smaller chemical libraries (see Ref. 14 for a review). Simultaneously, there have been various advances, such as developments in predictive chemistry, that could speed up the DMTA process (for a review, see Ref. 15) and that could potentially lead to fully automated DMTA cycles. Nevertheless, the improvements in the DMTA paradigm remain incremental, as the fundamental process remains unchanged.

In nature, however, the development of bioactive compounds has been driven by evolution. Microorganisms, insects, plants and animals have adapted through the process of natural selection to synthesize peptides and other small molecules as defences or attractants that provide them with an evolutionary advantage over their competitors16,17. Adaptation can happen remarkably quickly, particularly in rapidly dividing systems under a strong selection pressure, such as bacteria and viruses, in which random mutations among billions of cells can produce one that has an advantage, which can then become dominant18,19,20,21,22.

Evolution, defined as the change in heritable traits of biological populations over generations, requires: traits that vary among individuals in the population (phenotypic variation); that the different traits confer different rates of survival and reproduction (differential fitness); and that successful traits can be passed from generation to generation (heritability of fitness). The classical DMTA cycle in medicinal chemistry has many similarities to the evolutionary processes in biology mediated through traits encoded in the genomes of organisms. The medicinal chemistry design hypothesis could be viewed as analogous to genetic information23. The 'make' stage (chemical synthesis or purchase) corresponds to the translation of genetic information into proteins. The 'test' and 'analyse' stages are similar to the identification of organisms that have particular characteristics (differential fitness), and deduced structure–activity relationships lead to new designs. The good features are kept, and the bad features are discarded from the design hypothesis for the next round of synthesis, a process that is comparable to the mutation and recombination of the genetic information that occurs during reproduction and hereditability of fitness.

Evolution in nature can be mimicked in the laboratory using directed evolutionary processes to test billions (or more) variations of genetic sequences that encode particular characteristics in parallel. Under a selection pressure, each sequence has a chance to survive to propagate for the next generation, with the mutation of successful sequences introducing further variation for the next round of selection. For example, such processes are well established for the selection of therapeutic antibodies from phage display, ribosome display, mRNA display and microbial cell display libraries (see Refs 24,25 for reviews).

If chemists could harness the scale and speed of directed evolutionary processes to search for and optimize bioactive small molecules, a transformation in lead generation success rates could be achieved — perhaps even for highly challenging, but strongly biologically validated targets. Excitingly, advances in several areas — including natural product biosynthesis, synthetic biology and biosensors — are now providing the basis for developing integrated systems that could have the potential to fulfil this goal. A representation of such a directed evolution paradigm is shown in Fig. 1. Although applications so far have been limited primarily to proteins and cyclic peptides for extracellular targets, such systems could in principle also be applied to cyclic peptides and small molecules that inhibit intracellular protein–protein interactions to optimize their bioactivity, their selectivity and, in some cases, their pharmacokinetic properties. Below, the steps of directed evolution are presented first, followed by pioneering examples of its application to the discovery and optimization of cyclic peptides and small molecules. The opportunities, gaps and challenges for directed evolution to have a broader influence on drug discovery are also discussed.

Figure 1: Vision for harnessing evolutionary pressure in drug discovery.
Figure 1

By coupling the production of new molecules to biosensors, it is possible to drive cells under a mutational stress and a selection pressure through evolutionary cycles to optimize a ligand of interest. a | Enzymatic synthetic pathways are introduced and randomized within a plasmid. Biosensors are also introduced into the plasmid. Cells are then transformed with this plasmid. b | Selection pressure follows: the stress inducer (pressure) is expressed (step 1), triggering mutations (step 2) and leading to the synthesis of potential inhibitors (step 3). Only the 'fittest' cells — those able to generate an inhibitor — survive, while other cells die. c | For stress-surviving cells, hit deconvolution is carried out to identify the chemical structures of the inhibitors.

Harnessing directed evolution

Directed evolution systems to identify bioactive small molecules need to integrate components that allow the introduction of molecular variability across a population of molecules, the application of a selection pressure to the production of such molecules and hereditability of the capacity (fitness) to produce molecules with the desired properties.

Introduction of molecular variability. The introduction of broad variability in the starting population of potential small molecules is the first step of an evolutionary cycle. For simple peptides, variability in the population can be introduced by making random DNA or RNA libraries that encode populations of peptides using cell-free systems or cellular display. After translation, affinity-based selection can be applied to identify peptide sequences that have the desired properties, such as strength of binding to a protein target. Mutational variability in the population of peptides can be introduced in various ways. In vitro directed evolution experiments can allow natural genetic mutations to drive variation, the use of error-prone PCR to drive mutations in expanding the DNA libraries, or DNA shuffling and recombination (see Ref. 26 for a review). More-focused variability can be introduced by site saturation mutagenesis. In cells, several strategies can be used, including chemical mutagens, UV light, hypermutator strains (engineered through deletion or modification of DNA replication and repair genes), mutagenicity plasmids (which suppress proofreading and enhance error-prone lesion bypass by DNA polymerases and/or impair mismatch repair27) and genome-wide mutagenesis. Some of these methods are exemplified by the case studies in Table 1 (with selected structures shown in Fig. 2).

Table 1: Examples of compounds and libraries made by combinatorial biosynthesis, directed evolution and continuous evolution
Figure 2: Chemical structures of selected compounds produced by evolutionary approaches.
Figure 2

For details on the compounds, see the descriptions in Table 1.

For more-complex small molecules, there are growing opportunities to harness the pathways used in nature — particularly by bacteria and fungi — to biosynthesize a wide range of bioactive natural products (typically based on a polyketide or peptide core) non-ribosomally. Over the past two decades, microbial genomic studies have led to an explosion of knowledge on such non-ribosomal biosynthetic pathways and their manipulation to produce novel molecules (see Refs 28,29 for reviews). For the purposes of directed evolution, if a biosynthetic pathway exists that already contains the production elements of potential hit or lead molecules, random mutations in the biosynthetic machinery for that particular pathway may be introduced to induce variability in the product molecules (for example, see Ref. 30). Variability may also be introduced by using random combinations of biosynthetic units of known pathways, by changing starting materials or by randomly recombining enzymes that can perform chemical transformations31. It is important to note that introducing variability in these ways is not straightforward, given the complexity of the biosynthetic machinery for natural products, which is discussed in depth in other reviews32,33,34 and in more detail below. Nevertheless, growing knowledge of the intricacies of biosynthetic pathways, as well as improved tools for manipulating them, such as the use of CRISPR–Cas9 to unsilence cryptic biosynthetic gene clusters (BGCs)35,36, informatics tools, such as antiSMASH for in silico mining of BGCs37, new DNA assembly techniques38 and even the prediction of plausible small-molecule structures directly from the primary sequences of BGCs39 could increase the likelihood of success in the near future.

Applying selection pressure in an extracellular context. The next step in the evolutionary cycle is the application of selection pressure across a population. The term 'selection' in a directed evolution context is analogous to a 'screen' in the DMTA cycle (although some researchers reserve the word 'selection' for 'genetic selection', in which the generating cell receives a survival advantage, as described below). Selection can be based on: the production of target molecules measured using an independent assay, such as mass spectrometry40; the identification of active compounds by an affinity-based method, such as binding to a target protein immobilized on a solid support or magnetic beads41; or an activity-based bioassay42 (see below for discussion of specific examples using technologies such as phage display of peptides).

Selection does not have to be based solely on affinity or activity of the target molecule. For example, although peptides have been developed as drugs, their optimization can be cumbersome owing to the inherent instability of peptides towards proteases, which leads to them having short plasma half-lives. To help address this problem, proteolysis has been used as a selection pressure to identify more-stable peptides43,44.

Applying selection pressure in an intracellular context. An intracellular biosensor can be used to connect and translate the concentration and affinity of a particular molecule inside a cell to an output signal. A range of biosensors have been developed, including riboswitches, allosterically controlled transcription factors, enzyme-coupled biosensors and two-hybrid systems45 (Box 1). The signals from biosensors can cover various end points, including luminescence or fluorescence, enzymatic activity and cell viability. For example, when a cell produces a metabolite of interest, it can be used to regulate the expression of an antibiotic-resistance gene, and the productive cells can be identified by their selective growth in antibiotic-containing media (that is, they have obtained a fitness advantage).

Box 1: Applying biosensors as detectors for the selection of desirable molecules

A biosensor translates the concentration and affinity of a compound to an output signal, such as the generation of fluorescence or expression of a survival gene, and opens the path for fully integrated evolutionary cycles.

RNA riboswitches modulate RNA secondary structure in the 5′-untranslated region (UTR) by making the ribosome-binding site accessible to the ribosome only in the presence of a target molecule123,124. On binding of the specific molecule, the RNA changes conformation to reveal the riboswitch (RBS). Translation is then enabled, leading to expression of a gene encoding a signal, such as an antibiotic-resistance gene (see the figure, panel a). Surviving cells are the ones able to produce the target compound and are selected by their selective growth in antibiotic-containing media. These biosensors exist for a range of natural products125,126 and metabolites124,127.

Allosterically controlled transcription factors can control the expression of a target gene (for example, antibiotic-resistance genes or green fluorescent protein (GFP), allowing fluorescence-activated cell sorting ((FACS); see the figure, panel b). On compound binding, the transcription factor undergoes a conformational change that alters its affinity for DNA, leading to gene expression. Naturally evolved biosensors can detect a wide range of chemical scaffolds, including alkaloids, lipids and amino acids128,129.

Protease-based biosensors (based on the concept of enzyme-coupled biosensors130) can consist of a polypeptide 'tail' on the extracellular surface of a membrane protein, which is fluorescently labelled, leading to fluorescent cells131. In the presence of a specific protease (for example, elastase), the tail is cleaved, preventing labelling (see the figure, panel c).

Finally, some biosensors can trigger a survival response and are linked to inhibition of different target classes, such as proteases132 or proteins required for antibiotic resistance133,134. Among these, the two-hybrid system consists of one protein, called 'A' in this example, which is fused with a DNA-binding domain of a transcription factor and another protein, called 'B' in this example, which is fused with the activation domain of the same transcription factor135,136. When both proteins are co-expressed, and in the absence of an inhibitor of the A–B protein–protein interaction (PPI), the two partners bind, resulting in transcription of reporter genes (see the figure, panel d). In the presence of an inhibitor, the two domains remain separated, preventing transcription. The two-hybrid system can also be reversed in such a way that cell survival is conditional on disruption of the interaction. Box figure part a is modified with permission from Ref. 45, Elsevier. Part c is reproduced from Ref. 131, CC-BY-4.0.

ORF, open reading frame.

If a biosensor only has a binary output (for example, a 'survive or die' signal for a cell), it is likely to be useful only in manually directed evolution. However, if the output of a biosensor shows a more-gradual dependency on the activity of a detected molecule, the activity could potentially be optimized in continuous evolution cycles, as discussed later.

Hereditability of fitness. In a directed evolution system, selection of the 'fittest members' of the system (for example, DNA sequences that encode the highest-affinity or most-stable peptides) can be done manually. The members of the system that produce such molecules can then be expanded on with further rounds of variation and selection, either to focus on the same property for a modified set of producing members or focus on other properties. For example, for genetically encoded peptides, structure–activity analysis of the peptides could be applied, and the encoding of key pharmacophores46 within DNA sequences for the next round of molecule production and selection could be 'fixed' (Ref. 47) or favoured48, while allowing random variability among other parts of the DNA sequences. Where subsequent chemical modification is planned, key amino acids can also be fixed, while allowing remaining residues to be randomized47,49,50.

In a continuous directed evolution system, intracellular production of molecules can be coupled to the reproduction of cells using a biosensor (Box 1), removing the manual selection step and allowing nature to take full control. The advantage of continuous directed evolution over manually directed evolution is the opportunity for many more generations to be sampled, allowing more profound evolutionary changes to be explored. In the next two sections, examples of directed evolution with manual intervention and continuous directed evolution in the production of peptides and other small molecules are discussed.

Manually directed-evolution examples

Extracellular directed evolution. Technologies such as mRNA display and ribosome display are cell-free directed evolution methods, whereas in phage display, the target molecules are expressed on the surface of bacteriophages (Fig. 3; for a recent review of these and other display technologies, see Ref. 25). Such technologies enable the steps of variability, selection and hereditability of fitness in directed evolution as follows (Fig. 3). In RNA and ribosome display, variability is achieved by creating cDNA libraries. The libraries can be fully randomized, or randomized only at certain positions, with certain amino acid positions being fixed. mRNA and ribosome display, being cell-free methods, provide flexibility for the inclusion of unnatural amino acids. After translation, selection is made based on the affinity of the peptide or protein and its attached coding mRNA against an immobilized target. The binders can then be eluted, the mRNA tag can be back-translated to cDNA and the sequence can be amplified by PCR to identify the binding peptide sequence. Hereditability of fitness is achieved by subjecting selected sequences to further rounds of variation, by amplification using error-prone PCR, site saturation mutagenesis or DNA shuffling, to further optimize binding sequences, followed by additional rounds of display and selection. Additional selection pressures, such as protease stability, can select for multiple characteristics.

Figure 3: Display technologies.
Figure 3

a | mRNA display, a cell-free display technology. From a library of cDNAs (variation), mRNA is transcribed and enzymatically coupled to a synthetic oligonucleotide spacer modified to contain puromycin (red circle) at its 3′-end (step 1). The ribosome transcribes the mRNA and reads in the 5′-to-3′ direction. The tRNAs are shown in the P (peptidyl) and A (acceptor) sites; the P site most commonly contains the tRNA with the growing peptide chain, while the A site contains the aminoacyl-RNA holding the new amino acid to be added. When the ribosome reaches the end of the coding mRNA, translation stops, puromycin enters the A site and then it becomes attached to the peptide chain in the P site, thereby linking the polypeptide to its encoding mRNA (step 2). Binders from among the translated and mRNA-tagged peptide or protein library pool are selected by the immobilized protein target (step 3). Reverse transcription to a cDNA library pool is used to generate the next generation of mRNA coding sequences (step 4), which are expanded by PCR, including the mutational strategy of choice to create further variation for the next round of directed evolution (step 5). b | Ribosome display, which is similar in principle to mRNA display. In the case of ribosome display, the mRNA library is coupled to an RNA spacer lacking a stop codon (step 1), enabling the polypeptide chain to be sufficiently separated from the ribosome to enable it to fold. On translation on the ribosome in the absence of a stop codon, translation stalls, the polypeptide chain is not released from the tRNA and the encoding mRNA is not released from the ribosome (step 2). Because of the spacer sequence, the protein binding motif may be selected (together with the attached complex) by an immobilized protein target and then eluted, at which point the complex can be dissociated. Further rounds of mutagenesis amplification and selection (steps 4,5) can improve on the affinity of protein binding sequence against the protein target. c | Phage display, which couples the evolving peptide or protein to a bacteriophage cell surface protein, such as M13 phage protein III (pIII). When the phage infects Escherichia coli, its DNA is translated, linking the evolving peptide or protein to the pIII protein. As the phage assembles for onward infection, the peptide or protein is displayed on the phage surface attached to the pIII protein, making the phenotype–genotype link, and can be selected with its target binding partner. After selection, successful sequences can be subjected to further rounds of infection, mutation and translation to optimize the affinity for the binding protein partner. Part a is adapted with permission from Ref. 137, Elsevier.

In phage display, cDNA libraries are introduced into a coat protein gene so that, after translation, the coded protein or peptide is displayed attached to the phage surface coat. After display, phages bearing peptides or proteins on their surface can be selected based on affinity of the displayed protein for an immobilized protein target. Hereditability of fitness is achieved after elution of selected sequences, by further rounds of replication and selection, which can result in improved affinity for the target protein. At each round, random errors in DNA replication introduce variability, or — as in mRNA display — the encoding DNA can be manually randomized using DNA shuffling or site saturation mutagenesis for a more rational approach. Peptide libraries that can be chemically modified have been expressed on the surface of bacteriophages using phage display (see Ref. 51 for a review).

In vitro mRNA display methods allow the incorporation of unnatural amino acids52, such as N-methylated amino acids to improve permeability53, unnatural amino acids containing reactive 'warheads' (Ref. 47) and reactive unnatural amino acids to allow cyclization. Macrocyclic-peptide libraries inhibiting sirtuin 2 were developed using the RaPID (random non-standard peptide integrated discovery) mRNA display technology to incorporate an ɛ-N-thioacetyl-lysine warhead and an α-N-(2-chloroacetyl)-tyrosine to enable cyclization onto a cysteine side chain, which was also introduced47. After translation, displayed peptides can be modified with bridging small molecules to generate monocyclic and bicyclic peptides to improve potency and metabolic stability (Fig. 3). For example, 1,3,5-tris(bromomethyl)benzene was used to couple embedded cysteines in peptide sequences AC(X)6C(X)6CG (where C represents cysteines for cyclization and X represents any amino acid) displayed on the surface of phages to select potent inhibitors of Notch1 (Ref. 48). TBMB has also been used to cyclise similar peptide libraries to select for inhibitors of human plasma kallikrein54 and cathepsin G49 (Table 1). Other linkers that have been explored include 1,3,5-triacryloyl-1,3,5-triazinane and N,N′,N′′-(benzene-1,3,5-triyl)-tris(2-bromoacetamide), which each impose different conformation constraints on the cyclic peptides, adding topological as well as sequence variability into the libraries before selection51,55. Further studies have identified potent and selective inhibitors for urokinase plasminogen activator (uPA)56, factor XIIa57 and HER2 (also known as ERBB2)58 using this approach (for a review of chemical modifications, see Ref. 51).

Impressive selectivity can be achieved during the rounds of directed evolution, as demonstrated for uPA and factor XIIa, for which inhibitors with >1,000-fold selectivity over related proteases were discovered. The inhibitors were even selective for the target human orthologues compared with murine orthologues. Although this demonstrates the exquisite selectivity that can be achieved by directed evolution, high selectivity for human orthologues over animal orthologues (or vice versa) may also be a limitation, given that candidate drugs are evaluated in animals for both efficacy and safety before clinical trials. Selection for crossover to a relevant species for the study of pharmacodynamics and toxicology needs to be implicit during the directed-evolution process.

In the case of factor XIIa, further medicinal chemistry optimization was attempted on the phage-display-evolved bicyclic peptide sequence RCFRLPCRQLRCR (cyclized with 1,3,5-triacryloyl-1,3,5-triazinane) to explore unnatural amino acids that could not be sampled by the phage display technology59. Most modifications resulted in decreases in affinity. However, a very conservative change (but typical of changes explored in medicinal chemistry) — of a hydrogen atom in the para-position on a phenylalanine amino acid residue with a fluorine atom — led to a further tenfold improvement in affinity. Ring expansion by replacing cysteine with homocysteine and 4-mercaptovaline, which could also not be accessed by the phage display technology, also led to small improvements in potency60. These examples demonstrate that complementarity can exist between directed evolution and medicinal chemistry optimization to further explore chemical space.

Another example using phage-displayed peptide libraries illustrates the potential to select molecules based on multiple characteristics — in this case, activity against a target protein and proteolytic stability61. Bicyclic peptidic phage libraries were first subjected to affinity selection against human plasma kallikrein. The phage libraries were then exposed to pancreatin (essentially a cocktail of digestive enzymes) as a proteolytic pressure, followed by amplification. An identified bicyclic peptide was active against plasma kallikrein (half-maximal inhibitory concentration (IC50) = 18 nM), and its inhibitory activity did not change significantly when incubated in the presence of pancreatin (IC50 = 39 nM), demonstrating its enhanced stability.

A similar approach based on cell-free RNA display of peptides (Fig. 3) has been used to select stabilized cyclic peptides targeting the GDP-bound G protein isoform αi1i1·GDP)43. After running a trillion-member cyclic peptide library through seven rounds of selection for affinity for αi1·GDP, a cyclic peptide with 2 nM affinity was identified. However, the peptide was found to be only twice as stable in the presence of the protease chymotrypsin compared with a linear peptide with the same sequence. Using the pool of peptides from round seven as a starting point, the library was exposed to three additional rounds of a two-step selection protocol in which the peptide library was first exposed to immobilized chymotrypsin for 15 minutes, followed by binding selection. After the three additional rounds of selection, the best cyclic peptide identified had retained affinity for the target (Kd = 3 nM) but with a 300-fold improvement in protease stability compared with a linear peptide with the same sequence. Surprisingly, the identified protease-stable analogues contained known chymotrypsin cleavage pharmacophores, indicating that directed evolution can be used to identify useful molecules counter-intuitively to the present dogma in a field.

A further step to improve peptide stability has recently been reported in a study using an RNA-displayed cyclic peptide that could incorporate an unnatural amino acid at one or more positions44. Using the previously described αi1·GDP-selected cyclic peptide as the starting point, a focused library was designed whereby each wild-type position could undergo mutagenesis with nucleotides for the UAG stop codon to code for an unnatural amino acid, N-methylalanine. At each round, the library was subjected to affinity and proteolysis selection. After five rounds, a single dominant sequence was identified that maintained target affinity; this showed a 400-fold improvement in chymotrypsin resistance and a >3,700-fold improvement in proteinase K resistance. The final selected sequence maintained only two out of the original ten amino acids, and incorporated two N-methylalanine residues.

As a further proof of principle, a library of peptides was put through directed-evolution selection for protease stability and then target affinity for HER2 (Ref. 44). In this case, the unnatural amino acid N-methylnorvaline was chosen because of its high incorporation efficiency and hydrophobic nature, which could contribute to HER2 binding. To target the extracellular domain of HER2, pharmacophores from trastuzumab and pertuzumab — approved therapeutic antibodies that target the extracellular domain of HER2 — were used as a starting point for the design of focused libraries (containing 106–107 members). Affinity selection was conducted using HER2-expressing cells in culture, and selection for protease stability was made using increasingly stringent incubations with proteinase K, chymotrypsin and trypsin. After four rounds of selection, the leading cyclic peptide was as potent and selective as the therapeutic antibodies and with sufficient stability for in vivo testing in mice. It incorporated one N-methylnorvaline residue. HER2-specific uptake into tumours in vivo was demonstrated using imaging, with little uptake into non-tumour tissue being seen.

Intracellular directed evolution. An intracellular, survival-based biosensor, known as the two-hybrid system (Box 1) has been harnessed for hit discovery. In a pioneering example, displaying a library of 20-residue peptides led to the identification of peptides binding to cyclin-dependent kinase 2 (CDK2)62. Beyond small peptides, the two-hybrid system has been applied to the discovery of single-chain variable fragments against epidermal growth factor receptor (EGFR)63 and affibodies targeting tumour necrosis factor (TNF)64. Excitingly, directed evolution of cyclic peptides produced intracellularly rather than extracellularly has been achieved by coupling this biosensor to a platform known as split-intein circular ligation of proteins and peptides (SICLOPPS)65. Randomized peptide-coding DNA sequences are coupled to DNA sequences coding for self-excising proteins known as inteins, which are split on either side of the peptide-coding sequence in a plasmid. Following expression of the whole construct from the plasmid, the split-inteins come into close proximity, favouring an intracellular native chemical ligation reaction, which results in the release of the intein and the generation of a cyclic peptide66. This approach can be used to generate plasmid libraries that encode 106–109 cyclic peptides.

The coupling of SICLOPPS libraries to the reverse two-hybrid system has been used to discover novel inhibitors of protein–protein interactions and transcription factors67,68. An example of the potential to address highly challenging targets with this approach focused on inhibiting the subunit hetero dimerization of a transcription factor, hypoxia-inducible factor 1 (HIF1)69 (Fig. 4). Firstly, a 3.2-million-member cyclic peptide library was evaluated, with survival of the cells producing the peptide coupled to inhibition of the HIF1α–HIF1β interaction. Evaluation of the surviving cell colonies (including steps to eliminate false positives) led to the isolation of four cyclic peptides that inhibited the HIF1α–HIF1β interaction, the most potent of which bound to HIF1α with a Kd of 124 nM. This peptide was then coupled to a transactivating transcriptional activator (TAT) cell-penetrating peptide to allow evaluation in cells, which demonstrated dose-dependent inhibition of hypoxia (IC50 = 19 μM). The peptide was also shown to disrupt the HIF1 dimer without disrupting the HIF2 dimer, a notable contrast to a previously reported inhibitor70. Additional developments with SICLOPPS have allowed the incorporation of unnatural amino acids, and such an approach was successfully used to identify HIV protease inhibitors71.

Figure 4: Directed evolution using intracellular sensors.
Figure 4

The figure illustrates the application of a platform known as split-intein circular ligation of proteins and peptides (SICLOPPS) to the identification of inhibitors of the hypoxia-inducible factor 1α (HIF1α)–HIF1β interaction. A library of plasmids is generated by randomizing a DNA peptide-coding sequence. Each plasmid contains a split-intein on each side of the DNA peptide-coding sequence, as well as the HIF1 reverse two-hybrid system. Cells are then transformed with the plasmid, and on average, each cell is transformed by a single plasmid. On incorporation of the plasmid, transcription and translation, the intein peptide and the HIF1α–P22 and HIF1β–434 fusions are expressed. Dimerization of HIF1α and HIF1β takes place, bringing together P22 and 434 to form a repressor system, blocking expression of genes that are required for growth. Separately, the carboxyl-terminal (IC) and amino-terminal (IN) components of the intein come into close proximity, and cysteine-based excision releases the cyclic peptide. If the peptide is able to disrupt the HIF1α–HIF1β interaction, gene expression and therefore bacterial growth is restored.

Directed evolution using chemical approaches. For many years, chemists have tried to mimic nature by developing chemical evolutionary approaches, such as DNA-templated synthesis72, dynamic combinatorial libraries73 and de novo computational design coupled to evolutionary algorithms to guide the virtual evolution of potential hit compounds (for recent publications, see Refs 74,75,76). These technologies have not yet had a significant impact on the drug discovery process. However, there has been notable success with the use of DNA encoding in the chemical synthesis of large libraries. DNA-encoded libraries have become part of mainstream hit-identification procedures and have enabled the expansion of high-throughput screening beyond the typical library size of ~106 small molecules. However, the DNA tag merely records the history of the synthetic route to the appended small molecule and does not direct its synthesis, whereas DNA-templated synthesis offers the opportunity to both record and direct the synthetic trajectory.

A DNA-programmed version of the split-and-mix solid phase synthesis77 of a combinatorial chemical library with a 384-letter code has recently been demonstrated78 and adapted to 384-microtitre well plates (Fig. 5a). In this system, codons within each DNA sequence code for a particular reaction or reactant, and the DNA sequence can contain multiple reaction codons. At each round of synthesis, the DNA sequence can be diversified by recombination or mutation to introduce further variability and explore combinations of successfully selected features as the hit molecule grows step by step. Using this approach in 4 generations/coupling steps, incorporating 17 different amino acids in the first 3 steps and adding an eighteenth dipeptide in the fourth step, tetrapeptide and pentapeptide substrates for protein kinase A (PKA) were discovered, with an enrichment factor of over 15,000-fold over random 4-step assembly78. This technique has the potential to encode a library of complexity 384n, where n is the number of steps.

Figure 5: Directed evolution using chemical approaches.
Figure 5

a | DNA-programmed version of split-and-mix solid-phase synthesis. A library is created of DNA fragments consisting of four coding regions (A–D), which determine the amino acid to be coupled to produce the peptide. The coding regions are hybridized to arrayed oligonucleotides to split the DNA fragments, and they are then transferred to separate wells in a multi-titre plate, where they undergo a chemical reaction to add an amino acid. As the sequence of the nucleotide determines which well of the plate the molecule ends up in, it also determines which chemical reaction it undergoes. After the reaction, the DNA fragments are pooled and the cycle repeats. b | Activity-directed chemical synthesis. During each cycle, arrays of reactions consisting of a selection of substrates, catalysts and solvents are carried out. The reaction mixtures are then assayed, and active wells form the basis of the next iteration of substrates and reaction conditions. After a given number of iterations, the active reactions are scaled up, the reaction components are isolated and the active compounds are characterized. FMOC, 9-fluorenyl-methoxycarbonyl.

An alternative chemical approach is activity-directed chemical synthesis, in which diverse reaction array components and conditions are varied to produce a mixture of products, which is then screened79,80 (Fig. 5b). The active mixtures are then used to inform further reaction array design, resulting in directed evolution cycles. Finally, the most-active reaction mixtures are scaled up, and their components are purified and assayed. The concept was demonstrated with the evolution of a 4-cyano-3-trifluoromethylphenyl acetamide fragment (a moiety common to many androgen receptor agonists) in three rounds. The most potent compound after one cycle had an effector concentration for half-maximum response (EC50) value for androgen receptor agonism of 8.8 μM, whereas the most potent compound after three cycles had an EC50 value of 730 nM. Further medicinal chemistry efforts on two potent leads from the third cycle demonstrated structure–activity relationships spanning two orders of magnitude, showing the value of activity-directed synthesis in identifying novel, tractable and optimizable chemotypes.

As noted above, virtual screening now offers the opportunity to screen libraries of up to 1013 molecules. The directed evolution equivalent, de novo design, in which a molecule is grown virtually within an active site of a protein, guided by synthetic feasibility, drug–receptor affinity and evolutionary algorithms, has also been an active area of research for many years. So far, it has not developed far enough to become a routine drug discovery tool81. Computing power, the difficulty in controlling the growth of overly complex virtual molecules within the de novo design algorithm and the weakness of scoring functions to predict affinity have all been limiting factors82. With the development of 'deep-learning' methodologies, the growth in computing power, the continued development of computational chemistry algorithms such as free energy perturbation83, the encoding of tractable chemistry, and automated chemistry to regularly sample experimentally the virtual designs to guide the evolutionary algorithms, virtual directed evolution may still be a prosperous area in the future84,85,86.

Directed evolution using biosynthetic approaches. An exciting area in which directed evolution can be used to produce novel small molecules is based on exploiting the way that natural products such as non-ribosomal peptides and polyketides are made in nature. This area of research has been driven by developments in synthetic biology and advances in the understanding of BGCs for many natural products87,88. For example, such clusters contain genes coding for modular non-ribosomal peptide synthases (NRPSs), with each module including a domain that recognizes a particular component amino acid, known as an adenylation domain (A-domain). Theoretically, novel peptides could be combinatorially biosynthesized by 'cutting and pasting' adenylation domains from other NRPSs, but in practice this often results in much lower yields of natural products and even complete dysfunction, probably owing to disruption of complex protein–protein interactions caused by the cut-and-paste approach. This problem can be overcome by directed evolution, as exemplified by two studies on the antibiotic andrimid. In one study89, a valine-specific A-domain in an NRPS responsible for assembly of the core of andrimid (AdmK) was replaced by a 2-aminobutyrate-incorporating A-domain from another bacterial strain. This resulted in substantially impaired andrimid production. However, by using cycles of mutagenic PCR and activity-based selection for antibiotic activity, clones were identified that could not only produce similar levels of andrimid to the wild-type strain but also novel analogues. In another study90, rather than attempting to rescue the activity of a chimeric NRPS, saturation mutagenesis was applied to targeted residues at the active site of AdmK to modify substrate specificity, and a mass spectrometry approach was used for selection, rather than antibiotic activity. This technique not only offered greater sensitivity but also provided information on the chemical structure of the produced andrimid analogues. Four clones were identified that produced three new andrimid derivatives and one previously described analogue, all of which were potent antibiotics.

Rare high producers of the target molecules naringenin and glucaric acid were found by coupling an intracellular biosensor linked to survival91 with high-throughput screening of mutant-strain libraries generated by genome-wide mutagenesis92. By targeting up to 18 Escherichia coli genomic loci to induce mutations in regulatory or coding sequences of genes on the biosynthetic pathway, nearly a billion pathway variants were investigated. To overcome 'cheater' cells — cells that survive selection by mutating the sensor machinery without producing the target molecule — a selection scheme was followed that toggled between negative and positive selection, to allow 'evolutionary escapees' to be removed as they arose. After four rounds of evolution, production of the target molecules was increased by 36-fold for naringenin and 22-fold for glucaric acid92.

Inspired by how nature can repurpose enzymes for new chemical transformations when faced with a new environmental challenge22,93, researchers have also started to repurpose enzymes using directed evolution to catalyse chemical reactions that do not exist in nature94. Examples are as diverse as carbene- and nitrene-transfer reactions catalysed by cytochrome P450s evolved by directed evolution95,96, the redirection of cyclizations using terpene synthase97 and the adaption of biosynthetic pathways to produce unnatural amino acids98. Some common guidelines for engineering enzymes have been proposed94. To optimize enzymes to a new chemical function, it is important that some trace of activity is present initially. In the context of applying directed evolution in this way in drug discovery, providing a biological system intended to produce bioactive small molecules with a chemical starting point that has some affinity for the target of interest could give evolution an important helping hand; that is, directed evolution would be better used as a technique in the hit-to-lead process rather than for initial hit identification.

Steps towards directed evolution of small-molecule inhibitors have been taken using randomly recombined biosynthetic pathways. Researchers at the company Evolva have developed a system (Fig. 6) in which yeast cells are used to synthesize novel small molecules. The yeast are engineered by applying horizontal transfer of genes from known biochemical pathways that produce drug-relevant chemical scaffolds, as well as genetic material from organisms with unknown genetic diversity that could provide new enzymatic activities, organisms reported to have a medicinal effect and organisms that can tolerate infections99. The genetic material is combined randomly on yeast artificial chromosomes (YACs) to produce yeast cells that have the potential of synthesizing large random libraries of small molecules. As an example to illustrate the potential of the platform in drug discovery, yeast strains containing YACs with combinations of genes derived from pathways, including those for alkaloid, flavonoid and polyunsaturated fatty acid biosynthesis, were prepared100. Screening was done in single cells using an in-cell brome mosaic virus (BMV) functional assay, such that a yeast cell survives only in the presence of an inhibitor of viral replication expressed from the YACs. In total, 10,208 clones were analysed further. From 35 clones generating the most-potent inhibitors, the authors synthesized, validated and characterized 74 new compounds (mostly with molecular mass of 200–350 Da), which included several novel chemical scaffolds, and 28 of these compounds showed activity in a secondary biological assay.

Figure 6: Lead discovery using a biosynthetic approach.
Figure 6

The figure illustrates a small-molecule-synthesis platform developed by the company Evolva. Genes of interest are cloned and used to create yeast artificial chromosomes (YACs), which are transformed into yeast cells. The yeast cells then produce a range of diverse products according to their specific biosynthetic machinery, and compounds eliciting the required activity can activate a reporter, leading to cell survival. The active clones are selected, the compounds are isolated and characterized, and their activity is confirmed in a secondary assay. HPLC, high-performance liquid chromatography; LCMS, liquid chromatography–mass spectrometry.

Continuous directed evolution

In the previously cited examples, the directed evolution systems support the steps of mutation, translation, selection, replication and hereditability separately, with frequent manual intervention needed to make the evolutionary cycle 'spin'. The aim of a continuous directed evolution system is to seamlessly integrate all of the steps into an uninterrupted cycle. Continuous evolution enables many more generations to be explored, meaning greater evolutionary steps may be taken, which may be important for exploration across complex proteins or even multi-enzyme systems producing small molecules.

Bacteriophages, with their high mutation rates and ease of manipulation, have provided an excellent platform for such efforts, beginning 50 years ago with in vitro studies101 and progressing to continuous directed evolution using bacteriophages evolving in bacterial hosts (see Ref. 102 for a review). Natural mutation rates of phages are high, and they can be raised further through the use of chemical or biochemical mutagens. Selection pressures include culture conditions, time for growth and even the concentration of phage, as they compete with each other to survive and reproduce.

Continuous directed evolution has been successfully achieved with the development of the phage-assisted continuous evolution (PACE) system103 (Fig. 7). PACE is capable of evolving any gene that can be linked to the production of M13 phage protein III (pIII), which is expressed on the phage surface and has a key role in entry into bacterial hosts such as E. coli. It has been used to optimize and evolve RNA polymerases104,105, proteases106 and genome-editing proteins107, as well as in the discovery of a protein that binds to a receptor108, as illustrated in the following examples.

Figure 7: Phage-assisted continuous evolution.
Figure 7

a | M13 phage infection requires protein III (pIII), which is expressed on the surface of the phage and mediates F pilus binding and host cell entry (not shown). Phages lacking the pIII protein are approximately 108 times less infectious, and the production of infectious phage scales with increasing levels of pIII over concentrations spanning two orders of magnitude, providing a stringency criterion for selection. To couple pIII production to the activity of interest, the gene encoding pIII is deleted from the phage vector and inserted into an accessory plasmid (AP) present in the Escherichia coli host cells under the control of a biosensor. In the example, for the evolution of protein binding inhibitors, a protein–protein interaction between the target and the evolving protein brings E. coli RNA polymerase (RNAP) upstream of the pIII gene, initiating transcription. The selection plasmid (SP) contains the genes required to synthesize the protein to be optimized along with the phage genes (except for the pIII gene). The mutagenesis plasmid (MP) contains mutagenic proteins, which can be induced by arabinose. Only phage vectors able to produce molecules with affinity for the biosensor in the E. coli AP are able to induce sufficient pIII production and thereby produce infectious progeny. b | In the phage-assisted continuous evolution (PACE) system, M13 phage and E. coli are mixed in a flowing 'lagoon', in which the inflow and outflow are controlled. The average residence time in the mixing compartment is set to be too short for E. coli replication but long enough for phage replication and accumulation of mutations. Host cells entering the lagoon are infected with phage and selected for their ability to produce molecules with particular properties (such as affinity for a protein target). Phages that accumulate mutations (indicated as yellow circles) leading to higher-affinity ligands can produce more pIII. Subsequently, they can produce more infectious progeny and become enriched within the continuous flow system. Conversely, phages that do not mutate sufficiently or that accumulate mutations leading to lower-affinity ligands become depleted from the system. In this way, PACE achieves continuous directed evolution of the ability to produce molecules with the desired properties. PACE is often likened to a conveyor belt that becomes progressively enriched with bacterial cells propagating phages that can express an optimized protein or function. araC, arabinose operon regulatory protein C; kanR, kanamycin resistance protein. Parts a and b are modified with permission from Ref. 108, Macmillan Publishers Limited.

PACE was used to rapidly explore the potential for hepatitis C virus (HCV) protease inhibitors to drive the appearance of resistant mutants106. An engineered HCV protease-activated RNA polymerase was used to couple polypeptide cleavage to changes in gene expression that supported pIII production and hence phage propagation during PACE. By performing PACE in the presence of the HCV protease inhibitors danoprevir or asunaprevir, it was possible to rapidly evolve HCV protease-resistant mutants. In the presence of danoprevir, the most common mutation observed in HCV protease was D168E, which weakened the IC50 for danoprevir by 30-fold. In the case of asunaprevir, there was a strong bias for D168Y, which weakened the IC50 value by 30-fold, whereas the D168E mutation shifted the IC50 by only 10-fold.The D168Y mutant has been observed for asunaprevir in individuals with hepatitis. So, a 1–3 day continuous evolution experiment with PACE revealed clinically relevant mutations that would otherwise have needed expensive and lengthy laboratory or clinical experiments to explore.

Another example illustrates the use of PACE to improve genome-editing tools107. Transcription activator-like effectors (TALEs) are DNA-binding domains that provide the ability to specifically cut DNA when coupled to TALE nucleases (TALENs)109. TALEs are limited, in that the 5′-nucleotide of the target is specified to thymine. While promiscuous TALEs have been described, no variants exist that are specific to 5′-adenine, 5′-guanine or 5′-cytosine107. To couple TALEs to the PACE system, the DNA-binding domain was linked to a bacterial RNA polymerase III, which established sequence-specific and binding-dependent production of pIII, enabling propagation of the phage. To suppress promiscuous mutations that also recognized 5′-thymine, the investigators linked 5′-thymine recognition to a dominant-negative pIII, which compromises phage propagation. Through this simultaneous positive and negative selection, 5′-nucleotide-selective TALEs were evolved.

A third recent example, in which PACE was used to evolve Bacillus thuringiensis insecticide toxins to overcome B. thuringiensis toxin resistance, is particularly exciting for the potential use of PACE systems in drug discovery108. The protein Cry1Ac is a widely used B. thuringiensis toxin, but some insects are resistant to it owing to mutations in its receptor. Liu and colleagues used PACE in a two-hybrid system in E. coli to evolve new Cry1Ac analogues that could bind to the Cry1Ac-resistant cadherin-like receptor (CAD) from the cabbage looper moth (Trichoplusia ni), for which Cry1Ac has very low affinity (>1 mM). Cry1Ac analogues were evolved through an artificial evolutionary stepping-stone receptor, toxin-binding region 3 (TBR3), which contains three mutated residues to introduce weak Cry1Ac-binding activity. Continuous evolution was carried out in steps of increasing stringency: first, using the artificial TBR3 receptor and a moderately potent mutagenesis plasmid to avoid early mutations that could abolish Cry1Ac binding to CAD; and second, using full CAD with a higher mutational rate plasmid to access rare Cry1Ac mutational combinations to enhance binding to CAD. Consensus variants that contained the most common of the 25 mutations in Cry1Ac observed during the 527-hour experiment were designed and synthesized, and the resulting Cry1Ac analogues bound to CAD with affinities of 18–34 nM. Such an approach could, in principle, be applied not only to the directed evolution of any protein-based therapeutic but also more broadly to other therapeutic modalities, such as peptides or even small molecules; for example, to circumvent antibiotic resistance.

In vivo continuous evolution has recently been demonstrated in yeast, providing a eukaryotic continuous evolution system, and one that enables the evolution of phenotypes that cannot be easily linked to phage growth110. In this method, the 'cargo' to be optimized is cloned into a native inducible retrotransposon, Ty1, the regulation of which has been engineered to increase library size and mutation rate. The error-prone nature of Ty1 replication, and the capacity for retrotransposon cycling, provides a novel mechanism for in vivo continuous evolution. As a proof of concept, in vivo continuous evolution was applied to the multi-enzyme pathway to increase xylose catabolism110, an important process for lignocellulose conversion in the biofuels industry111. The pathway contains a promoter, a xylose isomerase (XylA) and a xylulokinase (Xks1). Growth on xylose-containing agar plates provided the selection pressure. Starting from the native proteins, over the course of 1 week, a superior isolate emerged, which displayed a 21% increase in exponential growth rate over control and a shorter lag phase, driven by one mutation (Xks1-E164K). When the native XylA was replaced with a previously identified mutant XylA3*, which had already shown a 77% improvement in xylose consumption rate, in vivo continuous evolution generated further improving mutations, this time in XylA. The authors noted that the outcome of improvements were different depending on the starting point, indicating the context-specific nature of continuous evolution.


Directed evolution has been a driving technology in the field of therapeutic antibodies112,113. With the renewed interest in cyclic peptides as a drug modality, the tools of directed evolution are crossing into medicinal chemistry. As the examples in this article highlight, the tools of directed evolution can be used to drive multi-objective optimization of potency, off-target selectivity and stability of cyclic peptides and small molecules. Beyond the peer-reviewed scientific literature, further potentially exciting innovations are being described in the patent literature by emerging biotechnology companies, which are already making deals with major pharmaceutical companies. Some examples of these companies are given in Table 2.

Table 2: Example companies involved in the design of cyclic peptides, directed or virtual evolution of peptides or small molecules

Although the techniques of directed evolution are mature for protein optimization and increasing in scope for peptide drug discovery, small molecules pose additional challenges, such as the greater diversity of building blocks and chemical reactions involved. The specificity of cellular biosynthetic machinery for small molecules produced in this way is also limiting, as attempting to diversify chemistry using a BGC can lead to synthetic route interruptions owing to the lack of recognition by one or more of the enzymes in the pathway. However, this limitation can be overcome by directed evolution of the enzymes, as illustrated in the examples above. Using biosynthetic pathways, the uncovering of cryptic BGCs and the understanding of their regulation, design and construction offers potential for broader use of small-molecule directed evolution32,114. New bioinformatic and computational tools could allow the retrosynthetic design of gene clusters and a greater understanding of substrate promiscuity and flexibility of the associated enzymes to introduce diversity115. New technologies, such as microfluidics, could help improve synthetic biology processes and generate large combinatorial libraries of plasmids116.

To increase the size of libraries of natural product-like small molecules and broaden the diversity through the introduction of non-natural substrates, biosynthesis could be combined with bio-orthogonal synthetic organic chemistry to provide larger and more diverse libraries in the host cell. As the evolutionary cycles progress and the successful cells propagate, the introduction of non-natural substrates for bio-orthogonal chemistry could be refined, in a similar fashion to a medicinal chemistry team 'homing in' on the clinical candidate. Examples described herein already demonstrate the value of hybrid approaches combining directed evolution with more-traditional medicinal chemistry transformations and/or structure–activity relationship (SAR)-guided optimization59,60.

Another important challenge for biosynthesis-based evolutionary approaches to produce small molecules is the decoupling of the coding genotype from the produced and selected molecule; that is, the breaking of the genotype–phenotype link. For proteins or peptides, the translated product can be engineered to remain directly or indirectly coupled to its encoding RNA or DNA, thus maintaining the genotype–phenotype link. Examples include the protein or peptide being expressed on the surface of the encoding cell in phage display, the displayed peptide being covalently bound to its encoding RNA in mRNA display, and cellular sequestration, as in the SICLOPPS technology. The selected protein or peptide primary structures can then be recovered by sequencing the encoding DNA or RNA. However, for biosynthesized small molecules where the direct link to genetic encoding is lost through the intermediary step of their synthesis via enzymes, the structure of the selected molecule itself needs to be determined. Although this is a difficult problem, it is not insurmountable, and it has been a challenge faced for many years by those involved in natural-product screening. Classical fermentation scale-up of successful clones followed by micro-fractioning is one approach. When the lead compound is identified, either it can be scaled up by traditional synthetic organic chemistry100, or the host can be optimized to produce more of the required compound, as was successfully applied in the recent discovery of the novel peptide antibiotic lugdunin from the human nasal commensal bacterium Staphylococcus lugdunensis IVK28 (Ref. 117).

An added complexity for a permeable small molecule is that it can easily diffuse away from the cell that produces it. Permeability can thus break the genotype–phenotype link and allow the molecule to activate biosensors in neighbouring cells, enabling cheater cells to survive. This can be addressed by compartmentalization using droplet emulsions and microfluidics118. For example, a method has recently been disclosed in which individual members of a library of mutant bacterial cells that produce a natural product can be encapsulated in microdroplets suspended in an immiscible carrier liquid, with a target cell containing the biosensor, allowing selection119. The droplet compartmentalization of one producer cell with one sensor cell maintains the genotype–phenotype link and may also allow directed evolution to select for both potency and cell permeability.

The time required to assemble the required biology within an evolutionary system may also be an important factor to consider when deciding to pursue such approaches in drug discovery. In the example discussed above from the company Evolva, it took 6–9 months to assemble the screening system and produce 74 molecules100, which seems comparable to the typical pharmaceutical company time of 1 year from target to hit1. However, as mentioned in the introduction, many of the highest-profile targets yield no tractable starting points, and the aspiration of having a tractable lead for every well-validated target120 may make investment in establishing a directed evolution system worthwhile, even if it takes longer than is required to establish a traditional medicinal chemistry screening cascade.

Although directed evolution may produce leads where traditional hit–identification and hit-to-lead approaches have failed, the leads will most likely still require the current expertise of medicinal chemists for the optimization of other key parameters required for candidate drugs, such as permeability, pharmacokinetics, safety aspects and formulation. Even though directed evolution is embedded in therapeutic antibody discovery, some aspects of developability are not implicitly selected for by the process. The antibody's stability, pharmacokinetics and immunogenicity still need to be considered121. As examples have shown59,60, leads from directed evolution can still benefit from rational design and multi-objective improvements by medicinal chemistry. Thus, directed evolution can be seen as an approach to augment the current toolbox and processes for medicinal chemists, rather than replace them.

Finally, drug discoverers have the greatest chance of success in finding new therapies if, like nature, they can select the best possible modality for a given target and apply the most-appropriate lead-discovery technologies, rather than remain wedded to their training and background122. In this respect, cyclic-peptide drug discovery is showing the way, by crossing over technologies developed for discovering and optimizing therapeutic proteins to identify small molecules. Collectively, academic institutions, pharmaceutical companies and biotechnology companies have many tools. By using them in innovative ways, a transformation in the productivity of hit and lead identification based on directed evolution could be the next revolution in drug discovery.

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.


  1. 1.

    et al. How to improve R&D productivity: the pharmaceutical industry's grand challenge. Nat. Rev. Drug Disc. 9, 203–214 (2010).

  2. 2.

    , & Innovation in the pharmaceutical industry: new estimates of R&D costs. J. Health Econom. 47, 20–33 (2016).

  3. 3.

    et al. Putting translational science on to a global stage. Nat. Rev. Drug Disc. 15, 217–218 (2016).

  4. 4.

    , & Coming-of-age of antibodies in cancer therapeutics trends. Pharmacol. Sci. 37, 1009–1028 (2016).

  5. 5.

    , , & Small molecules, big targets: drug discovery faces the protein–protein interaction challenge. Nat. Rev. Drug Discov. 15, 533–550 (2016).

  6. 6.

    , & Therapeutic antibodies to intracellular targets in cancer therapy. Expert Opin. Biol. Ther. 13, 1485–1488 (2013).

  7. 7.

    & Encoded combinatorial chemistry. Proc. Natl Acad. Sci. USA 89, 5381–5383 (1992).

  8. 8.

    , & Synthetic methods for the implementation of encoded combinatorial chemistry. J. Am. Chem. Soc. 115, 9812–9813 (1993).

  9. 9.

    et al. Generation and screening of an oligonucleotide-encoded synthetic peptide library. Proc. Natl Acad. Sci. USA 90, 10700–10704 (1993).

  10. 10.

    & Chemical space of DNA-encoded libraries. J. Med. Chem. 59, 6629–6644 (2016).

  11. 11.

    & Virtual screening strategies in drug discovery: a critical review. Curr. Med. Chem. 20, 2839–2860 (2013).

  12. 12.

    et al. Pfizer Global Virtual Library (PGVL): a chemistry design tool powered by experimentally validated parallel synthesis information. ACS Comb. Sci. 14, 579–589 (2012).

  13. 13.

    , & The art and practice of structure-based drug design: a molecular modeling perspective. Med. Res. Rev. 16, 3–50 (1996).

  14. 14.

    , , , & Twenty years on: the impact of fragments on drug discovery. Nat. Rev. Drug Discov. 15, 605–619 (2016).

  15. 15.

    et al. Hypothesis driven drug design: improving quality and effectiveness of the design-make-test-analyse cycle. Drug Disc. Today 17, 56–62 (2012).

  16. 16.

    & On the evolution of functional secondary metabolites (natural products). Mol. Microbiol. 6, 29–34 (1992).

  17. 17.

    Natural products and the gene cluster revolution. Trends Microbiol. 24, 968–977 (2016).

  18. 18.

    , & Can population genetics adapt to rapid evolution? Trends Genet. 32, 408–418 (2016).

  19. 19.

    et al. Evolutionary paths to antibiotic resistance under dynamically sustained drug selection. Nat. Genet. 44, 101–105 (2012).

  20. 20.

    et al. Rapid evolution of fluoroquinolone-resistant Escherichia coli in Nigeria is temporally associated with fluoroquinolone use. BMC Infect. Dis. 11, 312 (2011).

  21. 21.

    et al. The evolutionary history of methicillin-resistant Staphylococcus aureus (MRSA). Proc. Natl Acad. Sci. USA 99, 7687–7692 (2002).

  22. 22.

    & Rapid evolution of bacterial catabolic enzymes: a case study with atrazine chlorohydrolase. Biochemistry 40, 12747–12753 (2001).

  23. 23.

    Evolutionary approaches for the discovery of functional synthetic small molecules. Pure Appl. Chem. 78, 1–14 (2006).

  24. 24.

    , Sidhu, & Beyond natural antibodies: the power of in vitro display technologies. Nat. Biotechnol. 29, 245–254 (2011).

  25. 25.

    et al. Library-based display technologies: where do we stand? Mol. BioSyst. 12, 2342–2358 (2016).

  26. 26.

    & Polishing the craft of genetic diversity creation in directed evolution. Biotechnol. Adv. 31, 1707–1721 (2013).

  27. 27.

    & Development of potent in vivo mutagenesis plasmids with broad mutational spectra. Nat. Commun. 6, 8425 (2015).

  28. 28.

    , & New strategies and approaches for engineering biosynthetic gene clusters of microbial natural products. Biotechnol. Adv. (2017).

  29. 29.

    & Computational approaches to natural product discovery. Nat. Chem. Biol. 11, 639–648 (2015).

  30. 30.

    , & Assessing the combinatorial potential of the RiPP cyanobactin tru pathway. ACS Synth. Biol. 4, 482–492 (2015).

  31. 31.

    , , , & How to make a glycopeptide: a synthetic biology approach to expand antibiotic chemical diversity. ACS Infect. Dis. 2, 642–650 (2016).

  32. 32.

    et al. Synthetic biology to access and expand nature's chemical diversity. Nat. Rev. Microbiol. 14, 135–149 (2016).

  33. 33.

    , , & Accessing Nature's diversity through metabolic engineering and synthetic biology. F1000Res. 5, 397 (2016).

  34. 34.

    , , , & A systematic computational analysis of biosynthetic gene cluster evolution: lessons for engineering biosynthesis. PLoS Comput. Biol. 10, e1004016 (2014).

  35. 35.

    et al. Metabolic engineering of Escherichia coli using CRISPR–Cas9 meditated genome editing. Metab. Eng. 31, 13–21 (2015).

  36. 36.

    et al. CRISPathBrick: modular combinatorial assembly of type II-A CRISPR arrays for dCas9-mediated multiplex transcriptional repression in E. coli. ACS Synth. Biol. 4, 987–1000 (2015).

  37. 37.

    et al. antiSMASH 3.0 — a comprehensive resource for the genome mining of biosynthetic gene clusters. Nucleic Acids Res. 43, W237–W243 (2015).

  38. 38.

    , & DNA assembly techniques for next-generation combinatorial biosynthesis of natural products. J. Ind. Microbiol. Biotechnol. 41, 469–477 (2014).

  39. 39.

    et al. Antimicrobials inspired by nonribosomal peptide synthetase gene clusters. J. Am. Chem. Soc. 139, 1404–1407 (2017).

  40. 40.

    et al. Direct measurement of intracellular compound concentration by RapidFire mass spectrometry offers insight into cell permeability. Biomol. Screen. 21, 156–164 (2016).

  41. 41.

    , , & Affinity selection of peptide binders with magnetic beads via organic phase separation (MOPS). Biol. Pharm. Bull. 38, 1822–1826 (2015).

  42. 42.

    , , , & Glycine oxidase based high-throughput solid-phase assay for substrate profiling and directed evolution of (R)- and (S)-selective amine transaminases. Anal. Chem. 86, 11847–11853 (2014).

  43. 43.

    et al. Serum stable natural peptides designed by mRNA display. Sci. Rep. 4, 6008 (2014).

  44. 44.

    et al. Directed evolution of scanning unnatural-protease-resistant (SUPR) peptides for in vivo applications. ChemBioChem 17, 1643–1651 (2016).

  45. 45.

    , & Synthetic evolution of metabolic productivity using biosensors. Trends Biotechnol. 34, 371–381 (2016).

  46. 46.

    , , & Glossary of terms used in medicinal chemistry (IUPAC recommendations 1998). Pure Appl. Chem. 70, 1129–1143 (1998).

  47. 47.

    , & Discovery of macrocyclic peptides armed with a mechanism-based warhead: isoform-selective inhibition of human deacetylase SIRT2. Angew. Chem. Int. Ed. 51, 3423–3427 (2012).

  48. 48.

    , & Phage selection of bicyclic peptide ligands of the Notch1 receptor. ChemMedChem 10, 1754–1761 (2015).

  49. 49.

    , , & Phage-encoded combinatorial chemical libraries based on bicyclic peptides. Nat. Chem. Biol. 5, 502–507 (2009).

  50. 50.

    , , & Macrocyclic peptides inhibitors for the protein–protein interaction of Zaire Ebola virus protein 24 and Karyopherin alpha 5. Org. Biomol. Chem. 15, 5155–5160 (2017).

  51. 51.

    & Encoded libraries of chemically modified peptides. Curr. Opin. Chem. Biol. 26, 89–98 (2015).

  52. 52.

    , & In vitro selection of mRNA display libraries containing an unnatural amino acid. J. Am. Chem. Soc. 124, 9972–9973 (2002).

  53. 53.

    & in Ribozymes (ed. Hartig, J.), 465–478 (Springer, 2012).

  54. 54.

    et al. Bicyclic peptides with optimised ring size inhibit human plasma kallikrein and its orthologues while sparing paralogous proteases. ChemMedChem 7, 1173–1176 (2012).

  55. 55.

    , , , & Structurally diverse cyclisation linkers impose different backbone conformations in bicyclic peptides. ChemBioChem 13, 1032–1038 (2012).

  56. 56.

    et al. Bicyclic peptide inhibitor reveals large contact interface with a protease target. ACS Chem. Biol. 7, 817–821 (2012).

  57. 57.

    et al. Development of a selective peptide macrocycle inhibitor of coagulation factor XII toward the generation of a safe antithrombotic therapy. J. Med. Chem. 56, 3742–3746 (2013).

  58. 58.

    & Phage selection of bicyclic peptides binding Her2. Tetrahedron 70, 7733–7739 (2014).

  59. 59.

    et al. Peptide macrocycle inhibitor of coagulation factor XII with subnanomolar affinity and high target selectivity. J. Med. Chem. 60, 1151–1158 (2017).

  60. 60.

    , & Improving the binding affinity of in-vitro-evolved cyclic peptides by inserting atoms into the macrocycle backbone. ChemBioChem 17, 2299–2303 (2016).

  61. 61.

    & Phage selection of cyclic peptide antagonists with increased stability toward intestinal proteases. Protein Eng. Des. Sel. 26, 81–89 (2013).

  62. 62.

    et al. Genetic selection of peptide aptamers that recognize and inhibit cyclin-dependent kinase 2. Nature 380, 548–550 (1996).

  63. 63.

    , , , & Generation and functional characterization of intracellular antibodies interacting with the kinase domain of human EGF receptor. Oncogene 22, 1557–1567 (2003).

  64. 64.

    , , & Selection of TNF-α binding affibody molecules using a β-lactamase protein fragment complementation assay. New Biotechnol. 26, 251–259 (2009).

  65. 65.

    & Genetically selected cyclic-peptide inhibitors of AICAR transformylase homodimerization. Angew. Chem. Int. Ed. 44, 2760–2763 (2005).

  66. 66.

    & Peptides come round: using SICLOPPS libraries for early stage drug discovery. Chemistry 20, 10608–10614 (2014).

  67. 67.

    , & A systematic method for identifying small-molecule modulators of protein–protein interactions. Proc. Natl Acad. Sci. USA 101, 15591–15596 (2004).

  68. 68.

    et al. A cyclic peptide inhibitor of C-terminal binding protein dimerization links metabolism with mitotic fidelity in breast cancer cells. Chem. Sci. 4, 3046–3057 (2013).

  69. 69.

    et al. A cyclic peptide inhibitor of HIF-1 heterodimerization that inhibits hypoxia signaling in cancer cells. J. Am. Chem. Soc. 135, 10418–10425 (2013).

  70. 70.

    , & Exploitation of the HIF axis for cancer therapy. Cancer Biol. Ther. 3, 608–611 (2004).

  71. 71.

    et al. Evolution of cyclic peptide protease inhibitors. Proc. Natl Acad. Sci. USA 108, 11052–11056 (2011).

  72. 72.

    & The generality of DNA-templated synthesis as a basis for evolving non-natural small molecules. J. Am. Chem. Soc. 123, 6961–6963 (2001).

  73. 73.

    & Protein-directed dynamic combinatorial chemistry: a guide to protein ligand and inhibitor discovery. Molecules 21, E910 (2016).

  74. 74.

    , & in Nucleic Acid Nanotechnology (eds Kjems, J. et al.), 173–197 (Springer, 2014).

  75. 75.

    & Dynamic combinatorial chemistry: a tool to facilitate the identification of inhibitors for protein targets. Chem. Soc. Rev. 44, 2455–2488 (2015).

  76. 76.

    et al. Multistep reaction based de novo drug design: generating synthetically feasible design ideas. J. Chem. Inf. Model. 56, 605–620 (2016).

  77. 77.

    , , , & “Analogous” organic synthesis of small-compound libraries: validation of combinatorial chemistry in small-molecule synthesis. J. Am. Chem. Soc. 116, 2661–2662 (1994).

  78. 78.

    , , & Directed chemical evolution with an outsized genetic code. PloS ONE 11, e0154765 (2016).

  79. 79.

    , & Efficient discovery of bioactive scaffolds by activity-directed synthesis. Nat. Chem. 6, 872–876 (2014).

  80. 80.

    , , , & Activity-directed synthesis with intermolecular reactions: development of a fragment into a range of androgen receptor agonists. Angew. Chem. Int. Ed. 54, 13538–13544 (2015).

  81. 81.

    & in Comprehensive Medicinal Chemistry III (eds Chackalamannil, S. et al.), 15–22 (Elsevier, 2017).

  82. 82.

    & De novo design at the edge of chaos. J. Med. Chem. 59, 4077–4086 (2016).

  83. 83.

    et al. Accurate and reliable prediction of relative ligand binding potency in prospective drug discovery by way of a modern free-energy calculation protocol and force field. J. Am. Chem. Soc. 137, 2695–2703 (2015).

  84. 84.

    & A bright future for evolutionary methods in drug design. ChemMedChem 10, 1296–1300 (2015).

  85. 85.

    & Active-learning strategies in computer-assisted drug discovery. Drug Discov. Today 20, 458–465 (2015).

  86. 86.

    , & Deep learning in drug discovery. Mol. Informat. 35, 3–14 (2016).

  87. 87.

    Genetic engineering of modular PKSs: from combinatorial biosynthesis to synthetic biology. Nat. Prod. Rep. 33, 203–230 (2016).

  88. 88.

    , & Reinvigorating natural product combinatorial biosynthesis with synthetic biology. Nat. Chem. Biol. 11, 649–659 (2015).

  89. 89.

    , , , & Directed evolution can rapidly improve the activity of chimeric assembly-line enzymes. Proc. Natl Acad. Sci. USA 104, 11951–11956 (2007).

  90. 90.

    , , , & Directed evolution of the nonribosomal peptide synthetase AdmK generates new andrimid derivatives in vivo. Chem. Biol. 18, 601–607 (2011).

  91. 91.

    , , & Looking for the pick of the bunch: high-throughput screening of producing microorganisms with biosensors. Curr. Opin. Biotechnol. 26, 148–154 (2014).

  92. 92.

    , , & Evolution-guided optimization of biosynthetic pathways. Proc. Natl Acad. Sci. USA 111, 17803–17808 (2014).

  93. 93.

    et al. Cloning sequencing, and characterization of the hexahydro-1,3,5-trinitro-1,3,5-triazine degradation gene cluster from Rhodococcus rhodochrous. Appl. Environ. Microbiol. 68, 4764–4771 (2002).

  94. 94.

    , & Expanding the enzyme universe: accessing non-natural reactions by mechanism-guided directed evolution. Angew. Chem. Int. Ed. 54, 3351–3367 (2015).

  95. 95.

    , , & Olefin cyclopropanation via carbene transfer catalyzed by engineered cytochrome P450 enzymes. Science 339, 307–310 (2013).

  96. 96.

    et al. Amination catalyzed by engineered cytochrome P450 enzymes in vitro and in vivo. Angew. Chem. Int. Ed. 52, 9309–9312 (2013).

  97. 97.

    et al. High-throughput screening for terpene-synthase-cyclization activity and directed evolution of a terpene synthase. Angew. Chem. Int. Ed. 52, 5571–5574 (2013).

  98. 98.

    Semisynthetic production of unnatural l-α-amino acids by metabolic engineering of the cysteine-biosynthetic pathway. Nat. Biotechnol. 21, 422–427 (2003).

  99. 99.

    et al. Yeast artificial chromosomes employed for random assembly of biosynthetic pathways and production of diverse compounds in Saccharomyces cerevisiae. Microb. Cell Fact. 8, 45 (2009).

  100. 100.

    et al. Yeast synthetic biology platform generates novel chemical structures as scaffolds for drug discovery. ACS Synt. Biol. 3, 314–323 (2014).

  101. 101.

    , & Extracellular Darwinian experiment with a self-duplicating nucleic acid molecule. Proc. Natl Acad. Sci. USA 58, 217–224 (1967).

  102. 102.

    & In vivo continuous directed evolution. Curr. Opin. Chem. Biol. 24, 1–10 (2015).

  103. 103.

    , & A system for the continuous directed evolution of biomolecules. Nature 472, 499–503 (2011).

  104. 104.

    et al. A population-based experimental model for protein evolution: effects of mutation rate and selection stringency on evolutionary outcomes. Biochemistry 52, 1490–1499 (2013).

  105. 105.

    , & Evolution of a split RNA polymerase as a versatile biosensor platform. Nat. Chem. Biol. 13, 432–438 (2017).

  106. 106.

    , , & A system for the continuous directed evolution of proteases rapidly reveals drug-resistance mutations. Nat. Commun. 5, 5352 (2014).

  107. 107.

    et al. Continuous directed evolution of DNA-binding proteins to improve TALEN specificity. Nat. Methods 12, 939–942 (2015).

  108. 108.

    et al. Continuous evolution of Bacillus thuringiensis toxins overcomes insect resistance. Nature 533, 58–63 (2016).

  109. 109.

    & The development of TALE nucleases for biotechnology. Methods Mol. Biol. 1338, 27–42 (2016).

  110. 110.

    et al. In vivo continuous evolution of genes and pathways in yeast. Nat. Commun. 7, 13051 (2016).

  111. 111.

    , & Engineering xylose metabolism in triacylglycerol-producing Rhodococcus opacus for lignocellulosic fuel production. Biotechnol. Biofuels 6, 134 (2013).

  112. 112.

    , , & The role of phage display in therapeutic antibody discovery. Int. Immunol. 26, 649–657 (2014).

  113. 113.

    Selecting and screening recombinant antibody libraries. Nat. Biotechnol. 23, 1105–1116 (2005).

  114. 114.

    & Natural products version 2.0: connecting genes to molecules. J. Am. Chem. Soc. 132, 2469–2493 (2010).

  115. 115.

    et al. Bioinformatics for the synthetic biology of natural products: integrating across the design–build–test cycle. Nat. Prod. Rep. 33, 925–932 (2016).

  116. 116.

    et al. End-to-end automated microfluidic platform for synthetic biology: from design to functional analysis. J. Biol. Eng. 10, 3 (2016).

  117. 117.

    et al. Human commensals producing a novel antibiotic impair pathogen colonization. Nature 535, 511–516 (2016).

  118. 118.

    , & Enzyme engineering in biomimetic compartments. Curr. Opin. Struct. Biol. 33, 42–51 (2015).

  119. 119.

    , & UK WO2016092304 (2015).

  120. 120.

    , , , & Towards a hit for every target. Nat. Rev. Drug Discov. 15, 1–2 (2016).

  121. 121.

    et al. Developability assessment during the selection of novel therapeutic antibodies. J. Pharm. Sci. 104, 1885–1898 (2015).

  122. 122.

    et al. New modalities for challenging targets in drug discovery. Angew. Chem. Int. Ed. 56, 10294–10323 (2017).

  123. 123.

    & A flow cytometry-based screen for synthetic riboswitches. Nucleic Acids Res. 37, 184–192 (2009).

  124. 124.

    et al. Synthetic RNA devices to expedite the evolution of metabolite-producing microbes. Nat. Commun. 4, 1413 (2013).

  125. 125.

    & Tetracycline aptamer-controlled regulation of pre-mRNA splicing in yeast. Nucleic Acids Res. 35, 4179–4185 (2007).

  126. 126.

    et al. RNA aptamer-based electrochemical biosensor for selective and label-free analysis of dopamine. Anal. Chem. 85, 121–128 (2013).

  127. 127.

    & A diminutive and specific RNA binding site for l-tryptophan. Nucleic Acids Res. 33, 5482–5493 (2005).

  128. 128.

    et al. A high-throughput approach to identify genomic variants of bacterial metabolite producers at the single-cell level. Genome Biol. 13, R40 (2012).

  129. 129.

    et al. Engineering an allosteric transcription factor to respond to new ligands. Nat. Methods 13, 177–183 (2016).

  130. 130.

    et al. An enzyme-coupled biosensor enables (S)-reticuline production in yeast from glucose. Nat. Chem. Biol. 11, 465–471 (2015).

  131. 131.

    et al. A protease-based biosensor for the detection of schistosome cercaria. Sci. Rep. 6, 24725 (2016).

  132. 132.

    , & Sensitive genetic screen for protease activity based on a cyclic AMP signaling cascade in Escherichia coli. J. Bacteriol. 182, 7060–7066 (2000).

  133. 133.

    & Novel bacteriological assay for detection of potential antiviral agents. Antimicrob. Agents Chemother. 34, 2337–2341 (1990).

  134. 134.

    & Protease-dependent streptomycin sensitivity in E. coli—a system for protease inhibitor selection. Nat. Biotechnol. 507–510 (1995).

  135. 135.

    & A novel genetic system to detect protein–protein interactions. Nature 340, 245–246 (1989).

  136. 136.

    , , , & Quenching accumulation of toxic galactose-1-phosphate as a system to select disruption of protein–protein interactions in vivo. BioTechniques 37, 844–852 (2004).

  137. 137.

    , & mRNA display: ligand discovery, interaction analysis and beyond. Trends Biochem. Sci. 28, 159–165 (2003).

  138. 138.

    , & WO 2015175747 A1 (2015).

  139. 139.

    , , & WO 2015019999 A1 (2014).

  140. 140.

    & US 8680022 B2 (2009).

  141. 141.

    , & US 20140249292 A1 (2012).

  142. 142.

    et al. US 20120172235 A1 (2010).

  143. 143.

    , , & US 20150344872 A1 (2015).

  144. 144.

    et al. WO 2013172954 A1 (2013).

  145. 145.

    et al. WO 2016079682 A1 (2015).

  146. 146.

    & WO 2017013660 A1 (2016).

  147. 147.

    et al. Automated design of ligands to polypharmacological profiles. Nature 492, 215–220 (2012).

  148. 148.

    , & US 9373059 B1 20160621 (2016).

Download references


The authors are grateful to M. Wigglesworth and R. Maciewicz for their critical review of the manuscript.

Author information


  1. AstraZeneca R&D Gothenburg, Pepparedsleden 1, Mölndal, 43150, Sweden.

    • Andrew M. Davis
    •  & Eric Valeur
  2. Integrated Drug Discovery, Sanofi-Aventis Deutschland GmbH, Industriepark Höchst, 65926 Frankfurt am Main, Germany.

    • Alleyn T. Plowright


  1. Search for Andrew M. Davis in:

  2. Search for Alleyn T. Plowright in:

  3. Search for Eric Valeur in:

Competing interests

The authors are, or were, employees of AstraZeneca, a global, research-based biopharmaceutical company.

Corresponding author

Correspondence to Andrew M. Davis.


Chemical space

A multi-dimensional conceptual region defined by a set of descriptors. For example, 'drug-like' chemical space (defined by limiting the space to molecules with a molecular mass <500 Da, fewer than 30 C, H, N, O or S atoms and fewer than 4 rings) has been estimated to be as large as 1063 molecules.

Continuous directed evolution

A directed-evolution method that resembles natural evolution, in which the hereditability of fitness is passed onto subsequent generations without manual intervention. As success in directed evolution depends on the number of rounds completed, removing manual steps can dramatically increase the speed of each round, the number of rounds that can be completed and hence the complexity of evolutionary changes that could be driven.

De novo computational design

The design of compounds based purely on the protein structure, through the computational docking of fragments into an active site and their computational growth using feasible in silico chemical steps to increase calculated binding affinity.

Design–make–test–analyse cycles

(DMTA cycles). The repetitive central process in lead optimization, involving a cycle of four steps: design (a hypothesis is constructed to improve the profile of the lead molecule); make (compounds exemplifying the design are synthesized); test (synthesized compounds of confirmed structure and purity are tested in one or more carefully constructed and controlled assays); and analyse (the experimental data are analysed and the results are used to amend a design hypothesis for the next cycle).

Directed evolution processes

Methods that mimic the processes of evolution but are directed towards a user-defined goal.

DNA-encoded libraries

Very large mixtures of molecules generated using a split-and-pool approach and used for ultra-high-throughput screening. Each synthesized molecule is covalently bound to a DNA fragment, which records the synthetic steps that have been taken to create the small molecule. An immobilized protein target is used to select binders from a pool of DNA-tagged molecules. The structure of the binders is deduced by sequencing the appended DNA tag.

DNA shuffling and recombination

A way to propagate beneficial mutations by recombining DNA segments from several gene sequences or gene pools from a directed evolution experiment.

DNA-templated synthesis

A process in which the DNA heteroduplex is used to bring two complementary DNA fragments bearing different reacting molecules into close proximity, increasing the reaction rate by several orders of magnitude. The synthesis of a chemical library is not just encoded in a sequence-dependent manner, but can be used to direct the order of chemical reactions.

Dynamic combinatorial libraries

Collections of molecules formed from reversible reactions of reagents under thermodynamic control. All species are interconverting at equilibrium. In the presence of a binding protein that binds one or more molecules, the equilibrium is shifted, and the system becomes enriched with the binding moieties.

Evolutionary algorithms

A subset of machine-learning algorithms inspired by biological evolution. Candidate solutions are individuals in a population, and a fitness function defines their quality and acts as a selection. Successful features from individuals are mutated and/or recombined to form the next generation of individuals, for further selection based on the fitness function.

Fragment-based drug design

An approach by which small, weakly binding chemical fragments (typically with a molecular mass of 100–200 Da) that bind to a protein target are identified and optimized to higher-affinity leads, usually guided by structural information on the fragment–target interaction from techniques such as X-ray crystallography.

mRNA display

An in vitro ribosome translation system for peptides and proteins. mRNA display uses the antibiotic puromycin, which causes premature chain termination on the ribosome. The cDNA is transcribed into mRNA libraries, and the 3′-end of each mRNA is coupled via a spacer oligonucleotide to puromycin. The oligonucleotide spacers allow effective translation and termination. The attached puromycin can react with the growing peptide chain, forming a covalent link between the peptide and its encoding mRNA, making the genotype–phenotype link. Selection is made based on the affinity of the peptide or protein with its attached coding mRNA for an immobilized target.

Non-ribosomal biosynthetic pathways

Pathways that biosynthesize the cores of many natural products based on peptides and polyketides. These involve large modular enzyme complexes known as non-ribosomal peptide synthetases and polyketide synthases.

Phage display

An in vivo translation system that uses bacteriophage to maintain the link between translated peptides or proteins and the DNA that encodes them. cDNA for the protein or peptide of interest is inserted into the phage coat protein gene, and phage progeny in Escherichia coli 'display' the target protein on its surface, attached to the coat protein. Selection is achieved by affinity for an immobilized target. After elution of binders, affinity maturation is achieved by further rounds of amplification, which introduces further variability in the selected DNA sequences. The amino acid sequence of the optimized binder can be deduced by sequencing the coding DNA of the selected phage.


The steric and electronic features in a ligand that result in the optimal molecular interactions of the ligand with a specific biological target, typically modulating a biological response.


A genetic element that can amplify itself in a genome via a 'copy–paste' mechanism involving reverse transcription into RNA and translation back into DNA, which can then be inserted at various positions in the genome. Retrotransposons are common components of eukaryotic cells.

Ribosome display

An in vitro translation system for peptides and proteins. The initial cDNA library is fused to a spacer sequence lacking a stop codon. The cDNA is transcribed to mRNA.The mRNA is translated to protein on the ribosome, but the lack of stop codon prevents release factors binding and disassembling the translational complex. Therefore, the spacer sequence remains attached to the tRNA and bound to the ribosome, with the peptide chain protruding, allowing folding. The resulting complex of RNA, ribosome and protein can be selected by the affinity of the protruding protein for its ligand, and sequencing of the mRNA enables the identification of the protein sequence of the bound proteins.

Site saturation mutagenesis

A method by which one or more codons can be randomized to produce all possible amino acids at chosen positions within the DNA.

Split-and-mix solid phase synthesis

A method for the synthesis of large combinatorial compound libraries. A solid-phase-supported reagent is split equally, and each portion is reacted with a different reagent. After washing, the individual portions are recombined and mixed. Subsequent rounds of splitting, reaction and recombination generate a final library of xn compounds, where x is the number of starting portions and n is the number of rounds.

Structure–activity relationships

The links between structural changes and changes in the biological activity of a series of tested molecules. The deduction of these links is a fundamental concept in medicinal chemistry, and the derived structureactivity relationships are used to guide design–make–test–analyse cycles.

About this article

Publication history



Further reading