The strong biological rationale to pursue challenging drug targets such as protein–protein interactions has stimulated the development of novel screening strategies, such as DNA-encoded libraries, to allow broader areas of chemical space to be searched. There has also been renewed interest in screening natural products, which are the result of evolutionary selection for a function, such as interference with a key signalling pathway of a competing organism. However, recent advances in several areas, such as understanding of the biosynthetic pathways for natural products, synthetic biology and the development of biosensors to detect target molecules, are now providing new opportunities to directly harness evolutionary pressure to identify and optimize compounds with desired bioactivities. Here, we describe innovations in the key components of such strategies and highlight pioneering examples that indicate the potential of the directed-evolution concept. We also discuss the scientific gaps and challenges that remain to be addressed to realize this potential more broadly in drug discovery.
Drug discovery remains slow, expensive and unreliable, despite many technological advances in recent decades. The journey from a small-molecule screening hit to a candidate drug typically takes 4–5 years, costs US$14–25 million and has an attrition rate of around 50% or higher, and the attrition rate from a candidate drug to the launch of a commercial product has been reported to be as high as 97%1. The overall cost of drug development is high, with the cost per drug launch (including the cost of failures and capital) recently estimated to exceed $2.6 billion2.
The choice of biological target remains a key source of attrition. Hence, large investments are being made in translational science to more effectively validate the role of a biological target in human disease and identify the most appropriate patient subset in which to evaluate potential drugs3. In the hunt for effective therapeutics, new drug modalities have been successfully exploited, most notably antibody-based therapies4. However, some targets with a compelling biological rationale, including various transcription factors and intracellular protein–protein interactions, are still very challenging for either current small-molecule drug discovery technologies5 or large-molecule approaches6.
Standard small-molecule drug discovery approaches can conceptually be broken down into two components. The first component is an initial screen — often a high-throughput in vitro assay that can screen up to ~106 compounds — to identify compounds that show some level of the desired activity (hits). Setting up and analysing such screens typically takes 1 year1. The second component is the optimization of hits into leads through design–make–test–analyse cycles (DMTA cycles), ultimately leading to the selection of a candidate drug. In addition to the desired biological activity, such optimization has to take into account other properties that are crucial for candidate drugs, including pharmacokinetics and safety. Chemistry teams in large pharmaceutical companies can typically carry out DMTA cycles on 1–2 lead series in parallel, with each iteration taking 4–6 weeks.
Some advances in the initial screening component have focused on searching larger areas of chemical space. For example, DNA-encoded libraries7,8,9 can typically be used to screen up to 107–1010 molecules10, and virtual screening can profile up to 1013 compounds11,12, but even the largest of screening libraries cannot be used to fully explore 'drug-like' chemical space, which has been estimated to be as large as 1063 molecules13. Other advances, such as fragment-based drug design, conceptually enable wider areas of chemical space to be explored more efficiently with smaller chemical libraries (see Ref. 14 for a review). Simultaneously, there have been various advances, such as developments in predictive chemistry, that could speed up the DMTA process (for a review, see Ref. 15) and that could potentially lead to fully automated DMTA cycles. Nevertheless, the improvements in the DMTA paradigm remain incremental, as the fundamental process remains unchanged.
In nature, however, the development of bioactive compounds has been driven by evolution. Microorganisms, insects, plants and animals have adapted through the process of natural selection to synthesize peptides and other small molecules as defences or attractants that provide them with an evolutionary advantage over their competitors16,17. Adaptation can happen remarkably quickly, particularly in rapidly dividing systems under a strong selection pressure, such as bacteria and viruses, in which random mutations among billions of cells can produce one that has an advantage, which can then become dominant18,19,20,21,22.
Evolution, defined as the change in heritable traits of biological populations over generations, requires: traits that vary among individuals in the population (phenotypic variation); that the different traits confer different rates of survival and reproduction (differential fitness); and that successful traits can be passed from generation to generation (heritability of fitness). The classical DMTA cycle in medicinal chemistry has many similarities to the evolutionary processes in biology mediated through traits encoded in the genomes of organisms. The medicinal chemistry design hypothesis could be viewed as analogous to genetic information23. The 'make' stage (chemical synthesis or purchase) corresponds to the translation of genetic information into proteins. The 'test' and 'analyse' stages are similar to the identification of organisms that have particular characteristics (differential fitness), and deduced structure–activity relationships lead to new designs. The good features are kept, and the bad features are discarded from the design hypothesis for the next round of synthesis, a process that is comparable to the mutation and recombination of the genetic information that occurs during reproduction and hereditability of fitness.
Evolution in nature can be mimicked in the laboratory using directed evolutionary processes to test billions (or more) variations of genetic sequences that encode particular characteristics in parallel. Under a selection pressure, each sequence has a chance to survive to propagate for the next generation, with the mutation of successful sequences introducing further variation for the next round of selection. For example, such processes are well established for the selection of therapeutic antibodies from phage display, ribosome display, mRNA display and microbial cell display libraries (see Refs 24,25 for reviews).
If chemists could harness the scale and speed of directed evolutionary processes to search for and optimize bioactive small molecules, a transformation in lead generation success rates could be achieved — perhaps even for highly challenging, but strongly biologically validated targets. Excitingly, advances in several areas — including natural product biosynthesis, synthetic biology and biosensors — are now providing the basis for developing integrated systems that could have the potential to fulfil this goal. A representation of such a directed evolution paradigm is shown in Fig. 1. Although applications so far have been limited primarily to proteins and cyclic peptides for extracellular targets, such systems could in principle also be applied to cyclic peptides and small molecules that inhibit intracellular protein–protein interactions to optimize their bioactivity, their selectivity and, in some cases, their pharmacokinetic properties. Below, the steps of directed evolution are presented first, followed by pioneering examples of its application to the discovery and optimization of cyclic peptides and small molecules. The opportunities, gaps and challenges for directed evolution to have a broader influence on drug discovery are also discussed.
Harnessing directed evolution
Directed evolution systems to identify bioactive small molecules need to integrate components that allow the introduction of molecular variability across a population of molecules, the application of a selection pressure to the production of such molecules and hereditability of the capacity (fitness) to produce molecules with the desired properties.
Introduction of molecular variability. The introduction of broad variability in the starting population of potential small molecules is the first step of an evolutionary cycle. For simple peptides, variability in the population can be introduced by making random DNA or RNA libraries that encode populations of peptides using cell-free systems or cellular display. After translation, affinity-based selection can be applied to identify peptide sequences that have the desired properties, such as strength of binding to a protein target. Mutational variability in the population of peptides can be introduced in various ways. In vitro directed evolution experiments can allow natural genetic mutations to drive variation, the use of error-prone PCR to drive mutations in expanding the DNA libraries, or DNA shuffling and recombination (see Ref. 26 for a review). More-focused variability can be introduced by site saturation mutagenesis. In cells, several strategies can be used, including chemical mutagens, UV light, hypermutator strains (engineered through deletion or modification of DNA replication and repair genes), mutagenicity plasmids (which suppress proofreading and enhance error-prone lesion bypass by DNA polymerases and/or impair mismatch repair27) and genome-wide mutagenesis. Some of these methods are exemplified by the case studies in Table 1 (with selected structures shown in Fig. 2).
For more-complex small molecules, there are growing opportunities to harness the pathways used in nature — particularly by bacteria and fungi — to biosynthesize a wide range of bioactive natural products (typically based on a polyketide or peptide core) non-ribosomally. Over the past two decades, microbial genomic studies have led to an explosion of knowledge on such non-ribosomal biosynthetic pathways and their manipulation to produce novel molecules (see Refs 28,29 for reviews). For the purposes of directed evolution, if a biosynthetic pathway exists that already contains the production elements of potential hit or lead molecules, random mutations in the biosynthetic machinery for that particular pathway may be introduced to induce variability in the product molecules (for example, see Ref. 30). Variability may also be introduced by using random combinations of biosynthetic units of known pathways, by changing starting materials or by randomly recombining enzymes that can perform chemical transformations31. It is important to note that introducing variability in these ways is not straightforward, given the complexity of the biosynthetic machinery for natural products, which is discussed in depth in other reviews32,33,34 and in more detail below. Nevertheless, growing knowledge of the intricacies of biosynthetic pathways, as well as improved tools for manipulating them, such as the use of CRISPR–Cas9 to unsilence cryptic biosynthetic gene clusters (BGCs)35,36, informatics tools, such as antiSMASH for in silico mining of BGCs37, new DNA assembly techniques38 and even the prediction of plausible small-molecule structures directly from the primary sequences of BGCs39 could increase the likelihood of success in the near future.
Applying selection pressure in an extracellular context. The next step in the evolutionary cycle is the application of selection pressure across a population. The term 'selection' in a directed evolution context is analogous to a 'screen' in the DMTA cycle (although some researchers reserve the word 'selection' for 'genetic selection', in which the generating cell receives a survival advantage, as described below). Selection can be based on: the production of target molecules measured using an independent assay, such as mass spectrometry40; the identification of active compounds by an affinity-based method, such as binding to a target protein immobilized on a solid support or magnetic beads41; or an activity-based bioassay42 (see below for discussion of specific examples using technologies such as phage display of peptides).
Selection does not have to be based solely on affinity or activity of the target molecule. For example, although peptides have been developed as drugs, their optimization can be cumbersome owing to the inherent instability of peptides towards proteases, which leads to them having short plasma half-lives. To help address this problem, proteolysis has been used as a selection pressure to identify more-stable peptides43,44.
Applying selection pressure in an intracellular context. An intracellular biosensor can be used to connect and translate the concentration and affinity of a particular molecule inside a cell to an output signal. A range of biosensors have been developed, including riboswitches, allosterically controlled transcription factors, enzyme-coupled biosensors and two-hybrid systems45 (Box 1). The signals from biosensors can cover various end points, including luminescence or fluorescence, enzymatic activity and cell viability. For example, when a cell produces a metabolite of interest, it can be used to regulate the expression of an antibiotic-resistance gene, and the productive cells can be identified by their selective growth in antibiotic-containing media (that is, they have obtained a fitness advantage).
If a biosensor only has a binary output (for example, a 'survive or die' signal for a cell), it is likely to be useful only in manually directed evolution. However, if the output of a biosensor shows a more-gradual dependency on the activity of a detected molecule, the activity could potentially be optimized in continuous evolution cycles, as discussed later.
Hereditability of fitness. In a directed evolution system, selection of the 'fittest members' of the system (for example, DNA sequences that encode the highest-affinity or most-stable peptides) can be done manually. The members of the system that produce such molecules can then be expanded on with further rounds of variation and selection, either to focus on the same property for a modified set of producing members or focus on other properties. For example, for genetically encoded peptides, structure–activity analysis of the peptides could be applied, and the encoding of key pharmacophores46 within DNA sequences for the next round of molecule production and selection could be 'fixed' (Ref. 47) or favoured48, while allowing random variability among other parts of the DNA sequences. Where subsequent chemical modification is planned, key amino acids can also be fixed, while allowing remaining residues to be randomized47,49,50.
In a continuous directed evolution system, intracellular production of molecules can be coupled to the reproduction of cells using a biosensor (Box 1), removing the manual selection step and allowing nature to take full control. The advantage of continuous directed evolution over manually directed evolution is the opportunity for many more generations to be sampled, allowing more profound evolutionary changes to be explored. In the next two sections, examples of directed evolution with manual intervention and continuous directed evolution in the production of peptides and other small molecules are discussed.
Manually directed-evolution examples
Extracellular directed evolution. Technologies such as mRNA display and ribosome display are cell-free directed evolution methods, whereas in phage display, the target molecules are expressed on the surface of bacteriophages (Fig. 3; for a recent review of these and other display technologies, see Ref. 25). Such technologies enable the steps of variability, selection and hereditability of fitness in directed evolution as follows (Fig. 3). In RNA and ribosome display, variability is achieved by creating cDNA libraries. The libraries can be fully randomized, or randomized only at certain positions, with certain amino acid positions being fixed. mRNA and ribosome display, being cell-free methods, provide flexibility for the inclusion of unnatural amino acids. After translation, selection is made based on the affinity of the peptide or protein and its attached coding mRNA against an immobilized target. The binders can then be eluted, the mRNA tag can be back-translated to cDNA and the sequence can be amplified by PCR to identify the binding peptide sequence. Hereditability of fitness is achieved by subjecting selected sequences to further rounds of variation, by amplification using error-prone PCR, site saturation mutagenesis or DNA shuffling, to further optimize binding sequences, followed by additional rounds of display and selection. Additional selection pressures, such as protease stability, can select for multiple characteristics.
In phage display, cDNA libraries are introduced into a coat protein gene so that, after translation, the coded protein or peptide is displayed attached to the phage surface coat. After display, phages bearing peptides or proteins on their surface can be selected based on affinity of the displayed protein for an immobilized protein target. Hereditability of fitness is achieved after elution of selected sequences, by further rounds of replication and selection, which can result in improved affinity for the target protein. At each round, random errors in DNA replication introduce variability, or — as in mRNA display — the encoding DNA can be manually randomized using DNA shuffling or site saturation mutagenesis for a more rational approach. Peptide libraries that can be chemically modified have been expressed on the surface of bacteriophages using phage display (see Ref. 51 for a review).
In vitro mRNA display methods allow the incorporation of unnatural amino acids52, such as N-methylated amino acids to improve permeability53, unnatural amino acids containing reactive 'warheads' (Ref. 47) and reactive unnatural amino acids to allow cyclization. Macrocyclic-peptide libraries inhibiting sirtuin 2 were developed using the RaPID (random non-standard peptide integrated discovery) mRNA display technology to incorporate an ɛ-N-thioacetyl-lysine warhead and an α-N-(2-chloroacetyl)-tyrosine to enable cyclization onto a cysteine side chain, which was also introduced47. After translation, displayed peptides can be modified with bridging small molecules to generate monocyclic and bicyclic peptides to improve potency and metabolic stability (Fig. 3). For example, 1,3,5-tris(bromomethyl)benzene was used to couple embedded cysteines in peptide sequences AC(X)6C(X)6CG (where C represents cysteines for cyclization and X represents any amino acid) displayed on the surface of phages to select potent inhibitors of Notch1 (Ref. 48). TBMB has also been used to cyclise similar peptide libraries to select for inhibitors of human plasma kallikrein54 and cathepsin G49 (Table 1). Other linkers that have been explored include 1,3,5-triacryloyl-1,3,5-triazinane and N,N′,N′′-(benzene-1,3,5-triyl)-tris(2-bromoacetamide), which each impose different conformation constraints on the cyclic peptides, adding topological as well as sequence variability into the libraries before selection51,55. Further studies have identified potent and selective inhibitors for urokinase plasminogen activator (uPA)56, factor XIIa57 and HER2 (also known as ERBB2)58 using this approach (for a review of chemical modifications, see Ref. 51).
Impressive selectivity can be achieved during the rounds of directed evolution, as demonstrated for uPA and factor XIIa, for which inhibitors with >1,000-fold selectivity over related proteases were discovered. The inhibitors were even selective for the target human orthologues compared with murine orthologues. Although this demonstrates the exquisite selectivity that can be achieved by directed evolution, high selectivity for human orthologues over animal orthologues (or vice versa) may also be a limitation, given that candidate drugs are evaluated in animals for both efficacy and safety before clinical trials. Selection for crossover to a relevant species for the study of pharmacodynamics and toxicology needs to be implicit during the directed-evolution process.
In the case of factor XIIa, further medicinal chemistry optimization was attempted on the phage-display-evolved bicyclic peptide sequence RCFRLPCRQLRCR (cyclized with 1,3,5-triacryloyl-1,3,5-triazinane) to explore unnatural amino acids that could not be sampled by the phage display technology59. Most modifications resulted in decreases in affinity. However, a very conservative change (but typical of changes explored in medicinal chemistry) — of a hydrogen atom in the para-position on a phenylalanine amino acid residue with a fluorine atom — led to a further tenfold improvement in affinity. Ring expansion by replacing cysteine with homocysteine and 4-mercaptovaline, which could also not be accessed by the phage display technology, also led to small improvements in potency60. These examples demonstrate that complementarity can exist between directed evolution and medicinal chemistry optimization to further explore chemical space.
Another example using phage-displayed peptide libraries illustrates the potential to select molecules based on multiple characteristics — in this case, activity against a target protein and proteolytic stability61. Bicyclic peptidic phage libraries were first subjected to affinity selection against human plasma kallikrein. The phage libraries were then exposed to pancreatin (essentially a cocktail of digestive enzymes) as a proteolytic pressure, followed by amplification. An identified bicyclic peptide was active against plasma kallikrein (half-maximal inhibitory concentration (IC50) = 18 nM), and its inhibitory activity did not change significantly when incubated in the presence of pancreatin (IC50 = 39 nM), demonstrating its enhanced stability.
A similar approach based on cell-free RNA display of peptides (Fig. 3) has been used to select stabilized cyclic peptides targeting the GDP-bound G protein isoform αi1 (αi1·GDP)43. After running a trillion-member cyclic peptide library through seven rounds of selection for affinity for αi1·GDP, a cyclic peptide with 2 nM affinity was identified. However, the peptide was found to be only twice as stable in the presence of the protease chymotrypsin compared with a linear peptide with the same sequence. Using the pool of peptides from round seven as a starting point, the library was exposed to three additional rounds of a two-step selection protocol in which the peptide library was first exposed to immobilized chymotrypsin for 15 minutes, followed by binding selection. After the three additional rounds of selection, the best cyclic peptide identified had retained affinity for the target (Kd = 3 nM) but with a 300-fold improvement in protease stability compared with a linear peptide with the same sequence. Surprisingly, the identified protease-stable analogues contained known chymotrypsin cleavage pharmacophores, indicating that directed evolution can be used to identify useful molecules counter-intuitively to the present dogma in a field.
A further step to improve peptide stability has recently been reported in a study using an RNA-displayed cyclic peptide that could incorporate an unnatural amino acid at one or more positions44. Using the previously described αi1·GDP-selected cyclic peptide as the starting point, a focused library was designed whereby each wild-type position could undergo mutagenesis with nucleotides for the UAG stop codon to code for an unnatural amino acid, N-methylalanine. At each round, the library was subjected to affinity and proteolysis selection. After five rounds, a single dominant sequence was identified that maintained target affinity; this showed a 400-fold improvement in chymotrypsin resistance and a >3,700-fold improvement in proteinase K resistance. The final selected sequence maintained only two out of the original ten amino acids, and incorporated two N-methylalanine residues.
As a further proof of principle, a library of peptides was put through directed-evolution selection for protease stability and then target affinity for HER2 (Ref. 44). In this case, the unnatural amino acid N-methylnorvaline was chosen because of its high incorporation efficiency and hydrophobic nature, which could contribute to HER2 binding. To target the extracellular domain of HER2, pharmacophores from trastuzumab and pertuzumab — approved therapeutic antibodies that target the extracellular domain of HER2 — were used as a starting point for the design of focused libraries (containing 106–107 members). Affinity selection was conducted using HER2-expressing cells in culture, and selection for protease stability was made using increasingly stringent incubations with proteinase K, chymotrypsin and trypsin. After four rounds of selection, the leading cyclic peptide was as potent and selective as the therapeutic antibodies and with sufficient stability for in vivo testing in mice. It incorporated one N-methylnorvaline residue. HER2-specific uptake into tumours in vivo was demonstrated using imaging, with little uptake into non-tumour tissue being seen.
Intracellular directed evolution. An intracellular, survival-based biosensor, known as the two-hybrid system (Box 1) has been harnessed for hit discovery. In a pioneering example, displaying a library of 20-residue peptides led to the identification of peptides binding to cyclin-dependent kinase 2 (CDK2)62. Beyond small peptides, the two-hybrid system has been applied to the discovery of single-chain variable fragments against epidermal growth factor receptor (EGFR)63 and affibodies targeting tumour necrosis factor (TNF)64. Excitingly, directed evolution of cyclic peptides produced intracellularly rather than extracellularly has been achieved by coupling this biosensor to a platform known as split-intein circular ligation of proteins and peptides (SICLOPPS)65. Randomized peptide-coding DNA sequences are coupled to DNA sequences coding for self-excising proteins known as inteins, which are split on either side of the peptide-coding sequence in a plasmid. Following expression of the whole construct from the plasmid, the split-inteins come into close proximity, favouring an intracellular native chemical ligation reaction, which results in the release of the intein and the generation of a cyclic peptide66. This approach can be used to generate plasmid libraries that encode 106–109 cyclic peptides.
The coupling of SICLOPPS libraries to the reverse two-hybrid system has been used to discover novel inhibitors of protein–protein interactions and transcription factors67,68. An example of the potential to address highly challenging targets with this approach focused on inhibiting the subunit hetero dimerization of a transcription factor, hypoxia-inducible factor 1 (HIF1)69 (Fig. 4). Firstly, a 3.2-million-member cyclic peptide library was evaluated, with survival of the cells producing the peptide coupled to inhibition of the HIF1α–HIF1β interaction. Evaluation of the surviving cell colonies (including steps to eliminate false positives) led to the isolation of four cyclic peptides that inhibited the HIF1α–HIF1β interaction, the most potent of which bound to HIF1α with a Kd of 124 nM. This peptide was then coupled to a transactivating transcriptional activator (TAT) cell-penetrating peptide to allow evaluation in cells, which demonstrated dose-dependent inhibition of hypoxia (IC50 = 19 μM). The peptide was also shown to disrupt the HIF1 dimer without disrupting the HIF2 dimer, a notable contrast to a previously reported inhibitor70. Additional developments with SICLOPPS have allowed the incorporation of unnatural amino acids, and such an approach was successfully used to identify HIV protease inhibitors71.
Directed evolution using chemical approaches. For many years, chemists have tried to mimic nature by developing chemical evolutionary approaches, such as DNA-templated synthesis72, dynamic combinatorial libraries73 and de novo computational design coupled to evolutionary algorithms to guide the virtual evolution of potential hit compounds (for recent publications, see Refs 74,75,76). These technologies have not yet had a significant impact on the drug discovery process. However, there has been notable success with the use of DNA encoding in the chemical synthesis of large libraries. DNA-encoded libraries have become part of mainstream hit-identification procedures and have enabled the expansion of high-throughput screening beyond the typical library size of ~106 small molecules. However, the DNA tag merely records the history of the synthetic route to the appended small molecule and does not direct its synthesis, whereas DNA-templated synthesis offers the opportunity to both record and direct the synthetic trajectory.
A DNA-programmed version of the split-and-mix solid phase synthesis77 of a combinatorial chemical library with a 384-letter code has recently been demonstrated78 and adapted to 384-microtitre well plates (Fig. 5a). In this system, codons within each DNA sequence code for a particular reaction or reactant, and the DNA sequence can contain multiple reaction codons. At each round of synthesis, the DNA sequence can be diversified by recombination or mutation to introduce further variability and explore combinations of successfully selected features as the hit molecule grows step by step. Using this approach in 4 generations/coupling steps, incorporating 17 different amino acids in the first 3 steps and adding an eighteenth dipeptide in the fourth step, tetrapeptide and pentapeptide substrates for protein kinase A (PKA) were discovered, with an enrichment factor of over 15,000-fold over random 4-step assembly78. This technique has the potential to encode a library of complexity 384n, where n is the number of steps.
An alternative chemical approach is activity-directed chemical synthesis, in which diverse reaction array components and conditions are varied to produce a mixture of products, which is then screened79,80 (Fig. 5b). The active mixtures are then used to inform further reaction array design, resulting in directed evolution cycles. Finally, the most-active reaction mixtures are scaled up, and their components are purified and assayed. The concept was demonstrated with the evolution of a 4-cyano-3-trifluoromethylphenyl acetamide fragment (a moiety common to many androgen receptor agonists) in three rounds. The most potent compound after one cycle had an effector concentration for half-maximum response (EC50) value for androgen receptor agonism of 8.8 μM, whereas the most potent compound after three cycles had an EC50 value of 730 nM. Further medicinal chemistry efforts on two potent leads from the third cycle demonstrated structure–activity relationships spanning two orders of magnitude, showing the value of activity-directed synthesis in identifying novel, tractable and optimizable chemotypes.
As noted above, virtual screening now offers the opportunity to screen libraries of up to 1013 molecules. The directed evolution equivalent, de novo design, in which a molecule is grown virtually within an active site of a protein, guided by synthetic feasibility, drug–receptor affinity and evolutionary algorithms, has also been an active area of research for many years. So far, it has not developed far enough to become a routine drug discovery tool81. Computing power, the difficulty in controlling the growth of overly complex virtual molecules within the de novo design algorithm and the weakness of scoring functions to predict affinity have all been limiting factors82. With the development of 'deep-learning' methodologies, the growth in computing power, the continued development of computational chemistry algorithms such as free energy perturbation83, the encoding of tractable chemistry, and automated chemistry to regularly sample experimentally the virtual designs to guide the evolutionary algorithms, virtual directed evolution may still be a prosperous area in the future84,85,86.
Directed evolution using biosynthetic approaches. An exciting area in which directed evolution can be used to produce novel small molecules is based on exploiting the way that natural products such as non-ribosomal peptides and polyketides are made in nature. This area of research has been driven by developments in synthetic biology and advances in the understanding of BGCs for many natural products87,88. For example, such clusters contain genes coding for modular non-ribosomal peptide synthases (NRPSs), with each module including a domain that recognizes a particular component amino acid, known as an adenylation domain (A-domain). Theoretically, novel peptides could be combinatorially biosynthesized by 'cutting and pasting' adenylation domains from other NRPSs, but in practice this often results in much lower yields of natural products and even complete dysfunction, probably owing to disruption of complex protein–protein interactions caused by the cut-and-paste approach. This problem can be overcome by directed evolution, as exemplified by two studies on the antibiotic andrimid. In one study89, a valine-specific A-domain in an NRPS responsible for assembly of the core of andrimid (AdmK) was replaced by a 2-aminobutyrate-incorporating A-domain from another bacterial strain. This resulted in substantially impaired andrimid production. However, by using cycles of mutagenic PCR and activity-based selection for antibiotic activity, clones were identified that could not only produce similar levels of andrimid to the wild-type strain but also novel analogues. In another study90, rather than attempting to rescue the activity of a chimeric NRPS, saturation mutagenesis was applied to targeted residues at the active site of AdmK to modify substrate specificity, and a mass spectrometry approach was used for selection, rather than antibiotic activity. This technique not only offered greater sensitivity but also provided information on the chemical structure of the produced andrimid analogues. Four clones were identified that produced three new andrimid derivatives and one previously described analogue, all of which were potent antibiotics.
Rare high producers of the target molecules naringenin and glucaric acid were found by coupling an intracellular biosensor linked to survival91 with high-throughput screening of mutant-strain libraries generated by genome-wide mutagenesis92. By targeting up to 18 Escherichia coli genomic loci to induce mutations in regulatory or coding sequences of genes on the biosynthetic pathway, nearly a billion pathway variants were investigated. To overcome 'cheater' cells — cells that survive selection by mutating the sensor machinery without producing the target molecule — a selection scheme was followed that toggled between negative and positive selection, to allow 'evolutionary escapees' to be removed as they arose. After four rounds of evolution, production of the target molecules was increased by 36-fold for naringenin and 22-fold for glucaric acid92.
Inspired by how nature can repurpose enzymes for new chemical transformations when faced with a new environmental challenge22,93, researchers have also started to repurpose enzymes using directed evolution to catalyse chemical reactions that do not exist in nature94. Examples are as diverse as carbene- and nitrene-transfer reactions catalysed by cytochrome P450s evolved by directed evolution95,96, the redirection of cyclizations using terpene synthase97 and the adaption of biosynthetic pathways to produce unnatural amino acids98. Some common guidelines for engineering enzymes have been proposed94. To optimize enzymes to a new chemical function, it is important that some trace of activity is present initially. In the context of applying directed evolution in this way in drug discovery, providing a biological system intended to produce bioactive small molecules with a chemical starting point that has some affinity for the target of interest could give evolution an important helping hand; that is, directed evolution would be better used as a technique in the hit-to-lead process rather than for initial hit identification.
Steps towards directed evolution of small-molecule inhibitors have been taken using randomly recombined biosynthetic pathways. Researchers at the company Evolva have developed a system (Fig. 6) in which yeast cells are used to synthesize novel small molecules. The yeast are engineered by applying horizontal transfer of genes from known biochemical pathways that produce drug-relevant chemical scaffolds, as well as genetic material from organisms with unknown genetic diversity that could provide new enzymatic activities, organisms reported to have a medicinal effect and organisms that can tolerate infections99. The genetic material is combined randomly on yeast artificial chromosomes (YACs) to produce yeast cells that have the potential of synthesizing large random libraries of small molecules. As an example to illustrate the potential of the platform in drug discovery, yeast strains containing YACs with combinations of genes derived from pathways, including those for alkaloid, flavonoid and polyunsaturated fatty acid biosynthesis, were prepared100. Screening was done in single cells using an in-cell brome mosaic virus (BMV) functional assay, such that a yeast cell survives only in the presence of an inhibitor of viral replication expressed from the YACs. In total, 10,208 clones were analysed further. From 35 clones generating the most-potent inhibitors, the authors synthesized, validated and characterized 74 new compounds (mostly with molecular mass of 200–350 Da), which included several novel chemical scaffolds, and 28 of these compounds showed activity in a secondary biological assay.
Continuous directed evolution
In the previously cited examples, the directed evolution systems support the steps of mutation, translation, selection, replication and hereditability separately, with frequent manual intervention needed to make the evolutionary cycle 'spin'. The aim of a continuous directed evolution system is to seamlessly integrate all of the steps into an uninterrupted cycle. Continuous evolution enables many more generations to be explored, meaning greater evolutionary steps may be taken, which may be important for exploration across complex proteins or even multi-enzyme systems producing small molecules.
Bacteriophages, with their high mutation rates and ease of manipulation, have provided an excellent platform for such efforts, beginning 50 years ago with in vitro studies101 and progressing to continuous directed evolution using bacteriophages evolving in bacterial hosts (see Ref. 102 for a review). Natural mutation rates of phages are high, and they can be raised further through the use of chemical or biochemical mutagens. Selection pressures include culture conditions, time for growth and even the concentration of phage, as they compete with each other to survive and reproduce.
Continuous directed evolution has been successfully achieved with the development of the phage-assisted continuous evolution (PACE) system103 (Fig. 7). PACE is capable of evolving any gene that can be linked to the production of M13 phage protein III (pIII), which is expressed on the phage surface and has a key role in entry into bacterial hosts such as E. coli. It has been used to optimize and evolve RNA polymerases104,105, proteases106 and genome-editing proteins107, as well as in the discovery of a protein that binds to a receptor108, as illustrated in the following examples.
PACE was used to rapidly explore the potential for hepatitis C virus (HCV) protease inhibitors to drive the appearance of resistant mutants106. An engineered HCV protease-activated RNA polymerase was used to couple polypeptide cleavage to changes in gene expression that supported pIII production and hence phage propagation during PACE. By performing PACE in the presence of the HCV protease inhibitors danoprevir or asunaprevir, it was possible to rapidly evolve HCV protease-resistant mutants. In the presence of danoprevir, the most common mutation observed in HCV protease was D168E, which weakened the IC50 for danoprevir by 30-fold. In the case of asunaprevir, there was a strong bias for D168Y, which weakened the IC50 value by 30-fold, whereas the D168E mutation shifted the IC50 by only 10-fold.The D168Y mutant has been observed for asunaprevir in individuals with hepatitis. So, a 1–3 day continuous evolution experiment with PACE revealed clinically relevant mutations that would otherwise have needed expensive and lengthy laboratory or clinical experiments to explore.
Another example illustrates the use of PACE to improve genome-editing tools107. Transcription activator-like effectors (TALEs) are DNA-binding domains that provide the ability to specifically cut DNA when coupled to TALE nucleases (TALENs)109. TALEs are limited, in that the 5′-nucleotide of the target is specified to thymine. While promiscuous TALEs have been described, no variants exist that are specific to 5′-adenine, 5′-guanine or 5′-cytosine107. To couple TALEs to the PACE system, the DNA-binding domain was linked to a bacterial RNA polymerase III, which established sequence-specific and binding-dependent production of pIII, enabling propagation of the phage. To suppress promiscuous mutations that also recognized 5′-thymine, the investigators linked 5′-thymine recognition to a dominant-negative pIII, which compromises phage propagation. Through this simultaneous positive and negative selection, 5′-nucleotide-selective TALEs were evolved.
A third recent example, in which PACE was used to evolve Bacillus thuringiensis insecticide toxins to overcome B. thuringiensis toxin resistance, is particularly exciting for the potential use of PACE systems in drug discovery108. The protein Cry1Ac is a widely used B. thuringiensis toxin, but some insects are resistant to it owing to mutations in its receptor. Liu and colleagues used PACE in a two-hybrid system in E. coli to evolve new Cry1Ac analogues that could bind to the Cry1Ac-resistant cadherin-like receptor (CAD) from the cabbage looper moth (Trichoplusia ni), for which Cry1Ac has very low affinity (>1 mM). Cry1Ac analogues were evolved through an artificial evolutionary stepping-stone receptor, toxin-binding region 3 (TBR3), which contains three mutated residues to introduce weak Cry1Ac-binding activity. Continuous evolution was carried out in steps of increasing stringency: first, using the artificial TBR3 receptor and a moderately potent mutagenesis plasmid to avoid early mutations that could abolish Cry1Ac binding to CAD; and second, using full CAD with a higher mutational rate plasmid to access rare Cry1Ac mutational combinations to enhance binding to CAD. Consensus variants that contained the most common of the 25 mutations in Cry1Ac observed during the 527-hour experiment were designed and synthesized, and the resulting Cry1Ac analogues bound to CAD with affinities of 18–34 nM. Such an approach could, in principle, be applied not only to the directed evolution of any protein-based therapeutic but also more broadly to other therapeutic modalities, such as peptides or even small molecules; for example, to circumvent antibiotic resistance.
In vivo continuous evolution has recently been demonstrated in yeast, providing a eukaryotic continuous evolution system, and one that enables the evolution of phenotypes that cannot be easily linked to phage growth110. In this method, the 'cargo' to be optimized is cloned into a native inducible retrotransposon, Ty1, the regulation of which has been engineered to increase library size and mutation rate. The error-prone nature of Ty1 replication, and the capacity for retrotransposon cycling, provides a novel mechanism for in vivo continuous evolution. As a proof of concept, in vivo continuous evolution was applied to the multi-enzyme pathway to increase xylose catabolism110, an important process for lignocellulose conversion in the biofuels industry111. The pathway contains a promoter, a xylose isomerase (XylA) and a xylulokinase (Xks1). Growth on xylose-containing agar plates provided the selection pressure. Starting from the native proteins, over the course of 1 week, a superior isolate emerged, which displayed a 21% increase in exponential growth rate over control and a shorter lag phase, driven by one mutation (Xks1-E164K). When the native XylA was replaced with a previously identified mutant XylA3*, which had already shown a 77% improvement in xylose consumption rate, in vivo continuous evolution generated further improving mutations, this time in XylA. The authors noted that the outcome of improvements were different depending on the starting point, indicating the context-specific nature of continuous evolution.
Directed evolution has been a driving technology in the field of therapeutic antibodies112,113. With the renewed interest in cyclic peptides as a drug modality, the tools of directed evolution are crossing into medicinal chemistry. As the examples in this article highlight, the tools of directed evolution can be used to drive multi-objective optimization of potency, off-target selectivity and stability of cyclic peptides and small molecules. Beyond the peer-reviewed scientific literature, further potentially exciting innovations are being described in the patent literature by emerging biotechnology companies, which are already making deals with major pharmaceutical companies. Some examples of these companies are given in Table 2.
Although the techniques of directed evolution are mature for protein optimization and increasing in scope for peptide drug discovery, small molecules pose additional challenges, such as the greater diversity of building blocks and chemical reactions involved. The specificity of cellular biosynthetic machinery for small molecules produced in this way is also limiting, as attempting to diversify chemistry using a BGC can lead to synthetic route interruptions owing to the lack of recognition by one or more of the enzymes in the pathway. However, this limitation can be overcome by directed evolution of the enzymes, as illustrated in the examples above. Using biosynthetic pathways, the uncovering of cryptic BGCs and the understanding of their regulation, design and construction offers potential for broader use of small-molecule directed evolution32,114. New bioinformatic and computational tools could allow the retrosynthetic design of gene clusters and a greater understanding of substrate promiscuity and flexibility of the associated enzymes to introduce diversity115. New technologies, such as microfluidics, could help improve synthetic biology processes and generate large combinatorial libraries of plasmids116.
To increase the size of libraries of natural product-like small molecules and broaden the diversity through the introduction of non-natural substrates, biosynthesis could be combined with bio-orthogonal synthetic organic chemistry to provide larger and more diverse libraries in the host cell. As the evolutionary cycles progress and the successful cells propagate, the introduction of non-natural substrates for bio-orthogonal chemistry could be refined, in a similar fashion to a medicinal chemistry team 'homing in' on the clinical candidate. Examples described herein already demonstrate the value of hybrid approaches combining directed evolution with more-traditional medicinal chemistry transformations and/or structure–activity relationship (SAR)-guided optimization59,60.
Another important challenge for biosynthesis-based evolutionary approaches to produce small molecules is the decoupling of the coding genotype from the produced and selected molecule; that is, the breaking of the genotype–phenotype link. For proteins or peptides, the translated product can be engineered to remain directly or indirectly coupled to its encoding RNA or DNA, thus maintaining the genotype–phenotype link. Examples include the protein or peptide being expressed on the surface of the encoding cell in phage display, the displayed peptide being covalently bound to its encoding RNA in mRNA display, and cellular sequestration, as in the SICLOPPS technology. The selected protein or peptide primary structures can then be recovered by sequencing the encoding DNA or RNA. However, for biosynthesized small molecules where the direct link to genetic encoding is lost through the intermediary step of their synthesis via enzymes, the structure of the selected molecule itself needs to be determined. Although this is a difficult problem, it is not insurmountable, and it has been a challenge faced for many years by those involved in natural-product screening. Classical fermentation scale-up of successful clones followed by micro-fractioning is one approach. When the lead compound is identified, either it can be scaled up by traditional synthetic organic chemistry100, or the host can be optimized to produce more of the required compound, as was successfully applied in the recent discovery of the novel peptide antibiotic lugdunin from the human nasal commensal bacterium Staphylococcus lugdunensis IVK28 (Ref. 117).
An added complexity for a permeable small molecule is that it can easily diffuse away from the cell that produces it. Permeability can thus break the genotype–phenotype link and allow the molecule to activate biosensors in neighbouring cells, enabling cheater cells to survive. This can be addressed by compartmentalization using droplet emulsions and microfluidics118. For example, a method has recently been disclosed in which individual members of a library of mutant bacterial cells that produce a natural product can be encapsulated in microdroplets suspended in an immiscible carrier liquid, with a target cell containing the biosensor, allowing selection119. The droplet compartmentalization of one producer cell with one sensor cell maintains the genotype–phenotype link and may also allow directed evolution to select for both potency and cell permeability.
The time required to assemble the required biology within an evolutionary system may also be an important factor to consider when deciding to pursue such approaches in drug discovery. In the example discussed above from the company Evolva, it took 6–9 months to assemble the screening system and produce 74 molecules100, which seems comparable to the typical pharmaceutical company time of 1 year from target to hit1. However, as mentioned in the introduction, many of the highest-profile targets yield no tractable starting points, and the aspiration of having a tractable lead for every well-validated target120 may make investment in establishing a directed evolution system worthwhile, even if it takes longer than is required to establish a traditional medicinal chemistry screening cascade.
Although directed evolution may produce leads where traditional hit–identification and hit-to-lead approaches have failed, the leads will most likely still require the current expertise of medicinal chemists for the optimization of other key parameters required for candidate drugs, such as permeability, pharmacokinetics, safety aspects and formulation. Even though directed evolution is embedded in therapeutic antibody discovery, some aspects of developability are not implicitly selected for by the process. The antibody's stability, pharmacokinetics and immunogenicity still need to be considered121. As examples have shown59,60, leads from directed evolution can still benefit from rational design and multi-objective improvements by medicinal chemistry. Thus, directed evolution can be seen as an approach to augment the current toolbox and processes for medicinal chemists, rather than replace them.
Finally, drug discoverers have the greatest chance of success in finding new therapies if, like nature, they can select the best possible modality for a given target and apply the most-appropriate lead-discovery technologies, rather than remain wedded to their training and background122. In this respect, cyclic-peptide drug discovery is showing the way, by crossing over technologies developed for discovering and optimizing therapeutic proteins to identify small molecules. Collectively, academic institutions, pharmaceutical companies and biotechnology companies have many tools. By using them in innovative ways, a transformation in the productivity of hit and lead identification based on directed evolution could be the next revolution in drug discovery.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The authors are grateful to M. Wigglesworth and R. Maciewicz for their critical review of the manuscript.
- Chemical space
A multi-dimensional conceptual region defined by a set of descriptors. For example, 'drug-like' chemical space (defined by limiting the space to molecules with a molecular mass <500 Da, fewer than 30 C, H, N, O or S atoms and fewer than 4 rings) has been estimated to be as large as 1063 molecules.
- Continuous directed evolution
A directed-evolution method that resembles natural evolution, in which the hereditability of fitness is passed onto subsequent generations without manual intervention. As success in directed evolution depends on the number of rounds completed, removing manual steps can dramatically increase the speed of each round, the number of rounds that can be completed and hence the complexity of evolutionary changes that could be driven.
- De novo computational design
The design of compounds based purely on the protein structure, through the computational docking of fragments into an active site and their computational growth using feasible in silico chemical steps to increase calculated binding affinity.
- Design–make–test–analyse cycles
(DMTA cycles). The repetitive central process in lead optimization, involving a cycle of four steps: design (a hypothesis is constructed to improve the profile of the lead molecule); make (compounds exemplifying the design are synthesized); test (synthesized compounds of confirmed structure and purity are tested in one or more carefully constructed and controlled assays); and analyse (the experimental data are analysed and the results are used to amend a design hypothesis for the next cycle).
- Directed evolution processes
Methods that mimic the processes of evolution but are directed towards a user-defined goal.
- DNA-encoded libraries
Very large mixtures of molecules generated using a split-and-pool approach and used for ultra-high-throughput screening. Each synthesized molecule is covalently bound to a DNA fragment, which records the synthetic steps that have been taken to create the small molecule. An immobilized protein target is used to select binders from a pool of DNA-tagged molecules. The structure of the binders is deduced by sequencing the appended DNA tag.
- DNA shuffling and recombination
A way to propagate beneficial mutations by recombining DNA segments from several gene sequences or gene pools from a directed evolution experiment.
- DNA-templated synthesis
A process in which the DNA heteroduplex is used to bring two complementary DNA fragments bearing different reacting molecules into close proximity, increasing the reaction rate by several orders of magnitude. The synthesis of a chemical library is not just encoded in a sequence-dependent manner, but can be used to direct the order of chemical reactions.
- Dynamic combinatorial libraries
Collections of molecules formed from reversible reactions of reagents under thermodynamic control. All species are interconverting at equilibrium. In the presence of a binding protein that binds one or more molecules, the equilibrium is shifted, and the system becomes enriched with the binding moieties.
- Evolutionary algorithms
A subset of machine-learning algorithms inspired by biological evolution. Candidate solutions are individuals in a population, and a fitness function defines their quality and acts as a selection. Successful features from individuals are mutated and/or recombined to form the next generation of individuals, for further selection based on the fitness function.
- Fragment-based drug design
An approach by which small, weakly binding chemical fragments (typically with a molecular mass of 100–200 Da) that bind to a protein target are identified and optimized to higher-affinity leads, usually guided by structural information on the fragment–target interaction from techniques such as X-ray crystallography.
- mRNA display
An in vitro ribosome translation system for peptides and proteins. mRNA display uses the antibiotic puromycin, which causes premature chain termination on the ribosome. The cDNA is transcribed into mRNA libraries, and the 3′-end of each mRNA is coupled via a spacer oligonucleotide to puromycin. The oligonucleotide spacers allow effective translation and termination. The attached puromycin can react with the growing peptide chain, forming a covalent link between the peptide and its encoding mRNA, making the genotype–phenotype link. Selection is made based on the affinity of the peptide or protein with its attached coding mRNA for an immobilized target.
- Non-ribosomal biosynthetic pathways
Pathways that biosynthesize the cores of many natural products based on peptides and polyketides. These involve large modular enzyme complexes known as non-ribosomal peptide synthetases and polyketide synthases.
- Phage display
An in vivo translation system that uses bacteriophage to maintain the link between translated peptides or proteins and the DNA that encodes them. cDNA for the protein or peptide of interest is inserted into the phage coat protein gene, and phage progeny in Escherichia coli 'display' the target protein on its surface, attached to the coat protein. Selection is achieved by affinity for an immobilized target. After elution of binders, affinity maturation is achieved by further rounds of amplification, which introduces further variability in the selected DNA sequences. The amino acid sequence of the optimized binder can be deduced by sequencing the coding DNA of the selected phage.
The steric and electronic features in a ligand that result in the optimal molecular interactions of the ligand with a specific biological target, typically modulating a biological response.
A genetic element that can amplify itself in a genome via a 'copy–paste' mechanism involving reverse transcription into RNA and translation back into DNA, which can then be inserted at various positions in the genome. Retrotransposons are common components of eukaryotic cells.
- Ribosome display
An in vitro translation system for peptides and proteins. The initial cDNA library is fused to a spacer sequence lacking a stop codon. The cDNA is transcribed to mRNA.The mRNA is translated to protein on the ribosome, but the lack of stop codon prevents release factors binding and disassembling the translational complex. Therefore, the spacer sequence remains attached to the tRNA and bound to the ribosome, with the peptide chain protruding, allowing folding. The resulting complex of RNA, ribosome and protein can be selected by the affinity of the protruding protein for its ligand, and sequencing of the mRNA enables the identification of the protein sequence of the bound proteins.
- Site saturation mutagenesis
A method by which one or more codons can be randomized to produce all possible amino acids at chosen positions within the DNA.
- Split-and-mix solid phase synthesis
A method for the synthesis of large combinatorial compound libraries. A solid-phase-supported reagent is split equally, and each portion is reacted with a different reagent. After washing, the individual portions are recombined and mixed. Subsequent rounds of splitting, reaction and recombination generate a final library of xn compounds, where x is the number of starting portions and n is the number of rounds.
- Structure–activity relationships
The links between structural changes and changes in the biological activity of a series of tested molecules. The deduction of these links is a fundamental concept in medicinal chemistry, and the derived structureactivity relationships are used to guide design–make–test–analyse cycles.
About this article
Nature Reviews Chemistry (2018)