Review Article | Published:

Methods for the directed evolution of proteins

Nature Reviews Genetics volume 16, pages 379394 (2015) | Download Citation


Directed evolution has proved to be an effective strategy for improving or altering the activity of biomolecules for industrial, research and therapeutic applications. The evolution of proteins in the laboratory requires methods for generating genetic diversity and for identifying protein variants with desired properties. This Review describes some of the tools used to diversify genes, as well as informative examples of screening and selection methods that identify or isolate evolved proteins. We highlight recent cases in which directed evolution generated enzymatic activities and substrate specificities not known to exist in nature.

Key points

  • Directed evolution is a cyclic process that alternates between gene diversification and screening for or selection of functional gene variants.

  • Library size limitations can be overcome by focusing library diversity on residues implicated by molecular structures, computational models or phylogenetic data. In cases in which there is limited information, random mutagenesis can be used to interrogate the uncertain determinants of protein function.

  • Recombination methodologies access new combinations of functional variation and can shuffle disparate genetic elements to yield new chimeric proteins.

  • Low-throughput screens can directly measure individual phenotypes and thus accurately isolate desired subpopulations. Screen throughput can be increased using indirect visible reporters that are strongly coupled to the desired phenotypes.

  • Selections isolate functional variants through selective replication schemes or physical segregation. Selections operate simultaneously on entire populations and thus offer unparalleled throughput.


Over many generations, iterated mutation and natural selection during biological evolution provide solutions for challenges that organisms face in the natural world. However, the traits that result from natural selection only occasionally overlap with features of organisms and biomolecules that are sought by humans. To guide evolution to access useful phenotypes more frequently, humans for centuries have used artificial selection, beginning with the selective breeding of crops1 and domestication of animals2. More recently, directed evolution in the laboratory has proved to be a highly effective and broadly applicable framework for optimizing or altering the activities of individual genes and gene products, which are the fundamental units of biology.

Genetic diversity fuels both natural and laboratory evolution. The occurrence rate of spontaneous mutations is generally insufficient to access desired gene variants on a time scale that is practical for laboratory evolution. A number of genetic diversification techniques are therefore used to generate libraries of gene variants that accelerate the exploration of a gene's sequence space. Methods to identify and isolate library members with desired properties are a second crucial component of laboratory evolution. During organismal evolution, phenotype and genotype are intrinsically coupled within each organism. However, during laboratory evolution (Fig. 1a), it is often inconvenient or impossible to manipulate genes and gene products in a coupled manner. Therefore, single-gene evolution in the laboratory requires carefully designed strategies for screening or selecting functional variants in ways that maintain the genotype–phenotype association.

Figure 1: Key steps in the cycle of directed evolution.
Figure 1

a | The process of directed evolution in the laboratory mimics that of biological evolution. A diverse library of genes is translated into a corresponding library of gene products and screened or selected for functional variants in a manner that maintains the correspondence between genotype (genes) and phenotype (gene products and their functions). These functional genes are replicated and serve as starting points for subsequent rounds of diversification and screening or selection. b | Although the mutational space is multidimensional, it is conceptually helpful to visualize directed evolution as a series of steps within a three-dimensional fitness landscape. Library generation samples the proximal surface of the landscape, and screening or selection identifies the genetic means to 'climb' towards fitness peaks. Directed evolution can arrive at absolute maximum activity levels but can also become trapped at local fitness maxima in which library diversification is insufficient to cross 'fitness valleys' and access neighbouring fitness peaks.

In this Review, we summarize techniques that generate single-gene libraries, including standard methods as well as novel approaches that can generate superior diversity containing a larger proportion of functional mutants. We also review screening and selection methods that identify or isolate improved variants within these libraries. Although these strategies can be applied to multigene pathways3,4 and gene networks5,6,7, the examples in this Review will focus exclusively on the laboratory evolution of single genes. In addition, although many of these approaches apply to other types of biomolecules, we focus on the directed evolution of proteins because protein evolution has proved to be especially useful for generating novel biocatalysts8, reagents9 and therapeutics10.

Methods for gene diversification

It is impossible to cover the entire mutational space of a typical protein: complete randomization of a mere decapeptide would yield 1013 unique combinations of amino acids, which exceeds the achievable library size of almost all known protein library creation methods. Because comprehensive coverage of sequence space is impossible, gene diversification strategies are designed to perform an optimal sparse sampling of a vast multidimensional sequence space. The activity level of each library member can be conceptualized as the elevation in a fitness landscape on an xy coordinate that represents the genotype of that library member. The goal of directed evolution studies is to take mutational steps within this landscape that 'climb' towards peak activity levels (Fig. 1b). Over many generations, these beneficial mutations accumulate, resulting in a successively improved phenotype.

Researchers can use focused mutagenesis to maximize the likelihood that a library contains improved variants, provided that amino acid positions that are likely determinants of the desired function are known. In the absence of plausible structure–function relationships, random mutagenesis can provide a greater chance of accessing functional library members than focusing library diversity on incorrectly chosen residues that, when mutated, do not confer desired activities. Researchers have developed an extensive range of methods to perform both forms of gene diversification, and the most successful strategies often integrate random and focused mutagenesis.

Random mutagenesis. Traditional genetic screens use chemical and physical agents to randomly damage DNA. These agents include alkylating compounds such as ethyl methanesulfonate (EMS)11, deaminating compounds such as nitrous acid12, base analogues such as 2-aminopurine13, and ultraviolet irradiation14. Chemical mutagenesis is sufficient to deactivate genes at random for a genome-wide screen but is less commonly used for directed evolution because of biases in mutational spectrum11,12.

Non-chemical methods to randomly mutate genes frequently enhance the rate of errors during DNA replication. In Escherichia coli, DNA replication by DNA polymerase III introduces mutations at a rate of 10−10 mutations per replicated base15. This rate is increased in mutator strains containing deactivated proofreading and repair enzymes, mutS, mutT and mutD15,16,17. Transformation of the XL1-red strain with a plasmid bearing the evolving gene yields mutations at a rate of 10−6 per base per generation16. Unfortunately, these strains not only mutate the library member but also induce deleterious mutations in the host genome. Host intolerance to a high degree of genomic mutation places an upper limit on in vivo mutagenesis rates. To avoid this constraint, C. C. Liu and co-workers18 developed orthogonal in vivo DNA replication machinery that only mutates target DNA. This method co-opts naturally occurring Kluyveromyces lactis linear plasmids pGKL1/2 and their specialized TP-DNA polymerases. Because this plasmid is exclusively cytoplasmic, the TP-DNA polymerase exerts no mutational load on the host genome within the nucleus of Saccharomyces cerevisiae.

The relatively low mutation rates and the lack of control offered by most previously described in vivo random mutagenesis protocols have led to a strong preference towards in vitro random mutagenesis strategies. In error-prone PCR (epPCR), first described by Goeddel and co-workers19, the low fidelity of DNA polymerases under certain conditions generates point mutations during PCR amplification of a gene of interest. Increased magnesium concentrations, supplementation with manganese or the use of mutagenic dNTP analogues20 can reduce the base-pairing fidelity and increase mutation rates to 10−4~10−3 per replicated base21. Because mutations during PCR accumulate with each cycle of amplification, it is possible to increase the average number of mutations per clone by increasing the number of cycles.

One application of epPCR is to generate neutral drift libraries. Before directed evolution experiments are carried out, a target gene is mutagenized by epPCR and fused to a GFP reporter, and the variants are then screened for proper protein expression22. After multiple rounds of mutagenesis and screening, the resulting neutral drift library exhibits sequence diversity that does not destabilize protein structure and is therefore largely devoid of the deleterious mutations that would otherwise have accumulated during the multiple rounds of mutagenesis. Such libraries provide a valuable and evolvable starting point for subsequent directed evolution of the target protein towards a phenotype of interest22.

The DNA polymerases used in epPCR exhibit mutational biases, but unbalanced dNTP concentrations and proprietary mixtures of polymerases can help to reduce imbalance in the mutational spectrum23,24. To yield a more ideal nucleotide mutational spectrum, Schwaneberg and co-workers25 developed sequence saturation mutagenesis (SeSaM) in which the universal base deoxyinosine is enzymatically inserted throughout the target gene. Although this approach is effective, epPCR is easier to implement and can provide high mutation rates with fairly broad mutational spectra.

Focused mutagenesis strategies. Many proteins are structurally characterized at sufficient resolution to implicate specific residues in substrate binding or catalysis. Although random mutagenesis can generate stochastic point mutations at codons corresponding to these residues, access to codons that require mutation of more than one nucleotide relative to the initial codon often requires a focused mutagenesis strategy. Perhaps the most straightforward focused mutagenesis approach uses synthetic DNA oligonucleotides containing one or more degenerate codons at positions corresponding to targeted residues. This mutagenic oligonucleotide is incorporated into a gene library as a mutagenic cassette26 using either traditional restriction enzyme cloning or contemporary gene assembly protocols27,28,29. The simultaneous saturation mutagenesis of multiple residues can access combinations of mutations that may exhibit epistatic interactions. For example, synergistic mutations are those that in combination confer an effect that is larger than the sum of the effects of each individual mutation. Two beneficial mutations that exhibit synergism can undergo sequential enrichment and are therefore accessible through iterative single-residue saturation libraries. However, to access combinations of mutations exhibiting sign epistasis — a case in which mutations may be deleterious in isolation but confer gain of function in combination — sequential acquisition is impossible, and simultaneous saturation is therefore necessary.

As the number of unique sequences increases exponentially with the number of randomized sites, only a handful of residues can be randomized if complete coverage of the resulting combinations of mutations is desired. Furthermore, the vast majority of individual mutations are likely to be neutral or deleterious to the desired activity30. The mutational load of simultaneous saturation increases with the number of randomized sites, and the resulting library will be populated with a larger fraction of inactive library members.

For this reason, a number of focused mutagenesis strategies only introduce specific amino acid substitutions that are likely to be beneficial. Phylogenetic analyses of homologous proteins, which are pre-enriched for functional variation owing to natural selection, are one means for identifying these potentially beneficial mutations. Wyss and co-workers31 demonstrated that the introduction of consensus mutations can improve thermostability and native enzymatic activity. Rather than focusing on common ancestral mutations, reconstructed evolutionary adaptive path (REAP) analysis identifies significant mutational divergence that is more likely to confer novel gain of function32. These mutational signatures can be adopted from a distinct evolutionary pathway with known phenotypic characteristics and further curated based on structural proximity to the active site.

Molecular modelling can also predict specific amino acid substitutions that are likely to be beneficial33. Algorithms such as Rosetta calculate free energies based on steric clashes, hydrophobic packing, hydrogen bonding and electrostatic interactions34. Mutations that are predicted to stabilize protein folding35 or to improve transition state stabilization can be introduced into the library semi-stochastically by incorporating synthetic oligonucleotides via gene reassembly (ISOR)36.

Diversification by recombination. The reassortment of mutations to access beneficial combinations of mutations is a crucial component of biological evolution. This natural process can be mimicked by a variety of methods under the broad umbrella of homologous recombination. The original DNA shuffling method described by Stemmer37 fragments a gene with DNase and then allows fragments to randomly prime one another in a PCR reaction without added primers (Fig. 2a). A related method developed by Monticello and colleagues38, random chimeragenesis on transient templates (RACHITT), also uses DNase-mediated fragmentation but a different method of reassembly. Fragments anneal directly to a temporary uracil-containing scaffold; upon flap resection and fragment ligation, the scaffold is digested. DNase concentration and fragmentation reaction duration offer crude mechanisms to shift fragment sizes and crossover frequencies, but newer protocols provide greater control. Nucleotide exchange and excision technology (NExT)39 incorporates a fixed concentration of deoxyuridine triphosphate (dUTP) during PCR; subsequent treatment with uracil deglycosylases and apurinic/apyrimidinic lyases yields random fragments with size distribution determined by dUTP concentration. Unlike fragmentation-based methods, staggered extension process (StEP) described by Arnold and colleagues40 is a modified PCR protocol in which the elongation step is interrupted prematurely by heat denaturation. Subsequent annealing allows incomplete extension products to switch templates, effecting recombination of multiple DNA templates into one amplicon (Fig. 2b).

Figure 2: Genetic recombination methods for protein evolution.
Figure 2

a | DNA shuffling begins with DNase fragmentation of homologous gene variants. These fragments prime one another in a PCR reaction. The cycle of annealing, extending and denaturing fragments reassembles full-length gene products containing recombined segments. b | Staggered extension process (StEP) achieves homologous recombination by repeated premature denaturation during the extension stage of a PCR. Because partially polymerized genes may switch templates, the resulting full-length gene products are chimaeras with a varying number of crossovers. c | Assembly PCR or synthetic shuffling methods require overlapping libraries of oligonucleotides that encode genetic variation at a number of loci. These primers extend one another in various combinations to yield recombined gene products. d | Incremental truncation for the creation of hybrid enzymes (ITCHY) uses exonucleases to degrade the parts of genes that encode the free amino or carboxyl termini of prospective fusion protein partners. Blunt-end ligation yields single crossovers with varying fractional composition from each template gene. e | Non-homologous random recombination (NRR) uses DNase fragmentation followed by blunt-end ligation to generate diverse topological rearrangements (deletions, insertions and domain reordering). Note that DNA fragments are not drawn to scale. Typical recombined fragments range in length from dozens to hundreds of bases.

With the decreasing cost of synthetic oligonucleotides, assembly PCR41 (also known as assembly of designed oligonucleotides (ADO) or synthetic shuffling42,43) has become a preferred recombination strategy. In these reactions, overlapping primers extend one another; after multiple cycles the process yields full-length gene products in which each combination of mutation-bearing oligonucleotides has been recombined (Fig. 2c).

Recombination is only effective on a diverse population of functional genes. Typically, one of the recombination methods described above is used between rounds of evolution to recombine mutations from distinct clones44. Alternatively, homologous recombination with copies of the wild-type DNA sequence can eliminate non-beneficial passenger mutations, analogous to the traditional breeding technique of back-crossing37. In another effective use of homologous recombination during gene diversification, a family of closely related naturally occurring homologues can be shuffled into a starting library to take advantage of nature's pre-evolved repertoire of functional gene variants45.

Methods for in vitro recombination require substantial manual manipulation and are usually followed by transformation or transduction to introduce the recombined gene population back into cells. Cornish and colleagues46 harnessed the power of native systems in S. cerevisiae to perform homologous recombination between a library of donor cassettes and the evolving gene. Through yeast mating, functional gene variants undergo reassortment with different donor cassettes, allowing homologous recombination within the evolving population. Seamless alternation between sexual reproduction and selection can support continuous evolution46.

The recombination methods described above rely on sequence homology to preserve gene structure among recombinants. By contrast, sequence homology- independent protein recombination (SHIPREC) permits shuffling of disparate gene elements. Such a capability is particularly useful for recombining families of proteins with similar functions but disparate sequences47. Homology-independent recombination can also create combinatorial protein libraries that do not preserve the ordering or lengths of domains. Ostermeier et al.48 devised incremental truncation for the creation of hybrid enzymes (ITCHY), in which homology-independent recombination is used to create hybrid enzymes through the incremental truncation and fusion of two distinct genes (Fig. 2d). In addition, our laboratory49 used non-homologous random recombination (NRR) to generate functional proteins with substantial rearrangements of domain topology (Fig. 2e). Although these two techniques are different, they both involve random fragmentation (for example, using a DNase or an exonuclease) followed by sequence-independent ligation of fragments. Tuning fragmentation conditions can shift the average number of crossovers, and electrophoresis can be used to isolate ligated products of the desired length to minimize inactive library members that are too short or too long, or that have excessive numbers of crossovers.

Nonetheless, the vast majority of non-homologous recombinants will display domain disruption and folding instability. The SCHEMA algorithm computationally identifies breakpoints in proteins that minimize the number of inter-domain interactions50. Type IIb restriction enzyme sites can be inserted at these optimal breakpoints within the DNA sequences, and enzymatic digestion yields 'sticky ends' that enable sequence-independent site-directed chimeragenesis (SISDC)51. Alternatively, chimeric oligonucleotides with complementarity to two distinct domains defined either by eukaryotic exons52 or by SCHEMA can be used in overlap extension PCR53. A library of these chimeric primers can be used to shuffle domains even in the complete absence of homology.

Diversification strategy considerations. Directed evolution practitioners increasingly use sophisticated focused mutagenesis methods to construct smaller libraries of higher quality that sample a functionally rich portion of the fitness landscape. These strategies require phylogenetic information or molecular structures to focus library diversity on residues or even specific substitutions that are thought to be necessary for the desired activity. In the absence of this information, random mutagenesis is an absolute necessity. Even when the requisite data are available, deducing the determinants of protein function at the amino acid level can be challenging. Random mutagenesis maybe used to probe mutations that are distant from obvious substrate contact sites or that are not present in naturally evolved orthologues. Fortunately, random and focused mutagenesis strategies can be combined into a single diversification step or applied separately during successive rounds of evolution to maximize the likelihood of success54 (Table 1).

Table 1: Comparison and summary of approaches to library diversification

Genetic screens for single-gene evolution

Genetic screens were originally developed to discover genes associated with specific phenotypes. Geneticists randomly mutagenize the genome of a model organism and then assay individual organisms for a phenotype of interest. Organisms with altered phenotypes are characterized by crossing and linkage analyses, or more recently by high-throughput DNA sequencing, to identify specific mutations underlying phenotypic changes. Directed evolution applies similar screening strategies to single-gene libraries prepared with the aforementioned diversification methods.

Screens of spatially separated variants. Spatial separation (that is, encoding by location) of individual mutants preserves the linkage between phenotype and genotype. For these screens, gene variants are expressed in a unicellular model organism such as E. coli that can be screened as colonies on solid media or transferred into multiwell liquid culture plates (Fig. 3A). Although spatial separation of clones imposes a practical throughput limit of fewer than ~104 library members per screening round, a key advantage of this approach is its broad compatibility with many different assay techniques. When a fluorescent readout is not available, techniques such as nuclear magnetic resonance (NMR), high-performance liquid chromatography (HPLC), gas chromatography or mass spectroscopy can directly monitor substrate consumption or product formation. In principle, almost any enzymatic activity can be screened in a spatially separated library format, although the time-consuming and infrastructure-intensive nature of some spatially separated screening techniques further limit throughput.

Figure 3: Screening methods for protein evolution.
Figure 3

A | Clonally isolated variants are screened as colonies on solid media or as wells in liquid culture. Fluorescent or colorimetric reporters are measured by automated microtitre plate readers. Alternatively, lysates can be screened for product formation using chromatography, mass spectrometry or nuclear magnetic resonance (NMR). B | Fluorescence-activated cell sorting (FACS) enables the fluorescence measurement of individual cells and the separation of distinct subpopulations by electrostatic deflection. C | Yeast display techniques enable FACS screens of protein–protein interactions (part Ca), bond formation (part Cb) and peptide bond cleavage (part Cc). An evolving protein is displayed on the cell surface as a fusion to the cell–cell adhesion protein Aga2. Novel protein–protein interactions or improved affinities are identified on the basis of decoration with fluorescent-labelled antibodies that recognize both the evolving protein and a bound target (part Ca). Similarly, surface-displayed protein ligases that have high activity or altered substrate specificity are sorted on the basis of the covalent attachment of an epitope tag that is then detected by a fluorescent-labelled antibody (part Cb). In a screen for proteases with altered substrate specificities, the enzyme is retained in the endoplasmic reticulum. Upon substrate cleavage, an epitope tag is removed from an Aga2 fusion protein, thus allowing differential labelling with antibodies (part Cc). DE | In vitro compartments can be formed through double emulsions (part D) or with polyelectrolyte shells (part E). These compartments entrap DNA, translated proteins and fluorogenic substrates, allowing the fluorescence-activated sorting of functional variants. m/z, mass-to-charge ratio.

When performing low-throughput screens, an understanding of structure–activity relationships within the target protein may be necessary to maximize the probability of accessing a desired variant. These considerations are best exemplified by the evolution of cytochromes P450, a class of enzymes with high evolutionary potential evidenced by the diverse oxidative reactions they catalyse in nature. Arnold and colleagues8 screened a panel of ~100 previously designed P450 variants in E. coli lysates for carbene transfer to form cyclopropanes; product formation was monitored by gas chromatography. The resulting enzymes exhibit high-activity cyclopropanation with enantioselectivity and diastereoselectivity, capabilities that are not known to exist in any natural biocatalysts. In this case, prior knowledge of mutants with altered P450 activities enabled success with only a small library and a low-throughput screen.

When molecular insight or prior knowledge is lacking, it may be necessary to screen more variants to reach the desired phenotype. High-throughput screens rely on the rapid assessment of optical features such as colour, fluorescence, luminescence or turbidity. In special cases, the protein of interest has an inherently visible phenotype, as demonstrated by the pioneering evolution of the alkaline serine protease subtilisin. You and Arnold55 screened colonies on casein plates for zones of clearing due to proteolysis of substrate milk proteins. A secondary screen on casein plates containing dimethylformamide (DMF) identified variants exhibiting solvent-tolerant proteolysis.

Fluorescent proteins provide a readily screenable phenotype, and thus multiple research groups have used cellular fluorescence as a screen to identify GFP variants with brighter fluorescence and altered absorption or emission spectra44,56. More recently, this approach was applied to Arch rhodopsin, a form of channel rhodopsin engineered by Cohen and colleagues57 to exhibit voltage-dependent fluorescence and used to directly image neuronal activity. Arnold and colleagues9 expressed a library of Arch variants in E. coli using multiwell liquid culture plates and washed cells with ionic buffer to generate the transmembrane potential required for fluorescence measurements. After multiple rounds of screening random and site-directed libraries, the most active variant displayed red-shifted emission and increased brightness. The capabilities of evolved Arch should enable parallel monitoring of multiple neurons using wide-field microscopy.

Most biomolecules are not associated with directly observable phenotypes and therefore require a fluorescent, colorimetric or other readily detectable reporter. Surrogate substrates can be added directly to liquid culture or lysates to generate a fluorescent, luminescent or colorimetric signal that is proportional to the enzymatic activity of interest. As a result, these reporters allow precise screening of diverse catalysts such as P450 monooxygenases58, cellulases59, organophosphate hydrolases60 and retroaldolases61. However, the development of surrogate substrates for some reactions can represent a substantial undertaking62. In addition, evolved variants will have only been screened for activity on a surrogate substrate, and they must be separately assayed to ensure that enzyme optimization on the surrogate also improves activity on the desired substrate.

Widely used genetic reporters such as GFP, luciferase and beta-galactosidase enable facile detection of gene expression. Expression-mediated screens have been developed for the study of protein–protein interactions63 and the activity of enzymes including cellulases and glycosynthases64,65,66. As a general strategy, small-molecule- or cell-state-inducible genetic circuitry from nature can be used to detect desired enzymatic activity. For example, Ackerley and co-workers67 used the DNA-damage inducible SOS promoter to express beta-galactosidase in proportion to nitroreductase activation of genotoxic prodrugs. Through iterative site-directed mutagenesis, this screen identified nitroreductase variants that activated chemotherapeutic prodrugs and killed tumour cells with greater efficiency than wild-type nitroreductase. Gene expression reporters are imperfect measures of enzymatic activity but, when used properly, can correlate strongly with enzymatic activity68. Automated fluorescence measurement and robotic colony picking lighten the tedious workload of these screens, but the physical and material constraints associated with spatial separation inherently limit throughput.

High-throughput screening by flow cytometry. Rather than spatially separating clones, a bulk population can be interrogated at the level of individual cells using the cell wall or membrane to maintain genotype–phenotype association. Fluorescence-activated cell sorting (FACS)69 relies on a non-diffusing fluorescent reporter to automate the identification and isolation of cells containing desired gene variants (Fig. 3B). Integrating major advances in microfluidics, optics and cell manipulation, state-of-the-art flow cytometry offers one of the highest capacities of any screening method, achieving up to 108 library members screened in <24 hours70,71.

Cytosolic fluorescent or luminescent proteins within cells can form the basis for FACS screens of enzymes such as recombinases, chaperones and inteins72,73,74. Cell surface-displayed epitopes are also non-diffusive and can be detected by FACS using fluorescent-labelled antibodies. This approach became more widely used with the development of a yeast display screen for protein–protein interactions71. Boder and Wittrup71 expressed a library of epitope-tagged antibody fragments fused to the yeast mating adhesion receptor Aga2. The resulting library members were displayed on the surface of cells, where they had the opportunity to bind to a target protein fused to a second epitope tag. FACS enabled the isolation of cells decorated with two fluorescent-labelled antibodies, one for each of the epitopes, indicating proper antibody display and target binding (Fig. 3Ca). Researchers can modulate the stringency of FACS screens by varying washing conditions and the fluorescence threshold that triggers cell isolation. For many years, yeast surface display has facilitated affinity maturation of antibody–antigen pairs75 and the discovery of new protein–protein interactions76.

Recently, the yeast display framework has been applied to the evolution of more diverse enzymatic activities. Bond-forming enzymes can be evolved using yeast display, as our laboratory77 demonstrated by evolving sortase A (SrtA), a sequence-specific transpeptidase (that is, protein ligase) from Staphylococcus aureus. Aga2–SrtA library members were displayed on the cell surface alongside a triglycine (GGG) acceptor peptide fused to Aga1. Upon incubation with the biotinylated substrate peptide LPETG, active SrtA catalysed bond formation between the substrate and the acceptor. FACS was used to isolate cells displaying the biotinylated LPETGGG product (Fig. 3Cb). Owing to the unfavourable kinetics of wild-type SrtA, efficient bioconjugation typically requires equimolar concentration of substrate and enzyme. Iterated rounds of FACS screening with increasing stringency produced evolved variants of SrtA (eSrtA) with 140-fold higher kcat/Km values, enabling new applications78,79,80,81,82,83.

The development of a negative screen (also known as counterscreen) using unlabelled competitor substrates enabled our laboratory84 to evolve reprogrammed orthogonal sortases that selectively conjugate LAETG or LPESG substrates. Because substrates are applied ex vivo, this approach is not limited to genetically encoded peptide substrates, and it should be possible to design similar screens for enzymes that catalyse many different classes of bond-forming reactions.

Yeast display can also be modified for the evolution of bond-cleaving enzymes. Iverson, Georgiou and colleagues85 developed yeast endoplasmic reticulum sequestration screening (YESS) in which Aga2 is expressed as a fusion protein to a negative screening substrate, epitope tag 1, a positive screening substrate and epitope tag 2. The Aga2 substrate is retained in the endoplasmic reticulum for processing by a member of a protease library. The presence of both epitope tags on the cell surface indicates protease inactivity, whereas proteolysis of the negative screening substrate would eliminate both tags. FACS isolated the subpopulation of proteases that exclusively cleaved the positive screening substrate and thereby left only epitope tag 1 on the cell surface (Fig. 3Cc). Using YESS, Iverson, Georgiou and colleagues85 evolved tobacco etch virus (TEV) protease variants that selectively cleave ENLYFE/S or ENLYFH/S sequences but not the wild-type substrate ENLYFQ/S. These recent advances demonstrate how cell surface display can be adapted to screen for complex enzymatic activities.

Screening artificial cell-like compartments. When cell-constrained fluorescent reporters are difficult or impossible to implement for a given gene and phenotype, in vitro compartmentalization (IVC) provides an alternative format to enable high-throughput screening. IVC, pioneered by Tawfik and Griffiths86, uses the aqueous droplets in water–oil emulsions to compartmentalize individual genes and gene products along with a surrogate fluorogenic substrate. IVC can enable protein evolution in two formats: either emulsion of single cells expressing the library member or emulsion of individual DNA molecules together with in vitro transcription–translation machinery. Because flow cytometers can only sort particles in an aqueous mixture, a secondary emulsion is necessary to create water–oil–water droplets87 (Fig. 3D) for FACS-based screening. The flexibility to use fluorogenic substrates expands the phenotypes and enzymes that can be screened by flow cytometry.

Recently, IVC coupled with flow cytometry was used to evolve mammalian paraoxonase 1 (PON1). Wild-type PON1 can degrade a variety of organophosphate compounds and has a weak activity on some nerve agents. Tawfik and colleagues60 used fluorogenic coumarin substrate analogues to sort IVC droplets based on phosphotriesterase activities of PON1 variants. The resulting evolved enzyme rePON1 exhibits a 105-fold increase in catalytic activity on cyclosarin and is the first enzyme to degrade G-type (sarin-like) nerve agents with sufficient efficiency to provide prophylactic protection.

Chip-based microfluidic systems ('FACS on a chip') offer several advantages over conventional flow cytometry apparatus. The process of microfluidic droplet formation is more likely to encapsulate single cells or DNA library members, and the consistent volume and quantity of fluorescent reporters in each droplet can support highly quantitative measurements88. Furthermore, the path length of the flow cell precisely dictates the reaction time. These advantages have been demonstrated in proof-of-concept screens for cellulase and peroxidase activities59,88.

Alternative cell-like compartments beyond water–oil emulsions can also entrap genes, proteins and substrates in a suitable format for screening. Shell-like compartments made of layered polycationic and polyanionic polymers (polyelectrolytes) can encapsulate E. coli cells. Because these compartments are stable to detergent, DNA and protein remain linked even after detergent-induced cell lysis. Scott and Plückthun89 used this platform to screen for properly solubilized G protein-coupled receptors (GPCRs) that retain their structure and affinity for a fluorescent probe. In a similar approach, Hollfelder and co-workers60 built polyelectrolyte gel-shell beads (GSBs) that are compatible with flow cytometry (Fig. 3E). Using the fluorogenic organophosphate analogues described above, the researchers sorted GSBs based on phosphotriesterase activity to identify parathion hydrolase variants that more rapidly degrade organophosphate pesticides90.

Selections for functional proteins

Screening, by definition, requires the inspection of individual phenotypes. The resulting data, which can be very rich depending on the choice of observables, not only identify desirable subpopulations but also inform the choice of appropriate screen stringency in subsequent rounds of evolution. By contrast, selection bypasses the need to individually inspect each library member and instead links an activity of interest to physical separation of the encoding DNA or to survival of the organism producing active library members. The development of effective schemes by which molecular activities of interest lead to segregation or replication of desired variants can be a major undertaking that requires creativity and strong molecular intuition. Well-designed selection offers unparalleled throughput albeit at the expense of potentially rich screening data. This drawback often necessitates a secondary phenotypic assay of selection hits in order to optimize diversification and selection protocols for the next cycle of evolution.

Selections for binding affinity. Because all library members in the same mixture undergo selection simultaneously, a molecular linkage between genes and the corresponding gene products, rather than spatial encoding, must be maintained. In a typical target-binding selection, protein library members with desired binding activity and their encoding DNA sequences are captured using an immobilized target, whereas non-binding library members are washed away. In cell surface display or phage display methods, a cell or bacteriophage serves as a compartment to link genes and gene products. Protein library members are expressed on the surface of the cell or the coat of the bacteriophage through fusion with endogenous cell surface proteins91 or phage coat proteins71,92. Phage display has proved to be highly effective in the development of therapeutic antibodies10,93 and in the elucidation of peptide binding motifs94.

Unlike screening methods that are typically limited by measurement throughput, a transformation bottleneck95,96 restricts library sizes that can be processed by selection methods such as cell surface display or phage display, both of which require intracellular translation. As bacterial transformation provides, at best, ~109–1010 transformants per experiment, cell- or phage-based selection methods are generally limited to library sizes in this range96. Ribosome display, developed by Hanes and Plückthun97, can bypass this bottleneck through the use of in vitro translation reactions. In the absence of a stop codon and under carefully controlled conditions, ribosomes remain stably bound to both the mRNA and the growing polypeptide, thereby coupling proteins with their encoding genes. Similarly, mRNA display, developed by Wilson, Keefee and Szostak98, covalently links a translated protein to its encoding mRNA through a puromycin analogue. Binding selections are conceptually simple but limited in scope (Fig. 4a). They are well suited for evolving binding affinity but have only been used in a limited number of cases to evolve enzymes, including β-lactamases99 and RNA ligases100. Although binding affinity is an important component of enzymatic activity, catalytic efficiency and the rate of product release — two properties that are not necessarily maintained or improved during a binding selection — can strongly determine overall enzyme desirability.

Figure 4: Selection methods for protein evolution.
Figure 4

a | Affinity selection identifies library members that bind to an immobilized target. Methods for covalently linking proteins with their corresponding genes during selection include display on phage particles via protein fusion to the coat protein pIII (left), covalent attachment to their encoding mRNA transcript via a puromycin linkage (middle) and the non-covalent attachment of both mRNA and nascent polypeptide to stalled ribosomes (right). b | Compartmentalized self-replication (CSR) selects for DNA and RNA polymerases that can amplify, by PCR, their own genes within water emulsion droplets (blue circle) isolated from one another by an oil phase (brown rectangle). c | In compartmentalized partnered replication (CPR), the evolving activity must trigger expression of Taq polymerase. For example, aminoacyl tRNA synthetase (aaRS) activity promotes amber stop codon suppression, leading to the expression of full-length Taq polymerase. Individual Escherichia coli cells are then isolated in water–oil emulsion droplets and lysed by heat. Higher Taq expression leads to better PCR amplification of the active library members. d | During phage-assisted continuous evolution (PACE), host E. coli cells continuously dilute an evolving population of ~1010 filamentous bacteriophages in a fixed-volume vessel (cell stat; blue rectangle). Phage encoding active variants trigger host cell expression of the missing phage protein (pIII) in proportion to the desired activity and consequently produce infectious progeny, whereas phage with inactive variants produce progeny that are not infectious and are diluted out of the vessel.

Organismal survival as a basis for selection. In a second important class of selections, active library members enable organisms containing their corresponding genes to survive and replicate. Antibiotic resistance is perhaps the most straightforward activity to evolve using the selective replication of E. coli. Numerous studies have evolved enzymes that neutralize or export antibiotics, yielding variant enzymes that are predictive of natural evolutionary trajectories in microorganisms with tolerance to higher doses of antibiotics or resistance to a broader scope of antibiotic substrates45,101,102. In addition to evolving the genes that confer antibiotic resistance, it is also possible to use antibiotic selections to evolve other proteins by linking the desired activity to the expression of an antibiotic resistance gene. For example, Schultz and co-workers103,104 evolved aminoacyl tRNA synthetases that aminoacylate suppressor tRNAs with non-canonical amino acids, resulting in the suppression of a stop codon within a chloramphenicol efflux pump gene. In a similar strategy linking enzymatic activity to antibiotic resistance, Barbas and colleagues105 evolved recombinases with altered DNA sequence specificities by using their activity to reassemble a beta-lactamase gene.

Auxotroph complementation can also form the basis of selections for the evolution of metabolic enzymes. Xylose metabolism is an important target for protein evolution because xylose is a limiting factor in the conversion of lignocellulose biomass into ethanol for use in biofuels. Growth in media containing xylose as the sole carbon source enriches for genes encoding enzymes that better utilize this energy source. Using this strategy, monosaccharide transporters106 and a xylose isomerase107 were evolved for more efficient xylose consumption and ethanol production in S. cerevisiae.

The design of selections for protein activities that do not fulfil metabolic functions is more challenging and requires ingenuity. For example, Hilvert and colleagues108 evolved nanocontainers to more effectively trap HIV protease, a protein that is toxic to E. coli hosts and for which sequestration confers faster growth rates. This approach yielded lumazine synthase capsids that had tenfold higher loading capacity.

Selections within in vitro compartments. In vitro selections can bypass limitations of in vivo selections such as transformation efficiency bottlenecks and host genome mutations that unexpectedly influence selection survival. A popular approach to couple genes and gene products without using cells is the translation of library members in artificial compartments such as the aqueous droplets of water–oil emulsions. Selections within in vitro compartments are particularly well suited for enzymes that directly act on DNA substrates. For example, in a selection for meganucleases with altered sequence specificity, Stoddard and colleagues109 placed a mutated substrate sequence directly upstream of the meganuclease gene; DNA cleavage generated sticky ends that were competent for ligation of a PCR adapter. As a result, PCR within the emulsion droplets selectively amplified genes encoding nucleases that were active on the new substrate sequences.

In vitro selections for DNA and RNA polymerases in emulsions are also referred to as compartmentalized self-replication (CSR) because the polymerases that most efficiently replicate their encoding gene in an emulsion PCR are enriched post-selection110 (Fig. 4b). Using CSR, Holliger and colleagues110 evolved DNA polymerases with higher thermostability and expanded substrate preferences, including Taq polymerase variants that accept Cy3 and Cy5 fluorophore-linked dNTPs111. These evolved polymerases directly incorporate bright fluorescent dyes into DNA molecules, generating nucleic acid polymers with highly altered physical and chemical properties111. In a separate study, Holliger and colleagues112 used CSR to select DNA polymerases that more efficiently amplify damaged DNA isolated from extinct organisms.

The development of compartmentalized partnered replication (CPR) extends IVC selections beyond enzymes that act on DNA113. In CPR selection schemes, the evolving enzymatic activity controls expression of Taq polymerase. Higher concentrations of Taq lead to better PCR amplification of active genes within emulsion droplets containing single E. coli cells (Fig. 4c). The first demonstrations of CPR evolved T7 RNA polymerase variants with orthogonal promoter preferences113,114, an achievement that could in principle be accomplished using CSR. However, the power of CPR to evolve enzymes that do not act on DNA substrates was demonstrated through the evolution of tryptophanyl-tRNA synthetases that selectively charged the non-canonical amino acid 5-hydroxy-L-tryptophan onto suppressor tRNAs that suppress stop codons placed in the Taq polymerase gene113.

Emerging evolution paradigms

Continuous evolution. Traditional protein evolution methods require discrete time- and labour-intensive steps in which researchers generate gene libraries, introduce them into translation systems such as cells or in vitro compartments, perform screens or selections, and then isolate genes encoding library members with desired activities. Recently, researchers have developed methods by which all steps of the protein evolution cycle are performed continuously without manual intervention. These continuous evolution systems can markedly increase the efficiency of protein evolution and, therefore, the number of steps in the sequence space that can be explored in the search for optimal protein variants115.

The majority of continuous evolution experiments have selected for replicative fitness of microorganisms under continuous dilution. This continuous culture format has been applied to the evolution of bacterial genomes for shortened replication time116 and resistance to antibiotics117. Single-gene evolutions are also feasible in continuous culture, as demonstrated with chorismate mutase118 and β-lactamases119. However, specially designed continuous mutagenesis methods that only target the evolving gene of interest are crucial for long evolutionary trajectories to avoid host genome mutations that circumvent selections by inducing cell survival for reasons unrelated to the protein of interest. For this reason, error-prone polymerases that exclusively replicate the library member are particularly amenable to continuous evolution in both E. coli119 and S. cerevisiae18. In addition, the aforementioned system for in vivo recombination in S. cerevisiae exclusively triggers recombination in an evolving gene during alternating stages of sporulation and selection46.

Continuous evolution of viruses, including bacteriophage, is conducted in a fixed-volume vessel (a cellstat or 'lagoon') that is diluted with fresh bacterial host cells. The average residence time in the vessel is shorter than the time required for bacterial replication but longer than phage replication; thus, mutations only accumulate in the phage genome. This process has been used to study evolutionary dynamics within viral genomes120,121, but our laboratory122,123,124 (also B.P. Hubbard and D.R.L, unpublished observations) has more recently extended its application to single-gene evolution. In our phage-assisted continuous evolution (PACE) system, an evolving gene is inserted into the M13 bacteriophage genome in place of an essential phage gene such as gene III (gIII). Instead, the evolving gene controls expression of gIII from an accessory plasmid. If the phage encodes a functional library member, then pIII, the protein encoded by gIII, is produced. Only phage assembled in the presence of pIII are infectious and can go on to infect and replicate in fresh host cells that dilute the vessel (Fig. 4d).

The continuous nature of PACE coupled with enhanced in vivo mutagenesis enables several hundred 'rounds' of selection, mutation and replication to take place per week without manual intervention. The first demonstration of PACE not only reprogrammed the promoter preferences of the T7 RNA polymerase but also suggested schemes for protein–protein interactions and recombinases122. A subsequent study developed a dominant-negative phage protein pIII-neg that can poison progeny phage and form the basis of negative selection123. The recent use of PACE to continuously evolve proteases124 and DNA-binding proteins (B.P. Hubbard and D.R.L, unpublished observations) demonstrates how PACE can be generalized through the development of gene circuitry that links desired enzymatic activities to the expression of gIII.

Computational design and directed evolution. Continuous evolution can extensively explore a fitness landscape over many rounds of evolution but, similar to other methods described above, accesses mutants that successively emerge from a starting gene. Computational protein design can initiate sequence space exploration from starting points that are inaccessible to evolutionary processes originating from naturally existing genes; as a result, it has the potential to expedite the evolution of completely novel protein functions125,126. Although growing computational power and more sophisticated design methodologies have recently produced complex designs such as macromolecular assemblies, receptors and even catalysts127,128,129, initial designs frequently remain suboptimal and require directed evolution to achieve high activity. For example, we and our collaborators130 used phage and yeast display to increase affinity between the designed binding partners Pdar and Prb. Designed enzymes such as peroxidases131 and retroaldolases61 can also be optimized through evolution, yielding efficiencies that rival unrelated natural catalysts of the same reactions. Perhaps the most impressive testimony to the power of computational design coupled with directed evolution is the creation of novel protein catalysts. Tawfik, Baker and co-workers132,133 achieved this aim by designing and evolving proteins that catalyse the Kemp elimination, a reaction not known to be carried out by natural enzymes.

Conclusions and perspectives

Current protein evolution methods each offer unique features that make them more appropriate for solving certain classes of molecular problems (Table 2). When choosing a methodology, researchers should assess the features of the protein that is being evolved to find an optimal screening and selection technology, as well as an appropriate accompanying genetic diversification strategy (Fig. 5). Pioneering studies in the field of directed evolution sought to improve the wild-type activity of enzymes through the enhancement of solubility, thermostability, affinity for substrate or catalytic turnover. These properties remain important in contemporary directed evolution because increased activity and stability often facilitate the engineering or evolution of other desirable properties. The pursuit of ambitious goals such as reprogrammed substrate selectivity33,85 and synthetically useful biocatalysts134 benefits from innovative screens and selections that balance the need for throughput and accurate assessments of library members. New screens and selections that achieve higher throughput or carry out more continuous rounds of evolution can broaden the exploration of the fitness landscape, whereas novel mutagenesis strategies increase the search efficiency. Through computational techniques and creative molecular biology protocols, diversity is focused on residues and specific mutations that influence desired activities135. New directed evolution methods will continue to generate proteins with useful new activities and specificities, as well as expand the scope of protein evolution to include even larger sets of chemical and biological functions.

Table 2: Comparison between screening and selection strategies
Figure 5: Optimal strategies for directed evolution.
Figure 5

a | The choice of a screening or selection method can be depicted as a decision tree that operates primarily on the properties of the protein and phenotype to be evolved. Although many techniques can be extended to alternative phenotypes, this figure focuses on the most popular methods for each set of conditions. b | Diversification strategies must be chosen both at the outset of an evolution project and between rounds of screening or selection. Considerations can and should change over the course of a project due to the phenotypes and genotypes within the evolving population. This decision tree attempts to distil these considerations with an emphasis on focused mutagenesis methods that have the maximum potential to identify functional variants. epPCR, error-prone PCR; CPR, compartmentalized partnered replication; CSR, compartmentalized self-replication; FACS, fluorescence-activated cell sorting; gIII, gene III; NMR, nuclear magnetic resonance; PACE, phage-assisted continuous evolution; REAP, reconstructed evolutionary adaptive path; SeSaM, sequence saturation mutagenesis.


  1. 1.

    et al. The effects of artificial selection on the maize genome. Science 308, 1310–1314 (2005).

  2. 2.

    , & From wild animals to domestic pets, an evolutionary view of domestication. Proc. Natl Acad. Sci. USA 106 (Suppl. 1), 9971–9978 (2009).

  3. 3.

    , & Diversifying carotenoid biosynthetic pathways by directed evolution. Microbiol. Mol. Biol. Rev. 69, 51–78 (2005).

  4. 4.

    & Directed evolution of Methanococcus jannaschii citramalate synthase for biosynthesis of 1-propanol and 1-butanol by Escherichia coli. Appl. Environ. Microbiol. 74, 7802–7808 (2008).

  5. 5.

    et al. Programming cells by multiplex genome engineering and accelerated evolution. Nature 460, 894–898 (2009).

  6. 6.

    et al. Genome shuffling leads to rapid phenotypic improvement in bacteria. Nature 415, 644–646 (2002).

  7. 7.

    , , , & Engineering yeast transcription machinery for improved ethanol tolerance and production. Science 314, 1565–1568 (2006).

  8. 8.

    , , & Olefin cyclopropanation via carbene transfer catalyzed by engineered cytochrome P450 enzymes. Science 339, 307–310 (2013).

  9. 9.

    et al. Directed evolution of a far-red fluorescent rhodopsin. Proc. Natl Acad. Sci. USA 111, 13034–13039 (2014).

  10. 10.

    , , , & Guiding the selection of human-antibodies from phage display repertoires to a single epitope of an antigen. Biotechnology 12, 899–903 (1994).

  11. 11.

    , , , & A new approach to random mutagenesis in vitro. Biotechnol. Bioeng. 86, 622–627 (2004).

  12. 12.

    , & A general method for saturation mutagenesis of cloned DNA fragments. Science 229, 242–247 (1985).

  13. 13.

    Specific mutagenic effect of base analogues on Phage-T4. J. Mol. Biol. 1, 87–105 (1959).

  14. 14.

    & Mutagenic repair in Escherichia coli: products of the recA gene and of the umuD and umuC genes act at different steps in UV-induced mutagenesis. Proc. Natl Acad. Sci. USA 82, 4193–4197 (1985).

  15. 15.

    Bacterial mutator genes and the control of spontaneous mutation. Annu. Rev. Genet. 10, 135–156 (1976).

  16. 16.

    , & An efficient random mutagenesis technique using an E. coli mutator strain. Mol. Biotechnol. 7, 189–195 (1997).

  17. 17.

    , , , & Identification of the ε-subunit of Escherichia coli DNA polymerase III holoenzyme as the dnaQ gene product: a fidelity subunit for DNA replication. Proc. Natl Acad. Sci. USA 80, 7085–7089 (1983).

  18. 18.

    , & An orthogonal DNA replication system in yeast. Nat. Chem. Biol. 10, 175–177 (2014).

  19. 19.

    , & A method for random mutagenesis of a defined DNA segment using a modified polymerase chain reaction. Technique 1, 11–15 (1989).

  20. 20.

    , , & An approach to random mutagenesis of DNA using mixtures of triphosphate derivatives of nucleoside analogues. J. Mol. Biol. 255, 589–603 (1996).

  21. 21.

    & High fidelity DNA synthesis by the Thermus Aquaticus DNA polymerase. Nucleic Acids Res. 18, 3739–3744 (1990).

  22. 22.

    & Directed enzyme evolution via small and effective neutral drift libraries. Nat. Methods 5, 939–942 (2008).

  23. 23.

    & Randomization of genes by PCR mutagenesis. PCR Methods Appl. 2, 28–33 (1992). This seminal study in optimizing the conditions for epPCR is a must-read for all scientists performing random mutagenesis.

  24. 24.

    , , & Reducing mutational bias in random protein libraries. Anal. Biochem. 339, 9–14 (2005).

  25. 25.

    , , & Sequence saturation mutagenesis (SeSaM): a novel method for directed evolution. Nucleic Acids Res. 32, e26 (2004).

  26. 26.

    , & Cassette mutagenesis: an efficient method for generation of multiple mutations at defined sites. Gene 34, 315–323 (1985).

  27. 27.

    et al. Enzymatic assembly of DNA molecules up to several hundred kilobases. Nat. Methods 6, 343–341 (2009).

  28. 28.

    & Circular polymerase extension cloning of complex gene libraries and pathways. PLoS ONE 4, e6441 (2009).

  29. 29.

    , & User cloning and user fusion: the ideal cloning techniques for small and big laboratories. Methods Mol. Biol. 643, 185–200 (2010).

  30. 30.

    & Combinatorial cassette mutagenesis as a probe of the informational content of protein sequences. Science 241, 53–57 (1988).

  31. 31.

    , , & The consensus concept for thermostability engineering of proteins. Biochim. Biophys. Acta 1543, 408–415 (2000).

  32. 32.

    et al. Reconstructed evolutionary adaptive paths give polymerases accepting reversible terminators for sequencing and SNP detection. Proc. Natl Acad. Sci. USA 107, 1948–1953 (2010).

  33. 33.

    et al. Engineering V-type nerve agents detoxifying enzymes using computationally focused libraries. ACS Chem. Biol. 8, 2394–2403 (2013). This paper nicely demonstrates how computational modelling can identify beneficial mutations, which can be stochastically incorporated into gene libraries.

  34. 34.

    & Macromolecular modeling with rosetta. Annu. Rev. Biochem. 77, 363–382 (2008).

  35. 35.

    et al. Computationally designed libraries for rapid enzyme stabilization. Protein Eng. Des. Sel. 27, 49–58 (2014).

  36. 36.

    & Incorporating synthetic oligonucleotides via gene reassembly (ISOR): a versatile tool for generating targeted libraries. Protein Eng. Des. Sel. 20, 219–226 (2007).

  37. 37.

    Rapid evolution of a protein in vitro by DNA shuffling. Nature 370, 389–391 (1994). This study is the first to establish a method for homologous recombination of evolving protein populations.

  38. 38.

    et al. DNA shuffling method for generating highly recombined genes and evolved enzymes. Nat. Biotechnol. 19, 354–359 (2001).

  39. 39.

    et al. Nucleotide exchange and excision technology (NExT) DNA shuffling: a robust method for DNA fragmentation and directed evolution. Nucleic Acids Res. 33, e117 (2005).

  40. 40.

    , , , & Molecular evolution by staggered extension process (StEP) in vitro recombination. Nat. Biotechnol. 16, 258–261 (1998).

  41. 41.

    , , , & Single-step assembly of a gene and entire plasmid from large numbers of oligodeoxyribonucleotides. Gene 164, 49–53 (1995).

  42. 42.

    et al. Synthetic shuffling expands functional protein diversity by allowing amino acids to recombine independently. Nat. Biotechnol. 20, 1251–1255 (2002).

  43. 43.

    , & Assembly of designed oligonucleotides as an efficient method for gene recombination: a new tool in directed evolution. Chembiochem 4, 34–39 (2003).

  44. 44.

    , , & Improved green fluorescent protein by molecular evolution using DNA shuffling. Nat. Biotechnol. 14, 315–319 (1996).

  45. 45.

    , , & DNA shuffling of a family of genes from diverse species accelerates directed evolution. Nature 391, 288–291 (1998).

  46. 46.

    , , & Heritable recombination system for synthetic Darwinian evolution in yeast. ACS Synth. Biol. 1, 602–609 (2012).

  47. 47.

    , & Libraries of hybrid proteins from distantly related sequences. Nat. Biotechnol. 19, 456–460 (2001).

  48. 48.

    , & A combinatorial approach to hybrid enzymes independent of DNA homology. Nat. Biotechnol. 17, 1205–1209 (1999).

  49. 49.

    , , & Directed evolution of protein enzymes using nonhomologous random recombination. Proc. Natl Acad. Sci. USA 101, 7011–7016 (2004).

  50. 50.

    , , , & Protein building blocks preserved by recombination. Nat. Struct. Biol. 9, 553–558 (2002).

  51. 51.

    & General method for sequence-independent site-directed chimeragenesis. J. Mol. Biol. 330, 287–296 (2003).

  52. 52.

    & Directed evolution of proteins by exon shuffling. Nat. Biotechnol. 19, 423–428 (2001).

  53. 53.

    , , , & Engineering hybrid genes without the use of restriction enzymes: gene-splicing by overlap extension. Gene 77, 61–68 (1989).

  54. 54.

    Directed Evolution Library Creation (Springer, 2014). This book is an excellent resource for comparing and choosing between genetic diversification methods as well as for successfully executing library generation protocols.

  55. 55.

    & Directed evolution of subtilisin E in Bacillus subtilis to enhance total activity in aqueous dimethylformamide. Protein Eng. 9, 77–83 (1996).

  56. 56.

    , & Wavelength mutations and posttranslational autoxidation of green fluorescent protein. Proc. Natl Acad. Sci. USA 91, 12501–12504 (1994).

  57. 57.

    , , , & Optical recording of action potentials in mammalian neurons using a microbial rhodopsin. Nat. Methods 9, 90–130 (2012).

  58. 58.

    et al. Luminogenic cytochrome P450 assays. Expert Opin. Drug Metab. Toxicol. 2, 629–645 (2006).

  59. 59.

    , , , & A high-throughput cellulase screening system based on droplet microfluidics. Biomicrofluidics 8, 041102 (2014).

  60. 60.

    et al. Directed evolution of hydrolases for prevention of G-type nerve agent intoxication. Nat. Chem. Biol. 7, 120–125 (2011).

  61. 61.

    et al. Evolution of a designed retro-aldolase leads to complete active site remodeling. Nat. Chem. Biol. 9, 494–498 (2013).

  62. 62.

    & Enzyme assays for high-throughput screening. Curr. Opin. Biotechnol. 15, 314–322 (2004).

  63. 63.

    & A novel genetic system to detect protein–protein interactions. Nature 340, 245–246 (1989).

  64. 64.

    et al. Chemical complementation: a reaction-independent genetic assay for enzyme catalysis. Proc. Natl Acad. Sci. USA 99, 16537–16542 (2002).

  65. 65.

    , & Directed evolution of a glycosynthase via chemical complementation. J. Am. Chem. Soc. 126, 15051–15059 (2004).

  66. 66.

    , , , & High-throughput selection for cellulase catalysts using chemical complementation. J. Am. Chem. Soc. 130, 17446–17452 (2008).

  67. 67.

    et al. Targeted mutagenesis of the Vibrio fischeri flavin reductase FRase I to improve activation of the anticancer prodrug CB1954. Biochem. Pharmacol. 84, 775–783 (2012).

  68. 68.

    , , , & Correlation between catalytic efficiency and the transcription read-out in chemical complementation: a general assay for enzyme catalysis. Biochemistry 43, 3570–3581 (2004).

  69. 69.

    Electronic separation of biological cells by volume. Science 150, 910–911 (1965).

  70. 70.

    Practical Flow Cytometry (Wiley-Liss, 2003).

  71. 71.

    & Yeast surface display for screening combinatorial polypeptide libraries. Nat. Biotechnol. 15, 553–557 (1997). This paper describes the invention of yeast display protein libraries for screening protein–protein interactions and serves as the foundation for many other cell surface display methods.

  72. 72.

    & Directed evolution of the site specificity of Cre recombinase. Proc. Natl Acad. Sci. USA 99, 4185–4190 (2002).

  73. 73.

    , , , & Directed evolution of substrate-optimized GroEL/S chaperonins. Cell 111, 1027–1039 (2002).

  74. 74.

    , & Directed evolution of a small-molecule-triggered intein with improved splicing properties in mammalian cells. Chem. Biol. 18, 619–630 (2011).

  75. 75.

    et al. A general method for greatly improving the affinity of antibodies by using combinatorial libraries. Proc. Natl Acad. Sci. USA 102, 8466–8471 (2005).

  76. 76.

    , & Mining a yeast library for brain endothelial cell-binding antibodies. Nat. Methods 4, 143–145 (2007).

  77. 77.

    , & A general strategy for the evolution of bond-forming enzymes using yeast display. Proc. Natl Acad. Sci. USA 108, 11399–11404 (2011).

  78. 78.

    et al. Immobilization of actively thromboresistant assemblies on sterile blood-contacting surfaces. Adv. Healthc. Mater. 3, 30–35 (2014).

  79. 79.

    et al. Engineered red blood cells as carriers for systemic delivery of a wide array of functional probes. Proc. Natl Acad. Sci. USA 111, 10131–10136 (2014).

  80. 80.

    , , , & Protein thioester synthesis enabled by sortase. J. Am. Chem. Soc. 134, 10749–10752 (2012).

  81. 81.

    & Receptor-directed chimeric toxins created by sortase-mediated protein fusion. Mol. Cancer Ther. 12, 2273–2281 (2013).

  82. 82.

    et al. Flow-based enzymatic ligation by sortase A. Angew. Chem. Int. Ed Engl. 53, 9203–9208 (2014).

  83. 83.

    , , , & One-step enzymatic modification of the cell surface redirects cellular cytotoxicity and parasite tropism. ACS Chem. Biol. (2014).

  84. 84.

    , , , & Reprogramming the specificity of sortase enzymes. Proc. Natl Acad. Sci. USA 111, 13343–13348 (2014).

  85. 85.

    et al. Engineering of TEV protease variants by yeast ER sequestration screening (YESS) of combinatorial libraries. Proc. Natl Acad. Sci. USA 110, 7229–7234 (2013).

  86. 86.

    & Man-made cell-like compartments for molecular evolution. Nat. Biotechnol. 16, 652–656 (1998). The authors of this paper developed IVC as a platform for directed evolution. This study describes a selection for methyltransferases within water–oil emulsion droplets.

  87. 87.

    et al. In vitro compartmentalization by double emulsions: sorting and gene enrichment by fluorescence activated cell sorting. Anal. Biochem. 325, 151–157 (2004).

  88. 88.

    et al. Ultrahigh-throughput screening in drop-based microfluidics for directed evolution. Proc. Natl Acad. Sci. USA 107, 4004–4009 (2010).

  89. 89.

    & Direct molecular evolution of detergent-stable G protein-coupled receptors using polymer encapsulated cells. J. Mol. Biol. 425, 662–677 (2013).

  90. 90.

    et al. Evolution of enzyme catalysts caged in biomimetic gel-shell beads. Nat. Chem. 6, 791–796 (2014). In this study, polyelectrolyte shells served as in vitro compartments for screening by flow cytometry.

  91. 91.

    , & Rapid isolation of high-affinity protein binding peptides using bacterial display. Protein Eng. Des. Sel. 17, 731–739 (2004).

  92. 92.

    , , & Phage antibodies: filamentous phage displaying antibody variable domains. Nature 348, 552–554 (1990). In this pioneering study, phage display is demonstrated as a powerful technique to select high-affinity antibody fragments. This paper also nicely illustrates the guiding principles of related binding enrichments.

  93. 93.

    , , & Making antibody fragments using phage display libraries. Nature 352, 624–628 (1991).

  94. 94.

    & Searching for peptide ligands with an epitope library. Science 249, 386–390 (1990).

  95. 95.

    & High-efficiency transformation of yeast by electroporation. Methods Enzymol. 194, 182–187 (1991).

  96. 96.

    , & High efficiency transformation of E. coli by high voltage electroporation. Nucleic Acids Res. 16, 6127–6145 (1988).

  97. 97.

    & In vitro selection and evolution of functional proteins by using ribosome display. Proc. Natl Acad. Sci. USA 94, 4937–4942 (1997).

  98. 98.

    , & The use of mRNA display to select high-affinity protein-binding peptides. Proc. Natl Acad. Sci. USA 98, 3750–3755 (2001).

  99. 99.

    et al. In vitro selection for catalytic activity with ribosome display. J. Am. Chem. Soc. 124, 9396–9403 (2002).

  100. 100.

    & Selection and evolution of enzymes from a partially randomized non-catalytic scaffold. Nature 448, 828–831 (2007).

  101. 101.

    , , , & Predicting the emergence of antibiotic resistance by directed evolution and structural analysis. Nat. Struct. Biol. 8, 238–242 (2001).

  102. 102.

    & Understanding, predicting and manipulating the genotypic evolution of antibiotic resistance. Nat. Rev. Genet. 14, 243–248 (2013).

  103. 103.

    , , & Engineering a tRNA and aminoacyl-tRNA synthetase for the site-specific incorporation of unnatural amino acids into proteins in vivo. Proc. Natl Acad. Sci. USA 94, 10092–10097 (1997). This groundbreaking study on genetic code expansion exemplifies how selectable antibiotic resistance markers can form the basis for a range of in vivo selections.

  104. 104.

    , , , & An efficient system for the evolution of aminoacyl-tRNA synthetase specificity. Nat. Biotechnol. 20, 1044–1048 (2002).

  105. 105.

    , , , & 3rd Structure-guided reprogramming of serine recombinase DNA sequence specificity. Proc. Natl Acad. Sci. USA 108, 498–503 (2011).

  106. 106.

    , , , & Rewiring yeast sugar transporter preference through modifying a conserved protein motif. Proc. Natl Acad. Sci. USA 111, 131–136 (2014). This study uses an auxotroph complementation strategy to select for sugar transporters that selectively uptake xylose from culture media.

  107. 107.

    , & Directed evolution of xylose isomerase for improved xylose catabolism and fermentation in the yeast Saccharomyces cerevisiae. Appl. Environ. Microbiol. 78, 5708–5716 (2012).

  108. 108.

    , & Directed evolution of a protein container. Science 331, 589–592 (2011).

  109. 109.

    , & Redesign of extensive protein–DNA interfaces of meganucleases using iterative cycles of in vitro compartmentalization. Proc. Natl Acad. Sci. USA 111, 4061–4066 (2014).

  110. 110.

    , & Directed evolution of polymerase function by compartmentalized self-replication. Proc. Natl Acad. Sci. USA 98, 4552–4557 (2001).

  111. 111.

    et al. CyDNA: synthesis and replication of highly Cy-dye substituted DNA by an evolved polymerase. J. Am. Chem. Soc. 132, 5096–5104 (2010).

  112. 112.

    et al. Molecular breeding of polymerases for amplification of ancient DNA. Nat. Biotechnol. 25, 939–943 (2007).

  113. 113.

    et al. Directed evolution of genetic parts and circuits by compartmentalized partnered replication. Nat. Biotechnol. 32, 97–101 (2014). The authors of this paper evolved enzymes within IVCs by linking the desired phenotype to the expression of Taq polymerase within E. coli. Taq can then be used in PCR to amplify the DNA encoding active library members within the emulsion droplet.

  114. 114.

    , & Directed evolution of a panel of orthogonal T7 RNA polymerase variants for in vivo or in vitro synthetic circuitry. ACS Synth. Biol. (2014).

  115. 115.

    & In vivo continuous directed evolution. Curr. Opin. Chem. Biol. 24, 1–10 (2015).

  116. 116.

    & Dynamics of adaptation and diversification: a 10,000-generation experiment with bacterial populations. Proc. Natl Acad. Sci. USA 91, 6808–6814 (1994).

  117. 117.

    et al. Evolutionary paths to antibiotic resistance under dynamically sustained drug selection. Nat. Genet. 44, 101–105 (2012).

  118. 118.

    et al. Directed evolution of a model primordial enzyme provides insights into the development of the genetic code. PLoS Genet. 9, e1003187 (2013).

  119. 119.

    , , & Targeted gene evolution in Escherichia coli using a highly error-prone DNA polymerase I. Proc. Natl Acad. Sci. USA 100, 9727–9732 (2003).

  120. 120.

    et al. Exceptional convergent evolution in a virus. Genetics 147, 1497–1507 (1997).

  121. 121.

    , & Adaptive molecular evolution for 13,000 phage generations: a possible arms race. Genetics 170, 19–31 (2005).

  122. 122.

    , & A system for the continuous directed evolution of biomolecules. Nature 472, 499–503 (2011). This study establishes a technological platform for the continuous evolution of biomolecules by linking the phage life cycle to the desired enzymatic activity.

  123. 123.

    , , & Negative selection and stringency modulation in phage-assisted continuous evolution. Nat. Chem. Biol. 10, 216–222 (2014).

  124. 124.

    , , & A system for the continuous directed evolution of proteases rapidly reveals drug-resistance mutations. Nat. Commun. 5, 5352 (2014).

  125. 125.

    et al. De novo computational design of retro-aldol enzymes. Science 319, 1387–1391 (2008).

  126. 126.

    et al. Computational design of an enzyme catalyst for a stereoselective bimolecular Diels–Alder reaction. Science 329, 309–313 (2010).

  127. 127.

    et al. De novo design of a transmembrane Zn2+-transporting four-helix bundle. Science 346, 1520–1524 (2014).

  128. 128.

    et al. Accurate design of co-assembling multi-component protein nanomaterials. Nature 510, 103–108 (2014).

  129. 129.

    & A designed supramolecular protein assembly with in vivo enzymatic activity. Science 346, 1525–1528 (2014).

  130. 130.

    et al. A de novo protein binding pair by computational design and directed evolution. Mol. Cell 42, 250–260 (2011).

  131. 131.

    & Directed evolution of the peroxidase activity of a de novo-designed protein. Protein Eng. Des. Sel. 25, 445–452 (2012).

  132. 132.

    et al. Optimization of the in-silico-designed kemp eliminase KE70 by computational design and directed evolution. J. Mol. Biol. 407, 391–412 (2011).

  133. 133.

    et al. Kemp elimination catalysts by computational enzyme design. Nature 453, 190–195 (2008). This paper describes the computational design of a Kemp elimination catalyst. Subsequent screening yielded improved catalysts for a reaction that is not known to be performed by natural enzymes.

  134. 134.

    et al. Biocatalytic asymmetric synthesis of chiral amines from ketones applied to sitagliptin manufacture. Science 329, 305–309 (2010).

  135. 135.

    & Novel methods for directed evolution of enzymes: quality, not quantity. Curr. Opin. Biotechnol. 15, 291–297 (2004).

  136. 136.

    et al. Single-cell high-throughput screening to identify enantioselective hydrolytic enzymes. Angew. Chem. Int. Ed Engl. 47, 5085–5088 (2008).

  137. 137.

    et al. Selection of horseradish peroxidase variants with enhanced enantioselectivity by yeast surface display. Chem. Biol. 14, 1176–1185 (2007).

  138. 138.

    et al. Directed evolution of sortase A mutants with altered substrate selectivity profiles. J. Am. Chem. Soc. 133, 17536–17539 (2011).

Download references


This work was supported by the US Defense Advanced Research Projects Agency grants DARPA HR0011-11-2-0003 and DARPA N66001-12-C-4207, the US National Institutes of Health (NIH)/National Institute of General Medical Sciences (NIGMS) (grant R01 GM095501) and the Howard Hughes Medical Institute (HHMI).

Author information


  1. Department of Chemistry and Chemical Biology, Harvard University, 12 Oxford Street, Cambridge, Massachusetts 02138, USA.

    • Michael S. Packer
    •  & David R. Liu


  1. Search for Michael S. Packer in:

  2. Search for David R. Liu in:

Competing interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to David R. Liu.


Natural selection

A process by which individuals with the highest reproductive fitness pass on their genetic material to their offspring, thus maintaining and enriching heritable traits that are adaptive to the natural environment.

Artificial selection

(Also known as selective breeding). A process by which human intervention in the reproductive cycle imposes a selection pressure for phenotypic traits desired by the breeder.


Diverse populations of DNA fragments that are subject to downstream screening and selection.

Library size

The number variants that are subjected to screening and selection. Library sizes are limited by molecular cloning protocols and/or by host transformation efficiency.

Focused mutagenesis

A strategy of diversification that introduces mutations at DNA regions expected to influence protein activity.

Random mutagenesis

A strategy of diversification that introduces mutations in an unbiased manner throughout the entire gene.

Mutational spectrum

The frequency of each specific type of transition and transversion. The evenness of this spectrum allows more thorough sampling of sequence space.


The process by which a cell directly acquires a foreign DNA molecule. A number of protocols allow high-efficiency transformation of microorganisms through treatments with ionic buffers, heat shock or electroporation.

Neutral drift

A process that occurs in the presence of a purifying selection pressure to eliminate deleterious mutations. This is in contrast to genetic drift, a process by which mutations fluctuate in frequency in the absence of selection pressure.

Degenerate codons

Codons constructed with a mixed population of nucleotides at a given position, thus sampling all possible amino acids within the constructed libraries. The most popular examples are NNK and NNS (where N can be any of the four nucleotides, K can be G or T, and S can be G or C).

Epistatic interactions

Non-additive effects between mutations (for example, mutational synergy or synthetic lethality). As a result, the sequential acquisition of mutations is not always equivalent to mutational co-occurrence.

Homologous recombination

A process by which separate pieces of DNA swap genetic material, guided by the annealing of complementary DNA fragments.

Passenger mutations

(Also known as hitchhiker mutations). Unnecessary mutations that are enriched in a population owing to co-occurrence with a highly beneficial linked mutation.


The process by which a viral vector delivers a foreign DNA molecule to a cellular host.

Evolutionary potential

The capacity of a protein to take on new functions through evolution. High thermostability allows for necessary but destabilizing mutations, and functional diversity of homologues is a demonstration of previous evolution in nature.

Surrogate substrates

Substrate analogues that are permissive of enzymatic conversion but that, upon catalysis, exhibit chemical rearrangements that lead to altered optical properties, including visible colour, relief of fluorophore quenching, shifted fluorophore excitation or emission, and downstream chemiluminescence.

Fluorescence-activated cell sorting

(FACS). A flow cytometry method in which an aqueous suspension of cells or cell-like compartments is measured for fluorescence (often at multiple wavelengths) one cell at a time and subsequently separated based on a fluorescence threshold.

Negative screen

A screening method that involves depletion of an undesired phenotype.

Positive screening

Enrichment for a desired activity such as improved kinetics, tolerance to unnatural conditions and acceptance of new substrates.

Transformation bottleneck

The efficiency at which DNA library members are transferred into the host organism, thus restricting the number of variants that can be assessed by in vivo selection and screening.

Auxotroph complementation

The ability of functional library members to resolve a metabolic defect in the host, leading to replication of DNA that encodes active library members.

About this article

Publication history



Further reading