Naturally occurring enzymes are extraordinarily efficient catalysts1. They bind their substrates in a well-defined active site with precisely aligned catalytic residues to form highly active and selective catalysts for a wide range of chemical reactions under mild conditions. Nevertheless, many important synthetic reactions lack a naturally occurring enzymatic counterpart. Hence, the design of stable enzymes with new catalytic activities is of great practical interest, with potential applications in biotechnology, biomedicine and industrial processes. Furthermore, the computational design of new enzymes provides a stringent test of our understanding of how naturally occurring enzymes work. In the past several years, there has been exciting progress in designing new biocatalysts2, 3.
Here we describe the use of our recently developed computational enzyme design methodology4 to create new enzyme catalysts for a reaction for which no naturally occurring enzyme exists: the Kemp elimination5, 6. The reaction, shown in Fig. 1a, has been extensively studied as an activated model system for understanding the catalysis of proton abstraction from carbon—a process that is normally restricted by high activation-energy barriers7, 8.
Figure 1: Reaction scheme and catalytic motifs used in design.

a, The Kemp elimination proceeds by means of a single transition state, which can be stabilized by a base deprotonating the carbon and the dispersion of the resulting negative charge; a hydrogen bond donor can also be used to stabilize the partial negative charge on the phenolic oxygen. b, Examples of active site motifs highlighting the two choices for the catalytic base (a carboxylate (left) or a His–Asp dyad (right)) used for deprotonation, and a
-stacking aromatic residue for transition state stabilization. For each catalytic base, all combinations of hydrogen bond donor groups (Lys, Arg, Ser, Tyr, His, water or none) and
-stacking interactions (Phe, Tyr, Trp) were input as active site motifs into RosettaMatch.
Computational design method
The first step in our protocol for designing new enzymes is to choose a catalytic mechanism and then to use quantum mechanical transition state calculations to create an idealized active site with protein functional groups positioned so as to maximize transition state stabilization (Fig. 1b). The key step for the Kemp elimination is deprotonation of a carbon by a general base. We chose two different catalytic bases for this purpose: first, the carboxyl group of an aspartate or glutamate side chain, and, second, the imidazole of a histidine positioned and polarized by the carboxyl group of an aspartate or glutamate (we refer to this combination as a His–Asp dyad). The two choices have complementary strengths and weaknesses. The advantage of the carboxylate is that it is likely to be in the basic (deprotonated) form, but partial desolvation of the charged group in an apolar environment (to increase its relatively weak basicity) could destabilize the protein and further desolvation by the substrate could oppose binding. Although histidine is a better general base than a carboxylate, it is necessary to regulate both its pKa and its tautomeric state. Coupling the histidine with a base such as aspartate in a dyad serves to both position the histidine and increase its basicity. If the pKa of histidine is raised too high, however, it can become doubly protonated, rendering it ineffective as a base.
For both the carboxylate- and histidine-based mechanisms, we included additional functional groups in the idealized active sites to further facilitate catalysis using both quantum mechanical and classical methods9. A hydrogen bond donor was used to stabilize the developing negative charge on the phenolic oxygen in the otherwise hydrophobic active site. Catalytic motifs lacking the H-bond donor were also tested, because the developing negative charge is relatively small in the transition state and can be easily solvated by water9, 10. For each choice of catalytic site composition, density functional theory quantum-mechanical methods11, 12, 13 were used to optimize the placement and orientations of the catalytic groups around the transition state for maximal stabilization (see Methods). Finally, because stabilization of the transition state by charge delocalization is a key factor in catalysis of the Kemp elimination5, 6, 7, 10, 14, we chose to stack aromatic amino acid side chains on the planar transition state (Fig. 1b) using idealized
-stacking geometries15.
We next used the RosettaMatch hashing algorithm4 to search for constellations of protein backbone positions capable of supporting these idealized active sites in a large set of stable protein scaffolds with ligand-binding pockets and high-resolution crystal structures. As described in the Methods, the His–Asp dyad required generalizing RosettaMatch to handle side chains, such as the Asp, for which the range of allowed positions are referenced to another catalytic side chain rather than to the transition state; this was accomplished by identifying, for each His rotamer in a scaffold, the set of Asp rotamers that can provide the supporting hydrogen bond. The scaffold set spans a broad range of protein folds, including TIM barrels,
-propellers, jelly rolls, Rossman folds and lipocalins, amongst others (Supplementary Table 3). In a typical search, more than 100,000 possible realizations of the input idealized active site were found in the scaffold set. For each of these 'matches', gradient-based minimization16 was used to optimize the rigid body orientation of the transition state and the torsional degrees of freedom of the catalytic side chains to best satisfy all catalytic geometrical constraints. Subsequently, residues surrounding the transition state were redesigned both to maximize the stability of the active site conformation and the affinity to the transition state and to maintain protein stability using the Rosetta design methodology for proteins17 and small molecules18. Designs were screened for compatibility with substrate and product and were ranked on the basis of the catalytic geometry and the computed transition-state-binding energy.
A steady enrichment of the fraction of designs in the TIM barrel scaffold was observed throughout the enzyme design process. TIM barrel scaffolds represent 25% of the proteins in the input scaffold set, 43% of the initial matches, and 71% of the low-energy designs. Inspection of the designs suggests that the binding pockets in TIM barrel scaffolds were favoured because of the large number of take-off positions (all positions around the barrel pointing towards the cavity) for both the catalytic residues and the additional transition-state-binding residues optimized in the design process; the former favoured TIM barrel matches, and the latter favoured low-energy designs in TIM scaffolds. The TIM barrel is the most widespread and catalytically diverse fold in naturally occurring enzymes; our in silico design process seems to be drawn towards the same structural features as naturally occurring enzyme evolution.
Experimental characterization
Following the active site design, a total of 59 designs in 17 different scaffolds were selected for experimental characterization. Out of the 59 designs, 39 use an Asp or Glu as the generalized base and 20 use a His–Asp or His–Glu dyad. Eight of the designs showed measurable activity in Kemp elimination assays in an initial activity screen (Table 1; see Supplementary Table 4 for sequence information and Methods for experimental details). For each of these eight designs, mutation of the catalytic base (to Ala or Gln/Asn) markedly decreased the activity or abolished catalysis completely, suggesting that the observed activity results from the designed active site (Table 1; for some examples, see Fig. 2a). The designs have kcat/Km values in the range of 6 to 160 M-1 s-1 (Table 1 and Fig. 2b); it was not possible to obtain saturation kinetics in all cases (for example, see KE10 (open squares) and KE61 (open triangles) in Fig. 2b) owing to low substrate solubility. Both catalytic motifs were used in active designs; of the two most active catalysts, which show a rate acceleration of roughly 105 and a kcat/Km of about 100, one uses the Glu as the base and the other uses the His–Asp dyad. All designs exhibited multiple turnovers (
7)—a prerequisite for efficient catalysis.
Figure 2: Kinetic characterization of designed catalysts.

a, Catalytic activity was measured by monitoring the product formation over time for KE59 (open circles) and KE70 (filled circles) at 400
M substrate concentration. The y axis is the product concentration divided by the catalyst concentration that corresponds to the number of substrate turnovers. Deleting the catalytic base in both designs largely eliminates catalytic activity (open and filled triangles). Mutating Asp 44 of the catalytic dyad of KE70 to Asn (filled squares) causes a 2.5-fold reduction in activity. b, Michaelis–Menten plots for a representative selection of designed catalysts. The reaction velocity v divided by catalyst concentration is plotted on the y axis and the substrate concentration on the x axis. Some designs (for example, KE10 (open squares) and KE61 (open triangles)) show no saturation up to the maximal substrate solubility.
Models for these two most active designs are shown in Fig. 3. In the KE59 design (Fig. 3a), which is in a TIM barrel scaffold, Glu 231 is the catalytic base and Trp 110 facilitates charge delocalization by
-stacking to the transition state. Additionally, Leu 108, Ile 133, Ile 178, Val 159 and Ala 210 create a tightly packed hydrophobic pocket that envelops the non-polar substrate. The polar residues Ser 180 and Ser 211 provide hydrogen-bonding interactions with the nitro group of the transition state. Mutation of the catalytic base Glu 231 to Gln abolished catalytic activity (Table 1 and Fig. 2a, open triangles). Attempts to add a hydrogen bond donor to stabilize the negative charge developing at the phenolic oxygen through a Gly 131 to Ser mutation caused a ninefold reduction in kcat/Km (Table 1), perhaps owing to unfavourable electrostatic interactions between the oxygen atoms on the serine and substrate; this large effect suggests that the transition-state-binding site is quite well defined. The aromatic-rich pocket and carboxylate base are reminiscent of the active site of the Kemp catalytic antibody 34E4 (ref. 10).
Figure 3: Computational design models of the two most active catalysts.

a, KE59 uses indole-3-glycerolphosphate synthase from Sulfolobus solfataricus as a scaffold. The transition state model is almost completely buried, with loops covering the active site. The mostly hydrophobic residues in the active site pocket pack the transition state model tightly, providing high shape complementarity (shape complementarity = 0.84; ref. 29). The polar residue Ser 211 interacts with the nitro group of the transition state to promote binding. The key catalytic residues (Glu 231 and Trp 110) are depicted in cyan. b, The deoxyribose-phosphate aldolase from E. coli is the scaffold for KE70. The shorter loops leave the active-site pocket freely accessible for the substrate. The transition state is surrounded by hydrophobic residues that provide high shape complementarity (shape complementarity = 0.77; ref. 29). His 16 and Asp 44 (in cyan) constitute the catalytic dyad whereas Tyr 47 (in cyan) provides
-stacking interactions.
The KE70 design (Fig. 3b) uses the His–Asp dyad mechanism. Asp 44 positions and polarizes His 16 to optimally deprotonate the substrate. Tyr 47
-stacks above the transition state, and together with Ile 201, Ile 139, Val 167, Ala 18, Ala 102 and Trp 71 creates a tight hydrophobic pocket around the transition state. The active site is again in a TIM barrel scaffold with the His–Asp dyad near the bottom of the site. Mutation of the catalytic base His 16 to Ala abolished catalytic activity (Table 1 and Fig. 2a, filled triangles), whereas mutating Asp 44 of the catalytic dyad to Asn produced an approximately 2.5-fold reduction (Table 1 and Fig. 2a, filled squares). In another design using a His–Asp dyad as general base (KE71), the analogous Asp-to-Asn mutation reduced activity sixfold (Table 1) whereas the His-to-Ala mutation abolished catalysis (Table 1).
High-resolution structural information on designed proteins is essential to validate the accuracy of the design methodology. We were able to grow crystals and obtain a high-resolution structure of one of the early Glu-based designs, KE07 (see Supplementary Information for details). As shown in Fig. 4, the crystal structure and design model are virtually superimposable, with an active site (6.0 Å around the transition state) root mean squared deviation (r.m.s.d.) of 0.95 Å mostly reflecting modest side-chain rearrangements. The similarity between the design model and the crystal structure suggests that the active sites in our new enzymes resemble those in the corresponding design models. The subtle deviations in the backbone indicate loop regions in which explicitly modelling backbone flexibility may yield improved designs.
Figure 4: Comparison of the designed model of KE07 and the crystal structure.

The crystal structure (cyan) was solved in the unbound state and shows only modest rearrangement of active site side chains compared to the designed structure (grey) modelled in the presence of the transition state (yellow, transparent). (Backbone r.m.s.d. for the active site is 0.32 Å versus 0.95 Å for the active site including the side chains.) The observed electron density around relevant amino acids in the active site is shown in Supplementary Fig. 6. KE07 contains 13 mutations compared to the starting template scaffold (PDB code 1thf).
High resolution image and legend (186K)The crystal structure also revealed that Lys 222 makes a salt bridge to the catalytic Glu 101 in the absence of substrate, whereas in the designed model the ammonium of the lysine stabilizes the developing phenoxide in the transition state. Forming the productive transition state complex thus requires breaking of the salt bridge, and therefore elimination of the salt bridge in the unbound state would be expected to improve catalysis. We tested this prediction by substituting the lysine with an alanine, and this resulted in a 2.5-fold increase in kcat/Km (Table 1).
Directed evolution
In vitro evolution has been shown to markedly improve the stability, expression and activity of enzymes, and is currently the most widely used and successful approach for refining biocatalysts19. However, in vitro enzyme evolution generally requires a starting point with at least a low level of the desired activity, which is then optimized by repeated rounds of mutation and selection (for a notable exception, see ref. 20). We reasoned that in vitro evolution would be an excellent complement to our computational design efforts. The design calculations ensure that key catalytic functional groups are correctly positioned around the transition state, and, as demonstrated above, can generate active catalysts without requiring any starting activity. Thus, computational design can potentially provide excellent starting points for in vitro evolution. In contrast, the design process does not explicitly model configurational entropy changes, longer range second-shell interactions, and dynamics effects that can be important for efficient turnover; these shortcomings can potentially be remedied by directed evolution. Directed evolution can be valuable both in improving the designed catalysts and in stimulating improvements in the computational design methodology by shedding light on what is missing from the designs.
To investigate the extent to which in vitro evolution methods can improve computationally designed enzymes, we initiated evolution experiments on KE07—the early design for which the crystal structure was determined. Seven rounds of random mutagenesis and shuffling (also including synthetic oligonucleotides that expanded the diversity at selected residues), followed by screens in microtitre plates, yielded variants that had 4–8 mutations relative to KE07 and an improvement of >200-fold in kcat/Km (Table 2). Notably, the key aspects of the computational design, including the identities of the catalytic side chains, were not altered by the evolutionary process (indeed, mutating the catalytic base Glu 101 abolished the catalytic activity of both the designed template and its evolved variants; Table 2). Instead, the mutations were often seen in residues adjacent to designed positions (for example, Val 12, Ile 102, Gly 202), and thus provide subtle fine-tuning of the designed enzyme. Some mutations, such as Gly202Arg, are likely to increase the flexibility of regions neighbouring the active site. The hydrophobic residues Ile 7 and Ile 199 at the bottom of the active site were frequently mutated to polar or charged residues (the most common mutation being Ile7Asp), which may hold Lys 222 in position to stabilize the developing negative charge in the transition state while preventing interaction of Lys 222 with Glu 101. Consistent with this idea, the pKa of the catalytic Glu 101 shifts from <4.5 to 5.9 in the evolved variant with the Ile7Asp mutation (for details, see Supplementary Information). Although the Lys222Ala mutation increases the activity of the original KE07, it significantly decreases the activity of the evolved variants, perhaps owing to the uncompensated additional negative charge.
Conclusions
The marked increase in catalytic activity and in turnover (>1,000 catalytic cycles were observed for the evolved variants), achieved through screening a relatively small number of variants (800–1,600 clones per round) by molecular evolution standards bodes well for future combinations of computational design and molecular evolution. In particular, the in vitro evolution of the most active of the computational designs, for example, KE59 or KE70, has the potential to yield highly active catalysts for the Kemp elimination reaction. We anticipate the successful use of the combination of computational design and molecular evolution that we have described here for a wide range of important reactions in the years to come.
The challenge of generating new biocatalysts has led to several successful experimental strategies20, 21, 22. In particular, the Kemp elimination comprises a well-defined model for catalysis of proton transfer from carbon—a highly demanding reaction and a rate-determining step in numerous enzymes. It has therefore been the subject of several attempts to generate enzyme-mimics and models (such as catalytic antibodies23, promiscuous protein catalysts24 and enzyme-like polymers14). The catalytic parameters of the new enzymes described here are comparable to the most active catalysts of the Kemp elimination of 5-nitro-benzisoxazole described thus far, and provide further insights into the makings of an enzyme. Comparison with the catalytic antibodies23 highlights the major shortcoming of many of the designs noted above—that is, their relatively weak binding of the substrate. Although the computational design methodology has the advantage of being able to explicitly place key catalytic residues, this may come at a cost of overall substrate and transition-state binding affinity. Consistently achieving high affinity to the transition state and high turnover numbers is a challenge that we are currently approaching by introducing scaffold backbone flexibility into the design process. This should enable us to create higher affinity binding sites formed by more precisely positioned constellations of binding and catalytic residues.
The computational methodology described here can be readily generalized to design catalysts for more complex multistep reactions25. The combination of computational enzyme design to create the overall active site framework for catalysing a synthetic chemical reaction with molecular evolution to fine-tune and incorporate subtleties not yet modelled in the design methodology is a powerful route to create new enzyme catalysts for the very wide range of chemical reactions for which naturally occurring enzymes do not exist. Equally importantly, computational design provides a critical testing ground for evaluating and refining our understanding of how enzymes work.
Methods Summary
Computational design
Transition state geometries were computed at the B3LYP/6-31G(d) level for idealized active sites containing either a carboxylate or an imidazole-carboxylate dyad as the general base. Aromatic side chains were placed above and below the transition state using idealized
-stacking geometries15. A six-dimensional hashing procedure4 was applied to find transition state placements in a large set of protein scaffolds (Supplementary Table 3) that were consistent with the catalytic geometry. Residues surrounding the catalytic side chains and transition state were repacked and redesigned17, 18 to optimize steric, coulombic and hydrogen-bonding interactions with the transition state and associated catalytic residues.
Experimental characterization
The proteins were expressed in Escherichia coli BL21(DE3) using pET29b (Novagen) and purified over a Ni-NTA column (Qiagen). The proteins (1
M to 10
M) were assayed in 25 mM HEPES (pH 7.25) and 100 mM NaCl at 250
M substrate concentration for the initial screening, and substrate dilutions from 1 mM to 11
M were used for kinetic characterization. Kinetic parameters were determined in at least three independent measurements. Fitted Km values above 1 mM (and their corresponding kcat values) are necessarily approximate. Site-directed mutagenesis of catalytic residues and independent protein purifications by different protocols/laboratories were carried out to exclude possible contaminating enzymes (Supplementary Information).
In vitro evolution
Gene libraries of KE07 were created by random mutagenesis using error-prone PCR with 'wobble' base analogues dPTP and 8-oxo-dGTP26 using the Genemorph PCR mutagenesis kit (Stratagene), and by DNA shuffling of the most active variants27. In certain rounds, shuffling included the spiking of synthetic oligonucleotides that expanded the diversity at selected residues28. In each round, the cleared lysates of 800–1,600 individual colonies were assayed for hydrolysis of 5-nitrobenzisoxazole (0.125 mM) by following product formation at 380 nm. The most active clones were sub-cloned and sequenced, and the encoding plasmids were used as templates for subsequent rounds of mutagenesis and screening.
Full methods accompany this paper.
bond vector placement. The optimal catalytic geometry and the associated constraints for both reaction mechanisms are shown in
1021 possible combinations for creating the catalytic motif, which would be computationally intractable to enumerate. By using the linear-scaling RosettaMatch algorithm, this number was reduced to a much more manageable number (8.7
–C
g ml-1 kanamycin). Individual colonies were inoculated into 2YT supplemented with 50
15 hours at 37 °C. Overnight cultures (20
3 per gene, and mutations were mainly of the transition type. The first round of KE07 evolution yielded active variants with lysate activity up to fivefold higher than of that of the starting point KE07.
