Abstract
Darwinian evolution has given rise to all the enzymes that enable life on Earth. Mimicking natural selection, scientists have learned to tailor these biocatalysts through recursive cycles of mutation, selection and amplification, often relying on screening large protein libraries to productively modulate the complex interplay between protein structure, dynamics and function. Here we show that by removing destabilizing mutations at the library design stage and taking advantage of recent advances in gene synthesis, we can accelerate the evolution of a computationally designed enzyme. In only five rounds of evolution, we generated a Kemp eliminase—an enzymatic model system for proton transfer from carbon—that accelerates the proton abstraction step >108-fold over the uncatalyzed reaction. Recombining the resulting variant with a previously evolved Kemp eliminase HG3.17, which exhibits similar activity but differs by 29 substitutions, allowed us to chart the topography of the designer enzyme’s fitness landscape, highlighting that a given protein scaffold can accommodate several, equally viable solutions to a specific catalytic problem.
Similar content being viewed by others
Main
Natural evolution routinely generates powerful biocatalysts by elegantly navigating protein sequence space1. John Maynard Smith likened this process to a journey between functional proteins in a vast landscape2. Active enzymes, although rare, are connected to other functional catalysts in a continuous network that can be traversed by unit mutational steps. This connectivity is crucial for successful evolution via point mutations; without it, single mutations would be highly likely to reduce fitness, halting evolution2. Protein engineers have learned to traverse this sequence network through ‘directed evolution’, which involves iterative cycles of mutagenesis and screening or selection to develop improved proteins in the laboratory3. This deliberate exploration of sequence space revealed that proteins can rapidly adapt to new selection criteria, implying ‘easily’ navigable paths within the fitness landscape3,4,5,6. Following these paths has led to the improvement of diverse proteins for industrial or therapeutic applications7.
Nevertheless, a definitive understanding of the structural and molecular factors that govern the efficiency with which an enzyme catalyzes a reaction has remained elusive3, preventing scientists from reliably predicting amino acid sequences that encode a specific function. Today, enzyme engineering still relies on testing large libraries of variants, consisting of hundreds to millions of distinct protein sequences, to identify solutions suitable for application8. Even considering advanced high-throughput screening and computational methods7, the laboratory evolution of enzymes remains a resource- and time-consuming process9,10. Unfortunately, identifying the shortest evolutionary path to attain a desired protein function remains challenging as many evolutionary trajectories lead downhill, away from improved function. Most single-site mutations are either deleterious (30–50%) or neutral (50–70%), with only as few as 1% of residue exchanges being beneficial3,11,12,13,14,15. To increase the efficiency of the search process, limiting sequence space exploration to favorable amino acid exchanges would consequently be beneficial. While predicting which mutations will improve function and are thus worth exploring is a difficult task16,17,18, the prediction and exclusion of deleterious mutations is a potentially much more straightforward exercise19.
Here we demonstrate that removing deleterious mutations from protein libraries can greatly accelerate the search of protein fitness landscapes. For this purpose, we selected the de novo-designed Kemp eliminase HG3 as a test system20,21. HG3 catalyzes the Kemp elimination, a model for proton transfer from carbon (Fig. 1a)22. It was generated using quantum mechanical (QM) calculations to predict an idealized active site pocket for transition state stabilization20,23. Although the artificial enzyme showed modest activity compared to other de novo designs and catalytic antibodies elicited for the same reaction24,25, it was optimized over 17 rounds of mutagenesis and screening21. The improved variant HG3.17, which contained 17 additional mutations, was more active, better expressed and more thermostable compared to its ancestor21 and approaches the efficiency of natural enzymes that promote metabolically important proton transfers21,26. A retrospective analysis based on X-ray crystallography and computation showed that eight of HG3.17’s mutations are sufficient to reach ~80% of its activity27.
Results
Enzyme library design and construction
For rapid optimization of Kemp eliminase HG3, we designed complex gene libraries that excluded destabilizing mutations. The latter were identified by calculating ΔΔG values for the free energy change upon mutation28 for all 5,757 possible single amino acid substitutions in the HG3 sequence (19 (non-wild-type amino acids) × 303 (sequence length)) using a cartesian ΔΔG protocol implemented in the Rosetta Protein Modeling Suite29. Notably, this analysis indicated that approximately half of all possible single-site mutations (49.3%; 2,839 of 5,758) in HG3.17 could have been removed from the library design space without losing a single successful mutation (Fig. 1c–e).
Using this finding to guide library design, we fully saturated residues found within a 6 Å radius of the bound 6-nitrobenzotriazole (3) as well as all residues lining the tunnel leading to the active site (Fig. 2a) to avoid potential bias caused by knowledge from prior evolution experiments21. Additional mutations were only considered if the predicted ΔΔG of the resulting variant was below −0.5 Rosetta Energy Units (REU). This energy threshold was set for experimental reasons (oligo pool size and screening capacity) and limited the screening effort to 30% of all possible single-site mutations (ca. 1,800 variants per round). In addition, a small number of single-site mutations suggested by a HotSpot Wizard30 analysis, which is based on sequence, structure and evolutionary information, were included (Fig. 2b).
The single-site variant libraries were physically constructed using mixtures of unique DNA oligonucleotides (oligo pools) of limited length (200 bp) covering the entire HG3 gene. Full genes were assembled by overlap extension PCR31 using eight customized oligo fragments (Fig. 2d,e and Extended Data Fig. 1). The resulting gene libraries were used to transform Escherichia coli BL21 (DE3), and the corresponding Kemp eliminase variants were produced and assayed in cell lysates by following the formation of colored salicylonitrile (2) at 380 nm. Beneficial single mutations were combined in small combinatorial libraries and screened to identify the parent for the next evolution cycle (Extended Data Fig. 2).
This engineering procedure was carried out five times in total. Each cycle consisted of computational filtering (stability predictions in the frame of the parent gene), constructing and screening of the single-site variant libraries to identify favorable amino acid substitutions (on average, five to ten beneficial substitutions per round were identified showing a 1.2- to 2.4-fold improvement over the parent enzyme), followed by analyzing a combinatorial library built from identified beneficial mutations. Using this strategy, Kemp eliminase activity was boosted by a factor of >450-fold (Figs. 2c and 3d and Extended Data Fig. 2). Because oligos synthesized as pools can contain errors32,33,34, we sequenced the initial libraries and confirmed a coverage of >50% of the targeted mutations. We judged this acceptable as our enzyme engineering protocol was designed to interrogate all nondestabilizing mutations in each engineering cycle. In analogy to natural evolution, this sampling strategy allowed us to observe instances of unit mutational step reversals (round 1: N166S/round 2: S166N) and the stepwise finetuning of key amino acid positions (round 3: T54I/round 5: I54V and round 4: A125C/round 5: C125V; Fig. 3a,c).
Biochemical characterization of optimized Kemp eliminases
Thanks to this filtering strategy, only five rounds of evolution were needed to identify the highly active Kemp eliminase HG3.R5. Notably, HG3.R5 is characterized by 16 mutations, only three of which target the same residues (K50, Q90 and A125) as HG3.17. Overall, however, the two proteins have only one mutation in common, K50Q, which stabilizes the developing negative charge in the transition state (Extended Data Fig. 3)21,27. Although a comparison of the HG3.17 and HG3.R5 sequences suggests that the choice and placement of key catalytic groups exhibit little flexibility (the same catalytic dyad consisting of D127 and K50Q appeared independently in both cases), the rest of the active site displays considerable freedom with respect to the type and positioning of amino acid substitutions (Fig. 3b,c).
To assess the catalytic improvements gained during evolution, we determined the steady-state enzyme turnover number kcat and Michaelis constant Km parameters for the cleavage of 5-nitrobenzisoxazole (1) of the best variants from each evolution round by numerically fitting the total time course data (Supplementary Figs. 1 and 2). Although we corrected the data for variants HG3 and HG3.17 for conformational selection35, we assumed that the Kemp variants of the HG3.R series were fully active as isolated, potentially underestimating the true efficiency of these catalysts. This analysis showed that HG3.R5 cleaves 5-nitrobenzisoxazole (1) with a kcat of 702 ± 79 s−1 (mean ± s.d.) and a kcat/Km of 1.7 × 105 M−1 s−1. The >200-fold improvement in catalytic efficiency over the original computational design (Table 1) is comparable to that achieved by HG3.17 and can be largely attributed to an increase in turnover number rather than substrate affinity (Table 1).
TSA bound structure of Kemp eliminase HG3.R5
Notably, while the evolutionary trajectories of HG3.R5 and HG3.17 diverged at the nucleotide and amino acid level, we found that similar structural effects drive catalysis in both enzymes. Alignment of the 1.5 Å X-ray structure of HG3.R5 (Protein Data Bank (PDB) 8RD5; Supplementary Table 1) in complex with 6-nitrobenzotriazole (3) with HG3 and HG3.17 (Fig. 4a–c) showed the deeply buried active site that characterizes all variants of the HG3 series and shields the ligand from bulk solvent. Like HG3.17, decisive structural elements in HG3.R5 include the K50Q mutation (oxyanion stabilization) as well as excellent shape complementarity to the transition state analog (3) (TSA (3); Extended Data Fig. 3c,f), including fine-tuned interactions between the catalytic base, D127, and the acidic N–H bond of 6-nitrobenzotriazole (3) (Extended Data Fig. 3b).
The binding of an ordered water molecule in a position to stabilize the developing negative charge in the transition state is also notable. In HG3.R5, a substantial movement of P45 away from the ligand (2.4 Å) provided space for the water molecule (we), which is embedded in a dense hydrogen-bonding network and forms a polar contact with N3 (3.4 Å) of the TSA (3) (Fig. 4d and Extended Data Fig. 3f). A similarly placed water is also found in the structure of HG3.17 (ref. 36; Fig. 4e and Extended Data Fig. 3h), but it is accommodated by a different set of mutations (Fig. 4e). QM calculations on a 5-nitrobenzisoxazole (1) bound cluster model derived from the binary X-ray structure of HG3.R5 (PDB 8RD5) suggest that water we is a weak oxyanion binder that stabilizes the transition state by 0.6 kcal mol−1 (Extended Data Fig. 4a), a value which is confirmed by hybrid quantum mechanics/molecular mechanics (QM/MM) full optimization of the Michaelis–Menten complex and the transition state bound at the active site of the fully solvated enzyme (Extended Data Fig. 4b). Interestingly, this stabilizing effect is similar to that exerted by the Q50 side chain (1.4 kcal mol−1), and the combined effect of these interactions amounts to 2.3 kcal mol−1 (Extended Data Fig. 4a), suggesting some degree of cooperativity. In analogy to the autonomous evolution of oxyanion holes in natural enzymes, in which water molecules are often found to contribute critical hydrogen bonds37, this catalytic element arose independently in the evolution trajectories of both HG series, indicating water’s usefulness in the Kemp elimination reactions. Of note, initial in silico designs of Kemp eliminases included water as an alternative to amino acids as hydrogen bond donors23 due to its flexibility and ability to solvate the developing negative charge in the transition state38.
HG3.R5 substitutions form an intricate interaction network
In HG3.R5, mutations acquired in the first round, such as M172A and K50Q, directly contact the substrate (Extended Data Fig. 5b), while subsequent substitutions fan outward from the active site in an intricate interaction network that optimized ligand placement relative to the catalytic machinery (Extended Data Figs. 5 and 6). In this context, comparing the root mean square fluctuation (RMSF) per residue of HG3.R1–R5 variants with the structure of in silico designed HG3 revealed that three distinct protein regions (21–30, 46–58 and 82–92) exhibited a gradual gain in rigidity during evolution (Extended Data Fig. 6a,b). These rigidifying regions, which line one side of the active site tunnel and include the mutation K50Q, may modulate the contact frequencies of Q50 with the substrate 5-nitrobenzisoxazole (1) (Extended Data Fig. 6c). Noteworthy in this context is a 2.5 Å shift of the indole moiety of W87 toward the ligand, which brings it close to Q50 in HG3.R5. Notably, HG3.17 features a similar movement of the W87 side chain, albeit accompanied by a tryptophan ring flip35 (Fig. 4d,e).
Fitness landscape
To characterize the fitness landscape surrounding the HG3.R5 and HG3.17 variants, we investigated whether their respective mutations could complement each other. We constructed two combined variants, called HG3.R5w17 and HG3.17wR5, by incorporating all additional mutations found in the other improved Kemp eliminase (w17 and wR5, respectively) into HG3.R5 and HG3.17 (Fig. 3c). Surprisingly, both combined enzymes retained Kemp elimination activity, although they exhibited substantially reduced kcat values (Table 1). Intrigued by these results, we shuffled the HG3.R5 and HG3.17 genes in varying ratios. Subsequent analysis of active variants found in the shuffled libraries, complemented with data for variants from the HG3.R5 and HG3.17 evolution trajectories, resulted in 208 unique sequence–function data points. The underlying fitness landscape defined by these data points was elucidated by a principal component analysis of each variant’s embedding vector, which was derived from the evolutionary scale modeling (ESM) algorithm39 (Fig. 5). This analysis revealed a deep valley between the fitness peaks representing the most improved variants, HG3.R5 and HG3.17, with no discernible ridge connecting the sequences. Overall, this empirically deduced fitness landscape indicates that the optimized variants, which are 29 mutations apart, are evolutionarily more distant from each other than from the parental sequence HG3.
Discussion
The fitness landscape is a nearly century-old foundational concept in evolutionary biology40. Today, it is accepted that the shape of this genotype–phenotype fitness map defines evolvability41 and that relatively few mutational paths lead to fitter proteins42. In analogy to natural evolution, the laboratory optimization of enzymes selects high-fitness genotypes and does not usually allow an enzyme to traverse low-fitness valleys between local peaks of intermediate fitness and higher fitness peaks nearby43.
While the way sequence maps function does not change, our results indicate that protein sequence space can be more effectively searched by reducing the number of possible evolutionary pathways through the removal of destabilizing mutations. The resulting ‘condensed’ search space increases the probability of finding evolutionarily accessible mutational paths that can be effectively navigated by the stepwise introduction and combination of single-site mutations in the climb to higher ground. Using this strategy, we obtained an artificial biocatalyst that accelerates proton transfer with true enzymatic efficiency (kcat/kuncat = 6.1 × 108)23 within a remarkably short span of five evolution rounds. Notably, in analogy to natural evolution, each design-built-test cycle reconsidered all amino acid sites predicted to be nondestabilizing, enabling the escape from local maxima by permitting further modifications or reversal of specific mutations.
Evolutionary predictability is a fundamental question in biology and the topic of a long-running debate, reflecting relatively limited empirical data for real fitness landscapes41. While a few studies have provided evidence for convergence44,45,46 (the independent evolution of the same or a similar outcome), other experiments have highlighted how evolutionary trajectories are dependent on random events47,48. Interestingly, careful analysis of these experiments revealed that the degree of parallelism seems to vary according to the organizational level—shared adaptations at the level of genes and metabolic pathways are more common than shared substitutions of nucleotides41,44.
Our results show that although distinct in sequence, the optimized designer enzyme HG3.R5 harnesses the same catalytic principles exploited in other Kemp eliminases21,35. This bodes well for enzyme engineers as it indicates that individual protein scaffolds can house several, equally viable solutions for the same catalytic problem. The existence of more than one proverbial ‘needle in the haystack’ increases the chances of developing highly active catalysts using the experimental and in silico tools in hand today.
Methods
Materials
Commercial reagents are listed in Supplementary Data 2. Oligonucleotides (except for oligo pools) were ordered from Microsynth. Clonal genes and oligo pools were ordered from Twist Bioscience.
Synthesis of 5-nitrobenzisoxazole (1)
The synthesis of 5-nitrobenzisoxazole (1) was performed according to published protocols51. 1,2-Benzisoxazole (5 ml, 5.87 g) was added to concentrated H2SO4 (20 ml) at 0 °C until the solution turned yellow. A mixture of concentrated HNO3 (3.4 ml) and concentrated H2SO4 (1.3 ml) was added slowly at 0 °C, and the solution was stirred for 30 min. The reaction product was poured onto an ice/water mixture (1:1, 100 ml), and the crystals that formed were collected by filtration, washed with ice-cooled water and dried. The crude product was purified by normal-phase flash chromatography (RediSep column) with cyclohexane and ethyl acetate as mobile phases. The solvent was removed to yield 5-nitrobenzisoxazole (1) (3.05 g) as colorless needles (1H NMR (500 MHz, CDCl3) δ = 8.90 (d, 1 H), 8.73 (d, 1 H), 8.51 (dd, 1 H), and 7.76 (d, 1 H) and 13C NMR (500 MHz, CDCl3) δ = 164.4, 147.1, 144.8, 125.6, 121.9, 119.2, and 110.5).
Plasmid constructs
The HG3 and HG3.17 constructs were ordered from Twist Bioscience as cloned genes in a pET28b vector between the NcoI/XhoI restriction sites containing a C-terminal His6-tag with a stop codon. The pelB-HG3 and pelB-HG3.17 variants used for screening were obtained by cloning the genes into the pET22b vector using NcoI/XhoI restriction enzymes, introducing an N-terminal pelB signal peptide and a C-terminal His6-tag. Sequences of all genes are reported in the Supplementary Information.
Generation of complex single-site libraries using oligo pools
The cloning procedure is illustrated in Extended Data Fig. 1.
The oligo pool from Twist Bioscience was resuspended in MilliQ-water to yield a DNA concentration of 5 ng µl−1. The pool was amplified by PCR (25 µl) using KAPA HiFi HotStart DNA polymerase according to the manufacturer’s protocol, with the common forward and reverse primers with a 2 µl oligo pool. The individual oligo subpools were amplified analogously with subpool-specific primers (designed with LibGENiE19) and 1 µl purified oligo pool amplification product (~50 ng). The flanking upstream and downstream regions of the fragments were amplified using the parent variant as a template and corresponding primers (designed with LibGENiE19) in combination with T7_fw (TAATACGACTCACTATAGGG) or T7_rv (TGCTAGTTATTGCTCAGCGG).
Full genes were reassembled by an overlap extension PCR (25 µl) with KAPA HiFi HotStart DNA polymerase according to the manufacturer’s protocol. In total, 1 µl amplified oligo subpool and 1 µl upstream and downstream PCR were used as template fragments with the T7_fw and T7_rv primers. The protocol included eight amplification cycles without flanking T7 primers, followed by 25 cycles with T7 primers.
The reassembled genes were introduced into the pET22b vector containing the pelB signal tag using the MEGAWHOP52 cloning strategy. The PCR mixture (50 µl) contained 10 µl 5× Q5 reaction buffer, 2 µl dNTPs (10 mM each), 6 µl purified overlap extension PCR product (50–150 ng µl−1), 1 µl Q5 high-fidelity DNA polymerase and 1 µl template plasmid (100 ng, pET22b-pelB containing the parent gene of the evolution round). The template DNA was digested by incubation with DpnI (37 °C for 2 h).
Generation of combinatorial libraries
The cloning procedure is illustrated in Extended Data Fig. 2.
Primers encoding beneficial mutations and the parent amino acid were ordered from Microsynth. If beneficial mutations had appeared at the same site or in close proximity, they were encoded combinatorically on primers spanning the corresponding region using degenerate codons if feasible. Primers for such regions were mixed in equimolar quantities, ensuring an even distribution of the mutations.
Gene fragments spanning primer regions were produced with complementary overlaps. The PCR (50 µl) contained 10 µl 5× Q5 reaction buffer, 1 µl dNTPs (10 mM each), 1 µl template plasmid (50–100 ng µl−1 of pET22b-pelB containing the parent gene for the evolution round), 2.5 µl combinatorial primer mix (10 µM), 2.5 µl complementary overlap primer and 1 µl Q5 high-fidelity DNA polymerase.
The genes were reassembled by an overlap extension PCR (50 µl) with Q5 high-fidelity DNA polymerase according to the manufacturer’s protocol. The reaction contained 1 µl of each purified fragment as a template and 2.5 µl of the T7_fw and T7_rv primers, respectively. The temperature protocol included eight amplification cycles without flanking T7 primers, followed by 25 cycles including the flanking T7 primers.
The PCR product was cloned in a pET22b-pelB vector using restriction enzymes XbaI and XhoI followed by a ligation step with T4 ligase. All procedures were carried out according to the manufacturer’s protocols.
Generation of shuffled libraries
The genes of HG3.17 and HG3.R5 were amplified from the pET22b-pelB plasmid with T7_fw and T7_rv primers using the Q5 high-fidelity DNA polymerase according to the manufacturer’s protocol. The purified PCR products were mixed in equal quantities (500 ng each) and digested with 5 mU of DNAseI in 50 µl buffer (20 mM MnCl2 and 100 mM Tris (pH 7.4)) for 2, 3 and 5 min at 15 °C. DNAseI was inactivated (80 °C for 10 min), and the reactions were purified. In total, 4 ng of digested gene fragments were reassembled with a step-down PCR (50 µl) without primers using the Q5 high-fidelity DNA polymerase according to the manufacturer’s protocol. Reassembled genes were introduced into a pET22b vector containing the pelB signal tag with MEGAWHOP52 cloning as described earlier.
Screening of the single-site variant, hit combination and shuffled libraries
Chemically competent E. coli NEB10 cells were transformed with 5 µl of cloning products. Colonies were grown on LB agar plates (100 mg l−1 ampicillin). All colonies were scraped off the plate, and plasmids were isolated with the NucleoSpin plasmid kit. Chemically competent E. coli BL21 (DE3) cells were transformed with 5 µl of the isolated plasmid and grown on LB agar plates (100 mg l−1 ampicillin). Precultures were prepared by inoculating 1 ml LB medium (100 mg ml−1 ampicillin) in 96-well deep-well plates (DWP) with single E. coli BL21 colonies and grown overnight at 30 °C with 300 rpm in a Duetz-System (Adolf Kühner AG). In total, 2 µl of the preculture was used to inoculate 1 ml of ZYM-5052 autoinduction medium (100 mg ml−1 ampicillin) in a 96-well DWP. The main cultures were grown for 15 h at 30 °C and 24 h at 20 °C at 300 rpm in a Duetz-System.
The activity assay was performed in 96-well microtiter plates. Main cultures were diluted into reaction buffer (50 mM sodium phosphate (pH 7) and 100 mM NaCl) to obtain 100 µl of final volume. The ratio of culture to reaction buffer was adjusted according to the evolution round (R1: 20:80; R2: 20:80; R3: 10:90; R4: 1:99; R5: 0.5:99.5). The substrate (25 mM 5-nitrobenzisoxazole (1) in acetonitrile) was diluted into the reaction buffer (1:50), and 100 µl of this solution was immediately transferred to the diluted main cultures, initiating the reaction in a total volume of 200 µl containing 250 µM 5-nitrobenzisoxazole (1), 1% acetonitrile and round dependent volume of grown cultures (R1, 10 µl; R2, 10 µl; R3, 5 µl; R4, 0.5 µl and R5 0.25 µl). The reaction was monitored at 380 nm and 35 °C on a Tecan Spark (Tecan Group).
Production and purification of the HG3 variants
Genes were cloned into the pET28b plasmid between NcoI/XhoI restriction sites (without the pelB-leader sequence but with the C-terminal His6-tag and stop codon). A preculture of 6 ml LB medium (50 µg ml−1 kanamycin) was grown overnight at 37 °C. In total, 500 ml of LB medium (50 µg ml−1 kanamycin) was inoculated with 5 ml preculture and incubated at 37 °C. Production of the HG3 protein was induced when the optical density at 600 nm (OD600) reached a value of 0.6 by adding 250 µM isopropyl-β-d-thiogalactopyranoside. The culture was then incubated overnight at 18 °C.
Cells were collected by centrifugation for 60 min at 3,700g, 4°C, resuspended in 15 ml buffer (50 mM Tris–HCl (pH 7.4) and 500 mM NaCl), lysed by ultrasonic treatment (amplitude, 50%; pulse, 1 s/1 s and time, 1 min; Sonoplus) and clarified by centrifugation (1 h at 21,000g) as well as filtration (0.45 µm). The lysate was purified via nickel affinity chromatography on an ÄKTA Pure FPLC system (GE Healthcare) using a 5 ml HisTrap FF crude column (Cytiva). Loading buffer consisted of 500 mM NaCl and 20 mM imidazole in 50 mM Tris–HCl (pH 7.4), while the elution buffer contained 500 mM NaCl and 300 mM imidazole in 50 mM Tris–HCl (pH 7.4). Samples were desalted with three 5 ml HiTrap desalting columns (Cytiva) into a buffer containing 5 mM sodium phosphate (pH 6.0) and 20 mM NaCl and concentrated by ultrafiltration (Amicon Ultra-4 10K; Merck Millipore).
Progress curves
Progress curves to obtain kinetic parameters were recorded according to the protocol given in ref. 35. The 10× concentrated Kemp eliminase variant stocks were prepared in reaction buffer (55.5 mM sodium phosphate (pH 6.8) and 111.1 mM NaCl), and 10× concentrated stocks of 5-nitrobenzisoxazole (1) were prepared in methanol. In total, 120 μl of reaction buffer and 15 μl of the enzyme stock solution were added to a quartz cuvette, and the absorption at 440 nm in a Cary 60 UV–Vis spectrophotometer (Agilent Technologies) was set to zero. The assay was initiated by the addition of 15 μl 10× concentrated substrate stock. The final enzyme concentration in the reaction buffer varied according to the variant (HG3, 19.0 μM; HG3.17, 61.2 mM; HG3.R1, 9.02 μM; HG3.R2, 2.23 μM; HG3.R3, 1.33 μM; HG3.R4, 225 nM; HG3.R5, 126 nM; HG3.R5w17, 3.88 μM and HG3.17wR5, 3.70 μM).
Progress curves were measured by observing product formation at 440 nm (25 °C), applying an extinction coefficient of 1,050 M−1 cm−1. The initial substrate concentration was evaluated by diluting the reaction in an alkaline solution and measuring the product absorbance at 380 nm, applying an extinction coefficient of 15,800 M−1 cm−1 (15 μl reaction mixture was diluted in 235 μl of a 0.68 M sodium hydroxide solution). The data were analyzed using KinTek Explorer (6.1)53,54 to extract the steady-state parameters (Supplementary Figs. 1a and 2a).
The rate constant for the background reaction (k1) was fixed to the value reported in the literature (k1 = 1.04 × 10−4)35. The reaction rate constants for HG3 and HG3.17 were corrected for a conformational selection step as reported in ref. 35 (k2HG3 = 2.32 × 10−4 s−1, k−2HG3 = 8.12 × 10−5 s−1, k2HG3.17 = 16.7 × 10−4 s−1 and k−2HG3.17 = 6.68 × 10−5 s−1). These reaction rates (k2 and k−2) were not available for other variants, which were assumed to be fully active as isolated to avoid overestimation of their activity. Furthermore, the on-rates for substrate and product (that is, k3 and k−5, respectively) were fixed to a diffusion-limited value of 1,000 μM−1 s−1. This left three parameters for fitting (that is, k−3, k4 and k5). The individual microscopic rate constants cannot be determined by fitting, but the values for kcat and Km,substrate can be estimated reliably using the equations below:
Initial rate analysis
The kcat/Km values shown in Table 1 were determined using initial rates analysis for product formation. Reactions were initiated by adding enzyme (HG3, 750 nM; HG3.R1, 459 nM; HG3.R2, 112 nM; HG3.R3, 65.7 nM; HG3.R4, 25.1 nM; HG3.R5, 6.6 nM; HG3.17, 6.8 nM; HG3.17wR5, 209 nM; HG3.R5w17, 208 nM) to the benzisoxazole substrate (31.5 µM to 2.0 mM final concentration) in 10% methanol, 100 mM NaCl and 50 mM sodium phosphate buffer (pH 7). Product formation was monitored at 380 nm with ε380 (pH 7) = 15,784.9 M−1 cm−1 (\({\varepsilon }_{380}={\varepsilon }_{\max }/\left(1+{10}^{\left({{\rm{p}}K}_{a}-{\rm{pH}}\right)}\right)\); εmax = 15,800 M−1 cm−1) in a Cary 60 ultraviolet/visible spectrometer (Agilent Technologies) at 27 °C using glass cuvettes (1 cm path length). Data were fitted to the linear portion of the Michaelis–Menten model (v0 = (kcat/Km)[E0][S]), and kcat/Km was deduced from the slope. The substrate stock solution concentration was determined by preparing a 2000-fold dilution in 10% methanol containing 10 mM sodium hydroxide and measuring the sample absorption at 380 nm with ε380 (pH > 9) = 15,800 M−1 cm−1.
Melting temperature determination
The melting temperature of the purified variants was determined with a Prometheus Panta (NanoTemper Technologies). The proteins were prepared at approximately 1 mg ml−1 in 5 mM sodium phosphate buffer (pH 6) containing 20 mM NaCl. The thermal unfolding experiment was performed from 20 to 95 °C with a ramp of 1 °C min−1. The derived melting profile was analyzed using the accompanying software (PR.Panta Analysis v1.6.3).
Crystallization of HG3.R5 with the TSA 6-nitrobenzotriazole (3)
HG3.R5 was produced and purified as described above, but the desalting step was performed on a size-exclusion column (HiLoad 16/600 Superdex 75 prep grade) with a buffer containing 5 mM sodium phosphate (pH 6) and 20 mM NaCl. The fractions containing protein were concentrated by ultrafiltration (Amicon Ultra-15 10K; Merck Millipore). The protein solution was prepared at a concentration of 30 mg ml−1 containing 5 mM 6-nitrobenzotriazole (3) (from a 100 mM stock solution in DMSO), resulting in a solution containing 5% DMSO, 4.75 mM sodium phosphate at pH 6 and 19 mM NaCl. Crystallization was performed using the sitting drop vapor diffusion method at 20 °C in Intelli-Plates 96-3 LVR (Hampton Research) at the Protein Crystallization Center (University of Zurich, Switzerland). A 150 nl drop of enzyme and substrate solution (30 mg ml−1 enzyme with 5 mM 6-nitrobenzotriazole (3)) was mixed with 150 nl precipitants (100 mM Tris–acetate (pH 7.4) and 1.3 M (NH4)2SO4) and equilibrated against a 50 µl reservoir solution (100 mM Tris–acetate (pH 7.4) and 1.3 M (NH4)2SO4). Crystal formation was observed after 25 days of incubation, and crystals were picked after incubation for a total of 41 days. In total, 2 µl of 30% glycerol in a reservoir solution was added to the drop as a cryoprotectant. The crystals were snap-frozen and stored in liquid nitrogen.
Diffraction data were collected at the Swiss Light Source (Paul Scherrer Institute, Switzerland) on the beamline X06SA-PX at a temperature of 100 K and wavelength of 1.0000 Å. The data were (1) indexed and integrated with XDS (version 10 January 2022), (2) scaled and merged with Aimless (v0.7.9) and (3) solved using the experimental Kemp structure PDB code 5RGA (chain A) by molecular replacement using MOLREP (v11.0)55. Iterative refinement and manual model building steps were performed with REFMAC (v5.8.0419)56 and Coot (v0.9.8.92)57, respectively, embedded in the CCP4i2 (v8.0.013)58 suite. The final model with a resolution of 1.5 Å and a MolProbity59 score of 1.13 contained two protein chains (A and B) including the TSA (3).
Library design based on evolutionary information
An initial multiple sequence alignment (MSA) was created with the online HHblits tool60 (https://toolkit.tuebingen.mpg.de/tools/hhblits), using the UniRef30_2022_02 database and default parameters. The MSA was further processed with HHfilter61 (https://toolkit.tuebingen.mpg.de/tools/hhfilter) with the following settings: max ident: 90; min seq ident: 30; rest default. Variants were selected based on the consensus and frequency strategy outlined in the HotSpot Wizard overview30, leading to 76 additional variants included in each single-site library.
Library design based on computational mutational scanning
ΔΔG predictions were based on a protocol published by the official Rosetta forums (https://www.rosettacommons.org/node/11126, 10 February 2021), and the relevant Python script can be found in the Supplementary Information. Each variant was predicted three times, and the lowest energy obtained was compared to the wild-type energies to calculate differences in free energy (Python version 3.8, PyRosetta version available at PyRosetta4.Release.python38.linux r275 2021.07+release.c48be26 c48be2695c4ba637c6fa19ee5d289fd9a8aa99ef).
Library design based on tunnel and ligand analysis
Enzyme tunnel analysis was performed with CAVER 3.0 (ref. 49) using the following parameters: probe_radius 0.9, shell_radius 4.0, shell_depth 5.0, frame_weighting_coefficient 1.0 and frame_clustering_threshold 1.0. The TSA (3) was introduced by aligning the crystal structure of HG3.17 (PDB 4BS0) to an AlphaFold50 model of each round’s parent variant. Sites for randomization were selected based on distances to the ligand and tunnels.
Fitness landscape
The protein sequences were encoded using the esm.pretrained.esm2_t48_15B_UR50D() model39. After extracting each variant’s embedding, these were reduced in dimensionality through principal component analysis (python package: sklearn, default settings, n_components = 2). In the absence of experimental data, the space outside the measured variants was set to 0. Within the area of measured variants, we performed radial basis function interpolation (python package: scipy, smooth = 0.1) and applied a further filter (python package: scipy.ndimage.uniform_filter, size = 2, mode = ‘constant’) to reduce ruggedness.
MD simulation of HG3 variants
The crystal structures of HG3.R5 (PDB 8RD5) were used for the homology modeling of HG3.R1, HG3.R2, HG3.R3 and HG3.R4. Simulations were also carried out with HG3 (PDB 5RGA) and HG3.17 (PDB 5RGE).
The cocrystallized TSA (3) was manually replaced in all variants by the substrate 5-nitrobenzisoxazole (1). The PARSE force field was used at a pH 7, and protonation states for the titratable amino acids of all variants were determined using PROPKA3 (ref. 62), except for catalytic base D127 which was set up deprotonated.
Substrate parameters were generated using Ambertools (23.3) packages antechamber and parmchk2 (ref. 63) with bond charge correction (AM1-BCC). The resulting mol2 and frcmod files were converted into xml files using ParmEd (4.1.0). The simulations were performed using OpenMM (8.1.0)64 with the ff14SB force field65. The rectangular box with a padding of 1.0 nm was solvated with water (TIP3P water model), and the total charge was neutralized. Energy minimization was conducted until 10 kJ mol−1 tolerance energy was reached. A temperature of 308.15 K and a pH of 7 were set, and the Langevin integrator was used with a friction coefficient of 1 ps−1 and a step size of 2.
The long-range Coulomb interactions were computed by the particle mesh Ewald method, with a cutoff of 1 nm for the direct space interactions. The solvent was equilibrated by 5 ns NVT equilibration (using a Langevin integrator for temperature coupling) followed by 5 ns pressure coupled NPT equilibration using a Monte Carlo barostat at 1 atm pressure and 308.15 K reference temperature. The protein backbone and the substrate were restrained with a force of 100 kJ mol−1 Å−2. Finally, all restraints were removed, and a free equilibration of 5 ns was performed. The resulting system was used to produce five replicates of 100 ns each, with periodic boundary conditions. Every 1,000 steps, the trajectories were recorded.
Analysis was conducted using MDTraj66. RMSF of the backbone Cα was calculated by superposing all frames onto the first frame as a reference.
Contact frequency between protein and 5-nitrobenzisoxazole (1) atoms was calculated with the ContactTrajectory function of the MDTraj (1.9.9)66 derived Contact Map Explorer package. All substrate atoms were used as queries, while the full protein served as a haystack with a cutoff distance of 3 Å. Replicates in which the substrate left the binding site during simulation were not considered for this analysis.
QM calculations
Restricted geometry optimizations and transition state searches were carried out with Gaussian 16 (ref. 67) on cluster models. Cluster models were generated from chain A of the crystallographic structure of HG3.R5 (PDB 8RD5) in complex with the TSA (3) and replaced with fully optimized 5-nitrobenzisoxazole (1). Cluster models include the substrate and the following residues: W44, P45, N47 (backbone only), S48 (backbone only), L49 (backbone only), Q50, G82, and G83 and D127 (side chain up to Cβ). All C atoms except substrate positions 3, 3a and 7a were kept fixed at the crystallographic coordinates. Calculations were carried out using the M06-2X68 hybrid functional and 6-31G(d) basis set with ultrafine integration grids. Solvent effects in water were considered implicitly through the IEF-PCM polarizable continuum model69. The energies of the calculated structures are summarized in Supplementary Table 2.
QM/MM calculations
The initial geometry for the HG3.R5–5-nitrobenzisoxazole (1) complex was generated from chain A of the crystallographic structure (PDB 8RD5) of HG3.R5 in complex with the TSA (3) and replacing its coordinates with the optimized coordinates of 5-nitrobenzisoxazole (1) (M06-2X/6-31G(d) level of theory). Crystallographic waters were preserved. The enzyme–substrate complex was then prepared for classical MD relaxation with the Amber 22 (ref. [63) suite using the ff19SB70 force field for the protein and gaff2 (ref. 71) for the substrate. The complex was immersed in a water box with a 10 Å buffer of OPC72 water. A two-stage geometry optimization approach was implemented. The first stage minimizes only the positions of solvent molecules, and the second stage is an unrestrained minimization of all the atoms in the simulation cell. The systems were then heated for 100 ps by incrementing the temperature from 0 to 300 K under a constant pressure of 1 atm and periodic boundary conditions with a timestep of 1 fs. Harmonic restraints of 50 kcal mol−1 Å−2 were applied to the solute, with the Andersen73 temperature coupling scheme to control and equalize the temperature. The last geometry of the equilibration trajectory was extracted, and solvent molecules were trimmed, leaving an 8 Å thick water shell around the solute.
Full geometry optimizations and transition state searches were carried out with Gaussian 16 (ref. 67; ONIOM74 QM/MM method, with electrostatic embedding) using the M06-2X hybrid functional and 6-31+G(d) basis set with ultrafine integration grids for the QM part. The MM part was described with the ff14SB65, gaff2 (ref. 71) and TIP3P75 force fields for the protein, ligand and water, respectively. The QM layer includes the whole 5-nitrobenzisoxazole (1), the side chains (up to Cβ) of D127 and Q50 and the two conserved water molecules shown in Extended Data Fig. 4. Only solvent molecules were kept frozen, except for the two water molecules in the QM layer, to allow adaptation of the enzyme along the reaction path. All stationary points were characterized by a frequency analysis performed at the same level of theory, from which thermal corrections were obtained at 298.15 K. ONIOM energies, entropies, enthalpies, Gibbs free energies and lowest frequencies of the calculated structures are summarized in Supplementary Table 3. The effect of crystallographic water we on the activation barrier was computed by manually removing we and reoptimizing the reactant and transition state at the same level of theory.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
The data that support the findings of this work are available in this article, Extended Data Figs. 1–6, Source Data, Supplementary Data 1 and 2 and Supplementary Information. Crystallographic coordinates of the binary complex of HG3.R5 have been deposited in the PDB as 8RD5. The HG3 and HG3 variant crystal structures used in molecular replacement, preparation of comparative figures, MD and QM/MM experiments can be accessed via PDB codes 5RGA, 5RGE, 8RD5, 4BS0, 7K4Q and 7K4Z. MD trajectories, computed geometries and energies can be accessed through the Zenodo repository at https://doi.org/10.5281/zenodo.12756879 (ref. 76). Source data are provided with this paper.
Code availability
The Python script to make ΔΔG predictions can be found in the Supplementary Information. Data and scripts used to build the protein fitness landscapes can be accessed via https://github.com/ccbiozhaw/FitLan.
References
Tracewell, C. A. & Arnold, F. H. Directed enzyme evolution: climbing fitness peaks one amino acid at a time. Curr. Opin. Chem. Biol. 13, 3–9 (2009).
Maynard Smith, J. Natural selection and the concept of a protein space. Nature 225, 563–564 (1970).
Romero, P. A. & Arnold, F. H. Exploring protein fitness landscapes by directed evolution. Nat. Rev. Mol. Cell Biol. 10, 866–876 (2009).
Aharoni, A. et al. The ‘evolvability’ of promiscuous protein functions. Nat. Genet. 37, 73–76 (2005).
Chen, K. & Arnold, F. H. Tuning the activity of an enzyme for unusual environments: sequential random mutagenesis of subtilisin E for catalysis in dimethylformamide. Proc. Natl Acad. Sci. USA 90, 5618–5622 (1993).
Stemmer, W. P. C. Rapid evolution of a protein in vitro by DNA shuffling. Nature 370, 389–391 (1994).
Buller, R. et al. From nature to industry: harnessing enzymes for biocatalysis. Science 382, eadh8615 (2023).
Wang, Y. et al. Directed evolution: methodologies and applications. Chem. Rev. 121, 12384–12444 (2021).
Honda Malca, S. et al. Effective engineering of a ketoreductase for the biocatalytic synthesis of an ipatasertib precursor. Commun. Chem. 7, 46 (2024).
Huffman, M. A. et al. Design of an in vitro biocatalytic cascade for the manufacture of islatravir. Science 366, 1255–1259 (2019).
Guo, H. H., Choe, J. & Loeb, L. A. Protein tolerance to random amino acid change. Proc. Natl Acad. Sci. USA 101, 9205–9210 (2004).
Drummond, D. A., Silberg, J. J., Meyer, M. M., Wilke, C. O. & Arnold, F. H. On the conservative nature of intragenic recombination. Proc. Natl Acad. Sci. USA 102, 5380–5385 (2005).
Bloom, J. D., Labthavikul, S. T., Otey, C. R. & Arnold, F. H. Protein stability promotes evolvability. Proc. Natl Acad. Sci. USA 103, 5869–5874 (2006).
Axe, D. D., Foster, N. W. & Fersht, A. R. A search for single substitutions that eliminate enzymatic function in a bacterial ribonuclease. Biochemistry 37, 7157–7166 (1998).
Shafikhani, S., Siegel, R. A., Ferrari, E. & Schellenberger, V. Generation of large libraries of random mutants in Bacillus subtilis by PCR-based plasmid multimerization. BioTechniques 23, 304–310 (1997).
Reetz, M. Making enzymes suitable for organic chemistry by rational protein design. ChemBioChem 23, e202200049 (2022).
Jochens, H. & Bornscheuer, U. T. Natural diversity to guide focused directed evolution. ChemBioChem 11, 1861–1866 (2010).
Jochens, H., Aerts, D. & Bornscheuer, U. T. Thermostabilization of an esterase by alignment-guided focussed directed evolution. Protein Eng. Des. Sel. 23, 903–909 (2010).
Patsch, D., Eichenberger, M., Voss, M., Bornscheuer, U. T. & Buller, R. M. LibGENiE—a bioinformatic pipeline for the design of information-enriched enzyme libraries. Comput. Struct. Biotechnol. J. 21, 4488–4496 (2023).
Privett, H. K. et al. Iterative approach to computational enzyme design. Proc. Natl Acad. Sci. USA 109, 3790–3795 (2012).
Blomberg, R. et al. Precision is essential for efficient catalysis in an evolved Kemp eliminase. Nature 503, 418–421 (2013).
Casey, M. L., Kemp, D. S., Paul, K. G. & Cox, D. D. Physical organic chemistry of benzisoxazoles. I. Mechanism of the base-catalyzed decomposition of benzisoxazoles. J. Org. Chem. 38, 2294–2301 (1973).
Röthlisberger, D. et al. Kemp elimination catalysts by computational enzyme design. Nature 453, 190–195 (2008).
Thorn, S. N., Daniels, R. G., Auditor, M.-T. M. & Hilvert, D. Large rate accelerations in antibody catalysis by strategic use of haptenic charge. Nature 373, 228–230 (1995).
Khersonsky, O. et al. Bridging the gaps in design methodologies by evolutionary optimization of the stability and proficiency of designed Kemp eliminase KE59. Proc. Natl Acad. Sci. USA 109, 10358–10363 (2012).
Knowles, J. R. Enzyme catalysis: not different, just better. Nature 350, 121–124 (1991).
Broom, A. et al. Ensemble-based enzyme design can recapitulate the effects of laboratory directed evolution in silico. Nat. Commun. 11, 4808 (2020).
Kellogg, E. H., Leaver‐Fay, A. & Baker, D. Role of conformational sampling in computing mutation‐induced changes in protein structure and stability. Proteins 79, 830–838 (2011).
Frenz, B. et al. Prediction of protein mutational free energy: benchmark and sampling improvements increase classification accuracy. Front. Bioeng. Biotechnol. 8, 558247 (2020).
Bendl, J. et al. HotSpot Wizard 2.0: automated design of site-specific mutations and smart libraries in protein engineering. Nucleic Acids Res. 44, W479–W487 (2016).
Horton, R. M., Cai, Z., Ho, S. N. & Pease, L. R. Gene splicing by overlap extension: tailor-made genes using the polymerase chain reaction. BioTechniques 54, 129–133 (2013).
Meyerhans, A., Vartanian, J.-P. & Wain-Hobson, S. DNA recombination during PCR. Nucleic Acids Res. 18, 1687–1691 (1990).
Judo, M. S. B., Wedel, A. B. & Wilson, C. Stimulation and suppression of PCR-mediated recombination. Nucleic Acids Res. 26, 1819–1825 (1998).
Kuiper, B. P., Prins, R. C. & Billerbeck, S. Oligo pools as an affordable source of synthetic DNA for cost‐effective library construction in protein‐ and metabolic pathway engineering. ChemBioChem 23, e202100507 (2022).
Otten, R. et al. How directed evolution reshapes the energy landscape in an enzyme to boost catalysis. Science 370, 1442–1446 (2020).
Kries, H., Bloch, J. S., Bunzel, H. A., Pinkas, D. M. & Hilvert, D. Contribution of oxyanion stabilization to Kemp eliminase efficiency. ACS Catal. 10, 4460–4464 (2020).
Pihko, P. M. (ed). Hydrogen Bonding in Organic Synthesis, pp. 43–71 (Wiley, 2009).
Na, J., Houk, K. N. & Hilvert, D. Transition state of the base-promoted ring-opening of isoxazoles. Theoretical prediction of catalytic functionalities and design of haptens for antibody production. J. Am. Chem. Soc. 118, 6462–6471 (1996).
Lin, Z. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023).
Wright, S. The roles of mutation, inbreeding, crossbreeding and selection in evolution. Proc. Sixth Int. Congr. Genet. Vol 1. (ed. Jones, D.) 356–366 (Brooklyn Botanic Garden, 1932).
de Visser, J. A. G. M. & Krug, J. Empirical fitness landscapes and the predictability of evolution. Nat. Rev. Genet. 15, 480–490 (2014).
Weinreich, D. M., Delaney, N. F., DePristo, M. A. & Hartl, D. L. Darwinian evolution can follow only very few mutational paths to fitter proteins. Science 312, 111–114 (2006).
Kauffman, S. & Levin, S. Towards a general theory of adaptive walks on rugged landscapes. J. Theor. Biol. 128, 11–45 (1987).
Tenaillon, O. et al. The molecular diversity of adaptive convergence. Science 335, 457–461 (2012).
Lang, G. I. et al. Pervasive genetic hitchhiking and clonal interference in forty evolving yeast populations. Nature 500, 571–574 (2013).
Woods, R., Schneider, D., Winkworth, C. L., Riley, M. A. & Lenski, R. E. Tests of parallel molecular evolution in a long-term experiment with Escherichia coli. Proc. Natl Acad. Sci. USA 103, 9107–9112 (2006).
Salverda, M. L. M. et al. Initial mutations direct alternative pathways of protein evolution. PLoS Genet. 7, e1001321 (2011).
Blount, Z. D., Barrick, J. E., Davidson, C. J. & Lenski, R. E. Genomic analysis of a key innovation in an experimental Escherichia coli population. Nature 489, 513–518 (2012).
Chovancova, E. et al. CAVER 3.0: a tool for the analysis of transport pathways in dynamic protein structures. PLoS Comput. Biol. 8, e1002708 (2012).
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Hollfelder, F., Kirby, A. J., Tawfik, D. S., Kikuchi, K. & Hilvert, D. Characterization of proton-transfer catalysis by serum albumins. J. Am. Chem. Soc. 122, 1022–1029 (2000).
Miyazaki, K. & Takenouchi, M. Creating random mutagenesis libraries using megaprimer PCR of whole plasmid. BioTechniques 33, 1033–1038 (2002).
Johnson, M. L. & Brand, L. Methods in Enzymology, Vol. 467, pp. 601–626 (Elsevier, 2009).
Johnson, K. A., Simpson, Z. B. & Blom, T. Global kinetic explorer: a new computer program for dynamic simulation and fitting of kinetic data. Anal. Biochem. 387, 20–29 (2009).
Vagin, A. & Teplyakov, A. MOLREP: an automated program for molecular replacement. J. Appl. Crystallogr. 30, 1022–1025 (1997).
Murshudov, G. N., Vagin, A. A. & Dodson, E. J. Refinement of macromolecular structures by the maximum-likelihood method. Acta Crystallogr. D 53, 240–255 (1997).
Emsley, P., Lohkamp, B., Scott, W. G. & Cowtan, K. Features and development of Coot. Acta Crystallogr. D 66, 486–501 (2010).
Potterton, L. et al. CCP4i2: the new graphical user interface to the CCP4 program suite. Acta Crystallogr. D 74, 68–84 (2018).
Williams, C. J. et al. MolProbity: more and better reference data for improved all‐atom structure validation. Protein Sci. 27, 293–315 (2018).
Remmert, M., Biegert, A., Hauser, A. & Söding, J. HHblits: lightning-fast iterative protein sequence searching by HMM–HMM alignment. Nat. Methods 9, 173–175 (2012).
Zimmermann, L. et al. A completely reimplemented MPI bioinformatics toolkit with a new HHpred server at its core. J. Mol. Biol. 430, 2237–2243 (2018).
Olsson, M. H. M., Søndergaard, C. R., Rostkowski, M. & Jensen, J. H. PROPKA3: consistent treatment of internal and surface residues in empirical pKa predictions. J. Chem. Theory Comput. 7, 525–537 (2011).
Amber22 (Univ. California, 2022) https://ambermd.org
Eastman, P. et al. OpenMM 8: molecular dynamics simulation with machine learning potentials. J. Phys. Chem. B 128, 109–116 (2024).
Maier, J. A. et al. ff14SB: improving the accuracy of protein side chain and backbone parameters from ff99SB. J. Chem. Theory Comput. 11, 3696–3713 (2015).
McGibbon, R. T. et al. MDTraj: a modern open library for the analysis of molecular dynamics trajectories. Biophys. J. 109, 1528–1532 (2015).
Gaussian 16 (Gaussian Inc., 2016) https://gaussian.com
Zhao, Y. & Truhlar, D. G. The M06 suite of density functionals for main group thermochemistry, thermochemical kinetics, noncovalent interactions, excited states, and transition elements: two new functionals and systematic testing of four M06-class functionals and 12 other functionals. Theor. Chem. Acc. 120, 215–241 (2008).
Scalmani, G. & Frisch, M. J. Continuous surface charge polarizable continuum models of solvation. I. General formalism. J. Chem. Phys. 132, 114110 (2010).
Tian, C. et al. ff19SB: amino-acid-specific protein backbone parameters trained against quantum mechanics energy surfaces in solution. J. Chem. Theory Comput. 16, 528–552 (2020).
He, X., Man, V. H., Yang, W., Lee, T.-S. & Wang, J. A fast and high-quality charge model for the next generation general AMBER force field. J. Chem. Phys. 153, 114502 (2020).
Izadi, S., Anandakrishnan, R. & Onufriev, A. V. Building water models: a different approach. J. Phys. Chem. Lett. 5, 3863–3871 (2014).
Andersen, H. C. Molecular dynamics simulations at constant pressure and/or temperature. J. Chem. Phys. 72, 2384–2393 (1980).
Vreven, T. et al. Combining quantum mechanics methods with molecular mechanics methods in ONIOM. J. Chem. Theory Comput. 2, 815–826 (2006).
Jorgensen, W. L., Chandrasekhar, J., Madura, J. D., Impey, R. W. & Klein, M. L. Comparison of simple potential functions for simulating liquid water. J. Chem. Phys. 79, 926–935 (1983).
David, P. et al. Enriching productive mutational paths accelerates enzyme evolution. Zenodo https://doi.org/10.5281/zenodo.12756878 (2024).
Morozov, A. V., Kortemme, T., Tsemekhman, K. & Baker, D. Close agreement between the orientation dependence of hydrogen bonds observed in protein structures and quantum mechanical calculations. Proc. Natl Acad. Sci. USA 101, 6946–6951 (2004).
Acknowledgements
We thank K. Hecht and D. Aregger (Competence Center for Biocatalysis, Waedenswil, Switzerland) for the synthesis of 5-nitrobenzisoxazole. This work was created as part of NCCR Catalysis, a National Center of Competence in Research funded by the Swiss National Science Foundation (grant 180544 to R.M.B. and A.K.) and by MCIN/AEI/10.13039/501100011033 (grants PID2021-125946OB-I00 and PDC2022-133725-C22 to G.J.O., and EUR2023-143462 and RYC2022-036457-I to F.P.).
Author information
Authors and Affiliations
Contributions
D.P., M.V., T.S., M.E., M.M., A.K. and R.M.B. initiated and designed the project. D.P., T.S., M.V. and R.M.B. designed the experiments. D.P., T.S., M.V., S.G. and L.S. carried out the experiments. D.P., T.S., M.V. and R.M.B. analyzed the data. D.S., S.H. and T.S. performed the structure elucidation. P.S. carried out molecular dynamics simulations. F.P. and G.J.-O. carried out the QM and QM/MM calculations. D.P., T.S. and R.M.B. wrote the paper with feedback from A.K., U.T.B. and D.H. R.M.B. supervised the project.
Corresponding author
Ethics declarations
Competing interests
All authors declare no competing interests.
Peer review
Peer review information
Nature Chemical Biology thanks Shine CL Kamerlin, Stefan Lutz and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Creation of complex single-site variant libraries using oligo pools.
The gene is split into an appropriate number of fragments corresponding to the maximum length of the available oligos (200 bp in our case). For each desired single-site variant, an individual oligo is designed and later used in an overlap extension PCR with appropriate flanking regions, in this way introducing the desired mutations. In each oligo, a common flanking region (purple) is included, allowing the oligo’s initial amplification from the low-concentrated oligo pool. Another flanking region complementary to the gene (white) permits to specifically amplify oligos covering a certain gene fragment (subpool-specific amplification). Each variant gene is reassembled using an overlap extension PCR using the amplified oligo subpool in combination with gene fragments generated from appropriate upstream and downstream PCRs.
Extended Data Fig. 2 Creation of combinatorial hit libraries.
Identified beneficial mutations are encoded on customized primers. If appropriate, adjacent mutations are grouped on one primer, and degenerate codons are used to cover several amino acids at the same site reducing the overall number of required primers. Next, the gene is amplified in fragments spanning regions between mutations. Finally, an overlap extension PCR is performed to reassemble all mutagenized fragments into the final variant library.
Extended Data Fig. 3 Active site architecture of HG3 variants.
a, Schematic representation of ligand–enzyme interactions found in HG3.R5. The TSA (3) interacts via hydrogen bonds with the backbone amide of M237 as well as the side chains of Q50 (oxyanion stabilization) and D127 (catalytic base). Additionally, the nitro group of the ligand is anchored by van der Waals interactions to the side chain of W44. Of note, the TSA (3) forms a hydrogen bond to an evolutionarily acquired water in the active site of HG3.R5. b, Evolutionary optimization of the angles and distances characterizing the hydrogen-bonding interaction between D127 and the TSA (3). Values are given as the difference (∆) between the optimal angles and distances calculated for hydrogen-bonding interactions between acetamide dimers (δHA = 1.94 Å; θ = 159.4°; ψ = 112.3°)77 and the measured values from the binary crystal structures. In detail, measured distances and angles were obtained from all unique assemblies in the asymmetric unit cell of HG3.R5 (2 assemblies, 8RD5), HG3 (4 assemblies, 5RGA and 7K4Q) and HG3.17 (4 assemblies from 5RGE, 7K4Z and 4BS0). The data are obtained from the above-mentioned assemblies using Pymol (2.5.5) and presented as mean ± SD. c–h, Cut-away view of the TSA (3) bound active site of HG3.R5 (blue, c and f), HG3 (gray, d and g) and HG3.17 (purple, e and h), highlighting the improved shape complementarity of active site and ligand in the evolved variants. Structural illustrations are adapted from the PDB (HG3: 5RGA, HG3.R5: 8RD5, HG3.17: 5RGE).
Extended Data Fig. 4 Quantum mechanical calculations of the Kemp elimination catalyzed by HG3.R5.
a, Cluster model transition structures (full QM). b, Enzyme–substrate complex and transition structures (hybrid QM/MM). The TS stabilization exerted by crystallographic water we was calculated to be similar and additive to that achieved by the side chain of Q50. Note the shorter and better-oriented polar contact between the substrate and the water molecule in the TS, reflecting its contribution to oxyanion stabilization. A high degree of preorganization for catalysis was found in the X-ray structures and is reflected in the very low computed reaction barriers. The cluster QM calculated activation energies are incidentally in good agreement with the experimentally measured turnover number (kcat = 702 s−1; ΔG‡ ≈ 13.6 kcal mol−1 at 25 °C), while the QM/MM ones are substantially lower despite using the same DFT functional and a similar basis set. In this regard, a dramatic influence of the level of theory and implicit solvation on the calculated and ΔE‡ and ΔG‡ was found. Hence, attention should be focused on relative rather than absolute activation barriers. Pymol (2.4.0a) was used for structural representations.
Extended Data Fig. 5 Analysis of the interaction network of mutations acquired over the evolutionary trajectory of HG3.R5.
a, Snapshot of the configuration of the active site of HG3. b, In the first round of evolution, mutations M172A and K50Q were acquired, which are in direct contact with the substrate 5-nitrobenzisoxazole (1). c, In evolution round 2, mutation Q207M complements the prior mutation M172A, while mutation Q90H allows for the formation of a π-stacking interaction with W87. d, In evolution round 3, mutation E131S allows the formation of a hydrogen bond with W87, and mutation M49L opens up space close to P45, which is observed to substantially shift position in the final variant HG3.R5. P45’s conformational change presumably leads to the binding of water in the active site, which contributes to the stabilization of the transition state. e, In evolution round 4, the acquisition of mutation L69V seems to further assist in the observed conformational change of P45. f, In the final evolution round, mutations H209N and Y174S are introduced, which further anchor M237 and S131 via an elaborate water network.
Extended Data Fig. 6 MD simulations of HG3 and its variants.
Each variant was simulated 5 times with the substrate 5-nitrobenzisoxazole (1) for 100 ns recording every 1000 steps (step size = 2.0 fs), resulting in a total of 250000 data points per variant. a, Root-mean-squared fluctuation difference against HG3. Mutations acquired during the evolution are denoted by green markers, while the catalytically active D127 and the K50Q mutations are highlighted by outlined spheres. Three distinct regions exhibiting heightened rigidity (21–30, 46–58 and 82–92) are emphasized with shading. In HG3.R4, a region encompassing D127 (119–127) experiences a loss of rigidity, as indicated by a gray-shaded area. Rigidity is subsequently restored in HG3.R5. b, Crystal structure of HG3.R5 highlighting active site residues and the rigidifying regions (loop 1–loop 4). Loop 2 - comprising the K50Q mutation - gains rigidity compared to HG3 over the course of evolution, while loop 4 - comprising the catalytic active D127 temporarily - gains flexibility in HG3.R4. c, Contact frequencies of active site residues with the substrate 5-nitrobenzisoxazole (1). The data are shown from n = 4 (HG3.R1, HG3.R5 and HG3.17) or n = 5 (HG3, HG3.R2, HG3.R3 and HG3.R4) simulations and presented as mean ± SD.
Supplementary information
Supplementary Information
Supplementary Figs. 1 and 2, Supplementary Tables 1–3 and Supplementary Note (amino acid and nucleotide sequences, cartesian coordinates of cluster model QM structures and Python script for ΔΔG predictions with PyRosetta).
Supplementary Data 1
Supporting data for Supplementary Figs. 1 and 2.
Supplementary Data 2
List of commercial reagents.
Source data
Source Data Fig. 1
Data used to create Fig. 1c,d.
Source Data Fig. 2
Data used to create Fig. 2b,d,e.
Source Data Fig. 3
Data used to create Fig. 3d.
Source Data Fig. 4
Data used to create Fig. 4b,c.
Source Data Fig. 5
Data used to create the 3D plot of the fitness landscape.
Source Data Extended Data Fig. 3
Data used to create Extended Data Fig. 3b.
Source Data Extended Data Fig. 6
Data used to create Extended Data Fig. 6a,c.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Patsch, D., Schwander, T., Voss, M. et al. Enriching productive mutational paths accelerates enzyme evolution. Nat Chem Biol (2024). https://doi.org/10.1038/s41589-024-01712-3
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41589-024-01712-3