Automated design of enzymes with wild-type-like catalytic properties has been a long-standing but elusive goal. Here, we present a general, automated method for enzyme design through combinatorial backbone assembly. Starting from a set of homologous yet structurally diverse enzyme structures, the method assembles new backbone combinations and uses Rosetta to optimize the amino acid sequence, while conserving key catalytic residues. We apply this method to two unrelated enzyme families with TIM-barrel folds, glycoside hydrolase 10 (GH10) xylanases and phosphotriesterase-like lactonases (PLLs), designing 43 and 34 proteins, respectively. Twenty-one GH10 and seven PLL designs are active, including designs derived from templates with <25% sequence identity. Moreover, four designs are as active as natural enzymes in these families. Atomic accuracy in a high-activity GH10 design is further confirmed by crystallographic analysis. Thus, combinatorial-backbone assembly and design may be used to generate stable, active, and structurally diverse enzymes with altered selectivity or activity.
Enzymes can be grouped into families, members of which catalyze nearly identical chemical reactions, but exhibit vast differences in rates and substrate selectivities1,2,3. Conservation of chemical reactivity and diversity in substrate recognition are encoded in a modular architecture, wherein the residues actively taking part in catalysis are conserved in sequence and structure, typically including minute structural details. By contrast, structural elements outside the catalytic core vary substantially, including through insertion and deletion of large protein segments, to encode different substrate selectivities.
Enzymes belonging to the TIM-barrel fold, which is represented in five of the six top-level classes defined by the Enzyme Commission (EC)3,4, are a prime example for this modularity. In each TIM-barrel family, eight parallel β-strands are arranged in a conserved and concentric barrel around the active-site pocket; the α-helices surround the strands and stabilize the pocket. By contrast to the atomic conservation of the catalytic residues in each family, the loops connecting the β-strands to the α-helices are highly variable in length, conformation, and sequence; substrate selectivity is largely encoded in these variable regions. Owing to this structural modularity, new substrate selectivities can evolve through gene recombination among homologous TIM barrels followed by insertion, deletion, and mutation; that is, as long as the scaffold’s structural stability and the geometry of the core catalytic residues are maintained, the loop regions can vary substantially5,6,7. Indeed, more than 70 distinct sequence families in the Structural Classification of Proteins (SCOP) belong to the TIM-barrel fold4,8, demonstrating how modularity has been exploited time and again by evolution. Structural modularity is a hallmark of other versatile enzyme classes, including, for instance, enzymes of the β-propeller, β-trefoil, Rossman, α/α-barrel, and α/β-hydrolase folds9.
Modularity has also been exploited to optimize enzymes through laboratory evolution and structure-based recombination10,11,12. For instance, laboratory genetic recombination among naturally occurring enzymes through structurally conserved sites has generated enzymes with large variations in stability and specific activity13,14,15,16,17,18.
Structure-based recombination has also been used to fuse TIM-barrel fragments and even fragments from unrelated folds, to generate new structures19,20,21,22. These and other structure-based and computational design studies23,24,25 highlighted the structural adaptability of TIM barrels, but the resulting proteins were inactive, and in some cases, iterative laboratory evolution was employed, resulting in activities that were still several orders of magnitude lower than those of the wild type18,22,26,27. Furthermore, de novo enzyme design, whereby constellations of up to four catalytic residues are installed on natural scaffold proteins that do not exhibit the desired activity, targeted elementary reactions and has resulted in marginally stable proteins and catalytic efficiencies that were orders of magnitude lower than those of natural enzymes28,29,30, similarly requiring iterative laboratory evolution to improve stability and rates and to obtain the designed active-site constellation31,32,33. Thus, automated design of stable and sophisticated enzymes exhibiting catalytic efficiencies that rival those of natural ones has been a long-standing though elusive goal34,35,36.
Here, we demonstrate a path to automated design of stable and highly active enzymes. The design method is inspired by the evolution of new enzymes in nature through recombination, insertion, deletion, and mutation37,38. It starts by computationally segmenting all structures belonging to a modular enzyme family along structurally conserved sites and assembling the resulting modular fragments to generate a huge combinatorial diversity of backbones. Instead of using the natural sequences of the fragments, as in natural evolution or laboratory genetic recombination39,40, we next design the sequence of the entire protein (>300 amino acids) to maximize compatibility between the fragments and stabilize the active-site geometry. Each design step, therefore, introduces insertions and deletions as well as dozens of stabilizing mutations. Thus, although our method is inspired by natural evolutionary processes, each step is vastly more radical than individual recombination, insertion, deletion, and mutation events that occur in evolution, in which each event must be at least neutral in fitness or it is likely to be purged37,41. Despite having as many as 150 mutations from any natural enzyme, designed enzymes were stable, structurally accurate and highly active without requiring laboratory evolution. The results, therefore, provide proof-of-principle for fully automated design of stable, diverse, and highly active enzymes that catalyze complex reactions.
The PLL and GH10 families
To test the design method’s generality, we targeted two structurally diverse and well-characterized TIM-barrel enzyme families exhibiting very different activity profiles: Phosphotriesterase-Like Lactonases (PLL) and Glycoside Hydrolase 10 (GH10) xylanases42. PLLs are a group of evolutionarily divergent enzymes43 that possess a bi-metal center, which activates a water molecule for nucleophilic attack on the activated scissile bond of lactones (Fig. 1a). PLLs have potential applications in bacterial biofilm degradation and in the detoxification of organophosphates, including of highly toxic nerve agents44,45,46,47. GH10 enzymes hydrolyze the β-1,4 glycosidic bonds linking the xyloside units that comprise the backbone of the polysaccharide xylan, which is second only to cellulose in abundance in the plant cell wall48; xylanases are, therefore, essential for biomass degradation49,50. The GH10 catalytic core comprises two proximal and structurally fully conserved Glu sidechains, one acting as the nucleophile, which attacks the glycosidic bond, and the other as the protonating acid51 (Fig. 1a). Furthermore, most PLLs are obligate homodimers (Supplementary Fig. 1), while GH10s are monomers, and the PLL family is highly diverse, including members with <25% pairwise sequence identity. Thus, the two enzyme families we targeted for design are unrelated in sequence, oligomeric state, active-site structure, and catalytic activity. In both families, the active-site pocket is complex and comprises positions from most β-α units (units 2–8 in GH10s and 1, 4, 5, 6, and 8 in PLLs, Fig. 1b). By contrast to previous enzyme-design studies, both reactions target biological rather than model substrates, and glycosidic-bond hydrolysis involves a high activation barrier, presenting an additional challenge.
In each family, some of the eight β-α backbone units are structurally highly diverse (Fig. 1c). Structural diversity presents opportunities for vastly increasing the number of potential backbones for design through combinatorial backbone assembly compared to the limited number of backbones observed in experimentally determined structures (114 and 154 for PLL and GH10, respectively). To estimate the potential for diversifying the backbones through combinatorial backbone assembly, we structurally clustered the eight β-α units in the GH10 family by 1 Å root-mean-square deviation (rmsd). We then computed the number of possible combinations, assuming that all β-α units could be recombined with all others, yielding a total of 1010 different backbones, exceeding the number of GH10 structures in the PDB (Protein Database) by eight orders of magnitude. We also reasoned that since all backbone fragments originate from natural enzymes, the likelihood of obtaining stable and functional enzymes from assembly is much higher than using naive scaffold libraries as in past enzyme design28,29,30.
Combinatorial backbone assembly and design
We segmented the structures in each family according to points of maximal structural conservation in the β-strands and extracted the backbone conformations of each segment for use in subsequent backbone assembly52 (Fig. 1c). The choice of how to segment the backbone into β-α units is crucial for design success. Since our method samples backbone fragments independently of one another, each fragment must encode the most important stabilizing contacts; some stabilizing contacts, however, occur between adjacent fragments (Fig. 1d). To test the effects of different segmentation schemes on design success, we chose three segmentations for GH10 designs (Fig. 2): (1) A completely unbiased segmentation, in which each of the eight β-α units comprising the TIM-barrel were sampled independently of one another, maximally sampling backbone conformation space (design series xyl8); (2) A structure-based segmentation, comprising four backbone units, each of which forms stabilizing intrasegment contacts: β-α units 1, 2–4, 5–6, and 7–8 (series xyl4); and (3) another structure-based though discontinuous segmentation, where the least conformationally diverse segments, β-α units 1 and 5–6, formed one constant backbone segment and two other segments were formed by the variable units 2–4 and 7–8 (series xyl3). PLLs, by contrast to GH10s, are obligate homodimers53,54. We, therefore, sampled up to five fragments: one comprising the crucial homodimer interface formed by β-α units 1–3 and 8 and up to three other segments from units 4, 5, 6, and 7 (design series pll2, pll3, and pll4, which were assembled from 2, 3, or 4 PLL backbones, respectively). Thus, the computational design strategy is amenable to encoding a wide range of constraints inferred from experimental or structural analysis.
To assemble backbones and design new sequences, we generalized the Rosetta AbDesign method—originally developed to design new antibodies from backbone fragments of natural ones52,55,56 (Supplementary Fig. 2, Supplementary Movie 1). For each of the segmentation schemes, we started from a random combination of backbone fragments. In each design step, AbDesign samples a single backbone fragment from the conformation database and designs the protein’s amino acid sequence. Since the entire protein (>300 amino acids) needs to be designed to accommodate the large backbone changes introduced in each step, we used position-specific scoring matrices (PSSMs) to constrain amino acid choices at each position to identities that are commonly observed in a multiple-sequence alignment of natural family members. The PSSMs also focus design calculations on a sequence subspace that is more likely to include stable, folded, and active enzymes. Furthermore, the method does not model the enzyme-transition-state complex, which is often associated with modeling uncertainties and inaccuracies33. Instead, residues in direct contact with the bi-metal center in PLLs and the two catalytic Glu residues in GH10 as well as 11 additional residues directly involved in substrate binding, were not allowed to change sidechain conformation during design (Fig. 1b). Following sequence design, the new structure was accepted if it was lower in energy than the previous one, and higher-energy structures were accepted probabilistically. The designs were then ranked by Rosetta energy, clustered by backbone conformation to obtain conformationally unique structures (Supplementary Fig. 3) and subjected to the PROSS stability design algorithm in all regions outside the active-site pockets57. PROSS introduced dozens of mutations to each design (20 ± 6 and 36 ± 7 mutations in GH10 and PLL designs, respectively). Visual inspection indicated that the PROSS-designed mutations eliminated core cavities and improved surface polarity and were, therefore, likely to improve protein stability and expressibility.
Stable and highly active designs
Synthetic genes encoding 34 PLL and 43 GH10 designs were fused C-terminally to maltose-binding protein (MBP), which served as a solubility and affinity-purification tag. The designs were then overexpressed in E. coli BL21 DE3 cells and purified using an amylose (PLL designs) or Ni-NTA (GH10 designs) column. All the designs expressed solubly, and >70% exhibited high expression yields (20–200 mg protein per liter of bacterial culture, Supplementary Fig. 4). Thus, despite as many as 150 mutations relative to any natural enzyme (Table 1, Supplementary Fig. 5), as MBP fusions, the designs did not require iterative rounds of in vitro evolution to optimize expressibility.
We initially screened each of the 43 GH10 designs for xylanase activity with a qualitative assay that measures the formation of reducing sugars released from natural beechwood xylan58, finding that 20 of the 43 designs (46%) were active. We then selected the eight most active designs and two natural GH10 enzymes for quantitative kinetic analysis with the chromogenic substrate 4-nitrophenyl β-xylobioside (O-PNPX2) (Table 1, Supplementary Tables 1 and 2 and Supplementary Fig. 6). The kinetic analysis revealed a wide range of catalytic efficiencies (kcat/KM), and encouragingly, the two most efficient designs, xyl3.1 and xyl3.2, exhibited rates within fivefold of natural GH10 family members (Fig. 3a, b), despite having >100 mutations relative to any natural GH10 enzyme (Table 1).
In many industrial applications, GH10 enzymes are subjected to high temperature and acidic pH. Some designs exhibited maximal activity at 45 °C, and some retained full activity even at 50 °C (Supplementary Fig. 7), with a pH optimum at 6–6.5 (Supplementary Fig. 8), similar to stability and activity profiles of natural GH10 enzymes. We thus concluded that the automated design method yielded several enzymes that were distant in sequence from any natural enzyme, yet showed similar catalytic and stability profiles to those observed in nature.
Natural PLLs exhibit a range of chemically related hydrolytic activities, including the hydrolysis of lactones, esters, and phosphotriesters. We initially tested the 34 designed PLLs with the artificial substrate 5-thiobutyl butyrolactone (TBBL)59, finding seven active designs (Table 1). We subsequently measured the activity of these seven enzymes with a range of substrates: the natural aliphatic γ-nonanoic lactone, the ester p-nitrophenyl acetate, and the pesticide phosphotriester paraoxon (Supplementary Table 3; Fig. 3c, and Supplementary Figs. 9 and 10). PLL activity was tested with Co2+ or Zn2+, and in most cases much higher activity was observed with Co2+, similar to previous reports60. Four PLL designs hydrolyzed the less activated aliphatic γ-nonanoic lactone, four exhibited esterase activity, and six hydrolyzed the phosphotriester paraoxon. Strikingly, the lactonase and esterase catalytic efficiencies of the four most active PLL designs were similar to those of natural PLLs (Table 1 and Fig. 3c, d). Indeed, design pll2.1 exhibited roughly twofold higher efficiency of TBBL hydrolysis and pll2.4 exhibited an order of magnitude higher efficiency in the hydrolysis of γ-nonanoic lactone and the pesticide paraoxon than the two natural enzymes. Hence, the designs exhibited features such as high catalytic efficiency and substrate promiscuity, while sampling sequence and conformation space widely (Supplementary Fig. 3). These designs can, therefore, be used as starting points for altering the selectivity profile or discovering new catalytic activities through active-site design or laboratory evolution.
Thermal stability is an essential property of enzymes in many biotechnological applications and low stability often constrains laboratory evolution of new activities41. Following overexpression of the active GH10 and PLL designs, we proteolytically cleaved the N-terminal MBP fusion, and subjected the enzymes to thermal denaturation, noting that all designs exhibited high apparent melting temperatures (Tm) in the range of 50–82 °C (Table 1 and Supplementary Figs. 11 and 12), comparable to the apparent Tm of natural enzymes in these families, including enzymes from thermophiles.
Atomic precision underlies high catalytic efficiency
The active designs spanned five orders of magnitude in catalytic efficiency. For instance, the most active GH10 design xyl3.1 and the least active one xyl8.3 exhibited OPNPX2 hydrolysis efficiencies of 9417 and 0.61 M-1 s-1, respectively. To understand what were the underlying structural reasons for this vast difference in efficiency, we determined their crystallographic structures (Fig. 4, Supplementary Fig. 13, and Supplementary Table 4). In both structures, the two catalytic Glu residues were positioned as in the design conception (<0.5 Å all-atom root-mean-square deviation (rmsd)). The high-activity design xyl3.1 was also atomically accurate throughout 13 active-site residues that form an intricate hydrogen-bond network surrounding the two catalytic Glu residues (<1 Å all-atom rmsd), and indeed, across the entire protein, with a total backbone rmsd of 0.7 Å. By contrast, the low-activity design xyl8.3 showed conformational changes in the residues that form the active-site hydrogen-bond network (2 Å all-atom rmsd). Furthermore, missing electron density in β-α loops 7 and 8 suggested that at least parts of the active-site pocket were mobile, although design accuracy was high throughout the segments of the design model and experimental structure that could be aligned (0.9 Å backbone rmsd). We, therefore, concluded that accurate positioning of the two catalytic Glu residues was crucial to obtain any level of activity, and that high levels of activity depended on atomic precision and preorganization in a large network of polar residues that surround the catalytic residues.
Combinatorial backbone assembly uses principles inferred from the evolution of enzyme families38: positions at the active-site and ones that are crucial for protein folding and stability are conserved, whereas large backbone and sequence changes, including insertions and deletions, generate vast structural diversity in regions that encode substrate selectivity. Resulting designs were stable and some exhibited atomic accuracy and high catalytic efficiency and promiscuity even with respect to challenging non-activated substrates that are hydrolyzed by natural members of the respective enzyme families. By contrast, computational design of backbones at enzyme active sites has until now failed to show atomic accuracy, and stability and catalytic efficiencies were low33,35,61,62, limiting the application of computational design to model reactions. It is also notable that expert-guided insertions and deletions at enzyme active sites are challenging and typically require rounds of trial-and-error and optimization, including in the PLLs that were the subject of our study7. These laborious iterative strategies can now be bypassed using combinatorial backbone assembly and design. The most active designs were based on fragments from only a few (two or three) different template enzymes, whereas combining fragments from four templates and more generally led to lower efficiency (Fig. 3). It, therefore, appears that future improvements to backbone assembly are needed to fully realize the potential of the design algorithm.
Our method exploits structure and sequence diversity in enzyme families to generate a large combinatorial diversity of backbones, followed by sequence design for stability. The ability demonstrated here to design enzymes exhibiting a large network of active-site residues at atomic accuracy and the resulting high catalytic efficiency greatly simplifies the goal of enzyme design: Instead of depending on accurate transition-state modeling, which, despite recent improvements, still suffers from uncertainty33, conserving the natural active-site pocket suffices for wild-type like stability and catalytic efficiency in designs. Our study, therefore, demonstrates a fully automated path to design of structurally diverse enzymes, which catalyze complex reactions, despite over 100 mutations from any natural enzyme. The resulting enzymes are stable, active, and highly diverse potential starting points for designing new substrate selectivities, providing an alternative to metagenomic screening and iterative in vitro evolution. An important future direction is to combine backbone fragments from non-homologous families63, potentially extending the substrate spectrum beyond that observed within a target enzyme family. Thus, modular backbone assembly and design may provide a path to design of new biocatalysts.
A database of natural GH10 and PLL family enzymes
114 PLL and 154 GH10 structures were downloaded from the Pfam database64,65. The structures were segmented along structurally conserved points in the β-strands. For each segment, the mainchain dihedral angles (ϕ, ψ, and ω) and conformation-dependent PSSMs were computed using AbDesign52.
PLLs were segmented along the following positions (for reference, position numbering is according to the PLL from S. solfataricus, PDB ID: 2VC7): β-α unit 4:135–169, 5:171–195, 6:197–217, 7:219–254. GH10 family enzymes were segmented along the following positions (for reference, position numbering is according to GH10 from Thermoanaerobacterium saccharolyticum, PDB ID: 3W24): β-α unit 1:20–44, 2:47–83, 3:86–142, 4:145–185, 5:188–218, 6:221–248, 7:251–288, 8:291–319.
Constrained catalytic residues
In PLLs, six metal chelating residues were constrained (for reference, position numbering is according to PLL from S. solfataricus, PDB ID: 2VC7): 22, 24, 137, 170, 199, and 256. In GH10s, the following active-site residues were constrained (for reference, position numbering is according to GH10 from T. saccharolyticum, PDB ID: 3W24): 52, 85, 89, 92, 145, 146, 187, 189, 221, 223, 251, 292, and 300.
Combinatorial backbone assembly
For each segmentation scheme, we generated a starting set of 3000 backbone conformations by randomly recombining fragments selected from the conformation databases. The sequence of each design was then optimized using RosettaDesign. From each starting design, 30 steps were taken, in each of which a backbone segment was chosen at random and replaced with a random fragment from the relevant conformation database. Following segment replacement, the sequence of the segment and every residue within 6 Å was designed, and iterations of sidechain packing, backbone and sidechain minimization were conducted to obtain low-energy sequences. The new structure was accepted relative to the previously accepted structure if it passed the Metropolis criterion with a gradually decreasing temperature (simulated annealing Monte Carlo).
Structural clustering of designs
All resulting designs were clustered using MaxCluster (http://www.sbg.bio.ic.ac.uk/maxcluster/).
The clustered designs were subjected to the PROSS stability design algorithm57, and for each starting design, one stabilized variant (PROSS design variant 6) was selected for experimental characterization.
Rosetta energy function
GH10 designs were computed using the Rosetta Talaris14 all-atom energy function66, and PLL designs were computed using the more recent Rosetta energy function REF1567. Both energy functions are dominated by all-atom van der Waals packing, hydrogen bonding, electrostatics, and an implicit solvation model.
Sequence identity analysis
We calculated the sequence identity for each of the designed enzymes relative to the closest natural homolog using BLASTP68 with the NCBI nonredundant (nr) database (ftp://ftp.ncbi.nlm.nih.gov/blast/db/).
Amino acid sequences of active designs are given in Supplementary Note 1.
Paraoxon, p-nitrophenyl acetate, γ-nonanoic lactone, 5,5-dithio-bis-(2-nitrobenzoic acid) (DTNB, Ellman's reagent), m-cresol, and beechwood xylan were purchased from Sigma-Aldrich. TBBL was kindly provided by the Tawfik laboratory59. 4-nitrophenyl β-xylobioside (PNPX2) was purchased from Megazyme.
Synthetic genes of designs and natural enzymes were codon optimized for efficient E. coli expression and custom synthesized as linear fragments by Twist Bioscience. The genes were amplified and cloned into the pETMBPH vector (containing an N-terminal 6-His-tag and MBP69) through the EcoRI and PstI restriction sites. The ligated DNA was transformed into E. coli BL21 DE3 cells, and DNA was extracted for Sanger sequencing to validate accuracy. The list of primers used for cloning is given in Supplementary Table 6.
Protein expression and purification
For small-scale expression, 2 ml of 2YT medium supplemented with 50 μg ml-1 kanamycin (and 0.1 mM ZnCl2 or CoSO4 in case of PLLs) were inoculated with a single colony and grown at 37 °C for ~15 h. In all, 10 ml 2YT medium supplemented with 50 μg ml-1 kanamycin (and 0.1 mM ZnCl2 or CoSO4 in case of PLLs) were inoculated with 0.2 ml overnight culture and grown at 37 °C to an OD600 of ~0.6. Overexpression was induced with 0.2 mM IPTG, and the cultures were grown for ~24 h at 20 °C. After centrifugation and storage at –20 °C, the pellets were resuspended in lysis buffer and lysed by sonication.
PLL lysis buffer: 50 mM Tris (pH 8.0), 100 mM NaCl, 10 mM NaHCO3, 0.1 mM ZnCl2 or CoSO4, benzonase and 0.1 mg ml-1 lysozyme.
GH10 lysis buffer: 50 mM Tris (pH 6.8), 100 mM NaCl, benzonase and 0.1 mg ml-1 lysozyme. The PLL proteins were bound to amylose resin (NEB), washed with 50 mM Tris pH 8.0 with 100 mM NaCl and 0.1 mM ZnCl2 or CoSO4, and eluted with wash buffer containing 10 mM maltose. The GH10 proteins were bound to Ni-NTA resin (Merck), washed with 50 mM Tris pH 6.8 with 100 mM NaCl and 20 mM imidazole, and eluted with wash buffer containing 250 mM imidazole. Elution fraction was used for SDS-PAGE gel and for initial activity measurements. For further analysis of active designs, the expression was repeated with 50 ml culture, and after purification, the proteins were dialyzed in wash buffer. For crystallization, the expression was performed with 500 ml culture, and after purification, the protein was digested with TEV protease to remove the MBP fusion tag (1:20 TEV, 1 mM DTT, 24–48h at room temperature (RT)). The MBP fusion was removed by binding to Ni-NTA resin, and the protein was purified by gel filtration (HiLoad 26/600 Superdex75 preparative grade column, GE). Protein concentration was estimated by OD280 measurement, and protein expression levels were extrapolated to mg protein per liter culture.
Preliminary xylanase screening
Xylanase activity was determined qualitatively by measuring the reducing sugars released from xylan by the dinitrosalicylic acid (DNS) method58. A typical assay mixture consisted of 20 μl citrate buffer (500 mM, pH 6.0) added to 80 μl cell lysate. The reaction was started by adding 100 μl of 2% beechwood xylan suspended in DDW, and the reaction was continued for 20 min at 50 °C. The reaction was stopped by transferring the tubes to an ice-water bath. One-hundred microliters of the supernatant was then added to 150 μl DNS reagent, and the tubes were boiled for 10 min, after which the absorbance was measured at 540 nm. The read was compared to a blank sample (cell lysate expressing MBP), and active xylanase designs were taken for further examination.
The kinetic measurements were performed with purified proteins (fused to MBP) in activity buffer (PLL: 25 °C, 50 mM Tris pH 8.0 with 100 mM NaCl, supplemented with 0.1 mM ZnCl2 or CoSO4, GH10: 37 °C, 50 mM Tris pH 6.5 with 150 mM NaCl). A range of enzyme concentrations was used, depending on the activity. The activity of PLLs was tested at 20–22 °C with TBBL59, by coupling with DTNB and monitoring the absorbance at 412 nm), γ-nonanoic lactone (pH-sensitive assay in 2.5 mM bicine pH 8.3, by monitoring the absorbance of m-cresol indicator at 577 nm70), paraoxon and p-nitrophenyl acetate (monitoring the absorbance of the leaving group at 405 nm). The kinetic measurements were performed in 96-well plates (optical length –0.5 cm), and background hydrolysis rates were subtracted. The activity of GH10s was tested with O-PNPX2 by monitoring the absorbance of the leaving group at 405 nm. No background hydrolysis was observed with O-PNPX2. Specific activity of GH10s was also tested at a range of temperatures (25 °C, 37 °C, 45 °C, 50 °C) and at various pHs (citrate buffer: pH 5.0, 6.0, and 6.5, tris buffer: pH 7.0, 8.0, and 9.0).
Determination of kinetic parameters
Kinetic parameters were obtained by fitting the data to the Michaelis-Menten equation [v0 = kcat[E]0[S]0/([S]0 + KM)] using Prism 7. In cases where solubility limited substrate concentrations, data were fitted to the linear regime of the Michaelis-Menten model (v0= [S]0[E]0kcat/KM) and kcat/KM values were deduced from the slope. The reported values represent the means ± S.D. of at least two independent measurements.
T m measurements
Tm measurements were performed after cleavage of the MBP tag from the designs. Two methods were used: ThermoFluor experiments using SYPRO Orange dye (Sigma-Aldrich) on a ViiA 7 real-time PCR machine, with temperature ramp from 25 °C to 100 °C at 0.05 °K s-1 (ref. 71, and nanoDSF experiments performed on Prometheus™ NT.Plex instrument (NanoTemper Technologies)72. In addition, residual activity of PLL designs was tested following 0.5 h incubation at various temperatures and cooling to RT.
Structure determination and refinement
Crystals of xyl3.1 and xyl8.3 were obtained using the hanging-drop vapor-diffusion method with a Mosquito robot (TTP LabTech). The crystals of xyl8.3 were grown from 8% PEG 3500 and 0.05 M Tri-sodium citrate dihydrate pH = 5.8. The crystals formed in the space group P41212, with one complex per asymmetric unit. A complete dataset to 2.1 Å resolution was collected at 100 °K on a single crystal on in-house RIGAKU RU-H3R X-ray. Crystals of xyl3.1 were grown from 0.5 M (NH4)2H2PO4 and 0.05 M sodium acetate pH=4.5. The crystals formed in the space group H3, with one copy per asymmetric unit. A complete dataset to 1.85 Å resolution was collected at 100 °K on a single crystal on in-house RIGAKU RU-H3R X-ray.
Diffraction images of xyl3.1 and xyl8.3 crystals were indexed and integrated using the Mosflm program73, and the integrated reflections were scaled using the SCALA program74. Structure factor amplitudes were calculated using TRUNCATE75 from the CCP4 program suite. xyl3.1 and xyl8.3 structures were solved by molecular replacement with the program PHASER76. The models used to solve xyl8.3 and xyl3.1 structures were 3W25 and 3MMD, respectively.
All steps of atomic refinement of both structures were carried out with the CCP4/REFMAC5 program77 and by Phenix refine78. The models were built into 2mFobs – DFcalc, and mFobs – DFcalc maps by using the COOT program78,79. Details of the refinement statistics of the xyl8.3 and xyl3.1 structures are described in Supplementary Table 4.
All Rosetta design simultations used git version 2c0dc744fb56459daf220abc159f980b1809ecfe of the Rosetta biomolecular modeling software, which is freely available to academics at http://www.rosettacommons.org. The backbone conformation databases and PSSMs are distributed with the Rosetta release. RosettaScripts64 and command lines are available in Supplementary Data 1–11.
The coordinates of designs xyl8.3 and xyl3.1 are available from the RCSB Protein Data Bank (PDB IDs: 6FHE and 6FHF, respectively). Plasmids encoding the active designs are available from AddGene (IDs 107202–107217). Design protocols are available in the Supplementary Data 12–25. All other data supporting the findings of this study are available from the corresponding author upon reasonable request.
Furnham, N. et al. Exploring the evolution of novel enzyme functions within structurally defined protein superfamilies. PLoS Comput. Biol. 8, e1002403 (2012).
Orengo, C. A. & Thornton, J. M. Protein families and their evolution-a structural perspective. Annu. Rev. Biochem. 74, 867–900 (2005).
Nagano, N., Orengo, C. A. & Thornton, J. M. One fold with many functions: the evolutionary relationships between TIM barrel families based on their sequences, structures and functions. J. Mol. Biol. 321, 741–765 (2002).
Sterner, R. & Höcker, B. Catalytic versatility, stability, and evolution of the (βα)8-barrel enzyme fold. Chem. Rev. 105, 4038–4055 (2005).
Lupas, A. N., Ponting, C. P. & Russell, R. B. On the evolution of protein folds: are similar motifs in different protein folds the result of convergence, insertion, or relics of an ancient peptide world? J. Struct. Biol. 134, 191–203 (2001).
Riechmann, L. & Winter, G. Early protein evolution: building domains from ligand-binding polypeptide segments. J. Mol. Biol. 363, 460–468 (2006).
Afriat-Jurnou, L., Jackson, C. J. & Tawfik, D. S. Reconstructing a missing link in the evolution of a recently diverged phosphotriesterase by active-site loop remodeling. Biochemistry 51, 6047–6055 (2012).
Murzin, A. G., Brenner, S. E., Hubbard, T. & Chothia, C. SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247, 536–540 (1995).
Dellus-Gur, E., Toth-Petroczy, A., Elias, M. & Tawfik, D. S. What makes a protein fold amenable to functional innovation? fold polarity and stability trade-offs. J. Mol. Biol. 425, 2609–2621 (2013).
Arnold, F. H. Directed evolution: Bringing new chemistry to life. Angew. Chem. Int. Ed Engl. 57, 4143–4148 (2017).
Goldsmith, M. & Tawfik, D. S. Enzyme engineering: reaching the maximal catalytic efficiency peak. Curr. Opin. Struct. Biol. 47, 140–150 (2017).
Packer, M. S. & Liu, D. R. Methods for the directed evolution of proteins. Nat. Rev. Genet. 16, 379–394 (2015).
Sieber, V., Martinez, C. A. & Arnold, F. H. Libraries of hybrid proteins from distantly related sequences. Nat. Biotechnol. 19, 456–460 (2001).
Meyer, M. M., Hochrein, L. & Arnold, F. H. Structure-guided SCHEMA recombination of distantly related beta-lactamases. Protein Eng. Des. Sel. 19, 563–570 (2006).
Heinzelman, P. et al. A family of thermostable fungal cellulases created by structure-guided recombination. Proc. Natl Acad. Sci. U.S.A. 106, 5610–5615 (2009).
Ness, J. E. et al. Synthetic shuffling expands functional protein diversity by allowing amino acids to recombine independently. Nat. Biotechnol. 20, 1251–1255 (2002).
Raillard, S. et al. Novel enzyme activities and functional plasticity revealed by recombining highly homologous enzymes. Chem. Biol. 8, 891–898 (2001).
Park, H. S. et al. Design and evolution of new catalytic activity with an existing protein scaffold. Science 311, 535–538 (2006).
Höcker, B., Claren, J. & Sterner, R. Mimicking enzyme evolution by generating new (βα)8-barrels from (βα)4-half-barrels. Proc. Natl Acad. Sci. U.S.A. 101, 16448–16453 (2004).
Akanuma, S. & Yamagishi, A. Experimental evidence for the existence of a stable half-barrel subdomain in the (beta/alpha)8-barrel fold. J. Mol. Biol. 382, 458–466 (2008).
Bharat, T. A. M., Eisenbeis, S., Zeth, K. & Hocker, B. A -barrel built by the combination of fragments from different folds. Proc. Natl Acad. Sci. U.S.A. 105, 9942–9947 (2008).
Eisenbeis, S. et al. Potential of fragment recombination for rational design of proteins. J. Am. Chem. Soc. 134, 4019–4022 (2012).
Offredi, F. et al. De novo backbone and sequence design of an idealized α/β-barrel protein: Evidence of stable tertiary structure. J. Mol. Biol. 325, 163–174 (2003).
Figueroa, M. et al. The unexpected structure of the designed protein Octarellin V.1 forms a challenge for protein structure prediction tools. J. Struct. Biol. 195, 19–30 (2016).
Huang, P. -S. et al. De novo design of a four-fold symmetric TIM-barrel protein with atomic-level accuracy. Nat. Chem. Biol. 12, 29–34 (2016).
Claren, J., Malisi, C., Hocker, B. & Sterner, R. Establishing wild-type levels of catalytic activity on natural and artificial (βα)8-barrel protein scaffolds. Proc. Natl Acad. Sci. USA 106, 3404–3709 (2009).
Sperl, J. M., Rohweder, B., Rajendran, C. & Sterner, R. Establishing catalytic activity on an artificial (βα)8-barrel protein designed from identical half-barrels. FEBS Lett. 587, 2798–2805 (2013).
Jiang, L. et al. De novo computational design of retro-aldol enzymes. Science 319, 1387–1391 (2008).
Siegel, J. B. et al. Computational design of an enzyme catalyst for a stereoselective bimolecular Diels-Alder reaction. Science 329, 309–313 (2010).
Rothlisberger, D. et al. Kemp elimination catalysts by computational enzyme design. Nature 453, 190–195 (2008).
Giger, L. et al. Evolution of a designed retro-aldolase leads to complete active site remodeling. Nat. Chem. Biol. 9, 494 (2013).
Khersonsky, O. et al. Bridging the gaps in design methodologies by evolutionary optimization of the stability and proficiency of designed Kemp eliminase KE59. Proc. Natl Acad. Sci. U.S.A. 109, 10358–10363 (2012).
Kiss, G., Çelebi-Ölçüm, N., Moretti, R., Baker, D. & Houk, K. N. Computational enzyme design. Angew. Chem. Int. Ed. Engl. 52, 5700–5725 (2013).
Tawfik, D. S. Biochemistry. Loop grafting and the origins of enzyme species. Science 311, 475–476 (2006).
Baker, D. An exciting but challenging road ahead for computational enzyme design. Protein Sci. 19, 1817–1819 (2010).
Blomberg, R. et al. Precision is essential for efficient catalysis in an evolved Kemp eliminase. Nature 503, 418–421 (2013).
Romero, P. A. & Arnold, F. H. Exploring protein fitness landscapes by directed evolution. Nat. Rev. Mol. Cell Biol. 10, 866–876 (2009).
Khersonsky, O. & Fleishman, S. J. Why reinvent the wheel? Building new proteins based on ready-made parts. Protein Sci. 25, 1179–1187 (2016).
Stemmer, W. P. DNA shuffling by random fragmentation and reassembly: in vitro recombination for molecular evolution. Proc. Natl Acad. Sci. U.S.A. 91, 10747–10751 (1994).
Voigt, C. A., Martinez, C., Wang, Z.-G., Mayo, S. L. & Arnold, F. H. Protein building blocks preserved by recombination. Nat. Struct. Biol. 9, 553–558 (2002).
Bershtein, S., Segal, M., Bekerman, R., Tokuriki, N. & Tawfik, D. S. Robustness-epistasis link shapes the fitness landscape of a randomly drifting protein. Nature 444, 929–932 (2006).
Lombard, V., Ramulu, H. G., Drula, E., Coutinho, P. M. & Henrissat, B. The carbohydrate-active enzymes database (CAZy) in 2013. Nucl. Acids Res. 42, D490–D495 (2013).
Afriat, L., Roodveldt, C., Manco, G. & Tawfik, D. S. The latent promiscuity of newly identified microbial lactonases is linked to a recently diverged phosphotriesterase. Biochemistry 45, 13677–13686 (2006).
Goldsmith, M. et al. Overcoming an optimization plateau in the directed evolution of highly efficient nerve agent bioscavengers. Protein Eng. Des. Sel. 30, 333–345 (2017).
Dumas, D. P., Durst, H. D., Landis, W. G., Raushel, F. M. & Wild, J. R. Inactivation of organophosphorus nerve agents by the phosphotriesterase from Pseudomonas diminuta. Arch. Biochem. Biophys. 277, 155–159 (1990).
Rémy, B. et al. Harnessing hyperthermostable lactonase from Sulfolobus solfataricus for biotechnological applications. Sci. Rep. 6, 37780 (2016).
Hraiech, S. et al. Inhaled lactonase reduces Pseudomonas aeruginosa quorum sensing and mortality in rat pneumonia. PLoS ONE 9, e107125 (2014).
McCleary, B. V. & McGeough, P. A comparison of polysaccharide substrates and reducing sugar methods for the measurement of endo-1,4-β-xylanase. Appl. Biochem. Biotechnol. 177, 1152–1163 (2015).
Bajpai, P. Application of enzymes in the pulp and paper industry. Biotechnol. Prog. 15, 147–157 (1999).
Dodd, D. & Cann, I. K. O. Enzymatic deconstruction of xylan for biofuel production. Glob. Change Biol. Bioenergy 1, 2–17 (2009).
Withers, S. G. et al. Direct 1H n.m.r. determination of the stereochemical course of hydrolyses catalysed by glucanase components of the cellulase complex. Biochem. Biophys. Res. Commun. 139, 487–494 (1986).
Lapidoth, G. D. et al. AbDesign: An algorithm for combinatorial backbone design guided by natural conformations and sequences. Proteins 83, 1385–1406 (2015).
Hiblot, J., Bzdrenga, J., Champion, C., Chabriere, E. & Elias, M. Crystal structure of VmoLac, a tentative quorum quenching lactonase from the extremophilic crenarchaeon Vulcanisaeta moutnovskia. Sci. Rep. 5, 8372 (2015).
Chow, J. Y. et al. Directed evolution of a thermostable quorum-quenching lactonase from the amidohydrolase superfamily. J. Biol. Chem. 285, 40911–40920 (2010).
Baran, D. et al. Principles for computational design of binding antibodies. Proc. Natl Acad. Sci. U.S.A. 114, 10900–10905 (2017).
Khersonsky, O. & Fleishman, S. J. Incorporating an allosteric regulatory site in an antibody through backbone design. Protein Sci. 26, 807–813 (2017).
Goldenzweig, A. et al. Automated structure- and sequence-based design of proteins for high bacterial expression and stability. Mol. Cell 63, 337–346 (2016).
Miller, G. L. Use of dinitrosalicylic acid reagent for determination of reducing sugar. Anal. Chem. 31, 426–428 (1959).
Khersonsky, O. & Tawfik, D. S. Chromogenic and fluorogenic assays for the lactonase activity of serum paraoxonases. Chembiochem 7, 49–53 (2006).
Elias, M. et al. Structural basis for natural lactonase and promiscuous phosphotriesterase activities. J. Mol. Biol. 379, 1017–1028 (2008).
Murphy, P. M., Bolduc, J. M., Gallaher, J. L., Stoddard, B. L. & Baker, D. Alteration of enzyme specificity by computational loop remodeling and design. Proc. Natl Acad. Sci. U.S.A. 106, 9215–9220 (2009).
Bjelic, S. et al. Exploration of alternate catalytic mechanisms and optimization strategies for retroaldolase design. J. Mol. Biol. 426, 256–271 (2014).
Höcker, B., Beismann-Driemeyer, S., Hettwer, S., Lustig, A. & Sterner, R. Dissection of a (βα) 8-barrel enzyme into two folded halves. Nat. Struct. Mol. Biol. 8, 32–36 (2001).
Fleishman, S. J. et al. RosettaScripts: a scripting language interface to the Rosetta macromolecular modeling suite. PLoS ONE. 6, e20161 (2011).
Finn, R. D. et al. The Pfam protein families database: towards a more sustainable future. Nucl. Acids Res. 44, D279–D285 (2016).
Leaver-Fay, A. et al. Scientific benchmarks for guiding macromolecular energy function improvement. Methods Enzymol. 523 , 109–143 (2013).
Alford, R. F. et al. The Rosetta all-atom energy function for macromolecular modeling and design. J. Chem. Theory Comput. 13, 3031–3048 (2017).
Camacho, C. et al. BLAST+: architecture and applications. BMC Bioinforma. 10, 421 (2009).
Peleg, Y. & Unger, T. Application of high-throughput methodologies to the expression of recombinant proteins in E. coli. Methods Mol. Biol. 426, 197–208 (2008).
Khersonsky, O. & Tawfik, D. S. Structure-reactivity studies of serum paraoxonase PON1 suggest that its native activity is lactonase. Biochemistry 44, 6371–6382 (2005).
Reinhard, L., Mayerhofer, H., Geerlof, A., Mueller-Dieckmann, J. & Weiss, M. S. Optimization of protein buffer cocktails using Thermofluor. Acta Crystallogr. Sect. F. Struct. Biol. Cryst. Commun. 69, 209–214 (2013).
Lee, E., Badr, M., Lazic, A. & Duhr, S. Exploring protein stability and aggregation by nanoDSF. Protein Science 25, 104 (2016).
Papiz, M. Evolving methods for macromolecular crystallography, 11. Mathematics, Physics and Chemistry–Volume 245, edited by Randy J. Read and Joel Sussman. Crystallography Rev. 15, 123–126 (2009).
Evans, P. Scaling and assessment of data quality. Acta Crystallogr. D. Biol. Crystallogr. 62, 72–82 (2005).
French, S. & Wilson, K. On the treatment of negative intensity observations. Acta Crystallogr. A. 34, 517–525 (1978).
McCoy, A. J. Solving structures of protein complexes by molecular replacement with Phaser. Acta Crystallogr. D. Biol. Crystallogr. 63, 32–41 (2007).
Murshudov, G. N., Vagin, A. A. & Dodson, E. J. Refinement of macromolecular structures by the maximum-likelihood method. Acta Crystallogr. D. Biol. Crystallogr. 53, 240–255 (1997).
Afonine, P. V. et al. Towards automated crystallographic structure refinement with phenix.refine. Acta Crystallogr. D. Biol. Crystallogr. 68, 352–367 (2012).
Emsley, P. & Cowtan, K. Coot: model-building tools for molecular graphics. Acta Crystallogr. D. Biol. Crystallogr. 60, 2126–2132 (2004).
Hiblot, J., Gotthard, G., Chabriere, E. & Elias, M. Characterisation of the organophosphate hydrolase catalytic activity of SsoPox. Sci. Rep. 2, 779 (2012).
We thank Melina Shamshoum, Lior Artzi, and Ed Bayer for help in establishing xylanase activity screens in our laboratory, and Nir London, Dan Tawfik, and members of the Fleishman lab for critical reading. The research was supported by a Starting Grant from the European Research Council (335439), the Israel Science Foundation through its Center of Excellence in Structural Cell Biology (1775/12) and its joint India-Israel Research Program (2281/15), and by a charitable donation from Sam Switzer and family.
The authors declare no competing interests.
Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
About this article
Cite this article
Lapidoth, G., Khersonsky, O., Lipsh, R. et al. Highly active enzymes by automated combinatorial backbone assembly and sequence design. Nat Commun 9, 2780 (2018). https://doi.org/10.1038/s41467-018-05205-5
A PROSS-designed extensively mutated estrogen receptor α variant displays enhanced thermal stability while retaining native allosteric regulation and structure
Scientific Reports (2021)
Systems Microbiology and Biomanufacturing (2021)
Scientific Reports (2020)
Nature Methods (2020)
Nature Reviews Molecular Cell Biology (2019)