Nature Biotechnology 24, 328 - 330 (2006)
doi:10.1038/nbt0306-328
Fancy footwork in the sequence space shuffleFrances H ArnoldFrances H. Arnold is in the Division of Chemistry and Chemical Engineering, California Institute of Technology, 1200 East California Boulevard, Pasadena, California 91125, USA. frances@cheme.caltech.edu Recent reports on directed evolution broaden the scope of evolutionary enzyme engineering.If we were able to explore all possible proteins, we would likely find fantastical molecules that might solve any number of human problems. The riches of sequence space include cures for cancer and solutions to the energy crisis, along with countless other valuable molecules. Unfortunately, such a comprehensive exploration is not even remotely possible, as sequence space is Vastly—for "Very-much-more-than-astronomically"—large1. Just a single copy of each 300-amino acid sequence, for example, would fill dozens of universes. That nature has explored only the tiniest fraction of this space over the entire history of life on Earth should leave protein engineers excited about their own opportunities for discovery.
Yet excitement over the untold riches of sequence space must be tempered by the recognition that the great majority of those sequences don't code for anything interesting; most don't even fold. Estimates for the density of functional proteins in sequence space range anywhere from 1 in 1012 to 1 in 1077. No matter how you slice it, proteins are rare. Useful ones are even more rare. This might lead one to believe that discovering new proteins by mutation and selection is highly unlikely and to discount evolution as an algorithm for discovery. So how do laboratory evolutionists discover new proteins on the timescale of a PhD thesis or, worse, a commercial deadline? They do it by taking the right kinds of steps and starting from the right places.
Three recent reports describe new twists on this theme. Writing in the Journal of the American Chemical Society, Qian and Lutz2 show how circular permutation might complement other, more tried-and-true steps in the search for better enzymes. In a report in Nature Genetics, Peisajovich et al.3 provide a laboratory demonstration of how this permutation step might happen in nature. Finally, writing in Science, Park et al.4 demonstrate how some 'rational design,' with inspiration from studying evolutionarily related proteins, can help find a good place to do the sequence space shuffle.
Evolution works because functional proteins are not evenly distributed in sequence space. Functional proteins are surrounded by other functional proteins that share the same overall structure. Even though most random amino acid substitutions are deleterious, many are not. Sometimes, a single substitution can improve a protein; accumulating such beneficial mutations over iterative rounds of mutagenesis and selection is an effective evolutionary strategy. Random mutation is only one search mechanism that explores sequence space efficiently. Recombination also accesses functional proteins with high probability and can make much larger jumps in sequence space than random mutation5. Laboratory evolutionists have also used less-natural search operations: saturation mutagenesis and random mutagenesis targeted to key portions of a protein (for example, the active site) are widely believed to provide advantages over more random approaches, especially when detailed structural information is available.
Qian and Lutz demonstrate that circularization and random opening should be included on a list of preferred search steps by making permutations on a lipase from Candida antarctica that is widely used in chemical synthesis. Postulating that the enzyme might hydrolyze bulkier substrates more efficiently if it had greater flexibility in its active site, these authors set out to determine whether opening up new C- and N-termini might provide that flexibility, especially if the new ends appeared near the active site. Working at the gene level, they connected the termini with a flexible linker, circularized the construct and proceeded to make random cuts. Screening for the genes that produced lipase activity in Pichia pastoris yielded functional permutants. Not only did they identify 63 new ways to start and stop the C. antarctica lipase, they also found some variants with significantly higher (10–60 fold) kcat values on lipase substrates (p-nitrophenol butyrate and 6,8-difluoro-4-methylumbelliferyl octanoate).
The net result of permutation is (presumably) a protein of the same overall structure, and with most of the amino acids in the same places in the structure. However, the sequences and topologies might be completely different if the polypeptide chain starts and ends at very different positions in the structure (Fig. 1). What role this strategy plays in the evolution of new functional proteins still remains to be determined, but it could wreak havoc for patent attorneys!
 | |  | Various natural protein families bear the marks of having undergone permutation, leading to rearrangement of functional modules and diversification of their topologies. The circularization strategy used by Qian and Lutz to obtain their permutants is not likely to happen in vivo. However, Peisajovich et al. replicated what is considered the most likely natural equivalent of circular permutation—gene duplication and in-frame fusion followed by degradation from the 5' and 3' ends to generate new N- and C-termini. They tested this set of steps on a gene for a DNA methyltransferase (M.HaeIII) and demonstrated that not only could this mechanism produce active permuted methyltransferases, it could do so through a series of functional intermediates in which only the N- or the C-terminus was degraded. These intermediates, which contained some wholly or partially duplicated modules, folded and functioned, albeit at a reduced level compared with the unmodified protein. Because all of the steps used for this laboratory demonstration have natural counterparts, it is likely that similar events can and do occur in nature. In fact, the existence of a new class of methyltransferases predicted on the basis of the laboratory results was validated through searching sequence databases.
Whereas Qian and Lutz found 63 unique functional solutions to permuting the lipase, Peisajovich et al. found far fewer functional permutants of M.HaeIII. This is despite the fact that methyltransferases have clearly undergone such gene rearrangements in the past, and there is no evidence that lipases have. Clearly, guidelines for which proteins are likely to accept such an operation, and especially which ones are likely to benefit from it by developing new or improved function6,
7, must still be determined.
Understanding how functional proteins are distributed in sequence space is fundamental to the success of directed evolution. Another key factor, often ignored, however, is the starting point. Yes, the protein scraped from the bottom of your shoe or collected from your refrigerator is one of those rare sequences that encodes a functional protein. But it is not necessarily a good starting point for obtaining the protein of your dreams. Natural evolution can take twists and turns that the graduate student or industrial biotechnologist does not have the luxury of taking. Thus it makes sense to use all the shortcuts you can to breed new molecules. A champion racehorse is more likely to be born of fleet parents, or at least ones with the requisite physiognomy. A new functional protein is likewise more likely to appear when the laboratory evolutionist makes a discriminating choice of parent(s), thereby starting his search in a promising ballpark.
Recent indications are that good starting points might be accessible by rational design, using powerful computational approaches8 or inspiration derived from a related protein, particularly if that relative already exhibits the targeted function4. Mixing rational design with a little randomization, Park et al.4 converted glyoxalase II, which hydrolyzes the thioester bond of S-D-lactoylglutathione, into a metallo- -lactamase, which catalyzes a similar hydrolysis reaction, but on cefotaxime, a very different substrate. The two naturally occurring enzymes that hydrolyze these substrates share the same overall fold and ancestry, but exhibit low sequence identity. Rationally introduced changes to glyoxalase II included altering the metal binding site to accommodate Zn and duplicate the coordination pattern observed in the -lactamase. In addition, the C-terminal glutathione-binding domain was removed and new loop regions based on metallo- -lactamase family templates were grafted on, each containing variable amino acids.
Bacteria transformed with a diverse gene soup of all these changes and seasoned with a sprinkling of random mutations were selected using cefotaxime. Positives were further evolved with multiple rounds of DNA shuffling and selection until the evolved enzyme conferred resistance to 4 g/ml of the antibiotic. The resulting protein displays only 59% identity to glyoxalase II and contains mutations throughout. Although competent enough to confer antibiotic resistance at a low level, the evolved enzyme is significantly less active on cefotaxime than its role model. And, unlike natural -lactamases, it does not hydrolyze the other -lactam antibiotics tested.
One way to view the work of Park et al. is as a beautiful demonstration of how evolution can rescue a less-than-perfect (but good-as-you-can-get) rational design. The other way to look at it is that rational design narrowed the vast sequence space down to the infinitesimally small (compared to sequence space) ballpark actually searchable in a real experiment. What remains to be seen, however, is whether the ballparks targeted by human design4,
8 also contain enzymes as good as the ones that nature makes, in addition to the relatively mediocre versions discovered so far. In other words, are all mediocre enzymes surrounded by good ones? Answering this will require more exploration.
REFERENCES
-
Dennett, D.C.
Darwin's Dangerous Idea: Evolution and the Meanings of Life (Simon & Schuster Inc., New York, NY; 1995) p. 109.
-
Qian, Z.
&
Lutz, S.
J. Am. Chem. Soc. 127, 13466–13467 (2005). | Article | PubMed | ISI | ChemPort |
-
Peisajovich, S.G.
,
Rockah, L.
&
Tawfik, D.S.
Nat. Genet. 38, 168–174 (2006). | Article | PubMed | ISI | ChemPort |
-
Park, H.-S.
et al. Science 311, 535–538 (2006). | Article | PubMed | ISI | ChemPort |
-
Drummond, D.A.
et al. Proc. Natl. Acad. Sci. USA 102, 5380–5385 (2005). | Article | PubMed | ChemPort |
-
Baird, G.S.
,
Zacharias, D.A.
&
Tsien, R.Y.
Proc. Natl. Acad. Sci. USA 96, 11241–11246 (1999). | Article | PubMed | ChemPort |
-
Guntas, G.
,
Mansell, T.J.
,
Kim, J.R.
&
Ostermeier, M.
Proc. Natl. Acad. Sci. USA 102, 11224–11229 (2005). | Article | PubMed | ChemPort |
-
Dwyer, M.A.
,
Looger, L.L.
&
Hellinga, H.W.
Science 304, 1967–1971 (2004). | Article | PubMed | ISI | ChemPort |
|