Article | Published:

Protein building blocks preserved by recombination

Nature Structural Biologyvolume 9pages553558 (2002) | Download Citation



Borrowing concepts from the schema theory of genetic algorithms, we have developed a computational algorithm to identify the fragments of proteins, or schemas, that can be recombined without disturbing the integrity of the three-dimensional structure. When recombination leaves these schemas undisturbed, the hybrid proteins are more likely to be folded and functional. Crossovers found by screening libraries of several randomly shuffled proteins for functional hybrids strongly correlate with those predicted by this approach. Experimental results from the construction of hybrids of two β-lactamases that share 40% amino acid identity demonstrate a threshold in the amount of schema disruption that the hybrid protein can tolerate. To the extent that introns function to promote recombination within proteins, natural selection would serve to bias their locations to schema boundaries.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Change history

  • 10 June 2002

    Updated PDF, image updated


  1. 1.

    *Note: A mistake was introduced during the production process of this paper. In the AOP version of the paper, footnote 6 of Table 1 was mistakenly placed after the MIC value of hybrid 2A. The correct position for footnote 6 should be after the MIC value of hybrid 1A; 2,5606 . This mistake has been corrected in the HTML version and will appear correctly in print. The PDF version available online has been appended.


  1. 1

    Holland, J. Adaptation in Natural and Artificial Systems (The University of Michigan Press, Ann Arbor; 1975).

  2. 2

    Stemmer, W.P.C. Rapid evolution of a protein in-vitro by DNA shuffling. Nature 370, 389–391 (1994).

  3. 3

    Crameri, A., Raillard, S-A., Bermudez, E. & Stemmer, W.P.C. DNA shuffling of a family of genes from diverse species accelerates directed evolution. Nature 391, 288–291 (1998).

  4. 4

    Ostermeier, M. & Benkovic, S.J. Evolution of protein function by domain swapping. Adv. Protein Chem. 55, 29–77 (2000).

  5. 5

    Rossman, M.G. & Liljas, A. Recognition of structural domains in globular proteins. J. Mol. Biol. 85, 177–181 (1974).

  6. 6

    Crippen, G.M. Tree structural organization of proteins. J. Mol. Biol. 126, 315–332 (1978).

  7. 7

    Rose, G.D. Hierarchic organization of domains in globular-proteins J. Mol. Biol. 134, 447–470 (1979).

  8. 8

    Go, M. Correlation of DNA exonic regions with protein structural units in haemoglobin. Nature 291, 90–92 (1981).

  9. 9

    Zehfus, M.H. & Rose, G.D. Compact domains in proteins. Biochemistry 25, 5759–5765 (1986).

  10. 10

    Holm, L. & Sander, C. Parser for protein folding units. Proteins 19, 256–268 (1994).

  11. 11

    Panchenko, A.R., Luthey-Schulten, Z. & Wolynes, P.G. Foldons, protein structural modules, and exons. Proc. Natl. Acad. Sci. USA 93, 2008–2013 (1996).

  12. 12

    Tsai, C.-J., Maizel, J.V. & Nussinov, R. Anatomy of protein structures: visualizing how a one-dimensional protein chain folds into a three-dimensional shape. Proc. Natl. Acad. Sci. USA 97, 12038–12043 (2000).

  13. 13

    Go, M. Modular structural units, exons, and function in chicken lysozyme. Proc. Natl. Acad. Sci. USA 80, 1964–1968 (1983).

  14. 14

    de Souza, S.J., Long, M., Schoenbach, L., Roy, S.W. & Gilbert, W. Intron positions correlate with module boundaries in ancient proteins. Proc. Natl. Acad. Sci. USA 93, 14632–14636 (1996).

  15. 15

    Gilbert, W., de Souza, S.J. & Long, M.Y. Origin of genes. Proc. Natl. Acad. Sci. USA 94, 7698–7703 (1997).

  16. 16

    Ranganathan, A. et al. Knowledge-based design of bimodular and trimodular polyketide synthases based on domain and module swaps: a route to simple statin analogues. Chem. Biol. 6, 731–741 (1999).

  17. 17

    Bogarad, L.D. & Deem, M.W. A hierarchal approach to protein molecular evolution. Proc. Natl. Acad. Sci. USA 96, 2591–2595 (1999).

  18. 18

    Riechmann, L. & Winter, G. Novel folded protein domains generated by combinatorial shuffling of polypeptide segments. Proc. Natl. Acad. Sci. USA 97, 10068–10073 (2000).

  19. 19

    Forrest, S. & Mitchell, M. Foundations of Genetic Algorithms 2 (ed. Whitley, L.D.) 109 (Morgan Kaufmann, San Mateo; 1993).

  20. 20

    Mitchell, M. An Introduction to Genetic Algorithms (The MIT Press, Cambridge, Massachusetts; 1996).

  21. 21

    Sanschagrin, F., Theriault, E., Sabbagh, Y., Voyer, N. & Levesque, R.C. Combinatorial biochemistry and shuffling TEM, SHV and Streptomyces albus omega loops in PSE-4 class A β-lactamase. J. Antimicrob. Chemo. 45, 517–519 (2000).

  22. 22

    Ness, J.E. et al. DNA shuffling of subgenomic sequences of subtilisin. Nature Biotech. 17, 893–896 (1999).

  23. 23

    Brock, B.J. & Waterman, M.R. The use of random chimeragenesis to study structure/function properties of rat and human P450c17. Arch. Biochem. Biophys. 373, 401–408 (2000).

  24. 24

    Ostermeier, M., Shim, J.H. & Benkovic, S.J. A combinatorial approach to hybrid enzymes independent of DNA homology. Nature Biotech. 17, 1205–1209 (1999).

  25. 25

    Lutz, S., Ostermeier, M. & Benkovic, S.J. Rapid generation of incremental truncation libraries for protein engineering using α-phosphothioate nucleotides. Nucleic Acids Res. 29, e16 (2001).

  26. 26

    Jelsch, C., Mourey, L., Masson, J.M. & Samama, J.P. Crystal-structure of Escherichia coli TEM-1 β-lactamase at 1.8-Å resolution. Proteins 16, 364–383 (1993).

  27. 27

    Lim, D. et al. Insights into the molecular basis for carbenicillinase activity of PSE-4 β-lactamase from crystallographic and kinetic studies. Biochemistry 40, 395–402 (2001).

  28. 28

    Horton, R.M. PCR-mediated recombination and mutagenesis. Mol. Biotech. 3, 93–99 (1995).

  29. 29

    Palzkill, T. & Botstein, D. Probing β-lactamase structure and function using random replacement mutagenesis. Proteins 14, 19–44 (1992).

  30. 30

    Huang, W.Z., Petrosino, J., Hirsch, M., Shenkin, P.S. & Palzkill, T. Amino acid sequence determinants of β-lactamase structure and activity. J. Mol. Biol. 258, 688–703 (1996).

  31. 31

    Voigt, C.A., Mayo, S.L., Arnold, F.H. & Wang, Z.-G. Computational method to reduce the search space for directed protein evolution. Proc. Natl. Acad. Sci. USA 98, 3778–3783 (2001).

  32. 32

    Voigt, C.A., Kauffman, S. & Wang, Z.-G. Rational evolutionary design: the theory of in vitro protein evolution. Adv. Protein Chem. 55, 79–160 (2000).

  33. 33

    Voigt, C.A., Mayo, S.L., Arnold, F.H., & Wang, Z.-G., Computationally focusing the directed evolution of proteins. J. Cell. Biochem. Suppl. 37, 58–63 (2001).

  34. 34

    Bolon, D.N., Voigt, C.A. & Mayo, S.L. De novo design of biocatalysts. Curr. Opin. Chem. Biol. 6, 125–129 (2002).

  35. 35

    Henikoff, S. & Henikoff, J.G. Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA 89, 10915–10919 (1992).

  36. 36

    Lobkovsky, E. et al. Evolution of an enzyme-activity — crystallographic structure at 2-Å resolution of cephalosporinase from the AmpC gene of Enterobacter cloacae-P99 and comparison with a class-A penicillinase. Proc. Natl. Acad. USA 90, 11257–11261 (1993).

  37. 37

    Betzel, C. et al. Crystal-structure of the alkaline proteinase savinase from Bacillus lentus at 1.4-Å resolution. J. Mol. Biol. 223, 427–445 (1992).

  38. 38

    Williams, P.A., Cosme, J., Sridhar, V., Johnson, E.F. & Mcree, D.E. Mammalian microsomal cytochrome P450 monooxygenase: structural adaptations for membrane binding and functional diversity. Mol. Cell. 93, 121–131 (2000).

  39. 39

    Almassy, R.J., Janson, C.A., Kan, C.C. & Hostomska, Z. Structures of apo and complexed Escherichia coli glycinamide ribonucleotide transformylase. Proc. Natl. Acad. Sci. USA 89, 6114–6118 (1992).

  40. 40

    Koradi, R., Billeter, M. & Wüthrich, K. MOLMOL: a program for display and analysis of macromolecular structures. J. Mol. Graphics 14, 51–55 (1996).

Download references


C.A.V. is supported by a National Science Foundation graduate research fellowship and the California Institute of Technology Initiative in Computational Molecular Biology, a Burroughs Wellcome funded program for science at the interface. Z.G.W. acknowledges the support by the W.M. Keck Foundation. S.L.M. is supported by the Howard Hughes Medical Institute, the Ralph M. Parsons Foundation and an IBM Shared University Research Grant. The PSE-4 gene and the PMON vector were provided by R.C. Levesque (Université Laval, Québec, Canada).

Author information


  1. Biochemistry and Molecular Biophysics, California Institute of Technology, mail code 210-41, Pasadena, 91125, California, USA

    • Christopher A. Voigt
  2. Division of Chemistry and Chemical Engineering California Institute of Technology, California Institute of Technology, mail code 210-41, Pasadena, 91125, California, USA

    • Carlos Martinez
    • , Zhen-Gang Wang
    •  & Frances H. Arnold
  3. Howard Hughes Medical Institute and Division of Biology, California Institute of Technology, mail code 147-75, Pasadena, 91125, California, USA

    • Stephen L. Mayo


  1. Search for Christopher A. Voigt in:

  2. Search for Carlos Martinez in:

  3. Search for Zhen-Gang Wang in:

  4. Search for Stephen L. Mayo in:

  5. Search for Frances H. Arnold in:

Competing interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to Zhen-Gang Wang or Stephen L. Mayo or Frances H. Arnold.

About this article

Publication history




Issue Date


Further reading