Principles for designing ideal protein structures


Unlike random heteropolymers, natural proteins fold into unique ordered structures. Understanding how these are encoded in amino-acid sequences is complicated by energetically unfavourable non-ideal features—for example kinked α-helices, bulged β-strands, strained loops and buried polar groups—that arise in proteins from evolutionary selection for biological function or from neutral drift. Here we describe an approach to designing ideal protein structures stabilized by completely consistent local and non-local interactions. The approach is based on a set of rules relating secondary structure patterns to protein tertiary motifs, which make possible the design of funnel-shaped protein folding energy landscapes leading into the target folded state. Guided by these rules, we designed sequences predicted to fold into ideal protein structures consisting of α-helices, β-strands and minimal loops. Designs for five different topologies were found to be monomeric and very stable and to adopt structures in solution nearly identical to the computational models. These results illuminate how the folding funnels of natural proteins arise and provide the foundation for engineering a new generation of functional proteins free from natural evolution.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Figure 1: Fundamental rules.
Figure 2: Derivation of secondary structure lengths from the rules for five protein topologies.
Figure 3: Characterization of design for each of the five folds.
Figure 4: Comparison of computational models with experimentally determined structures.

Accession codes

Primary accessions

Protein Data Bank

Data deposits

TheNMR structures of the five designs have been deposited in the RCSB Protein Data Bank under the accession numbers 2KL8 (Di-I_5), 2LV8 (Di-II_10), 2LN3 (Di-III_14), 2LVB (Di-IV_5) and 2LTA (Di-V_7). NMR data have been deposited in the Biological Magnetic Resonance Data Bank under the accession numbers 16387 (Di-I_5), 18558 (Di-II_10), 18145 (Di-III_14), 18561 (Di-IV_5) and 18465 (Di-V_7).


  1. 1

    Leopold, P. E., Montal, M. & Onuchic, J. N. Protein folding funnels: a kinetic approach to the sequence-structure relationship. Proc. Natl Acad. Sci. USA 89, 8721–8725 (1992)

  2. 2

    Onuchic, J. N., Wolynes, P. G., Luthey-Schulten, Z. & Socci, N. D. Toward an outline of the topography of a realistic protein-folding funnel. Proc. Natl Acad. Sci. USA 92, 3626–3630 (1995)

  3. 3

    Dill, K. A. & Chan, H. S. From Levinthal to pathways to funnels. Nature Struct. Biol. 4, 10–19 (1997)

  4. 4

    Hill, R. B., Raleigh, D. P., Lombardi, A. & DeGrado, W. F. De novo design of helical bundles as models for understanding protein folding and function. Acc. Chem. Res. 33, 745–754 (2000)

  5. 5

    Butterfoss, G. L. & Kuhlman, B. Computer-based design of novel protein structures. Annu. Rev. Biophys. Biomol. Struct. 35, 49–65 (2006)

  6. 6

    Samish, I., MacDermaid, C. M., Perez-Aguilar, J. M. & Saven, J. G. Theoretical and computational protein design. Annu. Rev. Phys. Chem. 62, 129–149 (2011)

  7. 7

    Dahiyat, B. I. & Mayo, S. L. De novo protein design: fully automated sequence selection. Science 278, 82–87 (1997)

  8. 8

    Kuhlman, B. et al. Design of a novel globular protein fold with atomic-level accuracy. Science 302, 1364–1368 (2003)

  9. 9

    Dantas, G., Kuhlman, B., Callender, D., Wong, M. & Baker, D. A large scale test of computational protein design: folding and stability of nine completely redesigned globular proteins. J. Mol. Biol. 332, 449–460 (2003)

  10. 10

    Calhoun, J. R. et al. Computational design and characterization of a monomeric helical dinuclear metalloprotein. J. Mol. Biol. 334, 1101–1115 (2003)

  11. 11

    Isogai, Y., Ito, Y., Ikeya, T., Shiro, Y. & Ota, M. Design of lambda Cro fold: solution structure of a monomeric variant of the de novo protein. J. Mol. Biol. 354, 801–814 (2005)

  12. 12

    Shah, P. S. et al. Full-sequence computational design and solution structure of a thermostable protein variant. J. Mol. Biol. 372, 1–6 (2007)

  13. 13

    Hu, X., Wang, H., Ke, H. & Kuhlman, B. Computer-based redesign of a beta sandwich protein suggests that extensive negative design is not required for de novo beta sheet design. Structure 16, 1799–1805 (2008)

  14. 14

    Hecht, M. H., Richardson, J. S., Richardson, D. C. & Ogden, R. C. De novo design, expression, and characterization of Felix: a four-helix bundle protein of native-like sequence. Science 249, 884–891 (1990)

  15. 15

    Richardson, J. S. & Richardson, D. C. Natural beta-sheet proteins use negative design to avoid edge-to-edge aggregation. Proc. Natl Acad. Sci. USA 99, 2754–2759 (2002)

  16. 16

    Jin, W., Kambara, O., Sasakawa, H., Tamura, A. & Takada, S. De novo design of foldable proteins with smooth folding funnel: automated negative design and experimental verification. Structure 11, 581–590 (2003)

  17. 17

    Harbury, P. B., Plecs, J. J., Tidor, B., Alber, T. & Kim, P. S. High-resolution protein design with backbone freedom. Science 282, 1462–1467 (1998)

  18. 18

    Summa, C. M., Rosenblatt, M. M., Hong, J. K., Lear, J. D. & DeGrado, W. F. Computational de novo design, and characterization of an A(2)B(2) diiron protein. J. Mol. Biol. 321, 923–938 (2002)

  19. 19

    Havranek, J. J. & Harbury, P. B. Automated design of specificity in molecular recognition. Nature Struct. Biol. 10, 45–52 (2003)

  20. 20

    Kortemme, T. et al. Computational redesign of protein-protein interaction specificity. Nature Struct. Mol. Biol. 11, 371–379 (2004)

  21. 21

    Go, N. Theoretical studies of protein folding. Annu. Rev. Biophys. Bioeng. 12, 183–210 (1983)

  22. 22

    Rohl, C. A., Strauss, C. E., Misura, K. M. & Baker, D. Protein structure prediction using Rosetta. Methods Enzymol. 383, 66–93 (2004)

  23. 23

    Street, T. O., Fitzkee, N. C., Perskie, L. L. & Rose, G. D. Physical-chemical determinants of turn conformations in globular proteins. Protein Sci. 16, 1720–1727 (2007)

  24. 24

    Bystroff, C. & Baker, D. Prediction of local structure in proteins using a library of sequence-structure motifs. J. Mol. Biol. 281, 565–577 (1998)

  25. 25

    Hunter, C. G. & Subramaniam, S. Protein local structure prediction from sequence. Proteins 50, 572–579 (2003)

  26. 26

    Etchebest, C., Benros, C., Hazout, S. & de Brevern, A. G. A structural alphabet for local protein structures: improved prediction methods. Proteins 59, 810–827 (2005)

  27. 27

    Voelz, V. A., Shell, M. S. & Dill, K. A. Predicting peptide structures in native proteins from physical simulations of fragments. PLoS Comput. Biol. 5, e1000281 (2009)

  28. 28

    Tyka, M. D. et al. Alternate states of proteins revealed by detailed energy landscape mapping. J. Mol. Biol. 405, 607–618 (2011)

  29. 29

    Dill, K. A. Dominant forces in protein folding. Biochemistry 29, 7133–7155 (1990)

  30. 30

    Sheffler, W. & Baker, D. RosettaHoles: rapid assessment of protein core packing for structure prediction, refinement, design, and validation. Protein Sci. 18, 229–239 (2009)

  31. 31

    Fleming, P. J., Gong, H. & Rose, G. D. Secondary structure determines protein topology. Protein Sci. 15, 1829–1834 (2006)

  32. 32

    Chikenji, G., Fujitsuka, Y. & Takada, S. Shaping up the protein folding funnel by local interaction: lesson from a structure prediction study. Proc. Natl Acad. Sci. USA 103, 3141–3146 (2006)

  33. 33

    Kaplan, J. & DeGrado, W. F. De novo design of catalytic proteins. Proc. Natl Acad. Sci. USA 101, 11566–11570 (2004)

  34. 34

    Correia, B. E. et al. Computational design of epitope-scaffolds allows induction of antibodies specific for a poorly immunogenic HIV vaccine epitope. Structure 18, 1116–1126 (2010)

  35. 35

    Bolon, D. N. & Mayo, S. L. Enzyme-like proteins by computational design. Proc. Natl Acad. Sci. USA 98, 14274–14279 (2001)

  36. 36

    Jiang, L. et al. De novo computational design of retro-aldol enzymes. Science 319, 1387–1391 (2008)

  37. 37

    Röthlisberger, D. et al. Kemp elimination catalysts by computational enzyme design. Nature 453, 190–195 (2008)

  38. 38

    Siegel, J. B. et al. Computational design of an enzyme catalyst for a stereoselective bimolecular Diels-Alder reaction. Science 329, 309–313 (2010)

  39. 39

    Fleishman, S. J. et al. Computational design of proteins targeting the conserved stem region of influenza hemagglutinin. Science 332, 816–821 (2011)

  40. 40

    Azoitei, M. L. et al. Computation-guided backbone grafting of a discontinuous motif onto a protein scaffold. Science 334, 373–376 (2011)

  41. 41

    Khare, S. D. et al. Computational redesign of a mononuclear zinc metalloenzyme for organophosphate hydrolysis. Nature Chem. Biol. 8, 294–300 (2012)

  42. 42

    King, N. P. et al. Computational design of self-assembling protein nanomaterials with atomic level accuracy. Science 336, 1171–1174 (2012)

  43. 43

    Eisenbeis, S. et al. Potential of fragment recombination for rational design of proteins. J. Am. Chem. Soc. 134, 4019–4022 (2012)

  44. 44

    Bonneau, R., Ruczinski, I., Tsai, J. & Baker, D. Contact order and ab initio protein structure prediction. Protein Sci. 11, 1937–1944 (2002)

  45. 45

    Simons, K. T., Kooperberg, C., Huang, E. & Baker, D. Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. J. Mol. Biol. 268, 209–225 (1997)

  46. 46

    Huang, P. S. et al. RosettaRemodel: a generalized framework for flexible backbone protein design. PLoS ONE 6, e24109 (2011)

  47. 47

    Cooper, S. et al. Predicting protein structures with a multiplayer online game. Nature 466, 756–760 (2010)

  48. 48

    Studier, F. W. Protein production by auto-induction in high density shaking cultures. Protein Expr. Purif. 41, 207–234 (2005)

  49. 49

    Jansson, M. et al. High-level production of uniformly N-15- and C-13-enriched fusion proteins in Escherichia coli. J. Biomol. NMR 7, 131–141 (1996)

  50. 50

    Pace, C. N., Vajdos, F., Fee, L., Grimsley, G. & Gray, T. How to measure and predict the molar absorption coefficient of a protein. Protein Sci. 4, 2411–2423 (1995)

  51. 51

    Santoro, M. M. & Bolen, D. W. Unfolding free energy changes determined by the linear extrapolation method. 1. Unfolding of phenylmethanesulfonyl alpha-chymotrypsin using different denaturants. Biochemistry 27, 8063–8068 (1988)

  52. 52

    Acton, T. B. et al. Preparation of protein samples for NMR structure, function, and small-molecule screening studies. Methods Enzymol. 493, 21–60 (2011)

  53. 53

    Neri, D., Szyperski, T., Otting, G., Senn, H. & Wuthrich, K. Stereospecific nuclear magnetic resonance assignments of the methyl groups of valine and leucine in the DNA-binding domain of the 434 repressor by biosynthetically directed fractional 13C labeling. Biochemistry 28, 7510–7516 (1989)

  54. 54

    Tjandra, N., Grzesiek, S. & Bax, A. Magnetic field dependence of nitrogen-proton J splittings in N-15-enriched human ubiquitin resulting from relaxation interference and residual dipolar coupling. J. Am. Chem. Soc. 118, 6264–6272 (1996)

  55. 55

    Shen, Y., Atreya, H. S., Liu, G. H. & Szyperski, T. G-matrix Fourier transform NOESY-based protocol for high-quality protein structure determination. J. Am. Chem. Soc. 127, 9085–9099 (2005)

  56. 56

    Delaglio, F. et al. Nmrpipe - a multidimensional spectral processing system based on unix pipes. J. Biomol. NMR 6, 277–293 (1995)

  57. 57

    Bartels, C., Xia, T. H., Billeter, M., Guntert, P. & Wuthrich, K. The program Xeasy for computer-supported NMR spectral-analysis of biological macromolecules. J. Biomol. NMR 6, 1–10 (1995)

  58. 58

    Liu, G. H. et al. NMR data collection and analysis protocol for high-throughput protein structure determination. Proc. Natl Acad. Sci. USA 102, 10487–10492 (2005)

  59. 59

    Shen, Y., Delaglio, F., Cornilescu, G. & Bax, A. TALOS+: a hybrid method for predicting protein backbone torsion angles from NMR chemical shifts. J. Biomol. NMR 44, 213–223 (2009)

  60. 60

    Güntert, P., Mumenthaler, C. & Wuthrich, K. Torsion angle dynamics for NMR structure calculation with the new program DYANA. J. Mol. Biol. 273, 283–298 (1997)

  61. 61

    Herrmann, T., Guntert, P. & Wuthrich, K. Protein NMR structure determination with automated NOE assignment using the new software CANDID and the torsion angle dynamics algorithm DYANA. J. Mol. Biol. 319, 209–227 (2002)

  62. 62

    Linge, J. P., Williams, M. A., Spronk, C. A., Bonvin, A. M. & Nilges, M. Refinement of protein structures in explicit solvent. Proteins 50, 496–506 (2003)

  63. 63

    Brünger, A. T. et al. Crystallography & NMR system: a new software suite for macromolecular structure determination. Acta Crystallogr. D 54, 905–921 (1998)

  64. 64

    Huang, Y. J., Tejero, R., Powers, R. & Montelione, G. T. A topology-constrained distance network algorithm for protein structure determination from NOESY data. Proteins 62, 587–603 (2006)

  65. 65

    Huang, Y. J. et al. An integrated platform for automated analysis of protein NMR structures. Methods Enzymol. 394, 111–141 (2005)

  66. 66

    Lüthy, R., Bowie, J. U. & Eisenberg, D. Assessment of protein models with three-dimensional profiles. Nature 356, 83–85 (1992)

  67. 67

    Sippl, M. J. Recognition of errors in three-dimensional structures of proteins. Proteins 17, 355–362 (1993)

  68. 68

    Laskowski, R. A., MacArthur, M. W., Moss, D. S. & Thornton, J. M. Procheck - a program to check the stereochemical quality of protein structures. J. Appl. Crystallogr. 26, 283–291 (1993)

  69. 69

    Word, J. M., Bateman, R. C., Presley, B. K., Lovell, S. C. & Richardson, D. C. Exploring steric constraints on protein mutations using MAGE/PROBE. Protein Sci. 9, 2251–2259 (2000)

  70. 70

    Bhattacharya, A., Tejero, R. & Montelione, G. T. Evaluating protein structures determined by structural genomics consortia. Proteins 66, 778–795 (2007)

  71. 71

    Huang, Y. J., Powers, R. & Montelione, G. T. Protein NMR recall, precision, and F-measure scores (RPF scores): structure quality assessment measures based on information retrieval statistics. J. Am. Chem. Soc. 127, 1665–1674 (2005)

Download references


We thank N. Grishin for suggesting target folds for design, P. Rajagopal for one-dimensional NMR measurements of Folds-I and -II, and J. Siegel for measurements by mass spectrometer. We also thank P.-S. Huang and Y.-E. A. Ban for computational tools; J. L. Gallaher for experimental assistance; J. Castellanos for the help with designing Fold-IV; H.-W. Lee, K. Pederson and J. Prestegard for measurements of residual dipolar couplings; and S. Khare, F. DiMaio, I. Andre, S. Fleishman, J. Mills, S. Takada, S. Fuchigami and G. Chikenji for comments on the manuscript. This work was supported by HHMI, DOE, DARPA, DTRA and the National Institutes of General Medical Science Protein Structure Initiative (PSI:Biology) programme, grant U54 GM094597. N.K. was also supported by Japan Society for the Promotion of Science (JSPS) Postdoctoral Fellowships for Research Abroad.

Author information

N.K., R.T.-K., G.L., G.T.M. and D.B. designed the research. N.K. performed folding simulations and analysed natural proteins. N.K. wrote program code. N.K. and R.T.-K. performed computational design work: Di-I_5 and Di-IV_5 were designed by N.K., and Di-II_10, Di-III_14 and Di-V_7 were designed by R.T.-K. R.T.-K. expressed, purified and characterized the designed proteins by biochemical assay. R.X. and T.B.A. prepared isotope-enriched protein samples for NMR structure determination. G.L. collected NMR data and determined the solution NMR structures. N.K., R.T.-K., G.L., G.T.M. and D.B. wrote the manuscript.

Correspondence to Gaetano T. Montelione or David Baker.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Information

This file contains Supplementary Figures 1-14, Supplementary Tables 1-8, Supplementary Discussions 1-2, Supplementary Methods 1-5 and Supplementary references (see contents for further details). (PDF 4484 kb)

Supplementary Data

This file contains Supplementary Data for Rosetta command lines to perform the design protocol. (ZIP 6 kb)

PowerPoint slides

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Koga, N., Tatsumi-Koga, R., Liu, G. et al. Principles for designing ideal protein structures. Nature 491, 222–227 (2012).

Download citation

Further reading


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.