Structural biology

A toolbox for protein design

Some of the principles underlying how amino-acid sequences determine the three-dimensional structures of proteins have been defined. This has enabled a successful approach to designing protein folds from scratch. See Article p.222

Proteins are the molecular machines of the cell. In order to function, they have to fold into a defined three-dimensional (tertiary) conformation known as the native structure, which is encoded by the amino-acid sequence of the protein chain. It would be immensely useful to be able to predict native structures from amino-acid sequences, but our understanding of how such sequences determine the three-dimensional arrangements of proteins is limited. On page 222 of this issue, Koga et al.1 describe a set of rules that relate secondary protein-structure patterns — α-helices and β-sheets — to tertiary features. They also show how these principles can be used to design amino-acid sequences that fold into predefined topologies.

The folded state of a protein is stabilized by many different non-covalent interactions. These interactions form as the protein folds into its native conformation, which, under physiological conditions, corresponds to the thermodynamically most stable, lowest-energy conformation. When a protein's energy landscape — a map of every possible protein conformation and the associated energy levels, plotted as a two- or three-dimensional representation — is considered, the folding path can be envisaged as a funnel in which the formation of local stabilizing interactions contributes to a decrease in the protein's overall energy and so leads to the native protein conformation2.

However, if non-covalent interactions form that are energetically favourable but do not exist in the native conformation, this will lead to an accumulation of non-native states. Proteins therefore have to select the biologically relevant native structure over the large number of possible non-native conformations. If non-native states can be 'discouraged' from forming, this will improve the folding path of the protein chain, because it does not get trapped in energy minima associated with non-native states.

Naturally evolved proteins have 'smooth' folding funnels that lack non-native-state energy minima, and so Koga et al. set out to find the principles that help these proteins to differentiate between native and non-native conformations. They focused on local structural patterns that strongly favour the formation of single tertiary motifs — compact tertiary structures that consist of a few adjacent secondary structures.

By combining computer simulations with an analysis of test sets of naturally occurring protein sequences, the authors established three fundamental rules that describe the relationships between local interactions and tertiary-motif formation in the ββ-'hairpin' structure and the αβ- and βα-motifs. They found that the orientation of the secondary structural elements with respect to each other primarily depends on the number of amino acids in the peptide loop that connects them, as well as on the direction in which the amino-acid side chains of the β-strands are pointing. Notably, the positioning of the secondary elements is independent of the specific amino-acid residues in the loop's sequence.

On the basis of this set of fundamental rules, Koga et al. identified a second, emergent set of rules describing the lengths of secondary structural elements, and of the peptide loops connecting them, needed for three larger tertiary motifs: ββα, αββ and βαβ. Together with the widely used Ramachandran plots3 (which allow the possible conformations of amino-acid residues in proteins to be determined), these rules make up a toolbox of fundamental principles for designing tertiary structures.

Koga and colleagues went on to implement these principles, with remarkable results. They designed five different folds with similar topologies, and which consisted of several α-helices, β-strands and minimal connecting loops. For each topology, the authors calculated the thermodynamic stability of possible sequences and conformations to find the ones that had the lowest energies. To refine these sequences, they incorporated large, hydrophobic amino-acid residues in core regions of the folds to generate a strong driving force for folding, and used negative design — the incorporation of sequences that destabilize the formation of unwanted structures — at the edges of β-strands and protein surfaces to disfavour oligomerization.

The authors went on to synthesize the DNA that encodes several of their designed peptide sequences, to see if the peptides can be expressed in cells, and, if so, whether they fold into the predicted tertiary structures. They found that many of the peptide sequences could be expressed, and that the peptides displayed spectroscopic features typical of proteins formed from mixed α- and β-subunits. Of these sequences, many were highly stable even when exposed to heat. Some were monomeric and were amenable to analysis by nuclear magnetic resonance spectroscopy, allowing Koga et al. to determine the structure in solution of a designed sequence for each of the five fold types. The experimentally determined structures agreed well with the computational design models. The success rate of the authors' design strategy, defined as the percentage of experimentally tested proteins for each fold that exhibited a full set of desirable characteristics (see Table 1 of the paper1), was extraordinarily high, at 8–40%.

Researchers from the same laboratory as Koga et al. had previously reported4 the impressive preparation of Top7 — a 93-residue α/β-protein designed to adopt a tertiary structure not found in nature, and which incidentally satisfies the newly established design principles1. But the success of Koga and colleagues' protocol, and its incorporation of negative design, is a big step forward compared with that earlier work. With lengths of 80–100 amino acids, the authors' protein structures are comparable in size to small protein domains that act as building blocks of larger, more complex proteins. The design of custom protein scaffolds that perform new functions is now conceivable, as is the assembly of bigger designed domains and quaternary complexes.

Like machines, proteins can be assembled from smaller parts — such as secondary structural elements and single tertiary motifs — if those parts are connected by the right joints to form the 'chassis' for a specific function (Fig. 1). New enzymatic functions, such as the ability to catalyse a simple proton-transfer reaction5,6, have already been introduced into existing protein scaffolds in this way. Nevertheless, it is still tricky to design proteins that catalyse more complex reactions7, or which bind to specific ligand molecules8. The use of customized protein scaffolds might offer advantages for such efforts, but 'ideal' structures containing minimal peptide loops, such as those found in Koga and colleagues' sequences, will not, in many cases, provide enough space for catalytic (or ligand-binding) sites. Such sites in naturally occurring proteins are often surrounded by longer loops.

Figure 1: Protein assembly demystified.

Proteins are constructed from secondary structures known as α-helices and β-strands, connected by protein loops. Koga et al.1 have defined fundamental rules that describe how local interactions in secondary structures relate to the assembly of simple tertiary motifs (compact, three-dimensional structures that consist of a few adjacent secondary structures, such as the βα- and αβ-motifs shown). In this example, different connecting loops direct the α-helix to pack against different sides of the β-strand. These rules, in turn, form the basis of emergent principles governing the design of 'ideal', more complex motifs, such as the βαβ-structure shown, which is constructed from the βα- and αβ-motifs shown in grey boxes.

So the next challenge for de novo enzyme design is to find ways of mitigating the potentially destabilizing effect of non-ideal loops, to allow the introduction of catalytic (and often charged) residues. A look at natural proteins illustrates how compromises can be struck. For example, the common TIM-barrel fold consists of eight α-helices and eight β-strands alternating along the peptide backbone, and always harbours its catalytic residues in long loops found on one side of the barrel; the opposite side has short loops, and helps to stabilize the fold9. So perhaps a partial implementation of Koga and colleagues' fundamental rules — one that allows longer loops and/or naturally observed motifs to be copied in some places — will lead to further improvements in enzyme design. Other applications of the rules, such as predicting protein structures, might be just as interesting.


  1. 1

    Koga, N. et al. Nature 491, 222–227 (2012).

  2. 2

    Leopold, P. E., Montal, M. & Onuchic, J. N. Proc. Natl Acad. Sci. USA 89, 8721–8725 (1992).

  3. 3

    Ramachandran, G. N., Ramakrishnan, C. & Sasisekharan, V. J. Mol. Biol. 7, 95–99 (1963).

  4. 4

    Kuhlman, B. et al. Science 302, 1364–1368 (2003).

  5. 5

    Röthlisberger, D. et al. Nature 453, 190–195 2008).

  6. 6

    Korendovych, I. V. et al. Proc. Natl Acad. Sci. USA 108, 6823–6827 (2011).

  7. 7

    Baker, D. Protein Sci. 19, 1817–1819 (2010).

  8. 8

    Schreier, B., Stumpp, C., Wiesner, S. & Höcker, B. Proc. Natl Acad. Sci. USA 106, 18491–18496 (2009).

  9. 9

    Sterner, R. & Höcker, B. Chem. Rev. 105, 4038–4055 (2005).

Download references

Author information

Correspondence to Birte Höcker.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Höcker, B. A toolbox for protein design. Nature 491, 204–205 (2012) doi:10.1038/491204a

Download citation

Further reading


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.