Introduction
Most proteins are required to adopt a specific three-dimensional structure to be biologically active. How they achieve this has been the subject of immense scientific interest spanning the decades since the first structures of proteins were elucidated1. The conformational space accessible to the polypeptide chain is astronomically large, yet proteins fold on a biologically relevant timescale, with some obtaining their native structure in vitro in just microseconds2. How rapid folding is achieved has been rationalized by a number of concepts: the presence of nonrandom interactions in the initial denatured state that limit the conformational space available at the start of the folding reaction3; folding via intermediates that mark the way to the native structure4; and the realization that proteins fold on funnelled energy landscapes5 that describe folding as the inevitable consequence of the requirement to lower the free energy (increase stability) as more native contacts form6 (Fig. 1). In this landscape view of folding, the denatured state of the protein populates a large ensemble of structures. The polypeptide chain may then fold by numerous pathways, potentially adopting multiple partially folded ensembles en route to the native state6.
Figure 1: Schematic representation of folding funnels.
Example of a smooth energy landscape, through which the polypeptide chain is effectively funneled to the native structure (a), and a more rugged landscape, through which the polypeptide chain has to navigate, possibly via one or more populated intermediates, to the native state (b). In both examples, the denatured state occupies a broad ensemble of structures containing elements of both native and non-native interactions.
Full size image (67 KB)For a protein that folds via a two-state transition (with a mechanism in which only the denatured and native states are populated) the energy landscape is relatively smooth. Such a landscape lacks deep valleys and high barriers and effectively funnels the polypeptide chain to its native state (Fig. 1a). Such an ideal folding scenario is rare4, and many proteins fold on rough, rugged landscapes. Using new methods that can detect sparsely populated and/or transient non-native species (Table 1), even small, simple proteins have now been shown to fold through one or more partially folded states4, 5. In general, folding energy landscapes are rugged entities that are suboptimal for folding (Fig. 1b) through which the polypeptide chain has to navigate to the native state5. Landscape ruggedness arises as the consequence of the simple fact that native protein structures are stabilized by thousands of mutually supportive, weak interactions that cannot all be satisfied simultaneously during folding. As a result, energetic minimization of individual interactions can be conflicting, leading to 'frustration' in the energy landscape5, 7. This ruggedness may be attributed to opposing evolutionary pressures on protein sequences to enable them to fold reliably but also to avoid aggregation and to carry out specific biological functions5, 7.
Landscape theory predicts an additional folding scenario in which the native structure is attained without encountering any substantial energy barriers, so-called 'downhill folding'6. In principle, at least, proteins that fold in a downhill manner open the door to characterization of the folding landscape in immense detail via the myriad of non-native conformations that are accessible experimentally for such a folding mechanism. Barrierless folding is difficult to demonstrate unequivocally by experiment, as proteins that fold in this manner are expected to obtain their native structure with rates close to the folding 'speed limit'2. In addition, the experimental hallmarks of this type of folding are difficult to define8, 9, 10. Nonetheless, downhill folding has been suggested (with much debate11, 12) for a number of model proteins13, 14, 15. Single-molecule experiments have been proposed as a means to differentiate downhill and two-state folding10, 16, although this is not straightforward even with such a powerful approach17. A more detailed discussion of fast protein folding and downhill folding scenarios can be found in ref. 18.
Characterization of all the non-native species (unfolded states, transition states and partially folded intermediates) encountered by proteins that fold in a barrier-limited manner is essential if we are to realize our quest to understand how proteins fold in all-atom detail. Substantial advances toward this goal have been realized for a handful of small proteins19, 20, 21, 22, 23, 24, 25. This has been enabled by the development of experimental approaches with faster timescales of measurement26 and enhanced sensitivity (Table 1 and references therein), together with improvements in computing power and new theoretical tools6, 27. Today, the arsenal of biophysical methods available to the experimentalist allows transitions from picosecond to second (or longer) timescales to be monitored and species populated to as little as 0.5% to be identified and structurally assessed28. In principle, single-molecule techniques offer the potential to map folding events one molecule at a time. Using this approach, rare species can be detected and characterized that may be hidden by the averaging inherent within ensemble experiments17. This approach also enables the measurement of intramolecular diffusion coefficients in denatured and partially folded states, providing detailed insights into the nature of the polypeptide chain at different stages of folding29, 30. Perhaps most importantly, these experiments can link models based on chemical kinetics commonly used in protein folding with the more physical description of folding in terms of quantitative free-energy surfaces17.
A detailed review of the insights that have revolutionized our understanding of protein-folding mechanisms and their impact on biology is beyond the scope of this short article. Here we focus on three areas that have seen major advances in recent years: (i) the structural diversity and properties of non-native states, (ii) current knowledge about folding pathways and the extent to which protein sequences are optimized for folding efficiency, and (iii) new approaches that are beginning to allow us to take the knowledge gained from in vitro studies toward a molecular description of folding in the cell. In each area we highlight a selection of recent studies showing how different experimental approaches have been used to elucidate new details of protein-folding mechanisms.
The structural properties and diversity of non-native species
A major challenge in the structural, kinetic and thermodynamic characterization of folding landscapes is the transient and heterogeneous nature of non-native species. Defining the structural properties of non-native states, and determining how they interconvert so as to align them in the context of a folding pathway, has remained a central issue since this field began4. Today, the use of experimental methods with enhanced time resolution and sensitivity, in combination with molecular dynamics simulations, are beginning to reveal all-atom models of non-native ensembles20, 21, 22, 23, 24, 31. Although there remains further room for optimization of this approach32, 33, it has been particularly useful in allowing visualization and interrogation of ensembles of structures that represent the experimental data (rather than a unique solution to the experimental observables). These models can then inspire new experiments to test and refine the structural ensembles produced22, 34.
The denatured protein ensemble is of particular interest in the context of the landscape view of folding, because this is the state from which folding initiates. As the denatured protein ensemble is rarely populated at equilibrium, obtaining structural information about this species is a challenging task. This has been achieved using naturally unstable proteins in which the native and denatured states are in equilibrium under ambient conditions35 or by creating proteins by mutation19 or chemical modification36 that are denatured under conditions that typically favor folding in their wild-type counterparts. Alternative strategies involve denaturing the protein under acidic conditions or adding chaotropes, with the caveat that the structural properties of these ensembles may differ from those under less harsh conditions37, 38.
Studies of the denatured ensemble of the helical protein Im7, formed in 6 M urea, using chemical shift analysis and NOE measurements, revealed that this species lacks regular elements of secondary structure. Despite this, the polypeptide chain contains clusters of interacting hydrophobic side chains in those regions that ultimately form helices in the native state, potentially priming the protein for subsequent folding events37. The existence of structure in the denatured state of Im7 in the presence of chaotrope builds on an increasing body of data that indicates conformational restriction in denatured chains35, 36, 38. Folding of the helical
repressor protein also commences from a highly nonrandom state36. When denatured under ambient conditions by oxidation of methionine residues, this protein has been shown to possess nascent
-helical structure in the N-terminal region, whereas the C-terminal region remains nonhelical yet conformationally restrained36. All-atom images of the denatured ensembles of the all-helical acyl coenzyme A binding protein (ACBP) and the all
-sheet drkN SH3 domain have been obtained using NMR paramagnetic relaxation enhancement to provide restraints for simulations35, 38. These experiments revealed denatured ensembles containing species, ranging from expanded to highly compact, that are stabilized by both native and non-native interactions. Together, these studies indicate that the denatured states of proteins are highly heterogeneous, containing polypeptide chains that vary widely in their individual conformational properties.
By placing donor and acceptor chromophores at different positions, Schuler and co-workers have exploited the power of single-molecule Förster resonance energy transfer (FRET) and fluorescence correlation spectroscopy (FCS) to determine distance distributions in the unfolded ensemble of the cold-shock protein CspTm and the rate of intramolecular diffusion of the polypeptide chain in the denatured state at different denaturant concentrations29. These results suggest that the polypeptide behaves as a Gaussian chain even at low denaturant concentrations where the protein is collapsed and contains
20% of its native
-sheet structure30. This study reconciles the seemingly contradictory results from small-angle X-ray scattering experiments that demonstrate that the overall dimensions of unfolded proteins are consistent with random coil models39 and increasing evidence of residual structure in unfolded ensembles from NMR studies35, 36, 37, 38. Even within such a structured denatured state, the global reconfiguration time is rapid (
50 ns)29. A study of loop formation in unfolded polypeptides using triplet-triplet energy transfer also observed large-scale motions involving chain diffusion occurring on a timescale of 10–100 ns, whereas faster kinetics were observed on the 50–500 ps timescale corresponding to local fluctuations40.
For the characterization of more highly structured non-native species, such as partially folded intermediates and transition states, the use of protein engineering (
-value analysis) is well established41, 42. In recent years
-values have been used as restraints for molecular dynamics simulations to generate atomic-level structural models of these ensembles20, 21, 22, 23, 31. Recent analysis of the early and late transition state ensembles of two homologous PDZ domains using this approach demonstrated that the early transition states of the two domains are less similar in structure than the subsequent rate-limiting transition state ensembles. This is consistent with the landscape view that conformational space is less restricted earlier in folding23. The late transition state of both proteins adopts a narrow ensemble of structures with native-like topology. This demonstrates that conformational sampling is highly restricted by this stage of folding, as has been found previously in several other proteins21, 31.
Although the interpretation of
-values (energetic parameters) in structural terms requires caution43, 44, 45, for populated intermediates independent analysis of
-values, chemical shifts and hydrogen exchange protection factors allows assessment of the quality of the ensembles that result from molecular dynamics simulations using different observable parameters as restraints20. The characterization of folding intermediates that are stably populated is particularly important because, when on-pathway, they represent stepping stones en route to the native state4. Such species have also been implicated in misfolding diseases46, and their structural characterization offers prospects for therapeutic intervention. By the careful manipulation of experimental conditions to modulate the population of intermediate species and their rates of interconversion, it is possible to characterize these species using the range of approaches listed in Table 1. Even 'hidden' intermediates that are kinetically invisible (because they form after the rate-limiting transition state) can be detected and structurally characterized using native-state hydrogen exchange47. Rare intermediates can also be detected and structurally analyzed using relaxation dispersion NMR28 (Table 1). Structural ensembles representing the folding intermediate of the Fyn SH3 domain have been calculated using chemical shifts determined from relaxation dispersion NMR experiments as restraints24 (Fig. 2). A similar strategy was used to determine an ensemble of intermediate structures for Im7 using a combination of
-values, hydrogen exchange protection factors and chemical shifts as restraints20, 21 (Fig. 2). The finding that both of these small, single-domain proteins fold via intermediates underlines the generic importance of partially folded species in protein-folding reactions. The conformational properties of these species show that, for these proteins, the native topology is well defined by this point in folding. Perhaps more surprisingly, for both proteins substantial numbers of non-native (as well as native) contacts are formed in the intermediate ensembles20, 21, 25, indicative of frustration in folding landscapes of even these small, simple proteins.
Figure 2: Models of the conformational properties of the ensembles representing the folding intermediates of the bacterial immunity protein Im7 (a; ref. 21) and the rare folding intermediate of the G48V variant of the Fyn SH3 domain (b).
The native structure of each protein is shown below (PDB 1AYI for Im7 (ref. 94); PDB 1SHF for Fyn SH3 domain95). Comparison with the native structure demonstrates that the native topology is well defined in the folding intermediates of both these small proteins. Images of the intermediate ensembles reproduced from Nat. Struct. Mol. Biol. (ref. 21) and Nature (ref. 24).
Full size image (65 KB)Direct observation of (un)folding trajectories in real time using single-molecule fluorescence techniques offers further opportunities to monitor folding reactions and to reveal rare events or species hidden by the averaging of ensemble experiments17. Using immobilization techniques on surfaces or encapsulation within liposomes to increase observation times, the first trajectories of folding reactions of individual proteins in real time are emerging48, 49. Although single-molecule fluorescence studies such as these should be able to expose multiple species on the reaction coordinate, significant challenges lie ahead in developing experiments to allow the properties of rapidly interconverting species to be discerned. Mechanical manipulation using optical tweezers or the atomic force microscope has already revealed the presence of intermediates when individual proteins are unfolded under force50, 51.
Evolution of folding pathways
The landscape view presents a powerful picture of protein folding, in that it allows a clear portrayal of the heterogeneity of species on the folding surface. It also highlights the importance of native contacts in funneling the folding chain toward the native state6. As a consequence, the native topology determines the sequence of folding events, rationalizing why the structural mechanism of folding is conserved in protein families52 (even if the kinetic mechanism (for example, two- or three-state) varies53). It also explains the observed correlation between folding rate and the complexity of the native fold (contact order)54.
Important questions result from viewing folding as a multidimensional search process. These include how many routes to the native state are taken by a folding polypeptide chain and the sensitivity of the pathways taken to the experimental conditions and protein sequence. Some proteins seem to fold via a single route through the energy landscape, as famously portrayed by chymotrypsin inhibitor 2 (ref. 55). For other proteins, the route map is more diverse56, 57. For multidomain proteins, the possibility of folding via parallel routes is an obvious, and real56, possibility. Other long-established causes of parallel routes and alternative conformations involve cis-trans proline isomerization or disulfide oxidation58, 59, 60. In more recent work, Oliveberg and co-workers suggested that the number of pathways accessible to a polypeptide chain may be linked to the number of nucleation motifs contained within its sequence61. These authors identified the minimal nucleation motif for folding of the ribosomal protein S6 and showed it to comprise an
-helix docked against two
-strands, a motif similar in size to the smallest cooperatively stabilized proteins62 (Fig. 3). By creating circular permutants of S6, the authors showed that the nucleation motif is conserved in all sequences and always includes the central
-strand 1 but is completed by different structural elements that are dependent on the local loop entropies of the individual permutants62, 63 (Fig. 3). Recent studies using ankyrin repeat proteins have also revealed the presence of multiple nucleation sites64, 65, 66. This suggests that the opportunity to fold via different routes may be a general feature of evolved folding landscapes.
Figure 3: Overlapping nucleation motifs in the ribosomal protein S6.
(a) Above, structure of wild-type S6 (S6wt; PDB 1RIS96) colored to show the possibility for the protein to fold via different folding nuclei:
1 (red) and
2 (blue). Both nuclei share the central
1 strand (purple). Below, schematic of the secondary structure of S6, demonstrating overlap of the two folding nuclei. (b) Schematics demonstrating how local loop entropy influences which of the two nuclei dominates folding and, hence, the structural folding mechanism of the protein. P13-14 and P81-82 are circular permutants in which the N and C termini of the wild-type protein are linked and new termini created between positions 13 and 14, and 81 and 82, respectively. Figure redrawn from ref. 63.
Using the range of methods now available, the folding mechanisms of more than 20 small, water-soluble proteins have been interrogated in detail. For most of these proteins, folding is remarkably efficient in vitro. Thus, gross misfolding and aggregation are rare, folding is rapid (usually occurring in less than 1 second), and intermediate states, if formed, are transient4. By contrast with the behavior of these proteins, the 93-residue de novo designed protein Top7 folds with a mechanism too complex to be solved kinetically, involving numerous highly populated non-native states67. This suggests that the process of evolution has yielded sequences that are relatively well designed for folding. Imperfections in the landscape presumably reflect additional constraints that have limited the evolution of the sequence, such as the requirement to avoid aggregation, to remain sufficiently soluble or to be functional7, 68.
The conflict between folding and function is an emerging theme in current studies of protein folding. Several reports have documented the effects of these opposing evolutionary pressures on the folding landscape7, 21, 69, 70, 71. Examples include a statistical survey of natural proteins, which demonstrated that highly frustrated interactions colocalize with ligand binding sites in protein structures, underlining the opposing requirements of folding and function7. Recent temperature-jump studies of the human PIN1 WW domain also revealed how the evolutionary requirement to endow function is achieved at the expense of rapid folding and native-state stability69. For Im7, the transient formation of non-native interactions early in folding, which involves solvent-exposed residues that are vital for function, provides a further example of frustration in the folding landscape21. Finally a recent molecular dynamics study of interleukin-1
(IL-1
) revealed the phenomenon of 'backtracking', in which subsets of native contacts form, break and then reform later during folding70. Such real-time editing of prematurely formed native interactions contributes to the slow folding of this protein and is caused by residues within a functionally important
-bulge70, 71. It seems that today's sequences are not perfected for folding but represent a compromise of the different forces encountered during their evolutionary history. As well as providing exciting opportunities for the experimentalist to improve upon nature's designs, this also furnishes the threat that minor changes to the sequence and/or environmental conditions may increase landscape ruggedness to such an extent that it has deleterious effects on the maintenance of a healthy living cell72.
Toward a molecular description of folding in the cell
Translating how the insights gained from biophysical studies of folding in vitro relate to the physiological process of folding in the cell is a further major challenge. In the cellular setting, folding proceeds in a crowded environment73 that is packed with molecular chaperones, which assist the folding process in this hostile environment74. Over the course of evolution, the cell has devised cunning schemes to enable proteins (which are generally larger and more complex than those that have been studied in detail biophysically to date) to fold correctly. Other challenges include the requirements for post-translational modification, cofactor binding, complex formation and compartmentalization, all of which must be intricately controlled to ensure cellular homeostasis. Now that many of the proteins involved in chaperoning, targeting, modifying and degrading proteins have been identified, the challenge is to determine the shape of the folding landscape in the cellular context and to understand how it might be altered by changes in the cellular environment. In addition to describing the initial path to the native state that commences with protein synthesis, delineation of the shape of the folding landscape in vivo will allow the probability and molecular nature of excursions from the native state to be identified. This would allow the global and subglobal unfolding events that are crucial for function, molecular recognition and degradation to be understood in atomistic detail.
Although detailed description of the folding landscape in vivo will require further advances in methodology and increased computational power, significant steps have been made toward unraveling the mechanisms of folding in the cell. Theoretical and experimental studies have shown that molecular crowding increases the stability of compact states (native and non-native) over their expanded counterparts75, 76, 77. Crowding can also enhance folding rates78, 79 or result in conformational changes postulated to have important consequences for function80. Confinement (either within chaperones or within the ribosome exit tunnel) may also have important consequences for folding in the cellular environment77. Other Reviews in this issue81, 82 deal with these topics in detail. Exciting advances in the power of biophysical studies, particularly in NMR83 and fluorescence techniques17, 84, 85, are beginning to reveal insights into folding events in ribosome-bound nascent chains83, 86, within chaperones87, 88 and even within living cells89, 90, 91, 92. These studies have shown the propensity for folding subsequent to the emergence of the polypeptide chain from the ribosomal exit tunnel83 or, in some circumstances, even within the ribosomal exit tunnel86. They have also demonstrated the consequences of conformational restriction on the folding of polypeptide chains when they are confined within the GroEL folding cage87, 88, 93.
Following folding in real time in intact cells is an immensely challenging goal, and there is some way to go before we will be able to depict realistic models of the folding landscape therein. A number of recent, innovative studies have taken the first steps in this direction. Exploitation of a tetracysteine motif that specifically binds a biarsenical fluorescein dye (FlAsH) has enabled measurement of the unfolding free energy of a small protein, cellular retinoic acid binding protein (CRABP), in the Escherichia coli cytoplasm91. This approach has also been used to monitor protein aggregation in vivo92. The continuing development of NMR techniques offers the potential to study protein structure and dynamics in whole cells89, 90. Using 15N-labeled protein (the B1 domain of protein G) injected into Xenopus laevis oocytes, Selenko et al. were able to record high-resolution HSQC spectra in the eukaryotic cytosol. Analysis of line widths and chemical shifts allowed quantitative comparison of the structure and dynamics of this small protein in buffer, in crude X. laevis oocyte extracts, in solutions containing macromolecular crowding agents and in the intact cell90. The ultimate goal of monitoring folding kinetics in real time at the level of a single protein molecule in vivo awaits enhancements in dye technology, labeling strategies and instrument development. Impressive achievements in this area have allowed the kinetics of binding of individual repressor molecules to DNA in E. coli to be monitored in real time at the single-molecule level85. This technical tour de force bodes well for innovations in this area and for the application of single-molecule approaches to the study of protein folding and dynamics in the living cell.
Summary and outlook
The wide range of techniques developed over recent years has led to a near-atomistic view of the folding landscape of small, model proteins in vitro. Approaches combining both experiment and simulation have been crucial in achieving this goal, and the synergy of these approaches will become even more important as the field develops in the future. At a fundamental level, much remains to be learned about the biophysics of protein folding: the prediction of folds and folding mechanisms is far from routine27 and comparison of results from simulations and experimental approaches remains challenging33. Further enhancement of experimental techniques and theoretical approaches are needed so that each can be used to better model, refine and test the outputs of the other. In addition, the folding mechanisms of only the simplest of proteins have been studied biophysically in detail so far. Larger proteins, chemically modified proteins, protein complexes and membrane proteins, which together comprise most of nature's proteins, remain largely uncharacterized. Other future challenges include understanding folding in the context of interconverting ensembles and landscape theory in the living cell, where folding may be assisted by chaperones, challenged by stochastic events such as aggregation and modulated by the cellular status at any moment in time. Armed with the arsenal of methods and concepts about folding landscapes derived from studies in vitro over recent years and fueled by the increasing awareness of the importance of protein folding in the context of homeostasis and disease, major advances toward this goal are sure to be realized in the forthcoming years.
Note added in proof: Two recent articles106, 107 have described innovative NMR approaches to determine protein structure and dynamics in living cells.

-values in protein folding: evidence against specific nucleation sites
