Introduction

Dynamics of relatively open structures of polypeptides holds clues to the optimal search Nature has coded in the primary sequences as the polypeptide seeks its native structure. One would expect a directed, albeit statistical, dynamics in the vast conformational space far from the folding funnel for resolving Levinthal's paradox and to explain the significant reduction of the folding time observed in practice. What mechanisms drive such an optimal search and how the primary sequences aid such mechanisms are thus key questions in protein folding.

Although how proteins reach native states is of paramount importance, there is little or only limited consensus on the underlying mechanisms, even for secondary structures. Early studies interpreted measurements on folding kinetics as sequential transitions from denatured to native state, possibly through intermediate state(s)1,2. However, experimental and computational advances have revealed heterogeneous intermediates with different folding rates, leading to the hypothesis that folding can be achieved through multiple pathways3,4,5. Similarly, the concept of folding funnels on free energy landscapes was introduced to describe parallel folding processes6,7,8. The funnel theory is however still debated and the existence of individual unrelated pathways has been challenged9,10. Furthermore, although these energy-landscape-oriented models are useful abstract descriptors of the statistical mechanics behind the process, per se they provide no physical insights on the underlying structural transitions toward the native state.

Studying single-domain proteins or polypeptides (SDPs) can shed important light on the above issue not only due to the much reduced structural complexity of SDPs, but also because the secondary structures formed by SDPs play crucial roles in promoting the formation of higher-order structures in large proteins. A prime example is the case of β-hairpins, which are believed to act as nucleation sites during early stages of the folding of full proteins11,12,13. The focus on β-hairpins has so far been on the final stages of folding, e.g., hairpin formation and side-chain rearrangements14,15. In contrast, analyses of open conformational states continue to pose a major experimental and computational challenge16 since such states occupy a large and sparsely populated portion of the conformational space and are structurally diverse. As a result, dynamics further away from the ‘folding funnel’ remains largely unexplored and poorly understood. Nevertheless, as the early dynamics and collapse are associated with an entropic loss that is likely to contribute significantly to the total folding free energy, the critical role they play in SDPs is expected to influence folding dynamics and timescales of larger proteins as well17.

To delineate the entire β-hairpin folding process and thereby identify details about mechanisms that can reduce the complexity of the early-stage dynamics of full proteins, we have carried out extensive conventional molecular dynamics (MD) simulations totaling >10 μs on a model polypeptide, namely, a β-hairpin named chignolin (GYDPETGTWG; Figure 1a)18. We have chosen chignolin in view of the fact that more than 13,000 β-hairpins with key primary structures similar to those of chignolin are found in full proteins (see discussion later as well as Figure 1b). Using a systematic statistical examination of conformational progression from extended conformations toward the native structure, we demonstrate that, surprisingly, a single, simple mechanism (which we call the roll-up mechanism) describes the entire pre-folding process. We present both structural and temporal details associated with individual steps in this roll-up mechanism and, in addition, demonstrate that the folding process is guided by a few key residues. The results thus link the formation of a fundamental higher-order structural element, namely, a β-hairpin, to some of the important residues and their positions in the polypeptide's primary sequence and, moreover, reveal a mechanism that is behind the directed search for the native state. Since, as noted earlier, a large number of β-hairpins share the key features of chignolin at play here and in view of the potential role of β-hairpins as nucleation sites for the folding of full proteins, the kind of topological guidance illustrated by our results may not only be common to many polypeptides but also represent at least one mechanism by which Nature initiates ordered and efficient structure formation in full proteins.

Figure 1
figure 1

Dynamics of formation of residue-level turn structures and native turn in chignolin.

a) Folded chignolin, showing the native turn (NT) at the central turn region (emerald green) and the amino- and carboxyl-terminal strands (N-strand and C-strand, respectively; lime green). b) Four examples (coloured, solid structures) taken from the more than 13,000 β-hairpins which can be found in full proteins and which show features similar to chignolin (gray, transparent structures). c) Probability of residue-level turn structure formation when NT has not formed. NT is considered to form when the Cα-Cα distance between the corner residues Pro4 and Gly7, dPG,Cα, is less than 0.7 nm35. The corresponding probabilities when NT has formed are shown in the inset. Probabilities higher than 1% are indicated by open symbols. In both cases, the residues in the N-strand (Gly1-Asp3) never form turns. d) Sample trajectory, showing the root-mean-squared-distance, RMSD, as described previously19 (top panels), the radius of gyration Rg (middle panels) and the occurrences of turns (bottom panels), as identified using DSSP34, in NT and the C-strand residues. While RMSD and Rg reflect the fluctuations in the overall structural features, the turn occurrences show more coordinated evolution. e) Turn propagation from the C-strand to the NT region in the “roll-up mechanism”. The probability of turn conformation is plotted as a function of time lapsed from a specific, pre-determined starting conformation, in this case conformations with turns at Thr8 and Trp9. The dark line marks the boundary of the C-strand from the NT region, Gly7-Pro4. Typical examples of backbone structures in the roll-up process are also shown.

Results

A central point in previous β-hairpin studies has been to uncover the mechanisms that drive the folding process and, in particular, to examine whether folding follows a zipper-like mechanism or is driven by a hydrophobic collapse with subsequent side-chain-level rearrangements. For chignolin, we have recently shown that formation of the native turn (NT) region (Pro4-Gly7) drives folding and marks the beginning of the downhill zipping stage of the folding cascade19. Other recent studies of different β-hairpins have also reported turn-directed mechanisms, providing support for a general turn-centric folding scenario13,15. In light of these advances, we scrutinize the largely unexplored dynamics of open or relatively open structures, which precedes the abovementioned native-turn formation, as a logical next step towards understanding the entire conformational search and folding of β-hairpins as well as the key early structure-forming events in full proteins. Since standard parameters, such as radii of gyration, Rg and root-mean-squared-distances, RMSD, provide only coarse descriptions of the overall structural features, we choose instead the occurrence of turn structure at individual residue level to extract finer details on the conformational space and dynamics and progression of structural evolution. As a first result, such residue-specific analyses reveal that (residue-level) turn formation is not limited to the NT residues and that outside the NT region turn formation occurs, interestingly, only in the C-strand (Figure 1c). Moreover, turn formation at Thr8 and Trp9 appears to play a central role in the temporal progression of turn structures when the peptide is relatively extended and a clear trend emerges in the occurrence of turns leading up to turn formation closer to the NT region, although this progression is generally masked by intermittency (Figure 1d); note that, in comparison, such a turn progression is completely obscured if one restricts oneself to analyses in terms of RMSD and Rg alone (Figure 1d).

To obtain a clearer picture of the dynamics, we employ a segmented trajectory analysis on the large number of events captured in our extensive simulations. From the entire trajectories, conformations are selected on the basis of structural features (or reaction coordinates) of interest and are considered as starting points in the analysis, thus effectively segmenting the trajectories into shorter simulations, all starting with a conformation of interest. The dynamics subsequent to this particular conformation is then analyzed by averaging the parameters of interest as functions of elapsed time.

The usefulness and efficacy of this method in unveiling β-hairpin dynamics are illustrated in Figure 1e, with turns in Thr8 and Trp9 as starting point. The turn-formation probability at each residue after a time lapse of Δt clearly demonstrates that turns at Thr8 and Trp9 lead to gradual turn propagation from the C-terminal to the central turn region. This progression, or roll-up mechanism, which is initially aided by Pro4-Trp9 interactions (see, also, Figures 2g–h), does not necessarily lead to NT formation; in fact, many of the roll-up attempts stop mid-way and succumb to turn dissolution or reverse propagation (i.e., a roll-back) toward the C-terminal (Figure 1d). Indeed, prolonged Pro4-Trp9 interactions interfere with the formation of native Tyr2-Trp9 interactions and thus slow down NT formation and subsequent folding events (i.e., eventual establishment of native hydrogen bonds and packing of the hydrophobic core). However, on average, turn formation in the C-strand does eventually propagate to Pro4 through Gly7 and the transition from an extended state to a collapsed state with the native turn exhibits a smooth and directed (statistical) march.

Figure 2
figure 2

Effect of single-site mutations on conformational features.

a-b) Ramachandran plots of the dihedral angles of (a) Tyr2 (in non-mutated chignolin) and (b) Ala2 (in chignolin-Y2A). Interactions between Tyr2 side chain and Pro4 ring forces Tyr2 to assume an extended conformation with magnitudes of backbone dihedral angles Φ and Ψ both close to ±180°. In the absence of side-chain interactions a helical conformation emerges at (Φ,Ψ) ~ (−80°,−20°). c–d) Probability density distribution of the distance between Gly1 and Pro4 Cα-atoms in (c) chignolin and (d) chignolin-Y2A. The distance remains relatively constant at ~0.96 nm in chignolin, corresponding to extended conformation for the N-strand. In contrast, without the side-chain interactions the N-strand can also assume a bent conformation, as indicated by the second peak at ~0.76 nm. e–f) Frequency of the occurrence of turn structure in each residue in (e) chignolin and (f) chignolin-P4A. Rigidity of Pro4 limits turn formation to the NT region and the C-strand of chignolin, whereas replacement of Pro4 with Ala allows turn formation also in the N-strand (Gly1-Asp3). g) Typical structure with turns at Thr8 and Trp9. The formation of turn structures at Thr8 and Trp9 is facilitated by interactions between Pro4 and Trp9. h) Lack of consistent turn propagation in the absence of Pro4-Trp9 interactions. In the same manner as in Figure 1e, the probability of turn conformation for the ten residues in chignolin-W9A is plotted as a function of lapsed time, again for given conformations with turns at Thr8 and Ala9 as starting point.

Figure 1e also reveals important timescales for key events in the roll-up. For example, given that turns exist at Thr8 and Trp9 only, propagation to turns at Thr6 and Gly7 typically occurs on the order of a few ns, while further propagation to Glu5 and Pro4 occurs about 2 orders of magnitude more slowly. As folding is driven by NT formation19, the timescale of turn propagation to Pro4 thus provides a lower bound for the overall folding time, estimated to be ~1 μs17,20.

It is noteworthy that turn propagation occurs only from the C-strand; in fact, the N-strand residues never form turns (Figure 1c). Observations show that Tyr2 makes frequent hydrophobic contacts with Pro4; this distraction of Tyr2 by Pro4 and the attendant increased rigidity of the N-strand reduce the conformational options for the polypeptide, offer less hindrance to Pro4-Trp9 interactions and aid the roll-up along the C-strand, as we discuss later. The role of Tyr2-Pro4 interaction on the rigidity is confirmed by simulations of mutant chignolin-Y2A, which show that the replacement of Tyr by Ala increases the flexibility of the N-strand. As seen from Figure 2a-b, Tyr2 in chignolin exists primarily in extended (β-sheet) conformation, whereas Ala2 in Y2A also exists as less-extended α-helix. As a result of backbone flexibility in Y2A, the N-strand varies in end-to-end length, fluctuating between extended and bend conformations, while the same distance in chignolin shows a mono-modal distribution corresponding only to extended N-strand conformations (Figure 2c–d). Clearly, Tyr2-Pro4 interactions in chignolin help confine the available conformational space by restricting the mobility of Tyr2 and, consequently, the N-strand.

A similar conclusion on the role of Tyr2-Pro4 interactions can be made from simulations of chignolin-P4A (Pro4 replaced by Ala). In fact, replacing Pro4 has a more severe effect than that of replacing Tyr2 due to proline's intrinsic conformational restrictions. P4A substitution allows formation of turns anywhere along the polypeptide (Figure 2e–f), while turns are restricted to the C-strand in non-mutated chignolin. These results suggest that the roll-up mechanism, with its turn-propagation from the C-strand (and not from the N-strand) to the NT region, is heavily influenced by both the rigidity of proline's backbone and the Tyr2-Pro4 interactions.

Proline also increases the likelihood of a stable turn-propagation. In the initial steps in the roll-up, interactions between the side chains of Pro4 and Trp9 (Figure 2g), brought together by the formation of turns at Thr8 and Trp9, reduce the risk of roll-back. To test this hypothesis, we carried out simulations of chignolin-W9A (Trp9 replaced by Ala). Turn propagation is clearly less stable in W9A (Figure 2h) than in chignolin (Figure 1e), as evident from the diversity of turn conformations as time progresses. Hence, Pro4-Trp9 interactions stabilize the roll-up mechanism by impeding roll-back and providing a stable framework for turn propagation toward the NT region.

Discussion

Our results show that the pre-folding dynamics of chignolin is highly ordered and proceeds according to a single, simple roll-up mechanism with clearly identifiable stages. This roll-up begins with formation of a non-native turn microdomain at the C-strand and continues with turn propagation toward the central region, where NT forms. As alluded to earlier, during this process, interactions between specific residues, i.e., between Tyr2 and Pro4 and between Pro4 and Trp9, respectively help reduce the available phase space and consolidate turn propagation. The statistical analysis we introduce here, using examinations of trajectory segments with the same initial structure, is able to identify statistically significant (dynamic) structural evolution as the polypeptide searches for the native turns. Moreover, the analysis also provides relevant characteristic timescales.

It might be tempting to characterize the observed non-native turn microdomains as semi-stable states (somewhat analogous to the misfolded states with mismatched cross-strand hydrophobic contacts that have previously been reported for chignolin at later stages of folding19,21). It is important to note, however, that the roll-up mechanism we report in the present paper is simply an efficient search mechanism employed by the polypeptide at early stages, prior to the onset of folding (represented by the previously described folding steps19). The roll-up also describes the statistical progression of the structural changes temporally; that is, the roll-up mechanism and the corresponding underlying structures do not represent a single metastable state. Conclusions about the existence of semi-stable states on the basis of any particular residue-level turn conformation (or a collection of residue-level turns) should therefore be made with caution. Moreover, if any non-native semi-stable states exist during the early stages, they could have an impact on the timescale of subsequent folding events (e.g., native-turn formation, establishment of native hydrogen bonds and side-chain rearrangements; see ref. 17 for a discussion of the various timescales involved), but they would have no implications to the roll-up process itself, unless the metastable states interfere with the formation of native-turn structure. Such is not the case for the class of polypeptides represented by chignolin (with respect to key residues). It is also worth noting here that studies on GB1, another β-hairpin, have suggested that a reptation-like movement of one strand with respect to the other can take place in the late folding stages22,23 (although the underlying mechanism and timescale have not been explored). Again, the early-stage roll-up mechanism we report is different from the above reptation-like motion, which has been reported for transition between a folding intermediate and the folded state at the final stages of folding.

Finally, we ask whether the above roll-up mechanism is likely to be relevant to polypeptides other than chignolin. Hints available in the literature, which are retrospectively illuminating, suggest that the answer is ‘yes’. For example, occurrence of specific non-native turns before formation of the native β-hairpin turn has been observed recently in Peptide 114, although it remains to be seen if there is a similar continual turn propagation as observed by us. In addition, the similarity of folding times (~ 1 μs) and of where Tyr, Pro and Trp, which, we found, have a critical impact on the roll-up, are located in chignolin and Peptide 1 could imply that the folding of Peptide 1 is guided by the same overall, simple mechanism. The roles of the residues are also supported by results of another study24, which found that the N-strand of a modified version of the inhibitor tendamistat β-hairpin, containing Tyr1 and Pro4, was constrained to β-strand conformation, much in line with our observation above for chignolin. Indeed, the side-chain interactions involved in the roll-up may be more general in nature and may consequently be instigated by other amino acid residues with similar characteristics. As alluded to earlier, a search for β-hairpin segments in large proteins containing two aromatic residues separated by proline returns more than 13,000 such β-hairpins (based on a search for 7-to-15-residues-long amino acid sequences of the following type: A1-nX-P-mX-A2, where A1 and A2 are aromatic residues Tyr, Trp, or Phe, X denotes any type of residue and n and m denote the number of spacer residues between the aromatic residues and Pro, P; see examples in Figure 1b). This large number (with similar positions for the key hydrophobic residues) suggests that roll-up may be a general topologically guided mechanism by which the initial stages in the search for the folding funnel are facilitated by interactions between specific residues in the primary sequence.

Formation of small folding microdomains that can grow, collide and assemble to form larger and more stable native-like structures has been observed experimentally and proposed in various structure-oriented folding models, such as the diffusion-collision model25,26, the foldon model27 and the zipping-assembly model28. While these models are useful in interpreting experimental and computational results, they only provide insights on large-scale structural transitions, ignoring residue-level interactions that govern the initial formation and dynamics of microdomains. In contrast, our results provide direct evidence for significant involvement of non-native, Pro4-Trp9 interaction and topological guidance in the folding and intra-chain turn propagation as well as the corresponding kinetics. In effect, our work suggests that, although diverse microscopic pathways down the folding funnel can be discerned, macroscopically the complete folding course can be well described by a single, simple mechanism. The frequent roll-up/roll-back dynamics suggests a relatively flat energy landscape far from the folding basin. On the other hand, even where the polypeptide assumes extended conformations, the local landscape must still have non-zero gradient, which ultimately causes the roll-up mechanism. The existence of a well-defined, topologically guided pathway may suggest that such a pathway bridging the denatured and native states represents one of Nature's ways of reducing complexity and enhancing efficiency of folding. Importantly, recent experimental advances in techniques such as 2D IR, UVRR, NMR and single-molecule FRET offer structural information at unprecedented spatiotemporal resolutions29,30. Since the early stages of protein folding are still poorly understood, such technical improvements may provide complementary avenues for elucidating to what extent topological guidance plays a role in the intricate folding dynamics.

Methods

Extensive MD simulations totaling more than 10 μs of simulated time were carried out for chignolin and each of the three single-site chignolin mutants, Y2A, P4A and W9A, in which Tyr2, Pro4 and Trp9 had been replaced with alanine, respectively. All simulations were carried out starting from fully-extended all-trans conformations in a 4×4×4 nm3 box with about 2,200 water molecules as well as two Na+ ions, counterbalancing the charges of Asp3 and Glu5 to provide a neutral system net charge. Short energy minimizations, which settled high-energy interactions, preceded MD simulations at 298 K and 1 bar and atom positions were sampled every 10 ps. Both energy minimizations and MD simulations were done using GROMACS 4.0.331. Water and peptide bond lengths were constrained using SETTLE and LINCS-P, allowing a time step of 2 fs31. Electrostatic interactions were resolved with the particle mesh Ewald (PME) method, using a real space cut-off of 0.9 nm with a maximum distance between the PME grid points of 0.12 nm in x, y and z direction and a fourth-order polynomial for interpolating the charge values at the grid points. The OPLS-AA force field32 was employed together with the TIP4P water model based on their excellent ability to provide stability of the native state of chignolin, as noted in a recent comparison of force fields33. Identification of residue-level turn structure was done with DSSP34.