Protein folding features a diffusive search over a multidimensional energy landscape in conformational space for the minimum-energy structure1. Experiments, however, are usually interpreted in terms of a one-dimensional (1D) projection of the full landscape onto a practical reaction coordinate. Although simulations have shown that folding kinetics can be described well by diffusion over a 1D projection2,3, 1D approximations have not yet been fully validated experimentally. We used folding trajectories of single molecules held under tension in optical tweezers to compare the conditional probability of being on a transition path4, calculated from the trajectory5, with the prediction for ideal 1D diffusion over the measured 1D landscape6, calculated from committor statistics7,8. We found good agreement for the protein PrP (refs 9,10) and for one of the structural transitions in a leucine-zipper coiled-coil11, but not for a second transition in the coiled-coil, owing to poor reaction-coordinate quality12. These results show that 1D descriptions of folding can indeed be good, even for complex tertiary structures. More fundamentally, they also provide a fully experimental validation of the basic physical picture of folding as diffusion over a landscape.
Protein folding is justly renowned for its combinatorial complexity: not only is it driven by a wide range of different and often competing interactions, but there are hundreds or even thousands of degrees of freedom related to the bond angles in the polypeptide chain and the motions of the solvent1. The full energy landscape underlying folding thus has a very high dimensionality. Measuring the dynamics in each degree of freedom represents a supreme technical challenge that remains beyond current capabilities. Instead, through necessity, experiments typically monitor the folding dynamics in a much-reduced projection of the full dynamical space, most commonly using a single dimension associated with a convenient observable (for example, radius of gyration, end-to-end extension, and so on), which becomes the collective ‘reaction coordinate’ used to describe the progress of the folding13. The conformational dynamics are then described in terms of diffusion along this reaction coordinate.
Computational simulations suggest that low-dimensional reductions can generally provide a valid description of the folding14; indeed, simulations of a variety of small proteins show that kinetic properties such as rates and transition-path times can be accounted for quantitatively by even a 1D projection, with the observed kinetics matching the predictions for diffusion over the 1D energy profile2,3. Experimentally, simple 1D approximations have found reasonable empirical success, especially for smaller proteins10,15, although counterexamples exist that are likely to require multiple dimensions to account for the observed behaviour (for example, proteins with knots16 or multiple pathways17). Nevertheless, there are many potential concerns with simple 1D descriptions. Even if a low-dimensional approximation is valid, a 1D approximation may not be18,19. The projection onto the reaction coordinate may also be sub-optimal, incompletely capturing the full dynamics during folding; such ‘bad’ reaction coordinates may lead to non-Markovian dynamics, poor predictions and incorrect interpretations20,21,22. Reaction-coordinate quality is rarely tested in protein-folding experiments, however, with only a handful of examples published12. Moreover, even if a reaction coordinate is known to be good, it has not yet been directly shown that the dynamics along such a coordinate agree quantitatively with 1D diffusion over the measured energy profile.
One way to address this question is through analysis of the transition paths during folding. Transition paths represent the purely reactive portions of the folding trajectory, the fleeting moments when the protein changes from one conformation to another, in contrast to the non-productive fluctuations that comprise most of the trajectory. For two-state folding, where the projected energy profile consists of two wells separated by a barrier, the transition paths are those parts of the trajectory crossing the barrier from one well to the other (Fig. 1). The conditional probability that the molecule is on a transition path at a given reaction-coordinate value, p(TP|x), provides both a test of reaction-coordinate quality and—if the energy profile is known—whether the dynamics truly reflect 1D diffusion over this profile4.
We recently showed how to apply such transition-path analysis to single-molecule force spectroscopy (SMFS) measurements5. Single-molecule approaches are particularly well suited to characterizing transition paths23,24, because the latter are inherently a property of individual molecules. In SMFS, tension is applied to the ends of a single molecule, and its extension—the reaction coordinate—is measured as the conformation fluctuates25. Transition paths can be identified clearly because the extension can be measured with high precision. SMFS also provides effective ways for measuring the folding landscape6. Applying transition-path analysis to two-state DNA hairpins, the folding dynamics were found to match expectations for 1D diffusion5. Proteins pose a greater conceptual challenge for applying 1D descriptions, however, because of their complex tertiary structure.
We first analysed folding trajectories of the prion protein PrP, which was previously shown to have a two-state native folding pathway, although it can also form transient misfolded states9. Natively folded PrP molecules attached covalently at each terminus to DNA handles were bound to beads held in high-resolution optical tweezers (Fig. 2a). Trajectories of the molecular extension measured in equilibrium at a constant force of 9–10 pN, near the value at which folded and unfolded states were equally likely, showed multiple transitions between the native and unfolded states (Fig. 2b).
We calculated p(TP|x) using the Bayesian relation4 where P(x) is the equilibrium distribution of extension values in the complete trajectory, P(x|TP) is the distribution of extension values along only the transition paths, and p(TP) is the fraction of time spent on transition paths. Transition paths were identified as the parts of the trajectory (Fig. 2c, red and blue) transiting between two boundaries, x1 and x2 (Fig. 2c, dotted lines), chosen to bracket the barrier region between the folded and unfolded states. As described previously, this analysis must be corrected for instrumental effects on the measurement (here, the mechanical compliance)5. An additional complication in the case of PrP is the presence of misfolded states9, which contribute to P(x) even though they are excluded from P(x|TP) because they do not transit the full distance between folded and unfolded states. We corrected simultaneously for both compliance effects and misfolded states by replacing P(x) with the probability distribution P0(x) obtained from the 1D energy profile for PrP folding, calculated from non-equilibrium pulling curves10. Here, misfolded states were deselected kinetically by the pulling regime and compliance effects were removed through deconvolution.
The result for p(TP|x) is highly peaked, reaching a maximum value of about 0.45 (Fig. 3a, black). These features are indicative of a good reaction coordinate3; the two states are well resolved along the coordinate, and the protein is very likely to be found on a transition path in the region between the states, leading to a highly peaked p(TP|x); ideally, p(TP|x) should reach 0.5 at the barrier between the states. Indeed, the location of the energy barrier, x‡ (Fig. 3a, dashed line), found from the reconstructed landscape (Fig. 3a, blue), was very close to the peak in p(TP|x), well within the resolution of the reconstruction.
Having established the quality of extension as a reaction coordinate, we next tested whether the statistics of the transition paths were well described by 1D diffusion over the measured landscape. In the case of ideal diffusion4, one should have p(TP|x) = 2pfold(x)[1 − pfold(x)], where pfold(x) is the committor, the probability that when the molecule starts at x it will reach the folded state before the unfolded state7. For a two-state system, pfold(x) is approximately 0 near the unfolded state, 1 near the folded state, and 1/2 at the top of the barrier. In the case of diffusive dynamics along a 1D energy profile G(x), and assuming for simplicity a constant diffusion coefficient4,8, pfold(x) is given by8 Using the result (Fig. 3b, orange) to calculate ϕ(x) = 2pfold(x)[1 − pfold(x)] (Fig. 3a, orange), we found that the ϕ(x) agreed surprisingly well with p(TP|x): the location, height and width of the two peaks were all very similar, well within the limits of experimental uncertainty. The folding dynamics are thus well described by 1D diffusion over the measured landscape, the central result of this work. We confirmed this result using an alternative approach, calculating pfold(x) directly from the extension trajectory rather than from the 1D landscape (see Methods): for 1D diffusion, both methods should yield the same result8. Indeed, the landscape pfold (Fig. 3b, orange) agreed very well with the trajectory pfold (Fig. 3b, black), confirming that 1D diffusion over the reconstructed landscape describes the dynamics well.
The notion that 1D approximations are plausible was supported by previous work showing that PrP folding kinetics were consistent with Kramers’ theory over several orders of magnitude10, and that the kinetics of other proteins were similarly consistent with 1D models15. Our new results provide a deeper and more direct test of protein folding as a diffusive search over an energy landscape, showing that not only the kinetics but more importantly the statistics of the transition paths—the most important parts of the folding trajectories—match predictions for 1D diffusion over the measured landscape. The quantitative nature of the agreement is remarkable, given the size and complexity of the structure being formed: 104 amino acids forming 3 helices, 2 strands and multiple loops.
To test whether a similar result holds for other proteins, we analysed equilibrium folding trajectories of a coiled-coil leucine zipper, which in contrast to PrP exhibited three-state behaviour with an obligate intermediate11 (Fig. 4a). Treating the folding as sequential two-state transitions, previous work found that end-to-end extension was not a good reaction coordinate for the I ↔ U transition12, making this protein an interesting test case for transition-path analysis. We defined boundaries x1 and x2 for each transition as above, calculating p(TP|x) from equation (1) for each transition (Fig. 4b, black: F ↔ I, grey: I ↔ U), using the compliance-deconvolved11 distribution P0(x), as well as ϕ(x) from equation (2) (Fig. 4b, orange: F ↔ I, brown: I ↔ U), using the deconvolved landscape (Fig. 4b, blue). Reasonable agreement was found for F ↔ I, confirming that its folding dynamics are well described by diffusion over the measured energy profile. However, the test failed for I ↔ U: there were more non-reactive fluctuations into the barrier region than expected (even after accounting for compliance effects), depressing p(TP|x). Extension was thus not a good reaction coordinate for this transition, as found previously12.
The I ↔ U transition provides a counterexample where diffusion over the measured landscape does not describe the observed dynamics well, highlighting the importance of reaction-coordinate quality. Whereas the reaction coordinate can be engineered in computations to ensure optimal low-dimensional descriptions of the dynamics13,14, in experiments it is imposed by the choice of assay, without any particular privilege; here, for example, the applied force does not ensure that the reaction coordinate is always good. Changing the pulling axis may permit reaction-coordinate optimization22, but such an optimization has never been done experimentally.
The ability to capture the folding dynamics on a single dimension is usually understood intuitively as indicating a clear separation of timescales between a single slow coordinate that dominates the behaviour and faster dynamics along all other coordinates18,26 (although this explanation is not formally dispositive21). An important implication is that the transition paths probably funnel through a single, well-defined region of phase space acting as the transition-state ensemble (multiple pathways with different diffusivities would be likely to prevent quantitative agreement with 1D diffusion), suggesting that the transition-state ensemble can be identified in a physically meaningful way22.
It will be instructive to apply transition-path analysis more widely, to understand better the limits of 1D descriptions. It will be particularly interesting to analyse proteins exhibiting evidence of multiple competing pathways17, distributions of barriers27 and dynamic disorder19, forms of ‘anomalous’ diffusion such as subdiffusion of the backbone28, or particularly complex structures such as knots16, to obtain a quantitative look at how 1D descriptions break down (and the quality of the reaction coordinate in these cases). In addition to surveying different proteins, transition-path analysis will also be valuable for going beyond the current study, which was limited to equilibrium measurements under tension, to test whether the result depends on the mode of denaturation (for example, force, temperature, chemical denaturant), the probe used, or other measurement conditions (for example, equilibrium versus non-equilibrium), all of which may alter key folding properties such as the dominant pathways and barriers and hence the effective dimensionality29,30. Such studies should help establish how widely 1D landscapes can be applied and under what conditions 1D descriptions fail.
Sample preparation and measurement.
Samples of truncated hamster prion protein, PrP(90–231), were expressed, purified, refolded and attached covalently at each terminus to double-stranded DNA handles roughly 1 kilobase in length as described previously9. Protein–DNA chimaeras were bound specifically to 600-nm- and 810-nm-diameter polystyrene beads labelled with avidin and anti-digoxigenin, respectively. Samples were placed in 50 mM Mops, pH 7.0, 200 mM KCl and an oxygen scavenging system9 for measurement using a custom dual-trap optical tweezers apparatus described previously31. Extension trajectories were measured at equilibrium under a constant force of 9–10 pN, maintained by a passive force clamp to avoid artefacts in the transition region32, sampled at 50 kHz or 20 kHz and filtered online at the Nyquist frequency.
Folding trajectories of the leucine zipper were generously provided by C. Gebhardt and M. Rief. The sample preparation and measurement conditions have been described previously11. Briefly, the construct (consisting of three tandem repeats of the GCN4 leucine zipper) was attached to labelled DNA handles as done for PrP. Protein–DNA chimaeras were bound specifically to polystyrene beads held in dual-trap optical tweezers. Extension trajectories were measured in phosphate-buffered saline at equilibrium with a constant trap position, using a pretension such that all three states were occupied, sampled at 100 kHz and filtered at 20 kHz. Note that the lowest-extension state in these trajectories is not the fully folded native state of the leucine zipper, which unfolds at a much lower force, but rather a partially folded intermediate. For simplicity of labelling, however, we treat it here as the folded state under tension.
Transition paths were identified as the parts of the trajectory traversing between two boundaries, x1 and x2, respectively near the folded and unfolded states, chosen so as to allow the transition paths to be identified clearly while excluding most of the trajectory spent on non-productive attempts to cross the barrier. They were therefore located on the shoulders of the peaks in P(x) corresponding respectively to the folded and unfolded states, on the side of the peaks towards the barrier region, at the inflection points of Gaussian fits to the peaks in P(x). In the case of the leucine zipper, the two sequential transitions were analysed separately, as independent two-state transitions.
For PrP, the compliance-corrected distribution P0(x) was found as the equilibrium distribution expected from Boltzmann’s formula using the energy landscape for native folding calculated from non-equilibrium force-extension curves via the Hummer–Szabo formalism33,34, after deconvolution of compliance effects10. The resolution of the landscape reconstruction for PrP was 1.4 nm (ref. 10). For the leucine zipper, P0(x) was found by empirical deconvolution of P(x) using the measured point-spread function, taking into account the position-dependence of the point-spread function arising from the constant-trap-position measurement modality11. To maintain the normalization of p(TP|x), p(TP) was multiplied by , to correct for the fraction of the statistical weight in the transition region that was induced by the instrumental compliance, as described previously5. For PrP, p(TP|x) was calculated for 3,759 transitions; for the leucine zipper, 32,689 F ↔ I transitions and 283 I ↔ U transitions were analysed.
For calculating ϕ(x), the splitting probability was determined from the energy profile measured for each protein (Figs 3a and 4b, blue) via equation (2). The landscape after deconvolution was used in each case, to avoid artefacts from compliance effects35. The boundaries xf and xu were chosen to be near the folded and unfolded peaks in P(x); the result was insensitive to the precise choice of boundary location35.
To calculate pfold(x) empirically from the extension trajectory, for comparison with the landscape-derived committor (Fig. 3b), we used8 where δ is the Dirac delta function and the function c(t) is 1 if, in the interval after time t, the trajectory hits the folded state (represented by an absorbing boundary xf) before it hits the unfolded state (at xu); otherwise it is 0. As the misfolded states in the trajectory for PrP can alter pfold calculated from the trajectory but are very short-lived9, we minimized their influence by median-filtering the trajectory in a 1-ms window before calculating pfold. The result was relatively insensitive to the filter window size, in the range from ∼0.5 to 2 ms (Supplementary Fig. 1).
We thank C. Gebhardt and M. Rief for kindly providing data from the leucine zipper. This work was supported by the Alberta Prion Research Institute, Alberta Innovates Technology Solutions, the Natural Sciences and Engineering Research Council, and the National Research Council.