The energy cost of polypeptide knot formation and its folding consequences

Knots are natural topologies of chains. Yet, little is known about spontaneous knot formation in a polypeptide chain—an event that can potentially impair its folding—and about the effect of a knot on the stability and folding kinetics of a protein. Here we used optical tweezers to show that the free energy cost to form a trefoil knot in the denatured state of a polypeptide chain of 120 residues is 5.8 ± 1 kcal mol−1. Monte Carlo dynamics of random chains predict this value, indicating that the free energy cost of knot formation is of entropic origin. This cost is predicted to remain above 3 kcal mol−1 for denatured proteins as large as 900 residues. Therefore, we conclude that naturally knotted proteins cannot attain their knot randomly in the unfolded state but must pay the cost of knotting through contacts along their folding landscape.

obtained using the "hedgehog" method 1 . First, a set of vectors -the axes of the cylindersis sampled uniformly at random from the unit sphere (a "hedgehog" as the vectors have a common origin). Next, for each vector in the set, a non-parallel vector is randomly chosen, with the pair then being rotated by a random angle about the axis determined by ,. This process is repeated about 500 times. The vectors of this random hedgehog are then randomly selected to build a chain. Last, if 0, this chain is checked for compliance with the EV constraint: if the distance between the axes of any pair of non-adjacent cylinders is greater than , then the chain is accepted, otherwise it is discarded.

Identification of knotted chains present in the ensemble of open chains.
The knot type of a particular chain is determined as follows. Using Taylor's KNOT program 2 , the chain was iteratively contracted around its fixed termini as if it were a rubber band, checking that not any two parts of it passed through each other. If within 500 iterations the chain is reduced to just its termini, it is classified as unknotted. Otherwise, the resulting configuration is fed into the Harris and Harvey's Knot program 3 , which computes the Alexander polynomial of the configuration to determine its knot type (this program implements the algorithm developed by Vologodskii and co-workers 4 ). In what follows we shall denote by the set of all knots with 10 or fewer crossings 5 . The knot with 3 crossings is called the trefoil or 3 knot, and that with 0 crossings is called the trivial knot, unknot or 0 knot. are the sample mean and variance, respectively. Here, the probabilities , , are assumed to be independent and identically distributed with mean and variance that are unknown, so that 6 is estimated with the unbiased estimator , and with the unbiased estimator , . In particular, the variance (mean squared error) / of the sample mean in (Eq. 2) is estimated by , / .
Previous studies with closed DNA modeled as a freely-jointed chain of cylindrical segments, found an exponential decay relation of the probability of knotting with the segment's diameter-to-length ratio 1 . Here, Arc-L1-Arc is modeled as an open freely-jointed chain with 30 Kuhn segments of diameter and length 2 , where is the persistence length of the polypeptide. In the notation of our model and in Kuhn length units, the polypeptide is represented by an SAPC of 30 cylindrical segments of length one and diameter /2 (the segment's diameter-to-length ratio). In Fig. 4B of the main text we show , as a function of thickness for a trefoil knot of length 30 and thickness in the range 0.00 to 0.40. Taking the experimentallydetermined values of the diameter (0.58 nm) and persistence length (0.7 nm) for a polypeptide chain, we obtain a thickness of 0.4. For this value, the observed difference in free energy between the unknotted and 3 -knotted chain states is about 6.4 kcal mol -1 , in agreement with the value deduced from the experimental data, 5.8 1 kcal mol -1 .
In order to obtain the variation of the free energy cost of forming a 31 knot and allknots with 10 or fewer crossings ( ) as a function of the chain length ( Figure 4C where yo = 4.4 ± 0.07 kcal mol -1 , α = 4 ± 0.14 kcal mol -1 and β = 0.033 ± 0.026. Within the accuracy of the calculations, the fitted curve for all-knots in was indistinguishable from that for the 31 knot. Knotting Probability for a ring polymer. A closed chain is modeled by a self-avoiding polygon (SAP), which is a SAPC whose ends coincide. This introduces a closure constraint requiring that the sequence of cylinders' axes forms a polygon. The conformational ensemble of SAPs is generated as for the SAPCs save for the following change made to enforce the closure constraint 1 : We choose an even number, , of vectors -the cylinders' axes-so that the odd-indexed vectors are chosen randomly and independently, whereas the even-indexed ones are set . Thus, ∑ 0, which naturally enforces the closure constraint. All subsequent operations carried out to exclude pair correlations between the vectors always leave this sum invariant. The resulting chain is automatically closed. , determined for open chains, and is comparable to that reported by Rybenko and coworkers 9 ( 22), obtained from the analysis of closed wormlike chains of 16 to 60 segments. However, when the fitting of the open chains' data to an exponential decay function is extended to thickness of 0.4 the proportionality constant decreases to = 18.3. This occurs because the variation of the observed probabilities with chain thickness depart from an exponential function as is apparent from the /DF obtained for an exponential fitting /DF = 5.0). Such deviation for chains of 30 segments is also observed in the work of Klenin and coworkers 1 without interpretation.
Calculation of the radius of gyration. The square radius of gyration of an -segment SAPC of diameter is given by where is the distance between the SAPC's vertices and . The mean-square radius of gyration of such chains is then where the brackets ⋯ denote the average over all possible configurations of , … , . The root-mean-square radius of gyration is denoted by , , .
In an experiment, where out of a sample of , chains , 0 are -knotted, the mean-square radii of gyration of -knotted and of all chains are computed respectively as As for the probabilities, if the experiment is repeated times, yielding root-mean-square radii of gyration , , , 1 , ∈ , , the best estimate for the root-meansquare radius of gyration , is , , , , 12 where , and , are the sample mean and variance, respectively [see equations (2 and 3)]. In the absence of excluded volume interactions, 0.0, the following well known analytical result holds (assuming unit length segments) which can be used to check the accuracy of , 0.0 obtained from Eq (12).
In Supplementary Fig. 8   Each unfolding force is plotted against its corresponding refolding event determined for Arc-L1-Arc (turquoise and magenta dots). Average unfolding and refolding forces obtained for each stretched molecule (turquoise and magenta with errors bars) are also plotted, error bars represent the average SD. Turquoise and magenta symbols represent the Arc-L1-Arc molecules whose unfolding/refolding forces were classified as "high" and "low" respectively. b. Cluster analysis of unfolding and refolding forces of pARC. Superimposed on the data are the average unfolding and refolding forces obtained for each stretched molecule (blue triangles with errors bars).
Refolding force (pN)  half of the time folded or unfolded (F1/2), 10 ± 1 pN for knotted Arc-L1-Arc, 6.1 ± 0.5 pN for the unknotted one, and 6.8 ± 0.5 pN for pARC. The molecular extension calculated at F1/2 for the knotted, unknotted proteins and pARC are 16.2 ± 1, 13.4 ± 0.9 and 15.4 ± 0.7 nm, respectively. Free energy (ΔGF1/2) is equal (F1/2) times the change of extension ∆x(F1/2) between folded and unfolded states corrected by the stretching work of the unfolded state. The continuous lines represent a fit to Dudko equation to describe the unfolding force distributions (left panels) using parameter υ = 1/2 (blue), υ = 2/3 (red) and υ = 1 (Bell's formula, green) and the semilogarithmic dependence of the unfolding lifetimes (right panels). The dashed lines represent simulations to describe the distribution of refolding forces (left panel) and their semilogarithmic force dependence of lifetimes (right panels) for the parameter υ = ½.