The energy cost of polypeptide knot formation and its folding consequences

Bustamante, Andrés; Sotelo-Campos, Juan; Guerra, Daniel G.; Floor, Martin; Wilson, Christian A. M.; Bustamante, Carlos; Báez, Mauricio

doi:10.1038/s41467-017-01691-1

Download PDF

Article
Open access
Published: 17 November 2017

The energy cost of polypeptide knot formation and its folding consequences

Andrés Bustamante¹^na1,
Juan Sotelo-Campos²^na1,
Daniel G. Guerra³,
Martin Floor¹,
Christian A. M. Wilson¹,
Carlos Bustamante^3,4 &
…
Mauricio Báez¹

Nature Communications volume 8, Article number: 1581 (2017) Cite this article

4297 Accesses
28 Citations
2 Altmetric
Metrics details

Subjects

An Author Correction to this article was published on 14 December 2017

This article has been updated

Abstract

Knots are natural topologies of chains. Yet, little is known about spontaneous knot formation in a polypeptide chain—an event that can potentially impair its folding—and about the effect of a knot on the stability and folding kinetics of a protein. Here we used optical tweezers to show that the free energy cost to form a trefoil knot in the denatured state of a polypeptide chain of 120 residues is 5.8 ± 1 kcal mol⁻¹. Monte Carlo dynamics of random chains predict this value, indicating that the free energy cost of knot formation is of entropic origin. This cost is predicted to remain above 3 kcal mol⁻¹ for denatured proteins as large as 900 residues. Therefore, we conclude that naturally knotted proteins cannot attain their knot randomly in the unfolded state but must pay the cost of knotting through contacts along their folding landscape.

Mechanical unfolding of a knotted protein unveils the kinetic and thermodynamic consequences of threading a polypeptide chain

Article Open access 12 June 2020

Mechanical scission of a knotted polymer

Article Open access 22 April 2024

The protein folding rate and the geometry and topology of the native state

Article Open access 16 April 2022

Introduction

In natural proteins, the information encoded in their amino acid sequences directs their folding and attainment of the native state following precise kinetic and thermodynamic principles. Interestingly, analysis of the protein data bank reveals that only a small fraction (about 1%^{1, 2}) of all folded proteins form knots in their interior. This observation is somewhat surprising, given that knots are abundant in ensembles of generic equilibrated polymers^{1, 3}, where entanglements inevitably arise with increasing chain length and compactness^4,5,6,7,8. One common explanation for this bias is that knotting is not easily reconciled with the kinetics of folding because chains would require several well-coordinated steps to form a knot in a precise conformation^9,10,11. According to this view, most proteins could avoid knots because the kinetics of knotting is complex and slow^12,13,14 relative to that of folding. In this view, kinetic mechanisms encoded on a selected group of sequences play a central role in minimizing the entanglement of unknotted structures and guide the efficient formation of knots for that subset of chains.

Yet, a paucity of information exists today regarding the thermodynamic and kinetics principles that govern knot formation in globular proteins. Specifically: How are the folding path and the folding rate of these proteins affected by the knots? What prevents other proteins from undesired knotting? And what is the functional role, if any, of natural knots? Here, to address some of these questions, we determine the energetic cost of knotting of an unfolded polypeptide chain and its effect on the folding process.

Numerical simulations of extended random self-avoiding homopolymer models suggest that knots should be rare in extended configurations of proteins^{4, 15, 16}. However, these predictions have not been tested experimentally, because it is difficult to detect the presence of knots among ensembles of non-native states. Moreover, many denatured protein states show various degrees of residual structure and flexibility unaccounted for in these calculations^17,18,19 and, denatured states of some naturally knotted proteins appear to have great difficulty to untie spontaneously in experiments performed in bulk^{12, 20, 21}. In such cases, the application of direct approaches, like nuclear magnetic resonance²² and small-angle X-ray scattering²³ have been inadequate to establish the unknotted and knotted populations in the denatured state. Recently, Ziegler et al.¹³, have studied the presence or absence of a knot in the mechanically unfolded state of ubiquitin C-terminal hydrolase isoenzyme L1 (UCH-L1) by pulling its structure from different points with optical tweezers. However, the structural complexity and the presence of multiple intermediates prevented them from determining the effect of a knot on the thermodynamic stability of a protein and the cost of its spontaneous formation in the mechanically unfolded state¹³. We have developed an alternative experimental approach based on the mechanical denaturation of a small artificial protein that overcomes these limitations.

Phage 22 ARC repressor is a homodimer with two RHH motifs²⁴ whose single-chain version—dubbed Arc-L1-Arc^{25, 26}—has been proposed as the first artificial protein containing a 3₁ knot²⁷. Using optical tweezers, we found that Arc-L1-Arc not only presents a 3₁ knot but also populates an unknotted configuration in its native state. We show that the native unknotted and knotted states of Arc-L1-Arc are almost isoenergetic, whereas these structures present different free energies of stabilization with respect to their respective mechanically unfolded states. Therefore, most of the difference in protein stability is due to the presence of the 3₁ knot trapped in the denatured state of the knotted form. The free energy cost to form a trefoil knot in the denatured state of Arc-L1-Arc is high (5.8 ± 1 kcal mol⁻¹) and also independent of the protein sequence since this value is well predicted by Monte Carlo simulations of random chains^{5, 6}. Application of random chain models to calculate the knotting cost for longer polypeptides predicts that knots are rare events in denatured states of proteins in general. Further analysis of unfolding and refolding force distributions indicates a moderate effect of a knot on the position of the transition state and unfolding barrier, together with a significant decrease of the distance from the denatured to the transition state. Altogether, we conclude that knots are avoided by their high cost of formation in unfolded chains and we surmise that naturally knotted proteins must have evolved specific folding pathways to pay the cost of knotting through contacts along their folding landscape.

Results

Mechanical unfolding of Arc-L1-Arc reveals its dual topology

Models of Arc-L1-Arc predict that it should form a 3₁ knot when the artificial L1 loop passes over the C-terminal helix of the last RHH motif (Fig. 1a, loop in turquoise) and an unknotted conformation otherwise (Fig. 1a, L1 loop in magenta). The scoring energy of 6000 structures generated by the Rosetta software²⁸ indicates that the lowest energies of the unknotted and knotted conformations of Arc-L1-Arc are almost isoenergetic (Fig. 1b). Therefore, there is a reasonable chance to pick up either one of the two conformations by mechanically pulling on the folded protein near the N- and C-termini in an equilibrated ensemble. Two DNA handles²⁹ were attached to cysteine residues placed near the C- and N-terminal ends of Arc-L1-Arc trapping each molecule in only one knotted or unknotted conformation (Fig. 1c, left and right panels, respectively). We took advantage of this feature to compare the energy associated with the unfolding of a knotted and unknotted protein and to determine the energetic cost to form a knot in the unfolded state.

Figure 2a shows the force–extension curves of two Arc-L1-Arc molecules (upper and bottom panels) stretched and relaxed several times at 100 nm s⁻¹. Both molecules display single folding/unfolding transitions with unimodal force distributions, indicating that Arc-L1-Arc obeys a two-state unfolding mechanism. However, the molecule in the upper panel unfolds and refolds at lower forces (9.7 ± 1.9 and 4.6 ± 0.4 pN, respectively) than that in the lower panel (16.2 ± 3.5 and 7.0 ± 0.4 pN, respectively). Each tested molecule followed either one of these unfolding/refolding behaviors (Supplementary Fig. 1), resulting in a bimodal distribution for the entire data set of 32 Arc-L1-Arc molecules (Fig. 2b). Cluster analysis of unfolding and refolding peers (Supplementary Fig. 2a and Supplementary Note 1) confirms the conclusion arrived at by visual inspection: the heterogeneous behavior of unfolding and refolding forces arises from two types of Arc-L1-Arc molecules that do not interconvert in the presence of the DNA handles. Table 1 shows the average forces for each type of molecule obtained from two histograms generated by pooling the data of both types separately. Force–extension curves of Arc-L1-Arc molecules that undergo folding/unfolding transitions at high and low forces (Fig. 3a) yield contour lengths (L _c) of 36.9 ± 2.8 nm (Fig. 3b) and 40.2 ± 3.8 nm (Fig. 3c), respectively. The latter value is similar to that calculated for the unknotted conformation of Arc-L1-Arc (42 nm), while a tight 3₁ knot is predicted to decrease a chain’s L _c by 4.7 ± 0.4 nm^{30, 31}.

Table 1 Thermodynamic and mechanical properties of Arc-L1-Arc and pARC

Full size table

The heterogeneous behavior of Arc-L1-Arc was not detected when pulling a non-circular permutant of it, pARC. This molecule retains the architecture of Arc-L1-Arc without the linker L1, making it impossible to form a knot³² (Supplementary Fig. 3). The scatter plot of its unfolding and refolding forces coincides with that of Arc-L1-Arc molecules having low unfolding and refolding forces (Fig. 2b and Supplementary Fig. 2b). Also, pARC’s L _c (37.1 ± 3.8 nm) agreed well with its theoretical value for the fully unfolded protein (36.05 nm, Supplementary Fig. 4a, b). Together, these results indicate that the linker L1 in Arc-L1-Arc allows the protein to form either a knotted or an unknotted structure that are trapped and rendered non-interconvertible by placing the DNA handles.

Calculation of the energy cost of polypeptide knot

Does the formation of a knot confer thermodynamic besides mechanical stability to the molecule? To address this issue, we determined the free energy differences between the folded and unfolded states for the knotted and unknotted Arc-L1-Arc molecules (ΔG _CFT). Since the folding and unfolding transitions occur irreversibly (away from equilibrium), we used the Crooks fluctuation theorem (CFT) to obtain these free energies from irreversible work distributions^33,34,35,36 of unknotted (Supplementary Fig. 5a) and knotted molecules (Supplementary Fig. 5b). The values of ΔG _CFT for folding of unknotted (Fig. 4a, upper equilibrium) and knotted molecules of Arc-L1-Arc (Fig. 4a, lower equilibrium) were ΔG _{CFT(unknotted)} = 6.2 ± 0.2 kcal mol⁻¹ and ΔG _CFT(knotted) = 12.2 ± 0.2 kcal mol⁻¹, respectively, after correcting for the reversible work of stretching the unfolded chain in the pulling process (see Methods section). Folded and unfolded lifetimes plotted as a function of force intersect at the force where the molecule spends 50% of the time folded or unfolded (F _1/2; Supplementary Fig. 6, right panel); together with the change of extension between the folded and unfolded state (Supplementary Fig. 6), F _1/2 can be used to obtain an independent value for the folding free-energy (ΔG _F1/2)³⁷. Figure 4a shows the values of ΔG _F1/2 obtained for unknotted (upper equilibrium; ΔG _{F1/2 (unknotted)} = 5.4 kcal mol⁻¹) and knotted (lower equilibrium; ΔG _{F1/2 (knotted)} = 12.3 kcal mol⁻¹) states of Arc-L1-Arc. These values are consistent with those derived from the CFT analysis (see Table 1).

The extra thermodynamic stability conferred to the molecule by the knot cannot have its origin in differences between the native states of the knotted and unknotted Arc-L1-Arc, since the Rosetta scoring potential predicts them to be nearly isoenergetic (Fig. 1b, ΔG _N = G _N,unknotted−G _N,knotted = 0.5 kcal mol⁻¹, see Methods section for calculation). A similar value is obtained (ΔG _N = 0.9 kcal mol⁻¹) assuming that the 6 unknotted and 26 knotted folded molecules captured in the optical tweezers represent the equilibrium population between these native states. Therefore, the knotted protein gains an extra thermodynamic stability of about 6 kcal mol⁻¹ (ΔG _{CFT (knotted)}) because its denatured (knotted) reference state is destabilized relative to the denatured unknotted state by this amount. This quantity should be the energy cost to form a 3₁ knot in the unfolded state of Arc-L1-Arc. Formally, the energy cost to form a 3₁ knot in the unfolded state (Fig. 4a; ΔG _U) can be obtained from the thermodynamic cycle connecting the native and denature conformations of knotted and unknotted of Arc-L1-Arc (Fig. 4a). Using the value of ΔG _N (left vertical equilibrium in Fig. 4a) and stability values determined experimentally for the unknotted and knotted conformations of Arc-L1-Arc, we calculate a free energy difference for the spontaneous formation of a 3₁ knot in the unfolded state (ΔG _U) of 5.8 ± 1 kcal mol⁻¹ (right vertical equilibrium in Fig. 4a). Thus formation of a 3₁ knot incurs a large energy penalty for the unfolded polypeptide.

Random model predicts the energy cost of polypeptide knot

The value of ΔG _U does not have an ensemble counterpart to be compared to. We thus adapted a Monte Carlo method previously applied to the knotting of dsDNA molecules in diluted solutions (modeled as chains of rigid impenetrable cylinders of equal length and diameter)^{5, 6} to determine the equilibrium fraction of 3₁ knots in the unfolded state of Arc-L1-Arc. The fraction of knotted configurations depends on the persistence length of the chain and its effective diameter (Supplementary Methods and Supplementary Fig. 7) and was used to calculate the entropic free energy of formation of 3₁ knots in unfolded chains (ΔG _S; Eq. 7 in Supplementary Methods). This value was compared with the free energy cost of creating a 3₁ knot in the denatured state of Arc-L1-Arc deduced from the thermodynamic cycle (ΔG _U = 5.8 ± 1 kcal mol⁻¹; Fig. 4a). A match is obtained for an effective chain diameter (D) of 0.49 nm for the unfolded state of Arc-L1-Arc (see graphical interpolation in Fig. 4b). This value is close to the effective diameter of 0.58 nm estimated by intrinsic viscosity measurements for denatured protein chains³⁸; moreover, the average radius of gyration of the simulated configurations (3.5 ± 0.06 nm for an effective diameter of 0.49 nm, Supplementary Fig. 8) is very close to the ~ 3.3–3.5 nm obtained experimentally for denatured states of similar size^{17, 39}. Comparable ΔG _U and radius of gyration values are predicted by simulations of flexible open chains made up of beads (diameter = 0.4–0.5 nm) with the length of the Arc-L1-Arc repressor (100 beads)⁴⁰. The correspondence between experiments and random model computations indicates that ΔG_U is largely of entropic origin and, therefore, independent of protein sequence.

Experiments have shown that some average properties, such as the radius of gyration, scale with the protein length and are independent of the sequence, as predicted by random chain models⁴¹. Our results suggest that the knotting probability of denatured proteins is another such property. Thus it should be possible to predict the knotting probability of denatured proteins as a function of the chain length. In Fig. 4c, we expand the range of our calculations to chains of 370 residues (~ 100 Kuhn segments of length 1.4 nm or ~ 3.7 amino acids each and of effective diameter 0.49 nm). These calculations predict a value of ΔG _U of 4.4 kcal mol⁻¹ for chains of 100 Kuhns. The slow decay with chain length observed for ΔG _U indicates that knots are rare events for natural proteins (up to 250 Kuhns or 900 residues). This conclusion is also supported by simulations performed for closed chains. These simulations predict that the knotting probability becomes important (ΔG _U approaches zero) only for chain lengths of about 10,000 Kuhn segments⁴². The propensity to form a knot should be even lower since these simulations were performed using a chain diameter of 0.28 nm, smaller than that derived here (0.49 nm, Fig. 4b).

A knot alters the folding kinetics of Arc-L1-Arc

Using the Bell⁴³ and Dudko–Hummer–Szabo⁴⁴ models, we extracted kinetic information for the unfolding/refolding processes (Supplementary Fig. 6, left panel). This analysis indicated that the knot in the native state of Arc-L1-Arc has a moderate effect on the unfolding barrier. The distances between the folded and transition states for the knotted and the unknotted proteins do not differ significantly (Δx _u ^‡ = 1.8−1.2 nm, for the knotted; and Δx _u ^‡ = 3.2−1.6 nm, for the unknotted) and their corresponding unfolding rates at zero force are also similar (Supplementary Table 1). These results are to be expected since the L1 loop creates a superficial knot that is not involved in the formation of the hydrophobic core of Arc-L1-Arc (Fig. 1a). In contrast, the widths of the refolding force distributions of the knotted and unknotted configurations display significant differences (Supplementary Fig. 6, upper and middle panels). The Dudko–Hummer–Szabo function describing the refolding force distributions⁴⁵ does not yield a reliable fitting in either case. Therefore, we sought to determine what modifications in the kinetic parameters (Δx^≠, k _f, ΔG ^≠) reproduced the differences observed in the refolding force distributions between knotted and unknotted chains (Supplementary Fig. 6, upper and middle panels). Simulations revealed that a decrease of ~ 4 nm in the distance from the unfolded to the transition state (Δx ^≠) of the knotted protein—compared to its unknotted counterpart—is necessary to obtain the distributions of forces determined experimentally (Supplementary Fig. 6, upper and middle panels, Supplementary Table 2). Since the positions of the unfolding barriers are virtually unchanged by the knot (Supplementary Table 1), this decrease points to a displacement of the position of the unfolded state minimum rather than to a movement of the transition state due to the presence of the knot.

Discussion

The success of random chain models in predicting correctly the knotting probability of a protein has implications about how proteins avoid knots. Experiments^{20, 21} and simulations with polymer lattices⁴⁶ have reported that knots are persistent topologies after denaturing naturally knotted proteins. Consequently, the large energetic penalty to form knots in the denatured state of proteins seems a natural way for unknotted proteins to avoid such topological traps. On the other hand, several studies indicate that knots in the native states of globular proteins are much less frequent than predicted by random chain models^{47, 48}. Therefore, additional factors such as kinetic mechanisms encoded on sequences⁴⁹, the emergence of secondary structures in compacts states⁴⁷, etc., must play a role in reducing the frequency of knots observed among proteins.

The knotting cost calculated here has important consequences for the understanding of the folding of naturally knotted proteins. First, once the native knotted state has been attained, it finds itself “stabilized” relative to its knotted denatured state by an additional barrier of 6 kcal mol⁻¹. This added stability may explain the conservation of knots in some protein families. Knots observed in the denatured states of naturally knotted proteins, like α/β methyltransferases, are resilient structures difficult to untie^{20, 21}. Kinetic trapping of these knots in the denatured state should confer an effective thermodynamic stabilization to the proteins if their unknotted denatured states are not accessible in their biological time scale, as has been proposed¹². Second, for proteins in which the knot is not superficial as in Arc-L1-Arc—but internal and integral to the folded structure—the high cost of knot formation in the denatured state implies that they must find a different kinetic path to attain the folded knotted state. This path, depicted schematically along the diagonal on Fig. 4a, corresponds to routes in which the price of knotting (5.8 ± 1 kcal mol⁻¹) is paid along the way through the simultaneous formation of side chain contacts as the protein collapses toward the folded knotted state^{9, 12}. For example, it has been suggested that preordered intermediates would help to create the correct disposition of a loop for further chain threading^{9, 10}. Such paths must have evolved among these proteins to ensure the efficient attainment of the knot during their folding. Finally, in some cases, such as the naturally knotted protein UCH-L1, the slower folding from unknotted unfolded states¹³ suggests that some of this cost may not be paid entirely through folding contacts.

Methods

Modeling of dual conformations of Arc-L1-Arc

The 3₁ knot of Arc-L1-Arc has been inferred by comparison with the X-ray structure of the naturally knotted homologue VirC2 of Agrobacterium tumefaciens ^{27, 50}. VirC2 displays two RHH motifs connected by a loop that covers the C-terminal helix of the last RHH creating a 3₁ knot in the polypeptide chain. Reconstruction of the L1 loop of Arc-L1-Arc create a 3₁ knot as seen in VircC2. The presence of the 3₁ knot in Arc-L1-Arc is dependent on the loop position. In order to explore the conformation of the L1 linker in the Arc-L1-Arc construct, a loop modeling prediction was applied using the Rosetta software²⁸. First, a local all-atom search algorithm (FastRelax) was applied to the whole Arc dimer (PDB ID: 1ARR) in order to minimize its structure according to Rosetta’s energy function⁵¹. This search gave us information about the structural flexibility of the anchorage residues for the L1 linker, which made it possible to define the region of the protein to be conformationally explored. Six residues before (i.e., EGRIGA) and six residues after (i.e., MKGMSK) the inserted L1 linker (GGGSGGGTGGGSGGG) were included in the loop definition, totaling 27 residues. 6000 ab initio reconstructions of the loop region were produced by the KIC loop modeling algorithm⁵², which were then topologically classified into knotted and unknotted structures. This assortment was carried out using the knot detection algorithm included into Rosetta’s framework⁵³. The root-mean-square deviation of every structure was calculated relative to the lowest energy model produced and plotted against their Rosetta score value in order to analyze the relative energy profile of the Arc-L1-Arc topological conformations.

The probability of obtaining a knotted or unknotted conformation for the structures modeled by Rosetta was calculated according to a Boltzmann distribution. First, the probability of each unknotted (P _unknotted) or knotted (P _knotted) conformation was calculated as:

$$P_i = \frac{{e^{ - \left( {G_i/k_{\rm B}T} \right)}}}{{\mathop {\sum }\nolimits_j^N e^{ - \left( {G_j/k_{\rm B}T} \right)}}}$$

where N is the total number of structures classified as knotted or unknotted, i is the ith conformation, G _i is the Rosetta energy obtained for conformation i, k _B is the Boltzmann constant, and T is the absolute temperature. The temperature term k _B T was set to 1 Rosetta Energy Unit (REU). 1 REU corresponds to about 0.57 kcal mol⁻¹ based on comparative studies between experimentally determined ΔΔGs and the REU predicted by Rosetta upon punctual mutations of proteins⁵⁴. The free energy difference between knotted and unknotted structures (ΔG _N = G _unknotted-G _knotted) was estimated by ΔG _N = −RT ln(P _unknotted)/(P _knotted), where R is the ideal gas constant.

Purification and chemical modification of proteins

Genes encoding Arc-L1-Arc G3C/G121C and pARC G3C/G105C mutants were synthesized and purchased from Gen Script (Piscataway, NJ, USA, Supplementary Table 3). The proteins were expressed and purified as described previously by Robinson et al., 1996²⁵ except that a cationic exchange column was replaced by a step of size exclusion chromatography. Escherichia coli BL21 (DE3) cells (Thermo Fisher) containing the overexpressed proteins were harvested and lysed in 5 M GdnHCl, 10 mM Tris-HCl pH 8.0, 50 mM Phosphate, and 40 mM Imidazole. The lysate was loaded onto a HisTrap FF Crude (GE Healthcare) Ni²⁺ affinity column and eluted with 5 M GdnHCl plus 0.2 M acetic acid. The eluate was reduced with 50 mM dithiothreitol (DTT) overnight. The reduced protein was refolded into a gel filtration column (Superdex 75/300 GL) equilibrated in 25 mM Tris, pH 7.6, 0.2 M KCl, and 0.2 mM EDTA. Samples eluted from the size exclusion column were immediately incubated with a 50-fold molar excess of 2,2’-Dithiodipyridine (DTDP) for 2 h at room temperature. The excess of DTDP was removed using two Micro Bio-Spin columns (Bio-Rad) equilibrated with 25 mM Tris pH 7.6, 0.2 M KCl, 0.1 mM EDTA, and 0.001% Tween-20. The DTDP–protein reaction was monitored spectrophotometrically at 343 nm observing the release of pyridine-2-thione (ε_343nm = 7060 M⁻¹cm⁻¹). The stoichiometry of DTPD modification was determined as the ratio between the pyridine-2-thione and the protein concentrations as described by Cecconi et al.²⁹.

Attachment of DNA handles to proteins

The 558 bp DNA handles were synthesized by PCR reaction using biotinilated, digoxigenin, and SH-modified oligos (Supplementary Table 4) as described by Cecconi et al. 2008²⁹. A 30–40 µM SH-DNA handle solution in 10 mM Tris pH 8.5 was reduced with 10 mM DTT at room temperature for 2 h and then buffer exchanged into 25 mM Tris pH 7.6, 0.2 M KCl, 0.1 mM EDTA, and 0.001% Tween-20 using two Micro Bio-Spin columns (Bio-Rad). The reduced DNA solution was immediately mixed with the DTDP-activated proteins in molar ratio of 8:1 (DNA:Protein) during 6 h at room temperature following by 12 h at 4 °C. The DNA–protein complexes were purified in batch using Ni-NTA resin (QIAGEN). The unbound DNA was removed washing eight times with 25 mM Tris pH 7.6 and 0.2 M KCl. The complexes were eluted using the former buffer supplemented with 1 M imidazole. Once purified, the DNA–protein complex solution was supplemented with 0.1 mM EDTA and 50% Glycerol for storage at −80 °C.

Optical tweezer experiments and data analysis

Experiments were performed using a MiniTweezers device⁵⁵. The protein was pulled at constant speed of 100 nm s⁻¹ between 2 and 30 pN^{35, 56}. The molecules analyzed were those that displayed DNA overstretching transitions. Force–extension trajectories were analyzed using a Matlab program developed in Bustamante’s laboratory.

Contour length calculation

The molecular extension change (Δx _exp) and the folding (F _R) and unfolding (F _U) forces of the stretched proteins was determined for each force–extension curve. In order to calculate the experimental contour length (L _{c exp}) the Δx _exp data were compared with the molecular extension change predicted by the WLC model (Δx _theoretical) for each observed force. To calculate Δx _theoretical values, the theoretical contour length (L _{c theoretical}) expected for the full Arc-L1-Arc (42 nm, 120 residues × 0.35 nm per aa) or pARC (36.05 nm, 103 residues × 0.35 nm per aa) and a persistence length of 0.65 nm were used as parameters for the WLC model. In this way, a Δx _theoretical value was determined for each force. The L _{c exp} distribution was calculated as follows:

$$L_{c\,{\rm exp}} = \left( {\frac{{\Delta x_{{\rm exp}}}}{{\Delta x_{{\rm theoretical}}}}} \right){\mathrm{*}}L_{c\,{\rm theoretical}}$$

Calculation of reversible work of unfolding

The work observed for each unfolding/refolding transition (W _obs) was calculated integrating the area of a rectangle delimited by the unfolding force on both sides of the transition. In this method, the stretching work of the DNA handles is automatically canceled³⁵. In order to obtain the unfolding/refolding works in absence of forces (W ^F=0), the work done to stretch the unfolded state (W _s) was subtracted from the work done to unfold mechanically the molecule (W _obs)³⁴:

$$W^{F = 0} = W_{{\rm obs}} - W_{\rm s}$$

The value of W _s was determined by integration of the WLC between zero and each unfolding and refolding force of the protein. In this way, each value of W _obs was corrected by the corresponding stretching work of the unfolded state. In the case of Arc-L1-Arc, the integration of WLC was obtained using a persistence length of 0.65 nm and a contour length of 42 or 37 nm depending on the type of molecule analyzed—those displaying low average unfolding/refolding forces (unknotted) or those displaying high average unfolding/refolding force (knotted)—respectively. For pARC, W _s calculations were done using a 36.05 nm L _c and a persistence length of 0.65 nm. W _s calculations were done using a Matlab script, kindly provided to us by Jessie Dill and modified in order to calculate various W _s at once. The distributions of $W^{F = 0}$ were adjusted to a Gaussian function and the ΔG° was calculated as the point of intersection between the folding and unfolding probability distributions³³. The ∆G _F1/2 was calculated as F _1/2 times ∆x _(F1/2), corrected by the stretching work of the unfolded state (W _s). F _1/2 was determined as the point of intersection of the force dependent lifetimes of the folding/unfolding process and ∆x _(F1/2) by using the L _c of the unknotted and knotted Arc-L1-Arc.

Data analysis

The analysis was done using SigmaPlot 10. Histograms were generated using the automatic binning option. The algorithm calculates the number of bins for representation based upon the number of data points according to:

$${\rm No.\ of\ bins} = 3 + \frac{{\log \left( N \right)}}{{\log \left( 2 \right)}}$$

where N is the number of data points.

Fitting procedures to force distributions were performed using the dynamic fit option of SigmaPlot 10.

Cluster analysis

Each folding force (F _R) was paired with its corresponding unfolding force (F _U). Each (F _R, F _U) pair (available at https://doi.org/10.6084/m9.figshare.5195710.v1)⁵⁷ was represented as a point on a two-dimensional plane. Clustering was done using the SPSS software by applying a Biphasic agglomerative hierarchical algorithm⁵⁸. Similarity between data points was determined by calculating the distance between them in the plane. In this case, A and B are data points with coordinates (F _R ^A, F_U ^A) and (F _R ^B, F _U ^B), respectively.

Accordingly, the shorter the distance between two (F _R, F _U) pairs of points the higher the likelihood that these points will be clustered in the same class. The first step of the biphasic algorithm is a preclustering to form groups of points that are near each other. Once preclusters are formed, they are classified using a hierarchical clustering algorithm based on a centroid method. For the set of preclusters formed in the first step, a similarity matrix is computed by using the distance between preclusters' centroids. The matrix is scanned to identify the lower value representing the most similar preclusters. Those preclusters are joined and the similarity matrix is updated by the centroid’s preclusters, replacing the joined elements. The process continues until only one element is left.

Code availability

The source code for the version of the computer program used in this study is available from the corresponding authors upon reasonable request.

Data availability

Data set of force, distance, and work: https://doi.org/10.6084/m9.figshare.5195710.v1. All other data supporting these findings are available from the corresponding authors upon reasonable request.

Change history

14 December 2017
The original version of this article contained an error in the spelling of the author Christian A.M. Wilson, which was incorrectly given as Christian M.A. Wilson. This has now been corrected in both the PDF and HTML versions of the article.

References

Lim, N. C. H. & Jackson, S. E. Molecular knots in biology and chemistry. J. Phys. Condens. Matter 27, 1–35 (2015).
Article CAS Google Scholar
Jamroz, M. et al. KnotProt: a database of proteins with knots and slipknots. Nucleic Acids Res. 43, D306–D314 (2015).
Article CAS PubMed Google Scholar
Meluzzi, D., Smith, D. E. & Arya, G. Biophysics of knotting. Annu. Rev. Biophys. 39, 349–366 (2010).
Article CAS PubMed Google Scholar
Virnau, P., Kantor, Y. & Kardar, M. Knots in globule and coil phases of a model polyethylene. J. Am. Chem. Soc. 127, 15102–15106 (2005).
Article CAS PubMed Google Scholar
Rybenkov, V. V., Cozzarelli, N. R. & Vologodskii, A. V. Probability of DNA knotting and the effective diameter of the DNA double helix. Proc. Natl. Acad. Sci. USA 90, 5307–5311 (1993).
Article ADS CAS PubMed PubMed Central Google Scholar
Shaw, S. Y. & Wang, J. C. Knotting of a DNA chain during ring-closure. Science 260, 533–536 (1993).
Article ADS CAS PubMed Google Scholar
Poier, P., Likos, C. N. & Matthews, R. Influence of rigidity and knot complexity on the knotting of confined polymers. Macromolecules 47, 3394–3400 (2014).
Article ADS CAS PubMed PubMed Central Google Scholar
Dai, L., Renner, C. B. & Doyle, P. S. Metastable tight knots in semiflexible chains. Macromolecules 47, 6135–6140 (2014).
Article ADS CAS Google Scholar
Noel, J. K., Sulkowska, J. I. & Onuchic, J. N. Slipknotting upon native-like loop formation in a trefoil knot protein. Proc. Natl. Acad. Sci. USA 107, 15403–15408 (2010).
Article ADS CAS PubMed PubMed Central Google Scholar
Chwastyk, M. & Cieplak, M. Multiple folding pathways of proteins with shallow knots and co-translational folding. J. Chem. Phys. 143, 1–6 (2015).
Article Google Scholar
Faisca, P. F. Knotted proteins: a tangled tale of structural biology. Comput. Struct. Biotechnol. J. 13, 459–468 (2015).
Article CAS PubMed PubMed Central Google Scholar
Capraro, D. T. & Jennings, P. A. Untangling the influence of a protein knot on folding. Biophys. J. 110, 1044–1051 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Ziegler, F. et al. Knotting and unknotting of a protein in single molecule experiments. Proc. Natl. Acad. Sci. USA 113, 7533–7538 (2016).
Article CAS PubMed PubMed Central Google Scholar
Mallam, A. L. & Jackson, S. E. Knot formation in newlytranslated proteins is spontaneous and accelerated by chaperonins. Nat. Chem. Biol. 8, 147–153 (2012).
Article CAS Google Scholar
Mansfield, M. L. Are there knots in proteins? Nat. Struct. Biol. 1, 213–214 (1994).
Article CAS PubMed Google Scholar
Mansfield, M. L. Fit to be tied. Nat. Struct. Biol. 4, 166–167 (1997).
Article CAS PubMed Google Scholar
Berry, G. C. The hydrodynamic and conformational properties of denatured proteins in dilute solutions. Protein Sci. 19, 94–98 (2010).
Article CAS PubMed Google Scholar
Religa, T. L., Markson, J. S., Mayor, U., Freund, S. M. V. & Fersht, A. R. Solution structure of a protein denatured state and folding intermediate. Nature 437, 1053–1056 (2005).
Article ADS CAS PubMed Google Scholar
Uversky, V. N. Natively unfolded proteins: a point where biology waits for physics. Protein Sci. 11, 739–756 (2002).
Article CAS PubMed PubMed Central Google Scholar
Mallam, A. L., Rogers, J. M. & Jackson, S. E. Experimental detection of knotted conformations in denatured proteins. Proc. Natl. Acad. Sci. USA 107, 8189–8194 (2010).
Article ADS CAS PubMed PubMed Central Google Scholar
Wang, P., Yang, L. J., Liu, P. C., Gao, Y. Q. & Zhao, X. S. Single-molecule detection reveals knot sliding in TrmD denaturation. Chem. Eur. J. 19, 5909–5916 (2013).
Article CAS PubMed Google Scholar
Hsieh, S. J., Mallam, A. L., Jackson, S. E. & Hsu, S. T. Backbone 1H, 13C and 15N assignments of YibK and avariant containing a unique cysteine residue at C-terminus in 8 M urea-denatured states [corrected]. Biomol. NMR Assign. 8, 439–442 (2014).
Article CAS PubMed Google Scholar
Shih, P. M. et al. Random-coil behavior of chemically denatured topologically knotted proteins revealed by small-angle X-ray scattering. J. Phys. Chem. B 119, 5437–5443 (2015).
Article CAS PubMed Google Scholar
Schreiter, E. R. & Drennan, C. L. Ribbon-helix-helix transcription factors: variations on a theme. Nat. Rev. Microbiol. 5, 710–720 (2007).
Article CAS PubMed Google Scholar
Robinson, C. R. & Sauer, R. T. Equilibrium stability and sub-millisecond refolding of a designed single-chain Arc repressor. Biochemistry 35, 13878–13884 (1996).
Article CAS PubMed Google Scholar
Robinson, C. R. & Sauer, R. T. Covalent attachment of Arc repressor subunits by a peptide linker enhances affinity for operator DNA. Biochemistry 35, 109–116 (1996).
Article CAS PubMed Google Scholar
King, N. P., Jacobitz, A. W., Sawaya, M. R., Goldschmidt, L. & Yeates, T. O. Structure and folding of a designed knotted protein. Proc. Natl. Acad. Sci. USA 107, 20732–20737 (2010).
Article ADS CAS PubMed PubMed Central Google Scholar
Leaver-Fay, A. et al. ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. Methods Enzymol. 487, 545–574 (2011).
Article CAS PubMed PubMed Central Google Scholar
Cecconi, C., Shank, E. A., Dahlquist, F. W., Marqusee, S. & Bustamante, C. Protein-DNA chimeras for single molecule mechanical folding studies with the optical tweezers. Eur. Biophys. J. 37, 729–738 (2008).
Article CAS PubMed PubMed Central Google Scholar
Dzubiella, J. Sequence-specific size, structure, and stability of tight protein knots. Biophys. J. 96, 831–839 (2009).
Article ADS CAS PubMed PubMed Central Google Scholar
He, C., Lamour, G., Xiao, A., Gsponer, J. & Li, H. Mechanically tightening a protein slipknot into a trefoil knot. J. Am. Chem. Soc. 136, 11946–11955 (2014).
Article CAS PubMed Google Scholar
Tabtiang, R. K., Cezairliyan, B. O., Grant, R. A., Cochrane, J. C. & Sauer, R. T. Consolidating critical binding determinants by noncyclic rearrangement of protein secondary structure. Proc. Natl. Acad. Sci. USA 102, 2305–2309 (2005).
Article ADS CAS PubMed PubMed Central Google Scholar
Crooks, G. E. Entropy production fluctuation theorem and the nonequilibrium work relation for free energy differences. Phys. Rev. E Stat. Nonlin. Soft. Matter Phys. 60, 2721–2726 (1999).
Article ADS CAS Google Scholar
Collin, D. et al. Verification of the Crooks fluctuation theorem and recovery of RNA folding free energies. Nature 437, 231–234 (2005).
Article ADS CAS PubMed PubMed Central Google Scholar
Shank, E. A., Cecconi, C., Dill, J. W., Marqusee, S. & Bustamante, C. The folding cooperativity of a protein is controlled by its chain topology. Nature 465, 637–640 (2010).
Article ADS CAS PubMed PubMed Central Google Scholar
Mossa, A., Manosas, M., Forns, N., Huguet, J. M. & Ritort, F. Dynamic force spectroscopy of DNA hairpins: I. Force kinetics and free energy landscapes. J. Stat. Mech. Theor. Exp. 2009, 1–28 (2009).
Article Google Scholar
Liphardt, J., Onoa, B., Smith, S. B., Tinoco, I. & Bustamante, C. Reversible unfolding of single RNA molecules by mechanical force. Science 292, 733–737 (2001).
Article ADS CAS PubMed Google Scholar
Zhou, H. X. Dimensions of denatured protein chains from hydrodynamic data. J. Phys. Chem. B 106, 5769–5775 (2002).
Article CAS Google Scholar
Wilkins, D. K. et al. Hydrodynamic radii of native and denatured proteins measured by pulse field gradient NMR techniques. Biochemistry 38, 16424–16431 (1999).
Article CAS PubMed Google Scholar
D’Adamo, G. & Micheletti, C. Molecular crowding increases knots abundance in linear polymers. Macromolecules 48, 6337–6346 (2015).
Article ADS Google Scholar
Kohn, J. E. et al. Random-coil behavior and the dimensions of chemically unfolded proteins. Proc. Natl. Acad. Sci. USA 101, 12491–12496 (2004).
Article ADS CAS PubMed PubMed Central Google Scholar
Uehara, E. & Deguchi, T. Characteristic length of the knotting probability revisited. J. Phys. Condens. Matter 27, 1–9 (2015).
Article CAS Google Scholar
Bell, G. I., Dembo, M. & Bongrand, P. Cell-adhesion - competition between nonspecific repulsion and specific bonding. Biophys. J. 45, 1051–1064 (1984).
Article CAS PubMed PubMed Central Google Scholar
Dudko, O. K., Hummer, G. & Szabo, A. Theory, analysis, and interpretation of single-molecule force spectroscopy experiments. Proc. Natl. Acad. Sci. USA 105, 15755–15760 (2008).
Article ADS CAS PubMed PubMed Central Google Scholar
Pierse, C. A. & Dudko, O. K. Kinetics and energetics of biomolecular folding and binding. Biophys. J. 105, L19–L22 (2013).
Article CAS PubMed PubMed Central Google Scholar
Soler, M. A. & Faisca, P. F. N. Effects of knots on protein folding properties. PLoS ONE 8, e74755 (2013).
Article ADS CAS PubMed PubMed Central Google Scholar
Lua, R. C. & Grosberg, A. Y. Statistics of knots, geometry of conformations, and evolution of proteins. PLoS Comput. Biol. 2, e45 (2006).
Article ADS PubMed PubMed Central Google Scholar
Jackson, S. E., Suma, A. & Micheletti, C. How to fold intricately: using theory and experiments to unravel the properties of knotted proteins. Curr. Opin. Struct. Biol. 42, 6–14 (2017).
Article CAS PubMed Google Scholar
Wust, T., Reith, D. & Virnau, P. Sequence determines degree of knottedness in a coarse-grained protein model. Phys. Rev. Lett. 114, 028102 (2015).
Article ADS PubMed Google Scholar
Lu, J., den Dulk-Ras, A., Hooykaas, P. J. & Glover, J. N. Agrobacterium tumefaciens VirC2 enhances T-DNA transfer and virulence through its C-terminal ribbon-helix-helix DNA-binding fold. Proc. Natl. Acad. Sci. USA 106, 9643–9648 (2009).
Article ADS CAS PubMed PubMed Central Google Scholar
Tyka, M. D. et al. Alternate states of proteins revealed by detailed energy landscape mapping. J. Mol. Biol. 405, 607–618 (2011).
Article CAS PubMed Google Scholar
Mandell, D. J., Coutsias, E. A. & Kortemme, T. Sub-angstrom accuracy in protein loop reconstruction by robotics-inspired conformational sampling. Nat. Methods 6, 551–552 (2009).
Article CAS PubMed PubMed Central Google Scholar
Khatib, F., Weirauch, M. T. & Rohl, C. A. Rapid knot detection and application to protein structure prediction. Bioinformatics 22, e252–259 (2006).
Article CAS PubMed Google Scholar
Kellogg, E. H., Leaver-Fay, A. & Baker, D. Role of conformational sampling in computing mutation-induced changes in protein structure and stability. Proteins 79, 830–838 (2011).
Article CAS PubMed Google Scholar
Smith, S. B., Cui, Y. J. & Bustamante, C. Optical-trap force transducer that operates by direct measurement of light momentum. Methods Enzymol 361, 134–162 (2003).
Article CAS PubMed Google Scholar
Cecconi, C., Shank, E. A., Bustamante, C. & Marqusee, S. Direct observation of the three-state folding of a single protein molecule. Science 309, 2057–2060 (2005).
Article ADS CAS PubMed Google Scholar
Bustamante, A. et al.: ARC_DATA.xlsx. figshare. https://doi.org/10.6084/m9.figshare.5195710.v1 (2017).
Bacher, J, Wenzig, K. & Vogler, M. SPSS TwoStep Cluster-A First Evaluation. Vol. 1, 1–20 (University Erlangen-Nürnberg, 2001).

Download references

Acknowledgements

This work was supported by Fondecyt 11110534 (to M.B.), 1151274 (to M.B.), 11130263 (to C.A.M.W.), Anillo 1107 (to M.B.), and Chile and Fondecyt 196–2013 (to D.G.G.) CONCYTEC, Perú. This research was partially supported by the supercomputing infrastructure of the NLHPC (ECM-02) at Universidad de Chile. A.B. was supported by Conicyt master fellowship 22121199. We gratefully acknowledge Professor Alexander Vologodskii from New York University for helpful comments and advice. Travel for C.B. to and from Chile and Peru was also partially supported by the Howard Hughes Medical Institute, NIH grant R01GM032543, and the U.S. Department of Energy Office of Basic Energy Sciences Nanomachine Program under contract no. DE-AC02-05CH11231.

Author information

Bustamante A and Sotelo-Campos J contributed equally to this work.

Authors and Affiliations

Departamento de Bioquímica y Biología Molecular, Facultad de Ciencias Químicas y Farmacéuticas, Universidad de Chile, Santos Dumont 964, Independencia, Santiago, 8380494, Chile
Andrés Bustamante, Martin Floor, Christian A. M. Wilson & Mauricio Báez
Departamento Académico de Ciencias Exactas, Facultad de Ciencias y Filosofía, Universidad Peruana Cayetano Heredia, Av. Honorio Delgado 430, San Martin de Porras, Lima−31, 15102, Peru
Juan Sotelo-Campos
Laboratorio de Moléculas Individuales, Facultad de Ciencias y Filosofía, Universidad Peruana Cayetano Heredia, Av. Honorio Delgado 430, San Martin de Porras, Lima-31, 15102, Peru
Daniel G. Guerra & Carlos Bustamante
Department of Molecular and Cell Biology, Department of Physics and Department of Chemistry, Kavli Energy Nanoscience Institute, and Howard Hughes Medical Institute, University of California, Berkeley, CA, 94720, USA
Carlos Bustamante

Authors

Andrés Bustamante
View author publications
You can also search for this author in PubMed Google Scholar
Juan Sotelo-Campos
View author publications
You can also search for this author in PubMed Google Scholar
Daniel G. Guerra
View author publications
You can also search for this author in PubMed Google Scholar
Martin Floor
View author publications
You can also search for this author in PubMed Google Scholar
Christian A. M. Wilson
View author publications
You can also search for this author in PubMed Google Scholar
Carlos Bustamante
View author publications
You can also search for this author in PubMed Google Scholar
Mauricio Báez
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceived and designed the experiments: M.B., C.B., D.G.G., A.B. Performed the experiments: M.B., A.B., J.S.-C, M.F., C.A.M.W. Analyzed the data: A.B., M.F., C.A.M.W., M.B., J.S.-C. Wrote the paper: M.B., C.B., D.G.G., J.S.-C.

Corresponding authors

Correspondence to Carlos Bustamante or Mauricio Báez.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Additional information

Change history: A correction to this article has been published and is linked from the HTML version of this paper.

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A correction to this article is available online at https://doi.org/10.1038/s41467-017-02384-5.

Electronic supplementary material

Supplementary Information

Peer Review file

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Bustamante, A., Sotelo-Campos, J., Guerra, D.G. et al. The energy cost of polypeptide knot formation and its folding consequences. Nat Commun 8, 1581 (2017). https://doi.org/10.1038/s41467-017-01691-1

Download citation

Received: 03 February 2017
Accepted: 09 October 2017
Published: 17 November 2017
DOI: https://doi.org/10.1038/s41467-017-01691-1

This article is cited by

A new method to experimentally quantify dynamics of initial protein–protein interactions
- Babu Reddy Janakaloti Narayanareddy
- Nathan Reddy Allipeta
- Steven P. Gross
Communications Biology (2024)
Ligand-switchable nanoparticles resembling viral surface for sequential drug delivery and improved oral insulin therapy
- Tiantian Yang
- Aohua Wang
- Yong Gan
Nature Communications (2022)
Mechanical strength of RNA knot in Zika virus protects against cellular defenses
- Meng Zhao
- Michael T. Woodside
Nature Chemical Biology (2021)
Mechanical unfolding of a knotted protein unveils the kinetic and thermodynamic consequences of threading a polypeptide chain
- Maira Rivera
- Yuxin Hao
- Mauricio Baez
Scientific Reports (2020)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.