Protein knotting through concatenation significantly reduces folding stability

Concatenation by covalent linkage of two protomers of an intertwined all-helical HP0242 homodimer from Helicobacter pylori results in the first example of an engineered knotted protein. While concatenation does not affect the native structure according to X-ray crystallography, the folding kinetics is substantially slower compared to the parent homodimer. Using NMR hydrogen-deuterium exchange analysis, we showed here that concatenation destabilises significantly the knotted structure in solution, with some regions close to the covalent linkage being destabilised by as much as 5 kcal mol−1. Structural mapping of chemical shift perturbations induced by concatenation revealed a pattern that is similar to the effect induced by concentrated chaotrophic agent. Our results suggested that the design strategy of protein knotting by concatenation may be thermodynamically unfavourable due to covalent constrains imposed on the flexible fraying ends of the template structure, leading to rugged free energy landscape with increased propensity to form off-pathway folding intermediates.


Results
We have previously reported the backbone NMR assignments of wt HP0242 and the secondary structures in solution state are consistent with the crystal structure 28 . { 1 H}-15 N heteronuclear nuclear Overhauser effect (hetNOE) of HP0242 confirmed that all the four helices are highly ordered on the timescale of ps-ns ( Figure S1). Given that the unfolding kinetics of HP0242 variants under native conditions are on the timescale of hours to days, we opted to examine the slow folding dynamics of HP0242 by native NMR hydrogen-deuterium exchange (HDX) 16,17,29 for slow-exchanging amide groups and phase-modulated clean chemical exchange (CLEANEX-PM) 30 for fast exchanging amide groups, mostly the N-terminal region. The time constants of residue-specific amide HDX at pH 6.8 spanned from 10 2 to 10 6 minutes with those corresponding to the second helix (H2) being the slowest (Fig. 1). We repeated the same NMR HDX analysis at pH 7.8 to examine the pH-dependency of backbone amide HDX. While the HDX rates of most of the residues were proportional to catalyst concentration, i.e., [OH − ], indicating that the HDX process is under thermodynamic equilibrium, also known as the EX2 regime 31 , a few residues showed pH independent HDX rates, indicating that the HDX of these residues are under kinetic control of the opening rates of the corresponding hydrogen bonds, also known as the EX1 regime. For the EX2 residues, we calculated the protection factor (PF), and derived the corresponding free energy of unfolding, Δ G HDX , for individual backbone amide-mediated hydrogen bonds (see Methods). The results showed that H2, which is stabilised by leucine-zipper hydrophobic interactions has the higher free energy of unfolding up to 9 kcal mol −1 , which is close to the bulk number corresponding to the transient between intermediate and denatured (I-D) states of HP0242 derived from intrinsic fluorescence 26 . The average free energy of unfolding of H1 and H4 is closer to the values corresponding to the native to intermediate (N-I) states of HP0242 derived from intrinsic fluorescence and far-UV circular dichroism spectroscopy. Collectively, these data reaffirmed the sequential unfolding pathway established by bulk spectroscopic measurements in which the peripheral helices, namely H1, H3 and H3 unfold first followed by unfolding of the long H2 that is stabilised by leucine zipper-based homodimerisation 26 .
While our earlier work indicated that concatenated HP0242 exhibits more complex folding pathways with slower folding rates, possibly due to the increased likelihood of off-pathway folding intermediate formation, the molecular details of the impact on structure and folding dynamics upon concatenation of HP0242 remains elusive. We compared the backbone amide 15 N-1 H correlations of wt HP0242 and its concatenated form (Fig. 2). While the overall appearance of the two spectra were comparable, significant chemical shift perturbations were observed in the loop connecting H1 and H2 as well as most of H3, in addition to the flanking ends, i.e., the N-and C-termini as a result of concatenation. Furthermore, minor cross-peaks were observed in the 15 N-1 H correlation spectrum of the concatenated HP0242 due to the loss of internal symmetry. Remarkably, despite the spectral similarity between native and concatenated HP0242, the latter showed dramatic loss of NMR signals after a short period (< 10 min) of HDX in contrast to a much larger number of the remaining correlations for wt HP0242 dimer, indicating that concatenation significantly destabilises the native structure of HP0242 despite the essentially identical structures resolved in crystalline state under cryogenic conditions.
To further examine how concatenation impacts the folding of HP0242, we repeated the NMR HDX analysis for the concatenated form and found markedly reduced folding stabilities in H4 with its C-terminal half being so destabilised that no reliable HDX rates could be determined (Fig. 3). Likewise, most residues of H1 underwent rapid HDX that resulted in near-completely loss of protection against HDX. For H2 and H3, they were destabilised by as much as 2 kcal mol −1 across their whole sequences. Structural mapping of the residue-specific reduced free energy of unfolding showed that the impacts are localised near the concatenation site. Residues that are destabilised by more than 5 kcal mol −1 include S30, D80, Q84, S85, A87, N88 and I89. Other significantly destabilised residues (Δ Δ G > 4 kcal mol −1 ) include W18, I22 and F23, all of which are located in close proximity to the only tryptophan residue of the protomer, W18. In addition to the backbone hydrogen bonds of the helical structure, the side-chain hydroxyl group of S85 is hydrogen bonded to the indole nitrogen of W18, whose intrinsic fluorescence serves as the structural probe in our earlier study 26 . It is likely that the introduction of covalent linkage between the C-terminus of one protomer to the N-terminus of the other through a short flexible linker imposes so much strain to the local structure around the C-terminal half of H4 that the corresponding hydrogen bond network is bulged from ideal geometry in solution.
The large destabilising effect as a result concatenation prompted us to compare it with the destabilisation effect in response to chaotrophic agent-induced chemical denaturation. A series of 15 N-1 H correlation spectra of wt HP0242 were recorded in the presence of different concentrations of guanidine hydrochloride (GdnHCl) ranging from 0 to 7 M (Fig. 4). On increasing GdnHCl concentration, the intensities of well-dispersed 15 N-1 H correlations diminished progressively until they were too weak to be detected at 3 M GdnHCl. The chemical shift perturbations of individual 15 N-1 H correlations could be followed up to 2.5 M GdnHCl. Structural mapping of the observed chemical shift perturbations resemble remarkably with those observed in response to concatenation in that many of the highly perturbed residues are clustered around the junction between H1 and H4 where W18 is located. Several residues located at the fraying ends of the H3 and H2 also exhibited significant chemical shift perturbations. The loss of native 15 N-1 H correlation signals was accompanied by emergence of a group of poorly dispersed 15 N-1 H correlations, corresponding to unfolded population. Coexistence of native and chemically denatured 15 N-1 H correlations could be observed clearly at 2 M GdnHCl. Note however, that at near saturation denaturant concentration, i.e., 7 M GdnHCl, the number of unfolded 15 N-1 H correlations was far smaller than the expected number for a fully disordered polypeptide chain of 92 residues in length, suggesting that a significant proportion of the chemically denatured HP0242 was in a molten globular state with abundant conformation exchange processes on the timescale of μ s to ms, resulting in severe line broadening.

Discussion
In this work, we have compared the folding dynamics of wt and concatenated HP0242 using native NMR HDX analysis. Despite the essentially identical crystal structures of the two variants 24 , our results indicated that concatenation through a flexible linker between the two protomers results in substantial destabilisation across the entire structure with many residues showed reduced free energy of unfolding by more than 2 kcal mol −1 while the C-terminal half of H4 and most of H1, both of which are directly involved in concatenation, are destabilised by more than 4 kcal mol −1 . The global destabilising effects led us to compare them with chaotropic agent-induced global destabilisation by NMR. The results revealed a high degree of similarity between the two types of destabilisation in terms of their spatial distributions on the structure of HP0242. Furthermore, significantly fewer than expected number of 15 N-1 H correlations in the presence of 7 M GdnHCl suggests the existence of residual structures, most likely populated within H2 due to its high stability derived from NMR HDX, that undergo helix-coil transitions on the μ s to ms timescale, resulting in unfavourable line broadening beyond detection. Combining all the experimental evidence, we propose a linear folding pathway for concatenated HP0242 along which a partially unfolded folding intermediates become highly populated with the secondary structural elements around W18, namely H1, H4 and part of H2, being largely disordered. In the presence of highly concentrated GdnHCl (> 6 M), subsequent unfolding of H3 and H2 takes place to form a molten globular denatured state (Fig. 5).
Protein repeats occur naturally through evolution 32,33 . Tremendous efforts have been made to engineer protein tandem repeats based on naturally occurring modules [34][35][36][37][38][39][40] or de novo computational modules 41,42 . In most cases, the engineered protein tandem repeats display exceptionally high thermal and/or chemical stabilities compared to their ancestry building modules due to favourable enthalpic gains from inter-modular interactions and entropic stabilisation through covalent loop linkages 34,37,39,41,43 . A recent study by Tawfik and co-workers has nonetheless demonstrated through evolution traits that the stabilisation of the native states of β -propeller protein repeats is accompanied by parallel stabilisation of folding intermediates that are prone to misfold and aggregate 40 . Indeed, evolution tends to avoid high sequence similarity between neighbouring domains due to the higher propensity to misfold 44,45 . Theoretical and experimental analyses on the folding pathways of a series of circular permutations of β -trefoil interleukin-1β with different loop insertions suggested that these circular permutants tend to back-track on its folding landscape 46 , and that the destabilising effects can be attributed to geometric frustration of functional loops linking the modular repeats within the β -trefoil topology 47 . Collectively, these findings are in line with the fact that HP0242 exhibits complex folding pathways with high tendency to form off-pathway misfolded intermediates; the concatenated HP0242 exhibits significantly more populated folding intermediate than the intertwined symmetric dimer 26  costs, involved in attaining the knotted topology of the concatenated form of HP0242. While knotting may be thermodynamically unfavourable, emerging evidence has revealed the functional importance of protein knots 48 thus justifying their preservation throughout evolution.

Methods
Recombinant protein preparation. Uniformly 15 N-labelled wt and concatenated HP0242 were over-expressed and purified according to the previously described protocol 26,28 . Unless otherwise specified, the NMR samples were buffered in 10 mM phosphate (pH 6.8) containing 10% D 2 O (v/v) and 0.02% NaN 3 .
NMR spectroscopy. All NMR data were collected at 298 K using an AVANCE 800 (18.7 T), an AVANCE III 600 (14.0 T) or an AVANCE 500 (11.7 T) NMR spectrometer (Bruker Biospin, Germany). The latter two are equipped with a cryogen-cooled probe head. Unless otherwise specified, 5 mm quartz NMR tubes were used for data collection. The resulting datasets were processed by NMRPipe 49 and analysed by Sparky 50 . { 1 H}-15 N hetNOE was recorded at 14.0 T using parameters as described previously 51 .
Chemical denaturation monitored by NMR spectroscopy. Aliquots of 0.1 mM 15 N-labelled wt HP0242 were incubated in the presence of 0, 0.5, 1, 1.5, 2, 2.5, 3, 4, 5, 6 and 7 M GdnHCl overnight before NMR measurements. To minimize the interference from high salt contents during NMR measurements, 3 mm MATCH NMR tubes were used. 15 )). The size of the radius of the sausage representation corresponds to the magnitude of the chemical shift perturbation. It is also colour-ramped from green to magenta as indicated on the right hand side. NMR hydrogen exchange (HX). The rates at which the backbone amide protons exchange with bulk solvent are determined HDX 29,31 and CLEANEX-PM 30 . NMR HDX of wt and tandem HP0242 was carried out using the previously described protocol. Briefly, aliquots of pH-adjusted protein solution were lyophilized overnight and equal amounts of 99% D 2 O were added to resuspend the sample powder immediately before the NMR HDX measurements. The NMR HDX data were collected at 14.0 T by recording a series of 15 N-1 H SOFAST-HMQC spectra over a period of 20 days 29 . For fast HX processes of wt HP0242, CLEANEX-PM was used to determine the HX rates in 90% H 2 O and 10% D 2 O (v/v) at pH 6.8. The pH values of the samples were confirmed after the NMR HDX measurements. For both HP0242 variants, two pH values (7.8 and 6.8) were used to ascertain whether the subjects of interest are in the EX1 or EX2 regime. The HDX rate constants (k ex ) of individual residues were used to derive the protection factors (PFs) using the Excel spreadsheet with built-in parameters that are available from the Englander group (hx2.med.upenn.edu/download.html). PF is the ratio of the intrinsic HDX rate (k int ) over the observed HDX rate (k ex ), PF = k int /k ex . For the EX2 residues, the corresponding PFs are subsequently converted into the free energy of unfolding, Δ G HDX , where, Δ G HDX = − RT ln(PF).