To address whether proteins fold along multiple pathways, i,i+4 bi-histidine metal binding sites are introduced into dimeric and crosslinked versions of the leucine zipper region of the growth control transcription factor, GCN4. Divalent metal ion binding enhances both the equilibrium and folding activation free energies for GCN4. The enhancement of folding rates quantifies the fraction of molecules that have the binding site in a helical geometry in the transition state. Hence, this new method, termed -analysis, identifies the degree of pathway heterogeneity for a protein that folds in a two-state manner, a capability that is generally unavailable even with single molecule methods. Adjusting metal ion concentration continuously varies the stability of the bi-histidine region without additional structural perturbation to the protein. For dimeric and crosslinked versions, the accompanying changes in kinetic barrier heights at each metal ion concentration maps the folding landscape as well as establishes the importance of connectivity in pathway selection. Furthermore, this method can be generalized to other biophysical studies, where the ability to continuously tune the stability of a particular region with no extraneous structural perturbation is advantageous.
Traditionally, protein folding is considered a determinate process, whereby specific intermediates populate along a single pathway. Most1,
2,
3, but not all4,
5, folding experiments are interpreted in the context of a homogeneous transition state (TS) ensemble. Theoretical work, however, has led to a funnel picture in which folding occurs via structurally distinct, heterogeneous routes6,
7,
8,
9. This controversy, termed the 'classical versus new view' debate, has become one of the most discussed in protein folding6,
10. The debate has continued because it is experimentally difficult to resolve this issue for two-state folding proteins. For example, mutational -analysis, the most frequently applied method to characterize TSs, is confounded by multiple interpretations of intermediate f-values. The energetic effect of an amino acid substitution on the folding activation energy relative to equilibrium stability, quantified as f, defines the level of the interaction of the mutated residue in the TS. A f-value of zero indicates that the influence of the side chain is absent, whereas a value of one indicates the influence is fully realized. Fundamental ambiguities, however, arise for the fractional values normally observed. Do these values represent a single TS with partially formed structure or a population of heterogeneous TSs?
We resolved this apparent ambiguity for the Trp-containing -helical coiled coil GCN4-p1' using single and double Ala to Gly substitutions11. We deduced that fractional values in the dimeric protein were due to multiple pathways with helix nucleation sites located along the length of the coil. After inserting a destabilizing mutation at one site, which possessed an intermediate f-value and was potentially important in the TS, we conducted -analysis at a second site that had a small f-value. The f-value at the second site subsequently increased significantly, indicating that helix nucleation had shifted toward this region.
Given the importance of identifying pathway heterogeneity and the ubiquity of fractional f-values, we developed a simple method that characterizes the topography of the folding landscape. Divalent metal ion binding sites are engineered at solvent-exposed positions to stabilize specific regions (Fig. 1a,b). The relative enhancement of folding rates with respect to increasing binding energy identifies the percentage of TSs with the metal binding site formed, similar to ligand binding studies at naturally occurring sites12,
13. The binding energy required to switch a low flux pathway to a high flux pathway identifies the relative barrier heights between the pathway with the metal site formed relative to all other pathways. Hence, within the context of a single protein, this method can distinguish single pathway models with partial structure formation in the TS from multiple pathway folding models and map the free energy reaction surface.
Figure 1. Designed metal ion binding variants of the coiled coil.
Depictions of crosslinked coiled coil with the 3,7-biHis metal binding site and either a, an N-terminal CGG disulfide tether containing the A24G mutation or b, a C-terminal GGC disulfide tether. Dimeric versions of either crosslinked species are identical peptides except each lacks the terminal Cys residue required to make the crosslink. Graphics were rendered by Swiss-Prot Protein Viewer (Glaxo Wellcome Experimental Research) and the Persistence of Vision ray trace program (POV-Ray). c, Changes in equilibrium stability upon addition of the divalent metal ion, Co2+, obtained from standard GdmCl denaturation melts (triangle) and folding kinetics (circle) for the N-terminal crosslinked A24G form of GCN4. Inset: Standard GdmCl denaturation melts in varying concentrations of Co2+. The 4% baseline variance in CD level at low denaturant is nonsystematic and presumably from a slight variation in protein concentration, rather than a structural perturbation. This interpretation is supported by a measurement on the dimeric A24 version, where any potential effect of the N-terminal tether is removed. The addition of metal stabilizes the dimeric version by 2.8 kcal mol-1, which increases the folded population by 4% and accounts for nearly all of the 5−6% change in CD signal.
Equilibrium binding We introduced a pair of bi-histidine (biHis) metal binding sites in an i,i+4 configuration14 into GCN4-p1' (Fig. 1a,b). This design is borrowed from zinc finger motifs possessing two His residues on a helix, which coordinate the Zn2+ ion along with two Cys residues on a -turn. In the presence of Co2+ and Zn2+ at pH 7.5, where His residues are deprotonated, metal binding stabilizes the coiled coil up to 3.6 kcal mol-1 (Fig. 1c; Table 1). The degree of stabilization for the two independent sites, Geq, depends upon differences in metal dissociation constants between the two biHis sites in the native state, KNeq, and in the denatured state, KUeq.
Table 1. Equilibrium and kinetic parameters for divalent metal ion binding
For Co2+, binding constants in the native and denatured states are in the 50 M and 1 mM ranges, respectively. Although binding is 5- to 10-fold stronger for Zn2+, the ratio of binding constant in the native state to the denatured state is less, resulting in less added stability.
For the N-terminally crosslinked A24G form, the circular dichroism (CD) signal at 222 nm (helical structure) for the native state is insensitive to metal concentration (Fig. 1c, inset). Although the baselines in the denaturation profiles have a standard deviation of 4%, there is no systematic relation to metal concentration, indicating that metal binding to the biHis site does not introduce any appreciable structural perturbation. The CD signal for the denatured state is unchanged at different cation concentrations, indicating that binding to the unfolded state does not occur in a helical geometry. Addition of 1 mM Co2+ to wild type GCN4-p1', which contains a single solvent accessible His on each helix, fails to alter stability, indicating that metal-induced stabilization is specific to i,i+4 biHis substitution. This result suggests that non-neighboring surface His residues could be retained in future studies. However, we introduced a H18N substitution to completely preclude any unwanted complications.
Kinetic studies on crosslinked GCN4 BiHis metal binding studies are analogous to mutational -analysis; however, instead of an amino acid substitution, divalent metal ion concentration is varied to perturb the stability of a specific region. The degree to which the region is structured in the TS at a given metal concentration is the relative effect on folding activation free energy versus the change in equilibrium stability, fM2+ = G‡f / Geq. We first illustrate this method with the N-terminal crosslinked version (Fig. 1a), because its folding behavior is straightforward, occurring along a single robust pathway, unlike that for the dimeric version, which exhibits a more complex, multiple route behavior5. According to our mutational studies on the N-terminal crosslinked version, the region nearest the crosslink is structured in the TS, whereas the region farthest from the tether is unstructured.
For biHis metal binding sites located near the N-tethered end (residues 3 and 7), the addition of metal accelerates folding rates only, accounting for the entire change in equilibrium stability. In the denaturant dependence of activation energy plots, the folding arm increases with increasing metal concentration while the unfolding arm remains unchanged (Fig. 2a). The fM2+-value is unity at all metal concentrations (Fig. 3a). Hence, the folding pathway is robust and homogeneous, where the entire TS ensemble forms a helix at the 3,7-biHis region (Fig. 3a).
Figure 2. Denaturant dependence of folding at different metal concentrations.
Kinetic activation energy plots for a, N-terminal crosslinked A24G; b, C-terminal crosslinked A24; c, dimeric A24G; d, and dimeric A24. All experiments are conducted in 20 mM HEPES, 150 mM NaCl, pH 7.5, at 10 °C.
Change in folding rates versus changes in stability due to metal binding for a, the N-terminal crosslink; b, C-terminal crosslink; c, dimeric A24G; and d, dimeric A24 . Linear fits, shown with solid lines (a,b) and dotted lines (c,d), are for a homogeneous pathway. Hyperbolic relationships (c,d) represent a heterogeneous model for dimeric GCN4 (Eq. 4). Insets: Relative flux, f, through the metal-bound pathway is plotted as a function of the change in stability (Eq. 5). Because the crosslinked versions (a,b) have robust homogeneous pathways, their respective f -values are constant. In contrast, the dimeric proteins (c,d) have variable f -values that vary from 0 to 1 according to and the added stability. Folding energy landscapes are modeled using and f -values with two pathways with either N- or C-terminal helix nucleation. The unfolded ensemble occupies the outer ring. When GCN4-p1' is crosslinked at the N-terminus (a), the surface possesses a smaller barrier in the absence of M2+ for the N-terminal route with respect to the C-terminal route. Upon the addition of M2+, the barrier of the N-terminal route is lowered, and all the pathways continue to nucleate at the N-terminus. When M2+ is added to the C-terminal crosslinked version (b), all the flux nucleates at the C-terminus, because the N-terminal barrier is too large even following 3 kcal mol-1 stabilization. For the dimeric A24G version (c), the N- and C-terminal barriers are nearly equivalent in the absence of metal, and 50% of the ensemble nucleates at either end. But at high concentrations of M2+, the N-terminal, metal-stabilized route dominates. For the A24 dimeric version (d), nucleation principally occurs at the C-terminus in the absence of M2+. High concentrations of M2+ stabilize the N-terminal barrier to a level comparable to the C-terminal barrier, resulting in nucleation occurring equally at both termini. Graphics were created in Mathematica 4.0 (Wolfram Research, Inc.).
To investigate the influence of connectivity, the crosslink is moved to the C-terminus, leaving the 3,7-biHis site intact (Fig. 1b). For this version, the fM2+-value is zero in saturating concentrations of Zn2+ or Co2+ (Fig. 2b). This result indicates that folding remains homogeneous, and helix nucleates only at the tethered end, far from the biHis site (Fig. 3b). This shift is expected, because the metal site is now at the opposite end and distant from the crosslink. Hence, for the crosslinked coiled coil, folding is homogeneous with pathway selection dominated by chain connectivity (Fig. 3b).
Quantifying heterogeneity: -analysis Although the interpretations are straightforward for the fM2+-values of zero and unity of the crosslinked versions, the folding of dimeric GCN4 is complex (Fig. 2c,d). First, a complete derivation, similar to Brønsted analysis1, is presented for heterogeneous scenarios with fractional f-values, as observed in the dimeric coiled coil. Folding rates are calculated assuming two pathways, one where the biHis site is formed (k1) and a second that represents the sum of all other pathways (k2), where the observed folding rate is
In the absence of added metal, the relative flux is defined as = k2 / k1. Stabilization of the TS from the first pathway by metal binding, GfM2+, enhances k1 and increases the overall rate according to:
A term, f, is introduced to account for the possibility of fractional stabilization by the metal ion binding site in the TS, GfM2+ = f Geq. Having and f parameters resolves the inherent ambiguity in -analysis: fractional -values can result either from multiple pathways () and/or partial structure formation (f) in the TS. Empirically, for the solvent exposed biHis site in the N-terminal crosslinked version (Fig. 1a), the slope is unity; therefore, f is unity (Fig. 3a; Table 2).
Table 2. Heterogeneous and homogeneous folding model fit parameters1
Upon addition of metal, the net reduction of the folding activation energy is:
For standard mutational -analysis, the change in stability is a discrete value for a single mutation because it is derived from a single comparison to wild type protein. With biHis sites, analogous fM2+-values can be calculated at each metal concentration, suitably referenced to the metal-free condition.
However, a much more informative quantity, f, may be calculated for each incremental change in metal ion concentration. This differential quantity is the infinitesimal change in activation energy relative to the infinitesimal change in equilibrium stability:
f is tangent to the curve of Gf‡versusGeq (Fig. 3). In the simplified case of a homogeneous and robust folding pathway that is, is either large, 100, or small, 0.1 f is either a constant of zero or f, depending on whether none of the pathways or all of the pathways, respectively, have metal binding sites formed in the TS. For example, f for the N-terminal crosslinked version is unity as the N-terminal biHis region is always structured (Fig. 3a). Conversely, f is zero for the C-terminal crosslinked version as the N-terminal biHis region is completely unstructured at the rate-limiting step (Fig. 3b).
Pathway heterogeneity in dimeric GCN4 The pathway homogeneity observed for the crosslinked coiled coil is in contrast with pathway heterogeneity observed for dimeric GCN4 (ref. 5), thereby providing a test of the ability of the method to identify and quantify pathway heterogeneity. Initially, low concentrations of metal have a negligible impact on folding rates of 3,7-biHis dimeric GCN4 (Fig. 2d), and f remains near zero (Fig. 3d). Generally, f represents the ratio of the molecules that nucleate at the biHis region relative to those that nucleate at all other regions. Hence, the biHis route is a very minor pathway in the absence of metal. As metal concentration is increased, the N-terminal region is stabilized. More molecules nucleate at the N-terminal end, and f increases to 0.5 at high metal concentrations (Fig. 3d). Although f should approach unity with further stabilization of the binding site, it does not because of the limited experimental range of GeqM2+, which saturates at high metal concentrations. A detailed analysis of the metal dependence indicates that the biHis-containing pathway is 80-fold less populated than all other routes in the absence of metal (Table 2; Fig. 3d).
The rationale for the relative fluxes observed in the present work, biHis 80:1, versus the previous study, wild type6:1, of the wild type protein is due to the introduction of the biHis site. The two His residues reduce the intrinsic helicity of the region by >10-fold. Thus, in the biHis version, some of the flux normally at the N-terminus has shifted back towards the C-terminus. This shift explains the reduction in the observed pathway heterogeneity in the absence of metal for the biHis protein.
The second dimeric protein examined is the A24G version because most nucleation events now occur near the N-terminus for this system. At low concentrations of metal, the f -value is 0.5 indicating that about one-half of the nucleation events occur in the biHis region (Fig. 3c). This uniform distribution occurs despite the fact that the C-terminus has most of the intrinsic helicity5. As the metal concentration is increased, the N-terminal region is further stabilized and nearly all molecules fold with helix nucleated at the N-terminus, as evidenced by the f -value that increases to unity (Fig. 3c).
A comparison of dimeric A24 and A24G versions indicates that the change in the degree of pathway heterogeneity recapitulates the difference in their equilibrium stability. The A24G mutation is responsible for a shift in the amount of flux going through regions other than the N-terminus, because the -value shifts from 80:1 to 1:1 (Table 2). The ratio of the heterogeneity in these two versions in the absence of metal reflects the loss in stability for this mutation.
Thus, the A24G mutation is responsible for the 80-fold change in -value (2.5 kcal mol-1). This shift is consistent with the decrease in stability for the Gly substitution in either the biHis or wild type5 backgrounds (1.7 0.1 or 2.4 0.1 kcal mol-1, respectively), implying that the region containing the 24th residue is the major alternative nucleation site (Fig. 3).
Properties of the transition state Numerous characterizations of the dimeric and crosslinked forms of the coiled coil indicate that about one-third to one-half of the helix is nucleated in the TS5,
15. Our initial estimate of one-third helical content resulted from the low -value of 0.16 for the A14G comparison near the center of the coiled coil11. An H/D backbone amide isotope effect study pioneered by our group16 quantified the extent of hydrogen bonding in the TS, with a fD-H-value of 0.6. Potentially, non-native H-bond formation, possibly 310-helix, or backbone desolvation in the TS, as suggested by our trifluoroethanol studies17 and other recent results18 cause the apparent overestimate of the helical content.
In summary, these data indicate that partial desolvation of the polypeptide is a critical element of the rate-limiting step. Desolvation of the backbone is energetically costly, but helix formation offsets this energetic penalty. In order to condense hydrophobic side chains of two amphipathic helices, i,i+4 amides and carbonyls form H-bonds because the clustering of the hydrophobic moieties expels competing water molecules from the amide/carbonyl interface. This desolvation would otherwise leave H-bonding groups unsatisfied in the nascent hydrophobic core. As the initial buried backbone H-bonds form, the solvent exposed H-bonds on the opposite face of the helix concomitantly form. In this view, pairs of amphipathic helices provide an efficient geometry to bury hydrophobic surface while satisfying H-bonding requirements16.
Metal binding and f-values We conclude that metal binding necessitates authentic helix formation. We recovered native strength metal binding in the kinetics experiments (Fig. 1c) and found that the fraction of interaction for the metal binding site, the f-value, was unity. This interpretation is consistent with our D/H isotope effect results16 and other mutational studies5,
15. Directly connecting f-values and helix formation implies that fractional values equate to the degree of pathway heterogeneity that is, the fraction of TSs with helix formed at the biHis site. Hence, this method overcomes a number of deficiencies of standard mutational studies: (i) partial f-values either may represent pathway heterogeneity or partial structure formation; (ii) additional structural perturbations are introduced, especially in core mutations; (iii) side chain interactions in the TS often are unknown; and (iv) helix formation is inferential and often complicated by tertiary interactions or, in the case of Ala/Gly comparisons, backbone configurational entropy is probed.
Studies of heterogeneity For two-state proteins other than GCN4, limited evidence supports the possibility of pathway heterogeneity. Baker et al.19 observed that the topologically identical, small IgG binding domains of protein G and protein L nucleate at either of two different, but symmetrically related, turns. Mutational studies indicate that nucleation can occur at either turn and reflects their relative stability. Although mutational -analysis can identify multiple pathways, both the Serrano2,
20 and Baker3 groups found no evidence for a shift in pathway upon the destabilization of elements formed in the TS of SH3 domains. Similarly, loop insertion studies did not detect pathway heterogeneity3,
21. However, homologs of SH3 fold through different TSs19,
22.
When Fersht et al. encountered the ambiguity of assessing pathway heterogeneity in chymotrypsin inhibitor 2 (CI2) folding studies1, they concluded that a homogenous pathway could explain the data. Baker et al.23, however, reanalyzed the solvent exposed portion of the CI2 data and concluded that the ambiguity persists regarding whether partial -values are due to pathway heterogeneity. Their analysis conceded that mutagenesis methods are insufficient to generate the 4 kcal mol-1 free energy change required to establish the appropriate folding model for protein L.
Recent work has investigated the role of topology either through crosslinking or circular permutation. Topological changes sometimes24,
25,
26, but not always3,
27, result in different TSs. The existence of distinct TSs reveals that there is structural diversity at the rate-limiting step. However, such methods neither imply that heterogeneity exists for the same version of the protein nor do they quantify the diversity of TSs.
Proteins that fold in a two-state manner are well suited for developing single molecule methods. However, single molecule methods will not detect pathway heterogeneity. Unfolded ensembles typically equilibrate faster than the folding rate. Only a single relaxation process is observable, where the rate is the sum of the rates for all pathways. Thus, individual routes are not identifiable, unlike measurements on systems that have kinetically distinct starting configurations and exhibit multi-exponential relaxation processes28.
Conclusion Engineered biHis sites can address questions regarding pathway heterogeneity and helix formation in TSs. By spanning a larger energy range than possible with standard mutations in a manner that does not perturb the structure, we identified minor pathways representing 1% of the flux. Furthermore, the method explores the free energy reaction surface, because a continuous change in metal ion concentration continuously varies the stability of the specific region. We observed that the translational symmetry of the dimeric version of the coiled coil results in multiple helix nucleation sites. Intrinsic helicity of the sequence, however, is not the sole determinant of the site of nucleation. In both the N- and C-tethered versions, translational symmetry is broken, and folding occurs along a single, dominant pathway that is exclusively determined by tether location. Although the introduction of a metal binding site in the core of a protein may prove difficult, we have successfully engineered a biHis binding site across a -strand in mammalian Ubiquitin (data not shown). Hence, we anticipate that surface metal sites generally can be introduced throughout many proteins, thereby permitting investigations of a wide variety of folding issues including the importance of local versus nonlocal interactions.
Methods Peptides. 3,7-biHis versions of GCN4-p1' (Ac-RMHQLEHKVEELLSKNWNLENEVARLKKLVGER-NH2) were synthesized with Y17W for fluorescence. Crosslinked versions5 contain either an additional CGG or GGC linker at the N- or C-terminus, respectively. Dimeric versions lack the terminal Cys residue, preventing unwanted oxidation at pH 7.5. Peptide concentrations are determined using 280 = 5,700 M-1 cm-1. Mass spectrometry confirmed all protein samples.
Equilibrium. Equilibrium stability was determined from GdmCl denaturation using CD222 at 2 nm resolution. Peptide concentrations ranged from 2 to 42 M in Buffer A (20 mM HEPES and 150 mM NaCl at pH 7.5) at 10 °C.
Stopped-flow. Experiments used a Biologic stopped-flow apparatus29 at 10 °C in Buffer A. Protein concentrations ranged from 0.1−32 M. Metal ions stock solutions (1.0 M CoCl2 or 0.25 M ZnCl2) were added in equal concentrations to all buffers.
Acknowledgments We thank S. Meredith, X. Fang, T. Pan, N. Kallenbach, S.W. Englander, D. Baker, A. Fernández, A. Kossiakoff and our group members for numerous enlightening discussions. We also thank G. Reddy for peptide synthesis supported by a grant from the National Cancer Institute. This work was supported by grants from the National Institutes of Health and The Packard Foundation Interdisciplinary Science Program (T.R.S., P. Thiyagarajan, S. Berry, D. Lynn and S. Meredith).