Truncation and mutation of a poorly folded 39-residue peptide has
produced 20-residue constructs that are >95% folded in water at
physiological pH. These constructs optimize a novel fold, designated as the
'Trp-cage' motif, and are significantly more stable than any other miniprotein
reported to date. Folding is cooperative and hydrophobically driven by the
encapsulation of a Trp side chain in a sheath of Pro rings. As the smallest
protein-like construct, Trp-cage miniproteins should provide a testing ground
for both experimental studies and computational simulations of protein folding
and unfolding pathways. Pro−Trp interactions may be a particularly
effective strategy for the a priori design of self-folding peptides.
The a priori design of proteins and the determination of the
minimum requirements for the formation of protein-like structures are currently
the subject of active research, with several notable achievements reported in
the past few years1,
2,
3,
4,
5. Excluding systems with disulfide
or metal chelation crosslinks, the smallest natural domains that fold
autonomously have 32−40 residues6,
7,
8,
9. The design of
stable protein-like structures with a smaller number of residues requires both
an exquisite optimization of hydrophobic packing and a strategy to decrease the
backbone configurational entropy of the unfolded state.
There have been successes in the minimization of known folds and the
design of stable miniproteins. Dahiyat and Mayo1 have reported a
28-residue zinc finger mimic that displays a stable fold and some degree of
cooperativity in its melting behavior as monitored by CD. The smallest natural
proteins are WW domains, which have an antiparallel three-stranded -sheet
and two conserved Trp residues. The Pin WW domain can be truncated to 28
residues and retain two-state folding behavior9. The Serrano
group designed Betanova10, a 20-residue -sheet that is
12% folded in water at 10 °C (ref. 11).
Well-folded three-stranded -sheets of 20−29 residues in length have
been designed4,
12, but these require d-amino acids to
improve turn propensities. The goal of our fold optimization program is the
production of a 20-residue, self-folding peptide using only the standard set of
l-amino acids. Further, we want the hydrophobic effect13,
14, rather than secondary structure stability, to drive structure
formation to impart protein-like folding behavior.
Although there is no universally accepted definition for a 'self-folding
domain'7, the following features are useful diagnostics: (i)
multiple secondary structure elements in close contact; (ii) notable chemical
shift dispersion that predominantly reflects tertiary interactions; (iii) side
chain−side chain packing interactions that produce well-defined
1 and 2 values; (iv) greater backbone
amide exchange protection than that provided by secondary structure; and (v)
some degree of folding cooperativity and resistance to thermal unfolding. Here
we report a 20-residue construct (Fig. 1b) that
displays these characteristics and the fold-minimization strategy that provided
this miniprotein.
Figure 1. Truncation and optimization of the C-terminal fold of
EX4.
a, The structure of EX4 in aqueous 30% TFE. A
helix ribbon is shown for residues 8−28, with the side chain atoms of Leu
21, Phe 22, Trp 25 and all atoms of residues 29−39 are displayed with
standard CPK color scheme: carbon, black; hydrogen, white; nitrogen, blue; and
oxygen, red. The green spheres represent hydrogens that display large upfield
shifts due to ring current effects. Hydrogen bonds producing exchange
protection are indicated as follows: the thick and thin black lines represent
NHs with hydrogen bond fractional populations 0.9994 and 0.985−0.998,
respectively. Green lines in the substantially frayed N-terminus represent
hydrogen bonds with the fractional population shown. b, CPK model of
the designed miniprotein TC5b . The indole ring is highlighted in magenta; the
strongly upfield-shifted Gly30-H2 is shown in green.
Mutational optimization of the 'Trp cage' The NMR structure15 of the predominantly helical,
39-residue peptide exendin-4 (EX4) from Gila monster saliva served as our
starting point. EX4 appears to be poorly folded in water as it aggregates at
NMR concentrations but is stable and folded (Fig.
1a) in aqueous fluoroalcohol media. The feature that was most
striking about the XFXXWXXXXGPXXXXPPPX (where X is any amino
acid) sequence is the close association of a Gly and three Pro residues (shown
in bold) with the two aromatic side chains. We have designated this fold as the
'Trp cage'. Sequence-remote hydrogens located above and below the indole ring
of Trp 25 display large chemical shift deviations (CSDs) due to ring current
shielding effects. The following CSDs were observed for EX4 in 30%
trifluoroethanol (TFE): -3.00 and -1.44 (Gly 30-2 and
-3, respectively), -0.67 (Pro 31-3), -1.82 (Pro
37-), -1.52 (Pro 37-3) and -0.57 p.p.m. (Pro
38-2/3). These shift deviations provide the most direct NMR assay
for the extent of folding of modified sequences.
The unique structure of the Trp-cage motif in EX4 and its apparent lack
of interactions with the N-terminal half of the sequence suggest that the
domain could be extracted and still form a stable structure. Tests of a series
of truncated/mutated sequences (Table 1) in aqueous
fluoroalcohol media indicate that all of the species, with the exception of
Trp-cage construct 1c (TC1c), are 97% folded. Folding in water was
monitored in several ways: the absence of multiple Xaa-Pro
cis/trans isomers, the presence of characteristic long-range NOEs
and diagnostic chemical shift deviations. Of these, the CSDs provide the more
quantitative measures of folding (Table 1). Helicity
could be tracked by the H CSDs for the helix (residues 21−27),
with the shifts of 30-3 and 31-3 monitoring C-capping. The sum of
the upfield shifts for C-terminal Pro resonances (37-3, and
38-2 and -3) monitors Trp-cage formation. The largest values
observed for the 'C-cap' and 'cage formation' measures were equated with '100%
folded'. The upfield shift of Gly 30-2 reflects both C-capping and
tertiary structure formation.
Table 1. NMR folding measures for truncated sequences
All of the helix that is not buried by the Pro 37-Pro 38 unit can be
eliminated without disrupting the Trp cage, so long as the remaining helix is
stabilized by an N-capping box (TC1 and TC2) or a single N-capping residue
(Asp, Asn or Gly). TC1c was the first species prepared that is readily soluble
in water but only 23% folded based on the CSDs for Pro 37 and Pro 38. In
the partially folded 20-residue constructs, the extent of tertiary structuring
in water correlates with the N-cap propensities (Asp Asn > Gly)16, and all measures of tertiary structuring and helicity are highly
correlated. However, the C-terminal Trp-cage fold of these 20mers is, at most,
40% populated according to the diagnostic chemical shift criteria.
An examination of the NMR structure (data not shown) derived for TC4a
mutant revealed that the Arg 35 side chain was located close to the Asn 28 side
chain function. An N28D mutation was incorporated (TC5a and TC5b) in hopes of
creating an hydrogen bonded salt bridge. This mutation was coupled with an E24Q
mutation to avoid a potentially unfavorable coulombic interaction (EXXXD) in
the helix (introducing in its place a pH-dependent helix-favoring QXXXD
interaction)17. The F22Y mutation was included in TC5b because
Tyr side chains are more commonly observed to have hydrophobic stacking
interactions with both Pro and Trp residues in proteins. TC5b is >95% folded
in water, displaying the same CD spectrum with or without the addition of TFE
(Fig. 2a), and melts cooperatively (Fig. 2b). The ring current shifts of TC5b are
remarkable: Pro 37-3 ( = 0.34 p.p.m.) and Gly 30-2
( = 0.72 p.p.m.) are the two furthest upfield resonances in the
1H-NMR spectrum (D2O buffer at pH*
6 (uncorrected for isotope effect)). The shift of Gly 30-2 seems to be
the most upfield observation for a non-heme protein. NMR lineshapes and the
concentration independence of all spectroscopic parameters imply a monomeric
structured state. With TC5b, we appeared to have achieved our stated goal, and
further studies focused on this construct.
Figure 2. Folding measures for Trp-cage construct 5b.
a, CD spectra of 66 M TC5b in pH 7 aqueous
buffer (2 °C) and buffer with 30% (v/v) TFE: residue molar ellipticity
versus wavelength. The curve shape suggests an unrealistically high
level of helicity from a reinforcing Trp side chain chromophore contribution
near the amide n−* transition. b,
CD-monitored melts for TC5b in pH 7 aqueous buffer with and without the
addition of 30% TFE or guanidine-HCl (6.3 M): []222versus T (°C). The denatured state data was fit to a line; the
curves through the other data points are polynomial fits with no theoretical
significance. c, Graphs illustrating the correlation of chemical shift
temperature gradients (/T) with the chemical shift
deviations observed in D2O at 7 °C. All CH resonances
(|CSD| > 0.1 p.p.m.) are shown (solid squares) and are the
basis for the least-squares fit; other CH signals (open circles) in the
C-terminal portion of the structure fall on the same line. The Pro31-3
resonance (red circle) is an outlier (see text). d, Unfolded mole
fraction versus T (°C) from the CD data and NMR, the latter based on
11 fractional chemical shift deviations (23-, 26-, 30-2;
31- and -3; 37-, -3 and -3; and 38-,
-2 and -3). The NMR measures are shown with error bars
(s.e.); greater scatter would be expected for noncooperative unfolding.
The CD values were converted to fraction unfolded assuming the guanidine-HCl
line in (b) represents 100% unfolded and a common temperature gradient
([]222 / T = +52° per °C) for the
100% folded baselines. The intercept values of []222 were
-16,100° (buffer) and -17,200° (30% TFE).
TC5b fold characteristics and stability A structure ensemble for TC5b was generated from 169 NOE distances, of
which 28 were i/i+n (n > 4) constraints. The key long-range NOEs involving
the central Trp residue are illustrated in Fig. 3. Even
though 10 NHs are exchange protected (vide infra), no hydrogen bond
constraints are employed because there is no independent basis for assigning
the electron pair donors involved. This highly conservative data treatment
produced a well-converged structure (Fig. 3a;
Table 2), with an -helix from Leu 21 to Lys 27
and a short 310-helix (residues 30−33). The unusual features
are a Gly30-NH hydrogen bond to the i-5 backbone carbonyl, an indole-NH1
hydrogen bond to the i+10 backbone carbonyl and the placement of Pro rings on
both faces of the indole ring, with the Tyr ring completing the Trp-cage
hydrophobic cluster. The degree of side chain rotamer restriction observed in
the NMR ensemble argues against a fluxional structure. The TC5b structure is
compact and globular (Fig. 1b).
Figure 3. NMR spectra and the structure derived for TC5b.
a, Stereo view of the NMR ensemble (38 of 50
structures for TC5b in pH 7 aqueous buffer (Table 1)).
All atoms are displayed for Tyr 22 (orange), Trp 25 (magenta), Leu 26 (cyan),
Pro 31 (dark red), Pro 36 (black), Pro 37 (green) and Pro 38 (blue). For the
remaining residues, only the backbone is displayed. The heavy-atom pairwise
r.m.s. deviation over the key residues in the Trp cage (Tyr 22, Trp 25, Gly 30,
Pro 31, Pro 37 and Pro 38) is 0.46 0.15 Å. The annotated NOESY
segments b, with added TFE and c, without TFE illustrate the
diagnostic long-range NOEs. This is the same color scheme used in (a),
with Leu 21, Gln 24, Gly 30 and Arg 35 also shown in black. In (b), the
unlabeled line at 7.36 p.p.m. is Ile 23-HN. The key long-range NOEs of TC5b
(for example, 22- to 38-, 22- to 38-2, 22-
to 37-2, 25-1 to 35-/36-/37-, 25-1 to
35-/38-2, 25-2 to 31-3 and 25-2 to
31-/37-2-3) were observed in both media (the
22-/38- NOE does not appear in (c)). Trp
25-H1/H3 and H2/H2 are nearly and completely shift
coincident, respectively, in aqueous buffer. Long-range NOEs to these indole
ring resonances are attributed to individual hydrogen sites based on their
occurrence in other Trp-cage constructs under conditions where the resonances
are not shift coincident. d, TC5b is in a temperature dependent
equilibrium with an 'unfolded state' that does not display random coil shifts
because of residual local hydrophobic cluster formation. The increasingly
negative CSDs of 31-3 and 30-3 are rationalized because the
'residual' high temperature hydrophobic cluster between Trp 25 and Pro 31
places these two protons further into the shielding region than their location
in the unmelted Trp cage.
For measures of fold stability, we used NH exchange protection.
Helix/coil transition algorithms (for example, AGADIR)18, predict
protection factors 1.5 for the nine-residue (including the capping
residues) helix of TC5b. In agreement with this prediction, truncated peptides
lacking the C-terminal (Pro)3-unit do not display measurable NH
protection (data not shown). In contrast, NH1 of Trp 25 and nine
backbone NHs (residues 23−28, 30, 33 and 35) of TC5b are significantly
protected (protection factor (PF) = krc / kobs
= 101.01 0.23 at pH* 3.63).
Upon titration to pH* 6.03, the Trp 25-H1 protection
factor increases from 101.10 to 101.83. This is
attributed to the ionization of Asp 28 and fold stabilization through formation
of a salt bridge with Arg 35. The formation of tertiary structures clearly
results in enhanced exchange protection. To our knowledge, TC5b is the first
monomeric peptide of this length to display measurable exchange protection in
water. Upon repeating the pH* 6.03 experiment with the
addition of 30% TFE, protection factors increase for example, Trp 25
H1 (t1/2 = 8 h and PF =
102.84) and Leu 26 HN (t1/2 = 30 h and PF
= 103.15) in agreement with the increased
Tm observed by CD and NMR (Fig.
2d).
Trp cage folding cooperativity We have reported19 a novel graphical test for folding
cooperativity based on CSD values and chemical shift temperature gradients. For
two-state structure/disorder equilibria, a linear correlation is expected
between the CSDs observed at low temperatures and the /T
values observed during the melting transition; the correlation coefficient is a
measure of cooperativity. Excellent correlations are observed for NH shifts
(data not shown) and for all H resonances, with the ring-current shifted
resonances appearing on the same line (Fig.
2c).
CD studies of melting (Fig. 2b) also
indicate cooperativity. The melting transition midpoint is 42 °C in pH 7
aqueous buffer; CD measures of helix melting are in complete agreement with NMR
CSD melting data from nonhelical portions of the Trp cage (Fig.
2d). TC5b also displays the classic hallmarks of cooperative
unfolding in a guanidine-HCl denaturation titration
([guanidinium]1/2 = 2.2 M at 3 °C and
GU = +8.6 (0.9) kJ mol-1;
data not shown). This corresponds to 97.5% folded, in agreement with the value
obtained from the Trp25-NH1 exchange protection factor (98.5%). The NH
protection data, guanidine-HCl titration, thermal CD and NMR melting data are
in remarkably good agreement, which would be expected only for a two-state
folding process.
Structure in the absence of the Trp cage A detailed analysis of thermal chemical shift changes in some of our
constructs has revealed two instances of residual structure in states that lack
the complete Trp cage. In the earlier constructs with a longer helix (TC1a/c
and TC2), the retention of substantial helicity-induced shift deviations for
all of the H resonances in the -helix upon heating is observed in
30% TFE and also (to a lesser extent) in aqueous buffer. The phenomenon is
particularly pronounced for EX4. A direct comparison of the exchange rates for
EX4 and TC5b in pH* 6.03 buffer with 30% TFE quantified this
effect. The Trp 25-NH1 exchange half-lives are 3.0 and 8.0 h for EX4 and
TC5b, respectively. For TC5b, only the backbone NH of Leu 26 exchange slower
than 25-1. In the case of EX4, eight backbone NHs (all in the C-terminal
section of the -helix) remain after 25-1 was >85% exchanged.
The backbone exchange rates for the Ile 23−Lys 27 segment of EX4 indicate
a PF of 105.1 0.7, which is 500-fold greater than the
protection factor of the 25-1. Thus, in the case of EX4, Trp cage
formation corresponds to the docking of the C-terminal (Pro)3 unit
onto an exposed Trp side chain of a preformed helix.
In truncated, optimized Trp cage constructs, there is no evidence for
retained helicity after thermal loss of tertiary structure; however, there are
still some 'unexpected' CSD changes at Gly 30 and Pro 31 resonances upon
melting and/or mutation. In the optimized constructs (TC5a/b), the Gly
30-CH2 displays dramatic stereo-selective shielding (3.43 for
H2 and 0.96 p.p.m. for H3), and Pro 31-3 is modestly
shifted (CSD = -0.26 to − 0.47 p.p.m.). In the unoptimized
short constructs (and in constructs with a longer -helical segment),
30-3 CSDs are as large as -1.56 and 31-3 CSD values as
large as -0.87 p.p.m. were observed. Furthermore, although all of the
other Pro- resonances move uniformly toward their random coil values on
melting, warming aqueous solutions of TC5a/b shifts the 31-3 resonance
farther upfield (the outlier in Fig. 2c). A
structural model of the unfolding of TC5b rationalizes these observations
(Fig. 3d). A similar hydrophobic cluster between
Trp 25 and Pro 31 may serve to C-cap the helix of EX4 in the DPC
micelle-associated state that lacks tertiary structure15.
Implications If the NMR structure ensemble correctly reflects the structure and the
motions within the folded state energy, it should predict the chemical shifts.
Ring current shift calculations were performed using SHIFTS 3.1
(http://www.scripps.edu/case/) and MOLMOL20. The
structure predicts all of the >0.2 p.p.m. CSDs. For large CSDs due primarily
to the indole ring, the observed values are within 14 8 % (n =
10) of the SHIFTS 3.1 ring current predictions. Notably the highly
stereo-specific CSDs of the Gly 30-CH2 (3.43 and
−0.96 p.p.m. observed versus 3.17 0.39 and
0.91 0.17 calculated from the ensemble) and the large upfield
shifts for Pro 37-Pro 38 sites are reproduced. The agreement obtained using the
NMR ensemble is consistent with a structure that has relatively little
additional motion and remains tightly packed about the indole ring. We have
significantly lowered the size limit for a fully protein-like fold.
The potential for a coulombic interaction between Asp 28 and Arg 35 in
the folded state is the basis for the N28D mutation. At pH 7, TC5a is 7.6 kJ
mol-1 more stable than the TC3b. NMR-monitored pH
titrations of TC5a/b suggest an Asp deprotonation G of
5.8−7.5 kJ mol-1. Similar fold stability increments
have been observed upon the electrostatic optimization of protein surfaces21. We view the sequence remote coulombic interaction as the best
rationale; however, additional studies are required to ascertain what portion,
if any, of the stabilization is due to the pH-dependent, helix-favoring QXXXD
interaction17.
The spectroscopic probe (primarily chemical shifts) and co-solvents used
to monitor stability in this fold minimization and optimization study deserve
some comment. CSD analysis seems to be optimal for studying fast folding
systems at the peptide/protein borderline. There has been considerable
controversy concerning the extent to which the peptide structuring effects of
TFE and hexafluoroisopropanol (HFIP)22,
23 are pertinent to the
'native' states of peptides and proteins in water. Fluoroalcohol stabilization
of helices has been known for decades, and more recent studies have extended
this to -hairpins14,
24,
25 and -sheets10. With this report, fluoroalcohol-induced increases in 'native
aqueous structure' populations have been observed for a system that owes its
stability to a specific hydrophobic core (the Trp cage).
As previously noted15, the Trp-cage fold is an
intramolecular example of a motif for binding Pro-rich segments to domains that
are involved in signal transduction. Pro−Trp interactions may also be an
effective strategy for fold stabilization. There is some analogy between the
environment of our buried indole ring and that of Trp 11 in the Pin WW domain;
Tyr 22-Trp 25-Pro 36-Pro 37-Pro 38 in the Trp cage equates with Tyr 23-Trp
11-Pro 8-Leu 7-Pro 37 in the WW domain9. Pro residues are
advantageous because their inclusion serves to reduce the entropic advantage of
the unfolded state. It is not coincidence that the smallest natural proteins
also have Pro residues suspended over Trp side chains. Finally, Trp-cage
constructs may prove to be a useful paradigm for protein folding studies in
which both experiments and computational simulations are simplified by the
small size of the structures. As the smallest protein-like system known, these
systems should be an excellent testing ground for molecular dynamics
simulations of protein unfolding and folding pathways.
Methods Peptide synthesis. The sample of EX4 has been described15. All other
peptides were prepared using fast FMOC chemistry on an ABI 433A peptide
synthesizer and purified by reversed-phase HPLC (C18 column) with a
water + 0.1% TFA:acetonitrile + 0.085% TFA gradient. Purity and sequence were
established from the NMR spectra.
CD spectroscopy and melting studies. CD samples were prepared by dissolving weighed amounts (0.5−2
mg) of lyophilized peptides directly in 15 mM aqueous phosphate (pH
5.9−7.05) or phosphate-acetate (pH 3−4.5) buffer to produce
600M stock solutions (determined by UV at 278 nm
= 5,580 cm2 mmol-1 for Trp or 6,760
cm2 mmol-1 for Trp + Tyr). Quantitative
serial dilutions to the required levels of fluoroalcohol and aqueous buffer
were used. CD spectra were recorded in 1 and 10 mm path length cells
(25−70 and 2−10 M peptide concentrations, respectively) using a
JASCO model J720 spectropolarimeter as reported26. CD spectral
values for peptides are expressed in units of residue molar ellipticity (deg
cm2 dmol-1).
NMR spectroscopy and structure ensemble
generation. Solution-state NMR samples were made by dissolving 2−3.5 mg
lyophilized peptide in 450−500 l aqueous buffer (as listed in CD
methods) with 10% D2O for locking. Structuring shifts are reported
as CSDs, experimental shift/random coil values19, with
referencing to internal DSS. NOESY and TOCSY spectra were collected for all
media, with solvent suppression accomplished using the WATERGATE pulse
sequence27. All the spectra were collected on a Bruker DRX
operating at 500 MHz and were readily and unambiguously assigned by standard
methods28. The complete chemical shift assignments for TC5b in
aqueous buffer with and without added TFE have been deposited (BMRB entry
5292). NOE intensities were converted to Å distances short
(2.0−3.0 Å), medium (2.5−3.5 Å), long (2.9−4.0
Å) or very long (3.3−5.0 Å) and used in CNS29 protocol as described15. The weighting was 75 kcal
Å-2 during the final minimization, which included a
Lennard-Jones rather than purely repulsive van der Waals terms. The acceptance
criteria, violations (none) and convergence statistics appear in
Table 2. All structural figures in this report were
prepared using MOLMOL20.
NH exchange studies and protection factor (PF)
analysis. Exchange rates were obtained from 1D spectra as the slopes of plots
of ln (NH signal intensity) versus time. The experiments were performed
at 7−9 °C by adding pre-cooled D2O buffer to the
lyophilized peptide sample in a pre-cooled NMR tube. After brief vortex mixing,
the sample was placed in the previously cooled and shimmed NMR probe for
immediate data collection. Additional points were recorded first at 5−15
min intervals and then daily for long exchange times. Assignments for NHs that
were still present 2−4 h after dissolution in D2O buffer were
confirmed by 2D correlations. PF values were obtained using the protocols and
coil reference values used for EX4 (ref. 15).
Coordinates. Coordinates and NMR distance constraints have been deposited in the
Protein Data Bank (accession code 1L2Y).
Acknowledgments Initial support came from a feasibility grant from the University
of Washington Royalty Research Fund with continuing support from an NIH grant.
We thank L. Serrano (EMBL-Heidelberg) for reminding us of the pH dependence of
the helix-favoring QXXXD interaction.
Competing interests statement:
The authors declare that they have no competing financial interests.