Abstract
The binding of intrinsically disordered proteins to globular ones can require the folding of motifs into α-helices. These interactions offer opportunities for therapeutic intervention but their modulation with small molecules is challenging because they bury large surfaces. Linear peptides that display the residues that are key for binding can be targeted to globular proteins when they form stable helices, which in most cases requires their chemical modification. Here we present rules to design peptides that fold into single α-helices by instead concatenating glutamine side chain to main chain hydrogen bonds recently discovered in polyglutamine helices. The resulting peptides are uncharged, contain only natural amino acids, and their sequences can be optimized to interact with specific targets. Our results provide design rules to obtain single α-helices for a wide range of applications in protein engineering and drug design.
Similar content being viewed by others
Introduction
Proteins are central to biology as they carry out a wide range of essential functions, from gene regulation to enzymatic catalysis, where their ability to specifically interact with other biomolecules is crucial. In pharmacology, inhibiting their interactions using drug-like small molecules is a common approach to modulate biological functions relevant to disease. In cases where the binding partner is another protein, the binding interfaces are usually flat and extended1, making it challenging to inhibit the interactions with small molecules2, and it is generally preferable to target them with antibodies. Yet, despite recent progress in intracellular antibody delivery3, their clinical applications have been limited to targeting extracellular proteins, highlighting the need to develop new molecular tools to inhibit intracellular protein-protein interactions.
Peptides have some of the advantages of small molecules, such as their ease of synthesis, and some of the advantages of antibodies, such as their relatively large size. Peptides, therefore, have at least in principle great potential as modulators of protein-protein interactions for pharmacological applications4,5. Protein-protein interactions where one partner is in a helical conformation are particularly common and especially amenable to inhibition by peptides: an excised linear peptide comprising a suitable sequence can, in principle, inhibit the interaction if it can bind to its partner with high affinity6. Linear peptides have a low propensity to fold into stable α-helices, however, and the entropic cost of folding decreases both their affinity for their targets and their stability against proteolytic degradation, highlighting the need to develop new tools to stabilize their helical conformation.
Introducing non-natural amino acids that act as potent N-caps7 or substituting the i + 4 → i hydrogen bonds stabilizing this secondary structure by covalent, and therefore permanent, surrogates8 can be used to achieve this goal. Alternatively, specific amino acids can be introduced at positions i and i + 3, i and i + 4 or i and i + 7, that are close in space in α-helices, and linked by different means9,10 such as by peptide stapling11,12. Peptide stapling is based on the use of pairs of synthetic α-methyl, α-alkenyl amino acids at relative positions i,i + 3, i,i + 4 or i,i + 7 where their side chains can react by ring-closing metathesis when the peptide folds into an α-helix, thus greatly stabilizing this conformation11,12; in some cases, two such connections have been concatenated to obtain especially long and stable helices13. Although these approaches have shown applicability to inhibit protein-protein interactions, they have drawbacks that limit their range of applicability, such as their costly synthesis, limited solubility, and high rigidity.
We recently reported that Gln side chain to main chain hydrogen bonds stabilize the polyglutamine (polyQ) helix, a helical secondary structure formed by the polyQ tract of the androgen receptor (AR)14,15. In this structure both the main and side chain amide groups of Gln residues at position i donate hydrogens to the main chain CO group of the residues at position i-4; the strength of this bifurcate interaction depends on the residue at position i-4, with Leu performing particularly well. Equivalent interactions have also been observed in the polyQ tract of the protein huntingtin, indicating that it is not specific to AR and can therefore be used in design16. As the relative donor and acceptor positions of such interactions are equivalent to those of the residues linked covalently in stapled peptides, we reasoned that side chain to main chain hydrogen bonds could be used to stabilize the α-helical conformation of linear peptides to enhance their interaction with globular proteins.
Here, to address this hypothesis we design a series of linear, uncharged, and highly soluble peptides and confirm their cooperative folding into single α-helices under physiological conditions. To evaluate the versatility and range of applicability of the design rules we analyze their tolerance towards changes of the acceptor residue, with bulky hydrophobic residues performing best, and introduce a pH-dependent conformational switch; in addition we explore how such bifurcated hydrogen bonds can be combined with electrostatic interactions between side chains to further stabilize α-helices. Most importantly, we show that the sequences of such peptides can be tailored to interact with specific globular proteins. In summary, the simple design rules that we propose can be used to engineer a class of linear, cooperatively folded helical peptides for use as templates in applications in pharmacology, materials science, synthetic biology and, more generally, in bioengineering.
Results
Design of Gln-based single α-helices
The helicity of peptides and intrinsically disordered (ID) proteins can be predicted by secondary structure prediction algorithms such as Agadir17. This algorithm, similar to the Zimm-Bragg18 and Lifson-Roig19 helix-coil transition models, is based on statistical mechanics. The statistical weight of the helical state of any peptide segment depends on the free energy of it folding into an α-helix, that Agadir computes as the sum of different terms including one accounting for interactions between residues at positions i and i + 4, close in space in α-helices, that requires experimental parametrization. The current version of Agadir does not account for the Gln to Leu (Glni+4 → Leui) side chain to main chain interactions stabilizing polyQ helices and therefore underestimates the helicity of the polyQ tract in AR: peptide L4Q16, excised from AR and harboring four such interactions, has 38% helical propensity according to NMR experiments while Agadir predicts only 3%14. To address this we introduced an additional term to the free energy of folding into an α-helix accounting for this interaction (ΔGLQi,i+4) and by minimizing the prediction error with respect to the NMR-derived helicity (RMSDHel, see Supplementary Methods) obtained ΔGLQi,i+4 = −0.6 kcal mol−1 for L4Q16, in the range expected for one hydrogen bond in water20 (Supplementary Fig. 1a). To our surprise we found that the value of ΔGLQi,i+4 that minimizes RMSDHel depends on polyQ tract length: it increases from −0.4 kcal mol−1 for L4Q8 to −0.7 kcal mol−1 for L4Q20 (Supplementary Fig. 1b). The effective strength of the Glni+4 → Leui interactions in AR peptides depends thus on the number of equivalent interactions following them in the sequence, suggesting cooperativity.
To analyze the origin of this behavior and exploit it for peptide design we studied four peptides of identical amino acid composition but with two potential Glni+4 → Leui interactions (pink arrows in Fig. 1a) at different relative positions. The first such interaction is common to all peptides and the second one is shifted 1 (peptide P1-5), 2 (P2-6), 3 (P3-7), or 5 (P5-9) positions towards the C-terminus. After confirming that they were monomeric under our experimental conditions by size exclusion chromatography coupled to multiple angle light scattering (SEC-MALS) (Supplementary Fig. 1d), we used solution-state nuclear magnetic resonance (NMR) spectroscopy to probe their structural properties by exploiting the quantitative dependence of 13Cα and 1Hα NMR chemical shifts on residue-specific helical propensity, where larger 13Cα and lower 1Hα shifts indicate higher helicity21. An analysis of the NMR spectra indicated that in the sequence context of this family of peptides the strength of two Glni+4 → Leui interactions is maximal when the donor of the first and the acceptor of the second share a peptide bond such that the two interactions are concatenated, as in peptide P3-7; these results were confirmed by circular dichroism (CD) spectroscopy (Supplementary Fig. 1c).
We then used CD spectroscopy to study the secondary structure of peptides containing two and three pairs of concatenated Glni+4 → Leui interactions, (P3-7)2 and (P3-7)3, and obtained that they are highly helical and monomeric (Θ222 nm/Θ208 nm < 1) (Fig. 1c, d). An analysis of their secondary structure based on the main chain NMR chemical shifts indicated that the residue-specific helical propensity (pHel) is larger than 0.9 across 9 contiguous residues for (P3-7)2 and larger than 0.95 across 16 residues for (P3-7)3. We also characterized peptide (P3-7)3 Ctrl, in which all Leu hydrogen bond acceptors were substituted by Ala, resulting, despite the higher helical propensity of Ala, in lower helicity (Fig. 1c, d) due to the relatively low stability of side chain to main chain hydrogen bonds accepted by Ala (see below)14. As it is not common for monomeric peptides to cooperatively fold into α-helices in the absence of tertiary interactions, we verified their monomeric state by SEC-MALS and native mass spectrometry (MS) (Supplementary Fig. 1d, e). In addition, we removed the N-terminal PGAS motif, which can facilitate helix nucleation14, from these peptides (Fig. 1c) and observed that the uncapped counterparts, u(P3-7)2 and u(P3-7)3, also have high helical propensity (Supplementary Fig. 1f).
Next, we investigated the thermal stability of the helices by CD spectroscopy at temperatures up to 368 K (Fig. 1e and Supplementary Fig. 1g). The spectra at 278 K were equivalent to those obtained upon cooling after thermal unfolding, indicating that the unfolded state of the peptide is soluble under our experimental conditions. We also characterized the structural properties of (P3-7)3 by NMR at physiological temperature, 310 K. A comparison of the 13C-detected 2D CACO spectra of (P3-7)3 at 278 and 310 K revealed only a small decrease in helical propensity (Fig. 1d, e) indicating that the peptide remains essentially fully folded at 310 K (pHel ≈ 0.90 over 12 contiguous residues). Figure 1f shows the CD-monitored thermal denaturation of peptides (P3-7)3 and (P3-7)3 Ctrl, which reports on the higher stability of the former. Finally, prompted by the observation that these peptides remain highly helical at physiological temperature, we also investigated their stability in human serum as well as their internalization in HeLa cells. Both (P3-7)3 and (P3-7)3 Ctrl show high stability in human serum, with half-lifes over 24 hours, well beyond that of Angiopep-222 that we included as a positive control (Supplementary Fig. 2a); in addition, both peptides were internalized by HeLa cells at 310 K (Supplementary Fig. 2b). In conclusion, concatenating Glni+4 → Leui interactions allows obtaining stable α-helices that contain only natural amino acids, remain folded under physiological conditions, are soluble upon thermal unfolding, are resistant to proteolytic degradation, and are readily internalized by living human cells.
Structure of a Gln-based single α-helix
The high quality of the NMR spectra obtained for peptide (P3-7)2 allowed using this technique to study its structure at atomic resolution and further characterize the interactions stabilizing its helical conformation (Supplementary Fig. 3a). First, we measured 15N relaxation at 278 K (R1, R2, heteronuclear 15N{1H} NOE) at two magnetic field strengths (14.1 T and 18.8 T) for the main chain amide (NH) groups of residues Gly4 to Lys24 and for the side chain amide (Nε2Hε21) groups of all four Gln residues (Gln11, Gln14, Gln17, Gln20) (Fig. 2a and Supplementary Fig. 3b). We found that the main chain 15N R2/R1 ratios increase from the termini towards the center of the peptide until reaching plateau values (5.05 ± 0.25 at 18.8 T, 3.20 ± 0.15 at 14.1 T) between residues Leu 10 and Gln17 (Fig. 2a). A similar trend is traced out by the 15N{1H} NOE, that reaches upper plateau values between 0.65 and 0.83 over a larger central segment, from Leu7 to Ala18. Although this result could be influenced by the well-characterized phenomenon of helix fraying23 the main chain amide 15N relaxation data localizes the region of highest structural rigidity on a 10−1 to 101 ns timescale as the one defined by the pairs of Leu and Gln residues involved in concatenated Glni+4 → Leui interactions, i.e., from Leu 10 to Gln17; by contrast the Gln20 → Leu16 interaction, involving the last Gln residue, appears to be weaker.
The different properties of the first three Gln residues (11, 14, and 17) relative to the last Gln residue (20) are also evident in the side chain amide 15Nε2 relaxation data: whereas the positive (at 18.8 T) or close to zero (at 14.1 T) 15Nε2{1Hε21} NOE values suggest significantly less side chain mobility for the first three Gln residues, Gln20 shows clearly negative values, suggesting higher side chain mobility (Fig. 2a), also reflected in the relaxation-derived spectral density map (Fig. 2b). These results agree with the notion that the first three Gln residues form stronger Glni+4 → Leui interactions than Gln20. The high NMR signal dispersion in both the 13C and 15N dimensions of the spectra of peptide (P3-7)2 also allowed studying the Gln side chains: the first three Gln residues show fully resolved Hβ and Hγ signals in the 15N-edited HSQC-TOCSY spectrum, whereas the Hβ signals overlap for Gln 20 (Fig. 2c). This indicates that the conformations of the side chains of the former are better defined than those of the latter, even more than in polyQ helices14,16, in agreement with the relaxation data.
To improve the description of the Gln side chains, we used CoMAND24 to infer rotamer populations from a diagonal-free 3D CNH-NOESY spectrum reporting on distances between protons bound to 15N and 13C. Briefly, CoMAND selects subsets of conformers from a pool, here obtained by molecular dynamics, that reproduce the NOESY spectra (Fig. 2d). The distributions were enriched in the mt Gln rotamer (χ1 = −60° and χ2 = 180°) that is required for the Glni+4 → Leui interaction: the Gln residues involved in strong interactions (11, 14, and 17) have a high mt population (0.80), whereas that obtained for Gln 20 was lower (0.58). Both values are higher than that obtained for Gln residues in α-helices of structures deposited in the PDB25, 0.44 (Fig. 2e); importantly, these results were robust to changes in the force field used to generate the pool26,27.
To generate a conformational ensemble, we used the residue-specific CoMAND ensembles to train a Gaussian mixture model (GMM) by inferring χ1 and χ2 probability densities for each residue (Fig. 2f and Supplementary Fig. 3c), modified the side chain conformations accordingly and, through R-factor minimization, obtained the set of representative conformers shown in Fig. 2g (Supplementary Table 1). To validate the ensemble we measured three sets of residual dipolar couplings (RDCs) under steric alignment. The main chain 1DH,N values show a dipolar wave pattern, typical of α-helices, that matches well the period of the zero frequency spectral density, J(0) (Fig. 2h). Although the RDCs were not used as restraints they correlate well with the ensemble-averaged values (Q = 0.37, Fig. 2i), confirming that the ensemble is an accurate representation of peptide (P3-7)2 and that the design rules that we have put forward lead to single α-helices.
Ranking Glni+4 → Xi interactions by strength
The stability of our Gln-based α-helices stems from concatenated Gln side chain to main chain hydrogen bonds accepted by Leu residues. This design decision was based on the fact that the polyQ tract found in AR, which is the most helical studied so far, is flanked by four Leu residues14. We sought to determine how other residues perform as acceptors to increase the versatility of the design rules and better understand the factors determining the strength of the interaction. For this, we used a host-guest approach in which we determined the secondary structure of L3XQ16 peptides (Supplementary Table 2) by NMR. These peptides were obtained by substituting the fourth Leu residue of peptide L4Q16, excised from AR, by 13 different representative amino acids (Fig. 3a and Supplementary Fig. 4a).
We measured the residue-specific helical propensities of the peptides by NMR by combining standard 1HN-detected triple resonance with 13C-detected CACO and CON 2D NMR experiments: the high resolution in the CO dimension of the latter allowed the unambiguous assignment of all Gln residues, even in the variants with lowest signal dispersion (Supplementary Fig. 4b). Except L3TQ16 and L3SQ16 all variants show a helicity profile approximately proportional to that of L4Q16, (Fig. 3a): L3TQ16 and L3SQ16 instead show a different profile, likely because in these cases the substitution shifts the site of helix nucleation by introducing an S/T N-capping motif28. In these, Ser/Thr accept two concomitant hydrogen bonds donated by main chain amides of residues C-terminal to them: one by the main chain O and another one by the side chain O of the hydroxyl group.
To explain the range of helicities obtained we hypothesized that it is due to two main factors: the intrinsic helical propensity of residue X29 and, based on our previous work14, the ability of its side chain to shield the hydrogen bond. While the former has been extensively measured17,29,30, we quantified the latter by using molecular modeling, considering that the conformation of the residue accepting the hydrogen bond (Xi) can have an effect on the interaction of the Glni+4 Hε21 donor with competing water molecules (Fig. 3b and Supplementary Fig. 7a). For this we computed 1 μs trajectories in different force fields27,31 for all 20 possible variants in which we constrained the secondary structure of the Leu2-Gln5 segment (Supplementary Figs. 5a, 6a) and increased the population of the Glni+4 → Leui interaction with a soft restraint (Supplementary Figs. 5b, 6b) to facilitate sampling the relevant region of conformational space: we obtained, as expected, that the higher the frequency of the hydrogen bond, the lower the solvent accessible surface area (SASA) of the Hε21 atom of Gln 4, with high correlation (Fig. 3c).
Next, we quantified to what extent intrinsic helical propensity (x1) and Glni+4 Hε21 SASA (x2) explain the experimental helical propensities (Fig. 3d) by multiple linear regression. Indeed we obtained that these two independent variables explain 73% of the variability (Fig. 3d). Remarkably, the Gln4 Hε21 SASA value (x2) is the most important factor in the correlation, as its weight in the fitted equation is 40% higher than that of intrinsic helicity (x1); the model allows the prediction of the average helicities of the L3XQ16 variants not included in our experimental dataset (Fig. 3e). The correlation improves (from r2 = 0.73 to 0.85) when only the subset of residue types with apolar side chains is considered, suggesting that additional factors might play a role when charged or polar side chains are present (Fig. 3f), and the results are robust to changes in MD force field27,31 or intrinsic helical propensity scales (Fig. 3f). These data confirm that Leu is one of the best helicity-promoting acceptors, but that other residues such as Phe, Tyr, Ile or Met are similarly good, and that Trp is a particularly good acceptor in spite of its low intrinsic helical propensity. Thus, residues other than Leu can be introduced as acceptors of Gln side chain to main chain interactions, increasing the versatility of our design rules.
These results prompted us to investigate the presence of (P3-7)n-like motifs in nature. To do this we searched UniprotKB32, including the Swiss-Prot and TrEMBL databases, by using the motif search tool in ScanProsite33, which we queried with the motif Ω-X-X-(Ω-Q-X)n-1-X-Q-X-X, with Ω denoting good acceptors of Gln side chain to main chain hydrogen bonds (namely W, L, F, Y, I, M). We found that 3451 proteins contain sequences matching the motif, belonging to organisms across the kingdoms of life with representatives of a wide variety of taxonomic lineages including archaea, bacteria, viruses, and a full range of eukaryotes, from unicellular organisms to metazoa including humans (Fig. 3g). There is experimental evidence for the existence of 94 of these proteins (UniprotKB annotation score >3), mostly belonging to extensively characterized metazoa (Supplementary Fig. 7b). Figure 3h shows an alignment of some example sequences along with their UniprotKB accession code and the organism that they belong to.
Design of a pH-sensitive conformational switch
Gln to Glu substitutions in polyQ helices decrease helical character due to the inability of the Glu side chain to donate hydrogen bonds at physiological pH, where the carboxylate group is deprotonated14 (Fig. 4a). Re-protonation by decreasing the pH can restore the interaction, providing us with an opportunity to introduce a pH-sensitive conformational switch in our design rules. To explore this possibility we first sought to establish whether the loss of helicity upon substitution and its restoration upon pH decrease is strictly local or whether its effects instead can propagate to other parts of the sequence due to cooperativity.
For this, we compared the residue-specific helicity of polyQ L4Q16 variants with Gln to Glu substitutions at positions 1 (Q1E), 4 (Q4E), and 5 (Q5E) in the polyQ tract (Fig. 4b and Supplementary Fig. 8a). In Q1E the positions experiencing the strongest decrease in helicity are Leu 2 to Leu 4, which are tethered by the first bifurcated Glni+4 → Leui hydrogen bond in L4Q16, as expected, but there is also a small decrease in helicity for Gln residues in positions 5 to 8 (Fig. 4d, left, black). Similar effects were observed in Q4E and the loss of helical character in Q5E was much smaller, likely because the interaction broken upon substitution, accepted by Gln1, is weak even at physiological pH. This indicates that breaking the first Glni+4 → Leui interaction also weakens the interaction to which it is concatenated in the polyQ helix, and vice-versa, again in line with the notion that these two interactions form cooperatively (Fig. 4c, d, black).
Experiments at pH 2.8, where the carboxylate group of the Glu side chain is protonated, showed that helicity was even higher than that of peptide L4Q16 at physiological pH (Fig. 4c, d, red); in addition the dispersion of Gln side chain Nε and Hε21 resonances in the 2D 1H15,N HSQC spectrum, that is characteristic of polyQ helices, was restored (Fig. 4e–g). To investigate the physical basis of this, we simulated this interaction by using a hybrid QM/MM approach (Supplementary Fig. 8b, c). We found that the establishment of the side chain to main chain hydrogen bond weakened the main chain to main chain hydrogen bond: the associated average electron density decreased from 0.015 a.u. to 0.011 a.u. (Fig. 4h) and the main chain Glui+4 (H) - Leui (O) interatomic distance increased by 0.15 Å (Supplementary Fig. 8d). Consistent with the stronger hydrogen bond donor character of protonated Glu, relative to that of Gln, the average electron density of the Glui+4 → Leui side chain to main chain hydrogen bond (0.038 a.u.) was higher than that involving Gln side chains (0.017 a.u.) and the electronic polarization of the Hε2 donor was stronger for Glu relative to Gln (Fig. 4i).
Finally, to test whether switchable side chain to main chain hydrogen bonds can be integrated in our design rules, we studied variants of the (P3-7)2 and (P3-7)3 peptides where all Gln residues were substituted by Glu, namely E(P3-7)2 and E(P3-7)3 (Fig. 4j, k, top). As expected, these were less helical than (P3-7)2 and (P3-7)3 at physiological pH, but more at pH 2.8 (Fig. 4j, k, center); importantly, this increase of helicity and thermostability was not observed for peptides (P3-7)2 and (P3-7)3, that are stabilized by Gln side chain to main chain hydrogen bonds that are not affected by pH changes. Both the switchable nature and the increased strength of the side chain to main chain hydrogen bonds involving Gln is apparent in the enhanced thermal stability of the E(P3-7)n variants (Fig. 4j, k, bottom): while at physiological pH peptide E(P3-7)3 loses its helical character at a lower temperature than (P3-7)3, at pH 2.8 it remains helical even at the highest temperature studied, 340 K. Indeed 13C-detected 2D CACO NMR spectra confirmed that, at pH 2.8, E(P3-7)3 was highly helical at 310 K, the physiological temperature (Fig. 3l, m).
Combining Glni+4 → Xi and electrostatic interactions
The natural single α-helices studied until now are stabilized by numerous electrostatic interactions between side chains of opposite charge at relative positions i,i + 3 or 434. We sought to investigate whether these electrostatic interactions can be combined with Glni+4 → Xi side chain to main chain hydrogen bonds to stabilize α-helices. For this we studied the polyQ tract of the TATA-box binding protein (TBP), which has a primary structure that suggests the presence of an electrostatic interaction between either of two Glu residues immediately flanking the tract at the N-terminus (Glu9 and Glu10) and an Arg interrupting it (Arg13) (Fig. 5a). This interaction can occur concomitantly with two strong bifurcated hydrogen bonds accepted by Ile7 and Leu8, at position i-4 relative to the first two Gln residues of the tract. As observed for the polyQ tracts in AR14 and, to a lesser extent, huntingtin35 the CD spectrum of a monomeric (Supplementary Fig. 9) peptide spanning a tract of size 16 and its N-terminal flanking region, TBP-Q16, showed it is strongly helical and that its expansion to 25 Gln residues increases its helicity (Fig. 5a, b).
We then used NMR to characterize TBP-Q16 with residue resolution (Fig. 5d, e). At physiological pH, in agreement with the CD data, the peptide forms a fully folded helix between residues Glu9 and Gln14 and its helicity decreases progressively towards the C-terminus; at acidic pH, instead (Fig. 5c), at which Glu side chains are protonated, the helical propensity starts decreasing at position 12. In addition, both the spectral signature of concatenated Glni+4 → Xi interactions (Fig. 5f) and the rotamer selection associated with these interactions (Fig. 5g) are diminished for Gln14 (Hγ) and, to some extent, Gln12 (Hβ). These results are in agreement with the formation of a helix-stabilizing electrostatic interaction between Glu9 (or Glu10) and Arg13 that is lost upon protonation at low pH, confirming that electrostatic interactions can be combined with Gln side chain to main chain interactions. To further confirm that these interactions can co-exist we simulated a WTE-enhanced MD trajectory of TBP-Q1627,36: Fig. 5h shows two frames of the trajectory in which both the Gln11 → Ile7 and the Gln12 → Leu8 bifurcated hydrogen bonds occur simultaneously with a salt bridge involving Arg13 and either Glu10 (left) or Glu9 (right).
Targeting the helices to globular domains
Gln-based α-helical peptides can be modified to bind specific globular targets: the Ala residues in the (P3-7)n scaffolds (Fig. 1) can indeed be modified at will because they are not involved in the interactions that stabilize the helical structure. To prove this concept we modified the sequence of peptide (P3-7)3 to interact with the C-terminal domain of RAP74 (RAP74-CTD), a small globular domain that binds to intrinsically disordered motifs that fold upon binding37,38,39. We blended the sequence of (P3-7)3 with that of two different motifs (centFCP1 and cterFCP1) derived from FCP1 that interact with this globular protein independently39. This led to peptides δ and δctrl: δ was designed to bind to RAP74-CTD whereas δctrl is a control sequence equivalent to δ where we replaced Leu by Ala that, despite having high intrinsic helical propensity, are bad acceptors of the side chain to main interactions, thus decreasing helicity (Fig. 6a and Supplementary Fig. 10). To facilitate the comparison with established helix-stabilization methods, we also designed δStpl, which features a chemical staple covalently linking the side chains of residues at positions 13 and 17 and where Leu hydrogen bond acceptors were substituted by Ala to weaken Gln side chain to main chain interactions.
An analysis of the structural properties of peptides δ and δctrl by CD showed, as expected, that the former is more helical than the latter, especially at room temperature (298 K), further confirming the important role of Leu residues for the stability of the helical fold (Fig. 6a). At 278 K, δ and δStpl displayed equivalent helicity, although the later was slightly more helical at 298 K. We then analyzed the chemical shift perturbations (CSPs) in the 1H,15N BEST-TROSY spectrum of RAP74-CTD induced by peptide binding. We obtained that δ induced perturbations in residues of the globular protein that define the binding site of FCP1 in this globular target39, confirming a similar binding mode (KD = 131.01 ± 19.96 μM, Supplementary Fig. 11a); this interaction was also studied by isothermal titration calorimetry (ITC, Supplementary Fig. 11b). This was in contrast to the results obtained with both δctrl and (P3-7)3, which in both cases failed to interact (Figs. 6b, c). δStpl, instead, induced intermediate CSPs reporting on binding in the millimolar range (Supplementary Fig. 11a). The result obtained with δctrl indicates that the helical character of δ is key for its ability to interact with RAP74-CTD, whereas those obtained with (P3-7)3 indicate high helical character does not suffice, and that the identity of the residues placed in the vacant positions is indeed key for binding, in agreement with our hypothesis.
To provide a second proof of concept we blended the sequence of (P3-7)2 with that of a motif found in the activation domain of AR that also interacts with RAP74-CTD, to yield peptide γ (Fig. 6d and Supplementary Fig. 10). As we previously showed, inhibiting this interaction with small molecules or peptides is a potential avenue to treat castration-resistant prostate cancer37. Linear peptides spanning the AR motif bind weakly to RAP74-CTD due, at least in part, to their low helical propensity, providing us with an additional opportunity to test the potential of our designs. In peptide γ two Glni+4 → Xi hydrogen bond acceptor positions were modified to accommodate the binding motif following the rules learned previously (Fig. 3). We also designed peptides γStpl and γCtrl, analogous to their δ counterparts. CD spectra show higher helicity of γ when compared to either the WHTLF or the γCtrl peptides. In contrast, γStpl is the most helical peptide of the series. In agreement with our hypothesis, we obtained that peptide γ binds RAP74-CTD with a KD in the mid-micromolar range, at least an order of magnitude more strongly than either the (P3-7)2, the WHTLF or the γCtrl peptides (Fig. 6e, f and Supplementary Fig. 11c). γStpl also binds RAP74-CTD in the mid-micromolar range (Supplementary Fig. 11c), and likely compensates its rigidity with its higher helicity when compared to γ. In summary, we have shown that the sequences of Gln-based single alpha-helices can be modified to interact with a specific globular protein.
Discussion
Our results show that Gln side chain to main chain hydrogen bonds can be used to design linear peptides that fold into α-helices (Fig. 2) with properties that make them attractive for various applications: they are highly soluble, even upon thermal denaturation, are not stabilized by electrostatic interactions, unlike the Glu and Lys/Arg-rich single α-helices reported until now40,41,42; and display some degree of folding cooperativity due to the concatenation of side chain to main chain hydrogen bonds explicit in our design rules (Fig. 1). Our data sheds light on the potential basis of such cooperative effect, which occurs when the donor of a Glni+4 → Leui interaction and the acceptor of the next one share a peptide bond.
An important feature of our design rules is their versatility: the residue accepting the hydrogen bond donated by the Gln side chain can be any residue able to shield the interaction from the competition with water, such as Trp, Leu, Phe, Tyr, Met, and Ile (Fig. 3). Remarkably, we find natural sequences fulfilling our design in different kingdoms of life. We also found structural models for 2303 of these sequences in the AlphaFold Database (AFDB)43: in 61.4% of the cases, a DSSP analysis44 of the AlphaFold model shows that the (P3-7)n-like motif is helical, increasing to 79.8% when only motifs devoid of helix breaking residue types (P, G) in the central part of the sequence are considered (Supplementary Fig. 7c). This conclusion holds in a subset of 42 structural models calculated without multiple sequence alignments (MSAs), showing that the helicity of this motif is encoded in AlphaFold learned structural preferences (Supplementary Fig. 7d). Thus, these motifs may represent a new class of uncharged single α-helices (SAHs) that had so far remained undetected40,41,42. In addition, in suitable cases the design can be complemented by electrostatic interactions between side chains of opposite charge and Gln residues can be mutated to Glu to introduce a pH-responsive conformational switch that uses only natural amino acids45 and does not involve changes in oligomerization state46 (Figs. 4 and 5).
The key feature of our scaffold design is that it defines the identity of just a fraction of the peptide residues: the rest can be chosen or optimized for specific applications. To prove this concept, we designed two peptides to bind the globular target RAP74-CTD by using an approach analogous to previous motif-grafting attempts on folded scaffolds47 (Fig. 6). In this specific case, naively blending these sequence features with the sequence of the designed (P3-7)n scaffold proved sufficient for successful targeting. Of note, our strategy displayed similar or even superior binding compared to chemical stapling, known to impose backbone rigidity, which highlights the importance of geometric adaptability to the target surface conferred by the transient nature of the side chain to the main chain hydrogen bonds, and gives additional room for sequence optimization. In fact, even if the affinities that we have obtained in this initial exercise (Supplementary Fig. 11) can be sufficient for certain applications48 we anticipate that it will be possible to greatly improve them by systematically searching the sequence space available using techniques for affinity maturation based on high-throughput mutational scans49, especially when taking advantage of the versatility of our design rules. We also showed that our scaffold peptides are highly resistant to proteolytic degradation in human plasma and readily internalized by living human cells (Supplementary Fig. 2), thus overcoming some of the most important hurdles for peptide therapeutics. Although this could be related to their amino acid composition and sequence and needs to be studied on a case-by-case basis for potential hits, our design constitutes a valuable platform from which to evolve peptides with favorable pharmacokinetic properties.
The link between polyQ tract expansion and disease onset in polyQ disorders has not been established50. These tracts are found in intrinsically disordered regions, and much work has been devoted to investigating whether expansion changes their structural properties. The results obtained have been inconclusive: single-molecule Förster resonance energy transfer and NMR measurements showed little influence of tract length on the conformation of the tract found in huntingtin16,51 whereas recent studies from some of us have instead shown that the helical propensity of the tract found in the AR increases upon expansion14,15, as does the tract found in TBP (Fig. 5b).
Our results help rationalize these observations by considering that polyQ tracts are in a polyQ helix-coil equilibrium. Its position can be influenced by the residues flanking the tract at its C-terminus16,52 but, for a given set of solution conditions and tract length, it is mainly determined by the four residues flanking the tract at its N-terminus (Fig. 3). When these are good acceptors of Gln side chain to main chain hydrogen bonds, as in AR (LLLL), the polyQ helix is favored. Instead, when only two are good acceptors as in huntingtin (LKSF), the coil is favored16,51,53,54. The results obtained for TBP also fit this rationale: its flanking region (ILEE) contains two good acceptors (Ile, Leu) and the two other residues (Glu) can establish, at physiological pH, an electrostatic interaction with the Arg residue three or four positions towards the C-terminus, favoring the polyQ helix.
The polyQ helix-coil equilibrium is sensitive to solution conditions: the entropic cost of folding results in higher stability of polyQ helices at relatively low temperatures whereas high temperatures favor the coil. This contributes in part to explain the discrepancy between the results obtained with huntingtin, where the experiments were carried out at room temperature51, and those obtained for AR, where they were instead carried out at 278 K14: indeed, CD studies of the structural properties of huntingtin showed an increase in helical propensity upon tract expansion at low temperature (263 K)35. Finally, our observation that pairs of the concatenated side chain to main chain interactions form cooperatively contributes to explaining how expansion shifts the equilibrium to the polyQ helix state both in AR14 and in huntingtin51.
Our results thus suggest that polyQ tracts are in a helix-coil equilibrium that is governed by the N-terminal flanking region, by solution conditions and, due to cooperativity, by tract length. Given that interaction between low-populated helical conformations of huntingtin play a role in the early stages of its aggregation into amyloid fibrils55,56 we propose that polyQ expansion leads to the onset of Huntington’s disease at least in part by stabilizing pre-nucleation oligomeric species, where the polyQ tract is partially helical, that are on-pathway to aggregation. Our proposal is in agreement with the very recently reported effects of amino acid substitutions in the N-terminal flanking region of the polyQ tract of exon 1 of huntingtin, where increases in helical propensity led to increases in aggregation propensity both in vitro and in cells57.
In summary, we have shown how an appropriate concatenation of Gln side chain to main chain hydrogen bonds makes it possible, on the one hand, to design highly helical peptides that can be tailored to specific applications and, on the other hand, to rationalize the until now perplexing observations regarding the structural properties of polyQ tracts. We anticipate that the knowledge gained about this interaction will influence future developments in peptide design, particularly in the use of peptides as therapeutics, as well as contribute to better understanding the molecular basis of polyQ diseases.
Methods
Peptide sample preparation
Recombinant peptides with 15N or 13C,15N isotope enrichment were prepared as detailed elsewhere58. Synthetic genes coding for the peptide of interest and codon-optimized for expression in Escherichia coli, with an N-terminal His6-Sumo tag fusion and cloned into the pDEST-17 vector, were directly obtained from GeneArt (Thermo Fisher Scientific, Waltham, MA, USA). Genes coding for the L4Q16 variants in the L3X and QXE series (see Supplementary Table 2) were obtained using the Q5 Site-Directed Mutagenesis Kit from New England Biolabs (Ipswich, MA, USA). Rosetta (DE3)pLysS competent cells (Novagen, Merck KGaA, Darmstadt, Germany) were used for expression in M9 medium containing 15NH4Cl and, where required,13C-glucose (both from Cambridge Isotope Laboratories Inc., Tewksbury, MA, USA) as the sole nitrogen and carbon sources, respectively. All purification steps were performed at 277 K. Cell lysates in lysis buffer (20 mM Tris-HCl, 100 mM NaCl, 20 mM imidazole, pH 8.0) were purified by immobilized metal affinity chromatography (IMAC) using a HisTrap HP 5 mL column mounted on an Äkta Purifier System (GE Healthcare, Chicago, IL, USA). Fractions containing the His6-Sumo-peptide fusion in elution buffer (20 mM Tris-HCl, 100 mM NaCl, 500 mM imidazole, pH 8.0) were pooled and dialyzed overnight in lysis buffer while treated with ubiquitin-like specific protease Ulp1 (50 μg/mL). The His6-Sumo tag was removed with an additional IMAC step and the peptide-containing flow-through was dialyzed in ultrapure MilliQ water before lyophilization. Unlabeled synthetic peptides from solid-phase peptide synthesis were directly obtained from Genscript (Piscataway, NJ, USA) as lyophilized powder with >95% purity. Stapled peptides γStpl and δStpl were also synthesized by Genscript through the introduction at two positions, denoted as X, of an unnatural amino acid with a di-substitution at the Cα position (pentenylalanine). The i,i + 4 stapled peptides were produced by crosslinking the unnatural amino acids via ring-closing metathesis. Both recombinant and synthetic lyophilized peptides were dissolved in 6 M guanidine thiocyanate (Merck KGaA, Darmstadt, Germany) and incubated overnight at 1250 rpm, 298 K, in a thermoblock. The sample was then injected in an Äkta Purifier System equipped with a Superdex Peptide 10/300 GL column (GE Healthcare, Chicago, IL, USA) equilibrated with ultrapure water, 0.1% trifluoroacetic acid. The fractions containing monomeric peptide were pooled and centrifuged at 386,000 × g for 3 h in an Optima TLX ultracentrifuge equipped with a TLA 120.1 rotor (Beckman Coulter, Atlanta, GA, USA). Orthophosphoric acid or sodium phosphate was added to a final concentration of 20 mM to adjust the pH to 2.8 or 7.4, respectively. The peptide concentration was determined by measuring the absorbance at 280 nm (extinction coefficients were calculated using the Protparam tool on the ExPASy website, https://web.expasy.org/protparam) before diluting the sample to the final experimental concentration.
Protein sample preparation
Recombinant samples with 15N isotope enrichment of the C-terminal domain of subunit 1 of the general transcription regulator TFIIF (RAP74-CTD), spanning residues 450-517, were obtained as described previously37. A codon-optimized synthetic gene cloned in a pDONR221 vector was obtained from GeneArt (Thermo Fisher Scientific, Waltham, MA, USA) and subcloned into a pDEST-His6MBP vector obtained from Addgene. Rosetta (DE3)pLysS competent cells (Novagen, Merck KGaA, Darmstadt, Germany) were grown in MOPS medium with 15NH4Cl as the only nitrogen source (310 K, induction at OD600 = 0.7, 1 mM IPTG, harvesting 3 h after induction). Soluble fractions of cell lysates from sonication in lysis buffer (50 mM Tris-HCl, 1 M NaCl, 10 mM imidazole, pH 8.0) were purified by IMAC, and fractions containing the His6MBP-TEV-RAP74-CTD fusion were dialyzed at 277 K overnight against cleavage buffer (50 mM Tris-HCl, 200 mM NaCl, 0.5 mM EDTA, pH 8.0) in the presence of TEV protease (50 μg/mL). A second IMAC step was performed and the flow-through containing the RAP-CTD was loaded onto a HiTrap SP HP cation exchange column followed by a size exclusion chromatography step on a Superdex 75 GL 10/300 column equilibrated with NMR buffer (20 mM sodium phosphate, 0.1% trifluoroacetic acid, pH 7.4), both mounted in an Äkta Purifier System (GE Healthcare, Chicago, IL, USA). The sample was concentrated to 100 μM using an Amicon Ultra 15 mL centrifugal filter (Merck KGaA, Darmstadt, Germany).
CD spectroscopy
Peptide samples for CD spectroscopy were diluted to a final concentration of 30 μM in a volume of 400 μL in either 20 mM phosphoric acid (pH 2.8) or sodium phosphate (pH 7.4) buffer. Spectra were obtained at 278 K (unless stated otherwise) in a Jasco 815 UV spectro-photopolarimeter with a 1 mm optical path cuvette using a data interval of 0.2 nm in the 190−260 nm range with a scanning speed of 50 nm min−1 and 20 accumulations. A blank spectrum acquired on the pertaining buffer under the same experimental conditions was subtracted from the sample spectrum. Thermal denaturation experiments were performed by acquiring a single accumulation spectrum with the same parameters at 10 K intervals, with a temperature ramp speed of 10 K min−1 and an equilibration time of 1 min.
NMR spectroscopy
Peptide samples for NMR spectroscopy were diluted to a final concentration of 100 μM in a volume of 400 μL in either 20 mM phosphoric acid (pH 2.8) or sodium phosphate (pH 7.4) buffer with added 10% v/v D2O and 10 μM DSS for internal chemical shift referencing, then filled into Shigemi tubes (Shigemi Co. Ltd, Tokyo, Japan). All NMR experiments were recorded at 278 K (unless stated otherwise) on either a Bruker Avance III 600 MHz or a Bruker Avance NEO 800 MHz spectrometer, both equipped with TCI cryoprobes, using TopSpin 4.0.8 for data acquisition (Bruker, Billerica, MA). Unlabeled synthetic peptides P1-5, P2-6, P3-7, P5-9 were characterized by two-dimensional homonuclear (TOCSY and NOESY) and heteronuclear (1H-13C HSQC) experiments. The TOCSY and NOESY mixing times were set to 70 and 200 ms, respectively. Water suppression was achieved by excitation sculpting59 using a 2 ms long Squa100.1000 selective pulse. For peptide backbone resonance assignment, using uniformly 15N,13C labeled peptides, the following series of 3D triple resonance BEST-TROSY60 experiments were acquired with 25% non-uniform sampling (NUS): HNCO, HN(CA)CO, HN(CO)CA, HNCA, and HN(CO)CACB. For some peptides, we resolved assignment ambiguities by also acquiring a 3D (H)N(CA)NH spectrum. Furthermore, 2D 13C-detected CACO and CON experiments61 were measured. Data processing was carried out with qMDD62 for non-uniform sampled data and with NMRPipe63 for all uniformly collected experiments. Data analysis was performed with CcpNmr Analysis version 2.464. Determination of the residue-specific helical propensity (pHel) from backbone chemical shifts HN, NH, C’, and Cα was performed using CheSPI65, which uses sequence and condition-corrected (temperature, pH) estimates for the reference random coil chemical shifts derived from POTENCI66. CheSPI was chosen over other algorithms as it intrinsically considers chemical shift changes derived from Glu side chain protonation, as the ~0.6 ppm C’ chemical shift change reported before67 and observed in the TBP CACO spectra (Fig. 5d).
Side chain aliphatic 1H chemical shifts were obtained from 3D 15N-edited TOCSY-HSQC (75 ms mixing time) and NOESY-HSQC (200 ms mixing time) spectra. Glutamine side chain resonances were assigned using complementary 3D H(CC)(CO)NH and (H)CC(CO)NH spectra recorded with 25% NUS and 14 ms C,C-TOCSY mixing. To further confirm the side chain Nε assignments for Gln5 in peptide Q1E and Gln8 in peptide Q4E we recorded 2D 1H-15N HSQC spectra of synthetic unlabeled peptides with specific 15N labeling only in the positions of interest.
For the detailed analysis of side chain rotamer distributions by the CoMAND approach, a 3D CNH-NOESY (i.e., 3D [H]C,NH HSQC-NOESY-HSQC) spectrum68 of [U-13C15,N] labeled (P3-7)2 was recorded at 800 MHz, 278 K, with 400 ms NOE mixing time and 64(15N) × 86(13C) × 2048(1H) complex data points corresponding to 15.2 × 104.8 × 6.3 Hz FID resolution. The final 15N-HSQC module employed sensitivity-enhanced coherence selection by gradients and band-selective flip-back of HC polarization to enable its faster re-equilibration during a shorter total interscan delay (experimentally optimized as 0.6 s). For the prior assignment of all aliphatic side chain 1H and 13C resonances and easy distinction between intra- and inter-residual NOE signals in the [H]C,NH HSQC-NOESY-HSQC spectrum, we furthermore recorded a set of 3D [H]CC[CA]NH TOCSY (11.3 and 22.6 ms FLOPSY8 mixing) and [H]CC[CO]NH TOCSY spectra (9 and 18 ms FLOPSY8 mixing).
To study glutamine side chain 15N relaxation, a 15N labeled (P3-7)2 sample was prepared in NMR buffer (pH 7.4). To avoid bias due to dipole-dipole cross-correlated relaxation within the Nε2H2 moieties and thus allow a direct comparison with the main chain NH data, we sampled only their 50% semi-protonated Nε2HD isotopomers in buffered 50% D2O and applied continuous deuterium decoupling during the 15N coherence evolution. Of note, the differential deuterium isotope shift of 15Nε2 in the Nε2Hε21Dε22 vs Nε2Dε21Hε22 species69 also allowed an unambiguous stereospecific signal assignment of the attached side chain carboxamide Hε (Supplementary Fig. 3b).To measure 15N R1 and R2 rates the conventional pulse sequences with sensitivity-enhanced coherence selection by gradients, water flip-back, and fully interleaved acquisition of relaxation delays were complemented with continuous deuterium decoupling during t1(15N) in order to suppress 15N(t1) line broadening from scalar relaxation (via 1J15N,D coupling) for the glutamine side chain NHD isotopomers of interest. An exponential decay function was fitted to the data to obtain T1 and T2 values:
with I0 and It corresponding to peak intensity at times 0 and t, respectively. R2/R1 ratios were calculated as T1/T2, and errors were derived by propagating the SD of the fits. In contrast, the pulse sequence for measuring the 15N{1H} heteronuclear NOE (likewise with sensitivity-enhanced coherence selection by gradients and continuous deuterium decoupling during t1(15N)) required further critical adaptations to suppress detrimental antiphase signal components (in F1(15N)) for the 50% glutamine NH2 isotopomers that impede a clean quantification of nearby NHD signals of interest. Thus, for the reference (non-saturated) spectrum, the first 90° 1H pulse in the sensitivity-enhanced reINEPT following t1(15N) had to be cycled (inverted) along with the receiver phase. For the HN saturated spectrum, however, further antiphase contamination derives from some 4NzH±H’± multiquantum coherence forming during the HN saturation sequence70 that can be removed by its phase cycling and/or by appending a concatenated 1H spoil sequence (z-gradient 1 − 90°(1H) - z-gradient 2). NOE SD values were calculated as previously described71.
RDCs were determined from the difference between couplings observed for aligned versus unaligned (P3-7)2 samples (with U-15N,13C labeling) where alignment was achieved72 using a gel kit from New Era Enterprises, Inc. (Vineland, NJ, USA). For this, 7% acrylamide gels were dialyzed in ultrapure water for 3 h and NMR buffer overnight. The prepared gels were then soaked in the peptide sample (ca. 0.2 mM) overnight at 277 K and squeezed into open-ended 5 mm NMR tubes using a funnel and piston. The filled tube was closed with a bottom plug and a Shigemi top plunge. The sample alignment uniformity was assessed via the deuterium signal splitting. One bond 1H-15N RDCs were obtained from comparing aligned versus unaligned 2D BEST-TROSY spectra60 measured at 278 K and selecting either the HN TROSY or semi-TROSY signals in the direct dimension. One bond 13C’-15N and two bond 13C’-1HN RDCs were derived by comparing the pertaining 1JC’,N splitting in the indirect (15N) and 2JC’,H splitting in the direct (HN) dimension, respectively, observed in the (not 13C decoupled) 2D 1H,15N HSQC spectra of aligned vs unaligned samples measured at 278 K. PALES73 was used to calculate the expected RDCs for each structure in the CoMAND-derived ensemble, allowing for the calculation of an independent alignment tensor for each frame based only on its coordinates. Predicted RDCs were obtained as the average of the ca. 200 frames generated in the 20 iterations of converged, R-factor-minimizing CoMAND global ensemble calculations, and scaled to minimize the RMSD against experimentally determined RDCs.
To study the interaction between the peptides and RAP74-CTD we measured the 2D 1H-15N BEST-TROSY spectrum of 15N-labeled RAP74-CTD (50 μM throughout, 298 K) in the presence of increasing concentrations of peptide. Before the experiment, both the protein and the peptide were dialyzed (277 K, two dialysis steps) in the same preparation of NMR buffer (20 mM sodium phosphate, 0.1% trifluoroacetic acid, pH 7.4), using a Pur-A-Lyzer (Sigma-Aldrich, Burlington, MA, USA) and a Micro Float-A-Lyzer (Spectrum Laboratories, San Francisco, CA, USA) respectively. The chemical shift assignment of RAP74-CTD was reported previously (BMRB code 27288). Averaged 1H and 15N chemical shift perturbations (CSPs) were calculated as:
Dissociation constants, KD, and averaged CSP amplitude, Δδmax, for δ, δstpl, γ and γstpl peptide binding to RAP74-CTD were obtained by a global fitting (nonlinear regression) of the 10% peaks with the largest averaged CSPs to the following single-site binding model (1:1 stoichiometry):
A Monte-Carlo simulation varying both protein and peptide concentrations within 20% experimental errors was used to derive error margins for the final KD values.
Structural characterization using CoMAND
To investigate the conformational tendencies of the (P3-7)2 peptide we applied the CoMAND method (Conformational Mapping by Analytical NOESY Decomposition24). This method analyzes a 3D CNH-NOESY spectrum (i.e. 3D [H]C,NH HSQC-NOESY-HSQC), which displays only NOE contacts between 15N-bound and 13C -bound protons and is therefore intrinsically diagonal-free68. As a first step, one-dimensional 13C sub-spectra (strips) were extracted from this spectrum (τm=400 ms). Each strip is taken perpendicular to a specific 15N-HSQC position and represents contacts to a single 15N-bound proton, edited by the 13C shift of the attached carbon. For (P3-7)2 we obtained strips for 18 main chain amide protons (residues L7 to K24) and all 4 glutamine side chain Hε21 protons (Q11, Q14, Q17, Q20). These strips are analyzed in terms of a quantitative R-factor expressing the agreement between experimental and back-calculated spectra. Global back-calculation parameters for CoMAND were optimized by grid searching, resulting in an overall correlation time of 2.0 ns and effective 13C signal halfwidth of 14 Hz.
For a reconstruction of the experimental 13C strips, CoMAND compiles a linear combination of strips back-calculated from a set of trial conformers that should reflect the conformational space of each residue. Here, we used the a99sb-disp and DES-amber MD trajectories and back-calculated 13002 frames from each trajectory for each of the 22 experimental strips. For each residue, the conformational ensemble producing the lowest R-factor was then compiled using the CoMAND stochastic optimization method24. A starting conformer is randomly selected from the 20 conformers with lowest R-factors. All conformers are then tested in random order, with a new member added to the ensemble if it decreases the R-factor by more than a given threshold (0.0005). Convergence is achieved if no further conformer is found or if the ensemble reaches a maximum size, here set at 20 structures. Due to its stochastic nature, this selection procedure can be repeated to produce ensembles with similar R-factors, but sampling a wider range of conformers.
For (P3-7)2 we applied a two-step protocol for each residue. In the first step we established the minimum R-factor by compiling 100 per-residue ensembles, optimizing over single experimental strips. These per-residue ensembles were also used to define a set of witness strips; i.e. those whose R-factors may be affected by conformational changes in the residue in question. In the second step, we obtained the conformational distribution by co-optimizing over these witness strips. For each set, 100 optimization trials were run for each MD trajectory frame pool, resulting in 100 ensembles per frame pool, each typically containing 5–15 members. After removing ensembles with R-factors significantly above average (90% confidence interval), the set of conformers used in co-optimization (typically over 1000) was pooled to represent the conformational diversity for each residue.
To quantify the conformational distributions, we clustered the data via GMM. A vector of n features - here dihedral angles - was defined for each conformer which was then used to train a model describing the probability p(k|x) that a data point x is a member of cluster k. For each cluster, this probability is defined by an n-dimensional multivariate Gaussian distribution representing its center and shape, and by a prior probability, p(k), corresponding to its relative population. These model parameters were fitted to the training data using the Expectation Maximization (EM) algorithm, modified to accommodate the periodic nature of dihedral angles. For (P3-7)2, we applied the GMM method for the χ1/χ2 pairs of all leucine and glutamine residues. As the number of clusters that best describe the training data was not known a priori, we searched values from 1 to 9 systematically and assessed the fit via the Bayesian information criterion, a measure that includes a penalty for model complexity. EM initialization requires an arbitrary seed value for each cluster center. For χ1/χ2 pairs, it is convenient to select seeds at the center of a rotameric form, with seeds progressively added with increasing cluster number, according to their database frequency. The best scoring model was stored for each residue.
The GMM method provides a compact but detailed description of conformational landscapes for use in downstream calculations. Here we have applied it to Monte-Carlo conformational sampling as part of an extended “greedy” R-factor optimization protocol. The pooled a99sb-disp and DES-amber MD trajectories were systematically sampled and the conformer affording the greatest reduction in global R-factor was added in each iteration. Thus, 2–4 conformers were typically added to the ensemble, which was then further optimized by adjusting the side chain conformations for leucine and glutamine residues using χ1/χ2 combinations from the corresponding GMM model with a 0.05 probability cutoff. For each residue in the ensemble, up to 30 χ1/χ2 combinations were tested to find sterically acceptable conformations lowering the global R-factor. For glutamine, the χ3 angle was additionally sampled around population centers pertaining to each χ1/χ2 combination (five trials; standard deviation 8°). Note that enthalpic contributions from hydrogen bonding were not considered in testing conformers, such that their selection was primarily driven by the reduction in R-factors. This iterative process of ensemble selection and modification was repeated until no further conformers were added by the greedy step. This protocol was repeated 20 times to probe the consistency of results and an example was chosen as the final ensemble (Fig. 2g).
Molecular dynamics simulations
The trajectories for the (P3-7)2 peptide used in the CoMAND analysis and for the TBP peptide shown in Fig. 5h were generated with the Well-Tempered Ensemble (WTE)36,74 enhanced sampling algorithm starting from a fully helical conformation. We used 26 energy-biased replicas within the temperature range from 275 K to 500 K, and two unbiased replicas at 278 K and 298 K. The unbiased replica at 278 K (at which temperature the 3D [H]C,NH HSQC-NOESY-HSQC was measured) was then used to generate the conformations for our CoMAND analysis and Fig. 5h. For the biased replicas, the energy bias was increased during the first 500 ps and then kept constant. During the bias-deposition, a Gaussian with a height of 1.2 kJ mol−1 and a width of 140 kJ mol−1 was added every 0.5 ps. The bias factor was set to 16. All replicas were subsequently used in a production simulation for 200 ns, where conformations used by CoMAND were extracted from the last 150 ns. The exchange between replicas was monitored to ensure good replica diffusion in temperature space. The production simulation was run in the NPT ensemble. In order to generate frame pools for our CoMAND analysis that best reproduce the experimental results, we used two recent force fields for these simulations, DES-amber26 and a99sb-disp27, each with its pertaining TIP4P-D water model. These are two of the best force fields describing the helix-coil equilibrium, and we used both of them to test our model for robustness. Bonds with hydrogen atoms were constrained and a time-step of 2 fs was used. For our simulations, we used Gromacs 2019.475,76,77 patched with the PLUMED library78 version 2.5.379 to enable the WTE sampling method.
Two sets of trajectories for segments of L3X peptides (see Supplementary Table 2) with the sequence L1L2L3XQ1Q2Q3Q4Q5Q6Q7Q8 (where X is any of the 20 natural proteinogenic amino acids) were calculated using either the Charmm36m force field31 with a TIP3P80 water model or the a99sb-disp force field27 with its TIP4P-D water model. In this case, our objective was to obtain accurate estimates of the donor Qi+4 Hε21 SASA depending on the nature and rotamer populations of the acceptor residue Xi. For this reason, we prioritized the choice of two force fields of different origins over an accurate description of the helix-coil equilibrium. Instead, to bias the simulations towards relevant conformations, calculations were started from a fully helical conformation, and the backbone φ and ψ dihedral angles of residues L2 to Q5 were restrained to −60° and −40°, respectively, with a spring constant k value optimized at 5 kJ mol−1 degree−2 (Supplementary Figs. 5a, 6a) that was maintained across the trajectory. Similarly, a bias to optimize the occurrence of Q4 → X side chain to main chain hydrogen bonds was introduced by restraining the distance between the main chain O of residue X and the Nε2 of residue Q4 to 4 Å, with a spring constant k value optimized at 50 kJ mol−1 nm−2 for the Charmm36m simulations (Supplementary Fig. 5b) and 250 kJ mol−1 nm−2 for the a99sb simulations (Supplementary Fig. 6b). Structure minimization and thermalization at 278 K was performed in the NVT ensemble for 1 ns. For the 1 μs NPT production runs we used Gromacs 2019.475,76,77 and an equilibration period of 100 ns that was excluded from trajectory analysis.
A trajectory was calculated for peptide Q1E using the CHARMM22*81 force field and TIP3P water model80 starting from a fully helical conformation to obtain different structures with Glui+4 → Leui side chain to main chain hydrogen bonds involving the glutamic acid carboxyl group that served as seeds for the QM/MM simulations (see below). The dihedral angle χ4 orienting the glutamic acid carboxyl group was restrained to 0° since its most stable conformation in CHARMM22*, corresponding to χ4 = 180°, is incompatible with the experimentally indicated side chain to main chain interaction. We previously reported an equivalent QM/MM study on the Glni+4 → Leui interaction where we used the CHARMM22* force field to generate the seeds and treat the classical sub-system14, so an equivalent configuration allowed us the direct comparison of the hydrogen bond electron densities and NPAs shown in Fig. 4h, i. A fully helical starting structure was thermalized (300 K) and equilibrated in the NVT ensemble for 1 ns. The 1 μs production run was obtained using ACEMD82 with a 100 ns equilibration period.
QM/MM simulations
Four starting structures were selected from the classical MD trajectory of peptide Q1E conserving their box of water and ions. For the QM/MM simulations, we used AMBER 2083 coupled to the QM Terachem 1.9 interface84,85,86. The QM subsystem was described at the BLYP/6-31 G* level of theory including dispersion corrections87 and comprised 66 atoms including linker atoms. The classical subsystem was treated with the CHARMM22*81 and TIP3P80 force fields. The linker atom procedure was employed to saturate the valency of the frontier atoms and electrostatic embedding was used as implemented in AMBER. An electrostatic cutoff of 12 Å and periodic boundary conditions were employed throughout all QM/MM-MD simulations, using a time step of 1 fs. Structures were minimized, thermalized, and equilibrated for 10 ps at the QM/MM level prior to the 150 ps-long production runs. Finally, for each of the 150 ps QM/MM-MD runs, a Natural Bond Critical Point analysis88,89 was performed using NBO 7.090.
Database motif searches
UniprotKB32 (including both the Swiss-Prot and the TrEMBL databases) was queried for protein sequences containing motifs that fulfill our design rules using the ScanProsite33 motif search tool hosted at the Expasy website (https://prosite.expasy.org/scanprosite/). The query motif was introduced in Prosite format as: [LFYWIM]-X-X-[LFYWIM]-Q-X-[LFYWIM]-Q-X-[LFYWIM]-Q-X-Q-X-X-Q-X-X to quest for sequences with n = 4 Qi+4 - [LFYWIM]i pairs, with the number of central -[LFYWIM]-Q-X- triplets increased stepwise for concomitant increases in the quested number of Qi+4 - [LFYWIM]i pairs. UniprotKB annotation scores and taxonomic lineage information were obtained by programmatically accessing this information in UniprotKB using the accession codes from the ScanProsite searches.
Data analysis and plotting
Data were analyzed using Python 3.7.8 along with packages Pandas 1.3.5, Numpy 1.21.2, Scipy 1.7.3, Biopython 1.76, and MDtraj 1.9.3, whereas Matplotlib 3.5.2 and Seaborn 0.11.2 were used for data plotting and visualization. PyMOL 2.3.5 was used to generate the figures displaying macromolecular structures.
Hydrogen bond criteria
We considered two atoms to be hydrogen bonded if the distance between donor H and the acceptor O was shorter than 2.4 Å and their angle was larger than 120°.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
The chemical shift assignments for peptides P1-5, P2-6, P3-7, P5-9, (P3-7)2, (P3-7)3, u(P3-7)2, u(P3-7)3, L4Q16 (pH 2.8), L3WQ16, L3YQ16, L3FQ16, L3MQ16, L3IQ16, L3EQ16, L3RQ16, L3AQ16, L3KQ16, L3HQ16, L3VQ16, L3TQ16, L3SQ16, Q1E, Q4E, Q5E, E(P3-7)3 and TBP in all measurement conditions specified in this work have been deposited in the BMRB (www.bmrb.io) with accession codes 51592, 51593, 51594, 51595, 51591, 51597, 51608, 51609, 51616, 51578, 51579, 51569, 51573, 51571, 51568, 51574, 51567, 51572, 51570, 51577, 51576, 51575, 51580, 51581, 51582, 51596 and 51583, respectively (Supplementary Table 2). The sequences of all oligonucleotides used in this study are available in Supplementary Table 3. Chemical shift assignments for L4Q16 at pH 7.4 are available at BMRB entry 27716. For peptide (P3-7)2, chemical shift assignments along with relaxation data (15N T1,15N T2,1H-15N heteronuclear NOEs), residual dipolar couplings (DNH, DNC, DCH, Supplementary Table 4), and spectral density peaks (J(0), J(ωN)) have been deposited in the BMRB with accession code 51591. The structural ensemble of peptide (P3-7)2 shown in Fig. 2g is available as Supplementary Data 1 and as PDB entry 8B1X. All 20 calculated ensembles of peptide (P3-7)2 are contained in Supplementary Data 2. The solution structure of RAP74-CTD is available as PDB entry 1NHA. The molecular dynamics simulations generated in this study have been deposited in Zenodo (https://doi.org/10.5281/zenodo.7270329). The UniProt accession codes, annotation scores and taxonomic information related to proteins containing P3-7n-like motifs are available in .xlsx format as Supplementary Data 3. The same file contains the accession codes related to those proteins for which we found structural models in the AlphaFold Database along with the results of the DSSP analysis. Also in the same file information is available related to the subset of proteins for which we calculated structural models using ColabFold with and without MSAs along with the results of the DSSP analysis. All structural models calculated with ColabFold are available as supplementary material as .pdb files contained in Supplementary Data 4. Source data are provided with this paper.
References
Jones, S. & Thornton, J. M. Principles of protein-protein interactions. Proc. Natl Acad. Sci. USA 93, 13–20 (1996).
Fuller, J. C., Burgoyne, N. J. & Jackson, R. M. Predicting druggable binding sites at the protein–protein interface. Drug Discov. Today 14, 155–161 (2009).
Tietz, O., Cortezon-Tamarit, F., Chalk, R., Able, S. & Vallis, K. A. Tricyclic cell-penetrating peptides for efficient delivery of functional antibodies into cancer cells. Nat. Chem. 14, 284–293 (2022).
Philippe, G. J. B., Craik, D. J. & Henriques, S. T. Converting peptides into drugs targeting intracellular protein-protein interactions. Drug Discov. Today 26, 1521–1531 (2021).
Ran, X. & Gestwicki, J. E. Inhibitors of protein–protein interactions (PPIs): an analysis of scaffold choices and buried surface area. Curr. Opin. Chem. Biol. 44, 75–86 (2018).
Rezaei Araghi, R. & Keating, A. E. Designing helical peptide inhibitors of protein-protein interactions. Curr. Opin. Struct. Biol. 39, 27–38 (2016).
Hack, V. et al. Efficient α-helix induction in a linear peptide chain by N-capping with a bridged-tricyclic diproline analogue. Angew. Chem. Int. Ed. Engl. 52, 9539–9543 (2013).
Patgiri, A., Yadav, K. K., Arora, P. S. & Bar-Sagi, D. An orthosteric inhibitor of the Ras-Sos interaction. Nat. Chem. Biol. 7, 585–587 (2011).
de Araujo, A. D. et al. Comparative α-helicity of cyclic pentapeptides in water. Angew. Chem. Int. Ed. Engl. 53, 6965–6969 (2014).
Smith, S. J., Du, K., Radford, R. J. & Tezcan, F. A. Functional, metal-based crosslinkers for α-helix induction in short peptides. Chem. Sci. 4, 3740–3747 (2013).
Cromm, P. M., Spiegel, J. & Grossmann, T. N. Hydrocarbon stapled peptides as modulators of biological function. ACS Chem. Biol. 10, 1362–1375 (2015).
Pelay-Gimeno, M., Glas, A., Koch, O. & Grossmann, T. N. Structure-based design of inhibitors of protein-protein interactions: mimicking peptide binding epitopes. Angew. Chem. Int. Ed. Engl. 54, 8896–8927 (2015).
Hilinski, G. J. et al. Stitched α-helical peptides via bis ring-closing metathesis. J. Am. Chem. Soc. 136, 12314–12322 (2014).
Escobedo, A. et al. Side chain to main chain hydrogen bonds stabilize a polyglutamine helix in a transcription factor. Nat. Commun. 10, 2034 (2019).
Eftekharzadeh, B. et al. Sequence context influences the structure and aggregation behavior of a PolyQ tract. Biophys. J. 110, 2361–2366 (2016).
Urbanek, A. et al. Flanking regions determine the structure of the poly-glutamine in huntingtin through mechanisms common among glutamine-rich human proteins. Structure 28, 733–746 (2020).
Muñoz, V. & Serrano, L. Elucidating the folding problem of helical peptides using empirical parameters. Nat. Struct. Biol. 1, 399–409 (1994).
Zimm, B. H. & Bragg, J. K. Theory of the phase transition between helix and random coil in polypeptide chains. J. Chem. Phys. 31, 526–535 (1959).
Lifson, S. & Roig, A. On the theory of helix—coil transition in polypeptides. J. Chem. Phys. 34, 1963–1974 (1961).
Sheu, S.-Y., Yang, D.-Y., Selzle, H. L. & Schlag, E. W. Energetics of hydrogen bonds in peptides. Proc. Natl Acad. Sci. USA 100, 12683–12687 (2003).
Neal, S., Nip, A. M., Zhang, H. & Wishart, D. S. Rapid and accurate calculation of protein 1H, 13C and 15N chemical shifts. J. Biomol. NMR 26, 215–240 (2003).
Wei, X. et al. Retro-inverso isomer of Angiopep-2: a stable d-peptide ligand inspires brain-targeted drug delivery. Mol. Pharm. 11, 3261–3268 (2014).
Presta, L. G. & Rose, G. D. Helix signals in proteins. Science 240, 1632–1641 (1988).
ElGamacy, M., Riss, M., Zhu, H., Truffault, V. & Coles, M. Mapping local conformational landscapes of proteins in solution. Structure 27, 853–865.e5 (2019).
Shapovalov, M. V. & Dunbrack, R. L. Jr. A smoothed backbone-dependent rotamer library for proteins derived from adaptive kernel density estimates and regressions. Structure 19, 844–858 (2011).
Piana, S., Robustelli, P., Tan, D., Chen, S. & Shaw, D. E. Development of a force field for the simulation of single-chain proteins and protein–protein complexes. J. Chem. Theory Comput. 16, 2494–2507 (2020).
Robustelli, P., Piana, S. & Shaw, D. E. Developing a molecular dynamics force field for both folded and disordered protein states. Proc. Natl Acad. Sci. USA 115, E4758–E4766 (2018).
Wan, W. Y. & Milner-White, E. J. A recurring two-hydrogen-bond motif incorporating a serine or threonine residue is found both at alpha-helical N termini and in other situations. J. Mol. Biol. 286, 1651–1662 (1999).
Pace, C. N. & Scholtz, J. M. A helix propensity scale based on experimental studies of peptides and proteins. Biophys. J. 75, 422–427 (1998).
Luque, I., Mayorga, O. L. & Freire, E. Structure-based thermodynamic scale of alpha-helix propensities in amino acids. Biochemistry 35, 13681–13688 (1996).
Huang, J. et al. CHARMM36m: an improved force field for folded and intrinsically disordered proteins. Nat. Methods 14, 71–73 (2017).
UniProt Consortium. UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Res. 49, D480–D489 (2021).
de Castro, E. et al. ScanProsite: detection of PROSITE signature matches and ProRule-associated functional and structural residues in proteins. Nucleic Acids Res. 34, W362–W365 (2006).
Baker, E. G. et al. Local and macroscopic electrostatic interactions in single α-helices. Nat. Chem. Biol. 11, 221–228 (2015).
Bravo-Arredondo, J. M. et al. The folding equilibrium of huntingtin exon 1 monomer depends on its polyglutamine tract. J. Biol. Chem. 293, 19613–19623 (2018).
Bonomi, M. & Parrinello, M. Enhanced sampling in the well-tempered ensemble. Phys. Rev. Lett. 104, 190601 (2010).
De Mol, E. et al. Regulation of androgen receptor activity by transient interactions of its transactivation domain with general transcription regulators. Structure 26, 145–152.e3 (2018).
Kamada, K., Roeder, R. G. & Burley, S. K. Molecular mechanism of recruitment of TFIIF- associating RNA polymerase C-terminal domain phosphatase (FCP1) by transcription factor IIF. Proc. Natl Acad. Sci. USA 100, 2296–2299 (2003).
Yang, A. et al. NMR structure of a complex formed by the carboxyl-terminal domain of human RAP74 and a phosphorylated peptide from the central domain of the FCP1 phosphatase †‡. Biochemistry 48, 1964–1974 (2009).
Simm, D. & Kollmar, M. Waggawagga-CLI: a command-line tool for predicting stable single α-helices (SAH-domains), and the SAH-domain distribution across eukaryotes. PLoS One 13, e0191924 (2018).
Swanson, C. J. & Sivaramakrishnan, S. Harnessing the unique structural properties of isolated α-helices. J. Biol. Chem. 289, 25460–25467 (2014).
Süveges, D., Gáspári, Z., Tóth, G. & Nyitray, L. Charged single alpha-helix: a versatile protein structural motif. Proteins 74, 905–916 (2009).
Tunyasuvunakool, K. et al. Highly accurate protein structure prediction for the human proteome. Nature 596, 590–596 (2021).
Kabsch, W. & Sander, C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22, 2577–2637 (1983).
Song, Z. et al. Modulation of polypeptide conformation through donor-acceptor transformation of side-chain hydrogen bonding ligands. Nat. Commun. 8, 92 (2017).
Harrington, L., Fletcher, J. M., Heermann, T., Woolfson, D. N. & Schwille, P. De novo design of a reversible phosphorylation-dependent switch for membrane targeting. Nat. Commun. 12, 1472 (2021).
Sia, S. K. & Kim, P. S. Protein grafting of an HIV-1-inhibiting epitope. Proc. Natl Acad. Sci. USA 100, 9756–9761 (2003).
Qu, J. et al. Specific knockdown of α-synuclein by peptide-directed proteasome degradation rescued its associated neurotoxicity. Cell Chem. Biol. 27, 751–762.e4 (2020).
Wilson, D. S., Keefe, A. D. & Szostak, J. W. The use of mRNA display to select high-affinity protein-binding peptides. Proc. Natl Acad. Sci. USA 98, 3750–3755 (2001).
Orr, H. T. & Zoghbi, H. Y. Trinucleotide repeat disorders. Annu. Rev. Neurosci. 30, 575–621 (2007).
Warner, J. B. 4th et al. Monomeric huntingtin exon 1 has similar overall structural features for wild-type and pathological polyglutamine lengths. J. Am. Chem. Soc. 139, 14456–14469 (2017).
Urbanek, A. et al. Evidence of the reduced abundance of proline cis conformation in protein poly proline tracts. J. Am. Chem. Soc. 142, 7976–7986 (2020).
Urbanek, A. et al. A general strategy to access structural information at atomic resolution in polyglutamine homorepeats. Angew. Chem. Int. Ed. Engl. 130, 3660–3663 (2018).
Baias, M. et al. Structure and dynamics of the huntingtin exon-1 N-terminus: a solution NMR perspective. J. Am. Chem. Soc. 139, 1168–1176 (2017).
Kotler, S. A. et al. Probing initial transient oligomerization events facilitating Huntingtin fibril nucleation at atomic resolution by relaxation-based NMR. Proc. Natl Acad. Sci. USA 116, 3562–3571 (2019).
Jayaraman, M. et al. Slow amyloid nucleation via α-helix-rich oligomeric intermediates in short polyglutamine-containing huntingtin fragments. J. Mol. Biol. 415, 881–899 (2012).
Elena-Real, C. A. et al. The structure of pathogenic huntingtin exon-1 defines the bases of its aggregation propensity. bioRxiv https://doi.org/10.1101/2022.10.25.513661 (2022).
Escobedo, A., Chiesa, G. & Salvatella, X. Recombinant production of monomeric isotope-enriched aggregation-prone peptides: polyglutamine tracts and beyond. Methods Mol. Biol. 2141, 211–231 (2020).
Hwang, T. L. & Shaka, A. J. Water suppression that works. Excitation sculpting using arbitrary wave-forms and pulsed-field gradients. J. Magn. Reson. A 112, 275–279 (1995).
Solyom, Z. et al. BEST-TROSY experiments for time-efficient sequential resonance assignment of large disordered proteins. J. Biomol. NMR 55, 311–321 (2013).
Bermel, W. et al. Protonless NMR experiments for sequence-specific assignment of backbone nuclei in unfolded proteins. J. Am. Chem. Soc. 128, 3918–3919 (2006).
Orekhov, V. Y. & Jaravine, V. A. Analysis of non-uniformly sampled spectra with multi-dimensional decomposition. Prog. Nucl. Magn. Reson. Spectrosc. 59, 271–292 (2011).
Delaglio, F. et al. NMRPipe: a multidimensional spectral processing system based on UNIX pipes. J. Biomol. NMR 6, 277–293 (1995).
Vranken, W. F. et al. The CCPN data model for NMR spectroscopy: development of a software pipeline. Proteins 59, 687–696 (2005).
Nielsen, J. T. & Mulder, F. A. A. CheSPI: chemical shift secondary structure population inference. J. Biomol. NMR 75, 273–291 (2021).
Nielsen, J. T. & Mulder, F. A. A. POTENCI: prediction of temperature, neighbor and pH-corrected chemical shifts for intrinsically disordered proteins. J. Biomol. NMR 70, 141–165 (2018).
Platzer, G., Okon, M. & McIntosh, L. P. pH-dependent random coil (1)H, (13)C, and (15)N chemical shifts of the ionizable amino acids: a guide for protein pK a measurements. J. Biomol. NMR 60, 109–129 (2014).
Diercks, T., Coles, M. & Kessler, H. An efficient strategy for assignment of cross-peaks in 3D heteronuclear NOESY experiments. J. Biomol. NMR 15, 177–180 (1999).
Liu, A. et al. Hydrogen-bond detection, configuration assignment and rotamer correction of side-chain amides in large proteins by NMR spectroscopy through protium/deuterium isotope effects. Chembiochem 9, 2860–2871 (2008).
Ferrage, F., Reichel, A., Battacharya, S., Cowburn, D. & Ghose, R. On the measurement of 15N–{1H} nuclear overhauser effects. 2. Effects of the saturation scheme and water signal suppression. J. Magn. Reson. 207, 294–303 (2010).
Farrow, N. A. et al. Backbone dynamics of a free and phosphopeptide-complexed Src homology 2 domain studied by 15N NMR relaxation. Biochemistry 33, 5984–6003 (1994).
Chou, J. J., Gaemers, S., Howder, B., Louis, J. M. & Bax, A. A simple apparatus for generating stretched polyacrylamide gels, yielding uniform alignment of proteins and detergent micelles. J. Biomol. NMR 21, 377–382 (2001).
Zweckstetter, M. NMR: prediction of molecular alignment from structure using the PALES software. Nat. Protoc. 3, 679–690 (2008).
Deighan, M., Bonomi, M. & Pfaendtner, J. Efficient simulation of explicitly solvated proteins in the well-tempered ensemble. J. Chem. Theory Comput. 8, 2189–2192 (2012).
Berendsen, H. J. C., van der Spoel, D. & van Drunen, R. GROMACS: a message-passing parallel molecular dynamics implementation. Comput. Phys. Commun. 91, 43–56 (1995).
Van Der Spoel, D. et al. GROMACS: fast, flexible, and free. J. Comput. Chem. 26, 1701–1718 (2005).
Abraham, M. J. et al. GROMACS: high performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX 1-2, 19–25 (2015).
PLUMED consortium. Promoting transparency and reproducibility in enhanced molecular simulations. Nat. Methods 16, 670–673 (2019).
Tribello, G. A., Bonomi, M., Branduardi, D., Camilloni, C. & Bussi, G. PLUMED 2: new feathers for an old bird. Comput. Phys. Commun. 185, 604–613 (2014).
Jorgensen, W. L., Chandrasekhar, J., Madura, J. D., Impey, R. W. & Klein, M. L. Comparison of simple potential functions for simulating liquid water. J. Chem. Phys. 79, 926–935 (1983).
Piana, S., Lindorff-Larsen, K. & Shaw, D. E. How robust are protein folding simulations with respect to force field parameterization? Biophys. J. 100, L47–L49 (2011).
Harvey, M. J., Giupponi, G. & Fabritiis, G. D. ACEMD: accelerating biomolecular dynamics in the microsecond time scale. J. Chem. Theory Comput. 5, 1632–1639 (2009).
Case, D. A. et al. AMBER 2020, University of California, San Francisco. (2020).
Ufimtsev, I. S. & Martínez, T. J. Quantum chemistry on graphical processing units. 1. Strategies for two-electron integral evaluation. J. Chem. Theory Comput. 4, 222–231 (2008).
Ufimtsev, I. S. & Martinez, T. J. Quantum chemistry on graphical processing units. 2. Direct self-consistent-field (SCF) implementation. J. Chem. Theory Comput. 5, 3138–3138 (2009).
Ufimtsev, I. S. & Martinez, T. J. Quantum chemistry on graphical processing units. 3. Analytical energy gradients, geometry optimization, and first principles molecular dynamics. J. Chem. Theory Comput. 5, 2619–2628 (2009).
Grimme, S., Ehrlich, S. & Goerigk, L. Effect of the damping function in dispersion corrected density functional theory. J. Comput. Chem. 32, 1456–1465 (2011).
Weinhold, F. Natural Bond Critical Point analysis: quantitative relationships between natural bond orbital-based and QTAIM-based topological descriptors of chemical bonding. J. Comput. Chem. 33, 2440–2449 (2012).
Bader, R. F. W. Atoms in molecules. Acc. Chem. Res. 18, 9–15 (1985).
Glendening, E. D. et al. Natural bond orbital. NBO 7.0. (2018).
Acknowledgements
We thank Luis Serrano for help with the Agadir predictions and helpful discussions, Ben Lehner and Ernest Giralt for helpful discussions and the ICTS NMR facility, managed by the scientific and technological centers of the University of Barcelona (CCiT UB), for their help in NMR. B.M. acknowledges funding from the Asociación Española contra el Cáncer (FCAECC project #POSTD211371MATE). C.G. acknowledges a graduate fellowship from MINECO (PRE2018-084684). M.S.-N. acknowledges funding from MINECO (PID2020-119810RB-I00). M.S.-N. holds a Ramón y Cajal contract (RYC2018-024759-I) from the Spanish Ministry of Science, Innovation, and Universities. X.S. acknowledges funding from AGAUR (2017 SGR 324), MINECO (BIO2015-70092-R and PID2019-110198RB-I00), and the European Research Council (CONCERT, contract number 648201). B.B.K acknowledges funding from the Novo Nordisk Foundation (#NNF18OC0033926). M.O. acknowledges funding from the Instituto Nacional de Bioinformática, The EU BioExcel Centre of Excellence for HPC and the Spanish Ministry of Science (PID2021-122478NB-I00) and the Instituto de Salud Carlos III–Instituto Nacional de Bioinformatica (ISCIII PT 17/0009/0007 co-funded by the Fondo Europeo de Desarrollo Regional). M.O. is an ICREA Academy scholar and J.A. is a Juan de la Cierva fellow. M.C. was supported by institutional funds of the Max Planck Society. This project has been carried out using the resources of CSUC. IRB Barcelona is the recipient of a Severo Ochoa Award of Excellence from MINECO (Government of Spain).
Author information
Authors and Affiliations
Contributions
Conceptualization: A.E., J.G., R.C. and X.S. Experimental data acquisition, processing, and analysis: A.E., J.P., T.D., B.M., C.G., M.S.-N., M.B., L.S., J.G, and O.M. Simulations and data analysis: A.E., J.A., B.T. and R.C. Ensemble generation, and analysis: A.E., M.C. Supervision: A.E., B.B.K., O.M., M.O., R.C., and X.S. Writing original draft: A.E., J.G., R.C. and X.S. Writing final version: all authors. Funding acquisition: B.B.K., O.M., M.O., R.C., and X.S.
Corresponding authors
Ethics declarations
Competing interests
M.B. and X.S. are founders of Nuage Therapeutics. M.B. is an employee of Nuage Therapeutics. X.S. is a scientific advisor of Nuage Therapeutics. The remaining authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks the anonymous reviewers for their contribution to the peer review of this work. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Source data
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Escobedo, A., Piccirillo, J., Aranda, J. et al. A glutamine-based single α-helix scaffold to target globular proteins. Nat Commun 13, 7073 (2022). https://doi.org/10.1038/s41467-022-34793-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467-022-34793-6
This article is cited by
-
Exploring the thermodynamics of protein aggregation: an insight to Huntington's disease therapeutics
Neuroscience and Behavioral Physiology (2024)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.