Side chain to main chain hydrogen bonds stabilize polyglutamine helices in transcription factors

Polyglutamine (polyQ) tracts are regions of low sequence complexity of variable length found in more than one hundred human proteins. In transcription factors, where they are frequent, tract length can correlate with transcriptional activity. In addition, in nine proteins, their elongation beyond specific thresholds is the cause of polyQ disorders. To investigate the structural basis of the association between tract length, biological function and disease we studied how the conformation of the polyQ tract of the androgen receptor, a transcription factor associated with a polyQ disease, depends on its length. We found that the tract folds into a helical structure stabilized by unconventional hydrogen bonds between glutamine side chains and main chain carbonyl groups that are bifurcate with the conventional main chain to main chain hydrogen bonds stabilizing α-helices. In addition, since tract elongation provides additional interactions, the helicity of the polyglutamine tract directly correlates with its length. These findings provide a structural basis for the association between polyglutamine tract length, transcriptional activity, and the onset of polyglutamine disorders.


Introduction
Polyglutamine (polyQ) tracts are low complexity regions containing almost exclusively Gln residues. They are frequent in the human proteome, particularly in the intrinsically disordered domains of proteins involved in the regulation of transcription such as the activation domains of transcription factors 1 . The functions of polyQ tracts are not well-understood but it has been suggested that they regulate the activity of the proteins that harbor them by modulating the stability of the complexes that they form 2 . The lengths of polyQ tracts are variable because their coding DNA sequences tend to adopt secondary structures that hamper replication and repair 3 . Contractions and expansions in polyQ tracts can have functional consequences and the lengths of polyQ tracts may have been subject to natural selection 4 . As an example it has been proposed that the length of the polyQ tract present in huntingtin correlates with the intellectual coefficient 5 , presumably because this protein plays important although still not well-defined roles in neural plasticity 6 .
For nine specific proteins, including huntingtin and the androgen receptor (AR), the variability in the lengths of polyQ tracts has pathogenic implications. Expansions beyond specific thresholds is associated with nine hereditary rare neurodegenerative diseases known as polyQ diseases 7 . The mechanistic basis of this phenomenon is a matter of debate: some have suggested that the actual expanded transcripts are the neurotoxic species 8 due to their propensity to phase separate 9 , while others have suggested that expanded polyQ proteins are inherently neurotoxic 10 . It is generally thought, however, that polyQ expansions decrease protein solubility, leading to the formation of cytosolic or nuclear aggregates that interfere with proteasomal protein degradation 11 and sequester the transcriptional machinery 12 . This is supported by experiments carried out in vitro and in cells, that showed that polyQ expansion decreases protein solubility 13 and causes cell death 14 , as well as in vivo, that showed that promoting the clearance of polyQ aggregates led to improvements in polyQ expansion phenotypes 15 .
Since the disease-specific thresholds of polyQ diseases are similar 16 it has been hypothesized that polyQ tracts have a generic propensity to undergo a tract-length-dependent conformational change producing a highly insoluble structure. A substantial number of theoretical, computational and experimental studies have investigated how the conformational properties of polyQ tracts change with their length. Some of these studies have suggested that expansions of the polyQ tract of huntingtin confer the ability to adopt extended conformations with β secondary structure 17 . By contrast, most experimental studies carried report that polyQ tracts are collapsed disordered coils that barely change conformation upon expansion 18 . This led to alternative hypotheses that proposes that expansion leads to toxicity by increasing the affinity that polyQ tracts have for their interactors, regardless of conformation 19 .
AR is the nuclear receptor that regulates the development of the male phenotype. It harbors a polyQ tract associated with the neuromuscular disease spinobulbar muscular atrophy (SBMA) 20 , that affects men with AR genetic variants coding for tracts with more than 37 residues, that form fibrillar cytotoxic aggregates 21 . The length of this tract also anti-correlates with the risk of suffering prostate cancer 22 due to its influence on AR transcriptional activity 23 . It seems, therefore, that the length of the polyQ tract of AR must be in a specific range to prevent the over-activation of the receptor and simultaneously minimize its propensity to form cytotoxic aggregates. This trade-off is reflected in the distribution of AR polyQ tract lengths in the population, despite some variations between ethnic groups 24 .
Despite their relevance for understanding the causes of two diseases, the structural basis of these sequence-activity relationships has not been established, in part due to the difficulty of obtaining atomic-resolution structures on these poorly soluble repetitive sequences. By establishing robust assays we characterized the conformation of the polyQ tract of AR 25 as a function of tract length by using circular dichroism (CD) and solution nuclear magnetic resonance (NMR) spectroscopy, as well as molecular dynamics (MD) and QM/MM (quantum mechanics/molecular mechanics) calculations. We found that its stability directly depends on tract length due to the accumulation of unconventional interactions where Gln side chains donate a hydrogen bond to the main chain COs of residues at relative position i-4. By coupling the conformation of the polyQ tract to that of its flanking region these interactions provide a plausible explanation of how changes in tract length cause changes in gene expression and solubility, thus providing a rationale for the range of tract lengths observed in men.

The polyQ tract of AR folds into a helix that gains stability upon elongation
We used CD to analyze the secondary structure of synthetic peptides uQ 25 , uL 4 Q 25 and L 4 Q 25 (Fig. 1a). Peptide uQ 25 , where the letter u stands for uncapped, represents a polyQ tract of length 25 flanked by Lys residues, used to enhance solubility at physiological pH 26 . uL 4 Q 25 possesses four Leu residues found N-terminally to the polyQ region in AR and peptide L 4 Q 25 contains four additional AR residues (Pro-Gly-Ala-Ser) predicted to act as N-capping motif 27 (Fig. S1). As shown in Figure S2, the CD spectra of both uL 4 Q 25 and L 4 Q 25 , measured at pH 7.4 and 277K, have well-defined minima at ca. 205-208 and 222 nm, especially for L 4 Q 25 , indicating that they are 40 and 55% helical, respectively, in contrast to peptide uQ 25 , which is 20% helical. These results indicate that the helicity of this polyQ tract stems from interactions involving eight residues flanking it at the N-terminus, including a predicted N-capping motif and four Leu residues 25 . To quantify how helicity depends on tract length we studied polyQ peptides equivalent to L 4 Q 25 but with tract lengths 4, 8, 12, 16 and 20 (L 4 Q n , Fig. 1a) by CD and observed that they are strongly correlated (Fig. 1b). Helicity increased abruptly from L 4 Q 4 to L 4 Q 8 and L 4 Q 12 , from ca 5% to ca 40%, and then increased slightly upon further elongation. Since the CD signal depends both on the amount and length of helical structures, and to determine the residuespecific distribution of helicity, we measured the backbone chemical shifts of the peptides by solution NMR (Figs. S3 and S4) and analyzed them with the algorithm δ2D 28 . We found an increase in helical propensity upon polyQ tract elongation, in agreement with the results obtained by CD (Fig. 1c), concomitant with a change in the identity of the residue with the highest helicity: whereas for peptide L 4 Q 4 this is L3, with ca 20 % helicity, it shifts to L4, with 50% helicity, for peptides L 4 Q 8 and L 4 Q 12 and to Q1, with ca 80% helicity, for peptides L 4 Q 16 and L 4 Q 20 (Fig. 1c). We conclude that the stability of the conformation of this polyQ tract depends on its length and that for physiological tract lengths 24 the residue of highest helicity can be part of the tract.

The side chains of the first residues of the polyQ tract have a distinct rotameric state
To rationalize the stability of the polyQ helix we extended our NMR analysis to the side chains and initially focused on the carboxamide groups of the Gln residues. We found that the 15 N side chain resonances of the homopolymeric polyQ sequences are surprisingly well-dispersed and that the associated chemical shifts correlate with their position in the sequence i.e. that the resonances of the first residue of the tract appear upfield (111.75 ppm for Q1 in L 4 Q 20 ) (Fig. 2a) and shift to lower fields towards the C-terminus of the tract (113.15 ppm for Q20 in L 4 Q 20 ). Remarkably the first four residues (Q1 to Q4) have chemical shifts that are markedly lower; e.g. in L 4 Q 20 the difference in side chain 15 N chemical shift between Q4 and Q5 is 0.22 ppm whereas the resonances of Q5 and Q6 overlap. This indicates that the chemical environment of the Gln side chains varies along the polyQ tract, especially for the first residues. We then analyzed the 1 H resonances of the Gln side chains. Especially in the first residues of the tract the resonances of the γ protons, adjacent to the carboxamide group (Fig. 2b), overlap in the peptide with the shortest tract but gradually split as the length of the tract increases to 20. The behavior of the β protons, that are instead adjacent to the peptide backbone ( Fig. 2c and S6), is more complex: in L 4 Q 4 they are split, upon tract elongation to L 4 Q 12 they collapse in one peak but they split again in L 4 Q 16 and, especially, in L 4 Q 20 . These effects, caused by redistributions of side chain rotameric states, correlate with the increases in helicity that occur upon tract elongation reported in Figure 1c, indicating that the conformations of the main chain and side chain of these residues are coupled. Although these effects are particularly marked for the first three or four residues of the tract they can also be observed in the residues following them in the sequence, particularly in L 4 Q 16 and L 4 Q 20 (Fig. S6); this indicates that, in a given peptide, the population of the side chain conformation causing the effects gradually decreases along the sequence. In summary, we find that the side chains of residues with high helicity have a conformation that is different to those that are less ordered.

Hydrogen bonds between Gln side chain NH 2 groups and main chain COs in helical conformers
To rationalize these observations we carried out molecular dynamics (MD) simulations. For this, since these peptides have fractional helicity, we generated fully helical conformations for peptides L 4 Q 4 to L 4 Q 20 and produced MD trajectories at 300K. We observed that the helical starting structures had a lifetime that depended on the length of the polyQ tract and that partially helical conformations were re-populated after unfolding (Fig. S7). Although this is not evidence for convergence it indicates that both the helical and unfolded states of the peptide have been sampled. An analysis of the helicity in the trajectory as a function of residue number showed that the residues with highest helicity were the four Leu residues flanking the polyQ tract and that the helicity of the Gln residues decreased along the tract (Fig. 3a). However, in contrast to the experiments (Fig. 1b), the overall helicity did not increase upon tract elongation (Fig. 3a).
To obtain representations of the structural properties of peptides L 4 Q 4 to L 4 Q 20 that are in quantitative agreement with experiment we used the Cα and CO backbone chemical shifts to reweight the trajectories with a Bayesian/Maximum Entropy (BME) algorithm 29 . In this procedure the degree of re-weighting and, therefore, the extent to which the back-calculated chemical shifts agree with those measured experimentally, is controlled by the parameter θ, which determines the balance between the prior information, encoded in the MD trajectory, and the experimental data (Figs. S8 and S9 and, for θ = 4, Fig. 3b). We analyzed the secondary structure of the reweighted trajectories and obtained that their overall helicity increased with the length of the polyQ tract (Fig. 3a), as observed by CD (Fig. 1b) and that the effect of elongation on the helicity of the various residues of the peptide was equivalent to that observed by NMR, indicating that the reweighted trajectories are useful models of the conformational properties of polyQ peptides (Fig. 3c).
The 15 N chemical shifts of backbone amides depend on the hydrogen bonding status of both the HN group and the adjacent CO 30 . We thus hypothesized that the high dispersion of 15 N Gln side chain resonances (Fig. 2) is due to hydrogen bonding interactions of the carboxamide group of the Gln side chains. The primary amide (NH 2 ) groups of Gln and Asn side chains are good donors 31 and, in surveys of hydrogen bonds involving them in protein structures, Gln residues can donate hydrogens to the backbone COs preceding them in the sequence 32 . To investigate this possibility we analyzed the hydrogen bonds formed by Gln side chains in the reweighted trajectories and found that the most common hydrogen bond is one where the side chain of a Gln residue donates a hydrogen to the main chain CO group of the residue at relative position i-4 in the sequence (Fig. 4a). This specific interaction, that we term i→i-4 side chain to main chain (sc i →mc i-4 ) hydrogen bond has been observed in protein structures deposited in the protein data bank (PDB); interestingly, it occurs almost exclusively in α-helices both in the PDB 32 and in the trajectories (Fig. S10), suggesting that it plays a role in stabilizing this structure. In addition, for the reweighted MD ensembles of all peptides we observed that the population of this specific hydrogen bond progressively decreased along the polyQ tract (Fig. 4b).
Figure 3 -Structures adopted by polyQ peptides as a function of tract length: a) Residuespecific helicity obtained for peptides L 4 Q 4 to L 4 Q 20 before and after reweighting. b) Comparison of the difference between the experimental and back-calculated Cα and C' chemical shifts with those back-calculated from the reweighted MD trajectories obtained for peptides L 4 Q 4 to L 4 Q 20 . c) Representative structures for peptides L 4 Q 4 to L 4 Q 20 , defined as the frame of each trajectory with residue-specific helicity most similar to the ensemble-averaged counterpart. Residues are colored as a function of their average helicity (in the reweighted ensemble) and the Cα atoms of Gln residues are shown as spheres We analyzed the rotamers populated by Gln residues involved in these hydrogen bonds in the reweighted trajectories and observed that they constrain the range of values of χ1 and χ3 that they can adopt (Fig. 4c). Note that while the distribution of χ1 in Gln residues in α-helices is generally bimodal 33 only χ1 values around -60° are compatible with the suggested H-bonding motif, which also results in an enrichment of χ3 values around 90°. This is in agreement with the NMR results, which point towards the adoption of a specific conformation state by these side chains (Fig. 2c). As an example, we show a frame of the trajectory obtained for peptide L 4 Q 16 in which two such hydrogen bonds occur simultaneously (involving residues Q1 and Q4 but not Q2 and Q3; Fig. 4d). The NMR-derived structural ensembles thus suggest that sc i →mc i-4 hydrogen bonds can be part of a hydrogen bonding motif where the CO group accepts two hydrogen bonds donated by the Gln side (purple) and main (yellow) chains. To test the importance of sc i →mc i-4 hydrogen bonds we used CD to analyse the secondary structure of peptides based on the L 4 Q 16 sequence but with Gln residues substituted with Glu (Fig. 5a,b). Gln and Glu have similar structures and helical propensities 34 but the side chain of Glu is deprotonated at pH 7.4 and cannot act as hydrogen bond donor. Decreases in helicity after mutation of Gln residues are thus compatible with their involvement in helix stabilization via sc i →mc i-4 hydrogen bonds. Since in the NMR-derived ensembles the population of such hydrogen bonds is highest at the N-terminus of the tract (Fig. 4b) we analyzed the effect of mutating, one at a time, each of the first five Gln residues (peptides Q1E to Q5E) and found that the helicity of Q1E to Q4E was lower than that of L 4 Q 16 : we observed a shift of the minimum at ca 205-208 nm to lower wavelengths and a relative decrease in the ellipticity at 222 nm that, together, accounted for a decrease in helicity from 40 to 30%. By contrast we found that the helicity of Q5E was very similar to that of L 4 Q 16 (Fig. 5c and S11), suggesting that the propensity of the first four Gln residues to donate a hydrogen is higher than that of the fifth one. This is in agreement with the 15 N Gln side chain chemical shifts, where we observed especially low values for the first four residues, which could be caused by particularly strong hydrogen bonding interactions (Fig. 2a). We also analyzed a mutant where the first four hydrogen bonded Gln residues were simultaneously mutated to Glu (Q1-4E) and found that in this case the loss of helicity was larger, from 40 % to 20 %, similar to the value found in uQ 25 (Fig. 5b,c and S11).
Since the pK a of Glu side chains is ca 4, decreasing the pH of solutions of peptide Q1-4E to 2 should lead to their protonation and re-establish their ability to form sc i →mc i-4 hydrogen bonds.
To investigate this hypothesis, we analyzed the secondary structure of peptides L 4 Q 16 and Q1-4E at pH 2 by CD. For peptide L 4 Q 16 we observed, as expected, no change in secondary structure, whereas for peptide Q1-4E we instead observed that it was strongly helical at low pH, more so than L 4 Q 16 (Fig. S14). This suggested that, when protonated, Glu side chains, due to their acidic character, have an even higher propensity than Gln residues to donate a hydrogen bond to the main chain CO of the residue at position i-4. These results validate our approach to investigate side chain to main hydrogen bonds by Gln to Glu mutations and in addition contribute to explaining the high helical propensity observed in host-guest experiments for protonated Glu residues, where it is more helical than any other amino acid except Ala 34 .
It is remarkable that the first side chains of the polyQ tract have a particularly high propensity to form sc i →mc i-4 hydrogen bonds. The other Gln residues do so but with lower propensity, as suggested for example by their side chain chemical shifts. One difference between these two sets of Gln residues is that the former are at position i+4 relative to Leu residues whereas the latter are instead at position i+4 relative to Gln residues (Fig. 5b). Since the strength of hydrogen bonds depends on their degree of shielding from water 35 we hypothesized that the sc i →mc i-4 hydrogen bonds between Gln and Leu residues are stronger, at least in part, due to shielding of water by Leu side chains. Indeed, as α-helices have 3.6 residues per turn the sc i →mc i-4 hydrogen bond between residues L1 and Q1 can be shielded by the side chain of residue i (L1) (Fig. 5b). To investigate this we measured the helicity of a peptide based on the sequence of L 4 Q 16 but with all Leu residues mutated to Ala (L1-4A), an amino acid that has a smaller side chain and, presumably, a lower ability to shield this hydrogen bond. We found that, despite the higher intrinsic helical propensity of Ala compared to Leu 34 and the higher predicted helicity of L1-4A compared to L 4 Q 16 (Fig. S13), the helicity of L1-4A was only ca. 20%, as low as that of Q1-4E (Fig. 5c). This confirms that the shielding properties of the Leu side chains are indeed key for the strength of this interaction and for its ability to stabilize polyQ helices, and in addition indicates that accounting for the sc i →mc i-4 hydrogen bond revealed in this work will be important to reliably predict the helicity of polyQ peptides from their sequences (Fig. S13). To confirm that the shielding provided by Leu is relevant for the ability of Gln to donate a hydrogen bond to the residue at relative position i-4, we characterized the synthetic peptide L1-4A by NMR. We compared the side chain 1 H, 15 N resonances of peptide L1-4A with those of L 4 Q 16 by carrying out 1 H, 15 N-HSQC experiments at natural 15 N abundance and observed that there was a complete loss of dispersion in the 15 N chemical shift dimension for L1-4A: except for the last three Gln residues, all other residues in the tract have the same 15 N chemical shift (Fig.  5d). We then analyzed the side chain 1 H resonances of the Gln side chains and observed that, in contrast to L 4 Q 16 , the signals of Q1 to Q4 in L1-4A display collapsed γ and split β resonances, indicating that these side chains do not have the same conformation as in L 4 Q 16 .

The hydrogen bonds between Gln side chain NH 2 groups and main chain COs are bifurcate
Our results suggest that the side and main chain of Gln can simultaneously donate a hydrogen to the CO of the residue at relative position i-4 (Fig. 4d). This can generate a type of bifurcate hydrogen bonding, shown to occur experimentally 36 and in QM calculations 37 , that takes advantage of the directionality of the lone pairs of the acceptor group. This type of interactions are not accurately represented in the atom-centric representation of electrostatic interactions used in molecular simulation force fields, which may explain the problems we had to reproduce the experimental helicity in the classical MD simulations (Fig. 3a). To more accurately model the sc i →mc i-4 hydrogen bond we performed MD simulations by making use of the hybrid QM/MM methodology, which can account for a series of effects ignored in classical force fields such as lone pair directionality and electronic polarization. Specifically, given our results (Figs. 5b,c), the side chain carboxamide of the Gln residue at position i and the main chain CO group of Leu at position i-4 in peptide L4Q16 were included in the QM subsystem that was described at the DFT level of theory (see Fig. 6a). We performed a simulation of 150 ps at 300 K for the L 4 Q 16 peptide started from a specific frame of the classical MD trajectory where the bifurcate bond is formed (Fig. 6) and focused our analysis in the interaction between Q1 and L1 (Fig. 5b).
Our analysis showed that the main chain to main chain hydrogen bond between Q1 and L1 (mc Q1 →mc L1 ) is stable, that the the sc Q1 →mc L1 bond can form reversibly and that its breakage is caused by deviations of χ3 from the value required for the donor and acceptor to interact (+ 90 ± 30°, Fig. 4c,d, 6b,c). To analyze how the sc Q1 →mc L1 bond affects the mc Q1 →mc L1 interaction we compared the effect of the former on the distribution of donor to acceptor distances in the latter. We found that it caused the distribution to shift to longer distances, by 0.17 Å, thus weakening the hydrogen bond, indicating that the main and side chains of Q1 compete for the main chain CO group of L1 (Fig. S15). We then evaluated the strength of these interactions in terms of electron density at the interaction's natural bond critical point, ρ(r) 38,39 . We obtained that in the absence of the sc Q1 →mc L1 bond the mc Q1 →mc L1 bond has an average density of 0.014 au and, in its presence, of 0.008 au. By contrast, even in the presence of the mc Q1 →mc L1 bond, the value for the sc Q1 →mc L1 interaction is instead, on average, 0.017 au, in agreement with the notion that the Gln sidechain can be a better donor than the main chain 31 . Importantly, the total density to the bifurcate hydrogen bond is on average 0.025 au (Fig. 6c) indicating that the interaction between Q1 and L1 is strong. These results show that the unconventional sc i →mc i-4 hydrogen bonding interactions revealed in this work are bifurcate with the conventional mc Q1 →mc L1 interactions and strong, thus enhancing the stability of polyQ helices. of the distances between donor and acceptor for the mc Q1 →mc L1 and sc Q1 →mc L1 interactions, with an indication, with a grey background, of the frames for which 60° < χ3 < 120°. c) Distributions of the χ1, χ2 and χ3 dihedral angles of the side chains of Q1 with an indication, as a grey shade, of the range of values of χ3 that are compatible with the sc Q1 →mc L1 hydrogen bond. d) Distribution, plot as a normalized histogram, of the electron density ρ(r) corresponding to the mc Q1 →mc L1 interaction (yellow) in the absence (white background) and in the presence (grey background) of the sc Q1 →mc L1 , to the sc Q1 →mc L1 interaction (purple) and to the bifurcate hydrogen bond (grey).

Discussion
By combining experiments and simulations we have found that unconventional sc i →mc i-4 hydrogen bonds donated by Gln side chains can stabilize the α-helices formed by polyQ tracts. We also found, moreover, that their strength depends on the residue type of the acceptor: Leu residues are good acceptors while Ala residues are not. These results help rationalize the structural properties of polyQ tracts reported in the recent literature 25,40,41 . In the AR we found that the four Leu residues flanking the polyQ tract of the AR at its N-terminus are key for helicity 25 , which we attribute to their high propensity to accept sc i →mc i-4 hydrogen bonds. The tract of huntingtin, associated with Huntington's disease, also displays some helicity at low pH 40,41 , although lower than that observed in the AR. Even though the ability of each particular natural residue type to act as a sc i →mc i-4 hydrogen bond acceptor remains to be determined, that only the first position in the four residue stretch preceding the polyQ tract in huntingtin is a Leu could explain its lower secondary structure content.
Both in the AR and in huntingtin the helical character of the polyQ tract is not homogeneously distributed and is instead found to gradually decrease from the N to the C-terminus of the tract 25,40,41 . Our results indicate that this can be explained by a low propensity of Gln residues, relative to that of residues flanking the tracts at their N-terminus, to accept sc i →mc i-4 hydrogen bonds: unless interrupted by residues, such as Leu, with a high propensity to accept such bonds, helicity will decay towards the C-terminus of the tract. In addition our results provide a mechanistic interpretation of the results obtained by Kandel, Hendrikson and co-workers in their investigation of the effect of increasing the coiled coil character of polyQ tracts by interrupting them with Leu residues 2 . These authors found that the peptides were fully helical and remained so after dissociation of the coiled coil upon heating to temperatures as high as 348 K due, we propose, to the presence of sc i →mc i-4 hydrogen bonds with Leu acting as acceptor.
We attribute the high propensity of Leu residues to accept sc i →mc i-4 hydrogen bonds to the close proximity between the hydrogen bond and the Leu side chain. This can prevent water molecules from hydrogen bonding the interacting moieties and strengthen the sc i →mc i-4 interaction due to the energetic costs associated to unpaired hydrogen bonding partners 35 . Dry environments where this can occur include the core of globular proteins 42 , the interior of cell membranes 43 as well as as amyloid fibrils, where equivalent interactions, parallel to the fibril axis, contribute to the stability of the quaternary structure 44 . In addition it has been shown that both exon 1 of huntingtin 45 and the transactivation domain of AR 46 can form condensates that define environments of low dielectric constant, where electrostatic interactions may be strongly favored 47 . It will be interesting to investigate whether interactions such as those described here play a role in the phase separation process of these and similar proteins.
PolyQ tracts are frequently found in transcriptional regulators, particularly in transcription factors 1 . In several cases their transcriptional activity has been found to depend on the length of the polyQ tracts that they harbor but the physical basis of this phenomenon has not yet been firmly established 1,48 . Our results provide a possible rationale as they suggest that variations in the length of polyQ tracts would result in changes in the secondary structure of the transactivation domain of transcription factors. Indeed, these can affect the strength of the protein-protein interactions that regulate transcription 49 , that include interactions with transcriptional co-regulators and with general transcription factors. Whether a certain change in tract length causes a decrease or an increase in activity might depend on whether the polyQ tract and its flanking regions are involved in interactions with transcriptional co-activators or corepressors and should therefore be context-dependent, as found experimentally 48 .
A number of highly detailed in vitro experiments have established that the formation of fibrillar aggregates by proteins bearing polyQ tracts can proceed via oligomers 50 , potentially liquid-like 51 stabilized intermolecular interactions between flanking regions of polyQ tracts and equivalent to those stabilizing coiled coils 2,52 . Since extending the length of the tract increases the helicity of both the tract and its N-terminal flanking region it is conceivable that this will change the secondary structure and, therefore, the strength of the interactions that stabilize o these oligomers as well as, potentially, the rate at which they convert into fibrils. Our data, therefore, suggests that tract elongation can alter the structure and the stability of the oligomers populated on the fibrillization pathway and, as a consequence, modify the rate at which toxic fibrillar species build up 14 .
In summary we have shown that side chain to main chain hydrogen bonds donated by Gln side chains can cause polyQ tracts to form helices and that the stability of these helices directly correlates with the tract length. This unconventional interaction, due to the high propensity of the carboxamide group of the Gln side chain to donate hydrogens, is so energetically favoured that it can offset the entropic cost of constraining the range of conformations available to the side chain. In addition we have shown that the strength of these interactions depends on the degree to which the Gln side chains are exposed to water, implying that the secondary structure of polyQ tracts may vary depending on solution conditions, oligomerization state and interactions with other molecules. Our findings provide a mechanistic basis for the link that exists between polyQ tract length and transcriptional activity in transcription factors such as the AR and, more generally, between tract length and aggregation via helical oligomeric intermediates in polyQ diseases.
the experiments and lead their analysis and interpretation. X.S. conceived and led the project and wrote the first draft of the manuscript. All authors contributed to the final version. qMDD 2 for non-uniform sampled data and NMRPipe 3 for all uniformly collected experiments. Synthetic peptide L1-4A was prepared as detailed above to a final concentration of 250 µM and characterized by two-dimensional homonuclear (TOCSY and NOESY) and heteronuclear ( 1 H-15 N HSQC, at natural 15 N abundance) experiments. The TOCSY and NOESY mixing times were 70 and 200 ms, respectively.

Molecular dynamics, analysis and trajectory reweighting by maximum entropy
Input coordinates were generated using MacPyMOL in fully helical conformations. All simulations were performed in MD simulation software ACEMD 4 by using the CHARMM22* 5 , that was designed to have an accurate helix-coil balance force field. Each system was explicitly solvated in TIP3P water model inside cubic boxes from 25 Å to 40 Å distance around the peptides, depending on their length, and neutralized with Cland Na + ions. Initial conformations were minimized and equilibrated under NPT conditions at 1 atm and 300K for 1 ns. Production simulations were performed at 300K in the NVT ensemble using a 4 fs time-step for 5µs. The analysis of the secondary structure of individual frames was carried out with DSSP 6 and the chemical shifts were back-calculated with the predictor PPM 7 . The reweighting of the trajectories to match the experimental chemical shifts was carried out by using a Bayesian/Maximum Entropy method 8 (code available at: github.com/sbottaro/BME). The BME approach contains a single, free parameter (θ) that determines the balance between fitting the experimental data and not deviating too much from the prior information encoded in the force field. We chose θ=4 for the analysis shown in the main text based on an analysis showing this value to provide a good balance between the two terms (Fig. S9), and show results for other values of θ in Fig. S8.

Hydrogen Bond Criteria
To classify whether two atoms are hydrogen bonded we used angle and distance criteria. Specifically, we define hydrogen bonds as those where the distance between the donor and the acceptor is shorter than 3.4 Å (2.4 Å between H and heavy atom) and the donor hydrogenacceptor angle is greater than 120°.

Model Structures
After reweighting, we calculated the residue-specific helicity for all of the peptides by using the algorithm DSSP 6 . For model structure selection, residues that are in the helical conformation more than 50% of the simulation are defined as helical and the rest as random coil. From the simulation the structures that fit to this definition are selected and colored by their average helicity from Figure 3c. Color scale goes from dark blue (0% helicity) to dark red (78% helicity).

QM/MM calculations
The starting structure was selected from the classical MD simulations of L 4 Q 16 , preserving the previously defined box of water and ions. The AMBER 16 program 9 interfaced to the Terachem 1.9 program (www.petachem.com, accessed June 1, 2017) was used for the QM/MM simulation. QM atoms were described at the BLYP/6-31G* level including a dispersion correction 10 . The classical subsystem was described with the CHARMM22* 5 force field by making use of the Chamber keyword of Parmed program included in AMBERTOOLS 16 9 . The link atoms procedure as implemented in AMBER program was used to saturate the valence of