Huntingtin (HTT) is a large (348 kDa) protein that is essential for embryonic development and is involved in diverse cellular activities such as vesicular transport, endocytosis, autophagy and the regulation of transcription1,2. Although an integrative understanding of the biological functions of HTT is lacking, the large number of identified HTT interactors suggests that it serves as a protein–protein interaction hub1,3,4. Furthermore, Huntington’s disease is caused by a mutation in the HTT gene, resulting in a pathogenic expansion of a polyglutamine repeat at the amino terminus of HTT5,6. However, only limited structural information regarding HTT is currently available. Here we use cryo-electron microscopy to determine the structure of full-length human HTT in a complex with HTT-associated protein 40 (HAP40; encoded by three F8A genes in humans)7 to an overall resolution of 4 Å. HTT is largely α-helical and consists of three major domains. The amino- and carboxy-terminal domains contain multiple HEAT (huntingtin, elongation factor 3, protein phosphatase 2A and lipid kinase TOR) repeats arranged in a solenoid fashion. These domains are connected by a smaller bridge domain containing different types of tandem repeats. HAP40 is also largely α-helical and has a tetratricopeptide repeat-like organization. HAP40 binds in a cleft and contacts the three HTT domains by hydrophobic and electrostatic interactions, thereby stabilizing the conformation of HTT. These data rationalize previous biochemical results and pave the way for improved understanding of the diverse cellular functions of HTT.
Computational and biochemical studies of HTT have predicted a variable number of HEAT repeats interspersed by unstructured regions8,9,10,11,12. However, attempts to determine the structure of HTT at high resolution have been hindered by its flexibility13,14,15. Most structural studies have focused on an N-terminal fragment corresponding to the first exon of the HTT gene, and the majority of the protein (more than 97% of its amino acid length) remains largely uncharted14. To overcome this hurdle we searched for interaction partners that could stabilize the structure of HTT. A first screen using polyglutamine-expanded (46 glutamine) full-length human HTT (46QHTT) expressed at low levels in HEK293 cells identified abundant binding with HAP40 (Fig. 1a), which has been previously reported to interact with HTT7 and to recruit HTT to early endosomes16. Although a complex of HTT and HAP40 could not be reconstituted from the individual proteins in vitro, the complex was purified at high yield from human cells co-expressing both wild-type full-length human HTT (17QHTT) and HAP40 (Fig. 1b). Whereas HTT alone formed oligomers and tended to aggregate17, the HTT–HAP40 complex eluted as a symmetric narrow peak during size-exclusion chromatography (Fig. 1c). Ultracentrifugation analysis consistently indicated that the HTT–HAP40 complex was more conformationally homogeneous than HTT alone (Extended Data Fig. 1). The HTT–HAP40 complex, but not its isolated components, showed a sharp, strong unfolding transition in differential scanning fluorimetry assays (Fig. 1d), confirming that the complex was stable and amenable to structural studies18.
In the absence of an interactor, the conformational heterogeneity of HTT prevented high-resolution cryo-electron microscopy (cryo-EM) analysis. By contrast, the HTT–HAP40 complex was well defined and yielded a globular structure measuring approximately 120 × 80 × 100 Å (Fig. 2, Extended Data Fig. 2a, b), which was to some extent reminiscent of a published negative-stain structure of HTT9. The global resolution of the map was 4 Å (Extended Data Fig. 2c, d), sufficient to build a de novo atomic model using energy minimization with well-resolved large side chains as landmarks (Fig. 2, Extended Data Figs 2e, 3, Extended Data Table 1). For both HTT and HAP40, all secondary-structure elements resolved in the model corresponded to α-helices (Extended Data Fig. 4), in agreement with computational predictions using PSIPRED19 (Extended Data Fig. 5). For HTT, 72% of the helices were arranged in HEAT or other tandem repeats. On the other hand, most of the regions not resolved in the map were predicted to be unstructured. Notably, no density was observed for the HTT exon 1 fragment (residues 1–90; 17QHTT is used for amino acid numbering throughout the text) even at very low thresholds, indicating that this region of the protein is extremely flexible. Thus, polyglutamine length may have limited influence on the overall architecture of the HTT–HAP40 complex.
The domain organization of HTT has been controversial1,8,9,10,11,12. Our data show that HTT consists of three domains: N- and C-terminal domains containing multiple HEAT repeats (hereafter N-HEAT and C-HEAT) linked by a smaller bridge domain (Fig. 2). N-HEAT (residues 91–1,684) forms a typical α-solenoid, comprising 21 HEAT repeats arranged as a one-and-a-half-turn right-handed superhelix, the concave face of which defines an arch of approximately 80 Å in diameter (Fig. 3a). Two putative membrane-binding regions have been identified in HTT, both within N-HEAT: an exon 1 fragment, especially residues 1–17, which may form an amphipathic helix20, and a larger region at residues 168–366, which contains a functionally important palmitoylation site at C20821,22. Although the N terminus corresponding to exon 1 is not visible in our structure, N-HEAT repeats 2–4 (residues 160–275) form a positively charged region at the second putative membrane-binding region in the N-HEAT convex surface (Fig. 4a). However, a previously reported putative amphipathic helix (residues 223–240)22 faces the inner concave side of N-HEAT, with limited accessibility to membrane interactions.
Consistent with our computational predictions (Extended Data Fig. 5) and previous studies1,14, N-HEAT accommodates a large disordered insertion (residues 400–674) between N-HEAT repeats 6 and 7 (Fig. 3a). The insertion projects outwards without interrupting the interactions between HEAT repeats and makes this region accessible to protease action. Multiple proteolytic cleavage sites have been mapped to this region1,14,23, but the continuous packing of N-HEAT repeats 6 and 7 makes it unlikely that such cleavage events would result in an easy release of N-terminal fragments, consistent with previous reports15,24. This insertion also harbours multiple phosphorylation sites that may modulate protein–protein interactions and proteolytic accessibility1,25,26,27, perhaps by regulating the interaction of this insertion with the positively charged region of N-HEAT. Most other reported post-translational modifications of HTT are located in presumably unstructured regions that are not resolved in our map (Extended Data Fig. 4), including protease cleavage sites that release N-terminal HTT fragments1,23,28.
C-HEAT (residues 2,092–3,098) comprises 12 HEAT repeats forming an elliptical ring of approximately 80 × 30 Å (Fig. 3b) with repeats 1 and 12 interacting to close the ring. Unlike canonical HEAT repeats, in which the first helix is exposed to the convex surface of the domain, the first helix of C-HEAT repeat 12 faces the concave surface of C-HEAT. The repeats are interrupted by two insertions. Insertion 1 (residues 2,121–2,456) consists of 12 helical segments that separate C-HEAT repeats 1 and 2. On the other hand, insertion 2 (residues 2,510–2,663) is mostly unstructured and does not interfere with the interaction between C-HEAT repeats 2 and 3. Both insertions are harboured in the concave surface of C-HEAT, potentially shielding this region from protein–protein interactions. By contrast, both the convex and concave surfaces of N-HEAT are accessible in the structure (Fig. 3a) and could thus act as cargo-binding sites. This may explain why most of the known binding sites of HTT interactors have been mapped to its N terminus1,3,29.
N-HEAT and C-HEAT are stacked approximately vertically and connected by the bridge domain (residues 1,685–2,091, Fig. 3c). This domain contains six tandem α-helical repeats, of which repeats 3, 4 and 6 are armadillo-like. The repeat region is flanked by five non-repeat helices and a flexible C terminus (residues 2,062–2,092), which is unresolved. Besides this flexible linkage, N-HEAT and C-HEAT are connected only weakly via loop interactions (Fig. 3d), explaining the highly dynamic structure of HTT in the absence of interaction partners such as HAP40.
HAP40 binds within the cleft defined by the two HEAT domains and the bridge domain, thereby stabilizing the observed HTT conformation. HAP40 consists of 14 α-helices arranged in tetratricopeptide repeat-like tandem repeats (Fig. 2). Within the complex, HTT and HAP40 share large interfaces with mainly hydrophobic interactions (Fig. 4a). This finding is consistent with our differential scanning fluorimetry data, which suggest that the exposure of hydrophobic areas is considerably reduced in the HTT–HAP40 complex (Fig. 1d). The C terminus of HAP40 contains four negatively charged residues that interact with a positively charged patch on the bridge domain of HTT (Fig. 4b). By contrast, the N terminus of HAP40 is mostly solvent exposed, and consequently helix 1 is not well resolved in our map. Similarly, a central region of HAP40 (residues 217–258) was not visible, consistent with biochemical experiments showing that this region was not required for HTT binding (Extended Data Fig. 6).
Although HTT is highly conserved from sea urchins to humans12 (Extended Data Fig. 7), the HTT orthologue in Drosophila melanogaster bears little resemblance to human HTT. No HAP40 homologue appears to be present in D. melanogaster, suggesting that these two proteins may have co-evolved. Many HTT interactors bind the N terminus of HTT, whereas the binding of HAP40 to HTT requires the coordination of all HTT domains (Figs 2, 4). This explains why HAP40 has been identified as an HTT interactor by previous studies that used full-length HTT as bait4,7, but not by others that only used fragments of HTT3. It is also possible that other proteins bind to HTT at a similar location. Taken together, our data resolve long-standing speculations regarding the architecture of HTT, strongly support the concept that HTT serves as a multivalent interaction hub1 and invite future structure-guided studies of the mechanisms by which HTT coordinates its diverse activities.
The following antibodies were used: anti-Flag M2 (Sigma), anti-HAP40 (Santa Cruz SC-69489), anti-Strep (IBA 2-1507-001) and anti-HTT (Millipore MAB2166).
Identification of HTT-interacting proteins
HEK293-based C2.6 cells17 (2 × 108) expressing Flag-tagged full-length polyglutamine-expanded 46QHTT at low levels were collected, lysed with 25 mM Tris, 150 mM NaCl, 0.5% Tween 20, 1× protease inhibitor (Roche), pH 7.4 and then centrifuged (20,000g, 1 h). The supernatant was incubated with Flag beads at 4 °C for 2 h and was then washed three times with 25 mM Tris, 150 mM NaCl, 0.02% Tween 20, pH 7.4. Proteins bound to the Flag beads were eluted with 100 mM glycine, 150 mM NaCl, 0.02% Tween, pH 3.5 and immediately neutralized with 1 M Tris (pH 8.0). The eluted proteins were concentrated and analysed by SDS–PAGE and Coomassie staining. To identify potential interactors of HTT the lanes were excised and the proteins were in-gel digested using trypsin, and then analysed by nano liquid chromatography (C18, 500 × 0.075 mm, 2 μm column, Thermo Fisher Scientific) and tandem mass spectrometry (QExactive, Thermo Fisher Scientific) in data-dependent acquisition mode (Top12). Proteins were identified using Proteome Discoverer 1.4 (Thermo Fisher Scientific) with a peptide false discovery rate ≤ 0.01, and enrichment analysis was performed with Perseus 188.8.131.52 using MS1 peak area for quantification.
The identity of Flag affinity-purified proteins was confirmed by western blot analysis with anti-HTT and anti-HAP40 antibodies after SDS–PAGE.
Generation of a stable human cell line co-expressing 17QHTT and HAP40
B1.21 cells17 are based on HEK293 cells and express full-length wild-type 17QHTT upon induction with doxycycline. The expression plasmid pBSK/2-CMV-HAP40-TS was constructed to express the human HAP40 protein (NCBI RefSeq NP_036283.2) with a C-terminal Twin-Strep-tag under the control of the hCMV promoter. B1.21 cells were co-transfected with this plasmid together with a puromycin resistance gene. The resulting stable cell line (B1.21-HAP40TS), which expresses HAP40 at high constitutive levels and HTT upon induction with doxycycline, was used for the purification of the HTT–HAP40 complex. Generated cell lines tested negative for mycoplasma by PCR. Cell lines were authenticated by inducibility of HTT expression with doxycycline and western blot analysis.
Purification of HTT, HAP40 and the HTT–HAP40 complex
The purification of HTT alone has been described17. For purification of the HTT–HAP40 complex, 2 × 108 B1.21-HAP40TS cells were collected 72 h after induction with doxycycline by centrifugation at 400g for 10 min. Cells were lysed with 25 mM HEPES, 300 mM NaCl, 0.5% Tween 20, protease inhibitor, pH 8.0 by rotation at 4 °C for 30 min followed by centrifugation of the cell lysate at 30,000g and clearance by filtration through a 0.2-μm filter. The filtrate was incubated with Strep-Tactin beads (Qiagen) for 2–3 h at 4 °C. After washing three times with 25 mM HEPES, 300 mM NaCl, 0.02% Tween 20, pH 8.0, bound proteins were eluted with 25 mM HEPES, 300 mM NaCl, 0.02% Tween 20, 2.5 mM desthiobiotin, pH 8.0. The eluate was concentrated using Amicon filters.
The HTT–HAP40 complex was further purified by size-exclusion chromatography using a Superose 6 10/300 increase column (GE Healthcare) in running buffer (25 mM HEPES, 300 mM NaCl, 0.1% CHAPS and 1 mM DTT, pH 8.0). HTT–HAP40 eluted in one narrow-based peak and was concentrated with Amicon Ultra 100-kDa filters (Millipore).
HAP40 was purified from the HTT–HAP40 complex as follows. HTT–HAP40 bound to Strep beads was eluted with 25 mM HEPES, 300 mM NaCl, 0.05% N-dodecyl β-d-maltoside (DDM) and 2.5 mM desthiobiotin, pH 8.0. The eluate was concentrated using Amicon filters. To disrupt the HTT–HAP40 complex, DDM was added to a final concentration of 0.25%. After overnight incubation at 4 °C, the Strep eluate was further purified by size-exclusion chromatography using a Superose 6 10/300 increase column in running buffer (25 mM, 300 mM NaCl, 0.1% CHAPS and 1 mM DTT, pH 8.0) to separate HTT and HAP40. HAP40 eluted in one narrow-based peak and was concentrated with Amicon Ultra 30-kDa filters.
Sucrose gradients (5–20%) in 25 mM HEPES, 300 mM NaCl, 0.1% CHAPS, pH 8.0 were generated by an automatic gradient maker (Gradient Master, Biocomp Instruments). A volume of 120 μl of Flag-tag-purified HTT or Strep-tag-purified HTT–HAP40 complex was laid on top of the gradient and centrifuged at 39,000 r.p.m. for 16 h using a SW41 rotor in a Beckman ultracentrifuge. Fractions of the sucrose gradient were collected from the bottom of the tubes in fractions of 0.5 ml to be analysed by SDS–PAGE, Coomassie blue staining and western blotting.
Differential scanning fluorimetry
Protein thermostability was assessed by differential scanning fluorimetry30. Protein unfolding was monitored by the increase in the fluorescence of SYPRO Orange (Invitrogen). Before use, a 100 mM stock of the dye (stored at −20 °C) was diluted 1:20 in DMSO and directly added to the sample to a final concentration of 125 μM. The tested proteins were diluted in sample buffer (25 mM HEPES, 300 mM NaCl, 0.1% CHAPS, 1 mM DTT and 10% glycerol) to concentrations of 1.6 μM (HTT, HTT–HAP40 complex) and 2 μM (HAP40). The samples were heated up with a ramp rate of 1 °C min−1 over a temperature range of 15–95 °C using the qPCR System MX 3005 P (Stratagene). Measurements were performed in duplicate.
Transient expression of HAP40 and HAP40 fragments and interaction studies with HTT
Plasmids were generated expressing (under the control of the hCMV promoter) full-length HAP40, an N-terminal HAP40 fragment (HAP40-N, encoding residues 1–222), a C-terminal HAP40 fragment (HAP40-C, encoding residues 249–371) or a HAP40 fragment in which the central proline-rich region had been replaced by a flexible linker (HAP40∆, encoding residues 1–222 linked by a (GGGGS)3 linker to residues 249–371). All HAP40 variants carried a C-terminal Twin-Strep tag.
B1.21 cells induced with doxycycline to express 17QHTT were transiently transfected with the plasmids using PEI transfection. At 48 h after transfection, the cells were collected by centrifugation and lysed in 25 mM HEPES, 300 mM NaCl, 0.5% Tween 20, 1× protease inhibitor, pH 8.0, followed by centrifugation (20,000g, 1 h). The supernatant was incubated with Magstrep beads (IBA) at 4 °C for 2 h and then washed three times with 25 mM HEPES, 300 mM NaCl, 0.02% Tween 20, pH 8.0. Thereafter, bound proteins were eluted using desthiobiotin in SDS loading buffer, followed by SDS–PAGE and western blot analysis using anti-Flag and anti-Strep antibodies for detection.
Cryo-EM sample preparation and data acquisition
Purified HTT–HAP40 complex was diluted to 0.5 mg ml−1 with 25 mM HEPES, 300 mM NaCl, 0.025% CHAPS, 1 mM DTT. A 4-μl volume of sample was applied to a Quantifoil gold grid suspended with monolayer graphene (Graphenea) and vitrified by plunge freezing in a liquid ethane–propane mixture using a Vitrobot Mark IV (FEI) with a blotting time of 5 s. Data collection was performed on a Titan Krios microscope (FEI) operated at 300 kV and equipped with a field emission gun, a Gatan GIF Quantum energy filter and a Gatan K2 Summit direct electron camera. The calibrated magnification was 105,000 in EFTEM mode, corresponding to a pixel size of 1.35 Å. Images were collected at a dose rate of 4 electrons Å−2 s−1. Each exposure (8-s exposure time) comprised 16 sub-frames amounting to a total dose of 32 electrons Å−2 s−1. Data was recorded using SerialEM31 software and custom macros with defocus values ranging from −1.4 to −3 μm.
Micrograph movie frame stacks were subjected to beam-induced motion correction by MotionCor232. Most further processing was performed using RELION33. The contrast transfer function parameters for each micrograph were determined with CTFFIND434, and all micrographs with a resolution limit worse than 4 Å were discarded. Particles were initially picked with Gautomatch (http://www.mrc-lmb.cam.ac.uk/kzhang/Gautomatch/Gautomatch_Brief_Manual.pdf) using a sphere as template, and extracted with a 160-pixel by 160-pixel box. Reference-free 2D-class averaging was performed reiteratively, keeping only particles with well-resolved 2D averages for initial model generation. To validate the ab initio model, 3D classification was performed using initial models generated by RELION, VIPER35 or a simple sphere as reference. Identical 3D maps with detailed features were generated regardless of the reference used (Extended Data Fig. 8). 2D projections of this model were subsequently used as a reference to re-pick the particles. The resulting particles were subjected to reiterative reference-free 2D-class averaging. Strict selection of classes showing distinct structural features resulted in a particle subset used for further three-dimensional classification. The classes with identical detailed features were merged for further auto-refinement, applying a soft mask with six-pixel fall-off around the entire molecule, to produce the final density map with an overall resolution of 4 Å (Extended Data Fig. 2c). The resolution was estimated based on the gold-standard Fourier shell correlation (FSC) method using the 0.143 criterion36. The chirality of the final map was validated by model building of side chains within α-helices. All density maps were sharpened by applying a temperature factor that was estimated using post-processing in RELION. For visualization, the density maps were filtered based on the local resolution determined using half-reconstructions as input maps. Chimera37 and PyMOL38 were used for graphical visualizations.
Ab initio modelling of the entire HTT–HAP40 complex was performed in Coot39, using secondary structure predictions calculated by PSIPRED19 and the densities of bulky side chains to determine registers of the residues. Some regions of HTT (1–90, 323–342, 403–660, 960–977, 1,049–1,057, 1,103–1,120, 1,158–1,222, 1,319–1,347, 1,372–1,418, 1,504–1,510, 1,549–1,556, 1,714–1,728, 1,855–1,881, 2,063–2,091, 2,325–2,347, 2,472–2,490, 2,580–2,582, 2,627–2,660, 2,681–2,687, 2,926–2,944 and 3,099–3,138) and HAP40 (1–41, 217–257, 300–313 and 365–371) were not built in the final model, as no well-resolved densities were present in the map. Map refinement was carried out using Phenix.real_space_refine40 against the overall map at a resolution of 4 Å, with secondary structure and Ramachandran restraints. The final model was validated using MolProbity41 (Extended Data Table 1).
All data supporting the findings of this study are available within this paper. Source Data for Fig. 1 and Extended Data Fig. 2 and gel source images (Supplementary Fig. 1) are available with the online version of this paper. The cryo-EM map of the 17QHTT–HAP40 complex has been deposited in the Electron Microscopy Data Bank under accession code EMD-3984. The modelled structure of the 17QHTT–HAP40 complex has been deposited at the Protein Data Bank under accession code 6EZ8.
Electron Microscopy Data Bank
Protein Data Bank
We thank J. Plitzko for electron microscopy support, F. Beck for help with image processing, E. Conti, H. Kiefer, B. Landwehrmeyer, K. Lindenberg, P. Mittl and L. Toledo-Sherman for discussions and A. Bracher, M. Hipp and E. Sakata for the critical reading of the manuscript. This work has been funded by the CHDI Foundation, the German Federal Ministry of Education and Research (FTLDc 01GI1007A), the German Research Foundation (SFB1279) and the European Commission (grant FP7 GA ERC-2012-SyG_318987–ToPAG). Q.G. is the recipient of postdoctoral fellowships from EMBO (EMBO ALTF 73-2015) and the Alexander von Humboldt Foundation.