Structure of the human SAGA coactivator complex

The SAGA complex is a regulatory hub involved in gene regulation, chromatin modification, DNA damage repair and signaling. While structures of yeast SAGA (ySAGA) have been reported, there are noteworthy functional and compositional differences for this complex in metazoans. Here we present the cryogenic-electron microscopy (cryo-EM) structure of human SAGA (hSAGA) and show how the arrangement of distinct structural elements results in a globally divergent organization from that of yeast, with a different interface tethering the core module to the TRRAP subunit, resulting in a dramatically altered geometry of functional elements and with the integration of a metazoan-specific splicing module. Our hSAGA structure reveals the presence of an inositol hexakisphosphate (InsP6) binding site in TRRAP and an unusual property of its pseudo-(Ψ)PIKK. Finally, we map human disease mutations, thus providing the needed framework for structure-guided drug design of this important therapeutic target for human developmental diseases and cancer.

While all hSAGA HAT and DUB subunits were confirmed in our sample, they were not resolved in our structural analysis, either due to flexible tethering or a more dynamic and labile attachment, consistent with the low resolution and flexibility described for these modules in the yeast complex 4,5 . By comparing the positions of the HAT-tethering subunits TAF6L/SUPT7L and the integration of the DUB subunit ATXN7 in the core (Fig. 2a) with their respective counterparts in ySAGA, we anticipate similar general positions for these modules in hSAGA (Extended Data Fig. 4a-f). Moreover, very weak density, visible only in some class averages after gradient crosslinking (Extended Data Fig. 4g and Methods), likely correspond to the HAT and DUB domains, indicating a flexible or dynamic connection to the complex at the expected positions.
Core module structure and tethering of the SPL module. The structure of hSAGA is organized around the nine-subunit core module (Figs. 1a,c-e and 2a), in which the seven subunits contain histone folds (SUPT3H containing two) assemble into a distorted pseudo-octamer (Fig. 2b,c), as also observed in ySAGA 4,5 (Extended Data Fig. 4h), as well as in human and yeast TFIID 9,11,12 . The distortion from the symmetric nucleosomal octamer creates a gap that is filled by the TAF5L WD40 propeller, which centrally binds to helix ∝2 of the TAF6L histone fold (Fig. 2a,c). The periphery of the core is organized by the C-terminal TAF6L HEAT repeat domain, which connects the SPL module on its concave side (Figs. 1c and 2d) and probably the HAT module on its convex side (Extended Data Fig. 4a-c). Such connections are consistent with yeast two-hybrid assays of Drosophila homologs, which suggested interactions between SF3B3 and SF3B5 (SPL), SGF29 (HAT) and SUPT7L (Core) 13 . The SF3B3 subunit contains three WD40 propellers and tethers the SPL module via propeller one and two to the TAF6L HEAT repeats (Fig. 2d and Extended Data Fig. 2e,i). Of note, in ySAGA, the corresponding interface on the Taf6 HEAT repeat is blocked by the Taf5 N-terminal domain (NTD) (Fig. 2e). This  TAF9B   TAF9B   TAF9B   TAF9B   TAF12   TAF12   TRRAP   TRRAP   TAF12   TAF12   TAF10   TAF9B   TAF5L   TAF5L   TAF5L   TAF5L   TAF5L  domain is rotated −59° relative to the human TAF5L NTD, which in hSAGA is latched in place by the SUPT20H NTD (Fig. 2d).
SUPT20H as a latch and binding of InsP 6 . SUPT20H forms the largest interface with the rest of the complex (approximately 12,000 Å 2 ) and acts as a clamp-like scaffold within hSAGA (Fig. 3a), supporting its central role in complex assembly and module association 14,15 . Our structure shows how SUPT20H tethers the DUB anchor ATXN7 to the core ( Fig. 2a and Extended Data Fig. 4d).
In addition to its crucial role in latching away the TAF5L NTD, thus allowing incorporation of the SPL module (described above), SUPT20H also makes extensive contacts between the core and TRRAP module that contribute to create an architecture very different from that of ySAGA. The SUPT20H NTD connects to a long linker, 'the latch' (Fig. 3a), missing in yeast, that wraps along the surface of the core, around the TRRAP FAT domain and terminates in the cleft below the FAT and central TRRAP HEAT repeats with a previously unpredicted C-terminal domain (CTD) (Fig. 3b).
The CTD folds into a five-stranded antiparallel beta-sheet with an alpha-helix parallel to the sheet that connects the two C-terminal outer strands (Figs. 1c-e and 3b). The closest structural homolog is the Spt6 SH2 domain of Candida glabrata 16 (2.20 Å C ∝ -r.m.s.d. over 49 residues). Neither the SUPT20H latch, nor its CTD are conserved in ySAGA (Fig. 3c,d and Extended Data Fig. 5a,b). On the other hand, the N terminus of ySAGA Taf12, lacking in the human homolog, emerges from a location similar to that of the SUPT20H CTD and wraps around the opposite side of the Tra1 FAT domain (Fig. 3c,d). Metazoan TAF12s have a much shorter N terminus and contact TRRAP at a different location (Extended Data Fig. 5c,d).
The CTD location of SUPT20H resembles a lid at the entrance of a positively charged tunnel below the FAT domain that is conserved in metazoans ( Fig. 3e and Extended Data Fig. 6a-d).
In a side pocket of this tunnel and bound to highly conserved residues of the FAT and ΨPIKK domains, our structure shows clear density for the metabolite inositol hexakisphosphate (InsP 6 ), which copurified with hSAGA ( Fig. 3b,e,f and Extended Data Fig. 6e-j).
TRRAP structure and interactions with the core module. The TRRAP subunit, like the yeast Tra1, has a characteristic tripartite HEAT repeat organization, consisting of a central N-terminal repeat and a circular cradle, followed by a FAT domain and a ΨPIKK (Fig. 1c-e) (the Tra1 and TRRAP subunits are shared with the yeast NuA4 complex and its human counterpart, TIP60, respectively) [4][5][6]17 . Compared to ySAGA, hSAGA exhibits a dramatically different TRRAP-core interface that leads to a relative rotation of 75° of TRRAP/Tra1 with respect to the core and SUPT3H/Spt3 (Fig. 4a). While the approximate region of the interface is similar on TRRAP and Tra1, the region on the core contributing to the interface is dramatically different for yeast and human complexes. In hSAGA, all core subunits except for TAF6L and ATXN7 are involved in the TRRAP-core interface ( Fig. 4b and Extended Data Fig. 7a-c), as compared to a limited number in yeast (Extended Data Histone-fold dimers are grouped in boxes, colored by hSAGA subunits (above/below). Corresponding histones are indicated on the side, as well as the contact with the TAF5L WD40 propeller and the potential interaction with TBP below/above. c, Schematic of the relative locations of the distorted histone-fold octamer helices in the hSAGA core and in the nucleosome. d, The (concave) surface of the TAF6L HEAT domain tethers the SPL module in hSAGA (translucent EM contoured at 6.1σ). The SUPT20H NTD latches the TAF5L NTD in place (a close-up on the right). e, In ySAGA, the Taf5 NTD is rotated −59° and occupies the corresponding Taf6 HEAT domain surface (PDB 6T9K). The location of the HAT modules in d and e is indicated by translucent blue ovals and location of the depicted regions in the context of the complete complexes are outlined at the top right.   Fig. 7c) and presumably leads to the increased flexibility observed between the core and Tra1 in yeast 18,19 . While the    Table 3) or SUPT20H in human that form interfaces with different regions on Tra1 and TRRAP, respectively ( Fig. 3c and Extended Data Figs. 5a and 7c). In hSAGA, the SUPT20H extension doubles the total interface (to 7,073 Å 2 ), which is ultimately 64% larger than that of ySAGA (Supplementary Table 3).

Discussion
Local variations in the core enable a divergent architecture. Our study revealed that local variations, such as the repositioning of the TAF5L NTD and different interactions of SUPT20H and TAF12 on the TRRAP surface, result in very different interfaces between the structurally conserved cores of ySAGA and hSAGA with the Tra1 and TRRAP subunits, respectively. Consequently, this nonconserved geometry positions functional elements in the core and the activator-binding subunit in totally different relative orientations. While the hSAGA TRRAP-core interface is not entirely rigid (Extended Data Fig. 8a), a potential transition between the observed yeast and human conformations, is extremely unlikely. The yeast Taf12 N terminus and Spt20 C terminus form local interactions on the surface of Tra1 beyond the cleft and are likely to move with it as one body. Rearrangement from the yeast to the human conformation would require unfolding of these elements on Tra1 or of parts of the Taf12 histone fold. Similarly, a transition from the human to the yeast conformation would require unfolding of SUPT20H NTD elements that are involved in TAF5L NTD binding. Such a transition far exceeds the conformational space that these modules appear to be capable of exploring. Within the NuA4 complex, Tra1 has been shown to connect to the rest of the complex using a similar, albeit larger interface region as in ySAGA 20 , suggesting that the newly defined TRRAP interface in hSAGA might also be relevant for TRRAP incorporation into the related metazoan TIP60 complex.
Human SAGA and TBP. Analysis of our cryo-EM data (Methods) revealed heterogeneity that suggests alternative main chain conformations in the cleft between TRRAP and SUPT3H/SUPT7L/TADA1, which includes the region where TBP is bound by Spt3 (SUPT3H homolog) and the yeast-specific Spt8 (ref. 5 ) ( Fig. 1e and Extended Data Figs. 3i,j and 8). We could not observe density for TBP, even when it was added in excess to the purified hSAGA (Methods), in contrast with the observations for the yeast complex 5 , highlighting another distinct difference between these complexes. The lack of a stable TBP-hSAGA complex may either indicate that hSAGA does not bind TBP at all, or, together with the observed electron microscopy (EM) heterogeneity, might indicate a highly dynamic or regulated mode of TBP binding, unlike that for the TFIID or ySAGA complex, that may require stabilization by additional factors. Metazoans lack a homolog of the yeast subunit Spt8, which is sufficient for TBP binding on its own, whereas Spt3 is not 21 . On the other hand, the transcription factor c-Myc has been shown to interact with TBP 22 as well as TRRAP 23,24 via nonoverlapping regions, suggesting the intriguing possibility that activators could play a role in TBP recruitment to metazoan SAGA. DNA binding by ySAGA-bound TBP was shown to be sterically hindered by Tra1 (ref. 5 ). However, due to the distinct tethering of TRRAP in hSAGA, any interaction of TBP with hSAGA could have different consequences on TBP-DNA binding.

Metazoan incorporation of a SPL module.
Comparison of our structure with that of ySAGA reveals a crucial rearrangement of the TAF5L NTD within the core. The lack of a stabilizing TAF6L ). c,d residue mutations associated with cancer, autism, or intellectual disability 34,37 . c, Most disease mutations lie in a region of high sequence conservation. Surface coloring as in b. d, Surface representation of disease mutations as shown in a. red, surface exposed, probably interfere with activator binding; orange, buried, likely to structurally destabilize TrrAP (Extended Data Fig. 10a,b) or the interaction with SUPT20H (e,f). e, A disease-causing mutation of F859 is located at the interface with the SUPT20H CTD. f, r3746 forms a salt bridge with D291 of the SUPT20H latch. The disease mutation r3746Q disrupts the salt bridge. The reported residue numbers relate to the modeled isoform (Uniprot F2Z2U4).
HEAT-TAF5L NTD interaction probably contributes to increased flexibility of the TAF6L HEAT repeat domain, a critical platform for HAT and SPL module integration. Furthermore, the local repositioning of the TAF5L NTD exposes the TAF6L interface to allow for SPL module incorporation in hSAGA. The position of the TAF5 NTD is also dramatically different from that observed in Lobe A and Lobe B of TFIID, making this domain a crucial marker for the divergent architectures of TAF-containing complexes 9,11 (Extended Data Fig. 4i). While our EM structure revealed the site of incorporation of the SPL module, very little is yet known about its function or how its components partition between SAGA and the U2 small nuclear ribonucleoprotein. It has been proposed that SF3B3 incorporation into hSAGA may play a role in ultraviolet (UV) -damaged DNA binding and repair 25 , but contradictory results argued that the structurally related subunit DDB1, which we did not observe in our sample, is the one that recognizes UV-damaged DNA in the context of hSAGA (Supplementary Table 2) 26 . The SPL module subunits, SF3B3 and SF3B5, are shared with the metazoan spliceosomal SF3b core complex within the U2 small nuclear ribonucleoprotein. Our structure shows that they are tethered to the rest of the hSAGA complex in a similar way as they are in the spliceosomal SF3b complex 27 .
In hSAGA, SF3B3 binds to the HEAT repeat domain of TAF6L, while SF3B3 binds to the HEAT solenoid of SF3B1 in the SF3b complex 10 , and they do so using an overlapping interface (Extended Data Fig. 2i,j). Therefore, SF3B3/SF3B5 cannot be simultaneously incorporated into hSAGA and the SF3b SPL complex.
Pseudo-kinase active site in TRRAP. TRRAP lacks kinase activity, although homologs of TRRAP are present in active kinases, such as mTOR, DNA-PKcs and ATM 28 (Extended Data Fig. 9a-e). While the SAGA ΨPIKK lacks the canonical active site residues for catalysis 23,28 (Extended Data Fig. 9f), we found that the first residue of the TRRAP activation loop (Y3698), corresponding to the aspartate in the active PIKK's DFG motif 23 , adopts an unusual and well defined cis-peptide bond (Extended Data Fig. 9g). Such geometry outliers often serve a function in active sites 29 , and its position in our structure, together with the high evolutionary conservation of the ΨPIKK and of this specific residue in metazoans (Extended Data Fig. 9f), raises the question of whether the inactive kinase might have a different and so far undiscovered function, as observed for other pseudokinases 30 .
Binding of InsP 6 and its possible role in TRRAP stability. The resolution of our structure allowed us to visualize InsP 6 in the positively charged pocket below the TRRAP FAT domain. The position of InsP 6 in hSAGA is equivalent to that observed for mTORC2 (ref. 31 ) (Fig. 3b,e,f and Extended Data Fig. 6g,h) or the SMG1 kinase 32 , and thus it could serve a similar stabilizing role as proposed for those kinases 31,32 . In the ySAGA structures 4,5 , the region surrounding this pocket, including residues corresponding to R3051 and K3055 in hSAGA ( Fig. 3f and Extended Data Fig. 6e), is poorly resolved and lacks InsP 6 density (Extended Data Fig. 6i). On the other hand, an earlier structure of the yeast Tra1 subunit 17 is better defined in this region and contains an unattributed density where InsP 6 is seen bound in hSAGA (Extended Data Fig. 6j), potentially linking the stability of the TRRAP FAT domain to the presence of InsP 6 .
Human disease mutations. The best characterized function of SAGA's TRRAP module is serving as an interaction hub for transcriptional activators, which leads to its critical role in many diseases and its consideration as a prognostic marker and therapeutic target in many cancers 23,24,28,[33][34][35][36][37][38][39] . Structurally, TRRAP displays high flexibility around the N-terminal cradle region where the c-Myc and p53 binding sites are located 24,36 (Fig. 5a), and thus it is possible that c-Myc/p53 binding could stabilize or mediate conformational changes in this region. A cluster of disease-causing mutations lies along a highly conserved FAT-proximal HEAT repeat region where the N-terminal HEAT repeat arm and circular cradle meet (Fig. 5b,c), a site that has been shown to be crucial for liver X receptor alpha (LXRa) interaction 28,33,34,37 . A number of mutations, including the prevalent melanoma mutation S722F (TRRAP isoform here, S721F), are part of a highly conserved surface patch and probably involved in effector binding (Fig. 5c,d).
Other mutations appear buried and are likely to affect folding of the HEAT repeats and interfere with the structural integrity of TRRAP (Extended Data Fig. 10a,b). Two independent mutations identified in patients with intellectual disability and neurodevelopmental disorders 37 are at sites of interaction with the metazoan-specific extension seen in SUPT20H. The first (F859L) localizes directly at the interface with the SUPT20H CTD (Fig. 5e) and the second (R3746Q) eliminates a salt bridge with the highly conserved D291 of the SUPT20H latch ( Fig. 5f and Extended Data Fig. 5b). Because TRRAP is a scaffold for other important cellular complexes, disease-causing mutations may also disrupt assembly or lead to perturbations within TIP60 (ref. 28 ).

Concluding remarks
Our hSAGA structure reveals conserved structural elements as well as notable divergences from the yeast complex, including a distinct architecture and TRRAP-core interface, a lack of stable interaction with TBP and the visualization of the incorporation of the metazoan SPL module. Human SAGA complex combines transcription factor-interacting and enzymatic modules that need to regulate an intricate and unique transcriptional and chromatin landscape within human cells, in which enhancers and promoters are separated by kilo-to megabase distances. Furthermore, human promoter architectures, as well as intron and splice site properties, are very distinct from those in yeast 40,41 . These newly revealed structural features of hSAGA probably reflect unique mechanisms for this complex in human cells that go beyond transcription and chromatin regulation and can provide a launching point for further studies of SAGA's roles in human disease.

online content
Any methods, additional references, Nature Research reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/ s41594-021-00682-7.
Here, 400 mesh Cu grids were cleaned three times (in ethanol, water, ethanol) by sonication for 5 min and dried on filter paper. A petri dish was filled with water forming a meniscus and wiped off with lens paper. One drop of 1% (w/v) nitrocellulose in amyl acetate was added to the surface, forming a thin film. Cleaned grids were deposited on the film with the polished side facing down. The grids were transferred with parafilm onto filter paper with the nitrocellulose facing up and dried overnight. Grids were coated with carbon by evaporation using an Edwards Auto306 (10 −6 mbar, 6 A, 6 s). Before sample adsorption, grids were glow discharged (30 s, 15 W, Tergo EM PIE scientific). Human SAGA was diluted (2×) in dilution buffer (25 mM HEPES pH 7.5, 0.2 mM EDTA, 6 mM MgCl 2 , 0.2 M NaCl, 3% (w/v) D(+) Trehalose), 3 µl were applied to a grid and adsorbed for 1 min. The grid was washed and stained, respectively, by swirling it five times on a 50 µl drop of 2% (w/v) uranyl formate for 10 s (each), blotted and dried in an air stream. Data was collected on a Tecnai F20 (Thermo Fisher Scientific), using Leginon 44 ( Fig. 2 and Table 1). Micrographs were contrast transfer function (CTF) corrected using CTFfind v.4.1.13 (ref. 45 ). Particles were picked using Gaussian LoG picker in Relion-3.1 (ref. 46 ), extracted with a box size of 300 × 300 pixels and subjected to reference-free two-dimensional (2D) classification. Particles from the best classes (32%) were used for initial model generation using the statistical gradient descent method 47 in Relion-3.1 (ref. 46 ). Particles were classified by a series of three-dimensional (3D) and 2D classifications with and without alignment (Extended Data Fig. 2a). Classification revealed one class without the (ordered) TAF6L HEAT domain and SPL module. Particles with and without this region were separated by multi-reference 3D classification. The best reconstruction was refined and classified again by alignment-free 3D classification. Combined classes that yielded the highest resolution were refined and then postprocessed in Relion v.3.1 (ref. 46 ) (Extended Data Fig. 2b).
All grid preparation steps were done on ice. Here, 3 µl of undiluted hSAGA was transferred into a 0.5 ml non-stick tube and crosslinked by mixing with 0.6 µl of crosslinking buffer (25 mM HEPES pH 7.8, 0.2 M NaCl, 0.2 mM EDTA, 0.5 mM TCEP, 0.01% (v/v) NP40, 10% (v/v) glycerol, 6 mM bis(sulfosuccinimidyl) suberate) and incubated for 5 min. A graphene-oxide grid was picked up with Vitrobot tweezers, the sample was transferred to the grid and incubated for 2 min in a saturated humidity chamber. Afterward, the grid was washed five times by submerging and swirling for 5 s (each) in 230 µl of wash buffer (25 mM HEPES pH 7.8, 0.2 M NaCl, 0.2 mM EDTA, 0.5 mM TCEP, 0.01% (v/v) NP40, 2.5% (v/v) glycerol) in a five-well Teflon block. Without letting the grid dry, excess solution was blotted off at a 90° angle and 4 µl of wash buffer were added immediately. The grid and tweezers were mounted into a Vitrobot Mark IV (Thermo Fisher Scientific), blotted with fresh filter paper (blot force 0, 3 s) and plunge frozen into liquid ethane.
Data were collected with SerialEM 49 and 3 × 3 multishot acquisition on a Titan Krios G2 (Thermo Fisher Scientific) ( Table 1). Videos were whole-frame motion corrected and binned (2×) using the Relion-3.1 (ref. 46 ), CTF corrected using CTFfind v.4.1.13 (ref. 45 ) and sorted manually. Particles were picked using the Gaussian LoG picker in Relion-3.1 (ref. 46 ) and extracted with 8× binning (Extended Data Fig. 3a) and a box size of 45 × 45 pixels. Graphene-oxide edges were removed by 2D classification before hSAGA particles could be classified. After removing most graphene-oxide edges, particles were reextracted with recentering (4× binned, 90 × 90 box size) and reclassified in 2D. The negative stain reconstruction was low-pass filtered to 50 Å and used as reference model for initial 3D classification. Each class was subclassified by alignment-free 2D classification to remove particles close or on graphene-oxide edges. The remaining particles were subjected to 3D classification, recentered in the box by applying a coordinate transformation to the particle alignment parameters using a custom python script, reextracted with recentering, without binning and placed in a box size of 360 × 360 pixels, then subjected to a consensus 3D refinement. Afterward, the particles were subjected to two rounds of Bayesian polishing, 3D refinement, CTF refinement and alignment-free 3D classification (tau = 20) (Extended Data Fig. 3a). A final round of 3D classification, refinement and postprocessing yielded a reconstruction at 2.93 Å (Extended Data Fig. 3b-d). High variability and low local resolution were observed at the TRRAP N terminus and the HEAT repeat cradle in close proximity as well as around the surface of the core module. Low-pass filtering and B factor blurring slightly improved the interpretability of the map in these regions. Further improvement was made by multibody refinement (Extended Data Figs. 3e and 8) of the core and TRRAP modules, although the resolution did not improve. Various density modification and map enhancement methods were tested, and the greatest improvements in variable and surface exposed regions were obtained by applying the spiral phase transform in LocSpiral 50 to all reconstructions. This process revealed additional peptide connections and density fragments of the disordered TAF6L HEAT domain (Extended Data Fig. 3e,i,j). A principal component analysis of the multibody refinement showed a high degree of flexibility between the core and TRRAP modules (Extended Data Fig. 8a), which can also be observed by 3D variability analysis in Cryosparc v.2.15.0 (refs. 47,51 ). Signal subtraction after recentering and reextraction was attempted to detect density for the HAT and DUB modules but was not successful, presumably due to a high degree of conformational as well as potential compositional heterogeneity (metazoan SAGA has also been observed to occur without these modules 52,53 ). Nevertheless, early samples that had been stabilized by GraFix 54 revealed some 2D negative stain classes with highly variable density that is consistent with the suggested locations based on comparisons with ySAGA and the position of the N-terminal end of ATNX7 (Extended Data Fig. 4g). Masking and map transformations were carried out using UCSF Chimera 55 and Relion-3.1 (ref. 46 ). All resolution estimates are based on the 0.143 threshold criterion of the gold-standard Fourier shell correlation (FSC) 56 of two independently refined half sets in Relion-3.1 (ref. 46 ), after accounting for correlations introduced by masking 57 . Local resolution was estimated using Relion-3.1 (ref. 46 ).
Cryo-EM sample preparation of hSAGA with TBP, data collection and processing. Human SAGA was mixed with a sixfold molar excess of human full length TBP and incubated for 5.2 h on ice. Grids were frozen and data were collected and processed in the same way as described above, but no additional density corresponding to TBP could be observed.
Modeling and refinement. For model building in Coot 58 maps were converted to structure factors using phenix.map_to_structure_factors 59 , allowing low-pass filtering and variable B-sharpening or blurring in Coot 58 . Models were built into the postprocessed, multibody-refined and LocSpiral 50 filtered maps (Extended Data Fig. 3e-g). A fragmented initial model of secondary structure elements in TRRAP was generated using phenix.map_to_model 59 , manually corrected and completed in Coot 58 . A homology model of the TAF5L WD40 propeller was generated using SwissModel 60 (based on Protein Data Bank (PDB) accession code 6F3T, 61 ) and rigid-body fitted in Coot 58 . The remaining model was built de novo in Coot 58 , guided by homology models based on human TFIID 9 and ySAGA 4,5 . Regions with low confidence in register assignment were modeled as poly-alanines (assigned as unknown, UNKs). Before real-space refinement in Phenix, atomic B factors were reset to 90 Å 2 , the model was protonated using phenix.ready_set 59 and sanity checked as well as geometry minimized using gelly 62 (GlobalPhasing). Afterward, the model was refined using Rosetta 63 , validated using phenix.molprobity 59 and optimized in Coot 58 . Secondary structure restraints were generated using phenix.secondary_structure_restraints and corrected after manual inspection. A final refinement was carried out with phenix.real_space_refine 59 (1.18-3861) against the complete LocSpiral 50 filtered map using default parameters plus secondary structure restraints, rotamers.fit=outliers_and_poormap and rotamers. tuneup=outliers_and_poormap (Extended Data Fig. 3g). Model statistics were calculated using phenix.molprobity 59 (Table 1). Refinement against the regular postprocessed map resulted in almost identical statistics with an all-atom r.m.s.d. of 0.400 Å. All maps used for model building and refinement were deposited in the Electron Microscopy Data Bank (EMDB). Map versus model FSC was calculated using phenix.mtriage 59 (Extended Data Fig. 3f). The InsP 6 ligand was identified by density fit and homology to mTORC2 (ref. 31 ) and one out of two possible conformations was modeled ( Fig. 3f and Extended Data Fig. 6e). In analysis of our cryo-EM structure, separating the core and TRRAP modules improved map quality and revealed additional features on surface exposed regions after LocSpiral filtering 50 (Extended Data Fig. 3i,j). In particular, analysis of the region where SUPT3H, SUPT7L, TADA1 and TRRAP meet suggested alternative main chain conformations that could not be sorted out by classification. The highly variable region between SUPT3H, TADA1 and TRRAP corresponds to the approximate position where TBP binds to ySAGA (Extended Data Fig. 8b,c). The presence of multiple subunit isoforms, identified by mass spectrometry (Supplementary  Table 2), did not affect modeling. Differences of isoforms are primarily located in disordered regions and were addressed according to PDB standards with remarks.
A model for the negative stain reconstruction was generated by rigid-body fitting without coordinate refinement in phenix.real_space_refine 59 using the protein part of the cryo-EM model, a homology model of the TAF6L HEAT domain (generated with SwissModel 60 and based on human TAF6, ref. 9 , PDB 6MZL), and SF3B3/SF3B5 from the SF3b core complex 10 (PDB 5IFE) ( Table 1 and Extended Data Fig. 2d-i). Before fitting, expression tags in SF3B3 were deleted and the TAF6L HEAT domain was mutated to poly-alanines (annotated as UNKs) to reflect the absence of an authentic high-or medium-resolution structure for this region.
Structural analysis and visualization. Coordinate transformations and manipulations were carried out using CCP4 tools 64 . Structures were compared using PDBefold 65 and interfaces were analyzed using QtPISA v.2.1.0 (ref. 64 ). Relative angles between variable regions/domains (for example, TAF5(L) NTDs) of related structures with a common reference domain (for example, TAF5L WD40) were calculated by prealigning all structures to the reference domain of hSAGA using secondary structure matching. The center of masses of the hSAGA reference domain (for example, TAF5L WD40), the hSAGA variable domain (for example, TAF5L NTD or TRRAP ΨPIKK) and the hSAGA variable domain after superposition on the corresponding domain in related structures using secondary structure matching (for example, ySAGA Taf5 NTD or Tra1 ΨPIKK) were calculated. Center of masses were calculated in PyMOL (The PyMOL Molecular Graphics System, v.2.4.0 Schrödinger, LLC.) and angles between corresponding vectors were calculated using python. Related structures were identified using PDBeFold (70% query/70% target) 65 . Structure figures were generated using PyMOL, ChimeraX (UCSF, 2020-01-10) and Adobe Illustrator. Electrostatic surfaces were generated using the APBS 66 plugin in PyMOL. Videos were generated using ChimeraX 67 (UCSF, 2020-01-10), Adobe Premier and ffmpeg (https://ffmpeg. org). Plots were generated using python. Reported contour levels for maps are defined as σ = density threshold/r.m.s.

Sequence analysis.
In total, 23 metazoan homologs of hSAGA with a complete set of all 20 subunits were retrieved from databases. Sequence alignments were generated using the Clustal Omega 68 executable in Geneious Prime v.2021.0.3. Sequence conservation figures were generated by aligning all subunit sequences of all 23 metazoan SAGAs with the sequences of the molecular model of hSAGA. Alignments were combined and conservation scores were calculated using AL2CO 69 and used for coloring in PyMOL.
Reporting Summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
Cryo-EM maps and refined coordinates were deposited in the EMDB with accession codes EMD-23027 and EMD-23028 and in the PDB with accession codes 7KTR and 7KTS. The cell line can be provided on request. Supplementary Information is linked to the online version of the paper at www.nature.com/ nature. Correspondence should be addressed to enogales@lbl.gov. Source data are provided with this paper.

Code availability
Custom computer code is available on Github (coord_transform_to_star, https:// github.com/dominikaherbst/cryo-em_scripts). Fig. 1 | Conservation and Purification of hSAGA. a, Average sequence conservation of SAGA subunits in metazoan and yeast (relates to Supplementary Table 1). Indices indicate classification into mammals (m), vertebrate (v, green), invertebrate (i, yellow), and yeast (y, red). b. 4-20% gradient SDS-PAGE gel stained with Flamingo fluorescent protein stain (Biorad) of the hSAGA FLAG elution (E) with subunits labeled based on their predicted molecular weight. c. Western blot probing for DUB, HAT, and core subunits to verify the presence of these modules in the sample used for grid preparation. TBP did not co-purify with hSAGA. The lysate lane (L) corresponds to 0.0004% of the total input and the right lane (hSAGA Elution, E) corresponds to 4.65% of the final elution. Blots were cropped. Experiments in b and c were repeated twice with similar results. Fig. 2 | Negative stain processing scheme and model fit. A representative section of one of 745 micrographs is shown. After initial 2D classification, particles (ptcls) from the best classes were used for initial model generation. The data was cleaned up by 3D classification followed by alignment-free 2D classification. Particles from all good classes were subjected to a consensus 3D refinement followed by alignment-free 3D classification. All except for one class revealed fuzzy density for the TAF6L HEAT and SPL region. Subsequently, all classes were cleaned up individually by alignmentfree 2D classification and combined in a multi-reference classification using the two best models with and without TAF6L HEAT and SPL region. The best class including this region was subjected to 3D refinement, alignment-free 3D classification, and to a final refinement using particles of the class combination that yielded highest resolution. b. Final map and FSC plot. c. Angular distribution. d. Final map (contoured at 4.9 σ) and rigid body fit of SF3B3/SF3B5 (from PDB: 5IFE), a homology model of the TAF6L HEAT domain, and the cryo-EM structure from this study. e-h. Close-up view on the SPL module region of the hybrid map shown in Fig. 1c. The rigid-body fit is shown with translucent map surfaces. All domains fit precisely in the negative stain density, which shows clear central holes for the three WD40 propellers (f-h) of SF3B3. i, j. Comparison of the SF3B3/SF3B5 integration in hSAGA and the SF3b complex. i. The SPL module (SF3B3 and SF3B5 subunits) binds to the concave surface of the TAF6L HEAT domain. The negative stain map of hSAGA is shown in translucent white (contoured at 4.8 σ). j. Crystal structure of the SF3b complex 10 (PDB: 5IFE). The TAF6L HEAT domain of hSAGA is replaced by the SF3B1 HEAT repeat domain in the SF3b complex. Both domains share an overlapping binding region on the SF3B5 surface. Fig. 3 | Cryo-EM processing and model building. A representative section of one of 10,224 micrographs is shown. Graphene oxide (GO) edges were removed in cycles of initial 2D and one 3D classification. The negative stain reconstruction (see Extended Data Fig. 2) was used as initial model. 3D classes were centered in the box by applying a coordinate transformation to the alignment parameters, and unbinned particles were re-extracted with recentering. Particles were filtered for high-resolution features in cycles of 3D refinement, classification, Bayesian polishing, and CTF refinement as indicated. b. Postprocessed map (B-sharpened with -51.1 Å 2 , contoured at 4.9 σ) with local resolution. c. angular distribution. d. Fourier Shell Correlation (FSC). e. Multibody refinement improved map quality, but not the overall resolution. Considerable improvement of map quality was achieved by filtering with LocSpiral 50 . The model for the core and TrrAP was built into the LocSpiral filtered maps of the multibody refinement. The interface between these regions was built using the full map and used for model refinement. refinement against the postprocessed map (b) resulted in the same model, with virtually identical statistics and an all-atom r.m.s.d. of 0.400 Å. Maps are contoured at (regular/LocSpiral): Core 11.2 σ/9.2 σ, TrrAP 7.9 σ/9.0 σ, full 6.9 σ. f. Map vs. model FSC using the postprocessed map shown in b. g. The refined map shows well defined secondary structure elements and side chains (contoured at 9.0 σ). h. Model-sequence coverage. Sequences of all subunits are indicated as horizontal lines (black) and modeled regions as overlaying boxes (orange: visualized by cryo-EM; blue: visualized only by negative stain; translucent: regions with unclear register assignment (unknown, UNKs)). i, j. The LocSpiral filtered multibody map of the core reveals additional density corresponding to the poorly ordered TAF6L HEAT domain, and to SUPT3H in the cleft between the core and TrrAP module (both contoured at 5.9 σ). Fig. 4 | The core module and tethering of its peripheral modules in human and yeast. a-c. Tethering of the HAT module: a. The N-terminus of SUPT7L runs parallel to the TAF6L linker, which connects to the HEAT domain, along the surface of the core and ends with its N-terminus in close proximity to the HEAT domain. b. In Saccharomyces cerevisiae 4 (PDB: 6T9I), the Spt7 linker further extends towards the convex surface of the TAF6L HEAT domain and interacts with an unassigned region. c. The same region in Komagataella phaffii 5 was assigned as the Ada3 subunit of the HAT module (PDB: 6TBM). The similar location of the SUPT7L N-terminus suggests a similar interaction and connectivity for the HAT module in hSAGA. d-f. Tethering the DUB module. d. The ATXN7 subunit of the core and the DUB module (Sgf73 in yeast) is similarly integrated into the core module as in ySAGA (e, f), suggesting a similar relative attachment of the human DUB. g. GraFix 54 crosslinked negative stain class averages revealed extra density at the anticipated locations for the HAT (cyan arrow) and DUB (purple arrow) modules. h. The core of hSAGA and ySAGA (PDB: 6T9K) builds on common HF elements. Only HF containing subunits are shown. Architectural differences are created by local variations outside of the HFs. HF dimerization is indicated by arrows below the subunit labels. i. Comparison of TAF5 and TAF6 architecture within lobe A of human TFIID (canonical state 9 , PDB: 6MZL). TFIID contains two copies of TAF5 and TAF6, with one TAF5 located in lobe A (TAF5 A ) and the other one in lobe B (TAF5 B ), and with the two TAF6 HEAT domains (TAF6 A , TAF6 B ) in lobe C (shown on the right). Compared to hSAGA, the TFIID TAF6 HEAT domains are differently arranged relative to TAF5, and they act to bridge lobes B and C. The TFIID TAF5 NTD is rotated by 90°, leading to a divergent architecture. Fig. 5 | Human versus yeast interactions between TRRAP/Tra1, SuPT20/Spt20 and TAF12/Taf12. a. Location of the SUPT20H/ Spt20 C-terminal region after superposition of human TrrAP and yeast Tra1. The C-terminal helix of yeast Spt20 aligns with helix one of the SUPT20H linker. b. The sequence alignment of the SUPT20H/Spt20 C-terminal regions for 24 metazoan (SUPT20H) and two yeast (Spt20) species shows that the SUPT20H CTD is highly conserved in vertebrates, while it does not appear to exist in yeast. Secondary structure elements are indicated above the alignment. *: D291 forms a salt bridge with TrrAP r3746 (Fig. 5f). Vertebrate and invertebrate sequences were pre-aligned to human SUPT20H, regions corresponding to the structured part in a were extracted and realigned with the yeast sequences corresponding to the region from helix 1 in a to the C-terminus. c. relative location of the TAF12 N-terminal region, based on the superposition shown in a. In yeast, an N-terminal linker of Taf12 wraps around the inside of the Tra1 FAT domain, while human TAF12 contacts TrrAP in a different location. The structured N-terminus of yeast Taf12 is located in the same relative position as the human SUPT20H CTD. d. Zoomed-out sequence alignment of 24 metazoan and two yeast TAF12/Taf12 subunits. Structured regions are colored as in c and the region corresponding to the linker in yeast is indicated. Aligned regions in b and d are colored by similarity in gray scale (annotated in d). In yeast, Taf12 contains a considerably longer N-terminus that appears to be unique to yeast. Sequences are labeled as: Scientific organism name (Uniprot or NCBI accession code). The organism selection corresponds to Extended Data Fig. 1a and Supplementary Table 1.