Structural basis of human transcription–DNA repair coupling

Transcription-coupled DNA repair removes bulky DNA lesions from the genome1,2 and protects cells against ultraviolet (UV) irradiation3. Transcription-coupled DNA repair begins when RNA polymerase II (Pol II) stalls at a DNA lesion and recruits the Cockayne syndrome protein CSB, the E3 ubiquitin ligase, CRL4CSA and UV-stimulated scaffold protein A (UVSSA)3. Here we provide five high-resolution structures of Pol II transcription complexes containing human transcription-coupled DNA repair factors and the elongation factors PAF1 complex (PAF) and SPT6. Together with biochemical and published3,4 data, the structures provide a model for transcription–repair coupling. Stalling of Pol II at a DNA lesion triggers replacement of the elongation factor DSIF by CSB, which binds to PAF and moves upstream DNA to SPT6. The resulting elongation complex, ECTCR, uses the CSA-stimulated translocase activity of CSB to pull on upstream DNA and push Pol II forward. If the lesion cannot be bypassed, CRL4CSA spans over the Pol II clamp and ubiquitylates the RPB1 residue K1268, enabling recruitment of TFIIH to UVSSA and DNA repair. Conformational changes in CRL4CSA lead to ubiquitylation of CSB and to release of transcription-coupled DNA repair factors before transcription may continue over repaired DNA.

Pol II clamp and protrusion that form opposite sides of the active centre cleft (Fig. 1a). CSA and UVSSA do not bind to Pol II, consistent with their recruitment by CSB in vivo 4 . CSB binds to the CSA β-propeller domain with its ATPase lobe 1 and with its helical CSA-interacting motif (Fig. 1a, Extended Data Fig. 2d), which is essential for CSA binding and TCR in vivo 4 . The binding of the CSA-interacting motif to CSA might restrict the mobile ATPase lobe 2 of CSB to an active conformation, explaining the stimulation of CSB ATPase activity by CSA (Extended Data Fig. 6e). UVSSA contacts the opposite face of the CSA β-propeller via its VHS domain (Fig. 1a, Extended Data Fig. 10i, j), which explains why recruitment of UVSSA to stalled Pol II depends on CSA in vivo 4,23,24 .
The structure rationalizes known mutations associated with Cockayne syndrome 25 (Extended Data Fig. 4). Many mutations in CSB and CSA cluster in the CSB-CSA interface, including three CSB mutations (R670W, W686C and S687L) that lead to severe type II Cockayne syndrome. Other mutations in CSB are found in the ATP-binding site and in the interface with upstream DNA and probably impair CSB function. By contrast, the CSA mutation W361C, which causes UV-sensitive syndrome 25 , maps to the CSA-UVSSA interface and is predicted to impair recruitment of UVSSA. This supports the view that recruitment of UVSSA is critical for transcription-repair coupling, whereas loss of CSB and CSA might additionally impair transcription elongation or processing of stalled Pol II, perhaps explaining the more-severe clinical manifestations of Cockayne syndrome than of UV-sensitive syndrome 3,25 .

CSB translocase and elongation stimulation
Our structure suggests how the translocase activity of CSB pushes Pol II forward. The two ATPase lobes hold upstream DNA, and a helix-loophelix element ('pulling hook') protrudes from lobe 2 and inserts into the upstream fork of the DNA bubble (Fig. 1b, Extended Data Fig. 6a, c, d). The pulling hook contains a conserved phenylalanine residue (F796) that stacks with its aromatic side chain against the first base of the non-template strand at position -14 (Fig 1b, Extended Data Fig. 6a). Substitution of F796 by alanine impairs CSB activity, showing that the pulling hook is required for CSB function (Extended Data Fig. 1f, g). Rad26 contains an element corresponding to the pulling hook 19 (Extended Data Fig. 6d).
To better understand the translocase mechanism of CSB, we also solved the Pol II-TCR structure in the presence of ADP•BeF 3 at 2.7 Å resolution (structure 2; Extended Data Figs. 5, 10). The CSB translocase adopts the pre-translocated and post-translocated states in structures 1 and 2, respectively, elucidating the mechanism of the translocase (Fig. 1c, Extended Data Fig. 6c, Supplementary Video 2). Upon ATP binding, the ATPase lobe 2 of CSB closes and pulls on the template strand of the upstream DNA, whereas the pulling hook pulls on the non-template strand in the same direction. As lobe 1 of CSB is anchored to the Pol II clamp, pulling on the upstream DNA template will push Pol II forward.

Switching from elongation to TCR
Comparison of the Pol II-TCR structure with the structure of the Pol II EC* (ref. 21 ) reveals a clash between CSB and DSIF, which comprises the subunits SPT4 and SPT5 (ref. 20 ). We therefore tested whether the binding of CSB to Pol II displaces DSIF. We labelled DNA, CSB and DSIF with different fluorescent dyes and monitored the composition of Pol II complexes by electrophoretic mobility shift assay (Methods). The addition of increasing amounts of CSB indeed displaced DSIF from Pol II (Fig. 2a). To investigate whether CSB could replace DSIF during transcription, we conducted RNA elongation assays (Fig. 2b). Whereas the Pol II-DSIF complex could not transcribe over an arrest sequence, the addition of CSB stimulated the passage of Pol II, indicating that CSB replaced DSIF on transcribing Pol II. Competition between DSIF and CSB for Pol II binding explains the observation that SPT5 can repress TCR 26 . In summary, these results indicate that the switch from active Pol II elongation to TCR involves replacement of DSIF by CSB on the surface of Pol II.
To further investigate the switch from elongation to TCR, we extended our Pol II-TCR complex preparation by adding SPT6, PAF and RTF1, and analysed the resulting complex by cryo-EM (Extended Data Fig. 7). The overall resolution of the structure was 2.9 Å and it revealed all factors except RTF1, which probably dissociated after loss of DSIF (structure 3; Fig. 2c, Extended Data Fig. 8a). This 22-subunit Pol II-CSB-CSA-DDB1-UVSSA-SPT6-PAF complex represents an alternative elongation complex that we call EC TCR .
The conversion of EC* to EC TCR involves three structural changes (Supplementary Video 3). First, TCR factors replace DSIF and displace RTF1, which interacts with DSIF 21,27 . Second, upstream DNA moves and contacts the helix-hairpin-helix domain of SPT6 (Fig. 2c). Helix-hairpin-helix domains are found in several DNA-binding  Fig 8b). Third, the C-terminal linker of the PAF subunit, LEO1, that contacts upstream DNA in EC* (ref. 30 ) moves by up to approximately 30 Å and forms a helix that binds to lobe 2 of the CSB ATPase (Fig. 2c). This contact accounts for the known PAF-CSB interaction that is induced by UV light in vivo 31,32 and for our observation that PAF stimulates CSB ATPase activity (Extended Data Fig. 1d, h).

Ubiquitylation by CRL4 CSA
The cellular response to UV irradiation not only involves recruitment of TCR factors but also ubiquitylation events 7,12,16,32 . Ubiquitylation of the largest Pol II subunit, RPB1, on K1268 regulates transcription shutdown and recovery 7,16 . In addition, the E3 ligase CRL4 CSA polyubiquitylates CSB, leading to degradation of CSB 12 . To investigate these events in vitro, we performed ubiquitylation assays with the complete Pol II-TCR complex containing CRL4 CSA (Fig. 3a). We observed ubiquitylation of CSA and CUL4A and polyubiquitylation of CSB, as previously described 14 . We also detected ubiquitylation of UVSSA and identified 11 ubiquitylation sites on RPB1, including residue K1268 as the highest-scoring site (Fig. 3b). Ubiquitylation of Pol II was dependent on CSB and occurred in the absence of UVSSA (Extended Data Fig 9a), as shown in vivo 33 . These results indicate that CRL4 CSA is the E3 ligase that ubiquitylates K1268. We then solved the structure of the complete Pol II-TCR complex including CRL4 CSA at an overall resolution of 3.0 Å (Fig. 3c, Extended Data Figs. 9, 10). Focused classification revealed two distinct states that differed in the conformation of CRL4 CSA (Extended Data Figs. 9, 10). In the first state (structure 4), UVSSA positions the C-terminal domain of CUL4A such that RBX1 faces a loop in the RPB1 jaw domain that contains the ubiquitylated residue K1268. RBX1 binds to an E2 enzyme-ubiquitin complex 34 , which we modelled onto our structure (Fig. 3c). In this model, the activated C terminus of ubiquitin is positioned at the K1268-containing loop, which explains how CRL4 CSA directs site-specific Pol II ubiquitylation.
In the second state of the complete Pol II-TCR complex structure (structure 5), CUL4A and RBX1 have moved over a large distance to reach the C-terminal region of CSB (Fig. 3c). This CSB region is targeted by ubiquitylation 35 and is essential for TCR 36

TCR model
Our work converges with published data 3 on a molecular model that explains how the complete TCR complex, consisting of CSB, CRL4 CSA and UVSSA, mechanistically couples transcription to DNA repair (Fig. 4). When Pol II encounters an obstacle that cannot be overcome with the elongation factor TFIIS, Pol II stalls and the TCR complex binds. This requires displacement of DSIF and converts EC* to EC TCR , which then uses the ATPase activity of CSB to push Pol II forward. If the obstacle can be bypassed, Pol II resumes elongation and EC* is re-established. If the obstacle cannot be overcome, CRL4 CSA ubiquitylates Pol II at K1268, leading to recruitment of TFIIH near UVSSA 4 and downstream DNA. TFIIH may then use ATPase activity 17 to push Pol II backwards 37 and enable DNA repair. Finally, rearrangement of CRL4 CSA leads to polyubiquitylation of CSB and degradation by the proteasome, which releases TCR factors that are all anchored via CSB. Release of TCR factors liberates the Pol II sites for DSIF and RTF1, and EC* is re-established. Pol II may resume transcription after DNA repair. Interconversion between EC* and EC TCR may facilitate both bypass of small DNA lesions and repair of large lesions. Alternatively, Pol II becomes persistently stalled and is degraded 38 .

Online content
Any methods, additional references, Nature Research reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/s41586-021-03906-4.   Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Methods
No statistical methods were used to predetermine sample size. The experiments were not randomized, and the investigators were not blinded to allocation during experiments and outcome assessment.

Cloning and protein expression
Vectors encoding full-length human CSA, DDB1, CSB, UVSSA, CUL4A and mouse RBX1 were obtained from Harvard Medical School PlasmID Repository. Genes were amplified by PCR and cloned into respective vectors by ligation-independent cloning 39 . CSA and RBX1 were cloned into the 438A vector (addgene no. 55218), and CSB, DDB1, UVSSA and CUL4A were cloned into the 438B vector (addgene no. 55219), resulting in no tag or 6×His tag, respectively. CSA and DDB1, and CSA, DDB1, CUL4A and RBX1 were combined into single vectors by ligation-independent cloning 39 . The CSB ATPase-deficient mutant (CSB K538R) 40 and the pulling hook mutant (CSB F796A) were produced by around-the-horn-mutagenesis and expressed and purified as their wild-type counterparts. For fluorescent labelling of CSB, the gene was cloned into the 438-SNAP-V1 vector (addgene no. 55222), which resulted in a SNAPf tag at the N terminus. For fluorescent labelling of DSIF, a ybbR tag 41 preceded by a GGGG linker was introduced to the C terminus of Spt4 by around-the-horn mutagenesis.
For all TCR proteins, appropriate protein fractions were pulled, mixed with 2 mg of TEV protease and dialysed overnight against the dialysis buffer (400 mM NaCl, 20 mM Tris-HCl pH 7.9, 10% glycerol (v/v) and 1 mM DTT). After, dialysis protein solution was passed through a 5-ml HisTrap column equilibrated in dialysis buffer. Flow-through containing the protein was collected, concentrated and loaded onto Superdex 200 10/300 increase column (GE Healthcare) equilibrated in storage buffer (400 mM NaCl, 20 mM NaOH:HEPES pH 7.5, 10% glycerol (v/v) and 1 mM DTT). Peak fractions were pulled, concentrated, flash frozen and stored at −80 °C.
CSB containing N-terminal SNAPf and TwinStrepII tag was purified as follows. The clarified lysate was incubated with 1 ml of Strep-TactinXT 4Flow high-capacity resin (IBA) pre-equilibrated in lysis buffer and washed extensively with lysis buffer. Protein was eluted with BXT buffer (IBA), concentrated and loaded onto Superdex 200 10/300 increase column (GE Healthcare) equilibrated in storage buffer (400 mM NaCl, 20 mM NaOH:HEPES pH 7.5, 10% glycerol (v/v) and 1 mM DTT). Peak fractions were pulled, concentrated, flash frozen and stored at −80 °C.  TTT TTT TCT CGG TGT GGT GTG TGG TGT ATG  TAG (non-template strand); and /5Cy5/rUrUrA rUrArU rUrUrU rArUrU rCrUrU rArUrC rGrA rGrArG rGrA (RNA). Template-strand DNA and RNA were mixed in equimolar ratio and annealed in water by heating to solution to 95 °C followed by slow cooling (1 °C per min) to 4 °C. Pol II was mixed with DNA-RNA scaffold in equimolar ratio and incubated at 30 °C for 10 min. Next, 1.5 M excess of non-template DNA was added and the solution was incubated for 10 min more at 30 °C. A typical RNA extension reaction contained Pol II (200 nM) in the final buffer containing 100 mM NaCl, 20 mM HEPES pH 7.5, 5% (v/v) glycerol, 5 mM MgCl 2 and 1 mM DTT. When proteins were titrated, the highest protein concentration was 2 μM (in case of a protein mixture, concentration of each factor was 2 μM), followed by a half-log dilution series. In the case of the DSIF-CSB competition assay, Pol II was pre-incubated with 1.5× excess of DSIF before addition of TCR factors. Reactions were pre-incubated at 37 °C for 5 min and started with the addition of NTPs (0.5 mM GTP, UTP and CTP, 1 mM ATP and 0.5 mM dATP). Reactions were quenched with 2× quenching buffer (7 M urea in TBE buffer, 20 mM EDTA and 10 μg ml −1 proteinase K (Thermo Scientific)). Proteins were digested for 30 min at 37 °C. RNA products were separated on a sequencing gel and visualized with a Typhoon FLA 9500 (GE Healthcare Life Sciences). Gel quantification was performed with ImageJ software and data were plotted with Prism 9 software.

RNA extension assays
Three-colour electrophoretic mobility shift assay DNA and RNA oligonucleotides were ordered from Integrated DNA Technologies. Sequences 19 used in the assay are: /56-FAM/CGC TCT GCT CCT TCT CCC ATC CTC TCG ATG GCT ATG AGA TCA ACT AG (template strand); CTA GTT GAT CTC ATA GCC ATC GAG AGG ATG GGA GAA GGA GCA GAG CG (non-template strand); and rArCrA rUrCrA rUrArA rCrArU rUrUrG rArArC rArArG rArArU rArUrA rUrArU rArCrA rArArA rUrCrG rArGrA rGrGrA (RNA). For this assay, CSB and DSIF were fluorescently labelled. SNAPf-CSB (50 μM) was incubated with 10× molar excess of SNAP-Surface 546 substrate (New England BioLabs) overnight at 4 °C in CSB storage buffer. Labelled CSB was purified from the excess dye by Superdex 200 10/300 increase column (GE Healthcare) equilibrated in storage buffer (400 mM NaCl, 20 mM NaOH:HEPES pH 7.5, 10% glycerol (v/v) and 1 mM DTT). Labelling efficiency was around 100%. DSIF subunit SPT4 contained a ybbR tag on the C terminus and the protein was labelled by using Sfp phosphopantetheinyl transferase, as previously described in detail 41 . Substrate for the labelling reaction was LD666-CoA (Lumidyne Technologies) and the labelling efficiency was around 85%. The Pol II elongation complex was assembled by incubating Pol II with 1.3× excess of template strand:RNA for 10 min at 30 °C, followed by the addition of 1.5× excess of non-template strand and further incubation for 10 min at 30 °C. Next, the Pol II elongation complex was supplemented with 1.2× excess of DSIF and incubated for 10 min at 30 °C. Finally, CSB was titrated in the reaction and the reaction was further incubated for 10 min at 30 °C. Final reaction contained Pol II (100 nM), DSIF (120 nM) and CSB (400 nM, 200 nM, 150 nM, 100 nM, 50 nM and 25 nM) in final buffer containing 100 mM NaCl, 20 mM HEPES pH 7.5, 10% glycerol, 2 mM MgCl 2 and 1 mM DTT. Reactions were loaded on a NativePAGE 3-12% Bis-Tris gels (Thermo Scientific) and ran at 150 V for 1.5 h. The gels were scanned in Typhoon FLA 9500 (GE Healthcare Life Sciences) in three different channels for the visualization of template-strand DNA, CSB and DSIF.

Analytical size-exclusion chromatography
Analytical size-exclusion chromatography was used to monitor association of TCR factors with Pol II (Extended Data Fig. 1e) and to monitor RTF1 association with EC* and EC TCR (Extended Data Fig. 8a). In the case of TCR factors, the proteins were mixed in equimolar ratios in the final size-exclusion buffer (100 mM NaCl, 20 mM HEPES 7.5, 5% glycerol, 1 mM MgCl 2 and 1 mM DTT) and ran over a Superose 6 Increase 3.2/300 column. The Pol II elongation complex was formed as for structure 1. In the case of RTF1 binding, all factors were added to the pre-formed Pol II elongation complex in 1.5× excess in the final size-exclusion buffer and incubated for 1 h at 30 °C in the presence of 1 mM ATP and P-TEFb. The complexes were injected onto a Superose 6 Increase 3.2/300 column and the fractions were analysed by SDS-PAGE. The template strand and RNA used for the EC* and EC TCR formation were the same (template strand: CGC TCT GCT CCT TCT CCC ATC CTC TCG ATG GCT ATG AGA TCA ACT AG; RNA: rArCrA rUrCrA rUrArA rCrArU rUrUrG rArArC rArArG rArArU rArUrA rUrArU rArCrA rArArA rUrCrG rArGrA rGrGrA) but differed in the non-template strand, which was fully complementary to the template strand in the case of EC* (non-template strand: CTA GTT GAT CTC ATA GCC ATC GAG AGG ATG GGA GAA GGA GCA GAG CG) or formed a large bubble with the template strand in the case of EC TCR (non-template strand: CTA GTT GAT CTC ATA TTT CAT TCC TAC TCA GGA GAA GGA GCA GAG CG).

In vitro ubiquitylation assay
Ubiquitin, UBE1 and UbcH5b were purchased from Boston Biochem. The Pol II elongation complex was formed as for structural analysis of structure 1. The ubiquitylation reaction contained Pol II ECs (0.8 μM), CSB (0.8 μM), UVSSA (0.8 μM), CSA-DDB1-CUL4A-RBX1 (0.8 μM), UBE1 (150 nM), UbcH5b (0.5 μM) and ubiquitin (300 μM) in 100 mM NaCl, 50 mM Tris pH 7.9, 10 mM MgCl 2 , 0.2 mM CaCl 2 , 5% glycerol and 1 mM DTT. Reactions were started by the addition of ATP (3 mM) and stopped with EDTA (15 mM). Proteins were separated on NuPAGE 4-12% Bis-Tris protein gels (Invitrogen) and stained with Coomassie. In the case of the ubiquitylation assay in the absence of CSB or UVSSA, the assay was performed as described above, but with lower concentrations of Pol II, CRL4 CSA and CSB or UVSSA (0.4 μM). The proteins were separated on 3-8% Tris-acetate gel (Invitrogen) and transferred onto a PVDF membrane with a Trans-Blot Turbo Transfer System (Bio-Rad) for immunoblotting. The membrane was blocked with 5% (w/v) milk powder in PBS containing 0.1% Tween-20 (PBST) for 1 h at room temperature. The membrane was then incubated with F-12 anti-RPB1 antibody (1:100 dilution; Santa Cruz Biotechnology) in PBST supplemented with 2.5% (w/v) milk powder. After washing the membrane with PBST, the membrane was incubated with an anti-mouse HRP conjugate (1:3,000 dilution; ab5870, Abcam) in PBST supplemented with 1% (w/v) milk powder for 1 h at room temperature. The membrane was developed with SuperSignal West Pico Chemiluminescent Substrate (Thermo Fisher) and scanned with a ChemoCam Advanced Fluorescence imaging system (Intas Science Imaging).

ATPase assay
The enzyme-coupled ATPase assay uses two separate fast enzymatic reactions to couple ATP regeneration to NADH oxidation. The typical reaction contained 100 nM protein in buffer containing 50 mM potassium acetate, 20 mM KOH-HEPES pH 7, 5 mM magnesium acetate, 5% glycerol (v/v), 0.2 mg ml −1 BSA, 3 mM phosphoenolpyruvate (PEP), 0.3 mM NADH and excess pyruvate kinase and lactate dehydrogenase enzyme mix (Sigma). The reaction mixture was incubated for 10 min at 30 °C and the reaction was started by addition of ATP (1.5 mM final). The rate of ATP hydrolysis was monitored by measuring a decrease in the absorption at 340 nm using the Infinite M1000Pro reader (Tecan). Resulting curves were fit to a linear model using GraphPad Prism version 9.

Crosslinking mass spectrometry
The Pol II elongation complex was formed as described in the RNA extension assay. DNA and RNA sequences used for elongation complex formation are the following 19 : CGC TCT GCT CCT TCT CCC ATC CTC TCG ATG GCT ATG AGA TCA ACT AG (template strand); CTA GTT GAT CTC ATA TTT CAT TCC TAC TCA GGA GAA GGA GCA GAG CG (non-template strand); and rArUrC rGrAr GrArG rGrA (RNA). Equimolar amounts of elongation complex, CSB, CSA-DDB1 and UVSSA were mixed in the final complex formation buffer of 100 mM NaCl, 20 mM HEPES pH 7.5, 1 mM DTT, 1 mM MgCl 2 and 5% glycerol. The complex was incubated at 30 °C for 10 min and subsequently purified over a Superose 6 Increase 3.2/300 column equilibrated in complex formation buffer. For BS3 crosslinking, the protein solution was supplemented with 1 mM BS3 and incubated at 30 °C for 30 min. The crosslinking was quenched with 50 mM ammonium bicarbonate. For EDC crosslinking, the complex formation buffer contained HEPES pH 6.7 instead of pH 7.5. The protein solution was supplemented with 2 mM EDC and 5 mM sulfo-NHS and incubated at 30 °C for 30 min. The crosslinking reaction was quenched with 50 mM 2-mercaptoethanol and 20 mM Tris pH 7.9.
Analysis of crosslinked peptides was performed as previously described 17  Mass spectrometry analysis was performed on the Q Exactive HF-X Mass Spectrometer (Thermo Fisher Scientific) coupled with the Dionex UltiMate 3000 UHPLC system (Thermo Fisher Scientific). Online chromatographical separation was achieved with an in-house packed C18 column (ReproSil-Pur 120 C18-AQ, 1.9-μm pore size, 75-μm inner diameter and 30 cm in length; Dr. Maisch). Samples were analysed as three 5-μl injections, separated on a 75-min gradient: flow rate of 300 nl min −1 ; mobile phase A was 0.1% (v/v) FA; mobile phase B was 80% (v/v) ACN and 0.08% (v/v) ACN. The gradient was formed with an increase from 8%/12%/18% mobile phase B to 38%/42%/48% (depending on the fraction). MS1 acquisition was achieved with the following settings: resolution of 120,000; mass range of 380-1,580 m/z; injection time of 50 ms; and automatic gain control target of 1 × 10 6 . MS2 fragment spectra were collected with dynamic exclusion of 10 s and varying normalized collision energy for the different injection replicates (28%/30%/28-32%) and the following settings: isolation window of 1.4 m/z; resolution of 30,000; injection time of 128 ms; and automatic gain control target of 2 × 10 5 .
Result raw files were converted to the mgf format with ProteomeDiscoverer 2.1.0.81 (Thermo Fisher Scientific): signal-to-noise ratio of 1.5, and precursor mass of 350-7,000 Da. Crosslinked peptides were identified with pLink v2.3.9 (pFind group 44 ) and the following parameters: missed cleavage sites was 3; fixed modification was carbamidomethylation of cysteines; variable modification was oxidation of methionines; peptide tolerance was 10 p.p.m.; fragment tolerance was 20 p.p.m.; peptide length was 5-60 amino acids; and the spectral false discovery rate was 1%. The sequence database was assembled from all proteins within the complex. Crosslink sites were visualized with XiNet 45 and the Xlink Analyzer 46 plugin in Chimera.
The samples for the ubiquitylation analysis were produced by an in vitro ubiquitylation assay as described above. Control sample was prepared in the same way but without the addition of ubiquitin to make sure that endogenously purified Pol II was not already ubiquitylated. In addition to site-specific Pol II ubiquitylation, promiscuous ubiquitylation of free CSB and UVSSA was observed that probably resulted from a population of TCR factors not bound to Pol II.
For mass spectrometry, the samples were reduced with 5 mM DTT for 30 min at 37 °C and alkylated with 20 mM chloroacetamide for 30 min at room temperature. Unreacted chloroacetamide was quenched by supplementing an additional 5 mM DTT. Proteolytic digestion was performed overnight in denaturing conditions (1 M urea) with trypsin (Promega) in a 1:20 (w/w) protein ratio. The digestion mixtures were acidified with FA to 1% (v/v) end concentration and ACN was added to 5% (v/v) final concentration. Reversed-phase chromatographical purification for mass spectrometric analysis was performed with Harvard Apparatus Micro SpinColumns C18 by washing away salts and contaminants with 5% (v/v) ACN and 0.1% (v/v) FA. Purified peptides were eluted with 50% (v/v) ACN and 0.1% (v/v) FA. The peptide mixture was dried under vacuum and resuspended in 2% (v/v) ACN and 0.05% (v/v) TFA (5 μl for 1 μg of estimated protein amount before digestion).
Liquid chromatography with tandem mass spectrometry analysis was performed by injecting 4 μl of the samples in the Dionex UltiMate 3000 UHPLC system (Thermo Fisher Scientific) coupled with the Orbitrap Fusion Tribrid Mass Spectrometer (Thermo Fisher Scientific). Peptides were separated on an in-house packed C18 column (ReproSil-Pur 120 C18-AQ, 1.9-μm pore size, 75-μm inner diameter and 31 cm in length; Dr. Maisch). Chromatographical separation was achieved with 0.1% (v/v) FA (mobile phase A) and 80% (v/v) ACN and 0.08% (v/v) ACN (mobile phase B). A gradient was formed by the increase of mobile phase B from 5% to 42% in 43 min. Eluting peptides were analysed by data-dependent acquisition with the following MS1 parameters: resolution of 60,000; scan range of 350-1,500 m/z; injection time of 50 ms; and automatic gain control target of 4 × 10 5 . Analytes with charge states 2-7 were selected for higher-energy collisional dissociation with 30% normalized collision energy. Dynamic exclusion was set to 10 s. Fragment MS2 spectra were acquired with the following settings: isolation window of 1.6 m/z; detector type was orbitrap; resolution of 15,000; injection time of 120 ms; and automatic gain control target of 5 × 10 4 .
The resulting acquisition files were analysed with MaxQuant 47 (v1.6.17.0). Fragment peptide spectra were searched against a database containing all proteins of the complex and common protein contaminants. Oxidation of methionines, acetylation of protein N terminus and ubiquitylation residue on lysines were set as variable modifications. Carbamidomethylation of cysteines was set as a fixed modification. Default settings were used with the following exceptions: main search peptide tolerance was set to 6 p.p.m.; trypsin was selected for digestion enzyme; and maximum missed cleavages were increased to 3.

Cryo-EM sample preparation and image processing
The same DNA scaffolds were used for all structures 19 : CGC TCT GCT CCT TCT CCC ATC CTC TCG ATG GCT ATG AGA TCA ACT AG (template strand) and CTA GTT GAT CTC ATA TTT CAT TCC TAC TCA GGA GAA GGA GCA GAG CG (non-template strand). In the case of Pol II complex formation with TCR factors only, the shorter RNA was used: rArUrC rGrArG rArGrG rA. If SPT6, PAF and RTF1 were also present, longer RNA was used: rArCrA rUrCrA rUrArA rCrArU rUrUrG rArArC rArArG rArArU rArUrA rUrArU rArCrA rArArA rUrCrG rArGrA rGrGrA. The elongation complex was formed as in the RNA extension assays. For the Pol II-CSB-CSA-DDB1-UVSSA structure, the pre-formed elongation complex was mixed with twofold excess of TCR factors in complex formation buffer containing 100 mM NaCl, 20 mM HEPES pH 7.5, 1 mM MgCl 2 , 4% glycerol and 1 mM DTT. The protein solution was incubated at room temperature for 10 min and purified by the Superose 6 Increase 3.2/300 column equilibrated in complex formation buffer. Peak fractions were crosslinked with 0.1% glutaraldehyde on ice for 10 min and quenched with a mixture of lysine (50 mM final) and aspartate (20 mM final). The quenched protein solution was dialysed in Slide-A-Lyzer MINI Dialysis Device of 20K MWCO (Thermo Fisher Scientific) for 6 h against the complex formation buffer without glycerol. For the Pol II-CSB-CSA-DDB1-UVSSA-ADP•BeF 3 structure, the complex was supplemented with 0.5 mM ADP•BeF 3 before complex purification by size-exclusion chromatography. In the case of complex formation between Pol II, TCR factors, PAF, SPT6 and RTF1, the pre-formed elongation complex was mixed with twofold excess of all proteins in complex formation buffer. In addition, the reaction was supplemented with P-TEFb and ATP (1 mM final), as previously described 30 . Because ATP was present, we used a CSB ATPase-deficient mutant for complex formation. The complex was incubated at 30 °C for 1 h and purified by a Superose 6 Increase 3.2/300 column equilibrated in complex formation buffer. Downstream steps including crosslinking and dialysis were the same as for the previous samples. Dialysed samples were immediately used for the preparation of cryo-EM grids. Of the sample, 4 μl was applied to glow-discharged R2/1 carbon grids (Quantifoil), which were blotted for 5 s and plunge-frozen in liquid ethane with a Vitrobot Mark IV (FEI) operated at 4 °C and 100% humidity.
Micrographs were acquired on a FEI Titan Krios transmission electron microscope with a K3 summit direct electron detector (Gatan) and a GIF quantum energy filter (Gatan) operated with a slit width of 20 eV. Data collection was automated using SerialEM 48 and micrographs were taken at a magnification of ×81,000 (1.05 Å per pixel) with a dose of 1-1.05 e/Å 2 per frame over 40 frames. For Pol II-CSA-DDB1-CSB-UVSSA, a total of 10,300 micrographs were acquired; for Pol II-CSA-DDB1-CSB-UVSSA-ADP•BeF 3 , 10,940 micrographs were acquired; for Pol II-CSA-DDB1-CSB-UVSSA-SPT6-PAF, 8,365 micrographs were acquired; and for Pol II-CRL4 CSA -CSB-UVSSA-Spt6-PAF, 19,472 micrographs were acquired. Estimation of the contrast-transfer function, motion correction and particle picking was done on-the-fly using Warp 49 . Initial 2D classification and 3D classification steps were done in CryoSPARC 50 , followed by further processing in RELION 3.0 (refs [51][52][53] ). Owing to the flexibility of proteins on the Pol II surface, many rounds of signal subtraction and focused classifications were performed, as detailed for every dataset in Extended Data Figs. 3, 5, 7, 9. As a result, the focused classified maps were assembled into a final composite map for each structure. Masks were created with UCSF Chimera 54 . The final composite maps were created from focused refined maps and denoised in Warp 49 .

Model building and refinement
The focused refined maps and the final composite maps were used for model building. For the Pol II-CSB-CSA-DDB1-UVSSA structure, we first docked existing structures into the density. An initial CSB model was produced with SWISS-MODEL 55,56 using the Rad26 structure (Protein Data Bank (PDB) code: 5VVR 19 ) as the template. The model was fitted into the CSB focused refined map in Chimera 54 and rebuilt in Coot 57 , followed by real-space refinement in PHENIX 58 . The CSA-DDB1 crystal structure (PDB code: 4A11 (ref. 14 )) was fitted into the CSA-DDB1 focused refined map and real-space refinement in PHENIX 58 . During 3D classifications, the β-propeller B of DDB1 was found to adopt many different conformations, apparently rotating around the junction with the rest of the protein, and the final model reflects the most commonly observed conformation. The N-terminal VHS domain of UVSSA was predicted with SWISS-MODEL 55,56 using the GGA3 VHS domain as a template (PDB code: 1JPL 59 ). Guided by the crosslinking mass spectrometry data and EM density, the model was fitted into the CSA-UVSSA focused refined map, followed by several rounds of flexible fitting in Namdinator 60 and real-space refinement in PHENIX 58 . The Pol II model (PDB code: 7B0Y 61 ) was fitted into the final map and nucleic acids were modified and built in Coot. All protein models were combined in Coot and real-space refined in PHENIX into the final composite map using secondary structure, base-pairing and base-stacking restrains. For the Pol II-CSB-CSA-DDB1-UVSSA-ADP•BeF 3 model, ADP•BeF 3 was fitted into the density together with the Pol II-CSB-CSA-DDB1-UVSSA model and real-space refined in PHENIX into the final composite map using secondary structure, base-pairing and base-stacking restrains.
For the Pol II-CSB-CSA-DDB1-UVSSA-SPT6-PAF structure, the SPT6 and PAF models (PDB code: 6TED 21 ) were fitted into corresponding focused refined maps, adjusted in Coot and real-space refined in PHENIX. Owing to the improved resolution of the SPT6 core, we built an atomic model for it (the SPT6 core was previously modelled on the backbone level). The C-terminal part of LEO1 was displaced in our structure, and therefore these elements were manually built in Coot and deposited as polyalanine because the register could not be determined with certainty. RNA outside Pol II was poorly resolved, presumably due to the absence of DSIF, so we modelled it on the basis of the previous structure (PDB code: 6TED 21 ). All models were combined in Coot and real-space refined in PHENIX in the final composite map. In the case of the Pol II-CSB-CRL4 CSA -UVSSA-SPT6-PAF complex, 3D classification of the stably bound CSA-DDB1-CSB complex revealed two distinct conformations of CUL4A-RBX1. In the first conformation (state 1), CUL4A interacts with UVSSA; in the second conformation (state 2), CUL4A interacts with CSB. Owing to increased flexibility of CUL4A-RBX1, only a smaller subset of particles was used for the final focused refinement of this region. Both focused refinement rounds yielded reconstructions with well-resolved CSA-DDB1, which was then used to resample maps on the map of CSA-DDB1-CSB reconstructed from all particles with stably bound TCR proteins. The crystal structure of the CUL4A-RBX1 (PDB code: 4A0K 14 ) complex was fitted into the corresponding focused refined maps, followed by several rounds of flexible fitting in Namdinator 60 and real-space refinement in PHENIX 58 . The β-propeller B of DDB1 was manually adjusted in Chimera and Coot for both CRL4 CSA conformations. The model of Pol II-CSB-CSA-DDB1-UVSSA-SPT6-PAF was combined with CUL4A-RBX1 in Coot and the complete models were real-space refined in corresponding composite maps in PHENIX using secondary structure, base-pairing and base-stacking restrains. For  Fig. 3, full RBX1 was modelled on the basis of a CUL4A-RBX1 structure (PDB code: 2HYE) 62 due to lower map quality in this region, and the E2 enzyme-donor ubiquitin complex was not present in the complex and was modelled on the basis of a RNF4 RING-UbcH5a-ubiquitin structure (PDB code: 4AP4) 63 . In the case of structures containing a CSB ATPase-deficient mutant, the ATPase lobe 2 of CSB is very flexible. Since the complex was incubated with ATP, it is likely that the structure contains a mixture of empty and ATP-bound CSB molecules, resulting in both pre-translocated and post-translocated states of CSB. Final models were validated in Molprobity 64 and the figures were generated with Chimera 54 and ChimeraX 65 .

Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this paper.

Data availability
The electron density reconstructions and structure coordinates were deposited to the Electron Microscopy Database (EMDB) and to the PDB under the following accession codes: EMDB-13004 and PDB 7OO3 for structure 1, EMDB-13009 and PDB 7OOB for structure 2, EMDB-13010 and PDB 7OOP for structure 3, EMDB-13015 and PDB 7OPC for structure 4, and EMDB-13016 and PBD 7OPD for structure 5. The crosslinking mass spectrometry data and the ubiquitin mapping data have been deposited to the ProteomeXchange Consortium via PRIDE with the dataset identifier PXD025328. Fig. 1 | Biochemical characterisation of TCR factors. a, In vitro transcription over an arrest sequence in the presence of TCR factors. The ratio of band intensity for bypass and arrest products was plotted for triplicate measurement. Data are presented as mean values ± SD. b, In vitro transcription over an arrest sequence in the presence of CSB or CSB-CSA-DDB1 complex. The experiment was repeated two times independently with similar results. Bar graph shows an average value for duplicate measurement. c, ATPase assay monitoring CSB activity in the presence of increasing amounts of CSA-DDB1. The experiment was repeated two times independently with similar results. d, ATPase assay monitoring stimulation of CSB activity in the presence of Pol II elongation complex, TCR factors and PAF. Analysis shown in h. e, Analytical size exclusion chromatography of CSA-DDB1, CSB, UVSSA, CSA-DDB1-CSB-UVSSA and Pol II-CSA-DDB1-CSB-UVSSA complexes. The two latter samples were analysed by SDS-PAGE, which confirmed complex composition and purity. The experiment was performed once for individual factors and at least three times for the complexes. f, In vitro transcription over an arrest sequence in the presence of CSB or CSB mutant F796A. The experiment was repeated two times independently with similar results. Bar graph shows an average value for duplicate measurement. g, ATPase assay monitoring the activity of CSB mutant F796A alone, in the presence of bubble DNA and in the presence of a Pol II elongation complex (EC). Analysis shown in h. h, Summary of ATPase assay results. The rate of ATP hydrolysis was plotted for triplicate measurement. The colour code as in panels d and g. Data are presented as mean values ± SD. For original gel scans and graph data associated with the Extended Data Fig. 1 Fig. 2 | Cross-linking mass-spectrometry interaction networks. a, Cross-linking mass-spectrometry interaction network within the Pol II-CSA-DDB1-CSB-UVSSA complex after crosslinking with BS3. (right) Crosslinks with the score above 3 that were detected at least twice are shown. (left) Crosslinks were mapped onto the Pol II-CSA-DDB1-CSB structure. Coloured rods connecting crosslinked residues represent permitted (blue) or non-permitted (red) crosslinking distances. 89% of mapped crosslink sites fall within the permitted crosslinking distance of 30 Å. 11% of crosslinks in violation of crosslinking distance are likely a result of complex flexibility or technical errors. Histogram shows the number of crosslinks detected at a particular crosslinking distance. b, Cross-linking mass-spectrometry interaction network within the Pol II-CSA-DDB1-CSB-UVSSA complex after crosslinking with EDC. (left) Crosslinks with the score above 3 that were detected at least twice are shown. (right) Crosslinks were mapped onto the Pol II-CSA-DDB1-CSB structure. 83% of mapped crosslink sites fall within the permitted crosslinking distance of 20 Å. 17% of crosslinks in violation of crosslinking distance are likely a result of complex flexibility or technical errors. Histogram shows the number of crosslinks detected at a particular crosslinking distance. c, BS3 crosslinks with a score above 3 that were detected at least twice and mapped onto Pol II. The Pol II surface area within the 30 Å radius of the crosslink site was colored as a protein footprint. d, BS3 and EDC crosslinks used to identify the CSAinteracting motif (CIM) in CSB. Fig. 7 | Cryo-EM analysis of the Pol II-CSB-CSA-DDB1-UVSSA-SPT6-PAF complex (Structure 3). a, Processing tree. Number of particles in a particular class is reported above the density. Densities used for further processing are coloured as in Fig. 2c. b, Final composite map created from the focused refined maps. c, Local resolution estimate for the composite map. d, Fourier shell correlation plots for all focused refined maps and the composite map. e, Angular distribution plot for the high-resolution Pol II class used as a starting point for focused classifications. Fig. 8 | Additional analysis of EC TCR . a, Analytical size-exclusion chromatography of EC* (left) and EC TCR (right) in the presence of RTF1. Peak fractions were analysed by SDS PAGE. While RTF1 elutes with EC* in stoichiometric amounts, it elutes with EC TCR sub-stoichiometrically, which is indicative of weaker association of RTF1 with EC TCR compared to EC*.

Extended Data
The experiment with EC*-RTF1 was performed once and with EC TCR -RTF1 twice. For gel source data, see Supplementary Fig. 1. b, Modelling shows clashes between UVSSA and extended downstream DNA, and between SPT6 and extended upstream DNA, suggesting that some repositioning of DNA and/or SPT6 and UVSSA occurs when DNA is longer.

Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.

n/a Confirmed
The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one-or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.
A description of all covariates tested A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted

Software and code
Policy information about availability of computer code Data collection Serial EM 3.8 beta 8 Data analysis RELION 3.0 beta-2, UCSF Chimera 1.13, UCSF ChimeraX v0.8, Coot 0.9, Warp v1.0.7, PHENIX 1.18, cryoSPARC 2.14.2, Prism 9, ImageJ version 1.47v, Molprobity 4.5.1, XlinkAnalyzer version 1.1 For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors and reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information.

Data
Policy information about availability of data All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: -Accession codes, unique identifiers, or web links for publicly available datasets -A list of figures that have associated raw data -A description of any restrictions on data availability The electron density reconstructions and structure coordinates were deposited with the Electron Microscopy Database (EMDB) and with the Protein Data Bank (PDB) under the following accession codes: PDB code 7OO3 and EMDB-13004 for Structure 1, PDB code 7OOB and EMDB-13009 for Structure 2, PDB code 7OOP and EMDB-13010 for Structure 3, PDB code 7OPC and EMDB-13015 for Structure 4 and PBD code 7OPD and EMDB-13016 for Structure 5. The crosslinking mass spectrometric data and the ubiquitin-mapping data have been deposited to the ProteomeXchange Consortium via the PRIDE with the dataset identifier PXD025328 .