Transcription by DNA-dependent RNA polymerases (RNAPs) is the first step in the expression of the genome in all forms of life. Eukarya use three structurally related nuclear multi-subunit RNAPs (Pol I, Pol II and Pol III) to transcribe distinct subsets of genes1. They cooperate with distinct sets of transcription factors (TFs) that enable recruitment to specific classes of promoters. As an exception, only the TATA-box binding protein (TBP) is common to all three polymerases and has hence been termed the ‘universal transcription factor’2.

The mechanism of Pol II transcription, which accounts for the production of cellular messenger RNA (mRNA), has been studied in great detail. Transcription starts with the formation of the pre-initiation complex (PIC), which melts the promoter, enforces exact positioning of the transcription machinery, and defines the template strand. In the subsequent initial transcription phase, the transcript length of 5–7 nucleotides marks a decision point, beyond which Pol II either transitions into processive RNA synthesis3,4 or aborts transcription. Passing of the decision point is referred to as promoter escape4,5 and is aided by an energy-loaded transition state in which melted downstream DNA is ‘scrunched’5,6 into the polymerase. Scrunching is an essential mechanism for start site selection and promoter escape that appears to be conserved from phages7,8 over bacteria9,10 to eukaryotes5,6,11.

Most DNA viruses make use of the Pol II transcription machinery of the host to express their genome. The poxviruses, which cause smallpox in humans and various zoonoses, are a remarkable exception12,13,14. They multiply exclusively in the cytoplasm of infected cells and thus depend on their own set of factors ensuring gene expression and replication. Studies on the prototypical vaccinia virus identified a virus-encoded multi-subunit RNA polymerase (vRNAP) associated with factors that ensure the production of polyadenylated and m7G-capped mRNA15,16,17,18,19. More recently, cryo-EM structures gave insight into the atomic details of vRNAP complexes and their mechanisms of transcription elongation and transcription-coupled capping20,21. The core vRNAP enzyme, consisting of subunits Rpo147, Rpo132, Rpo35, Rpo30, Rpo22, Rpo19, Rpo18 and Rpo7, is evolutionarily related to eukaryotic RNAPs. Strong idiosyncrasies, however, are apparent for viral factors that are associated with vRNAP during mRNA production22,23,24,25. How these factors relate to TFs of nuclear RNAPs is largely unknown. In vivo, the core vRNAP associates with five additional virus-encoded proteins and one host factor: transcription factor Rap94 with partial homology to TFIIB3,26,27 and essential for targeting vRNAP to the virion28, the capping enzyme D1/D1229, the helicase NPH-I30, the core protein E11, the early transcription factor VETF consisting of subunits VETFl and VETFs and cellular tRNAGln. This unit, termed complete vRNAP, is necessary and sufficient to bind early promoters, which consist of a single, A/T-rich consensus sequence, termed the critical region (CR) located upstream of the transcription start site (TSS)31. Following early promotor binding, the complete vRNAP enables the entire transcription process, including initiation, elongation and termination.

Here, we have used complete vRNAP to reconstitute a range of complexes that represent vaccinia initial transcription states from the pre-initiation phase to promoter escape. The structural analysis of these complexes uncovers principles of promoter recognition and melting, template strand capture and an unprecedented helicase-assisted upstream DNA scrunching mechanism prior to promoter escape. Our study hence not only unravels the mechanisms of poxviral gene expression but also helps to understand thus far unexplained universal principles of transcription.


Reconstitution of transcription initiation complexes

We used complete vRNAP purified from HeLa cells infected with an engineered vaccinia strain to reconstitute FLAG-tagged21 vRNAP pre-initiation and initially transcribing vaccinia vRNAP complexes. Transcriptionally active complete vRNAP was used to reconstitute transcription complexes on a vaccinia early promoter scaffold consisting of the CR, a non-complementary bubble including the TSS (+1) and a template cassette lacking G nucleotides (Fig. 1a). Complex formation was observed in the presence of nucleoside triphosphates (NTPs) upon incubation of complete vRNAP (Fig. 1b). A large-scale reconstitution of DNA-bound vRNAP complexes was separated by gradient centrifugation (Extended Data Fig. 1a), transferred onto holey carbon grids, and three cryo-EM datasets were collected (Extended Data Fig. 1b,c). After extensive three-dimensional (3D) classification, several distinctive particle classes could be separated that represented vRNAP complexes in transcription stages ranging from pre-initiation to co-transcriptional capping.

Fig. 1: Structure of the vaccinia PIC.
figure 1

a, Synthetic DNA scaffold used for complex formation. b, Band shift assay (native gel electrophoresis) of complete vRNAP in the presence of the radiolabeled scaffold, as specified in a. c, Domain structure of VETFs, VETFl and Rap94. d, Overall structure of the PIC in two orthogonal views. The core polymerase is depicted in gray as a solvent-accessible surface. usDNA, upstream DNA; dsDNA, downstream DNA. e, Transparent isosurface of the DNA cryo-EM density, filtered by Gaussian blur with 1.5σ standard deviation, and the DNA model, shown in cartoon style. Approximated helix axes of the different duplex DNA sections are indicated, and the translation of the helix axes of the two duplex DNA regions adjacent to the IMR is denoted. This view is rotated by 20° relative to d. f, Detailed view of VETF and promoter DNA. Two views of VETF with the bound promoter within the PIC are displayed. For easier visualization, the core polymerase is hidden. g, Complete vRNAP residual density (EMD 4868, gray transparent isosurface) docked with the VETFl structure and shown along with the complete vRNAP model (PDB 6RFL) in cartoon representation (color code as in Figs. 13 and ref. 21 for the complete vRNAP-specific factors). The core polymerase is depicted in gray as a solvent-accessible surface. The predominantly disordered interface of VETFl and the tRNA aminoacyl stem is marked with an orange dotted line.

Source data

Cryo-electron microscopy structure of the PIC

Biochemical studies had previously shown that vRNAP, VETF and Rap94 are required for early transcription initiation26,32,33. We identified one particle class in our reconstitution that contained these factors along with the DNA scaffold, and thus represented the bona fide PIC32,33. Its single-particle reconstruction displayed an overall resolution of 3.0 Å and diffuse density for DNA and VETF. Signal subtraction and focused refinement resolved the VETF–DNA subcomplex (Extended Data Fig. 1d–i and Table 1). The density was docked with the core vRNAP model and the VETFl and VETFs chains and parts of Rap94 were traced de novo, allowing complete modeling of the PIC (Fig. 1c,d). Within the PIC, the promoter is positioned at the distal edge of the polymerase cleft. The upstream DNA contacts the protrusion domain of the polymerase subunit Rpo132, directly adjacent to the C-terminal domain (CTD) of Rap94. The downstream promoter region interacts with the vRNAP core through positions on the clamp head (Extended Data Fig. 2a). The melted promoter region is predominantly disordered but could be visualized with mild Gaussian filtering (Fig. 1e). It localizes centrally above the opening of the cleft forming a second contact zone with the clamp head (Extended Data Fig. 2a). Both DNA strands appear only minimally separated within the bubble region. The latter joins the adjacent double-helical upstream and downstream sections in a 100° angle accompanied by a 25-Å translational shift of the helix axes (Fig. 1e). We thus conclude that the DNA is in the initially melted state.

Table 1 Cryo-EM data collection, refinement and validation statistics

Of note, neither the B-cyclin, nor the B-homology region of the early transcription factor Rap94 establish direct DNA contacts in the PIC (Fig. 1d and Extended Data Fig. 2a). However, on the opposite side of the core vRNAP, VETFs and VETFl contact the DNA in the distal upstream and downstream promoter regions, respectively (Fig. 1f and Extended Data Fig. 2b,c). Therefore, and due to the absence of contacts in the initially melted region (IMR), the VETF heterodimer appears to be anchored like a bridge on the upstream and downstream region of the promoter (Fig. 1d and Extended Data Fig. 2b).

Unique mode of DNA-binding by the VETF heterodimer

The structure of VETF allowed us to decipher the mechanisms of vRNAP recruitment to the early promoter. VETFl folds into five distinct domains, termed NTD (N-terminal domain), TBPLD, CRBD, domain 4 and CTD (C-terminal domain) (Fig. 1c,f). Despite the absence of a priori detectable sequence homology, the second domain displays a bilobal TBP fold, and hence is a TBP-like domain (TBPLD). It is located centrally above the polymerase cleft and, unlike TBP in other structures of PICs, contacts the promoter in a sequence-independent manner. Sequence-specific DNA binding in the vaccinia PIC is instead facilitated by the neighboring domain, which recognizes the CR (Fig. 1a,d,f). Based on its fold and binding mode, this module constitutes a novel type of double-stranded DNA binding domain, hence termed the critical region binding domain (CRBD). Although holding only a limited content of secondary structure elements, it gains structural rigidity through three disulfide bridges that position a 310-helix ideally for its insertion into the major groove of the DNA (Fig. 2a). The side chain-to-base contacts of this helix are the major sites for sequence-specific readout of the promoter sequence (Fig. 2b,c). Only weak bending of the DNA helix axis is introduced in this region (Fig. 2a).

Fig. 2: Structure of the VETF heterodimer.
figure 2

a, VETFl CRBD binding to the upstream critical promoter region. Disulfide bridges are depicted as a stick model. b, Details of the VETFl CRBD promoter interaction. The model is depicted in stick representation and base pairs are numbered relative to the TSS. Only bases for the non-template strand are labeled, and the template strand is sequence complementary. Contact between Tyr367 and thymidine bases at positions −18 and −17 are displayed as a transparent van der Waals (vdW) surface. The protein-DNA hydrogen-bond network is depicted as dotted yellow lines. c, Schematic of the sequence-specific interactions of the CRBD reader. The CR consensus sequence is depicted according to Yang and others37. The vdW interactions are indicated in gray and hydrogen bonds in yellow. d, Details of VETFs binding to the upstream promoter. The intercalating ‘wedge’ residue Phe271 is shown in stick representation.

The joint structural context of TBPLD and CRBD in VETFl establishes specific contacts to the upstream promoter (Extended Data Fig. 2d). On the core vRNAP, this part of the promoter is anchored via the interaction of domain 2 of Rap94 with the NTD of VETFl (Fig. 1d). All other domains of VETFl (NTD, domain 4 and CTD) contribute to the structural backbone of VETF. Domain 4 and the CTD of VETFl make up the interface to VETFs (Fig. 1f).

The downstream promoter interacts almost exclusively with VETFs (Figs. 1d and 2d and Extended Data Fig. 2b). Only one additional pointed contact to the core vRNAP is established by the clamp head close to the TSS (Extended Data Fig. 2a). We observe a striking similarity of the first two domains of VETFs with the canonical helicase fold of chromatin remodeling SNF2-type ATPases22,34, of which INO8035 is the closest homolog. With the latter, VETFs shares, along with the vRNAP-associated transcription factor NPH-I, an extended brace helix that stably bridges the N- and C-lobe of the helicase. The intense DNA interaction of the VETFs’ helicase module is accompanied by a strong bend of the helix (Extended Data Fig. 2e). At the point of inflection, Phe271 intercalates via the minor groove, effectively disturbing the planar base-stacking over the range of roughly three base pairs on either side of the insertion site (Fig. 2d). Although melting of the two DNA strands is not observed at this position, this mechanism bears some similarity to the ‘scalpel’ method of strand-separating helicases36 (see Extended Data Fig. 2f–h for a comparison to the Pol II system).

Promoter positioning and enforcement of directionality

We next asked how the DNA contacts established by the CRBD of VETFl control the initiation process. The 310-helix of CRBD inserts into the major groove, making it the reader head of VETF (hence termed the CRBD reader, Fig. 2a,b). The CR is essentially a consensus sequence of 15 A nucleotides, interrupted by a TG dinucleotide31,37 (Figs. 1a and 2c). Arg370 and Gln375 engage in base-specific hydrogen bonding that involves the bases of the TG motif on the non-template strand and the complementary AC dinucleotide on the opposing template strand (Fig. 2b,c and Supplementary Video 1). By this means, VETFl anchors the promoter in a defined position relative to the polymerase cleft. The CR displays a high propensity for A nucleotides downstream of the TG motif (Figs. 1a and 2c). Consistent with this, the C5 methyl groups of the corresponding complementary T nucleotides at positions −18 and −17 of the template strand interact cooperatively with the reader head by stacking with Tyr367. Inverse promoter binding would imply an unfavorable contact of Tyr367 with adenine bases (Fig. 2b,c) and thus a single promoter direction is coerced. By this means, the CRBD-DNA interaction ensures the (1) identification of the CR, (2) alignment of the CR relative to the polymerase cleft and (3) enforcement of transcription directionality. The CRBD is thus is the main regulator of the transcription initiation process.

Asymmetric DNA binding by the TBP-like domain of VETFl

Our structure identified VETFl as a TBP-like protein (TBPLP) whose TBPLD is engaged in an intricate contact network comprising the neighboring domains of VETFl, VETFs and Rap94 (Fig. 3a). Members of the TBPLD family had previously been identified solely by means of sequence homology. However, VETFl stands apart from previously known TBPLPs because of its extremely divergent sequence, which, until now, has prevented its classification as such. Nevertheless, the structural conservation of the TBPLD is comparably high, resulting in a Z-score of 4.2 determined by PDBeFold38 when matching it to PDB entry 1TBP. To compare their structures and binding modes, we aligned the TBPLD, the upstream DNA module of VETFl (Fig. 3b), with the yeast TBP (yTBP), TATA-box crystal structure (Fig. 3c). The TBPLD of VETFl features the characteristic saddle structure that was previously described for TBP39,40,41,42. However, the symmetry that is evolutionary conserved in TBP43,44 appears broken. As a consequence, and unlike TBP, which contacts the TATA-box symmetrically, VETFl binds the promoter asymmetrically and sequence-independently solely through its C-terminal TBP lobe. Most strikingly, the TBPLD inserts into the DNA major groove, contrary to the canonical binding mode of TBP, which is based on minor groove insertion. In accordance with this observation, the two strictly conserved pairs of DNA-intercalating phenylalanine residues on each lobe of TBP are absent in the TBPLD39,40,41,42. Still, the TBPLD induces a pronounced DNA bend via intercalation of aliphatic, rather than aromatic, side chains (Fig. 3b). In agreement with the fundamentally different binding mode of the TBPLD, a consensus TATA-box is absent from vaccinia early promoters31.

Fig. 3: Comparison of the TBP-like domain from vaccinia VETFl with yTBP.
figure 3

a, Schematic of the VETF–promoter interactions. b, The TBPLD of VETFl in two orthogonal views. Residues intercalating between the nucleobases are depicted in stick model form. c, Structure of the yTBP protein bound to a synthetic TATA-box hairpin DNA oligomer64 (PDB 1YTB) in two orthogonal views corresponding to the protein orientation of the VETFl TBPLD as seen in b.

Rearrangement of the complete vRNAP into the PIC

Complete vRNAP is the predominant vRNAP complex found in infected cells and is necessary and sufficient to execute viral early transcription. Accordingly, we have previously speculated that this unit becomes incorporated into virions as a pre-assembled unit to promote the restart of transcription in the next infection cycle21. To investigate the transformation of complete vRNAP into the PIC, we compared both structures and their cryo-EM reconstructions. The VETF heterodimer is already present in the complete vRNAP; however, defined density could only be observed for the CRBD of VETFl, while the remaining parts were mobile. Under the assumption that the adjacent TBPLD is flexibly joined to the CRBD, we were able to dock the diffuse residual density in the vRNAP reconstruction with the VETFl coordinates extracted from the PIC model, resulting in reasonable overlap. In the resulting structure (Fig. 1g) VETFl displays a flexible interface to tRNAGln. A comparison with the PIC structure reveals major reconfigurations, including the release of all associated factors from complete vRNAP except for the VETF heterodimer and Rap94 (Supplementary Video 1). This underlines the importance of complete vRNAP as a pre-formed early transcription unit and the high plasticity of vaccinia transcriptional complexes (see also Supplementary Video 1 for a summary of core aspects of the PIC).

Structure of the late PIC

The structural transition described above explains how complete vRNAP becomes recruited to the viral early promoter to form the PIC. We next solved the structure of vRNAP particle classes that represent bona fide transcription stages following the pre-initiation phase. Based on biochemical evidence, such particles are predicted to be devoid of VETF but contain Rap94. Particles of class 1, subclass 2 (Extended Data Fig. 3a), which yielded a reconstruction at a resolution of 3.0 Å (Extended Data Fig. 1b–d and Table 1) fulfilled this criterion. The density could be docked with the complete vRNAP model21. Disordered density corresponding to DNA is visible upstream next to the Rap94 CTD and within the downstream DNA channel. These sites roughly coincide with the DNA anchor points on the core vRNAP observed in the PIC (compare Fig. 1d). However, no density for the DNA transcription bubble or nascent RNA was detected in the active cleft (Fig. 4a). Instead, we found well-defined density for the highly phosphorylated stretch within the C terminus of Rpo30 (termed the phospho-peptide domain, PPD; Fig. 4b). It is in a similar conformation as in the complete vRNAP21 and follows the path of the template- and non-template strand in the elongation complex (EC). This allows its pairing with the B-reader of Rap94 (Fig. 4a,b) and enables single-strand capture at later stages (see ‘PPD assisted single-strand capture and formation of the initially transcribing complex’ below). We therefore conclude that this particle represents a late state of the PIC (lPIC) in which VETF has been expelled, the melted promoter has been handed over to the core vRNAP, but transcription has not yet been initiated.

Fig. 4: Structure of the late PIC.
figure 4

a, Model of the lPIC with density for the bound DNA oligomer shown as a blue surface, for the PPD (in transparent gold). b, Domain structure of the bound transcription factors. Disordered regions are marked by hatched boxes.

PPD assisted single-strand capture and formation of the initially transcribing complex

Next, we investigated the structural basis of lPIC conversion into an initially transcribing complex (ITC). Three vRNAP particle classes yielded reconstructions that were identified as different conformations of the ITC based on their composition and promotor positioning (lTC1–3, Extended Data Fig. 3a–d and Table 1). The exact location of the polymerase on the promoter could be determined, because its downstream blunt end was readily visible in the density (Extended Data Fig. 4a). In contrast to the lPIC, we observed ordered density for DNA in the downstream DNA channel and for a DNA/RNA hybrid above the active site (Fig. 5a). The PPD of Rpo30, which occupied the position of the DNA/RNA hybrid in the lPIC, has been displaced by the template strand. Consequently, the B-homology region has become mobile and is not visible in the density (Fig. 5b). No density for upstream DNA was identified. The three ITC complexes superimposed well but differed in the positioning of the DNA within the downstream DNA channel (Fig. 5a) and the state of the clamp (Fig. 7b). For ITC3, downstream DNA density was located in a shallower position and was comparably less ordered. In the ITC1 particle, the clamp is in a closed conformation with the DNA bound firmly and deep in the downstream DNA channel. ITC2 and ITC3 display an open clamp conformation and the downstream DNA appears mobile and bound in a shallower position. No substantial differences between the three ITC complexes were discernible with regard to the DNA/RNA hybrid region. Thus, the three ITC structures inform on the conformational flexibility of the ITC and, in concert with the lPIC structure, on the template-strand capture mechanism.

Fig. 5: Three structures of initially transcribing complexes.
figure 5

a, Model of ITC state 1 shown with an overlay of the downstream DNA from ITC conformation 2 and ITC conformation 3. The core polymerase is depicted in gray as a solvent-accessible surface. Downstream DNA density for ITC conformation 3 was weak, so the DNA model at this position was omitted in the corresponding PDB entry. b, Domain structure of the bound transcription factors. Disordered regions are marked by hatched boxes.

ATP-dependent upstream promoter scrunching

During 3D classification, one particular class stood out because it comprised particles considerably larger than the ITC (Extended Data Fig. 5a). After a further round of focused classification of these particles on the observed extra density, followed by multibody refinement, a reconstruction was obtained that allowed the construction of a complete model (Fig. 6a, Extended Data Fig. 5b–d and Table 1). This complex was classified as a late ITC (lITC), based on the positions of the blunt ends of the upstream and downstream promoter-DNA segments that are visible in the density (Extended Data Fig. 4b) as well as on the presence of a RNA/DNA hybrid. Except for Rap94, the core vRNAP was in a conformation similar to that observed in the ITC complexes. The path of the downstream DNA fitted best that observed in the ITC3 particle, indicating loose binding. The downstream blunt end of the DNA scaffold had advanced roughly five base pairs in the downstream direction compared to the ITC (Extended Data Fig. 4a). Massive extra density above the cleft was unambiguously attributed to upstream DNA-bound NPH-I, and the NTD Rap94 and B-cyclin domain of Rap94 (Fig. 6a). Strikingly, the Rap94 B-homology region, the NTD and adjacent linkers appeared entirely reconfigured in comparison to other vRNAP complexes (Extended Data Fig. 4d,e) and the whole path of the Rap94 chain was visible (Fig. 6b). We also note that the path of the upstream DNA in the lITC is fundamentally different from that observed in the vaccinia PIC and in the ITC of Pol II45.

Fig. 6: Structure of the lITC.
figure 6

a, Model of the lITC in two orthogonal views. The core polymerase is depicted in gray as a solvent-accessible surface. b, Domain structure of the bound transcription factors. Disordered regions are marked by hatched boxes. c, Detailed view of NPH-I bound to the upstream promoter DNA. The ‘wedge’ residue Phe273 at the center of the strand-separating mechanism is indicated.

The blunt ends of the DNA promoter scaffold are visible in the EM density of the lITC (Extended Data Fig. 4b), thus allowing us to determine the position of (and the size of) vRNAP relative to the transcription bubble (Extended Data Fig. 4c). Strikingly, the upstream end of the scaffold can only be accommodated within the lITC under the assumption of massive promoter scrunching. This includes 13 base pairs upstream of the artificial non-complementary region of the promoter scaffold that have been additionally melted when compared to the ITC (Extended Data Fig. 4c). It is likely that this condition enables promoter escape and hence contributes to the transition of the initiation phase into productive elongation (Supplementary Video 2).


In this study, we describe six vRNAP structures that represent snapshots of the poxviral transcription initiation phase. When viewed together, a comprehensive mechanistic picture of the early events during vaccinia transcription emerges.

The structure of the vaccinia PIC in the initially melted state provides insight into poxvirus early promoter identification and binding. The arc-shaped VETF heterodimer spans the polymerase cleft and upstream and downstream promoter elements and thus allows precise insertion of the TBPLD at the site of initial melting. The upstream contact to the promoter is established by VETFl, and its CRBD is the decisive element for its sequence-specific recognition. The CRBD recognizes the critical region of the promoter through a thus far unknown DNA-binding domain, which is stabilized by three disulfide bridges. This is in agreement with two early studies observing DNAse protection of the −15 to −30 promoter region46 as well as crosslinking of VETFl to the upstream and of VETFs to the downstream promoter region32. Cystine formation in the CRBD may be introduced by vaccinia-encoded enzymes47 as potential host factors for this task are confined to the endoplasmic reticulum. The TBPLD cooperates with the CRBD in upstream promoter binding and introduces a sharp DNA bend, which probably generates the nucleation site for promoter melting. Strikingly, the TBPLD of VETFl displays an asymmetric DNA binding mode. This sharply contrasts with the canonical, symmetric DNA binding mode observed in all TBP–DNA complexes solved so far, including PIC complexes of the nuclear polymerases. Our findings could help the understanding of the dual nature of TBP48, which, in its canonical binding mode, recognizes the TATA-box. Evidently, TBP is capable of an alternate, sequence-independent mode of action when directing the transcription machinery to TATA-less promoters49,50. Because vaccinia early promoters do not contain a TATA-box, an attractive explanation for the deviant binding mode of the TBPLD is that its orientation in the vaccinia PIC mirrors the alternate function of TBP at TATA-less promoters. Asymmetric DNA binding by TBP has been proposed in the context of Pol I51 and may also occur in other TBPLDs43,52,53. To the best of our knowledge, no other structures of TBP-like domains exist in the database, and our structure might therefore contribute to the general understanding of this domain family. In this regard, it is interesting to note that a recent cryo-EM study reports a binding mode of TBP on a TATA-less promoter similar to that on TATA-containing promoters54. The behavior of the VETFl TBPLD might therefore be fundamentally different from that of genuine TBP.

Although the order of events cannot definitely be determined given the current state of knowledge, we propose the following mechanism for vaccinia early promoter melting based on our data and prior findings for the Pol II system55 (Fig. 3a). (1) The CRBD of VETFl binds the promoter at the CR, thereby enforcing directionality (Fig. 2a,b). (2) VETFs pulls the DNA in an ATP-dependent reaction towards the vRNAP clamp and lobe, analogous to the XPB helicase in the Pol II system56 (compare Extended Data Fig. 2g to Extended Data Fig. 2h). (3) The clamp closes tightly around the DNA (Fig. 7b), thereby shaping its path57. (4) The promoter DNA becomes underwound and bent by 80° towards the C-lobe of VETFs, exposing bases for an interaction with the latter (Fig. 2d). (5) The tip of the C-terminal lobe of the VETFl TBPLD intercalates upstream of the IMR, inducing a second sharp bend in the promoter (Fig. 3b). (6) This bend triggers the initial melting event at the TSS, and the IMR absorbs the negative twist of the adjacent DNA segments.

Fig. 7: Transition of complete vRNAP to the PIC, and a model for early promoter recognition and opening.
figure 7

a, Schematic of vaccinia early promoter recognition and the opening mechanism (color code as in Fig. 1). b, Plot of clamp closure versus transcription state. CCC, co-transcriptional capping complex.

A structure-based comparison of vaccinia and eukaryotic transcription systems reveals common principles but also obvious differences in the bound transcription factors. Similar positioning of the promoter relative to the core polymerase is observed in all PICs (likewise, the positions of the B-homology region of Rap94 in the vaccinia PIC and the corresponding domain of TFIIB in the Pol II PIC overlap45,58; compare Extended Data Fig. 2g to Extended Data Fig. 2h). However, whereas TFIIB directly contacts the promoter, the B-homology region in Rap94 does not bind DNA (Fig. 1d). Furthermore, some features in the distal section of the DNA path appear to be conserved. A common principle might be the binding of a helicase transcription factor to the downstream promoter. It appears plausible that the helicase domains of VETFs (Extended Data Figs. 2e and 6) and of the TFIIH subunit XPB (SSl2 in yeast, Extended Data Figs. 2f and 6) are functional counterparts59.

In contrast to a recent study describing a PIC intermediate of Pol II immediately prior to the initially melted state55, we do not observe underwinding of the DNA duplex in the vaccinia PIC. A possible explanation for this is that the IMR has absorbed a previous negative twist during the melting process. At the promoter upstream side, we noticed a topological relationship of the VETFl–promoter complex and the positioning of the Rap94 CTD with the TBP/TFIIF module on the DNA in the Pol II PIC (Extended Data Fig. 2h). This notion is corroborated by the fact that, despite their fundamentally different binding modes, both TBP and the VETFl TBPLD induce a strong bend of the DNA. Thus, the architecture of the vaccinia PIC differs fundamentally from its nuclear counterparts. Although the catalytic cores of all multi-subunit polymerases are largely homologous, only basic architectural features are conserved with respect to the positioning of the early transcription factor. The arrangement of the VETFl TBPLD is so far unprecedented and unexpected. Our studies further reveal that VETF and Rap94 perform functionalities of TBP and TFIIB. The three conformationally different ITC structures mirror the flexibility of the transcription machinery in the initially transcribing phase and may coincide with non-processive RNA synthesis and TSS search, as observed in the Pol II system60.

During transition to the lITC, a dramatic reorganization of the transcription machinery takes place. This includes the recruitment of NPH-I, a re-routing of the upstream DNA path and widening of the transcription bubble, extending from promoter position +12 to −22 (Extended Data Fig. 4c). To accommodate the re-routed upstream DNA, the cyclin domain of Rap94 undergoes a conformational change (Extended Data Fig. 4d) accompanied by other changes in the Rap94/core enzyme interaction (Extended Data Fig. 4e). The only plausible explanation for the widening of the transcription bubble is that the NPH-I helicase motor has actively melted and scrunched upstream DNA duplex into the core vRNAP61,62. By this means, NPH-I probably assists promoter escape by adding the free energy of ATP hydrolysis to the generation of an energy-rich transcription intermediate11. Although, for Pol II, only downstream promoter scrunching has so far been observed5,6,11, vRNAP employs a novel mechanism in which downstream and upstream promoter scrunching are combined. Strikingly, a mechanism in which the helicase transcription factor TFIIH injects free energy from ATP hydrolysis into the ITC during TSS scanning has been postulated for Pol II10. In addition to its function as helicase motor, NPH-I plays an obvious role for the statics of the transcription bubble. Both the 80° bend of the DNA (Extended Data Fig. 2e) and the insertion of the ‘wedge’ residue Phe273 (Fig. 2d) stabilize the upstream fork point of the transcription bubble in the lITC. Processive vRNAP elongation complexes can be assembled in the absence of Rap94 in vitro20. In vivo, such complexes are found associated with the latter28,63. Thus, Rap94 may ensure the efficient recruitment of NPH-I to ECs stalled at pause sites to enable readthrough62, and the resulting vRNAP complex might be structurally similar to the lITC (Fig. 6a).

After assignment of our structures to the transcription timeline, we propose a comprehensive model of initial transcription (Fig. 7 and Supplementary Video 2). First, complete vRNAP reconfigures to the PIC (step 1). In the PIC, vRNAP-bound VETF has selected, aligned, positioned and melted the promoter DNA, and the clamp is in a tight conformation (Fig. 7b). Upon handover of the melted promoter to the core polymerase, VETF leaves the complex, giving rise to the lPIC (step 2). Here, the promoter is supported upstream by the CTD of Rap94 and is anchored in the downstream DNA channel. The single-stranded DNA region is dynamic in this phase and therefore not visible (Fig. 4a). Through the interaction with the PPD of Rpo30, the B-homology domain of Rap94 is kept in an initiation-ready conformation. Template-strand capture proceeds with the displacement of the PPD, which might be driven by the pronounced electronegative charge of DNA interacting with the positively charged active site region of vRNAP. After single-strand capture (step 3), the B-reader scans the template strand for the TSS in a manner analogous to that which has been observed for Pol II4. Once the TSS is located, the B-homology domain becomes mobile and RNA synthesis commences (step 4). This phase is highly dynamic, as documented by three ITC structures deviating in the state of the clamp (Fig. 7b) and the positioning of the downstream DNA in the downstream DNA channel (Fig. 5a). The vRNAP promoter escape is accompanied by recruitment of NPH-I, a large-scale remodeling of Rap94, and major changes to the path of the upstream DNA (step 5). In the lITC complex (Fig. 6a), NPH-I acts as a strand-separating helicase, widens the transcription bubble, defines its upstream fork point, and shapes the path of the single-stranded template and non-template DNA (Fig. 6a,c). Transition to a processive EC (step 6) triggers contraction of the transcription bubble, mobilization of the upstream DNA duplex and loss of NPH-I. Alternately, abortive initiation might lead to re-initiation via re-recruitment of the Rpo30 PPD (step 6b). We note that all vRNAP complexes of the transcription initiation phase contain the core polymerase in a virtually constant conformation. Still, each transition of the transcription complexes is accompanied by changes of the clamp position (Fig. 7b).

Our study provides detailed mechanistic insights into the initial phase of poxvirus transcription. Some features observed in the presented structure are poxvirus-specific, such as the unique promoter recognition by the CRBD of VETFl. Others, such as the hitherto unknown behavior of a TBP-like protein, the observation of the initial melting event and the discovery of an ATP-dependent scrunching mechanism might be of relevance for the general understanding of multi-subunit RNAPs.


Generation of recombinant vaccinia virus GLV-1h439 and vRNAP purification

GLV-1h439 was derived from GLV-1h68 as described previously21. For vRNAP purification, HeLa S3 cells were cultured in DMEM, containing 10% FBS at 37 °C in the presence of 5% CO2. Cells were grown to 90% confluence and infected with purified GLV-1h439 with a multiplicity of infection of 1.2. After 24 h, the infected cells were pelleted and resuspended in lysis buffer (50 mM HEPES pH 7.5, 150 mM NaCl, 1,5 mM MgCl2, 0.5% (vol/vol) NP-40, 1 mM DTT), supplemented with complete EDTA-free protease inhibitor cocktail (Sigma-Aldrich). The soluble supernatant of the cellular extract was incubated for 3 h at 4 °C with anti-FLAG agarose beads (Sigma-Aldrich). The beads were washed four times with buffer containing 50 mM HEPES pH 7.5, 150 mM NaCl, 1.5 mM MgCl2, 0.1% (vol/vol) NP-40, 1 mM DTT, equilibrated with elution buffer (50 mM HEPES pH 7.5, 150 mM NaCl, 1.5 mM MgCl2 and 1 mM DTT) and eluted with a 200 µg ml−1 solution of 3× FLAG peptide (Sigma-Aldrich). The eluate was analyzed by SDS polyacrylamide gel electrophoresis (SDS–PAGE). Approximately 50 µg of purified vRNAP was obtained from a single 15-cm Petri-dish of infected HeLa S3 cells.

Reconstitution of promoter-bound vRNAP complexes

A synthetic double-stranded DNA oligonucleotide scaffold mimicking the vaccinia virus early promoter region was generated by annealing of two partially complementary DNA oligonucleotides (Fig. 1a). Annealing was performed in buffer containing 100 mM NaCl, 20 mM HEPES pH 7.5 and 3 mM MgCl2 by heating to 95 °C for 5 min followed by slow cooling to room temperature. The resulting double-stranded DNA oligonucleotide was precipitated with isopropanol and the pellet was resuspended in resuspension buffer (10 mM Tris-HCl pH 8.0, 1 mM EDTA).

For reconstitution of promoter-bound vRNAP complexes, 1 pmol of [32P]-labeled DNA promoter scaffold was incubated for 30 min at 30 °C with the indicated amount of vRNAP in the presence of 1 mM indicated NTPs. Complexes were analyzed by native gel electrophoresis (4% acrylamide and 0.13% bis-acrylamide, 25 mM Tris-HCl pH 7.4, 25 mM boric acid, 0.5 mM EDTA) at 4 °C. For large-scale reconstitution of promoter/vRNAP complexes, purified vRNAP was concentrated in a Vivaspin 10-kDa cutoff concentrator (Sartorius). A total of 400 µg of vRNAP was incubated with a 60-fold molar excess of the DNA scaffold in reconstitution buffer (50 mM NaCl, 10 mM Tris-HCl pH 7.5, 5 mM MgCl2, 1 mM DTT) in the presence of ATP and UTP (1 mM each) for 30 min at 30 °C. The mixture was separated by a 10–30% sucrose gradient centrifugation (16 h, 35,000 r.p.m., Beckman 60Ti rotor, 4 °C). Gradient fractions were collected manually and analyzed by SDS–PAGE followed by silver- and ethidium-bromide staining to visualize the proteins and DNA. The indicated fractions (Extended Data Fig. 1a) were used for cryo-EM analysis after buffer exchange against modified reconstitution buffer (100 mM NaCl, 10 mM Tris-HCl pH 7.5, 5 mM MgCl2, 1 mM DTT) in a concentrator.

Cryo-electron microscopy and model building of the PIC

Following sucrose gradient purification, the indicated fractions (Extended Data Fig. 1) were pooled, diluted 1:50 with a buffer containing 10 mM Tris-HCl pH 7.5, 100 mM NaCl, 5 mM MgCl2 and 1 mM DTT, and centrifuged in a Vivaspin concentrator to remove the sucrose. R1.2/1.3 holey carbon grids (Quantifoil) were glow-discharged for 90 s (Plasma Cleaner model PDC-002; Harrick Plasma) at medium power, and 3.5 μl of C2 sample was applied inside a Vitrobot Mark IV instrument (FEI) at 4 °C and 100% relative humidity. Grids were blotted for 3 s with blot force 5 and plunged into liquid ethane. Cryo-EM datasets comprising 10,816 (dataset 1), 9,878 (dataset 2) and 3,640 (dataset 3) micrographs, respectively, were collected from three different grids with a Thermo Fisher Titan Krios G3 set-up equipped with a Falcon III camera (Thermo Fisher). Data were acquired with EPU (Thermo Fisher) at 300 keV and a nominal magnification of ×75,000 (calibrated pixel size of 1.0635 Å) in video mode with 47 fractions per video and counting of the electron signal. The total exposure was 77.5 e2 for 75 s, with two exposures per hole.

Dose-weighted, motion-corrected sums of the micrograph videos were calculated with Motioncor265. The contrast-transfer function (CTF) of each micrograph was fitted with RELION 3.166 using the built-in CTFFIND algorithm. An initial set of 25,000 particles was picked with the Gaussian picker and subjected to three rounds of 2D classification in RELION66 to clean up the dataset. Eight class averages were selected as templates for subsequent automated particle picking within RELION and a total of 300,000 particles were picked using the RELION autopicker. After a second round of 2D classification, 3D classification was performed using the vRNAP core structure as template. Particles belonging to the PIC were selected and 2D classes for autopicking were calculated. The resulting three particle stacks, one for each dataset, were cleaned up individually by four rounds of 2D classification each, and contained 1,064,795 (dataset 1), 1,205,746 (dataset 2) and 323,776 (dataset 3) good particles, respectively. Each particle stack was then subjected to 3D classification, and particles that fell in the defined PIC class were selected. The PIC particle stacks of the three datasets were then united into a single stack, and CTF refinement, followed by a consensus 3D refinement, was performed. This united particle stack was then subjected to a focused 3D classification with a mask that selected for VETF and DNA. Two of the resulting three classes yielded high-resolution reconstructions of VETF and DNA in minimally divergent conformations (Extended Data Fig. 2c). The particles from the two good classes were then forwarded to a multibody (MB) refinement in RELION, either pooled or separately. MB refinement was performed with two bodies, representing either VETF or DNA and core vRNAP. We noticed that minor variations of the mask pairs resulted in improvement of particular regions of the reconstruction. We therefore repeated the MB refinement with 11 more mask pairs. The resulting 12 map pairs were then combined with Phenix.combine_focused_maps to create a single, optimal map and the procedure was repeated for several selected subsets. The combined map of all 12 map pairs was compared to the different combined maps based on subsets. The combined map based on all 12 map pairs showed comparably better richness of detail and connectivity for VETF and was therefore used for automated model refinement. To build the PIC model, the vRNAP core excluding the Rpo30 PPD was extracted from the complete vRNAP structure (PDB 6RFL) and docked into the cryo-EM density map. Within the residual density, the path of the DNA was identified and manually docked with section-wise stretches of ideal B-DNA. VETF was then traced de novo in Coot 0.967. To this end, the SNF2 helicase core of VETFs was located and built, followed by well-defined regions of VETFl. The resulting partial model was initially refined with Phenix.real_space_refine and forwarded to Phenix.combine_focused_maps to create a stitched map, and the VETF model was completed manually. The full polypeptide chains of both VETFs and VETFl were continuously modeled. Finally, residual density was identified as the relocated Rap94 NTD, and the DNA sequence was assigned. The resulting model was manually optimized with the real-space refinement routine of Coot 0.9 and subjected again to refinement with Phenix.real_space_refine68, including ADP refinement steps. During refinement, secondary structure and Ramachandran restraints were imposed. After four further cycles of manual inspection and automated refinement, the refinement converged, and a model with excellent stereochemistry and good correlation with the cryo-EM map was obtained (Table 1).

Three-dimensional reconstruction and model building of lPIC and ITC complexes

The lPIC particle stack obtained as described above was subjected to two rounds of focused 3D classification with three classes in each of the two rounds. The classification was focused with a mask on the cleft, active site and downstream DNA channel as well as the region of the Rap94 cyclin domain. From the resulting set of nine class averages (Extended Data Fig. 3a), four reasonable reconstructions were obtained after a final round of 3D refinement and post-processing, and the associated complexes were identified as the lPIC and ITC1–3 (Extended Data Fig. 3a). The resolution was determined by Fourier-shell correlation (FSC) to 3.0 Å for the lPIC and 2.9 Å, 3.2 Å and 3.0 Å for ITC1, ITC2 and ITC3, respectively (Extended Data Fig. 3b–d). To build the lPIC model, the vRNAP core including the Rpo30 PPD was extracted from the complete vRNAP structure (PDB 6RFL) and docked into the cryo-EM density. The positioning of the Rap94 cyclin domain and the adjacent linker regions was adjusted manually with Coot67, and the model was refined with Phenix.real_space_refine68 including an ADP refinement step. During refinement, secondary structure and Ramachandran restraints were imposed. After two further cycles of manual inspection and automated refinement, the refinement converged and a model with excellent stereochemistry and good correlation with the cryo-EM map was obtained (Table 1).

Three-dimensional reconstruction and model building of the lITC

The lITC particle stack, obtained as described above, was subjected to a round of focused 3D classification with a mask on the NPH-I and upstream DNA region. From the three resulting classes, a single one displayed good occupancy and resolution for NPH-I. Particles belonging to this class were subjected to two-body MB refinement in RELION using a mask for NPH-I and upstream DNA and a mask for the core vRNAP. The postprocessed reconstructions for both bodies were then combined with Phenix.combine_focused_maps. To build the lITC model, the ITC1 structure was docked into the density. Within the residual density, a characteristic SNF2 helicase fold was recognized that was docked with either VETFs or NPH-I from the complete vRNAP structure (PDB 6RFL). NPH-I unequivocally fitted the density, while VETFs did not. Further residual density could then be identified as the relocated Rap94 B-cyclin domain, the relocated Rap94 NTD and the NPH-I CTD. After manual adjustments with Coot, including rebuilding of remodeled Rap94 linker regions, the model was refined with Phenix.real_space_refine including an ADP refinement step. During refinement, secondary structure and Ramachandran restraints were imposed. After two further cycles of manual inspection and automated refinement, the refinement converged and a model with excellent stereochemistry and good correlation with the cryo-EM map was obtained.

Reporting Summary

Further information on research design is available in the Nature Research Reporting Summary linked to this Article.