Structure of the transcription open complex of distinct σI factors

Bacterial σI factors of the σ70-family are widespread in Bacilli and Clostridia and are involved in the heat shock response, iron metabolism, virulence, and carbohydrate sensing. A multiplicity of σI paralogues in some cellulolytic bacteria have been shown to be responsible for the regulation of the cellulosome, a multienzyme complex that mediates efficient cellulose degradation. Here, we report two structures at 3.0 Å and 3.3 Å of two transcription open complexes formed by two σI factors, SigI1 and SigI6, respectively, from the thermophilic, cellulolytic bacterium, Clostridium thermocellum. These structures reveal a unique, hitherto-unknown recognition mode of bacterial transcriptional promoters, both with respect to domain organization and binding to promoter DNA. The key characteristics that determine the specificities of the σI paralogues were further revealed by comparison of the two structures. Consequently, the σI factors represent a distinct set of the σ70-family σ factors, thus highlighting the diversity of bacterial transcription.

group I σ factors in the regions from σ 1.2 to σ 4 , and, therefore, present essentially the same structures in the σ 2 , σ 3 , and σ 4 domains 11,12 .The group III σ factors lack both σ 1.1 and σ 1.2 and show weaker interactions between σ 4 and the −35 element than those of the group I and II σ factors 13,14 .The group IV σ factors contain only σ 2 -and σ 4 -domains, which bind to RNAP and promoter DNA in a similar strategy to those of the other groups, but the detailed interactions between the group IV σ factor and the promoter DNA are quite different from the interactions of the other groups [15][16][17] .These interactions are of great importance for the recognition of a consensus sequence of the −35/−10 elements by the group IV σ factors.
The σ I (SigI) factor is a unique σ 70 that is widespread in Bacilli and Clostridia [20][21][22][23][24] .It contains a σ 2 -domain for recognition of the −10 element but lacks the σ 4 -domain that recognizes the −35 element 25 .σ I was initially classified into σ 70 -family group III 26 but later considered an ECF-like σ-factor, since its C-terminal domain (SigIC) was suspected of playing a recognition role for the −35 element despite its lack of sequence homology with σ 4 27,28   .σ I factors are involved in the heat shock response, iron metabolism, virulence, and carbohydrate sensing 21,24 .Multiple paralogues of σ I and cognate anti-σ I factors (RsgIs) have been found, and these σ I -anti-σ I operons were shown to regulate component expression of cellulosomes, the multienzyme complexes that mediate efficient cellulose degradation 20,24,29 .These RsgIs contain an exocellular carbohydrate-binding module, positioned to sense the extracellular polysaccharide substrate 30 , a periplasmic domain that accommodates an autoproteolytic event for signal transduction [31][32][33] , a transmembrane helix, and a cytoplasmic inhibitory domain that binds to SigI 23 .Promoter sequences recognized by the σ I s contain an A-tract motif and a CGWA motif in the −35 and −10 elements, respectively 27,28 .σ I paralogues exhibited distinct promoterspecificity, considered to be related to an upstream region of the A-tract motif 27,28 .Although the N-and C-terminal σ I -domains presumably recognize promoter −10 and −35 elements, respectively, it is unknown how they specifically recognize promoter DNA 23,25,27 .The structure of σ I in an active state (in complex with RNAP) is thus needed to elucidate the mechanism of specific promoter recognition by multiple σ I s.
Here, we determined high-resolution cryo-EM structures of RNAPσ-promoter complexes (transcription-ready open complexes, RPo complexes) for two C. thermocellum σ I s.Structural analysis and functional validation revealed the unique promoter recognition mode and molecular mechanism of specificity for σ I paralogues, which differ from all other known groups of σ 70 factors.
The N-terminal SigI6 domain (SigI6N, residues 13-110) is located in the cleft between the RNAP-β lobe and RNAP-β' coiled-coil (β'CC) with extensive hydrophobic and hydrogen-bond interactions, while the C-terminal SigI6 domain (SigI6C, residues 134-245) forms hydrophobic interactions with the flap-tip helix (βFTH) of the RNAP β subunit (Fig. 1D and Fig. S5).The relative position of SigIN with RNAP-β'CC is similar to that of other σ 70 -family σ 2 -domains and β'CC (Fig. S5A), but the detailed interactions are different, resulting in different helix orientations relative to β'CC (Fig. S5B).These differences are caused by non-conserved interacting residues in the different types of σ factors, although those of RNAP-β'CC are highly conserved (Fig. S5D, E).The interacting hydrophobic residues of SigIC with βFTH are completely different from those of the σ 4 -domains of other σ 70 factors (Fig. S5C, F), because SigIC has no sequence homology with σ 4 and adopts different structural elements in binding βFTH.
Promoter DNA binds to both σ I and the RNAP core enzyme (Fig. 1D).The upstream region of the promoter forms a duplex and the −35 element interacts with SigIC helices α8-α12.The downstream region forms the transcription bubble through extensive interaction with SigIN.SigIN binds to the −10 element, forming the opening of the bubble, and stabilizing the NT-strand DNA.Finally, the NT and template strands form a duplex and exit RNAP from the channel between the clamp formed by RNAP β and β' subunits.
Although the overall structures of RPo-SigI1 and RPo-SigI6 are similar, some differences are observed when the structures are aligned by their RNAP core enzymes (Fig. 1E).The SigIC domains show a rotation and shift, and the SigI1C-bound −35 element bends more towards RNAP than the promoter-bound SigI6C.The first α-helix of the N-terminal SigI1 and SigI6 domains also showed different orientations.
The overall architecture of the C. thermocellum RPo complexes is similar to other known RPo complexes from various bacteria 16,[34][35][36] .However, structural analysis of the σ I -promoter interactions (Fig. S6) indicated that the mechanism of recognition is different from other known σ 70 family members, as shown below.

Interactions between σ I and promoter −35 element
SigIC binds to the −35 element through both its HTH structure formed by helices α11 and α12 in the DNA major groove and the N-terminal part of helix α9 in the minor groove (Fig. 2A, B and Fig. S7).Although the local resolutions of the SigIC-binding region (about 4.5 Å) are lower than the resolution in the RNAP core regions, and the densities of the SigIC side chains are not always clearly observed, the SigIC model structures predicted by Alphafold 37 fit well into the densities, and some of the large side chain residues, such as Phe and Tyr, can be observed with clear side chain densities (Fig. S4), resulting in the construction of reliable models for the SigIC-promoter binding regions.Minor-groove binding in the −35 element has not been observed in other σ 70 -family members 11,13,16,34 .Several residues of helix α9 are involved in the interaction with the minor groove.The side chain of H171/H173 in SigI6/SigI1 is inserted into the minor groove, forming hydrogen bonding and stacking interactions with the ribose rings.Adjacent conserved residues, including R172/R174, S174/S176, and K170/K172, interact with backbone phosphates of the double-stranded DNA (dsDNA).These minor-groove-binding residues are conserved in σ I (Fig. S7C), and the SigIC-binding minor groove is formed by the characteristic, essential A-tract region in σ I -dependent promoters 27,28 .To confirm the importance of the SigIC minor-groove-binding residues, we analyzed the activity of SigI6 and its mutants using both an in vivo heterologous Bacillus system 27,28,38 and in vitro transcriptional activity assays 39 (Fig. 2D, E).The in vivo heterologous Bacillus system revealed that mutation of H171 to Tyr, Phe, Asn, Ser, or Ala resulted in complete loss of activity, while mutation to Lys or Arg resulted in significantly decreased but detectable activity, since they are minor-groove-binding residues observed in A-tract binding proteins [40][41][42] .Mutation of K170 and R172 also significantly decreased activity, confirming their functional importance.The in vitro transcriptional activity assays also exhibited similar results (Fig. 2E), indicating the functional importance of the minor-groove binding by SigI.
The DNA-binding mode of SigIC differs from that of the σ 70 -family σ 4 -domain, which binds to the major groove only 11,13,16,17,34,43 .The additional minor-groove binding results in significantly larger interface area (952 Å 2 ) between SigIC and promoter versus that between σ 4domain and promoter (e.g., 769 Å 2 of σ H from M. tuberculosis and 530 Å 2 of σ A from B. subtilis).Furthermore, the binding modes of SigIC and σ 4 with the major groove are completely different.Previous studies indicated that SigIC would show steric hindrance if it would adopt a dsDNA binding conformation similar to that of the σ 4 -domain of ECF σ-factors 23,44 .The RPo-σ I structures indeed revealed that although SigIC interacted with the major groove via its HTH structure (α11 and  α12), its position exhibits a ~180°rotation compared with that of the σ 4 -domain (Fig. 2A).This rotation not only resolves the potential steric clash but also allows the N-terminal part of helix α9 to fit into the minor groove forming an additional DNA-binding interface, representing a unique binding mode among the known σ 70 factors.A Dali search 45 revealed that the DNA-binding mode of SigIC is similar to the winged HTH domain of transcriptional factors, among which an ROK-family repressor Lmo0178 46 shows high similarity (Fig. 2C).HTH motifs of SigIC and Lmo0178 similarly bind to the major groove and a positively charged residue (His171/Lys9 in SigI6/Lmo0178, respectively) on an N-terminal helix that penetrates the downstream minor groove (Fig. 2C).However, as opposed to SigIC, Lmo0178 is dimeric and binds to a palindrome sequence, and a β-loop wing binds to the upstream minor groove.
In summary, SigIC has a unique −35 element recognition mode formed by two features: the conserved minor-groove A-tract binding lacking in σ 4 -promoter recognition, and non-conserved major-groove ROS-binding by the HTH motif, which presents a ~180°rotation compared to the σ 4 -HTH motif.Therefore, σ I -promoter recognition of the −35 element differs completely from that of the σ 4 -domain of other σ 70 factors.

Interactions between σ I and the −10 element
The SigIN domain adopts an oval structure formed by three helices α2-α4, similar to the σ 2 -domain of other σ 70 factors, and helix α1 is attached to one head of the oval, somewhat similar to the second helix of the σ 1.2 region (σR1.2) of groups I and II (Fig. 3A).Similar to other σ 2domains, SigIN opens the duplex of the −10 element to form the transcriptional bubble, mainly through helix α4.SigIN also binds the NT-strand through α1, α2, and Loop3 (connecting α3 and α4), thus stabilizing the unwound transcription bubble.The bubble size (number of unpaired nucleotides) is 14 bp, similar to that (13-15 bp) opened by groups I-III σ 70 factors 9,11,[47][48][49] but different from that (12 bp) of ECF σfactors [15][16][17] .Although the overall structure is similar to the σ 2 -domain of other σ 70 factors, the detailed comparison showed unique interactions between SigIN and promoter DNA for specific promoter recognition (Fig. 3A), as described below.
All unpaired bases of the NT-strand DNA in the bubble (from −12 to +2) in the two RPo-σ I structures turn outward with abundant πstackings between successive bases, and bases from −12 to −3 form extensive interactions with SigIN (Fig. 3A, B and Fig. S8A).This is a unique structural feature in known RNAP-σ complexes, since only part of the NT-strand bases in the bubble flip out in other group σ 70 -RNAP complexes (Fig. 3A) 11,16,43 .According to sequence alignment (Fig. S8B), residues binding to −10 element downstream bases are largely nonconserved in σ I .The −10 to −7 downstream promoter region together with A -11 showed extensive interactions with Loop3-the "specificity loop" in ECF-σ 16,50 .The latter loop specifically recognizes the −11 base in the X -14 G -13 T -12 Y -11 (X = C,G; Y = A,T,C) motif 3,25 , which spatially corresponds to T -10 of PsigI6.Since this position is not conserved in σ Idependent promoters, we investigated whether Loop 3 plays a specificity role in the different σ I s.Mutation of T -10 of PsigI6 into different nucleotides resulted in different effects: T -10 c showed much higher activity than wild-type PsigI6, while T -10 g and T -10 a showed complete and partial loss of activity, respectively.Similarly, mutations H84 and S85 of SigI6, according to the mutation pattern in SigI1 (H84G/S85Y), SigI2 (H84N/S85M) and SigI3 (H84N/S85G), resulted in diverse effects (Fig. 3C).The latter inconsistent results indicated that the downstream region of the CGWA motif is likely a modulator of promoter activity but does not serve as a specificity determinant for the different σ I s.
Structural comparison of active σ I in the RPo complex and RsgI-bound inhibited σ I Our previous study showed that RsgI specifically binds to the C-terminal domain of cognate σ I to inhibit σ I activity and that the interface contains both conserved and non-conserved residues 23 .Nevertheless, how this interaction inhibits σ I activity is unclear.The structure of the RPo complex revealed only slight conformational changes between the active and inhibited states of SigIC (Fig. 4A), and the same surface binds to RNAP and RsgI (Fig. 4B, C), thus indicating that RsgI inhibits σ I activity by competitive binding.SigIC binds βFTH through conserved hydrophobic surface residues (Fig. 4B and Fig. S7C), which partly overlap with the RsgI-binding residues (Fig. 4C and Fig. S7C).However, the interface area of the RsgI1-SigI1 interaction (1056 Å 2 ) is much larger than that of SigI1C-βFTH (800 Å 2 ), which might explain why the conserved σ I -RNAP interaction is inhibited by the nonconserved interaction with RsgI.

Discussion
Despite more than 20 years of study of the σ I s since their discovery 22 , their classification remains confusing.σ I s were initially classified as σ 70 family group III 26 and later reclassified as "ECF-like" 27,28 rather than ECF σ factors 3,25,51 .Our structures of the RPo-σ I complexes indicated that σ I is indeed a unique type of σ factor that cannot be classified into canonical groups of the σ 70 family.Several features distinguish σ I from the other groups.In this context, σ I has a σ 2 -domain that contains part of σR1.2 which only exists in members of groups I and II.In addition, the RPo-σ I complex contains a bubble size similar to those of groups I-III.However, σ I lacks a σ 3 domain which exists in the latter groups (Fig. 5A).Moreover, the −10 element binds to SigIN with more flippedout bases than those of other σ 70 -promoter complexes (Fig. 5B).Finally, although SigIC is responsible for recognition of the −35 element and is functionally similar to the σ 4 -domain of the other σ 70 factors, it is completely different, both in terms of structure and DNAbinding mode (Fig. 5B).Therefore, σ I factors represent a distinct member of the σ 70 -family σ factors, thus highlighting the diversity of bacterial transcription.
Intriguingly, the C-terminal σ I domain binds to −35 DNA with a large binding surface that penetrates both major and minor grooves of the promoter DNA.The minor-and major-groove regions correspond to the previously identified A-tract and region of specificity 27 , respectively, and the present study provides a structural basis for the function of the two regions.The manner of minor-groove binding by a single positively charged residue has been widely observed in DNAbinding proteins for specific A-tract or AT-rich DNA recognition 52,53 .Major-groove-binding by SigIC is similar to winged helix-turn-helix (HTH) domains of transcription factors 46,54 .However, evolutionary relationships were lacking between σ I and the transcriptional factors upon comparing their homologous sequences in various bacteria.The similarity is likely caused by the convergent evolution of two different proteins for DNA binding.
The two structures reported here provide insight into the specificity of different σ I paralogues in one bacterium.Non-conserved residues in the HTH motif of the SigIC domain specifically bind to the ROS in the promoter −35 element.In addition, the −12 nucleotide in the promoter −10 element plays a role in the specificity.Its downstream nucleotides show extensive interactions with σ I and probably modulate the activities for each specific gene.The numbers of interacting residues in σ I and interacting nucleotides in the promoter are much higher than those of the ECF σ factors, which may explain why one bacterium can maintain so many (up to sixteen) σ I s for regulation with specificity 27 .Since the σ I s in C. thermocellum are responsible for regulating the expression of cellulosome components-thus comprising a potential "treasure-trove for biotechnology" 55,56 , the promoter recognition mechanism revealed in this study provides the basis for future engineering of cellulosome production in cellulosomeproducing bacteria 57 .Furthermore, the unique binding mode and specificity mechanism of the σ I s provide new possibilities to design regulators in synthetic biology for the design of orthogonal genetic switches and regulators 8,58 .

Methods
Purification of RNAP core enzyme from C. thermocellum The strains used in this study are listed in Table S2.The plasmids and primers used in this study are listed in Supplementary Data 1 and Supplementary Data 2, respectively.The C. thermocellum strain for the purification of RNAP was constructed using the previously developed homologous recombination method 59 .Specifically, a strong constitutive promoter P 2638 60 and an N-terminal His×10-tag were inserted before the RNAP β' gene (clo1313_0314) [https://www.ncbi.nlm.nih.gov/gene/12420012] in C. thermocellum strain ΔpyrF 59 .The homology arms Bp-UP and Bp-DN and the promoter P 2638 were amplified by PCR from C. thermocellum DSM1313 genomic DNA.The DNA fragments were ligated by either overlapping PCR or restriction enzyme digestion and T4 ligation, and finally the homologous recombination plasmid pHKm2homo-5'Betap was obtained (Fig. S1A).The plasmid was transformed into C. thermocellum strain ΔpyrF by electroporation 59 , generating the mutant DSM1313::P 2638 -His 10 -β' after two rounds of screening.Transformants containing the plasmid pHKm2-homo-5'Betap were first screened on semi-solid GS-2 medium containing 3 μg/mL thiamphenicol (Tm).Then, the obtained transformant was screened with 10 μg/mL 5-fluoro-2-deoxyuridine (FUDR)-supplemented uracil auxotrophic MJ medium to generate the target mutant after homologous recombination.The mutant was verified by colony PCR and sequencing.C. thermocellum strains were routinely cultured anaerobically at 55 °C in GS-2 medium, supplemented with 5.0 g/L cellobiose as carbon source.
The RNAP core enzyme was directly purified from DSM1313::P 2638 -His 10 -β'.The cells were grown anaerobically at 55 °C in 50 L GS-2 medium supplemented with 5 g/L glucose as a carbon source.When the optical density at 600 nm (OD 600nm ) reached 1.2 ~1.8, cells were collected by centrifugation at 10,200 g for 30 min.The cell pellet was suspended in 1.5 L buffer A (20 mM Tris-HCl pH 8.0, 500 mM NaCl, 30 mM imidazole, 5% (v/v) glycerol, and protease inhibitor cocktail (Roche)) and lysed by ultrasonication.The lysate was centrifuged at 15,000 g for 50 min at 4 °C, the supernatant was then loaded onto a 40-mL His-Trap FF affinity column (GE Healthcare Life Sciences) preequilibrated with buffer A, and RNAP was eluted by buffer B (20 mM Tris-HCl pH 8.0, 500 mM NaCl, 500 mM imidazole, 5% (v/v) glycerol).The complex was further purified using a 5-mL Hi-Trap Heparin column (GE Healthcare Life Sciences) pre-equilibrated with buffer C (20 mM Tris-HCl pH 8.0, 100 mM NaCl, 2 mM DTT, 0.2 mM EDTA, 5% (v/v) glycerol, and protease inhibitor cocktail (Roche)), and the RNAP was eluted with buffer D (20 mM Tris-HCl pH 8.0, 1 M NaCl, 2 mM DTT, 0.2 mM EDTA, 5% (v/v) glycerol) by a linear gradient.The fractions containing RNAP were collected and loaded on a Source Q column (GE Healthcare Life Sciences).After elution by a linear gradient of NaCl to the final concentration of 1 M, the fractions containing RNAP core enzyme were collected, concentrated to 3 mg/mL, and stored at −80 °C.The subunits of the RNAP core enzyme in the purified proteins were identified by SDS-PAGE.

Purification of recombinant SigI1 and SigI6
The expression and purification of SigI1, SigI6, and mutants of SigI6 (C167S, R215A, R214A, H171R, H171A, R172A, K170A, R104A, and K16A) in Escherichia coli followed the procedures for SigI1 reported in a previous study 23 .Briefly, the gene fragments encoding full-length SigI1 and SigI6 were cloned into the pET28a-SMT3 plasmid, generating the plasmids pET28a-SMT3-SigI1 and pET28a-SMT3-SigI6.Each mutant of SigI6 was constructed by site-directed mutagenesis using the QuikChange method.All the plasmids were transformed into E. coli BL21 (DE3) for protein expression.The wild-type SigI6 showed poor stability during the purification, and the mutant SigI6-C167S showed much better stability.Therefore, SigI6-C167S was purified and used in the structural study.The recombinant proteins were first purified by a nickel-affinity column and then purified further by size-exclusion chromatography with a HiLoad 16/600 Superdex 75 column with buffer E (20 mM Tris-HCl pH 8.0, 150 mM NaCl, 2 mM DTT, 0.2 mM EDTA, 5% (v/v) glycerol, 10 mM MgCl 2 ).The purity of recombinant proteins was detected using SDS-PAGE (Fig. S10).

Nucleic acid scaffolds
Double-stranded nucleic acid scaffolds for the cryo-EM study of RPo-SigI1 and RPo-SigI6 were prepared from synthetic oligos (Table S3) by annealing the DNA (heating at 95 °C for 5 min and then allowing the DNA to cool slowly to room temperature).The annealing buffer contains 20 mM Tris-HCl pH 8.0, 200 mM NaCl, and 10 mM MgCl 2 .
Double-stranded nucleic acid scaffolds for the fluorescencedetected in vitro transcription assay were prepared by PCR using pUC19-PsigI6-Mango-tR2 as a template.DNA sequences and primers used for the in vitro transcription assay are listed in Tables S4 and Supplementary Data 2, respectively.

Reconstitution of the RPo-σ I complex
To reconstitute the RPo-SigI1 and RPo-SigI6 complexes for cryo-EM, the purified C. thermocellum RNAP core enzyme, purified recombinant SMT3-SigI1 or SMT3-SigI6-C167S, and annealed nucleic-acid scaffold were mixed at 1:3:1.3 molar ratio and incubated at 4 °C overnight.The reconstituted RPo-σ I complex was further treated with ULP1 protease to remove the SMT3 tag of SMT3-SigI1/6.The RPo-σ I complexes were concentrated to 500 μL and then purified using a Superdex 200 Increase 10/300 GL column in buffer E. The fractions of the RPo-σ I complex were collected and concentrated for cryo-EM sample preparation.The subunits and DNA scaffolds of RPo complexes were identified by SDS-PAGE and Native-PAGE (Fig. S1E, F).

Cryo-EM grid preparation
The purified samples (12-24 mg/mL protein) were mixed with 8 mM CHAPSO (final concentration) and 0.1 mM DTT. Quantifoil R1.2/1.3 holey carbon grids were glow-discharged for 90 s before the application of 3 μL of the sample.After blotting for 6-8 s with a blot force of 2 N, the grids were plunge-frozen in liquid ethane using an FEI Vitrobot Mark IV (FEI, Hillsboro) with 95% chamber humidity at 10 °C.

Cryo-EM data acquisition and processing
The grids were imaged using a 300-keV Titan Krios equipped with a K2 Summit direct electron detector (Gatan) and a GIF quantum energy filter (slit width 20 eV).Data were collected at a nominal magnification of ×22,500 (1.04 Å/pixel) with a dose rate of 8 electrons/pixel/s on the sample (~7.8 electrons/pixel/s on the detector).All images were recorded using Serial EM 61 with super-resolution counting mode for 7.6 s exposures in 32 subframes to give a total dose of 60 electrons/Å 2 with defocus range of −1.5 to −2.5 μm.
Motion correction and CTF estimation of cryo-EM movies were performed using Warp 62 , and particles were picked using an instance of Warp's neural network retrained on 100 selected micrographs RPo-SigI1 data sets and RPo-SigI6 data sets.Particles were extracted in Warp and subsequently classified in cryoSPARC 63 .
For the RPo-SigI6 dataset, the initial model was generated by Mycobacterium tuberculosis wild-type RNAP holoenzyme/RbpA/CarD/ Sor/AP3-RP2 class (EMD-22575) as a template to 3D classify the particles using cryoSPARC heterogeneous refinement.The best class was selected as the reference to classify the particles for 3D classification with alignment in RELION 64 .A collection of 120591 particles was selected to perform autorefinement.Focused classification (without alignment) of the SigI6C terminal was performed to improve the local density of the SigI6 and binding DNA.To further clean the dataset, CryoDRGN 65 was used to classify particles, and three similar classes were selected to perform the non-uniform refinement.The map was estimated to be at a resolution of 3.58 Å in RELION, and further processing by density modification with the ResolveCryoEM program 66 improved the map quality and resolution to 3.36 Å.
For the RPo-SigI1 dataset, the extracted particles were first 3D-and 2D-classified in cryoSPARC to discard poor particles.particles were then subjected to 3D classification in Relion and refinement in cryoSPARC to obtain a reconstructed map.To improve the local density of SigI1 and binding DNA, focused classification (without alignment) of the SigI1C terminus was performed in Relion.All particles in the best class during the focused classification were then subjected to non-uniform refinement in cryoSPARC, resulting in a map with an overall resolution of 3.03 Å. Post-processing of the density map generated during refinement was performed using DeepEMhancer 67 .Local resolution estimations were calculated within RELION.The procedures for Cryo-EM structure determination of RPo-SigI1 and RPo-SigI6 are shown in Fig. S2.

Model building and refinement
The final cryoEM map for RPo-SigI1 and RPo-SigI6 complexes was used for initial model building.The crystal structure of Mycobacterium tuberculosis RPtic-σ H complex structure (PDB ID 5ZX2) [https://www.rcsb.org/structure/5ZX2] was placed in the cryoEM maps of the RPo-SigI1 and RPo-SigI6 complexes, by rigid-body fitting with UCSF Chimera 68 .The RNAP subunits in RPo-σ I complexes were manually rebuilt into the cryoEM map referring to the fit RPtic-σ H structure.The individual models of SigI1 and SigI6 were built referring to the structure predicted by Alphafold 37 .The model was completed and manually adjusted residue-by-residue with real-space refinement in Coot 69 , and then followed by real-space refinement in PHENIX 70 .The models were visualized with UCSF Chimera, UCSF ChimeraX 71 , and PyMOL (http:// www.pymol.org/).

Bacillus subtilis strain construction
A heterologous B. subtilis host system was constructed to study the σ I -dependent promoter activities, referring to the published system which has been successfully used to study the activities of σ I s from C. thermocellum and Pseudobacteroides cellulosolvens 27,28 .Plasmids and primers in the present work are listed in Supplementary Data 1 and Supplementary Data 2, respectively.Two plasmids pULacZ and pAX05 were constructed to integrate the lacZ reporter gene and the C. thermocellum sigI6 gene into the amyE and sigI-rsgI loci, respectively, of B. subtilis.The plasmid pAX05 (Fig. S9A) was constructed from plasmid pAX01 carrying an erythromycin (Erm) resistance cassette and the xylose-inducible promoter PxylA 27,71 .The upstream (1011 bp) and downstream (1011 bp) regions of B. subtilis sigI-rsgI operon were used as the homologous recombination arms and amplified using primer pairs sigI-F1/sigI-R1 and rsgI-F1/rsgI-R1, respectively, from the genomic DNA of B. subtilis strain 168.The C. thermocellum sigI6 gene was amplified using the primer pair Bs-sigI6-F1/ Bs-sigI6-R1.Then the DNA fragments of the homologous recombination arms, the promoter PxylA, the sigI6 gene, and the linearized pAX01 vector generated by PCR were ligated simultaneously with the One Step Cloning Kit (Vazyme), thereby obtaining the pAX05 plasmid.The plasmid pULacZ was constructed from the pUC19 vector (Fig. S9B).A spectinomycin (Spc)-resistance gene as a selectable marker 72 was amplified from plasmid pLH-16 (provided by Mr. Hui Li, Qingdao Institute of Bioenergy and Bioprocess Technology).The upstream (1074 bp) and downstream (825 bp) regions of B. subtilis amyE were used as the homologous recombination arms and amplified using primer pairs amyE-F1/amyE-R1 and amyE-F2/amyE-R2.The reporter lacZ gene was amplified from the E. coli genome using primer pairs lacZ-F1/lacZ-R1.The promoter PsigI6 was amplified from C. thermocellum genomic DNA.Then the DNA fragments and the linearized pUC19 vector generated by PCR were ligated, generating the pULacZ plasmid.The plasmids containing the mutation of SigI or PsigI6 were obtained by site-directed mutagenesis using the primer pairs listed in Supplementary Data 2. All the used strains are listed in Table S2.
B. subtilis strains were grown on LB, SM1, or SM2 media 73 at 37 °C.The competent cells of B. subtilis 168 were prepared following the reported protocol 73 .B. subtilis 168 was transformed with pAX05 and pULacZ plasmids successively.The transformants were selected with 3 µg/mL Erm and 100 µg/mL Spc.Chromosomal integration of plasmids by a double-crossover event was confirmed by colony PCR using the primers listed in Supplementary Data 2.

Promoter activity analysis by the B. subtilis reporter system
To measure the β-galactosidase activity of LacZ in the B. subtilis reporter system, strain samples were inoculated into MCSE media with Erm and Spc, and the culture was shaken at 250 rpm until OD 600nm = 0.4-0.5.Then xylose was added to the final concentration of 1% to induce the expression of SigI for 2 h 28,74 .The β-galactosidase activity was analyzed using ortho-nitrophenyl-β-galactoside (ONPG) as the substrate according to the previously described procedures 28 .Briefly, 4 mL of the cell cultures was centrifuged at 5000 g for 10 min, and the cell pellet was washed twice with Z-buffer (60 mM Na 2 HPO 4 , 40 mM NaH 2 PO 4 , 10 mM KCl, 1 mM MgSO 4 , pH 7.0) and resuspend in 700 μL working buffer (60 mM Na 2 HPO 4 , 40 mM NaH 2 PO 4 , 10 mM KCl, 1 mM MgSO 4 , and 2.7 mM β-mercaptoethanol, pH 7.0).The cells were lysed by ultrasonication and the lysate was centrifuged at 13,000 g for 10 min.The 100 μL enzymatic reaction system contained different volumes of cell lysate, 10 μL ONPG stocking solution (13.1 mg/mL in double distilled water), and the working buffer to make up a volume of 100 μL.The reaction system was incubated at 37 °C for a certain period and then 40 μL of the reaction solution was added to 200 μL 1 M Na 2 CO 3 to terminate the reaction.The released 2-nitrophenol (ONP) was measured by determining the absorbance at 420 nm (A 420nm ).One unit of enzyme activity was defined as the amount of β-galactosidase that releases 1 nmol of ONP per minute.The enzymatic activity was normalized with cell density (OD 600nm ).

Fluorescence-detected in vitro transcription assay
The measurement of transcription activity was conducted by utilizing the significantly enhanced fluorescence of TO1-3PEG-Biotin when the Mango riboswitch is engaged 75 , which has been successfully used to study transcriptional activities of various RNAPs 39,76 .Briefly, to measure the transcriptional activity of SigI6 mutants or PsigI6 mutants, reaction mixtures (20 μL), containing the C. thermocellum RNAP core enzyme (final concentration 50 nM), promoter DNA or its mutants (final concentration 50 nM), and SigI6 or its mutants (100 nM) in reaction buffer (50 mM Tris-HCl pH 7.9, 100 mM KCl, 10 mM MgCl 2 , 1 mM DTT, 5% glycerol, and 0.01% Tween-20), were incubated at room temperature for 10 min.The reactions were initiated by the addition of 2 μL NTP mixture (UTP, ATP, GTP, and CTP; final concentration 0.1 mM of each) and 2 μL TOl-3PEG-Biotin (final concentration 0.5 μM), and the reaction mixture was incubated at 55 °C for 30 min.The fluorescence signals were measured using a plate reader (SpectraMax M2, Molecular Devices) at an excitation wavelength of 510 nm and an emission wavelength of 550 nm.

Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability
The models and cryo-EM maps have been deposited into the Protein Data Bank and the EMDB under accession numbers 8I23 and EMD-35130 for RPo-SigI1 and 8I24 and EMD-35131 for RPo-SigI6, respectively.Other structure data used in this study for analysis (7MKP, 6CA0, 7CKQ, 6MPJ, 5ZX2, 6IVU) are available in the Protein Data Bank.Protein sequences used in this study are available from Uniprot under accession codes A3DBH0 (SigI1) [https://www.uniprot.org/uniprotkb/A3DBH0/entry], A3DH98 (SigI6) [https://www.uniprot.org/uniprotkb/A3DH98/entry].Source data are provided in this paper.

Fig. 1 |
Fig. 1 | Cryo-EM structures of RPo-SigI1 and RPo-SigI6 from C. thermocellum.A The nucleic-acid scaffolds are used for structure determination.P1 and P6 are SigI1-and SigI6-dependent promoters, respectively.The transcription bubbles observed in the structures are indicated by dashed rectangles.Nucleotides that cannot be modeled in the structures because of poor density are shown in gray fonts.The filled triangles indicate the transcription starting site (TSS) reported in literature 27 , which has one nucleotide difference in the alignment.For convenient comparison between two promoters, the nucleotides in P6 are numbered according to the alignment with P1 instead of the TSS of P6.The −35 element, A-tract motif, −10 element, and discriminator are shaded in blue, green, yellow, and orange, respectively.B The cryo-EM density map of RPo-SigI6.Each subunit of RPo-SigI6 and DNA strand is colored differently: β, green; β', cyan; α1, khaki; α2, dark khaki; ω, yellow; SigI6, red; NT-strand DNA, deep blue; T-strand DNA, orange.C RPo-SigI6 presents a closed conformation of the β-β' clamp.For comparison, E. coli RNAP structures are shown in the open (RNAP core enzyme) and closed (RPoσ A ) conformations.The clamp distances between residues β G373 and β' I290 for E. coli RNAP and residues β G242 and β' I302 for C. thermocellum RNAP are labeled.D The organization of SigI6 (red) and P6 (light blue and orange) on the RNAP core (gray).E Comparison of the σ I and promoter conformations in RPo-SigI1 (pink) and RPo-SigI6 (red).The structures were superimposed by the whole complexes and the subunits of the RNAP cores are not shown.

Fig. 2 |
Fig. 2 | Interactions between the promoter DNA −35 element and SigI6 in the RPo-SigI6 structure.A Comparison of the DNA-binding modes of the SigI6C domain (red) and the σ 4 domain in σ A from Bacillus subtilis (PDB 7CKQ, gray).The helices in the helix-turn-helix (HTH) motifs are shown as cylinders.The two panels on the right demonstrate that when the structures are superimposed by the HTH motif, the respective NT strands of the DNA run in opposite directions (indicated by 5'→3'); when the structures are superimposed by the promoter DNAs, the HTH motifs exhibit a ~180°rotation.B The detailed interactions in major and minor grooves of the −35 element DNA.Residues involved in the interactions are shown as sticks.SigI6C, red; NT-strand DNA, light blue; T-strand DNA, orange.C Comparison of the DNA-binding modes in the SigI6C-promoter (red) recognition and that of a transcription repressor Lmo0178-operator DNA complex (PDB 5F7Q, gray) from Listeria monocytogenes.Lmo0178 is a dimer in the structure and one Lmo0178 molecule (Lmo0178-1) is superimposed with SigI6C.Residues H171 in SigI6 and K9 in Lmo0178, which similarly penetrate the minor groove are shown as sticks with labels.Lmo0178 contains a wing loop, which additionally binds to the upstream minor groove.D Activities of SigI6 mutants measured by the Bacillus subtilis heterologous reporter system.E In vitro transcriptional activities of SigI6 mutants and promoter mutants.The bars are filled with the following colors: green, wild-type and C167S mutant; yellow and orange, mutants for residues potentially interacting with minor and major grooves of the −35 element, respectively; gray, mutants of the −10 element or residues potentially interacting with the −10 element.Data are presented as mean values ± SD, and n = 3 biological replicates in D and E. Source data of D and E are provided as a Source data file.

Fig. 3 |
Fig. 3 | Interactions between promoter DNA −10 element and SigIN in the RPo-σ I structures.A Comparison of the DNA binding by the σ 2 domain of SigI, group I, group III, and group IV σ factors.SigI1N and SigI6N are shown in pink and red, respectively, and other group σ factors are shown in gray.The schematic diagrams of the promoter −10 element recognition by different types of σ factors are shown at the bottom, indicating more interactions between SigIN and the −10 element (blue) than those between other group σ factors and the −10 elements (dark gray).B Interactions between C. thermocellum SigI6 (pink) and transcription-bubble DNA (blue).SigI6 is rendered as surfaces, and the residues involved in protein-DNA interactions are shown as red sticks.C Activities of various SigI6 mutants in the SigI6N region or PsigI6 mutants in the −10 region, measured by the B. subtilis heterologous reporter system.Green, wild-type and C167S mutant; gray, mutants of the −10 element or residues potentially interacting with the −10 element.Data are presented as mean values ± SD, and n = 3 biological replicates in C. Source data of C are provided as a Source data file.

Fig. 4 |
Fig. 4 | Comparison of SigI1C in the active (i.e., in RPo complex) and the inactive (i.e., RsgI-bound) states.A Comparison of SigI1C structures in the active (pink) and inactive (PDB 6IVU, gray) states.B Interaction between SigI1C (pink) and RNAP βFTH (light purple).C Interaction between inactive SigI1C (gray) and RsgI1N (orange).In B and C, residues involved in hydrophobic, hydrogen bonding, and electrostatic interaction are shown as yellow, green, and blue sticks, respectively.

Fig. 5 |
Fig. 5 | Schematic diagrams of the promoter recognition by σ I and by the four groups of σ 70 family σ factors.A Domain organization of the different groups of the σ 70 family σ factors.The promoter regions recognized by the different domains are indicated by arrows.B Cartoon models of the promoter binding mode by σ I (top) and other σ 70 factors (bottom).For the cartoon of groups I-IV, the domains existing in only part of the groups are shown as dashed lines.The −35 element, Atract, −10 element, and discriminator DNAs are colored in blue, green, yellow, and orange, respectively.The SigIC, σ 4 , σ 3 , σ 2 , σ 1 , and NCR domains of σ factors are colored in red, cyan, khaki, pink, light green, and purple, respectively.