Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Structure and function of virion RNA polymerase of a crAss-like phage


CrAss-like phages are a recently described expansive group of viruses that includes the most abundant virus in the human gut1,2,3. The genomes of all crAss-like phages encode a large virion-packaged protein2,4 that contains a DFDxD sequence motif, which forms the catalytic site in cellular multisubunit RNA polymerases (RNAPs)5. Here, using Cellulophaga baltica crAss-like phage phi14:2 as a model system, we show that this protein is a DNA-dependent RNAP that is translocated into the host cell along with the phage DNA and transcribes early phage genes. We determined the crystal structure of this 2,180-residue enzyme in a self-inhibited state, which probably occurs before virion packaging. This conformation is attained with the help of a cleft-blocking domain that interacts with the active site and occupies the cavity in which the RNA–DNA hybrid binds. Structurally, phi14:2 RNAP is most similar to eukaryotic RNAPs that are involved in RNA interference6,7, although most of the phi14:2 RNAP structure (nearly 1,600 residues) maps to a new region of the protein fold space. Considering this structural similarity, we propose that eukaryal RNA interference polymerases have their origins in phage, which parallels the emergence of the mitochondrial transcription apparatus8.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Rent or buy this article

Get just this article for as long as you need it


Prices may be subject to local taxes which are calculated during checkout

Fig. 1: In vitro transcription activity of the phi14:2 RNAP gp66.
Fig. 2: Global analysis of phi14:2 transcription during infection.
Fig. 3: The phi14:2 RNAP gp66 is homologous to single- and multi-subunit RNAPs.

Data availability

The following publicly available datasets were used in the study: GenBank reference genome sequences of phages phi14:2 (NC_021806.1), phicrAss001 (MH675552.1), IAS (KJ003983); phage genomes from the Data S1 dataset found in the supplementary information of ref. 13; and PDB atomic models of proteins with the accession numbers 2J7N, 2O5J, 4C2M and 1Y1W. The genome of C. baltica strain 14 has been deposited to the NCBI BioProject and is accessible through the BioProject ID PRJNA552277. The RNA-seq datasets have been deposited to the NCBI Gene Expression Omnibus and are accessible through the GEO Series GenBank accession number GSE133609. The refined atomic model of phi14:2 gp66 and the X-ray structure factors have been deposited to the PDB under the accession number 6VR4. The uncropped gels used for Figs. 1, 2 and Extended Data Figs. 1, 6 are shown in the Supplementary InformationSource data are provided with this paper.

Code availability

The custom code, information about the software used in this study, and annotations of genomes of crAss-like phages are available from GitHub at


  1. Dutilh, B. E. et al. A highly abundant bacteriophage discovered in the unknown sequences of human faecal metagenomes. Nat. Commun. 5, 4498 (2014).

    ADS  CAS  PubMed  Google Scholar 

  2. Yutin, N. et al. Discovery of an expansive bacteriophage family that includes the most abundant viruses from the human gut. Nat. Microbiol. 3, 38–46 (2018).

    CAS  PubMed  Google Scholar 

  3. Koonin, E. V. & Yutin, N. The crAss-like phage group: how metagenomics reshaped the human virome. Trends Microbiol. 28, 349–359 (2020).

    CAS  PubMed  Google Scholar 

  4. Holmfeldt, K. et al. Twelve previously unknown phage genera are ubiquitous in global oceans. Proc. Natl Acad. Sci. USA 110, 12798–12803 (2013).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  5. Werner, F. & Grohmann, D. Evolution of multisubunit RNA polymerases in the three domains of life. Nat. Rev. Microbiol. 9, 85–98 (2011).

    CAS  PubMed  Google Scholar 

  6. Cogoni, C. & Macino, G. Gene silencing in Neurospora crassa requires a protein homologous to RNA-dependent RNA polymerase. Nature 399, 166–169 (1999).

    ADS  CAS  PubMed  Google Scholar 

  7. Salgado, P. S. et al. The structure of an RNAi polymerase links RNA silencing and transcription. PLoS Biol. 4, e434 (2006).

    PubMed  PubMed Central  Google Scholar 

  8. Shutt, T. E. & Gray, M. W. Bacteriophage origins of mitochondrial replication and transcription proteins. Trends Genet. 22, 90–95 (2006).

    CAS  PubMed  Google Scholar 

  9. Zhang, G. et al. Crystal structure of Thermus aquaticus core RNA polymerase at 3.3 Å resolution. Cell 98, 811–824 (1999).

    CAS  PubMed  Google Scholar 

  10. Sidorenkov, I., Komissarova, N. & Kashlev, M. Crucial role of the RNA:DNA hybrid in the processivity of transcription. Mol. Cell 2, 55–64 (1998).

    CAS  PubMed  Google Scholar 

  11. Campbell, E. A. et al. Structural mechanism for rifampicin inhibition of bacterial RNA polymerase. Cell 104, 901–912 (2001).

    CAS  PubMed  Google Scholar 

  12. Shkoporov, A. N. et al. ΦCrAss001 represents the most abundant bacteriophage family in the human gut and infects Bacteroides intestinalis. Nat. Commun. 9, 4781 (2018).

    ADS  PubMed  PubMed Central  Google Scholar 

  13. Guerin, E. et al. Biology and taxonomy of crAss-like bacteriophages, the most abundant virus in the human gut. Cell Host Microbe 24, 653–664.e6 (2018).

    CAS  PubMed  Google Scholar 

  14. Paget, M. S. Bacterial sigma factors and anti-sigma factors: structure, function and distribution. Biomolecules 5, 1245–1265 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  15. Vassylyev, D. G. et al. Structural basis for substrate loading in bacterial RNA polymerase. Nature 448, 163–168 (2007).

    ADS  CAS  PubMed  Google Scholar 

  16. Wang, D., Bushnell, D. A., Westover, K. D., Kaplan, C. D. & Kornberg, R. D. Structural basis of transcription: role of the trigger loop in substrate specificity and catalysis. Cell 127, 941–954 (2006).

    CAS  PubMed  PubMed Central  Google Scholar 

  17. Cramer, P., Bushnell, D. A. & Kornberg, R. D. Structural basis of transcription: RNA polymerase II at 2.8 ångstrom resolution. Science 292, 1863–1876 (2001).

    ADS  CAS  PubMed  Google Scholar 

  18. Lane, W. J. & Darst, S. A. Molecular evolution of multisubunit RNA polymerases: structural analysis. J. Mol. Biol. 395, 686–704 (2010).

    CAS  PubMed  Google Scholar 

  19. Krissinel, E. & Henrick, K. Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions. Acta Crystallogr. D 60, 2256–2268 (2004).

    CAS  PubMed  Google Scholar 

  20. Holm, L. Benchmarking fold detection by DaliLite v.5. Bioinformatics 35, 5326–5327 (2019).

    CAS  PubMed  Google Scholar 

  21. Berman, H. M. et al. The Protein Data Bank. Nucleic Acids Res. 28, 235–242 (2000).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  22. Engel, C., Sainsbury, S., Cheung, A. C., Kostrewa, D. & Cramer, P. RNA polymerase I structure and transcription regulation. Nature 502, 650–655 (2013).

    ADS  CAS  PubMed  Google Scholar 

  23. Fernández-Tornero, C. et al. Crystal structure of the 14-subunit RNA polymerase I. Nature 502, 644–649 (2013).

    ADS  PubMed  Google Scholar 

  24. Murakami, K. S., Davydova, E. K. & Rothman-Denes, L. B. X-ray crystal structure of the polymerase domain of the bacteriophage N4 virion RNA polymerase. Proc. Natl Acad. Sci. USA 105, 5046–5051 (2008).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  25. Gleghorn, M. L., Davydova, E. K., Rothman-Denes, L. B. & Murakami, K. S. Structural basis for DNA-hairpin promoter recognition by the bacteriophage N4 virion RNA polymerase. Mol. Cell 32, 707–717 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  26. Iyer, L. M., Koonin, E. V. & Aravind, L. Evolutionary connection between the catalytic subunits of DNA-dependent RNA polymerases and eukaryotic RNA-dependent RNA polymerases and the origin of RNA polymerases. BMC Struct. Biol. 3, 1 (2003).

    PubMed  PubMed Central  Google Scholar 

  27. Shabalina, S. A. & Koonin, E. V. Origins and evolution of eukaryotic RNA interference. Trends Ecol. Evol. 23, 578–587 (2008).

    PubMed  PubMed Central  Google Scholar 

  28. Aalto, A. P., Poranen, M. M., Grimes, J. M., Stuart, D. I. & Bamford, D. H. In vitro activities of the multifunctional RNA silencing polymerase QDE-1 of Neurospora crassa. J. Biol. Chem. 285, 29367–29374 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  29. Lee, H. C. et al. The DNA/RNA-dependent RNA polymerase QDE-1 generates aberrant RNA and dsRNA for RNAi in a process requiring replication protein A and a DNA helicase. PLoS Biol. 8, e1000496 (2010).

    PubMed  PubMed Central  Google Scholar 

  30. Holmfeldt, K., Middelboe, M., Nybroe, O. & Riemann, L. Large variabilities in host strain susceptibility and phage host range govern interactions between lytic marine phages and their Flavobacterium hosts. Appl. Environ. Microbiol. 73, 6730–6739 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

  31. Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  32. Bankevich, A. et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19, 455–477 (2012).

    MathSciNet  CAS  PubMed  PubMed Central  Google Scholar 

  33. Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Liao, Y., Smyth, G. K. & Shi, W. The R package Rsubread is easier, faster, cheaper and better for alignment and quantification of RNA sequencing reads. Nucleic Acids Res. 47, e47 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  35. Sokolova, M. et al. A non-canonical multisubunit RNA polymerase encoded by the AR9 phage recognizes the template strand of its uracil-containing promoters. Nucleic Acids Res. 45, 5958–5967 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  36. Kabsch, W. Integration, scaling, space-group assignment and post-refinement. Acta Crystallogr. D 66, 133–144 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  37. Pape, T. & Schneider, T. R. HKL2MAP: a graphical user interface for macromolecular phasing with SHELX programs. J. Appl. Crystallogr. 37, 843–844 (2004).

    CAS  Google Scholar 

  38. Sheldrick, G. M. Experimental phasing with SHELXC/D/E: combining chain tracing with density modification. Acta Crystallogr. D 66, 479–485 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  39. Cowtan, K. Recent developments in classical density modification. Acta Crystallogr. D 66, 470–478 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  40. Winn, M. D. et al. Overview of the CCP4 suite and current developments. Acta Crystallogr. D 67, 235–242 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  41. Read, R. J. & McCoy, A. J. Using SAD data in Phaser. Acta Crystallogr. D 67, 338–344 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  42. Cowtan, K. The Buccaneer software for automated model building. 1. Tracing protein chains. Acta Crystallogr. D 62, 1002–1011 (2006).

    PubMed  Google Scholar 

  43. Emsley, P., Lohkamp, B., Scott, W. G. & Cowtan, K. Features and development of Coot. Acta Crystallogr. D 66, 486–501 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  44. Adams, P. D. et al. PHENIX: a comprehensive Python-based system for macromolecular structure solution. Acta Crystallogr. D 66, 213–221 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  45. Chen, V. B. et al. MolProbity: all-atom structure validation for macromolecular crystallography. Acta Crystallogr. D 66, 12–21 (2010).

    CAS  PubMed  Google Scholar 

  46. Pettersen, E. F. et al. UCSF Chimera—a visualization system for exploratory research and analysis. J. Comput. Chem. 25, 1605–1612 (2004).

    CAS  PubMed  Google Scholar 

  47. Kettenberger, H., Armache, K. J. & Cramer, P. Complete RNA polymerase II elongation complex structure and its interactions with NTP and TFIIS. Mol. Cell 16, 955–965 (2004).

    CAS  PubMed  Google Scholar 

Download references


We thank S. Medvedeva for help with the promoter search. The study was carried out using resources of the Skoltech Genomics Core Facility. The work was supported by the Russian Science Foundation (grant 19-74-00011 to M.L.S.). This research used resources of the Advanced Photon Source, a US Department of Energy (DOE) Office of Science User Facility operated for the DOE Office of Science by Argonne National Laboratory under contract no. DE-AC02-06CH11357. The use of the LS-CAT Sector 21 was supported by the Michigan Economic Development Corporation and the Michigan Technology Tri-Corridor (grant 085P1000817).

Author information

Authors and Affiliations



K.V.S., M.L.S. and E.V.K. conceived the study. K.H. and E.N. provided C. baltica cells, phi14:2 phage, and phi14:2 DNA. A.V.D. cultivated C. baltica and phi14:2, prepared RNA for RNA-seq and primer extension experiments and performed RT–qPCR. S.A.P. purified phi14:2 RNAP and its mutants, performed in vitro transcription assays, and some of the primer extension experiments. M.V.K. processed and analysed RNA-seq data. M.V.K., N.Y. and K.S.M. annotated crAss-like phage genomes. E.I.K. performed mutagenesis of phi14:2 RNAP. L.M. performed primer extension experiments. M.V.Y. purified C. baltica RNAP. M.L.S. performed a search for promoters, prepared crystals and supervised the project. P.G.L. solved the crystal structure, and built and refined the atomic model. M.L.S., P.G.L. and S.B. analysed the structure. S.B. examined the activation of the enzyme by single-stranded DNA oligonucleotides. M.L.S., P.G.L. and K.V.S. wrote the manuscript, which was read, edited and approved by all authors.

Corresponding authors

Correspondence to Konstantin V. Severinov, Petr G. Leiman or Maria L. Sokolova.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature thanks Ryland Young and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Purification of the phi14:2 RNAP gp66 and analysis of its transcriptional activity on an RNA–DNA scaffold.

a, Wild-type gp66 and three aspartate-to-alanine point mutants in the catalytic DFDID motif are visualized on SDS-PAGE. The panel depicts one of two repeat experiments with similar outcomes. b, Unlike E. coli (Eco) and T7 phage RNAPs, gp66 does not extend an RNA primer of an RNA–DNA scaffold in the presence of rNTPs. The sequences of RNA–DNA scaffolds used are shown below the gels. The RNA was radioactively labelled at the 5′ end. The reaction products were resolved by electrophoresis in 16% (w/v) polyacrylamide gel containing 8 M urea and revealed by autoradiography. The assay was performed twice for each of two biological replicates. The uncropped SDS-PAGE gel and autoradiograms are shown in Supplementary Fig. 1.

Extended Data Fig. 2 General parameters of phi14:2 infection and temporal patterns of transcript accumulation of selected early, middle and late phi14:2 genes visualized by RT–qPCR and RNA-seq.

a, Growth curves of C. baltica infected with phi14:2 at different MOIs (mean ± s.d. of three biological replicates). For each condition, the OD600 was normalized to its value measured immediately after the phage was added to the culture (time point 0). b, Single-step multiplication of phi14:2 in C. baltica at an MOI of 0.001. The number of PFUs (mean ± s.d. of three biological replicates) is given for a standard infection protocol (no rifampicin, black line) and for infection in the presence of rifampicin (red line). The value of PFU was normalized to that measured at time point 0. c, Black and red lines and symbols correspond to the infection of C. baltica under a standard protocol and that in the presence of rifampicin, respectively. RT–qPCR reactions are quantified by a cycle threshold (Ct) parameter at which the RT–qPCR signal exceeds a preset value. The signal is converted to the original transcript abundance by plotting it as \({2}^{-{\rm{Ct}}}\). The transcript abundance is normalized to that of the C. baltica 16S ribosomal RNA (rRNA) as follows: \({2}^{-{{\rm{Ct}}}_{{\rm{gene}}}}/{2}^{-{{\rm{Ct}}}_{{\rm{16S}}{\rm{rRNA}}}}={2}^{-\Delta {\rm{Ct}}}\) where \(\Delta {\rm{Ct}}={{\rm{Ct}}}_{{\rm{gene}}}-{{\rm{Ct}}}_{{\rm{16S}}{\rm{rRNA}}}\). Each RT–qPCR reaction contained three technical replicates resulting in up to nine values of \(\Delta {{\rm{Ct}}}_{ij}={{\rm{Ct}}}_{{\rm{gene}}i}-{{\rm{Ct}}}_{{\rm{16S}}{\rm{rRNA}}j}\) \(i,j=1,\,2,\,3\). The line connects the mean values of the function \({2}^{-\Delta {{\rm{Ct}}}_{ij}}\) for each time point that are labelled with a larger symbol. The corresponding RNA-seq data are shown next to the RT–qPCR plots. The RT–qPCR and RNA-seq analyses were performed on different biological replicates.

Source data

Extended Data Fig. 3 SAD-derived electron density of the catalytic loop and cleft-blocking domain of phi14:2 RNAP gp66.

All panels show the initial experimental electron density map, which was calculated using SeMet SAD phases that were improved by solvent flattening and twofold non-crystallographic averaging, and the final refined model. a, b, Two orthogonal views of the catalytic site region. The Cα atoms of the catalytic loop are coloured yellow. The map is contoured at 2 s.d. above the mean. c, The structure of the cleft-blocking domain. The Cα atoms of the cleft-blocking domain are coloured cyan. The map is contoured at 1 s.d. above the mean. The orientation of the molecule and domain colour code are both similar to those shown in Extended Data Fig. 4.

Extended Data Fig. 4 Domain organization and functional elements of phi14:2 RNAP gp66.

phi14:2 gp66 (middle) can be divided into 12 domains, each containing its own separate hydrophobic core (except the cleft-blocking domain, owing to its small size). To improve clarity, the overall structure is shown with a substantial degree of depth cueing, resulting in some parts of the structure (e.g. domain VII) being almost invisible. All panels that show the structure are to scale. The colour code, domain boundaries and secondary structure are given in the lower panels. Known and putative functional elements are labelled. The functions of the channels that span the molecule were assigned as follows: gp66 was first superimposed onto the T. thermophilus RNAP elongation complex crystal structure (PDB ID: 2O5J15), and the RNA and DNA molecules were then extracted from the latter. Then, while keeping the RNA–DNA duplex part of the complex stationary, both the RNA and DNA tails were adjusted as rigid bodies to minimize clashes with gp66, and their geometries were regularized.

Extended Data Fig. 5 The cleft-blocking domain of the phi14:2 RNAP gp66 and the expander element of Pol I clash with nucleic acids in the RNAP elongation complex.

a, Superposition of the Pol I catalytic site (PDB ID: 4C2M22) onto that of the Pol II elongation complex structure (PDB ID: 1Y1W47). b, Superposition of the gp66 catalytic site onto that of the Pol II elongation complex structure (PDB ID: 1Y1W47). c, Two orthogonal views of the superposition of gp66 and Pol I catalytic sites (PDB ID: 4C2M22).

Extended Data Fig. 6 A model of the gp66 catalytic site in the active conformation.

a, DNA-dependent RNA synthesis activity of wild-type gp66 and its two catalytic loop mutants as measured on denatured phi14:2 DNA. The uncropped autoradiogram is shown in Supplementary Fig. 5. The assay was performed three times for each of two biological replicates. b, The crystal structure of the catalytic site of the T. thermophilus RNAP (the elongation state conformation, PDB ID: 2O5J15). Residues of the β subunit are labelled with a superscript index. c, Possible configuration of the catalytic site of the wild-type gp66 (tan coloured) and its I1364G and I1364W mutants (coloured cyan and magenta, respectively) in the active conformation. The models were obtained by rebuilding and regularizing the geometry of the polypeptide chain to make the side chains of the three catalytic aspartates (D1361, D1363 and D1365) point in the same direction. For each of the aspartates, a rotamer that brings the side chains closer in space was chosen. The magnesium ion was placed to match its dictionary distance value. In b, c, the bottom panels show the Ramachandran angles of the amino acids comprising the catalytic loop (the Ramachandran plots of gp66 and its two mutants are overlaid on top of each other); residues that interact with the catalytic loop are semi-transparent and represent the crystal structure conformation.

Extended Data Table 1 Data collection and refinement statistics
Extended Data Table 2 Absolutely conserved amino acids of RNAPs of crAss-like phages and their analogues in other RNAPs based on structural alignments

Supplementary information

Supplementary Information

This file contains a Supplementary Discussion, Supplementary Methods, Supplementary Figures 1-7, Supplementary Tables 1-6, and Supplementary References.

Reporting Summary

Video 1

: Structure of gp66 RNAP with putative locations of DNA and RNA molecules. The domain color code is the same as in the Extended data Fig. 4. The location of the RNA and DNA molecules and that of the RNA-DNA duplex are derived from the superposition of gp66 onto the crystal structure of the T. thermophilus RNAP elongation complex (PDB code 2O5J). To avoid clashes with gp66, the direction of RNA and DNA tails was adjusted, and then their geometries were minimized. Neither the structure nor location of the RNA-DNA duplex was changed in this procedure.

.Peer Review File

Source data

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Drobysheva, A.V., Panafidina, S.A., Kolesnik, M.V. et al. Structure and function of virion RNA polymerase of a crAss-like phage. Nature 589, 306–309 (2021).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

This article is cited by


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing