Abstract
Bacteria respond to environmental stimuli through precise regulation of transcription initiation and elongation. Bulk RNA sequencing primarily characterizes mature transcripts, so to identify actively transcribed loci we need to capture RNA polymerase (RNAP) complexed with nascent RNA. However, such capture methods have only previously been applied to culturable, genetically tractable organisms such as E. coli and B. subtilis. Here we apply precision run-on sequencing (PRO-seq) to profile nascent transcription in cultured E. coli and diverse uncultured bacteria. We demonstrate that PRO-seq can characterize the transcription of small, structured, or post-transcriptionally modified RNAs, which are often absent from bulk RNA-seq libraries. Applying PRO-seq to the human microbiome highlights taxon-specific RNAP pause motifs and pause-site distributions across non-coding RNA loci that reflect structure-coincident pausing. We also uncover concurrent transcription and cleavage of CRISPR guide RNAs and transfer RNAs. We demonstrate the utility of PRO-seq for exploring transcriptional dynamics in diverse microbial communities.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
Sequencing data produced in this project were uploaded to NCBI’s Sequence Read Archive and are associated with BioProjects PRJNA800038 and PRJNA800070.
Code availability
Scripts and notebooks used to process and visualize sequencing data are available at https://github.com/britolab/PRO-seq.
References
Wissink, E. M., Vihervaara, A., Tippens, N. D. & Lis, J. T. Nascent RNA analyses: tracking transcription and its regulation. Nat. Rev. Genet. 20, 705–723 (2019).
Larson, M. H. et al. A pause sequence enriched at translation start sites drives transcription dynamics in vivo. Science 344, 1042–1047 (2014).
Imashimizu, M. et al. Visualizing translocation dynamics and nascent transcript errors in paused RNA polymerases in vivo. Genome Biol. 16, 98 (2015).
Sharma, C. M. et al. The primary transcriptome of the major human pathogen Helicobacter pylori. Nature 464, 250–255 (2010).
Thomason, M. K. et al. Global transcriptional start site mapping using differential RNA sequencing reveals novel antisense RNAs in Escherichia coli. J. Bacteriol. 197, 18–28 (2015).
Ettwiller, L., Buswell, J., Yigit, E. & Schildkraut, I. A novel enrichment strategy reveals unprecedented number of novel transcription start sites at single base resolution in a model prokaryote and the gut microbiome. BMC Genomics 17, 199 (2016).
Mahat, D. B. et al. Base-pair-resolution genome-wide mapping of active RNA polymerases using precision nuclear run-on (PRO-seq). Nat. Protoc. 11, 1455–1476 (2016).
Blumberg, A. et al. Characterizing RNA stability genome-wide through combined analysis of PRO-seq and RNA-seq data. BMC Biol. https://doi.org/10.1186/s12915-021-00949-x (2021).
Mentesana, P. E., Chin-Bow, S. T., Sousa, R. & McAllister, W. T. Characterization of halted T7 RNA polymerase elongation complexes reveals multiple factors that contribute to stability. J. Mol. Biol. 302, 1049–1062 (2000).
Blumberg, A., Rice, E. J., Kundaje, A., Danko, C. G. & Mishmar, D. Initiation of mtDNA transcription is followed by pausing, and diverges across human cell types and during evolution. Genome Res. 27, 362–373 (2017).
Alberti, A. et al. Comparison of library preparation methods reveals their impact on interpretation of metatranscriptomic data. BMC Genomics 15, 912 (2014).
Dartigalongue, C., Missiakas, D. & Raina, S. Characterization of the Escherichia coliςE regulon. J. Biol. Chem. 276, 20866–20875 (2001).
Wesolowska-Andersen, A. et al. Choice of bacterial DNA extraction method from fecal material influences community structure as evaluated by metagenomic analysis. Microbiome 2, 19 (2014).
Liu, X. & Martin, C. T. Transcription elongation complex stability: the topological lock. J. Biol. Chem. 284, 36262–36270 (2009).
Liu, F. et al. Systematic evaluation of the viable microbiome in the human oral and gut samples with spike-in Gram+/− bacteria. mSystems 8, e0073822 (2023).
Croucher, N. J. & Thomson, N. R. Studying bacterial transcriptomes using RNA-seq. Curr. Opin. Microbiol. 13, 619–624 (2010).
Yuzhen, Y. E. & Quan, Z. Characterization of CRISPR RNA transcription by exploiting stranded metatranscriptomic data. RNA 22, 945–956 (2016).
Charpentier, E., Richter, H., van der Oost, J. & White, M. F. Biogenesis pathways of RNA guides in archaeal and bacterial CRISPR-Cas adaptive immunity. FEMS Microbiol. Rev. 39, 428–441 (2015).
Barrangou, R. et al. CRISPR provides acquired resistance against viruses in prokaryotes. Science 315, 1709–1712 (2007).
Richter, H. et al. Characterization of CRISPR RNA processing in Clostridium thermocellum and Methanococcus maripaludis. Nucleic Acids Res. 40, 9887–9896 (2012).
Deltcheva, E. et al. CRISPR RNA maturation by trans-encoded small RNA and host factor RNase III. Nature 471, 602–607 (2011).
Xue, C. & Sashital, D. G. Mechanisms of type I-E and I-F CRISPR-Cas systems in Enterobacteriaceae. EcoSal Plus https://doi.org/10.1128/ecosalplus.ESP-0008-2018 (2019).
Xu, H., Yao, J., Wu, D. C. & Lambowitz, A. M. Improved TGIRT-seq methods for comprehensive transcriptome profiling with decreased adapter dimer formation and bias correction. Sci. Rep. 9, 7953 (2019).
Boivin, V. et al. Reducing the structure bias of RNA-seq reveals a large number of non-annotated non-coding RNA. Nucleic Acids Res. 48, 2271–2286 (2020).
Marbaniang, C. N. & Vogel, J. Emerging roles of RNA modifications in bacteria. Curr. Opin. Microbiol. 30, 50–57 (2016).
de Crécy-Lagard, V. & Jaroch, M. Functions of bacterial tRNA modifications: from ubiquity to diversity. Trends Microbiol. 29, 41–53 (2021).
Li, Z. & Stanton, B. A. Transfer RNA-derived fragments, the underappreciated regulatory small RNAs in microbial pathogenesis. Front. Microbiol. 12, 687632 (2021).
Haiser, H. J., Karginov, F. V., Hannon, G. J. & Elliot, M. A. Developmentally regulated cleavage of tRNAs in the bacterium Streptomyces coelicolor. Nucleic Acids Res. 36, 732–741 (2008).
Schwartz, M. H. et al. Microbiome characterization by high-throughput transfer RNA sequencing and modification analysis. Nat. Commun. 9, 5353 (2018).
Shigematsu, M. et al. YAMAT-seq: an efficient method for high-throughput sequencing of mature transfer RNAs. Nucleic Acids Res. https://doi.org/10.1093/nar/gkx005 (2017).
Jiang, X. et al. Invertible promoters mediate bacterial phase variation, antibiotic resistance, and host adaptation in the gut. Science 363, 181–187 (2019).
Lan, F. et al. Single-cell analysis of multiple invertible promoters reveals differential inversion rates as a strong determinant of bacterial population heterogeneity. Sci. Adv. 9, eadg5476 (2023).
Chatzidaki-Livanis, M., Coyne, M. J. & Comstock, L. E. A family of transcriptional antitermination factors necessary for synthesis of the capsular polysaccharides of Bacteroides fragilis. J. Bacteriol. 191, 7288–7295 (2009).
Henrot, C. & Petit, M.-A. Signals triggering prophage induction in the gut microbiota. Mol. Microbiol. 118, 494–502 (2022).
Belogurov, G. A. & Artsimovitch, I. Regulation of transcript elongation. Annu. Rev. Microbiol. 69, 49–69 (2015).
Henderson, K. L. et al. Mechanism of transcription initiation and promoter escape by E. coli RNA polymerase. Proc. Natl Acad. Sci. USA 114, E3032–E3040 (2017).
Vvedenskaya, I. O. et al. Interactions between RNA polymerase and the ‘core recognition element’ counteract pausing. Science 344, 1285–1289 (2014).
Sun, Z., Yakhnin, A. V., FitzGerald, P. C., Mclntosh, C. E. & Kashlev, M. Nascent RNA sequencing identifies a widespread sigma70-dependent pausing regulated by Gre factors in bacteria. Nat. Commun. 12, 906 (2021).
Chuang, S. E. & Blattner, F. R. Characterization of twenty-six new heat shock genes of Escherichia coli. J. Bacteriol. 175, 5242–5252 (1993).
Schmieder, R. & Edwards, R. Quality control and preprocessing of metagenomic datasets. Bioinformatics 27, 863–864 (2011).
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
Rotmistrovsky, K. & Agarwala, R. BMTagger: Best Match Tagger for Removing Human Reads from Metagenomics Datasets.
Nurk, S., Meleshko, D., Korobeynikov, A. & Pevzner, P. A. MetaSPAdes: a new versatile metagenomic assembler. Genome Res. 27, 824–834 (2017).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at https://arxiv.org/abs/1303.3997v2 (2013).
Alneberg, J. et al. Binning metagenomic contigs by coverage and composition. Nat. Methods 11, 1144–1146 (2014).
Kang, D. D. et al. MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies. PeerJ 7, e7359 (2019).
Wu, Y.-W., Simmons, B. A. & Singer, S. W. MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics 32, 605–607 (2016).
Sieber, C. M. K. et al. Recovery of genomes from metagenomes via a dereplication, aggregation and scoring strategy. Nat. Microbiol. 3, 836–843 (2018).
Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12, 59–60 (2015).
Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P. & Tyson, G. W. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 25, 1043–1055 (2015).
Chaumeil, P. A., Mussig, A. J., Hugenholtz, P. & Parks, D. H. GTDB-Tk: a toolkit to classify genomes with the genome taxonomy database. Bioinformatics 36, 1925–1927 (2020).
Seemann, T. Prokka: rapid prokaryotic genome annotation. Bioinformatics 30, 2068–2069 (2014).
Hyatt, D. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11, 119 (2010).
Laslett, D. & Canback, B. ARAGORN, a program to detect tRNA genes and tmRNA genes in nucleotide sequences. Nucleic Acids Res. 32, 11–16 (2004).
Seemann, T. barrnap 0.9: Rapid Ribosomal RNA Prediction. https://github.com/tseemann/barrnap
Nawrocki, E. P. & Eddy, S. R. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics 29, 2933–2935 (2013).
Clark, K., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J. & Sayers, E. W. GenBank. Nucleic Acids Res. 44, D67–D72 (2016).
Freddolino, P. L., Amini, S. & Tavazoie, S. Newly identified genetic variations in common Escherichia coli MG1655 stock cultures. J. Bacteriol. 194, 303–306 (2012).
Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Liao, Y., Smyth, G. K. & Shi, W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30, 923–930 (2014).
Chávez, J. et al. Programmatic access to bacterial regulatory networks with regutools. Bioinformatics 36, 4532–4534 (2020).
Santos-Zavaleta, A. et al. RegulonDB v 10.5: tackling challenges to unify classic and high throughput knowledge of gene regulation in E. coli K-12. Nucleic Acids Res. 47, D212–D220 (2019).
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Wickham, H. ggplot2: Elegant Graphics for Data Analysis (Springer, 2016).
Skennerton, C.T. MinCED: Mining CRISPRs in Environmental Datasets. https://github.com/ctSkennerton/minced
Bland, C. et al. CRISPR Recognition Tool (CRT): a tool for automatic detection of clustered regularly interspaced palindromic repeats. BMC Bioinformatics 8, 209 (2007).
Hofacker, I. L. Vienna RNA secondary structure server. Nucleic Acids Res. 31, 3429–3431 (2003).
Kerpedjiev, P., Hammer, S. & Hofacker, I. L. Forna (force-directed RNA): simple and effective online RNA secondary structure diagrams. Bioinformatics 31, 3377–3379 (2015).
R Core Team. R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2014).
Kassambara, A. ggpubr: ‘ggplot2’ Based Publication Ready Plots (2020). https://github.com/kassambara/ggpubr
Kieft, K., Zhou, Z. & Anantharaman, K. VIBRANT: automated recovery, annotation and curation of microbial viruses, and evaluation of viral community function from genomic sequences. Microbiome 8, 90 (2020).
Wagih, O. ggseqlogo: a versatile R package for drawing sequence logos. Bioinformatics 33, 3645–3647 (2017).
Pagès, H., Aboyoun, P., Gentleman, R. & DebRoy, S. Biostrings: Efficient Manipulation of Biological Strings. https://github.com/Bioconductor/Biostrings
Amman, F. et al. TSSAR: TSS annotation regime for dRNA-seq data. BMC Bioinformatics 15, 89 (2014).
Acknowledgements
We thank P. Diebold for helpful discussions regarding cell permeabilization and data visualization. This work was funded by the NIGMS (R01 GM147731-01, awarded to I.L.B.) and the NHGRI (R01 HG009309 and R01 HG010346, awarded to C.G.D.). I.L.B. is a Packard Foundation Fellow and a Pew Biomedical Scholar. A.C.V. is a Cornell Center for Vertebrate Genomics Distinguished Scholar.
Author information
Authors and Affiliations
Contributions
A.C.V., I.D.V., C.G.D. and I.L.B. conceptualized the study. A.C.V. and E.J.R. carried out experiments. A.C.V. and I.L.B. analysed the data and wrote the manuscript. All authors provided feedback and comments on the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Microbiology thanks Anna Kuchina and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Metagenome characteristics and library type comparisons.
(a) Genus-level relative abundance data for US2 and US3 metagenomic assemblies, calculated with CheckM. Metagenomic bins were assigned taxonomic labels using GTDB-Tk. (b) Percent completeness and percent contamination of each of the high-quality metagenomic bins (<5% contamination; >90% completeness) included in the study for US2 and US3, as determined by CheckM. (c) Phylum-level relative abundance for all library types (terminator exonuclease negative and positive dRNA-seq libraries) calculated from mapped reads using Kraken2 and Bracken. (d) Family-level relative abundance for PRO-seq and RNAseq libraries calculated from mapped reads using Kraken2 and Bracken. Method #1 samples correspond to PRO-seq libraries that were processed without additional enzymes during permeabilization, and Method #2 samples correspond to PRO-seq libraries processed with these enzyme (see Methods). Note that, because of sample limitations, ‘Method #1’ and ‘Method #2’ metagenomes are different samples collected from the same individuals. (e) An UpSet plot showing the overlap of PRO-seq and dRNA-seq peaks coincident with promoter-proximal loci, defined as 50 bp up- and down-stream of the start codon of each open reading frame. The horizontal bar chart shows the total number of loci, and the total number of dRNA-seq peaks and PRO-seq peaks coincident with those loci.
Extended Data Fig. 2 Periodicity observed in CRISPR loci within the PRO-seq data.
Strand-specific RNAseq and PRO-seq read depths, in addition to PRO-seq reads’ 3’- and 5’-ends, are plotted for several well-covered CRISPR loci. Shaded boxes represent repeats. Sequence logos below each plot show repeat conservation. As in Fig. 3a, b, panels (A) and (B) show PRO-seq read 5’ end pile-ups at the same position across repeats. (C) and (D) show PRO-seq read 5’ end pile-ups within spacers.
Extended Data Fig. 3 CRISPR loci in E. coli MG1655 show co-transcriptional cleavage.
(a) One of the two CRISPR loci in E. coli MG1655 is depicted under control (left) and heat-shock (right) conditions. Strand-specific RNAseq and PRO-seq read depths, in addition to PRO-seq reads’ 3’- and 5’-ends, are plotted. Shaded boxes represent repeats. (b) Zoomed-in depiction of the PRO-seq 5’ RNA ends showing pile-ups at consistent positions within repeats. (c) Predicted crRNA repeat secondary structure. The black arrow points to the phosphodiester bond that is possibly cleaved by CasE during pre-crRNA processing, which marks the same position in the repeat as the arrows in (B). (d) PRO-seq captures nascent transcription of the entire CRISPR locus, situated downstream of the crRNA array, including CasE.
Extended Data Fig. 4 PRO-seq traces showing 5’ read end pile-ups within microbiota tRNAs.
(a) tRNA genes were identified in three highly complete US2 bins: Prevotella sp900313215, Prevotella sp002265625 and Prevotella copri. Different colors in the stacked bar plots represent different tRNA isoforms. (b) Representative tRNA genes, listed according to the sample, species annotation, and anticodon, are depicted from the two human microbiome samples. PRO-seq coverage, pile-up of PRO-seq 3’and 5’ read ends, and RNAseq coverage are shown for each tRNA gene (left). A zoomed-in PRO-seq read 5’ end pile-up is shown for each tRNA gene (right). Dotted lines show the boundaries of the tRNA gene.
Extended Data Fig. 5 PRO-seq traces across E. coli tRNAs show PRO-seq 5’ read end pile-ups.
Representative E. coli tRNA genes, listed by isoform, are shown for control (left) and heat shock (right) conditions. PRO-seq coverage, pile-up of PRO-seq 3’and 5’ read ends, and RNAseq coverage are shown for each tRNA gene. Arrows indicate direction of transcription.
Extended Data Fig. 6 Sites of RNAP stalling in tRNA sequences.
All tRNA loci identified in two species, Coprococcus eutactus (US2, top) and Ruminococcus bicirculans (US3, bottom), aligned at the anticodon sequence (vertical black lines). Sequence logos show sequence conservation, and bar plots give counts of PRO-seq 3’-end peaks (Z-score > 5, see Methods) at each aligned position. Secondary structures for representative tRNA sequences (yellow stars) are given at the right, with density plots reiterating the 3’ peak count data overlaid on the tRNA structures.
Extended Data Fig. 7 PRO-seq reveals aborted transcription at invertons and prophages.
(a) Stranded coverage data across four invertons from US2 and US3 is shown, with inverted repeats marked with blue triangles. Coordinates and directionality of coincident genes are given below the coverage plots. Decomposition of PRO-seq reads into 3’ and 5’ ends shows that transcription is initiated within the inverton and terminated just downstream. (b) Four examples of transcription across prophages, highlighting the complementary nature of PRO-seq and RNAseq data for observing the transcription of mobile genetic elements. The bounds of CI-like transcriptional regulators are demarcated by yellow arrows. Teal arrows give the bounds of genes encoding proteins of unknown function.
Extended Data Fig. 8 Phyla-specific pause site motifs.
Logos for clustered sequences surrounding PRO-seq read 3’ end peaks annotated for one Bacteroidota, one Proteobacteria, and two Bacillota species. The number of constituent peaks in a cluster out of the total number of peaks identified per bin is provided, as well as the median Z-score for each cluster and a plot showing the log2(Z-score) distribution for all positions in the −11 to +5 window. Position −1 represents the RNAP pause site and position +1 represents the next nucleotide added.
Extended Data Fig. 9 PRO-seq traces capture transcription of E. coli small regulatory RNAs.
(a) Normalized transcriptome profiles at selected E. coli small non-coding RNA (sRNA) loci. The left panel shows genomic context 2 kb up- and downstream from each sRNA locus (small black arrow). On the right, RNAseq coverage, composite PRO-seq read coverage, 5’ end and 3’ end coverage are shown for the sRNA locus, the bounds and strand of which are given by the large black arrows. (b) Log-log RPKM plots comparing merged PRO-seq and RNAseq libraries for control and heat-shock conditions. Genes are colored by RNA type. Spearman’s rank correlation coefficients (ρ) and Pearson’s correlation coefficients (r) are inset. (c) Box plots show the RPKM distribution for small non-coding RNAs and tRNAs across control and heat-shock conditions; 1 was added to all counts before normalization to facilitate plotting on a log scale. Black lines represent medians. P-values from two-sided Wilcoxon signed-rank tests are reported for each RNA type + treatment pair.
Supplementary information
Supplementary Table 1
Sequencing library information. Method #1 samples correspond to PRO-seq libraries that were processed without additional enzymes during permeabilization, and Method #2 samples correspond to PRO-seq libraries processed with these enzymes (see Methods). Note that, because of sample limitations, ‘Method #1’ and ‘Method #2’ metagenomes are different samples collected from the same individuals.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Vill, A.C., Rice, E.J., De Vlaminck, I. et al. Precision run-on sequencing (PRO-seq) for microbiome transcriptomics. Nat Microbiol 9, 241–250 (2024). https://doi.org/10.1038/s41564-023-01558-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41564-023-01558-w