Screening thousands of transcribed coding and non-coding regions reveals sequence determinants of RNA polymerase II elongation potential

Vlaming, Hanneke; Mimoso, Claudia A.; Field, Andrew R.; Martin, Benjamin J. E.; Adelman, Karen

doi:10.1038/s41594-022-00785-9

Technical Report
Published: 09 June 2022

Screening thousands of transcribed coding and non-coding regions reveals sequence determinants of RNA polymerase II elongation potential

Nature Structural & Molecular Biology volume 29, pages 613–620 (2022)Cite this article

6374 Accesses
13 Citations
63 Altmetric
Metrics details

Subjects

Abstract

Precise regulation of transcription by RNA polymerase II (RNAPII) is critical for organismal growth and development. However, what determines whether an engaged RNAPII will synthesize a full-length transcript or terminate prematurely is poorly understood. Notably, RNAPII is far more susceptible to termination when transcribing non-coding RNAs than when synthesizing protein-coding mRNAs, but the mechanisms underlying this are unclear. To investigate the impact of transcribed sequence on elongation potential, we developed a method to screen the effects of thousands of INtegrated Sequences on Expression of RNA and Translation using high-throughput sequencing (INSERT-seq). We found that higher AT content in non-coding RNAs, rather than specific sequence motifs, drives RNAPII termination. Further, we demonstrate that 5′ splice sites autonomously stimulate processive transcription, even in the absence of polyadenylation signals. Our results reveal a potent role for the transcribed sequence in dictating gene output and demonstrate the power of INSERT-seq toward illuminating these contributions.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: INSERT-seq demonstrates the role of transcribed sequences in gene regulation.**

**Fig. 2: Transcribed sequence directly affects transcription levels.**

**Fig. 3: GC content inherently affects transcriptional output.**

**Fig. 4: Co-transcriptionally spliced introns boost transcription.**

**Fig. 5: Splicing-dependent and splicing-independent role of the 5′SS.**

Full-length RNA profiling reveals pervasive bidirectional transcription terminators in bacteria

Article 15 July 2019

Xiangwu Ju, Dayi Li & Shixin Liu

Nascent RNA analyses: tracking transcription and its regulation

Article 09 August 2019

Erin M. Wissink, Anniina Vihervaara, … John T. Lis

Using TTchem-seq for profiling nascent transcription and measuring transcript elongation

Article 08 January 2020

Lea H. Gregersen, Richard Mitter & Jesper Q. Svejstrup

Data availability

Raw and processed data files of all INSERT-seq experiments, PRO-seq, H3K4me3 ChIP–seq, and TT-seq are available at the Gene Expression Omnibus, accession no. GSE178230. H3K27ac ChIP–seq data are available through the 4DN data portal (https://data.4dnucleome.org/), ExperimentSet accession no. 4DNESQ33L4G7. H3K4me1 mESC ChIP–seq data were downloaded from the Gene Expression Omnibus, accession no. GSE56138. Reference genome mm10 (GRCm38) can be downloaded using RefSeq assembly accession number GCF_000001635.20. Supplementary Tables 3–7 provide all normalized and averaged data from INSERT-seq experiments, as well as which inserts are included in which plot. Uncropped image files and processed data shown in each plot are provided as source data. Source data are provided with this paper.

Code availability

All scripts used for analysis of INSERT-seq data can be found on Github: https://github.com/AdelmanLab/Vlaming2021_INSERT-seq_paper. URLs for all custom scripts used for PRO-seq, TT-seq and ChIP–seq analysis are provided in the Methods; these can be found at https://github.com/AdelmanLab/NIH_scripts/ and https://github.com/benjaminmartin02/binBedGraph.

References

Lykke-Andersen, S. et al. Integrator is a genome-wide attenuator of non-productive transcription. Mol. Cell 81, 514–529.e6 (2021).
Article CAS PubMed Google Scholar
Scruggs, B. S. et al. Bidirectional transcription arises from two distinct hubs of transcription factor binding and active chromatin. Mol. Cell 58, 1101–1112 (2015).
Article CAS PubMed PubMed Central Google Scholar
Tian, B., Hu, J., Zhang, H. & Lutz, C. S. A large-scale analysis of mRNA polyadenylation of human and mouse genes. Nucleic Acids Res. 33, 201–212 (2005).
Article CAS PubMed PubMed Central Google Scholar
Shi, Y. & Manley, J. L. The end of the message: multiple protein-RNA interactions define the mRNA polyadenylation site. Genes Dev. 29, 889–897 (2015).
Article CAS PubMed PubMed Central Google Scholar
Ntini, E. et al. Polyadenylation site–induced decay of upstream transcripts enforces promoter directionality. Nat. Struct. Mol. Biol. 20, 923–928 (2013).
Article CAS PubMed Google Scholar
Almada, A. E., Wu, X., Kriz, A. J., Burge, C. B. & Sharp, P. A. Promoter directionality is controlled by U1 snRNP and polyadenylation signals. Nature 499, 360–363 (2013).
Article CAS PubMed PubMed Central Google Scholar
Core, L. J. et al. Analysis of nascent RNA identifies a unified architecture of initiation regions at mammalian promoters and enhancers. Nat. Genet. 46, 1311–1320 (2014).
Article CAS PubMed PubMed Central Google Scholar
Chiu, A. C. et al. Transcriptional pause sites delineate stable nucleosome-associated premature polyadenylation suppressed by U1 snRNP. Mol. Cell 69, 648–663 (2018).
Article CAS PubMed PubMed Central Google Scholar
Le Hir, H., Nott, A. & Moore, M. J. How introns influence and enhance eukaryotic gene expression. Trends Biochem. Sci. 28, 215–220 (2003).
Article PubMed CAS Google Scholar
Damgaard, C. K. et al. A 5′ splice site enhances the recruitment of basal transcription initiation factors in vivo. Mol. Cell 29, 271–278 (2008).
Article CAS PubMed Google Scholar
Bieberstein, N. I., Carrillo Oesterreich, F., Straube, K. & Neugebauer, K. M. First exon length controls active chromatin signatures and transcription. Cell Rep. 2, 62–68 (2012).
Article CAS PubMed Google Scholar
Fiszbein, A., Krick, K. S., Begg, B. E. & Burge, C. B. Exon-mediated activation of transcription starts. Cell 179, 1551–1565(2019).
Article CAS PubMed PubMed Central Google Scholar
Sousa-Luís, R. et al. POINT technology illuminates the processing of polymerase-associated intact nascent transcripts. Mol. Cell 81, 1935–19502021).
Article PubMed PubMed Central CAS Google Scholar
Caizzi, L. et al. Efficient RNA polymerase II pause release requires U2 snRNP function. Mol. Cell 81, 1920–1934.e9 (2021).
Article CAS PubMed Google Scholar
Kaida, D. et al. U1 snRNP protects pre-mRNAs from premature cleavage and polyadenylation. Nature 468, 664–668 (2010).
Article CAS PubMed PubMed Central Google Scholar
Berg, M. G. et al. U1 snRNP determines mRNA length and regulates isoform expression. Cell 150, 53–64 (2012).
Article CAS PubMed PubMed Central Google Scholar
Andersen, P. K., Lykke-Andersen, S. & Jensen, T. H. Promoter-proximal polyadenylation sites reduce transcription activity. Genes Dev. 26, 2169–2179 (2012).
Article CAS PubMed PubMed Central Google Scholar
Zhang, S. et al. Structure of a transcribing RNA polymerase II–U1 snRNP complex. Science 371, 305–309 (2021).
Article CAS PubMed Google Scholar
Kinney, J. B., Murugan, A., Callan, C. G. & Cox, E. C. Using deep sequencing to characterize the biophysical mechanism of a transcriptional regulatory sequence. Proc. Natl Acad. Sci. USA 107, 9158–9163 (2010).
Article CAS PubMed PubMed Central Google Scholar
Sharon, E. et al. Inferring gene regulatory logic from high-throughput measurements of thousands of systematically designed promoters. Nat. Biotechnol. 30, 521–530 (2012).
Article CAS PubMed PubMed Central Google Scholar
Field, A. & Adelman, K. Evaluating enhancer function and transcription. Annu. Rev. Biochem. 89, 213–234 (2020).
Article CAS PubMed Google Scholar
Hnisz, D. et al. Super-enhancers in the control of cell identity and disease. Cell 155, 934–947 (2013).
Article CAS PubMed Google Scholar
Flynn, R. A. et al. 7SK–BAF axis controls pervasive transcription at enhancers. Nat. Struct. Mol. Biol. 23, 231–238 (2016).
Article CAS PubMed PubMed Central Google Scholar
Preker, P. et al. RNA exosome depletion reveals transcription upstream of active human promoters. Science 322, 1851–1854 (2008).
Article CAS PubMed Google Scholar
Seila, A. C. et al. Divergent transcription from active promoters. Science 322, 1849–1851 (2008).
Article CAS PubMed PubMed Central Google Scholar
Andersson, R. et al. An atlas of active enhancers across human cell types and tissues. Nature 507, 455–461 (2014).
Article CAS PubMed PubMed Central Google Scholar
Kwak, H., Fuda, N. J., Core, L. J. & Lis, J. T. Precise maps of RNA polymerase reveal how promoters direct initiation and pausing. Science 339, 950–953 (2013).
Article CAS PubMed PubMed Central Google Scholar
Mouse Genome Sequencing Consortium. Initial sequencing and comparative analysis of the mouse genome. Nature 420, 520–562 (2002).
Article CAS Google Scholar
Krinner, S. et al. CpG domains downstream of TSSs promote high levels of gene expression. Nucleic Acids Res. 42, 3551–3564 (2014).
Article CAS PubMed PubMed Central Google Scholar
Noe Gonzalez, M., Blears, D. & Svejstrup, J. Q. Causes and consequences of RNA polymerase II stalling during transcript elongation. Nat. Rev. Mol. Cell Biol. 22, 3–21 (2021).
Article CAS PubMed Google Scholar
Zamft, B., Bintu, L., Ishibashi, T. & Bustamante, C. Nascent RNA structure modulates the transcriptional dynamics of RNA polymerases. Proc. Natl Acad. Sci. 109, 8948–8953 (2012).
Article CAS PubMed PubMed Central Google Scholar
Turowski, T. W. et al. Nascent transcript folding plays a major role in determining RNA polymerase elongation rates. Mol. Cell 79, 488–503(2020).
Article CAS PubMed PubMed Central Google Scholar
Roberts, J. W. Mechanisms of bacterial transcription termination. J. Mol. Biol. 431, 4030–4039 (2019).
Article CAS PubMed Google Scholar
Mishra, S. & Maraia, R. J. RNA polymerase III subunits C37/53 modulate rU:dA hybrid 3′ end dynamics during transcription termination. Nucleic Acids Res. 47, 310–327 (2019).
Article CAS PubMed Google Scholar
Fouqueau, T. et al. The cutting edge of archaeal transcription. Emerg. Top. Life Sci. 2, 517–533 (2018).
Article CAS PubMed PubMed Central Google Scholar
Davidson, L., Francis, L., Eaton, J. D. & West, S. Integrator-dependent and allosteric/intrinsic mechanisms ensure efficient termination of snRNA transcription. Cell Rep. 33, 108319 (2020).
Article CAS PubMed PubMed Central Google Scholar
White, E., Kamieniarz-Gdula, K., Dye, M. J. & Proudfoot, N. J. AT-rich sequence elements promote nascent transcript cleavage leading to RNA polymerase II termination. Nucleic Acids Res. 41, 1797–1806 (2013).
Article CAS PubMed Google Scholar
Bailey, T. L. & Elkan, C. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc. Int. Conf. Intell. Syst. Mol. Biol. 2, 28–36 (1994).
CAS PubMed Google Scholar
Heinz, S. et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell 38, 576–89 (2010).
Article CAS PubMed PubMed Central Google Scholar
Levitt, N., Briggs, D., Gil, A. & Proudfoot, N. J. Definition of an efficient synthetic poly(A) site. Genes Dev. 3, 1019–25 (1989).
Article CAS PubMed Google Scholar
Yeo, G. & Burge, C. B. Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals. J. Comput. Biol. 11, 377–394 (2004).
Article CAS PubMed Google Scholar
Mordstein, C. et al. Codon usage and splicing jointly influence mrna localization. Cell Syst. 10, 351–362.e8 (2020).
Article CAS PubMed PubMed Central Google Scholar
Elrod, N. D. et al. The integrator complex attenuates promoter-proximal transcription at protein-coding genes. Mol. Cell 76, 738–752 (2019).
Article CAS PubMed PubMed Central Google Scholar
Austenaa, L. M. I. et al. A first exon termination checkpoint preferentially suppresses extragenic transcription. Nat. Struct. Mol. Biol. 28, 337–346 (2021).
Article CAS PubMed PubMed Central Google Scholar
Estell, C., Davidson, L., Steketee, P. C., Monier, A. & West, S. ZC3H4 restricts non-coding transcription in human cells. eLife 10, e67305 (2021).
Article CAS PubMed PubMed Central Google Scholar
Rivera-Mulia, J. C. et al. Allele-specific control of replication timing and genome organization during development. Genome Res. 28, 800–811 (2018).
Article CAS PubMed PubMed Central Google Scholar
Williams, L. H. et al. Pausing of RNA polymerase II regulates mammalian developmental potential through control of signaling networks. Mol. Cell 58, 311–322 (2015).
Article CAS PubMed PubMed Central Google Scholar
Brinkman, E. K., Chen, T., Amendola, M. & van Steensel, B. Easy quantitative assessment of genome editing by sequence trace decomposition. Nucleic Acids Res. 42, e168–e168 (2014).
Article PubMed PubMed Central CAS Google Scholar
Reimer, K. A., Mimoso, C. A., Adelman, K. & Neugebauer, K. M. Co-transcriptional splicing regulates 3′ end cleavage during mammalian erythropoiesis. Mol. Cell 81, 998–1012.e7 (2021).
Article CAS PubMed PubMed Central Google Scholar
Henriques, T. et al. Stable pausing by RNA polymerase II provides an opportunity to target and integrate regulatory signals. Mol. Cell 52, 517–528 (2013).
Article CAS PubMed Google Scholar
Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J. 17, 10 (2011).
Article Google Scholar
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Article CAS PubMed PubMed Central Google Scholar
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
Article CAS PubMed Google Scholar
Danecek, P. et al. Twelve years of SAMtools and BCFtools. Gigascience 10, giab008 (2021).
Article PubMed PubMed Central CAS Google Scholar
Korhonen, J. H., Palin, K., Taipale, J. & Ukkonen, E. Fast motif matching revisited: high-order PWMs, SNPs and indels. Bioinformatics 33, 514–521 (2016).
Google Scholar
Georgiou, G. & van Heeringen, S. J. fluff: exploratory analysis and visualization of high-throughput sequencing data. PeerJ 4, e2209 (2016).
Article PubMed PubMed Central Google Scholar
Buecker, C. et al. Reorganization of enhancer patterns in transition from naive to primed pluripotency. Cell Stem Cell 14, 838–853 (2014).
Article CAS PubMed PubMed Central Google Scholar
Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).
Article PubMed PubMed Central CAS Google Scholar
Schwalb, B. et al. TT-seq maps the human transient transcriptome. Science 352, 1225–1228 (2016).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

We thank K. Sasaki for her help in optimizing the run-on protocol for screening purposes and E. Kaye for discussions on library design. We thank S. Buratowski for useful discussions on the project, and D. Shlyueva and T. H. Jensen for feedback on the manuscript. We are also grateful to the Flow Cytometry Facility at the HMS Department of Immunology for cell sorting help and advice, the HMS Nascent Transcriptomics Core for PRO-seq library construction, and the HMS Biopolymers Facility and The Bauer Core Facility at Harvard University for next-generation sequencing. This research was supported by the European Molecular Biology Organization (ALTF 531-2017 to H. V.), Human Frontier Science Program (LT000651/2018-L to H. V.), the National Institutes of Health (NIH R01 GM139960 to K. A.), startup funding from Harvard Medical School to K. A, the National Science Foundation Graduate Research Fellowship (DGE1745303 to C. A. M.) and the Canadian Institutes of Health Research (Banting fellowship to B. J. E. M.).

Author information

Authors and Affiliations

Department of Biological Chemistry and Molecular Pharmacology, Blavatnik Institute, Harvard Medical School, Boston, MA, USA
Hanneke Vlaming, Claudia A. Mimoso, Andrew R. Field, Benjamin J. E. Martin & Karen Adelman

Authors

Hanneke Vlaming
View author publications
You can also search for this author in PubMed Google Scholar
Claudia A. Mimoso
View author publications
You can also search for this author in PubMed Google Scholar
Andrew R. Field
View author publications
You can also search for this author in PubMed Google Scholar
Benjamin J. E. Martin
View author publications
You can also search for this author in PubMed Google Scholar
Karen Adelman
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

H. V. and K. A. conceived the study and designed experiments. H. V. performed experiments and analyzed data. C. A. M. performed PRO-seq data analysis, helped generate intron-containing clonal cell lines, and optimized the run-on assay and knockdown conditions. B. J. E. M. and A. R. F. performed ChIP–seq and TT-seq experiments. K. A. supervised the study. H. V. and K. A. wrote the manuscript with input from all co-authors.

Corresponding authors

Correspondence to Hanneke Vlaming or Karen Adelman.

Ethics declarations

Competing interests

K. A. is a consultant for Syros Pharmaceuticals, is on the scientific advisory board of CAMP4 Therapeutics, and receives research funding from Novartis unrelated to this work. The remaining authors declare no competing interests.

Peer review

Peer review information

Nature Structural and Molecular Biology thanks Yongsheng Shi and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Carolina Perdigoto, in collaboration with the Nature Structural & Molecular Biology team. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Correlations between INSERT-seq experiments.

a, Spearman correlation coefficients between steady-state RNA and Sort-seq experiments, using all inserts for which data was obtained in each of the six experiments (n = 12,090). b, Sort-seq scores of inserts containing TSS-proximal and TSS-distal genomic regions of indicated RNA classes. Same groups as in Fig. 1e. Comparisons between proximal and distal regions by Kruskal-Wallis test, **** indicates P < 0.0001. c, Sort-seq scores of inserts containing TSS-proximal regions from typical enhancers (TE, n = 1,506) and super enhancers (SE²², n = 600), compared by Mann–Whitney test. d, Correlation between steady-state RNA levels at the Oct4 uaRNA locus (average of 4 replicates) and 4930461G14Rik lincRNA locus (average of 3 replicates). Plotted are all inserts used for Fig. 1, as well as synthetic controls sequences (Fig. 3), for which data was obtained at the lincRNA locus (n = 11,600).

Source data

Extended Data Fig. 2 EXOSC3 knockdown validation and correlation between nascent RNA and steady-state RNA results.

a, Immunoblot showing EXOSC3 protein level in control and siEXOSC3 conditions, harvested from the same experiment as the screen in Fig. 2a, b. b, RT-qPCR on steady-state RNA samples with which the screen was performed, showing levels of the EXOSC3 mRNA and the reporter transcript, just downstream of the library integration site, both internally normalized to TBP. Bars show mean, whiskers indicate standard deviation, n = 3 biologically independent experiments. c, Correlation between nascent RNA (average of 2 replicates) and steady-state RNA (average of 4 replicates) levels, showing all inserts used for Fig. 1, as well as synthetic controls sequences (Fig. 3), n = 11,132. d, Chromatin-associated RNA (Chr-RNA) results with library at uaRNA locus. mRNAs n = 3,832, lincRNAs n = 339, uaRNAs n = 1,730, eRNAs n = 2074, mRNA terminators n = 414. Neighbors were compared by Kruskal-Wallis test, **** indicates P < 0.0001, higher P values are indicated in the panel. e, Correlation between Chr-RNA (average of 2 replicates) and steady-state RNA (average of 4 replicates) levels, all inserts from panel c for which Chr-RNA data was obtained (n = 11,029).

Source data

Extended Data Fig. 3 GC content in genomic regions and its effect on expression.

a, Distribution of GC contents in inserts of the indicated classes included in the library. Open violins show TSS-proximal regions, patterned violins show TSS-distal regions. b,c, Nascent RNA abundance (b) and sort-seq scores (c) of control sequences grouped by GC content percentage. N = 39/281/330/292/117 for <41/41-50/51-60/61-70/>70%, respectively. Neighbors were compared by Kruskal-Wallis test, **** indicates P < 0.0001, higher P values are indicated in the panel. d, Relation between the number of CpG dinucleotides in synthetic control sequences and their steady-state RNA levels (n = 1,059). The red line is the best linear fit through the data. Pearson r = 0.47, P < 0.0001. e, Metagene representations of PRO-seq signal around TSSs of uaRNAs (left) or eRNAs (right), grouped by GC content of the transcribed sequence from +6 to +179 downstream of the TSS (the region included in our screening library). Data shown are from endogenous genomic locations of sequences included in the INSERT-seq screen. Read counts were summed into 25nt bins.

Source data

Extended Data Fig. 4 Co-transcriptionally spliced introns boost transcription and protein expression.

Nascent RNA levels (left) and Sort-seq scores (right) of inserts containing wild-type introns (unbarcoded) grouped by splicing efficiency measured using the nascent RNA screen data. <3% spliced n = 76, 3-30% spliced n = 107, >30% spliced n = 198, significance tested by Kruskal-Wallis test. **** indicates P < 0.0001, higher P values are indicated in the figure.

Source data

Extended Data Fig. 5 Effects of splice site mutants and 5′SS insertion in INSERT-seq and clonal lines.

a, Nascent RNA levels (left) and Sort-seq scores (right) of intron-containing inserts with wild-type (wt) or mutant (m) splice sites. As in Fig. 5a, only introns are shown of which the wild-type version was >30% spliced in nascent RNA and mutants were <3% spliced. 5′SS mutants n = 51, 3′SS mutants n = 23, WT n = 52, comparisons by Kruskal-Wallis test. The differences between 5′SS and 3′SS mutants was not significant in these analyses, but the pattern of the 3′SS mutants being more abundant on average was consistent with the steady-state RNA result (Fig. 5a). b, Steady-state RNA levels of intron-containing inserts with wild-type (+) and mutant (-) splice sites as in Fig. 5a, but showing only inserts that do not contain a PAS hexamer (any of the top-10 PASs in mouse³). 5′SS mutants n = 19, 3′SS mutants n = 10, WT n = 20, comparisons by Kruskal-Wallis test. c, Characterization of all clonal cell lines shown in Fig. 5b, where versions of the 14^th intron of the Smc1 gene with wild-type (+) or mutant (−) splice sites were integrated at the Oct4 uaRNA reporter locus. Top shows RT-PCR, bottom shows PCR on genomic DNA. All clonal lines show genomic integration of the same size in the genomic DNA, but only lines where the intron is flanked by two wild-type splice sites show evidence of splicing. Note that lanes should not be quantitatively compared to each other, as amounts of template material were not controlled. d, Density plot of GC-corrected steady-state RNA levels of unspliced TSS-proximal/distal uaRNA/eRNA regions grouped by the presence and strength (MaxEnt score⁴¹) of a 5′SS motif (see Methods). None n = 2,632, medium (MaxEnt 5–10) n = 1,554, strong (MaxEnt10+) n = 106. All groups are significantly different from each other (P < 0.0001) by Kruskal-Wallis test. e, Density plot of GC-corrected steady-state RNA levels of unspliced TSS-proximal mRNA regions (left) and TSS-proximal/distal uaRNA/eRNA regions (right) grouped by the number of 5′SS motifs (MaxEnt score >5). mRNAs: none n = 1,392, 1 n = 1,604, >1 n = 691. uaRNA/eRNAs: none n = 2,632, 1 n = 1,232, >1 n = 428, comparisons by Kruskal-Wallis test. f, Relative nascent RNA levels (left) and sort-seq scores (right) of 10nt annotated 5′SSs with a MaxEnt score of >5, embedded into several background sequences. Only unspliced inserts (<3% spliced in nascent-RNA) were considered. Same groups as in Fig. 5d: scrambled (Scr, n = 24) and antisense (AS, n = 24) versions of 5′SSs were compared to sense (S) 5′SSs (n = 50) by Kruskal=Wallis test. In all panels, **** indicates P < 0.0001, higher P values are indicated in each plot.

Source data

Supplementary information

Reporting Summary

Peer Review File

Supplementary Tables 1–9

Containis library composition, all INSERT-seq data, and plasmids and primers used in this study

Source data

Source Data Fig. 1

Statistical Source Data

Source Data Fig. 2

Statistical Source Data

Source Data Fig. 3

Statistical Source Data

Source Data Fig. 4

Statistical Source Data

Source Data Fig. 5

Statistical Source Data

Source Data Extended Data Fig. 1

Statistical Source Data

Source Data Extended Data Fig. 2

Statistical Source Data

Source Data Extended Data Fig. 2

Unprocessed Western Blots

Source Data Extended Data Fig. 3

Statistical Source Data

Source Data Extended Data Fig. 4

Statistical Source Data

Source Data Extended Data Fig. 5

Statistical Source Data

Source Data Extended Data Fig. 5

Unprocessed gel image

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vlaming, H., Mimoso, C.A., Field, A.R. et al. Screening thousands of transcribed coding and non-coding regions reveals sequence determinants of RNA polymerase II elongation potential. Nat Struct Mol Biol 29, 613–620 (2022). https://doi.org/10.1038/s41594-022-00785-9

Download citation

Received: 04 July 2021
Accepted: 28 April 2022
Published: 09 June 2022
Issue Date: June 2022
DOI: https://doi.org/10.1038/s41594-022-00785-9

This article is cited by

Co-transcriptional gene regulation in eukaryotes and prokaryotes
- Morgan Shine
- Jackson Gordon
- Karla M. Neugebauer
Nature Reviews Molecular Cell Biology (2024)
PSIP1/LEDGF reduces R-loops at transcription sites to maintain genome integrity
- Sundarraj Jayakumar
- Manthan Patel
- Madapura M. Pradeepa
Nature Communications (2024)
Selection on synonymous sites: the unwanted transcript hypothesis
- Sofia Radrizzani
- Grzegorz Kudla
- Laurence D. Hurst
Nature Reviews Genetics (2024)
Autonomous transposons tune their sequences to ensure somatic suppression
- İbrahim Avşar Ilık
- Petar Glažar
- Tuğçe Aktaş
Nature (2024)
A CpG island-encoded mechanism protects genes from premature transcription termination
- Amy L. Hughes
- Aleksander T. Szczurek
- Robert J. Klose
Nature Communications (2023)

Subjects

Abstract

Access options

Similar content being viewed by others

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Extended data

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links