Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Technical Report
  • Published:

Screening thousands of transcribed coding and non-coding regions reveals sequence determinants of RNA polymerase II elongation potential

Abstract

Precise regulation of transcription by RNA polymerase II (RNAPII) is critical for organismal growth and development. However, what determines whether an engaged RNAPII will synthesize a full-length transcript or terminate prematurely is poorly understood. Notably, RNAPII is far more susceptible to termination when transcribing non-coding RNAs than when synthesizing protein-coding mRNAs, but the mechanisms underlying this are unclear. To investigate the impact of transcribed sequence on elongation potential, we developed a method to screen the effects of thousands of INtegrated Sequences on Expression of RNA and Translation using high-throughput sequencing (INSERT-seq). We found that higher AT content in non-coding RNAs, rather than specific sequence motifs, drives RNAPII termination. Further, we demonstrate that 5′ splice sites autonomously stimulate processive transcription, even in the absence of polyadenylation signals. Our results reveal a potent role for the transcribed sequence in dictating gene output and demonstrate the power of INSERT-seq toward illuminating these contributions.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: INSERT-seq demonstrates the role of transcribed sequences in gene regulation.
Fig. 2: Transcribed sequence directly affects transcription levels.
Fig. 3: GC content inherently affects transcriptional output.
Fig. 4: Co-transcriptionally spliced introns boost transcription.
Fig. 5: Splicing-dependent and splicing-independent role of the 5′SS.

Similar content being viewed by others

Data availability

Raw and processed data files of all INSERT-seq experiments, PRO-seq, H3K4me3 ChIP–seq, and TT-seq are available at the Gene Expression Omnibus, accession no. GSE178230. H3K27ac ChIP–seq data are available through the 4DN data portal (https://data.4dnucleome.org/), ExperimentSet accession no. 4DNESQ33L4G7. H3K4me1 mESC ChIP–seq data were downloaded from the Gene Expression Omnibus, accession no. GSE56138. Reference genome mm10 (GRCm38) can be downloaded using RefSeq assembly accession number GCF_000001635.20. Supplementary Tables 37 provide all normalized and averaged data from INSERT-seq experiments, as well as which inserts are included in which plot. Uncropped image files and processed data shown in each plot are provided as source data. Source data are provided with this paper.

Code availability

All scripts used for analysis of INSERT-seq data can be found on Github: https://github.com/AdelmanLab/Vlaming2021_INSERT-seq_paper. URLs for all custom scripts used for PRO-seq, TT-seq and ChIP–seq analysis are provided in the Methods; these can be found at https://github.com/AdelmanLab/NIH_scripts/ and https://github.com/benjaminmartin02/binBedGraph.

References

  1. Lykke-Andersen, S. et al. Integrator is a genome-wide attenuator of non-productive transcription. Mol. Cell 81, 514–529.e6 (2021).

    Article  CAS  PubMed  Google Scholar 

  2. Scruggs, B. S. et al. Bidirectional transcription arises from two distinct hubs of transcription factor binding and active chromatin. Mol. Cell 58, 1101–1112 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Tian, B., Hu, J., Zhang, H. & Lutz, C. S. A large-scale analysis of mRNA polyadenylation of human and mouse genes. Nucleic Acids Res. 33, 201–212 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Shi, Y. & Manley, J. L. The end of the message: multiple protein-RNA interactions define the mRNA polyadenylation site. Genes Dev. 29, 889–897 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Ntini, E. et al. Polyadenylation site–induced decay of upstream transcripts enforces promoter directionality. Nat. Struct. Mol. Biol. 20, 923–928 (2013).

    Article  CAS  PubMed  Google Scholar 

  6. Almada, A. E., Wu, X., Kriz, A. J., Burge, C. B. & Sharp, P. A. Promoter directionality is controlled by U1 snRNP and polyadenylation signals. Nature 499, 360–363 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Core, L. J. et al. Analysis of nascent RNA identifies a unified architecture of initiation regions at mammalian promoters and enhancers. Nat. Genet. 46, 1311–1320 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Chiu, A. C. et al. Transcriptional pause sites delineate stable nucleosome-associated premature polyadenylation suppressed by U1 snRNP. Mol. Cell 69, 648–663 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Le Hir, H., Nott, A. & Moore, M. J. How introns influence and enhance eukaryotic gene expression. Trends Biochem. Sci. 28, 215–220 (2003).

    Article  PubMed  CAS  Google Scholar 

  10. Damgaard, C. K. et al. A 5′ splice site enhances the recruitment of basal transcription initiation factors in vivo. Mol. Cell 29, 271–278 (2008).

    Article  CAS  PubMed  Google Scholar 

  11. Bieberstein, N. I., Carrillo Oesterreich, F., Straube, K. & Neugebauer, K. M. First exon length controls active chromatin signatures and transcription. Cell Rep. 2, 62–68 (2012).

    Article  CAS  PubMed  Google Scholar 

  12. Fiszbein, A., Krick, K. S., Begg, B. E. & Burge, C. B. Exon-mediated activation of transcription starts. Cell 179, 1551–1565(2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Sousa-Luís, R. et al. POINT technology illuminates the processing of polymerase-associated intact nascent transcripts. Mol. Cell 81, 1935–19502021).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  14. Caizzi, L. et al. Efficient RNA polymerase II pause release requires U2 snRNP function. Mol. Cell 81, 1920–1934.e9 (2021).

    Article  CAS  PubMed  Google Scholar 

  15. Kaida, D. et al. U1 snRNP protects pre-mRNAs from premature cleavage and polyadenylation. Nature 468, 664–668 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Berg, M. G. et al. U1 snRNP determines mRNA length and regulates isoform expression. Cell 150, 53–64 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Andersen, P. K., Lykke-Andersen, S. & Jensen, T. H. Promoter-proximal polyadenylation sites reduce transcription activity. Genes Dev. 26, 2169–2179 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Zhang, S. et al. Structure of a transcribing RNA polymerase II–U1 snRNP complex. Science 371, 305–309 (2021).

    Article  CAS  PubMed  Google Scholar 

  19. Kinney, J. B., Murugan, A., Callan, C. G. & Cox, E. C. Using deep sequencing to characterize the biophysical mechanism of a transcriptional regulatory sequence. Proc. Natl Acad. Sci. USA 107, 9158–9163 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Sharon, E. et al. Inferring gene regulatory logic from high-throughput measurements of thousands of systematically designed promoters. Nat. Biotechnol. 30, 521–530 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Field, A. & Adelman, K. Evaluating enhancer function and transcription. Annu. Rev. Biochem. 89, 213–234 (2020).

    Article  CAS  PubMed  Google Scholar 

  22. Hnisz, D. et al. Super-enhancers in the control of cell identity and disease. Cell 155, 934–947 (2013).

    Article  CAS  PubMed  Google Scholar 

  23. Flynn, R. A. et al. 7SK–BAF axis controls pervasive transcription at enhancers. Nat. Struct. Mol. Biol. 23, 231–238 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Preker, P. et al. RNA exosome depletion reveals transcription upstream of active human promoters. Science 322, 1851–1854 (2008).

    Article  CAS  PubMed  Google Scholar 

  25. Seila, A. C. et al. Divergent transcription from active promoters. Science 322, 1849–1851 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Andersson, R. et al. An atlas of active enhancers across human cell types and tissues. Nature 507, 455–461 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Kwak, H., Fuda, N. J., Core, L. J. & Lis, J. T. Precise maps of RNA polymerase reveal how promoters direct initiation and pausing. Science 339, 950–953 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Mouse Genome Sequencing Consortium. Initial sequencing and comparative analysis of the mouse genome. Nature 420, 520–562 (2002).

    Article  CAS  Google Scholar 

  29. Krinner, S. et al. CpG domains downstream of TSSs promote high levels of gene expression. Nucleic Acids Res. 42, 3551–3564 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Noe Gonzalez, M., Blears, D. & Svejstrup, J. Q. Causes and consequences of RNA polymerase II stalling during transcript elongation. Nat. Rev. Mol. Cell Biol. 22, 3–21 (2021).

    Article  CAS  PubMed  Google Scholar 

  31. Zamft, B., Bintu, L., Ishibashi, T. & Bustamante, C. Nascent RNA structure modulates the transcriptional dynamics of RNA polymerases. Proc. Natl Acad. Sci. 109, 8948–8953 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Turowski, T. W. et al. Nascent transcript folding plays a major role in determining RNA polymerase elongation rates. Mol. Cell 79, 488–503(2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Roberts, J. W. Mechanisms of bacterial transcription termination. J. Mol. Biol. 431, 4030–4039 (2019).

    Article  CAS  PubMed  Google Scholar 

  34. Mishra, S. & Maraia, R. J. RNA polymerase III subunits C37/53 modulate rU:dA hybrid 3′ end dynamics during transcription termination. Nucleic Acids Res. 47, 310–327 (2019).

    Article  CAS  PubMed  Google Scholar 

  35. Fouqueau, T. et al. The cutting edge of archaeal transcription. Emerg. Top. Life Sci. 2, 517–533 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Davidson, L., Francis, L., Eaton, J. D. & West, S. Integrator-dependent and allosteric/intrinsic mechanisms ensure efficient termination of snRNA transcription. Cell Rep. 33, 108319 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. White, E., Kamieniarz-Gdula, K., Dye, M. J. & Proudfoot, N. J. AT-rich sequence elements promote nascent transcript cleavage leading to RNA polymerase II termination. Nucleic Acids Res. 41, 1797–1806 (2013).

    Article  CAS  PubMed  Google Scholar 

  38. Bailey, T. L. & Elkan, C. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc. Int. Conf. Intell. Syst. Mol. Biol. 2, 28–36 (1994).

    CAS  PubMed  Google Scholar 

  39. Heinz, S. et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell 38, 576–89 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Levitt, N., Briggs, D., Gil, A. & Proudfoot, N. J. Definition of an efficient synthetic poly(A) site. Genes Dev. 3, 1019–25 (1989).

    Article  CAS  PubMed  Google Scholar 

  41. Yeo, G. & Burge, C. B. Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals. J. Comput. Biol. 11, 377–394 (2004).

    Article  CAS  PubMed  Google Scholar 

  42. Mordstein, C. et al. Codon usage and splicing jointly influence mrna localization. Cell Syst. 10, 351–362.e8 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Elrod, N. D. et al. The integrator complex attenuates promoter-proximal transcription at protein-coding genes. Mol. Cell 76, 738–752 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Austenaa, L. M. I. et al. A first exon termination checkpoint preferentially suppresses extragenic transcription. Nat. Struct. Mol. Biol. 28, 337–346 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Estell, C., Davidson, L., Steketee, P. C., Monier, A. & West, S. ZC3H4 restricts non-coding transcription in human cells. eLife 10, e67305 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Rivera-Mulia, J. C. et al. Allele-specific control of replication timing and genome organization during development. Genome Res. 28, 800–811 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Williams, L. H. et al. Pausing of RNA polymerase II regulates mammalian developmental potential through control of signaling networks. Mol. Cell 58, 311–322 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Brinkman, E. K., Chen, T., Amendola, M. & van Steensel, B. Easy quantitative assessment of genome editing by sequence trace decomposition. Nucleic Acids Res. 42, e168–e168 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  49. Reimer, K. A., Mimoso, C. A., Adelman, K. & Neugebauer, K. M. Co-transcriptional splicing regulates 3′ end cleavage during mammalian erythropoiesis. Mol. Cell 81, 998–1012.e7 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Henriques, T. et al. Stable pausing by RNA polymerase II provides an opportunity to target and integrate regulatory signals. Mol. Cell 52, 517–528 (2013).

    Article  CAS  PubMed  Google Scholar 

  51. Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J. 17, 10 (2011).

    Article  Google Scholar 

  52. Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).

    Article  CAS  PubMed  Google Scholar 

  54. Danecek, P. et al. Twelve years of SAMtools and BCFtools. Gigascience 10, giab008 (2021).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  55. Korhonen, J. H., Palin, K., Taipale, J. & Ukkonen, E. Fast motif matching revisited: high-order PWMs, SNPs and indels. Bioinformatics 33, 514–521 (2016).

    Google Scholar 

  56. Georgiou, G. & van Heeringen, S. J. fluff: exploratory analysis and visualization of high-throughput sequencing data. PeerJ 4, e2209 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  57. Buecker, C. et al. Reorganization of enhancer patterns in transition from naive to primed pluripotency. Cell Stem Cell 14, 838–853 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  59. Schwalb, B. et al. TT-seq maps the human transient transcriptome. Science 352, 1225–1228 (2016).

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

We thank K. Sasaki for her help in optimizing the run-on protocol for screening purposes and E. Kaye for discussions on library design. We thank S. Buratowski for useful discussions on the project, and D. Shlyueva and T. H. Jensen for feedback on the manuscript. We are also grateful to the Flow Cytometry Facility at the HMS Department of Immunology for cell sorting help and advice, the HMS Nascent Transcriptomics Core for PRO-seq library construction, and the HMS Biopolymers Facility and The Bauer Core Facility at Harvard University for next-generation sequencing. This research was supported by the European Molecular Biology Organization (ALTF 531-2017 to H. V.), Human Frontier Science Program (LT000651/2018-L to H. V.), the National Institutes of Health (NIH R01 GM139960 to K. A.), startup funding from Harvard Medical School to K. A, the National Science Foundation Graduate Research Fellowship (DGE1745303 to C. A. M.) and the Canadian Institutes of Health Research (Banting fellowship to B. J. E. M.).

Author information

Authors and Affiliations

Authors

Contributions

H. V. and K. A. conceived the study and designed experiments. H. V. performed experiments and analyzed data. C. A. M. performed PRO-seq data analysis, helped generate intron-containing clonal cell lines, and optimized the run-on assay and knockdown conditions. B. J. E. M. and A. R. F. performed ChIP–seq and TT-seq experiments. K. A. supervised the study. H. V. and K. A. wrote the manuscript with input from all co-authors.

Corresponding authors

Correspondence to Hanneke Vlaming or Karen Adelman.

Ethics declarations

Competing interests

K. A. is a consultant for Syros Pharmaceuticals, is on the scientific advisory board of CAMP4 Therapeutics, and receives research funding from Novartis unrelated to this work. The remaining authors declare no competing interests.

Peer review

Peer review information

Nature Structural and Molecular Biology thanks Yongsheng Shi and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Carolina Perdigoto, in collaboration with the Nature Structural & Molecular Biology team. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Correlations between INSERT-seq experiments.

a, Spearman correlation coefficients between steady-state RNA and Sort-seq experiments, using all inserts for which data was obtained in each of the six experiments (n = 12,090). b, Sort-seq scores of inserts containing TSS-proximal and TSS-distal genomic regions of indicated RNA classes. Same groups as in Fig. 1e. Comparisons between proximal and distal regions by Kruskal-Wallis test, **** indicates P < 0.0001. c, Sort-seq scores of inserts containing TSS-proximal regions from typical enhancers (TE, n = 1,506) and super enhancers (SE22, n = 600), compared by Mann–Whitney test. d, Correlation between steady-state RNA levels at the Oct4 uaRNA locus (average of 4 replicates) and 4930461G14Rik lincRNA locus (average of 3 replicates). Plotted are all inserts used for Fig. 1, as well as synthetic controls sequences (Fig. 3), for which data was obtained at the lincRNA locus (n = 11,600).

Source data

Extended Data Fig. 2 EXOSC3 knockdown validation and correlation between nascent RNA and steady-state RNA results.

a, Immunoblot showing EXOSC3 protein level in control and siEXOSC3 conditions, harvested from the same experiment as the screen in Fig. 2a, b. b, RT-qPCR on steady-state RNA samples with which the screen was performed, showing levels of the EXOSC3 mRNA and the reporter transcript, just downstream of the library integration site, both internally normalized to TBP. Bars show mean, whiskers indicate standard deviation, n = 3 biologically independent experiments. c, Correlation between nascent RNA (average of 2 replicates) and steady-state RNA (average of 4 replicates) levels, showing all inserts used for Fig. 1, as well as synthetic controls sequences (Fig. 3), n = 11,132. d, Chromatin-associated RNA (Chr-RNA) results with library at uaRNA locus. mRNAs n = 3,832, lincRNAs n = 339, uaRNAs n = 1,730, eRNAs n = 2074, mRNA terminators n = 414. Neighbors were compared by Kruskal-Wallis test, **** indicates P < 0.0001, higher P values are indicated in the panel. e, Correlation between Chr-RNA (average of 2 replicates) and steady-state RNA (average of 4 replicates) levels, all inserts from panel c for which Chr-RNA data was obtained (n = 11,029).

Source data

Extended Data Fig. 3 GC content in genomic regions and its effect on expression.

a, Distribution of GC contents in inserts of the indicated classes included in the library. Open violins show TSS-proximal regions, patterned violins show TSS-distal regions. b,c, Nascent RNA abundance (b) and sort-seq scores (c) of control sequences grouped by GC content percentage. N = 39/281/330/292/117 for <41/41-50/51-60/61-70/>70%, respectively. Neighbors were compared by Kruskal-Wallis test, **** indicates P < 0.0001, higher P values are indicated in the panel. d, Relation between the number of CpG dinucleotides in synthetic control sequences and their steady-state RNA levels (n = 1,059). The red line is the best linear fit through the data. Pearson r = 0.47, P < 0.0001. e, Metagene representations of PRO-seq signal around TSSs of uaRNAs (left) or eRNAs (right), grouped by GC content of the transcribed sequence from +6 to +179 downstream of the TSS (the region included in our screening library). Data shown are from endogenous genomic locations of sequences included in the INSERT-seq screen. Read counts were summed into 25nt bins.

Source data

Extended Data Fig. 4 Co-transcriptionally spliced introns boost transcription and protein expression.

Nascent RNA levels (left) and Sort-seq scores (right) of inserts containing wild-type introns (unbarcoded) grouped by splicing efficiency measured using the nascent RNA screen data. <3% spliced n = 76, 3-30% spliced n = 107, >30% spliced n = 198, significance tested by Kruskal-Wallis test. **** indicates P < 0.0001, higher P values are indicated in the figure.

Source data

Extended Data Fig. 5 Effects of splice site mutants and 5′SS insertion in INSERT-seq and clonal lines.

a, Nascent RNA levels (left) and Sort-seq scores (right) of intron-containing inserts with wild-type (wt) or mutant (m) splice sites. As in Fig. 5a, only introns are shown of which the wild-type version was >30% spliced in nascent RNA and mutants were <3% spliced. 5′SS mutants n = 51, 3′SS mutants n = 23, WT n = 52, comparisons by Kruskal-Wallis test. The differences between 5′SS and 3′SS mutants was not significant in these analyses, but the pattern of the 3′SS mutants being more abundant on average was consistent with the steady-state RNA result (Fig. 5a). b, Steady-state RNA levels of intron-containing inserts with wild-type (+) and mutant (-) splice sites as in Fig. 5a, but showing only inserts that do not contain a PAS hexamer (any of the top-10 PASs in mouse3). 5′SS mutants n = 19, 3′SS mutants n = 10, WT n = 20, comparisons by Kruskal-Wallis test. c, Characterization of all clonal cell lines shown in Fig. 5b, where versions of the 14th intron of the Smc1 gene with wild-type (+) or mutant (−) splice sites were integrated at the Oct4 uaRNA reporter locus. Top shows RT-PCR, bottom shows PCR on genomic DNA. All clonal lines show genomic integration of the same size in the genomic DNA, but only lines where the intron is flanked by two wild-type splice sites show evidence of splicing. Note that lanes should not be quantitatively compared to each other, as amounts of template material were not controlled. d, Density plot of GC-corrected steady-state RNA levels of unspliced TSS-proximal/distal uaRNA/eRNA regions grouped by the presence and strength (MaxEnt score41) of a 5′SS motif (see Methods). None n = 2,632, medium (MaxEnt 5–10) n = 1,554, strong (MaxEnt10+) n = 106. All groups are significantly different from each other (P < 0.0001) by Kruskal-Wallis test. e, Density plot of GC-corrected steady-state RNA levels of unspliced TSS-proximal mRNA regions (left) and TSS-proximal/distal uaRNA/eRNA regions (right) grouped by the number of 5′SS motifs (MaxEnt score >5). mRNAs: none n = 1,392, 1 n = 1,604, >1 n = 691. uaRNA/eRNAs: none n = 2,632, 1 n = 1,232, >1 n = 428, comparisons by Kruskal-Wallis test. f, Relative nascent RNA levels (left) and sort-seq scores (right) of 10nt annotated 5′SSs with a MaxEnt score of >5, embedded into several background sequences. Only unspliced inserts (<3% spliced in nascent-RNA) were considered. Same groups as in Fig. 5d: scrambled (Scr, n = 24) and antisense (AS, n = 24) versions of 5′SSs were compared to sense (S) 5′SSs (n = 50) by Kruskal=Wallis test. In all panels, **** indicates P < 0.0001, higher P values are indicated in each plot.

Source data

Supplementary information

Reporting Summary

Peer Review File

Supplementary Tables 1–9

Containis library composition, all INSERT-seq data, and plasmids and primers used in this study

Source data

Source Data Fig. 1

Statistical Source Data

Source Data Fig. 2

Statistical Source Data

Source Data Fig. 3

Statistical Source Data

Source Data Fig. 4

Statistical Source Data

Source Data Fig. 5

Statistical Source Data

Source Data Extended Data Fig. 1

Statistical Source Data

Source Data Extended Data Fig. 2

Statistical Source Data

Source Data Extended Data Fig. 2

Unprocessed Western Blots

Source Data Extended Data Fig. 3

Statistical Source Data

Source Data Extended Data Fig. 4

Statistical Source Data

Source Data Extended Data Fig. 5

Statistical Source Data

Source Data Extended Data Fig. 5

Unprocessed gel image

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Vlaming, H., Mimoso, C.A., Field, A.R. et al. Screening thousands of transcribed coding and non-coding regions reveals sequence determinants of RNA polymerase II elongation potential. Nat Struct Mol Biol 29, 613–620 (2022). https://doi.org/10.1038/s41594-022-00785-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41594-022-00785-9

This article is cited by

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research