Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Quantitative analysis of transcription start site selection reveals control by DNA sequence, RNA polymerase II activity and NTP levels

Abstract

Transcription start site (TSS) selection is a key step in gene expression and occurs at many promoter positions over a wide range of efficiencies. Here we develop a massively parallel reporter assay to quantitatively dissect contributions of promoter sequence, nucleoside triphosphate substrate levels and RNA polymerase II (Pol II) activity to TSS selection by ‘promoter scanning’ in Saccharomyces cerevisiae (Pol II MAssively Systematic Transcript End Readout, ‘Pol II MASTER’). Using Pol II MASTER, we measure the efficiency of Pol II initiation at 1,000,000 individual TSS sequences in a defined promoter context. Pol II MASTER confirms proposed critical qualities of S.cerevisiae TSS −8, −1 and +1 positions, quantitatively, in a controlled promoter context. Pol II MASTER extends quantitative analysis to surrounding sequences and determines that they tune initiation over a wide range of efficiencies. These results enabled the development of a predictive model for initiation efficiency based on sequence. We show that genetic perturbation of Pol II catalytic activity alters initiation efficiency mostly independently of TSS sequence, but selectively modulates preference for the initiating nucleotide. Intriguingly, we find that Pol II initiation efficiency is directly sensitive to guanosine-5′-triphosphate levels at the first five transcript positions and to cytosine-5′-triphosphate and uridine-5′-triphosphate levels at the second position genome wide. These results suggest individual nucleoside triphosphate levels can have transcript-specific effects on initiation, representing a cryptic layer of potential regulation at the level of Pol II biochemical properties. The results establish Pol II MASTER as a method for quantitative dissection of transcription initiation in eukaryotes.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: A high-throughput system for studying TSS selection.
Fig. 2: Wide range of initiation efficiency measured using MASTER.
Fig. 3: Sequence contributions to Pol II initiation efficiency from positions surrounding the TSS.
Fig. 4: Pol II mutants alter TSS efficiency for all possible TSS motifs while showing selective effects for base at +1.
Fig. 5: Pol II initiation is sensitive to NTP pools.
Fig. 6: Learned initiation preferences are predictive of TSS efficiencies at genomic promoters.
Fig. 7: Logistic regression model of DNA sequence contribution to TSS efficiency.
Fig. 8: Model for TSS sequence preference regulated by multiple mechanisms.

Similar content being viewed by others

Data availability

Raw sequencing data generated in this study are available in the National Center for Biotechnology Information BioProject, under the accession number PRJNA766624. Processed data are available in GEO, under the accession number GSE185290. Source data are provided with this paper.

Code availability

Code for analyses in this study is provided at https://github.com/Kaplan-Lab-Pitt/PolII_MASTER-TSS_sequence.

References

  1. Zhang, Z. & Dietrich, F. S. Mapping of transcription start sites in Saccharomyces cerevisiae using 5′ SAGE. Nucleic Acids Res. 33, 2838–2851 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Park, D., Morris, A. R., Battenhouse, A. & Iyer, V. R. Simultaneous mapping of transcript ends at single-nucleotide resolution and identification of widespread promoter-associated non-coding RNA governed by TATA elements. Nucleic Acids Res. 42, 3736–3749 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Pelechano, V., Wei, W. & Steinmetz, L. M. Extensive transcriptional heterogeneity revealed by isoform profiling. Nature 497, 127–131 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Chia, M. et al. High-resolution analysis of cell-state transitions in yeast suggests widespread transcriptional tuning by alternative starts. Genome Biol. 22, 34 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Nepal, C. et al. Dynamic regulation of the transcription initiation landscape at single nucleotide resolution during vertebrate embryogenesis. Genome Res. 23, 1938–1950 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Consortium, F. et al. A promoter-level mammalian expression atlas. Nature 507, 462–470 (2014).

    Article  Google Scholar 

  7. Yamashita, R. et al. Genome-wide characterization of transcriptional start sites in humans by integrative transcriptome analysis. Genome Res. 21, 775–789 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Carninci, P. et al. The transcriptional landscape of the mammalian genome. Science 309, 1559–1563 (2005).

    Article  CAS  PubMed  Google Scholar 

  9. Hoskins, R. A. et al. Genome-wide analysis of promoter architecture in Drosophila melanogaster. Genome Res. 21, 182–192 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Zheng, H. et al. Global identification of transcription start sites in the genome of Apis mellifera using 5′ LongSAGE. J. Exp. Zool. B Mol. Dev. Evol. 316, 500–514 (2011).

    Article  CAS  PubMed  Google Scholar 

  11. Chen, R. A. et al. The landscape of RNA polymerase II transcription initiation in C. elegans reveals promoter and enhancer architectures. Genome Res. 23, 1339–1347 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Cheng, Z. et al. Pervasive, coordinated protein-level changes driven by transcript isoform switching during Meiosis. Cell 172, 910–923 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Rojas-Duran, M. F. & Gilbert, W. V. Alternative transcription start site selection leads to large differences in translation activity in yeast. RNA 18, 2299–2305 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Batut, P., Dobin, A., Plessy, C., Carninci, P. & Gingeras, T. R. High-fidelity promoter profiling reveals widespread alternative promoter usage and transposon-driven developmental gene expression. Genome Res. 23, 169–180 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Zhang, P. et al. Relatively frequent switching of transcription start sites during cerebellar development. BMC Genomics 18, 461 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  16. Lu, Z. & Lin, Z. Pervasive and dynamic transcription initiation in Saccharomyces cerevisiae. Genome Res. 29, 1198–1210 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Demircioglu, D. et al. A pan-cancer transcriptome analysis reveals pervasive regulation through alternative promoters. Cell 178, 1465–1477 (2019).

    Article  CAS  PubMed  Google Scholar 

  18. Thorsen, K. et al. Tumor-specific usage of alternative transcription start sites in colorectal cancer identified by genome-wide exon array analysis. BMC Genomics 12, 505 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Boyd, M. et al. Characterization of the enhancer and promoter landscape of inflammatory bowel disease from human colon biopsies. Nat. Commun. 9, 1661 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  20. Giardina, C. & Lis, J. T. DNA melting on yeast RNA polymerase II promoter. Science 261, 759–762 (1993).

    Article  CAS  PubMed  Google Scholar 

  21. Qiu, C. et al. Universal promoter scanning by Pol II during transcription initiation in Saccharomyces cerevisiae. Genome Biol. 21, 132 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Kuehner, J. N. & Brow, D. A. Quantitative analysis of in vivo initiator selection by yeast RNA polymerase II supports a scanning model. J. Biol. Chem. 281, 14119–14128 (2006).

    Article  CAS  PubMed  Google Scholar 

  23. Kaplan, C. D., Jin, H., Zhang, I. L. & Belyanin, A. Dissection of Pol II trigger loop function and Pol II activity-dependent control of start site selection in vivo. PLoS Genet. 8, e1002627 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Miller, G. & Hahn, S. A DNA-tethered cleavage probe reveals the path for promoter DNA in the yeast preinitiation complex. Nat. Struct. Mol. Biol. 13, 603–610 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Fazal, F. M., Meng, C. A., Murakami, K., Kornberg, R. D. & Block, S. M. Real-time observation of the initiation of RNA polymerase II transcription. Nature 525, 274–277 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Hampsey, M. Molecular genetics of the RNA polymerase II general transcriptional machinery. Microbiol Mol. Biol. Rev. 62, 465–503 (1998).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Zhao, T. et al. Ssl2/TFIIH function in transcription start site scanning by RNA polymerase II in Saccharomyces cerevisiae. eLife 10, e71013 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Hahn S, H. E. & Guarente, L. Each of three ‘TATA elements’ specifies a subset of the transcription initiation sites at the CYC-1 promoter of Saccharomy cescerevisiae. Proc. Natl Acad. Sci. USA 82, 8562–8566 (1985).

    Article  PubMed  PubMed Central  Google Scholar 

  29. Cortes, T. et al. Genome-wide mapping of transcriptional start sites defines an extensive leaderless transcriptome in Mycobacterium tuberculosis. Cell Rep. 5, 1121–1131 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Bucher, P. Weight matrix descriptions of four eukaryotic RNA polymerase II promoter elements derived from 502 unrelated promoter sequences. J. Mol. Biol. 212, 563–578 (1990).

    Article  CAS  PubMed  Google Scholar 

  31. Smale, S. T. & Baltimore, D. The ‘initiator’ as a transcription control element. Cell 57, 103–113 (1989).

    Article  CAS  PubMed  Google Scholar 

  32. Corden, J. et al. Promoter sequences of eukaryotic protein-coding genes. Science 209, 1406–1414 (1980).

    Article  CAS  PubMed  Google Scholar 

  33. McNeil, J. B. & Smith, M. Saccharomyces cerevisiae CYC1 mRNA 5′-end positioning: analysis by in vitro mutagenesis, using synthetic duplexes with random mismatch base pairs. Mol. Cell. Biol. 5, 3545–3551 (1985).

    CAS  PubMed  PubMed Central  Google Scholar 

  34. Malabat, C., Feuerbach, F., Ma, L., Saveanu, C. & Jacquier, A. Quality control of transcription start site selection by nonsense-mediated mRNA decay. eLife 4, e06722 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  35. Policastro, R. A., Raborn, R. T., Brendel, V. P. & Zentner, G. E. Simple and efficient profiling of transcription initiation and transcript levels with STRIPE-seq. Genome Res. 30, 910–923 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Healy, A. M., Helser, T. L. & Zitomer, R. S. Sequences required for transcriptional initiation of the Saccharomyces cerevisiae CYC7 genes. Mol. Cell. Biol. 7, 3785–3791 (1987).

    CAS  PubMed  PubMed Central  Google Scholar 

  37. Furter-Graves, E. M. & Hall, B. D. DNA sequence elements required for transcription initiation of the Schizosaccharomyces pombe ADH gene in Saccharomyces cerevisiae. Mol. Gen. Genet 223, 407–416 (1990).

    Article  CAS  PubMed  Google Scholar 

  38. Carninci, P. et al. Genome-wide analysis of mammalian promoter architecture and evolution. Nat. Genet. 38, 626–635 (2006).

    Article  CAS  PubMed  Google Scholar 

  39. Hashimoto, S. et al. 5′-end SAGE for the analysis of transcriptional start sites. Nat. Biotechnol. 22, 1146–1149 (2004).

    Article  CAS  PubMed  Google Scholar 

  40. Suzuki, Y. et al. Diverse transcriptional initiation revealed by fine, large-scale mapping of mRNA start sites. EMBO Rep. 2, 388–393 (2001).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Kim, D. et al. Comparative analysis of regulatory elements between Escherichia coli and Klebsiella pneumoniae by genome-wide transcription start site profiling. PLoS Genet. 8, e1002867 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Vvedenskaya, I. O. et al. Massively systematic transcript end readout, ‘MASTER’: transcription start site selection, transcriptional slippage, and transcript yields. Mol. Cell 60, 953–965 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Gleghorn, M. L., Davydova, E. K., Basu, R., Rothman-Denes, L. B. & Murakami, K. S. X-ray crystal structures elucidate the nucleotidyl transfer reaction of transcript initiation using two nucleotides. Proc. Natl Acad. Sci. USA 108, 3566–3571 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Basu, R. S. et al. Structural basis of transcription initiation by bacterial RNA polymerase holoenzyme. J. Biol. Chem. 289, 24549–24559 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Lu, Z. & Lin, Z. The origin and evolution of a distinct mechanism of transcription initiation in yeasts. Genome Res. 31, 51–63 (2020).

    Article  PubMed  Google Scholar 

  46. Maicas, E. & Friesen, J. D. A sequence pattern that occurs at the transcription initiation region of yeast RNA polymerase II promoters. Nucleic Acids Res. 18, 3387–3393 (1990).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Lubliner, S., Keren, L. & Segal, E. Sequence features of yeast and human core promoters that are predictive of maximal promoter activity. Nucleic Acids Res. 41, 5569–5581 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Dujon, B. The yeast genome project: what did we learn? Trends Genet. 12, 263–270 (1996).

    Article  CAS  PubMed  Google Scholar 

  49. Lubliner, S. et al. Core promoter sequence in yeast is a major determinant of expression level. Genome Res. 25, 1008–1017 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Blazeck, J., Garg, R., Reed, B. & Alper, H. S. Controlling promoter strength and regulation in Saccharomyces cerevisiae using synthetic hybrid promoters. Biotechnol. Bioeng. 109, 2884–2895 (2012).

    Article  CAS  PubMed  Google Scholar 

  51. Dhillon, N. et al. Permutational analysis of Saccharomyces cerevisiae regulatory elements. Synth. Biol. 5, ysaa007 (2020).

  52. Wang, H., Schilbach, S., Ninov, M., Urlaub, H. & Cramer, P. Structures of transcription preinitiation complex engaged with the +1 nucleosome. Nat. Struct. Mol. Biol. 30, 226–232 (2022).

  53. Vvedenskaya, I. O., Goldman, S. R. & Nickels, B. E. Analysis of bacterial transcription by ‘Massively Systematic Transcript End Readout,’ MASTER. Methods Enzymol. 612, 269–302 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Vvedenskaya, I. O. et al. Interactions between RNA polymerase and the core recognition element are a determinant of transcription start site selection. Proc. Natl Acad. Sci. USA 113, E2899–E2905 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Winkelman, J. T. et al. Multiplexed protein–DNA cross-linking: scrunching in transcription start site selection. Science 351, 1090–1093 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Hochschild, A. Mastering transcription: multiplexed analysis of transcription start site sequences. Mol. Cell 60, 829–831 (2015).

    Article  CAS  PubMed  Google Scholar 

  57. Faitar, S. L., Brodie, S. A. & Ponticelli, A. S. Promoter-specific shifts in transcription initiation conferred by yeast TFIIB mutations are determined by the sequence in the immediate vicinity of the start sites. Mol. Cell. Biol. 21, 4427–4440 (2001).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Deshpande, A. P. & Patel, S. S. Mechanism of transcription initiation by the yeast mitochondrial RNA polymerase. Biochim. Biophys. Acta 1819, 930–938 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Javahery, R., Khachi, A., Lo, K., Zenzie-Gregory, B. & Smale, S. T. DNA sequence requirements for transcriptional initiator activity in mammalian cells. Mol. Cell. Biol. 14, 116–127 (1994).

    CAS  PubMed  PubMed Central  Google Scholar 

  60. Arkhipova, I. R. Promoter elements in Drosophila melanogaster revealed by sequence analysis. Genetics 139, 1359–1369 (1995).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. Yarden, G., Elfakess, R., Gazit, K. & Dikstein, R. Characterization of sINR, a strict version of the Initiator core promoter element. Nucleic Acids Res. 37, 4234–4246 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. Wong, M. S., Kinney, J. B. & Krainer, A. R. Quantitative activity profile and context dependence of all human 5′ splice sites. Mol. Cell 71, 1012–1026 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  63. Roca, X. et al. Features of 5′-splice-site efficiency derived from disease-causing mutations and comparative genomics. Genome Res. 18, 77–87 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Carmel, I., Tal, S., Vig, I. & Ast, G. Comparative analysis detects dependencies among the 5′ splice-site positions. RNA 10, 828–840 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. McPhillips, C. C., Hyle, J. W. & Reines, D. Detection of the mycophenolate-inhibited form of IMP dehydrogenase in vivo. Proc. Natl Acad. Sci. USA 101, 12171–12176 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  66. Hyle, J. W., Shaw, R. J. & Reines, D. Functional distinctions between IMP dehydrogenase genes in providing mycophenolate resistance and guanine prototrophy to yeast. J. Biol. Chem. 278, 28470–28478 (2003).

    Article  CAS  PubMed  Google Scholar 

  67. Kuehner, J. N. & Brow, D. A. Regulation of a eukaryotic gene by GTP-dependent start site selection and transcription attenuation. Mol. Cell 31, 201–211 (2008).

    Article  CAS  PubMed  Google Scholar 

  68. Rhee, H. S. & Pugh, B. F. Genome-wide structure and organization of eukaryotic preinitiation complexes. Nature 483, 295–301 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  69. Vo ngoc, L., Huang, C. Y., Cassidy, C. J., Medrano, C. & Kadonaga, J. T. Identification of the human DPR core promoter element using machine learning. Nature 585, 459–463 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  70. Luse, D. S., Parida, M., Spector, B. M., Nilson, K. A. & Price, D. H. A unified view of the sequence and functional organization of the human RNA polymerase II promoter. Nucleic Acids Res. 48, 7767–7785 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  71. Zhang, Y. et al. Structural basis of transcription initiation. Science 338, 1076–1080 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  72. Walmacq, C. et al. Mechanism of translesion transcription by RNA polymerase II and its role in cellular resistance to DNA damage. Mol. Cell 46, 18–29 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  73. Braberg, H. et al. From structure to systems: high-resolution, quantitative genetic analysis of RNA polymerase II. Cell 154, 775–788 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  74. Malik, I., Qiu, C., Snavely, T. & Kaplan, C. D. Wide-ranging and unexpected consequences of altered Pol II catalytic activity in vivo. Nucleic Acids Res. 45, 4431–4451 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  75. Kwapisz, M. et al. Mutations of RNA polymerase II activate key genes of the nucleoside triphosphate biosynthetic pathways. EMBO J. 27, 2411–2421 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  76. Thiebaut, M. et al. Futile cycle of transcription initiation and termination modulates the response to nucleotide shortage in S. cerevisiae. Mol. Cell 31, 671–682 (2008).

    Article  CAS  PubMed  Google Scholar 

  77. Steinmetz, E. J. et al. Genome-wide distribution of yeast RNA polymerase II and its control by Sen1 helicase. Mol. Cell 24, 735–746 (2006).

    Article  CAS  PubMed  Google Scholar 

  78. Hein, P. P., Palangat, M. & Landick, R. RNA transcript 3′-proximal sequence affects translocation bias of RNA polymerase. Biochemistry 50, 7002–7014 (2011).

    Article  CAS  PubMed  Google Scholar 

  79. Cabart, P., Jin, H., Li, L. & Kaplan, C. D. Activation and reactivation of the RNA polymerase II trigger loop for intrinsic RNA cleavage and catalysis. Transcription 5, e28869 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  80. Sainsbury, S., Niesser, J. & Cramer, P. Structure and function of the initially transcribing RNA polymerase II-TFIIB complex. Nature 493, 437–440 (2013).

    Article  CAS  PubMed  Google Scholar 

  81. Segal, E. & Widom, J. Poly(dA:dT) tracts: major determinants of nucleosome organization. Curr. Opin. Struct. Biol. 19, 65–71 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  82. Tillo, D. & Hughes, T. R. G+C content dominates intrinsic nucleosome occupancy. BMC Bioinf. 10, 442 (2009).

    Article  Google Scholar 

  83. Lee, W. et al. A high-resolution atlas of nucleosome occupancy in yeast. Nat. Genet. 39, 1235–1244 (2007).

    Article  CAS  PubMed  Google Scholar 

  84. Peckham, H. E. et al. Nucleosome positioning signals in genomic DNA. Genome Res. 17, 1170–1177 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  85. Segal, E. et al. A genomic code for nucleosome positioning. Nature 442, 772–778 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  86. Jin, H. & Kaplan, C. D. Relationships of RNA polymerase II genetic interactors to transcription start site usage defects and growth in Saccharomyces cerevisiae. G3 5, 21–33 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  87. Amberg, D. C., Burke, D., Strathern, J. N., Burke, D. & Cold Spring Harbor Laboratory. Methods in Yeast Genetics: A Cold Spring Harbor Laboratory Course Manual, XVII (Cold Spring Harbor Laboratory Press, 2005).

  88. Chee, M. K. & Haase, S. B. New and redesigned pRS plasmid shuttle vectors for genetic manipulation of Saccharomyces cerevisiae. G3 2, 515–526 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  89. Gietz, R. D. & Schiestl, R. H. High-efficiency yeast transformation using the LiAc/SS carrier DNA/PEG method. Nat. Protoc. 2, 31–34 (2007).

    Article  CAS  PubMed  Google Scholar 

  90. Benatuil, L., Perez, J. M., Belk, J. & Hsieh, C. M. An improved yeast transformation method for the generation of very large human antibody libraries. Protein Eng. Des. Sel. 23, 155–159 (2010).

    Article  CAS  PubMed  Google Scholar 

  91. Schmitt, M. E., Brown, T. A. & Trumpower, B. L. A rapid and simple method for preparation of RNA from Saccharomyces cerevisiae. Nucleic Acids Res. 18, 3091–3092 (1990).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  92. Vvedenskaya, I. O., Goldman, S. R. & Nickels, B. E. Preparation of cDNA libraries for high-throughput RNA sequencing analysis of RNA 5′ ends. Methods Mol. Biol. 1276, 211–228 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  93. Ranish, J. A. & Hahn, S. The yeast general transcription factor TFIIA is composed of 2 polypeptide subunits. J. Biol. Chem. 266, 19320–19327 (1991).

    Article  CAS  PubMed  Google Scholar 

  94. Zhang, J., Kobert, K., Flouri, T. & Stamatakis, A. PEAR: a fast and accurate Illumina paired-end read merger. Bioinformatics 30, 614–620 (2014).

    Article  CAS  PubMed  Google Scholar 

  95. Smith, T., Heger, A. & Sudbery, I. UMI-tools: modeling sequencing errors in unique molecular identifiers to improve quantification accuracy. Genome Res. 27, 491–499 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  96. Tareen, A. & Kinney, J. B. Logomaker: beautiful sequence logos in Python. Bioinformatics 36, 2272–2274 (2020).

    Article  CAS  PubMed  Google Scholar 

  97. Crooks, G. E., Hon, G., Chandonia, J. M. & Brenner, S. E. WebLogo: a sequence logo generator. Genome Res. 14, 1188–1190 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  98. Kuhn, M. Building predictive models in R using the caret package. J. Stat. Softw. 28, 1–26 (2008).

    Article  Google Scholar 

Download references

Acknowledgements

The authors thank Kaplan lab members for helpful comments on the manuscript. We are deeply grateful to C. Qiu for discussions and comments on this project. We acknowledge J. Kinney (Cold Spring Harbor Laboratory) and S. Li (Statistical Consulting Center at University of Pittsburgh) for discussions on modeling. We thank C.D. Johnson, R. Metz (Texas A&M AgriLife Genomics and Bioinformatics Service), A. Hillhouse (Texas A&M Institute for Genome Sciences & Society), W.A. MacDonald and R. Elbakri (the University of Pittsburgh Health Sciences Sequencing Core at UPMC Children’s Hospital of Pittsburgh), Y. Pan (the UPMC Genome Center), D. Kumar (the Waksman Genomics Core Facility at Rutgers University) and L. Freeman (Illumina) for discussions and advice regarding deep sequencing strategies. We thank S.J. Mullett and S.G. Wendell (Metabolomics and Lipidomics Core, NIHS10OD023402) for performing NTP measurements. We acknowledge support from National Institutes of Health (NIH) grant R01GM097260 to C.D.K. for the early part of this work and NIH grants R01GM120450 and R35GM144116 to C.D.K. and R35GM118059 to B.E.N. This research was supported in part by the University of Pittsburgh Center for Research Computing, RRID:SCR_022735, through the resources provided. Specifically, this work used the HTC cluster, which is supported by NIH award number S10OD028483.

Author information

Authors and Affiliations

Authors

Contributions

Y.Z. designed the project, performed experiments, analyzed data, made figures, and drafted and revised the manuscript. I.O.V. generated libraries for TSS-seq. S-H.Z. analyzed data and discussed the analysis. B.E.N. provided funding and methodology of TSS-seq, and revised the manuscript. C.D.K. conceived and designed the project, guided analyses and interpretation of data, provided funding and revised the manuscript.

Corresponding author

Correspondence to Craig D. Kaplan.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Structural & Molecular Biology thanks the anonymous reviewers for their contribution to the peer review of this work. Sara Osman was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 High level of reproducibility and coverage depth of library variants.

(A) Schematic of experimental approach. Promoter libraries with almost all possible sequences within a 9 nt randomized region were constructed on plasmids. Libraries were designated ‘AYR’, ‘BYR’, and ‘ARY’ based on randomized region composition. Plasmids were amplified in E. coli and transformed into yeast with wild type or mutated Pol II. DNA and RNA were extracted and prepared for DNA-seq and TSS-seq. (B) Base frequencies at positions within the randomized region of promoter variants demonstrate unbiased synthesis of randomized regions. Bars are mean +/- standard deviation of the mean for promoter variants in WT and four Pol II mutants. (C) Heatmap illustrating hierarchical clustering of Pearson correlation coefficients of reads per promoter variant E. coli libraries and three biological replicates of libraries transformed into yeast. (D) Example correlation plots of DNA reads count of promoter variants for E. coli and yeast WT biological replicates. Pearson r and number (N) of compared variants are shown. (E) Bulk primer extension for RNA produced from promoter variant libraries transformed into WT yeast. ‘No GFP’ control used RNA from an untransformed strain. ‘No RNA’ control used a sample of nuclease-free water. Dots represent three biological replicates. Bars are mean +/- standard deviation of the mean. (F) TSS usage based TSS-seq read lengths from transformed libraries. Dots represent three biological replicates. Bars are mean +/- standard deviation of mean. Distributions are similar to the distributions in E. Note that primer extension will blur usage into adjacent upstream position due to some level of non-templated addition of C to RNA 5’ ends. (G) Heat scatter plots of Coefficient of Variation (CoV, y axis) versus total RNA reads per promoter variant in each Pol II MASTER library. A cutoff of CoV = 0.5 was used to filter higher variance variants. (H) Heat scatter plots of relative expression versus TSS efficiency of major TSSs per promoter variant, with contour lines indicating deciles of data. Number (N) of promoter variants with [−1, +1] relative expression values (log2) and corresponding percentage of total promoter variants are shown.

Source data

Extended Data Fig. 2 Surrounding sequence of TSSs modulates initiation efficiency.

(A) +1 TSS efficiency of all −7 to −2 sequences within each N-8N-1N+1 motif in WT, rank ordered by efficiency of A-8C-1A+1 version shown as a heat map. x-axis is ordered based on median efficiency for each N-8N-1N+1 motif group, as shown in Fig. 2B. Spearman’s rank correlation tests between A-8C-1A+1 group and all groups are shown beneath the heat map. (B) Efficiencies of designed +1 TSSs grouped by base identities between −8 and +1 positions. Statistical analyses by Kruskal-Wallis with Dunn’s multiple comparisons test for base preference at individual positions relative to +1 TSS are shown beneath plots. Lines represent median values of subgroups. ****, P ≤ 0.0001; ***, P ≤ 0.001; **, P ≤ 0.01; *, P ≤ 0.05. (C) Histogram showing the distribution of measured efficiencies for all designed −8 to +4 TSSs of all promoter variants from ‘AYR’, ‘BYR’ and ‘ARY’ libraries in WT. Dashed line marks the 5% efficiency cutoff. (D) A+2G+3G+4 motif enrichment is apparent for the top 10% most efficient designed −8 TSS. A(/G)+2G(/C)+3G(/C)+4 motif enrichment was observed for the top 10% most efficient −8 TSSs but not for the next 10% most efficient TSSs. A(/G)+1 enrichment observed for top 20% most efficient TSSs is consistent with the +1 R preference of TSS. Numbers (N) of variants assessed are shown. Sequence logos were generated using WebLogo 3. Bars represent an approximate Bayesian 95% confidence interval. (E) An A at position −9 results in different sequence preferences at position −8. The dataset of designed +4 TSSs deriving from ‘AYR’, ‘BYR’ and ‘ARY’ libraries was used to detect the −9/−8 interaction. All variants were divided into 16 subgroups defined by bases at positions −9 and −8 relative to designed +4 TSS, and then their TSS efficiencies were plotted. Lines represent median values of subgroups. (F) An A at position −8 results in different sequence preferences at position −7. The dataset of designed +1 TSSs deriving from ‘AYR’ and ‘BYR’ libraries was used to detect −8/−7 interaction. Calculations same as −9/−8 interaction described in E.

Source data

Extended Data Fig. 3 High level of reproducibility of library variants in Pol II mutants.

(A) Histograms showing the distribution of measured efficiencies for all designed −8 to +4 TSSs for MASTER libraries in Pol II mutants. Dashed lines mark the 5% efficiency cutoff with number (N) of TSS variants shown. (B) TSS usage distributions at designed −10 to +25 TSSs for MASTER libraries in Pol II mutants. Dots represent three biological replicates. Bars are mean +/- standard deviation. (C) Hierarchical clustering of Pearson correlation coefficients of TSS efficiencies for major TSSs (designed +1 TSS for ‘AYR’ and ‘BYR’ libraries, +2 TSS for ‘ARY’ library) for WT or Pol II mutants illustrated as a heat map for three biological replicates. (D) Example correlation plots of TSS efficiency of major TSSs between representative biological replicates. Pearson r and number (N) of compared variants are shown. (E) Plots of CoV versus total RNA reads (three biological replicates) for Pol II mutants. The red dashed lines mark the CoV = 0.5 cutoff, an arbitrary cutoff for promoters with reasonable reproducibility across replicates. G1097D replicates contain outliers because this mutant is susceptible to genetic suppressors. A suppressor existing in one biological replicate generates a high CoV allowing filtering.

Source data

Extended Data Fig. 4 Pol II mutants alter TSS efficiency in general.

(A) TSS efficiency distributions of designed +1 TSSs of Pol II mutants for base subgroups at individual positions relative to +1. Identical analysis as in Extended Data Fig. 2B for WT Pol II. (B) Pol II GOF G1097D showed greater increase in efficiency than GOF allele E1103G at upstream TSSs (designed −32 and −8 TSSs), while E1103G showed stronger effects at designed +1 TSS than G1097D. (C) Pol II initiation sequence preference in Pol II mutants. Identical analysis as in Fig. 3B for WT Pol II. (D) Motif enrichment for top the 10% most efficient −8 TSSs for Pol II mutants. Identical motif enrichment analysis as in Extended Data Fig. 2D top panel for WT Pol II. Numbers (N) of variants assessed are indicated. Bars represent an approximate Bayesian 95% confidence interval.

Source data

Extended Data Fig. 5 High of reproducibility of TSS usage and efficiency upon MPA treatment.

(A) TSS usage distributions at designed −10 to +25 TSSs in WT ‘NYR’ library (mixed ‘AYR’ and ‘BYR’ libraries) treated with 100% ethanol or with 20 μg/ml MPA. MPA treatment shifted TSS usage downstream relative to EtOH treatment. Dots represent three biological replicates. Bars are mean +/- standard deviation of the mean. (B) Hierarchical clustering of Pearson correlation coefficients of TSS efficiencies for designed +1 TSS for three biological replicates for MPA or EtOH treatment, illustrated as a heat map. (C) Hierarchical clustering of Pearson correlation coefficients of TSS efficiencies for all genome positions within defined promoter windows with >=3 reads in each replicate, illustrated as a heat map. (D) Correlation plots for combined biological replicates for TSS efficiency upon MPA treatment (y axes) versus EtOH treatment (x axes) for all TSSs ≥ 2% efficiency in the 25%–75% of the distribution for a curated set of 5,979 yeast promoters (see Methods). TSSs are separated into groups depending on base identity at positions −3 (control) or positions +1 to +6.

Source data

Extended Data Fig. 6 Modeling identifies sequence features for TSS selection in WT and Pol II mutants.

(A) Overview of TSS efficiency modeling. (1) TSS efficiencies including designed −8 to +2 and +4 TSSs deriving from ‘AYR’, ‘BYR’ and ‘ARY’ libraries were pooled for modeling. (2) Sequences from −11 to +9 relative to variant TSSs were extracted. (3) To identify robust features, a forward stepwise selection strategy coupled with a five-fold cross-validation for logistic regression was used, with random splitting into training (80%) and test (20%) sets. Stepwise regression starting with a constant term only with stepwise variable addition, until a stopping criterion is met, was performed. Additive terms (sequences at positions −11 to +9) and interactions were tested in stages. Model performance was evaluated with R2. The stopping criterion for adding additional variables was an increase R2 < 0.01. (4) A logistic regression model containing selected robust features was trained using the training set and then evaluated with the test set. (B) Comparison of measured efficiencies and predicted efficiencies. Model performance R2 on entire test set and number (N) of data points shown in plot are shown. (C) Principal component analysis (PCA) for parameters of models trained using individual replicates of WT and Pol II mutants. Close clustering of individual replicates indicates that models are not overfit. The top 15 contributing variables are shown. GOF and LOF mutants were separated from WT by the 1st principal component. GOF G1097D and E1103G were further distinguished by 2nd principal component by additional position +2 information, which is consistent with results in Extended Data Fig. 4D, where G1097D and E1103G differentially altered +2 sequence enrichment. (D) A scatter plot of comparison of measured and predicted TSS efficiencies of all positions within 5,979 known genomic promoter windows21 with available measured efficiency. Pearson r and number (N) of compared variants are shown. Most promoter positions (82%, 1,678,406 out of 2,047,205) showed no observed efficiency, which is expected because TSSs need to be specified by a core promoter and scanning occurs over some distance downstream.

Source data

Supplementary information

Source data

Source Data Fig. 1

Statistical source data.

Source Data Fig. 2

Statistical source data.

Source Data Fig. 3

Statistical source data.

Source Data Fig. 4

Statistical source data.

Source Data Fig. 5

Statistical source data.

Source Data Fig. 6

Statistical source data.

Source Data Fig. 7

Statistical source data.

Source Data Extended Data Fig. 1/Table 1

Statistical source data.

Source Data Extended Data Fig. 2/Table 2

Statistical source data.

Source Data Extended Data Fig. 3/Table 3

Statistical source data.

Source Data Extended Data Fig. 4/Table 4

Statistical source data.

Source Data Extended Data Fig. 5/Table 5

Statistical source data.

Source Data Extended Data Fig. 6/Table 6

Statistical source data.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhu, Y., Vvedenskaya, I.O., Sze, SH. et al. Quantitative analysis of transcription start site selection reveals control by DNA sequence, RNA polymerase II activity and NTP levels. Nat Struct Mol Biol 31, 190–202 (2024). https://doi.org/10.1038/s41594-023-01171-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41594-023-01171-9

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing