Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Genomic organization of human transcription initiation complexes

A Retraction to this article was published on 23 July 2014

This article has been updated

Abstract

The human genome is pervasively transcribed, yet only a small fraction is coding. Here we address whether this non-coding transcription arises at promoters, and detail the interactions of initiation factors TATA box binding protein (TBP), transcription factor IIB (TFIIB) and RNA polymerase (Pol) II. Using ChIP-exo (chromatin immunoprecipitation with lambda exonuclease digestion followed by high-throughput sequencing), we identify approximately 160,000 transcription initiation complexes across the human K562 genome, and more in other cancer genomes. Only about 5% associate with messenger RNA genes. The remainder associates with non-polyadenylated non-coding transcription. Regardless, Pol II moves into a transcriptionally paused state, and TBP and TFIIB remain at the promoter. Remarkably, the vast majority of locations contain the four core promoter elements— upstream TFIIB recognition element (BREu), TATA, downstream TFIIB recognition element (BREd), and initiator element (INR)—in constrained positions. All but the INR also reside at Pol III promoters, where TBP makes similar contacts. This comprehensive and high-resolution genome-wide detection of the initiation machinery produces a consolidated view of transcription initiation events from yeast to humans at Pol II/III TATA-containing/TATA-less coding and non-coding genes.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Transcription machinery organization at human mRNA promoters.
Figure 2: TATA elements at most mRNA genes.
Figure 3: BRE and INR at most mRNA genes.
Figure 4: Non-coding TFIIB locations have chromatin marks and non-polyadenylated RNA.
Figure 5: Restricted spacing of CPEs.
Figure 6: TATA and BRE elements at most tRNA genes.

Similar content being viewed by others

Accession codes

Accessions

Sequence Read Archive

Data deposits

Sequencing data have been deposited at the NCBI Sequence Read Archive under accession number SRA067908.

Change history

  • 02 October 2013

    Minor changes were made to the core promoter consensus sequences.

References

  1. Buratowski, S., Hahn, S., Guarente, L. & Sharp, P. A. Five intermediate complexes in transcription initiation by RNA polymerase II. Cell 56, 549–561 (1989)

    Article  CAS  Google Scholar 

  2. Lagrange, T., Kapanidis, A. N., Tang, H., Reinberg, D. & Ebright, R. H. New core promoter element in RNA polymerase II-dependent transcription: sequence-specific DNA binding by transcription factor IIB. Genes Dev. 12, 34–44 (1998)

    Article  CAS  Google Scholar 

  3. Deng, W. & Roberts, S. G. A core promoter element downstream of the TATA box that is recognized by TFIIB. Genes Dev. 19, 2418–2423 (2005)

    Article  CAS  Google Scholar 

  4. Juven-Gershon, T. & Kadonaga, J. T. Regulation of gene expression via the core promoter and the basal transcriptional machinery. Dev. Biol. 339, 225–229 (2010)

    Article  CAS  Google Scholar 

  5. Kostrewa, D. et al. RNA polymerase II–TFIIB structure and mechanism of transcription initiation. Nature 462, 323–330 (2009)

    Article  CAS  ADS  Google Scholar 

  6. He, Y., Fang, J., Taatjes, D. J. & Nogales, E. Structural visualization of key steps in human transcription initiation. Nature 495, 481–486 (2013)

    Article  CAS  ADS  Google Scholar 

  7. Ptashne, M. & Gann, A. Transcriptional activation by recruitment. Nature 386, 569–577 (1997)

    Article  CAS  ADS  Google Scholar 

  8. Vannini, A. & Cramer, P. Conservation between the RNA polymerase I, II, and III transcription initiation machineries. Mol. Cell 45, 439–446 (2012)

    Article  CAS  Google Scholar 

  9. Kim, T. H. et al. A high-resolution map of active promoters in the human genome. Nature 436, 876–880 (2005)

    Article  CAS  ADS  Google Scholar 

  10. Carninci, P. et al. Genome-wide analysis of mammalian promoter architecture and evolution. Nature Genet. 38, 626–635 (2006)

    Article  CAS  Google Scholar 

  11. Gilmour, D. S. & Lis, J. T. RNA polymerase II interacts with the promoter region of the noninduced hsp70 gene in Drosophila melanogaster cells. Mol. Cell. Biol. 6, 3984–3989 (1986)

    Article  CAS  Google Scholar 

  12. Guenther, M. G., Levine, S. S., Boyer, L. A., Jaenisch, R. & Young, R. A. A chromatin landmark and transcription initiation at most promoters in human cells. Cell 130, 77–88 (2007)

    Article  CAS  Google Scholar 

  13. Kwak, H., Fuda, N. J., Core, L. J. & Lis, J. T. Precise maps of RNA polymerase reveal how promoters direct initiation and pausing. Science 339, 950–953 (2013)

    Article  CAS  ADS  Google Scholar 

  14. Kapranov, P., Willingham, A. T. & Gingeras, T. R. Genome-wide transcription and the implications for genomic organization. Nature Rev. Genet. 8, 413–423 (2007)

    Article  CAS  Google Scholar 

  15. Rhee, H. S. & Pugh, B. F. Comprehensive genome-wide protein-DNA interactions detected at single-nucleotide resolution. Cell 147, 1408–1419 (2011)

    Article  CAS  Google Scholar 

  16. Rhee, H. S. & Pugh, B. F. Genome-wide structure and organization of eukaryotic pre-initiation complexes. Nature 483, 295–301 (2012)

    Article  CAS  ADS  Google Scholar 

  17. Ernst, J. et al. Mapping and analysis of chromatin state dynamics in nine human cell types. Nature 473, 43–49 (2011)

    Article  CAS  ADS  Google Scholar 

  18. Pruitt, K. D., Tatusova, T. & Maglott, D. R. NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 35, D61–D65 (2007)

    Article  CAS  Google Scholar 

  19. He, Y., Vogelstein, B., Velculescu, V. E., Papadopoulos, N. & Kinzler, K. W. The antisense transcriptomes of human cells. Science 322, 1855–1857 (2008)

    Article  CAS  ADS  Google Scholar 

  20. Seila, A. C. et al. Divergent transcription from active promoters. Science 322, 1849–1851 (2008)

    Article  CAS  ADS  Google Scholar 

  21. Core, L. J. & Lis, J. T. Transcription regulation through promoter-proximal pausing of RNA polymerase II. Science 319, 1791–1792 (2008)

    Article  CAS  ADS  Google Scholar 

  22. Fenouil, R. et al. CpG islands and GC content dictate nucleosome depletion in a transcription-independent manner at mammalian promoters. Genome Res. 22, 2399–2408 (2012)

    Article  CAS  Google Scholar 

  23. Rozenberg, J. M. et al. All and only CpG containing sequences are enriched in promoters abundantly bound by RNA polymerase II in multiple tissues. BMC Genomics 9, 67 (2008)

    Article  Google Scholar 

  24. Sainsbury, S., Niesser, J. & Cramer, P. Structure and function of the initially transcribing RNA polymerase II–TFIIB complex. Nature 493, 437–440 (2013)

    Article  CAS  ADS  Google Scholar 

  25. Basehoar, A. D., Zanton, S. J. & Pugh, B. F. Identification and distinct regulation of yeast TATA box-containing genes. Cell 116, 699–709 (2004)

    Article  CAS  Google Scholar 

  26. Singer, V. L., Wobbe, C. R. & Struhl, K. A wide variety of DNA sequences can functionally replace a yeast TATA element for transcriptional activation. Genes Dev. 4, 636–645 (1990)

    Article  CAS  Google Scholar 

  27. Smale, S. T. & Baltimore, D. The “initiator” as a transcription control element. Cell 57, 103–113 (1989)

    Article  CAS  Google Scholar 

  28. Kim, T. K. et al. Widespread transcription at neuronal activity-regulated enhancers. Nature 465, 182–187 (2010)

    Article  CAS  ADS  Google Scholar 

  29. Hamada, M., Huang, Y., Lowe, T. M. & Maraia, R. J. Widespread use of TATA elements in the core promoters for RNA polymerases III, II, and I in fission yeast. Mol. Cell. Biol. 21, 6870–6881 (2001)

    Article  CAS  Google Scholar 

  30. Geiduschek, E. P. & Tocchini-Valentini, G. P. Transcription by RNA polymerase III. Annu. Rev. Biochem. 57, 873–914 (1988)

    Article  CAS  Google Scholar 

  31. White, R. J. & Jackson, S. P. Mechanism of TATA-binding protein recruitment to a TATA-less class III promoter. Cell 71, 1041–1053 (1992)

    Article  CAS  Google Scholar 

  32. Carrière, L. et al. Genomic binding of Pol III transcription machinery and relationship with TFIIS transcription factor distribution in mouse embryonic stem cells. Nucleic Acids Res. 40, 270–283 (2012)

    Article  Google Scholar 

  33. Verrijzer, C. P., Chen, J. L., Yokomori, K. & Tjian, R. Binding of TAFs to core elements directs promoter selectivity by RNA polymerase II. Cell 81, 1115–1125 (1995)

    Article  CAS  Google Scholar 

  34. Kapranov, P. & St Laurent, G. Dark matter RNA: existence, function, and controversy. Front. Genet. 3, 60 (2012)

    CAS  PubMed  PubMed Central  Google Scholar 

  35. Eichler, E. E. et al. Missing heritability and strategies for finding the underlying causes of complex disease. Nature Rev. Genet. 11, 446–450 (2010)

    Article  CAS  Google Scholar 

  36. Rhee, H. S. & Pugh, B. F. ChIP-exo method for identifying genomic location of DNA-binding proteins with near-single-nucleotide accuracy. Curr. Protoc. Mol. Biol. Chapter 21, Unit 21.24. (2012)

  37. Bailey, T. L. et al. MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res. 37, W202–W208 (2009)

    Article  CAS  Google Scholar 

  38. Berger, M. F. et al. Integrative analysis of the melanoma transcriptome. Genome Res. 20, 413–427 (2010)

    Article  CAS  Google Scholar 

  39. Dunham, I. et al. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012)

    Article  CAS  ADS  Google Scholar 

  40. Djebali, S. et al. Landscape of transcription in human cells. Nature 489, 101–108 (2012)

    Article  CAS  ADS  Google Scholar 

  41. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009)

    Article  CAS  Google Scholar 

  42. Albert, I., Wachi, S., Jiang, C. & Pugh, B. F. GeneTrack–a genomic data processing and visualization framework. Bioinformatics 24, 1305–1306 (2008)

    Article  CAS  Google Scholar 

Download references

Acknowledgements

We thank R. Reja, S. Mahony, P. Albert and Y. Li for bioinformatic assistance, and M. Cousar and K.-Y. Chan-Salis for experimental support. This work was supported by National Institutes of Health grant GM059055.

Author information

Authors and Affiliations

Authors

Contributions

B.J.V. performed the experiments and conducted data analyses. B.J.V. and B.F.P. conceived the experiments, analyses and co-wrote the manuscript.

Corresponding author

Correspondence to B. Franklin Pugh.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Extended data figures and tables

Extended Data Figure 1 Validation of ChIP-exo data and association with ENCODE annotated regions.

a, Pie chart of all 159,117 TFIIB-bound locations in K562 cells parsed into ENCODE-annotated regions. b, Venn overlap among mRNA genes having TBP or TFIIB locations (<500 bp from its TSS) and genes with measured polyadenylated mRNA levels detected by RNA-seq38. Data thresholding may contribute to non-overlapping sets. c, Moving average (100-gene) of mRNA levels versus TFIIB/TBP/Pol II occupancy levels on a median-centred log2 scale.

Extended Data Figure 2 Distribution of TFIIB/TBP/Pol II in CpG islands that overlap mRNA TSSs.

a, Peak-pair distribution for TFIIB, TBP and Pol II at the 5,095 CpG islands that overlap with the mRNA TSSs from Fig. 1b (78% overlap), and with the direction of transcription to the right. Rows are linked, and sorted by CpG island length. CpG island borders are indicated by blue and red bars, respectively. b, Shown is the averaged data from a. c, All 159,117 TFIIB locations were sorted by location, and inter-TFIIB distances calculated (red trace). Data were then sorted by distance, and the standard deviation of adjacent TFIIB occupancy ratios was calculated on a sliding window of 30 values. Peak calling parameters preclude detection of two separate TFIIB locations approximately <40 bp apart. Those that were 40–70 bp apart were correlated, whereas those >70 bp apart were less correlated or uncorrelated.

Extended Data Figure 3 Properties of CPEs associated with RefSeq genes.

a, Average TFIIB and TBP occupancy parsed by the number of mismatches to the TATA consensus. b, Distribution of each candidate CPE relative to each other.

Extended Data Figure 4 CPEs at non-coding loci bound by TFIIB.

a, Bar graph showing the percentage of all 150,754 putative ‘non-coding’ TFIIB binding locations (>500 bp from an annotated RefSeq TSS) that have the indicated number of CPEs. b, Distribution of ChIP-exo peaks on each strand relative to the indicated CPE, for 150,754 putative non-coding TFIIB locations. Opposite strand traces (red) are inverted. c, Distribution of TBP (purple) and Pol II (black) peak-pair midpoints relative to the TATA motif midpoint derived from the 150,754 TFIIB putative non-coding locations. d, TFIIB occupancy versus percentage of locations that code for proteins. All 159,117 TFIIB locations were sorted by occupancy level, and the percentage of locations linked to an annotated RefSeq feature was plotted as a moving average.

Extended Data Figure 5 Enrichment of different RNA fractions at 159,117 TFIIB locations throughout the human genome.

Frequency distribution RNA 5′ ends for poly(A)+(ref. 38) (top) and ENCODE project RNA fractions40 as indicated to the far left. Traces in the left panels are separated by sense (blue) and antisense (red, inverted) orientations relative to the corresponding mRNA TSS, which is directed to the right. Because the TSS orientation is not known for the poly(A) ncRNA loci, positive and negative strand tags were plotted relative to the TFIIB midpoint. The percentage of putative TFIIB locations that exist within 2 kb of an RNA tag are indicated in the top right corner of each plot.

Extended Data Figure 6 TFIIB core promoter distances.

Candidate CP at varying distances from all 159,117 TFIIB locations, for the indicated spacing variants (not all possible combinations were tested). Digits within spacing variant schematic reflect the base-pair spacing (N) between elements. CPE with high P values (less correlated to the PSPM matrix) have thin lines, whereas low/strong P values (<3 × 10−4) have thick lines.

Extended Data Figure 7 Promoter complexes across cancer cell lines.

a, b, Occupancy levels for TFIIB linked to coding genes (a) and non-coding regions (b) in the indicated cell type were normalized by column. The colour scales represent the range of average-centred, log2-transformed values within each respective column. Detection in all four cell types defines group 1. Groups 2–4 were parsed by k-means clustering. Rows were sorted within groups based on TFIIB occupancy averaged across the four cell types (yellow, black, cyan and grey denote high, medium, low and zero occupancy, respectively). For clarity in b, TFIIB locations that were detected in only one cell line were excluded from clustering. Columns were hierarchically clustered. The MCF7 data set had 20–30% of the coverage of other cell lines (reported in Supplementary Data 3), which probably accounts for an excessive number of zero-occupancy loci (grey).

Extended Data Table 1 Statistics of Illumina sequencing

Supplementary information

Supplementary Data 1

This file contains Supplementary Data 1a. (XLSX 23538 kb)

Supplementary Data 2

This file contains Supplementary Data 1b. (XLSX 22234 kb)

Supplementary Data 3

This file contains Supplementary Data 2. (XLS 26 kb)

Supplementary Data 4

This file contains Supplementary Data 3. (XLS 29157 kb)

Supplementary Data 5

This file contains Supplementary Data 4. (XLSX 532 kb)

PowerPoint slides

Rights and permissions

Reprints and permissions

About this article

Cite this article

Venters, B., Pugh, B. Genomic organization of human transcription initiation complexes. Nature 502, 53–58 (2013). https://doi.org/10.1038/nature12535

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nature12535

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing