Key Points
-
Genome-wide methods have identified fivefold to tenfold more transcription start sites (TSSs) than were previously known to exist. Many of these occur at unexpected locations, such as assumed gene deserts, exons and 3′ UTRs of known genes.
-
Most promoters are not represented by the accepted model of a single TSS with an upstream TATA-box; a cluster of TSSs in a narrow region of genomic DNA is the most common pattern. Core promoters can be classified according to the distribution and relative usage of their TSSs.
-
The TSS distribution of core promoters is tightly coupled to the occurrence of both known cis-regulatory elements and gene function, and is generally conserved between humans and mice.
-
Few promoters use an extended initiator sequence to define the TSS. The most consistent pattern is a pyrimidine–purine dinucleotide that overlaps the TSS.
-
Most genes have at least two distinct promoters, which may be differentially regulated and generate mRNAs that encode different protein isoforms.
-
The wealth of TSS data enables new types of analysis, including the study of promoter evolution and functional analysis of promoters on a genome-wide scale.
Abstract
The identification and characterization of mammalian core promoters and transcription start sites is a prerequisite to understanding how RNA polymerase II transcription is controlled. New experimental technologies have enabled genome-wide discovery and characterization of core promoters, revealing that most mammalian genes do not conform to the simple model in which a TATA box directs transcription from a single defined nucleotide position. In fact, most genes have multiple promoters, within which there are multiple start sites, and alternative promoter usage generates diversity and complexity in the mammalian transcriptome and proteome. Promoters can be described by their start site usage distribution, which is coupled to the occurrence of cis-regulatory elements, gene function and evolutionary constraints. A comprehensive survey of mammalian promoters is a major step towards describing and understanding transcriptional control networks.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$189.00 per year
only $15.75 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Smale, S. T. & Kadonaga, J. T. The RNA polymerase II core promoter. Annu. Rev. Biochem. 72, 449–479 (2003). An excellent in-depth review of well-studied core promoter elements.
Gross, P. & Oelgeschlager, T. Core promoter-selective RNA polymerase II transcription. Biochem. Soc. Symp. 2006, 225–236 (2006).
Hampsey, M. Molecular genetics of the RNA polymerase II general transcriptional machinery. Microbiol. Mol. Biol. Rev. 62, 465–503 (1998).
Thomas, M. C. & Chiang, C. M. The general transcription machinery and general cofactors. Crit. Rev. Biochem. Mol. Biol. 41, 105–178 (2006).
Lewis, B. A. & Reinberg, D. The mediator coactivator complex: functional and physical roles in transcriptional regulation. J. Cell Sci. 116, 3667–3675 (2003).
Black, J. C., Choi, J. E., Lombardo, S. R. & Carey, M. A mechanism for coordinating chromatin modification and preinitiation complex assembly. Mol. Cell 23, 809–818 (2006).
Kadonaga, J. T. Regulation of RNA polymerase II transcription by sequence-specific DNA binding factors. Cell 116, 247–257 (2004).
Wasserman, W. W. & Sandelin, A. Applied Bioinformatics for the identification of regulatory elements. Nature Rev. Genet. 5, 276–287 (2004). Reviews the computational methods that underlie the prediction of promoter positions and transcription factor binding sites, targeted towards bench biologists.
Bajic, V. B., Tan, S. L., Suzuki, Y. & Sugano, S. Promoter prediction analysis on the whole human genome. Nature Biotechnol. 22, 1467–1473 (2004).
Brodsky, A. S. et al. Genomic mapping of RNA polymerase II reveals sites of co-transcriptional regulation in human cells. Genome Biol. 6, R64 (2005). This study revealed the surprisingly high concentration of RNApolII that is bound to exons but not introns.
Kim, T. H. et al. A high-resolution map of active promoters in the human genome. Nature 436, 876–880 (2005). The first genome-wide ChIP–chip determination using antibodies that targeted the PIC.
Cooper, S. J., Trinklein, N. D., Anton, E. D., Nguyen, L. & Myers, R. M. Comprehensive analysis of transcriptional promoter structure and function in 1% of the human genome. Genome Res. 16, 1–10 (2006).
Gershenzon, N. I. & Ioshikhes, I. P. Synergy of human Pol II core promoter elements revealed by statistical sequence analysis. Bioinformatics 21, 1295–1300 (2005).
Ohler, U. Identification of core promoter modules in Drosophila and their application in accurate transcription start site prediction. Nucleic Acids Res. 34, 5943–5950 (2006).
Ohler, U., Liao, G. C., Niemann, H. & Rubin, G. M. Computational analysis of core promoters in the Drosophila genome. Genome Biol. 3, RESEARCH0087 (2002).
Molina, C. & Grotewold, E. Genome wide analysis of Arabidopsis core promoters. BMC Genomics 6, 25 (2005).
Schug, J. et al. Promoter features related to tissue specificity as measured by Shannon entropy. Genome Biol. 6, R33 (2005).
Cawley, S. et al. Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell 116, 499–509 (2004).
Cheng, J. et al. Transcriptional maps of 10 human chromosomes at 5-nucleotide resolution. Science 308, 1149–1154 (2005).
Crawford, G. E. et al. DNase-chip: a high-resolution method to identify DNase I hypersensitive sites using tiled microarrays. Nature Methods 3, 503–509 (2006).
Fan, J. B., Chee, M. S. & Gunderson, K. L. Highly parallel genomic assays. Nature Rev. Genet. 7, 632–644 (2006).
Kapranov, P. et al. Large-scale transcriptional activity in chromosomes 21 and 22. Science 296, 916–919 (2002). One of several key publications from Affymetrix on the utility of tiling arrays and the widespread occurrence of non-coding RNA.
Carninci, P. et al. The transcriptional landscape of the mammalian genome. Science 309, 1559–1563 (2005).
Carninci, P. et al. Genome-wide analysis of mammalian promoter architecture and evolution. Nature Genet. (2006). The largest experimental promoter identification study to date in any species, with subsequent computational analysis.
Harbers, M. & Carninci, P. Tag-based approaches for transcriptome research and genome annotation. Nature Methods 2, 495–502 (2005).
Barrera, L. O. & Ren, B. The transcriptional regulatory code of eukaryotic cells — insights from genome-wide analysis of chromatin organization and transcription factor binding. Curr. Opin. Cell Biol. 18, 291–298 (2006).
Kimura, K. et al. Diversification of transcriptional modulation: large-scale identification and characterization of putative alternative promoters of human genes. Genome Res. 16, 55–65 (2006).
Carninci, P. et al. Targeting a complex transcriptome: the construction of the mouse full-length cDNA encyclopedia. Genome Res. 13, 1273–1289 (2003).
Suzuki, Y. et al. Large-scale collection and characterization of promoters of human and mouse genes. In silico Biol. 4, 0036 (2004).
Shiraki, T. et al. Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage. Proc. Natl Acad. Sci. USA 100, 15776–15781 (2003).
Kodzius, R. et al. CAGE: cap analysis of gene expression. Nature Methods 3, 211–222 (2006).
Hashimoto, S. et al. 5′-end SAGE for the analysis of transcriptional start sites. Nature Biotechnol. 22, 1146–1149 (2004).
Wei, C. L. et al. 5′ long serial analysis of gene expression (LongSAGE) and 3′ LongSAGE for transcriptome characterization and genome annotation. Proc. Natl Acad. Sci. USA 101, 11701–11706 (2004).
Ng, P. et al. Gene identification signature (GIS) analysis for transcriptome characterization and genome annotation. Nature Methods 2, 105–111 (2005).
Shannon, M. F. & Rao, S. Transcription. Of chips and ChIPs. Science 296, 666–669 (2002).
Ren, B. & Dynlacht, B. D. Use of chromatin immunoprecipitation assays in genome-wide location analysis of mammalian transcription factors. Methods Enzymol. 376, 304–315 (2004).
Loh, Y. H. et al. The OCT4 and Nanog transcription network regulates pluripotency in mouse embryonic stem cells. Nature Genet. 38, 431–440 (2006).
Wei, C. L. et al. A global map of p53 transcription-factor binding sites in the human genome. Cell 124, 207–219 (2006).
Kampa, D. et al. Novel RNAs identified from an in-depth analysis of the transcriptome of human chromosomes 21 and 22. Genome Res. 14, 331–342 (2004).
Schaefer, B. C. Revolutions in rapid amplification of cDNA ends: new strategies for polymerase chain reaction cloning of full-length cDNA ends. Anal. Biochem. 227, 255–273 (1995).
Okazaki, Y. et al. Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs. Nature 420, 563–573 (2002).
Yamashita, R. et al. DBTSS: DataBase of Human Transcription Start Sites, progress report 2006. Nucleic Acids Res. 34, D86–D89 (2006).
Jackson, D. A., Pombo, A. & Iborra, F. The balance sheet for transcription: an analysis of nuclear RNA metabolism in mammalian cells. Faseb J. 14, 242–254 (2000).
Kovalskaya, E., Buzdin, A., Gogvadze, E., Vinogradova, T. & Sverdlov, E. Functional human endogenous retroviral LTR transcription start sites are located between the R and U5 regions. Virology 346, 373–378 (2006).
Buzdin, A., Kovalskaya-Alexandrova, E., Gogvadze, E. & Sverdlov, E. GREM, a technique for genome-wide isolation and quantitative analysis of promoter active repeats. Nucleic Acids Res. 34, e67 (2006).
Ling, J., Baibakov, B., Pi, W., Emerson, B. M. & Tuan, D. The HS2 enhancer of the β-globin locus control region initiates synthesis of non-coding, polyadenylated RNAs independent of a cis-linked globin promoter. J. Mol. Biol. 350, 883–896 (2005).
Drewell, R. A. et al. Novel conserved elements upstream of the H19 gene are transcribed and act as mesodermal enhancers. Development 129, 1205–1213 (2002).
Ravasi, T. & Hume, D. A. in Encyclopedia of Genetics, Genomics, Proteomics and Bioinformatics (ed. Subramamiam, S.) (John Wiley & Sons, Chichester, 2005).
Gingeras, T. R. The multitasking genome. Nature Genet. 38, 608–609 (2006).
Suzuki, Y. et al. Diverse transcriptional initiation revealed by fine, large-scale mapping of mRNA start sites. EMBO Rep. 2, 388–393 (2001).
Ponjavic, J. et al. Transcriptional and structural impact of TATA-initiation site spacing in mammalian core promoters. Genome Biol. 7, R78 (2006).
Hahn, S. Structure and mechanism of the RNA polymerase II transcription machinery. Nature Struct. Mol. Biol. 11, 394–403 (2004).
Zhu, Q., Dabi, T. & Lamb, C. TATA box and initiator functions in the accurate transcription of a plant minimal promoter in vitro. Plant Cell 7, 1681–1689 (1995).
O'Shea-Greenfield, A. & Smale, S. T. Roles of TATA and initiator elements in determining the start site location and direction of RNA polymerase II transcription. J. Biol. Chem. 267, 1391–1402 (1992).
Grace, M. L., Chandrasekharan, M. B., Hall, T. C. & Crowe, A. J. Sequence and spacing of TATA box elements are critical for accurate initiation from the β-phaseolin promoter. J. Biol. Chem. 279, 8102–8110 (2004).
Smale, S. T. et al. The initiator element: a paradigm for core promoter heterogeneity within metazoan protein-coding genes. Cold Spring Harb. Symp. Quant. Biol. 63, 21–31 (1998).
Weis, L. & Reinberg, D. Accurate positioning of RNA polymerase II on a natural TATA-less promoter is independent of TATA-binding-protein-associated factors and initiator-binding proteins. Mol. Cell. Biol. 17, 2973–2984 (1997).
Gallagher, P. G. et al. A dinucleotide deletion in the ankyrin promoter alters gene expression, transcription initiation and TFIID complex formation in hereditary spherocytosis. Hum. Mol. Genet. 14, 2501–2509 (2005).
Lee, M. P. et al. ATG deserts define a novel core promoter subclass. Genome Res. 15, 1189–1197 (2005). An in-depth experimental study of promoters with multiple start sites, followed by a computational screening of ATG deserts in the human genome.
Carcamo, J., Buckbinder, L. & Reinberg, D. The initiator directs the assembly of a transcription factor IID-dependent transcription complex. Proc. Natl Acad. Sci. USA 88, 8052–8056 (1991).
Luger, K., Mader, A. W., Richmond, R. K., Sargent, D. F. & Richmond, T. J. Crystal structure of the nucleosome core particle at 2.8 Å resolution. Nature 389, 251–260 (1997).
Nishida, H. et al. Histone H3 acetylated at lysine 9 in promoter is associated with low nucleosome density in the vicinity of transcription start site in human cell. Chromosome Res. 14, 203–211 (2006).
Mellor, J. Dynamic nucleosomes and gene transcription. Trends Genet. 22, 320–329 (2006).
Bantignies, F. & Cavalli, G. Cellular memory and dynamic regulation of polycomb group proteins. Curr. Opin. Cell Biol. 18, 275–283 (2006).
Segal, E. et al. A genomic code for nucleosome positioning. Nature 442, 772–778 (2006).
Kawaji, H. et al. Dynamic usage of transcription start sites within core promoters. Genome Biol. 7, R118 (2006).
Taylor, M. S. et al. Heterotachy in mammalian promoter evolution. PLoS Genet. 2, e30 (2006). The most comprehensive study of promoter evolution in mammalian species to date.
Albig, W., Kioschis, P., Poustka, A., Meergans, K. & Doenecke, D. Human histone gene organization: nonregular arrangement within a large cluster. Genomics 40, 314–322 (1997).
Guarguaglini, G. et al. Expression of the murine RanBP1 and Htf9-c genes is regulated from a shared bidirectional promoter during cell cycle progression. Biochem. J. 325, 277–286 (1997).
Sugimoto, M., Oohashi, T. & Ninomiya, Y. The genes COL4A5 and COL4A6, coding for basement membrane collagen chains {alpha}5(IV) and {alpha}6(IV), are located head-to-head in close proximity on human chromosome Xq22 and COL4A6 is transcribed from two alternative promoters. Proc. Natl Acad. Sci. USA 91, 11679–11683 (1994).
Trinklein, N. D. et al. An abundance of bidirectional promoters in the human genome. Genome Res. 14, 62–66 (2004).
Engstrom, P. G. et al. Complex loci in human and mouse genomes. PLoS Genet. 2, e47 (2006).
Katayama, S. et al. Antisense transcription in the mammalian transcriptome. Science 309, 1564–1566 (2005).
Pruitt, K. D., Tatusova, T. & Maglott, D. R. NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 33, D501–D504 (2005).
Bai, L., Santangelo, T. J. & Wang, M. D. Single-molecule analysis of RNA polymerase transcription. Annu. Rev. Biophys. Biomol. Struct. 35, 343–360 (2006).
Kornblihtt, A. R., de la Mata, M., Fededa, J. P., Munoz, M. J. & Nogues, G. Multiple links between transcription and splicing. RNA 10, 1489–1498 (2004). An excellent review that connects the splicing process to transcription.
Dye, M. J., Gromak, N. & Proudfoot, N. J. Exon tethering in transcription by RNA polymerase II. Mol. Cell 21, 849–859 (2006).
Schwartz, S. et al. Human–mouse alignments with BLASTZ. Genome Res. 13, 103–107 (2003).
Keightley, P. D., Lercher, M. J. & Eyre-Walker, A. Evidence for widespread degradation of gene control regions in hominid genomes. PLoS. Biol. 3, e42 (2005).
Lee, S., Kohane, I. & Kasif, S. Genes involved in complex adaptive processes tend to have highly conserved upstream regions in mammalian genomes. BMC Genomics 6, 168 (2005).
Tirosh, I., Weinberger, A., Carmi, M. & Barkai, N. A genetic signature of interspecies variations in gene expression. Nature Genet. 38, 830–834 (2006).
Nilsson, R. et al. Transcriptional network dynamics in macrophage activation. Genomics 88, 133–142 (2006).
Yan, C. & Boyd, D. D. Histone H3 acetylation and H3 K4 methylation define distinct chromatin regions permissive for transgene expression. Mol. Cell. Biol. 26, 6357–6371 (2006).
Pokholok, D. K. et al. Genome-wide map of nucleosome acetylation and methylation in yeast. Cell 122, 517–527 (2005).
Wiren, M. et al. Genomewide analysis of nucleosome density histone acetylation and HDAC function in fission yeast. EMBO J. 24, 2906–2918 (2005).
Guccione, E. et al. Myc-binding-site recognition in the human genome is determined by chromatin context. Nature Cell Biol. 8, 764–770 (2006).
Furuno, M. et al. Clusters of internally primed transcripts reveal novel long noncoding RNAs. PLoS Genet. 2, e37 (2006).
Wurtele, H. & Chartrand, P. Genome-wide scanning of HoxB1-associated loci in mouse ES cells using an open-ended chromosome conformation capture methodology. Chromosome Res. 14, 477–495 (2006).
Murrell, A., Heeson, S. & Reik, W. Interaction between differentially methylated regions partitions the imprinted genes Igf2 and H19 into parent-specific chromatin loops. Nature Genet. 36, 889–893 (2004).
Chakalova, L., Debrand, E., Mitchell, J. A., Osborne, C. S. & Fraser, P. Replication and transcription: shaping the landscape of the genome. Nature Rev. Genet 6, 669–677 (2005).
Krivan, W. & Wasserman, W. W. A predictive model for regulatory sequences directing liver-specific transcription. Genome Res. 11, 1559–1566 (2001).
Lenhard, B. et al. Identification of conserved regulatory elements by comparative genome analysis. J. Biol. 2, 13 (2003).
Margulies, M. et al. Genome sequencing in microfabricated high-density picolitre reactors. Nature 437, 376–380 (2005).
Ng, P. et al. Multiplex sequencing of paired-end ditags (MS-PET): a strategy for the ultra-high-throughput analysis of transcriptomes and genomes. Nucleic Acids Res. 34, e84 (2006).
Sabo, P. J. et al. Genome-scale mapping of DNase I sensitivity in vivo using tiling DNA microarrays. Nature Methods 3, 511–518 (2006).
ENCODE Project Consortium. The ENCODE (ENCyclopedia Of DNA Elements) Project. Science 306, 636–640 (2004).
Sambrook, J. & Russel, D. W. Molecular Cloning: A Laboratory Manual (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, 2001).
Kadonaga, J. T. The DPE, a core promoter element for transcription by RNA polymerase II. Exp. Mol. Med. 34, 259–264 (2002).
Lagrange, T., Kapanidis, A. N., Tang, H., Reinberg, D. & Ebright, R. H. New core promoter element in RNA polymerase II-dependent transcription: sequence-specific DNA binding by transcription factor IIB. Genes Dev. 12, 34–44 (1998).
Gardiner-Garden, M. & Frommer, M. CpG islands in vertebrate genomes. J. Mol. Biol. 196, 261–282 (1987).
Antequera, F. & Bird, A. Number of CpG islands and genes in human and mouse. Proc. Natl Acad. Sci. USA 90, 11995–11999 (1993).
Saxonov, S., Berg, P. & Brutlag, D. L. A genome-wide analysis of CpG dinucleotides in the human genome distinguishes two distinct classes of promoters. Proc. Natl Acad. Sci. USA 103, 1412–1417 (2006).
Gustincich, S. et al. The complexity of the mammalian transcriptome. J. Physiol. 575, 321–332 (2006).
Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).
Hinrichs, A. S. et al. The UCSC Genome Browser Database: update 2006. Nucleic Acids Res. 34, D590–D598 (2006).
Harrow, J. et al. GENCODE: producing a reference annotation for ENCODE. Genome Biol. 7, S41–S49 (2006).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Related links
Related links
FURTHER INFORMATION
Database of Transcriptional Start Sites (DBTSS)
Glossary
- Transcription start site
-
A nucleotide in the genome that is the first to be transcribed into a particular RNA.
- Core promoter
-
The genomic region that surrounds a TSS or cluster of TSSs. There is no absolute definition for the length of a core promoter; it is generally defined empirically as the segment of DNA that is required to recruit the transcription initiation complex and initiate transcription, given the appropriate external signals (such as enhancers).
- Orthologues
-
Genes that originate from the same ancestral gene and are diverged by a speciation event.
- Mediator complex
-
A multi-subunit complex that can respond to many different activators (such as DNA-bound transcription factors) and links such signals to the core promoter and the transcription machinery.
- Tag library
-
A tag library is similar to a conventional cDNA library, except that, subsequently to isolation and cloning of the cDNA, small fragments are generated by restriction-enzyme cleavage, concatamerized and recloned. This approach enables efficient DNA sequencing of thousands of tags from a single library.
- Tag cluster
-
This Review defines tag clusters as genomic regions in which two or more tags (of 20 nucleotides in length) overlap each other (both being mapped to the same strand).
Rights and permissions
About this article
Cite this article
Sandelin, A., Carninci, P., Lenhard, B. et al. Mammalian RNA polymerase II core promoters: insights from genome-wide studies. Nat Rev Genet 8, 424–436 (2007). https://doi.org/10.1038/nrg2026
Published:
Issue Date:
DOI: https://doi.org/10.1038/nrg2026
This article is cited by
-
CCIVR2 facilitates comprehensive identification of both overlapping and non-overlapping antisense transcripts within specified regions
Scientific Reports (2023)
-
A natural variation in the promoter of GRA117 affects carbon assimilation in rice
Planta (2023)
-
Robust and tunable signal processing in mammalian cells via engineered covalent modification cycles
Nature Communications (2022)
-
The variation in promoter sequences of the Akt3 gene between cow and buffalo revealed different responses against mastitis
Journal of Genetic Engineering and Biotechnology (2021)
-
CRISPRi enables isoform-specific loss-of-function screens and identification of gastric cancer-specific isoform dependencies
Genome Biology (2021)