Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Genome-wide analysis of mammalian promoter architecture and evolution

A Corrigendum to this article was published on 01 September 2007

This article has been updated


Mammalian promoters can be separated into two classes, conserved TATA box–enriched promoters, which initiate at a well-defined site, and more plastic, broad and evolvable CpG-rich promoters. We have sequenced tags corresponding to several hundred thousand transcription start sites (TSSs) in the mouse and human genomes, allowing precise analysis of the sequence architecture and evolution of distinct promoter classes. Different tissues and families of genes differentially use distinct types of promoters. Our tagging methods allow quantitative analysis of promoter usage in different tissues and show that differentially regulated alternative TSSs are a common feature in protein-coding genes and commonly generate alternative N termini. Among the TSSs, we identified new start sites associated with the majority of exons and with 3′ UTRs. These data permit genome-scale identification of tissue-specific promoters and analysis of the cis-acting elements associated with them.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Definition and characteristics of CAGE tag clusters.
Figure 2: TATA-box and TSS spacing definition and consensus.
Figure 3: Bidirectional overlapping promoters of Gabpa and Atp5j.
Figure 4: Pyrimidine-purine dinucleotides drive expression.
Figure 5: Promoter-based clustering reveals global features of the transcriptome.
Figure 6: Alternative promoters in protein-coding genes.

Similar content being viewed by others

Change history

  • 06 May 2006

    In the version of this article initially published online, the x-axis of Figure 4b was mislabeled. Specifically, the five groups on the x-axis should be labeled: No mutation PyPu to PuPu PyPu to PuPy PyPu to PyPu PyPu to PyPy The error has been corrected for all versions of the article.

  • 29 August 2007

    In the version of this article initially published, two of the smaller bar plots in Figure 1e were mistakenly duplicated. Specifically, the Zfp385 plot is an erroneous copy of the 137774 plot, and the Txndc7 plot is an erroneous copy of the Pik3r5 plot. See below for the corrected version of the figure. This error does not change the conclusions of the study in any way, as the bar plots are just a few visual examples of more than 5,000 tag clusters, and the correct plots follow the same distribution patterns as the erroneous ones. This error has been corrected in the HTML and PDF versions of the article.


  1. Bajic, V.B., Tan, S.L., Suzuki, Y. & Sugano, S. Promoter prediction analysis on the whole human genome. Nat. Biotechnol. 22, 1467–1473 (2004).

    Article  CAS  Google Scholar 

  2. Carninci, P. et al. Targeting a complex transcriptome: the construction of the mouse full-length cDNA encyclopedia. Genome Res. 13, 1273–1289 (2003).

    Article  Google Scholar 

  3. Okazaki, Y. et al. Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs. Nature 420, 563–573 (2002).

    Article  Google Scholar 

  4. Shiraki, T. et al. Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage. Proc. Natl. Acad. Sci. USA 100, 15776–15781 (2003).

    Article  CAS  Google Scholar 

  5. Kodzius, R. et al. CAGE: cap analysis of gene expression. Nat. Methods 3, 211–222 (2006).

    Article  CAS  Google Scholar 

  6. Jackson, D.A., Pombo, A. & Iborra, F. The balance sheet for transcription: an analysis of nuclear RNA metabolism in mammalian cells. FASEB J. 14, 242–254 (2000).

    Article  CAS  Google Scholar 

  7. Carninci, P. et al. The transcriptional landscape of the mammalian genome. Science 309, 1559–1563 (2005).

    Article  CAS  Google Scholar 

  8. Suzuki, Y., Yoshitomo-Nakagawa, K., Maruyama, K., Suyama, A. & Sugano, S. Construction and characterization of a full length-enriched and a 5′-end-enriched cDNA library. Gene 200, 149–156 (1997).

    Article  CAS  Google Scholar 

  9. Bucher, P. Weight matrix descriptions of four eukaryotic RNA polymerase II promoter elements derived from 502 unrelated promoter sequences. J. Mol. Biol. 212, 563–578 (1990).

    Article  CAS  Google Scholar 

  10. Karolchik, D. et al. The UCSC Genome Browser database. Nucleic Acids Res. 31, 51–54 (2003).

    Article  CAS  Google Scholar 

  11. Suzuki, Y. et al. Identification and characterization of the potential promoter regions of 1031 kinds of human genes. Genome Res. 11, 677–684 (2001).

    Article  CAS  Google Scholar 

  12. Kadonaga, J.T. The DPE, a core promoter element for transcription by RNA polymerase II. Exp. Mol. Med. 34, 259–264 (2002).

    Article  CAS  Google Scholar 

  13. Smale, S.T. & Kadonaga, J.T. The RNA polymerase II core promoter. Annu. Rev. Biochem. 72, 449–479 (2003).

    Article  CAS  Google Scholar 

  14. Burke, T.W. & Kadonaga, J.T. The downstream core promoter element, DPE, is conserved from Drosophila to humans and is recognized by TAFII60 of Drosophila . Genes Dev. 11, 3020–3031 (1997).

    Article  CAS  Google Scholar 

  15. Schneider, T.D. & Stephens, R.M. Sequence logos: a new way to display consensus sequences. Nucleic Acids Res. 18, 6097–6100 (1990).

    Article  CAS  Google Scholar 

  16. Butler, J.E. & Kadonaga, J.T. The RNA polymerase II core promoter: a key component in the regulation of gene expression. Genes Dev. 16, 2583–2592 (2002).

    Article  CAS  Google Scholar 

  17. Trinklein, N.D. et al. An abundance of bidirectional promoters in the human genome. Genome Res. 14, 62–66 (2004).

    Article  CAS  Google Scholar 

  18. Patton, J., Block, S., Coombs, C. & Martin, M.E. Identification of functional elements in the murine Gabp alpha/ATP synthase coupling factor 6 bi-directional promoter. Gene 369, 35–44 (2005).

    Article  Google Scholar 

  19. Prescott, E.M. & Proudfoot, N.J. Transcriptional collision between convergent genes in budding yeast. Proc. Natl. Acad. Sci. USA 99, 8796–8801 (2002).

    Article  CAS  Google Scholar 

  20. Katayama, S. et al. Antisense transcription in the mammalian transcriptome. Science 309, 1564–1566 (2005).

    Article  Google Scholar 

  21. Lenhard, B. et al. Identification of conserved regulatory elements by comparative genome analysis. J. Biol. 2, 13 (2003).

    Article  Google Scholar 

  22. Xie, X. et al. Systematic discovery of regulatory motifs in human promoters and 3′ UTRs by comparison of several mammals. Nature 434, 338–345 (2005).

    Article  CAS  Google Scholar 

  23. Keightley, P.D. & Gaffney, D.J. Functional constraints and frequency of deleterious mutations in noncoding DNA of rodents. Proc. Natl. Acad. Sci. USA 100, 13402–13406 (2003).

    Article  CAS  Google Scholar 

  24. Eisen, M.B., Spellman, P.T., Brown, P.O. & Botstein, D. Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95, 14863–14868 (1998).

    Article  CAS  Google Scholar 

  25. Kodzius, R. et al. Absolute expression values for mouse transcripts: re-annotation of the READ expression database by the use of CAGE and EST sequence tags. FEBS Lett. 559, 22–26 (2004).

    Article  CAS  Google Scholar 

  26. Schug, J. et al. Promoter features related to tissue specificity as measured by Shannon entropy. Genome Biol. 6, R33 (2005).

    Article  Google Scholar 

  27. Sandelin, A. & Wasserman, W.W. Constrained binding site diversity within families of transcription factors enhances pattern discovery bioinformatics. J. Mol. Biol. 338, 207–215 (2004).

    Article  CAS  Google Scholar 

  28. Landry, J.R., Mager, D.L. & Wilhelm, B.T. Complex controls: the role of alternative promoters in mammalian genomes. Trends Genet. 19, 640–648 (2003).

    Article  CAS  Google Scholar 

  29. Rosmarin, A.G., Yang, Z. & Resendes, K.K. Transcriptional regulation in myelopoiesis: Hematopoietic fate choice, myeloid differentiation, and leukemogenesis. Exp. Hematol. 33, 131–143 (2005).

    Article  CAS  Google Scholar 

  30. Bonizzi, G. & Karin, M. The two NF-kappaB activation pathways and their role in innate and adaptive immunity. Trends Immunol. 25, 280–288 (2004).

    Article  CAS  Google Scholar 

  31. ENCODE Project Consortium. The ENCODE (ENCyclopedia Of DNA Elements) Project. Science 306, 636–640 (2004).

  32. Kapranov, P. et al. Examples of the complex architecture of the human transcriptome revealed by RACE and high-density tiling arrays. Genome Res. 15, 987–997 (2005).

    Article  CAS  Google Scholar 

  33. Cheng, J. et al. Transcriptional maps of 10 human chromosomes at 5-nucleotide resolution. Science 308, 1149–1154 (2005).

    Article  CAS  Google Scholar 

  34. Pruitt, K.D., Tatusova, T. & Maglott, D.R. NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 33, D501–D504 (2005).

    Article  CAS  Google Scholar 

  35. Brodsky, A.S. et al. Genomic mapping of RNA polymerase II reveals sites of co-transcriptional regulation in human cells. Genome Biol. 6, R64 (2005).

    Article  Google Scholar 

  36. Bentley, D.L. Rules of engagement: co-transcriptional recruitment of pre-mRNA processing factors. Curr. Opin. Cell Biol. 17, 251–256 (2005).

    Article  CAS  Google Scholar 

  37. Wu, Y., Zhang, Y. & Zhang, J. Distribution of exonic splicing enhancer elements in human genes. Genomics 86, 329–336 (2005).

    Article  CAS  Google Scholar 

  38. Imamura, T. et al. Non-coding RNA directed DNA demethylation of Sphk1 CpG island. Biochem. Biophys. Res. Commun. 322, 593–600 (2004).

    Article  CAS  Google Scholar 

  39. Bluthgen, N., Kielbasa, S.M. & Herzel, H. Inferring combinatorial regulation of transcription in silico. Nucleic Acids Res. 33, 272–279 (2005).

    Article  CAS  Google Scholar 

  40. Siepel, A. & Haussler, D. Combining phylogenetic and hidden Markov models in biosequence analysis. J. Comput. Biol. 11, 413–428 (2004).

    Article  CAS  Google Scholar 

Download references


We thank the following individuals for discussion, encouragement and technical assistance: H. Atsui, A. Hasegawa, K. Hayashida, H. Himei, F. Hori, C. Kawazu, M. Kojima, K. Waki, M. Aoki, K Murakami, M. Murata, M. Nishikawa, H. Nishiyori, K. Nomura, M. Ohno, H. Sato, Y. Shigemoto, N. Suzuki, Y. Takeda and K. Yoshida. We especially thank A. Wada, T. Ogawa, M. Muramatsu, A. Kira and all the members of RIKEN Yokohama Research Promotion Division for supporting and encouraging the project. We also thank the Laboratory of Genome Exploration Research Group for secretarial and technical assistance, and Yokohama City University, who provided human samples and computational resources of the RIKEN Super Combined Cluster (RSCC). This work was mainly supported by Research Grant for the Genome Network Project from the Japanese Ministry of Education, Culture, Sports, Science and Technology (MEXT), the RIKEN Genome Exploration Research Project from the Japanese Ministry of Education, Culture, Sports, Science and Technology of the Japanese Government (to Y.H.), Advanced and Innovational Research Program in Life Science (to Y.H.), National Project on Protein Structural and Functional Analysis from MEXT (to Y.H.), Presidential Research Grant for Intersystem Collaboration of RIKEN (to P.C. and Y.H.) and a grant from the Six Framework Program from the European Commission (to P.C.).

Author information

Authors and Affiliations


Corresponding authors

Correspondence to David A Hume or Yoshihide Hayashizaki.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Fig. 1

Mapping CAGE starting sites to the genome. (PDF 590 kb)

Supplementary Fig. 2

Assessment of exonic promoter activity. (PDF 247 kb)

Supplementary Fig. 3

Conservation of promoters and TSS shapes over evolution. (PDF 881 kb)

Supplementary Fig. 4

Initiation site properties and evolutionary changes. (PDF 1028 kb)

Supplementary Fig. 5

Sequence pattern distributions for different classes of promoters. (PDF 347 kb)

Supplementary Fig. 6

Alternative promoters and transcription start sites in 3′ UTRs. (PDF 1166 kb)

Supplementary Fig. 7

CAGE validation examples. (PDF 983 kb)

Supplementary Fig. 8

Definition of TCs and mRNA assignments of TCs. (PDF 88 kb)

Supplementary Table 1

Detailed description of the data sets. (PDF 120 kb)

Supplementary Table 2

Substitution rate estimates for mouse and human promoters. (PDF 307 kb)

Supplementary Table 3

Functional and tissue specificity overrepresentation for different shape classes. (PDF 193 kb)

Supplementary Table 4

Internet links to publicly available resources and data sets. (PDF 126 kb)

Supplementary Table 5

CAGE reproducibility statistics. (PDF 220 kb)

Supplementary Table 6

Overrepresentation index of TFBS in macrophage promoters. (PDF 31 kb)

Supplementary Table 7

Overrepresentation and underrepresentation index of TFBS in macrophage promoters, detailed view. (PDF 135 kb)

Supplementary Note (PDF 184 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Carninci, P., Sandelin, A., Lenhard, B. et al. Genome-wide analysis of mammalian promoter architecture and evolution. Nat Genet 38, 626–635 (2006).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

This article is cited by


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing