Genome-wide analysis of mammalian promoter architecture and evolution

  • A Corrigendum to this article was published on 01 September 2007

Abstract

Mammalian promoters can be separated into two classes, conserved TATA box–enriched promoters, which initiate at a well-defined site, and more plastic, broad and evolvable CpG-rich promoters. We have sequenced tags corresponding to several hundred thousand transcription start sites (TSSs) in the mouse and human genomes, allowing precise analysis of the sequence architecture and evolution of distinct promoter classes. Different tissues and families of genes differentially use distinct types of promoters. Our tagging methods allow quantitative analysis of promoter usage in different tissues and show that differentially regulated alternative TSSs are a common feature in protein-coding genes and commonly generate alternative N termini. Among the TSSs, we identified new start sites associated with the majority of exons and with 3′ UTRs. These data permit genome-scale identification of tissue-specific promoters and analysis of the cis-acting elements associated with them.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Figure 1: Definition and characteristics of CAGE tag clusters.
Figure 2: TATA-box and TSS spacing definition and consensus.
Figure 3: Bidirectional overlapping promoters of Gabpa and Atp5j.
Figure 4: Pyrimidine-purine dinucleotides drive expression.
Figure 5: Promoter-based clustering reveals global features of the transcriptome.
Figure 6: Alternative promoters in protein-coding genes.

Change history

  • 06 May 2006

    In the version of this article initially published online, the x-axis of Figure 4b was mislabeled. Specifically, the five groups on the x-axis should be labeled: No mutation PyPu to PuPu PyPu to PuPy PyPu to PyPu PyPu to PyPy The error has been corrected for all versions of the article.

  • 29 August 2007

    In the version of this article initially published, two of the smaller bar plots in Figure 1e were mistakenly duplicated. Specifically, the Zfp385 plot is an erroneous copy of the 137774 plot, and the Txndc7 plot is an erroneous copy of the Pik3r5 plot. See below for the corrected version of the figure. This error does not change the conclusions of the study in any way, as the bar plots are just a few visual examples of more than 5,000 tag clusters, and the correct plots follow the same distribution patterns as the erroneous ones. This error has been corrected in the HTML and PDF versions of the article.

References

  1. 1

    Bajic, V.B., Tan, S.L., Suzuki, Y. & Sugano, S. Promoter prediction analysis on the whole human genome. Nat. Biotechnol. 22, 1467–1473 (2004).

  2. 2

    Carninci, P. et al. Targeting a complex transcriptome: the construction of the mouse full-length cDNA encyclopedia. Genome Res. 13, 1273–1289 (2003).

  3. 3

    Okazaki, Y. et al. Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs. Nature 420, 563–573 (2002).

  4. 4

    Shiraki, T. et al. Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage. Proc. Natl. Acad. Sci. USA 100, 15776–15781 (2003).

  5. 5

    Kodzius, R. et al. CAGE: cap analysis of gene expression. Nat. Methods 3, 211–222 (2006).

  6. 6

    Jackson, D.A., Pombo, A. & Iborra, F. The balance sheet for transcription: an analysis of nuclear RNA metabolism in mammalian cells. FASEB J. 14, 242–254 (2000).

  7. 7

    Carninci, P. et al. The transcriptional landscape of the mammalian genome. Science 309, 1559–1563 (2005).

  8. 8

    Suzuki, Y., Yoshitomo-Nakagawa, K., Maruyama, K., Suyama, A. & Sugano, S. Construction and characterization of a full length-enriched and a 5′-end-enriched cDNA library. Gene 200, 149–156 (1997).

  9. 9

    Bucher, P. Weight matrix descriptions of four eukaryotic RNA polymerase II promoter elements derived from 502 unrelated promoter sequences. J. Mol. Biol. 212, 563–578 (1990).

  10. 10

    Karolchik, D. et al. The UCSC Genome Browser database. Nucleic Acids Res. 31, 51–54 (2003).

  11. 11

    Suzuki, Y. et al. Identification and characterization of the potential promoter regions of 1031 kinds of human genes. Genome Res. 11, 677–684 (2001).

  12. 12

    Kadonaga, J.T. The DPE, a core promoter element for transcription by RNA polymerase II. Exp. Mol. Med. 34, 259–264 (2002).

  13. 13

    Smale, S.T. & Kadonaga, J.T. The RNA polymerase II core promoter. Annu. Rev. Biochem. 72, 449–479 (2003).

  14. 14

    Burke, T.W. & Kadonaga, J.T. The downstream core promoter element, DPE, is conserved from Drosophila to humans and is recognized by TAFII60 of Drosophila . Genes Dev. 11, 3020–3031 (1997).

  15. 15

    Schneider, T.D. & Stephens, R.M. Sequence logos: a new way to display consensus sequences. Nucleic Acids Res. 18, 6097–6100 (1990).

  16. 16

    Butler, J.E. & Kadonaga, J.T. The RNA polymerase II core promoter: a key component in the regulation of gene expression. Genes Dev. 16, 2583–2592 (2002).

  17. 17

    Trinklein, N.D. et al. An abundance of bidirectional promoters in the human genome. Genome Res. 14, 62–66 (2004).

  18. 18

    Patton, J., Block, S., Coombs, C. & Martin, M.E. Identification of functional elements in the murine Gabp alpha/ATP synthase coupling factor 6 bi-directional promoter. Gene 369, 35–44 (2005).

  19. 19

    Prescott, E.M. & Proudfoot, N.J. Transcriptional collision between convergent genes in budding yeast. Proc. Natl. Acad. Sci. USA 99, 8796–8801 (2002).

  20. 20

    Katayama, S. et al. Antisense transcription in the mammalian transcriptome. Science 309, 1564–1566 (2005).

  21. 21

    Lenhard, B. et al. Identification of conserved regulatory elements by comparative genome analysis. J. Biol. 2, 13 (2003).

  22. 22

    Xie, X. et al. Systematic discovery of regulatory motifs in human promoters and 3′ UTRs by comparison of several mammals. Nature 434, 338–345 (2005).

  23. 23

    Keightley, P.D. & Gaffney, D.J. Functional constraints and frequency of deleterious mutations in noncoding DNA of rodents. Proc. Natl. Acad. Sci. USA 100, 13402–13406 (2003).

  24. 24

    Eisen, M.B., Spellman, P.T., Brown, P.O. & Botstein, D. Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95, 14863–14868 (1998).

  25. 25

    Kodzius, R. et al. Absolute expression values for mouse transcripts: re-annotation of the READ expression database by the use of CAGE and EST sequence tags. FEBS Lett. 559, 22–26 (2004).

  26. 26

    Schug, J. et al. Promoter features related to tissue specificity as measured by Shannon entropy. Genome Biol. 6, R33 (2005).

  27. 27

    Sandelin, A. & Wasserman, W.W. Constrained binding site diversity within families of transcription factors enhances pattern discovery bioinformatics. J. Mol. Biol. 338, 207–215 (2004).

  28. 28

    Landry, J.R., Mager, D.L. & Wilhelm, B.T. Complex controls: the role of alternative promoters in mammalian genomes. Trends Genet. 19, 640–648 (2003).

  29. 29

    Rosmarin, A.G., Yang, Z. & Resendes, K.K. Transcriptional regulation in myelopoiesis: Hematopoietic fate choice, myeloid differentiation, and leukemogenesis. Exp. Hematol. 33, 131–143 (2005).

  30. 30

    Bonizzi, G. & Karin, M. The two NF-kappaB activation pathways and their role in innate and adaptive immunity. Trends Immunol. 25, 280–288 (2004).

  31. 31

    ENCODE Project Consortium. The ENCODE (ENCyclopedia Of DNA Elements) Project. Science 306, 636–640 (2004).

  32. 32

    Kapranov, P. et al. Examples of the complex architecture of the human transcriptome revealed by RACE and high-density tiling arrays. Genome Res. 15, 987–997 (2005).

  33. 33

    Cheng, J. et al. Transcriptional maps of 10 human chromosomes at 5-nucleotide resolution. Science 308, 1149–1154 (2005).

  34. 34

    Pruitt, K.D., Tatusova, T. & Maglott, D.R. NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 33, D501–D504 (2005).

  35. 35

    Brodsky, A.S. et al. Genomic mapping of RNA polymerase II reveals sites of co-transcriptional regulation in human cells. Genome Biol. 6, R64 (2005).

  36. 36

    Bentley, D.L. Rules of engagement: co-transcriptional recruitment of pre-mRNA processing factors. Curr. Opin. Cell Biol. 17, 251–256 (2005).

  37. 37

    Wu, Y., Zhang, Y. & Zhang, J. Distribution of exonic splicing enhancer elements in human genes. Genomics 86, 329–336 (2005).

  38. 38

    Imamura, T. et al. Non-coding RNA directed DNA demethylation of Sphk1 CpG island. Biochem. Biophys. Res. Commun. 322, 593–600 (2004).

  39. 39

    Bluthgen, N., Kielbasa, S.M. & Herzel, H. Inferring combinatorial regulation of transcription in silico. Nucleic Acids Res. 33, 272–279 (2005).

  40. 40

    Siepel, A. & Haussler, D. Combining phylogenetic and hidden Markov models in biosequence analysis. J. Comput. Biol. 11, 413–428 (2004).

Download references

Acknowledgements

We thank the following individuals for discussion, encouragement and technical assistance: H. Atsui, A. Hasegawa, K. Hayashida, H. Himei, F. Hori, C. Kawazu, M. Kojima, K. Waki, M. Aoki, K Murakami, M. Murata, M. Nishikawa, H. Nishiyori, K. Nomura, M. Ohno, H. Sato, Y. Shigemoto, N. Suzuki, Y. Takeda and K. Yoshida. We especially thank A. Wada, T. Ogawa, M. Muramatsu, A. Kira and all the members of RIKEN Yokohama Research Promotion Division for supporting and encouraging the project. We also thank the Laboratory of Genome Exploration Research Group for secretarial and technical assistance, and Yokohama City University, who provided human samples and computational resources of the RIKEN Super Combined Cluster (RSCC). This work was mainly supported by Research Grant for the Genome Network Project from the Japanese Ministry of Education, Culture, Sports, Science and Technology (MEXT), the RIKEN Genome Exploration Research Project from the Japanese Ministry of Education, Culture, Sports, Science and Technology of the Japanese Government (to Y.H.), Advanced and Innovational Research Program in Life Science (to Y.H.), National Project on Protein Structural and Functional Analysis from MEXT (to Y.H.), Presidential Research Grant for Intersystem Collaboration of RIKEN (to P.C. and Y.H.) and a grant from the Six Framework Program from the European Commission (to P.C.).

Author information

Correspondence to David A Hume or Yoshihide Hayashizaki.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Fig. 1

Mapping CAGE starting sites to the genome. (PDF 590 kb)

Supplementary Fig. 2

Assessment of exonic promoter activity. (PDF 247 kb)

Supplementary Fig. 3

Conservation of promoters and TSS shapes over evolution. (PDF 881 kb)

Supplementary Fig. 4

Initiation site properties and evolutionary changes. (PDF 1028 kb)

Supplementary Fig. 5

Sequence pattern distributions for different classes of promoters. (PDF 347 kb)

Supplementary Fig. 6

Alternative promoters and transcription start sites in 3′ UTRs. (PDF 1166 kb)

Supplementary Fig. 7

CAGE validation examples. (PDF 983 kb)

Supplementary Fig. 8

Definition of TCs and mRNA assignments of TCs. (PDF 88 kb)

Supplementary Table 1

Detailed description of the data sets. (PDF 120 kb)

Supplementary Table 2

Substitution rate estimates for mouse and human promoters. (PDF 307 kb)

Supplementary Table 3

Functional and tissue specificity overrepresentation for different shape classes. (PDF 193 kb)

Supplementary Table 4

Internet links to publicly available resources and data sets. (PDF 126 kb)

Supplementary Table 5

CAGE reproducibility statistics. (PDF 220 kb)

Supplementary Table 6

Overrepresentation index of TFBS in macrophage promoters. (PDF 31 kb)

Supplementary Table 7

Overrepresentation and underrepresentation index of TFBS in macrophage promoters, detailed view. (PDF 135 kb)

Supplementary Note (PDF 184 kb)

Rights and permissions

Reprints and Permissions

About this article

Further reading