Protocol | Published:

5′ end–centered expression profiling using cap-analysis gene expression and next-generation sequencing

Nature Protocols volume 7, pages 542561 (2012) | Download Citation


Cap-analysis gene expression (CAGE) provides accurate high-throughput measurement of RNA expression. CAGE allows mapping of all the initiation sites of both capped coding and noncoding RNAs. In addition, transcriptional start sites within promoters are characterized at single-nucleotide resolution. The latter allows the regulatory inputs driving gene expression to be studied, which in turn enables the construction of transcriptional networks. Here we provide an optimized protocol for the construction of CAGE libraries on the basis of the preparation of 27-nt-long tags corresponding to initial bases at the 5′ ends of capped RNAs. We have optimized the methods using simple steps based on filtration, which altogether takes 4 d to complete. The CAGE tags can be readily sequenced with Illumina sequencers, and upon modification they are also amenable to sequencing using other platforms.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.


  1. 1.

    , , & Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270, 467–470 (1995).

  2. 2.

    et al. Transcriptional maps of 10 human chromosomes at 5-nucleotide resolution. Science 308, 1149–1154 (2005).

  3. 3.

    & Whole genome transcriptome analysis. RNA Biol. 6, 107–112 (2009).

  4. 4.

    , , & Serial analysis of gene expression. Science 270, 484–487 (1995).

  5. 5.

    et al. CAGE: cap analysis of gene expression. Nat. Methods 3, 211–222 (2006).

  6. 6.

    et al. Unamplified cap analysis of gene expression on a single-molecule sequencer. Genome Res. 21, 1150–1159 (2011).

  7. 7.

    et al. Dynamic usage of transcription start sites within core promoters. Genome Biol. 7, R118 (2006).

  8. 8.

    et al. Transcriptional and structural impact of TATA-initiation site spacing in mammalian core promoters. Genome Biol. 7, R78 (2006).

  9. 9.

    et al. Evolutionary turnover of mammalian transcription start sites. Genome Res. 16, 713–2 (2006).

  10. 10.

    et al. Genome-wide analysis of promoter architecture in Drosophila melanogaster. Genome Res. 21, 182–192 (2011).

  11. 11.

    et al. Genome-wide analysis of mammalian promoter architecture and evolution. Nat. Genet. 38, 626–635 (2006).

  12. 12.

    et al. The complexity of the mammalian transcriptome. J. Physiol. 575, 321–332 (2006).

  13. 13.

    et al. Building promoter aware transcriptional regulatory networks using siRNA perturbation and deepCAGE. Nucleic Acids Res. 38, 8141–8148 (2010).

  14. 14.

    et al. A code for transcription initiation in mammalian genomes. Genome Res. 18, 1–12 (2008).

  15. 15.

    et al. The transcriptional network that controls growth arrest and differentiation in a human myeloid leukemia cell line. Nat. Genet. 41, 553–562 (2009).

  16. 16.

    et al. The regulated retrotransposon transcriptome of mammalian cells. Nat. Genet. 41, 563–571 (2009).

  17. 17.

    et al. Tissue-specific transcript annotation and expression profiling with complementary next-generation sequencing technologies. Nucleic Acids Res. 38, e165 (2010).

  18. 18.

    et al. 5′ Long serial analysis of gene expression (LongSAGE) and 3′ LongSAGE for transcriptome characterization and genome annotation. Proc. Natl. Acad. Sci. USA. 101, 11701–11706 (2004).

  19. 19.

    et al. Genome-wide detection and analysis of hippocampus core promoters using DeepCAGE. Genome Res. 19, 255–265 (2009).

  20. 20.

    et al. Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage. Proc. Natl. Acad. Sci. USA 100, 15776–15781 (2003).

  21. 21.

    et al. A user's guide to the encyclopedia of DNA elements (ENCODE). PLoS Biol. 9, e1001046 (2011).

  22. 22.

    , , & CAGE (Cap Analysis of Gene Expression): a protocol for the detection of promoter and transcriptional networks. Methods Mol. Biol. 786, 181–200 (2012).

  23. 23.

    et al. The transcriptional landscape of the mammalian genome. Science 309, 1559–1563 (2005).

  24. 24.

    et al. Thermostabilization and thermoactivation of thermolabile enzymes by trehalose and its application for the synthesis of full length cDNA. Proc. Natl. Acad. Sci. USA 95, 520–524 (1998).

  25. 25.

    , , , & Extra-long first-strand cDNA synthesis. Biotechniques 32, 984–985 (2002).

  26. 26.

    et al. High-efficiency full-length cDNA cloning by biotinylated CAP trapper. Genomics 37, 327–336 (1996).

  27. 27.

    et al. RIKEN integrated sequence analysis (RISA) system—384-format sequencing pipeline with 384 multicapillary sequencer. Genome Res. 10, 1757–1771 (2000).

  28. 28.

    et al. Linking promoters to functional transcripts in small samples with nanoCAGE and CAGEscan. Nat. Methods 7, 528–534 (2010).

  29. 29.

    et al. Development of a DNA barcode tagging method for monitoring dynamic changes in gene expression by using an ultra high-throughput sequencer. Biotechniques 45, 95–97 (2008).

  30. 30.

    , , & Subunit assembly and mode of DNA cleavage of the type III restriction endonucleases EcoP1I and EcoP15I. J. Mol. Biol. 306, 417–431 (2001).

  31. 31.

    & Exogenous AdoMet and its analogue sinefungin differentially influence DNA cleavage by R.EcoP15I–usefulness in SAGE. Biochem. Biophys. Res. Commun. 334, 803–811 (2005).

  32. 32.

    A new mathematical model for relative quantification in real-time RT-PCR. Nucleic Acids Res. 29, e45 (2001).

  33. 33.

    , & TagDust—a program to eliminate artifacts from next generation sequencing data. Bioinformatics 25, 2839–2840 (2009).

  34. 34.

    & Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

  35. 35.

    et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

  36. 36.

    & BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).

  37. 37.

    et al. The UCSC Genome Browser database: update 2011. Nucleic Acids Res. 39, D876–D882 (2011).

  38. 38.

    , & edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010).

  39. 39.

    Cap-analysis Gene Expression (CAGE): The Science of Decoding Gene Transcription (Pan Stanford, 2010).

  40. 40.

    , , & Measuring reproducibility of high-throughput experiments. Ann. Appl. Stat. 5, 1752–1779 (2011).

Download references


This work was founded by a Research Grant for the RIKEN Omics Science Center from the Japanese Ministry of Education, Culture, Sports, Science and Technology (to Y.H.) This project was also supported by the US National Human Genome Research Institute grant no. U54 HG004557. We thank S. Kato for experimental support, J. Severin for the genome browser, the RIKEN Genome Network Analysis Service for sequencing and basic bioinformatics analysis, and all our colleagues at the Omics Science Center for valuable feedback during the development of the methodology.

Author information


  1. RIKEN Omics Science Center, RIKEN Yokohama Institute, Yokohama, Japan.

    • Hazuki Takahashi
    • , Timo Lassmann
    • , Mitsuyoshi Murata
    •  & Piero Carninci


  1. Search for Hazuki Takahashi in:

  2. Search for Timo Lassmann in:

  3. Search for Mitsuyoshi Murata in:

  4. Search for Piero Carninci in:


H.T. performed most experiments. M.M. performed the background reduction experiment. T.L. performed computations analysis. H.T. and P.C. wrote the manuscript. P.C. designed the project.

Competing interests

P.C. is an inventor on various patents owned by RIKEN and Dnaform on the Cap-trapper technology, full-length cDNA cloning technologies and the CAGE technology.

Corresponding author

Correspondence to Piero Carninci.

Supplementary information

Image files

  1. 1.

    Supplementary Fig. 1

    Oligo-dT priming enhances the capture of CAGE tags on exons and 3′ UTRs.  CAGE libraries made from THP-1 cells. Data was displayed with the ZENBU genome browser (J. Severin, unpublished data). (a) The Actin beta gene is transcribed from right to left (violet arrow) on chromosome 7. (b) GAPDH gene is transcribed from left to right (green arrow) on chromosome 12. CAGE libraries were primed RT reaction with (1) random and oligodT (ratio 4:1) primers. (2) oligodT primers only and (3) random primers only. Both panels indicate that oligodT primers could enhance the capture of transcripts on 3′ exons and on internal exons, compared to random primer alone.

Text files

  1. 1.

    Supplementary Data 1

    The make_ctss script, which is used to cluster the CTSS (Step 65).

About this article

Publication history



Further reading


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.