Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Efficient targeted transcript discovery via array-based normalization of RACE libraries


Rapid amplification of cDNA ends (RACE) is a widely used approach for transcript identification. Random clone selection from the RACE mixture, however, is an ineffective sampling strategy if the dynamic range of transcript abundances is large. To improve sampling efficiency of human transcripts, we hybridized the products of the RACE reaction onto tiling arrays and used the detected exons to delineate a series of reverse-transcriptase (RT)-PCRs, through which the original RACE transcript population was segregated into simpler transcript populations. We independently cloned the products and sequenced randomly selected clones. This approach, RACEarray, is superior to direct cloning and sequencing of RACE products because it specifically targets new transcripts and often results in overall normalization of transcript abundance. We show theoretically and experimentally that this strategy leads indeed to efficient sampling of new transcripts, and we investigated multiplexing the strategy by pooling RACE reactions from multiple interrogated loci before hybridization.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Strategy for comprehensive characterization of new isoforms of annotated genes.
Figure 2: Examples of new RACEfrags verified by RT-PCR, cloning and sequencing.
Figure 3: Genomic coverage of RACEfrags originating from different tissues or combinations of tissues.
Figure 4: Absolute number and cumulative proportion of projected RACEfrags originating from index exons.
Figure 5: Distribution of distances of RACEfrags to assigned index exons.

Similar content being viewed by others

Accession codes


Gene Expression Omnibus


  1. Adams, M.D., Soares, M.B., Kerlavage, A.R., Fields, C. & Venter, J.C. Rapid cDNA sequencing (expressed sequence tags) from a directionally cloned human infant brain cDNA library. Nat. Genet. 4, 373–380 (1993).

    Article  CAS  Google Scholar 

  2. Gerhard, D.S. et al. The status, quality, and expansion of the NIH full-length cDNA project: the Mammalian Gene Collection (MGC). Genome Res. 14, 2121–2127 (2004).

    Article  Google Scholar 

  3. Kawai, J. et al. Functional annotation of a full-length mouse cDNA collection. Nature 409, 685–690 (2001).

    Article  Google Scholar 

  4. Carninci, P. et al. The transcriptional landscape of the mammalian genome. Science 309, 1559–1563 (2005).

    Article  CAS  Google Scholar 

  5. Bonaldo, M.F., Lennon, G. & Soares, M.B. Normalization and subtraction: two approaches to facilitate gene discovery. Genome Res. 6, 791–806 (1996).

    Article  CAS  Google Scholar 

  6. Soares, M.B. et al. Construction and characterization of a normalized cDNA library. Proc. Natl. Acad. Sci. USA 91, 9228–9232 (1994).

    Article  CAS  Google Scholar 

  7. Thill, G. et al. ASEtrap: a biological method for speeding up the exploration of spliceomes. Genome Res. 16, 776–786 (2006).

    Article  CAS  Google Scholar 

  8. Watahiki, A. et al. Libraries enriched for alternatively spliced exons reveal splicing patterns in melanocytes and melanomas. Nat. Methods 1, 233–239 (2004).

    Article  Google Scholar 

  9. Harrow, J. et al. GENCODE: producing a reference annotation for ENCODE. Genome Biol. 7 (Suppl 1), S4.1–S4.9 (2006).

    Article  Google Scholar 

  10. Shiraki, T. et al. Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage. Proc. Natl. Acad. Sci. USA 100, 15776–15781 (2003).

    Article  CAS  Google Scholar 

  11. Ng, P. et al. Gene identification signature (GIS) analysis for transcriptome characterization and genome annotation. Nat. Methods 2, 105–111 (2005).

    Article  CAS  Google Scholar 

  12. Peters, L.M. et al. Signatures from tissue-specific MPSS libraries identify transcripts preferentially expressed in the mouse inner ear. Genomics 89, 197–206 (2007).

    Article  CAS  Google Scholar 

  13. Roma, G. et al. A novel view of the transcriptome revealed from gene trapping in mouse embryonic stem cells. Genome Res. 17, 1051–1060 (2007).

    Article  CAS  Google Scholar 

  14. Kapranov, P. et al. RNA maps reveal new RNA classes and a possible function for pervasive transcription. Science 316, 1484–1488 (2007).

    Article  CAS  Google Scholar 

  15. Birney, E. et al. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447, 799–816 (2007).

    Article  CAS  Google Scholar 

  16. Denoeud, F. et al. Prominent use of distal 5′ transcription start sites and discovery of a large number of additional exons in ENCODE regions. Genome Res. 17, 746–759 (2007).

    Article  CAS  Google Scholar 

  17. Kapranov, P. et al. Examples of the complex architecture of the human transcriptome revealed by RACE and high-density tiling arrays. Genome Res. 15, 987–997 (2005).

    Article  CAS  Google Scholar 

  18. Frohman, M.A., Dush, M.K. & Martin, G.R. Rapid production of full-length cDNAs from rare transcripts: amplification using a single gene-specific oligonucleotide primer. Proc. Natl. Acad. Sci. USA 85, 8998–9002 (1988).

    Article  CAS  Google Scholar 

  19. Reymond, A. et al. Human chromosome 21 gene expression atlas in the mouse. Nature 420, 582–586 (2002).

    Article  CAS  Google Scholar 

  20. The ENCODE Project Consortium. The ENCODE (ENCyclopedia Of DNA Elements) Project. Science 306, 636–640 (2004).

  21. The ENCODE Project Consortium. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447, 799–816 (2007).

  22. Kodzius, R. et al. CAGE: cap analysis of gene expression. Nat. Methods 3, 211–222 (2006).

    Article  CAS  Google Scholar 

  23. Parra, G. et al. Tandem chimerism as a means to increase protein complexity in the human genome. Genome Res. 16, 37–44 (2006).

    Article  CAS  Google Scholar 

  24. Parra, G., Blanco, E. & Guigo, R. GeneID in Drosophila. Genome Res. 10, 511–515 (2000).

    Article  CAS  Google Scholar 

  25. Rozen, S. & Skaletsky, H. Primer3 on the WWW for general users and for biologist programmers. Methods Mol. Biol. 132, 365–386 (2000).

    CAS  PubMed  Google Scholar 

Download references


The project at Institut Municipal d'Investigació Mèdica, Center for Genomic Regulation (CRG), the Universities of Lausanne and Geneva, and Affymetrix was supported by grants U01HG003150 and U01HG003147 from the US National Human Genome Research Institute, National Institutes of Health; at IMIM and CRG also funded by grant BIO2006-03380 from the Spanish Ministry of Education and Science and from the European BioSapiens Consortium; at the Universities of Lausanne and Geneva also funded by the Swiss National Science Foundation, the EU AnEUploidy project and the National Center of Competence in Research Frontiers in Genetics; and at Affymetrix also funded by the National Cancer Institute, National Institutes of Health (N01-CO-12400) and by Affymetrix, Inc. The portion of this work carried out at Center for Cancer Systems Biology was funded by a grant from the Ellison Foundation (to M.V.) and as Institute Sponsored Research from the Dana Farber Cancer Institute Strategic Initiative. We acknowledge J.M. Oller for reviewing the probabilistic results and R. Castelo, C. Howald and D. Martin for useful suggestions.

Author information

Authors and Affiliations



T.R.G., S.E.A., A.R., P.K. and R.G. participated in the overall design of the experiments and the subsequent analysis. A.R., C.U., C.W., P.M. and S.E.A. performed the RACE reactions. J.D., E.D. and P.K. performed the hybridization of the RACE reactions into tiling arrays. R.R.M., C.L., D.S., K.S.-A. and M.V. carried out the RT-PCRs, the cloning and sequencing of candidates. S.D., S.F., J.L., F.D. and R.G. developed software and carried out the bioinformatics analysis. M.C. developed the theoretical model for sampling and carried out the computational simulations. A.F. and J.H. provided the reference gene annotation and helped map the RT-PCR sequences to the genome.

Corresponding author

Correspondence to Roderic Guigó.

Ethics declarations

Competing interests

P.K., J.D., E.D. and T.R.G. are Affymetrix employees.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–8, Supplementary Tables 1–2, Supplementary Methods, Supplementary Results (PDF 1195 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Djebali, S., Kapranov, P., Foissac, S. et al. Efficient targeted transcript discovery via array-based normalization of RACE libraries. Nat Methods 5, 629–635 (2008).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

This article is cited by


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing