Abstract
We have developed a DNA tag sequencing and mapping strategy called gene identification signature (GIS) analysis, in which 5′ and 3′ signatures of full-length cDNAs are accurately extracted into paired-end ditags (PETs) that are concatenated for efficient sequencing and mapped to genome sequences to demarcate the transcription boundaries of every gene. GIS analysis is potentially 30-fold more efficient than standard cDNA sequencing approaches for transcriptome characterization. We demonstrated this approach with 116,252 PET sequences derived from mouse embryonic stem cells. Initial analysis of this dataset identified hundreds of previously uncharacterized transcripts, including alternative transcripts of known genes. We also uncovered several intergenically spliced and unusual fusion transcripts, one of which was confirmed as a trans-splicing event and was differentially expressed. The concept of paired-end ditagging described here for transcriptome analysis can also be applied to whole-genome analysis of cis-regulatory and other DNA elements and represents an important technological advance for genome annotation.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
International Human Genome Sequencing Consortium. Finishing the euchromatic sequence of the human genome. Nature 431, 931–945 (2004).
Lander, E.S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).
Venter, J.C. et al. The sequence of the human genome. Science 291, 1304–1351 (2001).
Gibbs, R.A. et al. Genome sequence of the Brown Norway rat yields insights into mammalian evolution. Nature 428, 493–521 (2004).
Waterston, R.H. et al. Initial sequencing and comparative analysis of the mouse genome. Nature 420, 520–562 (2002).
The ENCODE Project Consortium. The ENCODE (ENCyclopedia Of DNA Elements) Project. Science 306, 636–640 (2004).
Okazaki, Y. et al. Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs. Nature 420, 563–573 (2002).
Rinn, J.L. et al. The transcriptional activity of human chromosome 22. Genes Dev. 17, 529–540 (2003).
Kampa, D. et al. Novel RNAs identified from an in-depth analysis of the transcriptome of human chromosomes 21 and 22. Genome Res. 14, 331–342 (2004).
Brent, M.R. & Guigo, R. Recent advances in gene structure prediction. Curr. Opin. Struct. Biol. 14, 264–272 (2004).
Guigo, R., Agarwal, P., Abril, J.F., Burset, M. & Fickett, J.W. An assessment of gene prediction accuracy in large DNA sequences. Genome Res. 10, 1631–1642 (2000).
Guigo, R. et al. Comparison of mouse and human genomes followed by experimental verification yields an estimated 1,019 additional genes. Proc. Natl. Acad. Sci. USA 100, 1140–1145 (2003).
Parra, G. et al. Comparative gene prediction in human and mouse. Genome Res. 13, 108–117 (2003).
Ruan, Y., Le Ber, P., Ng, H.H. & Liu, E.T. Interrogating the transcriptome. Trends Biotechnol. 22, 23–30 (2004).
Strausberg, R.L. et al. Generation and initial analysis of more than 15,000 full-length human and mouse cDNA sequences. Proc. Natl. Acad. Sci. USA 99, 16899–16903 (2002).
Kawai, J. et al. Functional annotation of a full-length mouse cDNA collection. Nature 409, 685–690 (2001).
Velculescu, V.E., Zhang, L., Vogelstein, B. & Kinzler, K.W. Serial analysis of gene expression. Science 270, 484–487 (1995).
Saha, S. et al. Using the transcriptome to annotate the genome. Nat. Biotechnol. 20, 508–512 (2002).
Brenner, S. et al. Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays. Nat. Biotechnol. 18, 630–634 (2000).
Wei, C-L. et al. 5′ Long serial analysis of gene expression (LongSAGE) and 3′ LongSAGE for transcriptome characterization and genome annotation. Proc. Natl. Acad. Sci. USA 101, 11701–11706 (2004).
Maniatis, T. & Tasic, B. Alternative pre-mRNA splicing and proteome expansion in metazoans. Nature 418, 236–243 (2002).
Kapranov, P. et al. Large-scale transcriptional activity in chromosomes 21 and 22. Science 296, 916–919 (2002).
Shoemaker, D.D. et al. Experimental annotation of the human genome using microarray technology. Nature 409, 922–927 (2001).
Yamada, K. et al. Empirical analysis of transcriptional activity in the Arabidopsis genome. Science 302, 842–846 (2003).
Carninci, P. & Hayashizaki, Y. High-efficiency full-length cDNA cloning. Methods Enzymol. 303, 19–44 (1999).
Maruyama, K. & Sugano, S. Oligo-capping: a simple method to replace the cap structure of eukaryotic mRNAs with oligoribonucleotides. Gene 138, 171–174 (1994).
Zhu, Y.Y., Machleder, E.M., Chenchik, A., Li, R. & Siebert, P.D. Reverse transcriptase template switching: a SMART approach for full-length cDNA library construction. Biotechniques 30, 892–897 (2001).
Hashimoto, S. et al. 5′-end SAGE for the analysis of transcriptional start sites. Nat. Biotechnol. 22, 1146–1149 (2004).
Shiraki, T. et al. Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage. Proc. Natl. Acad. Sci. USA 100, 15776–15781 (2003).
Hooper, M., Hardy, K., Handyside, A., Hunter, S. & Monk, M. HPRT-deficient (Lesch-Nyhan) mouse embryos derived from germline colonization by cultured cells. Nature 326, 292–295 (1987).
Acknowledgements
The authors acknowledge Q. Yu of the National University of Singapore and P. Kondu, W.Y. Au Yong, H.C. Yong, M. Hirwan and M. Rebhan of the Bioinformatics Institute of Singapore for information technology and bioinformatics support, as well as P. Li for providing the E14 cells. This work was funded by the Agency for Science, Technology and Research (A*STAR), Singapore.
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Supplementary Fig. 1
An example of GIS PET structure and the strategy of mapping PET to genome (PDF 869 kb)
Supplementary Fig. 2
Mapping specificity and quality of PETs to known transcripts (PDF 290 kb)
Supplementary Fig. 3
Quantitative comparison of GIS PETs and EST enumeration (PDF 513 kb)
Supplementary Fig. 4
Flow diagram summarizing the PET mapping and validation process (PDF 474 kb)
Supplementary Fig. 5
Transcripts from different categories identified by PETs and verified by PCR and DNA sequencing (PDF 4512 kb)
Supplementary Fig. 6
The DNA sequence, ORF, and the translated peptide sequence of the Ppp2r4-Set trans-spliced fusion transcript (PDF 1534 kb)
Supplementary Fig. 7
Exon arrangement of two putative fusion transcripts identified by GIS analysis (PDF 391 kb)
Supplementary Table 1
PET mapping statistics. (PDF 45 kb)
Supplementary Table 2
Mapping characteristics of the top 10 most abundant PET clusters and their corresponding genes. (PDF 58 kb)
Supplementary Table 3
Putative intergenically-spliced transcripts identified by GIS ditag analysis. (PDF 59 kb)
Supplementary Table 4
Oligonucleotide and adapter sequences used in the GIS analysis method. (PDF 78 kb)
Supplementary Data
Further GIS analysis (PDF 134 kb)
Supplementary Protocol
GIS Analysis (PDF 449 kb)
Rights and permissions
About this article
Cite this article
Ng, P., Wei, CL., Sung, WK. et al. Gene identification signature (GIS) analysis for transcriptome characterization and genome annotation. Nat Methods 2, 105–111 (2005). https://doi.org/10.1038/nmeth733
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nmeth733
This article is cited by
-
High-resolution analysis of cell-state transitions in yeast suggests widespread transcriptional tuning by alternative starts
Genome Biology (2021)
-
Single-cell RNA cap and tail sequencing (scRCAT-seq) reveals subtype-specific isoforms differing in transcript demarcation
Nature Communications (2020)
-
Transcriptomic and gene expression changes in response to postharvest surface pitting in ‘Lingwu Long’ jujube fruit
Horticulture, Environment, and Biotechnology (2018)
-
Genome-wide transcriptomic analysis of a superior biomass-degrading strain of A. fumigatus revealed active lignocellulose-degrading genes
BMC Genomics (2015)
-
Comparative transcriptomic analysis of silkwormBmovo-1 and wild type silkworm ovary
Scientific Reports (2015)