Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Gene identification signature (GIS) analysis for transcriptome characterization and genome annotation

Abstract

We have developed a DNA tag sequencing and mapping strategy called gene identification signature (GIS) analysis, in which 5′ and 3′ signatures of full-length cDNAs are accurately extracted into paired-end ditags (PETs) that are concatenated for efficient sequencing and mapped to genome sequences to demarcate the transcription boundaries of every gene. GIS analysis is potentially 30-fold more efficient than standard cDNA sequencing approaches for transcriptome characterization. We demonstrated this approach with 116,252 PET sequences derived from mouse embryonic stem cells. Initial analysis of this dataset identified hundreds of previously uncharacterized transcripts, including alternative transcripts of known genes. We also uncovered several intergenically spliced and unusual fusion transcripts, one of which was confirmed as a trans-splicing event and was differentially expressed. The concept of paired-end ditagging described here for transcriptome analysis can also be applied to whole-genome analysis of cis-regulatory and other DNA elements and represents an important technological advance for genome annotation.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Schematic view of the GIS analysis method.
Figure 2: Examples of previously uncharacterized transcripts identified by GIS analysis and verified by PCR and sequencing.
Figure 3: Schematic view of the Ppp2r4-Set fusion transcript identified by GIS analysis in E14 cells and RT-PCR verification of its existence in various tissues.

Similar content being viewed by others

References

  1. International Human Genome Sequencing Consortium. Finishing the euchromatic sequence of the human genome. Nature 431, 931–945 (2004).

  2. Lander, E.S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).

    Article  CAS  Google Scholar 

  3. Venter, J.C. et al. The sequence of the human genome. Science 291, 1304–1351 (2001).

    Article  CAS  Google Scholar 

  4. Gibbs, R.A. et al. Genome sequence of the Brown Norway rat yields insights into mammalian evolution. Nature 428, 493–521 (2004).

    Article  CAS  Google Scholar 

  5. Waterston, R.H. et al. Initial sequencing and comparative analysis of the mouse genome. Nature 420, 520–562 (2002).

    Article  CAS  Google Scholar 

  6. The ENCODE Project Consortium. The ENCODE (ENCyclopedia Of DNA Elements) Project. Science 306, 636–640 (2004).

  7. Okazaki, Y. et al. Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs. Nature 420, 563–573 (2002).

    Article  Google Scholar 

  8. Rinn, J.L. et al. The transcriptional activity of human chromosome 22. Genes Dev. 17, 529–540 (2003).

    Article  CAS  Google Scholar 

  9. Kampa, D. et al. Novel RNAs identified from an in-depth analysis of the transcriptome of human chromosomes 21 and 22. Genome Res. 14, 331–342 (2004).

    Article  CAS  Google Scholar 

  10. Brent, M.R. & Guigo, R. Recent advances in gene structure prediction. Curr. Opin. Struct. Biol. 14, 264–272 (2004).

    Article  CAS  Google Scholar 

  11. Guigo, R., Agarwal, P., Abril, J.F., Burset, M. & Fickett, J.W. An assessment of gene prediction accuracy in large DNA sequences. Genome Res. 10, 1631–1642 (2000).

    Article  CAS  Google Scholar 

  12. Guigo, R. et al. Comparison of mouse and human genomes followed by experimental verification yields an estimated 1,019 additional genes. Proc. Natl. Acad. Sci. USA 100, 1140–1145 (2003).

    Article  CAS  Google Scholar 

  13. Parra, G. et al. Comparative gene prediction in human and mouse. Genome Res. 13, 108–117 (2003).

    Article  CAS  Google Scholar 

  14. Ruan, Y., Le Ber, P., Ng, H.H. & Liu, E.T. Interrogating the transcriptome. Trends Biotechnol. 22, 23–30 (2004).

    Article  CAS  Google Scholar 

  15. Strausberg, R.L. et al. Generation and initial analysis of more than 15,000 full-length human and mouse cDNA sequences. Proc. Natl. Acad. Sci. USA 99, 16899–16903 (2002).

    Article  Google Scholar 

  16. Kawai, J. et al. Functional annotation of a full-length mouse cDNA collection. Nature 409, 685–690 (2001).

    Article  Google Scholar 

  17. Velculescu, V.E., Zhang, L., Vogelstein, B. & Kinzler, K.W. Serial analysis of gene expression. Science 270, 484–487 (1995).

    Article  CAS  Google Scholar 

  18. Saha, S. et al. Using the transcriptome to annotate the genome. Nat. Biotechnol. 20, 508–512 (2002).

    Article  CAS  Google Scholar 

  19. Brenner, S. et al. Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays. Nat. Biotechnol. 18, 630–634 (2000).

    Article  CAS  Google Scholar 

  20. Wei, C-L. et al. 5′ Long serial analysis of gene expression (LongSAGE) and 3′ LongSAGE for transcriptome characterization and genome annotation. Proc. Natl. Acad. Sci. USA 101, 11701–11706 (2004).

    Article  CAS  Google Scholar 

  21. Maniatis, T. & Tasic, B. Alternative pre-mRNA splicing and proteome expansion in metazoans. Nature 418, 236–243 (2002).

    Article  CAS  Google Scholar 

  22. Kapranov, P. et al. Large-scale transcriptional activity in chromosomes 21 and 22. Science 296, 916–919 (2002).

    Article  CAS  Google Scholar 

  23. Shoemaker, D.D. et al. Experimental annotation of the human genome using microarray technology. Nature 409, 922–927 (2001).

    Article  CAS  Google Scholar 

  24. Yamada, K. et al. Empirical analysis of transcriptional activity in the Arabidopsis genome. Science 302, 842–846 (2003).

    Article  CAS  Google Scholar 

  25. Carninci, P. & Hayashizaki, Y. High-efficiency full-length cDNA cloning. Methods Enzymol. 303, 19–44 (1999).

    Article  CAS  Google Scholar 

  26. Maruyama, K. & Sugano, S. Oligo-capping: a simple method to replace the cap structure of eukaryotic mRNAs with oligoribonucleotides. Gene 138, 171–174 (1994).

    Article  CAS  Google Scholar 

  27. Zhu, Y.Y., Machleder, E.M., Chenchik, A., Li, R. & Siebert, P.D. Reverse transcriptase template switching: a SMART approach for full-length cDNA library construction. Biotechniques 30, 892–897 (2001).

    Article  CAS  Google Scholar 

  28. Hashimoto, S. et al. 5′-end SAGE for the analysis of transcriptional start sites. Nat. Biotechnol. 22, 1146–1149 (2004).

    Article  CAS  Google Scholar 

  29. Shiraki, T. et al. Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage. Proc. Natl. Acad. Sci. USA 100, 15776–15781 (2003).

    Article  CAS  Google Scholar 

  30. Hooper, M., Hardy, K., Handyside, A., Hunter, S. & Monk, M. HPRT-deficient (Lesch-Nyhan) mouse embryos derived from germline colonization by cultured cells. Nature 326, 292–295 (1987).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

The authors acknowledge Q. Yu of the National University of Singapore and P. Kondu, W.Y. Au Yong, H.C. Yong, M. Hirwan and M. Rebhan of the Bioinformatics Institute of Singapore for information technology and bioinformatics support, as well as P. Li for providing the E14 cells. This work was funded by the Agency for Science, Technology and Research (A*STAR), Singapore.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Edison T Liu or Yijun Ruan.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Fig. 1

An example of GIS PET structure and the strategy of mapping PET to genome (PDF 869 kb)

Supplementary Fig. 2

Mapping specificity and quality of PETs to known transcripts (PDF 290 kb)

Supplementary Fig. 3

Quantitative comparison of GIS PETs and EST enumeration (PDF 513 kb)

Supplementary Fig. 4

Flow diagram summarizing the PET mapping and validation process (PDF 474 kb)

Supplementary Fig. 5

Transcripts from different categories identified by PETs and verified by PCR and DNA sequencing (PDF 4512 kb)

Supplementary Fig. 6

The DNA sequence, ORF, and the translated peptide sequence of the Ppp2r4-Set trans-spliced fusion transcript (PDF 1534 kb)

Supplementary Fig. 7

Exon arrangement of two putative fusion transcripts identified by GIS analysis (PDF 391 kb)

Supplementary Table 1

PET mapping statistics. (PDF 45 kb)

Supplementary Table 2

Mapping characteristics of the top 10 most abundant PET clusters and their corresponding genes. (PDF 58 kb)

Supplementary Table 3

Putative intergenically-spliced transcripts identified by GIS ditag analysis. (PDF 59 kb)

Supplementary Table 4

Oligonucleotide and adapter sequences used in the GIS analysis method. (PDF 78 kb)

Supplementary Data

Further GIS analysis (PDF 134 kb)

Supplementary Protocol

GIS Analysis (PDF 449 kb)

Supplementary Methods (PDF 138 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ng, P., Wei, CL., Sung, WK. et al. Gene identification signature (GIS) analysis for transcriptome characterization and genome annotation. Nat Methods 2, 105–111 (2005). https://doi.org/10.1038/nmeth733

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nmeth733

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing