Gene identification signature (GIS) analysis for transcriptome characterization and genome annotation

Ng, Patrick; Wei, Chia-Lin; Sung, Wing-Kin; Chiu, Kuo Ping; Lipovich, Leonard; Ang, Chin Chin; Gupta, Sanjay; Shahab, Atif; Ridwan, Azmi; Wong, Chee Hong; Liu, Edison T; Ruan, Yijun

doi:10.1038/nmeth733

Article
Published: 09 January 2005

Gene identification signature (GIS) analysis for transcriptome characterization and genome annotation

Patrick Ng¹^na1,
Chia-Lin Wei¹^na1,
Wing-Kin Sung¹,
Kuo Ping Chiu¹,
Leonard Lipovich¹,
Chin Chin Ang¹,
Sanjay Gupta¹,
Atif Shahab²,
Azmi Ridwan²,
Chee Hong Wong²,
Edison T Liu¹ &
…
Yijun Ruan¹

Nature Methods volume 2, pages 105–111 (2005)Cite this article

1344 Accesses
193 Citations
10 Altmetric
Metrics details

Abstract

We have developed a DNA tag sequencing and mapping strategy called gene identification signature (GIS) analysis, in which 5′ and 3′ signatures of full-length cDNAs are accurately extracted into paired-end ditags (PETs) that are concatenated for efficient sequencing and mapped to genome sequences to demarcate the transcription boundaries of every gene. GIS analysis is potentially 30-fold more efficient than standard cDNA sequencing approaches for transcriptome characterization. We demonstrated this approach with 116,252 PET sequences derived from mouse embryonic stem cells. Initial analysis of this dataset identified hundreds of previously uncharacterized transcripts, including alternative transcripts of known genes. We also uncovered several intergenically spliced and unusual fusion transcripts, one of which was confirmed as a trans-splicing event and was differentially expressed. The concept of paired-end ditagging described here for transcriptome analysis can also be applied to whole-genome analysis of cis-regulatory and other DNA elements and represents an important technological advance for genome annotation.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Schematic view of the GIS analysis method.**

**Figure 2: Examples of previously uncharacterized transcripts identified by GIS analysis and verified by PCR and sequencing.**

**Figure 3: Schematic view of the *Ppp2r4-Set* fusion transcript identified by GIS analysis in E14 cells and RT-PCR verification of its existence in various tissues.**

Inferring gene regulatory networks from single-cell multiome data using atlas-scale external data

Article Open access 12 April 2024

Single-cell long-read sequencing-based mapping reveals specialized splicing patterns in developing and adult mouse and human brain

Article Open access 09 April 2024

Assessing GPT-4 for cell type annotation in single-cell RNA-seq analysis

Article Open access 25 March 2024

References

International Human Genome Sequencing Consortium. Finishing the euchromatic sequence of the human genome. Nature 431, 931–945 (2004).
Lander, E.S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).
Article CAS Google Scholar
Venter, J.C. et al. The sequence of the human genome. Science 291, 1304–1351 (2001).
Article CAS Google Scholar
Gibbs, R.A. et al. Genome sequence of the Brown Norway rat yields insights into mammalian evolution. Nature 428, 493–521 (2004).
Article CAS Google Scholar
Waterston, R.H. et al. Initial sequencing and comparative analysis of the mouse genome. Nature 420, 520–562 (2002).
Article CAS Google Scholar
The ENCODE Project Consortium. The ENCODE (ENCyclopedia Of DNA Elements) Project. Science 306, 636–640 (2004).
Okazaki, Y. et al. Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs. Nature 420, 563–573 (2002).
Article Google Scholar
Rinn, J.L. et al. The transcriptional activity of human chromosome 22. Genes Dev. 17, 529–540 (2003).
Article CAS Google Scholar
Kampa, D. et al. Novel RNAs identified from an in-depth analysis of the transcriptome of human chromosomes 21 and 22. Genome Res. 14, 331–342 (2004).
Article CAS Google Scholar
Brent, M.R. & Guigo, R. Recent advances in gene structure prediction. Curr. Opin. Struct. Biol. 14, 264–272 (2004).
Article CAS Google Scholar
Guigo, R., Agarwal, P., Abril, J.F., Burset, M. & Fickett, J.W. An assessment of gene prediction accuracy in large DNA sequences. Genome Res. 10, 1631–1642 (2000).
Article CAS Google Scholar
Guigo, R. et al. Comparison of mouse and human genomes followed by experimental verification yields an estimated 1,019 additional genes. Proc. Natl. Acad. Sci. USA 100, 1140–1145 (2003).
Article CAS Google Scholar
Parra, G. et al. Comparative gene prediction in human and mouse. Genome Res. 13, 108–117 (2003).
Article CAS Google Scholar
Ruan, Y., Le Ber, P., Ng, H.H. & Liu, E.T. Interrogating the transcriptome. Trends Biotechnol. 22, 23–30 (2004).
Article CAS Google Scholar
Strausberg, R.L. et al. Generation and initial analysis of more than 15,000 full-length human and mouse cDNA sequences. Proc. Natl. Acad. Sci. USA 99, 16899–16903 (2002).
Article Google Scholar
Kawai, J. et al. Functional annotation of a full-length mouse cDNA collection. Nature 409, 685–690 (2001).
Article Google Scholar
Velculescu, V.E., Zhang, L., Vogelstein, B. & Kinzler, K.W. Serial analysis of gene expression. Science 270, 484–487 (1995).
Article CAS Google Scholar
Saha, S. et al. Using the transcriptome to annotate the genome. Nat. Biotechnol. 20, 508–512 (2002).
Article CAS Google Scholar
Brenner, S. et al. Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays. Nat. Biotechnol. 18, 630–634 (2000).
Article CAS Google Scholar
Wei, C-L. et al. 5′ Long serial analysis of gene expression (LongSAGE) and 3′ LongSAGE for transcriptome characterization and genome annotation. Proc. Natl. Acad. Sci. USA 101, 11701–11706 (2004).
Article CAS Google Scholar
Maniatis, T. & Tasic, B. Alternative pre-mRNA splicing and proteome expansion in metazoans. Nature 418, 236–243 (2002).
Article CAS Google Scholar
Kapranov, P. et al. Large-scale transcriptional activity in chromosomes 21 and 22. Science 296, 916–919 (2002).
Article CAS Google Scholar
Shoemaker, D.D. et al. Experimental annotation of the human genome using microarray technology. Nature 409, 922–927 (2001).
Article CAS Google Scholar
Yamada, K. et al. Empirical analysis of transcriptional activity in the Arabidopsis genome. Science 302, 842–846 (2003).
Article CAS Google Scholar
Carninci, P. & Hayashizaki, Y. High-efficiency full-length cDNA cloning. Methods Enzymol. 303, 19–44 (1999).
Article CAS Google Scholar
Maruyama, K. & Sugano, S. Oligo-capping: a simple method to replace the cap structure of eukaryotic mRNAs with oligoribonucleotides. Gene 138, 171–174 (1994).
Article CAS Google Scholar
Zhu, Y.Y., Machleder, E.M., Chenchik, A., Li, R. & Siebert, P.D. Reverse transcriptase template switching: a SMART approach for full-length cDNA library construction. Biotechniques 30, 892–897 (2001).
Article CAS Google Scholar
Hashimoto, S. et al. 5′-end SAGE for the analysis of transcriptional start sites. Nat. Biotechnol. 22, 1146–1149 (2004).
Article CAS Google Scholar
Shiraki, T. et al. Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage. Proc. Natl. Acad. Sci. USA 100, 15776–15781 (2003).
Article CAS Google Scholar
Hooper, M., Hardy, K., Handyside, A., Hunter, S. & Monk, M. HPRT-deficient (Lesch-Nyhan) mouse embryos derived from germline colonization by cultured cells. Nature 326, 292–295 (1987).
Article CAS Google Scholar

Download references

Acknowledgements

The authors acknowledge Q. Yu of the National University of Singapore and P. Kondu, W.Y. Au Yong, H.C. Yong, M. Hirwan and M. Rebhan of the Bioinformatics Institute of Singapore for information technology and bioinformatics support, as well as P. Li for providing the E14 cells. This work was funded by the Agency for Science, Technology and Research (A^*STAR), Singapore.

Author information

Patrick Ng and Chia-Lin Wei: These authors contributed equally to this work.

Authors and Affiliations

Genome Institute of Singapore, 60 Biopolis Street, Genome #02-01, Singapore, 138672
Patrick Ng, Chia-Lin Wei, Wing-Kin Sung, Kuo Ping Chiu, Leonard Lipovich, Chin Chin Ang, Sanjay Gupta, Edison T Liu & Yijun Ruan
Bioinformatics Institute, 30 Biopolis Street, Matrix #08-01, Singapore, 138671
Atif Shahab, Azmi Ridwan & Chee Hong Wong

Authors

Patrick Ng
View author publications
You can also search for this author in PubMed Google Scholar
Chia-Lin Wei
View author publications
You can also search for this author in PubMed Google Scholar
Wing-Kin Sung
View author publications
You can also search for this author in PubMed Google Scholar
Kuo Ping Chiu
View author publications
You can also search for this author in PubMed Google Scholar
Leonard Lipovich
View author publications
You can also search for this author in PubMed Google Scholar
Chin Chin Ang
View author publications
You can also search for this author in PubMed Google Scholar
Sanjay Gupta
View author publications
You can also search for this author in PubMed Google Scholar
Atif Shahab
View author publications
You can also search for this author in PubMed Google Scholar
Azmi Ridwan
View author publications
You can also search for this author in PubMed Google Scholar
Chee Hong Wong
View author publications
You can also search for this author in PubMed Google Scholar
Edison T Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yijun Ruan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Edison T Liu or Yijun Ruan.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Fig. 1

An example of GIS PET structure and the strategy of mapping PET to genome (PDF 869 kb)

Supplementary Fig. 2

Mapping specificity and quality of PETs to known transcripts (PDF 290 kb)

Supplementary Fig. 3

Quantitative comparison of GIS PETs and EST enumeration (PDF 513 kb)

Supplementary Fig. 4

Flow diagram summarizing the PET mapping and validation process (PDF 474 kb)

Supplementary Fig. 5

Transcripts from different categories identified by PETs and verified by PCR and DNA sequencing (PDF 4512 kb)

Supplementary Fig. 6

The DNA sequence, ORF, and the translated peptide sequence of the Ppp2r4-Set trans-spliced fusion transcript (PDF 1534 kb)

Supplementary Fig. 7

Exon arrangement of two putative fusion transcripts identified by GIS analysis (PDF 391 kb)

Supplementary Table 1

PET mapping statistics. (PDF 45 kb)

Supplementary Table 2

Mapping characteristics of the top 10 most abundant PET clusters and their corresponding genes. (PDF 58 kb)

Supplementary Table 3

Putative intergenically-spliced transcripts identified by GIS ditag analysis. (PDF 59 kb)

Supplementary Table 4

Oligonucleotide and adapter sequences used in the GIS analysis method. (PDF 78 kb)

Supplementary Data

Further GIS analysis (PDF 134 kb)

Supplementary Protocol

GIS Analysis (PDF 449 kb)

Supplementary Methods (PDF 138 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ng, P., Wei, CL., Sung, WK. et al. Gene identification signature (GIS) analysis for transcriptome characterization and genome annotation. Nat Methods 2, 105–111 (2005). https://doi.org/10.1038/nmeth733

Download citation

Received: 05 October 2004
Accepted: 15 December 2004
Published: 09 January 2005
Issue Date: February 2005
DOI: https://doi.org/10.1038/nmeth733

This article is cited by

High-resolution analysis of cell-state transitions in yeast suggests widespread transcriptional tuning by alternative starts
- Minghao Chia
- Cai Li
- Folkert J. van Werven
Genome Biology (2021)
Single-cell RNA cap and tail sequencing (scRCAT-seq) reveals subtype-specific isoforms differing in transcript demarcation
- Youjin Hu
- Jiawei Zhong
- Yizhi Liu
Nature Communications (2020)
Transcriptomic and gene expression changes in response to postharvest surface pitting in ‘Lingwu Long’ jujube fruit
- Xia Liu
- Tengyue Wang
- Yage Xing
Horticulture, Environment, and Biotechnology (2018)
Genome-wide transcriptomic analysis of a superior biomass-degrading strain of A. fumigatus revealed active lignocellulose-degrading genes
- Youzhi Miao
- Dongyang Liu
- Ruifu Zhang
BMC Genomics (2015)
Comparative transcriptomic analysis of silkwormBmovo-1 and wild type silkworm ovary
- Renyu Xue
- Xiaolong Hu
- Chengliang Gong
Scientific Reports (2015)