Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Using the transcriptome to annotate the genome


A remaining challenge for the human genome project involves the identification and annotation of expressed genes. The public and private sequencing efforts have identified 15,000 sequences that meet stringent criteria for genes, such as correspondence with known genes from humans or other species, and have made another 10,000–20,000 gene predictions of lower confidence, supported by various types of in silico evidence, including homology studies, domain searches, and ab initio gene predictions1,2. These computational methods have limitations, both because they are unable to identify a significant fraction of genes and exons and because they are unable to provide definitive evidence about whether a hypothetical gene is actually expressed3,4. As the in silico approaches identified a smaller number of genes than anticipated5,6,7,8,9, we wondered whether high-throughput experimental analyses could be used to provide evidence for the expression of hypothetical genes and to reveal previously undiscovered genes. We describe here the development of such a method—called long serial analysis of gene expression (LongSAGE), an adaption of the original SAGE approach10—that can be used to rapidly identify novel genes and exons.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Figure 1: Schematic of LongSAGE method.
Figure 2: Expression analysis of candidate LS genes.


  1. Lander, E.S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).

    CAS  PubMed  Google Scholar 

  2. Venter, J.C. et al. The sequence of the human genome. Science 291, 1304–1351 (2001).

    CAS  PubMed  Google Scholar 

  3. Wheelan, S.J. & Boguski, M.S. Late-night thoughts on the sequence annotation problem. Genome Res. 8, 168–169 (1998).

    Article  CAS  PubMed  Google Scholar 

  4. Guigo, R., Agarwal, P., Abril, J.F., Burset, M. & Fickett, J.W. An assessment of gene prediction accuracy in large DNA sequences. Genome Res. 10, 1631–1642 (2000).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Fields, C., Adams, M.D., White, O. & Venter, J.C. How many genes in the human genome? Nat. Genet. 7, 345–346 (1994).

    Article  CAS  PubMed  Google Scholar 

  6. Dunham, I. et al. The DNA sequence of human chromosome 22. Nature 402, 489–495 (1999).

    Article  CAS  PubMed  Google Scholar 

  7. Velculescu, V.E. et al. Analysis of human transcriptomes. Nat. Genet. 23, 387–388 (1999).

    Article  CAS  PubMed  Google Scholar 

  8. Liang, F. et al. Gene index analysis of the human genome estimates approximately 120,000 genes. Nat. Genet. 25, 239–240 (2000).

    Article  CAS  PubMed  Google Scholar 

  9. de Souza, S.J. et al. Identification of human chromosome 22 transcribed sequences with ORF expressed sequence tags. Proc. Natl. Acad. Sci. USA 97, 12690–12693 (2000).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Velculescu, V.E., Zhang, L., Vogelstein, B. & Kinzler, K.W. Serial analysis of gene expression. Science 270, 484–487 (1995).

    Article  CAS  PubMed  Google Scholar 

  11. Lal, A. et al. A public database for gene expression in human cancers. Cancer Res. 59, 5403–5407 (1999).

    CAS  PubMed  Google Scholar 

  12. Caron, H. et al. The human transcriptome map: clustering of highly expressed genes in chromosomal domains. Science 291, 1289–1292 (2001).

    Article  CAS  PubMed  Google Scholar 

  13. Burge, C. & Karlin, S. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268, 78–94 (1997).

    Article  CAS  PubMed  Google Scholar 

  14. Polyak, K., Xia, Y., Zweier, J.L., Kinzler, K.W. & Vogelstein, B. A model for p53-induced apoptosis. Nature 389, 300–304 (1997).

    Article  CAS  PubMed  Google Scholar 

  15. Adams, M.D. et al. Initial assessment of human gene diversity and expression patterns based upon 83 million nucleotides of cDNA sequence. Nature 377, 3 ff. (1995).

    Google Scholar 

  16. Okubo, K., Yoshii, J., Yokouchi, H., Kameyama, M. & Matsubara, K. An expression profile of active genes in human colonic mucosa. DNA Res. 1, 37–45 (1994).

    Article  CAS  PubMed  Google Scholar 

  17. Shoemaker, D.D. et al. Experimental annotation of the human genome using microarray technology. Nature 409, 922–927 (2001).

    Article  CAS  PubMed  Google Scholar 

  18. Boyd, A.C., Charles, I.G., Keyte, J.W. & Brammar, W.J. Isolation and computer-aided characterization of MmeI, a type II restriction endonuclease from Methylophilus methylotrophus. Nucleic Acids Res. 14, 5255–5274 (1986).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Tucholski, J., Skowron, P.M. & Podhajska, A.J. MmeI, a class-IIS restriction endonuclease: purification and characterization. Gene 157, 87–92 (1995).

    Article  CAS  PubMed  Google Scholar 

Download references


We thank Kathy Romans for assistance with database searches, Jennifer Davis for statistical analyses, and Steve Madden, Kathy Klinger, Xiaohong Cao, and members of our laboratories for helpful discussions. This work was supported by NIH grant CA57345.

Author information

Authors and Affiliations


Corresponding authors

Correspondence to Kenneth W. Kinzler or Victor E. Velculescu.

Ethics declarations

Competing interests

K.W.K. received research funding from Genzyme Molecular Oncology (Genzyme). Under a licensing agreement between the Johns Hopkins University and Genzyme, the SAGE technology was licensed to Genzyme for commercial purposes, and B.V., K.W.K., and V.E.V. are entitled to shares of royalties received by the university from the sales of the licensed technology. The SAGE technology is freely available to academia for research purposes. K.W.K. and V.E.V. are consultants to Genzyme, and B.V. has consulted for Genzyme in the past. The university and researchers (B.V., K.W.K., and V.E.V.) own Genzyme stock, which is subject to certain restrictions under university policy. The terms of these arrangements are being managed by the university in accordance with its conflict of interest policies.

Supplementary information

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Saha, S., Sparks, A., Rago, C. et al. Using the transcriptome to annotate the genome. Nat Biotechnol 20, 508–512 (2002).

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI:

This article is cited by


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing