Abstract
A remaining challenge for the human genome project involves the identification and annotation of expressed genes. The public and private sequencing efforts have identified ∼15,000 sequences that meet stringent criteria for genes, such as correspondence with known genes from humans or other species, and have made another ∼10,000–20,000 gene predictions of lower confidence, supported by various types of in silico evidence, including homology studies, domain searches, and ab initio gene predictions1,2. These computational methods have limitations, both because they are unable to identify a significant fraction of genes and exons and because they are unable to provide definitive evidence about whether a hypothetical gene is actually expressed3,4. As the in silico approaches identified a smaller number of genes than anticipated5,6,7,8,9, we wondered whether high-throughput experimental analyses could be used to provide evidence for the expression of hypothetical genes and to reveal previously undiscovered genes. We describe here the development of such a method—called long serial analysis of gene expression (LongSAGE), an adaption of the original SAGE approach10—that can be used to rapidly identify novel genes and exons.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Lander, E.S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).
Venter, J.C. et al. The sequence of the human genome. Science 291, 1304–1351 (2001).
Wheelan, S.J. & Boguski, M.S. Late-night thoughts on the sequence annotation problem. Genome Res. 8, 168–169 (1998).
Guigo, R., Agarwal, P., Abril, J.F., Burset, M. & Fickett, J.W. An assessment of gene prediction accuracy in large DNA sequences. Genome Res. 10, 1631–1642 (2000).
Fields, C., Adams, M.D., White, O. & Venter, J.C. How many genes in the human genome? Nat. Genet. 7, 345–346 (1994).
Dunham, I. et al. The DNA sequence of human chromosome 22. Nature 402, 489–495 (1999).
Velculescu, V.E. et al. Analysis of human transcriptomes. Nat. Genet. 23, 387–388 (1999).
Liang, F. et al. Gene index analysis of the human genome estimates approximately 120,000 genes. Nat. Genet. 25, 239–240 (2000).
de Souza, S.J. et al. Identification of human chromosome 22 transcribed sequences with ORF expressed sequence tags. Proc. Natl. Acad. Sci. USA 97, 12690–12693 (2000).
Velculescu, V.E., Zhang, L., Vogelstein, B. & Kinzler, K.W. Serial analysis of gene expression. Science 270, 484–487 (1995).
Lal, A. et al. A public database for gene expression in human cancers. Cancer Res. 59, 5403–5407 (1999).
Caron, H. et al. The human transcriptome map: clustering of highly expressed genes in chromosomal domains. Science 291, 1289–1292 (2001).
Burge, C. & Karlin, S. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268, 78–94 (1997).
Polyak, K., Xia, Y., Zweier, J.L., Kinzler, K.W. & Vogelstein, B. A model for p53-induced apoptosis. Nature 389, 300–304 (1997).
Adams, M.D. et al. Initial assessment of human gene diversity and expression patterns based upon 83 million nucleotides of cDNA sequence. Nature 377, 3 ff. (1995).
Okubo, K., Yoshii, J., Yokouchi, H., Kameyama, M. & Matsubara, K. An expression profile of active genes in human colonic mucosa. DNA Res. 1, 37–45 (1994).
Shoemaker, D.D. et al. Experimental annotation of the human genome using microarray technology. Nature 409, 922–927 (2001).
Boyd, A.C., Charles, I.G., Keyte, J.W. & Brammar, W.J. Isolation and computer-aided characterization of MmeI, a type II restriction endonuclease from Methylophilus methylotrophus. Nucleic Acids Res. 14, 5255–5274 (1986).
Tucholski, J., Skowron, P.M. & Podhajska, A.J. MmeI, a class-IIS restriction endonuclease: purification and characterization. Gene 157, 87–92 (1995).
Acknowledgements
We thank Kathy Romans for assistance with database searches, Jennifer Davis for statistical analyses, and Steve Madden, Kathy Klinger, Xiaohong Cao, and members of our laboratories for helpful discussions. This work was supported by NIH grant CA57345.
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Competing interests
K.W.K. received research funding from Genzyme Molecular Oncology (Genzyme). Under a licensing agreement between the Johns Hopkins University and Genzyme, the SAGE technology was licensed to Genzyme for commercial purposes, and B.V., K.W.K., and V.E.V. are entitled to shares of royalties received by the university from the sales of the licensed technology. The SAGE technology is freely available to academia for research purposes. K.W.K. and V.E.V. are consultants to Genzyme, and B.V. has consulted for Genzyme in the past. The university and researchers (B.V., K.W.K., and V.E.V.) own Genzyme stock, which is subject to certain restrictions under university policy. The terms of these arrangements are being managed by the university in accordance with its conflict of interest policies.
Supplementary information
Rights and permissions
About this article
Cite this article
Saha, S., Sparks, A., Rago, C. et al. Using the transcriptome to annotate the genome. Nat Biotechnol 20, 508–512 (2002). https://doi.org/10.1038/nbt0502-508
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1038/nbt0502-508
This article is cited by
-
A transcriptional landscape of 28 porcine tissues obtained by super deepSAGE sequencing
BMC Genomics (2020)
-
GRID-seq for comprehensive analysis of global RNA–chromatin interactions
Nature Protocols (2019)
-
GWAS and Beyond: Using Omics Approaches to Interpret SNP Associations
Current Genetic Medicine Reports (2019)
-
Non-target site-based resistance to tribenuron-methyl and essential involved genes in Myosoton aquaticum (L.)
BMC Plant Biology (2018)
-
De novo assembly, characterization, functional annotation and expression patterns of the black tiger shrimp (Penaeus monodon) transcriptome
Scientific Reports (2018)