Saurabh Saha1, 2, 5, Andrew B. Sparks1, 3, 5, Carlo Rago1, Viatcheslav Akmaev4, Clarence J. Wang4, Bert Vogelstein1, Kenneth W. Kinzler1
& Victor E. Velculescu1
1
Howard Hughes Medical Institute and the Sidney
Kimmel Comprehensive Cancer Center, Baltimore, MD
21231.
2
Program in Cellular and Molecular Medicine, Johns
Hopkins Medical Institutions, Baltimore, MD
21231.
3
Current address: GMP Genetics, 200
Prospect Street, Waltham, MA
02451.
4
Genzyme Molecular Oncology, P.O. Box
9322, Framingham, MA 01701.
A remaining challenge for the human genome project involves the
identification and annotation of expressed genes. The public and private
sequencing efforts have identified 15,000 sequences that meet stringent
criteria for genes, such as correspondence with known genes from humans or
other species, and have made another 10,000−20,000 gene predictions
of lower confidence, supported by various types of in silico evidence,
including homology studies, domain searches, and ab initio gene
predictions1,
2. These computational methods have limitations,
both because they are unable to identify a significant fraction of genes and
exons and because they are unable to provide definitive evidence about whether
a hypothetical gene is actually expressed3,
4. As the in
silico approaches identified a smaller number of genes than
anticipated5,
6,
7,
8,
9, we wondered whether high-throughput
experimental analyses could be used to provide evidence for the expression of
hypothetical genes and to reveal previously undiscovered genes. We describe
here the development of such a methodcalled long serial analysis of gene
expression (LongSAGE), an adaption of the original SAGE approach10that can be used to rapidly identify novel genes and
exons.