Nature Genetics
27, 332 - 336 (2001)
doi:10.1038/85913
Open-reading-frame sequence tags (OSTs) support the existence of at least 17,300 genes in C. elegansJérôme Reboul1, 9, Philippe Vaglio1, 9, Nia Tzellas1, Nicolas Thierry-Mieg1, 2, Troy Moore3, Cindy Jackson3, Tadasu Shin-i4, Yuji Kohara4, Danielle Thierry-Mieg5, Jean Thierry-Mieg5, Hongmei Lee6, Joseph Hitti6, Lynn Doucette-Stamm6, James L. Hartley7, Gary F. Temple7, Michael A. Brasch7, Jean Vandenhaute8, Philippe E. Lamesch1, 8, David E. Hill1
& Marc Vidal11
Dana-Farber Cancer Institute and Department of Genetics, Harvard Medical School, Boston, Massachusetts, USA. 2
Laboratoire LSR-IMAG, St-Martin D'Heres, France. 3
Research Genetics, Huntsville, Alabama, USA. 4
Genome Biology Laboratory, National Institute of Genetics, Mishima, Japan. 5
National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA. 6
Genome Therapeutics Corp., Waltham, Massachusetts, USA. 7
Life Technologies Inc., Rockville, Maryland, USA. 8
Département de Biologie, Facultés Universitaires Notre-Dame de la Paix, Namur, Belgium. 9
These authors contributed equally to this work.
Correspondence should be addressed to Marc Vidal marc_vidal@dfci.harvard.eduThe genome sequences of Caenorhabditis elegans, Drosophila melanogaster and Arabidopsis thaliana have been predicted to contain 19,000, 13,600 and 25,500 genes, respectively1,
2,
3. Before this information can be fully used for evolutionary and functional studies, several issues need to be addressed. First, the gene number estimates obtained in silico and not yet supported by any experimental data need to be verified. For example, it seems biologically paradoxical that C. elegans would have 50% more genes than Drosophilia. Second, intron/exon predictions need to be tested experimentally. Third, complete sets of open reading frames (ORFs), or "ORFeomes,"4 need to be cloned into various expression vectors. To address these issues simultaneously, we have designed and applied to C. elegans the following strategy. Predicted ORFs are amplified by PCR from a highly representative cDNA library4 using ORF-specific primers, cloned by Gateway recombination cloning4,
5,
6 and then sequenced to generate ORF sequence tags (OSTs) as a way to verify identity and splicing. In a sample (n=1,222) of the nearly 10,000 genes predicted ab initio (that is, for which no expressed sequence tag (EST) is available so far), at least 70% were verified by OSTs. We also observed that 27% of these experimentally confirmed genes have a structure different from that predicted by GeneFinder. We now have experimental evidence that supports the existence of at least 17,300 genes in C. elegans. Hence we suggest that gene counts based primarily on ESTs may underestimate the number of genes in human and in other organisms.
|