Abstract
Cloning procedures aided by homology searches of EST databases have accelerated the pace of discovery of new genes1, but EST database searching remains an involved and onerous task. More than 1.6 million human EST sequences have been deposited in public databases, making it difficult to identify ESTs that represent new genes. Compounding the problems of scale are difficulties in detection associated with a high sequencing error rate and low sequence similarity between distant homologues. We have developed a new method, coupling BLAST-based2 searches with a domain identification protocol3,4, that filters candidate homologues. Application of this method in a large-scale analysis of 100 signalling domain families has led to the identification of ESTs representing more than 1,000 novel human signalling genes. The 4,206 publicly available ESTs representing these genes are a valuable resource for rapid cloning of novel human signalling proteins. For example, we were able to identify ESTs of at least 106 new small GTPases, of which 6 are likely to belong to new subfamilies. In some cases, further analyses of genomic DNA led to the discovery of previously unidentified full-length protein sequences. This is exemplified by the in silico cloning (prediction of a gene product sequence using only genomic and EST sequence data) of a new type of GTPase with two catalytic domains.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Pandey, A. & Lewitter, F. Nucleotide sequence databases: a gold mine for biologists. Trends Biochem. Sci. 24 , 276–280 (1999).
Altschul, S.F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).
Schultz, J., Milpetz, F., Bork, P. & Ponting, C.P. SMART, a simple modular architecture research tool: identification of signaling domains. Proc. Natl Acad. Sci. USA 95, 5857– 5864 (1998).
Schultz, J., Copley, R.R., Doerks, T., Ponting, C.P. & Bork, P. SMART: a web-based tool for the study of genetically mobile domains. Nucleic Acids Res. 28, 231 –234 (2000).
Schuler, G.D. Pieces of the puzzle: expressed sequence tags and the catalog of human genes . J. Mol. Med. 75, 694– 698 (1997).
Retief, J.D., Lynch, K.R. & Pearson, W.R. Panning for genes—a visual strategy for identifying novel gene orthologs and paralogs. Genome Res. 9, 373–382 (1999).
Bork, P. & Gibson, T.J. Applying motif and profile searches . Methods Enzymol. 266, 162– 184 (1996).
Wadman, M. Human Genome Project aims to finish ‘working draft’ next year . Nature 398, 177 ( 1999).
Sunyaev, S. et al. Individual variation in protein coding sequences of the human genome. Adv. Protein Chem. (in press).
Prigent, C., Gill, R., Trower, M. & Sanseau, P. In silico cloning of a new protein kinase, Aik2, related to Drosophila Aurora using the new tool: EST Blast. In Silico Biol. 1, 11 (1998).
Birney, E. & Durbin, R. Dynamite: a flexible code generating language for dynamic programming methods used in sequence comparison. Ismb 5, 56–64 ( 1997).
Wolff, A.M., Petersen, J.G.L., Nilsson-Tillgren, T. & Din, N. The open reading frame YAL048c affects the secretion of proteinase A in S. cerevisiae. Yeast 15, 427– 434 (1999).
Acknowledgements
J.S., T.D. and P.B. are supported by the DFG and by the EC (grant 01KW9602/6) as well as by the BMBF grants MEDSEQ and TARGID.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Schultz, J., Doerks, T., Ponting, C. et al. More than 1,000 putative new human signalling proteins revealed by EST data mining. Nat Genet 25, 201–204 (2000). https://doi.org/10.1038/76069
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1038/76069
This article is cited by
-
RNA-Seq analysis of yak ovary: improving yak gene structure information and mining reproduction-related genes
Science China Life Sciences (2014)
-
The Golgi puppet master: COG complex at center stage of membrane trafficking interactions
Histochemistry and Cell Biology (2013)
-
Molecular cloning, sequence and expression analysis of ZmArf2, a maize ADP-ribosylation factor
Molecular Biology Reports (2010)