Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Letter
  • Published:

More than 1,000 putative new human signalling proteins revealed by EST data mining

Abstract

Cloning procedures aided by homology searches of EST databases have accelerated the pace of discovery of new genes1, but EST database searching remains an involved and onerous task. More than 1.6 million human EST sequences have been deposited in public databases, making it difficult to identify ESTs that represent new genes. Compounding the problems of scale are difficulties in detection associated with a high sequencing error rate and low sequence similarity between distant homologues. We have developed a new method, coupling BLAST-based2 searches with a domain identification protocol3,4, that filters candidate homologues. Application of this method in a large-scale analysis of 100 signalling domain families has led to the identification of ESTs representing more than 1,000 novel human signalling genes. The 4,206 publicly available ESTs representing these genes are a valuable resource for rapid cloning of novel human signalling proteins. For example, we were able to identify ESTs of at least 106 new small GTPases, of which 6 are likely to belong to new subfamilies. In some cases, further analyses of genomic DNA led to the discovery of previously unidentified full-length protein sequences. This is exemplified by the in silico cloning (prediction of a gene product sequence using only genomic and EST sequence data) of a new type of GTPase with two catalytic domains.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Flow chart summarizing search protocol and results.
Figure 2: In silico cloning of a novel small GTPase.

Similar content being viewed by others

References

  1. Pandey, A. & Lewitter, F. Nucleotide sequence databases: a gold mine for biologists. Trends Biochem. Sci. 24 , 276–280 (1999).

    Article  CAS  Google Scholar 

  2. Altschul, S.F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).

    Article  CAS  Google Scholar 

  3. Schultz, J., Milpetz, F., Bork, P. & Ponting, C.P. SMART, a simple modular architecture research tool: identification of signaling domains. Proc. Natl Acad. Sci. USA 95, 5857– 5864 (1998).

    Article  CAS  Google Scholar 

  4. Schultz, J., Copley, R.R., Doerks, T., Ponting, C.P. & Bork, P. SMART: a web-based tool for the study of genetically mobile domains. Nucleic Acids Res. 28, 231 –234 (2000).

    Article  CAS  Google Scholar 

  5. Schuler, G.D. Pieces of the puzzle: expressed sequence tags and the catalog of human genes . J. Mol. Med. 75, 694– 698 (1997).

    Article  CAS  Google Scholar 

  6. Retief, J.D., Lynch, K.R. & Pearson, W.R. Panning for genes—a visual strategy for identifying novel gene orthologs and paralogs. Genome Res. 9, 373–382 (1999).

    CAS  PubMed  PubMed Central  Google Scholar 

  7. Bork, P. & Gibson, T.J. Applying motif and profile searches . Methods Enzymol. 266, 162– 184 (1996).

    Article  CAS  Google Scholar 

  8. Wadman, M. Human Genome Project aims to finish ‘working draft’ next year . Nature 398, 177 ( 1999).

    Article  CAS  Google Scholar 

  9. Sunyaev, S. et al. Individual variation in protein coding sequences of the human genome. Adv. Protein Chem. (in press).

  10. Prigent, C., Gill, R., Trower, M. & Sanseau, P. In silico cloning of a new protein kinase, Aik2, related to Drosophila Aurora using the new tool: EST Blast. In Silico Biol. 1, 11 (1998).

    Google Scholar 

  11. Birney, E. & Durbin, R. Dynamite: a flexible code generating language for dynamic programming methods used in sequence comparison. Ismb 5, 56–64 ( 1997).

    CAS  PubMed  Google Scholar 

  12. Wolff, A.M., Petersen, J.G.L., Nilsson-Tillgren, T. & Din, N. The open reading frame YAL048c affects the secretion of proteinase A in S. cerevisiae. Yeast 15, 427– 434 (1999).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

J.S., T.D. and P.B. are supported by the DFG and by the EC (grant 01KW9602/6) as well as by the BMBF grants MEDSEQ and TARGID.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peer Bork.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Schultz, J., Doerks, T., Ponting, C. et al. More than 1,000 putative new human signalling proteins revealed by EST data mining. Nat Genet 25, 201–204 (2000). https://doi.org/10.1038/76069

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1038/76069

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing