Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

Bioinformatics in the post-sequence era

Abstract

In the past decade, bioinformatics has become an integral part of research and development in the biomedical sciences. Bioinformatics now has an essential role both in deciphering genomic, transcriptomic and proteomic data generated by high-throughput experimental technologies and in organizing information gathered from traditional biology. Sequence-based methods of analyzing individual genes or proteins have been elaborated and expanded, and methods have been developed for analyzing large numbers of genes or proteins simultaneously, such as in the identification of clusters of related genes and networks of interacting proteins. With the complete genome sequences for an increasing number of organisms at hand, bioinformatics is beginning to provide both conceptual bases and practical methods for detecting systemic functional behaviors of the cell and the organism.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1
Figure 2: Bioinformatics developments of the past decade.
Figure 3: Bioinformatics now and in the future.

Similar content being viewed by others

References

  1. Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).

    Article  CAS  Google Scholar 

  2. Lipman, D.J. & Pearson, W.R. Rapid and sensitive protein similarity searches. Science 227, 1435–1441 (1985).

    Article  CAS  Google Scholar 

  3. Smith, T.F. & Waterman, M.S. Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981).

    Article  CAS  Google Scholar 

  4. Olson, M., Hood, L., Cantor, C. & Botstein D. A common language for physical mapping of the human genome. Science 245, 1435–1435 (1989).

    Google Scholar 

  5. Adams, M.D. et al. Complementary DNA sequencing: expressed sequence tags and human genome project. Science 252, 1651–1656 (1991).

    Article  CAS  Google Scholar 

  6. Fleischmann, R.D. et al. Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269, 496–512 (1995).

    Article  CAS  Google Scholar 

  7. Goffeau, A. et al. Life with 6000 genes. Science 274, 546–567 (1996).

    Article  CAS  Google Scholar 

  8. The C. elegans Sequencing Consortium. Genome sequence of the nematode C. elegans: a platform for investigating biology. Science 282, 2012–2018 (1998).

  9. Adams, M.D. et al. The genome sequence of Drosophila melanogaster. Science 287, 2185–2195 (2000).

    Article  Google Scholar 

  10. Lander, E.S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).

    Article  CAS  Google Scholar 

  11. Venter, J.C. et al. The sequence of the human genome. Science 291, 1304–1351 (2001).

    Article  CAS  Google Scholar 

  12. Bork, P. & Koonin, E.V. Predicting functions from protein sequences—where are the bottlenecks? Nat. Genet. 18, 313–318 (1998).

    Article  CAS  Google Scholar 

  13. Park, J. et al. Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods. J. Mol. Biol. 284, 1201–1210 (1998).

    Article  CAS  Google Scholar 

  14. Altschul, S.F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).

    Article  CAS  Google Scholar 

  15. Krogh, A., Brown, M., Mian, I.S., Sjolander, K. & Haussler, D. Hidden Markov models in computational biology. Applications to protein modeling. J. Mol. Biol. 235, 1501–1531 (1994).

    Article  CAS  Google Scholar 

  16. Thompson, J.D., Higgins, D.G. & Gibson, T.J. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22, 4673–4680 (1994).

    Article  CAS  Google Scholar 

  17. Thompson, J.D., Gibson, T.J., Plewniak, F., Jeanmougin, F. & Higgins, D.G. The CLUSTAL X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 25, 4876–4882 (1997).

    Article  CAS  Google Scholar 

  18. Rost, B. & Sander, C. Improved prediction of protein secondary structure by use of sequence profiles and neural networks. Proc. Natl. Acad. Sci. USA 90, 7558–7562 (1993).

    Article  CAS  Google Scholar 

  19. Nakai, K. & Kanehisa, M. A knowledge base for predicting protein localization sites in eukaryotic cells. Genomics 14, 897–911 (1992).

    Article  CAS  Google Scholar 

  20. Bork, P. Powers and pitfalls in sequence analysis: the 70% hurdle. Genome Res. 10, 398–400 (2000).

    Article  CAS  Google Scholar 

  21. Falquet, L. et al. The PROSITE database, its status in 2002. Nucleic Acids Res. 30, 235–238 (2002).

    Article  CAS  Google Scholar 

  22. Henikoff, J.G., Greene, E.A., Pietrokovski, S. & Henikoff, S. Increased coverage of protein families with the blocks database servers. Nucleic Acids Res. 28, 228–230 (2000).

    Article  CAS  Google Scholar 

  23. Attwood, T.K. et al. PRINTS and PRINTS-S shed light on protein ancestry. Nucleic Acids Res. 30, 239–241 (2002).

    Article  CAS  Google Scholar 

  24. Corpet, F., Servant, F., Gouzy, J. & Kahn, D. ProDom and ProDom-CG: tools for protein domain analysis and whole genome comparisons. Nucleic Acids Res. 28, 267–269 (2000).

    Article  CAS  Google Scholar 

  25. Sonnhammer, E.L., Eddy, S.R., and Durbin, R. Pfam: a comprehensive database of protein domain families based on seed alignments. Proteins 28, 405–420 (1997).

    Article  CAS  Google Scholar 

  26. Schultz, J., Milpetz, F., Bork, P. & Ponting, C.P. SMART, a simple modular architecture research tool: identification of signaling domains. Proc. Natl. Acad. Sci. USA 95, 5857–5864 (1998).

    Article  CAS  Google Scholar 

  27. Haft, D.H. et al. TIGRFAMs: a protein family resource for the functional identification of proteins. Nucleic Acids Res. 29, 41–43 (2001).

    Article  CAS  Google Scholar 

  28. Huynen, M., Snel, B., Lathe, W. III & Bork, P. Predicting protein function by genomic context: quantitative evaluation and qualitative inferences. Genome Res. 10, 1204–1210 (2000).

    Article  CAS  Google Scholar 

  29. Marcotte, E.M., Pellegrini, M., Thompson, M.J., Yeates, T.O. & Eisenberg, D. A combined algorithm for genome-wide prediction of protein function. Nature 402, 83–86 (1999).

    Article  CAS  Google Scholar 

  30. Tatusov, R.L., Koonin, E.V. & Lipman, D.J. A genomic perspective on protein families. Science 278, 631–637 (1997).

    Article  CAS  Google Scholar 

  31. Pease, A.C. et al. Light-generated oligonucleotide arrays for rapid DNA sequence analysis. Proc. Natl. Acad. Sci. USA 91, 5022–5026 (1994).

    Article  CAS  Google Scholar 

  32. DeRisi, J.L., Iyer, V.R. & Brown, P.O. Exploring the metablic and genetic control of gene expression on a genomic scale. Science 278, 680–686 (1997).

    Article  CAS  Google Scholar 

  33. Tamayo, P. et al. Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc. Natl. Acad. Sci. USA 96, 2907–2912 (1999).

    Article  CAS  Google Scholar 

  34. Brown, M.P. et al. Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc. Natl. Acad. Sci. USA 97, 262–267 (2000).

    Article  CAS  Google Scholar 

  35. Uetz, P. et al. A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature 403, 623–627 (2000).

    Article  CAS  Google Scholar 

  36. Ito, T. et al. A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc. Natl. Acad. Sci. USA 98, 4569–4574 (2001).

    Article  CAS  Google Scholar 

  37. Gavin, A.C. et al. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 415, 147 (2002).

    Article  Google Scholar 

  38. Ho, Y. et al. Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 415, 180–183 (2002).

    Article  CAS  Google Scholar 

  39. von Mering, C. et al. Comparative assessment of large-scale data sets of protein-protein interactions. Nature 417, 399–403 (2002).

    Article  CAS  Google Scholar 

  40. Edwards, A.M. et al. Bridging structural biology and genomics: assessing protein interaction data with known complexes. Trends Genet. 18, 529–536 (2002).

    Article  CAS  Google Scholar 

  41. Ashburner, M. et al. The Gene Ontology Consortium. Nat. Genet. 25, 25–29 (2000).

    Article  CAS  Google Scholar 

  42. Kanehisa, M. A database for post-genome analysis. Trends Genet. 13, 375–376 (1997).

    Article  CAS  Google Scholar 

  43. Karp, P.D., Riley, M., Paley, S.M. & Pelligrini-Toole, A. EcoCyc: an encyclopedia of Escherichia coli genes and metabolism. Nucleic Acids Res. 24, 32–39 (1996).

    Article  CAS  Google Scholar 

  44. Ogata, H., Fujibuchi, W., Goto, S. & Kanehisa, M. A heuristic graph comparison algorithm and its application to detect functionally related enzyme clusters. Nucleic Acids Res. 28, 4021–4028 (2000).

    Article  CAS  Google Scholar 

  45. Barabasi, A.L. & Albert, R. Emergence of scaling in random networks. Science 286, 509–512 (1999).

    Article  CAS  Google Scholar 

  46. Watts, D.J. & Strogatz, S.H. Collective dynamics of 'small-world' networks. Nature 393, 440–442 (1998).

    Article  CAS  Google Scholar 

  47. Milo, R. et al. Network motifs: simple building blocks of complex networks. Science 298, 824–827 (2002).

    Article  CAS  Google Scholar 

  48. Ideker, T. et al. Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. Science 292, 929–934 (2001).

    Article  CAS  Google Scholar 

  49. Kumar, A. et al. Subcellular localization of the yeast proteome. Genes Dev. 16, 707–719 (2002).

    Article  CAS  Google Scholar 

  50. Kanehisa, M. Post-Genome Informatics (Oxford Univ. Press, Oxford, 2000).

    Google Scholar 

  51. Baxevanis, A.D. The molecular biology database collection: 2002 update. Nucleic Acids Res. 30, 1–12 (2002).

    Article  CAS  Google Scholar 

  52. Murzin, A.G., Brenner, S.E., Hubbard, T. & Chothia, C. SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247, 536–540 (1995).

    CAS  Google Scholar 

  53. Orengo, C.A. et al. CATH—a hierarchic classification of protein domain structures. Structure 5, 1093–1108 (1997).

    Article  CAS  Google Scholar 

  54. Wingender, E. et al. TRANSFAC: an integrated system for gene expression regulation. Nucleic Acids Res. 28, 316–319 (2000).

    Article  CAS  Google Scholar 

  55. Bader, G.D. et al. BIND—the biomolecular interaction network database. Nucleic Acids Res. 29, 242–245 (2001).

    Article  CAS  Google Scholar 

  56. Xenarios, I. et al. DIP, the database of interacting proteins: a research tool for studying cellular networks of protein interactions. Nucleic Acids Res. 30, 303–305 (2002).

    Article  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Minoru Kanehisa.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kanehisa, M., Bork, P. Bioinformatics in the post-sequence era. Nat Genet 33 (Suppl 3), 305–310 (2003). https://doi.org/10.1038/ng1109

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1038/ng1109

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing