Bioinformatics in the post-sequence era

Kanehisa, Minoru; Bork, Peer

doi:10.1038/ng1109

Review Article
Published: March 2003

Bioinformatics in the post-sequence era

Minoru Kanehisa¹ &
Peer Bork²

Nature Genetics volume 33, pages 305–310 (2003)Cite this article

3939 Accesses
124 Citations
3 Altmetric
Metrics details

Abstract

In the past decade, bioinformatics has become an integral part of research and development in the biomedical sciences. Bioinformatics now has an essential role both in deciphering genomic, transcriptomic and proteomic data generated by high-throughput experimental technologies and in organizing information gathered from traditional biology. Sequence-based methods of analyzing individual genes or proteins have been elaborated and expanded, and methods have been developed for analyzing large numbers of genes or proteins simultaneously, such as in the identification of clusters of related genes and networks of interacting proteins. With the complete genome sequences for an increasing number of organisms at hand, bioinformatics is beginning to provide both conceptual bases and practical methods for detecting systemic functional behaviors of the cell and the organism.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 2: Bioinformatics developments of the past decade.**

**Figure 3: Bioinformatics now and in the future.**

Computational analysis of cancer genome sequencing data

Article 08 December 2021

reString: an open-source Python software to perform automatic functional enrichment retrieval, results aggregation and data visualization

Article Open access 06 December 2021

Sequence-structure-function relationships in the microbial protein universe

Article Open access 26 April 2023

References

Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).
Article CAS Google Scholar
Lipman, D.J. & Pearson, W.R. Rapid and sensitive protein similarity searches. Science 227, 1435–1441 (1985).
Article CAS Google Scholar
Smith, T.F. & Waterman, M.S. Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981).
Article CAS Google Scholar
Olson, M., Hood, L., Cantor, C. & Botstein D. A common language for physical mapping of the human genome. Science 245, 1435–1435 (1989).
Google Scholar
Adams, M.D. et al. Complementary DNA sequencing: expressed sequence tags and human genome project. Science 252, 1651–1656 (1991).
Article CAS Google Scholar
Fleischmann, R.D. et al. Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269, 496–512 (1995).
Article CAS Google Scholar
Goffeau, A. et al. Life with 6000 genes. Science 274, 546–567 (1996).
Article CAS Google Scholar
The C. elegans Sequencing Consortium. Genome sequence of the nematode C. elegans: a platform for investigating biology. Science 282, 2012–2018 (1998).
Adams, M.D. et al. The genome sequence of Drosophila melanogaster. Science 287, 2185–2195 (2000).
Article Google Scholar
Lander, E.S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).
Article CAS Google Scholar
Venter, J.C. et al. The sequence of the human genome. Science 291, 1304–1351 (2001).
Article CAS Google Scholar
Bork, P. & Koonin, E.V. Predicting functions from protein sequences—where are the bottlenecks? Nat. Genet. 18, 313–318 (1998).
Article CAS Google Scholar
Park, J. et al. Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods. J. Mol. Biol. 284, 1201–1210 (1998).
Article CAS Google Scholar
Altschul, S.F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).
Article CAS Google Scholar
Krogh, A., Brown, M., Mian, I.S., Sjolander, K. & Haussler, D. Hidden Markov models in computational biology. Applications to protein modeling. J. Mol. Biol. 235, 1501–1531 (1994).
Article CAS Google Scholar
Thompson, J.D., Higgins, D.G. & Gibson, T.J. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22, 4673–4680 (1994).
Article CAS Google Scholar
Thompson, J.D., Gibson, T.J., Plewniak, F., Jeanmougin, F. & Higgins, D.G. The CLUSTAL X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 25, 4876–4882 (1997).
Article CAS Google Scholar
Rost, B. & Sander, C. Improved prediction of protein secondary structure by use of sequence profiles and neural networks. Proc. Natl. Acad. Sci. USA 90, 7558–7562 (1993).
Article CAS Google Scholar
Nakai, K. & Kanehisa, M. A knowledge base for predicting protein localization sites in eukaryotic cells. Genomics 14, 897–911 (1992).
Article CAS Google Scholar
Bork, P. Powers and pitfalls in sequence analysis: the 70% hurdle. Genome Res. 10, 398–400 (2000).
Article CAS Google Scholar
Falquet, L. et al. The PROSITE database, its status in 2002. Nucleic Acids Res. 30, 235–238 (2002).
Article CAS Google Scholar
Henikoff, J.G., Greene, E.A., Pietrokovski, S. & Henikoff, S. Increased coverage of protein families with the blocks database servers. Nucleic Acids Res. 28, 228–230 (2000).
Article CAS Google Scholar
Attwood, T.K. et al. PRINTS and PRINTS-S shed light on protein ancestry. Nucleic Acids Res. 30, 239–241 (2002).
Article CAS Google Scholar
Corpet, F., Servant, F., Gouzy, J. & Kahn, D. ProDom and ProDom-CG: tools for protein domain analysis and whole genome comparisons. Nucleic Acids Res. 28, 267–269 (2000).
Article CAS Google Scholar
Sonnhammer, E.L., Eddy, S.R., and Durbin, R. Pfam: a comprehensive database of protein domain families based on seed alignments. Proteins 28, 405–420 (1997).
Article CAS Google Scholar
Schultz, J., Milpetz, F., Bork, P. & Ponting, C.P. SMART, a simple modular architecture research tool: identification of signaling domains. Proc. Natl. Acad. Sci. USA 95, 5857–5864 (1998).
Article CAS Google Scholar
Haft, D.H. et al. TIGRFAMs: a protein family resource for the functional identification of proteins. Nucleic Acids Res. 29, 41–43 (2001).
Article CAS Google Scholar
Huynen, M., Snel, B., Lathe, W. III & Bork, P. Predicting protein function by genomic context: quantitative evaluation and qualitative inferences. Genome Res. 10, 1204–1210 (2000).
Article CAS Google Scholar
Marcotte, E.M., Pellegrini, M., Thompson, M.J., Yeates, T.O. & Eisenberg, D. A combined algorithm for genome-wide prediction of protein function. Nature 402, 83–86 (1999).
Article CAS Google Scholar
Tatusov, R.L., Koonin, E.V. & Lipman, D.J. A genomic perspective on protein families. Science 278, 631–637 (1997).
Article CAS Google Scholar
Pease, A.C. et al. Light-generated oligonucleotide arrays for rapid DNA sequence analysis. Proc. Natl. Acad. Sci. USA 91, 5022–5026 (1994).
Article CAS Google Scholar
DeRisi, J.L., Iyer, V.R. & Brown, P.O. Exploring the metablic and genetic control of gene expression on a genomic scale. Science 278, 680–686 (1997).
Article CAS Google Scholar
Tamayo, P. et al. Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc. Natl. Acad. Sci. USA 96, 2907–2912 (1999).
Article CAS Google Scholar
Brown, M.P. et al. Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc. Natl. Acad. Sci. USA 97, 262–267 (2000).
Article CAS Google Scholar
Uetz, P. et al. A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature 403, 623–627 (2000).
Article CAS Google Scholar
Ito, T. et al. A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc. Natl. Acad. Sci. USA 98, 4569–4574 (2001).
Article CAS Google Scholar
Gavin, A.C. et al. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 415, 147 (2002).
Article Google Scholar
Ho, Y. et al. Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 415, 180–183 (2002).
Article CAS Google Scholar
von Mering, C. et al. Comparative assessment of large-scale data sets of protein-protein interactions. Nature 417, 399–403 (2002).
Article CAS Google Scholar
Edwards, A.M. et al. Bridging structural biology and genomics: assessing protein interaction data with known complexes. Trends Genet. 18, 529–536 (2002).
Article CAS Google Scholar
Ashburner, M. et al. The Gene Ontology Consortium. Nat. Genet. 25, 25–29 (2000).
Article CAS Google Scholar
Kanehisa, M. A database for post-genome analysis. Trends Genet. 13, 375–376 (1997).
Article CAS Google Scholar
Karp, P.D., Riley, M., Paley, S.M. & Pelligrini-Toole, A. EcoCyc: an encyclopedia of Escherichia coli genes and metabolism. Nucleic Acids Res. 24, 32–39 (1996).
Article CAS Google Scholar
Ogata, H., Fujibuchi, W., Goto, S. & Kanehisa, M. A heuristic graph comparison algorithm and its application to detect functionally related enzyme clusters. Nucleic Acids Res. 28, 4021–4028 (2000).
Article CAS Google Scholar
Barabasi, A.L. & Albert, R. Emergence of scaling in random networks. Science 286, 509–512 (1999).
Article CAS Google Scholar
Watts, D.J. & Strogatz, S.H. Collective dynamics of 'small-world' networks. Nature 393, 440–442 (1998).
Article CAS Google Scholar
Milo, R. et al. Network motifs: simple building blocks of complex networks. Science 298, 824–827 (2002).
Article CAS Google Scholar
Ideker, T. et al. Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. Science 292, 929–934 (2001).
Article CAS Google Scholar
Kumar, A. et al. Subcellular localization of the yeast proteome. Genes Dev. 16, 707–719 (2002).
Article CAS Google Scholar
Kanehisa, M. Post-Genome Informatics (Oxford Univ. Press, Oxford, 2000).
Google Scholar
Baxevanis, A.D. The molecular biology database collection: 2002 update. Nucleic Acids Res. 30, 1–12 (2002).
Article CAS Google Scholar
Murzin, A.G., Brenner, S.E., Hubbard, T. & Chothia, C. SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247, 536–540 (1995).
CAS Google Scholar
Orengo, C.A. et al. CATH—a hierarchic classification of protein domain structures. Structure 5, 1093–1108 (1997).
Article CAS Google Scholar
Wingender, E. et al. TRANSFAC: an integrated system for gene expression regulation. Nucleic Acids Res. 28, 316–319 (2000).
Article CAS Google Scholar
Bader, G.D. et al. BIND—the biomolecular interaction network database. Nucleic Acids Res. 29, 242–245 (2001).
Article CAS Google Scholar
Xenarios, I. et al. DIP, the database of interacting proteins: a research tool for studying cellular networks of protein interactions. Nucleic Acids Res. 30, 303–305 (2002).
Article CAS Google Scholar

Download references

Author information

Authors and Affiliations

Bioinformatics Center, Kyoto University, Uji, Kyoto, 611-0011, Japan
Minoru Kanehisa
European Molecular Biology Laboratory, Heidelberg, 69012, Germany
Peer Bork

Authors

Minoru Kanehisa
View author publications
You can also search for this author in PubMed Google Scholar
Peer Bork
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Minoru Kanehisa.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kanehisa, M., Bork, P. Bioinformatics in the post-sequence era. Nat Genet 33 (Suppl 3), 305–310 (2003). https://doi.org/10.1038/ng1109

Download citation

Issue Date: March 2003
DOI: https://doi.org/10.1038/ng1109

This article is cited by

In silico prediction methods of self-interacting proteins: an empirical and academic survey
- Zhanheng Chen
- Zhuhong You
- Yanbin Wang
Frontiers of Computer Science (2023)
Applying genomics in regulatory toxicology: a report of the ECETOC workshop on omics threshold on non-adversity
- Timothy W. Gant
- Scott S. Auerbach
- Carole Yauk
Archives of Toxicology (2023)
Genomic diversity of Mycobacterium avium subsp. paratuberculosis: pangenomic approach for highlighting unique genomic features with newly constructed complete genomes
- Jaewon Lim
- Hong-Tae Park
- Donghyuk Kim
Veterinary Research (2021)
Research progress in bioremediation of petroleum pollution
- Yong Yang
- Zhan-Wei Zhang
- Wen-Yu Lu
Environmental Science and Pollution Research (2021)
A pattern recognition model to distinguish cancerous DNA sequences via signal processing methods
- Amin Khodaei
- Mohammad-Reza Feizi-Derakhshi
- Behzad Mozaffari-Tazehkand
Soft Computing (2020)

Bioinformatics in the post-sequence era

Abstract

Access options

Similar content being viewed by others

Computational analysis of cancer genome sequencing data

reString: an open-source Python software to perform automatic functional enrichment retrieval, results aggregation and data visualization

Sequence-structure-function relationships in the microbial protein universe

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

This article is cited by

In silico prediction methods of self-interacting proteins: an empirical and academic survey

Applying genomics in regulatory toxicology: a report of the ECETOC workshop on omics threshold on non-adversity

Genomic diversity of Mycobacterium avium subsp. paratuberculosis: pangenomic approach for highlighting unique genomic features with newly constructed complete genomes

Research progress in bioremediation of petroleum pollution

A pattern recognition model to distinguish cancerous DNA sequences via signal processing methods

Search

Quick links

Abstract

Access options

Similar content being viewed by others

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links