Abstract
In the past decade, bioinformatics has become an integral part of research and development in the biomedical sciences. Bioinformatics now has an essential role both in deciphering genomic, transcriptomic and proteomic data generated by high-throughput experimental technologies and in organizing information gathered from traditional biology. Sequence-based methods of analyzing individual genes or proteins have been elaborated and expanded, and methods have been developed for analyzing large numbers of genes or proteins simultaneously, such as in the identification of clusters of related genes and networks of interacting proteins. With the complete genome sequences for an increasing number of organisms at hand, bioinformatics is beginning to provide both conceptual bases and practical methods for detecting systemic functional behaviors of the cell and the organism.
This is a preview of subscription content, access via your institution
Relevant articles
Open Access articles citing this article.
-
Genomic diversity of Mycobacterium avium subsp. paratuberculosis: pangenomic approach for highlighting unique genomic features with newly constructed complete genomes
Veterinary Research Open Access 18 March 2021
-
Genomics of drug sensitivity in bladder cancer: an integrated resource for pharmacogenomic analysis in bladder cancer
BMC Medical Genomics Open Access 03 October 2018
-
A proteome view of structural, functional, and taxonomic characteristics of major protein domain clusters
Scientific Reports Open Access 27 October 2017
Access options
Subscribe to this journal
Receive 12 print issues and online access
$189.00 per year
only $15.75 per issue
Rent or buy this article
Get just this article for as long as you need it
$39.95
Prices may be subject to local taxes which are calculated during checkout



References
Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).
Lipman, D.J. & Pearson, W.R. Rapid and sensitive protein similarity searches. Science 227, 1435–1441 (1985).
Smith, T.F. & Waterman, M.S. Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981).
Olson, M., Hood, L., Cantor, C. & Botstein D. A common language for physical mapping of the human genome. Science 245, 1435–1435 (1989).
Adams, M.D. et al. Complementary DNA sequencing: expressed sequence tags and human genome project. Science 252, 1651–1656 (1991).
Fleischmann, R.D. et al. Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269, 496–512 (1995).
Goffeau, A. et al. Life with 6000 genes. Science 274, 546–567 (1996).
The C. elegans Sequencing Consortium. Genome sequence of the nematode C. elegans: a platform for investigating biology. Science 282, 2012–2018 (1998).
Adams, M.D. et al. The genome sequence of Drosophila melanogaster. Science 287, 2185–2195 (2000).
Lander, E.S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).
Venter, J.C. et al. The sequence of the human genome. Science 291, 1304–1351 (2001).
Bork, P. & Koonin, E.V. Predicting functions from protein sequences—where are the bottlenecks? Nat. Genet. 18, 313–318 (1998).
Park, J. et al. Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods. J. Mol. Biol. 284, 1201–1210 (1998).
Altschul, S.F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).
Krogh, A., Brown, M., Mian, I.S., Sjolander, K. & Haussler, D. Hidden Markov models in computational biology. Applications to protein modeling. J. Mol. Biol. 235, 1501–1531 (1994).
Thompson, J.D., Higgins, D.G. & Gibson, T.J. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22, 4673–4680 (1994).
Thompson, J.D., Gibson, T.J., Plewniak, F., Jeanmougin, F. & Higgins, D.G. The CLUSTAL X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 25, 4876–4882 (1997).
Rost, B. & Sander, C. Improved prediction of protein secondary structure by use of sequence profiles and neural networks. Proc. Natl. Acad. Sci. USA 90, 7558–7562 (1993).
Nakai, K. & Kanehisa, M. A knowledge base for predicting protein localization sites in eukaryotic cells. Genomics 14, 897–911 (1992).
Bork, P. Powers and pitfalls in sequence analysis: the 70% hurdle. Genome Res. 10, 398–400 (2000).
Falquet, L. et al. The PROSITE database, its status in 2002. Nucleic Acids Res. 30, 235–238 (2002).
Henikoff, J.G., Greene, E.A., Pietrokovski, S. & Henikoff, S. Increased coverage of protein families with the blocks database servers. Nucleic Acids Res. 28, 228–230 (2000).
Attwood, T.K. et al. PRINTS and PRINTS-S shed light on protein ancestry. Nucleic Acids Res. 30, 239–241 (2002).
Corpet, F., Servant, F., Gouzy, J. & Kahn, D. ProDom and ProDom-CG: tools for protein domain analysis and whole genome comparisons. Nucleic Acids Res. 28, 267–269 (2000).
Sonnhammer, E.L., Eddy, S.R., and Durbin, R. Pfam: a comprehensive database of protein domain families based on seed alignments. Proteins 28, 405–420 (1997).
Schultz, J., Milpetz, F., Bork, P. & Ponting, C.P. SMART, a simple modular architecture research tool: identification of signaling domains. Proc. Natl. Acad. Sci. USA 95, 5857–5864 (1998).
Haft, D.H. et al. TIGRFAMs: a protein family resource for the functional identification of proteins. Nucleic Acids Res. 29, 41–43 (2001).
Huynen, M., Snel, B., Lathe, W. III & Bork, P. Predicting protein function by genomic context: quantitative evaluation and qualitative inferences. Genome Res. 10, 1204–1210 (2000).
Marcotte, E.M., Pellegrini, M., Thompson, M.J., Yeates, T.O. & Eisenberg, D. A combined algorithm for genome-wide prediction of protein function. Nature 402, 83–86 (1999).
Tatusov, R.L., Koonin, E.V. & Lipman, D.J. A genomic perspective on protein families. Science 278, 631–637 (1997).
Pease, A.C. et al. Light-generated oligonucleotide arrays for rapid DNA sequence analysis. Proc. Natl. Acad. Sci. USA 91, 5022–5026 (1994).
DeRisi, J.L., Iyer, V.R. & Brown, P.O. Exploring the metablic and genetic control of gene expression on a genomic scale. Science 278, 680–686 (1997).
Tamayo, P. et al. Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc. Natl. Acad. Sci. USA 96, 2907–2912 (1999).
Brown, M.P. et al. Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc. Natl. Acad. Sci. USA 97, 262–267 (2000).
Uetz, P. et al. A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature 403, 623–627 (2000).
Ito, T. et al. A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc. Natl. Acad. Sci. USA 98, 4569–4574 (2001).
Gavin, A.C. et al. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 415, 147 (2002).
Ho, Y. et al. Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 415, 180–183 (2002).
von Mering, C. et al. Comparative assessment of large-scale data sets of protein-protein interactions. Nature 417, 399–403 (2002).
Edwards, A.M. et al. Bridging structural biology and genomics: assessing protein interaction data with known complexes. Trends Genet. 18, 529–536 (2002).
Ashburner, M. et al. The Gene Ontology Consortium. Nat. Genet. 25, 25–29 (2000).
Kanehisa, M. A database for post-genome analysis. Trends Genet. 13, 375–376 (1997).
Karp, P.D., Riley, M., Paley, S.M. & Pelligrini-Toole, A. EcoCyc: an encyclopedia of Escherichia coli genes and metabolism. Nucleic Acids Res. 24, 32–39 (1996).
Ogata, H., Fujibuchi, W., Goto, S. & Kanehisa, M. A heuristic graph comparison algorithm and its application to detect functionally related enzyme clusters. Nucleic Acids Res. 28, 4021–4028 (2000).
Barabasi, A.L. & Albert, R. Emergence of scaling in random networks. Science 286, 509–512 (1999).
Watts, D.J. & Strogatz, S.H. Collective dynamics of 'small-world' networks. Nature 393, 440–442 (1998).
Milo, R. et al. Network motifs: simple building blocks of complex networks. Science 298, 824–827 (2002).
Ideker, T. et al. Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. Science 292, 929–934 (2001).
Kumar, A. et al. Subcellular localization of the yeast proteome. Genes Dev. 16, 707–719 (2002).
Kanehisa, M. Post-Genome Informatics (Oxford Univ. Press, Oxford, 2000).
Baxevanis, A.D. The molecular biology database collection: 2002 update. Nucleic Acids Res. 30, 1–12 (2002).
Murzin, A.G., Brenner, S.E., Hubbard, T. & Chothia, C. SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247, 536–540 (1995).
Orengo, C.A. et al. CATH—a hierarchic classification of protein domain structures. Structure 5, 1093–1108 (1997).
Wingender, E. et al. TRANSFAC: an integrated system for gene expression regulation. Nucleic Acids Res. 28, 316–319 (2000).
Bader, G.D. et al. BIND—the biomolecular interaction network database. Nucleic Acids Res. 29, 242–245 (2001).
Xenarios, I. et al. DIP, the database of interacting proteins: a research tool for studying cellular networks of protein interactions. Nucleic Acids Res. 30, 303–305 (2002).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kanehisa, M., Bork, P. Bioinformatics in the post-sequence era. Nat Genet 33 (Suppl 3), 305–310 (2003). https://doi.org/10.1038/ng1109
Issue Date:
DOI: https://doi.org/10.1038/ng1109
This article is cited by
-
In silico prediction methods of self-interacting proteins: an empirical and academic survey
Frontiers of Computer Science (2023)
-
Genomic diversity of Mycobacterium avium subsp. paratuberculosis: pangenomic approach for highlighting unique genomic features with newly constructed complete genomes
Veterinary Research (2021)
-
Research progress in bioremediation of petroleum pollution
Environmental Science and Pollution Research (2021)
-
A pattern recognition model to distinguish cancerous DNA sequences via signal processing methods
Soft Computing (2020)
-
Back to the Colorectal Cancer Consensus Molecular Subtype Future
Current Gastroenterology Reports (2019)