Abstract
The computational reconstruction of biological systems, 'systems biology', is necessarily dependent on the existence of well-annotated data sets defining and describing the components of these systems, especially genes and the proteins they encode. Information about these components can be accessed either through structured bioinformatics databases, which store basic chemical and functional information abstracted from (or supplementing) the scientific literature, or through the literature itself, which is richer in content but essentially unstructured.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Apweiler, R., Bairoch, A. & Wu, C. H. Protein sequence databases. Curr. Opin. Chem. Biol. 8, 76–80 (2004).
Berman, H., Henrick, K. & Nakamura, H. Announcing the worldwide Protein Data Bank. Nature Struct. Biol. 10, 980 (2003).
Cochrane, G. et al. EMBL Nucleotide Sequence Database: developments in 2005. Nucleic Acids Res. 34, D10–D15 (2006).
Benson, D. A., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J. & Wheeler, D. L. GenBank. Nucleic Acids Res. 34, D16–D20 (2006).
Okubo, K., Sugawara, H., Gojobori, T. & Tateno, Y. DDBJ in preparation for overview of research activities behind data submissions. Nucleic Acids Res. 34, D6–9 (2006).
Birney, E. et al. Ensembl 2006. Nucleic Acids Res. 34, D556–D561 (2006).
Wu, C. H. et al. The Universal Protein Resource (UniProt): an expanding universe of protein information. Nucleic Acids Res. 34, D187–D191 (2006).
Garavelli, J. S. The RESID Database of Protein Modifications as a resource and annotation tool. Proteomics 4, 1527–1533 (2004).
Christie, K. R. et al. Saccharomyces Genome Database (SGD) provides tools to identify and analyze sequences from Saccharomyces cerevisiae and related sequences from other organisms. Nucleic Acids Res. 32, D311–D314 (2004).
Schwarz, E. M. et al. WormBase: better software, richer content. Nucleic Acids Res. 34, D475–D478 (2006).
Grumbling, G. & Strelets, V. FlyBase: anatomical data, images and queries. Nucleic Acids Res. 34, D484–D488 (2006).
Blake, J. A., Eppig, J. T., Bult, C. J., Kadin, J. A. & Richardson, J. E. The Mouse Genome Database (MGD): updates and enhancements. Nucleic Acids Res. 34, D562–D567 (2006).
Sequeira, E., McEntyre, J. & Lipman, D. PubMed Central decentralized. Nature 410, 740 (2001).
Lopez, R., Duggan, K., Harte, N. & Kibria, A. Public services from the European Bioinformatics Institute. Brief Bioinform. 4, 332–340 (2003).
Jenuth, J. P. The NCBI. Publicly available tools and resources on the Web. Methods Mol. Biol. 132, 301–312 (2000).
Zdobnov, E. M., Lopez, R., Apweiler, R. & Etzold, T. The EBI SRS server-new features. Bioinformatics 18, 1149–1150 (2002).
Etzold, T., Ulyanov, A. & Argos, P. SRS: information retrieval system for molecular biology data banks. Methods Enzymol. 266, 114–128 (1996).
Geer, R. C. & Sayers, E. W. Entrez: making use of its power. Brief Bioinform. 4, 179–184 (2003).
Gene Ontology Consortium. The Gene Ontology (GO) project in 2006. Nucleic Acids Res. 34, D322–D326 (2006).
Whetzel, P. L. et al. The MGED Ontology: a resource for semantics-based description of microarray experiments. Bioinformatics 22, 866–873 (2006).
Orchard, S. et al. Autumn 2005 Workshop of the Human Proteome Organisation Proteomics Standards Initiative (HUPO-PSI) Geneva, September, 4–6, 2005. Proteomics 6, 738–741 (2006).
Kersey, P. et al. Integr8 and Genome Reviews: integrated views of complete genomes and proteomes. Nucleic Acids Res 33, D297–302 (2005).
Pruitt, K. D., Tatusova, T. & Maglott, D. R. NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 33, D501–D504 (2005).
Maglott, D., Ostell, J., Pruitt, K. D. & Tatusova, T. Entrez Gene: gene-centered information at NCBI. Nucleic Acids Res. 33, D54–D58 (2005).
McGinnis, S. & Madden, T. L. BLAST: at the core of a powerful and diverse set of sequence analysis tools. Nucleic Acids Res. 32, W20–W25 (2004).
Pearson, W. R. Using the FASTA program to search protein and DNA sequence databases. Methods Mol. Biol. 24, 307–331 (1994).
Smith, T. F. & Waterman, M. S. Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981).
Eddy, S. R. What is a hidden Markov model? Nature Biotechnol. 22, 1315–1316 (2004).
Mulder, N. J. et al. InterPro, progress and status in 2005. Nucleic Acids Res. 33, D201–D205 (2005).
Quevillon, E. et al. InterProScan: protein domains identifier. Nucleic Acids Res. 33, W116–W120 (2005).
Kopp, J. & Schwede, T. The SWISS-MODEL Repository: new features and functionalities. Nucleic Acids Res. 34, D315–D318 (2006).
Sonnhammer, E. L., von Heijne, G. & Krogh, A. A hidden Markov model for predicting transmembrane helices in protein sequences. Proc. Int. Conf. Intell. Syst. Mol. Biol. 6, 175–182 (1998).
McGuffin, L. J. & Jones, D. T. Improvement of the GenTHREADER method for genomic fold recognition. Bioinformatics 19, 874–881 (2003).
McGuffin, L. J., Bryson, K. & Jones, D. T. The PSIPRED protein structure prediction server. Bioinformatics 16, 404–405 (2000).
Nelson, S. J., Schopen, M., Savage, A. G., Schulman, J. L. & Arluk, N. The MeSH translation maintenance system: structure, interface design, and implementation. Medinfo 11, 67–69 (2004).
Hoffmann, R. & Valencia, A. Implementing the iHOP concept for navigation of biomedical literature. Bioinformatics 21, ii252–ii258 (2005).
Oinn, T. et al. Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics 20, 3045–3054 (2004).
Hermjakob, H. et al. IntAct: an open source molecular interaction database. Nucleic Acids Res. 32, D452–D455 (2004).
Stein, L. D. et al. The generic genome browser: a building block for a model organism system database. Genome Res. 12, 1599–1610 (2002).
Davidson, S. B. et al. K2/Kleisli and GUS: Experiments in integrated access to genomic data sources. IBM Systems Journal 40, 512–531 (2001).
Karp, P. D. et al. Expansion of the BioCyc collection of pathway/genome databases to 160 genomes. Nucleic Acids Res. 33, 6083–6089 (2005).
Durinck, S. et al. BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis. Bioinformatics 21, 3439–3440 (2005).
Stevens, R. D., Robinson, A. J. & Goble, C. A. myGrid: personalised bioinformatics on the information grid. Bioinformatics 19, i302–i304 (2003).
Berners-Lee, T. & Hendler, J. Publishing on the semantic web. Nature 410, 1023–1024 (2001).
Author information
Authors and Affiliations
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Supplementary Information
Supplementary table S1 (PDF 69 kb)
Rights and permissions
About this article
Cite this article
Kersey, P., Apweiler, R. Linking publication, gene and protein data. Nat Cell Biol 8, 1183–1189 (2006). https://doi.org/10.1038/ncb1495
Issue Date:
DOI: https://doi.org/10.1038/ncb1495
This article is cited by
-
Openness and trust in data-intensive science: the case of biocuration
Medicine, Health Care and Philosophy (2020)
-
Mining locus tags in PubMed Central to improve microbial gene annotation
BMC Bioinformatics (2014)
-
Bioinformatics and molecular modeling in glycobiology
Cellular and Molecular Life Sciences (2010)
-
Towards bioinformatics assisted infectious disease control
BMC Bioinformatics (2009)
-
Evading the annotation bottleneck: using sequence similarity to search non-sequence gene data
BMC Bioinformatics (2008)