Key Points
-
Bio-ontologies provide a means of formalizing biological knowledge — for example, about genes, anatomy and phenotypes — in complex hierarchies that are composed of terms and rules.
-
Most bio-ontologies are stored at http://obo.sourceforge.net and are accepted by the community as authoritative.
-
All bio-ontologies assign a unique identifier (ID) for each term and these allow the archiving, storing and accessing of data in databases.
-
Ontology IDs provide a means of querying between databases (a function known as 'interoperability').
-
Complicated knowledge (such as that describing mutant phenotypes) can most easily be handled by composite annotations to multiple ontologies (anatomy, cell biology, pathology, traits, and so on).
-
The review concludes by discussing some of the problems that the field is now facing.
Abstract
Biological knowledge is inherently complex and so cannot readily be integrated into existing databases of molecular (for example, sequence) data. An ontology is a formal way of representing knowledge in which concepts are described both by their meaning and their relationship to each other. Unique identifiers that are associated with each concept in biological ontologies (bio-ontologies) can be used for linking to and querying molecular databases. This article reviews the principal bio-ontologies and the current issues in their design and development: these include the ability to query across databases and the problems of constructing ontologies that describe complex knowledge, such as phenotypes.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$189.00 per year
only $15.75 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
D'Souza, D. The Virtue of Prosperity: Finding Values in an Age of Techno-Affluence (Simon and Schuster, Inc., New York, 2000).
Baxevanis, A. D. (ed.). Current Protocols in Bioinformatics (Wiley, New York, 2002).
van Heijst, G., Schreiber, A. & Wielinga, B. Using explicit ontologies in KBS development. Int. J. of Human-Computer Studies 46, 183–292 (1997).
Stein, L. D. Integrating biological databases. Nature Rev. Genet. 4, 337–345 (2003).
Simons, P. Parts: A Study in Ontology (Oxford Univ. Press, Oxford, UK, 1987).
Twigger, S. et al. Rat Genome Database (RGD): mapping disease onto the genome. Nucleic Acids Res. 30, 125–128 (2002).
Garcia-Hernandez, M. et al. TAIR: a resource for integrated Arabidopsis data. Funct. Integr. Genomics 2, 239–253 (2002).
Lawrence, C. J., Dong, Q., Polacco, M. L., Seigfried, T. E. & Brendel, V. MaizeGDB, the community database for maize genetics and genomics. Nucleic Acids Res. 32, D393–D397 (2004).
Drysdale, R. Phenotypic data in FlyBase. Brief Bioinform. 2, 68–80 (2001). An early example of the use of multiple ontologies to describe phenotype.
Ware, D. H. et al. Gramene, a tool for grass genomics. Plant Physiol. 130, 1606–1613 (2002).
Blake, J. A., Richardson, J. E., Bult, C. J., Kadin, J. A. & Eppig, J. T. MGD: the Mouse Genome Database. Nucleic Acids Res. 31, 193–195 (2003).
Schofield, P. N. et al. Pathbase: a database of mutant mouse pathology. Nucleic Acids Res. 32, D512–D515 (2004).
Krieger, C. J. et al. MetaCyc: a multiorganism database of metabolic pathways and enzymes. Nucleic Acids Res. 32, D438–D442 (2004).
Hewett, M. et al. PharmGKB: the Pharmacogenetics Knowledge Base. Nucleic Acids Res. 30, 163–165 (2002).
Hill, D. P., Blake, J. A., Richardson, J. E. & Ringwald, M. Extension and integration of the gene ontology (GO): combining GO vocabularies with external vocabularies. Genome Res. 12, 1982–1991 (2002). Proposes a way to generate more specific ontologies by combining concepts from two orthogonal ontologies.
Harhay, G. P. & Keele, J. W. Positional candidate gene selection from livestock EST databases using Gene Ontology. Bioinformatics 19, 249–255 (2003).
Lin, J. et al. GeneCensus: genome comparisons in terms of metabolic pathway activity and protein family sharing. Nucleic Acids Res. 30, 4574–4582 (2002).
Draghici, S. et al. Onto-Tools, the toolkit of the modern biologist: Onto-Express, Onto-Compare, Onto-Design and Onto-Translate. Nucleic Acids Res. 31, 3775–3781 (2003).
Christie, K. R. et al. Saccharomyces Genome Database (SGD) provides tools to identify and analyze sequences from Saccharomyces cerevisiae and related sequences from other organisms. Nucleic Acids Res. 32, D311–D314 (2004).
King, O. D. et al. Predicting phenotype from patterns of annotation. Bioinformatics 19 (Suppl. 1), I183–I189 (2003). Uses decision trees to predict phenotypes of yeast mutants on the basis of genes' annotations to GO and other phenotypic descriptions.
Tulipano, P. K., Millar, W. S. & Cimino, J. J. Linking molecular imaging terminology to the gene ontology (GO). Pac. Symp. Biocomput. 613–623 (2003).
Bodenreider, O., Mitchell, J. A. & McCray, A. T. Evaluation of the UMLS as a terminology and knowledge resource for biomedical informatics. Proc. AMIA Symp. 61–65 (2002).
Leroy, G. & Chen, H. Meeting medical terminology needs — the Ontology-Enhanced Medical Concept Mapper. IEEE Trans. Inf. Technol. Biomed. 5, 261–270 (2001). Describes a query tool that involves the mapping of different concepts using human-created ontologies and natural language processing.
Bodenreider, O., Burgun, A. & Mitchell, J. A. Evaluation of WordNet as a source of lay knowledge for molecular biology and genetic diseases: a feasibility study. Stud. Health Technol. Inform. 95, 379–384 (2003). Maps GO terms and NCBI's LocusLink terms to WordNet to determine the overlap between molecular biological and lay knowledge.
Judd, W. S., Campbell, C. S., Kellogg, E. A., Stevens, P. F. & Donoghue, M. J. Plant Systematics: A Phylogenetic Approach (Sinauer Associates, Inc., Sunderland, Massachusetts, 2002).
Cook, D. L., Farley, J. F. & Tapscott, S. J. A basis for a visual language for describing, archiving and analyzing functional models of complex biological systems. Genome Biol. 2, RESEARCH0012 (2001). Provides a lexicon of icons to graphically represent molecular biology information.
Sigman, M. & Cecchi, G. A. Global organization of the WordNet lexicon. Proc. Natl Acad. Sci. USA 99, 1742–1747 (2002). Applies graph theoretical calculations to analyse the organization of WordNet.
Ogata, H., Fujibuchi, W., Goto, S. & Kanehisa, M. A heuristic graph comparison algorithm and its application to detect functionally related enzyme clusters. Nucleic Acids Res. 28, 4021–4028 (2000). Uses graph comparison methods to correlate the genome locations of microbial genes and these organisms' metabolic pathways.
Bard, J. Ontologies: formalising biological knowledge for bioinformatics. Bioessays 25, 501–506 (2003).
Rosse, C. et al. Motivation and organizational principles for anatomical knowledge representation: the digital anatomist symbolic knowledge base. J. Am. Med. Inform. Assoc. 5, 17–40 (1998). Proposes a human anatomy ontology that accommodates both the systemic and regional (topographical) views of anatomy.
Trombert-Paviot, B. et al. GALEN: a third generation terminology tool to support a multipurpose national coding system for surgical procedures. Int. J. Med. Inf. 58–59, 71–85 (2000). Provides an information-management architecture for handling all types of clinical data in language-independent ways.
Harris, M. A. et al. The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res. 32, D258–D261 (2004).
Hill, D. P. et al. The mouse Gene Expression Database (GXD): updates and enhancements. Nucleic Acids Res. 32, D568–D571 (2004).
Noy, N. F. et al. Protege-2000: an open-source ontology — development and knowledge-acquisition environment. Proc. AMIA Symp. 953 (2003).
Acknowledgements
We thank the curators of the various animal, plant and prokaryote databases who participated in the mutant phenotype ontology meetings (see list of URLs in online links box for groups that participated). We are grateful to S. Aitkin for commenting on the material in box 1 and to M. Buzgo for providing the photographs in figure 4 and for helpful comments on the manuscript. S.Y.R. is supported in part by the National Science Foundation (NSF), and J.B.L.B. thanks the Biotechology and Biological Sciences Research Council (BBSRC) for funding. This is Carnegie publication 1680.
We dedicate this paper to the late Robin Winter who articulated much of our knowledge about human congenital dysmorphologies and who is sorely missed.
Author information
Authors and Affiliations
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Related links
Related links
FURTHER INFORMATION
Discussion paper by Michael Ashburner on phenotype and trait ontology
Minutes from phenotype meetings
Database groups that participated in phenotype meetings
The Arabidopsis Information Resource
Berkeley Drosophila Genome Project
International Crop Information System
The Institute for Genome Resources — microbial systems
Glossary
- PHARMACOGENETICS
-
The study of drug responses related to inherited genetic differences.
- QUANTITATIVE TRAIT LOCUS
-
Genetic locus or chromosomal region that contributes to the phenotypic variation in continuously varying traits, such as weight.
- SEMANTICS
-
The meaning of a string in some language; this is distinct from syntax, which describes how symbols can be combined independently of their meaning.
- NATURAL LANGUAGE PROCESSING
-
Computer understanding, analysis, manipulation and/or generation of natural (human) language.
- VOXEL
-
The three-dimensional, or volume, equivalent of a pixel (two-dimensional picture unit).
- GRAPH THEORETICAL APPROACH
-
An approach to extracting meaning from ontologies that depends on using the intrinsic properties of graphs.
Rights and permissions
About this article
Cite this article
Bard, J., Rhee, S. Ontologies in biology: design, applications and future challenges. Nat Rev Genet 5, 213–222 (2004). https://doi.org/10.1038/nrg1295
Issue Date:
DOI: https://doi.org/10.1038/nrg1295