Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

Ontologies in biology: design, applications and future challenges

Key Points

  • Bio-ontologies provide a means of formalizing biological knowledge — for example, about genes, anatomy and phenotypes — in complex hierarchies that are composed of terms and rules.

  • Most bio-ontologies are stored at http://obo.sourceforge.net and are accepted by the community as authoritative.

  • All bio-ontologies assign a unique identifier (ID) for each term and these allow the archiving, storing and accessing of data in databases.

  • Ontology IDs provide a means of querying between databases (a function known as 'interoperability').

  • Complicated knowledge (such as that describing mutant phenotypes) can most easily be handled by composite annotations to multiple ontologies (anatomy, cell biology, pathology, traits, and so on).

  • The review concludes by discussing some of the problems that the field is now facing.

Abstract

Biological knowledge is inherently complex and so cannot readily be integrated into existing databases of molecular (for example, sequence) data. An ontology is a formal way of representing knowledge in which concepts are described both by their meaning and their relationship to each other. Unique identifiers that are associated with each concept in biological ontologies (bio-ontologies) can be used for linking to and querying molecular databases. This article reviews the principal bio-ontologies and the current issues in their design and development: these include the ability to query across databases and the problems of constructing ontologies that describe complex knowledge, such as phenotypes.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: The GO process ontology visualized with the DAG-edit program (version 4).
Figure 2: Coding phenotype.
Figure 3: Interoperability between mouse anatomy from digital sections and a gene-expression database.
Figure 4: Diversity of floral morphology in angiosperms.

Similar content being viewed by others

References

  1. D'Souza, D. The Virtue of Prosperity: Finding Values in an Age of Techno-Affluence (Simon and Schuster, Inc., New York, 2000).

    Google Scholar 

  2. Baxevanis, A. D. (ed.). Current Protocols in Bioinformatics (Wiley, New York, 2002).

    Google Scholar 

  3. van Heijst, G., Schreiber, A. & Wielinga, B. Using explicit ontologies in KBS development. Int. J. of Human-Computer Studies 46, 183–292 (1997).

    Article  Google Scholar 

  4. Stein, L. D. Integrating biological databases. Nature Rev. Genet. 4, 337–345 (2003).

    Article  CAS  Google Scholar 

  5. Simons, P. Parts: A Study in Ontology (Oxford Univ. Press, Oxford, UK, 1987).

    Google Scholar 

  6. Twigger, S. et al. Rat Genome Database (RGD): mapping disease onto the genome. Nucleic Acids Res. 30, 125–128 (2002).

    Article  CAS  Google Scholar 

  7. Garcia-Hernandez, M. et al. TAIR: a resource for integrated Arabidopsis data. Funct. Integr. Genomics 2, 239–253 (2002).

    Article  CAS  Google Scholar 

  8. Lawrence, C. J., Dong, Q., Polacco, M. L., Seigfried, T. E. & Brendel, V. MaizeGDB, the community database for maize genetics and genomics. Nucleic Acids Res. 32, D393–D397 (2004).

    Article  CAS  Google Scholar 

  9. Drysdale, R. Phenotypic data in FlyBase. Brief Bioinform. 2, 68–80 (2001). An early example of the use of multiple ontologies to describe phenotype.

    Article  CAS  Google Scholar 

  10. Ware, D. H. et al. Gramene, a tool for grass genomics. Plant Physiol. 130, 1606–1613 (2002).

    Article  CAS  Google Scholar 

  11. Blake, J. A., Richardson, J. E., Bult, C. J., Kadin, J. A. & Eppig, J. T. MGD: the Mouse Genome Database. Nucleic Acids Res. 31, 193–195 (2003).

    Article  CAS  Google Scholar 

  12. Schofield, P. N. et al. Pathbase: a database of mutant mouse pathology. Nucleic Acids Res. 32, D512–D515 (2004).

    Article  CAS  Google Scholar 

  13. Krieger, C. J. et al. MetaCyc: a multiorganism database of metabolic pathways and enzymes. Nucleic Acids Res. 32, D438–D442 (2004).

    Article  CAS  Google Scholar 

  14. Hewett, M. et al. PharmGKB: the Pharmacogenetics Knowledge Base. Nucleic Acids Res. 30, 163–165 (2002).

    Article  CAS  Google Scholar 

  15. Hill, D. P., Blake, J. A., Richardson, J. E. & Ringwald, M. Extension and integration of the gene ontology (GO): combining GO vocabularies with external vocabularies. Genome Res. 12, 1982–1991 (2002). Proposes a way to generate more specific ontologies by combining concepts from two orthogonal ontologies.

    Article  CAS  Google Scholar 

  16. Harhay, G. P. & Keele, J. W. Positional candidate gene selection from livestock EST databases using Gene Ontology. Bioinformatics 19, 249–255 (2003).

    Article  CAS  Google Scholar 

  17. Lin, J. et al. GeneCensus: genome comparisons in terms of metabolic pathway activity and protein family sharing. Nucleic Acids Res. 30, 4574–4582 (2002).

    Article  CAS  Google Scholar 

  18. Draghici, S. et al. Onto-Tools, the toolkit of the modern biologist: Onto-Express, Onto-Compare, Onto-Design and Onto-Translate. Nucleic Acids Res. 31, 3775–3781 (2003).

    Article  CAS  Google Scholar 

  19. Christie, K. R. et al. Saccharomyces Genome Database (SGD) provides tools to identify and analyze sequences from Saccharomyces cerevisiae and related sequences from other organisms. Nucleic Acids Res. 32, D311–D314 (2004).

    Article  CAS  Google Scholar 

  20. King, O. D. et al. Predicting phenotype from patterns of annotation. Bioinformatics 19 (Suppl. 1), I183–I189 (2003). Uses decision trees to predict phenotypes of yeast mutants on the basis of genes' annotations to GO and other phenotypic descriptions.

    Article  Google Scholar 

  21. Tulipano, P. K., Millar, W. S. & Cimino, J. J. Linking molecular imaging terminology to the gene ontology (GO). Pac. Symp. Biocomput. 613–623 (2003).

  22. Bodenreider, O., Mitchell, J. A. & McCray, A. T. Evaluation of the UMLS as a terminology and knowledge resource for biomedical informatics. Proc. AMIA Symp. 61–65 (2002).

  23. Leroy, G. & Chen, H. Meeting medical terminology needs — the Ontology-Enhanced Medical Concept Mapper. IEEE Trans. Inf. Technol. Biomed. 5, 261–270 (2001). Describes a query tool that involves the mapping of different concepts using human-created ontologies and natural language processing.

    Article  CAS  Google Scholar 

  24. Bodenreider, O., Burgun, A. & Mitchell, J. A. Evaluation of WordNet as a source of lay knowledge for molecular biology and genetic diseases: a feasibility study. Stud. Health Technol. Inform. 95, 379–384 (2003). Maps GO terms and NCBI's LocusLink terms to WordNet to determine the overlap between molecular biological and lay knowledge.

    PubMed  PubMed Central  Google Scholar 

  25. Judd, W. S., Campbell, C. S., Kellogg, E. A., Stevens, P. F. & Donoghue, M. J. Plant Systematics: A Phylogenetic Approach (Sinauer Associates, Inc., Sunderland, Massachusetts, 2002).

    Google Scholar 

  26. Cook, D. L., Farley, J. F. & Tapscott, S. J. A basis for a visual language for describing, archiving and analyzing functional models of complex biological systems. Genome Biol. 2, RESEARCH0012 (2001). Provides a lexicon of icons to graphically represent molecular biology information.

    Article  CAS  Google Scholar 

  27. Sigman, M. & Cecchi, G. A. Global organization of the WordNet lexicon. Proc. Natl Acad. Sci. USA 99, 1742–1747 (2002). Applies graph theoretical calculations to analyse the organization of WordNet.

    Article  CAS  Google Scholar 

  28. Ogata, H., Fujibuchi, W., Goto, S. & Kanehisa, M. A heuristic graph comparison algorithm and its application to detect functionally related enzyme clusters. Nucleic Acids Res. 28, 4021–4028 (2000). Uses graph comparison methods to correlate the genome locations of microbial genes and these organisms' metabolic pathways.

    Article  CAS  Google Scholar 

  29. Bard, J. Ontologies: formalising biological knowledge for bioinformatics. Bioessays 25, 501–506 (2003).

    Article  CAS  Google Scholar 

  30. Rosse, C. et al. Motivation and organizational principles for anatomical knowledge representation: the digital anatomist symbolic knowledge base. J. Am. Med. Inform. Assoc. 5, 17–40 (1998). Proposes a human anatomy ontology that accommodates both the systemic and regional (topographical) views of anatomy.

    Article  CAS  Google Scholar 

  31. Trombert-Paviot, B. et al. GALEN: a third generation terminology tool to support a multipurpose national coding system for surgical procedures. Int. J. Med. Inf. 58–59, 71–85 (2000). Provides an information-management architecture for handling all types of clinical data in language-independent ways.

    Article  Google Scholar 

  32. Harris, M. A. et al. The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res. 32, D258–D261 (2004).

    Article  CAS  Google Scholar 

  33. Hill, D. P. et al. The mouse Gene Expression Database (GXD): updates and enhancements. Nucleic Acids Res. 32, D568–D571 (2004).

    Article  CAS  Google Scholar 

  34. Noy, N. F. et al. Protege-2000: an open-source ontology — development and knowledge-acquisition environment. Proc. AMIA Symp. 953 (2003).

Download references

Acknowledgements

We thank the curators of the various animal, plant and prokaryote databases who participated in the mutant phenotype ontology meetings (see list of URLs in online links box for groups that participated). We are grateful to S. Aitkin for commenting on the material in box 1 and to M. Buzgo for providing the photographs in figure 4 and for helpful comments on the manuscript. S.Y.R. is supported in part by the National Science Foundation (NSF), and J.B.L.B. thanks the Biotechology and Biological Sciences Research Council (BBSRC) for funding. This is Carnegie publication 1680.

We dedicate this paper to the late Robin Winter who articulated much of our knowledge about human congenital dysmorphologies and who is sorely missed.

Author information

Authors and Affiliations

Authors

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Related links

Related links

FURTHER INFORMATION

Discussion paper by Michael Ashburner on phenotype and trait ontology

Minutes from phenotype meetings

Database groups that participated in phenotype meetings

The Arabidopsis Information Resource

Berkeley Drosophila Genome Project

DictyBase

Flybase

Gramene

International Crop Information System

The Institute for Genome Resources — microbial systems

The London Dysmorphology Database

MaizeGDB

Mouse Anatomy

Mouse Genome Informatics

Mouse mutagenesis centres

Nugene

OMIM

Rat Genome Database

Saccharomyces Genome Database

Glossary

PHARMACOGENETICS

The study of drug responses related to inherited genetic differences.

QUANTITATIVE TRAIT LOCUS

Genetic locus or chromosomal region that contributes to the phenotypic variation in continuously varying traits, such as weight.

SEMANTICS

The meaning of a string in some language; this is distinct from syntax, which describes how symbols can be combined independently of their meaning.

NATURAL LANGUAGE PROCESSING

Computer understanding, analysis, manipulation and/or generation of natural (human) language.

VOXEL

The three-dimensional, or volume, equivalent of a pixel (two-dimensional picture unit).

GRAPH THEORETICAL APPROACH

An approach to extracting meaning from ontologies that depends on using the intrinsic properties of graphs.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bard, J., Rhee, S. Ontologies in biology: design, applications and future challenges. Nat Rev Genet 5, 213–222 (2004). https://doi.org/10.1038/nrg1295

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1038/nrg1295

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing