Review

Use and misuse of the gene ontology annotations

Published online:

Abstract

The Gene Ontology (GO) project is a collaboration among model organism databases to describe gene products from all organisms using a consistent and computable language. GO produces sets of explicitly defined, structured vocabularies that describe biological processes, molecular functions and cellular components of gene products in both a computer- and human-readable manner. Here we describe key aspects of GO, which, when overlooked, can cause erroneous results, and address how these pitfalls can be avoided.

  • Subscribe to Nature Reviews Genetics for full access:

    $59

    Subscribe

Additional access options:

Already a subscriber?  Log in  now or  Register  for online access.

References

  1. 1.

    & Ontologies in biology: design, applications and future challenges. Nature Rev. Genet. 5, 213–222 (2004). This paper provides a more detailed overview of types and uses of ontologies in biology, with an emphasis on GO.

  2. 2.

    et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature Genet. 25, 25–29 (2000). This paper includes more details about the Gene Ontology.

  3. 3.

    et al. The genome sequence of Drosophila melanogaster. Science 287, 2185–2195 (2000).

  4. 4.

    , , & Overview of BioCreAtIvE: critical assessment of information extraction for biology. BMC Bioinformatics 6 (Suppl. 1), S1 (2005).

  5. 5.

    et al. An evaluation of GO annotation retrieval for BioCreAtIvE and GOA. BMC Bioinformatics 6 (Suppl. 1), S17 (2005).

  6. 6.

    et al. Network-based analysis of affected biological processes in type 2 diabetes models. PLoS Genet. 3, e96 (2007).

  7. 7.

    et al. Gene expression signatures that predict radiation exposure in mice and humans. PLoS Med. 4, e106 (2007).

  8. 8.

    The Gene Ontology Consortium. Creating the gene ontology resource: design and implementation. Genome Res. 11, 1425–1433 (2001). This paper describes in more detail how the GO ontology is built and maintained in more detail.

  9. 9.

    , , , & The Gene Ontology Annotation (GOA) Database — an integrated resource of GO annotations to the UniProt Knowledgebase. In Silico Biol. 4, 5–6 (2004).

  10. 10.

    & Stamen abscission zone transcriptome profiling reveals new candidates for abscission control: enhanced retention of floral organs in transgenic plants overexpressing Arabidopsis zinc finger protein 2. Plant Physiol. 146, 1305–1321 (2008).

  11. 11.

    et al. Transcriptional changes in the hookworm, Ancylostoma caninum, during the transition from a free-living to a parasitic larva. PLoS Negl. Trop. Dis. 2, e130 (2008).

  12. 12.

    , , & Genomic chart guiding embryonic stem cell cardiopoiesis. Genome Biol. 9, R6 (2008).

  13. 13.

    et al. Identification of a gene expression signature associated with recurrent disease in squamous cell carcinoma of the head and neck. Cancer Res. 64, 55–63 (2004).

  14. 14.

    & Gene expression profiles of genistein-treated PC3 prostate cancer cells. J. Nutr. 132, 3623–3631 (2002).

  15. 15.

    et al. Genome-wide expression of azoospermia testes demonstrates a specific profile and implicates ART3 in genetic susceptibility. PLoS Genet. 4, e26 (2008).

  16. 16.

    et al. Sister grouping of chimpanzees and humans as revealed by genome-wide phylogenetic analysis of brain gene expression profiles. Proc. Natl Acad. Sci. USA 101, 2957–2962 (2004).

  17. 17.

    et al. Expression of a pathogen-response program in peripheral blood cells defines a subgroup of rheumatoid arthritis patients. Genes Immun. 9, 16–22 (2008).

  18. 18.

    et al. Whole-genome analysis of histone H3 lysine 27 trimethylation in Arabidopsis. PLoS Biol. 5, e129 (2007).

  19. 19.

    , , , & Global functional profiling of gene expression. Genomics 81, 98–104 (2003). This paper describes how the significance of enriched or depleted terms is calculated using a number of alternative models in GO profiling.

  20. 20.

    , & POWER_SAGE: comparing statistical tests for SAGE experiments. Bioinformatics 16, 953–959 (2000).

  21. 21.

    , & Improved scoring of functional groups from gene expression data by decorrelating GO graph structure. Bioinformatics 22, 1600–1607 (2006). This paper explains some of the problems related to the structure of GO and proposes an approach that can be used to address them.

  22. 22.

    , , & Improved detection of overrepresentation of Gene Ontology annotations with parent child analysis. Bioinformatics 23, 3024–3031 (2007).

  23. 23.

    , , , & GOTax: investigating biological processes and biochemical activities along the taxonomic tree. Genome Biol. 8, R33 (2007).

  24. 24.

    , & GOing from functional genomics to biological significance. Cytogenet. Genome Res. 117, 278–287 (2007).

  25. 25.

    & Ontological analysis of gene expression data: current tools, limitations, and open problems. Bioinformatics 21, 3587–3595 (2005). This includes a detailed comparison of 14 functional profiling tools using a number of different criteria, including scope of the analysis, visualization capabilities, statistical model(s) used, correction for multiple comparisons, reference microarrays available, installation issues and sources of annotation data.

  26. 26.

    A simple sequentially rejective multiple test procedure. Scand. J. Stat. 6, 65–70 (1979).

  27. 27.

    & Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. (Ser. B) 57, 289–300 (1995).

  28. 28.

    Data Analysis Tools for DNA Microarrays (Chapman & Hall/CRC, Boca Raton, Florida, 2003).

  29. 29.

    A review of modern multiple hypothesis testing, with particular attention to the false discovery proportion. Stat. Methods Med. Res. 2007 08 14 (doi:101177/0962280206079046).

  30. 30.

    & Exploiting big biology: integrating large-scale biological data for function inference. Brief. Bioinform. 2, 363–374 (2001).

  31. 31.

    & Computational identification of cellular networks and pathways. Mol. Biosyst. 3, 478–482 (2007).

  32. 32.

    et al. Current progress in network research: toward reference networks for key model organisms. Brief. Bioinform. 8, 318–332 (2007).

  33. 33.

    , , , & A semantic analysis of the annotations of the human genome. Bioinformatics 21, 3416–3421 (2005).

  34. 34.

    , & Discovering functional relationships: biochemistry versus genetics. Trends Genet. 21, 424–427 (2005).

  35. 35.

    , , , & Finding function: evaluation methods for functional genomic data. BMC Genomics 7, 187 (2006).

  36. 36.

    et al. A draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science 296, 79–92 (2002).

  37. 37.

    et al. Functional annotation of a full-length mouse cDNA collection. Nature 409, 685–690 (2001).

  38. 38.

    et al. Annotated expressed sequence tags and cDNA microarrays for studies of brain and behavior in the honey bee. Genome Res. 12, 555–566 (2002).

  39. 39.

    et al. Transcriptional regulation of chemical diversity in Aspergillus fumigatus by LaeA. PLoS Pathog. 3, e50 (2007).

  40. 40.

    , , & Global analyses of mRNA translational control during early Drosophila embryogenesis. Genome Biol. 8, R63 (2007).

  41. 41.

    , , , & Lowest common ancestors in trees and directed acyclic graphs. J. Algorithms 57, 75–94 (2005).

Download references

Acknowledgements

We are grateful to the GO Consortium for their efforts in developing, maintaining and making accessible the GO ontology and annotations. We thank S. Carbon and C. Mungall for their help with SQL queries to the GO database and the following individuals for feedback on this manuscript: M. Ashburner, E. Camon, P. D'Eustachio, E. Dimmer, P. Gaudet, R. Huntley, R. Lovering, C. Mungall, S. Twigger, and K. Van Auken.

Author information

Author notes

Affiliations

  1. Carnegie Institution for Science, Department of Plant Biology, 260 Panama Street, Stanford, California 94305, USA.

    • Seung Yon Rhee
  2. Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    • Valerie Wood
  3. Lewis–Sigler Institute for Integrative Genomics, Carl Icahn Laboratory, Princeton University, Princeton, New Jersey 08544, USA.

    • Kara Dolinski
  4. Wayne State University, Department of Computer Science, 5,143 Cass Ave, Room 431 State Hall, Detroit, Michigan, 48202, USA.

    • Sorin Draghici

Authors

  1. Search for Seung Yon Rhee in:

  2. Search for Valerie Wood in:

  3. Search for Kara Dolinski in:

  4. Search for Sorin Draghici in:

Corresponding authors

Correspondence to Seung Yon Rhee or Sorin Draghici.

Supplementary information