Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

You are viewing this page in draft mode.

Use and misuse of the gene ontology annotations

Key Points

  • The Gene Ontology (GO) has a structure that allows powerful comparisons and inferences about gene functions, but its structure is often misunderstood or ignored in practice.

  • Evidence codes, annotations for unknown functions and annotation qualifiers are vital aspects of GO annotations, but these crucial features of GO annotation are often overlooked.

  • Functional profiling using GO annotations is often performed in an incorrect or inappropriate way. Important issues related to this include a tendency to perform only enrichment testing, using an incorrect reference set, lack of or an inappropriate correction for multiple comparisons, indiscriminate propagation of annotations through the hierarchy, and ignoring the correlations between GO terms.

  • Any analysis using GO annotations should cite data sources, including the version of ontology, date of annotation files, numbers and types of annotations used, and the versions and parameters of software, to ensure that results are fully reproducible.

  • Pie charts are not appropriate for displaying GO functional categorization because of the GO structure and annotation practices. Functional characterization studies should indicate the number of genes that are not mapped to any slim term, are mapped directly to the root node, or are unannotated.

Abstract

The Gene Ontology (GO) project is a collaboration among model organism databases to describe gene products from all organisms using a consistent and computable language. GO produces sets of explicitly defined, structured vocabularies that describe biological processes, molecular functions and cellular components of gene products in both a computer- and human-readable manner. Here we describe key aspects of GO, which, when overlooked, can cause erroneous results, and address how these pitfalls can be avoided.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Figure 1: Simple trees versus directed acyclic graphs.
Figure 2: Using gene ontology (GO) to bin the yeast genome into broad biological process categories.

References

  1. 1

    Bard, J. B. & Rhee, S. Y. Ontologies in biology: design, applications and future challenges. Nature Rev. Genet. 5, 213–222 (2004). This paper provides a more detailed overview of types and uses of ontologies in biology, with an emphasis on GO.

    CAS  Article  Google Scholar 

  2. 2

    Ashburner, M. et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature Genet. 25, 25–29 (2000). This paper includes more details about the Gene Ontology.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  3. 3

    Adams, M. D. et al. The genome sequence of Drosophila melanogaster. Science 287, 2185–2195 (2000).

    PubMed  Google Scholar 

  4. 4

    Hirschman, L., Yeh, A., Blaschke, C. & Valencia, A. Overview of BioCreAtIvE: critical assessment of information extraction for biology. BMC Bioinformatics 6 (Suppl. 1), S1 (2005).

    Article  PubMed  PubMed Central  Google Scholar 

  5. 5

    Camon, E. B. et al. An evaluation of GO annotation retrieval for BioCreAtIvE and GOA. BMC Bioinformatics 6 (Suppl. 1), S17 (2005).

    Article  PubMed  PubMed Central  Google Scholar 

  6. 6

    Liu, M. et al. Network-based analysis of affected biological processes in type 2 diabetes models. PLoS Genet. 3, e96 (2007).

    Article  PubMed  PubMed Central  Google Scholar 

  7. 7

    Dressman, H. K. et al. Gene expression signatures that predict radiation exposure in mice and humans. PLoS Med. 4, e106 (2007).

    Article  PubMed  PubMed Central  Google Scholar 

  8. 8

    The Gene Ontology Consortium. Creating the gene ontology resource: design and implementation. Genome Res. 11, 1425–1433 (2001). This paper describes in more detail how the GO ontology is built and maintained in more detail.

  9. 9

    Camon, E., Barrell, D., Lee, V., Dimmer, E. & Apweiler, R. The Gene Ontology Annotation (GOA) Database — an integrated resource of GO annotations to the UniProt Knowledgebase. In Silico Biol. 4, 5–6 (2004).

    Google Scholar 

  10. 10

    Cai, S. & Lashbrook, C. C. Stamen abscission zone transcriptome profiling reveals new candidates for abscission control: enhanced retention of floral organs in transgenic plants overexpressing Arabidopsis zinc finger protein 2. Plant Physiol. 146, 1305–1321 (2008).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  11. 11

    Datu, B. J. et al. Transcriptional changes in the hookworm, Ancylostoma caninum, during the transition from a free-living to a parasitic larva. PLoS Negl. Trop. Dis. 2, e130 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  12. 12

    Faustino, R. S., Behfar, A., Perez-Terzic, C. & Terzic, A. Genomic chart guiding embryonic stem cell cardiopoiesis. Genome Biol. 9, R6 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  13. 13

    Ginos, M. A. et al. Identification of a gene expression signature associated with recurrent disease in squamous cell carcinoma of the head and neck. Cancer Res. 64, 55–63 (2004).

    CAS  Article  Google Scholar 

  14. 14

    Li, Y. & Sarkar, F. H. Gene expression profiles of genistein-treated PC3 prostate cancer cells. J. Nutr. 132, 3623–3631 (2002).

    CAS  Article  Google Scholar 

  15. 15

    Okada, H. et al. Genome-wide expression of azoospermia testes demonstrates a specific profile and implicates ART3 in genetic susceptibility. PLoS Genet. 4, e26 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  16. 16

    Uddin, M. et al. Sister grouping of chimpanzees and humans as revealed by genome-wide phylogenetic analysis of brain gene expression profiles. Proc. Natl Acad. Sci. USA 101, 2957–2962 (2004).

    CAS  Article  Google Scholar 

  17. 17

    van der Pouw Kraan, T. C. et al. Expression of a pathogen-response program in peripheral blood cells defines a subgroup of rheumatoid arthritis patients. Genes Immun. 9, 16–22 (2008).

    CAS  Article  Google Scholar 

  18. 18

    Zhang, X. et al. Whole-genome analysis of histone H3 lysine 27 trimethylation in Arabidopsis. PLoS Biol. 5, e129 (2007).

    Article  PubMed  PubMed Central  Google Scholar 

  19. 19

    Draghici, S., Khatri, P., Martins, R. P., Ostermeier, G. C. & Krawetz, S. A. Global functional profiling of gene expression. Genomics 81, 98–104 (2003). This paper describes how the significance of enriched or depleted terms is calculated using a number of alternative models in GO profiling.

    CAS  Article  Google Scholar 

  20. 20

    Man, M. Z., Wang, X. & Wang, Y. POWER_SAGE: comparing statistical tests for SAGE experiments. Bioinformatics 16, 953–959 (2000).

    CAS  Article  Google Scholar 

  21. 21

    Alexa, A., Rahnenfuhrer, J. & Lengauer, T. Improved scoring of functional groups from gene expression data by decorrelating GO graph structure. Bioinformatics 22, 1600–1607 (2006). This paper explains some of the problems related to the structure of GO and proposes an approach that can be used to address them.

    CAS  Article  PubMed  Google Scholar 

  22. 22

    Grossmann, S., Bauer, S., Robinson, P. N. & Vingron, M. Improved detection of overrepresentation of Gene Ontology annotations with parent child analysis. Bioinformatics 23, 3024–3031 (2007).

    CAS  Article  Google Scholar 

  23. 23

    Schlicker, A., Rahnenfuhrer, J., Albrecht, M., Lengauer, T. & Domingues, F. S. GOTax: investigating biological processes and biochemical activities along the taxonomic tree. Genome Biol. 8, R33 (2007).

    Article  PubMed  PubMed Central  Google Scholar 

  24. 24

    McCarthy, F. M., Bridges, S. M. & Burgess, S. C. GOing from functional genomics to biological significance. Cytogenet. Genome Res. 117, 278–287 (2007).

    CAS  Article  Google Scholar 

  25. 25

    Khatri, P. & Draghici, S. Ontological analysis of gene expression data: current tools, limitations, and open problems. Bioinformatics 21, 3587–3595 (2005). This includes a detailed comparison of 14 functional profiling tools using a number of different criteria, including scope of the analysis, visualization capabilities, statistical model(s) used, correction for multiple comparisons, reference microarrays available, installation issues and sources of annotation data.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  26. 26

    Holm, S. A simple sequentially rejective multiple test procedure. Scand. J. Stat. 6, 65–70 (1979).

    Google Scholar 

  27. 27

    Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. (Ser. B) 57, 289–300 (1995).

    Google Scholar 

  28. 28

    Draghici, S. Data Analysis Tools for DNA Microarrays (Chapman & Hall/CRC, Boca Raton, Florida, 2003).

    Book  Google Scholar 

  29. 29

    Farcomeni, A. A review of modern multiple hypothesis testing, with particular attention to the false discovery proportion. Stat. Methods Med. Res. 14 Aug 2007 (doi:101177/0962280206079046).

  30. 30

    Marcotte, E. & Date, S. Exploiting big biology: integrating large-scale biological data for function inference. Brief. Bioinform. 2, 363–374 (2001).

    CAS  Article  Google Scholar 

  31. 31

    Markowetz, F. & Troyanskaya, O. G. Computational identification of cellular networks and pathways. Mol. Biosyst. 3, 478–482 (2007).

    CAS  Article  Google Scholar 

  32. 32

    Srinivasan, B. S. et al. Current progress in network research: toward reference networks for key model organisms. Brief. Bioinform. 8, 318–332 (2007).

    CAS  Article  Google Scholar 

  33. 33

    Khatri, P., Done, B., Rao, A., Done, A. & Draghici, S. A semantic analysis of the annotations of the human genome. Bioinformatics 21, 3416–3421 (2005).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  34. 34

    Wong, S. L., Zhang, L. V. & Roth, F. P. Discovering functional relationships: biochemistry versus genetics. Trends Genet. 21, 424–427 (2005).

    CAS  Article  Google Scholar 

  35. 35

    Myers, C. L., Barrett, D. R., Hibbs, M. A., Huttenhower, C. & Troyanskaya, O. G. Finding function: evaluation methods for functional genomic data. BMC Genomics 7, 187 (2006).

    Article  PubMed  PubMed Central  Google Scholar 

  36. 36

    Yu, J. et al. A draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science 296, 79–92 (2002).

    CAS  Article  Google Scholar 

  37. 37

    Kawai, J. et al. Functional annotation of a full-length mouse cDNA collection. Nature 409, 685–690 (2001).

    Article  Google Scholar 

  38. 38

    Whitfield, C. W. et al. Annotated expressed sequence tags and cDNA microarrays for studies of brain and behavior in the honey bee. Genome Res. 12, 555–566 (2002).

    Article  PubMed  PubMed Central  Google Scholar 

  39. 39

    Perrin, R. M. et al. Transcriptional regulation of chemical diversity in Aspergillus fumigatus by LaeA. PLoS Pathog. 3, e50 (2007).

    Article  PubMed  PubMed Central  Google Scholar 

  40. 40

    Qin, X., Ahn, S., Speed, T. P. & Rubin, G. M. Global analyses of mRNA translational control during early Drosophila embryogenesis. Genome Biol. 8, R63 (2007).

    Article  PubMed  PubMed Central  Google Scholar 

  41. 41

    Bender, M. A., Farach-Colton, M., Pemmasani, G., Skiena, S. & Sumazin, P. Lowest common ancestors in trees and directed acyclic graphs. J. Algorithms 57, 75–94 (2005).

    Article  Google Scholar 

Download references

Acknowledgements

We are grateful to the GO Consortium for their efforts in developing, maintaining and making accessible the GO ontology and annotations. We thank S. Carbon and C. Mungall for their help with SQL queries to the GO database and the following individuals for feedback on this manuscript: M. Ashburner, E. Camon, P. D'Eustachio, E. Dimmer, P. Gaudet, R. Huntley, R. Lovering, C. Mungall, S. Twigger, and K. Van Auken.

Author information

Affiliations

Authors

Corresponding authors

Correspondence to Seung Yon Rhee or Sorin Draghici.

Supplementary information

Related links

Related links

FURTHER INFORMATION

Seung Yon Rhee's hompage

Sorin Draghici's homepage

An Introduction to the Gene Ontology

Gene Ontology (GO) project

GO annotation conventions

GO annotation project at the European Bioinformatics Institute (GOA)

GO downloads

GO Slim and Subset Guide

Interpro database

ISI Web of Knowledge

Map2slim

Princeton University's GO Term Mapper

Reference genome annotation project at GO

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Yon Rhee, S., Wood, V., Dolinski, K. et al. Use and misuse of the gene ontology annotations. Nat Rev Genet 9, 509–515 (2008). https://doi.org/10.1038/nrg2363

Download citation

Further reading

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing