Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Protocol
  • Published:

Gene function analysis in complex data sets using ErmineJ


ErmineJ is software for the analysis of functionally interesting patterns in large gene lists drawn from gene expression profiling data or other high-throughput genomics studies. It can be used by biologists with no bioinformatics background to conduct sophisticated analyses of gene sets with multiple methods. It allows users to assess whether microarray data or other gene lists are enriched for a particular pathway or gene class. This protocol provides steps on how to format data files, determine analysis type, create custom gene sets and perform specific analyses—including overrepresentation analysis, genes score resampling and correlation resampling. ErmineJ differs from other methods in providing a rapid, simple and customizable analysis, including high-level visualization through its graphical user interface and scripting tools through its command-line interface, as well as custom gene sets and a variety of statistical methods. The protocol should take approximately 1 h, including (one-time) installation and setup.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Enter GO XML and Probe annotation file window.
Figure 2: Select analyses window.
Figure 3: Request for gene score file and raw data window in ErmineJ.
Figure 4: Custom gene sets window in ErmineJ.
Figure 5: Choose GO aspects window in ErmineJ.
Figure 6: Maximum and minimum gene set sizes window in ErmineJ.
Figure 7: Method options window for specific analyses.
Figure 8: Table view for results.
Figure 9: Tree view for results.
Figure 10: Exploring gene details in ErmineJ.

Similar content being viewed by others


  1. Ashburner, M. et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25, 25–29 (2000).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Zeeberg, B.R. et al. High-Throughput GoMiner, an 'industrial-strength' integrative Gene Ontology tool for interpretation of multiple-microarray experiments, with application to studies of Common Variable Immune Deficiency (CVID). BMC Bioinform. 6, 168 (2005).

    Article  Google Scholar 

  3. Martin, D. et al. GOToolBox: functional analysis of gene datasets based on Gene Ontology. Genome Biol. 5, R101 (2004).

    Article  PubMed  PubMed Central  Google Scholar 

  4. Al-Shahrour, F., Diaz-Uriarte, R. & Dopazo, J. FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes. Bioinformatics 20, 578–580 (2004).

    Article  CAS  PubMed  Google Scholar 

  5. Lee, J.S., Katari, G. & Sachidanandam, R. GObar: a Gene Ontology based analysis and visualization tool for gene sets. BMC Bioinform. 6, 189 (2005).

    Article  Google Scholar 

  6. Huang da, W., Sherman, B.T. & Lempicki, R.A. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc. 4, 44–57 (2009).

    Article  PubMed  Google Scholar 

  7. Lee, H.K., Braynen, W., Keshav, K. & Pavlidis, P. ErmineJ: tool for functional analysis of gene expression data sets. BMC Bioinform. 6, 269 (2005).

    Article  Google Scholar 

  8. Nam, D. et al. ADGO: analysis of differentially expressed gene sets using composite GO annotation. Bioinformatics 22, 2249–2253 (2006).

    Article  CAS  PubMed  Google Scholar 

  9. Subramanian, A., Kuehn, H., Gould, J., Tamayo, P. & Mesirov, J.P. GSEA-P: a desktop application for Gene Set Enrichment Analysis. Bioinformatics 23, 3251–3253 (2007).

    Article  CAS  PubMed  Google Scholar 

  10. Wrobel, G., Chalmel, F. & Primig, M. goCluster integrates statistical analysis and functional interpretation of microarray expression data. Bioinformatics 21, 3575–3577 (2005).

    Article  CAS  PubMed  Google Scholar 

  11. Zhang, B., Schmoyer, D., Kirov, S. & Snoddy, J. GOTree Machine (GOTM): a web-based platform for interpreting sets of interesting genes using Gene Ontology hierarchies. BMC Bioinform. 5, 16 (2004).

    Article  Google Scholar 

  12. Kim, S.B. et al. GAzer: gene set analyzer. Bioinformatics 23, 1697–1699 (2007).

    Article  CAS  PubMed  Google Scholar 

  13. Pavlidis, P., Furey, T.S., Liberto, M., Haussler, D. & Grundy, W.N. Promoter region-based classification of genes. Pac. Symp. Biocomput. 6, 151–163 (2001).

    Google Scholar 

  14. Breslin, T., Eden, P. & Krogh, M. Comparing functional annotation analyses with Catmap. BMC Bioinform. 5, 193 (2004).

    Article  Google Scholar 

  15. Basu, S.N., Kollu, R. & Banerjee-Basu, S. AutDB: a gene reference resource for autism research. Nucleic Acids Res. 37, D832–836 (2009).

    Article  CAS  PubMed  Google Scholar 

  16. Hamosh, A. et al. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 30, 52–55 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Rakhshandehroo, M., Hooiveld, G., Müller, M. & Kersten, S. Comparative analysis of gene regulation by the transcription factor PPARα between mouse and human. PLoS ONE 4, e6796 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  18. Gamper, M. et al. Gene expression profile of bladder tissue of patients with ulcerative interstitial cystitis. BMC Genomics 10, 199 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  19. Shao, L. & Vawter, M.P. Shared gene expression alterations in schizophrenia and bipolar disorder. Biol. Psychiatry 64, 89–97 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Lai, W.S. et al. Akt1 deficiency affects neuronal morphology and predisposes to abnormalities in prefrontal cortex functioning. Proc. Natl Acad. Sci. USA 103, 16906–16911 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Sequeira, A. et al. Global brain gene expression analysis links glutamatergic and GABAergic alterations to suicide and major depression. PLoS ONE 4, e6585 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  22. Fulp, C.T. et al. Identification of Arx transcriptional targets in the developing basal forebrain. Hum. Mol. Genet. 17, 3740–3760 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Roy. Stat. Soc. B 57, 12 (1995).

    Google Scholar 

Download references


This study was supported by NIH Grant GM076990, a Michael Smith Foundation for Health Research career award and by a CIHR New Investigator award. J.G. was supported by a MIND Foundation of BC postdoctoral award.

Author information

Authors and Affiliations



P.P., J.G. and M.M. prepared the protocol and the article.

Corresponding author

Correspondence to Paul Pavlidis.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary File 1: Gene ontology term file

This is the Gene ontology term file (titled “go_daily-termdb.rdf-xml.gz”) and it provides the gene sets to be used in the gene set enrichment analysis. It can be used directly by ErmineJ (it does not need to be unzipped). (ZIP 2527 kb)

Supplementary File 2: Probe annotation file

This is the probe annotation file for the HGU95 Affymetrix array design (titled “HG_U95.annot.txt”) which was used to generate the results shown in this protocol. (TXT 7610 kb)

Supplementary File 3: Gene score file

This is the gene score file used in the protocol. Scores are the p-values from an anova analysis of the raw data file. (TXT 252 kb)

Supplementary File 4: Raw data file

This file contains expression data which can be used for visualization purposes in ErmineJ and was used for the figures in this protocol. (TXT 5948 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gillis, J., Mistry, M. & Pavlidis, P. Gene function analysis in complex data sets using ErmineJ. Nat Protoc 5, 1148–1159 (2010).

Download citation

  • Published:

  • Issue Date:

  • DOI:

This article is cited by


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.


Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research