Abstract
ErmineJ is software for the analysis of functionally interesting patterns in large gene lists drawn from gene expression profiling data or other high-throughput genomics studies. It can be used by biologists with no bioinformatics background to conduct sophisticated analyses of gene sets with multiple methods. It allows users to assess whether microarray data or other gene lists are enriched for a particular pathway or gene class. This protocol provides steps on how to format data files, determine analysis type, create custom gene sets and perform specific analyses—including overrepresentation analysis, genes score resampling and correlation resampling. ErmineJ differs from other methods in providing a rapid, simple and customizable analysis, including high-level visualization through its graphical user interface and scripting tools through its command-line interface, as well as custom gene sets and a variety of statistical methods. The protocol should take approximately 1 h, including (one-time) installation and setup.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Ashburner, M. et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25, 25–29 (2000).
Zeeberg, B.R. et al. High-Throughput GoMiner, an 'industrial-strength' integrative Gene Ontology tool for interpretation of multiple-microarray experiments, with application to studies of Common Variable Immune Deficiency (CVID). BMC Bioinform. 6, 168 (2005).
Martin, D. et al. GOToolBox: functional analysis of gene datasets based on Gene Ontology. Genome Biol. 5, R101 (2004).
Al-Shahrour, F., Diaz-Uriarte, R. & Dopazo, J. FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes. Bioinformatics 20, 578–580 (2004).
Lee, J.S., Katari, G. & Sachidanandam, R. GObar: a Gene Ontology based analysis and visualization tool for gene sets. BMC Bioinform. 6, 189 (2005).
Huang da, W., Sherman, B.T. & Lempicki, R.A. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc. 4, 44–57 (2009).
Lee, H.K., Braynen, W., Keshav, K. & Pavlidis, P. ErmineJ: tool for functional analysis of gene expression data sets. BMC Bioinform. 6, 269 (2005).
Nam, D. et al. ADGO: analysis of differentially expressed gene sets using composite GO annotation. Bioinformatics 22, 2249–2253 (2006).
Subramanian, A., Kuehn, H., Gould, J., Tamayo, P. & Mesirov, J.P. GSEA-P: a desktop application for Gene Set Enrichment Analysis. Bioinformatics 23, 3251–3253 (2007).
Wrobel, G., Chalmel, F. & Primig, M. goCluster integrates statistical analysis and functional interpretation of microarray expression data. Bioinformatics 21, 3575–3577 (2005).
Zhang, B., Schmoyer, D., Kirov, S. & Snoddy, J. GOTree Machine (GOTM): a web-based platform for interpreting sets of interesting genes using Gene Ontology hierarchies. BMC Bioinform. 5, 16 (2004).
Kim, S.B. et al. GAzer: gene set analyzer. Bioinformatics 23, 1697–1699 (2007).
Pavlidis, P., Furey, T.S., Liberto, M., Haussler, D. & Grundy, W.N. Promoter region-based classification of genes. Pac. Symp. Biocomput. 6, 151–163 (2001).
Breslin, T., Eden, P. & Krogh, M. Comparing functional annotation analyses with Catmap. BMC Bioinform. 5, 193 (2004).
Basu, S.N., Kollu, R. & Banerjee-Basu, S. AutDB: a gene reference resource for autism research. Nucleic Acids Res. 37, D832–836 (2009).
Hamosh, A. et al. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 30, 52–55 (2002).
Rakhshandehroo, M., Hooiveld, G., Müller, M. & Kersten, S. Comparative analysis of gene regulation by the transcription factor PPARα between mouse and human. PLoS ONE 4, e6796 (2009).
Gamper, M. et al. Gene expression profile of bladder tissue of patients with ulcerative interstitial cystitis. BMC Genomics 10, 199 (2009).
Shao, L. & Vawter, M.P. Shared gene expression alterations in schizophrenia and bipolar disorder. Biol. Psychiatry 64, 89–97 (2008).
Lai, W.S. et al. Akt1 deficiency affects neuronal morphology and predisposes to abnormalities in prefrontal cortex functioning. Proc. Natl Acad. Sci. USA 103, 16906–16911 (2006).
Sequeira, A. et al. Global brain gene expression analysis links glutamatergic and GABAergic alterations to suicide and major depression. PLoS ONE 4, e6585 (2009).
Fulp, C.T. et al. Identification of Arx transcriptional targets in the developing basal forebrain. Hum. Mol. Genet. 17, 3740–3760 (2008).
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Roy. Stat. Soc. B 57, 12 (1995).
Acknowledgements
This study was supported by NIH Grant GM076990, a Michael Smith Foundation for Health Research career award and by a CIHR New Investigator award. J.G. was supported by a MIND Foundation of BC postdoctoral award.
Author information
Authors and Affiliations
Contributions
P.P., J.G. and M.M. prepared the protocol and the article.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Supplementary File 1: Gene ontology term file
This is the Gene ontology term file (titled “go_daily-termdb.rdf-xml.gz”) and it provides the gene sets to be used in the gene set enrichment analysis. It can be used directly by ErmineJ (it does not need to be unzipped). (ZIP 2527 kb)
Supplementary File 2: Probe annotation file
This is the probe annotation file for the HGU95 Affymetrix array design (titled “HG_U95.annot.txt”) which was used to generate the results shown in this protocol. (TXT 7610 kb)
Supplementary File 3: Gene score file
This is the gene score file used in the protocol. Scores are the p-values from an anova analysis of the raw data file. (TXT 252 kb)
Supplementary File 4: Raw data file
This file contains expression data which can be used for visualization purposes in ErmineJ and was used for the figures in this protocol. (TXT 5948 kb)
Rights and permissions
About this article
Cite this article
Gillis, J., Mistry, M. & Pavlidis, P. Gene function analysis in complex data sets using ErmineJ. Nat Protoc 5, 1148–1159 (2010). https://doi.org/10.1038/nprot.2010.78
Published:
Issue Date:
DOI: https://doi.org/10.1038/nprot.2010.78
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.