Journal home
Advance online publication
Current issue
Press releases
Free Association (blog)
Guide to authors
Online submissionOnline submission
For referees
Free online issue
Contact the journal
Reprints and permissions
About this site
For librarians
NPG Resources
Nature Biotechnology
Nature Cell Biology
Nature Medicine
Nature Methods
Nature Reviews Cancer
Nature Reviews Genetics
Nature Reviews Molecular Cell Biology
Nature Conferences
RNAi Gateway
NPG Subject areas
Clinical Medicine
Drug Discovery
Earth Sciences
Evolution & Ecology
Materials Science
Medical Research
Molecular Cell Biology
Browse all publications
Nature Genetics  36, 663 (2004)

Reply to "Statistical concerns about the GSEA procedure"

Vamsi K Mootha1, Mark J Daly1, Nick Patterson1, Joel N Hirschhorn1, Leif C Groop2 & David Altshuler1

1 Broad Institute, Harvard University and Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.

2 Department of Endocrinology, University Hospital MAS, Lund University, Malmo, Sweden.

Correspondence should be addressed to Vamsi K Mootha or David Altshuler
In reply

Our manuscript1 described Gene Set Enrichment Analysis (GSEA) as "designed to detect subtle but coordinated differences in expression of a priori defined sets of functionally related genes." The method requires two inputs: (i) a list of genes that have been ranked according to expression difference between two states and (ii) a priori defined gene sets (e.g., pathways), each consisting of members drawn from this list. A gene set then receives an enrichment score (ES) that is a measure of statistical evidence rejecting the null hypothesis that its members are randomly distributed in the ordered list. By definition, the ES is a function of the size of a gene set, the total number of genes in the entire list and the relative ranks of the members of the gene set.

Damian and Gorfine's first comment is that ES can be influenced by the size of a gene set. We completely agree, because in general, statistical significance is a function of two parameters: the estimated magnitude of an effect and the variance in this estimate. Because estimates based on larger numbers of measurements have lower variance than those based on fewer measures, the ES (a measure of statistical significance) may be greater for a set of 100 genes than for a second set of only 5 genes. This can be true if some or all of the 100 genes individually rank lower than the smaller set containing 5 genes. We note that scoring by statistical significance is common; for example, it is standard in genetic linkage analysis to rank regions based on the lod score, which is a measure of statistical significance (not effect size).

In their second example, Damian and Gorfine show that by removing almost half of the lowest-ranking genes in their hypothetical experiment, the ES for gene set S2 falls. The ES falls not simply because of the definition of membership in gene sets (as they claim), but rather because of the selective removal of all genes ranked lower than those in S2. As the members of S2 are now relegated to the bottom of the list, rather than being near the top, this gene set must receive a lower ES. Contrary to Damian and Gorfine's correspondence, the mere presence or absence of gene sets (without changing the underlying list of genes) will not affect the ES of a defined gene set.

Damian and Gorfine conclude by stating that GSEA is sensitive to "a priori definition of the hypotheses of interest." We completely agree, as this is the desired behavior of "an analytic technique designed to test a priori defined gene sets"1. Given that the explicit goal of GSEA is to combine information about functional relationships with measurements of gene expression, it would be quite surprising if the definition of the gene sets had no influence on the results.

  1. Mootha, V.K. et al. Nat. Genet. 34, 267–273 (2003). | Article | PubMed  | ChemPort |
Previous | Next
Table of contents
Download PDFDownload PDF
Send to a friendSend to a friend
More articles like this


Export citation
Export references

Search buyers guide:

Nature Genetics
ISSN: 1061-4036
EISSN: 1546-1718
Journal home | Advance online publication | Current issue | Archive | Press releases | Supplements | Focuses | For authors | Online submission | Permissions | For referees | Free online issue | About the journal | Contact the journal | Subscribe | Advertising | work@npg | naturereprints | About this site | For librarians
Nature Publishing Group, publisher of Nature, and other science journals and reference works©2004 Nature Publishing Group | Privacy policy