Statistical benchmarking and class discovery in gene expression data

Ben-Dor, Amir; Friedman, Nir; Yakhini, Zohar

doi:10.1038/87368

Download PDF

Poster Abstracts
Published: April 2001

Statistical benchmarking and class discovery in gene expression data

Amir Ben-Dor¹,
Nir Friedman² &
Zohar Yakhini¹

Nature Genetics volume 27, page 96 (2001)Cite this article

259 Accesses
Metrics details

Recent studies have elucidated putative disease subtypes from gene expression data^1,2,3. In the data analysis phase of this process we seek a partition of the set of sample tissues into, say, two statistically meaningful classes. All current algorithmic approaches to this problem are clustering-driven, using similarity measures that account for all measured genes. Such methods fail to discover classes supported on small subsets of measured genes. Consider a candidate subtype. Label each sample in the data + if it is in the class or – otherwise. Some genes have dramatic + to − expression-level differences. Under a null model, in which a vector of labels of the appropriate composition is uniformly drawn, we can assign P values to all + to − expression-level differences. For actual biological classes we typically observe an overabundance of differentially expressed genes (compared with the null model). Efficient methods for calculating exact score distributions, under this null model, allow for a new approach to class discovery. For candidate partitions of the sample set we compute the abundance of differentially expressed genes. Statistical significance is assigned to the observed abundance using the aforementioned methods. Simulated annealing search heuristics (in the space of all possible classes) find the highest-scoring partitions. Thus grouping is based on subsets of the genes rather than on the entire set. The calculations are accurate and efficient, in contrast to sampling-based methods. We will discuss statistical and algorithmic approaches and use actual gene expression data to demonstrate the discovery process.

References

Alizadeh, A. et al. Nature 403, 503–511 (2000).
Article CAS Google Scholar
Bittner, M. et al. Nature 406, 536–540 (2000).
Article CAS Google Scholar
Golub, T. et al. Science 286, 531–537 (1999).
Article CAS Google Scholar

Download references

Author information

Authors and Affiliations

Agilent Laboratories, Haifa, Israel
Amir Ben-Dor & Zohar Yakhini
Hebrew University, Jerusalem, Israel
Nir Friedman

Authors

Amir Ben-Dor
View author publications
You can also search for this author in PubMed Google Scholar
Nir Friedman
View author publications
You can also search for this author in PubMed Google Scholar
Zohar Yakhini
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ben-Dor, A., Friedman, N. & Yakhini, Z. Statistical benchmarking and class discovery in gene expression data. Nat Genet 27 (Suppl 4), 96 (2001). https://doi.org/10.1038/87368

Download citation

Issue Date: April 2001
DOI: https://doi.org/10.1038/87368

Statistical benchmarking and class discovery in gene expression data

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Search

Quick links

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links