Use of controlled terminology hierarchies to detect common characteristics of genes within expression clusters

Masys, Daniel; Welsh, John B.; Fink, J. Lynn; Gribskov, Michael; Klacansky, Igor; Corbeil, Jacques

doi:10.1038/87204

Download PDF

Poster Abstracts
Published: April 2001

Use of controlled terminology hierarchies to detect common characteristics of genes within expression clusters

Daniel Masys¹,
John B. Welsh²,
J. Lynn Fink³,
Michael Gribskov³,
Igor Klacansky¹ &
…
Jacques Corbeil¹

Nature Genetics volume 27, page 72 (2001)Cite this article

217 Accesses
1 Citations
Metrics details

A growing variety of statistical analysis approaches are available to identify groups of genes that share common expression patterns; however, the interpretation of the biological characteristics of genes in such clusters remains primarily a manual task. We have developed a data-mining method that uses indexing terms from the published literature linked to specific genes to present a view of the conceptual similarity of genes within a cluster or group of interest. The method takes advantage of the hierarchical nature of medical subject headings used to index citations in the MEDLINE database and the registry numbers applied to enzymes. The results are generated as dynamic HTML with links to the citations whose keywords appear in the term hierarchies. We have applied this method to gene clusters in the publication by Golub et al.¹ describing statistical methods for classifying acute myeloblastic leukemia (AML) and acute lymphoblastic leukemia (ALL) without a priori biological knowledge. In both sets of genes the most common enzymatic descriptor class is that of complement-activating enzymes. In the ALL-predictive set of genes, these enzyme descriptors include endonucleases, endopeptidases, amidohydrolases and acid anhydride hydrolases. In the AML-predictive set, several plasminogen activators occur as keywords, a finding that may correlate with defibrination syndromes and other hemostatic abnormalities that are associated with AML but not with ALL. Overall, complement activation is a common and potentially clinically significant phenomena in acute leukemias, and the high frequency of this descriptor in the set of highly expressed genes is consistent with our observations that informative genes were not merely markers of hematopoeitic lineage, but encoded proteins important in cancer pathogenesis. These conceptual similarities, revealed by the automated summing and organization of literature keywords associated with these 50 genes, are a new finding that complements the interpretations of the authors of the original paper.

References

Golub, T. et al. Science 286, 531–537 (1999).
Article CAS Google Scholar

Download references

Author information

Authors and Affiliations

University of California San Diego, San Diego, California, USA
Daniel Masys, Igor Klacansky & Jacques Corbeil
Genomics Institute, Novartis Research Foundation, San Diego, California, USA
John B. Welsh
San Diego Supercomputer Center, San Diego, California, USA
J. Lynn Fink & Michael Gribskov

Authors

Daniel Masys
View author publications
You can also search for this author in PubMed Google Scholar
John B. Welsh
View author publications
You can also search for this author in PubMed Google Scholar
J. Lynn Fink
View author publications
You can also search for this author in PubMed Google Scholar
Michael Gribskov
View author publications
You can also search for this author in PubMed Google Scholar
Igor Klacansky
View author publications
You can also search for this author in PubMed Google Scholar
Jacques Corbeil
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Masys, D., Welsh, J., Fink, J. et al. Use of controlled terminology hierarchies to detect common characteristics of genes within expression clusters. Nat Genet 27 (Suppl 4), 72 (2001). https://doi.org/10.1038/87204

Download citation

Issue Date: April 2001
DOI: https://doi.org/10.1038/87204

This article is cited by

An introduction to information retrieval: applications in genomics
- P M Nadkarni
The Pharmacogenomics Journal (2002)

Use of controlled terminology hierarchies to detect common characteristics of genes within expression clusters

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

This article is cited by

An introduction to information retrieval: applications in genomics

Search

Quick links

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

An introduction to information retrieval: applications in genomics

Search

Quick links