An introduction to information retrieval: applications in genomics

Nadkarni, P M

doi:10.1038/sj.tpj.6500084

Review
Published: 30 April 2002

An introduction to information retrieval: applications in genomics

P M Nadkarni¹

The Pharmacogenomics Journal volume 2, pages 96–102 (2002)Cite this article

1217 Accesses
7 Citations
Metrics details

Abstract

Information retrieval (IR) is the field of computer science that deals with the processing of documents containing free text, so that they can be rapidly retrieved based on keywords specified in a user's query. IR technology is the basis of Web-based search engines, and plays a vital role in biomedical research, because it is the foundation of software that supports literature search. Documents can be indexed by both the words they contain, as well as the concepts that can be matched to domain-specific thesauri; concept matching, however, poses several practical difficulties that make it unsuitable for use by itself. This article provides an introduction to IR and summarizes various applications of IR and related technologies to genomics.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

Rummagene: massive mining of gene sets from supporting materials of biomedical research publications

Article Open access 20 April 2024

NLM-Chem, a new resource for chemical entity recognition in PubMed full text literature

Article Open access 25 March 2021

NERO: a biomedical named-entity (recognition) ontology with a large, annotated corpus reveals meaningful associations through text embedding

Article Open access 20 October 2021

References

Salton G . Automatic Text Processing: the transformation, analysis, and retrieval of information by computer Addison-Wesley: Reading, MA 1989
Google Scholar
Van Rijsbergen CJ . Information Retrieval Butterworths: London, UK 1979
Google Scholar
Baeza-Yates R, Ribeiro-Neto B . Modern Information Retrieval Addison-Wesley Longman: Harlow, UK 1999
Google Scholar
Witten IH, Moffat A, Bell TC . Managing Gigabytes Morgan Kaufman: San Francisco, CA 1999
Google Scholar
Porter MF . An algorithm for suffix stripping Program 1980 14: 130–137
Article Google Scholar
Harman D . How effective is suffixing? J Am Soc Inform Sci 1991 42: 7–15
Article Google Scholar
Xu J, Croft WB . Corpus-based stemming using co-occurrence of word variants ACM Trans Inform Syst 1979 16: 61–81
Article Google Scholar
Nadkarni PM, Chen RS, Brandt CA . UMLS concept indexing for production databases: a feasibility study J Am Med Inform Assoc 2001 8: 80–91
Article CAS Google Scholar
Elkin PL, Cimino JJ, Lowe HJ, Aronow DB, Payne TH, Pincett PS et al . Mapping to MESH: the art of trapping MESH equivalence from within narrative text In Proc Symposium on Computer Applications in Medical Care 1988 pp 185–190
Aronson A, Rindflesch T, Browne A . Exploiting a large thesaurus for information retrieval In Proceedings of the RIAO 1994 pp 197–216
Aronson AR, Rindflesch TC . Query expansion using the UMLS Metathesaurus In Proceedings/AMIA Annual Fall Symposium 1997 pp 485–489
Rindflesch TC, Aronson AR . Ambiguity resolution while mapping free text to the UMLS Metathesaurus In Proceedings–the Annual Symposium on Computer Applications in Medical Care 1994 pp 240–244
Masys D . Linking microarray data to the literature (Editorial) Nature Genet 2001 27: 9–10
Article Google Scholar
Mutalik P, Deshpande A, Nadkarni P . Use of general-purpose negation detection to augment concept indexing of medical documents: a quantitative study using the UMLS J Am Med Inform Assoc 2001 8: 598–609
Article CAS Google Scholar
Williams JH, Perriens MP . Automated full text indexing and searching systems In IBM Information Systems Symposium Washington, DC 1968 pp 335–350
Sparck Jones K . A statistical interpretation of term specificity and its application in retrieval J Documentation 1972 28: 11–21
Article Google Scholar
Sparck-Jones K, Walter S, Robertson SE . Information retrieval: development and comparative experiments (Part I) Inform Proc Manage 2000 36: 779–808
Article Google Scholar
Sparck-Jones K, Walter S, Robertson SE . Information retrieval: development and comparative experiments (Part 2) Inform Proc Manage 2000 36: 809–840
Article Google Scholar
Google Inc Google: Technology Overview 2001
Marshall E . Medline searches turn up cases of suspected plagiarism (News) Science 1998 279: 473–474
Article CAS Google Scholar
OMIM. Online Mendelian Inheritance in Man In: McKusick–Nathans Institute for Genetic Medicine, Johns Hopkins University (Baltimore, MD) and National Center for Biotechnology Information, National Library of Medicine (Bethesda, MD) 2001
Ley K, Brewer K, Moton A . A web-based research tool for functional genomics of the microcirculation: the leukocyte adhesion cascade Microcirculation 1999 6: 259–265
Article CAS Google Scholar
Achard F, Vayssix G, Dessen P, Barillot E . Virgil database for rich links (1999 update) Nucl Acids Res 1999 27: 113–114
Article CAS Google Scholar
Rebhan M, Chalifa-Casp iV, Prilusky J, Lancet D . GeneCards: a novel functional genomics compendium with automated data mining and query reformulation support Bioinformatics 1998 14: 656–664
Article CAS Google Scholar
Wu S, Manber U . Fast text searching allowing errors Commun ACM 1992 35: 83–91
Article Google Scholar
Masys D, Welsh J, Lynn Fink J, MG, Klacansky I, Corbeil J . Use of keyword hierarchies to interpret gene expression patterns Bioinformatics 2001 17: 319–326
Article CAS Google Scholar
National Center for Biotechnology Information. PubMed help 2001
Tanabe L, Scherf U, Smith L, Lee J, Hunter L, Weinstein J . MedMiner: an Internet text-mining tool for biomedical information, with application to gene expression profiling Biotechniques 1999 27: 1210–1217
Article CAS Google Scholar
Rindflesch T, Hunter L, Aronson A . Mining molecular binding terminology from biomedical text In AMIA Fall Symposium 1999 pp 127–31
Rindflesch T, Tanabe L, Weinstein J, Hunter L . EDGAR: extraction of drugs, genes and relations from the biomedical literature In Pacific Symposium on Biocomputing, Honolulu, Hawaii 2000 pp 517–528
Swanson D, Smalheiser N . An interactive system for finding complementary literatures: a stimulus to scientific discovery Artif Intell 1997 91: 183–203
Article Google Scholar
Finn R . Program uncovers hidden connections in the literature The Scientist 1998 12: www.the-scientist.com
Swanson D . Migraine and magnesium: eleven neglected connections Perspect Biol Med 1988 31: 526–557
Article CAS Google Scholar

Download references

Acknowledgements

The author thanks Cynthia Brandt, MD, and John Fisk, MD, of the Yale Center for Medical Informatics, and the anonymous reviewers for feedback on the article. The author is supported by grants U01 ES10867–02 from the National Institute of Environmental Health Sciences, R01 LM06843–02 from the National Library of Medicine and U01 CA78266–04 from the National Cancer Institute.

Author information

Authors and Affiliations

Center for Medical Informatics, Yale University School of Medicine, New Haven, Connecticut, USA
P M Nadkarni

Authors

P M Nadkarni
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to P M Nadkarni.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nadkarni, P. An introduction to information retrieval: applications in genomics. Pharmacogenomics J 2, 96–102 (2002). https://doi.org/10.1038/sj.tpj.6500084

Download citation

Received: 24 October 2001
Revised: 24 November 2001
Accepted: 26 November 2001
Published: 30 April 2002
Issue Date: February 2002
DOI: https://doi.org/10.1038/sj.tpj.6500084

Keywords

This article is cited by

Diagnosis of Rare Diseases: a scoping review of clinical decision support systems
- Jannik Schaaf
- Martin Sedlmayr
- Holger Storf
Orphanet Journal of Rare Diseases (2020)

An introduction to information retrieval: applications in genomics

Abstract

Access options

Similar content being viewed by others

Rummagene: massive mining of gene sets from supporting materials of biomedical research publications

NLM-Chem, a new resource for chemical entity recognition in PubMed full text literature

NERO: a biomedical named-entity (recognition) ontology with a large, annotated corpus reveals meaningful associations through text embedding

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

This article is cited by

Diagnosis of Rare Diseases: a scoping review of clinical decision support systems

Search

Quick links

Abstract

Access options

Similar content being viewed by others

Rummagene: massive mining of gene sets from supporting materials of biomedical research publications

NLM-Chem, a new resource for chemical entity recognition in PubMed full text literature

NERO: a biomedical named-entity (recognition) ontology with a large, annotated corpus reveals meaningful associations through text embedding

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

This article is cited by

Diagnosis of Rare Diseases: a scoping review of clinical decision support systems

Search

Quick links