Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review
  • Published:

An introduction to information retrieval: applications in genomics

Abstract

Information retrieval (IR) is the field of computer science that deals with the processing of documents containing free text, so that they can be rapidly retrieved based on keywords specified in a user's query. IR technology is the basis of Web-based search engines, and plays a vital role in biomedical research, because it is the foundation of software that supports literature search. Documents can be indexed by both the words they contain, as well as the concepts that can be matched to domain-specific thesauri; concept matching, however, poses several practical difficulties that make it unsuitable for use by itself. This article provides an introduction to IR and summarizes various applications of IR and related technologies to genomics.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Similar content being viewed by others

References

  1. Salton G . Automatic Text Processing: the transformation, analysis, and retrieval of information by computer Addison-Wesley: Reading, MA 1989

    Google Scholar 

  2. Van Rijsbergen CJ . Information Retrieval Butterworths: London, UK 1979

    Google Scholar 

  3. Baeza-Yates R, Ribeiro-Neto B . Modern Information Retrieval Addison-Wesley Longman: Harlow, UK 1999

    Google Scholar 

  4. Witten IH, Moffat A, Bell TC . Managing Gigabytes Morgan Kaufman: San Francisco, CA 1999

    Google Scholar 

  5. Porter MF . An algorithm for suffix stripping Program 1980 14: 130–137

    Article  Google Scholar 

  6. Harman D . How effective is suffixing? J Am Soc Inform Sci 1991 42: 7–15

    Article  Google Scholar 

  7. Xu J, Croft WB . Corpus-based stemming using co-occurrence of word variants ACM Trans Inform Syst 1979 16: 61–81

    Article  Google Scholar 

  8. Nadkarni PM, Chen RS, Brandt CA . UMLS concept indexing for production databases: a feasibility study J Am Med Inform Assoc 2001 8: 80–91

    Article  CAS  Google Scholar 

  9. Elkin PL, Cimino JJ, Lowe HJ, Aronow DB, Payne TH, Pincett PS et al . Mapping to MESH: the art of trapping MESH equivalence from within narrative text In Proc Symposium on Computer Applications in Medical Care 1988 pp 185–190

  10. Aronson A, Rindflesch T, Browne A . Exploiting a large thesaurus for information retrieval In Proceedings of the RIAO 1994 pp 197–216

  11. Aronson AR, Rindflesch TC . Query expansion using the UMLS Metathesaurus In Proceedings/AMIA Annual Fall Symposium 1997 pp 485–489

  12. Rindflesch TC, Aronson AR . Ambiguity resolution while mapping free text to the UMLS Metathesaurus In Proceedings–the Annual Symposium on Computer Applications in Medical Care 1994 pp 240–244

  13. Masys D . Linking microarray data to the literature (Editorial) Nature Genet 2001 27: 9–10

    Article  Google Scholar 

  14. Mutalik P, Deshpande A, Nadkarni P . Use of general-purpose negation detection to augment concept indexing of medical documents: a quantitative study using the UMLS J Am Med Inform Assoc 2001 8: 598–609

    Article  CAS  Google Scholar 

  15. Williams JH, Perriens MP . Automated full text indexing and searching systems In IBM Information Systems Symposium Washington, DC 1968 pp 335–350

  16. Sparck Jones K . A statistical interpretation of term specificity and its application in retrieval J Documentation 1972 28: 11–21

    Article  Google Scholar 

  17. Sparck-Jones K, Walter S, Robertson SE . Information retrieval: development and comparative experiments (Part I) Inform Proc Manage 2000 36: 779–808

    Article  Google Scholar 

  18. Sparck-Jones K, Walter S, Robertson SE . Information retrieval: development and comparative experiments (Part 2) Inform Proc Manage 2000 36: 809–840

    Article  Google Scholar 

  19. Google Inc Google: Technology Overview 2001

  20. Marshall E . Medline searches turn up cases of suspected plagiarism (News) Science 1998 279: 473–474

    Article  CAS  Google Scholar 

  21. OMIM. Online Mendelian Inheritance in Man In: McKusick–Nathans Institute for Genetic Medicine, Johns Hopkins University (Baltimore, MD) and National Center for Biotechnology Information, National Library of Medicine (Bethesda, MD) 2001

  22. Ley K, Brewer K, Moton A . A web-based research tool for functional genomics of the microcirculation: the leukocyte adhesion cascade Microcirculation 1999 6: 259–265

    Article  CAS  Google Scholar 

  23. Achard F, Vayssix G, Dessen P, Barillot E . Virgil database for rich links (1999 update) Nucl Acids Res 1999 27: 113–114

    Article  CAS  Google Scholar 

  24. Rebhan M, Chalifa-Casp iV, Prilusky J, Lancet D . GeneCards: a novel functional genomics compendium with automated data mining and query reformulation support Bioinformatics 1998 14: 656–664

    Article  CAS  Google Scholar 

  25. Wu S, Manber U . Fast text searching allowing errors Commun ACM 1992 35: 83–91

    Article  Google Scholar 

  26. Masys D, Welsh J, Lynn Fink J, MG, Klacansky I, Corbeil J . Use of keyword hierarchies to interpret gene expression patterns Bioinformatics 2001 17: 319–326

    Article  CAS  Google Scholar 

  27. National Center for Biotechnology Information. PubMed help 2001

  28. Tanabe L, Scherf U, Smith L, Lee J, Hunter L, Weinstein J . MedMiner: an Internet text-mining tool for biomedical information, with application to gene expression profiling Biotechniques 1999 27: 1210–1217

    Article  CAS  Google Scholar 

  29. Rindflesch T, Hunter L, Aronson A . Mining molecular binding terminology from biomedical text In AMIA Fall Symposium 1999 pp 127–31

  30. Rindflesch T, Tanabe L, Weinstein J, Hunter L . EDGAR: extraction of drugs, genes and relations from the biomedical literature In Pacific Symposium on Biocomputing, Honolulu, Hawaii 2000 pp 517–528

  31. Swanson D, Smalheiser N . An interactive system for finding complementary literatures: a stimulus to scientific discovery Artif Intell 1997 91: 183–203

    Article  Google Scholar 

  32. Finn R . Program uncovers hidden connections in the literature The Scientist 1998 12: www.the-scientist.com

  33. Swanson D . Migraine and magnesium: eleven neglected connections Perspect Biol Med 1988 31: 526–557

    Article  CAS  Google Scholar 

Download references

Acknowledgements

The author thanks Cynthia Brandt, MD, and John Fisk, MD, of the Yale Center for Medical Informatics, and the anonymous reviewers for feedback on the article. The author is supported by grants U01 ES10867–02 from the National Institute of Environmental Health Sciences, R01 LM06843–02 from the National Library of Medicine and U01 CA78266–04 from the National Cancer Institute.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to P M Nadkarni.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nadkarni, P. An introduction to information retrieval: applications in genomics. Pharmacogenomics J 2, 96–102 (2002). https://doi.org/10.1038/sj.tpj.6500084

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/sj.tpj.6500084

Keywords

This article is cited by

Search

Quick links