How user intelligence is improving PubMed

Fiorini, Nicolas; Leaman, Robert; Lipman, David J; Lu, Zhiyong

doi:10.1038/nbt.4267

Perspective
Published: 01 October 2018

How user intelligence is improving PubMed

Nicolas Fiorini ORCID: orcid.org/0000-0002-9260-1326¹,
Robert Leaman¹,
David J Lipman¹ &
…
Zhiyong Lu¹

Nature Biotechnology volume 36, pages 937–945 (2018)Cite this article

4167 Accesses
36 Citations
21 Altmetric
Metrics details

Subjects

Abstract

PubMed is a widely used search engine for biomedical literature. It is developed and maintained by the US National Library of Medicine/National Center for Biotechnology Information and is visited daily by millions of users around the world. For decades, PubMed has used advanced artificial intelligence technologies that extract patterns of collective user activity, such as machine learning and natural language processing, to inform the algorithmic changes that ultimately improve a user's search experience. Although these efforts have led to objective improvements in search quality, the technical underpinnings remain largely invisible and go largely unnoticed by most users. Here we describe how these 'under-the-hood' techniques work within PubMed and report how their effectiveness and usage is assessed in real-world scenarios. In doing so, we hope to increase the transparency of the PubMed system and enable users to make more effective use of the search engine. We also identify open challenges and new opportunities for computational researchers to explore the potential of future improvements.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Information search and data analytics in PubMed.**

**Figure 2: Navigational and informational searches in PubMed.**

**Figure 3: The overall workflow for PubMed's Best Match search using machine learning.**

**Figure 4: Implementations of Best Match and of Related Articles in PubMed.**

**Figure 5: Factors that affect average CTRs.**

The shaky foundations of large language models and foundation models for electronic health records

Article Open access 29 July 2023

NLM-Chem, a new resource for chemical entity recognition in PubMed full text literature

Article Open access 25 March 2021

Rummagene: massive mining of gene sets from supporting materials of biomedical research publications

Article Open access 20 April 2024

References

Lu, Z. PubMed and beyond: a survey of web tools for searching biomedical literature. Database (Oxford) 2011, baq036 (2011).
Article Google Scholar
Islamaj Dogan, R., Murray, G.C., Névéol, A. & Lu, Z. Understanding PubMed user search behavior through log analysis. Database (Oxford) 2009, bap018 (2009).
Article Google Scholar
Chambliss, M.L. & Conley, J. Answering clinical questions. J. Fam. Pract. 43, 140–144 (1996).
CAS PubMed Google Scholar
Ely, J.W. et al. Obstacles to answering doctors' questions about patient care with evidence: qualitative study. Br. Med. J. 324, 710 (2002).
Article Google Scholar
Hersh, W. Information Retrieval: a Health and Biomedical Perspective (Springer Science & Business Media, 2008).
Jensen, L.J., Saric, J. & Bork, P. Literature mining for the biologist: from information retrieval to biological discovery. Nat. Rev. Genet. 7, 119–129 (2006).
Article CAS Google Scholar
Hersh, W. et al. TREC 2005 genomics track overview. in Proceedings of the Fourteenth Text Retrieval Conference (TREC 2003) (NIST, 2003).
Jiang, J. & Zhai, C. An empirical study of tokenization strategies for biomedical information retrieval. Inf. Retr. Boston 10, 341–363 (2007).
Article Google Scholar
Lu, Z., Kim, W. & Wilbur, W.J. Evaluation of query expansion using MeSH in PubMed. Inf. Retr. Boston 12, 69–80 (2009).
Article Google Scholar
Herskovic, J.R., Tanaka, L.Y., Hersh, W. & Bernstam, E.V. A day in the life of PubMed: analysis of a typical day's query log. J. Am. Med. Inform. Assoc. 14, 212–220 (2007).
Article Google Scholar
Haynes, R.B., McKibbon, K.A., Wilczynski, N.L., Walter, S.D. & Werre, S.R. Optimal search strategies for retrieving scientifically strong studies of treatment from Medline: analytical survey. Br. Med. J. 330, 1179 (2005).
Article Google Scholar
Cao, Y. et al. AskHERMES: an online question answering system for complex clinical questions. J. Biomed. Inform. 44, 277–288 (2011).
Article Google Scholar
Roberts, K. & Demner-Fushman, D. Interactive use of online health resources: a comparison of consumer and professional questions. J. Am. Med. Inform. Assoc. 23, 802–811 (2016).
Article Google Scholar
Lin, J. & Wilbur, W.J. Modeling actions of PubMed users with n-gram language models. Inf. Retr. Boston 12, 487–503 (2008).
Article Google Scholar
Russell-Rose, T. & Chamberlain, J. Expert search strategies: the information retrieval practices of healthcare information professionals. JMIR Med. Inform. 5, e33 (2017).
Article Google Scholar
Sherman, L. & Deighton, J. Banner advertising: measuring effectiveness and optimizing placement. J. Interact. Market 15, 60–64 (2001).
Article Google Scholar
Li, H. & Leckenby, J.D. Internet advertising formats and effectiveness. https://brosephstalin.files.wordpress.com/2010/06/ad_format_print.pdf (2004).
Campbell, F.M. National bias: a comparison of citation practices by health professionals. Bull. Med. Libr. Assoc. 78, 376–382 (1990).
CAS PubMed PubMed Central Google Scholar
Yeganova, L. et al. PubTermVariants: biomedical term variants and their use for PubMed search. Workshop on Biomedical Natural Language Processing 141–145 (ACL, 2016).
Wilbur, W.J., Kim, W. & Xie, N. Spelling correction in the PubMed search engine. Inf. Retr. Boston 9, 543–564 (2006).
Article Google Scholar
Poder, T.G., Erraji, J., Coulibaly, L.P. & Koffi, K. Percutaneous coronary intervention with second-generation drug-eluting stent versus bare-metal stent: systematic review and cost-benefit analysis. PLoS One 12, e0177476 (2017).
Article Google Scholar
Broder, A.Z. A taxonomy of web search. ACM Special Interest Group on Information Retrieval Forum 36, 3–10 (ACM, 2002).
Jones, K.S. A statistical interpretation of term specificity and its application in retrieval. J. Doc. 60, 493–502 (2004).
Article Google Scholar
Robertson, S.E., Walker, S., Jones, S., Hancock-Beaulieu, M. & Gatford, M. Okapi at TREC-3. in Proceedings of the 3rd Text Retrieval Conference (TREC-3) 109–126 (NIST, 1994).
Fiorini, N. et al. Best Match: new relevance search for PubMed. PLoS Biol. 16, e2005343 (2018).
Article Google Scholar
Burges, C.J.C. et al. Learning to rank using gradient descent. in Proceedings of the 22nd International Conference on Machine Learning 89–96 (ACM, 2005).
Smith, L. & Wilbur, W.J. The popularity of articles in PubMed. Open Inf. Syst. J. 5, 1–7 (2011).
Google Scholar
Mohan, S., Fiorini, N., Kim, S. & Lu, Z. Deep learning for biomedical information retrieval: learning textual relevance from click logs. in Proceedings of the 16th Workshop on Biomedical Natural Language Processing 222–231 (ACL, 2017).
Xiong, C., Dai, Z., Callan, J., Liu, Z. & Power, R. End-to-end neural ad-hoc ranking with kernel pooling. in Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval 55–64 (ACM, 2017).
Valenzuela, M., Ha, V. & Etzioni, O. Identifying meaningful citations. in Association for the Advancement of Artificial Intelligence Workshop: Scholarly Big Data 21–26 (AAAI, 2015).
Etzioni, O. Artificial intelligence: AI zooms in on highly influential citations. Nature 547, 32 (2017).
Article CAS Google Scholar
Fiorini, N., Lipman, D.J. & Lu, Z. Towards PubMed 2.0. Elife 6, e28801 (2017).
Article Google Scholar
Fiorini, N. et al. PubMed Labs: an experimental system for improving biomedical literature search. Database (Oxford) https://doi.org/10.1093/database/bay094 (2018).
Liu, W. et al. Author name disambiguation for PubMed. J. Assoc. Inf. Sci. Technol. 65, 765–781 (2014).
Article Google Scholar
Lu, Z., Wilbur, W.J., McEntyre, J.R., Iskhakov, A. & Szilagyi, L. Finding query suggestions for PubMed. AMIA Annu. Symp. Proc. 2009, 396–400 (2009).
PubMed PubMed Central Google Scholar
Huang, C.-C. & Lu, Z. Discovering biomedical semantic relations in PubMed queries for information retrieval and database curation. Database 2016, baw025 (2016).
Article Google Scholar
Jat, K.R. & Khairwa, A. Levalbuterol versus albuterol for acute asthma: a systematic review and meta-analysis. Pulm. Pharmacol. Ther. 26, 239–248 10.1016/j.pupt.2012.11.003 (2013).
Article CAS Google Scholar
Minack, E., Demartini, G. & Nejdl, W. Current approaches to search result diversification. in. Proceedings of The First International Workshop on Living Web at the 8th International Semantic Web Conference (ISWC) 37–44 (CEUR, 2009).
Kim, W., Yeganova, L., Comeau, D.C., Wilbur, J.W. & Lu, Z. MeSH-based dataset for measuring the relevance of text retrieval. in Proceedings of the 17th Workshop on Biomedical Natural Language Processing 161–165 (ACL, 2018).
Onal, K.D. et al. Neural information retrieval: at the end of the early years. Inf. Retr. J. 21, 111–182 (2018).
Article Google Scholar
Sordoni, A. et al. A hierarchical recurrent encoder-decoder for generative context-aware query suggestion. in Proceedings of the 24th ACM International on Conference on Information and Knowledge Management 553–562 (ACM, 2015).
Zamani, H. & Croft, W.B. Relevance-based word embedding. in Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval 505–514 (ACM, 2017).
Glater, R., Santos, R.L.T. & Ziviani, N. Intent-aware semantic query annotation. in Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval 485–494 (ACM, 2017).
Mitra, B. & Craswell, N. Query auto-completion for rare prefixes. in Proceedings of the 24th ACM International on Conference on Information and Knowledge Management 1755–1758 (ACM, 2015).
Cai, F., Reinanda, R. & Rijke, M.D. Diversifying query auto-completion. ACM Trans. Inf. Syst. 34, 1–33 (2016).
Article Google Scholar
Xia, L. et al. Adapting Markov decision process for search result diversification. in Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval 535–544 (ACM 2017).
Jiang, Z. et al. Learning to diversify search results via subtopic attention. in Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval 545–554 (ACM, 2017).
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S. & Dean, J. Distributed representations of words and phrases and their compositionality. Adv. Neural Inf. Process. Syst. 13, 3111–3119 (2013).
Google Scholar
Pennington, J., Socher, R. & Manning, C. Glove: global vectors for word representation. in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) 1532–1543 (ACL, 2014).
Mohan, S., Fiorini, N., Kim, S. & Lu, Z. A fast deep learning model for textual relevance in biomedical information retrieval. in Proceedings of the 27th International Conference on World Wide Web 77–86 (ACM, 2018).
Yeganova, L., Kim, W., Kim, S. & Wilbur, W.J. Retro: concept-based clustering of biomedical topical sets. Bioinformatics 30, 3240–3248 (2014).
Article CAS Google Scholar

Download references

Acknowledgements

The authors would like to thank K. Canese, R. Ismagilov, G. Starchenko, E. Kireev, J. Wilbur, D. Comeau, S. Kim, W. Kim, L. Yeganova, V. Miller, M. Osipov, R. Bryzgunov, I. Radetska, A. Gindulyte, M. Latterner, the NLM/NCBI leadership, and the many NCBI and NLM Library Operations staff working on and contributing to PubMed. This research was supported by the NIH Intramural Research Program, National Library of Medicine.

Author information

Authors and Affiliations

National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, Maryland, USA
Nicolas Fiorini, Robert Leaman, David J Lipman & Zhiyong Lu

Authors

Nicolas Fiorini
View author publications
You can also search for this author in PubMed Google Scholar
Robert Leaman
View author publications
You can also search for this author in PubMed Google Scholar
David J Lipman
View author publications
You can also search for this author in PubMed Google Scholar
Zhiyong Lu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhiyong Lu.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Figure 1 and Supplementary Note 1 (PDF 225 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fiorini, N., Leaman, R., Lipman, D. et al. How user intelligence is improving PubMed. Nat Biotechnol 36, 937–945 (2018). https://doi.org/10.1038/nbt.4267

Download citation

Received: 30 October 2017
Accepted: 06 September 2018
Published: 01 October 2018
Issue Date: November 2018
DOI: https://doi.org/10.1038/nbt.4267

This article is cited by

Adapting transformer-based language models for heart disease detection and risk factors extraction
- Essam H. Houssein
- Rehab E. Mohamed
- Abdelmgeid A. Ali
Journal of Big Data (2024)
NeighBERT: Medical Entity Linking Using Relation-Induced Dense Retrieval
- Ayush Singh
- Saranya Krishnamoorthy
- John E. Ortega
Journal of Healthcare Informatics Research (2024)
Enhancing unsupervised medical entity linking with multi-instance learning
- Cheng Yan
- Yuanzhe Zhang
- Shengping Liu
BMC Medical Informatics and Decision Making (2021)
Deep learning with sentence embeddings pre-trained on biomedical corpora improves the performance of finding similar sentences in electronic medical records
- Qingyu Chen
- Jingcheng Du
- Zhiyong Lu
BMC Medical Informatics and Decision Making (2020)
Navigation-based candidate expansion and pretrained language models for citation recommendation
- Rodrigo Nogueira
- Zhiying Jiang
- Jimmy Lin
Scientometrics (2020)