How user intelligence is improving PubMed

Abstract

PubMed is a widely used search engine for biomedical literature. It is developed and maintained by the US National Library of Medicine/National Center for Biotechnology Information and is visited daily by millions of users around the world. For decades, PubMed has used advanced artificial intelligence technologies that extract patterns of collective user activity, such as machine learning and natural language processing, to inform the algorithmic changes that ultimately improve a user's search experience. Although these efforts have led to objective improvements in search quality, the technical underpinnings remain largely invisible and go largely unnoticed by most users. Here we describe how these 'under-the-hood' techniques work within PubMed and report how their effectiveness and usage is assessed in real-world scenarios. In doing so, we hope to increase the transparency of the PubMed system and enable users to make more effective use of the search engine. We also identify open challenges and new opportunities for computational researchers to explore the potential of future improvements.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Figure 1: Information search and data analytics in PubMed.
Figure 2: Navigational and informational searches in PubMed.
Figure 3: The overall workflow for PubMed's Best Match search using machine learning.
Figure 4: Implementations of Best Match and of Related Articles in PubMed.
Figure 5: Factors that affect average CTRs.

References

  1. 1

    Lu, Z. PubMed and beyond: a survey of web tools for searching biomedical literature. Database (Oxford) 2011, baq036 (2011).

    Article  Google Scholar 

  2. 2

    Islamaj Dogan, R., Murray, G.C., Névéol, A. & Lu, Z. Understanding PubMed user search behavior through log analysis. Database (Oxford) 2009, bap018 (2009).

    Article  Google Scholar 

  3. 3

    Chambliss, M.L. & Conley, J. Answering clinical questions. J. Fam. Pract. 43, 140–144 (1996).

    CAS  PubMed  Google Scholar 

  4. 4

    Ely, J.W. et al. Obstacles to answering doctors' questions about patient care with evidence: qualitative study. Br. Med. J. 324, 710 (2002).

    Article  Google Scholar 

  5. 5

    Hersh, W. Information Retrieval: a Health and Biomedical Perspective (Springer Science & Business Media, 2008).

  6. 6

    Jensen, L.J., Saric, J. & Bork, P. Literature mining for the biologist: from information retrieval to biological discovery. Nat. Rev. Genet. 7, 119–129 (2006).

    CAS  Article  Google Scholar 

  7. 7

    Hersh, W. et al. TREC 2005 genomics track overview. in Proceedings of the Fourteenth Text Retrieval Conference (TREC 2003) (NIST, 2003).

  8. 8

    Jiang, J. & Zhai, C. An empirical study of tokenization strategies for biomedical information retrieval. Inf. Retr. Boston 10, 341–363 (2007).

    Article  Google Scholar 

  9. 9

    Lu, Z., Kim, W. & Wilbur, W.J. Evaluation of query expansion using MeSH in PubMed. Inf. Retr. Boston 12, 69–80 (2009).

    Article  Google Scholar 

  10. 10

    Herskovic, J.R., Tanaka, L.Y., Hersh, W. & Bernstam, E.V. A day in the life of PubMed: analysis of a typical day's query log. J. Am. Med. Inform. Assoc. 14, 212–220 (2007).

    Article  Google Scholar 

  11. 11

    Haynes, R.B., McKibbon, K.A., Wilczynski, N.L., Walter, S.D. & Werre, S.R. Optimal search strategies for retrieving scientifically strong studies of treatment from Medline: analytical survey. Br. Med. J. 330, 1179 (2005).

    Article  Google Scholar 

  12. 12

    Cao, Y. et al. AskHERMES: an online question answering system for complex clinical questions. J. Biomed. Inform. 44, 277–288 (2011).

    Article  Google Scholar 

  13. 13

    Roberts, K. & Demner-Fushman, D. Interactive use of online health resources: a comparison of consumer and professional questions. J. Am. Med. Inform. Assoc. 23, 802–811 (2016).

    Article  Google Scholar 

  14. 14

    Lin, J. & Wilbur, W.J. Modeling actions of PubMed users with n-gram language models. Inf. Retr. Boston 12, 487–503 (2008).

    Article  Google Scholar 

  15. 15

    Russell-Rose, T. & Chamberlain, J. Expert search strategies: the information retrieval practices of healthcare information professionals. JMIR Med. Inform. 5, e33 (2017).

    Article  Google Scholar 

  16. 16

    Sherman, L. & Deighton, J. Banner advertising: measuring effectiveness and optimizing placement. J. Interact. Market 15, 60–64 (2001).

    Article  Google Scholar 

  17. 17

    Li, H. & Leckenby, J.D. Internet advertising formats and effectiveness. https://brosephstalin.files.wordpress.com/2010/06/ad_format_print.pdf (2004).

  18. 18

    Campbell, F.M. National bias: a comparison of citation practices by health professionals. Bull. Med. Libr. Assoc. 78, 376–382 (1990).

    CAS  PubMed  PubMed Central  Google Scholar 

  19. 19

    Yeganova, L. et al. PubTermVariants: biomedical term variants and their use for PubMed search. Workshop on Biomedical Natural Language Processing 141–145 (ACL, 2016).

  20. 20

    Wilbur, W.J., Kim, W. & Xie, N. Spelling correction in the PubMed search engine. Inf. Retr. Boston 9, 543–564 (2006).

    Article  Google Scholar 

  21. 21

    Poder, T.G., Erraji, J., Coulibaly, L.P. & Koffi, K. Percutaneous coronary intervention with second-generation drug-eluting stent versus bare-metal stent: systematic review and cost-benefit analysis. PLoS One 12, e0177476 (2017).

    Article  Google Scholar 

  22. 22

    Broder, A.Z. A taxonomy of web search. ACM Special Interest Group on Information Retrieval Forum 36, 3–10 (ACM, 2002).

  23. 23

    Jones, K.S. A statistical interpretation of term specificity and its application in retrieval. J. Doc. 60, 493–502 (2004).

    Article  Google Scholar 

  24. 24

    Robertson, S.E., Walker, S., Jones, S., Hancock-Beaulieu, M. & Gatford, M. Okapi at TREC-3. in Proceedings of the 3rd Text Retrieval Conference (TREC-3) 109–126 (NIST, 1994).

  25. 25

    Fiorini, N. et al. Best Match: new relevance search for PubMed. PLoS Biol. 16, e2005343 (2018).

    Article  Google Scholar 

  26. 26

    Burges, C.J.C. et al. Learning to rank using gradient descent. in Proceedings of the 22nd International Conference on Machine Learning 89–96 (ACM, 2005).

  27. 27

    Smith, L. & Wilbur, W.J. The popularity of articles in PubMed. Open Inf. Syst. J. 5, 1–7 (2011).

    Google Scholar 

  28. 28

    Mohan, S., Fiorini, N., Kim, S. & Lu, Z. Deep learning for biomedical information retrieval: learning textual relevance from click logs. in Proceedings of the 16th Workshop on Biomedical Natural Language Processing 222–231 (ACL, 2017).

  29. 29

    Xiong, C., Dai, Z., Callan, J., Liu, Z. & Power, R. End-to-end neural ad-hoc ranking with kernel pooling. in Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval 55–64 (ACM, 2017).

  30. 30

    Valenzuela, M., Ha, V. & Etzioni, O. Identifying meaningful citations. in Association for the Advancement of Artificial Intelligence Workshop: Scholarly Big Data 21–26 (AAAI, 2015).

  31. 31

    Etzioni, O. Artificial intelligence: AI zooms in on highly influential citations. Nature 547, 32 (2017).

    CAS  Article  Google Scholar 

  32. 32

    Fiorini, N., Lipman, D.J. & Lu, Z. Towards PubMed 2.0. Elife 6, e28801 (2017).

    Article  Google Scholar 

  33. 33

    Fiorini, N. et al. PubMed Labs: an experimental system for improving biomedical literature search. Database (Oxford) https://doi.org/10.1093/database/bay094 (2018).

  34. 34

    Liu, W. et al. Author name disambiguation for PubMed. J. Assoc. Inf. Sci. Technol. 65, 765–781 (2014).

    Article  Google Scholar 

  35. 35

    Lu, Z., Wilbur, W.J., McEntyre, J.R., Iskhakov, A. & Szilagyi, L. Finding query suggestions for PubMed. AMIA Annu. Symp. Proc. 2009, 396–400 (2009).

    PubMed  PubMed Central  Google Scholar 

  36. 36

    Huang, C.-C. & Lu, Z. Discovering biomedical semantic relations in PubMed queries for information retrieval and database curation. Database 2016, baw025 (2016).

    Article  Google Scholar 

  37. 37

    Jat, K.R. & Khairwa, A. Levalbuterol versus albuterol for acute asthma: a systematic review and meta-analysis. Pulm. Pharmacol. Ther. 26, 239–248 10.1016/j.pupt.2012.11.003 (2013).

    CAS  Article  Google Scholar 

  38. 38

    Minack, E., Demartini, G. & Nejdl, W. Current approaches to search result diversification. in. Proceedings of The First International Workshop on Living Web at the 8th International Semantic Web Conference (ISWC) 37–44 (CEUR, 2009).

  39. 39

    Kim, W., Yeganova, L., Comeau, D.C., Wilbur, J.W. & Lu, Z. MeSH-based dataset for measuring the relevance of text retrieval. in Proceedings of the 17th Workshop on Biomedical Natural Language Processing 161–165 (ACL, 2018).

  40. 40

    Onal, K.D. et al. Neural information retrieval: at the end of the early years. Inf. Retr. J. 21, 111–182 (2018).

    Article  Google Scholar 

  41. 41

    Sordoni, A. et al. A hierarchical recurrent encoder-decoder for generative context-aware query suggestion. in Proceedings of the 24th ACM International on Conference on Information and Knowledge Management 553–562 (ACM, 2015).

  42. 42

    Zamani, H. & Croft, W.B. Relevance-based word embedding. in Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval 505–514 (ACM, 2017).

  43. 43

    Glater, R., Santos, R.L.T. & Ziviani, N. Intent-aware semantic query annotation. in Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval 485–494 (ACM, 2017).

  44. 44

    Mitra, B. & Craswell, N. Query auto-completion for rare prefixes. in Proceedings of the 24th ACM International on Conference on Information and Knowledge Management 1755–1758 (ACM, 2015).

  45. 45

    Cai, F., Reinanda, R. & Rijke, M.D. Diversifying query auto-completion. ACM Trans. Inf. Syst. 34, 1–33 (2016).

    Article  Google Scholar 

  46. 46

    Xia, L. et al. Adapting Markov decision process for search result diversification. in Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval 535–544 (ACM 2017).

  47. 47

    Jiang, Z. et al. Learning to diversify search results via subtopic attention. in Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval 545–554 (ACM, 2017).

  48. 48

    Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S. & Dean, J. Distributed representations of words and phrases and their compositionality. Adv. Neural Inf. Process. Syst. 13, 3111–3119 (2013).

    Google Scholar 

  49. 49

    Pennington, J., Socher, R. & Manning, C. Glove: global vectors for word representation. in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) 1532–1543 (ACL, 2014).

  50. 50

    Mohan, S., Fiorini, N., Kim, S. & Lu, Z. A fast deep learning model for textual relevance in biomedical information retrieval. in Proceedings of the 27th International Conference on World Wide Web 77–86 (ACM, 2018).

  51. 51

    Yeganova, L., Kim, W., Kim, S. & Wilbur, W.J. Retro: concept-based clustering of biomedical topical sets. Bioinformatics 30, 3240–3248 (2014).

    CAS  Article  Google Scholar 

Download references

Acknowledgements

The authors would like to thank K. Canese, R. Ismagilov, G. Starchenko, E. Kireev, J. Wilbur, D. Comeau, S. Kim, W. Kim, L. Yeganova, V. Miller, M. Osipov, R. Bryzgunov, I. Radetska, A. Gindulyte, M. Latterner, the NLM/NCBI leadership, and the many NCBI and NLM Library Operations staff working on and contributing to PubMed. This research was supported by the NIH Intramural Research Program, National Library of Medicine.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Zhiyong Lu.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Figure 1 and Supplementary Note 1 (PDF 225 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Fiorini, N., Leaman, R., Lipman, D. et al. How user intelligence is improving PubMed. Nat Biotechnol 36, 937–945 (2018). https://doi.org/10.1038/nbt.4267

Download citation

Further reading