Although large language models (LLMs) show promise in controlled settings, a study now exposes their limitations in real-world clinical applications and points the way towards robust evaluation and benchmarking before clinical use.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
References
Kung, T. H. et al. PLoS Digit. Health 2, e0000198 (2023).
Gilson, A. et al. JMIR Med. Educ. 9, e45312 (2023). (2023).
Mehandru, N. et al. npj Digit. Med. 7, 84 (2024).
Hager, P. et al. Nat. Med. https://doi.org/10.1038/s41591-024-03097-1 (2024).
Bedi, S. et al. Preprint at medRxiv https://doi.org/10.1101/2024.04.15.24305869 (2024).
Shah, N. H. et al. JAMA 330, 866–869 (2023).
Jindal, R. et al. J. Am. Med. Inform. Assoc. 31, 1441–1444 (2024).
Fleming, S. L. et al. Proc. AAAI Conference on Artificial Intelligence 38, 22021–22030 (2024).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
N.H.S. is a cofounder of Prealize Health (a predictive analytics company) and Atropos Health (an on-demand evidence generation company); reports funding from the Gordon and Betty Moore Foundation for developing virtual model deployments; and served on the board of the Coalition for Healthcare AI (CHAI), a consensus-building organization providing guidelines for the responsible use of artificial intelligence in health care. The other authors have no competing financial interests.
Rights and permissions
About this article
Cite this article
Bedi, S., Jain, S.S. & Shah, N.H. Evaluating the clinical benefits of LLMs. Nat Med (2024). https://doi.org/10.1038/s41591-024-03181-6
Published:
DOI: https://doi.org/10.1038/s41591-024-03181-6