Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • News & Views
  • Published:

Machine learning

Evaluating the clinical benefits of LLMs

Although large language models (LLMs) show promise in controlled settings, a study now exposes their limitations in real-world clinical applications and points the way towards robust evaluation and benchmarking before clinical use.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Verifying the benefits of AI models in healthcare.

References

  1. Kung, T. H. et al. PLoS Digit. Health 2, e0000198 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  2. Gilson, A. et al. JMIR Med. Educ. 9, e45312 (2023). (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  3. Mehandru, N. et al. npj Digit. Med. 7, 84 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  4. Hager, P. et al. Nat. Med. https://doi.org/10.1038/s41591-024-03097-1 (2024).

    Article  PubMed  Google Scholar 

  5. Bedi, S. et al. Preprint at medRxiv https://doi.org/10.1101/2024.04.15.24305869 (2024).

  6. Shah, N. H. et al. JAMA 330, 866–869 (2023).

    Article  PubMed  Google Scholar 

  7. Jindal, R. et al. J. Am. Med. Inform. Assoc. 31, 1441–1444 (2024).

    Article  PubMed  Google Scholar 

  8. Fleming, S. L. et al. Proc. AAAI Conference on Artificial Intelligence 38, 22021–22030 (2024).

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nigam H. Shah.

Ethics declarations

Competing interests

N.H.S. is a cofounder of Prealize Health (a predictive analytics company) and Atropos Health (an on-demand evidence generation company); reports funding from the Gordon and Betty Moore Foundation for developing virtual model deployments; and served on the board of the Coalition for Healthcare AI (CHAI), a consensus-building organization providing guidelines for the responsible use of artificial intelligence in health care. The other authors have no competing financial interests.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bedi, S., Jain, S.S. & Shah, N.H. Evaluating the clinical benefits of LLMs. Nat Med (2024). https://doi.org/10.1038/s41591-024-03181-6

Download citation

  • Published:

  • DOI: https://doi.org/10.1038/s41591-024-03181-6

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing