Large language models (LLMs) — a type of artificial intelligence that uses deep learning and large data sets for natural language processing tasks — are being increasingly deployed in a variety of applications. However, applying LLMs in medicine and health care, for example, for triaging of patient concerns, as clinical decision assistant for doctors and biomedical research assistant for scientists, remains challenging owing to the societal and medical implications of potential hallucinations of these models. One way to assess the reliability and knowledge encoded by LLMs is to test their answers to biomedical questions; however, current medical question-answering benchmarks are limited in scope and have typically only considered small language models (a few hundred to a few billion parameters). Now, writing in Nature, Karan Singhal, Shekoofeh Azizi, Alan Karthikesalingam, Vivek Natarajan and team report a multidimensional question-answering benchmark, evaluating the clinical knowledge of fine-tuned variants of the pathways language model (PaLM), a 540-billion parameter, densely activated LLM.
Based on this framework, the team designed a version of PaLM, trained to follow instructions (instruction-tuned), named Flan-PaLM, which substantially outperformed existing state-of-the-art baseline LLMs, with 67.6% accuracy on MedQA, 57.6% on MedMCQA and 79.0% on PubMedQA. Nonetheless, only 61.9% of Flan-PaLM long-form answers were deemed to be aligned with scientific consensus and 29.7% were rated potentially harmful. Applying an instruction prompt tuning strategy, a parameter-efficient alignment technique based on medical domain data and expert clinician demonstrations (called Med-PaLM), improved these readouts to 92.6% (alignment to scientific consensus) and 5.9% (potentially harmful). “Importantly, Med-PaLM not only matched the performance of Flan-PaLM on benchmarks, such as USMLE, but also greatly improved on axes such as factuality of answers, harm, helpfulness and bias, thereby closing the gap with physicians”, says Natarajan.
This is a preview of subscription content, access via your institution