Azadi A, Gorjinejad F, Mohammad-Rahimi H, Tabrizi R, Alam M, Golkar M. Evaluation of AI-generated Responses by Different Artificial Intelligence Chatbots to the Clinical Decision-Making Case-Based Questions in Oral and Maxillofacial Surgery. Oral Surg Oral Med Oral Pathol Oral Radiol 2024; DOI: 10.1016/j.oooo.2024.02.018.

Response is better to open-ended questions.

Artificial intelligence (AI) has been used by dental professionals to decrease workload, in patient education and consent and to enhance diagnosis and decision making. ChatGPT has been studied most frequently for its use and reliability. There have been no previous studies which have compared different systems.

In this study, GPT-3.5, GPT-4, Claude-Instant, Google Bard and Microsoft Bing were set an OMFS case-based questionnaire (50 questions) requiring open-ended (OEQ) and multiple-choice (MCQ) responses. Responses were compared to those of three OMFS consultants. For the MCQs, Bing showed the weakest response with only 26% correct answers. The others scored around 36% correct. For OEQ, all showed median scores of 4 or 5 for quality, as assessed by the question setters (where 1 = poor quality of little use and 5 = excellent quality and very useful for clinicians).

MCQs force the systems to make a ‘right' or 'wrong' decision, with no opportunity for reasoning. Queries to AI should be worded in such a way as to not limit their options for response. The authors conclude that the ‘technology cannot yet be trusted in the clinical scenario.'