Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Google Gemini and Bard artificial intelligence chatbot performance in ophthalmology knowledge assessment

Abstract

Purpose

With the popularization of ChatGPT (Open AI, San Francisco, California, United States) in recent months, understanding the potential of artificial intelligence (AI) chatbots in a medical context is important. Our study aims to evaluate Google Gemini and Bard’s (Google, Mountain View, California, United States) knowledge in ophthalmology.

Methods

In this study, we evaluated Google Gemini and Bard’s performance on EyeQuiz, a platform containing ophthalmology board certification examination practice questions, when used from the United States (US). Accuracy, response length, response time, and provision of explanations were evaluated. Subspecialty-specific performance was noted. A secondary analysis was conducted using Bard from Vietnam, and Gemini from Vietnam, Brazil, and the Netherlands.

Results

Overall, Google Gemini and Bard both had accuracies of 71% across 150 text-based multiple-choice questions. The secondary analysis revealed an accuracy of 67% using Bard from Vietnam, with 32 questions (21%) answered differently than when using Bard from the US. Moreover, the Vietnam version of Gemini achieved an accuracy of 74%, with 23 (15%) answered differently than the US version of Gemini. While the Brazil (68%) and Netherlands (65%) versions of Gemini performed slightly worse than the US version, differences in performance across the various country-specific versions of Bard and Gemini were not statistically significant.

Conclusion

Google Gemini and Bard had an acceptable performance in responding to ophthalmology board examination practice questions. Subtle variability was noted in the performance of the chatbots across different countries. The chatbots also tended to provide a confident explanation even when providing an incorrect answer.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Summary of Gemini’s performance across various question categories.
Fig. 2: Summary of Bard’s performance across various question categories.

Similar content being viewed by others

Data availability

All data generated or analysed during this study are included in this published article and its supplementary information files. Further enquiries can be directed to the corresponding author.

References

  1. Chow JCL, Sanders L, Li K. Impact of ChatGPT on medical chatbots as a disruptive technology. Front Artif Intell. 2023;6:60.

    Article  Google Scholar 

  2. Azaria A ChatGPT Usage and Limitations.

  3. Fatani B ChatGPT for Future Medical and Dental Research. Cureus. 15. Available at: /pmc/articles/PMC10165936/ [Accessed May 31, 2023] (2023).

  4. Gilson A, Safranek C, Huang T, Socrates V, Chi L, Taylor RA, et al. How Does ChatGPT Perform on the Medical Licensing Exams? The Implications of Large Language Models for Medical Education and Knowledge Assessment. medRxiv. 2022.12.23.22283901. Available at: https://www.medrxiv.org/content/10.1101/2022.12.23.22283901v1 [Accessed May 30, 2023] (2022).

  5. Shay D, Kumar B, Bellamy D, Palepu A, Dershwitz M, Walz JM, et al. Assessment of ChatGPT success with specialty medical knowledge using anaesthesiology board examination practice questions. Br J Anaesth. Available at: https://pubmed.ncbi.nlm.nih.gov/37210278/ [Accessed May 31, 2023] (2023).

  6. Bhayana R, Krishna S, Bleakney RR Performance of ChatGPT on a Radiology Board-style Examination: Insights into Current Strengths and Limitations. Radiology. Available at: https://pubmed.ncbi.nlm.nih.gov/37191485/ [Accessed May 31, 2023] 2023).

  7. Lum ZC, Collins D, Dennison S, Guntupalli L, Choudhary S, Saiz AM, et al. Can Artificial Intelligence Pass the American Board of Orthopaedic Surgery? An Analysis of 3900 Questions. Available at: https://papers.ssrn.com/abstract=4439147 [Accessed May 31, 2023].

  8. Mihalache A, Huang RS, Popovic MM, Muni RH ChatGPT-4: An assessment of an upgraded artificial intelligence chatbot in the United States Medical Licensing Examination. Med Teach. Available at: https://doi.org/10.1080/0142159X.2023.2249588 [Accessed January 15, 2024] (2023).

  9. Google. Bard updates from Google I/O 2023: Images, new features. Available at: https://blog.google/technology/ai/google-bard-updates-io-2023/ [Accessed June 2, 2023].

  10. Anon. Introducing Gemini: Google’s most capable AI model yet. Available at: https://blog.google/technology/ai/google-gemini-ai/#sundar-note [Accessed January 15, 2024].

  11. Anon. Eye Quiz. Available at: http://eyequiz.com/ [Accessed January 15, 2024].

  12. Anon. MedCalc’s Comparison of proportions calculator. Available at: https://www.medcalc.org/calc/comparison_of_proportions.php [Accessed January 15, 2024].

  13. Anon. Where you can use Bard - Bard Help. Available at: https://support.google.com/bard/answer/13575153?hl=en [Accessed January 15, 2024].

  14. Raimondi R, Tzoumas N, Salisbury T, Di Simplicio S, Romano MR Comparative analysis of large language models in the Royal College of Ophthalmologists fellowship exams. Available at: https://doi.org/10.1038/s41433-023-02563-3. [Accessed January 15, 2024].

  15. Waisberg E, Ong J, Masalkhi M, Zaman N, Sarker P, Lee AG, et al. Google’s AI chatbot “Bard”: a side-by-side comparison with ChatGPT and its utilization in ophthalmology. Eye. 2023;2023:1–4. https://www.nature.com/articles/s41433-023-02760-0.

    Google Scholar 

  16. Mihalache A, Huang RS, Popovic MM, Muni RH. Performance of an upgraded artificial intelligence chatbot for ophthalmic knowledge assessment. JAMA Ophthalmol. 2023;141:798–800. https://jamanetwork.com/journals/jamaophthalmology/fullarticle/2807120.

    Article  PubMed  Google Scholar 

  17. Mihalache A, Popovic MM, Muni RH. Performance of an artificial intelligence chatbot in ophthalmic knowledge assessment. JAMA Ophthalmol. 2023;141:589–97. https://jamanetwork.com/journals/jamaophthalmology/fullarticle/2804364.

    Article  PubMed  Google Scholar 

  18. Cai LZ, Shaheen A, Jin A, Fukui R, Yi JS, Yannuzzi N, et al. Performance of generative large language models on ophthalmology board–style questions. Am J Ophthalmol. 2023;254:141–9.

    Article  PubMed  Google Scholar 

  19. Mihalache A, Huang RS, Popovic MM, Patil NS, Shor R, Pandya BU, et al. Accuracy of an Artificial Intelligence Chatbot’s Interpretation of Clinical Ophthalmic Images. JAMA Ophthalmol. 2024. https://doi.org/10.1001/jamaophthalmol.2024.0017.

Download references

Acknowledgements

RHM research is supported by the Silber TARGET Fund.

Author information

Authors and Affiliations

Authors

Contributions

AM was responsible for analysing data, interpreting results, and writing the manuscript. JG was responsible for prompting the chatbot and extracting data. NSP was responsible for the conception of the study’s design and revision of the manuscript. RSH was responsible for figure creation, analysing data, and revision of the manuscript. MMP was responsible for the conception of the study’s design and revision of the manuscript. AM was responsible for supervision of the study. PJK was responsible for supervision of the study. RHM was responsible for revision of the manuscript and supervision of the study. All authors fulfill ICMJE Criteria for Authorship.

Corresponding author

Correspondence to Rajeev H. Muni.

Ethics declarations

Competing interests

AM: None; JG: None; NSP: None; RSH: None; MMP: Financial support (to institution)—PSI Foundation, Fighting Blindness Canada; AM: None. PJK: Honoraria: Novartis, Bayer, Roche, Boehringer Ingelheim, RegenxBio, Apellis; Advisory board—Novartis, Bayer, Roche, Apellis, Novelty Nobility, Viatris, Biogen; Financial support (to institution)—Roche, Novartis, Bayer, RegenxBio; RHM: Consultant—Alcon, Apellis, AbbVie, Bayer, Bausch Health, Roche; Financial Support (to institution)- Alcon, AbbVie, Bayer, Novartis, Roche.

Ethics

An ethics statement was not required for this study type, no human or animal subjects or materials were used.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mihalache, A., Grad, J., Patil, N.S. et al. Google Gemini and Bard artificial intelligence chatbot performance in ophthalmology knowledge assessment. Eye (2024). https://doi.org/10.1038/s41433-024-03067-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1038/s41433-024-03067-4

Search

Quick links