*Result*: Comparison of ChatGPT-4o, Google Gemini 1.5 Pro, Microsoft Copilot Pro, and Ophthalmologists in the management of uveitis and ocular inflammation: A comparative study of large language models.

Title:
Comparison of ChatGPT-4o, Google Gemini 1.5 Pro, Microsoft Copilot Pro, and Ophthalmologists in the management of uveitis and ocular inflammation: A comparative study of large language models.
Authors:
Demir S; Department of Ophthalmology, Adana 5 Ocak State Hospital, Adana, Turkey. Electronic address: dr.suleymandemir@outlook.com.
Source:
Journal francais d'ophtalmologie [J Fr Ophtalmol] 2025 Apr; Vol. 48 (4), pp. 104468. Date of Electronic Publication: 2025 Mar 13.
Publication Type:
Journal Article; Comparative Study
Language:
English
Journal Info:
Publisher: Masson Country of Publication: France NLM ID: 7804128 Publication Model: Print-Electronic Cited Medium: Internet ISSN: 1773-0597 (Electronic) Linking ISSN: 01815512 NLM ISO Abbreviation: J Fr Ophtalmol Subsets: MEDLINE
Imprint Name(s):
Original Publication: Paris, New York, Masson.
Contributed Indexing:
Keywords: ChatGPT-4o; Google Gemini 1.5 Pro; Inflammation oculaire; LLMs; Microsoft Copilot Pro; Ocular inflammation; Uveitis; Uvéite
Entry Date(s):
Date Created: 20250314 Date Completed: 20250413 Latest Revision: 20250522
Update Code:
20260130
DOI:
10.1016/j.jfo.2025.104468
PMID:
40086266
Database:
MEDLINE

*Further Information*

*Purpose: The aim of this study was to compare the latest large language models (LLMs) ChatGPT-4o, Google Gemini 1.5 Pro and Microsoft Copilot Pro developed by three different companies, with each other and with a group of ophthalmologists, to reveal the strengths and weaknesses of LLMs against each other and against ophthalmologists in the field of uveitis and ocular inflammation.
Methods: Using a personal OphthoQuestions (www.ophthoquestions.com) account, a total of 100 questions from 201 questions on uveitis and ocular inflammation out of a total of 4551 questions on OphthoQuestions, including questions involving multimodal imaging, were included in the study using the randomization feature of the website. In November 2024, ChatGPT-4o, Microsoft Copilot Pro, and Google Gemini 1.5 Pro were asked the same 100 questions: 80 multiple-choice and 20 open-ended questions. Each question was categorized as either true or false. A statistical comparison of the accuracy rates was performed.
Results: Among the 100 questions, ChatGPT-4o, Google Gemini 1.5 Pro, Microsoft Copilot Pro, and the human group (ophthalmologists) answered 80 (80.00%), 81 (81.00%), 80 (80.00%) and 72 (72.00%) questions, respectively, correctly. In the statistical comparisons between the groups for multiple-choice questions, no significant difference was found between the correct and incorrect response rates of the three LLMs and the human group (P=0.207, Cochran's Q test). In the statistical comparisons of responses to open-ended questions, there was no significant difference between the correct and incorrect response rates of the three LLMs and the human group (P=0.392, Cochran's Q test).
Conclusion: Although ChatGPT-4o, Google Gemini 1.5 Pro , and Microsoft Copilot Pro answered higher percentages of questions correctly than the human group, the LLMs were not statistically superior to each other or to the human group in the management of uveitis and ocular inflammation.
(Copyright © 2025 Elsevier Masson SAS. All rights reserved.)*

*Disclosure of interest The author declares that he has no competing interest.*