Expert-Level Accuracy of GPT-4V in Medicine Conceals Hidden Flaws

A recent study has revealed that OpenAI’s Generative Pre-trained Transformer 4 with Vision (GPT-4V) outperforms human physicians in solving medical challenge tasks, specifically in the accuracy of multiple-choice questions. However, this impressive achievement comes with important caveats.

While GPT-4V demonstrated superior performance with an 81.6% accuracy rate compared to 77.8% for human physicians on NEJM Image Challenges, a closer examination revealed that the AI’s reasoning often fell short. This study, unlike previous ones focusing solely on answer accuracy, scrutinized GPT-4V�s ability to comprehend images, recall medical knowledge, and perform step-by-step reasoning.

Key findings include:

Accuracy in Multi-choice Questions: GPT-4V achieved an accuracy rate of 81.6%, slightly higher than human physicians� 77.8%. It also performed well in cases where physicians failed, answering over 78% of such questions correctly.
Flawed Rationales: Despite its high accuracy, GPT-4V frequently provided flawed rationales, especially in image comprehension tasks. In 35.5% of correctly answered questions, the rationales were found lacking, with a significant 27.2% error rate in image comprehension alone.
Reliability in Medical Knowledge Recall: The model showed the most reliability in recalling medical knowledge, with lower error rates ranging from 11.6% to 13.0%.

The study underscores the need for thorough evaluations of AI rationales before integrating such models into clinical workflows. While GPT-4V shows great promise, particularly in decision support roles, its current limitations in rationalizing decisions based on visual data highlight the necessity for cautious and incremental adoption in clinical settings. Further research and development are crucial to ensure these tools can reliably augment human expertise without compromising patient care.

Expert-Level Accuracy of GPT-4V in Medicine Conceals Hidden Flaws

Latest News

Ncardia Introduces New 3D Heart Model to Advance Cardiovascular Research

Central Maine Healthcare Adopts New Digital Pathology Tech to Boost Diagnostic Efficiency

CareDx and Dovetail Genomics Partner to Improve Transplant Matching

Study Suggests Possible Link Between COVID-19 During Pregnancy and Heart Issues in Newborns

Latest Videos

AI and Changing Radiology Landscape

AI in Medical Imaging – From Research to Clinical Practice

Artificial Intelligence in Cardiology

About Us

Contact Us