Computer Science > Computer Vision and Pattern Recognition

arXiv:2410.01620 (cs)

[Submitted on 2 Oct 2024 (v1), last revised 19 Oct 2024 (this version, v4)]

Title:LMOD: A Large Multimodal Ophthalmology Dataset and Benchmark for Large Vision-Language Models

Authors:Zhenyue Qin, Yu Yin, Dylan Campbell, Xuansheng Wu, Ke Zou, Yih-Chung Tham, Ninghao Liu, Xiuzhen Zhang, Qingyu Chen

View PDF

Abstract:The prevalence of vision-threatening eye diseases is a significant global burden, with many cases remaining undiagnosed or diagnosed too late for effective treatment. Large vision-language models (LVLMs) have the potential to assist in understanding anatomical information, diagnosing eye diseases, and drafting interpretations and follow-up plans, thereby reducing the burden on clinicians and improving access to eye care. However, limited benchmarks are available to assess LVLMs' performance in ophthalmology-specific applications. In this study, we introduce LMOD, a large-scale multimodal ophthalmology benchmark consisting of 21,993 instances across (1) five ophthalmic imaging modalities: optical coherence tomography, color fundus photographs, scanning laser ophthalmoscopy, lens photographs, and surgical scenes; (2) free-text, demographic, and disease biomarker information; and (3) primary ophthalmology-specific applications such as anatomical information understanding, disease diagnosis, and subgroup analysis. In addition, we benchmarked 13 state-of-the-art LVLM representatives from closed-source, open-source, and medical domains. The results demonstrate a significant performance drop for LVLMs in ophthalmology compared to other domains. Systematic error analysis further identified six major failure modes: misclassification, failure to abstain, inconsistent reasoning, hallucination, assertions without justification, and lack of domain-specific knowledge. In contrast, supervised neural networks specifically trained on these tasks as baselines demonstrated high accuracy. These findings underscore the pressing need for benchmarks in the development and validation of ophthalmology-specific LVLMs.

Comments:	Project Page: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2410.01620 [cs.CV]
	(or arXiv:2410.01620v4 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2410.01620

Submission history

From: Zhenyue Qin [view email]
[v1] Wed, 2 Oct 2024 14:57:58 UTC (1,803 KB)
[v2] Thu, 3 Oct 2024 02:29:12 UTC (3,912 KB)
[v3] Tue, 8 Oct 2024 15:08:13 UTC (5,422 KB)
[v4] Sat, 19 Oct 2024 04:02:35 UTC (10,331 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:LMOD: A Large Multimodal Ophthalmology Dataset and Benchmark for Large Vision-Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:LMOD: A Large Multimodal Ophthalmology Dataset and Benchmark for Large Vision-Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators