Computer Science > Computer Vision and Pattern Recognition

arXiv:2402.01345 (cs)

[Submitted on 2 Feb 2024 (v1), last revised 8 May 2024 (this version, v6)]

Title:Skip \n: A Simple Method to Reduce Hallucination in Large Vision-Language Models

Authors:Zongbo Han, Zechen Bai, Haiyang Mei, Qianli Xu, Changqing Zhang, Mike Zheng Shou

Abstract:Recent advancements in large vision-language models (LVLMs) have demonstrated impressive capability in visual information understanding with human language. Despite these advances, LVLMs still face challenges with multimodal hallucination, such as generating text descriptions of objects that are not present in the visual information. However, the underlying fundamental reasons of multimodal hallucinations remain poorly explored. In this paper, we propose a new perspective, suggesting that the inherent biases in LVLMs might be a key factor in hallucinations. Specifically, we systematically identify a semantic shift bias related to paragraph breaks (\n\n), where the content before and after '\n\n' in the training data frequently exhibit significant semantic changes. This pattern leads the model to infer that the contents following '\n\n' should be obviously different from the preceding contents with less hallucinatory descriptions, thereby increasing the probability of hallucinatory descriptions subsequent to the '\n\n'. We have validated this hypothesis on multiple publicly available LVLMs. Besides, we find that deliberately inserting '\n\n' at the generated description can induce more hallucinations. A simple method is proposed to effectively mitigate the hallucination of LVLMs by skipping the output of '\n'.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2402.01345 [cs.CV]
	(or arXiv:2402.01345v6 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2402.01345

Submission history

From: Zongbo Han [view email]
[v1] Fri, 2 Feb 2024 12:02:46 UTC (388 KB)
[v2] Tue, 6 Feb 2024 05:10:33 UTC (388 KB)
[v3] Wed, 7 Feb 2024 08:07:02 UTC (388 KB)
[v4] Mon, 12 Feb 2024 13:53:20 UTC (393 KB)
[v5] Tue, 7 May 2024 01:46:15 UTC (391 KB)
[v6] Wed, 8 May 2024 02:15:45 UTC (388 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Skip \n: A Simple Method to Reduce Hallucination in Large Vision-Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Skip \n: A Simple Method to Reduce Hallucination in Large Vision-Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators