Computer Science > Computation and Language

arXiv:2402.11100 (cs)

[Submitted on 16 Feb 2024 (v1), last revised 9 Jun 2024 (this version, v2)]

Title:When LLMs Meet Cunning Texts: A Fallacy Understanding Benchmark for Large Language Models

Authors:Yinghui Li, Qingyu Zhou, Yuanzhen Luo, Shirong Ma, Yangning Li, Hai-Tao Zheng, Xuming Hu, Philip S. Yu

Abstract:Recently, Large Language Models (LLMs) make remarkable evolutions in language understanding and generation. Following this, various benchmarks for measuring all kinds of capabilities of LLMs have sprung up. In this paper, we challenge the reasoning and understanding abilities of LLMs by proposing a FaLlacy Understanding Benchmark (FLUB) containing cunning texts that are easy for humans to understand but difficult for models to grasp. Specifically, the cunning texts that FLUB focuses on mainly consist of the tricky, humorous, and misleading texts collected from the real internet environment. And we design three tasks with increasing difficulty in the FLUB benchmark to evaluate the fallacy understanding ability of LLMs. Based on FLUB, we investigate the performance of multiple representative and advanced LLMs, reflecting our FLUB is challenging and worthy of more future study. Interesting discoveries and valuable insights are achieved in our extensive experiments and detailed analyses. We hope that our benchmark can encourage the community to improve LLMs' ability to understand fallacies. Our data and codes are available at this https URL.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2402.11100 [cs.CL]
	(or arXiv:2402.11100v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2402.11100

Submission history

From: Yinghui Li [view email]
[v1] Fri, 16 Feb 2024 22:12:53 UTC (342 KB)
[v2] Sun, 9 Jun 2024 17:55:05 UTC (644 KB)

Computer Science > Computation and Language

Title:When LLMs Meet Cunning Texts: A Fallacy Understanding Benchmark for Large Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:When LLMs Meet Cunning Texts: A Fallacy Understanding Benchmark for Large Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators