Computer Science > Artificial Intelligence

arXiv:2407.09152 (cs)

[Submitted on 12 Jul 2024]

Title:The Two Sides of the Coin: Hallucination Generation and Detection with LLMs as Evaluators for LLMs

Authors:Anh Thu Maria Bui, Saskia Felizitas Brech, Natalie Hußfeldt, Tobias Jennert, Melanie Ullrich, Timo Breuer, Narjes Nikzad Khasmakhi, Philipp Schaer

View PDF

Abstract:Hallucination detection in Large Language Models (LLMs) is crucial for ensuring their reliability. This work presents our participation in the CLEF ELOQUENT HalluciGen shared task, where the goal is to develop evaluators for both generating and detecting hallucinated content. We explored the capabilities of four LLMs: Llama 3, Gemma, GPT-3.5 Turbo, and GPT-4, for this purpose. We also employed ensemble majority voting to incorporate all four models for the detection task. The results provide valuable insights into the strengths and weaknesses of these LLMs in handling hallucination generation and detection tasks.

Comments:	Paper accepted at ELOQUENT@CLEF'24
Subjects:	Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Cite as:	arXiv:2407.09152 [cs.AI]
	(or arXiv:2407.09152v1 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2407.09152

Submission history

From: Narjes Nikzad Khasmakhi [view email]
[v1] Fri, 12 Jul 2024 10:34:46 UTC (662 KB)

Computer Science > Artificial Intelligence

Title:The Two Sides of the Coin: Hallucination Generation and Detection with LLMs as Evaluators for LLMs

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:The Two Sides of the Coin: Hallucination Generation and Detection with LLMs as Evaluators for LLMs

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators