Computer Science > Computation and Language

arXiv:2402.15089 (cs)

[Submitted on 23 Feb 2024]

Title:AttributionBench: How Hard is Automatic Attribution Evaluation?

Authors:Yifei Li, Xiang Yue, Zeyi Liao, Huan Sun

Abstract:Modern generative search engines enhance the reliability of large language model (LLM) responses by providing cited evidence. However, evaluating the answer's attribution, i.e., whether every claim within the generated responses is fully supported by its cited evidence, remains an open problem. This verification, traditionally dependent on costly human evaluation, underscores the urgent need for automatic attribution evaluation methods. To bridge the gap in the absence of standardized benchmarks for these methods, we present AttributionBench, a comprehensive benchmark compiled from various existing attribution datasets. Our extensive experiments on AttributionBench reveal the challenges of automatic attribution evaluation, even for state-of-the-art LLMs. Specifically, our findings show that even a fine-tuned GPT-3.5 only achieves around 80% macro-F1 under a binary classification formulation. A detailed analysis of more than 300 error cases indicates that a majority of failures stem from the model's inability to process nuanced information, and the discrepancy between the information the model has access to and that human annotators do.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2402.15089 [cs.CL]
	(or arXiv:2402.15089v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2402.15089

Submission history

From: Yifei Li [view email]
[v1] Fri, 23 Feb 2024 04:23:33 UTC (9,232 KB)

Computer Science > Computation and Language

Title:AttributionBench: How Hard is Automatic Attribution Evaluation?

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:AttributionBench: How Hard is Automatic Attribution Evaluation?

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators