Challenges in Reproducing Human Evaluation Results for Role-Oriented Dialogue Summarization

Takumi Ito, Qixiang Fang, Pablo Mosteiro, Albert Gatt, Kees van Deemter

Abstract

There is a growing concern regarding the reproducibility of human evaluation studies in NLP. As part of the ReproHum campaign, we conducted a study to assess the reproducibility of a recent human evaluation study in NLP. Specifically, we attempted to reproduce a human evaluation of a novel approach to enhance Role-Oriented Dialogue Summarization by considering the influence of role interactions. Despite our best efforts to adhere to the reported setup, we were unable to reproduce the statistical results as presented in the original paper. While no contradictory evidence was found, our study raises questions about the validity of the reported statistical significance results, and/or the comprehensiveness with which the original study was reported. In this paper, we provide a comprehensive account of our reproduction study, detailing the methodologies employed, data collection, and analysis procedures. We discuss the implications of our findings for the broader issue of reproducibility in NLP research. Our findings serve as a cautionary reminder of the challenges in conducting reproducible human evaluations and prompt further discussions within the NLP community.

Anthology ID:: 2023.humeval-1.9
Volume:: Proceedings of the 3rd Workshop on Human Evaluation of NLP Systems
Month:: September
Year:: 2023
Address:: Varna, Bulgaria
Editors:: Anya Belz, Maja Popović, Ehud Reiter, Craig Thomson, João Sedoc
Venues:: HumEval | WS
SIG:
Publisher:: INCOMA Ltd., Shoumen, Bulgaria
Note:
Pages:: 97–123
Language:
URL:: https://aclanthology.org/2023.humeval-1.9
DOI:
Bibkey:
Cite (ACL):: Takumi Ito, Qixiang Fang, Pablo Mosteiro, Albert Gatt, and Kees van Deemter. 2023. Challenges in Reproducing Human Evaluation Results for Role-Oriented Dialogue Summarization. In Proceedings of the 3rd Workshop on Human Evaluation of NLP Systems, pages 97–123, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
Cite (Informal):: Challenges in Reproducing Human Evaluation Results for Role-Oriented Dialogue Summarization (Ito et al., HumEval-WS 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.humeval-1.9.pdf

PDF Cite Search