EntSUM: A Data Set for Entity-Centric Extractive Summarization

Mounica Maddela, Mayank Kulkarni, Daniel Preotiuc-Pietro

Abstract

Controllable summarization aims to provide summaries that take into account user-specified aspects and preferences to better assist them with their information need, as opposed to the standard summarization setup which build a single generic summary of a document. We introduce a human-annotated data set EntSUM for controllable summarization with a focus on named entities as the aspects to control. We conduct an extensive quantitative analysis to motivate the task of entity-centric summarization and show that existing methods for controllable summarization fail to generate entity-centric summaries. We propose extensions to state-of-the-art summarization approaches that achieve substantially better results on our data set. Our analysis and results show the challenging nature of this task and of the proposed data set.

Anthology ID:: 2022.acl-long.237
Volume:: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: May
Year:: 2022
Address:: Dublin, Ireland
Editors:: Smaranda Muresan, Preslav Nakov, Aline Villavicencio
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 3355–3366
Language:
URL:: https://aclanthology.org/2022.acl-long.237/
DOI:: 10.18653/v1/2022.acl-long.237
Bibkey:
Cite (ACL):: Mounica Maddela, Mayank Kulkarni, and Daniel Preotiuc-Pietro. 2022. EntSUM: A Data Set for Entity-Centric Extractive Summarization. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3355–3366, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):: EntSUM: A Data Set for Entity-Centric Extractive Summarization (Maddela et al., ACL 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.acl-long.237.pdf
Video:: https://aclanthology.org/2022.acl-long.237.mp4
Code: bloomberg/entsum
Data: New York Times Annotated Corpus

PDF Cite Search Code Video Fix data