Computer Science > Computation and Language

arXiv:2307.04285 (cs)

[Submitted on 10 Jul 2023]

Title:HistRED: A Historical Document-Level Relation Extraction Dataset

Authors:Soyoung Yang, Minseok Choi, Youngwoo Cho, Jaegul Choo

View PDF

Abstract:Despite the extensive applications of relation extraction (RE) tasks in various domains, little has been explored in the historical context, which contains promising data across hundreds and thousands of years. To promote the historical RE research, we present HistRED constructed from Yeonhaengnok. Yeonhaengnok is a collection of records originally written in Hanja, the classical Chinese writing, which has later been translated into Korean. HistRED provides bilingual annotations such that RE can be performed on Korean and Hanja texts. In addition, HistRED supports various self-contained subtexts with different lengths, from a sentence level to a document level, supporting diverse context settings for researchers to evaluate the robustness of their RE models. To demonstrate the usefulness of our dataset, we propose a bilingual RE model that leverages both Korean and Hanja contexts to predict relations between entities. Our model outperforms monolingual baselines on HistRED, showing that employing multiple language contexts supplements the RE predictions. The dataset is publicly available at: this https URL under CC BY-NC-ND 4.0 license.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2307.04285 [cs.CL]
	(or arXiv:2307.04285v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2307.04285
Journal reference:	ACL 2023

Submission history

From: Soyoung Yang [view email]
[v1] Mon, 10 Jul 2023 00:24:27 UTC (7,264 KB)

Computer Science > Computation and Language

Title:HistRED: A Historical Document-Level Relation Extraction Dataset

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:HistRED: A Historical Document-Level Relation Extraction Dataset

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators