Open AccessArticle

Document-Level Event Argument Extraction with Sparse Representation Attention

Mengxi Zhang

and

Honghui Chen

National Key Laboratory of Information Systems Engineering, National University of Defense Technology, Changsha 410073, China

Author to whom correspondence should be addressed.

Mathematics 2024, 12(17), 2636; https://doi.org/10.3390/math12172636

Submission received: 19 July 2024 / Revised: 20 August 2024 / Accepted: 23 August 2024 / Published: 25 August 2024

Download

Browse Figures

Versions Notes

Abstract

Document-level Event Argument Extraction (DEAE) aims to extract structural event knowledge composed of arguments and roles beyond the sentence level. Existing methods mainly focus on designing prompts and using Abstract Meaning Representation (AMR) graph structure as additional features to enrich event argument representation. However, two challenges still remain: (1) the long-range dependency between event trigger and event arguments and (2) the distracting context in the document towards an event that can mislead the argument classification. To address these issues, we propose a novel document-level event argument extraction model named AMR Parser and Sparse Representation (APSR). Specifically, APSR sets inter- and intra-sentential encoders to capture the contextual information in different scopes. Especially, in the intra-sentential encoder, APSR designs three types of sparse event argument attention mechanisms to extract the long-range dependency. Then, APSR constructs AMR semantic graphs, which capture the interactions among concepts well. Finally, APSR fuses the inter- and intra-sentential representations and predicts what role a candidate span plays. Experimental results on the RAMS and WikiEvents datasets demonstrate that APSR achieves a superior performance compared with competitive baselines in terms of F1 by

1.27 %

and

3.12 %

, respectively.

Keywords:

event argument extraction; document level; attention mechanism; information extraction

MSC:

68T50

1. Introduction

Event Extraction (EE) aims to identify and extract the event structures from unstructured text, which is a vital sub-task of information extraction [1]. EE can be generally classified into two subtasks [2]: (1) Event Detection (ED), with the goal of identifying the trigger word with its event type and (2) Event Argument Extraction (EAE), with the goal of identifying the arguments and classifying the specific roles they play. For example, in Figure 1a, for event trigger “murder” with its event type “Life: die”, the EAE model can extract arguments “nanny”, “child”, and “Moscow” and argument roles “killer”, “victim”, and “place”. In practice, EAE can benefit a wide range of downstream applications, such as event knowledge graph [3,4], recommender system [5,6,7], and question answering [8,9].

Moreover, EAE can be typically classified into sentence-level (SEAE) and document-level (DEAE), where most previous methods only extract events in a single sentence [10,11,12,13,14]. In addition, DEAE can be generally classified into two categories. (1) The first category is generation-based DEAE, which directly applies specific decoding strategies [15,16] or utilizes the pre-trained language models with prompts providing semantic guidance to capture the event arguments [17,18,19,20]. Other studies convert this problem into question answering (QA) [13,21] or machine reading comprehension (MRC) [22]. However, due to the independent modeling of entity mentions, these methods still exhibit limitations when addressing long-distance dependency. (2) The other one is span-based DEAE, which predicts event argument roles by identifying the start and end positions of candidate text spans [23,24,25,26,27]. These methods are capable of capturing cross-sentence arguments and multi-hop structures while suffering from the maximum length limitation. Recently, some studies have constructed an Abstract Meaning Representation (AMR) graph as additional features or training signals [28,29,30,31]. However, existing approaches typically set up an additional graph structure to better utilize the global information of context. This results in some distracting noise in the AMR graph, which may mislead the extraction. To solve this issue, Xu et al. [30] proposed a two-stream encoding model to make better use of feasible context. However, these methods mainly focus on leveraging AMR as supplementary features to enhance span representation, ignoring the written pattern of the document. How to figure out useful contextual information in the document still remains under-explored.

To sum up, two critical challenges still remain for DEAE: (1) Long-range dependency: in practical situations, the event structures are usually scattered in different sentences [19]. As shown in Figure 1b, for event trigger word “jail” with event type “Justice: arrestjaildetain” in sentence S3, the arguments “jailer”, “detainee” and “crime” are scattered in S3 and S5, which highly aggravate the complexity of extraction. (2) Distracting context issue: although a document typically provides more semantic information than a single sentence, distracting noise may also be brought in, which may mislead the extraction of event arguments.

To address the aforementioned challenges, we propose a novel DEAE model with AMR Parser and Sparse Representation (APSR). Specifically, to alleviate the long-range dependency issue, we construct the AMR parser graph structure, which is capable of providing abundant semantic information from the input sentences [32]. Inspired by [33,34], we consider that differences exist in event argument density between the sentence level and the document level, which means the density of event arguments in a whole document is generally much lower than that of a single sentence with an event trigger word. Based on that, we propose utilizing the sparse argument representation mechanism to capture high-quality argument representation. According to different human writing patterns, we design three types of sparse argument representation masks (i.e., sequential, flashback, and banded, respectively) to skip irrelevant information. In detail, for sequential mask, we consider the events are presented in the order they occur chronologically or logically, so the tokens in the former sentence can observe tokens in the latter. Moreover, the flashback mask refers to the narrative technique of inserting scenes or events that have occurred in the past into the current timeline of a story, so tokens in the latter sentence can observe tokens in the former sentence. In addition, for a banded mask, we argue that the arguments of an event are mostly scattered in the neighbor sentences, so we set that the tokens within a sentence can only observe tokens in the neighbor sentences. By adjusting the reception field of tokens in sparse argument attention mechanism, the model is capable of capturing high-quality argument representation.

In brief, the main contributions of this study are as follows:

We propose a span-based model for DEAE with a sparse argument representation encoder, which consists of an inter- and intra-sentential encoder with well-designed sparse argument attention mechanism to encode the document from different perspectives.
We propose three types of sparse argument attention masks (i.e., sequential, flashback, and banded, respectively), which are capable of introducing useful language bias.
Experimental results on two widely used benchmark datasets, i.e., RAMS and WikiEvents, validate APSR’s superiority over the state-of-the-art baselines.

2. Related Work

The DEAE task can be traced back to the MUC-4 benchmark [35], which requires retrieving participating entities and attribute values specific to each scenario. Recent methods for DEAE can be classified into generation-based DEAE method and span-based DEAE method [36].

2.1. Generation-Based DEAE Method

Generation-based DEAE methods can generate event arguments sequentially by designing a prompt containing a fixed format [16,17,18,20]. For instance, Lu et al. [18] designed a sequence-to-structure generation structure to capture event information directly from parallel text-record annotation. However, these methods focussed on modeling each event independently, so the generation precision decreased when there were too many words in the extracted arguments. Then, Liu et al. [22], introduced MRC to generate both topic-relevant and context-dependent questions, aiming to strengthen the reasoning process. Furthermore, Zeng et al. [37] attempted to use alignment-enhanced training and iterative inference to extract arguments, and thus no predefined event–event relation and event schema were required.

In addition, some works focussed on alleviating the cost of data annotation. For instance, Zhang et al. [38] defined the overlap knowledge across datasets for DEAE. Moreover, Du and Ji [21] proposed a retrieval-augmented QA-based model to capture cross-argument dependency, and further improved the performance of the EAE task under domain transfer and few-shot learning settings. Then, Lin et al. [39] focussed on a zero-shot scenario using global constraints with prompting, which reduced the human annotation effort. Furthermore, Cao et al. [40] devised a prefix-tuning network for cross-lingual setting.

However, these generation-based methods mainly generated all of the arguments simultaneously, so that they still exhibited limitations when dealing with long-distance arguments.

2.2. Span-Based DEAE Method

Span-based DEAE methods involve identifying candidate text spans that mark the starting and ending positions of arguments [23,24,25,26]. For instance, Zhang et al. [23] set the DEAE task as a two-step process composed of argument head-word detection and head-to-span expansion. In addition, Liu et al. [41] focussed on the non-argument contextual clues and the relevance among argument roles. And some studies formulated EAE as a QA task, casting DEAE into a series of reading comprehension problems [42,43]. However, such approaches usually have a maximum length limitation.

Furthermore, some researchers have also introduced Abstract Meaning Representation (AMR) [44] semantic graphs as additional features to enrich span representation. For example, Lin et al. [28] constructed an AMR graph as the training signals, which could help reflect the long-distance dependency. Moreover, Xu et al. [30] constructed a two-stream AMR-based interaction module to facilitate the understanding of complex semantic structures. In addition, other studies [29,31] explored different ways to guide the AMR graph for better suiting the information extraction task. However, these approaches tend to enlarge the graph size, while the graph introduces some unnecessary noise and thus decrease the computational efficiency. To address this issue, Yang et al. [45] formulated DEAE as a link prediction task, which significantly reduced the model inference time.

To sum up, these methods mainly focussed on taking the graph structure as additional features to enrich span representation, ignoring the written pattern of the document. Inspired by [33,34], we propose encoding the document from different perspectives to capture high-quality semantic information. Moreover, Table 1 illustrates some properties of two kinds of methods.

3. Approach

3.1. Task Formulation

We first formulate the DEAE task by following [27]. Specifically, we denote a document

D

that consists of N sentences as

D = \{w_{1}, w_{2}, \dots, w_{| N |}\}

, where each sentence contains a sequence of words, i.e.,

w_{i} = [x_{1}, x_{2}, \dots, x_{l}]

. Within the dataset, a document

D

also contains a set of pre-defined event types with a set

E

, where each event type

e \in E

is designated by the trigger

t \in D

. Then, the corresponding role set to each event type

e \in E

is pre-defined in the dataset as

R_{e}

R_{e} = \{r_{1}, r_{2}, \dots, r_{3}\}

. Our DEAE task aims to detect all

(r, s)

pairs for each event, where

r \in R_{e}

denotes the event argument role related to event type

e \in E

, and

s \subseteq D

is a contiguous text span in

D

The overall framework of the APSR model is illustrated in Figure 2, which focuses on high-quality event argument representation. Specifically, as shown in Figure 2, the framework of APSR consists of three main components: Giving a document as the input, we first encode the document with inter- and intra-sentential encoders using different self-attention mechanisms. Specifically, we design three types of sparse argument representation mask matrices (i.e., sequential, flashback, and banded, respectively) to capture different linguistic bias. The detail of a sparse argument representation mask is illustrated in the dashed box below. Then, we parse each sentence into the AMR graph, and transfer it into DGL graphs. Finally, we fuse the inter- and intra-sentential argument representations and predict what argument role the candidate span plays.

3.2. Sparse Argument Representation Encoder

In order to simultaneously identify the essential contextual information and filter the distracting information, we construct a sparse argument representation encoder, consisting of an inter-sentential encoder that is cognizant of the whole document, and an intra-sentential encoder with a well-designed sparse argument attention mechanism.

We first utilize BERT-

b a s e

[46] to encode the document, including an inter-sentential encoder

{B E R T}^{i n t e r}

and an intra-sentential encoder

{B E R T}^{i n t r a}

. Then, we obtain the corresponding representations

H^{i n t e r}

and

H^{i n t r a}

, as follows:

\begin{matrix} H^{i n t e r} = [h_{1}^{i n t e r}, h_{2}^{i n t e r}, \dots, h_{| D |}^{i n t e r}] = {BERT}^{i n t e r} ([w_{1}, w_{2}, \dots, w_{| D |}]) \\ H^{i n t r a} = [h_{1}^{i n t r a}, h_{2}^{i n t r a}, \dots, h_{| D |}^{i n t r a}] = {BERT}^{i n t r a} ([w_{1}, w_{2}, \dots, w_{| D |}]) \end{matrix}

(1)

where

h_{i}, i \in {1, 2, \dots, D}

denotes the contextual representation for each word

w_{i}

Specifically, on the one hand, the attention technique of the inter-sentential encoder can be formulated as follows [47]:

A t t e n t i o n^{i n t e r} (Q, K, V) = s o f t m a x (\frac{Q K^{T}}{\sqrt{d_{m}}}) V

(2)

where

Q, K, V

denote the query, key, and value matrix in the inter-sentential encoder, and

d_{m}

is the model dimension.

Moreover, on the other hand, in the intra-sentential encoder, due to the different information density between the sentence and document, we introduce a mask matrix as

M \in R^{N \times N}

to skip the irrelevant information as follows:

A t t e n t i o n^{i n t r a} (Q^{'}, K^{'}, V^{'}) = s o f t m a x (\frac{Q^{'} {K^{'}}^{T} + M}{\sqrt{d_{m}}}) V^{'}

(3)

where

Q^{'}, K^{'}, V^{'}

denote the query, key, and value matrix in the intra-sentential encoder, and

d_{m}

is the model dimension.

Specifically, we design three types of sparse event argument self-attention mechanisms, including sequential, flashback, and banded. Each

M \in {0, - \infty}

represents if the information of the jth sentence can be attended to the token in ith sentence:

Sequential. Sequential means presenting events or information in the order they occur chronologically or logically. We consider the events are described in the sequential order, that is, tokens in the former sentence can see tokens in the latter one:

$M_{s e q u e n t i a l} = \{\begin{matrix} 0, & i f j - i > 0 \\ - \infty, & O t h e r w i s e \end{matrix}$

(4)
Flashback. Flashback refers to the narrative technique of inserting scenes or events that have occurred in the past into the current timeline of a story (e.g., in historical documentary and literature), so tokens in latter sentence can observe tokens in the former sentence:

$M_{f l a s h b a c k} = \{\begin{matrix} 0, & i f i - j > 0 \\ - \infty, & O t h e r w i s e \end{matrix}$

(5)
Banded. Considering that the arguments of an event are mostly scattered in neighbor sentences, we set that tokens can only observe tokens in neighbor sentences within the neighbor hop of 3:

$M_{b a n d e d} = \{\begin{matrix} 0, & i f | i - j | < 3 \\ - \infty, & O t h e r w i s e \end{matrix}$

(6)

3.3. AMR Parser Module

The AMR graph can provide rich logical semantic information to benefit the understanding of intra-sentential and inter-sentential features. Specifically, we first adopt the transition-based AMR parser [48] to generate AMR graph

G = (V, E)

. In detail, each node

v_{i} = (m_{i}, n_{i}) \in V

describes a semantic concept, where

m_{i}

and

n_{i}

denote the span that marks the starting and ending positions in the original sentence, respectively, and each edge

e_{i, j} \in E

represents the specific relation between node

v_{i}

and

v_{j}

. Following [31], we then cluster the relation types into 12 clusters.

Due to the ability of the multi-layer Graph Convolutional Network (GCN) [49] to aggregate information from deeper neighboring nodes and enhance feature representation [50], we first employed the L-layer stacked GCN to capture interactions among distinct concept nodes connected by edges representing diverse relation types. At the l-th layer, for node n, we formulated the graph convolutional operation:

h_{n}^{(l + 1)} = ReLU (\sum_{k \in K} \sum_{v \in N_{k} (n) \cup {n}} \frac{1}{c_{n, k}} W_{k}^{(l)} h_{v}^{(l)})

(7)

where

K

is the clusters of 12 relation types that we define above,

N_{k} (n)

represents the set of neighbors of node n linked by k-th relation types, where

c_{n, k}

denotes a normalization constant, and

W_{k}^{(l)}

denotes a trainable parameter.

Then, we concatenate the hidden states across all layers to obtain the ultimate representation of node n, formulated as follows:

\begin{matrix} m_{n} = W_{1} [h_{n}^{0}, h_{n}^{1}, \dots, h_{n}^{L}] \end{matrix}

(8)

where

h_{n}^{0}

denotes the initial representation of node n.

After that,

m_{n}

is decomposed as the intra-sentential representations of the corresponding words, which are then aggregated token by token as follows:

{\tilde{h}}_{i}^{i n t r a} = z_{i}^{i n t r a} + \frac{\sum_{u} I (a_{n} < = i \land b_{n} > = i) m_{n}}{\sum_{n} I (a_{n} < = i \land b_{n} > = i)}

(9)

3.4. Fusion and Classification Module

In the fusion module, we first integrate the intra-sentential representations

{\tilde{H}}^{i n t e r} = [{\tilde{h}}_{1}^{i n t e r}, {\tilde{h}}_{2}^{i n t e r}, \dots, {\tilde{h}}_{| D |}^{i n t e r}]

and the inter-sentential representations

{\tilde{H}}^{i n t r a} = [{\tilde{h}}_{1}^{i n t r a}, {\tilde{h}}_{2}^{i n t r a}, \dots, {\tilde{h}}_{| D |}^{i n t r a}]

to formulate the ultimate representations of the candidate spans.

Then, we adopt a gating mechanism to fuse the two vector representations in the sentence and obtain the gate vector

g_{i}

g_{i} = s i g m o i d (W_{2} {\tilde{h}}_{i}^{i n t e r} + W_{3} {\tilde{h}}_{i}^{i n t r a} + b)

(10)

where

W_{2}

and

W_{3}

are trainable parameters.

Subsequently, we calculate the fused representations

{\tilde{h}}_{i}

{\tilde{h}}_{i} = g_{i} ⊙ {\tilde{h}}_{i}^{i n t e r} + (1 - g_{i}) ⊙ {\tilde{h}}_{i}^{i n t r a}

(11)

Moreover, for a candidate span extending from

w_{i}

w_{j}

, we calculate integrated representation

s_{i : j}

s_{i : j} = W_{s p a n} [{\tilde{h}}_{i}^{s t a r t}; {\tilde{h}}_{j}^{e n d}; \frac{1}{j - i + 1} \sum_{k = i}^{j} {\tilde{h}}_{k}]

(12)

where

W_{s p a n} \in R^{d \times 3 d}

represents the learnable parameter,

{\tilde{h}}_{i}^{s t a r t} = W_{s} {\tilde{h}}_{i}

and

{\tilde{h}}_{i}^{e n d} = W_{e} {\tilde{h}}_{i}

Finally, following [30], we introduce token-wise classifiers to predict the likelihood of the word

w_{i}

serves as the first or last word within the golden span:

P_{i}^{s} = s i g m o i d (W_{4} {\tilde{h}}_{i}^{s t a r t})

P_{i}^{e} = s i g m o i d (W_{5} {\tilde{h}}_{i}^{e n d})

, where

W_{4}

and

W_{5}

are trainable parameters. In order to enhance the boundary information for spans, we adopt boundary loss:

\begin{matrix} L_{s p a n} = - \sum_{i = 1}^{| D |} [y_{i}^{s} log P_{i}^{s} + (1 - y_{i}^{s}) log (1 - P_{i}^{s}) \\ + y_{i}^{e} log P_{i}^{e} + (1 - y_{i}^{e}) log (1 - P_{i}^{e})] \end{matrix}

(13)

where

y_{i}^{s}

and

y_{i}^{e}

are the golden labels.

In the classification model, we need to predict what specific role that candidate argument plays. To start with, we need to calculate the final classification vector

I_{i : j}

(Xu et al. [30]):

I_{i : j} = [s_{i : j}; {\tilde{h}}_{t}; {\tilde{h}}_{t} ⊙ s_{i : j}; | {\tilde{h}}_{t} - s_{i : j} |; H^{e}; E_{l e n}]

(14)

where

{\tilde{h}}_{t}

denotes trigger representation, and

H^{e}

and

E_{l e n}

represent the embedding of event type and span length, respectively.

Since DEAE is a multi-class classification task, we then employ cross entropy

L_{c}

L_{c l s} = - \sum_{i = 1}^{| D |} \sum_{j = i}^{| D |} y_{i : j} log P (r_{i : j} = y_{i : j})

(15)

where

y_{i : j}

denotes the golden argument role.

Finally, we train the model with final loss function

L = L_{c l s} + λ L_{s p a n}

with hyperparameter

λ

4. Experiments

4.1. Research Questions

To assess the effectiveness of APSR, we focus on the following four research questions:

RQ1: Can our proposed APSR model enhance the performance of DEAE compared with state-of-the-art baselines?
RQ2: Which part of APSR contributes the most to the extraction accuracy?
RQ3: How effective is the sparse argument attention mechanism for DEAE?
RQ4: Does APSR solve the issues caused by the long-range dependency and distracting context?

4.2. Datasets and Evaluation Metrics

We evaluate APSR on two widely used DEAE datasets, i.e., RAMS [27] and WikiEvents [19]. Both datasets are derived from real-world news articles and encompass a vast array of generic event types. The detailed statistics are illustrated in Table 2. Specifically, the RAMS dataset covers 3993 documents, annotated with 139 event types and 65 role types, while the WikiEvents dataset comprises 246 documents, covering 50 event types and 59 role types.

Following [23], we evaluate our results on the RAMS dataset in terms of F1 score, including Head F1 and Span F1. Moreover, we followed [19] to evaluate Head F1 and Coref F1 for the WikiEvents dataset.

Head F1: focuses exclusively on the accuracy of the head word within the event argument span. This metric evaluates the model’s performance in identifying the core part of the argument.
Span F1: evaluates the correctness that the predicted argument spans completely align with the golden ones. This metric assesses both the recognition of the argument and the precision of its boundaries.
Coref F1: measures the agreement between the extracted argument and the gold-standard argument [51] in terms of coreference. This metric emphasizes the model’s performance when maintaining contextual consistency.

4.3. Compared Baselines

Following [30,41,45], we validate the performance of APSR against the following competitive baselines for DEAE. Specifically, we first select several baselines for the RAMS dataset:

BERT-CRF [52], the first model using BERT-based BIO tagging scheme for semantic role labeling;
${BERT-CRF}_{T C D}$ [27], a model that adopts greedy decoding and type-constrained decoding mechanism to BERT-CRF;
Two-Step [23], the first approach which identifies the head-words of event arguments;
${Two-Step}_{T C D}$ [27], a span-based method that utilizes type-constrained decoding mechanism to Two-Step model;
TSAR [30], a two-steam span-based model with AMR-guided interaction mechanism.

For the WikiEvents dataset, we compare our APSR with BERT-CRF [52], TSAR [30] for comparison, and the following two approaches that are specifically evaluated on WikiEvents:

BERT-QA [13], the first model to consider EE as QA-based task.
BERT-QA-Doc [13], a model that runs BERT-QA on the document-level.

4.4. Experimental Settings

We utilize the widely uesd PLMs BERT-

b a s e

[46] as the encoder. Moreover, We set the batch size to 8, learning rate to 3 ×

10^{- 5}

, and employ AdamW [53] as the optimizer for training the APSR model. In addition, for the RAMS dataset, we conduct training for 50 epochs, with boundary loss

λ

and gcn-layer L set to 0.05 and 4, respectively. Differently, We train APSR for 100 epochs for WikiEvents, with boundary loss

λ

to 0.1 and gcn-layer L to 3, respectively.

5. Results and Discussion

5.1. Overall Performance

To answer

R Q 1

, we analyze the DEAE performance of APSR and baseline models on two public datasets: RAMS [27] and WikiEvents [19]. The overall performances on RAMS and WikiEvents are shown in Table 3 and Table 4, respectively.

Table 3 presents the results on the RAMS dataset. Obviously, we can see that

{APSR}_{s e q u e n t i a l}

and

{APSR}_{b a n d e d}

consistently outperform others in terms of Span F1 and Head F1. While

{APSR}_{f l a s h b a c k}

only achieves

0.6 %

improvement in terms of Head F1 over TSAR. Through the analysis of data source of RAMS, we argue this may be because the news articles are usually organized in chronological or logical order, and seldom adopt a narrative approach. In addition, compared with the span-based model TSAR,

{APSR}_{s e q u e n t i a l}

improves up to

0.35 %

in terms of Span F1 on the dev set and

1.27 %

in terms of Head F1 on the test set. These results show the effectiveness of introducing sparse argument attention mechanism to bring useful language bias for the DEAE task.

A similar conclusion can be derived from Table 4, where we evaluate two subtasks of DEAE, i.e., argument identification and argument classification. Specifically, when comparing QA-based approaches such as BERT-QA and BERT-QA-Doc,

{APSR}_{b a n d e d}

improves up to

15.55 %

in terms of Head F1 for argument identification and

13.41 %

in terms of Head F1 for argument classification over BERT-QA, respectively. We analyze this may be due to that QA-based methods are constrained by their ability to capture context features within a restricted span and they suffer from poor argument identification. Due to QA-based methods generating multiple answers for the same role, these answer spans are frequently either close together or overlapping. While the sparse argument representation mechanism is able to alleviate impacts of long-range dependency. Compared with the span-based method TSAR, our performance is a bit better. We attribute performance improvement to APSR’s ability to encode the context from different perspectives and obtain high-quality semantic information.

5.2. Ablation Study

To answer

R Q 2

, we execute an ablation study to assess the performance of different components in

{APSR}_{b a n d e d}

on the WikiEvents test set. The results are shown in Table 5.

First, we remove the intra-sentential encoder, we can observe that APSR decreases by

0.47 %

0.02 %

in terms of Head F1 for argument identification/classification, respectively, while the corresponding decrease rates in terms of Coref F1 are

1.43 %

0.98 %

. As Coref F1 measures the model’s performance in handling coreference of event arguments, which involves linking different mentions of the same argument, the experimental results indicate that the enhancement of the intra-sentential encoder improves APSR’s ability to recognize coreference relations. This demonstrates that our sparse argument representation mechanism can enhance the ability to capture contextual semantic information. Second, the removal of the inter-senential encoder causes the performance decreases for argument identification/classification by

2.66 %

2.90 %

in terms of Head F1 and

2.97 %

3.05 %

in terms of Coref F1, respectively. This shows the importance of gathering adequate context information for DEAE. In general, removing any component of sparse argument representation encoder leads to the performance decrease, which suggests their respective contributions to improving the performance.

5.3. Error Analysis

To answer

R Q 3

, we further explore four specific error types made by TSAR and APSR, where we randomly sample 100 examples from the RAMS test set and the number of errors are shown in Table 6. Specifically, the missing head denotes the missing arguments that the model does not recognize them; wrong span means the model cannot predict right start and end positions of head word of a golden span; wrong role refers to assigning a wrong role of head word of a golden span; and over-extract denotes the model predicts the role that does not exist in the context.

First, as shown in Table 6, concerning the missing head error, APSR can capture more correct arguments in the document compared to TSAR. Moreover, regarding the results of wrong span and wrong role errors, it is evident that APSR is able to capture more correct argument spans, and it also ensures accuracy in predicting the argument roles. This indicates that the sparse argument attention mechanism can help capture high-quality argument representation. In addition, APSR decreases the number of times that the model predicts a non-existent role. The results verify that the sparse argument attention mechanism can help alleviate the long-range dependency issue.

5.4. Case Study

To address

R Q 4

and demonstrate our model’s capability in mitigating the challenges associated with long-range dependency and distracting context, we investigate a specific example from the RAMS test set in the results of TSAR and APSR. As shown in Figure 3, we can observe that given the event trigger “smuggling” and the event type “Movement: transportartifact.smuggleextract” in S3, TSAR can only identify arguments “networks” and “Europe” in S2 and S3, respectively. We conjecture that TSAR focuses more on tokens within the sentence itself and those situated in the sentence containing event trigger. So it fails to recognize argument “Turkey” that locates cross two sentences with trigger “smuggling”. While our APSR is able to effectively capture all arguments scattered in S2, S3 and S5. It could be attributed to that our sparse argument representation mechanism is capable of extracting the event arguments distributed in several sentences, so the argument “migrants” located in S2 ahead of trigger “smuggling” in S3 can be identified according to the attention mechanism.

6. Conclusions

In this paper, we propose a novel span-based DEAE model with a sparse argument attention mechanism to address the long-range dependency and distracting context issues simultaneously. Based on different human writing patterns, we design three types of event argument masks, i.e., sequential, flashback, and banded, which encode the document from different perspectives. Experimental results conducted on RAMS and WikiEvents validate that our APSR outperforms competitive baseline approaches by

1.27 %

F1 and

3.12 %

F1, respectively. Regarding future work, we will study how to combine these sparse attention masks to capture useful event argument representation, and we would like to further explore the sparse argument attention mechanism in zero-shot cross-lingual setting.

Author Contributions

Conceptualization, M.Z. and H.C.; methodology, M.Z.; validation, M.Z. and H.C.; formal analysis, H.C.; investigation, M.Z.; resources, H.C.; data curation, M.Z.; writing—original draft preparation, M.Z.; writing—review and editing, H.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data are available from the authors on request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Sankepally, R. Event information retrieval from text. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Paris, France, 21–25 July 2019; p. 1447. [Google Scholar]
Fincke, S.; Agarwal, S.; Miller, S.; Boschee, E. Language model priming for cross-lingual event extraction. Proc. AAAI Conf. Artif. Intell. 2022, 36, 10627–10635. [Google Scholar] [CrossRef]
Antoine, B.; Yejin, C. Dynamic knowledge graph construction for zero-shot commonsense question answering. arXiv 2019, arXiv:1911.03876. [Google Scholar]
Guan, S.; Cheng, X.; Bai, L.; Zhang, F.; Li, Z.; Zeng, Y.; Jin, X.; Guo, J. What is event knowledge graph: A survey. IEEE Trans. Knowl. Data Eng. 2022, 35, 7569–7589. [Google Scholar] [CrossRef]
Liu, C.Y.; Zhou, C.; Wu, J.; Xie, H.; Hu, Y.; Guo, L. CPMF: A collective pairwise matrix factorization model for upcoming event recommendation. In Proceedings of the 2017 International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, USA, 14–19 May 2017; pp. 1532–1539. [Google Scholar]
Horowitz, D.; Contreras, D.; Salamó, M. EventAware: A mobile recommender system for events. Pattern Recognit. Lett. 2018, 105, 121–134. [Google Scholar] [CrossRef]
Li, M.; Zareian, A.; Lin, Y.; Pan, X.; Whitehead, S.; Chen, B.; Wu, B.; Ji, H.; Chang, S.F.; Voss, C.; et al. Gaia: A fine-grained multimedia knowledge extraction system. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Online, 5–10 July 2020; pp. 77–86. [Google Scholar]
Souza Costa, T.; Gottschalk, S.; Demidova, E. Event-QA: A dataset for event-centric question answering over knowledge graphs. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, Virtual Event, 19–23 October 2020; pp. 3157–3164. [Google Scholar]
Wang, J.; Jatowt, A.; Färber, M.; Yoshikawa, M. Improving question answering for event-focused questions in temporal collections of news articles. Inf. Retr. J. 2021, 24, 29–54. [Google Scholar] [CrossRef]
Nguyen, T.H.; Cho, K.; Grishman, R. Joint event extraction via recurrent neural networks. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, CA, USA, 12–17 June 2016; pp. 300–309. [Google Scholar]
Liu, X.; Luo, Z.; Huang, H. Jointly multiple events extraction via attention-based graph information aggregation. arXiv 2018, arXiv:1809.09078. [Google Scholar]
Yang, S.; Feng, D.; Qiao, L.; Kan, Z.; Li, D. Exploring pre-trained language models for event extraction and generation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; pp. 5284–5294. [Google Scholar]
Du, X.; Cardie, C. Event extraction by answering (almost) natural questions. arXiv 2020, arXiv:2004.13625. [Google Scholar]
Wei, K.; Sun, X.; Zhang, Z.; Zhang, J.; Zhi, G.; Jin, L. Trigger is not sufficient: Exploiting frame-aware knowledge for implicit event argument extraction. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Virtual Event, 1–6 August 2021; pp. 4672–4682. [Google Scholar]
Paolini, G.; Athiwaratkun, B.; Krone, J.; Ma, J.; Achille, A.; Anubhai, R.; Santos, C.N.d.; Xiang, B.; Soatto, S. Structured prediction as translation between augmented natural languages. arXiv 2021, arXiv:2101.05779. [Google Scholar]
Hsu, I.H.; Huang, K.H.; Boschee, E.; Miller, S.; Natarajan, P.; Chang, K.W.; Peng, N. DEGREE: A Data-Efficient Generation-Based Event Extraction Model. arXiv 2021, arXiv:2108.12724. [Google Scholar]
Lu, Y.; Liu, Q.; Dai, D.; Xiao, X.; Lin, H.; Han, X.; Sun, L.; Wu, H. Unified structure generation for universal information extraction. arXiv 2022, arXiv:2203.12277. [Google Scholar]
Lu, Y.; Lin, H.; Xu, J.; Han, X.; Tang, J.; Li, A.; Sun, L.; Liao, M.; Chen, S. Text2Event: Controllable sequence-to-structure generation for end-to-end event extraction. arXiv 2021, arXiv:2106.09232. [Google Scholar]
Li, S.; Ji, H.; Han, J. Document-level event argument extraction by conditional generation. arXiv 2021, arXiv:2104.05919. [Google Scholar]
Liu, X.; Huang, H.; Shi, G.; Wang, B. Dynamic prefix-tuning for generative template-based event extraction. arXiv 2022, arXiv:2205.06166. [Google Scholar]
Du, X.; Ji, H. Retrieval-augmented generative question answering for event argument extraction. arXiv 2022, arXiv:2211.07067. [Google Scholar]
Liu, J.; Chen, Y.; Xu, J. Machine reading comprehension as data augmentation: A case study on implicit event argument extraction. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Virtual Event, 7–11 November 2021; pp. 2716–2725. [Google Scholar]
Zhang, Z.; Kong, X.; Liu, Z.; Ma, X.; Hovy, E. A two-step approach for implicit event argument detection. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 7479–7485. [Google Scholar]
Dai, L.; Wang, B.; Xiang, W.; Mo, Y. Bi-directional iterative prompt-tuning for event argument extraction. arXiv 2022, arXiv:2210.15843. [Google Scholar]
Yang, X.; Lu, Y.; Petzold, L. Few-shot document-level event argument extraction. arXiv 2022, arXiv:2209.02203. [Google Scholar]
He, Y.; Hu, J.; Tang, B. Revisiting Event Argument Extraction: Can EAE Models Learn Better When Being Aware of Event Co-occurrences? arXiv 2023, arXiv:2306.00502. [Google Scholar]
Ebner, S.; Xia, P.; Culkin, R.; Rawlins, K.; Van Durme, B. Multi-sentence argument linking. arXiv 2019, arXiv:1911.03766. [Google Scholar]
Lin, J.; Chen, Q.; Zhou, J.; Jin, J.; He, L. Cup: Curriculum learning based prompt tuning for implicit event argument extraction. arXiv 2022, arXiv:2205.00498. [Google Scholar]
Fan, S.; Wang, Y.; Li, J.; Zhang, Z.; Shang, S.; Han, P. Interactive Information Extraction by Semantic Information Graph. In Proceedings of the IJCAI, Vienna, Austria, 23–29 July 2022; pp. 4100–4106. [Google Scholar]
Xu, R.; Wang, P.; Liu, T.; Zeng, S.; Chang, B.; Sui, Z. A two-stream AMR-enhanced model for document-level event argument extraction. arXiv 2022, arXiv:2205.00241. [Google Scholar]
Zhang, Z.; Ji, H. Abstract meaning representation guided graph encoding and decoding for joint information extraction. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics-Human Language Technologies (NAACL-HLT2021), Online, 6–11 June 2021. [Google Scholar]
Hsu, I.; Xie, Z.; Huang, K.H.; Natarajan, P.; Peng, N. AMPERE: AMR-aware prefix for generation-based event argument extraction model. arXiv 2023, arXiv:2305.16734. [Google Scholar]
He, K.; Chen, X.; Xie, S.; Li, Y.; Dollár, P.; Girshick, R. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 16000–16009. [Google Scholar]
Yuan, C.; Huang, H.; Cao, Y.; Wen, Y. Discriminative reasoning with sparse event representation for document-level event-event relation extraction. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL 2023), Toronto, ON, Canada, 9–14 July 2023. [Google Scholar]
Grishman, R.; Sundheim, B.M. Message understanding conference-6: A brief history. In Proceedings of the 16th Conference on Computational Linguistics—Volume 1 (COLING 1996), Copenhagen, Denmark, 5–9 August 1996; pp. 466–471. [Google Scholar]
Zhou, J.; Shuang, K.; Wang, Q.; Yao, X. EACE: A document-level event argument extraction model with argument constraint enhancement. Inf. Process. Manag. 2024, 61, 103559. [Google Scholar] [CrossRef]
Zeng, Q.; Zhan, Q.; Ji, H. EA²E: Improving Consistency with Event Awareness for Document-Level Argument Extraction. arXiv 2022, arXiv:2205.14847. [Google Scholar]
Zhang, K.; Shuang, K.; Yang, X.; Yao, X.; Guo, J. What is overlap knowledge in event argument extraction? APE: A cross-datasets transfer learning model for EAE. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Toronto, ON, Canada, 9–14 July 2023; pp. 393–409. [Google Scholar]
Lin, Z.; Zhang, H.; Song, Y. Global constraints with prompting for zero-shot event argument classification. arXiv 2023, arXiv:2302.04459. [Google Scholar]
Cao, P.; Jin, Z.; Chen, Y.; Liu, K.; Zhao, J. Zero-shot cross-lingual event argument extraction with language-oriented prefix-tuning. Proc. AAAI Conf. Artif. Intell. 2023, 37, 12589–12597. [Google Scholar] [CrossRef]
Liu, W.; Cheng, S.; Zeng, D.; Qu, H. Enhancing document-level event argument extraction with contextual clues and role relevance. arXiv 2023, arXiv:2310.05991. [Google Scholar]
Li, F.; Peng, W.; Chen, Y.; Wang, Q.; Pan, L.; Lyu, Y.; Zhu, Y. Event extraction as multi-turn question answering. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, Online Event, 16–20 November 2020; pp. 829–838. [Google Scholar]
Zhou, Y.; Chen, Y.; Zhao, J.; Wu, Y.; Xu, J.; Li, J. What the role is vs. what plays the role: Semi-supervised event argument extraction via dual question answering. Proc. AAAI Conf. Artif. Intell. 2021, 35, 14638–14646. [Google Scholar] [CrossRef]
Banarescu, L.; Bonial, C.; Cai, S.; Georgescu, M.; Griffitt, K.; Hermjakob, U.; Knight, K.; Koehn, P.; Palmer, M.; Schneider, N. Abstract meaning representation for sembanking. In Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse, Sofia, Bulgaria, 8–9 August 2013; pp. 178–186. [Google Scholar]
Yang, Y.; Guo, Q.; Hu, X.; Zhang, Y.; Qiu, X.; Zhang, Z. An AMR-based link prediction approach for document-level event argument extraction. arXiv 2023, arXiv:2305.19162. [Google Scholar]
Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. arXiv 2017, arXiv:1706.03762. [Google Scholar]
Astudillo, R.F.; Ballesteros, M.; Naseem, T.; Blodgett, A.; Florian, R. Transition-based parsing with stack-transformers. arXiv 2020, arXiv:2010.10669. [Google Scholar]
Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
Zeng, S.; Xu, R.; Chang, B.; Li, L. Double graph based reasoning for document-level relation extraction. arXiv 2020, arXiv:2009.13752. [Google Scholar]
Ji, H.; Grishman, R. Refining event extraction through cross-document inference. In Proceedings of the ACL-08: Hlt, 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Columbus, OH, USA, 15–20 June 2008; pp. 254–262. [Google Scholar]
Shi, P.; Lin, J. Simple bert models for relation extraction and semantic role labeling. arXiv 2019, arXiv:1904.05255. [Google Scholar]
Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]

Figure 1. Example of DEAE. The event trigger word is marked in bold text, and the arguments are marked in red text with underlines.

Figure 2. Overview of our APSR. Giving a document as the input, we first encode the document with inter-sentential encoder and intra-sentential encoder with a mask matrix M. The detail of the sparse argument representation mask matrix M can be seen in the dashed box below. Then, the AMR parser module constructs semantic graphs to facilitate semantic interaction. Finally, we fuse the argument representations from two encoders and predict what argument role the candidate span plays.

Figure 3. Case study, where we pick an instance from the RAMS test set. The event trigger word is marked in red bold text, and the arguments are marked in bold text with underlines.

Table 1. Comparison of generation-based and span-based DEAE methods.

Angle	Generation-Based	Span-Based
Strength	effectively solve the same-role argument assignment	effectively capture cross-sentence and multi-hop structures
Weakness	exhibit limitations when dealing with long-distance arguments	mainly focus on taking the graph structure as additional features to enrich span representation, ignoring the written pattern of the document

Table 2. Statistics of RAMS and WikiEvents datasets.

Dataset	Split	#Docs	#Events	#Arguments
RMAS *	Train	3194	7329	17,026
	Dev	399	924	2188
	Test	400	871	2023
WikiEvents **	Train	206	3241	4542
	Dev	20	345	428
	Test	20	365	566

* Data and code are available at http://nlp.jhu.edu/rams/ (accessed on 15 August 2024). ** Data and code are available at https://github.com/raspberryice/gen-arg (accessed on 15 August 2024).

Table 3. Overall performance of APSR and the baselines on both dev and test sets of the RAMS dataset. The result produced by the best baseline and the best result in each column are underlined in bold, respectively.

Method	Dev		Test
Method	Span F1	Head F1	Span F1	Head F1
BERT-CRF	38.1	45.7	39.3	47.1
${BERT-CRF}_{T C D}$	39.2	46.7	40.5	48.0
Two-Step	38.9	46.4	40.1	47.7
${Two-Step}_{T C D}$	40.3	48.0	41.8	49.7
TSAR	45.50	51.66	47.13	53.75
${APSR}_{s e q u e n t i a l}$	45.85	51.98	47.28	55.02
${APSR}_{b a n d e d}$	45.56	51.70	47.16	54.18
${APSR}_{f l a s h b a c k}$	44.88	52.26	46.86	53.63

Table 4. Overall performance of APSR and the baselines on both argument identification and argument classification on the WikiEvents dataset. The result produced by the best baseline and the best result in each column are underlined in bold, respectively.

Method	Arg Identification		Arg Classification
Method	Head F1	Coref F1	Head F1	Coref F1
BERT-CRF	69.83	72.24	54.48	56.72
BERT-QA	61.05	64.59	56.16	59.36
BERT-QA-Doc	39.15	51.25	34.77	45.96
TSAR	74.44	72.37	67.10	65.79
${APSR}_{s e q u e n t i a l}$	75.20	73.05	67.14	65.53
${APSR}_{b a n d e d}$	76.60	75.49	69.57	68.83
${APSR}_{f l a s h b a c k}$	73.08	71.43	65.93	64.65

Table 5. Ablation study on the WikiEvents test set.

Method	Arg Identification		Arg Classification
Method	Head F1	Coref F1	Head F1	Coref F1
${APSR}_{b a n d e d}$	76.60	75.49	69.57	68.83
- Intra-sentential Encoder	76.13	74.06	69.55	67.86
- Inter-sentential Encoder	73.94	72.52	66.67	65.78

Table 6. Error analysis on RAMS test set. We display the number of four types of errors for TSAR and APSR, including 160 and 138 errors in total, respectively.

Model	Missing Head	Wrong Span	Wrong Role	Over-Extract
TSAR	54	61	15	30
APSR	50	54	11	23

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, M.; Chen, H. Document-Level Event Argument Extraction with Sparse Representation Attention. Mathematics 2024, 12, 2636. https://doi.org/10.3390/math12172636

AMA Style

Zhang M, Chen H. Document-Level Event Argument Extraction with Sparse Representation Attention. Mathematics. 2024; 12(17):2636. https://doi.org/10.3390/math12172636

Chicago/Turabian Style

Zhang, Mengxi, and Honghui Chen. 2024. "Document-Level Event Argument Extraction with Sparse Representation Attention" Mathematics 12, no. 17: 2636. https://doi.org/10.3390/math12172636

APA Style

Zhang, M., & Chen, H. (2024). Document-Level Event Argument Extraction with Sparse Representation Attention. Mathematics, 12(17), 2636. https://doi.org/10.3390/math12172636

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Document-Level Event Argument Extraction with Sparse Representation Attention

Abstract

1. Introduction

2. Related Work

2.1. Generation-Based DEAE Method

2.2. Span-Based DEAE Method

3. Approach

3.1. Task Formulation

3.2. Sparse Argument Representation Encoder

3.3. AMR Parser Module

3.4. Fusion and Classification Module

4. Experiments

4.1. Research Questions

4.2. Datasets and Evaluation Metrics

4.3. Compared Baselines

4.4. Experimental Settings

5. Results and Discussion

5.1. Overall Performance

5.2. Ablation Study

5.3. Error Analysis

5.4. Case Study

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI