2024
pdf
bib
abs
Harvesting Events from Multiple Sources: Towards a Cross-Document Event Extraction Paradigm
Qiang Gao
|
Zixiang Meng
|
Bobo Li
|
Jun Zhou
|
Fei Li
|
Chong Teng
|
Donghong Ji
Findings of the Association for Computational Linguistics: ACL 2024
Document-level event extraction aims to extract structured event information from unstructured text. However, a single document often contains limited event information and the roles of different event arguments may be biased due to the influence of the information source.This paper addresses the limitations of traditional document-level event extraction by proposing the task of cross-document event extraction (CDEE) to integrate event information from multiple documents and provide a comprehensive perspective on events. We construct a novel cross-document event extraction dataset, namely CLES, which contains 20,059 documents and 37,688 mention-level events, where over 70% of them are cross-document. To address the task, we propose a CDEE pipeline that includes 5 steps, namely event extraction, coreference resolution, entity normalization, role normalization and entity-role resolution. Our CDEE pipeline achieves about 72% F1 in end-to-end cross-document event extraction, suggesting the challenge of this task and setting up a benchmark for future research. Our work builds a new line of information extraction research and will attract new research attention.
pdf
bib
abs
Refining and Synthesis: A Simple yet Effective Data Augmentation Framework for Cross-Domain Aspect-based Sentiment Analysis
Haining Wang
|
Kang He
|
Bobo Li
|
Lei Chen
|
Fei Li
|
Xu Han
|
Chong Teng
|
Donghong Ji
Findings of the Association for Computational Linguistics: ACL 2024
Aspect-based Sentiment Analysis (ABSA) is extensively researched in the NLP community, yet related models face challenges due to data sparsity when shifting to a new domain. Hence, data augmentation for cross-domain ABSA has attracted increasing attention in recent years. However, two key points have been neglected in prior studies: First, target domain unlabeled data are labeled with pseudo labels by the model trained in the source domain with little quality control, leading to inaccuracy and error propagation. Second, the label and text patterns of generated labeled data are monotonous, thus limiting the robustness and generalization ability of trained ABSA models. In this paper, we aim to design a simple yet effective framework to address the above shortages in ABSA data augmentation, called Refining and Synthesis Data Augmentation (RSDA). Our framework roughly includes two steps: First, it refines generated labeled data using a natural language inference (NLI) filter to control data quality. Second, it synthesizes diverse labeled data via novel label composition and paraphrase approaches. We conduct experiments on 4 kinds of ABSA subtasks, and our framework outperforms 7 strong baselines, demonstrating its effectiveness.
pdf
bib
abs
Revisiting Structured Sentiment Analysis as Latent Dependency Graph Parsing
Chengjie Zhou
|
Bobo Li
|
Hao Fei
|
Fei Li
|
Chong Teng
|
Donghong Ji
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Structured Sentiment Analysis (SSA) was cast as a problem of bi-lexical dependency graph parsing by prior studies.Multiple formulations have been proposed to construct the graph, which share several intrinsic drawbacks:(1) The internal structures of spans are neglected, thus only the boundary tokens of spans are used for relation prediction and span recognition, thus hindering the model’s expressiveness;(2) Long spans occupy a significant proportion in the SSA datasets, which further exacerbates the problem of internal structure neglect.In this paper, we treat the SSA task as a dependency parsing task on partially-observed dependency trees, regarding flat spans without determined tree annotations as latent subtrees to consider internal structures of spans.We propose a two-stage parsing method and leverage TreeCRFs with a novel constrained inside algorithm to model latent structures explicitly, which also takes advantages of joint scoring graph arcs and headed spans for global optimization and inference. Results of extensive experiments on five benchmark datasets reveal that our method performs significantly better than all previous bi-lexical methods, achieving new state-of-the-art.
pdf
bib
abs
Enhancing Cross-Document Event Coreference Resolution by Discourse Structure and Semantic Information
Qiang Gao
|
Bobo Li
|
Zixiang Meng
|
Yunlong Li
|
Jun Zhou
|
Fei Li
|
Chong Teng
|
Donghong Ji
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Existing cross-document event coreference resolution models, which either compute mention similarity directly or enhance mention representation by extracting event arguments (such as location, time, agent, and patient), lackingmthe ability to utilize document-level information. As a result, they struggle to capture long-distance dependencies. This shortcoming leads to their underwhelming performance in determining coreference for the events where their argument information relies on long-distance dependencies. In light of these limitations, we propose the construction of document-level Rhetorical Structure Theory (RST) trees and cross-document Lexical Chains to model the structural and semantic information of documents. Subsequently, cross-document heterogeneous graphs are constructed and GAT is utilized to learn the representations of events. Finally, a pair scorer calculates the similarity between each pair of events and co-referred events can be recognized using standard clustering algorithm. Additionally, as the existing cross-document event coreference datasets are limited to English, we have developed a large-scale Chinese cross-document event coreference dataset to fill this gap, which comprises 53,066 event mentions and 4,476 clusters. After applying our model on the English and Chinese datasets respectively, it outperforms all baselines by large margins.
pdf
bib
abs
What Factors Influence LLMs’ Judgments? A Case Study on Question Answering
Lei Chen
|
Bobo Li
|
Li Zheng
|
Haining Wang
|
Zixiang Meng
|
Runfeng Shi
|
Hao Fei
|
Jun Zhou
|
Fei Li
|
Chong Teng
|
Donghong Ji
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Large Language Models (LLMs) are now being considered as judges of high efficiency to evaluate the quality of answers generated by candidate models. However, their judgments may be influenced by complex scenarios and inherent biases, raising concerns about their reliability. This study aims to bridge this gap by introducing four unexplored factors and examining the performance of LLMs as judges, namely answer quantity, inducing statements, judging strategy, and judging style. Additionally, we introduce a new dimension of question difficulty to provide a more comprehensive understanding of LLMs’ judgments across varying question intricacies. We employ ChatGPT, GPT-4, Gemini, and Claude-2 as judges and conduct experiments on Vicuna Benchmark and MT-bench. Our study reveals that LLMs’ judging abilities are susceptible to the influence of these four factors, and analyzing from the newly proposed dimension of question difficulty is highly necessary. We also provide valuable insights into optimizing LLMs’ performance as judges, enhancing their reliability and adaptability across diverse evaluation scenarios.
2014
pdf
bib
Word Sense Induction Using Lexical Chain based Hypergraph Model
Tao Qian
|
Donghong Ji
|
Mingyao Zhang
|
Chong Teng
|
Congling Xia
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers
2012
pdf
bib
Context-Enhanced Personalized Social Summarization
Po Hu
|
Donghong Ji
|
Chong Teng
|
Yujing Guo
Proceedings of COLING 2012
2011
pdf
bib
Social Summarization via Automatically Discovered Social Context
Po Hu
|
Cheng Sun
|
Longfei Wu
|
Donghong Ji
|
Chong Teng
Proceedings of 5th International Joint Conference on Natural Language Processing
2009
pdf
bib
Query-Focused Multi-Document Summarization Using Co-Training Based Semi-Supervised Learning
Po Hu
|
Donghong Ji
|
Hai Wang
|
Chong Teng
Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, Volume 1
pdf
bib
Finding Answers to Definition Questions Using Web Knowledge Bases
Han Ren
|
Donghong Ji
|
Jing Wan
|
Chong Teng
Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, Volume 2