Seung-Hoon Na

Also published as: Seung-hoon Na

2024

pdf bib
COMEM: In-Context Retrieval-Augmented Mass-Editing Memory in Large Language Models
Shanbao Qiao | Xuebing Liu | Seung-Hoon Na
Findings of the Association for Computational Linguistics: NAACL 2024

pdf bib abs
DistillMIKE: Editing Distillation of Massive In-Context Knowledge Editing in Large Language Models
Shanbao Qiao | Xuebing Liu | Seung-Hoon Na
Findings of the Association for Computational Linguistics: ACL 2024

Among the recently emerged knowledge editing methods, in-context knowledge editing (IKE) has shown respectable abilities on knowledge editing in terms of generalization and specificity. Noting the promising advantages but unexplored issues of IKE, we propose **DistillMIKE** as a novel extension of IKE, i.e., editing **distill**ation of "**M**assive” **I**n-context **K**nowledge **E**diting in large language models (LLMs), mainly consisting of two expansions; 1) *Massive in-context knowledge editing (MIKE)*, which extends IKE to a massive editing task, aiming to inject not a single edit but a set of massive edits to LLMs; To preserve specificity, our key novel extension is a “selective” retrieval augmentation, where the retrieval-augmented IKE is only applied to “in-scope” examples, whereas the unedited model without IKE is employed for “out-of-scope” ones. 2) *Editing distillation* of MIKE using low-rank adaptation (LoRA), which distills editing abilities of MIKE to parameters of LLMs in a manner of eliminating the need of lengthy in-context demonstrations, thus removing the computational overhead encountered at the inference time. Experimental results on the zsRE and CounterFact datasets demonstrate that MIKE shows the state-of-the-art perfomrances and DistilMIKE show comparable performances with MIKE. Our code is available at https://github.com/JoveReCode/DistillMIKE.git.

pdf bib abs
SARCAT: Generative Span-Act Guided Response Generation using Copy-enhanced Target Augmentation
Jeong-Doo Lee | Hyeongjun Choi | Beomseok Hong | Youngsub Han | Byoung-Ki Jeon | Seung-Hoon Na
Findings of the Association for Computational Linguistics: EMNLP 2024

In this paper, we present a novel extension to improve the document grounded response generation, by proposing the Generative Span Act Guided Response Generation using Copy enhanced Target Augmentation (SARCAT) that consists of two major components as follows: 1) Copy-enhanced target-side input augmentation is an extended data augmentation to deal with the exposure bias problem by additionally incorporating the copy mechanism on top of the target-side augmentation (Xie et al., 2021). 2) Span-act guided response generation, which first predicts grounding spans and dialogue acts before generating a response. Experiment results on validation set in MultiDoc2Dial show that the proposed SARSAT leads to improvement over strong baselines on both seen and unseen settings and achieves the start-of the-art performance, even with the base reader using the pretrained T5-base model.

pdf bib abs
RADCoT: Retrieval-Augmented Distillation to Specialization Models for Generating Chain-of-Thoughts in Query Expansion
Sung-Min Lee | Eunhwan Park | DongHyeon Jeon | Inho Kang | Seung-Hoon Na
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Large language models (LLMs) have demonstrated superior performance to that of small language models (SLM) in information retrieval for various subtasks including dense retrieval, reranking, query expansion, and pseudo-document generation. However, the parameter sizes of LLMs are extremely large, making it expensive to operate LLMs stably for providing LLM-based retrieval services. Recently, retrieval-augmented language models have been widely employed to significantly reduce the parameter size by retrieving relevant knowledge from large-scale corpora and exploiting the resulting “in-context” knowledge as additional model input, thereby substantially reducing the burden of internalizing and retaining world knowledge in model parameters. Armed by the retrieval-augmented language models, we present a retrieval-augmented model specialization that distills the capability of LLMs to generate the chain-of-thoughts (CoT) for query expansion – that is, injects the LLM’s capability to generate CoT into a retrieval-augmented SLM – referred to as RADCoT. Experimental results on the MS-MARCO, TREC DL 19, 20 datasets show that RADCoT yields consistent improvements over distillation without retrieval, achieving comparable performance to that of the query expansion method using LLM-based CoTs. Our code is publicly available at https://github.com/ZIZUN/RADCoT.

pdf bib abs
GeminiPro at SemEval-2024 Task 9: BrainTeaser on Gemini
Kyu Hyun Choi | Seung-hoon Na
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

It is known that human thought can be distinguished into lateral and vertical thinking. The development of language models has thus far been focused on evaluating and advancing vertical thinking, while lateral thinking has been somewhat neglected. To foster progress in this area, SemEval has created and distributed a brainteaser dataset based on lateral thinking consist of sentence puzzles and word puzzle QA. In this paper, we test and discuss the performance of the currently known best model, Gemini, on this dataset.

2023

pdf bib abs
ExplainMeetSum: A Dataset for Explainable Meeting Summarization Aligned with Human Intent
Hyun Kim | Minsoo Cho | Seung-Hoon Na
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

To enhance the explainability of meeting summarization, we construct a new dataset called “ExplainMeetSum,” an augmented version of QMSum, by newly annotating evidence sentences that faithfully “explain” a summary. Using ExplainMeetSum, we propose a novel multiple extractor guided summarization, namely Multi-DYLE, which extensively generalizes DYLE to enable using a supervised extractor based on human-aligned extractive oracles. We further present an explainability-aware task, named “Explainable Evidence Extraction” (E3), which aims to automatically detect all evidence sentences that support a given summary. Experimental results on the QMSum dataset show that the proposed Multi-DYLE outperforms DYLE with gains of up to 3.13 in the ROUGE-1 score. We further present the initial results on the E3 task, under the settings using separate and joint evaluation metrics.

pdf bib abs
MAFiD: Moving Average Equipped Fusion-in-Decoder for Question Answering over Tabular and Textual Data
Sung-Min Lee | Eunhwan Park | Daeryong Seo | Donghyeon Jeon | Inho Kang | Seung-Hoon Na
Findings of the Association for Computational Linguistics: EACL 2023

Transformer-based models for question answering (QA) over tables and texts confront a “long” hybrid sequence over tabular and textual elements, causing long-range reasoning problems. To handle long-range reasoning, we extensively employ a fusion-in-decoder (FiD) and exponential moving average (EMA), proposing a Moving Average Equipped Fusion-in-Decoder (MAFiD). With FiD as the backbone architecture, MAFiD combines various levels of reasoning: independent encoding of homogeneous data and single-row and multi-row heterogeneous reasoning, using a gated cross attention layer to effectively aggregate the three types of representations resulting from various reasonings. Experimental results on HybridQA indicate that MAFiD achieves state-of-the-art performance by increasing exact matching (EM) and F1 by 1.1 and 1.7, respectively, on the blind test set.

pdf bib abs
DiffusionRet: Diffusion-Enhanced Generative Retriever using Constrained Decoding
Shanbao Qiao | Xuebing Liu | Seung-Hoon Na
Findings of the Association for Computational Linguistics: EMNLP 2023

Generative retrieval, which maps from a query to its relevant document identifiers (docids), has recently emerged as a new information retrieval (IR) paradigm, however, having suffered from 1) the lack of the intermediate reasoning step, caused by the manner of merely using a query to perform the hierarchical classification, and 2) the pretrain-finetune discrepancy, which comes from the use of the artificial symbols of docids. To address these limitations, we propose the novel approach of using the document generation from a query as an intermediate step before the retrieval, thus presenting ̲diffusion-enhanced generative ̲retrieval (DiffusionRet), which consists of two processing steps: 1) the diffusion-based document generation, which employs the sequence-to-sequence diffusion model to produce a pseudo document sample from a query, being expected to semantically close to a relevant document; 2) N-gram-based generative retrieval, which use another sequence-to-sequence model to generate n-grams that appear in the collection index for linking a generated sample to an original document. Experiment results on MS MARCO and Natural Questions dataset show that the proposed DiffusionRet significantly outperforms all the existing generative retrieval methods and leads to the state-of-the-art performances, even with much smaller number of parameters.

2022

pdf bib abs
Frustratingly Easy System Combination for Grammatical Error Correction
Muhammad Reza Qorib | Seung-Hoon Na | Hwee Tou Ng
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

In this paper, we formulate system combination for grammatical error correction (GEC) as a simple machine learning task: binary classification. We demonstrate that with the right problem formulation, a simple logistic regression algorithm can be highly effective for combining GEC models. Our method successfully increases the F0.5 score from the highest base GEC system by 4.2 points on the CoNLL-2014 test set and 7.2 points on the BEA-2019 test set. Furthermore, our method outperforms the state of the art by 4.0 points on the BEA-2019 test set, 1.2 points on the CoNLL-2014 test set with original annotation, and 3.4 points on the CoNLL-2014 test set with alternative annotation. We also show that our system combination generates better corrections with higher F0.5 scores than the conventional ensemble.

pdf bib abs
LM-BFF-MS: Improving Few-Shot Fine-tuning of Language Models based on Multiple Soft Demonstration Memory
Eunhwan Park | Donghyeon Jeon | Seonhoon Kim | Inho Kang | Seung-Hoon Na
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

LM-BFF (CITATION) achieves significant few-shot performance by using auto-generated prompts and adding demonstrations similar to an input example. To improve the approach of LM-BFF, this paper proposes LM-BFF-MS—better few-shot fine-tuning of language models with multiple soft demonstrations by making its further extensions, which include 1) prompts with multiple demonstrations based on automatic generation of multiple label words; and 2) soft demonstration memory which consists of multiple sequences of globally shared word embeddings for a similar context. Experiments conducted on eight NLP tasks show that LM-BFF-MS leads to improvements over LM-BFF on five tasks, particularly achieving 94.0 and 90.4 on SST-2 and MRPC, respectively.

pdf bib abs
JBNU-CCLab at SemEval-2022 Task 7: DeBERTa for Identifying Plausible Clarifications in Instructional Texts
Daewook Kang | Sung-Min Lee | Eunhwan Park | Seung-Hoon Na
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

In this study, we examine the ability of contextualized representations of pretrained language model to distinguish whether sequences from instructional articles are plausible or implausible. Towards this end, we compare the BERT, RoBERTa, and DeBERTa models using simple classifiers based on the sentence representations of the [CLS] tokens and perform a detailed analysis by visualizing the representations of the [CLS] tokens of the models. In the experimental results of Subtask A: Multi-Class Classification, DeBERTa exhibits the best performance and produces a more distinguishable representation across different labels. Submitting an ensemble of 10 DeBERTa-based models, our final system achieves an accuracy of 61.4% and is ranked fifth out of models submitted by eight teams. Further in-depth results suggest that the abilities of pretrained language models for the plausibility detection task are more strongly affected by their model structures or attention designs than by their model sizes.

pdf bib abs
JBNU-CCLab at SemEval-2022 Task 12: Machine Reading Comprehension and Span Pair Classification for Linking Mathematical Symbols to Their Descriptions
Sung-Min Lee | Seung-Hoon Na
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

This paper describes our system in the SemEval-2022 Task 12: ‘linking mathematical symbols to their descriptions’, achieving first on the leaderboard for all the subtasks comprising named entity extraction (NER) and relation extraction (RE). Our system is a two-stage pipeline model based on SciBERT that detects symbols, descriptions, and their relationships in scientific documents. The system consists of 1) machine reading comprehension(MRC)-based NER model, where each entity type is represented as a question and its entity mention span is extracted as an answer using an MRC model, and 2) span pair classification for RE, where two entity mentions and their type markers are encoded into span representations that are then fed to a Softmax classifier. In addition, we deploy a rule-based symbol tokenizer to improve the detection of the exact boundary of symbol entities. Regularization and ensemble methods are further explored to improve the RE model.

This study proposes Semantic-Infused SElective Graph Reasoning (SISER) for fact verification, which newly presents semantic-level graph reasoning and injects its reasoning-enhanced representation into other types of graph-based and sequence-based reasoning methods. SISER combines three reasoning types: 1) semantic-level graph reasoning, which uses a semantic graph from evidence sentences, whose nodes are elements of a triple – <Subject, Verb, Object>, 2) “semantic-infused” sentence-level “selective” graph reasoning, which combine semantic-level and sentence-level representations and perform graph reasoning in a selective manner using the node selection mechanism, and 3) sequence reasoning, which concatenates all evidence sentences and performs attention-based reasoning. Experiment results on a large-scale dataset for Fact Extraction and VERification (FEVER) show that SISER outperforms the previous graph-based approaches and achieves state-of-the-art performance.

2020

pdf bib abs
JBNU at SemEval-2020 Task 4: BERT and UniLM for Commonsense Validation and Explanation
Seung-Hoon Na | Jong-Hyeon Lee
Proceedings of the Fourteenth Workshop on Semantic Evaluation

This paper presents our contributions to the SemEval-2020 Task 4 Commonsense Validation and Explanation (ComVE) and includes the experimental results of the two Subtasks B and C of the SemEval-2020 Task 4. Our systems rely on pre-trained language models, i.e., BERT (including its variants) and UniLM, and rank 10th and 7th among 27 and 17 systems on Subtasks B and C, respectively. We analyze the commonsense ability of the existing pretrained language models by testing them on the SemEval-2020 Task 4 ComVE dataset, specifically for Subtasks B and C, the explanation subtasks with multi-choice and sentence generation, respectively.

pdf bib abs
JBNU at MRP 2020: AMR Parsing Using a Joint State Model for Graph-Sequence Iterative Inference
Seung-Hoon Na | Jinwoo Min
Proceedings of the CoNLL 2020 Shared Task: Cross-Framework Meaning Representation Parsing

This paper describes the Jeonbuk National University (JBNU) system for the 2020 shared task on Cross-Framework Meaning Representation Parsing at the Conference on Computational Natural Language Learning. Among the five frameworks, we address only the abstract meaning representation framework and propose a joint state model for the graph-sequence iterative inference of (Cai and Lam, 2020) for a simplified graph-sequence inference. In our joint state model, we update only a single joint state vector during the graph-sequence inference process instead of keeping the dual state vectors, and all other components are exactly the same as in (Cai and Lam, 2020).

2019

pdf bib abs
JBNU at MRP 2019: Multi-level Biaffine Attention for Semantic Dependency Parsing
Seung-Hoon Na | Jinwoon Min | Kwanghyeon Park | Jong-Hun Shin | Young-Kil Kim
Proceedings of the Shared Task on Cross-Framework Meaning Representation Parsing at the 2019 Conference on Natural Language Learning

This paper describes Jeonbuk National University (JBNU)’s system for the 2019 shared task on Cross-Framework Meaning Representation Parsing (MRP 2019) at the Conference on Computational Natural Language Learning. Of the five frameworks, we address only the DELPH-IN MRS Bi-Lexical Dependencies (DP), Prague Semantic Dependencies (PSD), and Universal Conceptual Cognitive Annotation (UCCA) frameworks. We propose a unified parsing model using biaffine attention (Dozat and Manning, 2017), consisting of 1) a BERT-BiLSTM encoder and 2) a biaffine attention decoder. First, the BERT-BiLSTM for sentence encoder uses BERT to compose a sentence’s wordpieces into word-level embeddings and subsequently applies BiLSTM to word-level representations. Second, the biaffine attention decoder determines the scores for an edge’s existence and its labels based on biaffine attention functions between roledependent representations. We also present multi-level biaffine attention models by combining all the role-dependent representations that appear at multiple intermediate layers.

pdf bib abs
QE BERT: Bilingual BERT Using Multi-task Learning for Neural Quality Estimation
Hyun Kim | Joon-Ho Lim | Hyun-Ki Kim | Seung-Hoon Na
Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2)

For translation quality estimation at word and sentence levels, this paper presents a novel approach based on BERT that recently has achieved impressive results on various natural language processing tasks. Our proposed model is re-purposed BERT for the translation quality estimation and uses multi-task learning for the sentence-level task and word-level subtasks (i.e., source word, target word, and target gap). Experimental results on Quality Estimation shared task of WMT19 show that our systems show competitive results and provide significant improvements over the baseline.

2017

pdf bib abs
Concept Equalization to Guide Correct Training of Neural Machine Translation
Kangil Kim | Jong-Hun Shin | Seung-Hoon Na | SangKeun Jung
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

Neural machine translation decoders are usually conditional language models to sequentially generate words for target sentences. This approach is limited to find the best word composition and requires help of explicit methods as beam search. To help learning correct compositional mechanisms in NMTs, we propose concept equalization using direct mapping distributed representations of source and target sentences. In a translation experiment from English to French, the concept equalization significantly improved translation quality by 3.00 BLEU points compared to a state-of-the-art NMT model.

pdf bib
Predictor-Estimator using Multilevel Task Learning with Stack Propagation for Neural Quality Estimation
Hyun Kim | Jong-Hyeok Lee | Seung-Hoon Na
Proceedings of the Second Conference on Machine Translation