research-article

Detecting and Mitigating the Ungrounded Hallucinations in Text Generation by LLMs

Authors:

Dengrong Huang,

Rui LiAuthors Info & Claims

AISNS '23: Proceedings of the 2023 International Conference on Artificial Intelligence, Systems and Network Security

Pages 77 - 81

https://doi.org/10.1145/3661638.3661653

Published: 01 June 2024 Publication History

Abstract

Large language models (LLMs) have achieved impressive success in generating fluent and coherent texts in natural language. However, the presence of inaccurate or low-quality data can unintentionally lead to the retention of incorrect knowledge, resulting in hallucinations that hinder progress in content generation. In this paper, we propose a comprehensive framework aimed at detecting and mitigating these hallucinations. Our approach uses Named Entity Recognition (NER) and Entity Relationship (ER) models to identify hallucination entities and sentences during the detection phase. Furthermore, by incorporating prompt engineering, we effectively correct these hallucination sentences using LLM in the mitigation phase. Tests on real articles confirm the effectiveness of our approach in rectifying LLM-associated hallucinations without adding new ones, thereby enhancing their reliability and credibility.

References

[1]

Hammerton J. Named entity recognition with long short-term memory[C]//Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003. 2003: 172-175.

[2]

Collobert R, Weston J, Bottou L, Natural language processing (almost) from scratch[J]. Journal of machine learning research, 2011, 12(ARTICLE): 2493− 2537.

[3]

Sarzynska-Wawer J, Wawer A, Pawlak A, Detecting formal thought disorder by deep contextualized word representations[J]. Psychiatry Research, 2021, 304: 114135.

[4]

Chan Y S, Roth D. Exploiting syntactico-semantic structures for relation extraction[C]//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 2011: 551-560.

[5]

Nadeau D, Sekine S. A survey of named entity recognition and classification[J]. Lingvisticae Investigationes, 2007, 30(1): 3-26.

[6]

Miwa M, Bansal M. End-to-end relation extraction using lstms on sequences and tree structures[J]. arXiv preprint arXiv:1601.00770, 2016.

[7]

Zhang M, Zhang Y, Fu G. End-to-end neural relation extraction with global optimization[C]//Proceedings of the 2017 conference on empirical methods in natural language processing. 2017: 1730-1740.

[8]

Holtzman A, Buys J, Du L, The curious case of neural text degeneration[J]. arXiv preprint arXiv:1904.09751, 2019.

[9]

Ji Z, Lee N, Frieske R, Survey of hallucination in natural language generation[J]. ACM Computing Surveys, 2023, 55(12): 1-38.

Digital Library

[10]

Varshney N, Yao W, Zhang H, A stitch in time saves nine: Detecting and mitigating hallucinations of llms by validating low-confidence generation[J]. arXiv preprint arXiv:2307.03987, 2023.

[11]

Zhang Y, Li Y, Cui L, Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models[J]. arXiv preprint arXiv:2309.01219, 2023.

[12]

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pages 311–318, Philadelphia, Pennsylvania, USA, July 2002. Association for Computational Linguistics.

[13]

Chin-Yew Lin and Eduard Hovy. Automatic evaluation of summaries using n-gram cooccurrence statistics. In Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, pages 150–157, 2003.

[14]

Yuan W, Neubig G, Liu P. Bartscore: Evaluating generated text as text generation[J]. Advances in Neural Information Processing Systems, 2021, 34: 27263-27277.

[15]

Lee N, Ping W, Xu P, Factuality enhanced language models for open-ended text generation[J]. Advances in Neural Information Processing Systems, 2022, 35: 34586-34599.

[16]

Gao L, Dai Z, Pasupat P, Rarr: Researching and revising what language models say, using language models[C]//Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2023: 16477-16508.

[17]

Manakul P, Liusie A, Gales M J F. Selfcheckgpt: Zero-resource black-box hallucination detection for generative large language models[J]. arXiv preprint arXiv:2303.08896, 2023.

[18]

Mündler N, He J, Jenko S, Self-contradictory Hallucinations of Large Language Models: Evaluation, Detection and Mitigation[J]. arXiv preprint arXiv:2305.15852, 2023.

[19]

Luo J, Xiao C, Ma F. Zero-Resource Hallucination Prevention for Large Language Models[J]. arXiv preprint arXiv:2309.02654, 2023.

[20]

Zha Y, Yang Y, Li R, AlignScore: Evaluating Factual Consistency with a Unified Alignment Function[J]. arXiv preprint arXiv:2305.16739, 2023.

[21]

Li J, Cheng X, Zhao W X, HELMA: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models[J]. arXiv preprint arXiv:2305.11747, 2023.

[22]

Lei D, Li Y, Wang M, Chain of Natural Language Inference for Reducing Large Language Model Ungrounded Hallucinations[J]. arXiv preprint arXiv:2310.03951, 2023.

[23]

Kryściński W, McCann B, Xiong C, Evaluating the factual consistency of abstractive text summarization[J]. arXiv preprint arXiv:1910.12840, 2019.

Index Terms

Detecting and Mitigating the Ungrounded Hallucinations in Text Generation by LLMs
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Natural language generation

Recommendations

An unsupervised approach to detecting and correcting errors in text
Detecting misspelled words in Turkish text using syllable n-gram frequencies
PReMI'07: Proceedings of the 2nd international conference on Pattern recognition and machine intelligence

In this study, we have designed and implemented a system which decides whether or not a word is misspelled in Turkish text. Firstly, three databases of syllable monogram, bigram and trigram frequencies are constructed using the syllables that are ...
Detecting Misspelled Words in Turkish Text Using Syllable n-gram Frequencies
Pattern Recognition and Machine Intelligence
Abstract
In this study, we have designed and implemented a system which decides whether or not a word is misspelled in Turkish text. Firstly, three databases of syllable monogram, bigram and trigram frequencies are constructed using the syllables that are ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

AISNS '23: Proceedings of the 2023 International Conference on Artificial Intelligence, Systems and Network Security

December 2023

467 pages

ISBN:9798400716966

DOI:10.1145/3661638

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article
Research
Refereed limited

Conference

AISNS 2023

AISNS 2023: 2023 International Conference on Artificial Intelligence, Systems and Network Security

December 22 - 24, 2023

Mianyang, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
49
Total Downloads

Downloads (Last 12 months)49
Downloads (Last 6 weeks)11

Reflects downloads up to 26 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents