War of Words: Harnessing the Potential of Large Language Models and Retrieval Augmented Generation to Classify, Counter and Diffuse Hate Speech
DOI:
https://doi.org/10.32473/flairs.37.1.135484Keywords:
Mistral, counterspeechAbstract
This paper explores the emergence of divergent narratives in the wake of the Russian-Ukraine war, which began on February 24, 2022, and the innovative application of AI language models, specifically RetrievalAugmented Generation (RAG) and instruction-based large language models (LLMs), in countering hateful speech on social media. We design a pipeline to automatically discover and then respond to hateful content trending on social media platforms. Monitoring via traditional topic/narrative modeling often focuses on lowlevel content, which is difficult to interpret. In addition, workflows for prioritization and response generation are often highly manual. We utilize several large language models (LLMs) throughout our pipeline to detect and summarize topics, to determine whether tweets contain hate speech and to generate counter narratives. We test our approach on Ukraine Bio-Lab Tweet Corpus of 500k Tweets and evaluate the counter-narrative generation performance across several dimensions: relevance, grammaticality, factuality, and diversity. Our approach outperforms existing state of the art algorithms for hate speech detection and promising counter-narrative generation performance scores across our metrics reflect effectiveness of our pipeline in addressing hateful social media posts
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Rohan Leekha, Olga Simek, Charlie Dagli
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.