research-article

A Hybrid Deep BiLSTM-CNN for Hate Speech Detection in Multi-social media

Authors:

Kalpdrum Passi,

Aniket MahantiAuthors Info & Claims

ACM Transactions on Asian and Low-Resource Language Information Processing, Volume 23, Issue 8

Article No.: 127, Pages 1 - 22

https://doi.org/10.1145/3657635

Published: 08 August 2024 Publication History

Abstract

Nowadays, means of communication among people have changed due to advancements in information technology and the rise of online multi-social media. Many people express their feelings, ideas, and emotions on social media sites such as Instagram, Twitter, Gab, Reddit, Facebook, and YouTube. However, people have misused social media to send hateful messages to specific individuals or groups to create chaos. For various governance authorities, manually identifying hate speech on various social media platforms is a difficult task to avoid such chaos. In this study, a hybrid deep-learning model, where bidirectional long short-term memory (BiLSTM) and convolutional neural network (CNN) are used to classify hate speech in textual data, is proposed. This model incorporates a GLOVE-based word embedding approach, dropout, L2 regularization, and global max pooling to get impressive results. Further, the proposed BiLSTM-CNN model has been evaluated on various datasets to achieve state-of-the-art performance that is superior to the traditional and existing machine learning methods in terms of accuracy, precision, recall, and F1-score.

References

[1]

Sepp Hochreiter and Jürgen Schmidhuber. 1996. LSTM can solve hard long time lag problems. Advances in Neural Information Processing Systems 9 (1996), 473--479.

[2]

Thomas Davidson, Dana Warmsley, Michael Macy, and Ingmar Weber. 2017. Automated hate speech detection and the problem of offensive language. Proceedings of the International AAAI Conference on Web and Social Media 11, 1 (2017), 512–515.

[3]

Thomas Mandl, Sandip Modha, Prasenjit Majumder, Daksh Patel, Mohana Dave, Chintak Mandlia, and Aditya Patel. 2019. Overview of the HASOC track at FIRE 2019: Hate speech and offensive content identification in Indo-European languages. In Proceedings of the 11th Annual Meeting of the Forum for Information Retrieval Evaluation. 14–17.

Digital Library

[4]

Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Sara Rosenthal, Noura Farra, and Ritesh Kumar. 2019. Predicting the type and target of offensive posts in social media. arXiv preprint arXiv:1902.09666 (2019).

[5]

Nedjma Ousidhoum, Zizheng Lin, Hongming Zhang, Yangqiu Song, and Dit-Yan Yeung. 2019. Multilingual and multi-aspect hate speech analysis. arXiv preprint arXiv:1908.11049 (2019).

[6]

Jennifer Golbeck, Zahra Ashktorab, Rashad O. Banjo, Alexandra Berlinger, Siddharth Bhagwan, Cody Buntain, and Paul Cheakalos. 2017. A large, labeled corpus for online harassment research. In Proceedings of the 2017 ACM on Web Science Conference. 229–233.

Digital Library

[7]

Antigoni Founta, Constantinos Djouvas, Despoina Chatzakou, Ilias Leontiadis, Jeremy Blackburn, Gianluca Stringhini, Athena Vakali, Michael Sirivianos, and Nicolas Kourtellis. 2018. Large scale crowdsourcing and characterization of twitter abusive behavior. Proceedings of the International AAAI Conference on Web and Social Media 12, 1 (2018), 491--500.

[8]

Yi-Ling Chung, Elizaveta Kuzmenko, Serra Sinem Tekiroglu, and Marco Guerini. 2019. CONAN—COunter NArratives through Nichesourcing: A multilingual dataset of responses to fight online hate speech. arXiv preprint arXiv:1910.03270 (2019).

[9]

Joni Salminen, Maximilian Hopf, Shammur A. Chowdhury, Soon-gyo Jung, Hind Almerekhi, and Bernard J. Jansen. 2020. Developing an online hate classifier for multiple social media platforms. Human-centric Computing and Information Sciences 10 (2020), 1–34.

Digital Library

[10]

Brendan Kennedy, Mohammad Atari, Aida Mostafazadeh Davani, Leigh Yeh, Ali Omrani, Yehsong Kim, and Kris Coombs. 2022. Introducing the Gab Hate Corpus: Defining and applying hate-based rhetoric to social media posts at scale. Language Resources and Evaluation 56 (2022), 79--108.

Digital Library

[11]

Jana Kurrek, Haji Mohammad Saleem, and Derek Ruths. 2020. Towards a comprehensive taxonomy and large-scale annotated corpus for online slur usage. In Proceedings of the 4th Workshop on Online Abuse and Harms. 138–149.

[12]

Björn Gambäck and Utpal Kumar Sikdar. 2017. Using convolutional neural networks to classify hate-speech. In Proceedings of the 1st Workshop on Abusive Language Online. 85–90.

[13]

Ziqi Zhang, David Robinson, and Jonathan Tepper. 2018. Detecting hate speech on Twitter using a convolution-GRU based deep neural network. In Proceedings of the 15th International Conference on the Semantic Web (ESWC ’18). 745–760.

Digital Library

[14]

Auliya Rahman Isnain, Agus Sihabuddin, and Yohanes Suyanto. 2020. Bidirectional long short term memory method and Word2vec extraction approach for hate speech detection. Indonesian Journal of Computing and Cybernetics Systems (IJCCS) 14 (2020), 169–178.

[15]

Neeraj Vashistha and Arkaitz Zubiaga. 2020. Online multilingual hate speech detection: Experimenting with Hindi and English social media. Information 12 (2020), 1--16.

[16]

Yanling Zhou, Yanyan Yang, Han Liu, Xiufeng Liu, and Nick Savage. 2020. Deep learning based fusion approach for hate speech detection. IEEE Access 8 (2020), 128923–128929.

[17]

Gaddisa Olani Ganfure. 2022. Comparative analysis of deep learning based Afaan Oromo hate speech detection. Journal of Big Data 9, 1 (2022), 1–13.

[18]

Zeerak Waseem and Dirk Hovy. 2016. Hateful symbols or hateful people? Predictive features for hate speech detection on Twitter. In Proceedings of the NAACL Student Research Workshop. 88–93.

[19]

Praphula Kumar Jain, Vijayalakshmi Saravanan, and Rajendra Pamula. 2021. A hybrid CNN-LSTM: A deep learning approach for consumer sentiment analysis using qualitative user-generated contents. Transactions on Asian and Low-Resource Language Information Processing 20, 5 (2021), 1–15.

Digital Library

[20]

O. E. Ojo, T. H. Ta, A. Gelbukh, H. Calvo, G. Sidorov, and O. O. Adebanji. 2022. Automatic hate speech detection using deep neural networks and word embedding. Computación y Sistemas 26, 2 (2022), 1007–1013.

[21]

Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP ’14), 1532–1543.

[22]

Mike Schuster and Kuldip K. Paliwal. 1997. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing 45, 11 (1997), 2673–2681.

Digital Library

[23]

Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).

[24]

Tim O'Reilly and John Battelle. 2004. Opening Welcome: State of the Internet Industry (ICLR'15).

[25]

Joni Salminen, Hind Almerekhi, Milica Milenković, Soon-gyo Jung, Jisun An, Haewoon Kwak, and Bernard Jansen. 2018. Anatomy of online hate: Developing a taxonomy and machine learning models for identifying and classifying hate in online news media. Proceedings of the International AAAI Conference on Web and Social Media 12, 1 (2018), 330--339.

[26]

Mai ElSherief, Vivek Kulkarni, Dana Nguyen, William Yang Wang, and Elizabeth Belding. 2018. Hate lingo: A target-based linguistic analysis of hate speech in social media. Proceedings of the International AAAI Conference on Web and Social Media 12, 1 (2018), 42--51.

[27]

T. Zia, M. Shehbaz Akram, M. Saqib Nawaz, B. Shahzad, A. M. Abdullatif, R. U. Mustafa, and M. Ikramullah Lali. 2016. Identification of hatred speeches on Twitter. In Proceedings of the 52nd IRES International Conference. 27–32.

[28]

Jing Qian, Mai ElSherief, Elizabeth Belding, and William Yang Wang. 2019. Learning to decipher hate symbols. arXiv preprint arXiv:1904.02418 (2019).

[29]

Leandro Silva, Mainack Mondal, Denzil Correa, Fabrício Benevenuto, and Ingmar Weber. 2016. Analyzing the targets of hate in online social media. 2016. Proceedings of the International AAAI Conference on Web and Social Media 10, 1 (2016), 687–690.

[30]

Jing Qian, Anna Bethke, Yinyin Liu, Elizabeth Belding, and William Yang Wang. 2019. A benchmark dataset for learning to intervene in online hate speech. arXiv preprint arXiv:1909.04251 (2019).

[31]

Fabio Del Vignal, Andrea Cimino, Felice Dell'Orletta, Marinella Petrocchi, and Maurizio Tesconi. 2017. Hate me, hate me not: Hate speech detection on Facebook. In Proceedings of the 1st Italian Conference on Cybersecurity (ITASEC '17). 86–95.

[32]

Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 11 (1998), 2278–2324.

[33]

Agathe Balayn, Jie Yang, Zoltan Szlavik, and Alessandro Bozzon. 2021. Automatic identification of harmful, aggressive, abusive, and offensive language on the Web: A survey of technical biases informed by psychology literature. ACM Transactions on Social Computing 4, 3, Article 11 (September 2021), 56 pages.

Digital Library

[34]

Onder Coban, Selma Ayse Ozel, and Ali Inan. 2023. Detection and cross-domain evaluation of cyberbullying in Facebook activity contents for Turkish. ACM Transactions on Asian and Low-resource Language Information Processing 22, 4, Article 114 (2023), 32 pages.

Digital Library

[35]

Ameer Hamza, Abdul Rehman Javed, Farkhund Iqbal, Amanullah Yasin, Gautam Srivastava, Dawid Połap, Thippa Reddy Gadekallu, and Zunera Jalil. 2023. Multimodal religiously hateful social media memes classification based on textual and image data. ACM Transactions on Asian and Low-resource Language Information Processing. Just Accepted Article 00 (September 2023), 18 pages.

Digital Library

[36]

Sakshi Dhall, Ashutosh Dhar Dwivedi, Saibal K. Pal, and Gautam Srivastava. 2021. Blockchain-based framework for reducing fake or vicious news spread on social media/messaging platforms. ACM Transactions on Asian and Low-resource Language Information Processing 21, 1, Article 8 (January 2022), 33 pages.

Digital Library

[37]

Usman Ahmed, Rutvij H. Jhaveri, Gautam Srivastava, and Jerry Chun-Wei Lin. 2022. Explainable deep attention active learning for sentimental analytics of mental disorder. ACM Transactions on Asian and Low-resource Language Information Processing. Just Accepted Article 00 (August 2022), 21 pages.

Digital Library

Index Terms

A Hybrid Deep BiLSTM-CNN for Hate Speech Detection in Multi-social media
1. Computing methodologies
  1. Machine learning
2. Social and professional topics
  1. Computing / technology policy
    1. Censorship
      1. Hate speech

Recommendations

A Measurement Study of Hate Speech in Social Media
HT '17: Proceedings of the 28th ACM Conference on Hypertext and Social Media

Social media platforms provide an inexpensive communication medium that allows anyone to quickly reach millions of users. Consequently, in these platforms anyone can publish content and anyone interested in the content can obtain it, representing a ...
Spread and reception of fake news promoting hate speech against migrants and refugees in social media: Research Plan for the Doctoral Programme Education in the Knowledge Society
TEEM'19: Proceedings of the Seventh International Conference on Technological Ecosystems for Enhancing Multiculturality

The growing number of cases of hate speech against migrants and refugees substantially obeys to the relevance of social media and the presence of fake news in them. This research will triangulate three methods to study how fake contents in social media ...
Spread of Hate Speech in Online Social Media
WebSci '19: Proceedings of the 10th ACM Conference on Web Science

Hate speech is considered to be one of the major issues currently plaguing the online social media. With online hate speech culminating in gruesome scenarios like the Rohingya genocide in Myanmar, anti-Muslim mob violence in Sri Lanka, and the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Asian and Low-Resource Language Information Processing

ACM Transactions on Asian and Low-Resource Language Information Processing Volume 23, Issue 8

August 2024

343 pages

EISSN:2375-4702

DOI:10.1145/3613611

Editor:
Imed Zitouni
Google, USA
,
Guest Editors:
Deepak Kumar Jain,
Thierry Boumans,
Stefano Berretti

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 August 2024

Online AM: 06 May 2024

Accepted: 03 April 2024

Revised: 26 March 2024

Received: 05 July 2023

Published in TALLIP Volume 23, Issue 8

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
354
Total Downloads

Downloads (Last 12 months)354
Downloads (Last 6 weeks)50

Reflects downloads up to 09 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents