Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

A Hybrid Deep BiLSTM-CNN for Hate Speech Detection in Multi-social media

Published: 08 August 2024 Publication History

Abstract

Nowadays, means of communication among people have changed due to advancements in information technology and the rise of online multi-social media. Many people express their feelings, ideas, and emotions on social media sites such as Instagram, Twitter, Gab, Reddit, Facebook, and YouTube. However, people have misused social media to send hateful messages to specific individuals or groups to create chaos. For various governance authorities, manually identifying hate speech on various social media platforms is a difficult task to avoid such chaos. In this study, a hybrid deep-learning model, where bidirectional long short-term memory (BiLSTM) and convolutional neural network (CNN) are used to classify hate speech in textual data, is proposed. This model incorporates a GLOVE-based word embedding approach, dropout, L2 regularization, and global max pooling to get impressive results. Further, the proposed BiLSTM-CNN model has been evaluated on various datasets to achieve state-of-the-art performance that is superior to the traditional and existing machine learning methods in terms of accuracy, precision, recall, and F1-score.

References

[1]
Sepp Hochreiter and Jürgen Schmidhuber. 1996. LSTM can solve hard long time lag problems. Advances in Neural Information Processing Systems 9 (1996), 473--479.
[2]
Thomas Davidson, Dana Warmsley, Michael Macy, and Ingmar Weber. 2017. Automated hate speech detection and the problem of offensive language. Proceedings of the International AAAI Conference on Web and Social Media 11, 1 (2017), 512–515.
[3]
Thomas Mandl, Sandip Modha, Prasenjit Majumder, Daksh Patel, Mohana Dave, Chintak Mandlia, and Aditya Patel. 2019. Overview of the HASOC track at FIRE 2019: Hate speech and offensive content identification in Indo-European languages. In Proceedings of the 11th Annual Meeting of the Forum for Information Retrieval Evaluation. 14–17.
[4]
Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Sara Rosenthal, Noura Farra, and Ritesh Kumar. 2019. Predicting the type and target of offensive posts in social media. arXiv preprint arXiv:1902.09666 (2019).
[5]
Nedjma Ousidhoum, Zizheng Lin, Hongming Zhang, Yangqiu Song, and Dit-Yan Yeung. 2019. Multilingual and multi-aspect hate speech analysis. arXiv preprint arXiv:1908.11049 (2019).
[6]
Jennifer Golbeck, Zahra Ashktorab, Rashad O. Banjo, Alexandra Berlinger, Siddharth Bhagwan, Cody Buntain, and Paul Cheakalos. 2017. A large, labeled corpus for online harassment research. In Proceedings of the 2017 ACM on Web Science Conference. 229–233.
[7]
Antigoni Founta, Constantinos Djouvas, Despoina Chatzakou, Ilias Leontiadis, Jeremy Blackburn, Gianluca Stringhini, Athena Vakali, Michael Sirivianos, and Nicolas Kourtellis. 2018. Large scale crowdsourcing and characterization of twitter abusive behavior. Proceedings of the International AAAI Conference on Web and Social Media 12, 1 (2018), 491--500.
[8]
Yi-Ling Chung, Elizaveta Kuzmenko, Serra Sinem Tekiroglu, and Marco Guerini. 2019. CONAN—COunter NArratives through Nichesourcing: A multilingual dataset of responses to fight online hate speech. arXiv preprint arXiv:1910.03270 (2019).
[9]
Joni Salminen, Maximilian Hopf, Shammur A. Chowdhury, Soon-gyo Jung, Hind Almerekhi, and Bernard J. Jansen. 2020. Developing an online hate classifier for multiple social media platforms. Human-centric Computing and Information Sciences 10 (2020), 1–34.
[10]
Brendan Kennedy, Mohammad Atari, Aida Mostafazadeh Davani, Leigh Yeh, Ali Omrani, Yehsong Kim, and Kris Coombs. 2022. Introducing the Gab Hate Corpus: Defining and applying hate-based rhetoric to social media posts at scale. Language Resources and Evaluation 56 (2022), 79--108.
[11]
Jana Kurrek, Haji Mohammad Saleem, and Derek Ruths. 2020. Towards a comprehensive taxonomy and large-scale annotated corpus for online slur usage. In Proceedings of the 4th Workshop on Online Abuse and Harms. 138–149.
[12]
Björn Gambäck and Utpal Kumar Sikdar. 2017. Using convolutional neural networks to classify hate-speech. In Proceedings of the 1st Workshop on Abusive Language Online. 85–90.
[13]
Ziqi Zhang, David Robinson, and Jonathan Tepper. 2018. Detecting hate speech on Twitter using a convolution-GRU based deep neural network. In Proceedings of the 15th International Conference on the Semantic Web (ESWC ’18). 745–760.
[14]
Auliya Rahman Isnain, Agus Sihabuddin, and Yohanes Suyanto. 2020. Bidirectional long short term memory method and Word2vec extraction approach for hate speech detection. Indonesian Journal of Computing and Cybernetics Systems (IJCCS) 14 (2020), 169–178.
[15]
Neeraj Vashistha and Arkaitz Zubiaga. 2020. Online multilingual hate speech detection: Experimenting with Hindi and English social media. Information 12 (2020), 1--16.
[16]
Yanling Zhou, Yanyan Yang, Han Liu, Xiufeng Liu, and Nick Savage. 2020. Deep learning based fusion approach for hate speech detection. IEEE Access 8 (2020), 128923–128929.
[17]
Gaddisa Olani Ganfure. 2022. Comparative analysis of deep learning based Afaan Oromo hate speech detection. Journal of Big Data 9, 1 (2022), 1–13.
[18]
Zeerak Waseem and Dirk Hovy. 2016. Hateful symbols or hateful people? Predictive features for hate speech detection on Twitter. In Proceedings of the NAACL Student Research Workshop. 88–93.
[19]
Praphula Kumar Jain, Vijayalakshmi Saravanan, and Rajendra Pamula. 2021. A hybrid CNN-LSTM: A deep learning approach for consumer sentiment analysis using qualitative user-generated contents. Transactions on Asian and Low-Resource Language Information Processing 20, 5 (2021), 1–15.
[20]
O. E. Ojo, T. H. Ta, A. Gelbukh, H. Calvo, G. Sidorov, and O. O. Adebanji. 2022. Automatic hate speech detection using deep neural networks and word embedding. Computación y Sistemas 26, 2 (2022), 1007–1013.
[21]
Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP ’14), 1532–1543.
[22]
Mike Schuster and Kuldip K. Paliwal. 1997. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing 45, 11 (1997), 2673–2681.
[23]
Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[24]
Tim O'Reilly and John Battelle. 2004. Opening Welcome: State of the Internet Industry (ICLR'15).
[25]
Joni Salminen, Hind Almerekhi, Milica Milenković, Soon-gyo Jung, Jisun An, Haewoon Kwak, and Bernard Jansen. 2018. Anatomy of online hate: Developing a taxonomy and machine learning models for identifying and classifying hate in online news media. Proceedings of the International AAAI Conference on Web and Social Media 12, 1 (2018), 330--339.
[26]
Mai ElSherief, Vivek Kulkarni, Dana Nguyen, William Yang Wang, and Elizabeth Belding. 2018. Hate lingo: A target-based linguistic analysis of hate speech in social media. Proceedings of the International AAAI Conference on Web and Social Media 12, 1 (2018), 42--51.
[27]
T. Zia, M. Shehbaz Akram, M. Saqib Nawaz, B. Shahzad, A. M. Abdullatif, R. U. Mustafa, and M. Ikramullah Lali. 2016. Identification of hatred speeches on Twitter. In Proceedings of the 52nd IRES International Conference. 27–32.
[28]
Jing Qian, Mai ElSherief, Elizabeth Belding, and William Yang Wang. 2019. Learning to decipher hate symbols. arXiv preprint arXiv:1904.02418 (2019).
[29]
Leandro Silva, Mainack Mondal, Denzil Correa, Fabrício Benevenuto, and Ingmar Weber. 2016. Analyzing the targets of hate in online social media. 2016. Proceedings of the International AAAI Conference on Web and Social Media 10, 1 (2016), 687–690.
[30]
Jing Qian, Anna Bethke, Yinyin Liu, Elizabeth Belding, and William Yang Wang. 2019. A benchmark dataset for learning to intervene in online hate speech. arXiv preprint arXiv:1909.04251 (2019).
[31]
Fabio Del Vignal, Andrea Cimino, Felice Dell'Orletta, Marinella Petrocchi, and Maurizio Tesconi. 2017. Hate me, hate me not: Hate speech detection on Facebook. In Proceedings of the 1st Italian Conference on Cybersecurity (ITASEC '17). 86–95.
[32]
Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 11 (1998), 2278–2324.
[33]
Agathe Balayn, Jie Yang, Zoltan Szlavik, and Alessandro Bozzon. 2021. Automatic identification of harmful, aggressive, abusive, and offensive language on the Web: A survey of technical biases informed by psychology literature. ACM Transactions on Social Computing 4, 3, Article 11 (September 2021), 56 pages.
[34]
Onder Coban, Selma Ayse Ozel, and Ali Inan. 2023. Detection and cross-domain evaluation of cyberbullying in Facebook activity contents for Turkish. ACM Transactions on Asian and Low-resource Language Information Processing 22, 4, Article 114 (2023), 32 pages.
[35]
Ameer Hamza, Abdul Rehman Javed, Farkhund Iqbal, Amanullah Yasin, Gautam Srivastava, Dawid Połap, Thippa Reddy Gadekallu, and Zunera Jalil. 2023. Multimodal religiously hateful social media memes classification based on textual and image data. ACM Transactions on Asian and Low-resource Language Information Processing. Just Accepted Article 00 (September 2023), 18 pages.
[36]
Sakshi Dhall, Ashutosh Dhar Dwivedi, Saibal K. Pal, and Gautam Srivastava. 2021. Blockchain-based framework for reducing fake or vicious news spread on social media/messaging platforms. ACM Transactions on Asian and Low-resource Language Information Processing 21, 1, Article 8 (January 2022), 33 pages.
[37]
Usman Ahmed, Rutvij H. Jhaveri, Gautam Srivastava, and Jerry Chun-Wei Lin. 2022. Explainable deep attention active learning for sentimental analytics of mental disorder. ACM Transactions on Asian and Low-resource Language Information Processing. Just Accepted Article 00 (August 2022), 21 pages.

Index Terms

  1. A Hybrid Deep BiLSTM-CNN for Hate Speech Detection in Multi-social media

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 23, Issue 8
      August 2024
      343 pages
      EISSN:2375-4702
      DOI:10.1145/3613611
      Issue’s Table of Contents

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 08 August 2024
      Online AM: 06 May 2024
      Accepted: 03 April 2024
      Revised: 26 March 2024
      Received: 05 July 2023
      Published in TALLIP Volume 23, Issue 8

      Check for updates

      Author Tags

      1. Hate speech
      2. CNN
      3. Bi-LSTM
      4. machine learning

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 354
        Total Downloads
      • Downloads (Last 12 months)354
      • Downloads (Last 6 weeks)50
      Reflects downloads up to 09 Nov 2024

      Other Metrics

      Citations

      View Options

      Get Access

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      Full Text

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media