Automatic Audio Replacement of Objectionable Content for Sri Lankan Locale

Gobiga Rajalingam⁵,
Janarthan Jeyachandran⁵,
M. S. M. Siriwardane⁵,
Tharshvini Pathmaseelan⁵,
R. K. N. D. Jayawardhane⁵ &
…
N. S. Weerakoon⁵

Part of the book series: Lecture Notes on Data Engineering and Communications Technologies ((LNDECT,volume 72))

Included in the following conference series:

International Conference of Reliable Information and Communication Technology

1257 Accesses

Abstract

Fake news, hate speech, crude language, ethnic and racial slurs and more have been spreading widely every day, yet in Sri Lanka, there is no definite solution to save our society from such profanities. The method we propose detects racist, sexist and cursing objectionable content of Sinhala, Tamil and English languages. To selectively filter out the potentially objectionable audio content, the input audio is first preprocessed, converted into text format, and then such objectionable content is detected with a machine learning filtering mechanism. In order to validate its offensive nature, a preliminary filtering model was created which takes the converted sentences as input and classifies them through a binary classification. When the text is classified as offensive, then secondary filtering is carried out with a separate multi-class text classification model which classifies each word in the sentence into sexist, racist, cursing, and non-offensive categories. The models in preliminary filtering involve the Term Frequency–Inverse Document Frequency (TF-IDF) vectorizer and Support Vector Machine algorithm with varying hyperparameters. As for the multi-class classification model for Sinhala language, the combination of Logistic Regression (LR) and Countvectorizer was used while the Multinomial Naive Bayes and TF-IDF vectorizer model was found suitable for Tamil. For English, LR with Countvectorizer model was chosen to proceed. The system has an 89% and 77% accuracy of detection for Sinhala and Tamil respectively. Finally, the detected objectionable content is replaced in the audio with a predetermined audio input.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

AC: An Audio Classifier to Classify Violent Extensive Audios

Automatic hate speech detection in audio using machine learning algorithms

Article 26 June 2024

Comparative Performance of Machine Learning Algorithms in Detecting Offensive Speech in Malayalam-English Code-Mixed Data

References

Conye, S.M., Stockdale, L.A., Nelson, D.A., Fraser, A.: Profanity in media associated with attitudes and behavior regarding profanity use and aggression. Pediatrics 2011 128(5), 867–872 (2011). https://doi.org/10.1542/peds.2011-1062
Article Google Scholar
Stuart, et al.: Automatic Replacement of Objectionable Audio Content from Audio Signals, by. Patent US 20090055189A1, 26 Feb 2009
Google Scholar
Nair, P.: Filtering some Portions of a Multimedia Stream. Patent US 2014O129225A1, 08 May 2014
Google Scholar
Fein, G., Merritt, E.: Communication Device Language Filter. Patent US 2010O28O828A1, 04 Nov 2010
Google Scholar
Vanjan, V.: Systems and Methods for Filtering Objectionable Content. Patent US 20150205574A1, 23 July 2015
Google Scholar
Fortuna, P., Nunes, S.: A survey on automatic detection of hate speech in text. ACM Comput. Surv. 81(05), 1–30 (2018)
Article Google Scholar
Anand, M., Eswari, R.: Classification of abusive comments in social media using deep learning. In: 2019 3rd International Conference on Computing Methodologies and Communication (ICCMC), pp. 974–977 (2019)
Google Scholar
Dias, D., Welikala, M., Dias, N.G.J.: Identifying racist social media comments in sinhala language using text analytics models with machine learning. In: Conference: 2018 18th International Conference on Advances in ICT for Emerging Regions, pp. 1–6. Colombo, Sri Lanka (Sept 2018). https://doi.org/10.1109/ICTER.2018.8615492
Djuric, N., Zhou, J., Morris, R., Grbovic, M., Radosavljevic, V., Bhamidipati, N.: Hate speech detection with comment embeddings. In: WWW 2015 Companion: Proceedings of the 24th International Conference on World Wide Web, p. 29. Association for Computing Machinery, New York, May 2015. https://doi.org/10.1145/2740908.2742760
Vigna, F.D., Cimino, A., Dell’Orletta, F., Petrocchi, M.: Hate me, hate me not: hate speech detection on Facebook. In: Proceedings of the First Italian Conference on Cybersecurity (ITASEC17), Venice, Italy, Jan 2017
Google Scholar
Schmidt, A., Wiegand, M.: A survey on hate speech detection using natural language processing. In: Conference: Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media, pp. 1–10. Association for Computational Linguistics, Jan 2017
Google Scholar
Loper, E., Bird, S.: NLTK: The Natural Language Tool. Presented at computer research repository, 1, 63–70 (2002).https://doi.org/10.3115/1118108.1118117
Allen, B.: Downey, “Think DSP”, Digital Signal Processing in Python, 1st edn. O’Reilly Media Inc., USA (2014)
Google Scholar
Pranckevičius, T., Marcinkevičius, V.: Comparison of naïve bayes, random forest, decision tree, support vector machines, and logistic regression classifiers for text reviews classification. Baltic J. Modern Computing 5(2), 221–232 (2017)
Article Google Scholar
Chawla, N.V., Japkowicz, N., Kotcz, A.: Editorial: special issue on learning from imbalanced data sets. ACM SIGKDD Explorations Newsl 6, 1 (2004). https://doi.org/10.1145/1007730.1007733
Article Google Scholar
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay. É.: Scikit-learn: Machine Learning in Python (2011)
Google Scholar
Colas, F., Brazdil, P.: Comparison of SVM and some older classification algorithms in text classification tasks. Artificial Intelligence in Theory and Practice, 217 (2006). https://doi.org/10.1007/978-0-387-34747-9_18
Wang, S., Manning, C.D.: Baselines and bigrams: simple, good sentiment and topic classification. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, pp. 90–94, Jeju, Republic of Korea, 8–14 July 2012 (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Rajarata University of Sri Lanka, Mihinthale, Sri Lanka
Gobiga Rajalingam, Janarthan Jeyachandran, M. S. M. Siriwardane, Tharshvini Pathmaseelan, R. K. N. D. Jayawardhane & N. S. Weerakoon

Authors

Gobiga Rajalingam
View author publications
You can also search for this author in PubMed Google Scholar
Janarthan Jeyachandran
View author publications
You can also search for this author in PubMed Google Scholar
M. S. M. Siriwardane
View author publications
You can also search for this author in PubMed Google Scholar
Tharshvini Pathmaseelan
View author publications
You can also search for this author in PubMed Google Scholar
R. K. N. D. Jayawardhane
View author publications
You can also search for this author in PubMed Google Scholar
N. S. Weerakoon
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

College of Computer Science and Engineering, Taibah University, Medina, Saudi Arabia
Faisal Saeed
School of Computing, Information Systems Department, Universiti Utara Malaysia, Sintok, Malaysia
Fathey Mohammed
Sanaa’a Community College, Sana'a, Yemen
Abdulaziz Al-Nahari

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rajalingam, G., Jeyachandran, J., Siriwardane, M.S.M., Pathmaseelan, T., Jayawardhane, R.K.N.D., Weerakoon, N.S. (2021). Automatic Audio Replacement of Objectionable Content for Sri Lankan Locale. In: Saeed, F., Mohammed, F., Al-Nahari, A. (eds) Innovative Systems for Intelligent Health Informatics. IRICT 2020. Lecture Notes on Data Engineering and Communications Technologies, vol 72. Springer, Cham. https://doi.org/10.1007/978-3-030-70713-2_98

Download citation

DOI: https://doi.org/10.1007/978-3-030-70713-2_98
Published: 06 May 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-70712-5
Online ISBN: 978-3-030-70713-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Automatic Audio Replacement of Objectionable Content for Sri Lankan Locale

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

AC: An Audio Classifier to Classify Violent Extensive Audios

Automatic hate speech detection in audio using machine learning algorithms

Comparative Performance of Machine Learning Algorithms in Detecting Offensive Speech in Malayalam-English Code-Mixed Data

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Automatic Audio Replacement of Objectionable Content for Sri Lankan Locale

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

AC: An Audio Classifier to Classify Violent Extensive Audios

Automatic hate speech detection in audio using machine learning algorithms

Comparative Performance of Machine Learning Algorithms in Detecting Offensive Speech in Malayalam-English Code-Mixed Data

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation