Abstract
Fake news, hate speech, crude language, ethnic and racial slurs and more have been spreading widely every day, yet in Sri Lanka, there is no definite solution to save our society from such profanities. The method we propose detects racist, sexist and cursing objectionable content of Sinhala, Tamil and English languages. To selectively filter out the potentially objectionable audio content, the input audio is first preprocessed, converted into text format, and then such objectionable content is detected with a machine learning filtering mechanism. In order to validate its offensive nature, a preliminary filtering model was created which takes the converted sentences as input and classifies them through a binary classification. When the text is classified as offensive, then secondary filtering is carried out with a separate multi-class text classification model which classifies each word in the sentence into sexist, racist, cursing, and non-offensive categories. The models in preliminary filtering involve the Term Frequency–Inverse Document Frequency (TF-IDF) vectorizer and Support Vector Machine algorithm with varying hyperparameters. As for the multi-class classification model for Sinhala language, the combination of Logistic Regression (LR) and Countvectorizer was used while the Multinomial Naive Bayes and TF-IDF vectorizer model was found suitable for Tamil. For English, LR with Countvectorizer model was chosen to proceed. The system has an 89% and 77% accuracy of detection for Sinhala and Tamil respectively. Finally, the detected objectionable content is replaced in the audio with a predetermined audio input.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Conye, S.M., Stockdale, L.A., Nelson, D.A., Fraser, A.: Profanity in media associated with attitudes and behavior regarding profanity use and aggression. Pediatrics 2011 128(5), 867–872 (2011). https://doi.org/10.1542/peds.2011-1062
Stuart, et al.: Automatic Replacement of Objectionable Audio Content from Audio Signals, by. Patent US 20090055189A1, 26 Feb 2009
Nair, P.: Filtering some Portions of a Multimedia Stream. Patent US 2014O129225A1, 08 May 2014
Fein, G., Merritt, E.: Communication Device Language Filter. Patent US 2010O28O828A1, 04 Nov 2010
Vanjan, V.: Systems and Methods for Filtering Objectionable Content. Patent US 20150205574A1, 23 July 2015
Fortuna, P., Nunes, S.: A survey on automatic detection of hate speech in text. ACM Comput. Surv. 81(05), 1–30 (2018)
Anand, M., Eswari, R.: Classification of abusive comments in social media using deep learning. In: 2019 3rd International Conference on Computing Methodologies and Communication (ICCMC), pp. 974–977 (2019)
Dias, D., Welikala, M., Dias, N.G.J.: Identifying racist social media comments in sinhala language using text analytics models with machine learning. In: Conference: 2018 18th International Conference on Advances in ICT for Emerging Regions, pp. 1–6. Colombo, Sri Lanka (Sept 2018). https://doi.org/10.1109/ICTER.2018.8615492
Djuric, N., Zhou, J., Morris, R., Grbovic, M., Radosavljevic, V., Bhamidipati, N.: Hate speech detection with comment embeddings. In: WWW 2015 Companion: Proceedings of the 24th International Conference on World Wide Web, p. 29. Association for Computing Machinery, New York, May 2015. https://doi.org/10.1145/2740908.2742760
Vigna, F.D., Cimino, A., Dell’Orletta, F., Petrocchi, M.: Hate me, hate me not: hate speech detection on Facebook. In: Proceedings of the First Italian Conference on Cybersecurity (ITASEC17), Venice, Italy, Jan 2017
Schmidt, A., Wiegand, M.: A survey on hate speech detection using natural language processing. In: Conference: Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media, pp. 1–10. Association for Computational Linguistics, Jan 2017
Loper, E., Bird, S.: NLTK: The Natural Language Tool. Presented at computer research repository, 1, 63–70 (2002).https://doi.org/10.3115/1118108.1118117
Allen, B.: Downey, “Think DSP”, Digital Signal Processing in Python, 1st edn. O’Reilly Media Inc., USA (2014)
Pranckevičius, T., Marcinkevičius, V.: Comparison of naïve bayes, random forest, decision tree, support vector machines, and logistic regression classifiers for text reviews classification. Baltic J. Modern Computing 5(2), 221–232 (2017)
Chawla, N.V., Japkowicz, N., Kotcz, A.: Editorial: special issue on learning from imbalanced data sets. ACM SIGKDD Explorations Newsl 6, 1 (2004). https://doi.org/10.1145/1007730.1007733
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay. É.: Scikit-learn: Machine Learning in Python (2011)
Colas, F., Brazdil, P.: Comparison of SVM and some older classification algorithms in text classification tasks. Artificial Intelligence in Theory and Practice, 217 (2006). https://doi.org/10.1007/978-0-387-34747-9_18
Wang, S., Manning, C.D.: Baselines and bigrams: simple, good sentiment and topic classification. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, pp. 90–94, Jeju, Republic of Korea, 8–14 July 2012 (2012)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Rajalingam, G., Jeyachandran, J., Siriwardane, M.S.M., Pathmaseelan, T., Jayawardhane, R.K.N.D., Weerakoon, N.S. (2021). Automatic Audio Replacement of Objectionable Content for Sri Lankan Locale. In: Saeed, F., Mohammed, F., Al-Nahari, A. (eds) Innovative Systems for Intelligent Health Informatics. IRICT 2020. Lecture Notes on Data Engineering and Communications Technologies, vol 72. Springer, Cham. https://doi.org/10.1007/978-3-030-70713-2_98
Download citation
DOI: https://doi.org/10.1007/978-3-030-70713-2_98
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-70712-5
Online ISBN: 978-3-030-70713-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)