Nothing Special   »   [go: up one dir, main page]

Skip to main content

Automatic Audio Replacement of Objectionable Content for Sri Lankan Locale

  • Conference paper
  • First Online:
Innovative Systems for Intelligent Health Informatics (IRICT 2020)

Abstract

Fake news, hate speech, crude language, ethnic and racial slurs and more have been spreading widely every day, yet in Sri Lanka, there is no definite solution to save our society from such profanities. The method we propose detects racist, sexist and cursing objectionable content of Sinhala, Tamil and English languages. To selectively filter out the potentially objectionable audio content, the input audio is first preprocessed, converted into text format, and then such objectionable content is detected with a machine learning filtering mechanism. In order to validate its offensive nature, a preliminary filtering model was created which takes the converted sentences as input and classifies them through a binary classification. When the text is classified as offensive, then secondary filtering is carried out with a separate multi-class text classification model which classifies each word in the sentence into sexist, racist, cursing, and non-offensive categories. The models in preliminary filtering involve the Term Frequency–Inverse Document Frequency (TF-IDF) vectorizer and Support Vector Machine algorithm with varying hyperparameters. As for the multi-class classification model for Sinhala language, the combination of Logistic Regression (LR) and Countvectorizer was used while the Multinomial Naive Bayes and TF-IDF vectorizer model was found suitable for Tamil. For English, LR with Countvectorizer model was chosen to proceed. The system has an 89% and 77% accuracy of detection for Sinhala and Tamil respectively. Finally, the detected objectionable content is replaced in the audio with a predetermined audio input.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Conye, S.M., Stockdale, L.A., Nelson, D.A., Fraser, A.: Profanity in media associated with attitudes and behavior regarding profanity use and aggression. Pediatrics 2011 128(5), 867–872 (2011). https://doi.org/10.1542/peds.2011-1062

    Article  Google Scholar 

  2. Stuart, et al.: Automatic Replacement of Objectionable Audio Content from Audio Signals, by. Patent US 20090055189A1, 26 Feb 2009

    Google Scholar 

  3. Nair, P.: Filtering some Portions of a Multimedia Stream. Patent US 2014O129225A1, 08 May 2014

    Google Scholar 

  4. Fein, G., Merritt, E.: Communication Device Language Filter. Patent US 2010O28O828A1, 04 Nov 2010

    Google Scholar 

  5. Vanjan, V.: Systems and Methods for Filtering Objectionable Content. Patent US 20150205574A1, 23 July 2015

    Google Scholar 

  6. Fortuna, P., Nunes, S.: A survey on automatic detection of hate speech in text. ACM Comput. Surv. 81(05), 1–30 (2018)

    Article  Google Scholar 

  7. Anand, M., Eswari, R.: Classification of abusive comments in social media using deep learning. In: 2019 3rd International Conference on Computing Methodologies and Communication (ICCMC), pp. 974–977 (2019)

    Google Scholar 

  8. Dias, D., Welikala, M., Dias, N.G.J.: Identifying racist social media comments in sinhala language using text analytics models with machine learning. In: Conference: 2018 18th International Conference on Advances in ICT for Emerging Regions, pp. 1–6. Colombo, Sri Lanka (Sept 2018). https://doi.org/10.1109/ICTER.2018.8615492

  9. Djuric, N., Zhou, J., Morris, R., Grbovic, M., Radosavljevic, V., Bhamidipati, N.: Hate speech detection with comment embeddings. In: WWW 2015 Companion: Proceedings of the 24th International Conference on World Wide Web, p. 29. Association for Computing Machinery, New York, May 2015. https://doi.org/10.1145/2740908.2742760

  10. Vigna, F.D., Cimino, A., Dell’Orletta, F., Petrocchi, M.: Hate me, hate me not: hate speech detection on Facebook. In: Proceedings of the First Italian Conference on Cybersecurity (ITASEC17), Venice, Italy, Jan 2017

    Google Scholar 

  11. Schmidt, A., Wiegand, M.: A survey on hate speech detection using natural language processing. In: Conference: Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media, pp. 1–10. Association for Computational Linguistics, Jan 2017

    Google Scholar 

  12. Loper, E., Bird, S.: NLTK: The Natural Language Tool. Presented at computer research repository, 1, 63–70 (2002).https://doi.org/10.3115/1118108.1118117

  13. Allen, B.: Downey, “Think DSP”, Digital Signal Processing in Python, 1st edn. O’Reilly Media Inc., USA (2014)

    Google Scholar 

  14. Pranckevičius, T., Marcinkevičius, V.: Comparison of naïve bayes, random forest, decision tree, support vector machines, and logistic regression classifiers for text reviews classification. Baltic J. Modern Computing 5(2), 221–232 (2017)

    Article  Google Scholar 

  15. Chawla, N.V., Japkowicz, N., Kotcz, A.: Editorial: special issue on learning from imbalanced data sets. ACM SIGKDD Explorations Newsl 6, 1 (2004). https://doi.org/10.1145/1007730.1007733

    Article  Google Scholar 

  16. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay. É.: Scikit-learn: Machine Learning in Python (2011)

    Google Scholar 

  17. Colas, F., Brazdil, P.: Comparison of SVM and some older classification algorithms in text classification tasks. Artificial Intelligence in Theory and Practice, 217 (2006). https://doi.org/10.1007/978-0-387-34747-9_18

  18. Wang, S., Manning, C.D.: Baselines and bigrams: simple, good sentiment and topic classification. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, pp. 90–94, Jeju, Republic of Korea, 8–14 July 2012 (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Rajalingam, G., Jeyachandran, J., Siriwardane, M.S.M., Pathmaseelan, T., Jayawardhane, R.K.N.D., Weerakoon, N.S. (2021). Automatic Audio Replacement of Objectionable Content for Sri Lankan Locale. In: Saeed, F., Mohammed, F., Al-Nahari, A. (eds) Innovative Systems for Intelligent Health Informatics. IRICT 2020. Lecture Notes on Data Engineering and Communications Technologies, vol 72. Springer, Cham. https://doi.org/10.1007/978-3-030-70713-2_98

Download citation

Publish with us

Policies and ethics