Abstract
With the availability of a wide range of images and videos on the Internet, classification and detection of inappropriate content has become a matter of serious concern. This type of content has a harmful impact on the minds of minors as well as on adults. Therefore, it is necessary to control and detect such content from images and videos. Recent research has focused on deep-learning-based automated pornographic detection, a bold move to replace humans in the time-consuming task of moderating online content. This paper is based on the idea that incorporating detailed information into a model helps solve the problem of mapping pornographic content. In this paper, a novel deep-learning transformer-based framework namely, Obscenity Detection Transformer (ODT) is proposed to detect and classify inappropriate or pornographic content from videos. The proposed transformer inputs video frames and leverages the vision transformer with the LSTM layer. LSTM embedding enables the network to extract more informative features. Also, GELU activation-based MLP is employed to classify pornographic and non-pornographic content. The advantage of leveraging transformer-based architecture is that these architectures improve efficiency and accuracy when compared with CNN-based models. To validate the efficiency and efficacy of the proposed model, extensive experiments are carried out on Pornography-2 k and Pornography-800 datasets. The proposed model outperforms the current state-of-the-art (CNN) in terms of computational efficiency and accuracy. The accuracies achieved for the two aforementioned datasets are 99.6% and 98.8%, respectively.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
Data sharing not applicable to this article as no datasets were generated during the current study.
References
Avila S, Thome N, Cord M, Valle E, De A (2011) BOSSA: Extended bow formalism for image classification. Proc - Int Conf Image Process. ICIP (1): 2909–2912. https://doi.org/10.1109/ICIP.2011.6116268
Avila S, Thome N, Cord M, Valle E, De A. Araújo A (2013) Pooling in image representation: The visual codeword point of view. Comput Vis Image Underst 117(5):453–465. https://doi.org/10.1016/j.cviu.2012.09.007
Bhatt R, Onyema EM, Almuzaini KK, Iwendi C, Band SS, Sharma T, Mosavi A. Assessment of dynamic swarm heterogeneous clustering in cognitive radio sensor networks. Wirel Commun Mob Comput. 2022. Article ID 7359210: 1–15. https://doi.org/10.1155/2022/7359210
Bouirouga H, El Fkihi S, Jilbab A, Aboutajdine D (2012) Skin detection in pornographic videos using threshold technique. J Theor Appl Inf Technol 35(1):7–19
Caetano C, Avila S, Schwartz WR, Guimarães SJF, de A. Araújo A (2016) A mid-level video representation based on binary descriptors: A case study for pornography detection. Neurocomputing 213:102–114. https://doi.org/10.1016/j.neucom.2016.03.099
Chen J, Liang G, He W, Xu C, Yang J, Liu R (2020) A pornographic images recognition model based on deep one-class classification with visual attention mechanism. IEEE Access 8:122709–122721
Farrelly B, Sun Y, Mahanti A, Gong M (2017) Video Workload Characteristics of Online Porn: Perspectives from a Major Video Streaming Service, 2017 IEEE 42nd Conference on Local Computer Networks (LCN), Singapore, pp. 518–519. https://doi.org/10.1109/LCN.2017.119
Fleck M, Forsyth D, Bregler C (1996) Finding naked people, in: Proceedings of the European Conference on Computer Vision (ECCV). 1065, pp. 593–602
Forsyth D, Fleck M (1996) Identifying nude pictures, in: Proceedings of the IEEE Workshop on Applications of Computer Vision. pp. 103–108
Forsyth D, Fleck M (1999) Automatic detection of human nudes. Int J Comput Vis 32(1):63–77
Gangwar A, González-Castro V, Alegre E, Fidalgo E (2021) AttM-CNN: Attention and metric learning based CNN for pornography, age and child sexual abuse (CSA) detection in images. Neurocomputing 445:81–104
Gautam N, Vishwakarma DK (2022) Obscenity detection in videos through a sequential ConvNet pipeline classifier. IEEE Trans Cogn Dev Syst 15(1):310–318
Guo MH, Xu TX, Liu JJ, Liu ZN, Jiang PT, Mu TJ, Zhang S-H, Martin RR, Cheng M-M, Hu S-M (2022) Attention mechanisms in computer vision: A survey. Computational Visual Media 8:1–38
Huang C, Yuan C, Zhang J (2020). Violation Detection of Live Video Based on Deep Learning, https://doi.org/10.1155/2020/1895341
Jones MJ, Rehg JM (1999) Statistical color models with application to skin detection. IEEE Computer Society Conference on Computer Vision and Pattern Recognition. (Cat. No PR00149), Fort Collins, CO, USA, pp 274–280. https://doi.org/10.1109/CVPR.1999.786951
Jones M, Rehg J (2002) Statistical color models with application to skin detection. Int J Comput Vis 46(1):81–96
Lee S, Shim W, Kim S (2009) Hierarchical system for objectionable video detection. IEEE Trans Consum Electron 55(2):677–684
Moreira D et al (2016) Pornography classification: The hidden clues in video space–time. Forensic Sci Int 268:46–61. https://doi.org/10.1016/j.forsciint.2016.09.010
Moustafa M (2015) Applying deep learning to classify pornographic images and videos. arXiv preprint arXiv:1511.08899
Perez M, Avila S, Moreira D, Moraes D, Testoni V, Valle E, Rocha A (2017) Neurocomputing 230: 279-293. https://doi.org/10.1016/j.neucom.2016.12.017
Quadra A, El-Murr A, Latham J (2017) The effects of pornography on children and young people: An evidence scan. Australian Institute of Family Studies
Rowley H, Jing Y, Baluja S (2006) Large scale image-based adult-content filtering, in: Proceedings of the International Conference on Computer Vision Theory and Applications (VISAPP), pp. 290–296
Samal S, Nayak R, Jena S et al (2023) Obscene image detection using transfer learning and feature fusion. Multimed Tools Appl. https://doi.org/10.1007/s11042-023-14437-7
Samal S, Zhang Y-D, Gadekallu TR, Nayak R, Balabantaray BK (2023) SBMYv3: Improved MobYOLOv3 a BAM attention-based approach for obscene image and video detection. Expert Systems e13230. https://doi.org/10.1111/exsy.13230
da Silva MV, Marana AN (2019) Spatiotemporal CNNs for pornography detection in videos. Lect Notes Comput Sci (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics) 11401 LNCS:547–555. https://doi.org/10.1007/978-3-030-13469-3_64
Song Y-D, Gong M, Mahanti A (2019) Measurement and Analysis of an Adult Video Streaming Service, 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), Vancouver, BC, Canada, pp. 489–492. https://doi.org/10.1145/3341161.3342940
Wang L, Zhang J, Wang M, Tian J, Zhuo L (2020) Multilevel fusion of multimodal deep features for porn streamer recognition in live video. Pattern Recogn Lett 140:150–157
Wehrmann J, Simões GS, Barros RC, Cavalcante VF (2018) Adult content detection in videos with convolutional and recurrent neural networks. Neurocomputing 272:432–438. https://doi.org/10.1016/j.neucom.2017.07.012
Wong C, Song YD, Mahanti A (2020) YouTube of porn: longitudinal measurement, analysis, and characterization of a large porn streaming service. Soc Netw Anal Min 10:62. https://doi.org/10.1007/s13278-020-00661-8
Yousaf K, Nawaz T (2022) A deep learning-based approach for inappropriate content detection and classification of youtube videos. IEEE Access 10:16283–16298
Yu R, Christophersen C, Song Y-D, Mahanti A (2019) Comparative analysis of adult video streaming services: characteristics and workload, 2019 Network Traffic Measurement and Analysis Conference (TMA), Paris, France, pp. 49-56. https://doi.org/10.23919/TMA.2019.8784688
Zheng H, Daoudi M (2004) Blocking adult images based on statistical skin detection. Electron Lett Comput Vis Image Anal 4(2):1–14
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Rautela, K., Sharma, D., Kumar, V. et al. Obscenity detection transformer for detecting inappropriate contents from videos. Multimed Tools Appl 83, 10799–10814 (2024). https://doi.org/10.1007/s11042-023-16078-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-16078-2