research-article

KidsGUARD: fine grained approach for child unsafe video representation and detection

Authors:

Rishabh Kaushal,

Arun Balaji Buduru,

Ponnurangam KumaraguruAuthors Info & Claims

SAC '19: Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing

Pages 2104 - 2111

https://doi.org/10.1145/3297280.3297487

Published: 08 April 2019 Publication History

Abstract

Increasingly more and more videos are being uploaded on video sharing platforms, and a significant number of viewers on these platforms are children. At times, these videos have violent or sexually explicit scenes (referred as child unsafe) to catch children's attention. To evade moderation, malicious video uploaders typically limit the child unsafe content to only a few frames in the video. Hence, a fine-grained approach, referred as KidsGUARD¹, to detect sparsely present child unsafe content is required. Prior approaches to content moderation either flag the entire video as inappropriate or use hand-crafted features derived from video frames. In this work, we leverage Long Short Term Memory (LSTM) based autoencoder to learn effective video representations of video descriptors obtained from using VGG16 Convolutional Neural Network (CNN). Encoded video representations are fed into LSTM classifier for detection of sparse child unsafe video content. To evaluate this approach, we create a dataset of 109,835 video clips curated specifically for child unsafe content. We find that deep learning approach (1) detects fine-grained child unsafe video content with the granularity of 1 second, (2) identifies even sparsely location child unsafe video content by achieving a high recall of 81% at high precision of 80%, and (3) outperforms baseline video encoding approaches based on like Fisher Vector (FV) and Vector of Locally Aggregated Descriptors (VLAD).

References

[1]

Sandra Avila, Nicolas Thome, Matthieu Cord, Eduardo Valle, and Arnaldo De A AraúJo. 2013. Pooling in image representation: The visual codeword point of view. Computer Vision and Image Understanding 117, 5 (2013), 453--465.

Digital Library

[2]

Vlad Bulakh, Christopher W Dunn, and Minaxi Gupta. 2014. Identifying fraudulently promoted online videos. In Proceedings of the 23rd International Conference on World Wide Web. ACM, 1111--1116.

Digital Library

[3]

Carlos Caetano, Sandra Avila, Silvio Guimaraes, and Arnaldo de A Araújo. 2014. Pornography detection using bossanova video descriptor. In Signal Processing Conference (EUSIPCO), 2014 Proceedings of the 22nd European. IEEE, 1681--1685.

[4]

Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: a library for support vector machines. ACM transactions on intelligent systems and technology (TIST) 2, 3 (2011), 27.

Digital Library

[5]

Stuart Dredge. 2015. Why YouTube is the new children's TV... and why it matters. theguardian (November 2015). https://www.theguardian.com/technology/2015/nov/19/youtube-is-the-new-childrens-tv-heres-why-that-matters {Online; posted 19-November-2015}.

[6]

Wenbin Du, Yali Wang, and Yu Qiao. 2018. Recurrent Spatial-Temporal Attention Network for Action Recognition in Videos. IEEE Transactions on Image Processing 27, 3 (2018), 1347--1360.

[7]

Lijuan Duan, Guoqin Cui, Wen Gao, and Hongming Zhang. 2002. Adult image detection method base-on skin color model and support vector machine. In Asian conference on computer vision. 797--800.

[8]

Tadilo Endeshaw, Johan Garcia, and Andreas Jakobsson. 2008. Classification of indecent videos by low complexity repetitive motion detection. In Applied Imagery Pattern Recognition Workshop, 2008. AIPR'08. 37th IEEE. IEEE, 1--7.

Digital Library

[9]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780.

Digital Library

[10]

Christian Jansohn, Adrian Ulges, and Thomas M Breuel. 2009. Detecting pornographic video content by combining image features with motion information. In Proceedings of the 17th ACM international conference on Multimedia. ACM, 601--604.

Digital Library

[11]

Hervé Jégou, Matthijs Douze, Cordelia Schmid, and Patrick Pérez. 2010. Aggregating local descriptors into a compact image representation. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on. IEEE, 3304--3311.

[12]

Soonhong Jung, Junsic Youn, and Sanghoon Sull. 2014. A real-time system for detecting indecent videos based on spatiotemporal patterns. IEEE Transactions on Consumer Electronics 60, 4 (2014), 696--701.

[13]

Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, and Li Fei-Fei. 2014. Large-scale video classification with convolutional neural networks. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 1725--1732.

Digital Library

[14]

Rishabh Kaushal, Srishty Saha, Payal Bajaj, and Ponnurangam Kumaraguru. 2016. KidsTube: Detection, characterization and analysis of child unsafe content & promoters on YouTube. In Privacy, Security and Trust (PST), 2016 14th Annual Conference on. IEEE, 157--164.

[15]

Will Kay, Joao Carreira, Karen Simonyan, Brian Zhang, Chloe Hillier, Sudheendra Vijayanarasimhan, Fabio Viola, Tim Green, Trevor Back, Paul Natsev, et al. 2017. The kinetics human action video dataset. arXiv preprint arXiv:1705.06950 ( 2017).

[16]

Ranjay Krishna, Kenji Hata, Frederic Ren, Li Fei-Fei, and Juan Carlos Niebles. 2017. Dense-Captioning Events in Videos. In International Conference on Computer Vision (ICCV).

[17]

H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre. 2011. HMDB: a large video database for human motion recognition. In Proceedings of the International Conference on Computer Vision (ICCV).

Digital Library

[18]

J Richard Landis and Gary G Koch. 1977. The measurement of observer agreement for categorical data. biometrics (1977), 159--174.

[19]

Seungmin Lee, Woochul Shim, and Sehun Kim. 2009. Hierarchical system for objectionable video detection. IEEE Transactions on Consumer Electronics 55, 2 (2009).

Digital Library

[20]

Yizhi Liu, Xiangdong Wang, Yongdong Zhang, and Sheng Tang. 2011. Fusing audio-words with visual features for pornographic video detection. In Trust, Security and Privacy in Computing and Communications (TrustCom), 2011 IEEE 10th International Conference on. IEEE, 1488--1493.

Digital Library

[21]

Yizhi Liu, Ying Yang, Hongtao Xie, and Sheng Tang. 2014. Fusing audio vocabulary with visual features for pornographic video detection. Future Generation Computer Systems 31 (2014), 69--76.

Digital Library

[22]

Sonia Livingstone, Leslie Haddon, Anke Görzig, and Kjartan Olafsson. 2010. Risks and safety on the internet: the perspective of European children: key findings from the EU Kids Online survey of 9-16 year olds and their parents in 25 countries. (2010).

[23]

Ana Paula B Lopes, Sandra EF de Avila, Anderson NA Peixoto, Rodrigo S Oliveira, Marcelo de M Coelho, and Arnaldo de A Araújo. 2009. Nude detection in video using bag-of-visual-features. In Computer Graphics and Image Processing (SIBGRAPI), 2009 XXII Brazilian Symposium on. IEEE, 224--231.

Digital Library

[24]

Miriam Marciel, Rubén Cuevas, Albert Banchs, Roberto González, Stefano Traverso, Mohamed Ahmed, and Arturo Azcorra. 2016. Understanding the detection of view fraud in video content portals. In Proceedings of the 25th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 357--368.

Digital Library

[25]

Jiquan Ngiam, Aditya Khosla, Mingyu Kim, Juhan Nam, Honglak Lee, and Andrew Y Ng. 2011. Multimodal deep learning. In Proceedings of the 28th international conference on machine learning (ICML-11). 689--696.

Digital Library

[26]

Victor M Torres Ochoa, Sule Yildirim Yayilgan, and Faouzi Alaya Cheikh. 2012. Adult video content detection using machine learning techniques. In Signal Image Technology and Internet Based Systems (SITIS), 2012 Eighth International Conference on. IEEE, 967--974.

Digital Library

[27]

Joanne Orlando. 2017. The way your children watch YouTube is not that surprising - but it is a concern. Here are some tips. https://theconversation.com/the-way-your-children-watch-youtube-is-not-that-surprising-but-it-is-a-concern-here-are-some-tips-87597. theconversation (December 2017). {Online;posted 1-December-2017}.

[28]

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825--2830.

Digital Library

[29]

Yuxin Peng, Yunzhen Zhao, and Junchao Zhang. 2018. Two-stream Collaborative Learning with Spatial-temporal Attention for Video Classification. IEEE Transactions on Circuits and Systems for Video Technology (2018).

[30]

Niall Rea, Gerard Lacey, R Dahyot, and C Lambe. 2006. Multimodal periodicity analysis for illicit content detection in videos. (2006).

[31]

Jorge Sánchez, Florent Perronnin, Thomas Mensink, and Jakob Verbeek. 2013. Image classification with the fisher vector: Theory and practice. International journal of computer vision 105, 3 (2013), 222--245.

Digital Library

[32]

Karen Simonyan and Andrew Zisserman. 2014. Two-stream convolutional networks for action recognition in videos. In Advances in neural information processing systems. 568--576.

Digital Library

[33]

Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).

[34]

Khurram Soomro, Amir Roshan Zamir, and Mubarak Shah. 2012. UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402 (2012).

[35]

Nitish Srivastava, Elman Mansimov, and Ruslan Salakhudinov. 2015. Unsupervised learning of video representations using lstms. In International Conference on Machine Learning. 843--852.

Digital Library

[36]

Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. In Advances in neural information processing systems. 3104--3112.

Digital Library

[37]

Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander A Alemi. 2017. Inception-v4, inception-resnet and the impact of residual connections on learning. In AAAI, Vol. 4. 12.

Digital Library

[38]

Sheng Tang, Jintao Li, Yongdong Zhang, Cheng Xie, Ming Li, Yizhi Liu, Xiufeng Hua, Yan-Tao Zheng, Jinhui Tang, and Tat-Seng Chua. 2009. Pornprobe: an lda-svm based pornography detection system. In Proceedings of the 17th ACM international conference on Multimedia. ACM, 1003--1004.

Digital Library

[39]

Victor Manuel Torres Ochoa. 2012. Adult video content detection using Machine Learning Techniques. Master's thesis.

[40]

Adrian Ulges, Christian Schulze, Damian Borth, and Armin Stahl. 2012. Pornography detection in video benefits (a lot) from a multi-modal approach. In Proceedings of the 2012 ACM international workshop on Audio and multimedia methods for large-scale video analysis. ACM, 21--26.

Digital Library

[41]

A. Vedaldi and B. Fulkerson. 2008. VLFeat: An Open and Portable Library of Computer Vision Algorithms. http://www.vlfeat.org/. (2008).

[42]

Subhashini Venugopalan, Marcus Rohrbach, Jeffrey Donahue, Raymond Mooney, Trevor Darrell, and Kate Saenko. 2015. Sequence to sequence-video to text. In Proceedings of the IEEE international conference on computer vision. 4534--4542.

Digital Library

[43]

Jônatas Wehrmann, Gabriel S Simões, Rodrigo C Barros, and Victor F Cavalcante. 2018. Adult content detection in videos with convolutional and recurrent neural networks. Neurocomputing 272 (2018), 432--438.

Digital Library

[44]

Zuxuan Wu, Xi Wang, Yu-Gang Jiang, Hao Ye, and Xiangyang Xue. 2015. Modeling spatial-temporal clues in a hybrid deep learning framework for video classification. In Proceedings of the 23rd ACM international conference on Multimedia. ACM, 461--470.

Digital Library

[45]

Jun Xu, Tao Mei, Ting Yao, and Yong Rui. 2016. Msr-vtt: A large video description dataset for bridging video and language. In Computer Vision and Pattern Recognition (CVPR), 2016 IEEE Conference on. IEEE, 5288--5296.

[46]

Zhongwen Xu, Yi Yang, and Alexander G Hauptmann. 2015. A discriminative CNN video representation for event detection. In Computer Vision and Pattern Recognition (CVPR), 2015 IEEE Conference on. IEEE, 1798--1807.

[47]

Joe Yue-Hei Ng, Matthew Hausknecht, Sudheendra Vijayanarasimhan, Oriol Vinyals, Rajat Monga, and George Toderici. 2015. Beyond short snippets: Deep networks for video classification. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4694--4702.

Cited By

Shafizadegan FNaghsh-Nilchi AShabaninia E(2025)Hybrid embedding for multimodal few-frame action recognitionMultimedia Systems10.1007/s00530-025-01676-x31:2Online publication date: 17-Feb-2025
https://doi.org/10.1007/s00530-025-01676-x
Zhu DShan XWu CYung KIp A(2024)Multi Frame Obscene Video Detection With ViTInternational Journal on Semantic Web and Information Systems10.4018/IJSWIS.35976820:1(1-18)Online publication date: 16-Nov-2024
https://doi.org/10.4018/IJSWIS.359768
Bendale BSwati Dattatraya Shirke S(2024)An empirical study of various detection based techniques with divergent learning’sWeb Intelligence10.3233/WEB-23010322:3(315-331)Online publication date: 13-Sep-2024
https://doi.org/10.3233/WEB-230103
Show More Cited By

Index Terms

KidsGUARD: fine grained approach for child unsafe video representation and detection

Recommendations

Creating Safe Places: Understanding the Lived Experiences of Families Managing Cystic Fibrosis in Young Children
CHI '24: Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems

While previous HCI research has examined chronic care management for children, less is known about supporting families with young children facing serious illnesses. We interviewed 12 families affected by cystic fibrosis (CF) to understand their ...
Child Safety in the Smart Home: Parents' Perceptions, Needs, and Mitigation Strategies
CSCW2

Concerns about child physical and digital safety are emerging with families' adoption of smart home technologies such as robot vacuums and smart speakers. To better understand parents' definitions and perceptions of child safety regarding smart home ...
A Smart Home for ‘Us’: Understanding and Designing a Parent-Child Engagement Mechanism for Child Access and Participation in the Smart Home
IDC '23: Proceedings of the 22nd Annual ACM Interaction Design and Children Conference

Families adopt smart home products for convenience and entertainment; however, these products are not designed to support children’s use cases and needs, which leads to child safety issues and family tensions. My previous work has identified the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SAC '19: Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing

April 2019

2682 pages

ISBN:9781450359337

DOI:10.1145/3297280

Conference Chairs:
Chih-Cheng Hung
Kennesaw State University, Marietta, Georgia
,
George A. Papadopoulos
University of Cyprus, Nicosia, Cyprus

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGAPP: ACM Special Interest Group on Applied Computing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 April 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SAC '19

Sponsor:

SIGAPP

SAC '19: The 34th ACM/SIGAPP Symposium on Applied Computing

April 8 - 12, 2019

Limassol, Cyprus

Acceptance Rates

Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Upcoming Conference

SAC '25

Sponsor:
sigapp

The 40th ACM/SIGAPP Symposium on Applied Computing

March 31 - April 4, 2025

Catania , Italy

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

22
Total Citations
View Citations
320
Total Downloads

Downloads (Last 12 months)49
Downloads (Last 6 weeks)16

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Shafizadegan FNaghsh-Nilchi AShabaninia E(2025)Hybrid embedding for multimodal few-frame action recognitionMultimedia Systems10.1007/s00530-025-01676-x31:2Online publication date: 17-Feb-2025
https://doi.org/10.1007/s00530-025-01676-x
Zhu DShan XWu CYung KIp A(2024)Multi Frame Obscene Video Detection With ViTInternational Journal on Semantic Web and Information Systems10.4018/IJSWIS.35976820:1(1-18)Online publication date: 16-Nov-2024
https://doi.org/10.4018/IJSWIS.359768
Bendale BSwati Dattatraya Shirke S(2024)An empirical study of various detection based techniques with divergent learning’sWeb Intelligence10.3233/WEB-23010322:3(315-331)Online publication date: 13-Sep-2024
https://doi.org/10.3233/WEB-230103
Alam IBasit AZiar R(2024)Utilizing Age‐Adaptive Deep Learning Approaches for Detecting Inappropriate Video ContentHuman Behavior and Emerging Technologies10.1155/2024/70040312024:1Online publication date: 19-Jun-2024
https://doi.org/10.1155/2024/7004031
Khan ETanveer NShahid AIqbal MMirza HJaved AQazi IQazi ZChua TNgo CKa-Wei Lee RKumar RLauw H(2024)Analyzing Ad Exposure and Content in Child-Oriented Videos on YouTubeProceedings of the ACM Web Conference 202410.1145/3589334.3645585(1215-1226)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3645585
Balat MGabr MBakr HZaky A(2024)TikGuard: A Deep Learning Transformer-Based Solution for Detecting Unsuitable TikTok Content for Kids2024 6th Novel Intelligent and Leading Emerging Sciences Conference (NILES)10.1109/NILES63360.2024.10753192(337-340)Online publication date: 19-Oct-2024
https://doi.org/10.1109/NILES63360.2024.10753192
Khan NAmin SJan ZYan C(2024)Detection of Violent Scenes in Cartoon Movies Using a Deep Learning ApproachIEEE Access10.1109/ACCESS.2024.348020512(154080-154091)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3480205
Povedano Álvarez DSandoval Orozco AGarcía-Miguel JGarcía Villalba L(2023)Learning Strategies for Sensitive Content DetectionElectronics10.3390/electronics1211249612:11(2496)Online publication date: 1-Jun-2023
https://doi.org/10.3390/electronics12112496
Alqahtani SYafooz WAlsaeedi ASyed LAlluhaibi R(2023)Children’s Safety on YouTube: A Systematic ReviewApplied Sciences10.3390/app1306404413:6(4044)Online publication date: 22-Mar-2023
https://doi.org/10.3390/app13064044
Ali SRazi AKim SAlsoubai ALing CDe Choudhury MWisniewski PStringhini G(2023)Getting Meta: A Multimodal Approach for Detecting Unsafe Conversations within Instagram Direct Messages of YouthProceedings of the ACM on Human-Computer Interaction10.1145/35796087:CSCW1(1-30)Online publication date: 16-Apr-2023
https://dl.acm.org/doi/10.1145/3579608
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten