Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3297280.3297487acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article

KidsGUARD: fine grained approach for child unsafe video representation and detection

Published: 08 April 2019 Publication History

Abstract

Increasingly more and more videos are being uploaded on video sharing platforms, and a significant number of viewers on these platforms are children. At times, these videos have violent or sexually explicit scenes (referred as child unsafe) to catch children's attention. To evade moderation, malicious video uploaders typically limit the child unsafe content to only a few frames in the video. Hence, a fine-grained approach, referred as KidsGUARD1, to detect sparsely present child unsafe content is required. Prior approaches to content moderation either flag the entire video as inappropriate or use hand-crafted features derived from video frames. In this work, we leverage Long Short Term Memory (LSTM) based autoencoder to learn effective video representations of video descriptors obtained from using VGG16 Convolutional Neural Network (CNN). Encoded video representations are fed into LSTM classifier for detection of sparse child unsafe video content. To evaluate this approach, we create a dataset of 109,835 video clips curated specifically for child unsafe content. We find that deep learning approach (1) detects fine-grained child unsafe video content with the granularity of 1 second, (2) identifies even sparsely location child unsafe video content by achieving a high recall of 81% at high precision of 80%, and (3) outperforms baseline video encoding approaches based on like Fisher Vector (FV) and Vector of Locally Aggregated Descriptors (VLAD).

References

[1]
Sandra Avila, Nicolas Thome, Matthieu Cord, Eduardo Valle, and Arnaldo De A AraúJo. 2013. Pooling in image representation: The visual codeword point of view. Computer Vision and Image Understanding 117, 5 (2013), 453--465.
[2]
Vlad Bulakh, Christopher W Dunn, and Minaxi Gupta. 2014. Identifying fraudulently promoted online videos. In Proceedings of the 23rd International Conference on World Wide Web. ACM, 1111--1116.
[3]
Carlos Caetano, Sandra Avila, Silvio Guimaraes, and Arnaldo de A Araújo. 2014. Pornography detection using bossanova video descriptor. In Signal Processing Conference (EUSIPCO), 2014 Proceedings of the 22nd European. IEEE, 1681--1685.
[4]
Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: a library for support vector machines. ACM transactions on intelligent systems and technology (TIST) 2, 3 (2011), 27.
[5]
Stuart Dredge. 2015. Why YouTube is the new children's TV... and why it matters. theguardian (November 2015). https://www.theguardian.com/technology/2015/nov/19/youtube-is-the-new-childrens-tv-heres-why-that-matters {Online; posted 19-November-2015}.
[6]
Wenbin Du, Yali Wang, and Yu Qiao. 2018. Recurrent Spatial-Temporal Attention Network for Action Recognition in Videos. IEEE Transactions on Image Processing 27, 3 (2018), 1347--1360.
[7]
Lijuan Duan, Guoqin Cui, Wen Gao, and Hongming Zhang. 2002. Adult image detection method base-on skin color model and support vector machine. In Asian conference on computer vision. 797--800.
[8]
Tadilo Endeshaw, Johan Garcia, and Andreas Jakobsson. 2008. Classification of indecent videos by low complexity repetitive motion detection. In Applied Imagery Pattern Recognition Workshop, 2008. AIPR'08. 37th IEEE. IEEE, 1--7.
[9]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780.
[10]
Christian Jansohn, Adrian Ulges, and Thomas M Breuel. 2009. Detecting pornographic video content by combining image features with motion information. In Proceedings of the 17th ACM international conference on Multimedia. ACM, 601--604.
[11]
Hervé Jégou, Matthijs Douze, Cordelia Schmid, and Patrick Pérez. 2010. Aggregating local descriptors into a compact image representation. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on. IEEE, 3304--3311.
[12]
Soonhong Jung, Junsic Youn, and Sanghoon Sull. 2014. A real-time system for detecting indecent videos based on spatiotemporal patterns. IEEE Transactions on Consumer Electronics 60, 4 (2014), 696--701.
[13]
Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, and Li Fei-Fei. 2014. Large-scale video classification with convolutional neural networks. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 1725--1732.
[14]
Rishabh Kaushal, Srishty Saha, Payal Bajaj, and Ponnurangam Kumaraguru. 2016. KidsTube: Detection, characterization and analysis of child unsafe content & promoters on YouTube. In Privacy, Security and Trust (PST), 2016 14th Annual Conference on. IEEE, 157--164.
[15]
Will Kay, Joao Carreira, Karen Simonyan, Brian Zhang, Chloe Hillier, Sudheendra Vijayanarasimhan, Fabio Viola, Tim Green, Trevor Back, Paul Natsev, et al. 2017. The kinetics human action video dataset. arXiv preprint arXiv:1705.06950 ( 2017).
[16]
Ranjay Krishna, Kenji Hata, Frederic Ren, Li Fei-Fei, and Juan Carlos Niebles. 2017. Dense-Captioning Events in Videos. In International Conference on Computer Vision (ICCV).
[17]
H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre. 2011. HMDB: a large video database for human motion recognition. In Proceedings of the International Conference on Computer Vision (ICCV).
[18]
J Richard Landis and Gary G Koch. 1977. The measurement of observer agreement for categorical data. biometrics (1977), 159--174.
[19]
Seungmin Lee, Woochul Shim, and Sehun Kim. 2009. Hierarchical system for objectionable video detection. IEEE Transactions on Consumer Electronics 55, 2 (2009).
[20]
Yizhi Liu, Xiangdong Wang, Yongdong Zhang, and Sheng Tang. 2011. Fusing audio-words with visual features for pornographic video detection. In Trust, Security and Privacy in Computing and Communications (TrustCom), 2011 IEEE 10th International Conference on. IEEE, 1488--1493.
[21]
Yizhi Liu, Ying Yang, Hongtao Xie, and Sheng Tang. 2014. Fusing audio vocabulary with visual features for pornographic video detection. Future Generation Computer Systems 31 (2014), 69--76.
[22]
Sonia Livingstone, Leslie Haddon, Anke Görzig, and Kjartan Olafsson. 2010. Risks and safety on the internet: the perspective of European children: key findings from the EU Kids Online survey of 9-16 year olds and their parents in 25 countries. (2010).
[23]
Ana Paula B Lopes, Sandra EF de Avila, Anderson NA Peixoto, Rodrigo S Oliveira, Marcelo de M Coelho, and Arnaldo de A Araújo. 2009. Nude detection in video using bag-of-visual-features. In Computer Graphics and Image Processing (SIBGRAPI), 2009 XXII Brazilian Symposium on. IEEE, 224--231.
[24]
Miriam Marciel, Rubén Cuevas, Albert Banchs, Roberto González, Stefano Traverso, Mohamed Ahmed, and Arturo Azcorra. 2016. Understanding the detection of view fraud in video content portals. In Proceedings of the 25th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 357--368.
[25]
Jiquan Ngiam, Aditya Khosla, Mingyu Kim, Juhan Nam, Honglak Lee, and Andrew Y Ng. 2011. Multimodal deep learning. In Proceedings of the 28th international conference on machine learning (ICML-11). 689--696.
[26]
Victor M Torres Ochoa, Sule Yildirim Yayilgan, and Faouzi Alaya Cheikh. 2012. Adult video content detection using machine learning techniques. In Signal Image Technology and Internet Based Systems (SITIS), 2012 Eighth International Conference on. IEEE, 967--974.
[27]
Joanne Orlando. 2017. The way your children watch YouTube is not that surprising - but it is a concern. Here are some tips. https://theconversation.com/the-way-your-children-watch-youtube-is-not-that-surprising-but-it-is-a-concern-here-are-some-tips-87597. theconversation (December 2017). {Online;posted 1-December-2017}.
[28]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825--2830.
[29]
Yuxin Peng, Yunzhen Zhao, and Junchao Zhang. 2018. Two-stream Collaborative Learning with Spatial-temporal Attention for Video Classification. IEEE Transactions on Circuits and Systems for Video Technology (2018).
[30]
Niall Rea, Gerard Lacey, R Dahyot, and C Lambe. 2006. Multimodal periodicity analysis for illicit content detection in videos. (2006).
[31]
Jorge Sánchez, Florent Perronnin, Thomas Mensink, and Jakob Verbeek. 2013. Image classification with the fisher vector: Theory and practice. International journal of computer vision 105, 3 (2013), 222--245.
[32]
Karen Simonyan and Andrew Zisserman. 2014. Two-stream convolutional networks for action recognition in videos. In Advances in neural information processing systems. 568--576.
[33]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
[34]
Khurram Soomro, Amir Roshan Zamir, and Mubarak Shah. 2012. UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402 (2012).
[35]
Nitish Srivastava, Elman Mansimov, and Ruslan Salakhudinov. 2015. Unsupervised learning of video representations using lstms. In International Conference on Machine Learning. 843--852.
[36]
Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. In Advances in neural information processing systems. 3104--3112.
[37]
Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander A Alemi. 2017. Inception-v4, inception-resnet and the impact of residual connections on learning. In AAAI, Vol. 4. 12.
[38]
Sheng Tang, Jintao Li, Yongdong Zhang, Cheng Xie, Ming Li, Yizhi Liu, Xiufeng Hua, Yan-Tao Zheng, Jinhui Tang, and Tat-Seng Chua. 2009. Pornprobe: an lda-svm based pornography detection system. In Proceedings of the 17th ACM international conference on Multimedia. ACM, 1003--1004.
[39]
Victor Manuel Torres Ochoa. 2012. Adult video content detection using Machine Learning Techniques. Master's thesis.
[40]
Adrian Ulges, Christian Schulze, Damian Borth, and Armin Stahl. 2012. Pornography detection in video benefits (a lot) from a multi-modal approach. In Proceedings of the 2012 ACM international workshop on Audio and multimedia methods for large-scale video analysis. ACM, 21--26.
[41]
A. Vedaldi and B. Fulkerson. 2008. VLFeat: An Open and Portable Library of Computer Vision Algorithms. http://www.vlfeat.org/. (2008).
[42]
Subhashini Venugopalan, Marcus Rohrbach, Jeffrey Donahue, Raymond Mooney, Trevor Darrell, and Kate Saenko. 2015. Sequence to sequence-video to text. In Proceedings of the IEEE international conference on computer vision. 4534--4542.
[43]
Jônatas Wehrmann, Gabriel S Simões, Rodrigo C Barros, and Victor F Cavalcante. 2018. Adult content detection in videos with convolutional and recurrent neural networks. Neurocomputing 272 (2018), 432--438.
[44]
Zuxuan Wu, Xi Wang, Yu-Gang Jiang, Hao Ye, and Xiangyang Xue. 2015. Modeling spatial-temporal clues in a hybrid deep learning framework for video classification. In Proceedings of the 23rd ACM international conference on Multimedia. ACM, 461--470.
[45]
Jun Xu, Tao Mei, Ting Yao, and Yong Rui. 2016. Msr-vtt: A large video description dataset for bridging video and language. In Computer Vision and Pattern Recognition (CVPR), 2016 IEEE Conference on. IEEE, 5288--5296.
[46]
Zhongwen Xu, Yi Yang, and Alexander G Hauptmann. 2015. A discriminative CNN video representation for event detection. In Computer Vision and Pattern Recognition (CVPR), 2015 IEEE Conference on. IEEE, 1798--1807.
[47]
Joe Yue-Hei Ng, Matthew Hausknecht, Sudheendra Vijayanarasimhan, Oriol Vinyals, Rajat Monga, and George Toderici. 2015. Beyond short snippets: Deep networks for video classification. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4694--4702.

Cited By

View all
  • (2025)Hybrid embedding for multimodal few-frame action recognitionMultimedia Systems10.1007/s00530-025-01676-x31:2Online publication date: 17-Feb-2025
  • (2024)Multi Frame Obscene Video Detection With ViTInternational Journal on Semantic Web and Information Systems10.4018/IJSWIS.35976820:1(1-18)Online publication date: 16-Nov-2024
  • (2024)An empirical study of various detection based techniques with divergent learning’sWeb Intelligence10.3233/WEB-23010322:3(315-331)Online publication date: 13-Sep-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SAC '19: Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing
April 2019
2682 pages
ISBN:9781450359337
DOI:10.1145/3297280
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 April 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. child safety
  2. social media analysis
  3. video analysis

Qualifiers

  • Research-article

Conference

SAC '19
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Upcoming Conference

SAC '25
The 40th ACM/SIGAPP Symposium on Applied Computing
March 31 - April 4, 2025
Catania , Italy

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)49
  • Downloads (Last 6 weeks)16
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Hybrid embedding for multimodal few-frame action recognitionMultimedia Systems10.1007/s00530-025-01676-x31:2Online publication date: 17-Feb-2025
  • (2024)Multi Frame Obscene Video Detection With ViTInternational Journal on Semantic Web and Information Systems10.4018/IJSWIS.35976820:1(1-18)Online publication date: 16-Nov-2024
  • (2024)An empirical study of various detection based techniques with divergent learning’sWeb Intelligence10.3233/WEB-23010322:3(315-331)Online publication date: 13-Sep-2024
  • (2024)Utilizing Age‐Adaptive Deep Learning Approaches for Detecting Inappropriate Video ContentHuman Behavior and Emerging Technologies10.1155/2024/70040312024:1Online publication date: 19-Jun-2024
  • (2024)Analyzing Ad Exposure and Content in Child-Oriented Videos on YouTubeProceedings of the ACM Web Conference 202410.1145/3589334.3645585(1215-1226)Online publication date: 13-May-2024
  • (2024)TikGuard: A Deep Learning Transformer-Based Solution for Detecting Unsuitable TikTok Content for Kids2024 6th Novel Intelligent and Leading Emerging Sciences Conference (NILES)10.1109/NILES63360.2024.10753192(337-340)Online publication date: 19-Oct-2024
  • (2024)Detection of Violent Scenes in Cartoon Movies Using a Deep Learning ApproachIEEE Access10.1109/ACCESS.2024.348020512(154080-154091)Online publication date: 2024
  • (2023)Learning Strategies for Sensitive Content DetectionElectronics10.3390/electronics1211249612:11(2496)Online publication date: 1-Jun-2023
  • (2023)Children’s Safety on YouTube: A Systematic ReviewApplied Sciences10.3390/app1306404413:6(4044)Online publication date: 22-Mar-2023
  • (2023)Getting Meta: A Multimodal Approach for Detecting Unsafe Conversations within Instagram Direct Messages of YouthProceedings of the ACM on Human-Computer Interaction10.1145/35796087:CSCW1(1-30)Online publication date: 16-Apr-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media