Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Online Estimation of Evolving Human Visual Interest

Published: 04 September 2014 Publication History

Abstract

Regions in video streams attracting human interest contribute significantly to human understanding of the video. Being able to predict salient and informative Regions of Interest (ROIs) through a sequence of eye movements is a challenging problem. Applications such as content-aware retargeting of videos to different aspect ratios while preserving informative regions and smart insertion of dialog (closed-caption text)1 into the video stream can significantly be improved using the predicted ROIs. We propose an interactive human-in-the-loop framework to model eye movements and predict visual saliency into yet-unseen frames. Eye tracking and video content are used to model visual attention in a manner that accounts for important eye-gaze characteristics such as temporal discontinuities due to sudden eye movements, noise, and behavioral artifacts. A novel statistical- and algorithm-based method gaze buffering is proposed for eye-gaze analysis and its fusion with content-based features. Our robust saliency prediction is instantiated for two challenging and exciting applications. The first application alters video aspect ratios on-the-fly using content-aware video retargeting, thus making them suitable for a variety of display sizes. The second application dynamically localizes active speakers and places dialog captions on-the-fly in the video stream. Our method ensures that dialogs are faithful to active speaker locations and do not interfere with salient content in the video stream. Our framework naturally accommodates personalisation of the application to suit biases and preferences of individual users.

Supplementary Material

a8-katti-apndx.pdf (katti.zip)
Supplemental movie, appendix, image and software files for, Online Estimation of Evolving Human Visual Interest

References

[1]
A1 Clip: Paris Zarcilla. 2009. Smile. in aniBOOM online video clip, YouTube, https://www.youtube.com/watch?v=ghgzFY85Gw.
[2]
J. S. Agustin, H. Skovsgaard, E. Mollenbach, M. Barret, M. Tall, D. W. Hansen, and J. P. Hansen. 2010. Evaluation of a low-cost open-source gaze tracker. In Proceedings of the Symposium on Eye-Tracking Research and Applications (ETRA'10). ACM Press, New York, 77--80.
[3]
F. Alnajar, T. Gevers, R. Valenti, and S. Ghebreab. 2013. Calibration-free gaze estimation using human gaze patterns. In Proceedings of the IEEE International Conference on Computer Vision (ICCV'13). 137--144.
[4]
M. Arulampalam, S. Maskell, N. Gordon, and T. Clapp. 2002. A tutorial on particle filters for online nonlinear/non-gaussian bayesian tracking. IEEE Trans. Signal Process. 50, 2, 174--188.
[5]
S. Avidan and A. Shamir. 2007. Seam carving for content-aware image resizing. ACM Trans. Graph. 26, 3, Article 10.
[6]
P. Baldi and L. Itti. 2010. Of bits and wows: A bayesian theory of surprise with applications to attention. Neural Netw. 23, 5, 649--666.
[7]
M. Cerf, J. Harel, W. Einhuser, and C. Koch. 2007. Predicting human gaze using low-level saliency combined with face detection. In Neural Information Processing Systems, J. C. Platt, D. Koller, Y. Singer, and S. T. Roweis, Eds., MIT Press, 1--7.
[8]
Chakde Clip: Dir. Shimit Amin. 2007. Chak de! India. Yash Raj films, DVD.
[9]
N. Dalal and B. Triggs. 2005. Histograms of oriented gradients for human detection. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05). 886--893.
[10]
Y. Feng, G. Cheung, W.-T. Tan, P. Le Callet, and Y. Ji. 2013. Low-cost eye gaze prediction system for interactive networked video streaming. IEEE Trans. Multimedia 15, 8, 1865--1879.
[11]
M. Grundmann, V. Kwatra, M. Han, and I. A. Essa. 2010. Discontinuous seam-carving for video retargeting. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'10). 569--576.
[12]
J. Harel, C. Koch, and P. Perona. 2007. Graph-based visual saliency. Adv. Neural Inf. Process. Syst. 19, 545--552.
[13]
R. Hong, M. Wang, M. Xu, S. Yan, and T.-S. Chua. 2010. Dynamic captioning: Video accessibility enhancement for hearing impairment. In Proceedings of the International Conference on Multimedia (MM'10). 421--430.
[14]
L. Itti, C. Koch, and E. Niebur. 1998. A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. Pattern Anal. Mach. Intell. 20, 11, 1254--1259.
[15]
E. Jain. 2012. Attention-guided algorithms to retarget and augment animations, stills, and videos. Ph.D. dissertation, The Robotics Institute, Carnegie Mellon University.
[16]
JBDY Clip: Dir. Kundan Shah. 1983. Jaane Bhi Do Yaaro. in National Film Development Corporation, Ultra Distributors, DVD 2004.
[17]
T. Judd, K. Ehinger, F. Durand, and A. Torralba. 2009. Learning to predict where humans look. In Proceedings of the IEEE International Conference on Computer Vision (ICCV'09). 2106--2113.
[18]
M. Kankanhalli, J. Wang, and R. Jain. 2006. Experiential sampling in multimedia systems. IEEE Trans. Multimedia 8, 5, 937--946.
[19]
H. Katti, S. Ramanathan, M. S. Kankanhalli, N. Sebe, T. S. Chua, and K. R. Ramakrishnan. 2010. Making computers look the way we look: Exploiting visual attention for image understanding. In Proceedings of the International Conference on Multimedia (MM'10). ACM Press, New York, 667--670.
[20]
S. Kopf, J. Kiess, H. Lemelson, and W. Effelsberg. 2009. FSCAV: Fast seam carving for size adaptation of videos. In Proceedings of the 17th ACM International Conference on Multimedia (MM'09). ACM Press, New York, 321--330.
[21]
National Captioning Institute. 1970. Online article on captioned television. http://www.ncicap.org/caphist.asp.
[22]
S. Ramanathan, H. Katti, R. Huang, T. S. Chua, and M. S. Kankanhalli. 2009. Automated localization of affective objects and actions in images via caption text-cum-eye gaze analysis. In Proceedings of the 17th ACM International Conference on Multimedia (MM'09). 729--732.
[23]
S. Ramanathan, H. Katti, M. S. Kankanhalli, T. S. Chua, and N. Sebe. 2010. An eye fixation database for saliency detection in images. In Proceedings of the 11th European Conference on Computer Vision (ECCV'10). 30--43.
[24]
M. Rubinstein, A. Shamir, and S. Avidan. 2008. Improved seam carving for video retargeting. ACM Trans. Graph. 27, 3, 1--9.
[25]
J. San Agustin, H. Skovsgaard, J. P. Hansen, and D. W. Hansen. 2009. Low-cost gaze interaction: Ready to deliver the promises. In Proceedings of the Extended Abstracts on Human Factors in Computing Systems (CHI-EA'09). ACM Press, New York, 4453--4458.
[26]
A. Santella and D. Decarlo. 2004. Robust clustering of eye movement recordings for quantification of visual interest. In Proceedings of the Symposium on Eye Tracking Research and Applications (ETRA'04). ACM Press, New York, 27--34.
[27]
A. Shamir and O. Sorkine. 2009. Visual media retargeting. In Proceeding of the ACM SIGGRAPH ASIA Courses. ACM Press, New York, 1--13.
[28]
TTSS Clip: Dir. Tomas Alfredson. 2011. Tinker Tailor Soldier Spy. in StudioCanal, Karla Films, Paradis Films, Kinowelt, Filmproduction, Working Title Films, Canal+, Cin+. STUDIOCANAL, UK, DVD.
[29]
D. Xu and P. Nasiopoulos. 2009. Logo insertion transcoding for h.264/avc compressed video. In Proceedings of the 16th IEEE International Conference on Image Processing (ICIP'09). 3693--3696.
[30]
Y2 Clip: Justin Lee, K. Tam, T. Bradbury, K. Rashidi, and K. Liang. 2009. The 5 second rule. in CAMPUS MOVIEFEST, Outspire Productions online video clip, YouTube, https://www.youtube.com/watch?v=9rgCsosjJtl.

Cited By

View all
  • (2024)Attention-based automatic editing of virtual lectures for reduced production labor and effective learning experienceInternational Journal of Human-Computer Studies10.1016/j.ijhcs.2023.103161181:COnline publication date: 1-Jan-2024
  • (2022)Recognition of Advertisement Emotions With Application to Computational AdvertisingIEEE Transactions on Affective Computing10.1109/TAFFC.2020.296454913:2(781-792)Online publication date: 1-Apr-2022
  • (2022)Active Speaker Recognition using Cross Attention Audio-Video Fusion2022 10th European Workshop on Visual Information Processing (EUVIP)10.1109/EUVIP53989.2022.9922810(1-6)Online publication date: 11-Sep-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications
ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 11, Issue 1
August 2014
151 pages
ISSN:1551-6857
EISSN:1551-6865
DOI:10.1145/2665935
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 September 2014
Accepted: 01 April 2014
Revised: 01 December 2013
Received: 01 August 2013
Published in TOMM Volume 11, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Video retargeting
  2. gaze
  3. video captioning
  4. visual attention

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)16
  • Downloads (Last 6 weeks)0
Reflects downloads up to 24 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Attention-based automatic editing of virtual lectures for reduced production labor and effective learning experienceInternational Journal of Human-Computer Studies10.1016/j.ijhcs.2023.103161181:COnline publication date: 1-Jan-2024
  • (2022)Recognition of Advertisement Emotions With Application to Computational AdvertisingIEEE Transactions on Affective Computing10.1109/TAFFC.2020.296454913:2(781-792)Online publication date: 1-Apr-2022
  • (2022)Active Speaker Recognition using Cross Attention Audio-Video Fusion2022 10th European Workshop on Visual Information Processing (EUVIP)10.1109/EUVIP53989.2022.9922810(1-6)Online publication date: 11-Sep-2022
  • (2021)Automatic Subtitle Placement Through Active Speaker Identification in Multimedia Documents2021 International Conference on e-Health and Bioengineering (EHB)10.1109/EHB52898.2021.9657604(1-4)Online publication date: 18-Nov-2021
  • (2020)A View on the Viewer: Gaze-Adaptive Captions for VideosProceedings of the 2020 CHI Conference on Human Factors in Computing Systems10.1145/3313831.3376266(1-12)Online publication date: 21-Apr-2020
  • (2019)DEEP-HEAR: A Multimodal Subtitle Positioning System Dedicated to Deaf and Hearing-Impaired PeopleIEEE Access10.1109/ACCESS.2019.2925806(1-1)Online publication date: 2019
  • (2018)Fast Volume Seam Carving With Multipass Dynamic ProgrammingIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2016.262056328:5(1087-1101)Online publication date: May-2018
  • (2017)Evaluating content-centric vs. user-centric ad affect recognitionProceedings of the 19th ACM International Conference on Multimodal Interaction10.1145/3136755.3136796(402-410)Online publication date: 3-Nov-2017
  • (2016)Region-of-Interest-Based Subtitle Placement Using Eye-Tracking Data of Multiple ViewersProceedings of the ACM International Conference on Interactive Experiences for TV and Online Video10.1145/2932206.2933558(123-128)Online publication date: 17-Jun-2016
  • (2016)An Adjustable Gaze Tracking System and Its Application for Automatic Discrimination of Interest ObjectsIEEE/ASME Transactions on Mechatronics10.1109/TMECH.2015.247052221:2(973-979)Online publication date: Apr-2016
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media