poster

Public Access

VIVA: Visual Exploration and Analysis of Videos with Interactive Annotation

Authors:

Anita Ruangrotsakun,

Thuy-Vy Nguyen,

Minsuk KahngAuthors Info & Claims

IUI '23 Companion: Companion Proceedings of the 28th International Conference on Intelligent User Interfaces

Pages 162 - 165

https://doi.org/10.1145/3581754.3584160

Published: 27 March 2023 Publication History

All formats PDF

Abstract

This paper presents VIVA, a novel interactive tool for visually exploring long videos and searching for specific moments. Previous work on video data exploration and analytics often assumes that manually-created, rich annotations are available. However, such metadata may not be easily obtained. We design an interactively machine learning workflow for users to rapidly create annotations along a timeline. Combined with VIVA’s focus+context visualization that effectively displays frame snapshots in the context of a video stream, VIVA enables users to explore and analyze long video clips by incrementally make sense of them. We present usage scenarios that demonstrate how users would use VIVA for video-related tasks.

Supplementary Material

MP4 File (viva-iui23demo-video.mp4)

Demo video

Download
69.48 MB

References

[1]

2016. Svelte: Cybernetically enhanced web apps. https://svelte.dev/. Accessed on February 16, 2023.

[2]

Hassan Akbari, Liangzhe Yuan, Rui Qian, Wei-Hong Chuang, Shih-Fu Chang, Yin Cui, and Boqing Gong. 2021. Vatt: Transformers for multimodal self-supervised learning from raw video, audio and text. Advances in Neural Information Processing Systems 34 (2021), 24206–24221.

[3]

Saleema Amershi, Maya Cakmak, William Bradley Knox, and Todd Kulesza. 2014. Power to the people: The role of humans in interactive machine learning. AI Magazine 35, 4 (2014), 105–120.

Digital Library

[4]

Jürgen Bernard, Marco Hutter, Matthias Zeppelzauer, Dieter Fellner, and Michael Sedlmair. 2017. Comparing visual-interactive labeling with active learning: An experimental study. IEEE Transactions on Visualization and Computer Graphics 24, 1(2017), 298–308.

[5]

Tat-Seng Chua, Shih-Fu Chang, Lekha Chaisorn, and Winston Hsu. 2004. Story boundary detection in large broadcast news video archives: techniques, experience and trends. In Proceedings of the 12th annual ACM international conference on Multimedia. 656–659.

Digital Library

[6]

Andy Cockburn, Amy Karlson, and Benjamin B Bederson. 2009. A review of overview+ detail, zooming, and focus+context interfaces. ACM Computing Surveys (CSUR) 41, 1 (2009), 1–31.

Digital Library

[7]

Ork de Rooij, Jarke van Wijk, and Marcel Worring. 2010. Mediatable: Interactive categorization of multimedia collections. IEEE Computer Graphics and Applications 30, 5 (2010), 42–51.

Digital Library

[8]

Ork de Rooij and Marcel Worring. 2013. Active bucket categorization for high recall video retrieval. IEEE Transactions on Multimedia 15, 4 (2013), 898–907.

Digital Library

[9]

Dazhen Deng, Jiang Wu, Jiachen Wang, Yihong Wu, Xiao Xie, Zheng Zhou, Hui Zhang, Xiaolong Zhang, and Yingcai Wu. 2021. EventAnchor: Reducing Human Interactions in Event Annotation of Racket Sports Videos. In CHI Conference on Human Factors in Computing Systems.

[10]

John J Dudley and Per Ola Kristensson. 2018. A review of user interface design for interactive machine learning. ACM Transactions on Interactive Intelligent Systems (TiiS) 8, 2(2018), 1–37.

Digital Library

[11]

C Ailie Fraser, Joy O Kim, Hijung Valentina Shin, Joel Brandt, and Mira Dontcheva. 2020. Temporal Segmentation of Creative Live Streams. In CHI Conference on Human Factors in Computing Systems. 1–12.

[12]

Bhavya Ghai, Q Vera Liao, Yunfeng Zhang, Rachel Bellamy, and Klaus Mueller. 2021. Explainable Active Learning (XAL): Toward AI Explanations as Interfaces for Machine Teachers. Proceedings of the ACM on Human-Computer Interaction 4, CSCW3(2021), 1–28.

Digital Library

[13]

Gaudenz Halter, Rafael Ballester-Ripoll, Barbara Flueckiger, and Renato Pajarola. 2019. VIAN: A visual annotation tool for film analysis. Computer Graphics Forum 38, 3 (2019), 119–129.

[14]

Marti A Hearst. 1994. Multi-paragraph segmentation expository text. In 32nd Annual Meeting of the Association for Computational Linguistics. 9–16.

Digital Library

[15]

Benjamin Höferlin, Rudolf Netzel, Markus Höferlin, Daniel Weiskopf, and Gunther Heidemann. 2012. Inter-active learning of ad-hoc classifiers for video visual analytics. In 2012 IEEE Conference on Visual Analytics Science and Technology (VAST). IEEE, 23–32.

Digital Library

[16]

Nam Wook Kim, Benjamin Bach, Hyejin Im, Sasha Schriber, Markus Gross, and Hanspeter Pfister. 2017. Visualizing nonlinear narratives with story curves. IEEE Transactions on Visualization and Computer Graphics 24, 1(2017), 595–604.

[17]

Kuno Kurzhals, Marcel Hlawatsch, Christof Seeger, and Daniel Weiskopf. 2016. Visual analytics for mobile eye tracking. IEEE Transactions on Visualization and Computer Graphics 23, 1(2016), 301–310.

Digital Library

[18]

Kuno Kurzhals, Markus John, Florian Heimerl, Paul Kuznecov, and Daniel Weiskopf. 2016. Visual movie analytics. IEEE Transactions on Multimedia 18, 11 (2016), 2149–2160.

Digital Library

[19]

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In European conference on computer vision. Springer, 740–755.

[20]

A Chris Long, Brad Myers, Juan Casares, Scott Stevens, and Albert Corbett. [n. d.]. Video Editing Using Lenses and Semantic Zooming. Carnegie Mellon University([n. d.]).

[21]

Behrooz Mahasseni, Michael Lam, and Sinisa Todorovic. 2017. Unsupervised video summarization with adversarial lstm networks. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 202–211.

[22]

Gary Marchionini. 2006. Exploratory search: from finding to understanding. Commun. ACM 49, 4 (2006), 41–46.

Digital Library

[23]

Justin Matejka, Tovi Grossman, and George Fitzmaurice. 2014. Video lens: rapid playback and exploration of large video collections and associated metadata. In Proceedings of the 27th annual ACM symposium on User interface software and technology. 541–550.

Digital Library

[24]

Amy Pavel, Dan B Goldman, Björn Hartmann, and Maneesh Agrawala. 2015. Sceneskim: Searching and browsing movies using synchronized captions, scripts and plot summaries. In Proceedings of the 28th Annual ACM Symposium on User Interface Software & Technology. 181–190.

Digital Library

[25]

Daniel M Russell, Mark J Stefik, Peter Pirolli, and Stuart K Card. 1993. The cost structure of sensemaking. In Proceedings of the INTERACT’93 and CHI’93 Conference on Human Factors in Computing Systems. 269–276.

Digital Library

[26]

Ben Shneiderman. 2003. The eyes have it: A task by data type taxonomy for information visualizations. In The craft of information visualization. Elsevier, 364–371.

[27]

Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. preprint arXiv:1409.1556(2014).

[28]

Tan Tang, Yanhong Wu, Yingcai Wu, Lingyun Yu, and Yuhong Li. 2021. VideoModerator: A Risk-aware Framework for Multimodal Video Moderation in E-Commerce. IEEE Transactions on Visualization and Computer Graphics 28, 1(2021), 846–856.

Digital Library

[29]

Anh Truong, Peggy Chi, David Salesin, Irfan Essa, and Maneesh Agrawala. 2021. Automatic generation of two-level hierarchical tutorials from instructional makeup videos. In CHI Conference on Human Factors in Computing Systems. 1–16.

Digital Library

[30]

Saelyne Yang, Jisu Yim, Juho Kim, and Hijung Valentina Shin. 2022. CatchLive: Real-time Summarization of Live Streams with Stream Content and Interaction Data. In CHI Conference on Human Factors in Computing Systems. 1–20.

[31]

Minerva M Yeung and Boon-Lock Yeo. 1997. Video visualization for compact presentation and fast browsing of pictorial content. IEEE Transactions on circuits and Systems for Video Technology 7, 5(1997), 771–785.

Digital Library

[32]

Haipeng Zeng, Xingbo Wang, Aoyu Wu, Yong Wang, Quan Li, Alex Endert, and Huamin Qu. 2019. EmoCo: Visual analysis of emotion coherence in presentation videos. IEEE Transactions on Visualization and Computer Graphics 26, 1(2019), 927–937.

[33]

Jian Zhao, Chidansh Bhatt, Matthew Cooper, and David A Shamma. 2018. Flexible learning with semantic visual exploration and sequence-based recommendation of MOOC videos. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. 1–13.

Digital Library

[34]

Zhenge Zhao, Panpan Xu, Carlos Scheidegger, and Liu Ren. 2021. Human-in-the-loop extraction of interpretable concepts in deep learning models. IEEE Transactions on Visualization and Computer Graphics 28, 1(2021), 780–790.

Digital Library

Cited By

Zabulis XPartarakis NBartalesi VPratelli NMeghini CDubois AMoreno IManitsaris S(2024)Multimodal Dictionaries for Traditional Craft EducationMultimodal Technologies and Interaction10.3390/mti80700638:7(63)Online publication date: 18-Jul-2024
https://doi.org/10.3390/mti8070063

Index Terms

VIVA: Visual Exploration and Analysis of Videos with Interactive Annotation
1. Human-centered computing
  1. Human computer interaction (HCI)
    1. Interactive systems and tools
  2. Visualization
    1. Visualization application domains
      1. Visual analytics

Recommendations

Interactive exploration of volumetric data sets with a combined visual and haptic interface
Provenance and Annotation for Visual Exploration Systems

Exploring data using visualization systems has been shown to be an extremely powerful technique. However, one of the challenges with such systems is an inability to completely support the knowledge discovery process. More than simply looking at data, ...
An Atmospheric Visual Analysis and Exploration System

Meteorological research involves the analysis of multi-field, multi-scale, and multi-source data sets. In order to better understand these data sets, models and measurements at different resolutions must be analyzed. Unfortunately, traditional ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

IUI '23 Companion: Companion Proceedings of the 28th International Conference on Intelligent User Interfaces

March 2023

266 pages

ISBN:9798400701078

DOI:10.1145/3581754

Copyright © 2023 Owner/Author.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 March 2023

Check for updates

Qualifiers

Poster
Research
Refereed limited

Funding Sources

Conference

IUI '23

Sponsor:

IUI '23: 28th International Conference on Intelligent User Interfaces

March 27 - 31, 2023

NSW, Sydney, Australia

Acceptance Rates

Overall Acceptance Rate 746 of 2,811 submissions, 27%

Upcoming Conference

IUI '25

Sponsor:
sigai
sigai

30th International Conference on Intelligent User Interfaces

March 24 - 27, 2025

Cagliari , Italy

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
1,012
Total Downloads

Downloads (Last 12 months)891
Downloads (Last 6 weeks)100

Reflects downloads up to 16 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zabulis XPartarakis NBartalesi VPratelli NMeghini CDubois AMoreno IManitsaris S(2024)Multimodal Dictionaries for Traditional Craft EducationMultimodal Technologies and Interaction10.3390/mti80700638:7(63)Online publication date: 18-Jul-2024
https://doi.org/10.3390/mti8070063

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents