Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3581754.3584160acmconferencesArticle/Chapter ViewAbstractPublication PagesiuiConference Proceedingsconference-collections
poster
Public Access

VIVA: Visual Exploration and Analysis of Videos with Interactive Annotation

Published: 27 March 2023 Publication History

Abstract

This paper presents VIVA, a novel interactive tool for visually exploring long videos and searching for specific moments. Previous work on video data exploration and analytics often assumes that manually-created, rich annotations are available. However, such metadata may not be easily obtained. We design an interactively machine learning workflow for users to rapidly create annotations along a timeline. Combined with VIVA’s focus+context visualization that effectively displays frame snapshots in the context of a video stream, VIVA enables users to explore and analyze long video clips by incrementally make sense of them. We present usage scenarios that demonstrate how users would use VIVA for video-related tasks.

Supplementary Material

MP4 File (viva-iui23demo-video.mp4)
Demo video

References

[1]
2016. Svelte: Cybernetically enhanced web apps. https://svelte.dev/. Accessed on February 16, 2023.
[2]
Hassan Akbari, Liangzhe Yuan, Rui Qian, Wei-Hong Chuang, Shih-Fu Chang, Yin Cui, and Boqing Gong. 2021. Vatt: Transformers for multimodal self-supervised learning from raw video, audio and text. Advances in Neural Information Processing Systems 34 (2021), 24206–24221.
[3]
Saleema Amershi, Maya Cakmak, William Bradley Knox, and Todd Kulesza. 2014. Power to the people: The role of humans in interactive machine learning. AI Magazine 35, 4 (2014), 105–120.
[4]
Jürgen Bernard, Marco Hutter, Matthias Zeppelzauer, Dieter Fellner, and Michael Sedlmair. 2017. Comparing visual-interactive labeling with active learning: An experimental study. IEEE Transactions on Visualization and Computer Graphics 24, 1(2017), 298–308.
[5]
Tat-Seng Chua, Shih-Fu Chang, Lekha Chaisorn, and Winston Hsu. 2004. Story boundary detection in large broadcast news video archives: techniques, experience and trends. In Proceedings of the 12th annual ACM international conference on Multimedia. 656–659.
[6]
Andy Cockburn, Amy Karlson, and Benjamin B Bederson. 2009. A review of overview+ detail, zooming, and focus+context interfaces. ACM Computing Surveys (CSUR) 41, 1 (2009), 1–31.
[7]
Ork de Rooij, Jarke van Wijk, and Marcel Worring. 2010. Mediatable: Interactive categorization of multimedia collections. IEEE Computer Graphics and Applications 30, 5 (2010), 42–51.
[8]
Ork de Rooij and Marcel Worring. 2013. Active bucket categorization for high recall video retrieval. IEEE Transactions on Multimedia 15, 4 (2013), 898–907.
[9]
Dazhen Deng, Jiang Wu, Jiachen Wang, Yihong Wu, Xiao Xie, Zheng Zhou, Hui Zhang, Xiaolong Zhang, and Yingcai Wu. 2021. EventAnchor: Reducing Human Interactions in Event Annotation of Racket Sports Videos. In CHI Conference on Human Factors in Computing Systems.
[10]
John J Dudley and Per Ola Kristensson. 2018. A review of user interface design for interactive machine learning. ACM Transactions on Interactive Intelligent Systems (TiiS) 8, 2(2018), 1–37.
[11]
C Ailie Fraser, Joy O Kim, Hijung Valentina Shin, Joel Brandt, and Mira Dontcheva. 2020. Temporal Segmentation of Creative Live Streams. In CHI Conference on Human Factors in Computing Systems. 1–12.
[12]
Bhavya Ghai, Q Vera Liao, Yunfeng Zhang, Rachel Bellamy, and Klaus Mueller. 2021. Explainable Active Learning (XAL): Toward AI Explanations as Interfaces for Machine Teachers. Proceedings of the ACM on Human-Computer Interaction 4, CSCW3(2021), 1–28.
[13]
Gaudenz Halter, Rafael Ballester-Ripoll, Barbara Flueckiger, and Renato Pajarola. 2019. VIAN: A visual annotation tool for film analysis. Computer Graphics Forum 38, 3 (2019), 119–129.
[14]
Marti A Hearst. 1994. Multi-paragraph segmentation expository text. In 32nd Annual Meeting of the Association for Computational Linguistics. 9–16.
[15]
Benjamin Höferlin, Rudolf Netzel, Markus Höferlin, Daniel Weiskopf, and Gunther Heidemann. 2012. Inter-active learning of ad-hoc classifiers for video visual analytics. In 2012 IEEE Conference on Visual Analytics Science and Technology (VAST). IEEE, 23–32.
[16]
Nam Wook Kim, Benjamin Bach, Hyejin Im, Sasha Schriber, Markus Gross, and Hanspeter Pfister. 2017. Visualizing nonlinear narratives with story curves. IEEE Transactions on Visualization and Computer Graphics 24, 1(2017), 595–604.
[17]
Kuno Kurzhals, Marcel Hlawatsch, Christof Seeger, and Daniel Weiskopf. 2016. Visual analytics for mobile eye tracking. IEEE Transactions on Visualization and Computer Graphics 23, 1(2016), 301–310.
[18]
Kuno Kurzhals, Markus John, Florian Heimerl, Paul Kuznecov, and Daniel Weiskopf. 2016. Visual movie analytics. IEEE Transactions on Multimedia 18, 11 (2016), 2149–2160.
[19]
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In European conference on computer vision. Springer, 740–755.
[20]
A Chris Long, Brad Myers, Juan Casares, Scott Stevens, and Albert Corbett. [n. d.]. Video Editing Using Lenses and Semantic Zooming. Carnegie Mellon University([n. d.]).
[21]
Behrooz Mahasseni, Michael Lam, and Sinisa Todorovic. 2017. Unsupervised video summarization with adversarial lstm networks. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 202–211.
[22]
Gary Marchionini. 2006. Exploratory search: from finding to understanding. Commun. ACM 49, 4 (2006), 41–46.
[23]
Justin Matejka, Tovi Grossman, and George Fitzmaurice. 2014. Video lens: rapid playback and exploration of large video collections and associated metadata. In Proceedings of the 27th annual ACM symposium on User interface software and technology. 541–550.
[24]
Amy Pavel, Dan B Goldman, Björn Hartmann, and Maneesh Agrawala. 2015. Sceneskim: Searching and browsing movies using synchronized captions, scripts and plot summaries. In Proceedings of the 28th Annual ACM Symposium on User Interface Software & Technology. 181–190.
[25]
Daniel M Russell, Mark J Stefik, Peter Pirolli, and Stuart K Card. 1993. The cost structure of sensemaking. In Proceedings of the INTERACT’93 and CHI’93 Conference on Human Factors in Computing Systems. 269–276.
[26]
Ben Shneiderman. 2003. The eyes have it: A task by data type taxonomy for information visualizations. In The craft of information visualization. Elsevier, 364–371.
[27]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. preprint arXiv:1409.1556(2014).
[28]
Tan Tang, Yanhong Wu, Yingcai Wu, Lingyun Yu, and Yuhong Li. 2021. VideoModerator: A Risk-aware Framework for Multimodal Video Moderation in E-Commerce. IEEE Transactions on Visualization and Computer Graphics 28, 1(2021), 846–856.
[29]
Anh Truong, Peggy Chi, David Salesin, Irfan Essa, and Maneesh Agrawala. 2021. Automatic generation of two-level hierarchical tutorials from instructional makeup videos. In CHI Conference on Human Factors in Computing Systems. 1–16.
[30]
Saelyne Yang, Jisu Yim, Juho Kim, and Hijung Valentina Shin. 2022. CatchLive: Real-time Summarization of Live Streams with Stream Content and Interaction Data. In CHI Conference on Human Factors in Computing Systems. 1–20.
[31]
Minerva M Yeung and Boon-Lock Yeo. 1997. Video visualization for compact presentation and fast browsing of pictorial content. IEEE Transactions on circuits and Systems for Video Technology 7, 5(1997), 771–785.
[32]
Haipeng Zeng, Xingbo Wang, Aoyu Wu, Yong Wang, Quan Li, Alex Endert, and Huamin Qu. 2019. EmoCo: Visual analysis of emotion coherence in presentation videos. IEEE Transactions on Visualization and Computer Graphics 26, 1(2019), 927–937.
[33]
Jian Zhao, Chidansh Bhatt, Matthew Cooper, and David A Shamma. 2018. Flexible learning with semantic visual exploration and sequence-based recommendation of MOOC videos. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. 1–13.
[34]
Zhenge Zhao, Panpan Xu, Carlos Scheidegger, and Liu Ren. 2021. Human-in-the-loop extraction of interpretable concepts in deep learning models. IEEE Transactions on Visualization and Computer Graphics 28, 1(2021), 780–790.

Cited By

View all
  • (2024)Multimodal Dictionaries for Traditional Craft EducationMultimodal Technologies and Interaction10.3390/mti80700638:7(63)Online publication date: 18-Jul-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
IUI '23 Companion: Companion Proceedings of the 28th International Conference on Intelligent User Interfaces
March 2023
266 pages
ISBN:9798400701078
DOI:10.1145/3581754
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 March 2023

Check for updates

Qualifiers

  • Poster
  • Research
  • Refereed limited

Funding Sources

Conference

IUI '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 746 of 2,811 submissions, 27%

Upcoming Conference

IUI '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)891
  • Downloads (Last 6 weeks)100
Reflects downloads up to 16 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Multimodal Dictionaries for Traditional Craft EducationMultimodal Technologies and Interaction10.3390/mti80700638:7(63)Online publication date: 18-Jul-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media