Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3422839.3423064acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Realistic Video Summarization through VISIOCITY: A New Benchmark and Evaluation Framework

Published: 12 October 2020 Publication History

Abstract

Automatic video summarization is still an unsolved problem due to several challenges. We take steps towards making it more realistic by addressing the following challenges. Firstly, the currently available datasets either have very short videos or have few long videos of only a particular type. We introduce a new benchmarking dataset called VISIOCITY which comprises of longer videos across six different categories with dense concept annotations capable of supporting different flavors of video summarization and other vision problems. Secondly, for long videos, human reference summaries, necessary for supervised video summarization techniques, are difficult to obtain. We present a novel recipe based on pareto optimality to automatically generate multiple reference summaries from indirect ground truth present in VISIOCITY. We show that these summaries are at par with human summaries. Thirdly, we demonstrate that in the presence of multiple ground truth summaries (due to the highly subjective nature of the task), learning from a single combined ground truth summary using a single loss function is not a good idea. We propose a simple recipe VISIOCITY-SUM to enhance an existing model using a combination of losses and demonstrate that it beats the current state of the art techniques. We also present a study of different desired characteristics of a good summary and demonstrate that a single measure (say F1) to evaluate a summary, as is the current typical practice, falls short in some ways. We propose an evaluation framework for better quantitative assessment of summary quality which is closer to human judgment than a single measure. We report the performance of a few representative techniques of video summarization on VISIOCITY assessed using various measures and bring out the limitation of the techniques and/or the assessment mechanism in modeling human judgment and demonstrate the effectiveness of our evaluation framework in doing so.

References

[1]
Evlampios Apostolidis, Eleni Adamantidou, Alexandros I Metsai, Vasileios Mezaris, and Ioannis Patras. 2020. Unsupervised Video Summarization via Attention-Driven Adversarial Learning. In International Conference on Multimedia Modeling . Springer, 492--504.
[2]
Sandra Eliza Fontes De Avila, Ana Paula Brand ao Lopes, Antonio da Luz Jr, and Arnaldo de Albuquerque Araújo. 2011. VSUMM: A mechanism designed to produce static video summaries and a novel evaluation method. Pattern Recognition Letters, Vol. 32, 1 (2011), 56--68.
[3]
David Doermann and David Mihalcik. 2000. Tools and techniques for video performance evaluation. In icpr. IEEE, 4167.
[4]
Jiri Fajtl, Hajar Sadeghi Sokeh, Vasileios Argyriou, Dorothy Monekosso, and Paolo Remagnino. 2018. Summarizing Videos with Attention. In Asian Conference on Computer Vision. Springer, 39--54.
[5]
Cheng-Yang Fu, Joon Lee, Mohit Bansal, and Alexander C Berg. 2017. Video highlight prediction using audience chat reactions. arXiv preprint arXiv:1707.08559 (2017).
[6]
Tsu-Jui Fu, Shao-Heng Tai, and Hwann-Tzong Chen. 2019. Attentive and Adversarial Learning for Video Summarization. In 2019 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 1579--1587.
[7]
Boqing Gong, Wei-Lun Chao, Kristen Grauman, and Fei Sha. 2014. Diverse sequential subset selection for supervised video summarization. In Advances in Neural Information Processing Systems. 2069--2077.
[8]
Michael Gygli, Helmut Grabner, Hayko Riemenschneider, and Luc Van Gool. 2014a. Creating Summaries from User Videos. In ECCV .
[9]
Michael Gygli, Helmut Grabner, Hayko Riemenschneider, and Luc Van Gool. 2014b. Creating summaries from user videos. In European conference on computer vision. Springer, 505--520.
[10]
Michael Gygli, Helmut Grabner, and Luc Van Gool. 2015. Video summarization by learning submodular mixtures of objectives. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . 3090--3098.
[11]
Mei Huang, Ayesh B Mahajan, and Daniel F DeMenthon. 2004. Automatic performance evaluation for video summarization . Technical Report. MARYLAND UNIV COLLEGE PARK INST FOR ADVANCED COMPUTER STUDIES.
[12]
Zhong Ji, Kailin Xiong, Yanwei Pang, and Xuelong Li. 2019. Video summarization with attention-based encoder-decoder networks. IEEE Transactions on Circuits and Systems for Video Technology (2019).
[13]
Sivapriyaa Kannappan, Yonghuai Liu, and Bernie Tiddeman. 2019. Human consistency evaluation of static video summaries. Multimedia Tools and Applications, Vol. 78, 9 (2019), 12281--12306.
[14]
Vishal Kaushal, Sandeep Subramanian, Suraj Kothawade, Rishabh Iyer, and Ganesh Ramakrishnan. 2019. A Framework towards Domain Specific Video Summarization. In 2019 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 666--675.
[15]
Aditya Khosla, Raffay Hamid, Chih-Jen Lin, and Neel Sundaresan. 2013. Large-scale video summarization using web-image priors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2698--2705.
[16]
Yong Jae Lee, Joydeep Ghosh, and Kristen Grauman. 2012. Discovering important people and objects for egocentric video summarization. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 1346--1353.
[17]
Zhuo Lei, Chao Zhang, Qian Zhang, and Guoping Qiu. 2019. FrameRank: A Text Processing Approach to Video Summarization. arXiv preprint arXiv:1904.05544 (2019).
[18]
Yingbo Li and Bernard Merialdo. 2010. VERT: automatic evaluation of video summaries. In Proceedings of the 18th ACM international conference on Multimedia. ACM, 851--854.
[19]
Yandong Li, Liqiang Wang, Tianbao Yang, and Boqing Gong. 2018. How local is the local diversity? Reinforcing sequential determinantal point processes with dynamic ground sets for supervised video summarization. In Proceedings of the European Conference on Computer Vision (ECCV) . 151--167.
[20]
Zheng Lu and Kristen Grauman. 2013. Story-driven summarization for egocentric video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2714--2721.
[21]
Yu-Fei Ma, Lie Lu, Hong-Jiang Zhang, and Mingjing Li. 2002. A user attention model for video summarization. In Proceedings of the tenth ACM international conference on Multimedia. ACM, 533--542.
[22]
Michel Minoux. 1978. Accelerated greedy algorithms for maximizing submodular set functions. In Optimization Techniques . Springer, 234--243.
[23]
Mayu Otani, Yuta Nakashima, Esa Rahtu, and Janne Heikkila. 2019. Rethinking the Evaluation of Video Summaries. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7596--7604.
[24]
Rameswar Panda, Niluthpol Chowdhury Mithun, and Amit K Roy-Chowdhury. 2017. Diversity-aware multi-video summarization. IEEE Transactions on Image Processing, Vol. 26, 10 (2017), 4712--4724.
[25]
Bryan A Plummer, Matthew Brown, and Svetlana Lazebnik. 2017. Enhancing video summarization via vision-language embedding. In Computer Vision and Pattern Recognition, Vol. 2.
[26]
Danila Potapov, Matthijs Douze, Zaid Harchaoui, and Cordelia Schmid. 2014. Category-specific video summarization. In European conference on computer vision. Springer, 540--555.
[27]
Aidean Sharghi, Ali Borji, Chengtao Li, Tianbao Yang, and Boqing Gong. 2018. Improving sequential determinantal point processes for supervised video summarization. In Proceedings of the European Conference on Computer Vision (ECCV) . 517--533.
[28]
Aidean Sharghi, Jacob S Laurel, and Boqing Gong. 2017. Query-focused video summarization: Dataset, evaluation, and a memory network based approach. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2127--2136.
[29]
Yale Song, Jordi Vallmitjana, Amanda Stent, and Alejandro Jaimes. 2015a. TVSum: Summarizing web videos using titles. In CVPR. IEEE Computer Society, 5179--5187. http://dblp.uni-trier.de/db/conf/cvpr/cvpr2015.html#SongVSJ15
[30]
Yale Song, Jordi Vallmitjana, Amanda Stent, and Alejandro Jaimes. 2015b. Tvsum: Summarizing web videos using titles. In Proceedings of the IEEE conference on computer vision and pattern recognition. 5179--5187.
[31]
Ba Tu Truong and Svetha Venkatesh. 2007. Video abstraction: A systematic review and classification. ACM transactions on multimedia computing, communications, and applications (TOMM), Vol. 3, 1 (2007), 3.
[32]
Arun Balajee Vasudevan, Michael Gygli, Anna Volokitin, and Luc Van Gool. 2017. Query-adaptive video summarization via quality-aware relevance estimation. In Proceedings of the 25th ACM international conference on Multimedia. ACM, 582--590.
[33]
Shuwen Xiao, Zhou Zhao, Zijian Zhang, Xiaohui Yan, and Min Yang. 2020. Convolutional Hierarchical Attention Network for Query-Focused Video Summarization. arXiv preprint arXiv:2002.03740 (2020).
[34]
Bo Xiong, Yannis Kalantidis, Deepti Ghadiyaram, and Kristen Grauman. 2019. Less is More: Learning Highlight Detection from Video Duration. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . 1258--1267.
[35]
Serena Yeung, Alireza Fathi, and Li Fei-Fei. 2014. Videoset: Video summary evaluation through text. arXiv preprint arXiv:1406.5824 (2014).
[36]
Li Yuan, Francis EH Tay, Ping Li, Li Zhou, and Jiashi Feng. 2019. Cycle-SUM: Cycle-consistent Adversarial LSTM Networks for Unsupervised Video Summarization. arXiv preprint arXiv:1904.08265 (2019).
[37]
Ke Zhang, Wei-Lun Chao, Fei Sha, and Kristen Grauman. 2016a. Summary transfer: Exemplar-based subset selection for video summarization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . 1059--1067.
[38]
Ke Zhang, Wei-Lun Chao, Fei Sha, and Kristen Grauman. 2016b. Video summarization with long short-term memory. In European Conference on Computer Vision . Springer, 766--782.
[39]
Bolei Zhou, Agata Lapedriza, Jianxiong Xiao, Antonio Torralba, and Aude Oliva. 2014. Learning deep features for scene recognition using places database. In Advances in neural information processing systems. 487--495.
[40]
Kaiyang Zhou, Yu Qiao, and Tao Xiang. 2018. Deep reinforcement learning for unsupervised video summarization with diversity-representativeness reward. In Thirty-Second AAAI Conference on Artificial Intelligence .

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
AI4TV '20: Proceedings of the 2nd International Workshop on AI for Smart TV Content Production, Access and Delivery
October 2020
50 pages
ISBN:9781450381468
DOI:10.1145/3422839
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 October 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. automatic evaluation
  2. benchmark
  3. challenges
  4. dataset
  5. video summarization

Qualifiers

  • Research-article

Conference

MM '20
Sponsor:

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)16
  • Downloads (Last 6 weeks)3
Reflects downloads up to 27 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Query-attentive video summarization: a comprehensive reviewMultimedia Tools and Applications10.1007/s11042-024-19977-0Online publication date: 6-Aug-2024
  • (2024)Find the Cliffhanger: Multi-modal Trailerness in Soap OperasMultiMedia Modeling10.1007/978-3-031-53308-2_15(199-212)Online publication date: 28-Jan-2024
  • (2023)A Comprehensive Survey on Summarization TechniquesSN Computer Science10.1007/s42979-023-02007-54:5Online publication date: 29-Jul-2023
  • (2023)A comprehensive study of automatic video summarization techniquesArtificial Intelligence Review10.1007/s10462-023-10429-z56:10(11473-11633)Online publication date: 13-Mar-2023
  • (2023)Analysis of Video Summarization Techniques for Resource Optimization in Multimedia ApplicationsBig Data Analytics in Intelligent IoT and Cyber-Physical Systems10.1007/978-981-99-4518-4_17(281-295)Online publication date: 5-Nov-2023
  • (2022)Comparative Analysis of Text-based Video Summarization Techniques using Deep Learning2022 2nd Asian Conference on Innovation in Technology (ASIANCON)10.1109/ASIANCON55314.2022.9908945(1-6)Online publication date: 26-Aug-2022
  • (2022)Recent Challenges and Opportunities in Video Summarization With Machine Learning AlgorithmsIEEE Access10.1109/ACCESS.2022.322337910(122762-122785)Online publication date: 2022
  • (2022)A Multi-Flexible Video Summarization Scheme Using Property-Constraint Decision TreeNeurocomputing10.1016/j.neucom.2022.07.077Online publication date: Jul-2022
  • (2020)AI4TV 2020Proceedings of the 28th ACM International Conference on Multimedia10.1145/3394171.3421894(4756-4757)Online publication date: 12-Oct-2020

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media