short-paper

Open access

Escaping local minima in deep reinforcement learning for video summarization

Authors:

Panagiota Alexoudi,

Ioannis Mademlis,

Ioannis PitasAuthors Info & Claims

ICMR '23: Proceedings of the 2023 ACM International Conference on Multimedia Retrieval

Pages 530 - 534

https://doi.org/10.1145/3591106.3592288

Published: 12 June 2023 Publication History

All formats PDF

Abstract

State-of-the-art deep neural unsupervised video summarization methods mostly fall under the adversarial reconstruction framework. This employs a Generative Adversarial Network (GAN) structure and Long Short-Term Memory (LSTM) autoencoders during its training stage. The typical result is a selector LSTM that sequentially receives video frame representations and outputs corresponding scalar importance factors, which are then used to select key-frames. This basic approach has been augmented with an additional Deep Reinforcement Learning (DRL) agent, trained using the Discriminator’s output as a reward, which learns to optimize the selector’s outputs. However, local minima are a well-known problem in DRL. Thus, this paper presents a novel regularizer for escaping local loss minima, in order to improve unsupervised key-frame extraction. It is an additive loss term employed during a second training phase, that rewards the difference of the neural agent’s parameters from those of a previously found good solution. Thus, it encourages the training process to explore more aggressively the parameter space in order to discover a better local loss minimum. Evaluation performed on two public datasets shows considerable increases over the baseline and against the state-of-the-art.

References

[1]

E. Apostolidis, E. Adamantidou, A. I. Metsai, V. Mezaris, and I. Patras. 2020. AC-SUM-GAN: Connecting actor-critic and generative adversarial networks for unsupervised video summarization. IEEE Transactions on Circuits and Systems for Video Technology 31, 8 (2020), 3278–3292.

[2]

E. Apostolidis, E. Adamantidou, A. I. Metsai, V. Mezaris, and I. Patras. 2020. Unsupervised video summarization via attention-driven adversarial learning. In International Conference on Multimedia Modeling (MMM). Springer.

[3]

E. Apostolidis, A. I. Metsai, E. Adamantidou, V. Mezaris, and I. Patras. 2019. A stepwise, label-based approach for improving the adversarial training in unsupervised video summarization. In Proceedings of the International Workshop on AI for Smart TV Content Production, Access and Delivery.

[4]

N. Gonuguntla, B. Mandal, and NB Puhan. 2019. Enhanced Deep Video Summarization Network. In Proceedings of the British Machine Vision Conference (BMVC).

[5]

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y.A. Bengio. 2014. Generative adversarial nets. Proceedings of the Advances in Neural Information Processing Systems (NIPS) (2014).

[6]

M. Gygli, H. Grabner, H. Riemenschneider, and L. Van Gool. 2014. Creating summaries from user videos. In Proceedings of the European Conference on Computer Vision (ECCV). Springer.

[7]

T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine. 2018. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Proceedings of the International Conference on Machine Learning. PMLR.

[8]

X. He, Y. Hua, T. Song, Z. Zhang, Z. Xue, R. Ma, N. Robertson, and H. Guan. 2019. Unsupervised video summarization with attentive conditional Generative Adversarial Networks. In Proceedings of the ACM International Conference on Multimedia.

[9]

Y. Jung, D. Cho, D. Kim, S. Woo, and I. S. Kweon. 2019. Discriminative feature learning for unsupervised video summarization. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI).

Digital Library

[10]

M. Kaseris, I. Mademlis, and I. Pitas. 2021. Adversarial unsupervised video summarization augmented with dictionary loss. In Proceedings of the IEEE International Conference on Image Processing (ICIP).

[11]

M. Kaseris, I. Mademlis, and I. Pitas. 2022. Exploiting Caption Diversity for Unsupervised Video Summarization. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]

A. Kulesza and B. Taskar. 2012. Determinantal Point Processes for machine learning. arXiv preprint arXiv:1207.6083 (2012).

[13]

I. Mademlis, A. Tefas, N. Nikolaidis, and I. Pitas. 2016. Movie shot selection preserving narrative properties. In Proceedings of the IEEE International Workshop on Multimedia Signal Processing (MMSP).

[14]

I. Mademlis, A. Tefas, N. Nikolaidis, and I. Pitas. 2016. Multimodal stereoscopic movie summarization conforming to narrative characteristics. IEEE Transactions on Image Processing 25, 12 (2016), 5828–5840.

Digital Library

[15]

B. Mahasseni, M. Lam, and S. Todorovic. 2017. Unsupervised video summarization with adversarial lstm networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]

D. Potapov, M. Douze, Z. Harchaoui, and C. Schmid. 2014. Category-specific video summarization. In Proceedings of the European Conference on Computer Vision (ECCV). Springer.

[17]

M. Rochan and Y. Wang. 2019. Video summarization by learning from unpaired data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]

Mrigank Rochan, Linwei Ye, and Yang Wang. 2018. Video summarization using fully convolutional sequence networks. In Proceedings of the European Conference on Computer Vision (ECCV). Springer.

Digital Library

[19]

M. Schilling, A. Melnik, F. W. Ohl, H. J. Ritter, and B. Hammer. 2021. Decentralized control and local information for robust and adaptive decentralized Deep Reinforcement Learning. Neural Networks 144 (2021), 699–725.

Digital Library

[20]

Y. Song, J. Vallmitjana, A. Stent, and A. Jaimes. 2015. TVSum: Summarizing web videos using titles. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]

S. Zagoruyko and N. Komodakis. 2017. DiracNets: Training very deep neural networks without skip-connections. arXiv preprint arXiv:1706.00388 (2017).

[22]

K. Zhang, W.-L. Chao, F. Sha, and K. Grauman. 2016. Video summarization with Long Short-Term Memory. In Proceedings of the European Conference on Computer Vision (ECCV). Springer.

[23]

Y. Zhang, X. Liang, D. Zhang, M. Tan, and E. P. Xing. 2020. Unsupervised object-level video summarization with online motion auto-encoder. Pattern Recognition Letters 130 (2020), 376–385.

Digital Library

[24]

B. Zhao, X. Li, and X. Lu. 2019. Property-constrained dual learning for video summarization. IEEE Transactions on Neural Networks and Learning Systems 31, 10 (2019), 3989–4000.

[25]

B. Zhao, X. Li, and X. Lu. 2020. TTH-RNN: Tensor-Train hierarchical recurrent neural network for video summarization. IEEE Transactions on Industrial Electronics (2020).

[26]

K. Zhou, Y. Qiao, and T. Xiang. 2018. Deep reinforcement learning for unsupervised video summarization with diversity-representativeness reward. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI).

Cited By

Apostolidis EBalaouras GPatras IMezaris V(2024)Explainable Video Summarization for Advancing Media Content ProductionEncyclopedia of Information Science and Technology, Sixth Edition10.4018/978-1-6684-7366-5.ch065(1-24)Online publication date: 1-Jul-2024
https://doi.org/10.4018/978-1-6684-7366-5.ch065

Index Terms

Escaping local minima in deep reinforcement learning for video summarization
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Video summarization
  2. Machine learning
    1. Learning paradigms
      1. Reinforcement learning
    2. Machine learning approaches
      1. Neural networks

Recommendations

Using independently recurrent networks for reinforcement learning based unsupervised video summarization
Abstract
Sigmoid and hyperbolic activation functions in long short-term memory (LSTM) and gated recurrent unit (GRU) based models used in recent studies on video summarization, may cause gradient decay over layers. Moreover, interpreting and developing ...
Unsupervised video summarization using deep Non-Local video summarization networks
Abstract
Video summarization is to extract effective information from videos to quickly obtain the most informative summary. Most of the existing video summarization methods use recurrent neural networks and their variants such as long and short-term ...
Unsupervised Reinforcement Learning For Video Summarization Reward Function
IVSP '19: Proceedings of the 2019 International Conference on Image, Video and Signal Processing

We propose a new reward function based on Deep Summarization Network (DSN), which is used to synthesize short video summaries to facilitate large-scale browsing of videos. The DSN uses the video summarization as a process of sequential decision making, ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMR '23: Proceedings of the 2023 ACM International Conference on Multimedia Retrieval

June 2023

694 pages

ISBN:9798400701788

DOI:10.1145/3591106

Editors:
Ioannis (Yiannis) Kompatsiaris
Centre for Research and Technology Hellas, Greece
,
Jiebo Luo
University of Rochester,USA
,
Nicu Sebe
University of Trento, Italy
,
Angela Yao
National University of Singapore, Singapore
,
Vasileios Mezaris
Centre for Research and Technology Hellas, Greece
,
Symeon Papadopoulos
Centre for Research and Technology Hellas, Greece
,
Adrian Popescu
CEA LIST, France
,
Zi (Helen) Huang
University of Queensland, Australia

Copyright © 2023 Owner/Author.

This work is licensed under a Creative Commons Attribution-NonCommercial International 4.0 License.

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 June 2023

Check for updates

Author Tags

Qualifiers

Short-paper
Research
Refereed limited

Funding Sources

Horizon 2020 Framework Programme

Conference

ICMR '23

Sponsor:

SIGMM

ICMR '23: International Conference on Multimedia Retrieval

June 12 - 15, 2023

Thessaloniki, Greece

Acceptance Rates

Overall Acceptance Rate 254 of 830 submissions, 31%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
239
Total Downloads

Downloads (Last 12 months)187
Downloads (Last 6 weeks)11

Reflects downloads up to 20 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Apostolidis EBalaouras GPatras IMezaris V(2024)Explainable Video Summarization for Advancing Media Content ProductionEncyclopedia of Information Science and Technology, Sixth Edition10.4018/978-1-6684-7366-5.ch065(1-24)Online publication date: 1-Jul-2024
https://doi.org/10.4018/978-1-6684-7366-5.ch065

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents