Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3591106.3592288acmconferencesArticle/Chapter ViewAbstractPublication PagesicmrConference Proceedingsconference-collections
short-paper
Open access

Escaping local minima in deep reinforcement learning for video summarization

Published: 12 June 2023 Publication History

Abstract

State-of-the-art deep neural unsupervised video summarization methods mostly fall under the adversarial reconstruction framework. This employs a Generative Adversarial Network (GAN) structure and Long Short-Term Memory (LSTM) autoencoders during its training stage. The typical result is a selector LSTM that sequentially receives video frame representations and outputs corresponding scalar importance factors, which are then used to select key-frames. This basic approach has been augmented with an additional Deep Reinforcement Learning (DRL) agent, trained using the Discriminator’s output as a reward, which learns to optimize the selector’s outputs. However, local minima are a well-known problem in DRL. Thus, this paper presents a novel regularizer for escaping local loss minima, in order to improve unsupervised key-frame extraction. It is an additive loss term employed during a second training phase, that rewards the difference of the neural agent’s parameters from those of a previously found good solution. Thus, it encourages the training process to explore more aggressively the parameter space in order to discover a better local loss minimum. Evaluation performed on two public datasets shows considerable increases over the baseline and against the state-of-the-art.

References

[1]
E. Apostolidis, E. Adamantidou, A. I. Metsai, V. Mezaris, and I. Patras. 2020. AC-SUM-GAN: Connecting actor-critic and generative adversarial networks for unsupervised video summarization. IEEE Transactions on Circuits and Systems for Video Technology 31, 8 (2020), 3278–3292.
[2]
E. Apostolidis, E. Adamantidou, A. I. Metsai, V. Mezaris, and I. Patras. 2020. Unsupervised video summarization via attention-driven adversarial learning. In International Conference on Multimedia Modeling (MMM). Springer.
[3]
E. Apostolidis, A. I. Metsai, E. Adamantidou, V. Mezaris, and I. Patras. 2019. A stepwise, label-based approach for improving the adversarial training in unsupervised video summarization. In Proceedings of the International Workshop on AI for Smart TV Content Production, Access and Delivery.
[4]
N. Gonuguntla, B. Mandal, and NB Puhan. 2019. Enhanced Deep Video Summarization Network. In Proceedings of the British Machine Vision Conference (BMVC).
[5]
I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y.A. Bengio. 2014. Generative adversarial nets. Proceedings of the Advances in Neural Information Processing Systems (NIPS) (2014).
[6]
M. Gygli, H. Grabner, H. Riemenschneider, and L. Van Gool. 2014. Creating summaries from user videos. In Proceedings of the European Conference on Computer Vision (ECCV). Springer.
[7]
T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine. 2018. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Proceedings of the International Conference on Machine Learning. PMLR.
[8]
X. He, Y. Hua, T. Song, Z. Zhang, Z. Xue, R. Ma, N. Robertson, and H. Guan. 2019. Unsupervised video summarization with attentive conditional Generative Adversarial Networks. In Proceedings of the ACM International Conference on Multimedia.
[9]
Y. Jung, D. Cho, D. Kim, S. Woo, and I. S. Kweon. 2019. Discriminative feature learning for unsupervised video summarization. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI).
[10]
M. Kaseris, I. Mademlis, and I. Pitas. 2021. Adversarial unsupervised video summarization augmented with dictionary loss. In Proceedings of the IEEE International Conference on Image Processing (ICIP).
[11]
M. Kaseris, I. Mademlis, and I. Pitas. 2022. Exploiting Caption Diversity for Unsupervised Video Summarization. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[12]
A. Kulesza and B. Taskar. 2012. Determinantal Point Processes for machine learning. arXiv preprint arXiv:1207.6083 (2012).
[13]
I. Mademlis, A. Tefas, N. Nikolaidis, and I. Pitas. 2016. Movie shot selection preserving narrative properties. In Proceedings of the IEEE International Workshop on Multimedia Signal Processing (MMSP).
[14]
I. Mademlis, A. Tefas, N. Nikolaidis, and I. Pitas. 2016. Multimodal stereoscopic movie summarization conforming to narrative characteristics. IEEE Transactions on Image Processing 25, 12 (2016), 5828–5840.
[15]
B. Mahasseni, M. Lam, and S. Todorovic. 2017. Unsupervised video summarization with adversarial lstm networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[16]
D. Potapov, M. Douze, Z. Harchaoui, and C. Schmid. 2014. Category-specific video summarization. In Proceedings of the European Conference on Computer Vision (ECCV). Springer.
[17]
M. Rochan and Y. Wang. 2019. Video summarization by learning from unpaired data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[18]
Mrigank Rochan, Linwei Ye, and Yang Wang. 2018. Video summarization using fully convolutional sequence networks. In Proceedings of the European Conference on Computer Vision (ECCV). Springer.
[19]
M. Schilling, A. Melnik, F. W. Ohl, H. J. Ritter, and B. Hammer. 2021. Decentralized control and local information for robust and adaptive decentralized Deep Reinforcement Learning. Neural Networks 144 (2021), 699–725.
[20]
Y. Song, J. Vallmitjana, A. Stent, and A. Jaimes. 2015. TVSum: Summarizing web videos using titles. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[21]
S. Zagoruyko and N. Komodakis. 2017. DiracNets: Training very deep neural networks without skip-connections. arXiv preprint arXiv:1706.00388 (2017).
[22]
K. Zhang, W.-L. Chao, F. Sha, and K. Grauman. 2016. Video summarization with Long Short-Term Memory. In Proceedings of the European Conference on Computer Vision (ECCV). Springer.
[23]
Y. Zhang, X. Liang, D. Zhang, M. Tan, and E. P. Xing. 2020. Unsupervised object-level video summarization with online motion auto-encoder. Pattern Recognition Letters 130 (2020), 376–385.
[24]
B. Zhao, X. Li, and X. Lu. 2019. Property-constrained dual learning for video summarization. IEEE Transactions on Neural Networks and Learning Systems 31, 10 (2019), 3989–4000.
[25]
B. Zhao, X. Li, and X. Lu. 2020. TTH-RNN: Tensor-Train hierarchical recurrent neural network for video summarization. IEEE Transactions on Industrial Electronics (2020).
[26]
K. Zhou, Y. Qiao, and T. Xiang. 2018. Deep reinforcement learning for unsupervised video summarization with diversity-representativeness reward. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI).

Cited By

View all
  • (2024)Explainable Video Summarization for Advancing Media Content ProductionEncyclopedia of Information Science and Technology, Sixth Edition10.4018/978-1-6684-7366-5.ch065(1-24)Online publication date: 1-Jul-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ICMR '23: Proceedings of the 2023 ACM International Conference on Multimedia Retrieval
June 2023
694 pages
ISBN:9798400701788
DOI:10.1145/3591106
This work is licensed under a Creative Commons Attribution-NonCommercial International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 June 2023

Check for updates

Author Tags

  1. deep reinforcement learning
  2. key-frame extraction
  3. unsupervised learning
  4. video summarization

Qualifiers

  • Short-paper
  • Research
  • Refereed limited

Funding Sources

Conference

ICMR '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 254 of 830 submissions, 31%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)187
  • Downloads (Last 6 weeks)11
Reflects downloads up to 20 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Explainable Video Summarization for Advancing Media Content ProductionEncyclopedia of Information Science and Technology, Sixth Edition10.4018/978-1-6684-7366-5.ch065(1-24)Online publication date: 1-Jul-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media