Abstract
With the explosively increasing of mobile phones and other oriented camera devices, more and more video data is captured and stored. This brings out an urgent need for fast browsing and understanding video contents. Automatic generation of video summarization is one of effective techniques to tackle these problems which extracts succinct summaries to represent the original long videos. It involves two problems: video segmentation and summary generation. Most previous works just focused on addressing the second problem by exploiting a simple strategy like boundary detection to segment videos. However, this type of approach leads to suboptimal result because they not only lack of learning mechanism in video segmentation stage, but also separate the whole task into two independent stages. In this paper, we proposed a novel structure-transfer-driven temporal subspace clustering segmentation (STSC) method for video summarization. We first learn the structure information from source videos and then transfer it to target videos. By the Determinantal Point Process (DPP) algorithm, we select an informative subset of shots to create the final video summary. Experimental results on SumMe and TVSum datasets demonstrate the effection of our proposed method, against state-of-the-art methods.
Similar content being viewed by others
References
Affandi RH, Kulesza A, Fox EB (2012) Markov determinantal point processes, arXiv:1210.4850
Boyd S, Parikh N, Chu E, Peleato B, Eckstein J (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends®;, in Machine Learning, 2011, 3(1):1–122
Chang X, Yang Y (2017) Semisupervised feature analysis by mining correlations among multiple tasks. IEEE Transactions on Neural Networks and Learning Systems 28(10):2294–2305
Chang X, Ma Z, Lin M, Yang Y, Hauptmann A (2017) Feature interaction augmented sparse learning for fast kinect motion detection. IEEE Trans Image Process 26(8):3911–3920
Chang X, Ma Z, Yang Y, Zeng Z, Hauptmann AG (2017) Bi-level semantic representation analysis for multimedia event detection. IEEE Trans Cybern 47(5):1180–1197
Chang X, Yu YL, Yang Y, Xing EP (2017) Semantic pooling for complex event analysis in untrimmed videos. IEEE Trans Pattern Anal Mach Intell 39(8):1617–1632
Chao W-L, Gong B, Grauman K, Sha F (2015) Large-margin determinantal point processes. In: UAI, pp 191–200
Chu WS, Song Y, Jaimes A (2015) Video co-summarization: video summarization by visual co-occurrence. In: Computer vision and pattern recognition, pp 3584–3592
Elhamifar E, Vidal R (2009) Sparse subspace clustering. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 2790–2797
Fox E, Sudderth EB, Jordan MI, Willsky AS (2009) Nonparametric bayesian learning of switching linear dynamical systems. In: Proceedings of annual conference on neural information processing systems, pp 457–464
Furini M, Geraci F, Montangero M, Pellegrini M (2010) Stimo: still and moving video storyboard for the web scenario. Multimedia Tools and Applications 46 (1):47
Gong B, Chao W-L, Grauman K, Sha F (2014) Diverse sequential subset selection for supervised video summarization. In: Proceedings of annual conference on neural information processing systems, pp 2069–2077
Gygli M, Grabner H, Riemenschneider H, VanGool L (2014) Creating summaries from user videos. In: Proceedings of european conference on computer vision, 505–520
Gygli M, Song Y, Cao L (2016) Video2gif: automatic generation of animated gifs from video :1001–1009
Hoai M, Torre FDL (2013) Maximum margin temporal clustering. In: Proceedings of international conference on artificial intelligence and statistics
Khosla A, Hamid R, Lin C-J, Sundaresan N (2013) Large-scale video summarization using web-image priors. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 2698–2705
Kulesza A, Taskar B (2010) Structured determinantal point processes. In: Proceedings of annual conference on neural information processing systems, pp 1171–1179
Kulesza A, Taskar B (2011) k-dpps: fixed-size determinantal point processes. In: Proceedings of international conference on machine learning, pp 1193–1200
Kulesza A, Taskar B (2011) Learning determinantal point processes. In: Proceedings of twenty-seventh conference on uncertainty in artificial intelligence, pp 419–427
Kulesza A, Taskar B et al (2012) Determinantal point processes for machine learning. Foundations and Trends®;, in Machine Learning 5(2–3):123–286
Lee YJ, Ghosh J, Grauman K (2012) Discovering important people and objects for egocentric video summarization. In: Proceedings of IEEE conference on computer vision and pattern recognition. IEEE, pp 1346–1353
Li Y, Merialdo B (2010) Multi-video summarization based on video-mmr. In: 2010 11th international workshop on image analysis for multimedia interactive services (WIAMIS), pp 1–4
Li Z, Nie F, Chang X, Yang Y (2017) Beyond trace ratio: weighted harmonic mean of trace ratios for multiclass discriminant analysis. IEEE Trans Knowl Data Eng PP(99):1–1
Liu D, Hua G, Chen T (2010) A hierarchical visual model for video object summarization. IEEE Trans Pattern Anal Mach Intell 32(12):2178–2190
Liu G, Lin Z, Yu Y (2010) Robust subspace segmentation by low-rank representation. In: Proceedings of international conference on machine learning, pp 663–670
Lu C-Y, Min H, Zhao Z-Q, Zhu L, Huang D-S, Yan S (2012) Robust and efficient subspace segmentation via least squares regression. In: Proceedings of european conference on computer vision, pp 347–360
Massoudi A, Lefebvre F, Demarty CH, Oisel L, Chupeau B (2007) A video fingerprint based on visual digest and local fingerprints. In: Proceedings of IEEE international conference on image processing, pp 2297–2300
Mundur P, Rao Y, Yesha Y (2006) Keyframe-based video summarization using delaunay clustering. Int J Digit Libr 6(2):219–232
Nie L, Wang M, Gao Y, Zha ZJ, Chua TS (2013) Beyond text qa: multimedia answer generation by harvesting web information. IEEE Trans Multimedia 15(2):426–441
Nie L, Wang M, Zha Z, Li G, Chua TS (2011) Multimedia answering:enriching text qa with media information, pp 695–704
Nie L, Wang M, Zha ZJ, Chua TS (2012) Oracle in image search: a content-based approach to performance prediction. ACM Trans Inf Syst 13:1–13:23
Nie L, Yan S, Wang M, Hong R, Chua TS (2012) Harvesting visual concepts for image search with complex queries. In: ACM international conference on multimedia, pp 59–68
Potapov D, Douze M, Harchaoui Z, Schmid C (2014) Category-specific video summarization. In: Proceedings of european conference on computer vision, pp 540–555
Robards MW, Sunehag P (2009) Semi-markov kmeans clustering and activity recognition from body-worn sensors. In: Proceedings of IEEE international conference on data mining, pp 438–446
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. Comput Sci
Song Y, Vallmitjana J, Stent A, Jaimes A (2015) Tvsum: summarizing web videos using titles. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5179–5187
Wang S, Tu B, Xu C, Zhang Z (2014) Exact subspace clustering in linear time. In: Proceedings of AAAI conference on artificial intelligence, pp 2113–2120
Zhang K, Chao WL, Sha F, Grauman K (2016) Video summarization with long short-term memory. In: Proceedings of european conference on computer vision, pp 766–782
Zhao B, Xing EP (2014) Quasi real-time summarization for consumer videos. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 2513–2520
Zhou F, Torre FDL, Hodgins JK (2013) Hierarchical aligned cluster analysis for temporal clustering of human motion. IEEE Trans Pattern Anal Mach Intell 35 (3):582–96
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhang, J., Shi, Y., Jing, P. et al. A structure-transfer-driven temporal subspace clustering for video summarization. Multimed Tools Appl 78, 24123–24145 (2019). https://doi.org/10.1007/s11042-018-6841-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-018-6841-4