Abstract
In this paper we propose several novel algorithms for multi-video summarization. The first and essential algorithm, Video Maximal Marginal Relevance (Video-MMR), mimics the principle of a classical algorithm of text summarization, Maximal Marginal Relevance (MMR). Video-MMR rewards relevant keyframes and penalizes redundant keyframes, only relying on visual features. We extend Video-MMR to Audio Video Maximal Marginal Relevance (AV-MMR) by exploiting audio features. We also propose Balanced AV-MMR, which exploits additional semantic features, the balance between audio information and visual information, and the balance of temporal information in different videos of a set. The proposed algorithms are generic and suitable for summarizing various video genres in multi-video set by using multimodal information. Our series of MMR algorithms for multi-video summarization are proved to be effective by the large-scale subjective and objective evaluation.
Similar content being viewed by others
References
Ajmal M, Ashraf M, Shakir M, Abbas Y, Shah F (2012) Video summarization: Techniques and classification. Comput Vision Graph :1–13
Allen MJ, Weintraub L, Abrams BS (2008) Forensic vision with application to highway safety. Lawyers & Judges Publishing
Barbieri M, Agnihotri L, Dimitrova N (2003) Video summarization: methods and landscape. Internet multimedia management systems IV. In: Smith JR, Panchanathan S, Zhang T (eds) Proceedings of the SPIE
Carbonell J, Goldstein J (1998) The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of ACM SIGIR conference. Melbourne Australia
Chiu P, Girgensohn A, PolakW, Rieffel E,Wilcox L (2000) A genetic algorithm for video segmentation and summarization. In: IEEE international conference on multimedia and expo, ICME 2000, vol 3. IEEE, pp 1329–1332
Cong Y, Yuan J, Luo J (2012) Towards scalable summarization of consumer videos via sparse dictionary selection. Multimed IEEE Trans 14(1):66–75
Dale K, Shechtman E, Avidan S, Pfister H (2012) Multi-video browsing and summarization. In: IEEE computer society conference on computer vision and pattern recognition workshops (CVPRW). IEEE, pp 1–8
Das D, Martins AF (2007) A survey on automatic text summarization. Tech. rep., Literature Survey for the Language and Statistics II course at CMU
de Avila SEF, Lopes APB et al (2011) Vsumm: A mechanism designed to produce static video summaries and a novel evaluation method. Pattern Recogn Lett 32(1):56–68
Delacourt P, Wellekens CJ (2000) Distbic: a speaker-based segmentation for audio data indexing. Speech Commun 32(1):111–126
Dimitrova N (2004) Context and memory in multimedia content analysis. IEEE Multimedia 11:7–11
Ding D, Metze F, Rawat S, Schulam P, Burger S, Younessian E, Bao L, Christel M, Hauptmann A (2012) Beyond audio and video retrieval: towards multimedia summarization. In: Proceedings of the 2nd ACM international conference on multimedia retrieval. ACM, p 2
Dreyfus HL, Drey-fus SE, Zadeh LA (1987) Mind over machine: The power of human intuition and expertise in the era of the computer. IEEE Expert 2(2):110–111
Dumont E, Merialdo B (2008) Automatic evaluation method for rushes summary content. In: Proceedings of international workshop on content-based multimedia indexing. London, pp 451–457
Ejaz N, Mehmood I, Wook Baik S (2012) Efficient visual attention based framework for extracting key frames from videos. Signal Processing: Image Communication
Ejaz N, Tariq TB, Baik SW (2012) Adaptive key frame extraction for video summarization using an aggregation mechanism. J Vis Communi Image Represent 23(7):1031–1040
Fraternali P, Martinenghi D, Tagliasacchi M (2012) Top-k bounded diversification. In: Proceedings of the 2012 international conference on management of data. ACM, pp 421–432
Furini M, Ghini V (2006) An audio-video summarization scheme based on audio and video analysis. Consumer Communications and Networking Conference
Gao S, Tsang I, Chia L (2010) Kernel sparse representation for image classification and face recognition. Comput Vision–ECCV 2010:1–14
Haroz S, Whitney D (2012) How capacity limits of attention influence information visualization effectiveness. IEEE Trans Vis Comput Graph 18(12):2402–2410. http://dblp.uni-trier.de/db/journals/tvcg/tvcg18.html#HarozW12
He L, Sanocki E, Gupta A, Grudin J (1999) Auto-summarization of audio-video presentations. In: Proceedings of the seventh ACM international conference on Multimedia (Part 1). ACM, pp 489–498
Jiang W, Cotton C, Loui A (2011) Automatic consumer video summarization by audio and visual analysis. In: IEEE international conference on multimedia and expo (ICME). IEEE, pp 1–6
Kemp T, Schmidt M, Westphal M, Waibel A (2000) Strategies for automatic segmentation of audio data. In: IEEE international conference on acoustics, speech, and signal processing, 2000. ICASSP’00. Proceedings, vol 3. IEEE, pp 1423–1426
Kumar M, Loui A (2011) Key frame extraction from consumer videos using sparse representation. In: 18th IEEE international conference on image processing (ICIP). IEEE, pp 2437–2440
Lee H, Battle A, Raina R, Ng A (2007) Efficient sparse coding algorithms. Adv Neural Inf Process Syst 19:801
Lew M, Sebe N, Djeraba C, Jain R (2006) Content-based multimedia information retrieval: state of the art and challenges. ACM Transactions on Multimedia Computing
Li Y, Merialdo B (2010) Multi-video summarization based on Video-MMR. In: Proceedings of 11th international workshop on image analysis for multimedia interactive services. Desenzano del Garda, Italy
Li Y, Merialdo B (2012) Multi-video summarization based on Balanced AV-MMR. In: Proceedings of The 18th international conference on multimedia modeling. Klagenfurt, Austria
Li Y, Merialdo B, Rouvier M, Linares G (2011) Static and dynamic video summaries. In: Proceedings of the 19th ACM international conference on multimedia. ACM, pp 1573–1576
Lienhart R, Pfeiffer S, Effelsberg W (1997) Video abstracting. Commun ACM 40(12):55–62
Lin CY (2004) ROUGE: a package for automatic evaluation of summaries. In: proceedings of the workshop on text summarization branches out (WAS), Barcelona, p 2004
Lin K, Lee A, Yang Y, Lee C, Chen H (2011) Automatic highlights extraction for drama video using music emotion and human face features. In: IEEE 13th international workshop on multimedia signal processing (MMSP). IEEE, pp 1–6
Liu W, Tao D (2013) Multiview hessian regularization for image annotation. IEEE Trans Image Process 22(7):2676–2687
Liu W, Tao D, Cheng J, Tang Y (2014) Multiview hessian discriminative sparse coding for image annotation. Comput Vision Image Underst 118:50–60
Ma Y, Hua X, Lu L, Zhang H (2005) A generic framework of user attention model and its application in video summarization. IEEE Trans Multimed 7:907–919
Ma YF, Lu L, Zhang HJ, Li M (2002) A user attention model for video summarization. In: Proceedings of the tenth ACM international conference on multimedia. ACM, pp 533–542
Mahmoud KM, Ismail MA, Ghanem NM (2013) Vscan: an enhanced video summarization using density-based spatial clustering. In: Image analysis and processing–ICIAP 2013. Springer, pp 733–742
Marois R, Ivanoff J (2005) Capacity limits of information processing in the brain. Trends Cogn Sci 9(6):296–305
McDonald R (2007) A study of global inference algorithms in multi-document summarization. Adv Inf Retr:557–564
Mckeown K, Passonneau J R, Elson K D (1998) Do summaries help? A task-based evaluation of multi-document summarization. In: Proceedings of ACM SIGIR conference. Melbourne Australia
Money AG (2007) Agius, H., Video summarisation: A conceptual framework and survey of the state of the art. J Vis Commun Image Represent
Nilsson M, Nordberg J, Claesson I (2007) Face detection using local smqt features and split up snow classifier. In: Proceedings of IEEE international conference on acoustics, speech, and signal processing
Over P, Smeaton AF, Kelly P (2007) The trecvid 2007 bbc rushes summarization evaluation pilot. In: Proceedings of ACM MM’07. Augsburg, Bavaria, Germany
Peng W, Chu W, Chang C, Chou C, Huang W, Chang W, Hung Y (2011) Editing by viewing: automatic home video summarization by viewing behavior analysis. IEEE Trans Multimed 13(3):539–550
Rudinac S, Larson M, Hanjalic A (2013) Learning crowdsourced user preferences for visual summarization of image collections
Shapiro KE (2001) The limits of attention: temporal constraints in human information processing. Oxford University Press
Smeaton AF, Over P, Kraaij W (2006) Evaluation campaigns and TRECVid. In: Proceedings of the 8th ACM international workshop on multimedia information retrieval. ACM Press, New York, pp 321–330. doi:10.1145/1178677.1178722
Sugano M, Nakajima Y, Yanagihara H (2002) Automated MPEG audio-video summarization and description. In: Proceedings of the international conference on image processing. New York
Truong BT, Venkatesh S (2007) Video abstraction: a systematic review and classification. ACM Trans Multimed Comput Commun Appl 3
University of Cambridge HTK toolkit. http://htk.eng.cam.ac.uk
Video Retrieval Group City U. of Hong Kong: local interest point extraction toolkit. http://vireo.cs.cityu.edu.hk
Wactlar HD (2001) Multi-document summarization and visualization in the informedia digital video library. In: Proceedings of the 12th new information technology conference. Beijing, China
Wang F, Merialdo B (2009) Multi-document video summarization. In: Proceedings of international conference on multimedia and expo. New York, USA
Wang Z, Kumar M, Luo J, Li B (2011) Sequence-kernel based sparse representation for amateur video summarization. In: Proceedings of the 2011 joint ACM workshop on Modeling and representing events. ACM, pp 31–36
Xu C, Shao X, Maddags NC, Kankanhalli MS (2005) Automatic music video summarization based on audio-visual-text analysis and alignment. ACM SIGIR
Xu C, Tao D, Xu C (2013) A survey on multi-view learning arXiv preprint. arXiv:1304.5634
Yahiaoui I, Merialdo B, Huet B (2001) Automatic video summarization. Multimedia content-based indexing and retrieval
Yang CC, Chen H, Hong K (2003) Visualization of large category map for internet browsing. Decis Support Syst 35(1):89–102
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Li, Y., Merialdo, B. Multimedia maximal marginal relevance for multi-video summarization. Multimed Tools Appl 75, 199–220 (2016). https://doi.org/10.1007/s11042-014-2287-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-014-2287-5