Abstract
This paper presents a video analysis approach based on concept detection and keyframe extraction employing a visual thesaurus representation. Color and texture descriptors are extracted from coarse regions of each frame and a visual thesaurus is constructed after clustering regions. The clusters, called region types, are used as basis for representing local material information through the construction of a model vector for each frame, which reflects the composition of the image in terms of region types. Model vector representation is used for keyframe selection either in each video shot or across an entire sequence. The selection process ensures that all region types are represented. A number of high-level concept detectors is then trained using global annotation and Latent Semantic Analysis is applied. To enhance detection performance per shot, detection is employed on the selected keyframes of each shot, and a framework is proposed for working on very large data sets.
Similar content being viewed by others
Notes
More details can be found in http://www-nlpir.nist.gov/projects/tv2007/tv2007.html#3.
References
Avrithis Y, Doulamis A, Doulamis N, Kollias S (1999) A stochastic framework for optimal key frame extraction from mpeg video databases. Comput Vis Image Underst 5(1):3–24
Ayache S, Quenot G (2007) TRECVID 2007 collaborative annotation using active learning. In: TRECVID 2007 workshop, Gaithersburg, 5–6 November 2007
Boujemaa N, Fleuret F, Gouet V, Sahbi H (2004) Visual content extraction for automatic semantic annotation of video news. In: IS&T/SPIE conference on storage and retrieval methods and applications for multimedia, part of electronic imaging symposium, San Jose, January 2004
Chang CC, Lin CJ (2001) LIBSVM: a library for support vector machines. http://www.csie.ntu.edu.tw/∼cjlin/libsvm
Chang SF, Sikora T, Puri A (2001) Overview of the MPEG-7 standard. IEEE Trans Circuits Systems Video Technol 11(6):688–695
Chapelle O, Haffner P, Vapnik V (1999) Support vector machines for histogram-based image classification. IEEE Trans Neural Netw 10(5):1055–1064
Chiu S (1997) Extracting fuzzy rules from data for function approximation and pattern classification. In: Dubois D, Prade H, Yager R (eds) Fuzzy information engineering: a guided tour of applications. Wiley, New York
Cooper M, Foote J (2005) Discriminative techniques for keyframe selection. In: Proceedings of the IEEE international conference on multimedia & expo (ICME), Amsterdam, 6–9 July 2005
Dance C, Willamowski J, Fan L, Bray C, Csurka G (2004) Visual categorization with bags of keypoints. In: ECCV—international workshop on statistical learning in computer vision
Deerwester S, Dumais S, Furnas GW, Landauer TK, Harshman R (1990) Indexing by latent semantic analysis. J Soc Inf Sci 41(6):391–407
Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A (2007) The PASCAL visual object classes challenge 2007 (VOC2007) results. http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html
Gokalp D, Aksoy S (2007) Scene classification using bag-of-regions representations. In: IEEE conference on computer vision and pattern recognition (CVPR), Minneapolis, 18–23 June 2007
Haykin S (1998) Neural networks: a comprehensive foundation. Prentice Hall, Englewood Cliffs
IBM (2005) MARVEL multimedia analysis and retrieval system. IBM Research White Paper
Kishida K (2005) Property of average precision and its generalization: an examination of evaluation indicator for information retrieval. NII Technical Reports, NII-2005-014E
Klir GJ, Yuan B (1995) Fuzzy sets and fuzzy logic—theory and applications. Prentice Hall, Englewood Cliffs
Laaksonen J, Koskela M, Oja E (2002) Picsom, self-organizing image retrieval with MPEG-7 content descriptors. IEEE Trans Neural Netw 13:841–853
Lazebnik S, Schmid C, Ponce J (2006) A discriminative framework for texture and object recognition using local image features. In: Towards category-level object recognition. Springer, New York, pp 423–442
Lowe D (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Ma YF, Lu L, Zhang HJ, Li M (2002) A user attention model for video summarization. In: MULTIMEDIA ’02: Proceedings of the tenth ACM international conference on multimedia. ACM, New York, pp 533–542
Manjunath B, Ohm J, Vasudevan V, Yamada A (2001) Color and texture descriptors. IEEE Trans Circuits Syst Video Technol 11(6):703–715
Mérialdo B, Huet B, Yahiaoui I, Souvannavong F (2002) Automatic video summarization. In: International thyrrenian workshop on digital communications, advanced methods for multimedia signal processing, Palazzo dei Congressi, Capri, 8–11 September 2002
Mitchell M (1998) An introduction to genetic algorithms. MIT, Cambridge
Molina J, Spyrou E, Sofou N, Martinez JM (2007) On the selection of MPEG-7 visual descriptors and their level of detail for nature disaster video sequences classification. In: 2nd international conference on semantics and digital media technologies (SAMT), Genova, 5–7 December 2007
Morris OJ, Lee MJ, Constantinides AG (1986) Graph theory for image analysis: an approach based on the shortest spanning tree. IEE Proc 133:146–152
Naphade MR, Kennedy L, Kender JR, Chang SF, Smith JR, Over P, Hauptmann A (2005) A light scale concept ontology for multimedia understanding for TRECVID 2005. IBM Research Technical Report
Natsev A, Naphade M, Smith J (2003) Lexicon design for semantic indexing in media databases. In: International conference on communication technologies and programming, Varna, 23–26 June 2003
Opelt A, Pinz A, Zisserman A (2006) Incremental learning of object detectors using a visual shape alphabet. In: IEEE computer society conference on computer vision and pattern recognition, New York, 17–22 June 2006
Russell BC, Torralba A, Murphy KP, Freeman WT (2008) Labelme: a database and web-based tool for image annotation. Int J Comput Vis 77:157–173
Saux BL, Amato G (2004) Image classifiers for scene analysis. In: International conference on computer vision and graphics, Warsaw, 22–24 September 2004
Smeaton AF, Over P, Kraaij W (2006) Evaluation campaigns and TRECVid. In: MIR ’06: proceedings of the 8th ACM international workshop on multimedia information retrieval. ACM, New York
Smeulders AWM, Worring M, Santini S, Gupta A, Jain R (2000) Content-based image retrieval at the end of the early years. IEEE Trans Pattern Anal Mach Intell 22(12):1349–1380
Snoek CGM, Worring M (2003) Time interval based modelling and classification of events in soccer video. In: Proceedings of the 9th annual conference of the advanced school for computing and imaging (ASCI), Heijen, June 2003
Souvannavong F, Mérialdo B, Huet B (2005) Region-based video content indexing and retrieval. In: CBMI 2005, fourth international workshop on content-based multimedia indexing, Riga, 21–23 June 2005
Spyrou E, Avrithis Y (2007) A region thesaurus approach for high-level concept detection in the natural disaster domain. In: 2nd international conference on semantics and digital media technologies (SAMT), Genova, December 2007
Spyrou E, Avrithis Y (2007) Keyframe extraction using local visual semantics in the form of a region thesaurus. In: 2nd international workshop on semantic media adaptation and personalization (SMAP), London, 17–18 December 2007
Spyrou E, LeBorgne H, Mailis T, Cooke E, Avrithis Y, O’Connor N (2005) Fusing MPEG-7 visual descriptors for image classification. In: International conference on artificial neural networks (ICANN), Warsaw, 11–15 September 2005
Spyrou E, Tolias G, Mylonas P, Avrithis Y (2008) A semantic multimedia analysis approach utilizing a region thesaurus and LSA. In: International workshop on image analysis for multimedia interactive services (WIAMIS), Klagenfurt, 7–9 May 2008
Sundaram H, Chang SF (2003) Video analysis and summarization at structural and semantic levels, multimedia information retrieval and management: technological fundamentals and applications. In: Feng D, Siu WC, Zhang H (Eds) Springer, New York
Vapnik V (1995) The nature of statistical learning theory. Springer, New York
Voisine N, Dasiopoulou S, Mezaris V, Spyrou E, Athanasiadis T, Kompatsiaris I, Avrithis Y, Strintzis MG (2005) Knowledge-assisted video analysis using a genetic algorithm. In: 6th international workshop on image analysis for multimedia interactive services (WIAMIS 2005), Montreux, 13–15 April 2005
Yamada A, Pickering M, Jeannin S, Cieplinski L, Ohm J, Kim M (2001) MPEG-7 Visual part of eXperimentation model version 9.0
Yanagawa A, Chang SF, Kennedy L, Hsu W (2007) Columbia universitys baseline detectors for 374 LSCOM semantic visual concepts. Columbia University ADVENT Technical Report
Yuan J, Guo Z et al (2007) THU and ICRC at TRECVID 2007. In: 5th TRECVID workshop, Gaithersburg, November 2007
Zhang H, Wu J, Zhong D, Smoliar S (1997) An integrated system for content-based retrieval and browsing. Pattern Recogn 30:643–658
Zhuang Y, Rui Y, Huang T, Mehrotra S (1998) Adaptive keyframe extraction using unsupervised clustering. In: Proc of international conference oh image processing (ICIP), Chicago, 4–7 October 1998
Acknowledgements
This work was partially supported by the European Commission under contracts FP7-215453 WeKnowIt, FP6-027026 K-Space and FP6-027685 MESH.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Spyrou, E., Tolias, G., Mylonas, P. et al. Concept detection and keyframe extraction using a visual thesaurus. Multimed Tools Appl 41, 337–373 (2009). https://doi.org/10.1007/s11042-008-0237-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-008-0237-9