Abstract
This paper presents a novel method for annotating videos taken from the TRECVID 2005 data using only static visual features and metadata of still image frames. The method is designed to provide the user with annotation or tagging tools to incorporate multimedia data such as video or still images as well as text into searching or other combined applications running either on the web or on other networks. It mainly uses MPEG-7-based visual features and metadata of prototype images and allows the user to select either a prototype or a training set. It also adaptively adjusts the weights of the visual features the user finds most adequate to bridge the semantic gap. The user can also detect relevant regions in video frames by using a self-developed segmentation tool and can carry out region-based annotation with the same video frame set. The method provides satisfactory results even when the annotations of the TRECVID 2005 video data greatly vary considering the semantic level of concepts. It is simple and fast, using a very small set of training data and little or no user intervention. It also has the advantage that it can be applied to any combination of visual and textual features.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Zhao, R., Grosky, W.I.: Negotiating the semantic gap: from feature maps to semantic landscapes. Pattern Recognition 35, 593–600 (2002)
Zhou, X.S., Huang, T.S.: Unifying Keywords and Visual Contents in Image Retrieval. IEEE Multimedia 9(2), 23–33 (2002)
Barnard, K., et al.: Matching Words and Pictures. Journal of Machine Learning Research 3, 1107–1135 (2003)
Hofmann, T.: Learning and Representing Topic. A Hierarchical Mixture Model for Word Occurrences in Document Databases. In: Proc. of CONALD, Pittsburgh (1998)
Wang, J.Z., Li, J.: Learning-based linguistic indexing of pictures with 2-D MHMMs. In: Proc. ACM Multimedia, pp. 436–445. ACM Press, New York (2002)
Lim, J.-H., Tian, Q., Mulhem, P.: Home Photo Content Modeling for Personalized Event-Based Retrieval. IEEE Multimedia 9(2), 28–37 (2003)
Matsumoto, K., et al.: SVM-based Shot Boundary Detection with a Novel Feature. In: ICME. IEEE Proc. of International Conference Multimedia Exhibition, pp. 1837–1840 (2006)
Qi, G., et al.: Video Annotation by Active Learning and Cluster Tuning, IEEE-Computer Society. In: CVPRW. Proc. Of the 2006 Conference on Computer Vision and Pattern Recognition Workshop, pp. 114–121 (2006)
Vapnik, V.N.: The Nature of Statistical Learning Theory, 2nd edn. Springer, Heidelberg (1999)
Carbonetto, P., de Freitas, N., Barnard, K.: A Statistical Model for General Contextual Object Recognition. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3021, pp. 350–362. Springer, Heidelberg (2004)
Gabrilovich, E., Markovitch, S.: Feature Generation for Text Categorization Using World Knowledge. In: Proc. of The 19th International Joint Conf. for Artificial Intelligence (2005)
Metzler, D., Manmatha, R.: An inference network approach to image retrieval. In: Enser, P.G.B., Kompatsiaris, Y., O’Connor, N.E., Smeaton, A.F., Smeulders, A.W.M. (eds.) CIVR 2004. LNCS, vol. 3115, pp. 42–50. Springer, Heidelberg (2004)
Manjunath, B.S., et al.: Introduction to MPEG-7. Wiley, Chichester (2002)
Smith, J.R., et al.: Large-Scale Concept Ontology for Multimedia. IEEE Multimedia 14(1), 86–90 (2007)
Hanbury, A.: MUSCLE, Guide to annotation, Version 2.12, Tech. Univ. of Vienna (2006)
Hanbury, A.: Analysis of Keywords Used in Image Understanding Tasks. In: Proceedings of the OntoImage Workshop, Genoa, Italy (2006)
Boll, S.: MultiTube–Where Web 2.0 and Multimedia Could Meet. IEEE Multimedia 14(1), 9–13 (2007)
Paul, O.: Guidelines for the TRECVID, 2005 Evaluation (2006), http://www-nlpir.nist.gov/projects/tv2005/
Manjunath, B.S., et al.: Color and Texture Descriptors. IEEE Trans.on Circuits and Systems for Video Technology 11(6), 703–715 (2001)
Huang, J., et al.: Image indexing using color correlograms. In: Proc. IEEE Comp. Soc. Conf. Comp. Vis. and Patt. Rec., pp. 762–768. IEEE Computer Society Press, Los Alamitos (1997)
Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)
Kutics, A., Nakagawa, A.: Detecting Prominent Objects for Image Retrieval. In: IEEE International Conference on ICIP, vol. 3, pp. 445–448 (2005)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kutics, A., Nakagawa, A., Shindoh, K. (2007). Use of Adaptive Still Image Descriptors for Annotation of Video Frames. In: Kamel, M., Campilho, A. (eds) Image Analysis and Recognition. ICIAR 2007. Lecture Notes in Computer Science, vol 4633. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74260-9_61
Download citation
DOI: https://doi.org/10.1007/978-3-540-74260-9_61
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74258-6
Online ISBN: 978-3-540-74260-9
eBook Packages: Computer ScienceComputer Science (R0)