Abstract
In this paper we describe new methods to detect semantic concepts from digital video based on audible and visual content. Temporal Gradient Correlogram captures temporal correlations of gradient edge directions from sampled shot frames. Power-related physical features are extracted from short audio samples in video shots. Video shots containing people, cityscape, landscape, speech or instrumental sound are detected with trained self-organized maps and kNN classification results of audio samples. Test runs and evaluations in TREC 2002 Video Track show consistent performance for Temporal Gradient Correlogram and state-of-the-art precision in audio-based instrumental sound detection.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
IBM CueVideo Toolkit. http://www.almaden.ibm.com/projects/cuevideo.shtml (27.2.2003)
Informedia. http://www.informedia.cs.cmu.edu/ (27.2.2003)
Flickner, M., Sawhney, H., Niblack, W., Ashley, J., Huang, Q., Dom, B., Gorkani, M., Hafner, J., Lee, D., Petkovic D., Steele, D., and, Yanker, P.: Query by image and video content: The QBIC system. In IEEE Computer Magazine 28, (1995) 23–32
Smeaton, A: Browsing digital video in the FÍschlár system. Keynote presentation at Infotech Oulu International Workshop on Information Retrieval, Oulu, Finland (2001)
Virage, Inc. http://www.virage.com/ (27.2.2003)
Naphade, M.R., Kristjansson, T., Frey, B., Huang, T.S.: Probabilistic multimedia objects (multijects): a novel approach to video indexing and retrieval in multimedia systems. In proceedings of International Conference on Image Processing, vol. 3. (1998) 536–540
Naphade, M.R., Huang, T.S.: Semantic video indexing using a probabilistic framework. In proceedings of 15th International Conference on Pattern Recognition, Vol. 3. (2000) 79–84
Chang S.F., Chen W., Sundaram H.: Semantic visual templates — linking features to semantics. In Proceedings of IEEE International Conference on Image Processing, vol. 3. (1998) 531–535
Del Bimbo, A.: Expressive semantics for automatic annotation and retrieval of video streams. IEEE International Conference on Multimedia and Expo, Vol.2. (2000) 671–674
Naphade, M.R., Basu, S., Smith, J.R., Ching-Yung Lin, Tseng, B.: A statistical modeling approach to content based video retrieval. Proceedings of 16th International Conference on Pattern Recognition, Vol.2. (2002) 953–956
TREC Video Retrieval Evaluation. http://www-nlpir.nist.gov/projects/trecvid/ (27.2.2003)
Vailaya, A., Jain, A., Hong Jiang Zhang: On image classification: city vs. landscape. Proceedings of IEEE Workshop on Content-Based Access of Image and Video Libraries. (1998) 3–8
Carey M., Parris E., Lloyd-Thomas H.: A comparison of features for speech, music discrimination. Proc. ICASSP (1999)
Hoyt J. & Wechsler H.: Detection of human speech in structured noise. Proc. ICASSP (1994)
Saunders J.: Real-time discrimination of broadcast speech/music. Proc. ICASSP (1996)
Penttilä J., Peltola J., Seppänen T.: A speech/music discriminator-based audio browser with a degree of certainty measure. Proceedings of Infotech Oulu International Workshop on Information Retrieval, Oulu, Finland, (2001) 125–131
Scheirer E., Slaney M.: Construction and evaluation of a robust multifeature speech/music discriminator. Proc. ICASSP (1997)
DeValois, R.L., DeValois K.K.: Neural coding of color. In E.C. Carterette and M.P. Friedman (eds.) Handbook of perception, vol. 5. New YorK: Academic press (1975) 117–166
Mäenpää T., Pietikäinen M. & Viertola J.: Separating color and pattern information for color texture discrimination. Proceedings of 16th International Conference on Pattern Recognition, vol. 1. Quebec City, Canada (2002) 668–671
Rautiainen M., Doermann D.: Temporal color correlograms for video retrieval. Proceedings of 16th International Conference on Pattern Recognition, Quebec City, Canada (2002)
Haralick, R., Shanmugam, K., Dinstein, I.: Textural features for image classification, IEEE Transactions on Systems, Man and Cybernetics, vol. 3. (1973) 610–621
Prewitt, J.M.S.: Object enhancement and extraction. In B.S. Lipkin and A. Rosenfeld, (eds) Picture Processing and Psychopictorics, Academic Press, New York (1970)
Huang, J., Kumar, S.R., Mitra, M., Zhu W.J.: Image indexing using color correlograms. Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Juan, Puerto Rico (1997) 762–768
Manjunath, B.S., Salembier, P., Sikora, T.: Introduction to MPEG-7: Multimedia Content Description Language. Wiley, John & Sons, Inc. (2002)
Rautiainen, M., Penttilä, J., Vorobiev, D., Noponen, K., Väyrynen, P., Hosio, M., Matinmikko, E., Mäkelä, S.M., Peltola, J., Ojala, T., Seppänen, T.: TREC 2002 Video Track experiments at MediaTeam Oulu and VTT. Text Retrieval Conference TREC 2002 Video Track, Gaithersburg, MD (2002)
Open Video Project. http://www.open-video.org/ (27.2.2003)
Internet Archive Home Page. http://webdev.archive.org/movies/movies.php (27.2.2003)
Voorhees, E.M.: Overview of TREC 2001. Proceedings of the Tenth Text REtrieval Conference TREC-10 (2001)
Smeulders, A.W.M., Worring, M., Santini, S., Gupta, A., Jain, R.: Content-based image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 22., Issue 12 (2000) 1349–1380
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Rautiainen, M., Seppänen, T., Penttilä, J., Peltola, J. (2003). Detecting Semantic Concepts from Video Using Temporal Gradients and Audio Classification. In: Bakker, E.M., Lew, M.S., Huang, T.S., Sebe, N., Zhou, X.S. (eds) Image and Video Retrieval. CIVR 2003. Lecture Notes in Computer Science, vol 2728. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45113-7_26
Download citation
DOI: https://doi.org/10.1007/3-540-45113-7_26
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40634-1
Online ISBN: 978-3-540-45113-6
eBook Packages: Springer Book Archive