research-article

Multimodal late fusion bag of features applied to scene detection

Authors:

Bruno Lorenço Lopes,

Rudinei GoularteAuthors Info & Claims

WebMedia '13: Proceedings of the 19th Brazilian symposium on Multimedia and the web

Pages 15 - 22

https://doi.org/10.1145/2526188.2526202

Published: 05 November 2013 Publication History

Abstract

Recent advances in technology have increased the availability of video data, creating a strong requirement for efficient systems to manage those materials. To make efficient use of video information, first, the data has to be automatic segmented into smaller, manageable and understandable units, like scenes. This paper presents a new, multimodal video scene segmentation technique. The proposed approach is to combine Bag of Features based techniques (visual and aural) in order to explore the latent semantic obtained by them in complementary way, improving scene segmentation. The results achieved showed to be promising.

References

[1]

G. Adomavicius and A. Tuzhilin. Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. IEEE Trans. on Knowl. and Data Eng., 17(6):734--749, June 2005.

Digital Library

[2]

P. K. Atrey, M. A. Hossain, A. E. Saddik, and M. S. Kankanhalli. Multimodal fusion for multimedia analysis: a survey, 2010.

[3]

R. G. Bachu and S. Kopparthi. Separation of voiced and unvoiced using zero crossing rate and energy of the speech signal. 2008.

[4]

Y. Cai, W. Tong, L. Yang, and A. G. Hauptmann. Constrained keypoint quantization: towards better bag-of-words model for large-scale multimedia retrieval. In Proceedings of the 2nd ACM International Conference on Multimedia Retrieval, ICMR '12, pages 16:1--16:8, New York, NY, USA, 2012. ACM.

Digital Library

[5]

C. Cao, S. Chen, W. Zhang, and X. Tang. Automatic motion-guided video stylization and personalization. In Proceedings of the 19th ACM international conference on Multimedia, MM '11, pages 1041--1044, New York, NY, USA, 2011. ACM.

Digital Library

[6]

L. Chaisorn, T.-S. Chua, and C.-H. Lee. The segmentation of news video into story units. In Multimedia and Expo, 2002. ICME '02. Proceedings. 2002 IEEE International Conference on, volume 1, pages 73--76 vol.1, 2002.

[7]

S.-F. Chang and H. Sundaram. Structural and semantic analysis of video. In Multimedia and Expo, 2000. ICME 2000. 2000 IEEE International Conference on, volume 2, pages 687--690 vol.2, 2000.

[8]

V. Chasanis, A. Kalogeratos, and A. Likas. Movie segmentation into scenes and chapters using locally weighted bag of visual words. In Proceedings of the ACM International Conference on Image and Video Retrieval, CIVR '09, pages 35:1--35:7, New York, NY, USA, 2009. ACM.

Digital Library

[9]

V. Chasanis, A. Likas, and N. Galatsanos. Efficient video shot summarization using an enhanced spectral clustering approach. In Proceedings of the 18th international conference on Artificial Neural Networks, Part I, ICANN '08, pages 847--856, Berlin, Heidelberg, 2008. Springer-Verlag.

Digital Library

[10]

C.-F. Chen and Y.-C. F. Wang. Exploring self-similarities of bag-of-features for image classification. In Proceedings of the 19th ACM international conference on Multimedia, MM '11, pages 1421--1424, New York, NY, USA, 2011. ACM.

Digital Library

[11]

A. Chianese, V. Moscato, A. Penta, and A. Picariello. Scene detection using visual and audio attention. In Proceedings of the 2008 Ambi-Sys workshop on Ambient media delivery and interactive television, AMDIT '08, pages 4:1--4:7, ICST, Brussels, Belgium, Belgium, 2008. ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering).

Digital Library

[12]

M.-T. Chiang and B. Mirkin. Intelligent Choice of the Number of Clusters in K-Means Clustering: An Experimental Study with Different Cluster Spreads. Journal of Classification, 27(1):3--40, 2010.

Digital Library

[13]

P. F. Evangelista, M. J. Embrechts, and B. K. Szymanski. Some properties of the gaussian kernel for one class learning. In Proceedings of the 17th international conference on Artificial neural networks, ICANN'07, pages 269--278, Berlin, Heidelberg, 2007. Springer-Verlag.

Digital Library

[14]

J. T. Foote. Content-based retrieval of music and audio. In MULTIMEDIA STORAGE AND ARCHIVING SYSTEMS II, PROC. OF SPIE, pages 138--147, 1997.

[15]

T. Giannakopoulos, A. Makris, D. Kosmopoulos, S. Perantonis, and S. Theodoridis. Audio-visual fusion for detecting violent scenes in videos. In S. Konstantopoulos, S. Perantonis, V. Karkaletsis, C. Spyropoulos, and G. Vouros, editors, Artificial Intelligence: Theories, Models and Applications, volume 6040 of Lecture Notes in Computer Science, pages 91--100. Springer Berlin Heidelberg, 2010.

Digital Library

[16]

A. Hanjalic, R. Lagendijk, and J. Biemond. Automated high-level movie segmentation for advanced video-retrieval systems. Circuits and Systems for Video Technology, IEEE Transactions on, 9(4):580--588, jun 1999.

Digital Library

[17]

A. Klaser, M. Marsza lek, I. Laptev, and C. Schmid. Will person detection help bag-of-features action recognition?, 2010.

[18]

A. Klaser, M. Marsza lek, C. Schmid, and A. Zisserman. Human focused action localization in video, 2010.

[19]

N. Kumar, P. Rai, C. Pulla, and C. V. Jawahar. Video scene segmentation with a semantic similarity. In B. Prasad, P. Lingras, and R. Nevatia, editors, IICAI, pages 970--981. IICAI, 2011.

[20]

G. Lebanon, Y. Mao, and J. Dillon. The locally weighted bag of words framework for document representation. J. Mach. Learn. Res., 8:2405--2441, Dec. 2007.

Digital Library

[21]

C. Liu. A unified user preference based framework for video content personalization. SIGMultimedia Rec., 2(4):4--5, Dec. 2010.

Digital Library

[22]

Y. Liu, W.-L. Zhao, C.-W. Ngo, C.-S. Xu, and H.-Q. Lu. Coherent bag-of audio words model for efficient large-scale video copy detection. In Proceedings of the ACM International Conference on Image and Video Retrieval, CIVR '10, pages 89--96, New York, NY, USA, 2010. ACM.

Digital Library

[23]

Z. Liu, D. C. Gibbon, H. Drucker, and A. Basso. Content personalization and adaptation for three-screen services. In Proceedings of the 2008 international conference on Content-based image and video retrieval, CIVR '08, pages 635--644, New York, NY, USA, 2008. ACM.

Digital Library

[24]

D. G. Lowe. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision, 60(2):91--110, Nov. 2004.

Digital Library

[25]

J. B. MacQueen. Some methods for classification and analysis of multivariate observations. In L. M. L. Cam and J. Neyman, editors, Proc. of the fifth Berkeley Symposium on Mathematical Statistics and Probability, volume 1, pages 281--297. University of California Press, 1967.

[26]

J. Magalhães and F. Pereira. Using mpeg standards for multimedia customization. Sig. Proc.: Image Comm., 19(5):437--456, 2004.

[27]

M. J. Md. Rashidul Hasan. Speaker identification using mel frequency cepstral coefficients. In 3rd International Conference on Electrical & Computer Engineer, number 2004, pages 565--568, Dhaka- Bangladesh, 2004. ICECE.

[28]

P. Mohanta and S. Saha. Semantic grouping of shots in a video using modified k-means clustering. In Advances in Pattern Recognition, 2009. ICAPR '09. Seventh International Conference on, pages 125--128, feb. 2009.

Digital Library

[29]

A. Y. Ng, M. I. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. In ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS, pages 849--856. MIT Press, 2001.

Digital Library

[30]

W. Park, S. Kang, and Y.-K. Kim. A personalized multimedia contents recommendation using a psychological model. Comput. Sci. Inf. Syst., 9(1):1--21, 2012.

[31]

Z. Rasheed and M. Shah. Scene detection in hollywood movies and tv shows. In Computer Vision and Pattern Recognition, 2003. Proceedings. 2003 IEEE Computer Society Conference on, volume 2, pages II--343--8 vol.2, june 2003.

[32]

C. J. V. Rijsbergen. Information Retrieval. Butterworth-Heinemann, Newton, MA, USA, 2nd edition, 1979.

Digital Library

[33]

U. Sakarya and Z. Telatar. Graph partition based scene boundary detection. In Image and Signal Processing and Analysis, 2007. ISPA 2007. 5th International Symposium on, pages 544--549, sept. 2007.

[34]

A. F. Smeaton, P. Over, and A. R. Doherty. Video shot boundary detection: Seven years of trecvid activity. Comput. Vis. Image Underst., 114(4):411--418, Apr. 2010.

Digital Library

[35]

A. W. M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain. Content-based image retrieval at the end of the early years. IEEE Trans. Pattern Anal. Mach. Intell., 22(12):1349--1380, Dec. 2000.

Digital Library

[36]

H. Sundaram and S. F. Chang. Video Scene Segmentation using Video and Audio Features. Proc. IEEE international Conference on Multimedia and Expo (ICME), 2000.

[37]

S. M. M. Tahaghoghi, H. E. Williams, J. A. Thom, and T. Volkmer. Video cut detection using frame windows. In Proceedings of the Twenty-eighth Australasian conference on Computer Science - Volume 38, ACSC '05, pages 193--199, Darlinghurst, Australia, Australia, 2005. Australian Computer Society, Inc.

Digital Library

[38]

B. Tseng, C.-Y. Lin, and J. Smith. Using mpeg-7 and mpeg-21 for personalizing video. MultiMedia, IEEE, 11(1):42--52, jan.-march 2004.

Digital Library

[39]

J. Wang, L. Duan, H. Lu, J. Jin, and C. Xu. A mid-level scene change representation via audiovisual alignment. In Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on, volume 2, page II, may 2006.

[40]

J. Yang, Y.-G. Jiang, A. G. Hauptmann, and C.-W. Ngo. Evaluating bag-of-visual-words representations in scene classification. In Proceedings of the international workshop on Workshop on multimedia information retrieval, MIR '07, pages 197--206, New York, NY, USA, 2007. ACM.

Digital Library

[41]

M. Yeung, B.-L. Yeo, and B. Liu. Segmentation of video by clustering and graph analysis. Computer Vision and Image Understanding, 71(1):94--109, 1998.

Digital Library

[42]

Y. Zhai and M. Shah. A general framework for temporal video scene segmentation. In Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on, volume 2, pages 1111--1116 Vol. 2, oct. 2005.

Digital Library

[43]

Y. Zhai and M. Shah. Video scene segmentation using markov chain monte carlo. Multimedia, IEEE Transactions on, 8(4):686--697, 2006.

Digital Library

Index Terms

Multimodal late fusion bag of features applied to scene detection
1. Information systems
  1. Information retrieval
    1. Document representation

Recommendations

Bag of Features Tracking
ICPR '10: Proceedings of the 2010 20th International Conference on Pattern Recognition

In this paper, we propose a visual tracking approach based on "bag of features" (BoF) algorithm. We randomly sample image patches within the object region in training frames for constructing two codebooks using RGB and LBP features, instead of only one ...
Movie segmentation into scenes and chapters using locally weighted bag of visual words
CIVR '09: Proceedings of the ACM International Conference on Image and Video Retrieval

Movies segmentation into semantically correlated units is a quite tedious task due to "semantic gap". Low-level features do not provide useful information about the semantical correlation between shots and usually fail to detect scenes with constantly ...
Soft-assigned bag of features for object tracking

Hard assignment-based bag of features (BoF) representation inevitably brings in quantization errors, which may lead to inaccuracy, even failure in object tracking. In this paper, we propose a novel soft-assigned BoF tracking approach, in which soft ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

WebMedia '13: Proceedings of the 19th Brazilian symposium on Multimedia and the web

November 2013

360 pages

ISBN:9781450325592

DOI:10.1145/2526188

General Chairs:
Cássio V.S. Prazeres
Federal University of Bahia
,
Paulo N.M. Sampaio
Salvador University
,
Program Chairs:
André Santanchè
University of Campinas
,
Celso A.S. Santos
Federal University of Espírito Santo
,
Rudinei Goularte
University of São Paulo

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SBC: Brazilian Computer Society

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 November 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

WebMedia '13

Sponsor:

SBC

WebMedia '13: 19th Brazilian Symposium on Multimedia and the Web

November 5 - 8, 2013

Salvador, Brazil

Acceptance Rates

WebMedia '13 Paper Acceptance Rate 29 of 87 submissions, 33%;

Overall Acceptance Rate 270 of 873 submissions, 31%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
156
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 21 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents