Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2526188.2526202acmotherconferencesArticle/Chapter ViewAbstractPublication PageswebmediaConference Proceedingsconference-collections
research-article

Multimodal late fusion bag of features applied to scene detection

Published: 05 November 2013 Publication History

Abstract

Recent advances in technology have increased the availability of video data, creating a strong requirement for efficient systems to manage those materials. To make efficient use of video information, first, the data has to be automatic segmented into smaller, manageable and understandable units, like scenes. This paper presents a new, multimodal video scene segmentation technique. The proposed approach is to combine Bag of Features based techniques (visual and aural) in order to explore the latent semantic obtained by them in complementary way, improving scene segmentation. The results achieved showed to be promising.

References

[1]
G. Adomavicius and A. Tuzhilin. Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. IEEE Trans. on Knowl. and Data Eng., 17(6):734--749, June 2005.
[2]
P. K. Atrey, M. A. Hossain, A. E. Saddik, and M. S. Kankanhalli. Multimodal fusion for multimedia analysis: a survey, 2010.
[3]
R. G. Bachu and S. Kopparthi. Separation of voiced and unvoiced using zero crossing rate and energy of the speech signal. 2008.
[4]
Y. Cai, W. Tong, L. Yang, and A. G. Hauptmann. Constrained keypoint quantization: towards better bag-of-words model for large-scale multimedia retrieval. In Proceedings of the 2nd ACM International Conference on Multimedia Retrieval, ICMR '12, pages 16:1--16:8, New York, NY, USA, 2012. ACM.
[5]
C. Cao, S. Chen, W. Zhang, and X. Tang. Automatic motion-guided video stylization and personalization. In Proceedings of the 19th ACM international conference on Multimedia, MM '11, pages 1041--1044, New York, NY, USA, 2011. ACM.
[6]
L. Chaisorn, T.-S. Chua, and C.-H. Lee. The segmentation of news video into story units. In Multimedia and Expo, 2002. ICME '02. Proceedings. 2002 IEEE International Conference on, volume 1, pages 73--76 vol.1, 2002.
[7]
S.-F. Chang and H. Sundaram. Structural and semantic analysis of video. In Multimedia and Expo, 2000. ICME 2000. 2000 IEEE International Conference on, volume 2, pages 687--690 vol.2, 2000.
[8]
V. Chasanis, A. Kalogeratos, and A. Likas. Movie segmentation into scenes and chapters using locally weighted bag of visual words. In Proceedings of the ACM International Conference on Image and Video Retrieval, CIVR '09, pages 35:1--35:7, New York, NY, USA, 2009. ACM.
[9]
V. Chasanis, A. Likas, and N. Galatsanos. Efficient video shot summarization using an enhanced spectral clustering approach. In Proceedings of the 18th international conference on Artificial Neural Networks, Part I, ICANN '08, pages 847--856, Berlin, Heidelberg, 2008. Springer-Verlag.
[10]
C.-F. Chen and Y.-C. F. Wang. Exploring self-similarities of bag-of-features for image classification. In Proceedings of the 19th ACM international conference on Multimedia, MM '11, pages 1421--1424, New York, NY, USA, 2011. ACM.
[11]
A. Chianese, V. Moscato, A. Penta, and A. Picariello. Scene detection using visual and audio attention. In Proceedings of the 2008 Ambi-Sys workshop on Ambient media delivery and interactive television, AMDIT '08, pages 4:1--4:7, ICST, Brussels, Belgium, Belgium, 2008. ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering).
[12]
M.-T. Chiang and B. Mirkin. Intelligent Choice of the Number of Clusters in K-Means Clustering: An Experimental Study with Different Cluster Spreads. Journal of Classification, 27(1):3--40, 2010.
[13]
P. F. Evangelista, M. J. Embrechts, and B. K. Szymanski. Some properties of the gaussian kernel for one class learning. In Proceedings of the 17th international conference on Artificial neural networks, ICANN'07, pages 269--278, Berlin, Heidelberg, 2007. Springer-Verlag.
[14]
J. T. Foote. Content-based retrieval of music and audio. In MULTIMEDIA STORAGE AND ARCHIVING SYSTEMS II, PROC. OF SPIE, pages 138--147, 1997.
[15]
T. Giannakopoulos, A. Makris, D. Kosmopoulos, S. Perantonis, and S. Theodoridis. Audio-visual fusion for detecting violent scenes in videos. In S. Konstantopoulos, S. Perantonis, V. Karkaletsis, C. Spyropoulos, and G. Vouros, editors, Artificial Intelligence: Theories, Models and Applications, volume 6040 of Lecture Notes in Computer Science, pages 91--100. Springer Berlin Heidelberg, 2010.
[16]
A. Hanjalic, R. Lagendijk, and J. Biemond. Automated high-level movie segmentation for advanced video-retrieval systems. Circuits and Systems for Video Technology, IEEE Transactions on, 9(4):580--588, jun 1999.
[17]
A. Klaser, M. Marsza lek, I. Laptev, and C. Schmid. Will person detection help bag-of-features action recognition?, 2010.
[18]
A. Klaser, M. Marsza lek, C. Schmid, and A. Zisserman. Human focused action localization in video, 2010.
[19]
N. Kumar, P. Rai, C. Pulla, and C. V. Jawahar. Video scene segmentation with a semantic similarity. In B. Prasad, P. Lingras, and R. Nevatia, editors, IICAI, pages 970--981. IICAI, 2011.
[20]
G. Lebanon, Y. Mao, and J. Dillon. The locally weighted bag of words framework for document representation. J. Mach. Learn. Res., 8:2405--2441, Dec. 2007.
[21]
C. Liu. A unified user preference based framework for video content personalization. SIGMultimedia Rec., 2(4):4--5, Dec. 2010.
[22]
Y. Liu, W.-L. Zhao, C.-W. Ngo, C.-S. Xu, and H.-Q. Lu. Coherent bag-of audio words model for efficient large-scale video copy detection. In Proceedings of the ACM International Conference on Image and Video Retrieval, CIVR '10, pages 89--96, New York, NY, USA, 2010. ACM.
[23]
Z. Liu, D. C. Gibbon, H. Drucker, and A. Basso. Content personalization and adaptation for three-screen services. In Proceedings of the 2008 international conference on Content-based image and video retrieval, CIVR '08, pages 635--644, New York, NY, USA, 2008. ACM.
[24]
D. G. Lowe. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision, 60(2):91--110, Nov. 2004.
[25]
J. B. MacQueen. Some methods for classification and analysis of multivariate observations. In L. M. L. Cam and J. Neyman, editors, Proc. of the fifth Berkeley Symposium on Mathematical Statistics and Probability, volume 1, pages 281--297. University of California Press, 1967.
[26]
J. Magalhães and F. Pereira. Using mpeg standards for multimedia customization. Sig. Proc.: Image Comm., 19(5):437--456, 2004.
[27]
M. J. Md. Rashidul Hasan. Speaker identification using mel frequency cepstral coefficients. In 3rd International Conference on Electrical & Computer Engineer, number 2004, pages 565--568, Dhaka- Bangladesh, 2004. ICECE.
[28]
P. Mohanta and S. Saha. Semantic grouping of shots in a video using modified k-means clustering. In Advances in Pattern Recognition, 2009. ICAPR '09. Seventh International Conference on, pages 125--128, feb. 2009.
[29]
A. Y. Ng, M. I. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. In ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS, pages 849--856. MIT Press, 2001.
[30]
W. Park, S. Kang, and Y.-K. Kim. A personalized multimedia contents recommendation using a psychological model. Comput. Sci. Inf. Syst., 9(1):1--21, 2012.
[31]
Z. Rasheed and M. Shah. Scene detection in hollywood movies and tv shows. In Computer Vision and Pattern Recognition, 2003. Proceedings. 2003 IEEE Computer Society Conference on, volume 2, pages II--343--8 vol.2, june 2003.
[32]
C. J. V. Rijsbergen. Information Retrieval. Butterworth-Heinemann, Newton, MA, USA, 2nd edition, 1979.
[33]
U. Sakarya and Z. Telatar. Graph partition based scene boundary detection. In Image and Signal Processing and Analysis, 2007. ISPA 2007. 5th International Symposium on, pages 544--549, sept. 2007.
[34]
A. F. Smeaton, P. Over, and A. R. Doherty. Video shot boundary detection: Seven years of trecvid activity. Comput. Vis. Image Underst., 114(4):411--418, Apr. 2010.
[35]
A. W. M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain. Content-based image retrieval at the end of the early years. IEEE Trans. Pattern Anal. Mach. Intell., 22(12):1349--1380, Dec. 2000.
[36]
H. Sundaram and S. F. Chang. Video Scene Segmentation using Video and Audio Features. Proc. IEEE international Conference on Multimedia and Expo (ICME), 2000.
[37]
S. M. M. Tahaghoghi, H. E. Williams, J. A. Thom, and T. Volkmer. Video cut detection using frame windows. In Proceedings of the Twenty-eighth Australasian conference on Computer Science - Volume 38, ACSC '05, pages 193--199, Darlinghurst, Australia, Australia, 2005. Australian Computer Society, Inc.
[38]
B. Tseng, C.-Y. Lin, and J. Smith. Using mpeg-7 and mpeg-21 for personalizing video. MultiMedia, IEEE, 11(1):42--52, jan.-march 2004.
[39]
J. Wang, L. Duan, H. Lu, J. Jin, and C. Xu. A mid-level scene change representation via audiovisual alignment. In Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on, volume 2, page II, may 2006.
[40]
J. Yang, Y.-G. Jiang, A. G. Hauptmann, and C.-W. Ngo. Evaluating bag-of-visual-words representations in scene classification. In Proceedings of the international workshop on Workshop on multimedia information retrieval, MIR '07, pages 197--206, New York, NY, USA, 2007. ACM.
[41]
M. Yeung, B.-L. Yeo, and B. Liu. Segmentation of video by clustering and graph analysis. Computer Vision and Image Understanding, 71(1):94--109, 1998.
[42]
Y. Zhai and M. Shah. A general framework for temporal video scene segmentation. In Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on, volume 2, pages 1111--1116 Vol. 2, oct. 2005.
[43]
Y. Zhai and M. Shah. Video scene segmentation using markov chain monte carlo. Multimedia, IEEE Transactions on, 8(4):686--697, 2006.

Index Terms

  1. Multimodal late fusion bag of features applied to scene detection

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    WebMedia '13: Proceedings of the 19th Brazilian symposium on Multimedia and the web
    November 2013
    360 pages
    ISBN:9781450325592
    DOI:10.1145/2526188
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    • SBC: Brazilian Computer Society

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 05 November 2013

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. audio descriptors
    2. bag of features
    3. multimedia
    4. scene detection
    5. visual descriptors

    Qualifiers

    • Research-article

    Conference

    WebMedia '13
    Sponsor:
    • SBC

    Acceptance Rates

    WebMedia '13 Paper Acceptance Rate 29 of 87 submissions, 33%;
    Overall Acceptance Rate 270 of 873 submissions, 31%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 156
      Total Downloads
    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 21 Nov 2024

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media