article

A novel feature fusion based framework for efficient shot indexing to massive web videos

Authors:

Wei LiuAuthors Info & Claims

Telecommunications Systems, Volume 59, Issue 3

Pages 401 - 413

https://doi.org/10.1007/s11235-014-9945-9

Published: 01 July 2015 Publication History

Abstract

This study addresses an automatic approach to analyze the structure of large scale web videos based on visual and acoustic information. In our approach, video streams are macro-segmented via mining the duplicate sequences. Acoustic and visual information are both adopted for mining so as to avoid missing true-positive. Web videos contain severe visual and acoustic distortions, differing to TV data, where duplicate clips are quite similar. In this case, we present novel visual-acoustic feature schemes to handle the distortions. And shot based indexing algorithm and several temporary constrains are presented to mine the duplicate sequences, where the weak geometric verification is combined with direct hashing to achieve high efficiency and superior performance of image-based duplicate sequences detection, and dynamic programming is introduced to recall missing true-positives in audio-based section. Experiments conducted on the dataset composed of 500 h content-unknown videos show that F-Measure of duplicate sequences mining for web videos can achieve the rate of 95 % and, in terms of efficiency and detection performance, the proposed algorithm outperforms the state-of-art approaches.

References

[1]

Zhao, J. Y., Hayasaka, R., Muranoi, R., & Matsushita, Y. (1998). A MPEG video structure analysis scheme and its application to hierarchical video browser. Telecommunication Systems, 9(3---4), 403---422.

[2]

Gauch, J. M., & Shivadas, A. (2006). Finding and identifying unknown commercials using repeated video sequence detection. Computer Vision and Image Understanding, 103, 80---88.

Digital Library

[3]

Berrani S., Lechat P., & Manson G. (2007) TV broadcast macro-segmentation: metadata-based vs. content-based approaches, Proceedings of the 6th ACM international conference on Image and video retrieval, Amsterdam, The Netherlands: ACM, pp. 325---332.

[4]

Berrani, S., Manson, G., & Lechat, P. (2008). A non-supervised approach for repeated sequence detection in TV broadcast streams. Image Communication, 23, 525---537.

Digital Library

[5]

Covell, M., Baluja, S. (2006) Advertisement detection and replacement using acoustic and visual repetition, MMSP'06, IEEE 8th workshop on multimedia signal processing.

[6]

Bai, H., Wang, L., Qin, G., Zhang, J., Tao, K., Chang, X., Dong, Y. (2011). TV program segmentation using multi-modal information fusion, Proceedings of the 1st ACM international conference on multimedia retrieval, 2011 ACM, New York, NY, USA.

[7]

Wang, L., Dong, Y., Bai, H., Zhangy, J., Huang, C., & Liu, W. (2012). Content-based large scale web audio copy detection, International conference on multimedia & expo (ICME).

[8]

Hampapur, A., Hyun, K., & Bolle, R. (2002). Comparison of sequence matching techniques for video copy detection. Proceedings of the storage and retrieval for media databases, pp. 194---201.

[9]

Bai, H., Dong, Y., Liu, W., Wang, L., Huang, C., & Tao, K. (2011). France telecom orange labs (Beijing) at TRECVID 2011: Content-Based Copy Detection-TRECVID 2011 Notebook Paper.

[10]

Duan, L., Wang, J., Zheng, Y., Jin, J. S., Lu, H., & Xu, C. (2006) Segmentation, categorization, and identification of commercial clips from TV streams using multimodal analysis, Proceedings of the 14th annual ACM international conference on Multimedia, Santa Barbara, CA, USA: ACM, pp. 201---210.

[11]

Derek, Y. K., Ke, Y., Hoiem, D., & Sukthankar, R. (2005). Computer vision for music identification. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1, 597---604.

[12]

Haitsma, J., Kalker, T. (2001) Robust audio hashing for content identification, Content-based multimedia indexing (CBMI).

[13]

Dong, Y., Qin, G., Xiao, G. R., Lian, S. G., & Chang, X. F. (2013) Advanced news video parsing via visual characteristics of anchorperson scenes, Telecommunication Systems.

[14]

Smeaton, A. F., Over, P., & Doherty, A. R. (2010). Video shot boundary detection: Seven years of trecvid activity. Computer Vision and Image Understanding, 114(4), 411---418.

Digital Library

[15]

Fei-Fei, L., & Perona, P. (2005) A Bayesian hierarchical model for learning natural scene categories. Proceedings of IEEE computer vision and pattern recognition. pp. 524---531.

Digital Library

[16]

Lowe, David G. (1999). Object recognition from local scale-invariant features. Proceedings of the International Conference on Computer Vision, 2, 1150---1157.

[17]

Huang, C., & Dong, Y. (2012) A fast color feature for real-time image retrieval, IC-NIDC.

[18]

Lazebnik, S., Schmid, C., & Ponce, J. (2006) Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. CVPR

[19]

Uijlings, J. R. R., Smeulders, A. W. M., & Scha, R. J. H. (2010). Real-time visual concept classifcation. IEEE Transactions of Multimedia, 12(7), 665.

Digital Library

[20]

Nister, D., & Stewenius, H. (2006). Scalable recognition with a vocabulary tree, IEEE computer society conference on computer vision and pattern recognition. 2, 2161---2168.

[21]

Shang, L., Yang, L., Wang, F., Chan, K., & Hua, X. (2010) Real-time large scale near-duplicate web video retrieval, ACM MM.

[22]

Needleman, S. B., & Wunsch, C. D. (1970). An efficient method applicable to the search for similarities in the amino acid sequences of two proteins. Journal of Molecular Biology, 48, 444---453.

[23]

Sellers, P. H. (1974). An algorithm for the distance between two finite sequences. Journal of Combinatorial Theory, A16, 253---258.

[24]

Wang, L., Dong, Y., Bai, H., Zhangy, J., Huang, C., Liu, W. (2012) Content-based large scale web audio copy detection, International conference on multimedia & expo (ICME).

[25]

Datar, M., Immorlica, N., Indyk, P., Mirrokni, V. S. (2004) Locality-sensitive hashing scheme based on p-stable distributions, Annual symposium on computational geometry, pp. 253---262.

[26]

Gionis, A., Indyk, P., & Motwani, R. (1999) Similarity search in high dimensions via hashing, Proceeding VLDB '99 Proceedings of the 25th international conference on very large data bases, pp. 518---529.

[27]

Schaefer, G., & Zhou, H. Y. (2009). Fuzzy clustering for colour reduction in images. Telecommunication Systems, 40(1---2), 17---25.

A novel feature fusion based framework for efficient shot indexing to massive web videos
1. Computing methodologies
  1. Machine learning
    1. Machine learning algorithms

Recommendations

Online heterogeneous feature fusion machines for visual recognition

Heterogeneous Feature Fusion Machines (HFFM) is a kernel based logistic regression model that effectively fuses multiple features for visual recognition tasks. However, the batch mode solution for HFFM, 'Block Coordinate Gradient Descent' (BCGD) has the ...
A Feature-Adaptive Semi-Supervised Framework for Co-saliency Detection
MM '18: Proceedings of the 26th ACM international conference on Multimedia

Co-saliency detection, which refers to the discovery of common salient foreground regions in a group of relevant images, has attracted increasing attention due to its widespread applications in many vision tasks. Existing methods assemble features from ...
A Unified Geolocation Framework for Web Videos
Special Section on Urban Computing

In this article, we propose a unified geolocation framework to automatically determine where on the earth a web video was shot. We analyze different social, visual, and textual relationships from a real-world dataset and find four relationships with ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Telecommunications Systems

Telecommunications Systems Volume 59, Issue 3

July 2015

120 pages

ISSN:1018-4864

Issue’s Table of Contents

Copyright © Copyright © 2015 Springer Science+Business Media New York.

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 July 2015

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 25 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents