Nothing Special   »   [go: up one dir, main page]

skip to main content
article

A novel feature fusion based framework for efficient shot indexing to massive web videos

Published: 01 July 2015 Publication History

Abstract

This study addresses an automatic approach to analyze the structure of large scale web videos based on visual and acoustic information. In our approach, video streams are macro-segmented via mining the duplicate sequences. Acoustic and visual information are both adopted for mining so as to avoid missing true-positive. Web videos contain severe visual and acoustic distortions, differing to TV data, where duplicate clips are quite similar. In this case, we present novel visual-acoustic feature schemes to handle the distortions. And shot based indexing algorithm and several temporary constrains are presented to mine the duplicate sequences, where the weak geometric verification is combined with direct hashing to achieve high efficiency and superior performance of image-based duplicate sequences detection, and dynamic programming is introduced to recall missing true-positives in audio-based section. Experiments conducted on the dataset composed of 500 h content-unknown videos show that F-Measure of duplicate sequences mining for web videos can achieve the rate of 95 % and, in terms of efficiency and detection performance, the proposed algorithm outperforms the state-of-art approaches.

References

[1]
Zhao, J. Y., Hayasaka, R., Muranoi, R., & Matsushita, Y. (1998). A MPEG video structure analysis scheme and its application to hierarchical video browser. Telecommunication Systems, 9(3---4), 403---422.
[2]
Gauch, J. M., & Shivadas, A. (2006). Finding and identifying unknown commercials using repeated video sequence detection. Computer Vision and Image Understanding, 103, 80---88.
[3]
Berrani S., Lechat P., & Manson G. (2007) TV broadcast macro-segmentation: metadata-based vs. content-based approaches, Proceedings of the 6th ACM international conference on Image and video retrieval, Amsterdam, The Netherlands: ACM, pp. 325---332.
[4]
Berrani, S., Manson, G., & Lechat, P. (2008). A non-supervised approach for repeated sequence detection in TV broadcast streams. Image Communication, 23, 525---537.
[5]
Covell, M., Baluja, S. (2006) Advertisement detection and replacement using acoustic and visual repetition, MMSP'06, IEEE 8th workshop on multimedia signal processing.
[6]
Bai, H., Wang, L., Qin, G., Zhang, J., Tao, K., Chang, X., Dong, Y. (2011). TV program segmentation using multi-modal information fusion, Proceedings of the 1st ACM international conference on multimedia retrieval, 2011 ACM, New York, NY, USA.
[7]
Wang, L., Dong, Y., Bai, H., Zhangy, J., Huang, C., & Liu, W. (2012). Content-based large scale web audio copy detection, International conference on multimedia & expo (ICME).
[8]
Hampapur, A., Hyun, K., & Bolle, R. (2002). Comparison of sequence matching techniques for video copy detection. Proceedings of the storage and retrieval for media databases, pp. 194---201.
[9]
Bai, H., Dong, Y., Liu, W., Wang, L., Huang, C., & Tao, K. (2011). France telecom orange labs (Beijing) at TRECVID 2011: Content-Based Copy Detection-TRECVID 2011 Notebook Paper.
[10]
Duan, L., Wang, J., Zheng, Y., Jin, J. S., Lu, H., & Xu, C. (2006) Segmentation, categorization, and identification of commercial clips from TV streams using multimodal analysis, Proceedings of the 14th annual ACM international conference on Multimedia, Santa Barbara, CA, USA: ACM, pp. 201---210.
[11]
Derek, Y. K., Ke, Y., Hoiem, D., & Sukthankar, R. (2005). Computer vision for music identification. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1, 597---604.
[12]
Haitsma, J., Kalker, T. (2001) Robust audio hashing for content identification, Content-based multimedia indexing (CBMI).
[13]
Dong, Y., Qin, G., Xiao, G. R., Lian, S. G., & Chang, X. F. (2013) Advanced news video parsing via visual characteristics of anchorperson scenes, Telecommunication Systems.
[14]
Smeaton, A. F., Over, P., & Doherty, A. R. (2010). Video shot boundary detection: Seven years of trecvid activity. Computer Vision and Image Understanding, 114(4), 411---418.
[15]
Fei-Fei, L., & Perona, P. (2005) A Bayesian hierarchical model for learning natural scene categories. Proceedings of IEEE computer vision and pattern recognition. pp. 524---531.
[16]
Lowe, David G. (1999). Object recognition from local scale-invariant features. Proceedings of the International Conference on Computer Vision, 2, 1150---1157.
[17]
Huang, C., & Dong, Y. (2012) A fast color feature for real-time image retrieval, IC-NIDC.
[18]
Lazebnik, S., Schmid, C., & Ponce, J. (2006) Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. CVPR
[19]
Uijlings, J. R. R., Smeulders, A. W. M., & Scha, R. J. H. (2010). Real-time visual concept classifcation. IEEE Transactions of Multimedia, 12(7), 665.
[20]
Nister, D., & Stewenius, H. (2006). Scalable recognition with a vocabulary tree, IEEE computer society conference on computer vision and pattern recognition. 2, 2161---2168.
[21]
Shang, L., Yang, L., Wang, F., Chan, K., & Hua, X. (2010) Real-time large scale near-duplicate web video retrieval, ACM MM.
[22]
Needleman, S. B., & Wunsch, C. D. (1970). An efficient method applicable to the search for similarities in the amino acid sequences of two proteins. Journal of Molecular Biology, 48, 444---453.
[23]
Sellers, P. H. (1974). An algorithm for the distance between two finite sequences. Journal of Combinatorial Theory, A16, 253---258.
[24]
Wang, L., Dong, Y., Bai, H., Zhangy, J., Huang, C., Liu, W. (2012) Content-based large scale web audio copy detection, International conference on multimedia & expo (ICME).
[25]
Datar, M., Immorlica, N., Indyk, P., Mirrokni, V. S. (2004) Locality-sensitive hashing scheme based on p-stable distributions, Annual symposium on computational geometry, pp. 253---262.
[26]
Gionis, A., Indyk, P., & Motwani, R. (1999) Similarity search in high dimensions via hashing, Proceeding VLDB '99 Proceedings of the 25th international conference on very large data bases, pp. 518---529.
[27]
Schaefer, G., & Zhou, H. Y. (2009). Fuzzy clustering for colour reduction in images. Telecommunication Systems, 40(1---2), 17---25.
  1. A novel feature fusion based framework for efficient shot indexing to massive web videos

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Telecommunications Systems
    Telecommunications Systems  Volume 59, Issue 3
    July 2015
    120 pages

    Publisher

    Kluwer Academic Publishers

    United States

    Publication History

    Published: 01 July 2015

    Author Tags

    1. Conditional entropy based feature selection
    2. Geometric direct hashing
    3. Multi-scale band energy difference
    4. RGB-DSIFT
    5. Unsupervised duplicate sequence detection
    6. Vocabulary tree

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 0
      Total Downloads
    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 25 Nov 2024

    Other Metrics

    Citations

    View Options

    View options

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media