research-article

A Hamming Embedding Kernel with Informative Bag-of-Visual Words for Video Semantic Indexing

Authors:

Bernard MerialdoAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 10, Issue 3

Article No.: 26, Pages 1 - 20

https://doi.org/10.1145/2535938

Published: 17 April 2014 Publication History

Abstract

In this article, we propose a novel Hamming embedding kernel with informative bag-of-visual words to address two main problems existing in traditional BoW approaches for video semantic indexing. First, Hamming embedding is employed to alleviate the information loss caused by SIFT quantization. The Hamming distances between keypoints in the same cell are calculated and integrated into the SVM kernel to better discriminate different image samples. Second, to highlight the concept-specific visual information, we propose to weight the visual words according to their informativeness for detecting specific concepts. We show that our proposed kernels can significantly improve the performance of concept detection.

References

[1]

F. Alhwarin, C. Wang, D. Ristic-Durrant, and A. Graser. 2008. Improved sift-features matching for object recognition. In Proceedings of the BCS International Academic Conference on Visions of Computer Science (VoCS'08). 179--190.

Digital Library

[2]

D. Batra, R. Sukthankar, and T. Chen. 2008. Learning class-specific affinities for image labelling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

[3]

H. Cai, K. Mikolajczyk, and J. Matas. 2011. Learning linear discriminant projections for dimensionality reduction of image descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 33, 2, 338--352.

Digital Library

[4]

M. Charikar. 2002. Similarity estimation techniques from rounding algorithms. In Proceedings of the ACM Symposium on Theory of Computing.

Digital Library

[5]

N. Cristianini, J. Kandola, A. Elisseeff, and J. S. Taylor. 2002. On kernel target alignment. Adv. Neural Inf. Process. Syst. 14, 367--373.

[6]

B. Fulkerson, A. Vedaldi, and S. Soatto. 2008. Localizing objects with smart dictionaries. In Proceedings of the European Conference on Computer Vision. 179--192.

Digital Library

[7]

J. Gemert, J. Geusebroek, C. Veenman, and A. Smeulders. 2008. Kernel codebooks for scene categorization. In Proceedings of the European Conference on Computer Vision. 696--709.

Digital Library

[8]

Gemert, J., Veenman, C., Smeulders, A., and Geusebroek, J. 2010. Visual Word Ambiguity. IEEE Trans. on Pattern Analysis and Machine Intelligence. 32, 7, 1271--1283.

Digital Library

[9]

X. He and P. Niyogi. 2003. Locality preserving projections. Adv. Neural Inf. Process. Syst. 16.

[10]

C. Igel, T. Glasmachers, B. Mersch, N. Pfeifer, and P. Meinicke. 2007. Gradient-based optimization of kernel-target alignment for sequence kernels applied to bacterial gene start detection. IEEE/ACM Trans. Comput. Biol. Bioinf. 4, 2, 216--226.

Digital Library

[11]

P. Indyk and R. Motwani. 1998. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proceedings of the Symposium on Theory of Computing.

Digital Library

[12]

M. Jain, R. Benmokhtar, and P. Gros. 2012. Hamming embedding similarity-based image classification. In Proceedings of the 2^nd ACM International Conference on Multimedia Retrieval.

Digital Library

[13]

M. Jain, H. Jegou, and P. Gros. 2011. Asymmetric hamming embedding: Taking the best of our bits for large scale image search. In Proceedings of the 19^th ACM International Conference on Multimedia. 1441--1444.

Digital Library

[14]

H. Jegou, M. Douze, and C. Schmid. 2008. Hamming embedding and weak geometric consistency for large scale image search. In Proceedings of the European Conference on Computer Vision.

Digital Library

[15]

Y. G. Jiang, C. W. Ngo, and J. Yang. 2007. Towards optimal bag-of-features for object categorization and semantic video retrieval. In Proceedings of International Conference on Image and Video Retrieval.

Digital Library

[16]

Y. G. Jiang and C. W. Ngo. 2008. Bag-of-visual-words expansion using visual relatedness for video indexing. In Proceedings of the 31^st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 769--770.

Digital Library

[17]

Y. G. Jiang, X. Zeng, G. Ye, D. Ellis, and S. F. Chang. 2010. Columbia-ucf trecvid 2010 multimedia event detection: Combining multiple modalities, contextual concepts, and temporal matching. In NIST TRECVID Workshop.

[18]

Y. G. Jiang, J. Wang, X. Xue, and S. F. Chang. 2013. Query-adaptive image search with hash codes. IEEE Trans. Multimedia 15, 2, 442--453.

Digital Library

[19]

F. Jurie and B. Triggs. 2005. Creating efficient codebooks for visual recognition. IEEE Conf. Comput. Vis., 604--610.

Digital Library

[20]

K. Kesorn and S. Poslad. 2012. An enhanced bag-of-visual word vector space model to represent visual content in athletics images. IEEE Trans. Multimedia 14, 1.

Digital Library

[21]

H. W. Kuhn. 1955. The hungarian method for the assignment problem. Naval Res. Logistics Quart. 2, 83--97.

[22]

Libsvm. 2014. http://www.csie.ntu.edu.tw/cjlin/libsvm/.

[23]

D. Lowe. 2004. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 2, 91--110.

Digital Library

[24]

K. Mikoljczyk and C. Schmid. 2004. Scale and affine invariant interest point detectors. Int. J. Comput. Vis. 60, 63--86.

Digital Library

[25]

F. Moosmann, E. Nowak, and F. Jurie. 2008. Randomized clustering forests for image classification. IEEE Trans. Pattern Anal. Mach. Intell. 30, 9, 1632--1646.

Digital Library

[26]

P. Natarajan, P. Natarajan, S. Wu, X. Zhuang, A. Vazquez-Reina, et al. 2012. BBN viser trecvid 2012 multimedia event detection and multimedia event recounting systems. In NIST TRECVID Workshop.

[27]

E. Nowak, F. Jurie, and B. Triggs. 2006. Sampling strategies for bag-of-features image classification. In Proceedings of the European Conference on Computer Vision.

Digital Library

[28]

F. Perronnin. 2008. Universal and adapted vocabularies for generic visual categorization. IEEE Trans. Pattern Anal. Machine Intell. 30, 7, 1243--1256.

Digital Library

[29]

J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. 2008. Lost in quantization: improving particular object retrieval in large scale image databases. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

[30]

P. Quelhas, F. Monay, J. Odobez, D. Perez, and T. Tuytelaars. 2007. A thousand words in a scene. IEEE Trans. Pattern Anal. Mach. Intell. 29, 9, 1575--1589.

Digital Library

[31]

G. Quenot and G. Awad. 2012. TRECVID 2012 semantic indexing task. In NIST TRECVID Workshop.

[32]

S. Roweis and L. Saul. 2000. Nonlinear dimensionality reduction by locally linear embedding. Sci. 290, 5500.

[33]

L. Saul and S. Roweis. 2003. Think globally, fit locally: Unsupervised learning of low dimensional manifolds. J. Mach. Learn. Res. 4, 12, 119--155.

Digital Library

[34]

G. Shakhnarovich. 2005. Learning task-specific similarity. PhD dissertation, Massachusetts Institute of Technology.

Digital Library

[35]

A. Sibiryakov. 2009. High-entropy hamming embedding of local image descriptors using random projections. In Proceedings of the IEEE International Workshop on Multimedia Signal Processing.

[36]

J. Sivic and A. Zisserman. 2003. Video google: A text retrieval approach to object matching in videos. In Proceedings of the IEEE Conference on Computer Vision. 1470--1477.

Digital Library

[37]

E. Sudderth, A. Torralba, W. Freeman, and A. Willsky. 2008. Describing visual scenes using transformed objects and parts. Int. J. Comput. Vis. 77, 1--3, 291--330.

Digital Library

[38]

P. Tirilly, V. Claveau, and P. Gros. 2008. Language modeling for bag-of-visual-words image categorization. In Proceedings of the International Conference on Content-Based Image and Video Retrieval. 249--258.

Digital Library

[39]

Trec Video Retrieval Evaluation. 2012. Guidelines for trecvid. http://www-nlpir.nist.gov/projects/trecvid/, http://www-nlpir.nist.gov/projects/tv2012/tv2012.html.

[40]

T. Tuytelaars and C. Schmid. 2007. Vector quantizing feature space with a regular lattice. In Proceedings of the IEEE Conference on Computer Vision.

[41]

F. Wang and B. Merialdo. 2010. Weighting informativeness of bag-of-visual-words by kernel optimization for video concept detection. In Proceedings of the International Workshop on Very-Large-Scale Multimedia Corpus, Mining and Retrieval.

Digital Library

[42]

F. Wang, Z. Sun, D. Zhang, and C. W. Ngo. 2012a. Semantic indexing and multimedia event detection: ECNU at trecvid 2012. In NIST TRECVID Workshop.

[43]

J. Wang, S. Kumar, and S. F. Chang. 2012b. Semi-supervised hashing for large-scale search. IEEE Trans. Pattern Anal. Mach. Intell. 34, 12, 2393--2406.

Digital Library

[44]

X. Wang, L. Zhang, L. Zhang, and F. Jing. 2006. Annosearch: Image auto-annotation by search. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

Digital Library

[45]

J. Winn, A. Criminisi, and T. Minka. 2005. Ojbect categorization by learned universal visual dictionary. In Proceedings of the IEEE Conference on Computer Vision. 1800--1807.

Digital Library

[46]

L. Yang, R. Jin, C. Pantofaru, and R. Sukthankar. 2007. Discriminative cluster refinement: Improving object category recognition given limited training data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

[47]

L. Yang, R. Jin, R. Sukthankar, and F. Jurie. 2008. Unifying discriminative visual codebook generation with classifier training for object category recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

[48]

Y. Yang, F. Nie, D. Xu, and J. Luo. 2012. A multimedia retrieval framework based on semi-supervised ranking and relevance feedback. IEEE Trans. Pattern Anal. Mach. Intell. 34, 4, 723--742.

Digital Library

[49]

J. Yuan, Y. Wu, and M. Yang. 2008. Discovery of collocation patterns: From visual words to visual phrases. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

Cited By

Chen WLiu YWang WBakker EGeorgiou TFieguth PLiu LLew M(2023)Deep Learning for Instance Retrieval: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2022.321859145:6(7270-7292)Online publication date: 1-Jun-2023
https://dl.acm.org/doi/10.1109/TPAMI.2022.3218591
Yao SAmin MSu LHu SLi SWang SZhao YAbdelzaher TKaplan LAggarwal CYener AXing G(2016)Recursive ground truth estimator for social data streamsProceedings of the 15th International Conference on Information Processing in Sensor Networks10.5555/2959355.2959369(1-12)Online publication date: 11-Apr-2016
https://dl.acm.org/doi/10.5555/2959355.2959369
Liu LZhang JYang A(2016)Palmprint Recognition via Sparse Coding Spatial Pyramid Matching Representation of SIFT FeatureBiometric Recognition10.1007/978-3-319-46654-5_26(235-243)Online publication date: 21-Sep-2016
https://doi.org/10.1007/978-3-319-46654-5_26
Show More Cited By

Index Terms

A Hamming Embedding Kernel with Informative Bag-of-Visual Words for Video Semantic Indexing
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Video summarization

Recommendations

Weighting informativeness of bag-of-visual-words by kernel optimization for video concept detection
VLS-MCMR '10: Proceedings of the international workshop on Very-large-scale multimedia corpus, mining and retrieval

Bag-of-Visual-Words (BoW) feature has been demonstrated effective and widely used in video concept detection due to its discriminative ability by capturing the local information in images. In the current approaches, all the words in the visual ...
Bag-of-visual-words expansion using visual relatedness for video indexing
SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval

Bag-of-visual-words (BoW) has been popular for visual classification in recent years. In this paper, we propose a novel BoW expansion method to alleviate the effect of visual word correlation problem. We achieve this by diffusing the weights of visual ...
Image Classification Model Using Visual Bag of Semantic Words
Abstract
In the image classification field, the visual bag of words (BoW) has two drawbacks. One is low classification accuracy because a visual BoW is typically extracted from local low-level visual feature vectors via key points, without considering the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 10, Issue 3

April 2014

140 pages

ISSN:1551-6857

EISSN:1551-6865

DOI:10.1145/2602979

Issue’s Table of Contents

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 April 2014

Accepted: 01 October 2013

Revised: 01 August 2013

Received: 01 April 2013

Published in TOMM Volume 10, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Fundamental Research Funds for the Central Universities
City University of Hong Kong
Shanghai Pujiang Program (no. 12PJ1402700)
National Natural Science Foundation of China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
233
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)1

Reflects downloads up to 22 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Chen WLiu YWang WBakker EGeorgiou TFieguth PLiu LLew M(2023)Deep Learning for Instance Retrieval: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2022.321859145:6(7270-7292)Online publication date: 1-Jun-2023
https://dl.acm.org/doi/10.1109/TPAMI.2022.3218591
Yao SAmin MSu LHu SLi SWang SZhao YAbdelzaher TKaplan LAggarwal CYener AXing G(2016)Recursive ground truth estimator for social data streamsProceedings of the 15th International Conference on Information Processing in Sensor Networks10.5555/2959355.2959369(1-12)Online publication date: 11-Apr-2016
https://dl.acm.org/doi/10.5555/2959355.2959369
Liu LZhang JYang A(2016)Palmprint Recognition via Sparse Coding Spatial Pyramid Matching Representation of SIFT FeatureBiometric Recognition10.1007/978-3-319-46654-5_26(235-243)Online publication date: 21-Sep-2016
https://doi.org/10.1007/978-3-319-46654-5_26
Leveau VJoly ABuisson OValduriez PHauptmann ANgo CXue XJiang YSnoek CVasconcelos N(2015)Kernelizing Spatially Consistent Visual Matches for Fine-Grained ClassificationProceedings of the 5th ACM on International Conference on Multimedia Retrieval10.1145/2671188.2749328(155-162)Online publication date: 22-Jun-2015
https://dl.acm.org/doi/10.1145/2671188.2749328

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents