research-article

Submodular video hashing: a unified framework towards video pooling and indexing

Authors:

Liangliang Cao,

Shih-Fu ChangAuthors Info & Claims

MM '12: Proceedings of the 20th ACM international conference on Multimedia

Pages 299 - 308

https://doi.org/10.1145/2393347.2393393

Published: 29 October 2012 Publication History

Abstract

This paper develops a novel framework for efficient large-scale video retrieval. We aim to find video according to higher level similarities, which is beyond the scope of traditional near duplicate search. Following the popular hashing technique we employ compact binary codes to facilitate nearest neighbor search. Unlike the previous methods which capitalize on only one type of hash code for retrieval, this paper combines heterogeneous hash codes to effectively describe the diverse and multi-scale visual contents in videos. Our method integrates feature pooling and hashing in a single framework. In the pooling stage, we cast video frames into a set of pre-specified components, which capture a variety of semantics of video contents. In the hashing stage, we represent each video component as a compact hash code, and combine multiple hash codes into hash tables for effective search. To speed up the retrieval while retaining most informative codes, we propose a graph-based influence maximization method to bridge the pooling and hashing stages. We show that the influence maximization problem is submodular, which allows a greedy optimization method to achieve a nearly optimal solution. Our method works very efficiently, retrieving thousands of video clips from TRECVID dataset in about 0.001 second. For a larger scale synthetic dataset with 1M samples, it uses less than 1 second in response to 100 queries. Our method is extensively evaluated in both unsupervised and supervised scenarios, and the results on TRECVID Multimedia Event Detection and Columbia Consumer Video datasets demonstrate the success of our proposed technique.

References

[1]

Y. Boureau, N. Le Roux, F. Bach, J. Ponce, and Y. LeCun. Ask the locals: multi-way local pooling for image recognition. In ICCV, 2011.

Digital Library

[2]

Y.-L. Boureau, F. Bach, Y. LeCun, and J. Ponce. Learning mid-level features for recognition. In CVPR, 2010.

[3]

S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. Computer networks and ISDN systems, 30(1--7):107--117, 1998.

Digital Library

[4]

L. Cao, Y. Mu, S.-F. Chang, A. Natsev, G. Hua, and J. R. Smith. Scene aligned pooling for complex video recognition. In ECCV, 2012.

Digital Library

[5]

F. Chung. Spectral graph theory. Number 92. Amer Mathematical Society, 1997.

[6]

A. Das and D. Kempe. Submodular meets spectral: Greedy algorithms for subset selection, sparse approximation and dictionary selection. ICML, 2011.

Digital Library

[7]

L. Fei-Fei and P. Perona. A bayesian hierarchy model for learning natural scene categories. In CVPR, 2005.

Digital Library

[8]

A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions via hashing. In VLDB, pages 518--529, 1999.

Digital Library

[9]

J. He, S. Kumar, and S.-F. Chang. On the difficulty of nearest neighbor search. In ICML, 2012.

[10]

J. He, W. Liu, and S. Chang. Scalable similarity search with optimized kernel hashing. In SIGKDD, pages 1129--1138, 2010.

Digital Library

[11]

R. Horn and C. Johnson. Matrix analysis. Cambridge Univ Pr, 1990.

Digital Library

[12]

Z. Huang, H. Shen, J. Shao, X. Zhou, and B. Cui. Bounded coordinate system indexing for real-time video clip search. ACM Transactions on Information Systems, 27(3):17, 2009.

Digital Library

[13]

G. Jeh and J. Widom. Simrank: a measure of structural-context similarity. In ACM SIGKDD, pages 538--543, 2002.

Digital Library

[14]

Y.-G. Jiang, G. Ye, S.-F. Chang, D. Ellis, and A. C. Loui. Consumer video understanding: A benchmark database and an evaluation of human and machine performance. In ICMR, 2011.

Digital Library

[15]

A. Karpenko and P. Aarabi. Tiny videos: A large dataset for nonparametric video retrieval and frame classification. TPAMI, 33(3):618--630, 2011.

Digital Library

[16]

Y. Ke, R. Sukthankar, and L. Huston. An efficient parts-based near-duplicate and sub-image retrieval system. In ACM Multimedia, pages 869--876, 2004.

Digital Library

[17]

G. Kim, E. Xing, L. Fei-Fei, and T. Kanade. Distributed cosegmentation via submodular optimization on anisotropic diffusion. In ICCV, pages 169--176, 2011.

Digital Library

[18]

A. Krause and C. Guestrin. Beyond convexity: Submodularity in machine learning. ICML Tutorials, 2008.

[19]

S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In CVPR, volume 2, pages 2169--2178, 2006.

Digital Library

[20]

Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278--2324, 1998.

[21]

J. Leskovec, A. Krause, C. Guestrin, C. Faloutsos, J. VanBriesen, and N. Glance. Cost-effective outbreak detection in networks. KDD, 2007.

Digital Library

[22]

L.-J. Li, H. Su, E. P. Xing, and L. Fei-Fei. Object bank: A high-level image representation for scene classification and semantic feature sparsification. In NIPS, 2010.

[23]

Z. Li, H. Ning, L. Cao, T. Zhang, Y. Gong, and T. Huang. Learning to search efficiently in high dimensions. NIPS, 24, 2011.

[24]

J. Liu, Z. Huang, H. Shen, and B. Cui. Correlation-based retrieval for heavily changed near-duplicate videos. ACM Transactions on Information Systems, 29(4):21, 2011.

Digital Library

[25]

W. Liu, J. Wang, R. Ji, Y.-G. Jiang, and S.-F. Chang. Supervised hashing with kernels. CVPR, 2012.

[26]

W. Liu, J. Wang, S. Kumar, and S.-F. Chang. Hashing with graphs. ICML, 2011.

Digital Library

[27]

M. Merler, L. X. Bert Huang, G. Hua, and A. Natsev. Semantic model vectors for complex video event recognition. IEEE Transactions on Multimedia, 2011.

Digital Library

[28]

G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher. An analysis of approximations for maximizing submodular set functions. Mathematical Programming, 14:265--294, 1978.

Digital Library

[29]

A. Oliva and A. Torralba. Modeling the shape of the scene: a holistic representation of the spatial envelope. IJCV, 2001.

Digital Library

[30]

B. Russell, A. Torralba, C. Liu, R. Fergus, and W. T. Freeman. Object recognition by scene alignment. In NIPS, 2007.

[31]

G. Shakhnarovich, P. Viola, and T. Darrell. Fast pose estimation with parameter-sensitive hashing. In ICCV, pages 750--757, 2003.

Digital Library

[32]

J. Song, Y. Yang, Z. Huang, H. Shen, and R. Hong. Multiple feature hashing for real-time large scale near-duplicate video retrieval. In ACM Multimedia, pages 423--432, 2011.

Digital Library

[33]

A. Torralba, R. Fergus, and Y. Weiss. Small codes and large image databases for recognition. In CVPR, 2008.

[34]

Y. Weiss, A. Torralba, and R. Fergus. Spectral hashing. NIPS, 2008.

Digital Library

[35]

X. Wu, A. Hauptmann, and C. Ngo. Practical elimination of near-duplicates from web video search. In ACM Multimedia, pages 218--227, 2007.

Digital Library

[36]

J. Xiao, J. Haysy, K. A. Ehinger, A. Oliva, and A. Torralba. Sun database: Large-scale scene recognition from abbey to zoo. In CVPR, 2010.

[37]

J. Yang, K. Yu, Y. Gong, and T. Huang. Linear pyramid matching using sparse coding for image classification. In CVPR, 2009.

[38]

D. Zhang, F. Wang, and L. Si. Composite hashing with multiple information sources. In SIGIR, pages 225--234, 2011.

Digital Library

[39]

X. Zhu. Semi-supervised learning literature survey. 2005.

[40]

X. Zhu, Z. Ghahramani, and J. Lafferty. Semi-supervised learning using gaussian fields and harmonic functions. In ICML, 2003.

Digital Library

[41]

J. Zobel, A. Moffat, and K. Ramamohanarao. Inverted files versus signature files for text indexing. ACM Transactions on Database Systems, 23(4):453--490, 1998.

Digital Library

Cited By

Duan JHao YZhu BCheng LZhou PWang X(2024)Efficient Unsupervised Video Hashing With Contextual Modeling and Structural ControllingIEEE Transactions on Multimedia10.1109/TMM.2024.336892426(7438-7450)Online publication date: 2024
https://doi.org/10.1109/TMM.2024.3368924
Zachariah ARao P(2023)Video Retrieval for Everyday Scenes With Common ObjectsProceedings of the 2023 ACM International Conference on Multimedia Retrieval10.1145/3591106.3592239(565-570)Online publication date: 12-Jun-2023
https://dl.acm.org/doi/10.1145/3591106.3592239
Shen XZhou YYuan YYang XLan LZheng Y(2023)Contrastive Transformer Hashing for Compact Video RepresentationIEEE Transactions on Image Processing10.1109/TIP.2023.332699432(5992-6003)Online publication date: 2023
https://doi.org/10.1109/TIP.2023.3326994
Show More Cited By

Index Terms

Submodular video hashing: a unified framework towards video pooling and indexing
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking

Recommendations

Video Retrieval with Similarity-Preserving Deep Temporal Hashing

Despite the fact that remarkable progress has been made in recent years, Content-based Video Retrieval (CBVR) is still an appealing research topic due to increasing search demands in the Internet era of big data. This article aims to explore an ...
Classification-enhancement deep hashing for large-scale video retrieval
Abstract
With the explosive growth of video data on the Internet, retrieving and detecting similar video contents effectively has become a challenging problem. Whereas hashing is a mature technique for dealing with this problem, especially in ...
Highlights
- Triplet-wise loss is applied into video hashing for similarity preserving.
- Add ...
Effective hashing for large-scale multimedia search
SIGMOD'13 PhD Symposium: Proceedings of the 2013 SIGMOD/PODS Ph.D. symposium

With the rapid development of the Internet and multimedia technologies over the last decade, a huge amount of data has become available, from text corpus, to collections of online images and videos. Cheap storage cost and modern database technologies ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '12: Proceedings of the 20th ACM international conference on Multimedia

October 2012

1584 pages

ISBN:9781450310895

DOI:10.1145/2393347

General Chairs:
Noboru Babaguchi
Osaka University, Japan
,
Kiyoharu Aizawa
The University of Tokyo, Japan
,
John Smith
IBM, USA
,
Program Chairs:
Shin'ichi Satoh
National Institute of Informatics, Japan
,
Thomas Plagemann
University of Oslo, Norway
,
Xian-Sheng Hua
Microsoft, USA
,
Rong Yan
Facebook, USA

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 October 2012

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MM '12

Sponsor:

SIGMM

MM '12: ACM Multimedia Conference

October 29 - November 2, 2012

Nara, Japan

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

48
Total Citations
View Citations
353
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)2

Reflects downloads up to 22 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Duan JHao YZhu BCheng LZhou PWang X(2024)Efficient Unsupervised Video Hashing With Contextual Modeling and Structural ControllingIEEE Transactions on Multimedia10.1109/TMM.2024.336892426(7438-7450)Online publication date: 2024
https://doi.org/10.1109/TMM.2024.3368924
Zachariah ARao P(2023)Video Retrieval for Everyday Scenes With Common ObjectsProceedings of the 2023 ACM International Conference on Multimedia Retrieval10.1145/3591106.3592239(565-570)Online publication date: 12-Jun-2023
https://dl.acm.org/doi/10.1145/3591106.3592239
Shen XZhou YYuan YYang XLan LZheng Y(2023)Contrastive Transformer Hashing for Compact Video RepresentationIEEE Transactions on Image Processing10.1109/TIP.2023.332699432(5992-6003)Online publication date: 2023
https://doi.org/10.1109/TIP.2023.3326994
Wu KXu L(2023)Deep Hybrid Neural Network With Attention Mechanism for Video Hash Retrieval MethodIEEE Access10.1109/ACCESS.2023.327632111(47956-47966)Online publication date: 2023
https://doi.org/10.1109/ACCESS.2023.3276321
Li SLi XLu JZhou J(2022)Structure-Adaptive Neighborhood Preserving Hashing for Scalable Video SearchIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2021.309325832:4(2441-2454)Online publication date: Apr-2022
https://doi.org/10.1109/TCSVT.2021.3093258
Wu YLiu XQin HXia KHu SMa YWang M(2021)Boosting Temporal Binary Coding for Large-Scale Video SearchIEEE Transactions on Multimedia10.1109/TMM.2020.297859323(353-364)Online publication date: 2021
https://doi.org/10.1109/TMM.2020.2978593
Qi MQin JYang YWang YLuo J(2021)Semantics-Aware Spatial-Temporal Binaries for Cross-Modal Video RetrievalIEEE Transactions on Image Processing10.1109/TIP.2020.304868030(2989-3004)Online publication date: 2021
https://doi.org/10.1109/TIP.2020.3048680
Singh P(2021)Robust Homomorphic Video Hashing2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)10.1109/MIPR51284.2021.00021(91-96)Online publication date: Sep-2021
https://doi.org/10.1109/MIPR51284.2021.00021
Bui TCooper DCollomosse JBell MGreen ASheridan JHiggins JDas AKeller JThereaux O(2020)Tamper-Proofing Video With Hierarchical Attention Autoencoder Hashing on BlockchainIEEE Transactions on Multimedia10.1109/TMM.2020.296764022:11(2858-2872)Online publication date: Nov-2020
https://doi.org/10.1109/TMM.2020.2967640
Wang XWang QWang H(2020)Active Video Hashing via Structure Information Learning for Activity AnalysisIEEE Access10.1109/ACCESS.2020.29947838(96428-96437)Online publication date: 2020
https://doi.org/10.1109/ACCESS.2020.2994783
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents