Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2393347.2393393acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Submodular video hashing: a unified framework towards video pooling and indexing

Published: 29 October 2012 Publication History

Abstract

This paper develops a novel framework for efficient large-scale video retrieval. We aim to find video according to higher level similarities, which is beyond the scope of traditional near duplicate search. Following the popular hashing technique we employ compact binary codes to facilitate nearest neighbor search. Unlike the previous methods which capitalize on only one type of hash code for retrieval, this paper combines heterogeneous hash codes to effectively describe the diverse and multi-scale visual contents in videos. Our method integrates feature pooling and hashing in a single framework. In the pooling stage, we cast video frames into a set of pre-specified components, which capture a variety of semantics of video contents. In the hashing stage, we represent each video component as a compact hash code, and combine multiple hash codes into hash tables for effective search. To speed up the retrieval while retaining most informative codes, we propose a graph-based influence maximization method to bridge the pooling and hashing stages. We show that the influence maximization problem is submodular, which allows a greedy optimization method to achieve a nearly optimal solution. Our method works very efficiently, retrieving thousands of video clips from TRECVID dataset in about 0.001 second. For a larger scale synthetic dataset with 1M samples, it uses less than 1 second in response to 100 queries. Our method is extensively evaluated in both unsupervised and supervised scenarios, and the results on TRECVID Multimedia Event Detection and Columbia Consumer Video datasets demonstrate the success of our proposed technique.

References

[1]
Y. Boureau, N. Le Roux, F. Bach, J. Ponce, and Y. LeCun. Ask the locals: multi-way local pooling for image recognition. In ICCV, 2011.
[2]
Y.-L. Boureau, F. Bach, Y. LeCun, and J. Ponce. Learning mid-level features for recognition. In CVPR, 2010.
[3]
S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. Computer networks and ISDN systems, 30(1--7):107--117, 1998.
[4]
L. Cao, Y. Mu, S.-F. Chang, A. Natsev, G. Hua, and J. R. Smith. Scene aligned pooling for complex video recognition. In ECCV, 2012.
[5]
F. Chung. Spectral graph theory. Number 92. Amer Mathematical Society, 1997.
[6]
A. Das and D. Kempe. Submodular meets spectral: Greedy algorithms for subset selection, sparse approximation and dictionary selection. ICML, 2011.
[7]
L. Fei-Fei and P. Perona. A bayesian hierarchy model for learning natural scene categories. In CVPR, 2005.
[8]
A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions via hashing. In VLDB, pages 518--529, 1999.
[9]
J. He, S. Kumar, and S.-F. Chang. On the difficulty of nearest neighbor search. In ICML, 2012.
[10]
J. He, W. Liu, and S. Chang. Scalable similarity search with optimized kernel hashing. In SIGKDD, pages 1129--1138, 2010.
[11]
R. Horn and C. Johnson. Matrix analysis. Cambridge Univ Pr, 1990.
[12]
Z. Huang, H. Shen, J. Shao, X. Zhou, and B. Cui. Bounded coordinate system indexing for real-time video clip search. ACM Transactions on Information Systems, 27(3):17, 2009.
[13]
G. Jeh and J. Widom. Simrank: a measure of structural-context similarity. In ACM SIGKDD, pages 538--543, 2002.
[14]
Y.-G. Jiang, G. Ye, S.-F. Chang, D. Ellis, and A. C. Loui. Consumer video understanding: A benchmark database and an evaluation of human and machine performance. In ICMR, 2011.
[15]
A. Karpenko and P. Aarabi. Tiny videos: A large dataset for nonparametric video retrieval and frame classification. TPAMI, 33(3):618--630, 2011.
[16]
Y. Ke, R. Sukthankar, and L. Huston. An efficient parts-based near-duplicate and sub-image retrieval system. In ACM Multimedia, pages 869--876, 2004.
[17]
G. Kim, E. Xing, L. Fei-Fei, and T. Kanade. Distributed cosegmentation via submodular optimization on anisotropic diffusion. In ICCV, pages 169--176, 2011.
[18]
A. Krause and C. Guestrin. Beyond convexity: Submodularity in machine learning. ICML Tutorials, 2008.
[19]
S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In CVPR, volume 2, pages 2169--2178, 2006.
[20]
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278--2324, 1998.
[21]
J. Leskovec, A. Krause, C. Guestrin, C. Faloutsos, J. VanBriesen, and N. Glance. Cost-effective outbreak detection in networks. KDD, 2007.
[22]
L.-J. Li, H. Su, E. P. Xing, and L. Fei-Fei. Object bank: A high-level image representation for scene classification and semantic feature sparsification. In NIPS, 2010.
[23]
Z. Li, H. Ning, L. Cao, T. Zhang, Y. Gong, and T. Huang. Learning to search efficiently in high dimensions. NIPS, 24, 2011.
[24]
J. Liu, Z. Huang, H. Shen, and B. Cui. Correlation-based retrieval for heavily changed near-duplicate videos. ACM Transactions on Information Systems, 29(4):21, 2011.
[25]
W. Liu, J. Wang, R. Ji, Y.-G. Jiang, and S.-F. Chang. Supervised hashing with kernels. CVPR, 2012.
[26]
W. Liu, J. Wang, S. Kumar, and S.-F. Chang. Hashing with graphs. ICML, 2011.
[27]
M. Merler, L. X. Bert Huang, G. Hua, and A. Natsev. Semantic model vectors for complex video event recognition. IEEE Transactions on Multimedia, 2011.
[28]
G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher. An analysis of approximations for maximizing submodular set functions. Mathematical Programming, 14:265--294, 1978.
[29]
A. Oliva and A. Torralba. Modeling the shape of the scene: a holistic representation of the spatial envelope. IJCV, 2001.
[30]
B. Russell, A. Torralba, C. Liu, R. Fergus, and W. T. Freeman. Object recognition by scene alignment. In NIPS, 2007.
[31]
G. Shakhnarovich, P. Viola, and T. Darrell. Fast pose estimation with parameter-sensitive hashing. In ICCV, pages 750--757, 2003.
[32]
J. Song, Y. Yang, Z. Huang, H. Shen, and R. Hong. Multiple feature hashing for real-time large scale near-duplicate video retrieval. In ACM Multimedia, pages 423--432, 2011.
[33]
A. Torralba, R. Fergus, and Y. Weiss. Small codes and large image databases for recognition. In CVPR, 2008.
[34]
Y. Weiss, A. Torralba, and R. Fergus. Spectral hashing. NIPS, 2008.
[35]
X. Wu, A. Hauptmann, and C. Ngo. Practical elimination of near-duplicates from web video search. In ACM Multimedia, pages 218--227, 2007.
[36]
J. Xiao, J. Haysy, K. A. Ehinger, A. Oliva, and A. Torralba. Sun database: Large-scale scene recognition from abbey to zoo. In CVPR, 2010.
[37]
J. Yang, K. Yu, Y. Gong, and T. Huang. Linear pyramid matching using sparse coding for image classification. In CVPR, 2009.
[38]
D. Zhang, F. Wang, and L. Si. Composite hashing with multiple information sources. In SIGIR, pages 225--234, 2011.
[39]
X. Zhu. Semi-supervised learning literature survey. 2005.
[40]
X. Zhu, Z. Ghahramani, and J. Lafferty. Semi-supervised learning using gaussian fields and harmonic functions. In ICML, 2003.
[41]
J. Zobel, A. Moffat, and K. Ramamohanarao. Inverted files versus signature files for text indexing. ACM Transactions on Database Systems, 23(4):453--490, 1998.

Cited By

View all
  • (2024)Efficient Unsupervised Video Hashing With Contextual Modeling and Structural ControllingIEEE Transactions on Multimedia10.1109/TMM.2024.336892426(7438-7450)Online publication date: 2024
  • (2023)Video Retrieval for Everyday Scenes With Common ObjectsProceedings of the 2023 ACM International Conference on Multimedia Retrieval10.1145/3591106.3592239(565-570)Online publication date: 12-Jun-2023
  • (2023)Contrastive Transformer Hashing for Compact Video RepresentationIEEE Transactions on Image Processing10.1109/TIP.2023.332699432(5992-6003)Online publication date: 2023
  • Show More Cited By

Index Terms

  1. Submodular video hashing: a unified framework towards video pooling and indexing

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '12: Proceedings of the 20th ACM international conference on Multimedia
    October 2012
    1584 pages
    ISBN:9781450310895
    DOI:10.1145/2393347
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 29 October 2012

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. feature pooling
    2. indexing
    3. multiple feature hashing
    4. submodular
    5. video hashing

    Qualifiers

    • Research-article

    Conference

    MM '12
    Sponsor:
    MM '12: ACM Multimedia Conference
    October 29 - November 2, 2012
    Nara, Japan

    Acceptance Rates

    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)5
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 22 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Efficient Unsupervised Video Hashing With Contextual Modeling and Structural ControllingIEEE Transactions on Multimedia10.1109/TMM.2024.336892426(7438-7450)Online publication date: 2024
    • (2023)Video Retrieval for Everyday Scenes With Common ObjectsProceedings of the 2023 ACM International Conference on Multimedia Retrieval10.1145/3591106.3592239(565-570)Online publication date: 12-Jun-2023
    • (2023)Contrastive Transformer Hashing for Compact Video RepresentationIEEE Transactions on Image Processing10.1109/TIP.2023.332699432(5992-6003)Online publication date: 2023
    • (2023)Deep Hybrid Neural Network With Attention Mechanism for Video Hash Retrieval MethodIEEE Access10.1109/ACCESS.2023.327632111(47956-47966)Online publication date: 2023
    • (2022)Structure-Adaptive Neighborhood Preserving Hashing for Scalable Video SearchIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2021.309325832:4(2441-2454)Online publication date: Apr-2022
    • (2021)Boosting Temporal Binary Coding for Large-Scale Video SearchIEEE Transactions on Multimedia10.1109/TMM.2020.297859323(353-364)Online publication date: 2021
    • (2021)Semantics-Aware Spatial-Temporal Binaries for Cross-Modal Video RetrievalIEEE Transactions on Image Processing10.1109/TIP.2020.304868030(2989-3004)Online publication date: 2021
    • (2021)Robust Homomorphic Video Hashing2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)10.1109/MIPR51284.2021.00021(91-96)Online publication date: Sep-2021
    • (2020)Tamper-Proofing Video With Hierarchical Attention Autoencoder Hashing on BlockchainIEEE Transactions on Multimedia10.1109/TMM.2020.296764022:11(2858-2872)Online publication date: Nov-2020
    • (2020)Active Video Hashing via Structure Information Learning for Activity AnalysisIEEE Access10.1109/ACCESS.2020.29947838(96428-96437)Online publication date: 2020
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media