Article

Supporting audiovisual query using dynamic programming

Authors:

Milind R. Naphade,

Thomas S. HuangAuthors Info & Claims

MULTIMEDIA '01: Proceedings of the ninth ACM international conference on Multimedia

Pages 411 - 420

https://doi.org/10.1145/500141.500202

Published: 01 October 2001 Publication History

Abstract

A necessary capability for content-based retrieval is to support the paradigm of query by example. Most systems for video retrieval support queries using image sequences only. We present an algorithm for matching multimodal (audio-visual) patterns for the purpose of content-based video retrieval. The novel ability of our approach to use the information content in multiple media coupled with a strong emphasis on temporal similarity differentiates it from the state-of-the-art in content-based retrieval. At the core of the pattern matching scheme is a dynamic programming algorithm, which leads to a significant improvement in performance. Coupling the use of audio with video this algorithm can be applied to grouping of shots based on audio-visual similarity. We also support relevance feedback. The user can provide feedback to the system, by choosing clips, which are closer to the user's desired target. The system then automatically adjusts the relative weights or relevance of the media and fetches different sets of target clips accordingly. It is our observation that a few iterations of such feedback are generally sufficient, for retrieving the desired video clips.

References

[1]

J. Bach, C. Fuller, A. Gupta, A. Hampapur, B. Horowitz, R. Humphrey, R. Jain, and C. Shu. The virage image search engine: An open framework for image management. In Pnxeedings of SPIE Storage and Retrieval for Image and Video Databases, Feb. 1996.

[2]

Ft. E. Bellman. Dynamic Progmmming. Princeton University Press, Princeton, NJ, 1957.

[3]

S. F. Chang, W. Chen, and H. Sundaam. Semantic visual templates linking features to semantics. In Proceedings of IEEE International Confemce on Image Processing, volume 3, pages 531-535, Chicago, IL, Oct. 1998.

[4]

M. Flickner, H. Sawhney, W. Niblack, .I. Ashley, Q. Huang, B. Dam, M. Gorksni, J. Hafner, D. Lee, D. Petkovic, D. Steele, and P. Yanker. Query by image and video content: The QBIC system. IEEE Computer, 28(9):23-32, 1995.

Digital Library

[5]

A. K. Jain and A. Vailaya. Shape-based retrieval: A case study with trademark image databases. Pattern Recognition, 31(9):1369-1390, 1998.

[6]

A. K. Jain, A. Vailaya, and W. Xiong. Query by video clip. Multimedia Systems, Special Issue mz Vidideo Libraries, 7(5):369-384, 1999.

Digital Library

[7]

V. Kobla, D. DeMenthon, and D. Doermann. Identifying sports video using replay, text and camera motion features. In Proceedings of SPIE Storage and Retrieval for Media Databases, volume 3972, pages 332-343, Jan. 2000.

[8]

W. Ma and B. S. Manjunath. NETRA: A toolbox for navigating large image databases. Multimedia System, 7(3):X+-198, 1999.

Digital Library

[9]

R. Mohan. Video sequence matching. In Proceedings of Intonational Conference on Speech, Accowtics and Sigd Processing, volume 6, pages 3697-3700, 1998.

[10]

M. Naphade, I. Kozintsev, T. Huang, and K. Rnmchandran. A factor graph framework for semantic indexing and retrieval in video. In Pmceedings of Workshop on Content Based Access to Image and Video Libmries Held in Conjunction with CVPR, pages 35-39, June 2000.

Digital Library

[11]

M. Naphade, T. Kristjansson, B. Frey, and T. S. Huang. Probabilistic multimedia objects (multijects): A novel approach to indexing and retrieval in multimedia systems. In Pmceedings of IEEE Intemtiond Conference on Image Pnxessing, volume 3, pages 53G-540, Chicago, IL, Oct. 1998.

[12]

M. Naphade, R. Mehrotra, A. M. Ferman, J. Warnick, T. S. Hung, and A. M. Tekalp. A hiih performance shot boundary detection algorithm using multiple cues. In Proceedings of IEEE Intmatimd Confemm on Image Processing, volume 2, pages 884-887, Chicago, IL, Oct. 1998.

[13]

M. R. Naphade and T. S. Huang. Semantic video indexing using a probabilistic framework. In Pmceedings of IAPR International Conference on Pattern Recognition, volume 3, pages 83-88, Barcelona, Spain, Sep. 2000.

Digital Library

[14]

M. R. Naphade and T. S. Huang. A probabilistic framework for semantic video indexing, filtering and retrieval. IEEE Transactions on Multimedia, special issue on Multimedia over IP, 3(1):141-151, Mar. 2001.

Digital Library

[15]

M. R. Naphade, I. Kaintsev, and T. S. Hung. On probabilistic semantic video indexing. In Proceedings of Neural Information Processing Systems, Nov. 2000.

[16]

M. R. Naphade, M. M. Yang, and B. L. Yea. A navel scheme for fast and efficient video sequence matching using compact signatures. In Pmceedings of SPIE Storage and Retrieval for Mzdtimedia Databases, volume 3972, pages 564-572, Jan. 2000.

[17]

L. Rabiner and B. H. Juang. findamentals of Speech Recognition Prentice Hall, Englewood Cliffs, NJ, 1993.

Digital Library

[18]

Y. R.ui, T. S. Huang, M. Ortega, and S. Mehrotra. Relevance feedback: A power tool in interactive content-based image retrieval. IEEE %nsactions on Cimits and Systems for Video Technology, Special issue on Segmentation, Description, and Retrieval of Video Content, 8(5):644-655, Sep. 1998.

Digital Library

[19]

H. Sakoe and S. Chiba. Dynamic programming optimization for spoken word recognition. IEEE Zhmsactimw on Accoustics, Speech, Signal Processing ASSP, 26(1):43-49, Feb. 1978.

[20]

D. D. Saw, Y. P. Tan, S. R. Kulkami, and P. J. Ramadge. Automated analysis and annotation of basketball video. In Proceedings of SPIE Symposium, volume 3022, pages 176187, 1997.

[21]

J. R. Smith and S. F. Chang. Visualseek: A fully automated content-based image query system. In Proceedings of ACM Multimedia, Boston, MA, Nov. 1996.

Digital Library

[22]

S. Srinivasan, D. Ponceleon, A. Amir, and D. Petkovic. What is that video anyway? In search of better browsing. In Proceedings of IEEE Intenzational Conference on Multimedia and Ezpo, pages 388-392, July 2000.

Digital Library

[23]

N. Vaswncelos and A. Lippman. Baysian modeling of video editing and structure: Semantic features for video summarization and browsing. In P-dings of IEEE International Confemce on Image Processing, volume 2, pages 550-555, Chicago, IL, Oct. 1998.

[24]

M. M. Yeung and B. Liu. Efficient matching and clustering of video shots. In Proceedings of IEEE Intenuationol Conference on Image Pwxessing, volume 1, pages 338-341, Washington, D.C., Oct. 1995.

Digital Library

[25]

H. Zhang, A. Wang, and Y. Altunbasak. Content-based video retrieval and compression: A unified solution. In Proceedings of IEEE International Confersce on Image Processing, volume 1, pages 13-16, Santa Barbara, CA, Oct. 1997.

Digital Library

[26]

T. Zhang and C. Kuo. An integrated approach to multimodal media content analysis. In Proceedings of SPIE, ISIT Storage and Retrieval for Media Databases, volume 3972, pages 506-517, Jan. 2000.

[27]

D. Zhong and S. F. Chang. Spatio-temporal video search using the object-based video representation. In Pmceedings of IEEE International Conference on Image Pmcewing, volume 1, pages 21-24, Santa Barbara, CA, Oct. 1997.

Digital Library

Cited By

Li TNian FWu XGao QLu Y(2016)Efficient video copy detection using multi-modality and dynamic path searchMultimedia Systems10.1007/s00530-014-0387-822:1(29-39)Online publication date: 1-Feb-2016
https://dl.acm.org/doi/10.1007/s00530-014-0387-8
Anguera XObrador POliver NBoll SHoi SLuo JJin RKing IXu D(2009)Multimodal video copy detection applied to social mediaProceedings of the first SIGMM workshop on Social media10.1145/1631144.1631157(57-64)Online publication date: 23-Oct-2009
https://dl.acm.org/doi/10.1145/1631144.1631157
Haubold ANaphade MSebe NWorring M(2007)Classification of video events using 4-dimensional time-compressed motion featuresProceedings of the 6th ACM international conference on Image and video retrieval10.1145/1282280.1282311(178-185)Online publication date: 9-Jul-2007
https://dl.acm.org/doi/10.1145/1282280.1282311
Show More Cited By

Index Terms

Supporting audiovisual query using dynamic programming
1. Information systems
  1. Information retrieval
    1. Information retrieval query processing
2. Theory of computation
  1. Design and analysis of algorithms
    1. Algorithm design techniques
      1. Dynamic programming

Recommendations

Mutual relevance feedback for multimodal query formulation in video retrieval
MIR '05: Proceedings of the 7th ACM SIGMM international workshop on Multimedia information retrieval

Video indexing and retrieval systems allow users to find relevant video segments for a given information need. A multimodal video index may include speech indices, a text-from-screen (OCR) index, semantic visual concepts, content-based image features, ...
Cross-media retrieval using query dependent search methods

The content-based cross-media retrieval is a new type of multimedia retrieval in which the media types of query examples and the returned results can be different. In order to learn the semantic correlations among multimedia objects of different ...
A GA-based query optimization method for web information retrieval

By a different use of relevance feedback (the order in which the relevant documents are retrieved, the terms of the relevant documents, and the terms of the irrelevant documents) in the design of fitness function, and by introducing three different ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

MULTIMEDIA '01: Proceedings of the ninth ACM international conference on Multimedia

October 2001

664 pages

ISBN:1581133944

DOI:10.1145/500141

Conference Chairs:
Nicolas D. Georganas
University of Ottawa
,
Radu Popescu-Zeletin
GMD Fokus

Copyright © 2001 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 October 2001

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

MM01

Sponsor:

MM01: ACM Multimedia 2001

September 30 - October 5, 2001

Ottawa, Canada

Acceptance Rates

Overall Acceptance Rate 995 of 4,171 submissions, 24%

Upcoming Conference

MM '24

Sponsor:
sigmm

The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
427
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)1

Reflects downloads up to 24 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Li TNian FWu XGao QLu Y(2016)Efficient video copy detection using multi-modality and dynamic path searchMultimedia Systems10.1007/s00530-014-0387-822:1(29-39)Online publication date: 1-Feb-2016
https://dl.acm.org/doi/10.1007/s00530-014-0387-8
Anguera XObrador POliver NBoll SHoi SLuo JJin RKing IXu D(2009)Multimodal video copy detection applied to social mediaProceedings of the first SIGMM workshop on Social media10.1145/1631144.1631157(57-64)Online publication date: 23-Oct-2009
https://dl.acm.org/doi/10.1145/1631144.1631157
Haubold ANaphade MSebe NWorring M(2007)Classification of video events using 4-dimensional time-compressed motion featuresProceedings of the 6th ACM international conference on Image and video retrieval10.1145/1282280.1282311(178-185)Online publication date: 9-Jul-2007
https://dl.acm.org/doi/10.1145/1282280.1282311
Naphade M(2004)On supervision and statistical learning for semantic multimedia analysisJournal of Visual Communication and Image Representation10.1016/j.jvcir.2004.04.01015:3(348-369)Online publication date: 1-Sep-2004
https://dl.acm.org/doi/10.1016/j.jvcir.2004.04.010

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents