tutorial

Composite Concept Discovery for Zero-Shot Video Event Detection

Authors:

Amirhossein Habibian,

Thomas Mensink,

Cees G. M. SnoekAuthors Info & Claims

ICMR '14: Proceedings of International Conference on Multimedia Retrieval

Pages 17 - 24

https://doi.org/10.1145/2578726.2578746

Published: 01 April 2014 Publication History

Abstract

We consider automated detection of events in video without the use of any visual training examples. A common approach is to represent videos as classification scores obtained from a vocabulary of pre-trained concept classifiers. Where others construct the vocabulary by training individual concept classifiers, we propose to train classifiers for combination of concepts composed by Boolean logic operators. We call these concept combinations composite concepts and contribute an algorithm that automatically discovers them from existing video-level concept annotations. We discover composite concepts by jointly optimizing the accuracy of concept classifiers and their effectiveness for detecting events. We demonstrate that by combining concepts into composite concepts, we can train more accurate classifiers for the concept vocabulary, which leads to improved zero-shot event detection. Moreover, we demonstrate that by using different logic operators, namely "AND", "OR", we discover different types of composite concepts, which are complementary for zero-shot event detection. We perform a search for 20 events in 41K web videos from two test sets of the challenging TRECVID Multimedia Event Detection 2013 corpus. The experiments demonstrate the superior performance of the discovered composite concepts, compared to present-day alternatives, for zero-shot event detection.

References

[1]

T. L. Berg, A. C. Berg, and J. Shih. Automatic attribute discovery and characterization from noisy web data. In ECCV, 2010.

Digital Library

[2]

D. Borth, R. Ji, T. Chen, T. Breuel, and S.-F. Chang. Large-scale visual sentiment ontology and detectors using adjective noun pairs. In ACM MM, 2013.

Digital Library

[3]

J. Dalton, J. Allan, and P. Mirajkar. Zero-shot video retrieval using content and concepts. In CIKM, 2013.

Digital Library

[4]

A. Farhadi and M. A. Sadeghi. Phrasal recognition. IEEE Trans. PAMI, 35(12), 2013.

Digital Library

[5]

A. Habibian, K. van de Sande, and C. Snoek. Recommendations for video event recognition using concept vocabularies. In ICMR, 2013.

Digital Library

[6]

M. Jain, H. Jégou, and P. Bouthemy. Better exploiting motion for better action recognition. In CVPR, 2013.

Digital Library

[7]

L. Jiang, A. G. Hauptmann, and G. Xiang. Leveraging high-level and low-level features for multimedia event detection. In ACM MM, 2012.

Digital Library

[8]

X. Li, C. Snoek, M. Worring, and A. Smeulders. Harvesting social images for bi-concept search. IEEE Trans. Multimedia, 14(4):1091--1104, 2012.

Digital Library

[9]

J. Liu, Q. Yu, O. Javed, S. Ali, A. Tamrakar, A. Divakaran, H. Cheng, and H. S. Sawhney. Video event recognition using concept attributes. In WACV, 2013.

Digital Library

[10]

Z. Ma, Y. Yang, Z. Xu, S. Yan, N. Sebe, and A. G. Hauptmann. Complex event detection via multi-source video attributes. In CVPR, 2013.

Digital Library

[11]

M. Mazloom, E. Gavves, K. van de Sande, and C. Snoek. Searching informative concept banks for video event detection. In ICMR, 2013.

Digital Library

[12]

M. Mazloom, A. Habibian, and C. Snoek. Querying for video events by semantic signatures from few examples. In ICMR, 2013.

Digital Library

[13]

M. Merler, B. Huang, L. Xie, G. Hua, and A. Natsev. Semantic model vectors for complex video event recognition. IEEE Trans. Multimedia, 14(1), 2012.

Digital Library

[14]

P. Natarajan, S. Wu, F. Luisier, X. Zhuang, and M. Tickoo. BBN VISER TRECVID 2013 multimedia event detection and multimedia event recounting systems. In TRECVID Workshop, 2013.

[15]

S.-Y. Neo, J. Zhao, M.-Y. Kan, and T.-S. Chua. Video retrieval using high level features: Exploiting query matching and confidence-based weighting. In CIVR, 2006.

Digital Library

[16]

P. Over, J. Fiscus, G. Sanders, et al. TRECVID 2012-an overview of the goals, tasks, data, evaluation mechanisms and metrics. In TRECVID Workshop, 2012.

[17]

H. Peng, F. Long, and C. Ding. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. PAMI, 27(8), 2005.

Digital Library

[18]

F. Perronnin, Z. Akata, Z. Harchaoui, and C. Schmid. Towards good practice in large-scale learning for image classification. In CVPR, 2012.

[19]

M. Rastegari, A. Diba, D. Parikh, and A. Farhadi. Multi-attribute queries: To merge or not to merge? In CVPR, 2013.

Digital Library

[20]

H. Wang, A. Klaser, C. Schmid, and C.-L. Liu. Action recognition by dense trajectories. In CVPR, 2011.

Digital Library

[21]

H. Wang and C. Schmid. Action recognition with improved trajectories. In ICCV, 2013.

Digital Library

[22]

E. Younessian, T. Mitamura, and A. Hauptmann. Multimodal knowledge-based analysis in multimedia event detection. In ICMR, 2012.

Digital Library

[23]

J. Yuan, Z.-J. Zha, Y.-T. Zheng, M. Wang, X. Zhou, and T.-S. Chua. Learning concept bundles for video search with complex queries. In ACM MM, 2011.

Digital Library

Cited By

Gu ZZhou SNiu LZhao ZZhang L(2023)From Pixel to Patch: Synthesize Context-Aware Features for Zero-Shot Semantic SegmentationIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.314596234:10(7689-7703)Online publication date: Oct-2023
https://doi.org/10.1109/TNNLS.2022.3145962
Ma WChen QZhou TZhao SCai Z(2023)Using Multimodal Contrastive Knowledge Distillation for Video-Text RetrievalIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.325719333:10(5486-5497)Online publication date: Oct-2023
https://doi.org/10.1109/TCSVT.2023.3257193
Feng ZZeng ZGuo CLi Z(2023)Temporal Multimodal Graph Transformer With Global-Local Alignment for Video-Text RetrievalIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2022.320791033:3(1438-1453)Online publication date: Mar-2023
https://doi.org/10.1109/TCSVT.2022.3207910
Show More Cited By

Index Terms

Composite Concept Discovery for Zero-Shot Video Event Detection
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Video summarization

Recommendations

Searching informative concept banks for video event detection
ICMR '13: Proceedings of the 3rd ACM conference on International conference on multimedia retrieval

An emerging trend in video event detection is to learn an event from a bank of concept detector scores. Different from existing work, which simply relies on a bank containing all available detectors, we propose in this paper an algorithm that learns ...
Zero-shot event detection via event-adaptive concept relevance mining
Abstract
Zero-shot complex event detection has been an emerging task in coping with the scarcity of labeled training videos in practice. Aiming to progress beyond the state-of-the-art zero-shot event detection, we propose a new zero-shot event ...
Recommendations for video event recognition using concept vocabularies
ICMR '13: Proceedings of the 3rd ACM conference on International conference on multimedia retrieval

Representing videos using vocabularies composed of concept detectors appears promising for event recognition. While many have recently shown the benefits of concept vocabularies for recognition, the important question what concepts to include in the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICMR '14: Proceedings of International Conference on Multimedia Retrieval

April 2014

564 pages

ISBN:9781450327824

DOI:10.1145/2578726

Conference Chairs:
Mohan Kankanhalli
National University of Singapore
,
Stefan Rueger
The Open University, UK
,
R. Manmatha
A9.com, USA
,
General Chairs:
Joemon Jose
University of Glasgow, UK
,
Keith van Rijsbergen
University of Glasgow, UK

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 April 2014

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Tutorial
Research
Refereed limited

Conference

ICMR '14

ICMR '14: International Conference on Multimedia Retrieval

April 1 - 4, 2014

Glasgow, United Kingdom

Acceptance Rates

ICMR '14 Paper Acceptance Rate 21 of 111 submissions, 19%;

Overall Acceptance Rate 254 of 830 submissions, 31%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

52
Total Citations
View Citations
257
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)2

Reflects downloads up to 08 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Gu ZZhou SNiu LZhao ZZhang L(2023)From Pixel to Patch: Synthesize Context-Aware Features for Zero-Shot Semantic SegmentationIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.314596234:10(7689-7703)Online publication date: Oct-2023
https://doi.org/10.1109/TNNLS.2022.3145962
Ma WChen QZhou TZhao SCai Z(2023)Using Multimodal Contrastive Knowledge Distillation for Video-Text RetrievalIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.325719333:10(5486-5497)Online publication date: Oct-2023
https://doi.org/10.1109/TCSVT.2023.3257193
Feng ZZeng ZGuo CLi Z(2023)Temporal Multimodal Graph Transformer With Global-Local Alignment for Video-Text RetrievalIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2022.320791033:3(1438-1453)Online publication date: Mar-2023
https://doi.org/10.1109/TCSVT.2022.3207910
Wu XQian JWang T(2023)Text-video retrieval method based on enhanced self-attention and multi-task learningMultimedia Tools and Applications10.1007/s11042-023-14589-682:16(24387-24406)Online publication date: 23-Feb-2023
https://doi.org/10.1007/s11042-023-14589-6
Lin QCao WHe Z(2022)Level-wise aligned dual networks for text–video retrievalEURASIP Journal on Advances in Signal Processing10.1186/s13634-022-00887-y2022:1Online publication date: 7-Jul-2022
https://doi.org/10.1186/s13634-022-00887-y
Jin YJiang WYang YMu Y(2022)Zero-Shot Video Event Detection With High-Order Semantic Concept Discovery and MatchingIEEE Transactions on Multimedia10.1109/TMM.2021.307362424(1896-1908)Online publication date: 2022
https://doi.org/10.1109/TMM.2021.3073624
Huo SZhou YWang H(2022)YFormer: A New Transformer Architecture for Video-Query Based Video Moment RetrievalPattern Recognition and Computer Vision10.1007/978-3-031-18913-5_49(638-650)Online publication date: 14-Oct-2022
https://dl.acm.org/doi/10.1007/978-3-031-18913-5_49
Honarparvar SSaeedi SLiang SSquires J(2021)Design and Development of an Internet of Smart Cameras Solution for Complex Event Detection in COVID-19 Risk Behaviour RecognitionISPRS International Journal of Geo-Information10.3390/ijgi1002008110:2(81)Online publication date: 18-Feb-2021
https://doi.org/10.3390/ijgi10020081
Zhang HJepson AMohomed IDerpanis KZhang RFazly AShen HZhuang YSmith JYang YCesar PMetze FPrabhakaran B(2021)Personalized Multi-modal Video Retrieval on Mobile DevicesProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3481545(1185-1191)Online publication date: 17-Oct-2021
https://dl.acm.org/doi/10.1145/3474085.3481545
Jin WZhao ZZhang PZhu JHe XZhuang YDiaz FShah CSuel TCastells PJones RSakai T(2021)Hierarchical Cross-Modal Graph Consistency Learning for Video-Text RetrievalProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3404835.3462974(1114-1124)Online publication date: 11-Jul-2021
https://dl.acm.org/doi/10.1145/3404835.3462974
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents