Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2578726.2578746acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmrConference Proceedingsconference-collections
tutorial

Composite Concept Discovery for Zero-Shot Video Event Detection

Published: 01 April 2014 Publication History

Abstract

We consider automated detection of events in video without the use of any visual training examples. A common approach is to represent videos as classification scores obtained from a vocabulary of pre-trained concept classifiers. Where others construct the vocabulary by training individual concept classifiers, we propose to train classifiers for combination of concepts composed by Boolean logic operators. We call these concept combinations composite concepts and contribute an algorithm that automatically discovers them from existing video-level concept annotations. We discover composite concepts by jointly optimizing the accuracy of concept classifiers and their effectiveness for detecting events. We demonstrate that by combining concepts into composite concepts, we can train more accurate classifiers for the concept vocabulary, which leads to improved zero-shot event detection. Moreover, we demonstrate that by using different logic operators, namely "AND", "OR", we discover different types of composite concepts, which are complementary for zero-shot event detection. We perform a search for 20 events in 41K web videos from two test sets of the challenging TRECVID Multimedia Event Detection 2013 corpus. The experiments demonstrate the superior performance of the discovered composite concepts, compared to present-day alternatives, for zero-shot event detection.

References

[1]
T. L. Berg, A. C. Berg, and J. Shih. Automatic attribute discovery and characterization from noisy web data. In ECCV, 2010.
[2]
D. Borth, R. Ji, T. Chen, T. Breuel, and S.-F. Chang. Large-scale visual sentiment ontology and detectors using adjective noun pairs. In ACM MM, 2013.
[3]
J. Dalton, J. Allan, and P. Mirajkar. Zero-shot video retrieval using content and concepts. In CIKM, 2013.
[4]
A. Farhadi and M. A. Sadeghi. Phrasal recognition. IEEE Trans. PAMI, 35(12), 2013.
[5]
A. Habibian, K. van de Sande, and C. Snoek. Recommendations for video event recognition using concept vocabularies. In ICMR, 2013.
[6]
M. Jain, H. Jégou, and P. Bouthemy. Better exploiting motion for better action recognition. In CVPR, 2013.
[7]
L. Jiang, A. G. Hauptmann, and G. Xiang. Leveraging high-level and low-level features for multimedia event detection. In ACM MM, 2012.
[8]
X. Li, C. Snoek, M. Worring, and A. Smeulders. Harvesting social images for bi-concept search. IEEE Trans. Multimedia, 14(4):1091--1104, 2012.
[9]
J. Liu, Q. Yu, O. Javed, S. Ali, A. Tamrakar, A. Divakaran, H. Cheng, and H. S. Sawhney. Video event recognition using concept attributes. In WACV, 2013.
[10]
Z. Ma, Y. Yang, Z. Xu, S. Yan, N. Sebe, and A. G. Hauptmann. Complex event detection via multi-source video attributes. In CVPR, 2013.
[11]
M. Mazloom, E. Gavves, K. van de Sande, and C. Snoek. Searching informative concept banks for video event detection. In ICMR, 2013.
[12]
M. Mazloom, A. Habibian, and C. Snoek. Querying for video events by semantic signatures from few examples. In ICMR, 2013.
[13]
M. Merler, B. Huang, L. Xie, G. Hua, and A. Natsev. Semantic model vectors for complex video event recognition. IEEE Trans. Multimedia, 14(1), 2012.
[14]
P. Natarajan, S. Wu, F. Luisier, X. Zhuang, and M. Tickoo. BBN VISER TRECVID 2013 multimedia event detection and multimedia event recounting systems. In TRECVID Workshop, 2013.
[15]
S.-Y. Neo, J. Zhao, M.-Y. Kan, and T.-S. Chua. Video retrieval using high level features: Exploiting query matching and confidence-based weighting. In CIVR, 2006.
[16]
P. Over, J. Fiscus, G. Sanders, et al. TRECVID 2012-an overview of the goals, tasks, data, evaluation mechanisms and metrics. In TRECVID Workshop, 2012.
[17]
H. Peng, F. Long, and C. Ding. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. PAMI, 27(8), 2005.
[18]
F. Perronnin, Z. Akata, Z. Harchaoui, and C. Schmid. Towards good practice in large-scale learning for image classification. In CVPR, 2012.
[19]
M. Rastegari, A. Diba, D. Parikh, and A. Farhadi. Multi-attribute queries: To merge or not to merge? In CVPR, 2013.
[20]
H. Wang, A. Klaser, C. Schmid, and C.-L. Liu. Action recognition by dense trajectories. In CVPR, 2011.
[21]
H. Wang and C. Schmid. Action recognition with improved trajectories. In ICCV, 2013.
[22]
E. Younessian, T. Mitamura, and A. Hauptmann. Multimodal knowledge-based analysis in multimedia event detection. In ICMR, 2012.
[23]
J. Yuan, Z.-J. Zha, Y.-T. Zheng, M. Wang, X. Zhou, and T.-S. Chua. Learning concept bundles for video search with complex queries. In ACM MM, 2011.

Cited By

View all
  • (2023)From Pixel to Patch: Synthesize Context-Aware Features for Zero-Shot Semantic SegmentationIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.314596234:10(7689-7703)Online publication date: Oct-2023
  • (2023)Using Multimodal Contrastive Knowledge Distillation for Video-Text RetrievalIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.325719333:10(5486-5497)Online publication date: Oct-2023
  • (2023)Temporal Multimodal Graph Transformer With Global-Local Alignment for Video-Text RetrievalIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2022.320791033:3(1438-1453)Online publication date: Mar-2023
  • Show More Cited By

Index Terms

  1. Composite Concept Discovery for Zero-Shot Video Event Detection

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICMR '14: Proceedings of International Conference on Multimedia Retrieval
    April 2014
    564 pages
    ISBN:9781450327824
    DOI:10.1145/2578726
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 April 2014

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Concept representation
    2. Event recognition

    Qualifiers

    • Tutorial
    • Research
    • Refereed limited

    Conference

    ICMR '14
    ICMR '14: International Conference on Multimedia Retrieval
    April 1 - 4, 2014
    Glasgow, United Kingdom

    Acceptance Rates

    ICMR '14 Paper Acceptance Rate 21 of 111 submissions, 19%;
    Overall Acceptance Rate 254 of 830 submissions, 31%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)5
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 08 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)From Pixel to Patch: Synthesize Context-Aware Features for Zero-Shot Semantic SegmentationIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.314596234:10(7689-7703)Online publication date: Oct-2023
    • (2023)Using Multimodal Contrastive Knowledge Distillation for Video-Text RetrievalIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.325719333:10(5486-5497)Online publication date: Oct-2023
    • (2023)Temporal Multimodal Graph Transformer With Global-Local Alignment for Video-Text RetrievalIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2022.320791033:3(1438-1453)Online publication date: Mar-2023
    • (2023)Text-video retrieval method based on enhanced self-attention and multi-task learningMultimedia Tools and Applications10.1007/s11042-023-14589-682:16(24387-24406)Online publication date: 23-Feb-2023
    • (2022)Level-wise aligned dual networks for text–video retrievalEURASIP Journal on Advances in Signal Processing10.1186/s13634-022-00887-y2022:1Online publication date: 7-Jul-2022
    • (2022)Zero-Shot Video Event Detection With High-Order Semantic Concept Discovery and MatchingIEEE Transactions on Multimedia10.1109/TMM.2021.307362424(1896-1908)Online publication date: 2022
    • (2022)YFormer: A New Transformer Architecture for Video-Query Based Video Moment RetrievalPattern Recognition and Computer Vision10.1007/978-3-031-18913-5_49(638-650)Online publication date: 14-Oct-2022
    • (2021)Design and Development of an Internet of Smart Cameras Solution for Complex Event Detection in COVID-19 Risk Behaviour RecognitionISPRS International Journal of Geo-Information10.3390/ijgi1002008110:2(81)Online publication date: 18-Feb-2021
    • (2021)Personalized Multi-modal Video Retrieval on Mobile DevicesProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3481545(1185-1191)Online publication date: 17-Oct-2021
    • (2021)Hierarchical Cross-Modal Graph Consistency Learning for Video-Text RetrievalProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3404835.3462974(1114-1124)Online publication date: 11-Jul-2021
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media