Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3343031.3356082acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Relation Understanding in Videos: A Grand Challenge Overview

Published: 15 October 2019 Publication History

Abstract

ACM Multimedia 2019 Video Relation Understanding Challenge is the first grand challenge aiming at pushing video content analysis at the relational and structural level. This year, the challenge asks the participants to explore and develop innovative algorithms to detect object entities and their relations based on a large-scale user-generated video dataset. The tasks will advance the foundation of future visual systems that are able to perform complex inferences. This paper presents an overview of the grand challenge, including background, detailed descriptions of the three proposed tasks, the corresponding datasets for training, validation and testing, and the evaluation process.

References

[1]
Jingjing Chen and Chong-Wah Ngo. 2016. Deep-based ingredient recognition for cooking recipe retrieval. In Proceedings of the 24th ACM international conference on Multimedia. ACM, 32--41.
[2]
Chunhui Gu, Chen Sun, David A Ross, Carl Vondrick, Caroline Pantofaru, Yeqing Li, Sudheendra Vijayanarasimhan, George Toderici, Susanna Ricco, Rahul Sukthankar, et almbox. 2018. AVA: A video dataset of spatio-temporally localized atomic visual actions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6047--6056.
[3]
Jingjia Huang, Nannan Li, Jiaxing Zhong, Thomas H Li, and Ge Li. 2018. Online Action Tube Detection via Resolving the Spatio-temporal Context Pattern. In 2018 ACM Multimedia Conference on Multimedia Conference. ACM, 993--1001.
[4]
Yu-Gang Jiang, Minjun Li, Xi Wang, Wei Liu, and Xian-Sheng Hua. 2018. DeepProduct: Mobile product search with portable deep features. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Vol. 14, 2 (2018), 50.
[5]
Dong Li, Zhaofan Qiu, Qi Dai, Ting Yao, and Tao Mei. 2018. Recurrent tubelet proposal and recognition networks for action detection. In Proceedings of the European conference on computer vision (ECCV). 303--318.
[6]
Cewu Lu, Ranjay Krishna, Michael Bernstein, and Li Fei-Fei. 2016. Visual relationship detection with language priors. In European Conference on Computer Vision. Springer, 852--869.
[7]
Cong Ma, Yuan Li, Fan Yang, Ziwei Zhang, Yueqing Zhuang, Huizhu Jia, and Xiaodong Xie. 2019. Deep Association: End-to-end Graph-Based Learning for Multiple Object Tracking with Conv-Graph Neural Network. In Proceedings of the 2019 on International Conference on Multimedia Retrieval. ACM, 253--261.
[8]
Esteban Real, Jonathon Shlens, Stefano Mazzocchi, Xin Pan, and Vincent Vanhoucke. 2017. Youtube-bounding boxes: A large high-precision human-annotated data set for object detection in video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5296--5305.
[9]
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et almbox. 2015. Imagenet large scale visual recognition challenge. International journal of computer vision, Vol. 115, 3 (2015), 211--252.
[10]
Xindi Shang, Donglin Di, Junbin Xiao, Yu Cao, Xun Yang, and Tat-Seng Chua. 2019. Annotating Objects and Relations in User-Generated Videos. In Proceedings of the 2019 on International Conference on Multimedia Retrieval. ACM, 279--287.
[11]
Xindi Shang, Tongwei Ren, Jingfan Guo, Hanwang Zhang, and Tat-Seng Chua. 2017a. Video visual relation detection. In Proceedings of the 25th ACM international conference on Multimedia. ACM, 1300--1308.
[12]
Xindi Shang, Tongwei Ren, Hanwang Zhang, Gangshan Wu, and Tat-Seng Chua. 2017b. Object trajectory proposal. In 2017 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 331--336.
[13]
Bart Thomee, David A Shamma, Gerald Friedland, Benjamin Elizalde, Karl Ni, Douglas Poland, Damian Borth, and Li-Jia Li. [n. d.]. YFCC100M: The New Data in Multimedia Research. Commun. ACM, Vol. 59, 2 ( [n. d.]), 64--73.
[14]
Yunlu Xu, Chengwei Zhang, Zhanzhan Cheng, Jianwen Xie, Yi Niu, Shiliang Pu, and Fei Wu. 2019. Segregated Temporal Assembly Recurrent Networks for Weakly Supervised Multiple Action Detection., Vol. 33 (2019), 9070--9078.
[15]
Xun Yang, Yunshan Ma, Lizi Liao, Meng Wang, and Tat-Seng Chua. 2019. TransNFCM: translation-based neural fashion compatibility modeling. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 403--410.
[16]
Xun Yang, Meng Wang, Luming Zhang, Fuming Sun, Richang Hong, and Meibin Qi. 2016. An efficient tracking system by orthogonalized templates. IEEE Transactions on Industrial Electronics, Vol. 63, 5 (2016), 3187--3197.

Cited By

View all
  • (2024)Hierarchical Perceptual and Predictive Analogy-Inference Network for Abstract Visual ReasoningProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681246(4841-4850)Online publication date: 28-Oct-2024
  • (2024)Scene Graph Generation: A comprehensive surveyNeurocomputing10.1016/j.neucom.2023.127052566(127052)Online publication date: Jan-2024
  • (2023)Contrastive Video Question Answering via Video Graph TransformerIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.3292266(1-14)Online publication date: 2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '19: Proceedings of the 27th ACM International Conference on Multimedia
October 2019
2794 pages
ISBN:9781450368896
DOI:10.1145/3343031
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 October 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. action recognition
  2. object detection
  3. spatio-temporal
  4. video content analysis
  5. visual relation

Qualifiers

  • Research-article

Funding Sources

  • National Research Foundation Singapore

Conference

MM '19
Sponsor:

Acceptance Rates

MM '19 Paper Acceptance Rate 252 of 936 submissions, 27%;
Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)19
  • Downloads (Last 6 weeks)3
Reflects downloads up to 21 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Hierarchical Perceptual and Predictive Analogy-Inference Network for Abstract Visual ReasoningProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681246(4841-4850)Online publication date: 28-Oct-2024
  • (2024)Scene Graph Generation: A comprehensive surveyNeurocomputing10.1016/j.neucom.2023.127052566(127052)Online publication date: Jan-2024
  • (2023)Contrastive Video Question Answering via Video Graph TransformerIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.3292266(1-14)Online publication date: 2023
  • (2023)Video Visual Relation Detection With Contextual Knowledge EmbeddingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.327032835:12(13083-13095)Online publication date: 1-Dec-2023
  • (2023)Counterfactual Inference for Visual Relationship Detection in Videos2023 IEEE International Conference on Multimedia and Expo (ICME)10.1109/ICME55011.2023.00036(162-167)Online publication date: Jul-2023
  • (2022)Attention Guided Relation Detection Approach for Video Visual Relation DetectionIEEE Transactions on Multimedia10.1109/TMM.2021.310943024(3896-3907)Online publication date: 2022
  • (2022)Video Graph Transformer for Video Question AnsweringComputer Vision – ECCV 202210.1007/978-3-031-20059-5_3(39-58)Online publication date: 29-Oct-2022
  • (2021)VidVRD 2021Proceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3479232(4779-4783)Online publication date: 17-Oct-2021
  • (2021)Interventional Video Relation DetectionProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3475540(4091-4099)Online publication date: 17-Oct-2021
  • (2021)Video Visual Relation Detection via Iterative InferenceProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3475263(3654-3663)Online publication date: 17-Oct-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media