It is our great pleasure to welcome you to the 1st Workshop and Challenge on Comprehensive Video Understanding in the Wild- CoVieW'18, which is held in Seoul, Korea on October 22, 2018, in conjunction with ACM Multimedia 2018. The workshop aims to solve the comprehensive understanding in untrimmed videos with a particular emphasis on joint action and scene recognition. The workshop encourages researchers to participate in our challenge and to report their results.
The workshop consists of two tracks. The first track invites a paper that addresses video action and scene recognition or related topics. The second track is the challenge section that focuses on the evaluation on multi-task action and scene recognition on the new untrimmed video dataset, called the Multi-task Action and Scene Recognition dataset. Several papers were submitted to our workshop, and each paper was reviewed by at least two technical program committee members. Three papers were finally accepted for the first track, and three papers were accepted for the second challenge track. The accepted papers will be presented at the workshop..
Proceeding Downloads
Deep Video Understanding: Representation Learning, Action Recognition, and Language Generation
Analyzing videos is one of the fundamental problems of computer vision and multimedia analysis for decades. The task is very challenging as video is an information-intensive media with large variations and complexities. Thanks to the recent development ...
Actor and Observer: Joint Modeling of First and Third-Person Videos
Several theories in cognitive neuroscience suggest that when people interact with the world, or simulate interactions, they do so from a first-person egocentric perspective, and seamlessly transfer knowledge between third-person (observer) and first-...
Explore Multi-Step Reasoning in Video Question Answering
This invited talk is a repeated but more detailed talk about the paper which is accepted by ACM-MM 2018: Video question answering (VideoQA) always involves visual reasoning. When answering questions composing of multiple logic correlations, models need ...
Joint Object Tracking and Segmentation with Independent Convolutional Neural Networks
Object tracking and segmentation are important research topics in computer vision. They provide the trajectory and boundary of an object based on their appearance and shape features. Most studies on tracking and segmentation focus on encoding methods ...
Stereo Vision aided Image Dehazing using Deep Neural Network
Deterioration of image due to haze is one of the factors that degrade the performance of computer vision algorithm. The haze component absorbs and reflects the reflected light from the object, distorting the original irradiance. The more the distance ...
Learning to Detect, Associate, and Recognize Human Actions and Surrounding Scenes in Untrimmed Videos
While recognizing human actions and surrounding scenes addresses different aspects of video understanding, they have strong correlations that can be used to complement the singular information of each other. In this paper, we propose an approach for ...
Multi-task Joint Learning for Videos in the Wild
Most of the conventional state-of-the-art methods for video analysis achieve outstanding performance by combining two or more different inputs, e.g. an RGB image, a motion image, or an audio signal, in a two-stream manner. Although these approaches ...
New Feature-level Video Classification via Temporal Attention Model
CoVieW 2018 is a new challenge which aims at simultaneous scene and action recognition for untrimmed video [1]. In the challenge, frame-level video features extracted by pre-trained deep convolutional neural network (CNN) are provided for video-level ...
Video Understanding via Convolutional Temporal Pooling Network and Multimodal Feature Fusion
In this paper, we present a new end-to-end convolutional neural network architecture for video classification, and apply the model to action and scene recognition in untrimmed videos for the Challenge on Comprehensive Video Understanding in the Wild. ...
Index Terms
- Proceedings of the 1st Workshop and Challenge on Comprehensive Video Understanding in the Wild