Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3265987acmconferencesBook PagePublication PagesmmConference Proceedingsconference-collections
CoVieW'18: Proceedings of the 1st Workshop and Challenge on Comprehensive Video Understanding in the Wild
ACM2018 Proceeding
Publisher:
  • Association for Computing Machinery
  • New York
  • NY
  • United States
Conference:
MM '18: ACM Multimedia Conference Seoul Republic of Korea 22 October 2018
ISBN:
978-1-4503-5976-4
Published:
15 October 2018
Sponsors:

Reflects downloads up to 14 Dec 2024Bibliometrics
Skip Abstract Section
Abstract

It is our great pleasure to welcome you to the 1st Workshop and Challenge on Comprehensive Video Understanding in the Wild- CoVieW'18, which is held in Seoul, Korea on October 22, 2018, in conjunction with ACM Multimedia 2018. The workshop aims to solve the comprehensive understanding in untrimmed videos with a particular emphasis on joint action and scene recognition. The workshop encourages researchers to participate in our challenge and to report their results.

The workshop consists of two tracks. The first track invites a paper that addresses video action and scene recognition or related topics. The second track is the challenge section that focuses on the evaluation on multi-task action and scene recognition on the new untrimmed video dataset, called the Multi-task Action and Scene Recognition dataset. Several papers were submitted to our workshop, and each paper was reviewed by at least two technical program committee members. Three papers were finally accepted for the first track, and three papers were accepted for the second challenge track. The accepted papers will be presented at the workshop..

Skip Table Of Content Section
SESSION: Keynote & Invited Talks
keynote
Deep Video Understanding: Representation Learning, Action Recognition, and Language Generation

Analyzing videos is one of the fundamental problems of computer vision and multimedia analysis for decades. The task is very challenging as video is an information-intensive media with large variations and complexities. Thanks to the recent development ...

invited-talk
Actor and Observer: Joint Modeling of First and Third-Person Videos

Several theories in cognitive neuroscience suggest that when people interact with the world, or simulate interactions, they do so from a first-person egocentric perspective, and seamlessly transfer knowledge between third-person (observer) and first-...

invited-talk
Explore Multi-Step Reasoning in Video Question Answering

This invited talk is a repeated but more detailed talk about the paper which is accepted by ACM-MM 2018: Video question answering (VideoQA) always involves visual reasoning. When answering questions composing of multiple logic correlations, models need ...

SESSION: Session 1: Regular Track
research-article
Joint Object Tracking and Segmentation with Independent Convolutional Neural Networks

Object tracking and segmentation are important research topics in computer vision. They provide the trajectory and boundary of an object based on their appearance and shape features. Most studies on tracking and segmentation focus on encoding methods ...

research-article
Stereo Vision aided Image Dehazing using Deep Neural Network

Deterioration of image due to haze is one of the factors that degrade the performance of computer vision algorithm. The haze component absorbs and reflects the reflected light from the object, distorting the original irradiance. The more the distance ...

research-article
Learning to Detect, Associate, and Recognize Human Actions and Surrounding Scenes in Untrimmed Videos

While recognizing human actions and surrounding scenes addresses different aspects of video understanding, they have strong correlations that can be used to complement the singular information of each other. In this paper, we propose an approach for ...

SESSION: Session 2: Challenge Track
research-article
Multi-task Joint Learning for Videos in the Wild

Most of the conventional state-of-the-art methods for video analysis achieve outstanding performance by combining two or more different inputs, e.g. an RGB image, a motion image, or an audio signal, in a two-stream manner. Although these approaches ...

research-article
New Feature-level Video Classification via Temporal Attention Model

CoVieW 2018 is a new challenge which aims at simultaneous scene and action recognition for untrimmed video [1]. In the challenge, frame-level video features extracted by pre-trained deep convolutional neural network (CNN) are provided for video-level ...

research-article
Video Understanding via Convolutional Temporal Pooling Network and Multimodal Feature Fusion

In this paper, we present a new end-to-end convolutional neural network architecture for video classification, and apply the model to action and scene recognition in untrimmed videos for the Challenge on Comprehensive Video Understanding in the Wild. ...

Contributors
  • Yonsei University
  • UC Merced
  • Yonsei University
  • Hanyang University
  • National Taiwan University of Science and Technology

Index Terms

  1. Proceedings of the 1st Workshop and Challenge on Comprehensive Video Understanding in the Wild
        Index terms have been assigned to the content through auto-classification.
        Please enable JavaScript to view thecomments powered by Disqus.

        Recommendations