Compositional Structure Learning for Action Understanding
Abstract
The focus of the action understanding literature has predominately been classification, how- ever, there are many applications demanding richer action understanding such as mobile robotics and video search, with solutions to classification, localization and detection. In this paper, we propose a compositional model that leverages a new mid-level representation called compositional trajectories and a locally articulated spatiotemporal deformable parts model (LALSDPM) for fully action understanding. Our methods is advantageous in capturing the variable structure of dynamic human activity over a long range. First, the compositional trajectories capture long-ranging, frequently co-occurring groups of trajectories in space time and represent them in discriminative hierarchies, where human motion is largely separated from camera motion; second, LASTDPM learns a structured model with multi-layer deformable parts to capture multiple levels of articulated motion. We implement our methods and demonstrate state of the art performance on all three problems: action detection, localization, and recognition.
- Publication:
-
arXiv e-prints
- Pub Date:
- October 2014
- DOI:
- 10.48550/arXiv.1410.5861
- arXiv:
- arXiv:1410.5861
- Bibcode:
- 2014arXiv1410.5861X
- Keywords:
-
- Computer Science - Computer Vision and Pattern Recognition
- E-Print:
- 13 pages