Physical Representation Learning and Parameter Identification from Video Using Differentiable Physics
Representation learning for video is increasingly gaining attention in the field of computer vision. For instance, video prediction models enable activity and scene forecasting or vision-based planning and control. In this article, we investigate ...
Inferring Bias and Uncertainty in Camera Calibration
Accurate camera calibration is a precondition for many computer vision applications. Calibration errors, such as wrong model assumptions or imprecise parameter estimation, can deteriorate a system’s overall performance, making the reliable ...
Rescaling Egocentric Vision: Collection, Pipeline and Challenges for EPIC-KITCHENS-100
- Dima Damen,
- Hazel Doughty,
- Giovanni Maria Farinella,
- Antonino Furnari,
- Evangelos Kazakos,
- Jian Ma,
- Davide Moltisanti,
- Jonathan Munro,
- Toby Perrett,
- Will Price,
- Michael Wray
This paper introduces the pipeline to extend the largest dataset in egocentric vision, EPIC-KITCHENS. The effort culminates in EPIC-KITCHENS-100, a collection of 100 hours, 20M frames, 90K actions in 700 variable-length videos, capturing long-term ...
Learning a Robust Part-Aware Monocular 3D Human Pose Estimator via Neural Architecture Search
Even though most existing monocular 3D human pose estimation methods achieve very competitive performance, they are limited in estimating heterogeneous human body parts with the same decoder architecture. In this work, we present an approach to ...
Dual-Attention-Guided Network for Ghost-Free High Dynamic Range Imaging
Ghosting artifacts caused by moving objects and misalignments are a key challenge in constructing high dynamic range (HDR) images. Current methods first register the input low dynamic range (LDR) images using optical flow before merging them. This ...
Distribution-Aware Margin Calibration for Semantic Segmentation in Images
The Jaccard index, also known as Intersection-over-Union (IoU), is one of the most critical evaluation metrics in image semantic segmentation. However, direct optimization of IoU score is very difficult because the learning objective is neither ...
View-Invariant, Occlusion-Robust Probabilistic Embedding for Human Pose
- Ting Liu,
- Jennifer J. Sun,
- Long Zhao,
- Jiaping Zhao,
- Liangzhe Yuan,
- Yuxiao Wang,
- Liang-Chieh Chen,
- Florian Schroff,
- Hartwig Adam
Recognition of human poses and actions is crucial for autonomous systems to interact smoothly with people. However, cameras generally capture human poses in 2D as images and videos, which can have significant appearance variations across ...
Joint Bilateral-Resolution Identity Modeling for Cross-Resolution Person Re-Identification
- Wei-Shi Zheng,
- Jincheng Hong,
- Jiening Jiao,
- Ancong Wu,
- Xiatian Zhu,
- Shaogang Gong,
- Jiayin Qin,
- Jianhuang Lai
Person images captured by public surveillance cameras often have low resolutions (LRs), along with uncontrolled pose variations, background clutter and occlusion. These issues cause the resolution mismatch problem when matched with high-resolution ...
Memory-Efficient Hierarchical Neural Architecture Search for Image Restoration
Recently, much attention has been spent on neural architecture search (NAS), aiming to outperform those manually-designed neural architectures on high-level vision recognition tasks. Inspired by the success, here we attempt to leverage NAS ...
Semantic Edge Detection with Diverse Deep Supervision
Semantic edge detection (SED), which aims at jointly extracting edges as well as their category information, has far-reaching applications in domains such as semantic segmentation, object proposal generation, and object recognition. SED naturally ...