Correspondence Distillation from NeRF-Based GAN
The neural radiance field (NeRF) has shown promising results in preserving the fine details of objects and scenes. However, unlike explicit shape representations e.g., mesh, it remains an open problem to build dense correspondences across ...
Dual Graph Networks for Pose Estimation in Crowded Scenes
Pose estimation in crowded scenes is key to understanding human behavior in real-life applications. Most existing CNN-based pose estimation methods often depend on the appearance of visible parts as cues to localize human joints. However, ...
Source-Free Domain Adaptation via Target Prediction Distribution Searching
Existing Source-Free Domain Adaptation (SFDA) methods typically adopt the feature distribution alignment paradigm via mining auxiliary information (eg., pseudo-labelling, source domain data generation). However, they are largely limited due to ...
Are Vision Transformers Robust to Spurious Correlations?
Deep neural networks may be susceptible to learning spurious correlations that hold on average but not in atypical test samples. As with the recent emergence of vision transformer (ViT) models, it remains unexplored how spurious correlations are ...
Semantic Image Matting: General and Specific Semantics
Although conventional matting formulation can separate foreground from background in fractional occupancy which can be caused by highly transparent objects, complex foreground (e.g., net or tree), and objects containing very fine details (e.g., ...
Background Activation Suppression for Weakly Supervised Object Localization and Semantic Segmentation
Weakly supervised object localization and semantic segmentation aim to localize objects using only image-level labels. Recently, a new paradigm has emerged by generating a foreground prediction map (FPM) to achieve pixel-level localization. While ...
DeepFTSG: Multi-stream Asymmetric USE-Net Trellis Encoders with Shared Decoder Feature Fusion Architecture for Video Motion Segmentation
Discriminating salient moving objects against complex, cluttered backgrounds, with occlusions and challenging environmental conditions like weather and illumination, is essential for stateful scene perception in autonomous systems. We propose a ...
SignParser: An End-to-End Framework for Traffic Sign Understanding
In intelligent transportation systems, parsing traffic signs and transmitting traffic information to humans is an urgent need. However, despite the success achieved in the detection and recognition of low-level circular or triangular traffic signs,...
MixStyle Neural Networks for Domain Generalization and Adaptation
Neural networks do not generalize well to unseen data with domain shifts—a longstanding problem in machine learning and AI. To overcome the problem, we propose MixStyle, a simple plug-and-play, parameter-free module that can improve domain ...
Style-Hallucinated Dual Consistency Learning: A Unified Framework for Visual Domain Generalization
Domain shift widely exists in the visual world, while modern deep neural networks commonly suffer from severe performance degradation under domain shift due to poor generalization ability, which limits real-world applications. The domain shift ...
In the Eye of Transformer: Global–Local Correlation for Egocentric Gaze Estimation and Beyond
Predicting human’s gaze from egocentric videos serves as a critical role for human intention understanding in daily activities. In this paper, we present the first transformer-based model to address the challenging problem of egocentric gaze ...
SOTVerse: A User-Defined Task Space of Single Object Tracking
Single object tracking (SOT) research falls into a cycle—trackers perform well on most benchmarks but quickly fail in challenging scenarios, causing researchers to doubt the insufficient data content and take more effort to construct larger ...
3D Adversarial Augmentations for Robust Out-of-Domain Predictions
- Alexander Lehner,
- Stefano Gasperini,
- Alvaro Marcos-Ramiro,
- Michael Schmidt,
- Nassir Navab,
- Benjamin Busam,
- Federico Tombari
Since real-world training datasets cannot properly sample the long tail of the underlying data distribution, corner cases and rare out-of-domain samples can severely hinder the performance of state-of-the-art models. This problem becomes even more ...
Inferring Attention Shifts for Salient Instance Ranking
The human visual system has limited capacity in simultaneously processing multiple visual inputs. Consequently, humans rely on shifting their attention from one location to another. When viewing an image of complex scenes, psychology studies and ...
Indoor Obstacle Discovery on Reflective Ground via Monocular Camera
Visual obstacle discovery is a key step towards autonomous navigation of indoor mobile robots. Successful solutions have many applications in multiple scenes. One of the exceptions is the reflective ground. In this case, the reflections on the ...