IJCV: Vol 132, No 3

Volume 132, Issue 3Mar 2024

Volume 132, Issue 3

Mar 2024

Publisher:

Kluwer Academic Publishers
101 Philip Drive Assinippi Park Norwell, MA
United States

ISSN:0920-5691

Tags:

Bibliometrics

Select All

Export Citations Save to Binder

research-article

Correspondence Distillation from NeRF-Based GAN

Pages 611–631https://doi.org/10.1007/s11263-023-01903-w

Abstract

The neural radiance field (NeRF) has shown promising results in preserving the fine details of objects and scenes. However, unlike explicit shape representations e.g., mesh, it remains an open problem to build dense correspondences across ...

news

Editor’s Note: Special Issue on Physics-Based Vision Meets Deep Learning

Page 632https://doi.org/10.1007/s11263-023-01897-5

research-article

Dual Graph Networks for Pose Estimation in Crowded Scenes

Pages 633–653https://doi.org/10.1007/s11263-023-01901-y

Abstract

Pose estimation in crowded scenes is key to understanding human behavior in real-life applications. Most existing CNN-based pose estimation methods often depend on the appearance of visible parts as cues to localize human joints. However, ...

research-article

Source-Free Domain Adaptation via Target Prediction Distribution Searching

Pages 654–672https://doi.org/10.1007/s11263-023-01892-w

Abstract

Existing Source-Free Domain Adaptation (SFDA) methods typically adopt the feature distribution alignment paradigm via mining auxiliary information (eg., pseudo-labelling, source domain data generation). However, they are largely limited due to ...

research-article

Multi-Modal Meta-Transfer Fusion Network for Few-Shot 3D Model Classification

Pages 673–688https://doi.org/10.1007/s11263-023-01905-8

Abstract

Nowadays, driven by the increasing concern on 3D techniques, resulting in the large-scale 3D data, 3D model classification has attracted enormous attention from both research and industry communities. Most of the current methods highly depend on ... $^{}$

research-article

Public Access

Are Vision Transformers Robust to Spurious Correlations?

Pages 689–709https://doi.org/10.1007/s11263-023-01916-5

Abstract

Deep neural networks may be susceptible to learning spurious correlations that hold on average but not in atypical test samples. As with the recent emergence of vision transformer (ViT) models, it remains unexplored how spurious correlations are ...

research-article

Semantic Image Matting: General and Specific Semantics

Pages 710–730https://doi.org/10.1007/s11263-023-01907-6

Abstract

Although conventional matting formulation can separate foreground from background in fractional occupancy which can be caused by highly transparent objects, complex foreground (e.g., net or tree), and objects containing very fine details (e.g., ...

research-article

SCT: A Simple Baseline for Parameter-Efficient Fine-Tuning via Salient Channels

Pages 731–749https://doi.org/10.1007/s11263-023-01918-3

Abstract

Pre-trained vision transformers have strong representations benefit to various downstream tasks. Recently many parameter-efficient fine-tuning (PEFT) methods have been proposed, and their experiments demonstrate that tuning only 1% extra ...

research-article

Background Activation Suppression for Weakly Supervised Object Localization and Semantic Segmentation

Pages 750–775https://doi.org/10.1007/s11263-023-01919-2

Abstract

Weakly supervised object localization and semantic segmentation aim to localize objects using only image-level labels. Recently, a new paradigm has emerged by generating a foreground prediction map (FPM) to achieve pixel-level localization. While ...

research-article

DeepFTSG: Multi-stream Asymmetric USE-Net Trellis Encoders with Shared Decoder Feature Fusion Architecture for Video Motion Segmentation

Pages 776–804https://doi.org/10.1007/s11263-023-01910-x

Abstract

Discriminating salient moving objects against complex, cluttered backgrounds, with occlusions and challenging environmental conditions like weather and illumination, is essential for stateful scene perception in autonomous systems. We propose a ...

research-article

SignParser: An End-to-End Framework for Traffic Sign Understanding

Pages 805–821https://doi.org/10.1007/s11263-023-01912-9

Abstract

In intelligent transportation systems, parsing traffic signs and transmitting traffic information to humans is an urgent need. However, despite the success achieved in the detection and recognition of low-level circular or triangular traffic signs,...

research-article

MixStyle Neural Networks for Domain Generalization and Adaptation

Pages 822–836https://doi.org/10.1007/s11263-023-01913-8

Abstract

Neural networks do not generalize well to unseen data with domain shifts—a longstanding problem in machine learning and AI. To overcome the problem, we propose MixStyle, a simple plug-and-play, parameter-free module that can improve domain ...

research-article

Style-Hallucinated Dual Consistency Learning: A Unified Framework for Visual Domain Generalization

Pages 837–853https://doi.org/10.1007/s11263-023-01911-w

Abstract

Domain shift widely exists in the visual world, while modern deep neural networks commonly suffer from severe performance degradation under domain shift due to poor generalization ability, which limits real-world applications. The domain shift ...

research-article

In the Eye of Transformer: Global–Local Correlation for Egocentric Gaze Estimation and Beyond

Pages 854–871https://doi.org/10.1007/s11263-023-01879-7

Abstract

Predicting human’s gaze from egocentric videos serves as a critical role for human intention understanding in daily activities. In this paper, we present the first transformer-based model to address the challenging problem of egocentric gaze ...

research-article

SOTVerse: A User-Defined Task Space of Single Object Tracking

Pages 872–930https://doi.org/10.1007/s11263-023-01908-5

Abstract

Single object tracking (SOT) research falls into a cycle—trackers perform well on most benchmarks but quickly fail in challenging scenarios, causing researchers to doubt the insufficient data content and take more effort to construct larger ...

research-article

3D Adversarial Augmentations for Robust Out-of-Domain Predictions

Pages 931–963https://doi.org/10.1007/s11263-023-01914-7

Abstract

Since real-world training datasets cannot properly sample the long tail of the underlying data distribution, corner cases and rare out-of-domain samples can severely hinder the performance of state-of-the-art models. This problem becomes even more ...

research-article

Inferring Attention Shifts for Salient Instance Ranking

Pages 964–986https://doi.org/10.1007/s11263-023-01906-7

Abstract

The human visual system has limited capacity in simultaneously processing multiple visual inputs. Consequently, humans rely on shifting their attention from one location to another. When viewing an image of complex scenes, psychology studies and ...

research-article

Indoor Obstacle Discovery on Reflective Ground via Monocular Camera

Pages 987–1007https://doi.org/10.1007/s11263-023-01925-4

Abstract

Visual obstacle discovery is a key step towards autonomous navigation of indoor mobile robots. Successful solutions have many applications in multiple scenes. One of the exceptions is the reflective ground. In this case, the reflections on the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

International Journal of Computer Vision

Sections

Correspondence Distillation from NeRF-Based GAN

Editor’s Note: Special Issue on Physics-Based Vision Meets Deep Learning

Dual Graph Networks for Pose Estimation in Crowded Scenes

Source-Free Domain Adaptation via Target Prediction Distribution Searching

Multi-Modal Meta-Transfer Fusion Network for Few-Shot 3D Model Classification

Are Vision Transformers Robust to Spurious Correlations?

Semantic Image Matting: General and Specific Semantics

SCT: A Simple Baseline for Parameter-Efficient Fine-Tuning via Salient Channels

Background Activation Suppression for Weakly Supervised Object Localization and Semantic Segmentation

DeepFTSG: Multi-stream Asymmetric USE-Net Trellis Encoders with Shared Decoder Feature Fusion Architecture for Video Motion Segmentation

SignParser: An End-to-End Framework for Traffic Sign Understanding

MixStyle Neural Networks for Domain Generalization and Adaptation

Style-Hallucinated Dual Consistency Learning: A Unified Framework for Visual Domain Generalization

In the Eye of Transformer: Global–Local Correlation for Egocentric Gaze Estimation and Beyond

SOTVerse: A User-Defined Task Space of Single Object Tracking

3D Adversarial Augmentations for Robust Out-of-Domain Predictions

Inferring Attention Shifts for Salient Instance Ranking

Indoor Obstacle Discovery on Reflective Ground via Monocular Camera