Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleOctober 2024
Tunnel Try-on: Excavating Spatial-temporal Tunnels for High-quality Virtual Try-on in Videos
- Zhengze Xu,
- Mengting Chen,
- Zhao Wang,
- Linyu Xing,
- Zhonghua Zhai,
- Nong Sang,
- Jinsong Lan,
- Shuai Xiao,
- Changxin Gao
MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 3199–3208https://doi.org/10.1145/3664647.3680836Video try-on is challenging and has not been well tackled in previous works. The main obstacle lies in preserving the clothing details and modeling the coherent motions simultaneously. Faced with those difficulties, we address video try-on by proposing a ...
- ArticleNovember 2024
EfficientMatting: Bilateral Matting Network for Real-Time Human Matting
AbstractRecent human matting methods typically suffer from two drawbacks: 1) high computation overhead caused by multiple stages, and 2) limited practical application due to the need for auxiliary guidance (e.g., trimap, mask, or background). To address ...
- research-articleJuly 2024
Query-centric distance modulator for few-shot classification
AbstractFew-shot classification (FSC) is a highly challenging task, as only a small number of labeled samples are available when identifying new categories. Distance metric learning-based methods have emerged as a prominent approach to FSC, which ...
Highlights- A plug-and-play query-centric distance modulator for distance metric learning-based FSC methods.
- Weight generation through data inconsistency within each channel from different classes.
- Decoupled from the process of feature ...
- research-articleMarch 2024
HyRSM++: Hybrid relation guided temporal set matching for few-shot action recognition
AbstractFew-shot action recognition is a challenging but practical problem aiming to learn a model that can be easily adapted to identify new action categories with only a few labeled samples. However, existing attempts still suffer from two drawbacks: (...
Highlights- A new temporal coherence regularization on videos is proposed.
- Capturing the intra- and inter-relations inside the episodic task.
- Reformulating the query-support metric as a set matching problem.
- research-articleJanuary 2024
DIMGNet: A Transformer-Based Network for Pedestrian Reidentification With Multi-Granularity Information Mutual Gain
- Runmin Wang,
- Zhenlin Zhu,
- Yanbin Zhu,
- Hua Chen,
- Yongzhong Liao,
- Ziyu Zhu,
- Yajun Ding,
- Changxin Gao,
- Nong Sang
IEEE Transactions on Multimedia (TOM), Volume 26Pages 6513–6528https://doi.org/10.1109/TMM.2024.3352896Pedestrian reidentification (ReID) is a challenging task that involves identifying and retrieving specific pedestrians across different cameras and scenes. This problem has significant implications for security surveillance, and has thus received ...
-
- research-articleMay 2024
Lookup table meets local laplacian filter: pyramid reconstruction network for tone mapping
NIPS '23: Proceedings of the 37th International Conference on Neural Information Processing SystemsArticle No.: 2510, Pages 57558–57569Tone mapping aims to convert high dynamic range (HDR) images to low dynamic range (LDR) representations, a critical task in the camera imaging pipeline. In recent years, 3-Dimensional Look-Up Table (3D LUT) based methods have gained attention due to ...
- research-articleDecember 2023
Difficulty-Aware Dynamic Network for Lightweight Exposure Correction
IEEE Transactions on Circuits and Systems for Video Technology (IEEETCSVT), Volume 34, Issue 6Pages 5033–5048https://doi.org/10.1109/TCSVT.2023.3340506Recently, deep learning-based methods have been successfully applied to the field of exposure correction. However, most of the existing methods treat different locations of an image in the same way, ignoring the inhomogeneous recovery difficulty and ...
- ArticleNovember 2023
Frequency Information Matters for Image Matting
AbstractImage matting aims to estimate the opacity of foreground objects in order to accurately extract them from the background. Existing methods are only concerned with RGB features to obtain alpha mattes, limiting the perception of local tiny details. ...
- ArticleNovember 2023
Portrait Matting via Semantic and Detail Guidance
AbstractPortrait matting is a challenging computer vision task that aims to estimate the per-pixel opacity of the foreground human regions. To produce high-quality alpha mattes, the majority of available methods employ a user-supplied trimap as an ...
- research-articleFebruary 2024
Focusing on features with higher importance degree: A significance-wise mechanism for image restoration▪
AbstractWith the aim of retrieving high-quality images from corrupted versions, image restoration meets the extensive demand of application scenarios. State-of-the-art methods solve this problem by means of designing convolution blocks in multistage ...
- research-articleOctober 2023
CLIP-guided Prototype Modulating for Few-shot Action Recognition
International Journal of Computer Vision (IJCV), Volume 132, Issue 6Pages 1899–1912https://doi.org/10.1007/s11263-023-01917-4AbstractLearning from large-scale contrastive language-image pre-training like CLIP has shown remarkable success in a wide range of downstream tasks recently, but it is still under-explored on the challenging few-shot action recognition (FSAR) task. In ...
- ArticleDecember 2023
Self-supervised Low-Light Image Enhancement via Histogram Equalization Prior
AbstractDeep learning-based methods for low-light image enhancement have achieved remarkable success. However, the requirement of enormous paired real data limits the generality of these models. Although there have been a few attempts in training low-...
- research-articleOctober 2023
Self-Supervised Learning from Untrimmed Videos via Hierarchical Consistency
IEEE Transactions on Pattern Analysis and Machine Intelligence (ITPM), Volume 45, Issue 10Pages 12408–12426https://doi.org/10.1109/TPAMI.2023.3273415Natural untrimmed videos provide rich visual content for self-supervised learning. Yet most previous efforts to learn spatio-temporal representations rely on manually trimmed videos, such as Kinetics dataset (Carreira and Zisserman 2017), resulting in ...
- research-articleAugust 2023
Cross-domain few-shot action recognition with unlabeled videos
Computer Vision and Image Understanding (CVIU), Volume 233, Issue Chttps://doi.org/10.1016/j.cviu.2023.103737AbstractCurrent few-shot action recognition approaches have achieved impressive performance using only a few labeled examples. However, they usually assume the base (train) and target (test) videos typically come from the same domain, which may limit ...
Highlights- Few-shot action recognition methods perform poorly in cross-domain situations.
- Self-supervised learning can alleviate domain shift.
- Temporal modeling is important in the cross-domain few-shot action setting.
- This is the first ...
- research-articleJuly 2023
An Adaptive Post-Processing Network With the Global-Local Aggregation for Semantic Segmentation
IEEE Transactions on Circuits and Systems for Video Technology (IEEETCSVT), Volume 34, Issue 2Pages 1159–1173https://doi.org/10.1109/TCSVT.2023.3292156Current semantic segmentation methods mainly focus on modeling the context of the global image to obtain high-quality segmentation results. However, they ignore the role of local image patches, which contain complementary and effective context ...
- research-articleJuly 2023
Improving the Generalization of MAML in Few-Shot Classification via Bi-Level Constraint
IEEE Transactions on Circuits and Systems for Video Technology (IEEETCSVT), Volume 33, Issue 7Pages 3284–3295https://doi.org/10.1109/TCSVT.2022.3232717Few-shot classification (FSC), which aims to identify novel classes in the presence of a few labeled samples, has drawn vast attention in recent years. One of the representative few-shot classification methods is model-agnostic meta-learning (MAML), which ...
- research-articleJune 2023
Semantic segmentation via pixel‐to‐center similarity calculation
CAAI Transactions on Intelligence Technology (CIT2), Volume 9, Issue 1Pages 87–100https://doi.org/10.1049/cit2.12245AbstractSince the fully convolutional network has achieved great success in semantic segmentation, lots of works have been proposed to extract discriminative pixel representations. However, the authors observe that existing methods still suffer from two ...
- research-articleJune 2023
DMRNet++: Learning Discriminative Features With Decoupled Networks and Enriched Pairs for One-Step Person Search
IEEE Transactions on Pattern Analysis and Machine Intelligence (ITPM), Volume 45, Issue 6Pages 7319–7337https://doi.org/10.1109/TPAMI.2022.3221079Person search aims at localizing and recognizing query persons from raw video frames, which is a combination of two sub-tasks, i.e., pedestrian detection and person re-identification. The dominant fashion is termed as the one-step person search that ...
- research-articleMay 2023
Camera distance helps 3D hand pose estimated from a single RGB image
AbstractMost existing methods for RGB hand pose estimation use root-relative 3D coordinates for supervision. However, such supervision neglects the distance between the camera and the object (i.e., the hand). The camera distance is especially ...
Graphical abstractDisplay Omitted
Highlights- The same hand pose projected different 2D shapes due to the camera distance.
- ...
- research-articleMarch 2023
MAR: <underline>M</underline>asked Autoencoders for Efficient <underline>A</underline>ction <underline>R</underline>ecognition
IEEE Transactions on Multimedia (TOM), Volume 26Pages 218–233https://doi.org/10.1109/TMM.2023.3263288Standard approaches for video action recognition usually operate on full input videos, which is inefficient due to the widespread spatio-temporal redundancy in videos. The recent progress in masked video modelling, specifically VideoMAE, has shown the ...