Multi-level pyramid fusion for efficient stereo matching
Stereo matching is a key technology for many autonomous driving and robotics applications. Recently, methods based on Convolutional Neural Network have achieved huge progress. However, it is still difficult to find accurate matching points in ...
Propagating prior information with transformer for robust visual object tracking
In recent years, the domain of visual object tracking has witnessed considerable advancements with the advent of deep learning methodologies. Siamese-based trackers have been pivotal, establishing a new architecture with a weight-shared backbone. ...
Underwater image enhancement based on weighted guided filter image fusion
An underwater image enhancement technique based on weighted guided filter image fusion is proposed to address challenges, including optical absorption and scattering, color distortion, and uneven illumination. The method consists of three stages: ...
Exploring multi-dimensional interests for session-based recommendation
Session-based recommendation (SBR) aims to recommend the next clicked item to users by mining the user’s interaction sequences in the current session. It has received widespread attention recently due to its excellent privacy protection ...
Discrete codebook collaborating with transformer for thangka image inpainting
Thangka, as a precious heritage of painting art, holds irreplaceable research value due to its richness in Tibetan history, religious beliefs, and folk culture. However, it is susceptible to partial damage and form distortion due to natural ...
HNQA: histogram-based descriptors for fast night-time image quality assessment
Taking high quality images at night is a challenging issue for many applications. Therefore, assessing the quality of night-time images (NTIs) is a significant area of research. Since there is no reference image for such images, night-time image ...
3D human pose estimation method based on multi-constrained dilated convolutions
In recent years, research on 2D to 3D human pose estimation methods has gained increasing attention. However, these methods, such as depth ambiguity and self-occlusion, still need to be addressed. To address these problems, we propose a 3D human ...
Remote sensing image cloud removal based on multi-scale spatial information perception
Remote sensing imagery is indispensable in diverse domains, including geographic information systems, climate monitoring, agricultural planning, and disaster management. Nonetheless, cloud cover can drastically degrade the utility and quality of ...
Anomaly detection in surveillance videos using Transformer with margin learning
Weakly supervised video anomaly detection (WSVAD) constitutes a highly research-oriented and challenging project within the domains of image and video processing. In prior studies of WSVAD, it has typically been formulated as a multiple-instance ...
Exploiting multi-level consistency learning for source-free domain adaptation
Due to data privacy concerns, a more practical task known as Source-free Unsupervised Domain Adaptation (SFUDA) has gained significant attention recently. SFUDA adapts a pre-trained source model to the target domain without access to the source ...
Weakly-supervised temporal action localization using multi-branch attention weighting
Weakly-supervised temporal action localization aims to train an accurate and robust localization model using only video-level labels. Due to the lack of frame-level temporal annotations, existing weakly-supervised temporal action localization ...
MT-ASM: a multi-task attention strengthening model for fine-grained object recognition
Fine-Grained Object Recognition (FGOR) equips intelligent systems with recognition capabilities at or even beyond the level of human experts, making it a core technology for numerous applications such as biodiversity monitoring systems and ...
PS-YOLO: a small object detector based on efficient convolution and multi-scale feature fusion
Compared to generalized object detection, research on small object detection has been slow, mainly due to the need to learn appropriate features from limited information about small objects. This is coupled with difficulties such as information ...
Multimodal recommender system based on multi-channel counterfactual learning networks
Most multimodal recommender systems utilize multimodal content of user-interacted items as supplemental information to capture user preferences based on historical interactions without considering user-uninteracted items. In contrast, multimodal ...
Integrate encryption of multiple images based on a new hyperchaotic system and Baker map
Image encryption serves as a crucial means to safeguard information against unauthorized access during both transmission and storage phases. This paper introduces an integrated encryption algorithm tailored for multiple images, leveraging a novel ...
SiamS3C: spatial-channel cross-correlation for visual tracking with centerness-guided regression
Visual object tracking can be divided into the object classification and bounding-box regression tasks, but only one sharing correlation map leads to inaccuracy. Siamese trackers compute correlation map by cross-correlation operation with high ...
Exploring multi-level transformers with feature frame padding network for 3D human pose estimation
Recently, transformer-based architecture achieved remarkable performance in 2D to 3D lifting pose estimation. Despite advancements in transformer-based architecture they still struggle to handle depth ambiguity, limited temporal information, ...
Adaptive B-spline curve fitting with minimal control points using an improved sparrow search algorithm for geometric modeling of aero-engine blades
In Industry 4.0 and advanced manufacturing, producing high-precision, complex products such as aero-engine blades involves sophisticated processes. Digital twin technology enables the creation of high-precision, real-time 3D models, optimizing ...
BSP-Net: automatic skin lesion segmentation improved by boundary enhancement and progressive decoding methods
Automatic skin lesion segmentation from dermoscopy images is of great significance in the early treatment of skin cancers, which is yet challenging even for dermatologists due to the inherent issues, i.e., considerable size, shape and color ...
Wacml: based on graph neural network for imbalanced node classification algorithm
The presence of a large number of robot accounts on social media has led to negative social impacts. In most cases, the distribution of robot accounts and real human accounts is imbalanced, resulting in insufficient representativeness and poor ...
3D model watermarking using surface integrals of generated random vector fields
We propose a new semi-blind semi-fragile watermarking algorithm for authenticating triangulated 3D models using the surface integrals of generated random vector fields. Watermark data is embedded into the flux of a vector field across the model’s ...
Contour-assistance-based video matting localization
Video matting is a technique used to replace foreground objects in video frames by predicting their alpha matte. Originally developed for film special effects, advertisements, and live streaming, video matting can also be exploited for malicious ...
Cvstgan: A Controllable Generative Adversarial Network for Video Style Transfer of Chinese Painting
Style transfer aims to apply the stylistic characteristics of a reference image onto a target image or video. Existing studies on style transfer suffer from either fixed style without adjustability or unclear stylistic patterns in output results. ...
Triple fusion and feature pyramid decoder for RGB-D semantic segmentation
Current RGB-D semantic segmentation networks incorporate depth information as an extra modality and merge RGB and depth features using methods such as equal-weighted concatenation or simple fusion strategies. However, these methods hinder the ...
Large scale multimodal fashion care recognition
Smart Fashion is reshaping people’s lives, and affects people’s choices and outfits. Existing computer-vision-enabled fashion technology has covered many aspects, such as fashion detection, fashion recognition, fashion segmentation, virtual ...
Physical-prior-guided single image dehazing network via unpaired contrastive learning
Image dehazing aims to restore high fidelity clear images from hazy ones. It has wide applications on many intelligent image analysis systems in computer vision area. Many prior-based and learning-based methods have already made significant ...
Multi-scale motion contrastive learning for self-supervised skeleton-based action recognition
People process things and express feelings through actions, action recognition has been able to be widely studied, yet under-explored. Traditional self-supervised skeleton-based action recognition focus on joint point features, ignoring the ...
LLR-MVSNet: a lightweight network for low-texture scene reconstruction
In recent years, learning-based MVS methods have achieved excellent performance compared with traditional methods. However, these methods still have notable shortcomings, such as the low efficiency of traditional convolutional networks and simple ...
Automatic lymph node segmentation using deep parallel squeeze & excitation and attention Unet
Automatic segmentation and lymph node (LN) detection for cancer staging are critical. In clinical practice, computed tomography (CT) and positron emission tomography (PET) imaging detect abnormal LNs. Yet, it is still a difficult task due to the ...