Stereo-RSSF: stereo robust sparse scene-flow estimation
Scene-flow (SF) estimation is considered to be one of the most fundamental problems in scene understanding and autonomous control. The majority of the existing methods adopted for SF estimation suffer lack of robustness in some environments and ...
Boundary-aware small object detection with attention and interaction
Object detection is a critical technology for the intelligent analytical processing of images captured by drones. The objects usually come in various scales and can be extremely small. Existing detection methods are inherently based on pyramid ...
TSNet: Task-specific network for joint diabetic retinopathy grading and lesion segmentation of ultra-wide optical coherence tomography angiography images
Diabetic retinopathy (DR) is a common complication of diabetes which may lead to blindness. Early diagnosis can effectively prevent the deterioration of the disease and enable timely treatment. Ophthalmologists diagnose DR by observing ultra-wide ...
Cluster-based two-branch framework for point cloud attribute compression
Owing to the irregular distribution of point clouds in 3D space, effectively compressing the point cloud is still challenging. Recently, numerous compression methods have been developed with outstanding performance for the compression of geometry ...
End-to-end learning for joint depth and image reconstruction from diffracted rotation
Monocular depth estimation is an open challenge due to the ill-posed nature of the problem at hand. Deep learning techniques proved capable of producing acceptable depth estimation accuracy but the lack of robust depth cues within RGB images ...
Multi-scale color constancy based on salient varying local spatial statistics
The human visual system unconsciously determines the color of the objects by “discounting” the effects of the illumination, whereas machine vision systems have difficulty performing this task. Color constancy algorithms assist computer vision ...
A self-attention model for viewport prediction based on distance constraint
Panoramic video multimedia technology has made significant advancements in recent years, providing users with an immersive experience by displaying the entire 360° spherical scene centered around their virtual location. However, due to its larger ...
Wall segmentation in house plans: fusion of deep learning and traditional methods
Recognition and extraction of elements from house plans present significant challenges in the construction, decoration and interior design industries. To address this issue, this paper proposes a wall segmentation system for house plans that ...
Masked-attention diffusion guidance for spatially controlling text-to-image generation
Text-to-image synthesis has achieved high-quality results with recent advances in diffusion models. However, text input alone has high spatial ambiguity and limited user controllability. Most existing methods allow spatial control through ...
MFOGCN: multi-feature-based orthogonal graph convolutional network for 3D human motion prediction
Human motion prediction in various motion capture applications, e.g., optical and inertial, is challenging because of the complexity of human motion sequences. Current studies on this issue have insufficient analysis on the latent motion ...
Modeling and realization of image-based garment texture transfer
We present an automated framework founded on texture transfer, facilitating the substitution of textures in garment images with specified ones for applications in garment design and online presentation. In contrast to previous methodologies, our ...
Interpolating meshes of arbitrary topology by Catmull–Clark surfaces with energy constraint
We propose an efficient method with energy constraints for constructing a Catmull–Clark surface that interpolates a given mesh. We approximate the surface energy of Catmull–Clark surfaces near extraordinary points by summing their finite ...
A novel deformable B-spline curve model based on elasticity
The physically based deformable curve models are widely used to simulate thin one-dimensional objects in computer graphics, interactive simulation, and surgery simulation. These models consider objects to be rods described by an adapted frame ...
Answer sheet layout analysis based on YOLOv5s-DC and MSER
Layout analysis is the first step in automatic grading and other OCR tasks. Although various layout analysis technologies have been developed for different application scenarios, existing approaches still have difficulty in achieving high accuracy ...
Fast continuous patch-based artistic style transfer for videos
Convolutional neural network-based image style transfer models often suffer from temporal inconsistency when applied to video. Although several video style transfer models have been proposed to improve temporal consistency, they often trade off ...
3D point cloud denoising method based on global feature guidance
Raw point cloud (PC) data acquired by 3D sensors or reconstruction algorithms inevitably contain noise and outliers, which can seriously impact downstream tasks, such as surface reconstruction and target detection. To address this problem, this ...
Ni-DehazeNet: representation learning via bilevel optimized architecture search for nighttime dehazing
Nighttime dehazing is a challenging ill-posed problem due to the severe haze pollution and color attenuation. Since available daytime dehazing approaches cannot be consistently adapted to the nighttime case, this paper specifically designs ...
Survey on vision-based dynamic hand gesture recognition
To communicate with one another hand, gesture is very important. The task of using the hand gesture in technology is influenced by a very common way humans communicate with the natural environment. The recognizing and finding pose estimation of ...
Segmentation-driven feature-preserving mesh denoising
Feature-preserving mesh denoising has received noticeable attention in visual media, with the aim of recovering high-fidelity, clean mesh shapes from the ones that are contaminated by noise. Existing denoising methods often design smaller weights ...
Scene representation using a new two-branch neural network model
Scene classification and recognition have always been one of the most challenging tasks of scene understanding due to the inherent ambiguity in visual scenes. The core of scene classification and recognition tasks is scene representation. Deep ...
Automated barcodeless product classifier for food retail self-checkout images
Growing popularity of self-service in retail stores and increasing associated shrinkage presents an urgent need for computer-vision-based product recognition in the area of self-checkouts. The article focuses on individual product recognition ...
Bidirectional feature enhancement transformer for unsupervised domain adaptation
Unsupervised domain adaptation (UDA) aims to generalize knowledge learned from one labeled source domain to another unlabeled target domain. To extract domain-invariant feature representations, most existing UDA approaches leverage convolution ...
STAM: a spatio-temporal adaptive module for improving static convolutions in action recognition
Temporal adaptive convolution has demonstrated superior performance over static convolution techniques in video understanding. However, it needs to be improved in long-time series modeling and multi-scale feature-map adaptation. To address these ...
Mixture autoregressive and spectral attention network for multispectral image compression based on variational autoencoder
Multispectral images, with their unique three-dimensional characteristics, require specialized spatial-spectral feature extraction modules to achieve superior compression results. Current end-to-end compression frameworks underperform compared to ...
Real-scene-constrained virtual scene layout synthesis for mixed reality
Given a real source scene and a virtual target scene, the real-scene-constrained virtual scene layout synthesis problem is defined as how to re-synthesize the layout of the virtual furniture in the virtual scene to form a new virtual scene such ...
A self-attention-based fusion framework for facial expression recognition in wavelet domain
Facial expression recognition (FER) plays a vital role for applications based on human–computer interaction. In the past few years, many deep learning models have been proposed for FER, but their performance is limited due to challenges such as ...
Residual network-based ocean wave modelling from satellite images using ensemble Kalman filter
Nonlinear ocean waves have a significant impact on the functioning of several offshore activities. Predicting the internal ocean waves plays a crucial role on submarine and ship operations. Data assimilation is a mechanism in which data observed ...
Obtaining the user-defined polygons inside a closed contour with holes
In image processing, computer vision algorithms are applied to regions bounded by closed contours. These contours are often irregular, poorly defined, and contain holes or unavailable areas inside. A common problem in computational geometry ...
TMGAN: two-stage multi-domain generative adversarial network for landscape image translation
Chinese landscape paintings, realistic landscape photographs, and oil paintings each possess unique artistic characteristics and painting features. Image-to-image translation between these three domains is an extremely challenging task. Existing ...