Nothing Special   »   [go: up one dir, main page]

skip to main content
Reflects downloads up to 13 Nov 2024Bibliometrics
Skip Table Of Content Section
research-article
Vicsgaze: a gaze estimation method using self-supervised contrastive learning
Abstract

Existing deep learning-based gaze estimation methods achieved high accuracy, and the prerequisite for ensuring their performance is large-scale datasets with gaze labels. However, collecting large-scale gaze datasets is time-consuming and ...

research-article
3D human pose estimation with multi-hypotheses gated transformer
Abstract

Human pose estimation aims to locate human joints from inputs such as images and videos. Recent works have made significant progress in 3D human pose estimation, but they still face the ill-posed problem caused by the deep ambiguity of estimating ...

research-article
Mutual-weighted feature disentanglement for unsupervised domain adaptation
Abstract

Unsupervised domain adaptation (UDA) aims to reduce the distribution discrepancy across domains, enabling the transfer of knowledge from the labeled source domain to the unlabeled target domain. The main focus of most current UDA methods lies on ...

research-article
Motion synthesis via distilled absorbing discrete diffusion model
Abstract

In this work, we explore the potential of discrete diffusion model in text-driven motion synthesis. Previous methods aimed at improving the quality of generated motions often led to an increase in model parameters, while neglecting the diversity ...

research-article
TEST-Net: transformer-enhanced Spatio-temporal network for infectious disease prediction
Abstract

Outbreaks of infectious diseases have caused tremendous human suffering and incalculable economic losses, and infectious diseases are a global public health problem that threatens human society. Therefore, it is necessary to model the spatial and ...

research-article
Model-based portrait video compression with spatial constraint and adaptive pose processing
Abstract

Motion model based video coding approach, which employs sparse sets of keypoints instead of dense optical flows, can efficiently compress videos at ultra-low bitrates. Such schemes obtain notable performance gains over traditional video codecs in ...

research-article
Dynamical semantic enhancement network for continuous sign language recognition
Abstract

In the field of sign language recognition, effective interpretation of semantic information, which is primarily conveyed through facial and hand gestures, poses significant challenges. Previous methods often struggle to simultaneously capture ...

research-article
DS-SRD: a unified framework for structured representation distillation
Abstract

To improve the representation performance of smaller models, representation distillation has been investigated to transfer structured knowledge from a larger model (teacher) to a smaller model (student). Current work aims to maximize a lower bound ...

research-article
Local and global context cooperation for temporal action detection
Abstract

Temporal action detection (TAD) is a fundamental task for video understanding. The task aims to locate the start and end boundaries of action instances and identify their corresponding categories within untrimmed videos. Distinguishing between ...

research-article
Multi-scale feature correspondence and restriction mechanism for visible X-ray baggage re-Identification
Abstract

Recently, social security surveillance has posed a new AI challenge, i.e., Visible-X-ray baggage Re-Identification (VX-ReID), which aims to re-identify and retrieve baggage between visible and X-ray imaging modalities. Compared with cross-modality ...

research-article
Exploring the impact of volumetric graphics on the engagement of broadcast media professionals
Abstract

The purpose of this study is to explore content creator preferences in broadcast media, with a specific focus on the impact of integrating 3D graphics to enhance viewer engagement. In this study, we investigated the integration of 3D technology in ...

research-article
Gmd: Gaussian mixture descriptor for pair matching of 3D fragments
Abstract

In the automatic reassembly of fragments acquired using laser scanners to reconstruct objects, a crucial step is the matching of fractured surfaces. In this paper, we propose a novel local descriptor that uses the Gaussian Mixture Model (GMM) to ...

research-article
SS-CMT: a label independent cross-modal transferable adversarial video attack with sparse strategy
Abstract

Deep neural networks are vulnerable to adversarial examples which are generated by adding carefully crafted perturbations on benign examples. Some research works explore the transferability of adversarial examples between hetero-modal models from ...

research-article
PillarVTP: vehicle trajectory prediction method based on local point cloud aggregation and receptive field expansion
Abstract

Vehicle trajectory prediction plays a crucial role in the control and safety warning of autonomous vehicles. Existing methods often depend on costly high definition (HD) maps for generating trajectories to fit their scenarios, or involve ...

research-article
Robust Grassmann manifold convex hull collaborative representation learning and its kernel extension for image set analysis
Abstract

Effectively leveraging multi-view information is crucial for in-depth analysis of complex problems. Currently, the approach of analyzing sets of images has garnered significant attention, mainly because it allows for the comprehensive ...

research-article
DSTANet: learning a dual-stream model for anomaly driving action detection using spatio-temporal and appearance features
Abstract

Driving action anomaly detection based on in-cab surveillance video has become the mainstream of current driving action research. However, there is a substantial redundancy of spatio-temporal information in the spatio-temporal action features ...

research-article
SiamRCSC: Robust siamese network with channel and spatial constraints for visual object tracking
Abstract

Locating and classifying the target object is performed by the siamese-based tracking framework by evaluating the similarity on the feature maps from the template and search branches. While the promising tracking performances have been achieved by ...

research-article
Hierarchical bi-directional conceptual interaction for text-video retrieval
Abstract

The large pre-trained vision-language models (VLMs) utilized in text-video retrieval have demonstrated strong cross image-text understanding ability. Existing works leverage VLMs to extract features and design fine-grained uni-directional ...

research-article
Multi-view anomaly detection via hybrid instance-neighborhood aligning and cross-view reasoning
Abstract

Multi-view anomaly detection aims to identify anomalous instances whose patterns are disparate across different views, and existing works usually project the multi-view data into a common subspace for abnormal instance identification. Nevertheless,...

research-article
A lightweight distillation recurrent convolution network on FPGA for real-time video super-resolution
Abstract

In the application of image super-resolution (SR) based on field-programmable gate array (FPGA), depthwise separable convolution is widely utilized. However, existing network designs overly simplify the structures used for deep feature extraction ...

research-article
Design and experimental evaluation of an intelligent sugarcane stem node recognition system based on enhanced YOLOv5s
Abstract

The rapid and accurate identification of sugarcane internodes is of great significance for tasks such as field operations and precision management in the sugarcane industry, and it is also a fundamental task for the intelligence of the sugarcane ...

research-article
Dynamic spatial-temporal topology graph network for skeleton-based action recognition
Abstract

Over the past few years, skeleton-based action recognition has gained significant attention for its simple yet robust representation of the human body structure. Many researchers have employed Graph Convolutional Network (GCN) to explore ...

research-article
Expressive feature representation pyramid network for pulmonary nodule detection
Abstract

Lung cancer has the highest fatality rate among all types of cancers. The detection of pulmonary nodules serves as the primary means for early diagnosis, utilizing deep learning models for pulmonary nodule detection can improve the accuracy and ...

research-article
Universal NIR-II fluorescence image enhancement via covariance weighted attention network
Abstract

The second near-infrared (NIR-II) fluorescence imaging has become a new imaging mode due to its characteristics of real-time intraoperative imaging. The NIR-IIb window (1500–1700 nm) has stronger light penetration and has a clearer imaging effect ...

research-article
CFFANet: category feature fusion and attention mechanism network for retinal vessel segmentation
Abstract

Retinal vessel segmentation is a computer-aided diagnostic method for ophthalmic disease analysis. Owing to the complex structure of the retinal vasculature, it is difficult for the segmentation network to capture effective features, and the ...

research-article
Collaborative multi-knowledge distillation under the influence of softmax regression representation
Abstract

Knowledge distillation can transfer knowledge from a powerful yet cumbersome teacher model to a less-parameterized student model, thus effectively achieving model compression. Various knowledge distillation methods have mainly focused on the task ...

research-article
Dual-branch network object detection algorithm based on dual-modality fusion of visible and infrared images
Abstract

Aiming at the limitations of visible images in object detection, this paper proposes a dual-branch network object detection algorithm based on dual-modality fusion of visible and infrared images. Based on YOLOv7-s, the algorithm firstly introduces ...

research-article
Panoramic image semantic segmentation using channel attention-based HarDNet and distorted boundary learning
Abstract

In this paper, we propose a semantic segmentation framework for panoramic images. First, in order to solve the problem of large panoramic image size, we use HarDNet in the backbone. By applying HarDNet, while improving segmentation accuracy, it ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.