MUME: Vol 30, No 6

Volume 30, Issue 6Dec 2024Current Issue

Latest Issue

Volume 30, Issue 6

Dec 2024

Publisher:

Springer-Verlag
Berlin, Heidelberg

ISSN:0942-4962

Tags:

Bibliometrics

Select All

Export Citations Save to Binder

research-article

Vicsgaze: a gaze estimation method using self-supervised contrastive learning

https://doi.org/10.1007/s00530-024-01458-x

Abstract

Existing deep learning-based gaze estimation methods achieved high accuracy, and the prerequisite for ensuring their performance is large-scale datasets with gaze labels. However, collecting large-scale gaze datasets is time-consuming and ...

research-article

3D human pose estimation with multi-hypotheses gated transformer

https://doi.org/10.1007/s00530-024-01460-3

Abstract

Human pose estimation aims to locate human joints from inputs such as images and videos. Recent works have made significant progress in 3D human pose estimation, but they still face the ill-posed problem caused by the deep ambiguity of estimating ...

research-article

Mutual-weighted feature disentanglement for unsupervised domain adaptation

https://doi.org/10.1007/s00530-024-01477-8

Abstract

Unsupervised domain adaptation (UDA) aims to reduce the distribution discrepancy across domains, enabling the transfer of knowledge from the labeled source domain to the unlabeled target domain. The main focus of most current UDA methods lies on ...

research-article

Motion synthesis via distilled absorbing discrete diffusion model

https://doi.org/10.1007/s00530-024-01492-9

Abstract

In this work, we explore the potential of discrete diffusion model in text-driven motion synthesis. Previous methods aimed at improving the quality of generated motions often led to an increase in model parameters, while neglecting the diversity ...

research-article

TEST-Net: transformer-enhanced Spatio-temporal network for infectious disease prediction

https://doi.org/10.1007/s00530-024-01494-7

Abstract

Outbreaks of infectious diseases have caused tremendous human suffering and incalculable economic losses, and infectious diseases are a global public health problem that threatens human society. Therefore, it is necessary to model the spatial and ...

research-article

Model-based portrait video compression with spatial constraint and adaptive pose processing

https://doi.org/10.1007/s00530-024-01499-2

Abstract

Motion model based video coding approach, which employs sparse sets of keypoints instead of dense optical flows, can efficiently compress videos at ultra-low bitrates. Such schemes obtain notable performance gains over traditional video codecs in ...

research-article

Dynamical semantic enhancement network for continuous sign language recognition

https://doi.org/10.1007/s00530-024-01505-7

Abstract

In the field of sign language recognition, effective interpretation of semantic information, which is primarily conveyed through facial and hand gestures, poses significant challenges. Previous methods often struggle to simultaneously capture ...

research-article

DS-SRD: a unified framework for structured representation distillation

https://doi.org/10.1007/s00530-024-01507-5

Abstract

To improve the representation performance of smaller models, representation distillation has been investigated to transfer structured knowledge from a larger model (teacher) to a smaller model (student). Current work aims to maximize a lower bound ...

research-article

Local and global context cooperation for temporal action detection

https://doi.org/10.1007/s00530-024-01511-9

Abstract

Temporal action detection (TAD) is a fundamental task for video understanding. The task aims to locate the start and end boundaries of action instances and identify their corresponding categories within untrimmed videos. Distinguishing between ...

research-article

Multi-scale feature correspondence and restriction mechanism for visible X-ray baggage re-Identification

https://doi.org/10.1007/s00530-024-01513-7

Abstract

Recently, social security surveillance has posed a new AI challenge, i.e., Visible-X-ray baggage Re-Identification (VX-ReID), which aims to re-identify and retrieve baggage between visible and X-ray imaging modalities. Compared with cross-modality ...

research-article

Exploring the impact of volumetric graphics on the engagement of broadcast media professionals

https://doi.org/10.1007/s00530-024-01517-3

Abstract

The purpose of this study is to explore content creator preferences in broadcast media, with a specific focus on the impact of integrating 3D graphics to enhance viewer engagement. In this study, we investigated the integration of 3D technology in ...

research-article

Gmd: Gaussian mixture descriptor for pair matching of 3D fragments

https://doi.org/10.1007/s00530-024-01519-1

Abstract

In the automatic reassembly of fragments acquired using laser scanners to reconstruct objects, a crucial step is the matching of fractured surfaces. In this paper, we propose a novel local descriptor that uses the Gaussian Mixture Model (GMM) to ... $_{}$

research-article

SS-CMT: a label independent cross-modal transferable adversarial video attack with sparse strategy

https://doi.org/10.1007/s00530-024-01520-8

Abstract

Deep neural networks are vulnerable to adversarial examples which are generated by adding carefully crafted perturbations on benign examples. Some research works explore the transferability of adversarial examples between hetero-modal models from ...

research-article

PillarVTP: vehicle trajectory prediction method based on local point cloud aggregation and receptive field expansion

https://doi.org/10.1007/s00530-024-01521-7

Abstract

Vehicle trajectory prediction plays a crucial role in the control and safety warning of autonomous vehicles. Existing methods often depend on costly high definition (HD) maps for generating trajectories to fit their scenarios, or involve ...

research-article

Robust Grassmann manifold convex hull collaborative representation learning and its kernel extension for image set analysis

https://doi.org/10.1007/s00530-024-01522-6

Abstract

Effectively leveraging multi-view information is crucial for in-depth analysis of complex problems. Currently, the approach of analyzing sets of images has garnered significant attention, mainly because it allows for the comprehensive ... $_{}$ $_{}$

research-article

DSTANet: learning a dual-stream model for anomaly driving action detection using spatio-temporal and appearance features

https://doi.org/10.1007/s00530-024-01523-5

Abstract

Driving action anomaly detection based on in-cab surveillance video has become the mainstream of current driving action research. However, there is a substantial redundancy of spatio-temporal information in the spatio-temporal action features ...

research-article

SiamRCSC: Robust siamese network with channel and spatial constraints for visual object tracking

https://doi.org/10.1007/s00530-024-01524-4

Abstract

Locating and classifying the target object is performed by the siamese-based tracking framework by evaluating the similarity on the feature maps from the template and search branches. While the promising tracking performances have been achieved by ...

research-article

Hierarchical bi-directional conceptual interaction for text-video retrieval

https://doi.org/10.1007/s00530-024-01525-3

Abstract

The large pre-trained vision-language models (VLMs) utilized in text-video retrieval have demonstrated strong cross image-text understanding ability. Existing works leverage VLMs to extract features and design fine-grained uni-directional ...

research-article

Multi-view anomaly detection via hybrid instance-neighborhood aligning and cross-view reasoning

https://doi.org/10.1007/s00530-024-01526-2

Abstract

Multi-view anomaly detection aims to identify anomalous instances whose patterns are disparate across different views, and existing works usually project the multi-view data into a common subspace for abnormal instance identification. Nevertheless,...

research-article

A lightweight distillation recurrent convolution network on FPGA for real-time video super-resolution

https://doi.org/10.1007/s00530-024-01528-0

Abstract

In the application of image super-resolution (SR) based on field-programmable gate array (FPGA), depthwise separable convolution is widely utilized. However, existing network designs overly simplify the structures used for deep feature extraction ...

research-article

Design and experimental evaluation of an intelligent sugarcane stem node recognition system based on enhanced YOLOv5s

https://doi.org/10.1007/s00530-024-01529-z

Abstract

The rapid and accurate identification of sugarcane internodes is of great significance for tasks such as field operations and precision management in the sugarcane industry, and it is also a fundamental task for the intelligence of the sugarcane ...

research-article

Dynamic spatial-temporal topology graph network for skeleton-based action recognition

https://doi.org/10.1007/s00530-024-01531-5

Abstract

Over the past few years, skeleton-based action recognition has gained significant attention for its simple yet robust representation of the human body structure. Many researchers have employed Graph Convolutional Network (GCN) to explore ...

research-article

Expressive feature representation pyramid network for pulmonary nodule detection

https://doi.org/10.1007/s00530-024-01532-4

Abstract

Lung cancer has the highest fatality rate among all types of cancers. The detection of pulmonary nodules serves as the primary means for early diagnosis, utilizing deep learning models for pulmonary nodule detection can improve the accuracy and ...

research-article

Universal NIR-II fluorescence image enhancement via covariance weighted attention network

https://doi.org/10.1007/s00530-024-01533-3

Abstract

The second near-infrared (NIR-II) fluorescence imaging has become a new imaging mode due to its characteristics of real-time intraoperative imaging. The NIR-IIb window (1500–1700 nm) has stronger light penetration and has a clearer imaging effect ...

research-article

CFFANet: category feature fusion and attention mechanism network for retinal vessel segmentation

https://doi.org/10.1007/s00530-024-01535-1

Abstract

Retinal vessel segmentation is a computer-aided diagnostic method for ophthalmic disease analysis. Owing to the complex structure of the retinal vasculature, it is difficult for the segmentation network to capture effective features, and the ...

research-article

Collaborative multi-knowledge distillation under the influence of softmax regression representation

https://doi.org/10.1007/s00530-024-01537-z

Abstract

Knowledge distillation can transfer knowledge from a powerful yet cumbersome teacher model to a less-parameterized student model, thus effectively achieving model compression. Various knowledge distillation methods have mainly focused on the task ...

research-article

Dual-branch network object detection algorithm based on dual-modality fusion of visible and infrared images

https://doi.org/10.1007/s00530-024-01540-4

Abstract

Aiming at the limitations of visible images in object detection, this paper proposes a dual-branch network object detection algorithm based on dual-modality fusion of visible and infrared images. Based on YOLOv7-s, the algorithm firstly introduces ...

research-article

Panoramic image semantic segmentation using channel attention-based HarDNet and distorted boundary learning

https://doi.org/10.1007/s00530-024-01541-3

Abstract

In this paper, we propose a semantic segmentation framework for panoramic images. First, in order to solve the problem of large panoramic image size, we use HarDNet in the backbone. By applying HarDNet, while improving segmentation accuracy, it ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Multimedia Systems

Sections

Vicsgaze: a gaze estimation method using self-supervised contrastive learning

3D human pose estimation with multi-hypotheses gated transformer

Mutual-weighted feature disentanglement for unsupervised domain adaptation

Motion synthesis via distilled absorbing discrete diffusion model

TEST-Net: transformer-enhanced Spatio-temporal network for infectious disease prediction

Model-based portrait video compression with spatial constraint and adaptive pose processing

Dynamical semantic enhancement network for continuous sign language recognition

DS-SRD: a unified framework for structured representation distillation

Local and global context cooperation for temporal action detection

Multi-scale feature correspondence and restriction mechanism for visible X-ray baggage re-Identification

Exploring the impact of volumetric graphics on the engagement of broadcast media professionals

Gmd: Gaussian mixture descriptor for pair matching of 3D fragments

SS-CMT: a label independent cross-modal transferable adversarial video attack with sparse strategy

PillarVTP: vehicle trajectory prediction method based on local point cloud aggregation and receptive field expansion

Robust Grassmann manifold convex hull collaborative representation learning and its kernel extension for image set analysis

DSTANet: learning a dual-stream model for anomaly driving action detection using spatio-temporal and appearance features

SiamRCSC: Robust siamese network with channel and spatial constraints for visual object tracking

Hierarchical bi-directional conceptual interaction for text-video retrieval

Multi-view anomaly detection via hybrid instance-neighborhood aligning and cross-view reasoning

A lightweight distillation recurrent convolution network on FPGA for real-time video super-resolution

Design and experimental evaluation of an intelligent sugarcane stem node recognition system based on enhanced YOLOv5s

Dynamic spatial-temporal topology graph network for skeleton-based action recognition

Expressive feature representation pyramid network for pulmonary nodule detection

Universal NIR-II fluorescence image enhancement via covariance weighted attention network

CFFANet: category feature fusion and attention mechanism network for retinal vessel segmentation

Collaborative multi-knowledge distillation under the influence of softmax regression representation

Dual-branch network object detection algorithm based on dual-modality fusion of visible and infrared images

Panoramic image semantic segmentation using channel attention-based HarDNet and distorted boundary learning