TOMM: Vol 18, No 3

Volume 18, Issue 3August 2022

Volume 18, Issue 3

August 2022

Editor:

Alberto Del Bimbo
University of Firenze, Italy

Publisher:

Association for Computing Machinery
New York
NY
United States

ISSN:1551-6857

EISSN:1551-6865

Tags:

Subscribe to Journal Recommend ACM DL

ALREADY A SUBSCRIBER?SIGN IN

Bibliometrics

Issue Downloads

PDFfront matter (TOC, masthead, submission information)

Select All

Export Citations Save to Binder

research-article

Causal Inference with Knowledge Distilling and Curriculum Learning for Unbiased VQA

Article No.: 67, Pages 1–23https://doi.org/10.1145/3487042

Recently, many Visual Question Answering (VQA) models rely on the correlations between questions and answers yet neglect those between the visual information and the textual information. They would perform badly if the handled data distribute differently ...

research-article

Interactive Re-ranking via Object Entropy-Guided Question Answering for Cross-Modal Image Retrieval

Article No.: 68, Pages 1–17https://doi.org/10.1145/3485042

Cross-modal image-retrieval methods retrieve desired images from a query text by learning relationships between texts and images. Such a retrieval approach is one of the most effective ways of achieving the easiness of query preparation. Recent cross-...

research-article

Shuffle-invariant Network for Action Recognition in Videos

Article No.: 69, Pages 1–18https://doi.org/10.1145/3485665

The local key features in video are important for improving the accuracy of human action recognition. However, most end-to-end methods focus on global feature learning from videos, while few works consider the enhancement of the local information in a ...

research-article

Learning Adaptive Spatial-Temporal Context-Aware Correlation Filters for UAV Tracking

Article No.: 70, Pages 1–18https://doi.org/10.1145/3486678

Tracking in the unmanned aerial vehicle (UAV) scenarios is one of the main components of target-tracking tasks. Different from the target-tracking task in the general scenarios, the target-tracking task in the UAV scenarios is very challenging because of ...

research-article

Enhanced 3D Shape Reconstruction With Knowledge Graph of Category Concept

Article No.: 71, Pages 1–20https://doi.org/10.1145/3491224

Reconstructing three-dimensional (3D) objects from images has attracted increasing attention due to its wide applications in computer vision and robotic tasks. Despite the promising progress of recent deep learning–based approaches, which directly ...

research-article

Domain-invariant Graph for Adaptive Semi-supervised Domain Adaptation

Article No.: 72, Pages 1–18https://doi.org/10.1145/3487194

Domain adaptation aims to generalize a model from a source domain to tackle tasks in a related but different target domain. Traditional domain adaptation algorithms assume that enough labeled data, which are treated as the prior knowledge are available in ...

research-article

Objective Object Segmentation Visual Quality Evaluation: Quality Measure and Pooling Method

Article No.: 73, Pages 1–19https://doi.org/10.1145/3491229

Objective object segmentation visual quality evaluation is an emergent member of the visual quality assessment family. It aims to develop an objective measure instead of a subjective survey to evaluate the object segmentation quality in agreement with ...

research-article

CRAR: Accelerating Stereo Matching with Cascaded Residual Regression and Adaptive Refinement

Article No.: 74, Pages 1–19https://doi.org/10.1145/3488719

Dense stereo matching estimates the depth for each pixel of the referenced images. Recently, deep learning algorithms have dramatically promoted the development of stereo matching. The state-of-the-art result is achieved by models adopting deep ...

research-article

Recognizing Gaits Across Walking and Running Speeds

Article No.: 75, Pages 1–22https://doi.org/10.1145/3488715

For decades, very few methods were proposed for cross-mode (i.e., walking vs. running) gait recognition. Thus, it remains largely unexplored regarding how to recognize persons by the way they walk and run. Existing cross-mode methods handle the walking-...

research-article

Inner Knowledge-based Img2Doc Scheme for Visual Question Answering

Article No.: 76, Pages 1–21https://doi.org/10.1145/3489142

Visual Question Answering (VQA) is a research topic of significant interest at the intersection of computer vision and natural language understanding. Recent research indicates that attributes and knowledge can effectively improve performance for both ...

research-article

Matching Faces and Attributes Between the Artistic and the Real Domain: the PersonArt Approach

Article No.: 77, Pages 1–23https://doi.org/10.1145/3490033

In this article, we present an approach for retrieving similar faces between the artistic and the real domain. The application we refer to is an interactive exhibition inside a museum, in which a visitor can take a photo of himself and search for a ...

research-article

A Multimodal Framework for Large-Scale Emotion Recognition by Fusing Music and Electrodermal Activity Signals

Article No.: 78, Pages 1–23https://doi.org/10.1145/3490686

Considerable attention has been paid to physiological signal-based emotion recognition in the field of affective computing. For reliability and user-friendly acquisition, electrodermal activity (EDA) has a great advantage in practical applications. ...

research-article

GraSP: Local Grassmannian Spatio-Temporal Patterns for Unsupervised Pose Sequence Recognition

Article No.: 79, Pages 1–23https://doi.org/10.1145/3491227

Many applications of action recognition, especially broad domains like surveillance or anomaly-detection, favor unsupervised methods considering that exhaustive labeling of actions is not possible. However, very limited work has happened in this domain. ...

research-article

Skeleton Sequence and RGB Frame Based Multi-Modality Feature Fusion Network for Action Recognition

Article No.: 80, Pages 1–24https://doi.org/10.1145/3491228

Action recognition has been a heated topic in computer vision for its wide application in vision systems. Previous approaches achieve improvement by fusing the modalities of the skeleton sequence and RGB video. However, such methods pose a dilemma between ...

research-article

Distributed Gateway Selection for Video Streaming in VANET Using IP Multicast

Article No.: 81, Pages 1–24https://doi.org/10.1145/3491388

The volume of video traffic as infotainment service over vehicular ad hoc network (VANET) has rapidly increased for past few years. Providing video streaming as VANET infotainment service is very challenging because of high mobility and heterogeneity of ...

research-article

Multilayer Video Encoding for QoS Managing of Video Streaming in VANET Environment

Article No.: 82, Pages 1–19https://doi.org/10.1145/3491433

Efficient delivery and maintenance of the quality of service (QoS) of audio/video streams transmitted over VANETs for mobile and heterogeneous nodes are one of the major challenges in the convergence of this network type and these services. In this ...

research-article

When Pairs Meet Triplets: Improving Low-Resource Captioning via Multi-Objective Optimization

Article No.: 83, Pages 1–20https://doi.org/10.1145/3492325

Image captioning for low-resource languages has attracted much attention recently. Researchers propose to augment the low-resource caption dataset into (image, rich-resource language, and low-resource language) triplets and develop the dual attention ...

research-article

Improving Crowd Density Estimation by Fusing Aerial Images and Radio Signals

Article No.: 84, Pages 1–23https://doi.org/10.1145/3492346

A recent line of research focuses on crowd density estimation from RGB images for a variety of applications, for example, surveillance and traffic flow control. The performance drops dramatically for low-quality images, such as occlusion, or poor light ...

research-article

A Format-compatible Searchable Encryption Scheme for JPEG Images Using Bag-of-words

Article No.: 85, Pages 1–18https://doi.org/10.1145/3492705

The development of cloud computing attracts enterprises and individuals to outsource their data, such as images, to the cloud server. However, direct outsourcing causes the extensive concern of privacy leakage, as images often contain rich sensitive ...

research-article

Blockchain-Based Audio Watermarking Technique for Multimedia Copyright Protection in Distribution Networks

Article No.: 86, Pages 1–23https://doi.org/10.1145/3492803

Copyright protection in multimedia protection distribution is a challenging problem. To protect multimedia data, many watermarking methods have been proposed in the literature. However, most of them cannot be used effectively in a multimedia distribution ...

research-article

Deep Illumination-Enhanced Face Super-Resolution Network for Low-Light Images

Article No.: 87, Pages 1–19https://doi.org/10.1145/3495258

Face images are typically a key component in the fields of security and criminal investigation. However, due to lighting and shooting angles, faces taken under low-light conditions are often difficult to recognize. Face super-resolution (FSR) technology ...

research-article

Scribble-Supervised Meibomian Glands Segmentation in Infrared Images

Article No.: 88, Pages 1–23https://doi.org/10.1145/3497747

Infrared imaging is currently the most effective clinical method to evaluate the morphology of the meibomian glands (MGs) in patients. As an important indicator for monitoring the development of MG dysfunction, it is necessary to accurately measure gland-...

survey

Towards Integrating Image Encryption with Compression: A Survey

Article No.: 89, Pages 1–21https://doi.org/10.1145/3498342

As digital images are consistently generated and transmitted online, the unauthorized utilization of these images is an increasing concern that has a significant impact on both security and privacy issues; additionally, the representation of digital ...

Subjects

Comments

Please enable JavaScript to view thecomments powered by Disqus.

ACM Transactions on Multimedia Computing, Communications, and Applications

Sections

Issue Downloads

Causal Inference with Knowledge Distilling and Curriculum Learning for Unbiased VQA

Interactive Re-ranking via Object Entropy-Guided Question Answering for Cross-Modal Image Retrieval

Shuffle-invariant Network for Action Recognition in Videos

Learning Adaptive Spatial-Temporal Context-Aware Correlation Filters for UAV Tracking

Enhanced 3D Shape Reconstruction With Knowledge Graph of Category Concept

Domain-invariant Graph for Adaptive Semi-supervised Domain Adaptation

Objective Object Segmentation Visual Quality Evaluation: Quality Measure and Pooling Method

CRAR: Accelerating Stereo Matching with Cascaded Residual Regression and Adaptive Refinement

Recognizing Gaits Across Walking and Running Speeds

Inner Knowledge-based Img2Doc Scheme for Visual Question Answering

Matching Faces and Attributes Between the Artistic and the Real Domain: the PersonArt Approach

A Multimodal Framework for Large-Scale Emotion Recognition by Fusing Music and Electrodermal Activity Signals

GraSP: Local Grassmannian Spatio-Temporal Patterns for Unsupervised Pose Sequence Recognition

Skeleton Sequence and RGB Frame Based Multi-Modality Feature Fusion Network for Action Recognition

Distributed Gateway Selection for Video Streaming in VANET Using IP Multicast

Multilayer Video Encoding for QoS Managing of Video Streaming in VANET Environment

When Pairs Meet Triplets: Improving Low-Resource Captioning via Multi-Objective Optimization

Improving Crowd Density Estimation by Fusing Aerial Images and Radio Signals

A Format-compatible Searchable Encryption Scheme for JPEG Images Using Bag-of-words

Blockchain-Based Audio Watermarking Technique for Multimedia Copyright Protection in Distribution Networks

Deep Illumination-Enhanced Face Super-Resolution Network for Low-Light Images

Scribble-Supervised Meibomian Glands Segmentation in Infrared Images

Towards Integrating Image Encryption with Compression: A Survey