Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleOctober 2024JUST ACCEPTED
Correlation-aware Cross-modal Attention Network for Fashion Compatibility Modeling in UGC Systems
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Just Accepted https://doi.org/10.1145/3698772Empowered by the continuous integration of social multimedia and artificial intelligence, the application scenarios of information retrieval (IR) progressively tend to be diversified and personalized. Currently, User-Generated Content (UGC) systems have ...
- research-articleOctober 2024JUST ACCEPTED
Mutually-Guided Hierarchical Multi-Modal Feature Learning for Referring Image Segmentation
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Just Accepted https://doi.org/10.1145/3698771Referring image segmentation aims to locate and segment the target region based on a given textual expression query. The primary challenge is to understand semantics from visual and textual modalities and achieve alignment and matching. Prior works have ...
- research-articleOctober 2024JUST ACCEPTED
Scalable Frame-based Construction of Sociocultural NormBases for Socially-Aware Dialogues
- Shilin Qu,
- Weiqing Wang,
- Xin Zhou,
- Haolan Zhan,
- Zhuang Li,
- Lizhen Qu,
- Linhao Luo,
- Yuan-Fang Li,
- Gholamreza Haffari
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Just Accepted https://doi.org/10.1145/3697838Sociocultural norms serve as guiding principles for personal conduct in social interactions, emphasizing respect, cooperation, and appropriate behavior, which is able to benefit tasks including conversational information retrieval, contextual information ...
- research-articleSeptember 2024
Privacy-Enhanced Prototype-Based Federated Cross-Modal Hashing for Cross-Modal Retrieval
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 20, Issue 9Article No.: 295, Pages 1–19https://doi.org/10.1145/3674507Cross-modal hashing is widely used for efficient similarity searches, improving data processing efficiency, and reducing storage costs. Existing cross-modal hashing methods primarily focus on centralized training scenarios, where fixed-scale and fixed-...
- research-articleSeptember 2024
Multimodal PEAR Chain-of-Thought Reasoning for Multimodal Sentiment Analysis
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 20, Issue 9Article No.: 286, Pages 1–23https://doi.org/10.1145/3672398Multimodal sentiment analysis aims to predict sentiments from multimodal signals such as audio, video, and text. Existing methods often rely on Pre-trained Language Models (PLMs) to extract semantic information from textual data, lacking an in-depth ...
-
- research-articleSeptember 2024
Style Variable and Irrelevant Learning for Generalizable Person Re-identification
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 20, Issue 9Article No.: 281, Pages 1–22https://doi.org/10.1145/3671003Domain generalization person re-identification (DG-ReID) has gained much attention recently due to the poor performance of supervised re-identification on unseen domains. The goal of domain generalization is to develop a model that is insensitive to ...
- research-articleSeptember 2024
ANAGL: A Noise-Resistant and Anti-Sparse Graph Learning for Micro-Video Recommendation
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 20, Issue 9Article No.: 278, Pages 1–15https://doi.org/10.1145/3670407In recent years, graph convolutional networks (GCNs) have seen widespread utilization within micro-video recommendation systems, facilitating the understanding of user preferences through interactions with micro-videos. Despite the commendable performance ...
- research-articleSeptember 2024JUST ACCEPTED
Personalized Federated Mutual Learning for Unsupervised Camera-aware Person Re-identification
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Just Accepted https://doi.org/10.1145/3696453Person re-identification (ReID) is essential for enhancing security and tracking in multi-camera surveillance systems. To achieve effective person re-identification (ReID) performance across diverse datasets, the Federated Unsupervised Person Re-...
- research-articleSeptember 2024JUST ACCEPTED
CrossFormer: Cross-modal Representation Learning via Heterogeneous Graph Transformer
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Just Accepted https://doi.org/10.1145/3688801Transformers have been recognized as powerful tools for various cross-modal tasks due to their superior ability to perform representation learning through self-attention. Existing transformer-based cross-modal models can be categorized into single-stream ...
- research-articleSeptember 2024
Exploiting Instance-level Relationships in Weakly Supervised Text-to-Video Retrieval
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 20, Issue 10Article No.: 316, Pages 1–21https://doi.org/10.1145/3663571Text-to-Video Retrieval is a typical cross-modal retrieval task that has been studied extensively under a conventional supervised setting. Recently, some works have sought to extend the problem to a weakly supervised formulation, which can be more ...
- research-articleSeptember 2024
Rank-based Hashing for Effective and Efficient Nearest Neighbor Search for Image Retrieval
- Vinicius Sato Kawai,
- Lucas Pascotti Valem,
- Alexandro Baldassin,
- Edson Borin,
- Daniel Carlos Guimarães Pedronette,
- Longin Jan Latecki
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 20, Issue 10Article No.: 329, Pages 1–19https://doi.org/10.1145/3659580The large and growing amount of digital data creates a pressing need for approaches capable of indexing and retrieving multimedia content. A traditional and fundamental challenge consists of effectively and efficiently performing nearest-neighbor ...
- research-articleSeptember 2024JUST ACCEPTED
Fast Unsupervised Cross-modal Hashing With Robust Factorization and Dual Projection
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Just Accepted https://doi.org/10.1145/3694684Unsupervised hashing has attracted extensive attention in effectively and efficiently tackling large-scale cross-modal retrieval task. Existing methods typically try to mine the latent common subspace across multimodal data without any category ...
- research-articleAugust 2024JUST ACCEPTED
SeSMR: Secure and Efficient Session-based Multimedia Recommendation in Edge Computing
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Just Accepted https://doi.org/10.1145/3687473Session-based multimedia recommendation in edge computing remains an important issue for boosting the utilization of services since service composition has increasingly attracted attention. Existing session-based recommendations (SBRs) model the session ...
- research-articleAugust 2024JUST ACCEPTED
QR-CLIP: Introducing Explicit Knowledge for Location and Time Reasoning
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Just Accepted https://doi.org/10.1145/3689638This paper focuses on reasoning about the location and time behind images. Given that pre-trained vision-language models (VLMs) exhibit excellent image and text understanding capabilities, most existing methods leverage them to match visual cues with ...
- research-articleAugust 2024
Realizing Efficient On-Device Language-based Image Retrieval
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 20, Issue 9Article No.: 270, Pages 1–18https://doi.org/10.1145/3649896Advances in deep learning have enabled accurate language-based search and retrieval (e.g., over user photos) in the cloud. Many users prefer to store their photos in the home due to privacy concerns. As such, a need arises for models that can perform ...
- research-articleAugust 2024
Encrypted Video Search with Single/Multiple Writers
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 20, Issue 9Article No.: 264, Pages 1–23https://doi.org/10.1145/3643887Video-based services have become popular. Clients often outsource their videos to the cloud to relieve local maintenance. However, privacy has become a major concern, since many videos contain sensitive information. Although retrieving (unencrypted) ...
- research-articleJuly 2024JUST ACCEPTED
Harnessing Representative Spatial-Temporal Information for Video Question Answering
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Just Accepted https://doi.org/10.1145/3675399Video question answering, aiming to answer a natural language question related to the given video, has become prevalent in the past few years. Although remarkable improvements have been obtained, it is still exposed to the challenge of insufficient ...
- research-articleJune 2024
Online Cross-modal Hashing With Dynamic Prototype
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 20, Issue 8Article No.: 252, Pages 1–18https://doi.org/10.1145/3665249Online cross-modal hashing has received increasing attention due to its efficiency and effectiveness in handling cross-modal streaming data retrieval. Despite the promising performance, these methods mainly focus on the supervised learning paradigm, ...
- research-articleJune 2024
Exploration of Speech and Music Information for Movie Genre Classification
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 20, Issue 8Article No.: 241, Pages 1–19https://doi.org/10.1145/3664197Movie genre prediction from trailers is mostly attempted in a multi-modal manner. However, the characteristics of movie trailer audio indicate that this modality alone might be highly effective in genre prediction. Movie trailer audio predominantly ...
- research-articleJune 2024
UniQRNet: Unifying Referring Expression Grounding and Segmentation with QRNet
- Jiabo Ye,
- Junfeng Tian,
- Ming Yan,
- Haiyang Xu,
- Qinghao Ye,
- Yaya Shi,
- Xiaoshan Yang,
- Xuwu Wang,
- Ji Zhang,
- Liang He,
- Xin Lin
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 20, Issue 8Article No.: 246, Pages 1–28https://doi.org/10.1145/3660638Referring expression comprehension aims to align natural language queries with visual scenes, which requires establishing fine-grained correspondence between vision and language. This has important applications in multi-modal reasoning systems. Existing ...