Information retrieval

Applied Filters

People

Publication Date

Searched The ACM Guide to Computing Literature (3,766,106 records)|Limit your search to The ACM Full-Text Collection (758,920 records)

Showing 1 - 20of252 Results

Filters

Select All

Export Citations Save to Binder

per page:

Recency

research-article
Free
October 2024
JUST ACCEPTED
Correlation-aware Cross-modal Attention Network for Fashion Compatibility Modeling in UGC Systems
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Just Accepted https://doi.org/10.1145/3698772
Empowered by the continuous integration of social multimedia and artificial intelligence, the application scenarios of information retrieval (IR) progressively tend to be diversified and personalized. Currently, User-Generated Content (UGC) systems have ...
0
Metrics
Total Citations0
View online with eReader
PDF
research-article
Free
October 2024
JUST ACCEPTED
Mutually-Guided Hierarchical Multi-Modal Feature Learning for Referring Image Segmentation
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Just Accepted https://doi.org/10.1145/3698771
Referring image segmentation aims to locate and segment the target region based on a given textual expression query. The primary challenge is to understand semantics from visual and textual modalities and achieve alignment and matching. Prior works have ...
0
Metrics
Total Citations0
View online with eReader
PDF
research-article
Free
October 2024
JUST ACCEPTED
Scalable Frame-based Construction of Sociocultural NormBases for Socially-Aware Dialogues
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Just Accepted https://doi.org/10.1145/3697838
Sociocultural norms serve as guiding principles for personal conduct in social interactions, emphasizing respect, cooperation, and appropriate behavior, which is able to benefit tasks including conversational information retrieval, contextual information ...
0
Metrics
Total Citations0
View online with eReader
PDF
research-article
September 2024
Privacy-Enhanced Prototype-Based Federated Cross-Modal Hashing for Cross-Modal Retrieval
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 20, Issue 9Article No.: 295, Pages 1–19https://doi.org/10.1145/3674507
Cross-modal hashing is widely used for efficient similarity searches, improving data processing efficiency, and reducing storage costs. Existing cross-modal hashing methods primarily focus on centralized training scenarios, where fixed-scale and fixed-...
0
196
Metrics
Total Citations0
Total Downloads196
Last 12 Months196
Last 6 weeks76
Get Access
research-article
September 2024
Multimodal PEAR Chain-of-Thought Reasoning for Multimodal Sentiment Analysis
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 20, Issue 9Article No.: 286, Pages 1–23https://doi.org/10.1145/3672398
Multimodal sentiment analysis aims to predict sentiments from multimodal signals such as audio, video, and text. Existing methods often rely on Pre-trained Language Models (PLMs) to extract semantic information from textual data, lacking an in-depth ...
2
358
Metrics
Total Citations2
Total Downloads358
Last 12 Months358
Last 6 weeks100
Get Access
research-article
September 2024
Style Variable and Irrelevant Learning for Generalizable Person Re-identification
- Kai Lv,
- Haobo Chen,
- Chuyang Zhao,
- Kai Tu,
- Junru Chen,
- Yadong Li,
- Boxun Li,
- Youfang Lin
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 20, Issue 9Article No.: 281, Pages 1–22https://doi.org/10.1145/3671003
Domain generalization person re-identification (DG-ReID) has gained much attention recently due to the poor performance of supervised re-identification on unseen domains. The goal of domain generalization is to develop a model that is insensitive to ...
0
90
Metrics
Total Citations0
Total Downloads90
Last 12 Months90
Last 6 weeks29
Get Access
research-article
September 2024
ANAGL: A Noise-Resistant and Anti-Sparse Graph Learning for Micro-Video Recommendation
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 20, Issue 9Article No.: 278, Pages 1–15https://doi.org/10.1145/3670407
In recent years, graph convolutional networks (GCNs) have seen widespread utilization within micro-video recommendation systems, facilitating the understanding of user preferences through interactions with micro-videos. Despite the commendable performance ...
0
107
Metrics
Total Citations0
Total Downloads107
Last 12 Months107
Last 6 weeks25
Get Access
research-article
Free
September 2024
JUST ACCEPTED
Personalized Federated Mutual Learning for Unsupervised Camera-aware Person Re-identification
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Just Accepted https://doi.org/10.1145/3696453
Person re-identification (ReID) is essential for enhancing security and tracking in multi-camera surveillance systems. To achieve effective person re-identification (ReID) performance across diverse datasets, the Federated Unsupervised Person Re-...
0
49
Metrics
Total Citations0
Total Downloads49
Last 12 Months49
Last 6 weeks49
View online with eReader
PDF
research-article
Free
September 2024
JUST ACCEPTED
CrossFormer: Cross-modal Representation Learning via Heterogeneous Graph Transformer
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Just Accepted https://doi.org/10.1145/3688801
Transformers have been recognized as powerful tools for various cross-modal tasks due to their superior ability to perform representation learning through self-attention. Existing transformer-based cross-modal models can be categorized into single-stream ...
0
108
Metrics
Total Citations0
Total Downloads108
Last 12 Months108
Last 6 weeks108
View online with eReader
PDF
research-article
September 2024
Exploiting Instance-level Relationships in Weakly Supervised Text-to-Video Retrieval
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 20, Issue 10Article No.: 316, Pages 1–21https://doi.org/10.1145/3663571
Text-to-Video Retrieval is a typical cross-modal retrieval task that has been studied extensively under a conventional supervised setting. Recently, some works have sought to extend the problem to a weakly supervised formulation, which can be more ...
0
254
Metrics
Total Citations0
Total Downloads254
Last 12 Months254
Last 6 weeks46
Get Access
research-article
September 2024
Rank-based Hashing for Effective and Efficient Nearest Neighbor Search for Image Retrieval
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 20, Issue 10Article No.: 329, Pages 1–19https://doi.org/10.1145/3659580
The large and growing amount of digital data creates a pressing need for approaches capable of indexing and retrieving multimedia content. A traditional and fundamental challenge consists of effectively and efficiently performing nearest-neighbor ...
0
268
Metrics
Total Citations0
Total Downloads268
Last 12 Months268
Last 6 weeks45
Get Access
research-article
Free
September 2024
JUST ACCEPTED
Fast Unsupervised Cross-modal Hashing With Robust Factorization and Dual Projection
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Just Accepted https://doi.org/10.1145/3694684
Unsupervised hashing has attracted extensive attention in effectively and efficiently tackling large-scale cross-modal retrieval task. Existing methods typically try to mine the latent common subspace across multimodal data without any category ...
0
89
Metrics
Total Citations0
Total Downloads89
Last 12 Months89
Last 6 weeks89
View online with eReader
PDF
research-article
Free
August 2024
JUST ACCEPTED
SeSMR: Secure and Efficient Session-based Multimedia Recommendation in Edge Computing
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Just Accepted https://doi.org/10.1145/3687473
Session-based multimedia recommendation in edge computing remains an important issue for boosting the utilization of services since service composition has increasingly attracted attention. Existing session-based recommendations (SBRs) model the session ...
0
63
Metrics
Total Citations0
Total Downloads63
Last 12 Months63
Last 6 weeks63
View online with eReader
PDF
research-article
Free
August 2024
JUST ACCEPTED
QR-CLIP: Introducing Explicit Knowledge for Location and Time Reasoning
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Just Accepted https://doi.org/10.1145/3689638
This paper focuses on reasoning about the location and time behind images. Given that pre-trained vision-language models (VLMs) exhibit excellent image and text understanding capabilities, most existing methods leverage them to match visual cues with ...
0
63
Metrics
Total Citations0
Total Downloads63
Last 12 Months63
Last 6 weeks63
View online with eReader
PDF
research-article
Open Access
August 2024
Realizing Efficient On-Device Language-based Image Retrieval
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 20, Issue 9Article No.: 270, Pages 1–18https://doi.org/10.1145/3649896
Advances in deep learning have enabled accurate language-based search and retrieval (e.g., over user photos) in the cloud. Many users prefer to store their photos in the home due to privacy concerns. As such, a need arises for models that can perform ...
0
301
Metrics
Total Citations0
Total Downloads301
Last 12 Months301
Last 6 weeks119
View online with eReader
PDF
research-article
August 2024
Encrypted Video Search with Single/Multiple Writers
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 20, Issue 9Article No.: 264, Pages 1–23https://doi.org/10.1145/3643887
Video-based services have become popular. Clients often outsource their videos to the cloud to relieve local maintenance. However, privacy has become a major concern, since many videos contain sensitive information. Although retrieving (unencrypted) ...
0
289
Metrics
Total Citations0
Total Downloads289
Last 12 Months289
Last 6 weeks54
Get Access
research-article
Free
July 2024
JUST ACCEPTED
Harnessing Representative Spatial-Temporal Information for Video Question Answering
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Just Accepted https://doi.org/10.1145/3675399
Video question answering, aiming to answer a natural language question related to the given video, has become prevalent in the past few years. Although remarkable improvements have been obtained, it is still exposed to the challenge of insufficient ...
0
121
Metrics
Total Citations0
Total Downloads121
Last 12 Months121
Last 6 weeks41
View online with eReader
PDF
research-article
June 2024
Online Cross-modal Hashing With Dynamic Prototype
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 20, Issue 8Article No.: 252, Pages 1–18https://doi.org/10.1145/3665249
Online cross-modal hashing has received increasing attention due to its efficiency and effectiveness in handling cross-modal streaming data retrieval. Despite the promising performance, these methods mainly focus on the supervised learning paradigm, ...
0
172
Metrics
Total Citations0
Total Downloads172
Last 12 Months172
Last 6 weeks24
Get Access
research-article
June 2024
Exploration of Speech and Music Information for Movie Genre Classification
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 20, Issue 8Article No.: 241, Pages 1–19https://doi.org/10.1145/3664197
Movie genre prediction from trailers is mostly attempted in a multi-modal manner. However, the characteristics of movie trailer audio indicate that this modality alone might be highly effective in genre prediction. Movie trailer audio predominantly ...
0
129
Metrics
Total Citations0
Total Downloads129
Last 12 Months129
Last 6 weeks22
Get Access
research-article
June 2024
UniQRNet: Unifying Referring Expression Grounding and Segmentation with QRNet
- Jiabo Ye,
- Junfeng Tian,
- Ming Yan,
- Haiyang Xu,
- Qinghao Ye,
- Yaya Shi,
- Xiaoshan Yang,
- Xuwu Wang,
- Ji Zhang,
- Liang He,
- Xin Lin
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 20, Issue 8Article No.: 246, Pages 1–28https://doi.org/10.1145/3660638
Referring expression comprehension aims to align natural language queries with visual scenes, which requires establishing fine-grained correspondence between vision and language. This has important applications in multi-modal reasoning systems. Existing ...
0
329
Metrics
Total Citations0
Total Downloads329
Last 12 Months329
Last 6 weeks58
Get Access

Applied Filters

People

Names

Institutions

Authors

Editors

Reviewers

Publications

All Publications

Content Type

Supplemental Material Type

Media Formats

Paper Award

Publisher

Publication Date

Correlation-aware Cross-modal Attention Network for Fashion Compatibility Modeling in UGC Systems

Mutually-Guided Hierarchical Multi-Modal Feature Learning for Referring Image Segmentation

Scalable Frame-based Construction of Sociocultural NormBases for Socially-Aware Dialogues

Privacy-Enhanced Prototype-Based Federated Cross-Modal Hashing for Cross-Modal Retrieval

Multimodal PEAR Chain-of-Thought Reasoning for Multimodal Sentiment Analysis

Style Variable and Irrelevant Learning for Generalizable Person Re-identification

ANAGL: A Noise-Resistant and Anti-Sparse Graph Learning for Micro-Video Recommendation

Personalized Federated Mutual Learning for Unsupervised Camera-aware Person Re-identification

CrossFormer: Cross-modal Representation Learning via Heterogeneous Graph Transformer

Exploiting Instance-level Relationships in Weakly Supervised Text-to-Video Retrieval

Rank-based Hashing for Effective and Efficient Nearest Neighbor Search for Image Retrieval

Fast Unsupervised Cross-modal Hashing With Robust Factorization and Dual Projection

SeSMR: Secure and Efficient Session-based Multimedia Recommendation in Edge Computing

QR-CLIP: Introducing Explicit Knowledge for Location and Time Reasoning

Realizing Efficient On-Device Language-based Image Retrieval

Encrypted Video Search with Single/Multiple Writers

Harnessing Representative Spatial-Temporal Information for Video Question Answering

Online Cross-modal Hashing With Dynamic Prototype

Exploration of Speech and Music Information for Movie Genre Classification

UniQRNet: Unifying Referring Expression Grounding and Segmentation with QRNet