default search action
ICMR 2023: Thessaloniki, Greece
- Ioannis Kompatsiaris, Jiebo Luo, Nicu Sebe, Angela Yao, Vasileios Mazaris, Symeon Papadopoulos, Adrian Popescu, Zi Helen Huang:
Proceedings of the 2023 ACM International Conference on Multimedia Retrieval, ICMR 2023, Thessaloniki, Greece, June 12-15, 2023. ACM 2023
Regular Long Papers
- Nitish Nag, Hyungik Oh, Mengfan Tang, Mingshu Shi, Ramesh C. Jain:
Integrative Multi-Modal Computing for Personal Health Navigation. 1-9 - Hugo Schindler, Adrian Popescu, Van-Khoa Nguyen, Jerome Deshayes-Chossart:
Raising User Awareness about the Consequences of Online Photo Sharing. 10-19 - Sven Schultze, Ani Withöft, Larbi Abdenebaoui, Susanne Boll:
Explaining Image Aesthetics Assessment: An Interactive Approach. 20-28 - Omar Adjali, Paul Grimal, Olivier Ferret, Sahar Ghannay, Hervé Le Borgne:
Explicit Knowledge Integration for Knowledge-Aware Visual Question Answering about Named Entities. 29-38 - Shuo Chen, Ying-Jun Du, Pascal Mettes, Cees G. M. Snoek:
Multi-Label Meta Weighting for Long-Tailed Dynamic Scene Graph Generation. 39-47 - Ying He, Gongqing Wu, Desheng Cai, Xuegang Hu:
Cross-View Sample-Enriched Graph Contrastive Learning Network for Personalized Micro-video Recommendation. 48-56 - Konstantin Schall, Kai Uwe Barthel, Nico Hezel, Klaus Jung:
Improving Image Encoders for General-Purpose Nearest Neighbor Search and Classification. 57-66 - Giacomo Nebbia, Adriana Kovashka:
Hypernymization of named entity-rich captions for grounding-based multi-modal pretraining. 67-75 - Yizhao Gao, Zhiwu Lu:
CMMT: Cross-Modal Meta-Transformer for Video-Text Retrieval. 76-84 - Jiazhi Guan, Hang Zhou, Zhizhi Guo, Tianshu Hu, Lirui Deng, Chengbin Quan, Meng Fang, Youjian Zhao:
Dual-Modality Co-Learning for Unveiling Deepfake in Spatio-Temporal Space. 85-94 - Jiaxin Deng, Dong Shen, Haojie Pan, Xiangyu Wu, Ximan Liu, Gaofeng Meng, Fan Yang, Tingting Gao, Ruiji Fu, Zhongyuan Wang:
A Unified Model for Video Understanding and Knowledge Embedding with Heterogeneous Knowledge Graph Dataset. 95-104 - Chiyu Zhang, Zaiyan Dai, Peng Cao, Jun Yang:
Edge Enhanced Image Style Transfer via Transformers. 105-114 - Juheon Hwang, Jiwoo Kang, Kyoungoh Lee, Sanghoon Lee:
Unlocking Potential of 3D-aware GAN for More Expressive Face Generation. 115-124 - Yuze Wang, Junyi Wang, Yansong Qu, Yue Qi:
RIP-NeRF: Learning Rotation-Invariant Point-based Neural Radiance Field for Fine-grained Editing and Compositing. 125-134 - Tiancong Cheng, Ying Zhang, Yifang Yin, Roger Zimmermann, Zhiwen Yu, Bin Guo:
A Multi-Teacher Assisted Knowledge Distillation Approach for Enhanced Face Image Authentication. 135-143 - Ying Zhang, Lilei Zheng, Vrizlynn L. L. Thing, Roger Zimmermann, Bin Guo, Zhiwen Yu:
FaceLivePlus: A Unified System for Face Liveness Detection and Face Verification. 144-152 - Bing Han, Jianshu Li, Wenqi Ren, Man Luo, Jian Liu, Xiaochun Cao:
SIGMA-DF: Single-Side Guided Meta-Learning for Deepfake Detection. 153-161 - Yizhe Zhu, Jialin Gao, Xi Zhou:
AVForensics: Audio-driven Deepfake Video Detection with Masking Strategy in Self-supervision. 162-171 - Marco Arazzi, Marco Cotogni, Antonino Nocera, Luca Virgili:
Predicting Tweet Engagement with Graph Neural Networks. 172-180 - Peiwang Tang, Qinghua Zhang, Xianchao Zhang:
A Recurrent Neural Network based Generative Adversarial Network for Long Multivariate Time Series Forecasting. 181-189 - Victoria Sherratt, Kevin Pimbblet, Nina Dethlefs:
Multi-channel Convolutional Neural Network for Precise Meme Classification. 190-198 - Yankun Wu, Yuta Nakashima, Noa Garcia:
Not Only Generative Art: Stable Diffusion for Content-Style Disentanglement in Art Analysis. 199-208 - Wen-Jiin Tsai, Yi-Cheng Tien:
Attention-based Video Virtual Try-On. 209-216 - Soyun Choi, Youjia Zhang, Sungeun Hong:
Intra-inter Modal Attention Blocks for RGB-D Semantic Segmentation. 217-225 - Cheng-Yu Fang, Xian-Feng Han:
Joint Geometric-Semantic Driven Character Line Drawing Generation. 226-233 - Zeqing Xia, Zhouhui Lian:
CurveSDF: Binary Image Vectorization Using Signed Distance Fields. 234-242 - Yusong Wang, Dongyuan Li, Kotaro Funakoshi, Manabu Okumura:
EMP: Emotion-guided Multi-modal Fusion and Contrastive Learning for Personality Traits Recognition. 243-252 - Zefan Zhang, Yi Ji, Chunping Liu:
Knowledge-Aware Causal Inference Network for Visual Dialog. 253-261 - Chun Zhang, Keyan Ren, Qingyun Bian, Yu Shi:
Less is More: Decoupled High-Semantic Encoding for Action Recognition. 262-271 - Ziwei Xiong, Han Wang:
Dual-Stream Multimodal Learning for Topic-Adaptive Video Highlight Detection. 272-279 - Ruilin Zhang, Haiyang Zheng, Hongpeng Wang:
TDEC: Deep Embedded Image Clustering with Transformer and Distribution Information. 280-288 - Beibei Zhang, Yaqun Fang, Fan Yu, Jia Bei, Tongwei Ren:
MMSF: A Multimodal Sentiment-Fused Method to Recognize Video Speaking Style. 289-297 - Guoxing Yang, Haoyu Lu, Zelong Sun, Zhiwu Lu:
Shot Retrieval and Assembly with Text Script for Video Montage Generation. 298-306 - Shenshen Li, Xing Xu, Fumin Shen, Yang Yang:
Multi-granularity Separation Network for Text-Based Person Retrieval with Bidirectional Refinement Regularization. 307-315 - Tiening Sun, Zhong Qian, Peifeng Li, Qiaoming Zhu:
Graph Interactive Network with Adaptive Gradient for Multi-Modal Rumor Detection. 316-324 - Harsh Sinha, Adriana Kovashka:
Towards Shape-regularized Learning for Mitigating Texture Bias in CNNs. 325-334 - Mingqi Chen, Feng Shuang, Shaodong Li, Xi Liu:
ASCS-Reinforcement Learning: A Cascaded Framework for Accurate 3D Hand Pose Estimation. 335-342 - Yangming Zhou, Yuzhou Yang, Qichao Ying, Zhenxing Qian, Xinpeng Zhang:
Multi-modal Fake News Detection on Social Media via Multi-grained Information Fusion. 343-352 - Mingjun Li, Shuo Xu, Feng Su:
Learning and Fusing Multi-Scale Representations for Accurate Arbitrary-Shaped Scene Text Recognition. 353-361 - Chunhong Cao, Huawei Fu, Gai Li, Mengyang Wang, Xieping Gao:
Modeling Functional Brain Networks with Multi-Head Attention-based Region-Enhancement for ADHD Classification. 362-369 - Chunhong Cao, Gai Li, Huawei Fu, Xingxing Li, Xieping Gao:
SPAE: Spatial Preservation-based Autoencoder for ADHD functional brain networks modelling. 370-377 - Bingchao Wu, Yangyuxuan Kang, Bei Guan, Yongji Wang:
We Are Not So Similar: Alleviating User Representation Collapse in Social Recommendation. 378-387 - Pengzhi Li, Yikang Ding, Linge Li, Jingwei Guan, Zhiheng Li:
Towards Practical Consistent Video Depth Estimation. 388-397 - Jiancheng Pan, Qing Ma, Cong Bai:
Reducing Semantic Confusion: Scene-aware Aggregation Network for Remote Sensing Cross-modal Retrieval. 398-406 - Jialin Tian, Xing Xu, Zuo Cao, Gong Zhang, Fumin Shen, Yang Yang:
Zero-shot Sketch-based Image Retrieval with Adaptive Balanced Discriminability and Generalizability. 407-415 - Liang Li, Weiwei Sun:
Label-wise Deep Semantic-Alignment Hashing for Cross-Modal Retrieval. 416-424 - Ying Li, Chunming Guan, Jiaquan Gao:
TsP-Tran: Two-Stage Pure Transformer for Multi-Label Image Retrieval. 425-433 - Maria Pegia, Björn Þór Jónsson, Anastasia Moumtzidou, Ilias Gialampoukidis, Stefanos Vrochidis, Ioannis Kompatsiaris:
MuseHash: Supervised Bayesian Hashing for Multimodal Image Representation. 434-442 - Siteng Huang, Qiyao Wei, Donglin Wang:
Reference-Limited Compositional Zero-Shot Learning. 443-451 - Haram Choi, Cheolwoong Na, Jinseop Kim, Jihoon Yang:
Exploration of Lightweight Single Image Denoising with Transformers and Truly Fair Training. 452-461 - Feng Zhao, Min Zhang, Tiancheng Huang, Donglin Wang:
TAGM: Task-Aware Graph Model for Few-shot Node Classification. 462-471 - Yutian Luo, Yizhao Gao, Zhiwu Lu:
Learning with Adaptive Knowledge for Continual Image-Text Modeling. 472-480 - Wenxiu Geng, Xiangxian Li, Yulong Bian:
A Dual-branch Enhanced Multi-task Learning Network for Multimodal Sentiment Analysis. 481-489 - Yu Zang, Zhe Xue, Shilong Ou, Yunfei Long, Hai Zhou, Junping Du:
FedPcf : An Integrated Federated Learning Framework with Multi-Level Prospective Correction Factor. 490-498 - Lina Sun, Yewen Li, Yumin Dong:
Learning From Expert: Vision-Language Knowledge Distillation for Unsupervised Cross-Modal Hashing Retrieval. 499-507 - Yaoqing Li, Sheng-Hua Zhong, Shuai Li, Yan Liu:
A Robust Deep Learning Enhanced Monocular SLAM System for Dynamic Environments. 508-515 - Yingnan Fu, Wenyuan Cai, Ming Gao, Aoying Zhou:
Symbol Location-Aware Network for Improving Handwritten Mathematical Expression Recognition. 516-524
Regular Short Papers
- Daichi Suzuki, Go Irie, Kiyoharu Aizawa:
Text-to-Image Fashion Retrieval with Fabric Textures. 525-529 - Panagiota Alexoudi, Ioannis Mademlis, Ioannis Pitas:
Escaping local minima in deep reinforcement learning for video summarization. 530-534 - Florian Spiess, Ralph Gasser, Silvan Heller, Heiko Schuldt, Luca Rossetto:
A Comparison of Video Browsing Performance between Desktop and Virtual Reality Interfaces. 535-539 - Zhexu Shen, Liang Yang, Zhihan Yang, Hongfei Lin:
More Than Simply Masking: Exploring Pre-training Strategies for Symbolic Music Understanding. 540-544 - Pu Ching, Hung-Kuo Chu, Min-Chun Hu:
SOFA: Style-based One-shot 3D Facial Animation Driven by 2D landmarks. 545-549 - Kun He, Changyu Li, Jie Shao:
Strong-Weak Cross-View Interaction Network for Stereo Image Super-Resolution. 550-554 - Jiabao Sheng, Saikit Lam, Zhe Li, Jiang Zhang, Xinzhi Teng, Yuanpeng Zhang, Jing Cai:
Multi-view Contrastive Learning with Additive Margin for Adaptive Nasopharyngeal Carcinoma Radiotherapy Prediction. 555-559 - Shuiying Liao, Yujuan Ding, P. Y. Mok:
Recommendation of Mix-and-Match Clothing by Modeling Indirect Personal Compatibility. 560-564 - Arun Zachariah, Praveen Rao:
Video Retrieval for Everyday Scenes With Common Objects. 565-570 - subst Nico, Tse-Yu Pan, Herman Prawiro, Jian-Wei Peng, Wen-Cheng Chen, Hung-Kuo Chu, Min-Chun Hu:
Offensive Tactics Recognition in Broadcast Basketball Videos Based on 2D Camera View Player Heatmaps. 571-575 - Meishan Liu, Meng Jian, Ge Shi, Ye Xiang, Lifang Wu:
Graph Contrastive Learning on Complementary Embedding for Recommendation. 576-580 - Sahar Tahmasebi, Sherzod Hakimov, Ralph Ewerth, Eric Müller-Budack:
Improving Generalization for Multimodal Fake News Detection. 581-585 - Christos Koutlis, Manos Schinas, Symeon Papadopoulos:
MemeFier: Dual-stage Modality Fusion for Image Meme Classification. 586-591 - Aristotelis Ballas, Christos Diou:
CNNs with Multi-Level Attention for Domain Generalization. 592-596 - Werner Bailer, Rahel Arnold, Vera Benz, Davide Coccomini, Anastasios Gkagkas, Gylfi Þór Guðmundsson, Silvan Heller, Björn Þór Jónsson, Jakub Lokoc, Nicola Messina, Nick Pantelidis, Jiaxin Wu:
Improving Query and Assessment Quality in Text-Based Interactive Video Retrieval Evaluation. 597-601 - Iacopo Ghinassi, Lin Wang, Chris Newell, Matthew Purver:
Multimodal Topic Segmentation of Podcast Shows with Pre-trained Neural Encoders. 602-606 - Georgios Orfanidis, Konstantinos Ioannidis, Anastasios Tefas, Stefanos Vrochidis, Ioannis Kompatsiaris:
Tweaking EfficientDet for frugal training. 607-611 - Mingyuan Ge, Yewen Li, Longfei Ma, Mingyong Li:
Deep Enhanced-Similarity Attention Cross-modal Hashing Learning. 612-616 - Kai Feng, Tao Liu, Heng Zhang, Zihao Meng, Zemin Miao:
TNOD: Transformer Network with Object Detection for Tag Recommendation. 617-621 - Tianqi Zhao, Ming Kong, Tian Liang, Qiang Zhu, Kun Kuang, Fei Wu:
CLAP: Contrastive Language-Audio Pre-training Model for Multi-modal Sentiment Analysis. 622-626
Brave New Ideas Paper
- David Alonso del Barrio, Daniel Gatica-Perez:
Framing the News: From Human Perception to Large Language Model Inferences. 627-635
Doctoral Symposium Paper
- Shenshen Li:
Dual-Path Semantic Construction Network for Composed Query-Based Image Retrieval. 636-639
Reproducibility Track Paper
- Mitchell Lee, Chris Lee, Sanjay Penmetsa, Min Chen, Mizuki Miyashita, Naatosi Fish, Bo Wu, Omar Shahbaz Khan:
Reproducibility Companion Paper: MeTILDA - Platform for Melodic Transcription in Language Documentation and Application. 640-643
Technical Demonstrations
- Kento Terauchi, Keiji Yanai:
CalorieCam360: Simultaneous Eating Action Recognition of Multiple People Using an Omnidirectional Camera. 644-648 - Giuseppe Amato, Paolo Bolettieri, Fabio Carrara, Fabrizio Falchi, Claudio Gennaro, Nicola Messina, Lucia Vadicamo, Claudio Vairo:
VISIONE: A Large-Scale Video Retrieval System with Advanced Search Functionalities. 649-653 - Kai Uwe Barthel, Nico Hezel, Konstantin Schall, Klaus Jung:
navigu.net: NAvigation in Visual Image Graphs gets User-friendly. 654-658 - Manos Schinas, Panagiotis Galopoulos, Symeon Papadopoulos:
MAAM: Media Asset Annotation and Management. 659-663 - Stefanos Stoikos, David Kauchak, Douglas Turnbull, Alexandra Papoutsaki:
Cross-Language Music Recommendation Exploration. 664-668
Keynote Talk Abstracts
- Nozha Boujemaa, Abdelrahman Hassan, Giorgi Kokaia, Pratyush Kumar Sinha:
How Responsible LLMs are beneficial to search and exploration in Retail industry. 669 - Jürgen Gall:
Efficient CNNs and Transformers for Video Understanding and Image Synthesis. 670 - Elisa Ricci:
Recognizing Actions in Videos under Domain Shift. 671
Tutorial Abstract
- Kai Uwe Barthel:
Algorithms for Generating and Evaluating Visually Sorted Grid Layouts. 672-673
Workshop Abstracts
- Guillaume Habault, Minh-Son Dao, Michael Alexander Riegler, Duc-Tien Dang-Nguyen, Yuta Nakashima, Cathal Gurrin:
ICDAR'23: Intelligent Cross-Data Analysis and Retrieval. 674-675 - Luca Cuccovillo, Bogdan Ionescu, Giorgos Kordopatis-Zilos, Symeon Papadopoulos, Adrian Popescu:
MAD '23 Workshop: Multimedia AI against Disinformation. 676-677 - Cathal Gurrin, Björn Þór Jónsson, Duc-Tien Dang-Nguyen, Graham Healy, Jakub Lokoc, Liting Zhou, Luca Rossetto, Minh-Triet Tran, Wolfgang Hürst, Werner Bailer, Klaus Schoeffmann:
Introduction to the Sixth Annual Lifelog Search Challenge, LSC'23. 678-679
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.