default search action
7th MIPR 2024: San Jose, CA, USA
- 7th IEEE International Conference on Multimedia Information Processing and Retrieval, MIPR 2024, San Jose, CA, USA, August 7-9, 2024. IEEE 2024, ISBN 979-8-3503-5142-2
- Tsung-Shan Yang, Yun-Cheng Wang, Chengwei Wei, C.-C. Jay Kuo:
GHOI: A Green Human-Object-Interaction Detector. 1-7 - Mei Qiu, Wei Lin, Lauren Ann Christopher, Stanley Y. P. Chien, Yaobin Chen, Shu Hu:
Real-Time Lane-Wise Traffic Monitoring in Optimal ROIs. 8-14 - Mehmet Akif Özkanoglu, Ali C. Begen, Sedat Ozer:
SkyDataNet: An Object Detection Algorithm with 2D Gaussian Loss for UAV-Based Aerial Images. 21-27 - Yao-Hui Su, Ming-Der Shieh, Chia-Chi Tsai:
Target-Aware Siamese Networks Based on Masked Attention Mechanism for Visual Object Tracking. 28-34 - Raju Shrestha, Hanne Korneliussen:
A Framework for Generating Images and Hashtags for Social Media Posts for Artificial Influencers. 42-48 - Ning Xu, Serhad Doken:
Automatic Visual Citation Generation for Text-to-Image Generation. 49-54 - Ryan Metcalfe, Garth Long, Charlie L. Wang, Iole Moccagatta:
Enhancing Local LLM Performance Through Heterogeneous Multi-Device Computing. 55-60 - Zhenfei Zhang, Tsung-Wei Huang, Guan-Ming Su, Ming-Ching Chang, Xin Li:
Text-Driven Synchronized Diffusion Video and Audio Talking Head Generation. 61-67 - Haohong Wang, Daniel Smith, Malgorzata Kudelska:
10x Future of Filmmaking Empowered by AIGC. 68-74 - Daniel Kienzle, Marco Kantonis, Robin Schön, Rainer Lienhart:
Segformer++: Efficient Token-Merging Strategies for High-Resolution Semantic Segmentation. 75-81 - Haiyi Li, Xuejing Lei, Xinyu Wang, C.-C. Jay Kuo:
Green Image Label Transfer. 82-87 - Fei Zhao, Jiawen Chen, Bin Huang, Chengcui Zhang, Gary Warner, Rushi Chen, Shaorou Tang, Yuanfei Ma, Zixi Nan:
GenCheck: A LoRA-Adapted Multimodal Large Language Model for Check Analysis. 88-94 - Yuwei Chen, Ming-Ching Chang, Xin Li:
Leveraging Semantic Segmentation for Image Manipulation Detection and Localization. 95-101 - Avinash Anand, Raj Jaiswal, Abhishek Dharmadhikari, Atharva Marathe, Harsh Popat, Harshil Mital, Ashwin R. Nair, Kritarth Prasad, Sidharth Kumar, Astha Verma, Rajiv Ratn Shah, Roger Zimmermann:
GeoVQA: A Comprehensive Multimodal Geometry Dataset for Secondary Education. 102-108 - Debaleen Das Spandan, Razib Iqbal:
ProxeGraph: Scene Graph Generation Utilizing Proxemics for Smart Homes. 109-115 - Junwen Chen, Yingcheng Wang, Keiji Yanai:
HOI as Embeddings: Advancements of Model Representation Capability in Human-Object Interaction Detection. 116-122 - Sheng-Jhou Lu, Hung-Wei Lee, Yu-Ming Han, Ji-Min Zhou, Ying Liu, Huang-Chia Shih:
Lightweight Schemes Fusion for Heatmap-based Human Pose Estimation. 123-126 - Michael R. Smith, Renee Gooding, Jonathan Bisila, Christina L. Ting:
Anomaly Detection in Video Using Compression. 127-133 - Kratika Bhagtani, Amit Kumar Singh Yadav, Paolo Bestagini, Edward J. Delp:
SSLCT: A Convolutional Transformer for Synthetic Speech Localization. 134-140 - Chun-Han Cheng, Ting-Yu Wei, Homer H. Chen:
Playlist Continuation of Cold-Start Songs. 141-147 - Hung-Jui Guo, Balakrishnan Prabhakaran:
Improved Standard-Based Motion Parallax Measurement in Mixed Reality. 148-154 - Kunal Sawarkar, Abhilasha Mangal, Shivam Raj Solanki:
Blended RAG: Improving RAG (Retriever-Augmented Generation) Accuracy with Semantic Search and Hybrid Query-Based Retrievers. 155-161 - Jacob Edward Galajda, Kien A. Hua:
Automated Thematic Composer Classification Using Segment Retrieval. 162-168 - Wen-Shiang Li, Yao-Cheng Lu, Wen-Kai Hsiao, Yu-Yao Tseng, Ming-Hung Wang:
DRM-SN: Detecting Reused Multimedia Content on Social Networks. 169-175 - Xiang Fang, Arvind Easwaran, Blaise Genest:
Uncertainty-Guided Appearance-Motion Association Network for Out-of-Distribution Action Detection. 176-182 - Chenhan Fu, Guoming Wang, Rongxing Lu, Siliang Tang:
FastLearn: A Rapid Learning Agent for Chat Models to Acquire Latest Knowledge. 183-189 - Ryan Tan, Thanh Hong-Phuoc, Lei Gao, Randy Tan, Sagarjit Aujla, Adel Mohamed, Ling Guan, Karthikeyan Umapathy, Naimul Mefraz Khan:
Enhancement of Neonatal Lung Pathology Classification Using Multi-view Feature Representation. 190-195 - Junyu Chen, Jie An, Hanjia Lyu, Christopher Kanan, Jiebo Luo:
Holistic Visual-Textual Sentiment Analysis with Prior Models. 196-202 - Jashia Mitayeegiri, Shaohua Dong, Chenxi Qiu, Qing Yang, Xinrong Li, Heng Fan, Yan Huang:
Radio Map Estimation (RME) with Deep Progressive Network. 203-206 - Soheil Hor, Mostafa El-Khamy, Yanlin Zhou, Amin Arbabian, SukHwan Lim:
CM-ASAP: Cross-Modality Adaptive Sensing and Perception for Efficient Hand Gesture Recognition. 207-213 - Yu-Szu Wei, Yuan-Chun Sun, Shin-Yi Zheng, Hsun-Fu Hsu, Chun-Ying Huang, Cheng-Hsin Hsu:
Mitigating Privacy Threats Without Degrading Visual Quality of VR Applications: Using Re-Identification Attack as a Case Study. 214-220 - Omeed Ashtiani, Meghana Spurthi Maadugundu, Minhas Kamal, Balakrishnan Prabhakaran:
Device-Agnostic Remote Range-of-motion Assessment using Data Abstraction. 221-226 - Franz Louis Cesista, Rui Aguiar, Jason Kim, Paolo Acilo:
Retrieval Augmented Structured Generation: Business Document Information Extraction as Tool Use. 227-230 - Charlie Hsu, Yuan-Chun Sun, Kuan-Yu Lee, Chun-Ying Huang:
Will Neural 3D Object Representations be the Silver Bullet for Improving VR Experience in HMDs? 231-234 - Vijay John, Yasutomo Kawanishi:
Frame-Level Latent Embedding Using Weak Labels for Multi-View Action Recognition. 235-238 - Muhammad Arslan, Muhammad Mubeen, Arslan Akram, Saadullah Farooq Abbasi, Muhammad Salman Ali, Muhammad Usman Tariq:
A Deep Features Based Approach Using Modified ResNet50 and Gradient Boosting for Visual Sentiments Classification. 239-242 - Yang Xing, Peixi Liao, Reem AwdhE Alasleh, Vissuta Khampatee, Farshid Alizadeh-Shabdiz:
Dental X-ray Segmentation and Auto Implant Design Based on Convolutional Neural Network. 243-246 - Jie Cai, Yuan Lin, Jiang Li, Jiaming Ding, Ling Ouyang, Chiu Man Ho, Zibo Meng:
Joint HDR Denoising and Fusion on Mobile Devices. 247-252 - Rex Liu, Xin Liu:
MU-MAE: Multimodal Masked Autoencoders-Based One-Shot Learning. 253-259 - Siddhant Garg, Lijun Zhang, Hui Guan:
Structured Pruning for Multi-Task Deep Neural Networks. 260-266 - Ting Yu Tsai, Li Lin, Shu Hu, Ming-Ching Chang, Hongtu Zhu, Xin Wang:
UU-Mamba: Uncertainty-aware U-Mamba for Cardiac Image Segmentation. 267-273 - Mohammad Abu-Shaira, Weishi Shi:
Unveiling Statistical Significance of Online Regression Over Multiple Datasets. 274-279 - Minghao Li, Junjie Qiu, Weishi Shi:
Macro-AUC-Driven Active Learning Strategy for Multi-Label Classification Enhancement. 280-286 - Dae Yeol Lee, Geonsun Lee, Guan-Ming Su:
Viewing Comfort Enhancement on Head-Mounted Displays Using Stereo Disparity Control. 287-293 - Avinash Anand, Avni Mittal, Laavanaya Dhawan, Juhi Krishnamurthy, Mahisha Ramesh, Naman Lal, Astha Verma, Pijush Bhuyan, Himani, Rajiv Ratn Shah, Roger Zimmermann, Shin'ichi Satoh:
ExCEDA: Unlocking Attention Paradigms in Extended Duration E-Classrooms by Leveraging Attention-Mechanism Models. 301-307 - Avinash Anand, Sarthak Jain, Shashank Sharma, Akhil P. Dominic, Aman Gupta, Ashta Verma, Raj Jaiswal, Naman Lal, Rajiv Ratn Shah, Roger Zimmermann:
Pulse of the Crowd: Quantifying Crowd Energy through Audio and Video Analysis. 308-314 - Yiwei Han, Kaiyi Qi, Jiebo Luo:
Plastic Surgery Image Classification and Generation. 315-320 - Rui Deng, Tianpei Gu:
CU-Mamba: Selective State Space Models with Channel Learning for Image Restoration. 328-334 - Chih-Chung Hsu, Wei-Hao Huang, Wen-Hai Tseng, Ming-Hsuan Wu, Ren-Jung Xu, Chia-Ming Lee:
OmniDet: Omnidirectional Object Detection via Fisheye Camera Adaptation. 335-341 - Most Husne Jahan, Abdelhak Bentaleb:
GESA: Exploring Loss-based Adversarial Attacks in Volumetric Media Streaming. 342-348 - Omkar N. Kulkarni, Aryan Mishra, Shashank Arora, Vivek K. Singh, Pradeep K. Atrey:
LivePics-24: A Multi-person, Multi-camera, Multi-settings Live Photos Dataset. 349-354 - Abhineet Kumar Pandey, Ming-Ching Chang, Xin Li:
TextSleuth: A New Dataset and Baseline for Scene Text Manipulation Detection. 362-368 - Md Atik Ahamed, Qiang Shawn Cheng:
MambaTab: A Plug-and-Play Model for Learning Tabular Data. 369-375 - Bishwa Karki, Chun-Hua Tsai, Pei-Chi Huang, Xin Zhong:
Deep Learning-based Text-in-Image Watermarking. 376-382 - Haoran Tong, Xu Cui, Laiyun Qing:
Single-frame Supervised Action Temporal Localization Based on Multi-view Contrastive Learning. 383-389 - Hadi Hadizadeh, S. Faegheh Yeganli, Bahador Rashidi, Ivan V. Bajic:
Mutual Information Analysis in Multimodal Learning Systems. 390-395 - Ling Guan, Lei Gao, Kai Liu, Zheng Guo:
Mathematics-Inspired Learning: A Green Learning Model with Interpretable Properties. 396-402 - Tejas Duseja, K. M. Annervaz, Jeevithiesh Duggani, Shyam Zacharia, Michael Free, Ambedkar Dukkipati:
Learning to Switch off, Switch on, and Integrate Modalities in Large Pre-trained Transformers. 403-409 - Wala Elsharif, Marco Agus, Mahmood Alzubaidi, James She:
Cultural Relevance Index: Measuring Cultural Relevance in AI-Generated Images. 410-416 - Fei Zhao, Chengcui Zhang:
Parameter-Efficient Adaptation of Foundation Models for Damaged Building Assessment. 417-422 - Shengtai Ju, Amy R. Reibman:
Exploring the Impact of Hand Pose and Shadow on Hand-Washing Action Recognition. 423-429 - Prasun Datta, Chau-Wai Wong, Min Wu:
Enabling Paper-Based Surface Authentication via Digital Twin and Experimental Verification. 430-438 - Yan Ju, Chengzhe Sun, Shan Jia, Shuwei Hou, Zhaofeng Si, Soumyya Kanti Datta, Lipeng Ke, Riky Zhou, Anita Nikolich, Siwei Lyu:
DeepFake-o-meter v2.0: An Open Platform for DeepFake Detection. 439-445 - Vikram Patil, Sharmilee Rajkumar Rajan, Pradeep K. Atrey:
GeoSecure-B: A Method for Secure Bearing Calculation. 446-451 - Narendra Kumar, Gaurav Bhatnagar:
Clearing Text Images: A Non-blind Deblurring with Convex Total Variation Regularization Model. 452-457 - Craig Rainey, Min Chen:
Algorithmic Stock Trading Strategies. 458-464 - Xiaoqiong Liu, Yunhe Feng, Shu Hu, Xiaohui Yuan, Heng Fan:
Benchmarking the Robustness of UAV Tracking Against Common Corruptions. 465-470 - Gowtham Medisetti, Zacchaeus Compson, Heng Fan, Huaxiao Yang, Yunhe Feng:
LitAI: Enhancing Multimodal Literature Understanding and Mining with Generative AI. 471-476 - Beitong Tian, Mingyuan Wu, Ruixiao Zhang, Haozhen Zheng, Bo Chen, Yaohui Wang, Shiv Trivedi, Shanbo Zhang, Robert Bruce Kaufman, Leah Espenhahn, Gianni Pezzarossi, Mauro Sardela, John Dallesasse, Klara Nahrstedt:
GaugeTracker: AI - Powered Cost-Effective Analog Gauge Monitoring System. 477-483 - Md. Abdullah Al Forhad, Weishi Shi:
Balancing Explanations and Adaptation in Offline Continual Learning Systems Using Active Augmented Reply. 484-490 - Shijun Liang, Dongdong Fu:
Controllable Universal Edge-Preserving Image Filtering. 491-494 - Nguyen Gia Bach, Chanh Minh Tran, Eiji Kamioka, Phan Xuan Tan:
Attenuation-Aware Weighted Optical Flow with Medium Transmission Map for Learning-Based Visual Odometry in Underwater Terrain. 495-498 - Shanker Ram, Sambhu Ganesan, Yajat Nagaraj Kiran:
Harmful Brain Activity Classification of Spectrograms with Transfer Deep Learning. 499-502 - Vadim Abronin, Aleksei Naumov, Denis Mazur, Dmitriy Bystrov, Katerina Tsarova, Artem Melnikov, Sergey Dolgov, Reuben Brasher, Michael Perelshtein:
TQCompressor: Improving Tensor Decomposition Methods in Neural Networks Via Permutations. 503-506 - Fei Zhao, Chengcui Zhang, Maya Shah, Nitesh Saxena:
BubbleSig: Same-Hand Ballot Stuffing Detection. 507-510 - Sushmita Chandel, Preeti Dwivedi, Gaurav Bhatnagar, Marcin Kowalski:
Towards a Novel Blob Detection Approach for Concealed Object Detection in Passive Terahertz Imaging. 511-514 - Andrea Caruso, Giovanni Schembra:
A VR 360°-Video Encoding Framework with Differentiated Tile Compression Based on Digital-Twin Technology. 515-521 - Katsuaki Nakano, Michael Zuzak, Cory E. Merkel, Alexander C. Loui:
Trustworthy and Robust Machine Learning for Multimedia: Challenges and Perspectives. 522-528 - Mohit Prabhushankar, Ghassan AlRegib:
Counterfactual Gradients-based Quantification of Prediction Trust in Neural Networks. 529-535 - Zachary McBride Lazri, Dae Yeol Lee, Guan-Ming Su:
A Framework for Single-View Multi-Plane Image Inpainting. 536-541 - Qingyang Zhou, Jiawei Yu, Shan Liu, C.-C. Jay Kuo:
GPSR: A Green Point Cloud Surface Reconstruction Method. 542-548 - Edward Y. Chang:
Behavioral Emotion Analysis Model for Large Language Models. 549-556 - Chih-Chung Hsu, Chia-Ming Lee:
MISS: Memory-efficient Instance Segmentation for Sport-Scenes with Visual Inductive Priors. 557-561 - Ming-Wen Kuan, Wei-Yang Lin, Chia-Ling Tsai, Shih-Jen Chen, Paisan Ruamviboonsuk, Dong-Jie Jiang:
Simultaneous Classification and Segmentation of Subretinal Lesions on ICGA Images. 562-565 - Chen-Wei Wang, Hwai-Jung Hsu:
Automatic Clipping and Text Logging for Baseball Game Videos Using Deep Learning. 566-571 - Alnur Alimanov, Md Baharul Islam:
Advancing Retinal Image Segmentation: A Denoising Diffusion Probabilistic Model Perspective. 572-578 - Kaixuan Li, Wei-bang Chen, Yongjin Lu, Xiaoliang Wang, He Gao:
Automated Recognition of Optic Disc and Blood Vessels in Diabetic Fundoscopy Images Using Real-Time Image Analysis. 579-585 - Li Lin, Yamini Sri Krubha, Zhenhuan Yang, Cheng Ren, Thuc Duy Le, Irene Amerini, Xin Wang, Shu Hu:
Robust COVID-19 Detection in CT Images with CLIP. 586-592 - Aparna Tiwari, Hitika Tiwari, K. S. Venkatesh, Anuj Kumar Sharma:
Enhancing Video Stability with Object-Centric Stabilization. 593-599 - Mohamed Benkedadra, Dany Rimez, Tiffanie Godelaine, Natarajan Chidambaram, Hamed Razavi Khosroshahi, Horacio Tellez, Matei Mancas, Benoît Macq, Sidi Ahmed Mahmoudi:
CIA: Controllable Image Augmentation Framework Based on Stable Diffusion. 600-606 - Li Lin, Sarah Papabathini, Xin Wang, Shu Hu:
Robust Light-Weight Facial Affective Behavior Recognition with CLIP. 607-611 - Dingzong Zhang, Khushi Jain, Priyanka Singh:
Guarding Against ChatGPT Threats: Identifying and Addressing Vulnerabilities. 612-615 - Fayadh Alenezi:
Advection-Diffusion for Feature-based Cancer Diagnosis. 616-621 - Quoc Hoan Vu, Priyanka Singh:
Exploiting Correlation Between Facial Action Units for Detecting Deepfake Videos. 622-625 - Luoxu Jin, Hiroshi Watanabe:
Perceptual Image Compression via Stable Diffusion at Low Bitrate. 626-629 - Benny Stein, Niklas Beck, Daniel Becker, Dennis Wegener:
Building a Generative AI Showroom for Foundation Models with Different Modalities. 630-633 - Omkar N. Kulkarni, Thomas Lloyd-Jones, My Tran, Gregory Vincent, Vivek K. Singh, Pradeep K. Atrey:
Where You Look Matters in Group Photos: A Demo of GARGI iOS App. 634-637 - Dominic Baker, Wei-bang Chen, He Gao:
Early Alzheimer's Detection: The Promise of AI-Powered MRI Analysis. 638-641 - He Gao, Wei-Bang Chen:
ProSchedule: A Comprehensive Mobile Solution for Seamless Academic Scheduling. 642-645 - Hieu Hanh Le, Yuki Yasumitsu, Ryosuke Matsuo, Tomoyoshi Yamazaki, Haruo Yokota:
A Clustering-based Sequence Variants Analysis Method for Electronic Medical Records of Multimedical Institutions. 653-659 - Khushi Jain, Priyanka Singh, Xue Li:
Privacy-Preserving Disease Prediction with Secure Data Deduplication on Untrusted Cloud Servers. 660-666 - Chih-Yuan Li, Jun-Ting Wu, Chan Hsu, Ming-Yen Lin, Yihuang Kang:
Understanding eGFR Trajectories and Kidney Function Decline via Large Multimodal Models. 667-673 - Sukhan Lee, Soojin Lee, Yaejin Lee:
Self-Monitoring the Mental-Health State of a Focused Population with Multiple Self-Questionnaires and Sentiment Descriptions. 674-680 - Nisha Daga, George Kodimattam Joseph:
Big Data and Bigger Dilemmas: Ethical Concerns of Data in Healthcare. 681-684 - Vishakha Pareek, Shreyansh Sharma, Vibhor Singh, Shashwat Singh:
Patient 3D Data Visualisation with AR-based Interactive Technology for Brain MRI. 685-690
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.