Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleNovember 2024
Toward real text manipulation detection: New dataset and new solution
AbstractWith the surge in realistic text tampering, detecting fraudulent text in images has gained prominence for maintaining information security. However, the high costs associated with professional text manipulation and annotation limit the ...
Highlights- A new dataset for text manipulation detection with diverse handcraft manipulations
- An asymmetric dual-stream baseline framework to exploit different transformed domains
- An aggregation hub and a fusion module for efficient multi-...
- research-articleNovember 2024
A multi-modal and multi-stage fusion enhancement network for segmentation based on OCT and OCTA images
AbstractThe accurate segmentation of retinal vessels (RV) and Foveal Avascular Zone (FAZ) is crucial for the retina health assessment. However, fusing OCT and OCTA images while projecting these 3D data for segmentation is a challenge. In this paper, we ...
Highlights- Propose a novel multi-modal fusion enhancement network for RV and FAZ joint segmentation based on OCT and OCTA.
- Design an end-to-end kernel adaptive projection (KAPM) for 3D to 2D and volume fusion (VFM) for data reuse.
- Achieve ...
- research-articleNovember 2024
Towards efficient multi-modal 3D object detection: Homogeneous sparse fuse network
Expert Systems with Applications: An International Journal (EXWA), Volume 256, Issue Chttps://doi.org/10.1016/j.eswa.2024.124945AbstractLiDAR-only 3D detection methods struggle with the sparsity of point clouds. To overcome this issue, multi-modal methods have been proposed, but their fusion is a challenge due to the heterogeneous representation of images and point clouds. This ...
Highlights- LiDAR-based 3D detection faces point cloud sparsity challenges.
- A novel Homogeneous Sparse Fusion multi-modal approach is introduced.
- Homogeneous Sparse Fusion adaptively extracts foreground features.
- Cross-modality consistency ...
- research-articleNovember 2024
Federated Learning Using Multi-Modal Sensors with Heterogeneous Privacy Sensitivity Levels
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 20, Issue 11Article No.: 350, Pages 1–27https://doi.org/10.1145/3686801Data from multi-modal sensors, such as Red-Green-Blue (RGB) cameras, thermal cameras, microphones, and mmWave radars, have gradually been adopted in various classification problems for better accuracy. Some sensors, like RGB cameras and microphones, ...
- research-articleNovember 2024
SCI-MKGC: A MKGC Method Based on Spatial Context and Interaction Attention
SocialMeta '24: Proceedings of the Third International Workshop on Social and Metaverse Computing, Sensing and NetworkingPages 57–62https://doi.org/10.1145/3698387.3700002Multi-modal knowledge graphs (MKGs) contain diverse modalities of feature, such as text, audio and image. Due to the emergence of novel entities or an insufficiency in multi-modal corpora, MKGs confront data incompleteness frequently. Thus, MKGs ...
-
- research-articleNovember 2024
Understanding Non-Verbal Irony Markers: Machine Learning Insights Versus Human Judgment
ICMI '24: Proceedings of the 26th International Conference on Multimodal InteractionPages 164–172https://doi.org/10.1145/3678957.3685723Irony detection is a complex task that often stumps both humans, who frequently misinterpret ironic statements, and artificial intelligence (AI) systems. While the majority of AI research on irony detection has concentrated on linguistic cues, the role ...
- research-articleNovember 2024
M3 LUC: Multi-modal Model for Urban Land-Use Classification
SIGSPATIAL '24: Proceedings of the 32nd ACM International Conference on Advances in Geographic Information SystemsPages 270–281https://doi.org/10.1145/3678717.3691278Identifying urban land-use types is crucial for effective resource management, urban planning, and sustainable development. However, classifying land use is complex due to the complexity of the city and the poor data available in undeveloped areas. In ...
- research-articleOctober 2024
i-Care: A Multi-Modal Data Integration Approach for Real-time Surveillance and Voice Assistance to Improve Infant Care
BuildSys '24: Proceedings of the 11th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and TransportationPages 99–109https://doi.org/10.1145/3671127.3698176The rapid evolution and advancement of technology in adult healthcare have paved the way for improving infant care through enhanced monitoring and assistance systems. Traditional infant monitoring systems face challenges, including the need for large ...
- ArticleNovember 2024
Detect Text Forgery with Non-forged Image Features: A Framework for Detection and Grounding of Image-Text Manipulation
AbstractWith the rapid development of generative models, multimodal fake media has proliferated across the Internet. Detecting and Grounding forgery images and text is crucial in advancing cybersecurity. Most existing approaches utilize image-text ...
- ArticleOctober 2024
Efficient Multi-modal Human-Centric Contrastive Pre-training with a Pseudo Body-Structured Prior
AbstractHuman-centric perception tasks are essential to many human-agent interaction applications. Unsupervised multi-modal human-centric pre-training models have been recently proposed to lay the foundation for various downstream tasks. However, existing ...
- research-articleOctober 2024
Correlation filter based single object tracking: A review
Highlights- A comprehensive review of correlation filter-based tracking algorithms.
- Review of feature-based classification in correlation filter-based tracking.
- Review of recent deep learning-based correlation filter tracking from different ...
In recent years, correlation filter-based (CF) tracking algorithms have gained momentum in the field of visual tracking. CF tracking algorithms have achieved compelling performance by addressing its limitations such as boundary effect and filter ...
- ArticleOctober 2024
MPMNet: Modal Prior Mutual-Support Network for Age-Related Macular Degeneration Classification
Medical Image Computing and Computer Assisted Intervention – MICCAI 2024Pages 733–742https://doi.org/10.1007/978-3-031-72378-0_68AbstractEarly screening and classification of Age-related Macular Degeneration (AMD) are crucial for precise clinical treatment. Currently, most automated methods focus solely on dry and wet AMD classification. However, the classification of wet AMD into ...
- research-articleNovember 2024
MMCL-CPI: A multi-modal compound-protein interaction prediction model incorporating contrastive learning pre-training
Computational Biology and Chemistry (COBC), Volume 112, Issue Chttps://doi.org/10.1016/j.compbiolchem.2024.108137Abstract MotivationCompound-protein interaction (CPI) prediction plays a crucial role in drug discovery and drug repositioning. Early researchers relied on time-consuming and labor-intensive wet laboratory experiments. However, the advent of deep ...
Graphical AbstractDisplay Omitted
Highlights- Introducing MMCL-CPI, a novel method for CPI prediction integrating sequence and image modalities.
- The multi-modal CPI model outperforms others across various datasets.
- Pioneering multi-modal pre-training in bioinformatics, our ...
- research-articleOctober 2024
Large scale multimodal fashion care recognition
AbstractSmart Fashion is reshaping people’s lives, and affects people’s choices and outfits. Existing computer-vision-enabled fashion technology has covered many aspects, such as fashion detection, fashion recognition, fashion segmentation, virtual ...
- ArticleSeptember 2024
ProGEO: Generating Prompts Through Image-Text Contrastive Learning for Visual Geo-Localization
Artificial Neural Networks and Machine Learning – ICANN 2024Pages 448–462https://doi.org/10.1007/978-3-031-72338-4_30AbstractVisual Geo-localization (VG) refers to the process to identify the location described in query images, which is widely applied in robotics field and computer vision tasks, such as autonomous driving, metaverse, augmented reality, and SLAM. In fine-...
- research-articleNovember 2024
A Multi-modal Discourse Analysis of the promotional video "CPC" by Using ELAN Software
ISAIE '24: Proceedings of the 2024 International Symposium on Artificial Intelligence for EducationPages 433–437https://doi.org/10.1145/3700297.3700371The promotional video "CPC" serves as a vital material for showcasing the image of the Communist Party of China around the world. By integrating multi-modalities such as visual, auditory, and textual elements, it effectively conveys information, ...
- research-articleSeptember 2024
Mutli-modal straight flow matching for accelerated MR imaging
Computers in Biology and Medicine (CBIM), Volume 178, Issue Chttps://doi.org/10.1016/j.compbiomed.2024.108668AbstractDiffusion models have garnered great interest lately in Magnetic Resonance (MR) image reconstruction. A key component of generating high-quality samples from noise is iterative denoising for thousands of steps. However, the complexity of ...
Highlights- We propose a ODE-based flow method to solve the accelerated MR Imaging problem and apply domain transfer to MRI reconstruction task for the first time. It is a multi-modal method that transfer domain from undersampled k-space measurements ...
- posterJuly 2024
A Multi-modal Framework for 3D Facial Animation Control
SIGGRAPH '24: ACM SIGGRAPH 2024 PostersArticle No.: 1, Pages 1–2https://doi.org/10.1145/3641234.36710353D facial reconstruction and animation has advanced significantly in the past decades. However, existing methods based on single-modal input struggle with specific facial part control and require post-processing for natural rendering. Recent approaches ...
- research-articleJuly 2024
APPFNet: Adaptive point-pixel fusion network for 3D semantic segmentation with neighbor feature aggregation
Expert Systems with Applications: An International Journal (EXWA), Volume 251, Issue Chttps://doi.org/10.1016/j.eswa.2024.123990Abstract3D semantic segmentation is significant for scene understanding in various domains, such as autonomous driving, mapping, and robotics. Existing research often enhances prediction accuracy by integrating data from camera and LiDAR (light detection ...
Highlights- A multi-modal network architecture APPFNet is proposed based on Transformer.
- A new module NFAM is proposed to enhance point cloud information.
- APPFNet performs semantic segmentation using images and complete point clouds.
- The ...