Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleNovember 2024
Deadline and Period Assignment for Guaranteeing Timely Response of the Cyber-Physical System
ACM Transactions on Design Automation of Electronic Systems (TODAES), Volume 30, Issue 1Article No.: 1, Pages 1–26https://doi.org/10.1145/3689048Cyber-physical systems (CPSs) need to respond to each change of each monitored object in time. The entire response process can be divided into two stages: the update stage and the control stage. Tasks in CPSs can thus be divided into two kinds: update ...
- research-articleOctober 2024
Calibrating Prompt from History for Continual Vision-Language Retrieval and Grounding
MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 4302–4311https://doi.org/10.1145/3664647.3681387In the field of machine learning, continual learning is a crucial concept that allows models to adapt to non-stationary data distributions. However, most of the existing works focus on uni-modal settings and ignore the multi-modal data. In this paper, to ...
- research-articleOctober 2024
Boosting Speech Recognition Robustness to Modality-Distortion with Contrast-Augmented Prompts
MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 3838–3847https://doi.org/10.1145/3664647.3681347In the burgeoning field of Audio-Visual Speech Recognition (AVSR), extant research has predominantly concentrated on the training paradigms tailored for high-quality resources. However, owing to the challenges inherent in real-world data collection, ...
- research-articleOctober 2024
Low-rank Prompt Interaction for Continual Vision-Language Retrieval
MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 8257–8266https://doi.org/10.1145/3664647.3681264Research on continual learning in multi-modal tasks has been receiving increasing attention. However, most existing work overlooks the explicit cross-modal and cross-task interactions. In this paper, we innovatively propose the Low-rank Prompt I...
- research-articleSeptember 2024
PIRN: Phase Invariant Reconstruction Network for infrared image super-resolution
AbstractSingle image super-resolution (SR) reconstruction plays a crucial role in various fields, including surveillance and remote sensing. However, the majority of available SR reconstruction methods are designed primarily for visible images, making it ...
-
- research-articleAugust 2024
A method for image–text matching based on semantic filtering and adaptive adjustment
Journal on Image and Video Processing (JIVP), Volume 2024, Issue 1https://doi.org/10.1186/s13640-024-00639-yAbstractAs image–text matching (a critical task in the field of computer vision) links cross-modal data, it has captured extensive attention. Most of the existing methods intended for matching images and texts explore the local similarity levels between ...
- research-articleAugust 2024
EAGER: Two-Stream Generative Recommender with Behavior-Semantic Collaboration
- Ye Wang,
- Jiahao Xun,
- Minjie Hong,
- Jieming Zhu,
- Tao Jin,
- Wang Lin,
- Haoyuan Li,
- Linjun Li,
- Yan Xia,
- Zhou Zhao,
- Zhenhua Dong
KDD '24: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data MiningPages 3245–3254https://doi.org/10.1145/3637528.3671775Generative retrieval has recently emerged as a promising approach to sequential recommendation, framing candidate item retrieval as an autoregressive sequence generation problem. However, existing generative methods typically focus solely on either ...
- research-articleAugust 2024
Multi-Granularity Relational Attention Network for Audio-Visual Question Answering
- Linjun Li,
- Tao Jin,
- Wang Lin,
- Hao Jiang,
- Wenwen Pan,
- Jian Wang,
- Shuwen Xiao,
- Yan Xia,
- Weihao Jiang,
- Zhou Zhao
IEEE Transactions on Circuits and Systems for Video Technology (IEEETCSVT), Volume 34, Issue 8Pages 7080–7094https://doi.org/10.1109/TCSVT.2023.3264524Recent methods for video question answering (VideoQA), aiming to generate answers based on given questions and video content, have made significant progress in cross-modal interaction. From the perspective of video understating, these existing frameworks ...
- ArticleJanuary 2025
Development of a Bistable Multi-joint Modular Gripper with Enhanced Adaptability and Speed
AbstractHandling dynamic objects is a significant challenge in robotics, necessitating the development of grippers capable of safe and reliable manipulation without compromising speed. Traditional rigid grippers, when used in dynamic environments, often ...
- research-articleJuly 2024
Borda regret minimization for generalized linear dueling bandits
ICML'24: Proceedings of the 41st International Conference on Machine LearningArticle No.: 2195, Pages 53571–53596Dueling bandits are widely used to model preferential feedback prevalent in many applications such as recommendation systems and ranking. In this paper, we study the Borda regret minimization problem for dueling bandits, which aims to identify the item ...
- research-articleJuly 2024
FreeBind: free lunch in unified multimodal space via knowledge fusion
- Zehan Wang,
- Ziang Zhang,
- Xize Cheng,
- Rongjie Huang,
- Luping Liu,
- Zhenhui Ye,
- Haifeng Huang,
- Yang Zhao,
- Tao Jin,
- Peng Gao,
- Zhou Zhao
ICML'24: Proceedings of the 41st International Conference on Machine LearningArticle No.: 2139, Pages 52233–52246Unified multi-model representation spaces are the foundation of multimodal understanding and generation. However, the billions of model parameters and catastrophic forgetting problems make it challenging to further enhance pre-trained unified spaces. In ...
- research-articleJuly 2024
Non-confusing generation of customized concepts in diffusion models
- Wang Lin,
- Jingyuan Chen,
- Jiaxin Shi,
- Yichen Zhu,
- Chen Liang,
- Junzhong Miao,
- Tao Jin,
- Zhou Zhao,
- Fei Wu,
- Shuicheng Yan,
- Hanwang Zhang
ICML'24: Proceedings of the 41st International Conference on Machine LearningArticle No.: 1206, Pages 29935–29948We tackle the common challenge of inter-concept visual confusion in compositional concept generation using text-guided diffusion models (TGDMs). It becomes even more pronounced in the generation of customized concepts, due to the scarcity of user-...
- research-articleJune 2024
A three-field based finite element analysis for a class of magnetoelastic materials
Finite Elements in Analysis and Design (FEAD), Volume 233, Issue Chttps://doi.org/10.1016/j.finel.2024.104126AbstractA simple yet effective material model was proposed by Zhao et al. (2019) and demonstrated to be capable of modeling the shape transformations of various planar and three-dimensional material samples programmed with the so-called “hard-magnetic ...
- research-articleApril 2024
Reputation incentives with public supervision promote cooperation in evolutionary games
Applied Mathematics and Computation (APMC), Volume 466, Issue Chttps://doi.org/10.1016/j.amc.2023.128445AbstractPublic supervision, as a source of social behavioral norms and moral guidelines, exerts important guidance and influence on individuals. To maintain public order, in this study, we propose a reputation incentives mechanism with public supervision,...
Highlights- We propose a reputation incentives mechanism with public supervision.
- The dynamic process takes into account individual differences and incorporates diverse evaluation standards.
- We study on the impact of the evaluation intensity ...
- research-articleMarch 2024
Research on remote control system of tracking trolley based on ESP8266
CCEAI '24: Proceedings of the 2024 8th International Conference on Control Engineering and Artificial IntelligencePages 171–177https://doi.org/10.1145/3640824.3640851Traditional line-tracking cars usually use wired transmission to perform real-time AD (Analog to Digital) signal sampling and control, which has many disadvantages, such as cumbersome steps, multiple constraints, poor real-time performance, and low time ...
- research-articleMay 2024
Assembly Action Recognition based on Dual Stream Fusion of Skeleton and Video Data
ICIGP '24: Proceedings of the 2024 7th International Conference on Image and Graphics ProcessingPages 159–165https://doi.org/10.1145/3647649.3647676In order to solve the problem of low accuracy of workers' assembly action recognition relying only on human skeletal data in complex background change environments, an assembly action recognition network based on dual stream fusion of skeleton and video ...
- research-articleNovember 2023
Trust-aware conditional adversarial domain adaptation with feature norm alignment
Neural Networks (NENE), Volume 168, Issue CPages 518–530https://doi.org/10.1016/j.neunet.2023.10.002AbstractAdversarial learning has proven to be an effective method for capturing transferable features for unsupervised domain adaptation. However, some existing conditional adversarial domain adaptation methods assign equal importance to different ...
Highlights- The feature norms of each domain usually follow a complex distribution.
- Data transferability is precisely quantified by Gaussian-uniform mixture model.
- Mixed information can better guide features away from the decision boundary.
- research-articleOctober 2023
Rethinking Missing Modality Learning from a Decoding Perspective
MM '23: Proceedings of the 31st ACM International Conference on MultimediaPages 4431–4439https://doi.org/10.1145/3581783.3612291Conventional pipeline of multimodal learning consists of three stages, including encoding, fusion, and decoding. Most existing methods under missing modality condition focus on the first stage and aim to learn the modality invariant representation or ...
- demonstrationOctober 2023
RadarHD: Demonstrating Lidar-like Point Clouds from mmWave Radar
- Akarsh Prabhakara,
- Tao Jin,
- Arnav Das,
- Gantavya Bhatt,
- Lilly Kumari,
- Elahe Soltanaghai,
- Jeff Bilmes,
- Swarun Kumar,
- Anthony Rowe
ACM MobiCom '23: Proceedings of the 29th Annual International Conference on Mobile Computing and NetworkingArticle No.: 106, Pages 1–3https://doi.org/10.1145/3570361.3614077Millimeter wave radars can perceive through occlusions like dust, fog, smoke and clothes. But compared to cameras and lidars, their perception quality is orders of magnitude poorer. RadarHD [3] tackles this problem of poor quality by creating a machine ...
- ArticleSeptember 2023
NegT5: A Cross-Task Text-to-Text Framework for Negation in Question Answering
Intelligent Information and Database SystemsPages 272–285https://doi.org/10.1007/978-981-99-5837-5_23AbstractNegation is a fundamental grammatical construct that plays a crucial role in understanding QA tasks. It has been revealed that models trained with SQuAD1 still produce original responses when presented with negated sentences. To mitigate this ...