Apr 4, 2024 · In this work, we present UniAV, a Unified Audio-Visual perception network, to achieve joint learning of TAL, SED and AVEL tasks for the first time.
This paper introduces the first unified framework to localize all three kinds of instances in untrimmed videos, including visual actions, sound events and ...
In this work, we present UniAV, a Unified Audio-Visual perception network, to achieve joint learning of TAL, SED and AVEL tasks for the first time.
We introduce the first Untrimmed Audio-Visual (UnAV-100) dataset, which contains 10K untrimmed videos with over 30K audio-visual events covering 100 event ...
Aug 12, 2024 · This paper presents UniAV, a unified audio-visual perception model for multi-task video localization. · UniAV is designed to leverage both visual ...
... Videos: A Large-Scale Benchmark and Baseline (CVPR 2023). Python 54 4 · UniAV UniAV Public. Unified Audio-Visual Perception for Multi-Task Video Localization.
Apr 5, 2024 · Video localization tasks aim to temporally locate specific instances in videos, including temporal action localization (TAL), sound event ...
Aug 13, 2024 · ``UniAV: Unified Audio-Visual Perception for Multi-Task Video Event Localization,'' Tiantian Geng, Teng Wang, Yanfu Zhang, Jinming Duan, ...
In this work, we present UniAV, a Unified Audio-Visual perception network, to achieve joint learning of TAL, SED and AVEL tasks for the first time.
UniAV: Unified Audio-Visual Perception for Multi-Task Video Localization. T Geng, T Wang, Y Zhang, J Duan, W Guan, F Zheng. arXiv preprint arXiv:2404.03179 ...