UniAV: Unified Audio-Visual Perception for Multi-Task Video Localization.

AllVideos Shopping Images Maps News Books

Unified Audio-Visual Perception for Multi-Task Video Event Localization

Apr 4, 2024 · In this work, we present UniAV, a Unified Audio-Visual perception network, to achieve joint learning of TAL, SED and AVEL tasks for the first time.

Unified Audio-Visual Perception for Multi-Task Video Localization

github.com › ttgeng233 › UniAV

This paper introduces the first unified framework to localize all three kinds of instances in untrimmed videos, including visual actions, sound events and ...

Unified Audio-Visual Perception for Multi-Task Video Localization - arXiv

arxiv.org › html

In this work, we present UniAV, a Unified Audio-Visual perception network, to achieve joint learning of TAL, SED and AVEL tasks for the first time.

UnAV-100 Dataset - Papers With Code

paperswithcode.com › dataset › unav-100

We introduce the first Untrimmed Audio-Visual (UnAV-100) dataset, which contains 10K untrimmed videos with over 30K audio-visual events covering 100 event ...

UniAV: Unified Audio-Visual Perception for Multi-Task Video ...

www.aimodels.fyi › papers › arxiv › uni...

Aug 12, 2024 · This paper presents UniAV, a unified audio-visual perception model for multi-task video localization. · UniAV is designed to leverage both visual ...

Tiantian Geng ttgeng233 - GitHub

github.com › ttgeng233

... Videos: A Large-Scale Benchmark and Baseline (CVPR 2023). Python 54 4 · UniAV UniAV Public. Unified Audio-Visual Perception for Multi-Task Video Localization.

arXiv Sound on X: "``UniAV: Unified Audio-Visual Perception for Multi ...

twitter.com › ArxivSound › status

Apr 5, 2024 · Video localization tasks aim to temporally locate specific instances in videos, including temporal action localization (TAL), sound event ...

arXiv Sound on X: "``UniAV: Unified Audio-Visual Perception for Multi ...

twitter.com › ArxivSound › status

Aug 13, 2024 · ``UniAV: Unified Audio-Visual Perception for Multi-Task Video Event Localization,'' Tiantian Geng, Teng Wang, Yanfu Zhang, Jinming Duan, ...

Yanfu Zhang - CatalyzeX

www.catalyzex.com › author

In this work, we present UniAV, a Unified Audio-Visual perception network, to achieve joint learning of TAL, SED and AVEL tasks for the first time.

‪Teng Wang‬ - ‪Google Scholar‬

scholar.google.com › citations

UniAV: Unified Audio-Visual Perception for Multi-Task Video Localization. T Geng, T Wang, Y Zhang, J Duan, W Guan, F Zheng. arXiv preprint arXiv:2404.03179 ...