Computer Science > Computer Vision and Pattern Recognition

arXiv:2408.05364 (cs)

[Submitted on 9 Aug 2024]

Title:Spherical World-Locking for Audio-Visual Localization in Egocentric Videos

Authors:Heeseung Yun, Ruohan Gao, Ishwarya Ananthabhotla, Anurag Kumar, Jacob Donley, Chao Li, Gunhee Kim, Vamsi Krishna Ithapu, Calvin Murdock

View PDF HTML (experimental)

Abstract:Egocentric videos provide comprehensive contexts for user and scene understanding, spanning multisensory perception to behavioral interaction. We propose Spherical World-Locking (SWL) as a general framework for egocentric scene representation, which implicitly transforms multisensory streams with respect to measurements of head orientation. Compared to conventional head-locked egocentric representations with a 2D planar field-of-view, SWL effectively offsets challenges posed by self-motion, allowing for improved spatial synchronization between input modalities. Using a set of multisensory embeddings on a worldlocked sphere, we design a unified encoder-decoder transformer architecture that preserves the spherical structure of the scene representation, without requiring expensive projections between image and world coordinate systems. We evaluate the effectiveness of the proposed framework on multiple benchmark tasks for egocentric video understanding, including audio-visual active speaker localization, auditory spherical source localization, and behavior anticipation in everyday activities.

Comments:	ECCV2024
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2408.05364 [cs.CV]
	(or arXiv:2408.05364v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2408.05364

Submission history

From: Heeseung Yun [view email]
[v1] Fri, 9 Aug 2024 22:29:04 UTC (2,715 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Spherical World-Locking for Audio-Visual Localization in Egocentric Videos

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Spherical World-Locking for Audio-Visual Localization in Egocentric Videos

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators