Computer Science > Computer Vision and Pattern Recognition

arXiv:2108.10696 (cs)

[Submitted on 24 Aug 2021 (v1), last revised 17 Jan 2022 (this version, v2)]

Title:Spatio-Temporal Self-Attention Network for Video Saliency Prediction

Authors:Ziqiang Wang, Zhi Liu, Gongyang Li, Yang Wang, Tianhong Zhang, Lihua Xu, Jijun Wang

View PDF

Abstract:3D convolutional neural networks have achieved promising results for video tasks in computer vision, including video saliency prediction that is explored in this paper. However, 3D convolution encodes visual representation merely on fixed local spacetime according to its kernel size, while human attention is always attracted by relational visual features at different time. To overcome this limitation, we propose a novel Spatio-Temporal Self-Attention 3D Network (STSANet) for video saliency prediction, in which multiple Spatio-Temporal Self-Attention (STSA) modules are employed at different levels of 3D convolutional backbone to directly capture long-range relations between spatio-temporal features of different time steps. Besides, we propose an Attentional Multi-Scale Fusion (AMSF) module to integrate multi-level features with the perception of context in semantic and spatio-temporal subspaces. Extensive experiments demonstrate the contributions of key components of our method, and the results on DHF1K, Hollywood-2, UCF, and DIEM benchmark datasets clearly prove the superiority of the proposed model compared with all state-of-the-art models.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2108.10696 [cs.CV]
	(or arXiv:2108.10696v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2108.10696
Related DOI:	https://doi.org/10.1109/TMM.2021.3139743

Submission history

From: Ziqiang Wang [view email]
[v1] Tue, 24 Aug 2021 12:52:47 UTC (2,002 KB)
[v2] Mon, 17 Jan 2022 14:45:41 UTC (2,319 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Spatio-Temporal Self-Attention Network for Video Saliency Prediction

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Spatio-Temporal Self-Attention Network for Video Saliency Prediction

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators