Computer Science > Computer Vision and Pattern Recognition

arXiv:2405.17729 (cs)

[Submitted on 28 May 2024]

Title:Hierarchical Action Recognition: A Contrastive Video-Language Approach with Hierarchical Interactions

Authors:Rui Zhang, Shuailong Li, Junxiao Xue, Feng Lin, Qing Zhang, Xiao Ma, Xiaoran Yan

Abstract:Video recognition remains an open challenge, requiring the identification of diverse content categories within videos. Mainstream approaches often perform flat classification, overlooking the intrinsic hierarchical structure relating categories. To address this, we formalize the novel task of hierarchical video recognition, and propose a video-language learning framework tailored for hierarchical recognition. Specifically, our framework encodes dependencies between hierarchical category levels, and applies a top-down constraint to filter recognition predictions. We further construct a new fine-grained dataset based on medical assessments for rehabilitation of stroke patients, serving as a challenging benchmark for hierarchical recognition. Through extensive experiments, we demonstrate the efficacy of our approach for hierarchical recognition, significantly outperforming conventional methods, especially for fine-grained subcategories. The proposed framework paves the way for hierarchical modeling in video understanding tasks, moving beyond flat categorization.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
Cite as:	arXiv:2405.17729 [cs.CV]
	(or arXiv:2405.17729v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2405.17729

Submission history

From: Rui Zhang [view email]
[v1] Tue, 28 May 2024 01:17:22 UTC (9,419 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Hierarchical Action Recognition: A Contrastive Video-Language Approach with Hierarchical Interactions

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Hierarchical Action Recognition: A Contrastive Video-Language Approach with Hierarchical Interactions

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators