Computer Science > Computer Vision and Pattern Recognition

arXiv:2410.14540 (cs)

[Submitted on 18 Oct 2024]

Title:Multi-modal Pose Diffuser: A Multimodal Generative Conditional Pose Prior

Authors:Calvin-Khang Ta, Arindam Dutta, Rohit Kundu, Rohit Lal, Hannah Dela Cruz, Dripta S. Raychaudhuri, Amit Roy-Chowdhury

View PDF HTML (experimental)

Abstract:The Skinned Multi-Person Linear (SMPL) model plays a crucial role in 3D human pose estimation, providing a streamlined yet effective representation of the human body. However, ensuring the validity of SMPL configurations during tasks such as human mesh regression remains a significant challenge , highlighting the necessity for a robust human pose prior capable of discerning realistic human poses. To address this, we introduce MOPED: \underline{M}ulti-m\underline{O}dal \underline{P}os\underline{E} \underline{D}iffuser. MOPED is the first method to leverage a novel multi-modal conditional diffusion model as a prior for SMPL pose parameters. Our method offers powerful unconditional pose generation with the ability to condition on multi-modal inputs such as images and text. This capability enhances the applicability of our approach by incorporating additional context often overlooked in traditional pose priors. Extensive experiments across three distinct tasks-pose estimation, pose denoising, and pose completion-demonstrate that our multi-modal diffusion model-based prior significantly outperforms existing methods. These results indicate that our model captures a broader spectrum of plausible human poses.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2410.14540 [cs.CV]
	(or arXiv:2410.14540v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2410.14540

Submission history

From: Calvin-Khang Ta [view email]
[v1] Fri, 18 Oct 2024 15:29:19 UTC (36,629 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Multi-modal Pose Diffuser: A Multimodal Generative Conditional Pose Prior

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Multi-modal Pose Diffuser: A Multimodal Generative Conditional Pose Prior

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators