Computer Science > Robotics

arXiv:2308.12537 (cs)

[Submitted on 24 Aug 2023]

Title:HuBo-VLM: Unified Vision-Language Model designed for HUman roBOt interaction tasks

Authors:Zichao Dong, Weikun Zhang, Xufeng Huang, Hang Ji, Xin Zhan, Junbo Chen

View PDF

Abstract:Human robot interaction is an exciting task, which aimed to guide robots following instructions from human. Since huge gap lies between human natural language and machine codes, end to end human robot interaction models is fair challenging. Further, visual information receiving from sensors of robot is also a hard language for robot to perceive. In this work, HuBo-VLM is proposed to tackle perception tasks associated with human robot interaction including object detection and visual grounding by a unified transformer based vision language model. Extensive experiments on the Talk2Car benchmark demonstrate the effectiveness of our approach. Code would be publicly available in this https URL.

Subjects:	Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2308.12537 [cs.RO]
	(or arXiv:2308.12537v1 [cs.RO] for this version)
	https://doi.org/10.48550/arXiv.2308.12537

Submission history

From: Weikun Zhang [view email]
[v1] Thu, 24 Aug 2023 03:47:27 UTC (3,978 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.RO

< prev | next >

new | recent | 2023-08

Change to browse by:

cs
cs.CV

References & Citations

export BibTeX citation

Computer Science > Robotics

Title:HuBo-VLM: Unified Vision-Language Model designed for HUman roBOt interaction tasks

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Robotics

Title:HuBo-VLM: Unified Vision-Language Model designed for HUman roBOt interaction tasks

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators