Computer Science > Robotics

arXiv:2405.01527 (cs)

[Submitted on 2 May 2024 (v1), last revised 8 Aug 2024 (this version, v2)]

Title:Track2Act: Predicting Point Tracks from Internet Videos enables Generalizable Robot Manipulation

Authors:Homanga Bharadhwaj, Roozbeh Mottaghi, Abhinav Gupta, Shubham Tulsiani

Abstract:We seek to learn a generalizable goal-conditioned policy that enables zero-shot robot manipulation: interacting with unseen objects in novel scenes without test-time adaptation. While typical approaches rely on a large amount of demonstration data for such generalization, we propose an approach that leverages web videos to predict plausible interaction plans and learns a task-agnostic transformation to obtain robot actions in the real world. Our framework,Track2Act predicts tracks of how points in an image should move in future time-steps based on a goal, and can be trained with diverse videos on the web including those of humans and robots manipulating everyday objects. We use these 2D track predictions to infer a sequence of rigid transforms of the object to be manipulated, and obtain robot end-effector poses that can be executed in an open-loop manner. We then refine this open-loop plan by predicting residual actions through a closed loop policy trained with a few embodiment-specific demonstrations. We show that this approach of combining scalably learned track prediction with a residual policy requiring minimal in-domain robot-specific data enables diverse generalizable robot manipulation, and present a wide array of real-world robot manipulation results across unseen tasks, objects, and scenes. this https URL

Comments:	ECCV 2024. Last 3 authors contributed equally
Subjects:	Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2405.01527 [cs.RO]
	(or arXiv:2405.01527v2 [cs.RO] for this version)
	https://doi.org/10.48550/arXiv.2405.01527

Submission history

From: Homanga Bharadhwaj [view email]
[v1] Thu, 2 May 2024 17:56:55 UTC (22,792 KB)
[v2] Thu, 8 Aug 2024 23:18:08 UTC (45,886 KB)

Computer Science > Robotics

Title:Track2Act: Predicting Point Tracks from Internet Videos enables Generalizable Robot Manipulation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Robotics

Title:Track2Act: Predicting Point Tracks from Internet Videos enables Generalizable Robot Manipulation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators