GenH2R: Learning Generalizable Human-to-Robot Handover via Scalable Simulation, Demonstration, and Imitation

This paper presents GenH2R, a framework for learning generalizable vision-based human-to-robot (H2R) handover skills. The goal is to equip robots with the ability to reliably receive objects with unseen geometry handed over by humans in various complex trajectories.

We acquire such generalizability by learning H2R handover at scale with a comprehensive solution including procedural simulation assets creation, automated demonstration generation, and effective imitation learning. We leverage large-scale 3D model repositories, dexterous grasp generation methods, and curve-based 3D animation to create an H2R handover simulation environment named GenH2R-Sim, surpassing the number of scenes in existing simulators by three orders of magnitude. We further introduce a distillation-friendly demonstration generation method that automatically generates a million high-quality demonstrations suitable for learning. Finally, we present a 4D imitation learning method augmented by a future forecasting objective to distill demonstrations into a visuo-motor handover policy.

Experimental evaluations in both simulators and the real world demonstrate significant improvements (at least +10% success rate) over baselines in all cases.

Previous simulator only captures real-world human grasping objects in a limited manner (only 1000 scenes with 20 distinct objects). We introduce a new environment, GenH2R-Sim, to overcome these deficiencies and facilitate generalizable handovers.

Grasping poses: we use DexGraspNet to generate a substantial dataset of human hand grasp poses. We utilize this method to generate approximately 1,000,000 grasp poses for 3,266 different objects sourced from ShapeNet.

Hand-object moving trajectories: We use multiple Bézier curves to model different stages of the motion, and link the ends of these curves to create a seamless track.

To scale up robot demonstrations, we propose to automatically generate demonstrations with grasp and motion planning using privileged human motion and object state information.

We address a key question in learning visuomotor policy: how to efficiently generate robot demostrations that incorporate paired vision-action data from successful task experiences.

We identify the vision-action correlation between visual observations and planned actions as the crucial factor influencing distillability. We present a distillation-friendly demonstration generation method that sparsely samples handover animations for landmark states and periodically replans grasp and motion based on privileged future landmarks.

To distill the above demonstrations into a visuomotor policy, we utilize point cloud input for its richer geometric information and smaller sim-vs-real gap compared to images.

We propose a 4D imitation learning method that factors the sequential point cloud observations into geometry and motion parts, facilitating policy learning by better revealing the current scene state.

Furthermore, the imitation objective is augmented by a forecasting objective which predicts the future motion of the handover object. vision-action correlation.

Baseline(HandoverSim2real) - success rate(%): 75.23

Ours - success rate(%): 86.57

Baseline(HandoverSim2real) - success rate(%): 68.75

Ours - success rate(%): 85.65

Baseline(HandoverSim2real) - success rate(%): 29.17

Ours - success rate(%): 41.43

Baseline(HandoverSim2real) - success rate(%): 52.4

Ours - success rate(%): 68.33

Baseline (HandoverSim2real)

BibTeX

If you have any questions, please contact:

Zifan Wang (wzf22@mails.tsinghua.edu.cn)

Junyu Chen (junyu-ch21@mails.tsinghua.edu.cn)

BibTeX

@article{wang2024genh2r,
  title={GenH2R: Learning Generalizable Human-to-Robot Handover via Scalable Simulation, Demonstration, and Imitation},
  author={Wang, Zifan and Chen, Junyu and Chen, Ziqing and Xie, Pengwei and Chen, Rui and Yi, Li},
  journal={arXiv preprint arXiv:2401.00929},
  year={2024}
}

Abstract

Video

Method

GenH2R-Sim

Generating Demonstrations for Distillation

Forecast-Aided 4D Imitation Learning

Experiments

Simulation Experiments

Real-world Experiments

BibTeX

BibTeX