UniDexGrasp

UniDexGrasp: Universal Robotic Dexterous Grasping via Learning Diverse Proposal Generation and Goal-Conditioned Policy

CVPR 2023

Yinzhen Xu^{1, 2, 3} Weikang Wan^{1, 2} Jialiang Zhang^{1, 2} Haoran Liu^{1, 2} Zikang Shan¹ Hao Shen¹ Ruicheng Wang¹ Haoran Geng^{1, 2} Yijia Weng⁴ Jiayi Chen¹ Tengyu Liu³ Li Yi⁵ He Wang^{†1, 2}

¹ Center on Frontiers of Computing Studies, Peking University ² School of EECS, Peking University ³ Beijing Institute for General AI ⁴ Stanford University ⁵ Tsinghua University

^* equal contributions ^† corresponding author

Paper

Code

Dataset

UniDexGrasp via grasp proposal generation and goal-conditioned execution. Left (grasp proposals): each figure shows for an object we generate diverse and high-quality grasp poses that vary greatly in rotation, translation and articulation states; right (grasp execution): Given two different grasp goal poses illustrated in the two bottom corners, we learn highly generalizable goal-conditioned grasping policy that can adaptively execute each corresponding goal pose, respectively shown in the green and blue trajectories.

Abstract

In this work, we tackle the problem of learning universal robotic dexterous grasping from a point cloud observation under a table-top setting. The goal is to grasp and lift up objects in high-quality and diverse ways and generalize across hundreds of categories and even the unseen.

Inspired by successful pipelines used in parallel gripper grasping, we split the task into two stages:

grasp proposal (pose) generation;
goal-conditioned grasp execution.

For the first stage, we propose a novel probabilistic model of grasp pose conditioned on the point cloud observation that factorizes rotation modeling and translation and articulation modeling. Trained on our synthesized large-scale dexterous grasp dataset, this model enables us to sample diverse and high-quality dexterous grasp poses for the object in the point cloud. For the second stage, we propose to replace the motion planning used in parallel gripper grasping with a goal-conditioned grasp policy, due to the complexity involved in dexterous grasping execution. Note that it is very challenging to learn this highly generalizable grasp policy that only takes realistic inputs without oracle states.

We thus propose several important innovations, including state canonicalization, object curriculum, and teacher-student distillation. When integrating the two stages together, our final pipeline, for the first time, shows universal dexterous grasping on thousands of object instances with more than 60% success rate and significantly outperforms all baselines. Our experiments show minimal generalization gap between the seen and unseen instances, further demonstrating the universality of our method.

Methods

Full pipeline

Our main pipeline. The left part is the first stage, which generates a dexterous grasp proposal. The input is the object point cloud at time step 0, $X_0$, fused from depth images, with ground truth segmentation of the table and the object. A rotation $R$ is sampled from the distribution implied by the GraspIPDF, and the point cloud will be canonicalized by $R^{-1}$ to $\tilde{X}_0$. The GraspGlow then samples the translation $\tilde{\bm{t}}$ and joint angles $\bm{q}$. Next, the ContactNet takes $\tilde{X}_0$ and a point cloud $\tilde{X}_H$ sampled from the hand to predict the ideal contact map $\bm{c}$ on the object. Then, the predicted hand pose is optimized based on the contact information. The final goal pose is transformed by $R$ to align with the original visual observation. The right part is the second stage, the goal-conditioned dexterous grasping policy that takes the goal $\bm{g}$, point cloud $X_t$ and robot proprioception $\bm{s}^r_t$ to take actions accordingly.

Grasping Policy

The goal-conditioned dexterous grasping policy pipeline. $\widetilde{{\mathcal{S}}^{\mathcal{E}}}=(\widetilde{\bm{s}_r},\widetilde{\bm{s}_o},X_O,\widetilde{g})$ and $\widetilde{{\mathcal{S}}^{\mathcal{S}}}=(\widetilde{\bm{s}_r},\widetilde{X_S},\widetilde{g})$ denote the input state of the teacher policy and student policy after state canonicalization, respectively; $\oplus$ denotes concatenation.

UniDexGrasp: Universal Robotic Dexterous Grasping via Learning Diverse Proposal Generation and Goal-Conditioned Policy

CVPR 2023

Video

Abstract

Methods

Full pipeline

Grasping Policy

Language-guided Dexterous Grasping

Qualitative results

Citation

Contact