## UniDexGrasp: Universal Robotic Dexterous Grasping via Learning Diverse Proposal Generation and Goal-Conditioned Policy

#### CVPR 2023

###### Yinzhen Xu*1, 2Weikang Wan*1     Jialiang Zhang*1    Haoran Liu*1    Zikang Shan1     Hao Shen 1     Ruicheng Wang1Haoran Geng1, 2Yijia Weng3Jiayi Chen1, 2Tengyu Liu2Li Yi4He Wang†1 1 Peking University    2 Beijing Institute for General AI    3 Stanford University    4 Tsinghua University    * equal contributions   † corresponding author

UniDexGrasp via grasp proposal generation and goal-conditioned execution. Left (grasp proposals): each figure shows for an object we generate diverse and high-quality grasp poses that vary greatly in rotation, translation and articulation states; right (grasp execution): Given two different grasp goal poses illustrated in the two bottom corners, we learn highly generalizable goal-conditioned grasping policy that can adaptively execute each corresponding goal pose, respectively shown in the green and blue trajectories.

## Abstract

In this work, we tackle the problem of learning universal robotic dexterous grasping from a point cloud observation under a table-top setting. The goal is to grasp and lift up objects in high-quality and diverse ways and generalize across hundreds of categories and even the unseen.

Inspired by successful pipelines used in parallel gripper grasping, we split the task into two stages:

1. grasp proposal (pose) generation;
2. goal-conditioned grasp execution.
For the first stage, we propose a novel probabilistic model of grasp pose conditioned on the point cloud observation that factorizes rotation modeling and translation and articulation modeling. Trained on our synthesized large-scale dexterous grasp dataset, this model enables us to sample diverse and high-quality dexterous grasp poses for the object in the point cloud. For the second stage, we propose to replace the motion planning used in parallel gripper grasping with a goal-conditioned grasp policy, due to the complexity involved in dexterous grasping execution. Note that it is very challenging to learn this highly generalizable grasp policy that only takes realistic inputs without oracle states.

We thus propose several important innovations, including state canonicalization, object curriculum, and teacher-student distillation. When integrating the two stages together, our final pipeline, for the first time, shows universal dexterous grasping on thousands of object instances with more than 60% success rate and significantly outperforms all baselines. Our experiments show minimal generalization gap between the seen and unseen instances, further demonstrating the universality of our method.

## Methods

### Full pipeline

Our main pipeline. The input is an object point cloud $X$ fused from depth images, with ground truth segmentation of the table and the object. A rotation $R$ is sampled from the distribution implied by the GraspIPDF, and the point cloud will be canonicalized by $R^{-1}$ to $\tilde{X}$. The GraspGlow then samples the translation $\bm{t}$ and joint angles $\bm{q}$. Next, the ContactNet takes $\tilde{X}$ and a point cloud $X_H$ sampled from the hand to predict the ideal contact map $\bm{c}$ on the object. Then, the predicted hand pose is optimized based on the contact information. The final goal pose is transformed by $R$ to align to the original visual observation. Finally, the RL policy takes the goal $\bm{g}$, point cloud $X$ and robot proprioception $\bm{s}_r$ to take actions accordingly.

### Grasping Policy

The goal-conditioned dexterous grasping policy pipeline. $\widetilde{{\mathcal{S}}^{\mathcal{E}}}=(\widetilde{\bm{s}_r},\widetilde{\bm{s}_o},X_O,\widetilde{g})$ and $\widetilde{{\mathcal{S}}^{\mathcal{S}}}=(\widetilde{\bm{s}_r},\widetilde{X_S},\widetilde{g})$ denote the input state of the teacher policy and student policy after state canonicalization, respectively; $\oplus$ denotes concatenation.

## Language-guided Dexterous Grasping

Qualitative results of language-guided grasp proposal selection. CLIP can select proposals complying with the language instruction, allowing goal-conditioned policy to execute potentially functional grasps.

## Citation


@article{xu2023unidexgrasp,
title={UniDexGrasp: Universal Robotic Dexterous Grasping via Learning Diverse Proposal Generation and Goal-Conditioned Policy},
author={Xu, Yinzhen and Wan, Weikang and Zhang, Jialiang and Liu, Haoran and Shan, Zikang and Shen, Hao and Wang, Ruicheng and Geng, Haoran and Weng, Yijia and Chen, Jiayi and others},
journal={arXiv preprint arXiv:2303.00938},
year={2023}
}