1 Center on Frontiers of Computing Studies, Peking University
2 School of EECS, Peking University
3 Beijing Institute for General AI
4 Stanford University
5 Tsinghua University
* equal contributions
† corresponding author
UniDexGrasp via grasp proposal generation and goal-conditioned execution. Left (grasp proposals): each figure shows for an object we generate diverse and high-quality grasp poses that vary greatly in rotation, translation and articulation states; right (grasp execution): Given two different grasp goal poses illustrated in the two bottom corners, we learn highly generalizable goal-conditioned grasping policy that can adaptively execute each corresponding goal pose, respectively shown in the green and blue trajectories.
In this work, we tackle the problem of learning universal
robotic dexterous grasping from a point cloud observation
under a table-top setting. The goal is to grasp and lift up objects
in high-quality and diverse ways and generalize across
hundreds of categories and even the unseen.
Inspired by
successful pipelines used in parallel gripper grasping, we
split the task into two stages:
Our main pipeline. The left part is the first stage, which generates a dexterous grasp proposal. The input is the object point cloud at time step 0, $X_0$, fused from depth images, with ground truth segmentation of the table and the object. A rotation $R$ is sampled from the distribution implied by the GraspIPDF, and the point cloud will be canonicalized by $R^{-1}$ to $\tilde{X}_0$. The GraspGlow then samples the translation $\tilde{\bm{t}}$ and joint angles $\bm{q}$. Next, the ContactNet takes $\tilde{X}_0$ and a point cloud $\tilde{X}_H$ sampled from the hand to predict the ideal contact map $\bm{c}$ on the object. Then, the predicted hand pose is optimized based on the contact information. The final goal pose is transformed by $R$ to align with the original visual observation. The right part is the second stage, the goal-conditioned dexterous grasping policy that takes the goal $\bm{g}$, point cloud $X_t$ and robot proprioception $\bm{s}^r_t$ to take actions accordingly.
The goal-conditioned dexterous grasping policy pipeline. $\widetilde{{\mathcal{S}}^{\mathcal{E}}}=(\widetilde{\bm{s}_r},\widetilde{\bm{s}_o},X_O,\widetilde{g})$ and $\widetilde{{\mathcal{S}}^{\mathcal{S}}}=(\widetilde{\bm{s}_r},\widetilde{X_S},\widetilde{g})$ denote the input state of the teacher policy and student policy after state canonicalization, respectively; $\oplus$ denotes concatenation.
Qualitative results of language-guided grasp proposal selection. CLIP can select proposals complying with the language instruction, allowing goal-conditioned policy to execute potentially functional grasps.
@article{xu2023unidexgrasp,
title={UniDexGrasp: Universal Robotic Dexterous Grasping via Learning Diverse Proposal Generation and Goal-Conditioned Policy},
author={Xu, Yinzhen and Wan, Weikang and Zhang, Jialiang and Liu, Haoran and Shan, Zikang and Shen, Hao and Wang, Ruicheng and Geng, Haoran and Weng, Yijia and Chen, Jiayi and others},
journal={arXiv preprint arXiv:2303.00938},
year={2023}
}
If you have any questions, please feel free to contact us: