UniDexGrasp++: Improving Dexterous Grasping Policy Learning via Geometry-aware Curriculum and Iterative Generalist-Specialist Learning

ICCV 2023 Best Paper Finalist


Weikang Wan*    Haoran Geng*    Yun Liu    Zikang Shan    Yaodong Yang    Li Yi    He Wang   

* equal contributions   corresponding author  


input

In this work, we present a novel dexterous grasping policy learning pipeline, UniDexGrasp++. Same to UniDexGrasp, UniDexGrasp++ is trained on 3000+ different object instances with random object poses under a table-top setting. It significantly outperforms the previous SOTA and achieves 85.4% and 78.2% success rates on the train and test set.


Abstract


We propose a novel, object-agnostic method for learning a universal policy for dexterous object grasping from realistic point cloud observations and proprioceptive information under a table-top setting, namely UniDexGrasp++. To address the challenge of learning the vision-based policy across thousands of object instances, we propose Geometry-aware Curriculum Learning (GeoCurriculum) and Geometry-aware iterative Generalist-Specialist Learning (GiGSL) which leverage the geometry feature of the task and significantly improve the generalizability. With our proposed techniques, our final policy shows universal dexterous grasping on thousands of object instances with 85.4% and 78.2% success rate on the train set and test set which outperforms the state-of-the-art baseline UniDexGrasp by 11.7% and 11.3%, respectively.


Methods


Overview

input

Overview. Our method, UniDexGrasp++ follows the convention of first learning a state-based policy and then distilling it into a vision-based policy, and our proposed method significantly boost both the state and vision learning stages.

GeoCurriculum

input

Geometry-aware Curriculum Learning. The idea is to gradually enlarge the grasping task space from a single object with a fixed pose, to similar objects with similar poses, finally to thousands objects with arbitrary poses. To do so, we propose to pretrain a point cloud autoencoder and use its bottleneck feature as the metric of the task space. We then can gradually enlarge the task space from one point to the whole space.

GeoClustering

input

Geometry-aware Clustering. After this curriculum training, we obtain our first generalist policy, SG1 and need to further improve it. Inspired by divide-and-conquer, we use the task metric to partition the whole task space into a lot of subspaces. And then we can duplicate SG1 many times and finetune each of them in a smaller task subspace to obtain many specialist SS_i. SS_i will have a better performance than SG1 in its task subspace.

GiGSL

input

Geometry-aware iterative Generalist-Specialist Learning. All the specialists SS_i now can distill back to a new generalist and gain a higher overall grasping performance. We iteratively do generalist and specialist learning several times until the policy is good enough.

Full pipeline

input

Method Overview. We propose to first adopt a state-based policy learning stage followed by a vision-based policy learning stage. The state-based policy takes input robot state Rt, object state St, and the geometric feature z of the scene point cloud of the first frame. We leverage a geometry-aware task curriculum (GeoCurriculum) to learn the first state-based generalist policy. After that, this generalist policy is further improved via iteratively performing specialist fine-tuning and distilling back to the generalist in our proposed geometry-aware iterative generalist-specialist learning (GiGSL), where the task assignment to which specialist is decided by our geometry-aware clustering (GeoClustering). For vision-based policy learning, we first distill the final state-based specialists to an initial vision-based generalist and then do GiGSL for the vision generalist, until we obtain the final vision-based generalist with the highest performance.




Results


input

Training Environment. We train our policies in IssacGym that contains more than 3000 training object instances with arbitrary initial poses.

input

Performance during Training. At the beginning of the training, geometry-aware task curriculum yields a state-based policy SG1 that achieves 82.7 success rate. Then the Geometry aware iterative Generalist-Specialist Learning keeps increasing the success rate. Note that each distillation will inevitably drop some performance, especially for the state-vision distillation due to significant difference in inputs. But finally, our vision-based learning iterations bring it back to an 85.4 success rate.

input

Visualization of Clusters. Here we visualize the task clusters in the vision stage. Look, each cluster groups similar objects with similar grasping poses together. Note that, usually, after distillation, finetuning using RL will only hurt performance. But, in our task cluster with high task similarity, RL finetuning can further improve the performance. This is the key and secret to our specialist learning.

input

Visualization of Grasping. Here we show some grasping processes of different objects with different pose.


Citation


            
@article{wan2023unidexgrasp++,
  title={UniDexGrasp++: Improving Dexterous Grasping Policy Learning via Geometry-aware Curriculum and Iterative Generalist-Specialist Learning},
  author={Wan, Weikang and Geng, Haoran and Liu, Yun and Shan, Zikang and Yang, Yaodong and Yi, Li and Wang, He},
  journal={arXiv preprint arXiv:2304.00464},
  year={2023} 
}
            

Contact


If you have any questions, please feel free to contact us:

  • Weikang Wan: wwkPrevent spamming@Prevent spammingpku.edu.cn
  • Haoran Geng: ghrPrevent spamming@Prevent spammingstu.pku.edu.cn
  • He Wang: hewangPrevent spamming@Prevent spammingpku.edu.cn