For the task of hanging clothes, learning how to insert a hanger into a garment is a crucial step, but has rarely been explored in robotics. In this work, we address the problem of inserting a hanger into various unseen garments that are initially laid flat on a table. This task is challenging due to its long-horizon nature, the high degrees of freedom of the garments and the lack of data. To simplify the learning process, we first propose breaking the task into several subtasks. Then, we formulate each subtask as a policy learning problem and propose a low-dimensional action parameterization. To overcome the challenge of limited data, we build our own simulator and create 144 synthetic clothing assets to effectively collect high-quality training data. Our approach uses single-view depth images and object masks as input, which mitigates the Sim2Real appearance gap and achieves high generalization capabilities for new garments. Extensive experiments in both simulation and reality validate our proposed method. By training on various garments in the simulator, our method achieves a 75\% success rate with 8 different unseen garments in the real world.
RoboHanger: We use a dual-arm robot, where each arm has 7 degrees of freedom (DoF) and is equipped with parallel grippers. A camera is mounted on the robot's head. Our method is based on visual input, enabling the robot to insert a hanger into the necklines of various garments.
System Overview. (a) Before each action primitive, our system takes RGB-D observations as input and segments the hanger, the garment, and its neckline. We pre-detect three keypoints of the hanger at the beginning of the policy. (b) The action primitive press-and-lift inserts the left endpoint of the hanger into the target garment. (c) The action primitive drag-and-rotate inserts the right endpoint of the hanger into the target garment. In (b) and (c), four U-Nets \((Q_press, Q_lift, Q_drag , Q_rotate)\) all take the depth map and masks as input and output 2D value maps indicating the subtask success rate of applying actions at each pixel. We apply argmax to obtain a final single point.
@misc{chen2025robohangerlearninggeneralizablerobotic,
title={RoboHanger: Learning Generalizable Robotic Hanger Insertion for Diverse Garments},
author={Yuxing Chen and Songlin Wei and Bowen Xiao and Jiangran Lyu and Jiayi Chen and Feng Zhu and He Wang},
year={2025},
eprint={2412.01083},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2412.01083},
}