FoldNet++: a Cross-embodiment T-shirt Folding and Unfolding Model Pre-trained on Synthetic Data

Yuxing Chen^1,2, Zhiyuan Wei^1,2, Bowen Xiao^1,2, Zhizheng Zhang², He Wang^1,2

¹CFCS, Peking University, China
²Galbot, China

Abstract

Due to the deformable nature of garments, training a generalizable foundation model for T-shirt folding and unfolding that can transfer across different robotic embodiments is highly challenging. In this work, we propose FoldNet++, a cross-embodiment foundation model for T-shirt folding and unfolding trained entirely on large-scale synthetic data. For training data synthesis, we first adopt the approach of FoldNet to produce a large-scale set of physically simulatable T-shirts with diverse appearances and annotated semantic keypoints. Leveraging these semantic keypoints, we generate demonstrations across different robotic embodiments using a unified rule-based framework. The demonstrations covers T-shirt of arbitrary initial configurations. Then we use these demonstrations to train a vision-based model. Through large-scale synthesized cross-embodiment data, our model demonstrates strong zero-shot and few-shot capabilities across different robots. Our model can transfer to real-world environments without any real-world fine-tuning and can achieve a 93% success rate over the entire pipeline.

Method

Overview of FoldNet++. Our pipeline consists of three stages: demonstration generation, model pre-training, and deployment. (a) In the demonstration Generation stage, our dataset includes 6 different embodiments, 1K T-shirts with varying sizes and colors, and 1K table textures along with diverse HDRI assets. Using this setup, we generate 120K complete folding and unfolding demonstration episodes in simulation. (b) In the model pre-training stage, we adopt a mainstream flow-matching-based VLA architecture (\(\pi_0\)) and perform cross-embodiment (CE) pre-training. (c) In the final deployment stage, we directly evaluate the model trained purely on synthetic data in the real world. The model demonstrates strong zero-shot capability on an in-distribution embodiment and strong few-shot capability on an out-of-distribution (OOD) embodiment.

Results

Deployment on Galbot-G1

Deployment on ARX-R5

FoldNet++: a Cross-embodiment T-shirt Folding and Unfolding Model Pre-trained on Synthetic Data

Abstract

Method

Results

More Videos