Experiments
Simulation and Real-World Setups
We argue that a generalizable non-prehensile manipulation policy in a realistic robotic setting should not only accommodate diverse object
geometries but also adapt to varying physical properties, all while relying solely on a single-camera setup without the need for additional tracking modules.
Left: Simulation Setups. Right: Real-World Setups and 10 unseen objects for evaluation.
Simulation Rollouts
Generalize across Diverse Gemotries
Generalize across Different Table Frictions
Handling Objects with Non-uniform Mass Distribution
Combining with VLM and Grasping
Quantitative Results in the Real World
We evaluate our model's generalization ability by comparing it with CORN, which relies on an external tracking module for object pose estimation in real-world experiments. Our method achieves accurate manipulation across diverse objects without external pose tracking, significantly outperforming CORN with an average success rate of 68% versus 36%.
DyWA Methods
Our World Action Model processes the embeddings of the current observation (partial point cloud, end-effector pose, and joint state) and the goal point cloud (transformed from the initial partial observation) to predict the robot action and next state. Additionally, an adaptation module encodes historical observations and actions, decoding them into the dynamics embedding that conditions the model via FiLM. A pre-trained RL teacher policy (right) supervises both the action and adaptation embedding using privileged full point cloud and physics parameter embeddings.
Acknowledgments
We thank Yixin Zheng for organizing the code release, and Junhao Yang for assistance with rendering. We are especially grateful to Jiayi Chen for his insightful suggestions and in-depth discussions that significantly shaped the direction of this work. We also appreciate the valuable feedback and discussions from Jiazhao Zhang, Mi Yan, and Shenyuan Gao.
BibTeX
@article{lyu2025dywa,
title={DyWA: Dynamics-adaptive World Action Model for Generalizable Non-prehensile Manipulation},
author={Lyu, Jiangran and Li, Ziming and Shi, Xuesong and Xu, Chaoyi and Wang, Yizhou and Wang, He},
journal={arXiv preprint arXiv:2503.16806},
year={2025}
}