Learning Versatile Humanoid Manipulation with Touch Dreaming

2026-04-14 • Robotics

Robotics

AI summaryⓘ

The authors study how to make humanoid robots better at doing tasks that need both moving and touching things carefully. They built a smart robot controller that keeps the robot stable while it moves and manipulates objects. Using virtual reality, they collected real human-like robot movements to teach the robot. They also created a new AI model called Humanoid Transformer with Touch Dreaming (HTD) that helps the robot understand touch along with sight and body feelings. Their approach greatly improved the robot's success in five tricky tasks involving touch and movement.

humanoid robotsloco-manipulationreinforcement learningteleoperationbehavioral cloningtransformer modeltactile sensingproprioceptionmultimodal learningcontact-aware manipulation

Authors

Yaru Niu, Zhenlong Fang, Binghong Chen, Shuai Zhou, Revanth Senthilkumaran, Hao Zhang, Bingqing Chen, Chen Qiu, H. Eric Tseng, Jonathan Francis, Ding Zhao

Abstract

Humanoid robots promise general-purpose assistance, yet real-world humanoid loco-manipulation remains challenging because it requires whole-body stability, dexterous hands, and contact-aware perception under frequent contact changes. In this work, we study dexterous, contact-rich humanoid loco-manipulation. We first develop an RL-based whole-body controller that provides stable lower-body and torso execution during complex manipulation. Built on this controller, we develop a whole-body humanoid data collection system that combines VR-based teleoperation with human-to-humanoid motion mapping, enabling efficient collection of real-world demonstrations. We then propose Humanoid Transformer with Touch Dreaming (HTD), a multimodal encoder--decoder Transformer that models touch as a core modality alongside multi-view vision and proprioception. HTD is trained in a single stage with behavioral cloning augmented by touch dreaming: in addition to predicting action chunks, the policy predicts future hand-joint forces and future tactile latents, encouraging the shared Transformer trunk to learn contact-aware representations for dexterous interaction. Across five contact-rich tasks, Insert-T, Book Organization, Towel Folding, Cat Litter Scooping, and Tea Serving, HTD achieves a 90.9% relative improvement in average success rate over the stronger baseline. Ablation results further show that latent-space tactile prediction is more effective than raw tactile prediction, yielding a 30% relative gain in success rate. These results demonstrate that combining robust whole-body execution, scalable humanoid data collection, and predictive touch-centered learning enables versatile, high-dexterity humanoid manipulation in the real world. Project webpage: humanoid-touch-dream.github.io.

View PDFOpen arXiv