HoMMI: Learning Whole-Body Mobile Manipulation from Human Demonstrations
2026-03-03 • Robotics
Robotics
AI summaryⓘ
The authors created HoMMI, a system that learns how to control a robot's whole body for moving and manipulating objects by watching humans perform tasks without the robot being present. They use a special camera view from the person’s point of view to understand the surroundings but had to solve problems because humans and robots do things differently. To fix this, they designed methods that translate what humans see and do into actions a robot can perform, using visual tricks and coordinated whole-body movements. Their work helps robots learn complicated tasks involving moving around, using both hands, and sensing the environment.
Whole-body mobile manipulationEgocentric sensingCross-embodiment policyHand-eye coordinationRobot learning from demonstrationAction spaceObservation spacePolicy transferBimanual coordinationActive perception
Authors
Xiaomeng Xu, Jisang Park, Han Zhang, Eric Cousineau, Aditya Bhat, Jose Barreiros, Dian Wang, Shuran Song
Abstract
We present Whole-Body Mobile Manipulation Interface (HoMMI), a data collection and policy learning framework that learns whole-body mobile manipulation directly from robot-free human demonstrations. We augment UMI interfaces with egocentric sensing to capture the global context required for mobile manipulation, enabling portable, robot-free, and scalable data collection. However, naively incorporating egocentric sensing introduces a larger human-to-robot embodiment gap in both observation and action spaces, making policy transfer difficult. We explicitly bridge this gap with a cross-embodiment hand-eye policy design, including an embodiment agnostic visual representation; a relaxed head action representation; and a whole-body controller that realizes hand-eye trajectories through coordinated whole-body motion under robot-specific physical constraints. Together, these enable long-horizon mobile manipulation tasks requiring bimanual and whole-body coordination, navigation, and active perception. Results are best viewed on: https://hommi-robot.github.io