Unleashing Infinite Motion: Scaling Expressive Quadrupedal Motion via Generative Video Priors
2026-06-26 • Robotics
Robotics
AI summaryⓘ
The authors created Uni-Mo, a new method to teach four-legged robots many different moves without needing real animals. Instead of relying on animals to guide the robot's motions, they use AI to generate videos of robot-like movements from text descriptions, then turn these videos into data for the robot to learn. They also introduced a special technique to keep the robot's appearance consistent in these videos, making the data easier to use. They tested this approach on a real robot with high success and shared a large open dataset of robot motions for others to use.
quadruped robotslarge language models (LLM)video diffusion modelsrobot locomotion3D motion trackingidentity consistency lossUnitree Go2robot motion datasetsbehavioral cloningsimulation-to-real transfer
Authors
Youzhi Liu, Li Gao, Yifei Qian, Liu Liu, Yang Cai, Ziqiao Li
Abstract
Quadruped robots have achieved remarkable locomotion, yet their behavioral repertoire remains confined to a few gaits--far from the expressive, companion-like presence long envisioned for them. Attempts to import the humanoid recipe of large-scale motion data have inherited one tacit assumption: that robot motion must first pass through an animal body, making data collection dependent on cooperative animals, reconstruction fragile across species, and retargeting ill-posed across incompatible morphologies. We propose Uni-Mo, a fully automated pipeline that removes the animal from the loop by reframing data scarcity as a generation problem: an LLM proposes motion prompts, a video diffusion model synthesizes the corresponding robot behaviors, and the generated videos are lifted into 3D reference trajectories used to train tracking policies deployed on a real Unitree Go2. To make naively-drifting generations reliably extractable, we introduce an Identity Consistency Loss that enforces appearance coherence across frames. We release Quad-Imaginarium at https://github.com/GaoLii/Quad-Imaginarium.git, the resulting open-source dataset of 7,488 language-annotated quadruped motions (18.5 hours) spanning acrobatic and performative behaviors. We validate 392 randomly sampled motions on a real Unitree Go2 with a 96.7% deployment success rate, complemented by a 97.6% success rate across the full dataset in simulation.