EgoScale: Scaling Dexterous Manipulation with Diverse Egocentric Human Data
2026-02-18 • Robotics
Robotics
AI summaryⓘ
The authors created EgoScale, a system that learns fine hand movements for robots by training on a huge amount of video data showing humans using their hands. They found that the more human data they used, the better the robot learned to perform tasks, and this improvement could be predicted. Their method involves first training on large human datasets, then quickly adapting the model to the robot's movements with minimal extra training. This approach significantly improved the robot's success at dexterous tasks and worked even when the robot had fewer fingers than the human hand videos used in training.
dexterous manipulationegocentric videoVision Language Action (VLA) modelhuman to robot transferrobotic handdegrees of freedom (DoF)pretrainingrobot policylong horizon tasksembodiment agnostic
Authors
Ruijie Zheng, Dantong Niu, Yuqi Xie, Jing Wang, Mengda Xu, Yunfan Jiang, Fernando Castañeda, Fengyuan Hu, You Liang Tan, Letian Fu, Trevor Darrell, Furong Huang, Yuke Zhu, Danfei Xu, Linxi Fan
Abstract
Human behavior is among the most scalable sources of data for learning physical intelligence, yet how to effectively leverage it for dexterous manipulation remains unclear. While prior work demonstrates human to robot transfer in constrained settings, it is unclear whether large scale human data can support fine grained, high degree of freedom dexterous manipulation. We present EgoScale, a human to dexterous manipulation transfer framework built on large scale egocentric human data. We train a Vision Language Action (VLA) model on over 20,854 hours of action labeled egocentric human video, more than 20 times larger than prior efforts, and uncover a log linear scaling law between human data scale and validation loss. This validation loss strongly correlates with downstream real robot performance, establishing large scale human data as a predictable supervision source. Beyond scale, we introduce a simple two stage transfer recipe: large scale human pretraining followed by lightweight aligned human robot mid training. This enables strong long horizon dexterous manipulation and one shot task adaptation with minimal robot supervision. Our final policy improves average success rate by 54% over a no pretraining baseline using a 22 DoF dexterous robotic hand, and transfers effectively to robots with lower DoF hands, indicating that large scale human motion provides a reusable, embodiment agnostic motor prior.