Human Universal Grasping
2026-06-15 • Robotics
Robotics
AI summaryⓘ
The authors created a large dataset of human hand grasps using smart glasses to record how people pick up objects in everyday environments. They developed a model called HUG that uses images and depth data to predict natural human grasps for different objects. This model can then be adapted to control robot hands, allowing robots to grasp objects more effectively without extra training. They tested their approach on many unseen objects and showed it works better than previous methods. They also provide a new benchmark and release their data and code for others to use.
robotic graspingegocentric datasetRGB-D imagingflow-matching modelMANO hand posezero-shot learningstereo camerabenchmark evaluationrobot hand retargeting3D object meshes
Authors
Kevin Yuanbo Wu, Tianxing Zhou, Isaac Tu, Billy Yan, Irmak Guzey, David Fouhey, Dandan Shan, Lerrel Pinto
Abstract
Humans can grasp objects effortlessly, whereas multi-fingered robots are far from this level of generality. We argue that the most natural source of robot grasping data is from humans, who pick up thousands of objects every day. We present HUG, a flow-matching model that generates diverse human grasps for any user-specified object in a single RGB-D image captured from a stereo camera. Using smart glasses, we first collect 1M-HUGs, an egocentric dataset of human grasps spanning 1M frames (27.8 hrs) and 6,707 object instances across 41 buildings. Next, to model the distribution of natural human grasps, our novel flow-matching model fuses RGB and depth observations to output a grasp parameterized by wrist translation, wrist rotation, and MANO hand pose. Predicted grasps can be retargeted to various robot hands, enabling zero-shot grasping in everyday scenes. To standardize evaluation, we build a new simulated benchmark, HUG-Bench, of 90 unseen objects from five geometric categories and various sizes, with metric-scale 3D meshes. We evaluate HUG in the real world on the 30-object test set of HUG-Bench across multiple stereo cameras, robot embodiments, and household environments. HUG outperforms the state-of-the-art grasping baselines by +23% and +34% on our challenging object set. Code, data, benchmark, checkpoints, and an interactive demo are released on our website: https://grasping.io/