WHOLE: World-Grounded Hand-Object Lifted from Egocentric Videos
2026-02-25 • Computer Vision and Pattern Recognition
Computer Vision and Pattern Recognition
AI summaryⓘ
The authors address the challenge of understanding hand and object movements in videos taken from a first-person view, where things often get blocked or move out of sight. They propose WHOLE, a method that looks at both hand and object motions together instead of separately, using a learned model of how hands and objects usually move together. This approach helps the system better guess the positions and interactions even when something isn't clearly visible. Their method performs better than previous ones in estimating hand movements, object poses, and how they interact.
egocentric videohand pose estimationobject pose estimation6D object posemotion reconstructiongenerative priorhand-object interactionworld spacetrajectory estimation
Authors
Yufei Ye, Jiaman Li, Ryan Rong, C. Karen Liu
Abstract
Egocentric manipulation videos are highly challenging due to severe occlusions during interactions and frequent object entries and exits from the camera view as the person moves. Current methods typically focus on recovering either hand or object pose in isolation, but both struggle during interactions and fail to handle out-of-sight cases. Moreover, their independent predictions often lead to inconsistent hand-object relations. We introduce WHOLE, a method that holistically reconstructs hand and object motion in world space from egocentric videos given object templates. Our key insight is to learn a generative prior over hand-object motion to jointly reason about their interactions. At test time, the pretrained prior is guided to generate trajectories that conform to the video observations. This joint generative reconstruction substantially outperforms approaches that process hands and objects separately followed by post-processing. WHOLE achieves state-of-the-art performance on hand motion estimation, 6D object pose estimation, and their relative interaction reconstruction. Project website: https://judyye.github.io/whole-www