SHOW3D: Capturing Scenes of 3D Hands and Objects in the Wild

2026-03-30 • Computer Vision and Pattern Recognition

Computer Vision and Pattern RecognitionRobotics

AI summaryⓘ

The authors created a new system to capture detailed 3D images of hands and objects interacting in real-life situations, not just in controlled studios. Their setup uses multiple cameras on a backpack and a VR headset to track movements accurately without markers. They made a big dataset called SHOW3D that shows hands and objects in many natural environments, including outdoors. This helps improve how well computer models understand hand-object interactions in the real world.

3D hand trackingegocentric visionmulti-camera systemVR headsethand-object interactiondatasetmarker-less trackingground-truth annotationin-the-wild datacomputer vision

Authors

Patrick Rim, Kevin Harris, Braden Copple, Shangchen Han, Xu Xie, Ivan Shugurov, Sizhe An, He Wen, Alex Wong, Tomas Hodan, Kun He

Abstract

Accurate 3D understanding of human hands and objects during manipulation remains a significant challenge for egocentric computer vision. Existing hand-object interaction datasets are predominantly captured in controlled studio settings, which limits both environmental diversity and the ability of models trained on such data to generalize to real-world scenarios. To address this challenge, we introduce a novel marker-less multi-camera system that allows for nearly unconstrained mobility in genuinely in-the-wild conditions, while still having the ability to generate precise 3D annotations of hands and objects. The capture system consists of a lightweight, back-mounted, multi-camera rig that is synchronized and calibrated with a user-worn VR headset. For 3D ground-truth annotation of hands and objects, we develop an ego-exo tracking pipeline and rigorously evaluate its quality. Finally, we present SHOW3D, the first large-scale dataset with 3D annotations that show hands interacting with objects in diverse real-world environments, including outdoor settings. Our approach significantly reduces the fundamental trade-off between environmental realism and accuracy of 3D annotations, which we validate with experiments on several downstream tasks. show3d-dataset.github.io

View PDFOpen arXiv