AnyHand: A Large-Scale Synthetic Dataset for RGB(-D) Hand Pose Estimation

2026-03-26 • Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition

AI summaryⓘ

The authors created AnyHand, a very large set of computer-generated hand images with detailed 3D information to help computers recognize hand positions better from normal photos or photos with depth data. They showed that adding AnyHand data to training improves accuracy on several tests without needing to change existing methods. Their model also works well on new, unseen data without extra training. Additionally, they developed a simple way to use depth information that boosts performance further when combined with their dataset.

3D hand pose estimationRGB imagesRGB-D imagessynthetic datasethand-object interactiondepth fusiongeneralizationbenchmark datasetscomputer vision

Authors

Chen Si, Yulin Liu, Bo Ai, Jianwen Xie, Rolandos Alexandros Potamias, Chuanxia Zheng, Hao Su

Abstract

We present AnyHand, a large-scale synthetic dataset designed to advance the state of the art in 3D hand pose estimation from both RGB-only and RGB-D inputs. While recent works with foundation approaches have shown that an increase in the quantity and diversity of training data can markedly improve performance and robustness in hand pose estimation, existing real-world-collected datasets on this task are limited in coverage, and prior synthetic datasets rarely provide occlusions, arm details, and aligned depth together at scale. To address this bottleneck, our AnyHand contains 2.5M single-hand and 4.1M hand-object interaction RGB-D images, with rich geometric annotations. In the RGB-only setting, we show that extending the original training sets of existing baselines with AnyHand yields significant gains on multiple benchmarks (FreiHAND and HO-3D), even when keeping the architecture and training scheme fixed. More impressively, the model trained with AnyHand shows stronger generalization to the out-of-domain HO-Cap dataset, without any fine-tuning. We also contribute a lightweight depth fusion module that can be easily integrated into existing RGB-based models. Trained with AnyHand, the resulting RGB-D model achieves superior performance on the HO-3D benchmark, showing the benefits of depth integration and the effectiveness of our synthetic data.

View PDFOpen arXiv