Syn4D: A Multiview Synthetic 4D Dataset

2026-05-06Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition
AI summary

The authors introduce Syn4D, a new synthetic dataset designed to help computers better understand moving 3D scenes from regular videos. It provides detailed information like camera movement, depth, and human poses, all perfectly labeled. A unique feature lets users pinpoint any pixel in 3D space at any time or camera view. The authors tested Syn4D on tasks like 3D reconstruction and tracking, showing it can help improve how computers see dynamic scenes. This dataset aims to support future research in understanding and modeling scenes that change over time.

3D reconstructionmonocular videodepth mapscamera motiondense trackinghuman pose estimationsynthetic datasetdynamic scenesspatiotemporal modeling
Authors
Zeren Jiang, Yushi Lan, Yihang Luo, Yufan Deng, Zihang Lai, Edgar Sucar, Christian Rupprecht, Iro Laina, Diane Larlus, Chuanxia Zheng, Andrea Vedaldi
Abstract
Dense 3D reconstruction and tracking of dynamic scenes from monocular video remains an important open challenge in computer vision. Progress in this area has been constrained by the scarcity of high-quality datasets with dense, complete, and accurate geometric annotations. To address this limitation, we introduce Syn4D, a multiview synthetic dataset of dynamic scenes that includes ground-truth camera motion, depth maps, dense tracking, and parametric human pose annotations. A key feature of Syn4D is the ability to unproject any pixel into 3D to any time and to any camera. We conduct extensive evaluations across multiple downstream tasks to demonstrate the utility and effectiveness of the proposed dataset, including 4D scene reconstruction, 3D point tracking, geometry-aware camera retargeting, and human pose estimation. The experimental results highlight Syn4D's potential to facilitate research in dynamic scene understanding and spatiotemporal modeling.