OpenSpatial: A Principled Data Engine for Empowering Spatial Intelligence
2026-04-08 • Computation and Language
Computation and Language
AI summaryⓘ
The authors present OpenSpatial, an open-source tool designed to generate high-quality spatial data for tasks like measuring space, understanding relationships, and reasoning about scenes. They built a large dataset called OpenSpatial-3M with 3 million detailed samples using 3D bounding boxes as a core method. Models trained on this data performed better on various spatial reasoning tests, improving accuracy by about 19%. The authors also studied how different data features affect spatial understanding. By sharing both the tool and dataset, they aim to help future research in spatial intelligence.
Spatial reasoning3D bounding boxesData generationSpatial measurementCamera perceptionMulti-view consistencyScene-aware reasoningSpatial intelligenceDatasetOpen-source
Authors
Jianhui Liu, Haoze Sun, Wenbo Li, Yanbing Zhang, Rui Yang, Zhiliang Zhu, Yijun Yang, Shenghe Zheng, Nan Jiang, Jiaxiu Jiang, Haoyang Huang, Tien-Tsin Wong, Nan Duan, Xiaojuan Qi
Abstract
Spatial understanding is a fundamental cornerstone of human-level intelligence. Nonetheless, current research predominantly focuses on domain-specific data production, leaving a critical void: the absence of a principled, open-source engine capable of fully unleashing the potential of high-quality spatial data. To bridge this gap, we elucidate the design principles of a robust data generation system and introduce OpenSpatial -- an open-source data engine engineered for high quality, extensive scalability, broad task diversity, and optimized efficiency. OpenSpatial adopts 3D bounding boxes as the fundamental primitive to construct a comprehensive data hierarchy across five foundational tasks: Spatial Measurement (SM), Spatial Relationship (SR), Camera Perception (CP), Multi-view Consistency (MC), and Scene-Aware Reasoning (SAR). Leveraging this scalable infrastructure, we curate OpenSpatial-3M, a large-scale dataset comprising 3 million high-fidelity samples. Extensive evaluations demonstrate that versatile models trained on our dataset achieve state-of-the-art performance across a wide spectrum of spatial reasoning benchmarks. Notably, the best-performing model exhibits a substantial average improvement of 19 percent, relatively. Furthermore, we provide a systematic analysis of how data attributes influence spatial perception. By open-sourcing both the engine and the 3M-scale dataset, we provide a robust foundation to accelerate future research in spatial intelligence.