OccAny: Generalized Unconstrained Urban 3D Occupancy

2026-03-24 • Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition

AI summaryⓘ

The authors address the problem of predicting 3D occupancy (understanding what parts of urban scenes are filled or empty) without relying on specialized sensors or data from the same environment. They introduce OccAny, a new model that works with different types of images and can predict and complete 3D occupancy in cities even when scenes are unfamiliar or uncalibrated. Their method includes new techniques like Segmentation Forcing to improve quality and a way to generate new views to help fill in missing geometry. Tests show OccAny performs better than existing general models and is competitive with specialized methods.

3D occupancy predictionurban scenesvisual geometry modelsmetric predictionsegmentation forcingnovel view renderingmonocular imagessurround-view imagesgeometry completionout-of-domain generalization

Authors

Anh-Quan Cao, Tuan-Hung Vu

Abstract

Relying on in-domain annotations and precise sensor-rig priors, existing 3D occupancy prediction methods are limited in both scalability and out-of-domain generalization. While recent visual geometry foundation models exhibit strong generalization capabilities, they were mainly designed for general purposes and lack one or more key ingredients required for urban occupancy prediction, namely metric prediction, geometry completion in cluttered scenes and adaptation to urban scenarios. We address this gap and present OccAny, the first unconstrained urban 3D occupancy model capable of operating on out-of-domain uncalibrated scenes to predict and complete metric occupancy coupled with segmentation features. OccAny is versatile and can predict occupancy from sequential, monocular, or surround-view images. Our contributions are three-fold: (i) we propose the first generalized 3D occupancy framework with (ii) Segmentation Forcing that improves occupancy quality while enabling mask-level prediction, and (iii) a Novel View Rendering pipeline that infers novel-view geometry to enable test-time view augmentation for geometry completion. Extensive experiments demonstrate that OccAny outperforms all visual geometry baselines on 3D occupancy prediction task, while remaining competitive with in-domain self-supervised methods across three input settings on two established urban occupancy prediction datasets. Our code is available at https://github.com/valeoai/OccAny .

View PDFOpen arXiv