PatchScene: Patch-based Voxel Diffusion for Large-Scale Scene Completion

2026-06-02Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition
AI summary

The authors introduce PatchScene, a new method for filling in missing parts of 3D LiDAR scans by working on small connected patches instead of big chunks. They use a special technique to smoothly combine information from overlapping areas and from nearby moments in time, making the 3D scenes more complete and consistent. Their approach also takes advantage of how LiDAR points are denser near the sensor and sparser far away to improve details progressively. Tests show their method is more accurate and stable over time compared to others, and it works well even when used on larger distances than it was trained on.

LiDAR3D Scene CompletionDiffusion ModelVoxelPatch-Based ProcessingSpatio-Temporal FusionSemanticKITTIAutonomous DrivingRadial DensityGeneralization
Authors
Qingdong Xu, Jiajun Zhu, Shilin Zhu, Xinjing He, Chao Lu, Huanran Wang, Jiyao Zhang
Abstract
We propose PatchScene, a novel diffusion-based framework for large-scale LiDAR scene completion. Unlike existing methods that rely on global latent representations or dense voxel grids, PatchScene adopts a patch-based voxel diffusion paradigm that explicitly generates fine-grained geometry within localized 3D regions. To ensure coherent reconstruction at both spatial and temporal scales, we introduce a confidence-guided spatio-temporal fusion mechanism that integrates overlapping patches and adjacent frames in a unified generative process. Furthermore, we design an Annular-Flow diffusion strategy that leverages the radial density pattern of LiDAR scans to progressively propagate high-fidelity information from near-range to far-range regions, enabling spatially unbounded scene completion. Extensive experiments on the SemanticKITTI benchmark demonstrate that PatchScene achieves state-of-the-art performance across all standard metrics, surpassing previous approaches in both geometric accuracy and temporal consistency. Remarkably, the model trained on 20 m LiDAR ranges generalizes effectively to 50 m scenes without retraining, highlighting its strong scalability and generalization capability for real-world autonomous driving applications.