Self-Improving 4D Perception via Self-Distillation

2026-04-09Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition
AI summary

The authors present SelfEvo, a framework that helps improve 3D and 4D reconstruction models without needing labeled training data, which is usually hard to get for dynamic scenes. Their method uses differences in space and time from unlabeled videos to teach the model to get better by itself. They explore different design choices to make this self-improvement work well. Tests on eight diverse benchmarks show that SelfEvo consistently boosts performance, especially for tasks like video depth and camera position estimation.

multi-view reconstructionself-distillationspatiotemporal context4D perceptiondynamic scenesvideo depth estimationcamera estimationself-supervised learningunlabeled videos
Authors
Nan Huang, Pengcheng Yu, Weijia Zeng, James M. Rehg, Angjoo Kanazawa, Haiwen Feng, Qianqian Wang
Abstract
Large-scale multi-view reconstruction models have made remarkable progress, but most existing approaches still rely on fully supervised training with ground-truth 3D/4D annotations. Such annotations are expensive and particularly scarce for dynamic scenes, limiting scalability. We propose SelfEvo, a self-improving framework that continually improves pretrained multi-view reconstruction models using unlabeled videos. SelfEvo introduces a self-distillation scheme using spatiotemporal context asymmetry, enabling self-improvement for learning-based 4D perception without external annotations. We systematically study design choices that make self-improvement effective, including loss signals, forms of asymmetry, and other training strategies. Across eight benchmarks spanning diverse datasets and domains, SelfEvo consistently improves pretrained baselines and generalizes across base models (e.g. VGGT and $π^3$), with significant gains on dynamic scenes. Overall, SelfEvo achieves up to 36.5% relative improvement in video depth estimation and 20.1% in camera estimation, without using any labeled data. Project Page: https://self-evo.github.io/.