SEGAR: Selective Enhancement for Generative Augmented Reality
2026-03-25 • Computer Vision and Pattern Recognition
Computer Vision and Pattern RecognitionArtificial Intelligence
AI summaryⓘ
The authors propose SEGAR, a method that uses AI to predict future scenes with specific visual edits for augmented reality (AR). Their system first creates future images with certain changes while keeping other parts the same, then fixes important safety areas to match the real world. They tested this in driving situations where it's easy to know what parts of the scene are important. This work is an early effort to make AR that can prepare and update images ahead of time to run more smoothly.
generative world modelsaugmented realitydiffusion modelsimage sequence predictionselective correctionsemantic regionsdriving scenariostemporal coherencereal-world alignmentvisual augmentation
Authors
Fanjun Bu, Chenyang Yuan, Hiroshi Yasuda
Abstract
Generative world models offer a compelling foundation for augmented-reality (AR) applications: by predicting future image sequences that incorporate deliberate visual edits, they enable temporally coherent, augmented future frames that can be computed ahead of time and cached, avoiding per-frame rendering from scratch in real time. In this work, we present SEGAR, a preliminary framework that combines a diffusion-based world model with a selective correction stage to support this vision. The world model generates augmented future frames with region-specific edits while preserving others, and the correction stage subsequently aligns safety-critical regions with real-world observations while preserving intended augmentations elsewhere. We demonstrate this pipeline in driving scenarios as a representative setting where semantic region structure is well defined and real-world feedback is readily available. We view this as an early step toward generative world models as practical AR infrastructure, where future frames can be generated, cached, and selectively corrected on demand.