WorldCache: Content-Aware Caching for Accelerated Video World Models

2026-03-23 • Computer Vision and Pattern Recognition

Computer Vision and Pattern RecognitionArtificial IntelligenceComputation and LanguageMachine Learning

AI summaryⓘ

The authors studied a method to speed up a type of video model called Diffusion Transformers, which usually takes a lot of time to create video frames. They improved feature caching—a way to reuse parts of the model's calculations—by making it smarter about when and how to reuse these parts based on motion and important areas in the video. Their method, called WorldCache, avoids common problems like blurry or ghost images without needing to retrain the model. It made the video generation process more than twice as fast while keeping almost all the original quality.

Diffusion Transformersvideo world modelsfeature cachingsequential denoisingspatio-temporal attentionmotion-adaptive thresholdssaliency-weighted drift estimationinference speeduptraining-free caching

Authors

Umair Nawaz, Ahmed Heakl, Ufaq Khan, Abdelrahman Shaker, Salman Khan, Fahad Shahbaz Khan

Abstract

Diffusion Transformers (DiTs) power high-fidelity video world models but remain computationally expensive due to sequential denoising and costly spatio-temporal attention. Training-free feature caching accelerates inference by reusing intermediate activations across denoising steps; however, existing methods largely rely on a Zero-Order Hold assumption i.e., reusing cached features as static snapshots when global drift is small. This often leads to ghosting artifacts, blur, and motion inconsistencies in dynamic scenes. We propose \textbf{WorldCache}, a Perception-Constrained Dynamical Caching framework that improves both when and how to reuse features. WorldCache introduces motion-adaptive thresholds, saliency-weighted drift estimation, optimal approximation via blending and warping, and phase-aware threshold scheduling across diffusion steps. Our cohesive approach enables adaptive, motion-consistent feature reuse without retraining. On Cosmos-Predict2.5-2B evaluated on PAI-Bench, WorldCache achieves \textbf{2.3$\times$} inference speedup while preserving \textbf{99.4\%} of baseline quality, substantially outperforming prior training-free caching approaches. Our code can be accessed on \href{https://umair1221.github.io/World-Cache/}{World-Cache}.

View PDFOpen arXiv