EditCtrl: Disentangled Local and Global Control for Real-Time Generative Video Editing

2026-02-16Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition
AI summary

The authors address the problem of video editing being slow and inefficient because existing methods process the whole video even for small edits. They propose EditCtrl, a new way to focus computing power only on the parts of the video that need editing, making the process much faster and more efficient. EditCtrl uses a local video context module for detailed editing and a lightweight global module to keep the whole video consistent. Their approach is 10 times more efficient and also improves the quality of edits compared to previous methods. Additionally, their method supports editing multiple areas with text instructions and can propagate changes automatically through the video.

video inpaintinggenerative video editingpre-trained modelscomputational efficiencylocal video contexttemporal global contextmasked tokensautoregressive propagationmulti-region editing
Authors
Yehonathan Litman, Shikun Liu, Dario Seyb, Nicholas Milef, Yang Zhou, Carl Marshall, Shubham Tulsiani, Caleb Leak
Abstract
High-fidelity generative video editing has seen significant quality improvements by leveraging pre-trained video foundation models. However, their computational cost is a major bottleneck, as they are often designed to inefficiently process the full video context regardless of the inpainting mask's size, even for sparse, localized edits. In this paper, we introduce EditCtrl, an efficient video inpainting control framework that focuses computation only where it is needed. Our approach features a novel local video context module that operates solely on masked tokens, yielding a computational cost proportional to the edit size. This local-first generation is then guided by a lightweight temporal global context embedder that ensures video-wide context consistency with minimal overhead. Not only is EditCtrl 10 times more compute efficient than state-of-the-art generative editing methods, it even improves editing quality compared to methods designed with full-attention. Finally, we showcase how EditCtrl unlocks new capabilities, including multi-region editing with text prompts and autoregressive content propagation.