NEGATE: Constrained Semantic Guidance for Linguistic Negation in Text-to-Video Diffusion
2026-03-06 • Computer Vision and Pattern Recognition
Computer Vision and Pattern Recognition
AI summaryⓘ
The authors address the challenge of handling negation (like "not" or "no") in AI systems that generate images or videos using diffusion models. Instead of changing or retraining the model, they treat negation as a rule that guides the generation process by projecting certain updates to obey linguistic constraints. Their approach works for various types of negation, does not require additional training, and works with existing models for images and videos. They also created a new benchmark to test how well systems handle negation. Experiments show their method improves negation without messing up the visual quality or structure.
negationdiffusion modelsclassifier-free guidancesemantic guidancefeasibility constraintlinguistic structureimage generationvideo generationbenchmarkmulti-negation
Authors
Taewon Kang, Ming C. Lin
Abstract
Negation is a fundamental linguistic operator, yet it remains inadequately modeled in diffusion-based generative systems. In this work, we present a formal treatment of linguistic negation in diffusion-based generative models by modeling it as a structured feasibility constraint on semantic guidance within diffusion dynamics. Rather than introducing heuristics or retraining model parameters, we reinterpret classifier-free guidance as defining a semantic update direction and enforce negation by projecting the update onto a convex constraint set derived from linguistic structure. This novel formulation provides a unified framework for handling diverse negation phenomena, including object absence, graded non-inversion semantics, multi-negation composition, and scope-sensitive disambiguation. Our approach is training-free, compatible with pretrained diffusion backbones, and naturally extends from image generation to temporally evolving video trajectories. In addition, we introduce a structured negation-centric benchmark suite that isolates distinct linguistic failure modes in generative systems, to further research in this area. Experiments demonstrate that our method achieves robust negation compliance while preserving visual fidelity and structural coherence, establishing the first unified formulation of linguistic negation in diffusion-based generative models beyond representation-level evaluation.