CFG-Ctrl: Control-Based Classifier-Free Diffusion Guidance

2026-03-03 • Computer Vision and Pattern Recognition

Computer Vision and Pattern RecognitionMachine Learning

AI summaryⓘ

The authors study how to better guide AI models that create images from text by improving a method called Classifier-Free Guidance (CFG). They explain CFG as a type of control system that adjusts the image generation process but note that current methods can be unstable or less accurate with strong guidance. To fix this, the authors introduce a new control method called Sliding Mode Control CFG (SMC-CFG), which quickly corrects errors to make the generated images better match the text. Their tests show that SMC-CFG produces more accurate images and works well across different models and settings.

Classifier-Free Guidance (CFG)Generative Flow ModelsControl TheoryProportional Control (P-control)Sliding Mode Control (SMC)Semantic AlignmentLyapunov StabilityText-to-Image GenerationStable Diffusion

Authors

Hanyang Wang, Yiyang Liu, Jiawei Chi, Fangfu Liu, Ran Xue, Yueqi Duan

Abstract

Classifier-Free Guidance (CFG) has emerged as a central approach for enhancing semantic alignment in flow-based diffusion models. In this paper, we explore a unified framework called CFG-Ctrl, which reinterprets CFG as a control applied to the first-order continuous-time generative flow, using the conditional-unconditional discrepancy as an error signal to adjust the velocity field. From this perspective, we summarize vanilla CFG as a proportional controller (P-control) with fixed gain, and typical follow-up variants develop extended control-law designs derived from it. However, existing methods mainly rely on linear control, inherently leading to instability, overshooting, and degraded semantic fidelity especially on large guidance scales. To address this, we introduce Sliding Mode Control CFG (SMC-CFG), which enforces the generative flow toward a rapidly convergent sliding manifold. Specifically, we define an exponential sliding mode surface over the semantic prediction error and introduce a switching control term to establish nonlinear feedback-guided correction. Moreover, we provide a Lyapunov stability analysis to theoretically support finite-time convergence. Experiments across text-to-image generation models including Stable Diffusion 3.5, Flux, and Qwen-Image demonstrate that SMC-CFG outperforms standard CFG in semantic alignment and enhances robustness across a wide range of guidance scales. Project Page: https://hanyang-21.github.io/CFG-Ctrl

View PDFOpen arXiv