The Geometry of Noise: Why Diffusion Models Don't Need Noise Conditioning

2026-02-20Machine Learning

Machine LearningComputer Vision and Pattern Recognition
AI summary

The authors explore autonomous generative models that do not explicitly use noise levels during training. They clarify that these models effectively perform a type of gradient flow on a 'Marginal Energy' landscape, which represents the combined effect of all noise levels. Although this landscape has sharp singularities near real data, the models learn a way to smooth out these problematic areas, making sampling stable. They also identify why certain model designs fail by amplifying errors, and show that using velocity-based approaches helps keep predictions stable.

autonomous generative modelsMarginal EnergyRiemannian gradient flownoise-agnostic modelingdata manifoldblind diffusionparameterization stabilityvelocity-based modelsnoise-level estimationgradient singularity
Authors
Mojtaba Sahraee-Ardakan, Mauricio Delbracio, Peyman Milanfar
Abstract
Autonomous (noise-agnostic) generative models, such as Equilibrium Matching and blind diffusion, challenge the standard paradigm by learning a single, time-invariant vector field that operates without explicit noise-level conditioning. While recent work suggests that high-dimensional concentration allows these models to implicitly estimate noise levels from corrupted observations, a fundamental paradox remains: what is the underlying landscape being optimized when the noise level is treated as a random variable, and how can a bounded, noise-agnostic network remain stable near the data manifold where gradients typically diverge? We resolve this paradox by formalizing Marginal Energy, $E_{\text{marg}}(\mathbf{u}) = -\log p(\mathbf{u})$, where $p(\mathbf{u}) = \int p(\mathbf{u}|t)p(t)dt$ is the marginal density of the noisy data integrated over a prior distribution of unknown noise levels. We prove that generation using autonomous models is not merely blind denoising, but a specific form of Riemannian gradient flow on this Marginal Energy. Through a novel relative energy decomposition, we demonstrate that while the raw Marginal Energy landscape possesses a $1/t^p$ singularity normal to the data manifold, the learned time-invariant field implicitly incorporates a local conformal metric that perfectly counteracts the geometric singularity, converting an infinitely deep potential well into a stable attractor. We also establish the structural stability conditions for sampling with autonomous models. We identify a ``Jensen Gap'' in noise-prediction parameterizations that acts as a high-gain amplifier for estimation errors, explaining the catastrophic failure observed in deterministic blind models. Conversely, we prove that velocity-based parameterizations are inherently stable because they satisfy a bounded-gain condition that absorbs posterior uncertainty into a smooth geometric drift.