Efficient and Training-Free Single-Image Diffusion Models
2026-06-03 • Computer Vision and Pattern Recognition
Computer Vision and Pattern RecognitionMachine Learning
AI summaryⓘ
The authors study how to create new images that have a similar pattern of small patches as one original image. Instead of training a complex neural network, they use a simpler method that works directly with a collection of small image patches at different sizes. This lets them avoid slow training and still produce high-quality, varied images. Their method also works well with different tasks like changing styles or resizing images, and it can generate very large images quickly. They connect their approach to older patch-based image techniques and improve speed and quality compared to previous single-image models.
diffusion modelimage patchessingle-image generationdenoiserscore functionimage stylizationlatent space diffusionimage symmetrizationpatch-based image restorationimage retargeting
Authors
Haojun Qiu, Kiriakos N. Kutulakos, David B. Lindell
Abstract
We consider the problem of generating images whose internal structure -- defined by the distribution of patches across multiple scales -- matches that of a single reference image. Recent approaches address this problem by training a diffusion model on a single image. But even in this setting, training is computationally expensive and requires hours of optimization. Instead, we model the image using a dataset of its patches at different scales. As this dataset is finite and the dimensionality of its patches is small, the score function for a noisy patch can be computed tractably using an optimal, closed-form denoiser, eliminating the need for neural network training. We integrate this patch-based denoiser into an efficient, training-free image diffusion model, and we describe how our method connects to classical patch-based image restoration techniques. Our approach achieves state-of-the-art generation quality and diversity compared to trained single-image diffusion models, and we demonstrate applications, including unconditional image generation, text-guided stylization, image symmetrization, and retargeting. Further, we show that our approach is compatible with latent space diffusion, and we show multiple additional acceleration techniques to achieve megapixel single-image generation in one second, and gigapixel generation in minutes.