Image Generation with a Sphere Encoder

2026-02-16 • Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition

AI summaryⓘ

The authors introduce the Sphere Encoder, a new image generation method that creates images quickly by passing data through a network just a few times. It works by learning to convert images into points on a sphere and then turning random points on that sphere back into images. The model is trained only by trying to reconstruct images, and it can also improve image quality by repeating the process. Their method competes well with slower, multi-step diffusion models but requires much less computation.

generative modelsimage generationlatent spaceencoder-decoderspherical latent spacediffusion modelsimage reconstructionconditional generationinference cost

Authors

Kaiyu Yue, Menglin Jia, Ji Hou, Tom Goldstein

Abstract

We introduce the Sphere Encoder, an efficient generative framework capable of producing images in a single forward pass and competing with many-step diffusion models using fewer than five steps. Our approach works by learning an encoder that maps natural images uniformly onto a spherical latent space, and a decoder that maps random latent vectors back to the image space. Trained solely through image reconstruction losses, the model generates an image by simply decoding a random point on the sphere. Our architecture naturally supports conditional generation, and looping the encoder/decoder a few times can further enhance image quality. Across several datasets, the sphere encoder approach yields performance competitive with state of the art diffusions, but with a small fraction of the inference cost. Project page is available at https://sphere-encoder.github.io .

View PDFOpen arXiv