Under One Sun: Multi-Object Generative Perception of Materials and Illumination
2026-03-19 • Computer Vision and Pattern Recognition
Computer Vision and Pattern Recognition
AI summaryⓘ
The authors present MultiGP, a method that can figure out how objects in a photo look by separating their texture, reflectance, and the lighting that affects them, all from just one image. They use the idea that while objects may look different, the lighting is the same for all of them, which helps solve the tricky problem of separating these visual elements. Their approach combines a special network architecture and techniques like attention and control networks to keep texture details clear and lighting consistent. Tests show MultiGP can accurately pull apart each object's look and the shared lighting in the scene.
generative inverse renderingreflectancetextureilluminationradiometric disentanglementcascaded architectureaxial attentiondiffusion modelsControlNet
Authors
Nobuo Yoshii, Xinran Nicole Han, Ryo Kawahara, Todd Zickler, Ko Nishino
Abstract
We introduce Multi-Object Generative Perception (MultiGP), a generative inverse rendering method for stochastic sampling of all radiometric constituents -- reflectance, texture, and illumination -- underlying object appearance from a single image. Our key idea to solve this inherently ambiguous radiometric disentanglement is to leverage the fact that while their texture and reflectance may differ, objects in the same scene are all lit by the same illumination. MultiGP exploits this consensus to produce samples of reflectance, texture, and illumination from a single image of known shapes based on four key technical contributions: a cascaded end-to-end architecture that combines image-space and angular-space disentanglement; Coordinated Guidance for diffusion convergence to a single consistent illumination estimate; Axial Attention applied to facilitate ``cross-talk'' between objects of different reflectance; and a Texture Extraction ControlNet to preserve high-frequency texture details while ensuring decoupling from estimated lighting. Experimental results demonstrate that MultiGP effectively leverages the complementary spatial and frequency characteristics of multiple object appearances to recover individual texture and reflectance as well as the common illumination.