Repurposing 3D Generative Model for Autoregressive Layout Generation

2026-04-17Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition
AI summary

The authors present LaviGen, a new way to create 3D object layouts by working directly in 3D space instead of using text descriptions. Their method generates scenes step-by-step, carefully considering how objects fit together and follow physical rules to make realistic arrangements. They also improve the process with a special adapted 3D diffusion model that uses multiple sources of information and a technique called dual-guidance self-rollout distillation to make it faster and more accurate. Tests show their approach produces layouts that are more physically plausible and quicker to generate than previous methods.

3D generative models3D layout generationautoregressive processgeometric relationsphysical constraints3D diffusion modelself-rollout distillationLayoutVLM benchmarkphysical plausibilityscene synthesis
Authors
Haoran Feng, Yifan Niu, Zehuan Huang, Yang-Tian Sun, Chunchao Guo, Yuxin Peng, Lu Sheng
Abstract
We introduce LaviGen, a framework that repurposes 3D generative models for 3D layout generation. Unlike previous methods that infer object layouts from textual descriptions, LaviGen operates directly in the native 3D space, formulating layout generation as an autoregressive process that explicitly models geometric relations and physical constraints among objects, producing coherent and physically plausible 3D scenes. To further enhance this process, we propose an adapted 3D diffusion model that integrates scene, object, and instruction information and employs a dual-guidance self-rollout distillation mechanism to improve efficiency and spatial accuracy. Extensive experiments on the LayoutVLM benchmark show LaviGen achieves superior 3D layout generation performance, with 19% higher physical plausibility than the state of the art and 65% faster computation. Our code is publicly available at https://github.com/fenghora/LaviGen.