DreamPartGen: Semantically Grounded Part-Level 3D Generation via Collaborative Latent Denoising
2026-03-19 • Computer Vision and Pattern Recognition
Computer Vision and Pattern RecognitionArtificial IntelligenceMachine Learning
AI summaryⓘ
The authors created DreamPartGen, a system that makes 3D objects from text by breaking them into meaningful parts with both shape and look. Unlike past methods that only looked at shapes, DreamPartGen understands how parts relate to each other and match the words used to describe them. It uses special models to keep the object's parts connected and aligned with the description while making the 3D shapes. Their tests show that DreamPartGen improves how well the 3D objects match the text and keep realistic shapes.
text-to-3D generationsemantic groundingpart decompositiongeometry modelingappearance modelinginter-part relationsco-denoising3D synthesistext-shape alignment
Authors
Tianjiao Yu, Xinzhuo Li, Muntasir Wahed, Jerry Xiong, Yifan Shen, Ying Shen, Ismini Lourentzou
Abstract
Understanding and generating 3D objects as compositions of meaningful parts is fundamental to human perception and reasoning. However, most text-to-3D methods overlook the semantic and functional structure of parts. While recent part-aware approaches introduce decomposition, they remain largely geometry-focused, lacking semantic grounding and failing to model how parts align with textual descriptions or their inter-part relations. We propose DreamPartGen, a framework for semantically grounded, part-aware text-to-3D generation. DreamPartGen introduces Duplex Part Latents (DPLs) that jointly model each part's geometry and appearance, and Relational Semantic Latents (RSLs) that capture inter-part dependencies derived from language. A synchronized co-denoising process enforces mutual geometric and semantic consistency, enabling coherent, interpretable, and text-aligned 3D synthesis. Across multiple benchmarks, DreamPartGen delivers state-of-the-art performance in geometric fidelity and text-shape alignment.