PhysForge: Generating Physics-Grounded 3D Assets for Interactive Virtual World
2026-05-06 • Computer Vision and Pattern Recognition
Computer Vision and Pattern Recognition
AI summaryⓘ
The authors address the problem of creating 3D objects that not only look good but also behave correctly in virtual worlds. They introduce PhysForge, a system that first plans the object's physical properties and functions, then builds the object with precise shapes and movement parts. Their approach uses a big dataset called PhysDB and combines a language model with a physics-aware design to ensure the objects can work properly in simulations. Experiments show their method makes useful, realistic 3D assets for interactive environments.
3D asset synthesisphysics-based modelingfunctional propertieshierarchical physical blueprintvision-language model (VLM)diffusion modelkinematic parameterssimulation-ready assetsembodied AIKineVoxel Injection (KVI)
Authors
Yunhan Yang, Chunshi Wang, Junliang Ye, Yang Li, Zanxin Chen, Zehuan Huang, Yao Mu, Zhuo Chen, Chunchao Guo, Xihui Liu
Abstract
Synthesizing physics-grounded 3D assets is a critical bottleneck for interactive virtual worlds and embodied AI. Existing methods predominantly focus on static geometry, overlooking the functional properties essential for interaction. We propose that interactive asset generation must be rooted in functional logic and hierarchical physics. To bridge this gap, we introduce PhysForge, a decoupled two-stage framework supported by PhysDB, a large-scale dataset of 150,000 assets with four-tier physical annotations. First, a VLM acts as a "physical architect" to plan a "Hierarchical Physical Blueprint" defining material, functional, and kinematic constraints. Second, a physics-grounded diffusion model realizes this blueprint by synthesizing high-fidelity geometry alongside precise kinematic parameters via a novel KineVoxel Injection (KVI) mechanism. Experiments demonstrate that PhysForge produces functionally plausible, simulation-ready assets, providing a robust data engine for interactive 3D content and embodied agents.