NeuROK: Generative 4D Neural Object Kinematics

2026-05-28Computer Vision and Pattern Recognition

Computer Vision and Pattern RecognitionGraphics
AI summary

The authors address the challenge of simulating how 3D objects change shape over time under physical forces, which is hard with existing methods that rely on fixed physics models. They propose a way to learn a simple, compressed representation (latent space) that captures all possible shapes an object can take when it moves or deforms, calling this approach Neural Object Kinematics (NeuROK). Using a transformer model trained on a large dataset, their method can realistically generate these changes more flexibly and generally than before by working within this low-dimensional representation. This simplifies simulating object dynamics without needing detailed physics formulas. Tests show their approach works well for a variety of objects and scenarios.

3D visiontransformers4D dynamicslatent spaceobject kinematicsNeural Object Kinematicsencoder-decoder modelLagrangian mechanicsdata-driven simulationphysical deformation
Authors
Chen Geng, Guangzhao He, Yue Gao, Yunzhi Zhang, Shangzhe Wu, Jiajun Wu
Abstract
Data-driven approaches have revolutionized 3D vision, enabling transformers to effectively reconstruct and generate static 3D objects. However, generating simulative 4D dynamics -- realistic temporal deformations of static objects under various physical conditions -- remains challenging and often ad hoc, despite its importance in building comprehensive 3D world models. Most existing methods assume a predefined physical model and use system identification to estimate parameters, restricting these methods to specific categories and small-scale datasets. We propose that these restrictions can be overcome by learning a data-driven kinematic state parameterization for object-centric physical systems. Specifically, we learn both a latent space representing all possible states of the object and a decoder that maps any sampled latent to a plausibly deformed shape of the object. We refer to this parameterization as Neural Object Kinematics (NeuROK), and learn a transformer-based encoder-decoder model on a curated large-scale 4D dataset. This formulation and the learned model significantly simplify the generation of simulative dynamics since we only need to consider the dynamics within a low-dimensional latent space from the Lagrangian mechanics' perspective in classical physics. We demonstrate the effectiveness and generality of this neural simulation framework across diverse dynamic object types, showing clear advantages over prior works. Project page: https://chen-geng.com/neurok