MonoPhysics: Estimating Geometry, Appearance, and Physical Parameters from Monocular Videos

2026-05-28Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition
AI summary

The authors introduce MonoPhysics, a method that figures out how soft objects move and look by using just one camera. Usually, having multiple camera views helps because it gives more information about the object's shape and size, but MonoPhysics solves this problem with clever techniques that link physics and visuals. By combining simulation with 3D imaging approaches, their system can guess the object's physical properties and shape accurately from a single video. They tested it on different datasets and found it works better than other single-camera methods and almost as well as methods using multiple cameras.

inverse physicsmonocular visiondeformable objectsdifferentiable MPM simulation3D Gaussian Splattinggeometry refinementphysical parametersscale ambiguitymulti-view geometry
Authors
Daniel Rho, Jun Myeong Choi, Matthew Thornton, Biswadip Dey, Roni Sengupta
Abstract
Existing inverse physics methods recover physical parameters from multi-view videos, where geometric constraints across views resolve scale and 3D structure. In monocular settings, however, such constraints are absent, leading to severe scale ambiguity, inaccurate geometry, and weak coupling between appearance optimization and physical simulation. We propose MonoPhysics, a framework for monocular inverse physics estimation of deformable objects using differentiable MPM simulation and 3D Gaussian Splatting, which jointly optimizes geometry, appearance, and physical parameters from a single camera view. We address these challenges through three visual-physical bridges: global scale alignment, physics-aware geometry refinement, and a differentiable position map, which together enable accurate optimization from monocular observations alone. We evaluate on Vid2Sim and our new dataset of elastic and plastic objects, showing that MonoPhysics outperforms existing baselines in monocular settings and achieves performance comparable to multi-view baselines using only a single camera. Our project page is available at https://daniel03c1.github.io/MonoPhysics/