Detailed Geometry and Appearance from Opportunistic Motion

2026-03-27Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition
AI summary

The authors show that by watching how an object moves when a person manipulates it in front of a few fixed cameras, it’s possible to see the object from more angles than the cameras alone provide. This helps build better 3D models of the object’s shape and appearance. They tackle the challenge of figuring out both the object's position and shape at the same time, and handle how the object looks under lighting as it moves. Their method, tested on simulated and real data, creates more accurate 3D reconstructions than previous methods that use only sparse camera views.

3D reconstructionsparse viewpointsobject pose estimationgeometry estimationappearance modeling6DoF trajectoriesGaussian splattingspherical harmonicsspecular reflectiondiffuse reflection
Authors
Ryosuke Hirai, Kohei Yamashita, Antoine Guédon, Ryo Kawahara, Vincent Lepetit, Ko Nishino
Abstract
Reconstructing 3D geometry and appearance from a sparse set of fixed cameras is a foundational task with broad applications, yet it remains fundamentally constrained by the limited viewpoints. We show that this bound can be broken by exploiting opportunistic object motion: as a person manipulates an object~(e.g., moving a chair or lifting a mug), the static cameras effectively ``orbit'' the object in its local coordinate frame, providing additional virtual viewpoints. Harnessing this object motion, however, poses two challenges: the tight coupling of object pose and geometry estimation and the complex appearance variations of a moving object under static illumination. We address these by formulating a joint pose and shape optimization using 2D Gaussian splatting with alternating minimization of 6DoF trajectories and primitive parameters, and by introducing a novel appearance model that factorizes diffuse and specular components with reflected directional probing within the spherical harmonics space. Extensive experiments on synthetic and real-world datasets with extremely sparse viewpoints demonstrate that our method recovers significantly more accurate geometry and appearance than state-of-the-art baselines.