Global Structure-from-Motion Meets Feedforward Reconstruction

2026-05-25Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition
AI summary

The authors study Structure-from-Motion (SfM), a technique that reconstructs 3D scenes and camera positions from photos. They note that new feedforward methods handle tricky cases like low texture or symmetry better than classical methods but struggle with scale and accuracy in normal scenarios. To fix this, they created a new pipeline that combines the strengths of both classical and feedforward approaches. Their tests show improved results across different situations, and they provide their code for others to use.

Structure-from-Motion3D reconstructioncamera pose estimationfeedforward methodsclassical SfMlow textureimage overlapsymmetrycomputer vision
Authors
Linfei Pan, Johannes Schönberge, Marc Pollefeys
Abstract
Structure-from-Motion -- the process of simultaneously estimating camera poses and 3D scene structure from a collection of images -- remains a central challenge in computer vision, with many open problems yet to be solved. Recent advances in feedforward 3D reconstruction have made significant strides in overcoming persistent failure cases of classical SfM methods, particularly in scenarios characterized by low texture, limited overlap, and symmetries. However, while feedforward approaches excel in these challenging conditions, they often face limitations regarding scalability, accuracy, or robustness, and typically fall short of classical methods in standard reconstruction settings. In this work, we systematically analyze these limitations and propose a new Structure-from-Motion pipeline by combining the respective strengths of classical and feedforward methods. Extensive experiments across multiple datasets show the benefits of our approach, achieving state-of-the-art results across a wide range of scenarios. We share our system as an open-source implementation at https://github.com/colmap/gluemap.