Anchor3R: Streaming 3D Reconstruction with Transient Anchors for Long-Horizon Visual Mapping

2026-06-03Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition
AI summary

The authors focus on helping robots map their surroundings over long periods using only a camera. Existing methods often struggle because they tie the robot's position estimates to a fixed starting point, causing errors to build up when the robot moves far. Their new method, Anchor3R, instead predicts positions relative to the current view, making the system more flexible and accurate over time. They tested this approach in various settings and found it improves position accuracy and map quality while using limited memory.

visual mapping3D reconstructionpose estimationstreaming algorithmslocal coordinate systemloop closuremotion averagingrobot perceptionRGB-D databounded memory inference
Authors
Peilin Tao, Chong Cheng, Yuansen Du, Caiwei Song, Zhengqing Chen, Xiaoyang Guo, Wei Yin, Weiqiang Ren, Qian Zhang, Hainan Cui, Shuhan Shen
Abstract
Long-horizon online visual mapping is a core capability for robot perception, requiring continuous camera-motion and scene-geometry estimation from visual streams under bounded memory and computation. Recent feed-forward 3D reconstruction models provide strong geometric priors, but their streaming variants often predict poses in a fixed coordinate system tied to the first frame or a persistent scene memory. This fixed-gauge design leads to train--test mismatch, attention bias toward early anchors, and accumulated drift on sequences much longer than those seen during training. We propose \emph{Anchor3R}, a streaming 3D reconstruction framework that treats feed-forward reconstruction as current-centric local measurement prediction rather than persistent global-gauge regression. At each time step, Anchor3R predicts window-relative poses and a local pointmap in the current-frame coordinate system, turning streaming reconstruction into relative-pose measurement generation. These measurements support online pose updates, while loop-closure reinsertion and motion averaging align the trajectory and transform local pointmaps into a coherent global reconstruction. Experiments on indoor, outdoor, driving, and RGB-D benchmarks show that Anchor3R improves long-horizon pose accuracy and dense reconstruction quality over existing streaming baselines, while supporting bounded-memory online inference.