UFO-4D: Unposed Feedforward 4D Reconstruction from Two Images
2026-02-27 • Computer Vision and Pattern Recognition
Computer Vision and Pattern Recognition
AI summaryⓘ
The authors present UFO-4D, a new method that can create detailed 3D models of moving scenes using only two random pictures. Instead of slow guessing, their approach quickly figures out the shapes, movements, and camera positions all at once by representing the scene as dynamic 3D Gaussian splats. This way, the method learns better by combining information about how things look, how far they are, and how they move, all linked together. Their approach works well even with limited data and is better than previous methods at estimating geometry, motion, and camera pose.
4D reconstruction3D Gaussian splatsunposed imagescamera pose estimationself-supervised learningdynamic geometryimage synthesisdepth estimationmotion estimationfeedforward framework
Authors
Junhwa Hur, Charles Herrmann, Songyou Peng, Philipp Henzler, Zeyu Ma, Todd Zickler, Deqing Sun
Abstract
Dense 4D reconstruction from unposed images remains a critical challenge, with current methods relying on slow test-time optimization or fragmented, task-specific feedforward models. We introduce UFO-4D, a unified feedforward framework to reconstruct a dense, explicit 4D representation from just a pair of unposed images. UFO-4D directly estimates dynamic 3D Gaussian Splats, enabling the joint and consistent estimation of 3D geometry, 3D motion, and camera pose in a feedforward manner. Our core insight is that differentiably rendering multiple signals from a single Dynamic 3D Gaussian representation offers major training advantages. This approach enables a self-supervised image synthesis loss while tightly coupling appearance, depth, and motion. Since all modalities share the same geometric primitives, supervising one inherently regularizes and improves the others. This synergy overcomes data scarcity, allowing UFO-4D to outperform prior work by up to 3 times in joint geometry, motion, and camera pose estimation. Our representation also enables high-fidelity 4D interpolation across novel views and time. Please visit our project page for visual results: https://ufo-4d.github.io/