World from Motion: Generative Dynamic Gaussian Reconstruction from Monocular Video

2026-07-01 • Computer Vision and Pattern Recognition

Computer Vision and Pattern RecognitionArtificial IntelligenceGraphics

AI summaryⓘ

The authors propose a technique called World from Motion that creates detailed 3D movie-like models from regular single-camera videos. Their method uses special pixel-based images to help fix mistakes and fill gaps in the 3D reconstructions. They trained their system using paired videos and simulated errors common in single-camera captures. When running, it improves the 3D models by adding new details and motions, resulting in better views from different angles. This approach works well even with real-world videos that have big viewpoint changes and movement.

3D Gaussian representationsmonocular videosnovel-view synthesisdynamic 3D reconstructionrendering artifactspixel-aligned renderingsmultiview video3D scene motion4D reconstructioncamera trajectories

Authors

Liyuan Zhu, Shengyu Huang, Amrita Mazumdar, Tianye Li, Zan Gojcic, Gordon Wetzstein, Iro Armeni, Shalini De Mello, Alex Trevithick

Abstract

We present World from Motion, a method for generating freely renderable dynamic 3D Gaussian representations from monocular videos. Our approach conditions a video model on dense, pixel-aligned renderings that encode appearance, geometry, and 3D scene motion along both input and target camera trajectories to correct rendering artifacts and fill in missing regions from an initial reconstruction. To train this model, we construct a dataset of aligned multiview video pairs and dynamic 3DGS representations, with simulated artifacts characteristic of monocular reconstruction. At test time, we distill the model's generations, including newly observed regions and motions, back into a single consistent, high-quality dynamic 3DGS, improving both novel-view synthesis and the underlying 3D motion. Our method sets a new state of the art in 4D reconstruction and seamlessly generalizes to in-the-wild videos with large viewpoint changes and dynamic motions.

View PDFOpen arXiv