GenOpticalFlow: A Generative Approach to Unsupervised Optical Flow Learning

2026-03-23Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition
AI summary

The authors address the challenge of teaching computers to estimate how objects move between video frames without using expensive hand-labeled data. They create a new method called \modelname that generates synthetic video frames and exact motion data by using depth information from a pre-trained model to guide a frame prediction process. This synthetic data provides reliable training examples without human input. To improve accuracy, they also filter out parts of the generated frames that might be incorrect. Their experiments show this approach works well compared to other methods that avoid using manual labels.

optical flowcomputer visiondepth estimationframe synthesisunsupervised learningsemi-supervised learningbrightness constancymotion estimationpixel alignmentdata augmentation
Authors
Yixuan Luo, Feng Qiao, Zhexiao Xiong, Yanjing Li, Nathan Jacobs
Abstract
Optical flow estimation is a fundamental problem in computer vision, yet the reliance on expensive ground-truth annotations limits the scalability of supervised approaches. Although unsupervised and semi-supervised methods alleviate this issue, they often suffer from unreliable supervision signals based on brightness constancy and smoothness assumptions, leading to inaccurate motion estimation in complex real-world scenarios. To overcome these limitations, we introduce \textbf{\modelname}, a novel framework that synthesizes large-scale, perfectly aligned frame--flow data pairs for supervised optical flow training without human annotations. Specifically, our method leverages a pre-trained depth estimation network to generate pseudo optical flows, which serve as conditioning inputs for a next-frame generation model trained to produce high-fidelity, pixel-aligned subsequent frames. This process enables the creation of abundant, high-quality synthetic data with precise motion correspondence. Furthermore, we propose an \textit{inconsistent pixel filtering} strategy that identifies and removes unreliable pixels in generated frames, effectively enhancing fine-tuning performance on real-world datasets. Extensive experiments on KITTI2012, KITTI2015, and Sintel demonstrate that \textbf{\modelname} achieves competitive or superior results compared to existing unsupervised and semi-supervised approaches, highlighting its potential as a scalable and annotation-free solution for optical flow learning. We will release our code upon acceptance.