FlowIt: Global Matching for Optical Flow with Confidence-Guided Refinement
2026-03-30 • Computer Vision and Pattern Recognition
Computer Vision and Pattern Recognition
AI summaryⓘ
The authors introduce FlowIt, a new model for estimating optical flow, which is how objects move between video frames. FlowIt uses a special type of neural network called a hierarchical transformer to understand large movements across the whole image. They start by creating a strong initial guess of motion using a math problem called optimal transport, and then improve it by focusing on areas with high confidence to help with uncertain parts. Their tests show FlowIt works very well on standard datasets and even generalizes to new data without extra training.
optical flowhierarchical transformerlarge pixel displacementoptimal transportocclusion mapconfidence mapguided refinementzero-shot generalizationSintel datasetKITTI dataset
Authors
Sadra Safadoust, Fabio Tosi, Matteo Poggi, Fatma Güney
Abstract
We present FlowIt, a novel architecture for optical flow estimation designed to robustly handle large pixel displacements. At its core, FlowIt leverages a hierarchical transformer architecture that captures extensive global context, enabling the model to effectively model long-range correspondences. To overcome the limitations of localized matching, we formulate the flow initialization as an optimal transport problem. This formulation yields a highly robust initial flow field, alongside explicitly derived occlusion and confidence maps. These cues are then seamlessly integrated into a guided refinement stage, where the network actively propagates reliable motion estimates from high-confidence regions into ambiguous, low-confidence areas. Extensive experiments across the Sintel, KITTI, Spring, and LayeredFlow datasets validate the efficacy of our approach. FlowIt achieves state-of-the-art results on the competitive Sintel and KITTI benchmarks, while simultaneously establishing new state-of-the-art cross-dataset zero-shot generalization performance on Sintel, Spring, and LayeredFlow.