Posterior Augmented Flow Matching

2026-05-01Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition
AI summary

The authors explain that Flow Matching (FM) trains a guide to turn simple random data into complex images, but it struggles because it learns from very limited examples. They propose Posterior-Augmented Flow Matching (PAFM), which improves this by considering many possible ways to reach a target image from an intermediate step, rather than just one. This approach reduces noise in training and helps the model generalize better to new images. Their tests show PAFM performs better than FM on several image tasks without needing much extra computing power.

Flow MatchingVector FieldGradient VariancePosterior DistributionImportance SamplingFID ScoreImage GenerationConditional ModelingNeural NetworksOverfitting
Authors
George Stoica, Sayak Paul, Matthew Wallingford, Vivek Ramanujan, Abhay Nori, Winson Han, Ali Farhadi, Ranjay Krishna, Judy Hoffman
Abstract
Flow matching (FM) trains a time-dependent vector field that transports samples from a simple prior to a complex data distribution. However, for high-dimensional images, each training sample supervises only a single trajectory and intermediate point, yielding an extremely sparse and high-variance training signal. This under-constrained supervision can cause flow collapse, where the learned dynamics memorize specific source-target pairings, mapping diverse inputs to overly similar outputs, failing to generalize. We introduce Posterior-Augmented Flow Matching (PAFM), a theoretically grounded generalization of FM that replaces single-target supervision with an expectation over an approximate posterior of valid target completions for a given intermediate state and condition. PAFM factorizes this intractable posterior into (i) the likelihood of the intermediate under a hypothesized endpoint and (ii) the prior probability of that endpoint under the condition, and uses an importance sampling scheme to construct a mixture over multiple candidate targets. We prove that PAFM yields an unbiased estimator of the original FM objective while substantially reducing gradient variance during training by aggregating information from many plausible continuation trajectories per intermediate. Finally, we show that PAFM improves over FM by up to 3.4 FID50K across different model scales (SiT-B/2 and SiT-XL/2), different architectures (SiT and MMDiT), and in both class and text conditioned benchmarks (ImageNet and CC12M), with a negligible increase in the compute overhead. Code: https://github.com/gstoica27/PAFM.git.