RewardFlow: Generate Images by Optimizing What You Reward

2026-04-09 • Computer Vision and Pattern Recognition

Computer Vision and Pattern RecognitionArtificial Intelligence

AI summaryⓘ

The authors present RewardFlow, a new method to improve image editing and generation using existing diffusion models without changing their training. Their approach uses multiple reward signals that help the model understand and match what the user wants, like keeping objects consistent and following instructions strictly. They also add a special reward based on visual question answering to give more detailed guidance. To manage these different goals, they created a system that adjusts how the model focuses on each reward during the editing process. Their tests show that RewardFlow improves how well images match the desired edits and compositions.

diffusion modelsflow-matching modelsLangevin dynamicssemantic alignmentperceptual fidelityvisual question answering (VQA)image editingreward functionsmulti-objective optimizationcompositional generation

Authors

Onkar Susladkar, Dong-Hwan Jang, Tushar Prakash, Adheesh Juvekar, Vedant Shah, Ayush Barik, Nabeel Bashir, Muntasir Wahed, Ritish Shrirao, Ismini Lourentzou

Abstract

We introduce RewardFlow, an inversion-free framework that steers pretrained diffusion and flow-matching models at inference time through multi-reward Langevin dynamics. RewardFlow unifies complementary differentiable rewards for semantic alignment, perceptual fidelity, localized grounding, object consistency, and human preference, and further introduces a differentiable VQA-based reward that provides fine-grained semantic supervision through language-vision reasoning. To coordinate these heterogeneous objectives, we design a prompt-aware adaptive policy that extracts semantic primitives from the instruction, infers edit intent, and dynamically modulates reward weights and step sizes throughout sampling. Across several image editing and compositional generation benchmarks, RewardFlow delivers state-of-the-art edit fidelity and compositional alignment.

View PDFOpen arXiv