Towards Training-Free Scene Text Editing

2026-03-25 • Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition

AI summaryⓘ

The authors developed TextFlow, a new method to edit text in photos without needing extra training or paired examples. They combine two techniques: Flow Manifold Steering (FMS), which keeps the style and structure consistent, and Attention Boost (AttnBoost), which improves how the text looks. Their approach works well on different scenes and languages, producing realistic and accurate edited images. This makes text editing in images easier and more adaptable compared to older methods.

scene text editingattention mechanismflow manifold steeringvisual realismsemantic consistencytraining-free methodstext manipulationspatial refinementimage editingend-to-end framework

Authors

Yubo Li, Xugong Qin, Peng Zhang, Hailun Lin, Gangyan Zeng, Kexin Zhang

Abstract

Scene text editing seeks to modify textual content in natural images while maintaining visual realism and semantic consistency. Existing methods often require task-specific training or paired data, limiting their scalability and adaptability. In this paper, we propose TextFlow, a training-free scene text editing framework that integrates the strengths of Attention Boost (AttnBoost) and Flow Manifold Steering (FMS) to enable flexible, high-fidelity text manipulation without additional training. Specifically, FMS preserves the structural and style consistency by modeling the visual flow of characters and background regions, while AttnBoost enhances the rendering of textual content through attention-based guidance. By jointly leveraging these complementary modules, our approach performs end-to-end text editing through semantic alignment and spatial refinement in a plug-and-play manner. Extensive experiments demonstrate that our framework achieves visual quality and text accuracy comparable to or superior to those of training-based counterparts, generalizing well across diverse scenes and languages. This study advances scene text editing toward a more efficient, generalizable, and training-free paradigm. Code is available at https://github.com/lyb18758/TextFlow

View PDFOpen arXiv