Too Vivid to Be Real? Benchmarking and Calibrating Generative Color Fidelity

2026-03-11 • Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition

AI summaryⓘ

The authors noticed that current text-to-image generators often make images look overly colorful and unrealistic because people tend to prefer bright, vivid pictures when rating them. To fix this, they created a large dataset called the Color Fidelity Dataset (CFD) that includes many real and generated images with varying realism in color. They also developed a new metric (CFM) that measures how true the colors are to real life. Moreover, they designed a method (CFR) that improves color accuracy during image generation without extra training. Together, these tools help check and improve the color realism of images created from text.

text-to-image generationcolor fidelityimage realismevaluation metricsmultimodal encoderdatasetsynthetic imagesspatial-temporal guidanceperceptual qualitycolor authenticity

Authors

Zhengyao Fang, Zexi Jia, Yijia Zhong, Pengcheng Luo, Jinchao Zhang, Guangming Lu, Jun Yu, Wenjie Pei

Abstract

Recent advances in text-to-image (T2I) generation have greatly improved visual quality, yet producing images that appear visually authentic to real-world photography remains challenging. This is partly due to biases in existing evaluation paradigms: human ratings and preference-trained metrics often favor visually vivid images with exaggerated saturation and contrast, which make generations often too vivid to be real even when prompted for realistic-style images. To address this issue, we present Color Fidelity Dataset (CFD) and Color Fidelity Metric (CFM) for objective evaluation of color fidelity in realistic-style generations. CFD contains over 1.3M real and synthetic images with ordered levels of color realism, while CFM employs a multimodal encoder to learn perceptual color fidelity. In addition, we propose a training-free Color Fidelity Refinement (CFR) that adaptively modulates spatial-temporal guidance scale in generation, thereby enhancing color authenticity. Together, CFD supports CFM for assessment, whose learned attention further guides CFR to refine T2I fidelity, forming a progressive framework for assessing and improving color fidelity in realistic-style T2I generation. The dataset and code are available at https://github.com/ZhengyaoFang/CFM.

View PDFOpen arXiv