Mitigating Perceptual Judgment Bias in Multimodal LLM-as-a-Judge via Perceptual Perturbation and Reward Modeling

2026-06-01Computer Vision and Pattern Recognition

Computer Vision and Pattern RecognitionArtificial Intelligence
AI summary

The authors looked at how current multimodal large language models (MLLMs) sometimes prefer believable story-like answers over ones that match what they actually see in an image. They call this problem Perceptual Judgment Bias and showed that these models often ignore visual details and rely too much on text. To fix this, they created a special dataset with nearly identical answers that only differ in visual mistakes to better teach the models. Using this data, they designed a new training method that helps models judge images and text together more accurately and consistently, closer to how humans evaluate them. Their work improves the reliability of these models when understanding and judging visual information combined with language.

Multimodal Large Language ModelsPerceptual Judgment BiasVisual ReasoningCounterfactual ResponsesPerceptually Perturbed Judgment DatasetGRPO-based RewardBatch-ranking ObjectiveMLLM-as-a-JudgeVisual-Text AlignmentHuman Evaluation
Authors
Seojeong Park, Jiho Choi, Junyong Kang, Seonho Lee, Jaeyo Shin, Hyunjung Shim
Abstract
Recent multimodal large language models have demonstrated strong reasoning ability, yet their reliability as automated evaluators remains limited by a critical weakness: when visual evidence conflicts with textual cues, MLLM judges tend to reward plausible narratives over perceptually correct answers. We identify and systematically analyze this phenomenon, which we term Perceptual Judgment Bias. Through controlled visual perturbations, existing multimodal judges frequently anchor on the response text instead of their own visual perception, leading to inconsistent and non-verifiable evaluations. To address this issue, we introduce the Perceptually Perturbed Judgment Dataset, which constructs minimally edited counterfactual responses that isolate perceptual errors and enable verifiable supervision. Building on this dataset, we develop a unified training framework that combines a structured GRPO-based reward with a batch-ranking objective, achieving coherent global ordering without explicit pairwise labels. Experiments across diverse MLLM-as-a-Judge benchmarks show that our approach substantially improves perceptual fidelity, ranking coherence, and alignment with human evaluation. Our results establish a scalable and generalizable pathway for training multimodal judges that are perceptually grounded, interpretable, and robust to visual-reasoning conflicts.