Vision-Default, Prior-Override: Causal Mechanisms of Perception-Knowledge Conflict in Vision-Language Models
2026-06-26 • Computation and Language
Computation and Language
AI summaryⓘ
The authors studied how vision-language models decide between what they see and what they "know" when these two don't match. They used detailed techniques to find specific parts of the model, especially certain attention heads, that handle stored knowledge like facts about objects. Removing these parts caused the model to rely more on visual input, showing these knowledge parts are key for overriding visual information. This pattern was consistent across different models and sizes, pointing to a small set of components controlling how the models resolve conflicts between seeing and memorized facts.
vision-language modelsattention headsactivation patchingresidual streammechanistic analysismodel ablationvisual groundingprior knowledge groundingcausal structuremultimodal systems
Authors
Niclas Lietzow, Danielle Bitterman, Carsten Eickhoff, William Rudman, Michal Golovanevsky
Abstract
Vision-language models must reconcile visual evidence with memorized world knowledge when the two conflict. How they resolve this conflict shapes the reliability of multimodal systems, yet prior work characterizes it behaviorally without a component-level causal account. We combine activation patching across three granularities (residual stream, attention heads, and MLP sublayers) with model-component ablation studies and mechanistic analysis. Across three VLM families, we find that visual grounding emerges by default, whereas prior grounding depends on a small set of causally necessary attention heads (2.5-4.8%) concentrated in the second half of the network. These heads enable answers from stored world knowledge (e.g., "red" for a strawberry) despite conflicting visual input. Ablating them flips predictions from knowledge-grounded to visually grounded answers in 68-96% of cases under prior-knowledge prompts, but changes only 0.8-7.5% of visually grounded predictions, establishing an asymmetric causal structure. The identified heads decompose into routing heads, which modulate information flow, and writing heads, which directly project answer tokens into the residual stream. This structure is consistent across model families and scales, revealing a sparse causal circuit underlying perception-knowledge conflict in VLMs.