ViTaB-A: Evaluating Multimodal Large Language Models on Visual Table Attribution
2026-02-17 • Computation and Language
Computation and Language
AI summaryⓘ
The authors studied how well multimodal large language models (mLLMs) can not only answer questions using tables but also correctly show exactly where the answers come from in the data (like which rows and columns). They found that while these models can often answer questions okay, they are much worse at pointing to the exact evidence, especially with JSON data. The models were better at citing rows than columns and had more trouble with text-based tables compared to images. Different model types also showed varying results, but overall the authors say current mLLMs are not reliable for clear and detailed citing of structured data.
multimodal large language modelsstructured datatable formatsJSONevidence attributionquestion answeringprompting strategiestransparencytraceability
Authors
Yahia Alqurnawi, Preetom Biswas, Anmol Rao, Tejas Anvekar, Chitta Baral, Vivek Gupta
Abstract
Multimodal Large Language Models (mLLMs) are often used to answer questions in structured data such as tables in Markdown, JSON, and images. While these models can often give correct answers, users also need to know where those answers come from. In this work, we study structured data attribution/citation, which is the ability of the models to point to the specific rows and columns that support an answer. We evaluate several mLLMs across different table formats and prompting strategies. Our results show a clear gap between question answering and evidence attribution. Although question answering accuracy remains moderate, attribution accuracy is much lower, near random for JSON inputs, across all models. We also find that models are more reliable at citing rows than columns, and struggle more with textual formats than images. Finally, we observe notable differences across model families. Overall, our findings show that current mLLMs are unreliable at providing fine-grained, trustworthy attribution for structured data, which limits their usage in applications requiring transparency and traceability.