BAMI: Training-Free Bias Mitigation in GUI Grounding

2026-05-07 • Computer Vision and Pattern Recognition

Computer Vision and Pattern RecognitionArtificial Intelligence

AI summaryⓘ

The authors studied a problem where computer programs need to accurately understand and interact with graphical user interfaces (GUIs), like clicking buttons or dragging items. They found that errors happen mostly because the images are very detailed and the interface elements are complicated. To fix this, they created a method called BAMI that helps the model focus better and choose the right options more carefully. Their tests showed that BAMI improved the accuracy of existing models without needing additional training. This means it helps programs understand GUIs more reliably in tricky situations.

GUI groundingScreenSpot-Pro benchmarkMasked Prediction Distribution (MPD)Bias-Aware Manipulation Inference (BAMI)coarse-to-fine focuscandidate selectionprecision biasambiguity biasTianXi-Action-7Btraining-free setting

Authors

Borui Zhang, Bo Zhang, Bo Wang, Wenzhao Zheng, Yuhao Cheng, Liang Tang, Yiqiang Yan, Jie Zhou, Jiwen Lu

Abstract

GUI grounding is a critical capability for enabling GUI agents to execute tasks such as clicking and dragging. However, in complex scenarios like the ScreenSpot-Pro benchmark, existing models often suffer from suboptimal performance. Utilizing the proposed \textbf{Masked Prediction Distribution (MPD)} attribution method, we identify that the primary sources of errors are twofold: high image resolution (leading to precision bias) and intricate interface elements (resulting in ambiguity bias). To address these challenges, we introduce \textbf{Bias-Aware Manipulation Inference (BAMI)}, which incorporates two key manipulations, coarse-to-fine focus and candidate selection, to effectively mitigate these biases. Our extensive experimental results demonstrate that BAMI significantly enhances the accuracy of various GUI grounding models in a training-free setting. For instance, applying our method to the TianXi-Action-7B model boosts its accuracy on the ScreenSpot-Pro benchmark from 51.9\% to 57.8\%. Furthermore, ablation studies confirm the robustness of the BAMI approach across diverse parameter configurations, highlighting its stability and effectiveness. Code is available at https://github.com/Neur-IO/BAMI.

View PDFOpen arXiv