Pushing the Frontier of Black-Box LVLM Attacks via Fine-Grained Detail Targeting

2026-02-19Machine Learning

Machine LearningArtificial IntelligenceComputation and LanguageComputer Vision and Pattern Recognition
AI summary

The authors improved a method for tricking large vision-language models without knowing their internal details. They found that previous approaches caused unstable learning due to how these models process parts of images. To fix this, they introduced techniques to average out confusing signals and use smarter examples, making the attack more reliable. Their new method, M-Attack-V2, greatly increased success rates against several advanced models compared to before. This work helps better understand and test the weaknesses of big vision-language systems.

Black-box adversarial attacksLarge Vision-Language Models (LVLMs)Gradient varianceVision Transformers (ViT)Local crop-level matchingMulti-Crop Alignment (MCA)Auxiliary Target Alignment (ATA)Patch MomentumTransfer-based attacksOptimization stability
Authors
Xiaohan Zhao, Zhaoyi Li, Yaxin Luo, Jiacheng Cui, Zhiqiang Shen
Abstract
Black-box adversarial attacks on Large Vision-Language Models (LVLMs) are challenging due to missing gradients and complex multimodal boundaries. While prior state-of-the-art transfer-based approaches like M-Attack perform well using local crop-level matching between source and target images, we find this induces high-variance, nearly orthogonal gradients across iterations, violating coherent local alignment and destabilizing optimization. We attribute this to (i) ViT translation sensitivity that yields spike-like gradients and (ii) structural asymmetry between source and target crops. We reformulate local matching as an asymmetric expectation over source transformations and target semantics, and build a gradient-denoising upgrade to M-Attack. On the source side, Multi-Crop Alignment (MCA) averages gradients from multiple independently sampled local views per iteration to reduce variance. On the target side, Auxiliary Target Alignment (ATA) replaces aggressive target augmentation with a small auxiliary set from a semantically correlated distribution, producing a smoother, lower-variance target manifold. We further reinterpret momentum as Patch Momentum, replaying historical crop gradients; combined with a refined patch-size ensemble (PE+), this strengthens transferable directions. Together these modules form M-Attack-V2, a simple, modular enhancement over M-Attack that substantially improves transfer-based black-box attacks on frontier LVLMs: boosting success rates on Claude-4.0 from 8% to 30%, Gemini-2.5-Pro from 83% to 97%, and GPT-5 from 98% to 100%, outperforming prior black-box LVLM attacks. Code and data are publicly available at: https://github.com/vila-lab/M-Attack-V2.