Ω-QVLA: Robust Quantization for Vision-Language-Action Models via Composite Rotation and Per-step Scaling
2026-05-27 • Computer Vision and Pattern Recognition
Computer Vision and Pattern RecognitionMachine Learning
AI summaryⓘ
The authors present Omega-QVLA, a new method to shrink big Vision-Language-Action (VLA) models so they can run efficiently on devices without extra training. Unlike past methods that only compressed part of the model or used mixed precision, Omega-QVLA uniformly reduces the size of both the language and action parts of the model to low-bit precision. They use special math techniques to keep the model accurate despite this compression. Tests show their compressed models perform just as well or better than full-size versions while using much less memory, and they work well on real-world tasks where other methods struggle.
Vision-Language-Action modelsmodel quantizationpost-training quantizationdiffusion modelsSVD-Hadamard rotationactivation scalingW4A4 precisionmemory footprinton-device deployment
Authors
Xinyu Wang, Mingze Li, Sicheng Lyu, Dongxiu Liu, Kaicheng Yang, Ziyu Zhao, Yufei Cui, Xiao-Wen Chang, Peng Lu
Abstract
Vision-Language-Action (VLA) models unify perception, reasoning, and control within a single policy, yet their multi-billion-parameter backbones and diffusion-based action heads make on-device deployment prohibitively expensive. Prior quantization efforts offer only partial solutions, compressing the LLM backbone while leaving the DiT action head at full precision, or resorting to mixed-precision schemes, driven by the belief that uniformly quantizing the action head is inherently unstable. We challenge this assumption with Omega-QVLA, the first training-free post-training quantization framework that compresses both the language backbone and the entire diffusion action head of a VLA model to a uniform W4A4 precision, eliminating the need for mixed-precision allocation. Omega-QVLA combines a composite SVD-Hadamard rotation that equalizes per-channel weight energy while diffusing residual activation outliers with per-step DiT activation scaling quantization that absorbs dynamic-range drift across denoising steps. On LIBERO, Omega-QVLA compresses Pi 0.5 and GR00T N1.5 to W4A4 with 98.0% and 87.8% task success rates, matching or exceeding their FP16 references of 97.1% and 87.0%, while reducing the static memory footprint by 71.3%. Real-world manipulation experiments further confirm smooth, accurate manipulation where prior methods fail. Code is available at https://github.com/UCMP13753/Omega-QVLA.