SigmaQuant: Hardware-Aware Heterogeneous Quantization Method for Edge DNN Inference
2026-02-25 • Machine Learning
Machine LearningHardware Architecture
AI summaryⓘ
The authors discuss how deep neural networks (DNNs) often struggle on devices like phones because these devices have limited memory, energy, and processing power. They explain that using the same small data size (bitwidth) for all parts of the network can hurt accuracy or waste resources. To fix this, the authors propose SigmaQuant, a system that assigns different bit sizes to different layers in the network based on needs, adapting to hardware limits without requiring long trial-and-error searches. This approach aims to keep the model both accurate and efficient on various devices.
Deep Neural NetworksQuantizationBitwidthHeterogeneous QuantizationEdge DevicesModel CompressionResource ConstraintsAdaptive AlgorithmsLatencyEnergy Efficiency
Authors
Qunyou Liu, Pengbo Yu, Marina Zapater, David Atienza
Abstract
Deep neural networks (DNNs) are essential for performing advanced tasks on edge or mobile devices, yet their deployment is often hindered by severe resource constraints, including limited memory, energy, and computational power. While uniform quantization provides a straightforward approach to compress model and reduce hardware requirement, it fails to fully leverage the varying robustness across layers, and often lead to accuracy degradation or suboptimal resource usage, particularly at low bitwidths. In contrast, heterogeneous quantization, which allocates different bitwidths to individual layers, can mitigate these drawbacks. Nonetheless, current heterogeneous quantization methods either needs huge brute-force design space search or lacks the adaptability to meet different hardware conditions, such as memory size, energy budget, and latency requirement. Filling these gaps, this work introduces \textbf{\textit{SigmaQuant}}, an adaptive layer-wise heterogeneous quantization framework designed to efficiently balance accuracy and resource usage for varied edge environments without exhaustive search.