Measuring Model Robustness via Fisher Information: Spectral Bounds, Theoretical Guarantees, and Practical Algorithms
2026-06-03 • Machine Learning
Machine LearningComputer Vision and Pattern Recognition
AI summaryⓘ
The authors propose a new way to measure how sensitive deep learning models are to small changes in their inputs, without relying on specific attack methods. Their measure uses a mathematical object called the Fisher Information Matrix (FIM) to capture the worst-case influence on the model's output. They provide theoretical results and practical algorithms to calculate this measure efficiently for popular neural network types. Their experiments show that this measure relates well to actual attacks, making it a useful tool to understand and improve model robustness.
deep neural networksrobustnessFisher Information Matrixspectral norminput perturbationJacobianadversarial vulnerabilitypower iterationattack-agnostic metricmodel interpretability
Authors
Chong Zhang, Xiang Li, Jia Wang, Qiufeng Wang, Xiaobo Jin
Abstract
The robustness of deep neural networks is crucial for safety-critical deployments, yet existing evaluation methods are often attack-dependent and lack interpretability. We propose a principled, attack-agnostic robustness metric based on the spectral norm of the Fisher Information Matrix (FIM), which quantifies the worst-case sensitivity of the model's output distribution to input perturbations. Theoretically, we establish that the FIM equals the variance of the input Jacobian and derive closed-form spectral bounds for common architectures, including VGG, ResNet, DenseNet, and Transformer, providing the first theoretical robustness ranking. To enable scalable evaluation, we develop efficient algorithms, including power iteration and Hutchinson-based estimation, that support both white-box and black-box settings. Extensive experiments across multiple datasets, including CIFAR, ImageNet, and medical images, and across multiple architectures show a strong correlation between our metric and adversarial vulnerability. Our framework serves as an interpretable diagnostic tool that complements attack-based evaluations, offering insights into architectural sensitivity and guiding the design of more robust models. Code is available at: https://github.com/franz-chang/SRP/.