Efficient KernelSHAP Explanations for Patch-based 3D Medical Image Segmentation

2026-04-13Computer Vision and Pattern Recognition

Computer Vision and Pattern RecognitionArtificial Intelligence
AI summary

The authors developed a faster way to explain 3D medical image segmentation models using KernelSHAP, focusing only on important regions to save time. They improved efficiency by reusing previous calculations for parts of the image that don't change. They also tested different ways to group image features for explanations, comparing whole organs, simple supervoxels, and organ-aware supervoxels. Their experiments showed that while simple supervoxels perform well on technical metrics, organ-aware groupings provide more meaningful and clinically useful explanations, especially for spotting false positives.

KernelSHAP3D medical image segmentationCT scansnnU-Netsupervoxelsperturbation-based explainabilityreceptive fieldfalse positivepatch logit cachingDice score
Authors
Ricardo Coimbra Brioso, Giulio Sichili, Damiano Dei, Nicola Lambri, Pietro Mancosu, Marta Scorsetti, Daniele Loiacono
Abstract
Perturbation-based explainability methods such as KernelSHAP provide model-agnostic attributions but are typically impractical for patch-based 3D medical image segmentation due to the large number of coalition evaluations and the high cost of sliding-window inference. We present an efficient KernelSHAP framework for volumetric CT segmentation that restricts computation to a user-defined region of interest and its receptive-field support, and accelerates inference via patch logit caching, reusing baseline predictions for unaffected patches while preserving nnU-Net's fusion scheme. To enable clinically meaningful attributions, we compare three automatically generated feature abstractions within the receptive-field crop: whole-organ units, regular FCC supervoxels, and hybrid organ-aware supervoxels, and we study multiple aggregation/value functions targeting stabilizing evidence (TP/Dice/Soft Dice) or false-positive behavior. Experiments on whole-body CT segmentations show that caching substantially reduces redundant computation (with computational savings ranging from 15% to 30%) and that faithfulness and interpretability exhibit clear trade-offs: regular supervoxels often maximize perturbation-based metrics but lack anatomical alignment, whereas organ-aware units yield more clinically interpretable explanations and are particularly effective for highlighting false-positive drivers under normalized metrics.