Interpretable Aneurysm Classification via 3D Concept Bottleneck Models: Integrating Morphological and Hemodynamic Clinical Features

2026-03-08Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition
AI summary

The authors address the problem of using deep learning to identify brain aneurysms while keeping the model understandable by doctors. Instead of typical black-box AI, they use Concept Bottleneck Models that base predictions on clear clinical features related to aneurysm shape and blood flow. They built a 3D model analyzing brain scans that achieves about 93% accuracy and remains reliable with small variations in input. Their approach shows it's possible to have AI that is both accurate and transparent, which is important for medical trust and approval.

Intracranial aneurysmDeep learningConcept Bottleneck Models3D ResNet-343D DenseNet-121Computed Tomography Angiography (CTA)Explainable AI (XAI)Morphological featuresHemodynamic featuresTest-Time Augmentation (TTA)
Authors
Toqa Khaled, Ahmad Al-Kabbany
Abstract
We are concerned with the challenge of reliably classifying and assessing intracranial aneurysms using deep learning without compromising clinical transparency. While traditional black-box models achieve high predictive accuracy, their lack of inherent interpretability remains a significant barrier to clinical adoption and regulatory approval. Explainability is paramount in medical modeling to ensure that AI-driven diagnoses align with established neurosurgical principles. Unlike traditional eXplainable AI (XAI) methods -- such as saliency maps, which often provide post-hoc, non-causal visual correlations -- Concept Bottleneck Models (CBMs) offer a robust alternative by constraining the model's internal logic to human-understandable clinical indices. In this article, we propose an end-to-end 3D Concept Bottleneck framework that maps high-dimensional neuroimaging features to a discrete set of morphological and hemodynamic concepts for aneurysm identification. We implemented this pipeline using a pre-trained 3D ResNet-34 backbone and a 3D DenseNet-121 to extract features from CTA volumes, which were subsequently processed through a soft bottleneck layer representing human-interpretable clinical concepts. The model was optimized using a joint-loss function to balance diagnostic focal loss and concept mean squared error (MSE), validated via stratified five-fold cross-validation. Our results demonstrate a peak task classification accuracy of 93.33% +/- 4.5% for the ResNet-34 architecture and 91.43% +/- 5.8% for the DenseNet-121 model. Furthermore, the implementation of 8-pass Test-Time Augmentation (TTA) yielded a robust mean accuracy of 88.31%, ensuring diagnostic stability during inference. By maintaining an accuracy-generalization gap of less than 0.04, this framework proves that high predictive performance can be achieved without sacrificing interpretability.