Bridging the gap between Performance and Interpretability: An Explainable Disentangled Multimodal Framework for Cancer Survival Prediction
2026-03-02 • Computer Vision and Pattern Recognition
Computer Vision and Pattern Recognition
AI summaryⓘ
The authors developed DIMAFx, a tool that helps predict cancer patient survival by combining information from tissue images and gene data. Unlike previous methods, DIMAFx shows which parts of each data type are important and how they work together, making the process easier to understand. In breast cancer, they found that features shared between data types relate to tumor shape and gene activity, while unique features highlight tissue surroundings. The approach balances good prediction accuracy with clear explanations, aiding future medical use.
multimodal learningcancer survival predictionhistopathologytranscriptomicsrepresentation disentanglementSHapley Additive exPlanations (SHAP)breast cancertumor microenvironmentestrogen response pathwayprecision medicine
Authors
Aniek Eijpe, Soufyan Lakbir, Melis Erdal Cesur, Sara P. Oliveira, Angelos Chatzimparmpas, Sanne Abeln, Wilson Silva
Abstract
While multimodal survival prediction models are increasingly more accurate, their complexity often reduces interpretability, limiting insight into how different data sources influence predictions. To address this, we introduce DIMAFx, an explainable multimodal framework for cancer survival prediction that produces disentangled, interpretable modality-specific and modality-shared representations from histopathology whole-slide images and transcriptomics data. Across multiple cancer cohorts, DIMAFx achieves state-of-the-art performance and improved representation disentanglement. Leveraging its interpretable design and SHapley Additive exPlanations, DIMAFx systematically reveals key multimodal interactions and the biological information encoded in the disentangled representations. In breast cancer survival prediction, the most predictive features contain modality-shared information, including one capturing solid tumor morphology contextualized primarily by late estrogen response, where higher-grade morphology aligned with pathway upregulation and increased risk, consistent with known breast cancer biology. Key modality-specific features capture microenvironmental signals from interacting adipose and stromal morphologies. These results show that multimodal models can overcome the traditional trade-off between performance and explainability, supporting their application in precision medicine.