Geometry-Aware State Space Model: A New Paradigm for Whole-Slide Image Representation

2026-05-06Computer Vision and Pattern Recognition

Computer Vision and Pattern RecognitionArtificial Intelligence
AI summary

The authors address the challenge of analyzing whole-slide images (WSIs) of tissue samples, which are very large and made of many smaller patches. They point out that previous methods treated these patches as simple flat data, missing important hierarchical and regional tissue features. To improve this, they create a new system called BatMIL that represents patch data in both curved (hyperbolic) and flat (Euclidean) spaces to better capture tissue structure and details. They also use a model to efficiently process thousands of patches and a mixture-of-experts approach to handle different tissue regions separately. Their experiments show BatMIL performs better than existing methods on various cancer datasets.

Whole-slide images (WSI)Multiple Instance Learning (MIL)Hyperbolic geometryEuclidean spaceHistopathologyStructured state space sequence model (S4)Mixture-of-experts (MoE)Patch embeddingTissue heterogeneitySlide-level classification
Authors
Enhui Chai, Sicheng Chen, Tianyi Zhang, Chad Wong, Kecheng Huang, Zeyu Liu, Fei Xia
Abstract
Accurate analysis of histopathological images is critical for disease diagnosis and treatment planning. Whole-slide images (WSIs), which digitize tissue specimens at gigapixel resolution, are fundamental to this process but require aggregating thousands of patches for slide-level predictions. Multiple Instance Learning (MIL) tackles this challenge with a two-stage paradigm, decoupling tile-level embedding and slide-level prediction. However, most existing methods implicitly embed patch representations in homogeneous Euclidean spaces, overlooking the hierarchical organization and regional heterogeneity of pathological tissues. This limits current models' ability to capture global tissue architecture and fine-grained cellular morphology. To address this limitation, we introduce a hybrid hyperbolic-Euclidean representation that embeds WSI features in dual geometric spaces, enabling complementary modeling of hierarchical tissue structures and local morphological details. Building on this formulation, we develop BatMIL, a WSI classification framework that leverages both geometric spaces. To model long-range dependencies among thousands of patches, we employ a structured state space sequence model (S4) backbone that encodes patch sequences with linear computational complexity. Furthermore, to account for regional heterogeneity, we introduce a chunk-level mixture-of-experts (MoE) module that groups patches into regions and dynamically routes them to specialized subnetworks, improving representational capacity while reducing redundant computation. Extensive experiments on seven WSI datasets spanning six cancer types demonstrate that BatMIL consistently outperforms state-of-the-art MIL approaches in slide-level classification tasks. These results indicate that geometry-aware representation learning offers a promising direction for next-generation computational pathology.