Cross-Modal Knowledge Distillation from Spatial Transcriptomics to Histology

2026-04-10Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition
AI summary

The authors work on understanding how different types of cells group together in tissues, which helps in biology and medicine. They use a detailed method called spatial transcriptomics to find these groups, but it is expensive and not widely available. To solve this, the authors teach a computer to recognize these groups just from regular tissue images (H&E slides) by learning from examples where both transcriptomics and images are available. Their method predicts cell groups more accurately than methods that only look at images. Once trained, their tool can identify cell groups using only tissue images, making it more accessible.

spatial transcriptomicstissue nichesH&E histologycross-modal distillationcell-type compositionunsupervised learningmorphologyneighborhood compositionbiological tissuemodel distillation
Authors
Arbel Hizmi, Artemii Bakulin, Shai Bagon, Nir Yosef
Abstract
Spatial transcriptomics provides a molecularly rich description of tissue organization, enabling unsupervised discovery of tissue niches -- spatially coherent regions of distinct cell-type composition and function that are relevant to both biological research and clinical interpretation. However, spatial transcriptomics remains costly and scarce, while H&E histology is abundant but carries a less granular signal. We propose to leverage paired spatial transcriptomics and H&E data to transfer transcriptomics-derived niche structure to a histology-only model via cross-modal distillation. Across multiple tissue types and disease contexts, the distilled model achieves substantially higher agreement with transcriptomics-derived niche structure than unsupervised morphology-based baselines trained on identical image features, and recovers biologically meaningful neighborhood composition as confirmed by cell-type analysis. The resulting framework leverages paired spatial transcriptomic and H&E data during training, and can then be applied to held-out tissue regions using histology alone, without any transcriptomic input at inference time.