Benchmarking Pathology Foundation Models for Breast Cancer Survival Prediction

2026-04-27 • Computer Vision and Pattern Recognition

Computer Vision and Pattern RecognitionMachine Learning

AI summaryⓘ

The authors compared different pathology foundation models (PFMs), which are computer programs trained to analyze tissue images, to predict breast cancer survival. They tested these models on over 5,400 patients using data from three different groups to see how well the models worked on new data. They found that newer models generally performed better, but differences between recent models were small. Interestingly, a smaller, faster model sometimes worked better than a larger one. Their work helps guide how to choose efficient and reliable models for predicting patient outcomes in clinical settings.

Pathology foundation modelsBreast cancerSurvival predictionWhole-slide histopathology imagesTransfer learningExternal validationFeature extractionModel generalizationDistilled modelsClinical cohorts

Authors

Fredrik K. Gustafsson, Constance Boissin, Johan Vallon-Christersson, David A. Clifton, Mattias Rantalainen

Abstract

Pathology foundation models (PFMs) have recently emerged as powerful pretrained encoders for computational pathology, enabling transfer learning across a wide range of downstream tasks. However, systematic comparisons of these models for clinically meaningful prediction problems remain limited, especially in the context of survival prediction under external validation. In this study, we benchmark widely used and recently proposed PFMs for breast cancer survival prediction from whole-slide histopathology images. Using a standardized pipeline based on patch-level feature extraction and a unified survival modeling framework, we evaluate model representations across three independent clinical cohorts comprising more than 5,400 patients with long-term follow-up. Models are trained on one cohort and evaluated on two independent external cohorts, enabling a rigorous assessment of cross-dataset generalization. Overall, H-optimus-1 achieves the strongest survival prediction performance. More broadly, we observe consistent generational improvements across model families, with second-generation PFMs outperforming their first-generation counterparts. However, absolute performance differences between many recent PFMs remain modest, suggesting diminishing returns from further scaling of pretraining data or model size alone. Notably, the compact distilled model H0-mini slightly outperforms its larger teacher model H-optimus-0, despite using fewer than 8% of the parameters and enabling significantly faster feature extraction. Together, these results provide the first large-scale, externally validated benchmark of PFMs for breast cancer survival prediction, and offer practical guidance for efficient deployment of PFMs in clinical workflows.

View PDFOpen arXiv