SurvHTE-Bench: A Benchmark for Heterogeneous Treatment Effect Estimation in Survival Analysis
2026-03-05 • Machine Learning
Machine LearningArtificial Intelligence
AI summaryⓘ
The authors focus on how to measure the different effects treatments have on survival when some patient outcomes are only partially known because they were cut off (censored). They point out that existing methods to estimate these effects don’t have a consistent way to be tested and compared. To fix this, the authors created SurvHTE-Bench, a collection of test datasets and tools including fake, semi-fake, and real data to fairly evaluate survival treatment effect methods. Their work helps make sure that future methods can be assessed more reliably and compared on equal footing.
heterogeneous treatment effectsright-censored survival datacausal inferencesurvival analysiscounterfactual outcomessynthetic datasetssemi-synthetic datasetsbenchmarkingprecision medicinetreatment effect estimation
Authors
Shahriar Noroozizadeh, Xiaobin Shen, Jeremy C. Weiss, George H. Chen
Abstract
Estimating heterogeneous treatment effects (HTEs) from right-censored survival data is critical in high-stakes applications such as precision medicine and individualized policy-making. Yet, the survival analysis setting poses unique challenges for HTE estimation due to censoring, unobserved counterfactuals, and complex identification assumptions. Despite recent advances, from Causal Survival Forests to survival meta-learners and outcome imputation approaches, evaluation practices remain fragmented and inconsistent. We introduce SurvHTE-Bench, the first comprehensive benchmark for HTE estimation with censored outcomes. The benchmark spans (i) a modular suite of synthetic datasets with known ground truth, systematically varying causal assumptions and survival dynamics, (ii) semi-synthetic datasets that pair real-world covariates with simulated treatments and outcomes, and (iii) real-world datasets from a twin study (with known ground truth) and from an HIV clinical trial. Across synthetic, semi-synthetic, and real-world settings, we provide the first rigorous comparison of survival HTE methods under diverse conditions and realistic assumption violations. SurvHTE-Bench establishes a foundation for fair, reproducible, and extensible evaluation of causal survival methods. The data and code of our benchmark are available at: https://github.com/Shahriarnz14/SurvHTE-Bench .