Adaptive Combinatorial Experimental Design: Pareto Optimality for Decision-Making and Inference
2026-02-27 • Machine Learning
Machine Learning
AI summaryⓘ
The authors study how to design experiments that choose combinations of options to test, balancing two goals: quickly picking good options (low regret) and accurately figuring out which options are best (high statistical power). They show these goals conflict and use a concept called Pareto optimality to find the best trade-offs. They create two algorithms that work under different feedback types and prove these methods efficiently balance the trade-off. Their results also show having more detailed feedback helps make better decisions. Overall, the authors provide a solid approach for experimenting when you want to both learn and perform well.
Combinatorial Multi-Armed BanditsRegret MinimizationStatistical PowerPareto OptimalityFull-Bandit FeedbackSemi-Bandit FeedbackAdaptive Experimental DesignMixCombKL AlgorithmMixCombUCB Algorithm
Authors
Hongrui Xie, Junyu Cao, Kan Xu
Abstract
In this paper, we provide the first investigation into adaptive combinatorial experimental design, focusing on the trade-off between regret minimization and statistical power in combinatorial multi-armed bandits (CMAB). While minimizing regret requires repeated exploitation of high-reward arms, accurate inference on reward gaps requires sufficient exploration of suboptimal actions. We formalize this trade-off through the concept of Pareto optimality and establish equivalent conditions for Pareto-efficient learning in CMAB. We consider two relevant cases under different information structures, i.e., full-bandit feedback and semi-bandit feedback, and propose two algorithms MixCombKL and MixCombUCB respectively for these two cases. We provide theoretical guarantees showing that both algorithms are Pareto optimal, achieving finite-time guarantees on both regret and estimation error of arm gaps. Our results further reveal that richer feedback significantly tightens the attainable Pareto frontier, with the primary gains arising from improved estimation accuracy under our proposed methods. Taken together, these findings establish a principled framework for adaptive combinatorial experimentation in multi-objective decision-making.