Online Scalarization in Vector-Valued Games

2026-05-07Computer Science and Game Theory

Computer Science and Game Theory
AI summary

The authors study a game setup where players receive multiple values as payoffs and then combine these values into one score using a method called scalarization. Unlike past work where this combination method was fixed, they let the method change over time as part of learning. They propose a two-level learning system: one part slowly chooses how to combine values, and another part quickly picks actions based on that choice. Their method shows better results in guiding the game towards a preferred outcome compared to using a fixed combination. They also provide algorithms with theoretical guarantees that the approach improves over time.

vector-valued gamesscalarizationonline learningbandit algorithmsno-regret learningonline mirror descentregret boundsequilibrium convergencemulti-player gamesalgorithmic game theory
Authors
Ehsan Asadollahi, Calvin Hawkins, Matthew Hale
Abstract
We study repeated multi-player vector-valued games in which a player observes a payoff vector each round and evaluates outcomes through linear scalarizations of those vectors. Different from most prior works, the choice of scalarization is treated as an online decision variable rather than a fixed modeling decision. We propose a bi-level learning framework in which an outer learner chooses a scalarization from a finite candidate class on a slow timescale, while a faster inner bandit no-regret learner selects actions using the scalar feedback induced by the chosen scalarization. Performance of this approach is defined with respect to a certain true weight vector, and the deployed scalarizations act as control signals that shape the induced payoff trajectory. We provide implementable algorithms based on bandit online mirror descent with stabilized importance weighting, and we derive finite-time performance guarantees in the form of sublinear regret bounds. Experiments on a vector-valued extension of a canonical game show that convergence to the preferred equilibrium rises from roughly $50\%$ under non-adaptive scalarization to about $80\%$ under our proposed method.