Regret Minimization with Adaptive Opponents in Repeated Games

2026-06-04 • Machine Learning

Machine LearningArtificial IntelligenceComputer Science and Game Theory

AI summaryⓘ

The authors study how to measure and reduce regret in repeated games where opponents adapt their strategies based on past behavior. They introduce a new metric called Repeated Policy Regret (RP-Regret) that better accounts for opponents who react to past plays, unlike traditional regret measures. They explore conditions needed to keep this regret low over time and propose three algorithms to minimize it despite its non-convex nature. Their work shows that when all players minimize RP-Regret, the game can reach stronger equilibrium outcomes, and experiments demonstrate more cooperative results in classic games like Stag-Hunt.

regret minimizationrepeated gamesadaptive opponentsexternal regretpolicy regretnon-convex optimizationsubgame perfect equilibriumonline learningStag-Hunt game

Authors

Mingyang Liu, Asuman Ozdaglar, Tiancheng Yu, Kaiqing Zhang

Abstract

In this paper, we study regret minimization in repeated games with \emph{adaptive} opponents who can respond based on histories of play. The standard metric of \emph{external regret} in online learning is known to fail to capture such adaptivity. To account for players' counterfactual reasoning, we introduce {\tt Repeated Policy Regret (RP-Regret)}, a game-theoretic metric that measures the difference between the \emph{realized} and the \emph{best-in-hindsight} accumulated utility when all players can \emph{respond} to the history of play. Compared to existing regret notions in this setting, ours is native to repeated game playing, enabling stronger comparators and opponents with fewer constraints, while maintaining the possibility of finding better equilibria when all players minimize it. We first identify necessary conditions for obtaining {\tt RP-Regret} sublinear in time, on the variation of the player's comparator strategies in the regret definition and on the memories of both the comparator and opponents' strategies. We then study additional conditions and provable algorithms to minimize {\tt RP-Regret}, which is by definition \emph{non-convex} in the strategy space. To address this challenge, we propose three algorithms: (i) one based on an optimization oracle, as assumed in some prior work in online non-convex learning; (ii) one that minimizes a convex and \emph{linearized} surrogate of {\tt RP-Regret} at each iteration; (iii) one that directly minimizes {\tt RP-Regret} when opponents change strategies slowly. Furthermore, when all players can run algorithms to minimize the {\tt RP-Regret} (or its linearized variant), certain subgame perfect equilibria of the repeated game can be learned. We also provide experiments showing that minimizing our regret notions can lead to more cooperative solutions with higher utility in games such as Stag-Hunt.

View PDFOpen arXiv