Ergodic Deviation-Robust Equilibrium under Mirror Descent Learning in Finite Games

2026-06-16Computer Science and Game Theory

Computer Science and Game Theory
AI summary

The authors introduce a concept called Ergodic Deviation-Robust Equilibrium (EDRE) for games where players learn using a method called entropic mirror descent (EMD). EDRE is a special type of outcome that is stable over time, making sure no group of players gains significantly by deviating while also being an equilibrium selected by the learning dynamics. They prove the existence of EDRE in certain games, show it avoids unstable outcomes, and analyze how hard it is to compute. The authors also show that EDRE connects static equilibrium ideas with dynamic learning behavior, providing a way to certify equilibria using the learning process itself.

Ergodic Deviation-Robust EquilibriumEntropic Mirror DescentNash EquilibriumPotential GamesPolymatrix GamesDeviation RegretFixed PointVariational InequalitiesPPAD Complexity
Authors
Joshua Steier
Abstract
We introduce Ergodic Deviation-Robust Equilibrium (EDRE), a dynamics-relative equilibrium concept for repeated finite games in which agents learn via entropic mirror descent (EMD). EDRE requires three properties to hold simultaneously for the same profile and learning run: (E1) the limit profile is an $\varepsilon$-Nash equilibrium at a product distribution; (E2) along the entire learning trajectory, every fixed coalition's cumulative aggregate (summed-unilateral) deviation gain is $\tilde{\mathcal{O}}(\sqrt{T})$ with high probability; and (E3) the limit profile is a fixed point of the EMD map, so that it is selected by the dynamics rather than merely certified as an equilibrium. We prove that the $\sqrt{T}$ deviation-regret rate is order-tight, establish existence in exact-potential games (via Nash's theorem, with a constructive proximal route under concavity) together with Lyapunov monotonicity of EMD (and pointwise convergence when the fixed-point set is a singleton), and extend the selection property to monotone polymatrix games through variational inequalities. Although a static EDRE coincides with an $\varepsilon$-Nash equilibrium, its content is dynamic: robust (positive-measure) selection under EMD excludes linearly unstable equilibria, so EDRE acts as a Nash equilibrium equipped with a dynamic certificate rather than a static refinement. On the complexity side, we show that computing EDRE is PPAD-hard in general polymatrix games and belongs to promise-PPAD for potential games. A worked $2\times 2$ coordination-game example illustrates all components of the framework. Additional results, including a bandit-feedback extension, a period-doubling route to Li-Yorke chaos for the two-strategy EMD map at large step size, a linear-program formulation for minimum-cost steering, and supporting simulations, appear in the appendices.