Planning in entropy-regularized Markov decision processes and games

2026-04-21 • Machine Learning

Machine Learning

AI summaryⓘ

The authors introduce SmoothCruiser, a new method to help computers plan better in certain decision-making problems and games by using a technique called entropy regularization. This approach leverages a smoothness property to make the learning process more efficient, needing fewer examples to get good results. Their method guarantees a predictable amount of effort to reach a certain accuracy, which isn't available for traditional setups without this technique.

Markov decision processentropy regularizationvalue functionBellman operatorsample complexitygenerative modeltwo-player gamesplanning algorithm

Authors

Jean-Bastien Grill, Omar Darwiche Domingues, Pierre Ménard, Rémi Munos, Michal Valko

Abstract

We propose SmoothCruiser, a new planning algorithm for estimating the value function in entropy-regularized Markov decision processes and two-player games, given a generative model of the environment. SmoothCruiser makes use of the smoothness of the Bellman operator promoted by the regularization to achieve problem-independent sample complexity of order O~(1/epsilon^4) for a desired accuracy epsilon, whereas for non-regularized settings there are no known algorithms with guaranteed polynomial sample complexity in the worst case.

View PDFOpen arXiv