Planning in entropy-regularized Markov decision processes and games

2026-04-21Machine Learning

Machine Learning
AI summary

The authors introduce SmoothCruiser, a new method to help computers plan better in certain decision-making problems and games by using a technique called entropy regularization. This approach leverages a smoothness property to make the learning process more efficient, needing fewer examples to get good results. Their method guarantees a predictable amount of effort to reach a certain accuracy, which isn't available for traditional setups without this technique.

Markov decision processentropy regularizationvalue functionBellman operatorsample complexitygenerative modeltwo-player gamesplanning algorithm
Authors
Jean-Bastien Grill, Omar Darwiche Domingues, Pierre Ménard, Rémi Munos, Michal Valko
Abstract
We propose SmoothCruiser, a new planning algorithm for estimating the value function in entropy-regularized Markov decision processes and two-player games, given a generative model of the environment. SmoothCruiser makes use of the smoothness of the Bellman operator promoted by the regularization to achieve problem-independent sample complexity of order O~(1/epsilon^4) for a desired accuracy epsilon, whereas for non-regularized settings there are no known algorithms with guaranteed polynomial sample complexity in the worst case.