Boltzmann-based Exploration for Robust Decentralized Multi-Agent Planning

2026-03-02 • Multiagent Systems

Multiagent SystemsArtificial Intelligence

AI summaryⓘ

The authors looked at how groups of agents plan together using a method called Decentralized Monte Carlo Tree Search (Dec-MCTS), which has trouble when rewards are rare or tricky. They created a new version named Coordinated Boltzmann MCTS (CB-MCTS) that uses a random decision process (Boltzmann policy) and gradually reduces randomness to explore better. This new method helps agents explore thoughtfully and perform well, especially in challenging situations. The authors tested it and found it often beats Dec-MCTS and works well on typical tasks.

Monte Carlo Tree Search (MCTS)Decentralized MCTSMulti-agent planningBoltzmann policyEntropy bonusExploration vs. exploitationSparse rewardsSimple-regret settingCooperative agents

Authors

Nhat Nguyen, Duong Nguyen, Gianluca Rizzo, Hung Nguyen

Abstract

Decentralized Monte Carlo Tree Search (Dec-MCTS) is widely used for cooperative multi-agent planning but struggles in sparse or skewed reward environments. We introduce Coordinated Boltzmann MCTS (CB-MCTS), which replaces deterministic UCT with a stochastic Boltzmann policy and a decaying entropy bonus for sustained yet focused exploration. While Boltzmann exploration has been studied in single-agent MCTS, applying it in multi-agent systems poses unique challenges. CB-MCTS is the first to address this. We analyze CB-MCTS in the simple-regret setting and show in simulations that it outperforms Dec-MCTS in deceptive scenarios and remains competitive on standard benchmarks, providing a robust solution for multi-agent planning.

View PDFOpen arXiv