GARL: Game-Theoretic Reinforcement Learning for Multi-Agent Strategic Prioritisation

2026-06-03Computation and Language

Computation and Language
AI summary

The authors propose a new method called GARL that helps multiple AI agents work together better when making strategic decisions. GARL treats their interaction like a game where agents first choose important items and then a judge decides the final order. By using ideas from game theory combined with reinforcement learning, GARL guides agents to improve their teamwork and decision-making strategies. They tested GARL on sorting important legal issues and found it helps smaller AI models perform as well as bigger ones. This approach offers a structured way to improve how AI agents collaborate and prioritize tasks.

multi-agent systemslarge language models (LLMs)reinforcement learninggame theorystrategic prioritisationpolicy optimisationissue rankinglegal AImulti-agent reinforcement learninginteraction policies
Authors
Yuxiao Ye, Yiwen Zhang, Huiyuan Xie, Yuqin Huang, Zhiyuan Liu
Abstract
LLM-based multi-agent systems are increasingly used for strategic decision-making tasks. In such settings, performance depends not only on individual model capabilities, but also on the policies by which agents interact and adapt. Multi-agent reinforcement learning can optimise these interaction policies, but its reward design often remains task-specific and weakly grounded in interaction structure. To address this gap, we propose GARL, a GAme-theoretic Reinforcement Learning framework for multi-agent strategic prioritisation. GARL formalises strategic prioritisation as a two-stage game: competing agents first allocate strategic resources over a shared candidate set, and a higher-level arbiter then produces the final ranking. The resulting game-theoretic utilities are converted into role-specific reinforcement signals, allowing policy optimisation to be guided by structured interaction. We instantiate GARL on issues-in-dispute ranking, where the goal is to prioritise core issues in legal proceedings. Experiments show that GARL improves ranking performance, enables small open-source LLMs to become competitive with a strong closed-source LLM under the same candidate-ranking setting, and yields gains in legal-domain competence and broader strategic decision-making. Overall, GARL demonstrates how game-theoretic interaction structure can be turned into reinforcement-learning objectives, providing a principled approach to policy optimisation in multi-agent strategic prioritisation.