Learning to cooperate with emergent reputation via multi-agent reinforcement learning
2026-06-03 • Computer Science and Game Theory
Computer Science and Game Theory
AI summaryⓘ
The authors study how groups of simple agents can learn to cooperate by forming and using reputations, which are shared opinions about how trustworthy others are. Unlike past work that used fixed rules for reputations, they create a method called COOPER that lets agents learn both how to judge reputations and how to use them to cooperate just from rewards in their environment. They design COOPER to handle the tricky connection between reputation and decision-making, reducing noise and delays in learning. Their experiments show that COOPER works across different game setups and social networks, and that agents can develop cooperation and reputation rules naturally when learning together.
Multi-agent systemsReputation systemsCooperationReinforcement learningDistributed learningSocial dilemmasEmergent behaviorSelf-playSocial networksPolicy learning
Authors
Xinwei Song, Yizhe Huang, Dengji Zhao, Xue Feng
Abstract
Reputation, the aggregation of peer assessments diffused through social networks, is a pivotal mechanism for promoting cooperation in social dilemmas ubiquitous to distributed multi-agent systems comprising agents with limited perception and cognitive capabilities. Exploring efficient reputation systems, comprising reputation assessment rules and reputation-based policies, is a long-standing challenge. Previous work assumes predefined reputation assessment rules or models reputation as an intrinsic reward to learn policies, compromising the methods' ability for generalization and adaptation. To address this, we propose a distributed multi-agent reinforcement learning method $\textbf{COOPER}$ ($\textbf{COOP}$eration with $\textbf{E}$mergent $\textbf{R}$eputation), which jointly learns reputation assessment rules and reputation-based policies entirely from environment rewards. Notably, leveraging the underlying mechanisms of reputation, we deliberately design the constituent modules of $\textbf{COOPER}$ and the data flows among them, overcoming the latency and noise in the feedback signal, caused by the deep entanglement between reputation and policy. Experiments on the donation game and the coin game in grid world environments demonstrate that $\textbf{COOPER}$ effectively adapts to various existing reputation systems and co-players. Furthermore, we observe the co-emergence of reputation norms and cooperation in self-play settings. These results hold robustly across diverse social network topologies, underscoring the generalizability and efficacy of our approach.