MASPO: Joint Prompt Optimization for LLM-based Multi-Agent Systems

2026-05-07 • Artificial Intelligence

Artificial IntelligenceComputation and Language

AI summaryⓘ

The authors studied systems where multiple AI agents work together by following special instructions called prompts. They found it is hard to make these prompts good for the whole group, because each agent usually only focuses on its own task. To fix this, they created MASPO, a method that improves all agents' prompts together by checking how well they help the next agent, not just each individual one. Their tests on six different tasks showed that MASPO works better than other prompt improvement methods without needing extra labeled data.

large language modelsmulti-agent systemsprompt engineeringcollaborative AIoptimizationevolutionary algorithmsbeam searchlocal vs global objectivesautomatic refinement

Authors

Zhexuan Wang, Xuebo Liu, Li Wang, Zifei Shan, Yutong Wang, Zhenxi Song, Min Zhang

Abstract

Large language model (LLM)-based Multi-agent systems (MAS) have shown promise in tackling complex collaborative tasks, where agents are typically orchestrated via role-specific prompts. While the quality of these prompts is pivotal, jointly optimizing them across interacting agents remains a non-trivial challenge, primarily due to the misalignment between local agent objectives and holistic system goals. To address this, we introduce MASPO, a novel framework designed to automatically and iteratively refine prompts across the entire system. A core innovation of MASPO is its joint evaluation mechanism, which assesses prompts not merely by their local validity, but by their capacity to facilitate downstream success for successor agents. This effectively bridges the gap between local interactions and global outcomes without relying on ground-truth labels. Furthermore, MASPO employs a data-driven evolutionary beam search to efficiently navigate the high-dimensional prompt space. Extensive empirical evaluations across 6 diverse tasks demonstrate that MASPO consistently outperforms state-of-the-art prompt optimization methods, achieving an average accuracy improvement of 2.9. We release our code at https://github.com/wangzx1219/MASPO.

View PDFOpen arXiv