Collaborative Task and Path Planning for Heterogeneous Robotic Teams using Multi-Agent PPO

2026-04-01Robotics

RoboticsMultiagent Systems
AI summary

The authors focus on coordinating a team of different robots to explore planets more efficiently. Traditional planning methods become too slow as the number of tasks and robots grows, so the authors use a learning-based approach called Multi-Agent Proximal Policy Optimization (MAPPO). Their method helps the robot team quickly decide who should do what and where to go, even when plans need to change on the spot. They tested this strategy against traditional optimal planning and found it works well for complex exploration tasks.

robotic explorationmulti-agent systemsproportional policy optimizationtarget allocationschedulingheterogeneous robotsonline replanningcombinatorial optimizationplanetary exploration
Authors
Matthias Rubio, Julia Richter, Hendrik Kolvenbach, Marco Hutter
Abstract
Efficient robotic extraterrestrial exploration requires robots with diverse capabilities, ranging from scientific measurement tools to advanced locomotion. A robotic team enables the distribution of tasks over multiple specialized subsystems, each providing specific expertise to complete the mission. The central challenge lies in efficiently coordinating the team to maximize utilization and the extraction of scientific value. Classical planning algorithms scale poorly with problem size, leading to long planning cycles and high inference costs due to the combinatorial growth of possible robot-target allocations and possible trajectories. Learning-based methods are a viable alternative that move the scaling concern from runtime to training time, setting a critical step towards achieving real-time planning. In this work, we present a collaborative planning strategy based on Multi-Agent Proximal Policy Optimization (MAPPO) to coordinate a team of heterogeneous robots to solve a complex target allocation and scheduling problem. We benchmark our approach against single-objective optimal solutions obtained through exhaustive search and evaluate its ability to perform online replanning in the context of a planetary exploration scenario.