Bridging Performance and Generalization in Reinforcement Learning for Agile Flight

2026-06-25Robotics

Robotics
AI summary

The authors studied how to teach drones to race autonomously on new tracks they haven't seen before, which is very hard because drones need to be both fast and safe. They found that current methods either crash on new tracks or slow down a lot to avoid crashes. Their new approach uses a combination of smart training techniques and a way to create diverse race tracks to help drones learn general skills without slowing down or needing adjustment later. They tested their method in simulations and real life, showing much better generalization and speed than previous methods, even when the drone had to rely only on camera input. This shows their approach helps drones race well on many different tracks right away.

Autonomous drone racingReinforcement learningZero-shot generalizationActuation saturationAgile flightPhysically informed procedural generationEnd-to-end controlVision-based controlLearning progressSim-to-real transfer
Authors
Jonathan Green, Jiaxu Xing, Nico Messikommer, Angel Romero, Davide Scaramuzza
Abstract
Autonomous drone racing is a fundamentally challenging regime for autonomous aerial robots, requiring time-optimal control while operating under persistent actuation saturation. While reinforcement learning (RL) has achieved human-level performance in this domain, current methods fail to generalize; policies trained on specific environments often crash immediately in unseen configurations. This failure reflects the intrinsic difficulty of zero-shot generalization in agile flight, arising from high-dimensional task variation and the tight coupling between safety and performance at high speeds. Existing approaches that improve generalization impose a substantial cost on flight speed: control policies must significantly degrade performance to achieve even modest levels of generalization. In this work, we propose a framework for zero-shot generalization in agile flight for RL-based drone racing. By combining task-aware switching based on learning progress with a physically informed procedural track generator, the framework produces a fast and robust generalist policy without test-time adaptation. Our method achieves strong zero-shot performance across a wide range of unseen racetracks in the real world, demonstrating a 7.4x improvement in generalization over the state-of-the-art approaches, while maintaining competitive racing speeds. We validate our method's results in both simulation and real-world settings, including a challenging vision-based, end-to-end control setting that operates without explicit state estimation, where all prior approaches fail to generalize.