MAD: Mapping-Aware World Models for Agile Quadrotor Flight

2026-06-03Robotics

Robotics
AI summary

The authors developed a system called MAD that helps quadrotor drones fly quickly and safely in cluttered spaces by remembering and mapping their surroundings instead of just reacting to what they see in the moment. MAD creates internal maps of occupied and visible areas using depth images and drone movements, which helps the drone avoid collisions more effectively. The team tested their approach in simulations and real-world flights, showing their method outperforms other vision-only techniques and works well indoors and outdoors. Their work bridges the gap between traditional navigation methods and fully end-to-end learning.

QuadrotorWorld modelOccupancy grid mapProprioceptionDepth imagingRecurrent neural networksPolicy learningCollision avoidanceSim-to-real transferVision-based navigation
Authors
Xinhong Zhang, Runqing Wang, Yunfan Ren, Ding Yu, Boyu Zhou, Jian Sun, Fang Deng, Jie Chen, Gang Wang
Abstract
Agile quadrotor flight in cluttered scenes requires more than a reactive mapping from a depth image to a control command: the vehicle must remember which regions have been observed, infer nearby occupied space, and act under partial visibility and tight latency. In this paper, we present Mapping-Aware Dreamer (MAD), a geometry-aware world model for vision-based quadrotor flight. Instead of using raw-image reconstruction as the main self-supervised objective, MAD learns recurrent latent dynamics that reconstruct robocentric occupancy and visibility grid maps together with proprioceptive states. This design forces the latent state to encode local geometry, visibility history, and ego-motion in a form that is directly relevant to collision avoidance. MAD is trained in DiffAero using a GPU-parallel map-construction module that provides high-throughput supervision for occupancy and visibility. The learned representation is used in three policy-learning modes: imagination-based MAD-Dreamer and feature-extractor variants based on PPO and SHAC. Across visual navigation and racing tasks, MAD-based agents achieve higher success rates, faster flight, and better cross-task transfer than corresponding vision-only baselines. The model also produces interpretable map predictions and accurate ego-motion estimates from depth observations. We further deploy the learned policy on a physical quadrotor with an Intel RealSense D435i and demonstrate safe indoor and outdoor flight under limited sensing, reaching 9.66 m/s in simulation and 5.05 m/s in real-world forest experiments. These results show that mapping-aware world models provide a practical middle ground between modular aerial navigation and end-to-end learning.