Continual Robot Policy Learning via Variational Neural Dynamics

2026-06-25Robotics

Robotics
AI summary

The authors present a method that helps robots adapt to changing conditions by learning from their own experiences continuously. They combine known physics with a neural network to better understand hidden changes, like wind or battery levels, that affect robot movement. Their system quickly recognizes current conditions and adjusts the robot's control policy accordingly, leading to faster and more accurate responses during tasks. Tests showed significant improvements in drone flight stability compared to previous methods.

continual learningrobot dynamicsneural residual modelrecurrent encoderdifferentiable simulationpolicy adaptationquadrotor controlhidden conditionsonline learningtrajectory tracking
Authors
Jiaxu Xing, Zhiyuan Zhu, Yunfan Ren, Ismail Geles, Yifan Zhai, Rudolf Reiter, Davide Scaramuzza
Abstract
Robots deployed in the real world rarely operate under a single fixed dynamics model: wind changes, payloads vary, batteries drain, contacts shift, and hardware wears. Yet most learning-based controllers are trained once and deployed as if learning were complete. This prevents the robot from using deployment experience to further improve task performance. In this work, we propose a continual learning framework that uses real-world experience to improve robot policies under hidden and recurring dynamics. Our method learns a condition-aware dynamics model from real state-action trajectories by combining an analytical physics prior with a neural residual for unmodeled effects. A recurrent encoder infers the current hidden condition from recent interaction, and this estimate conditions both the residual model and the policy. Policy learning is performed via differentiable simulation using diverse learned dynamics sampled from the latent model. At deployment, these sampled conditions are replaced by conditions inferred online from recent real interaction, allowing the policy to recover recurring dynamics by recognition rather than residual re-fitting. Through extensive simulation studies and real-world experiments, we demonstrate that the framework improves policy performance under diverse unobserved disturbances. On real quadrotor trajectory tracking under changing wind, the policy recovers from recurring disturbances in roughly 1s, about 5x faster than online residual re-fitting. It also reduces large-disturbance hover and tracking errors by 65.7% and 53.3% over the state-of-the-art online adaptation approaches