HandelBot: Real-World Piano Playing via Fast Adaptation of Dexterous Robot Policies
2026-03-12 • Robotics
Robotics
AI summaryⓘ
The authors developed HandelBot, a system to control robotic hands for very precise tasks like playing the piano with both hands. They start with a robot trained in simulation and then improve its finger movements in two steps: first by adjusting positions based on real-world tests, and then by using more learning to fine-tune its actions. Their experiments showed HandelBot plays piano more accurately than just using the simulation-trained policy, needing only a short time of real practice. This approach helps overcome challenges in teaching robots delicate, skilled tasks.
dexterous manipulationmulti-fingered robotic handsreinforcement learningsimulation-to-real transferbimanual piano playingpolicy refinementresidual reinforcement learninghardware experimentsrobot adaptationphysical rollouts
Authors
Amber Xie, Haozhi Qi, Dorsa Sadigh
Abstract
Mastering dexterous manipulation with multi-fingered hands has been a grand challenge in robotics for decades. Despite its potential, the difficulty of collecting high-quality data remains a primary bottleneck for high-precision tasks. While reinforcement learning and simulation-to-real-world transfer offer a promising alternative, the transferred policies often fail for tasks demanding millimeter-scale precision, such as bimanual piano playing. In this work, we introduce HandelBot, a framework that combines a simulation policy and rapid adaptation through a two-stage pipeline. Starting from a simulation-trained policy, we first apply a structured refinement stage to correct spatial alignments by adjusting lateral finger joints based on physical rollouts. Next, we use residual reinforcement learning to autonomously learn fine-grained corrective actions. Through extensive hardware experiments across five recognized songs, we demonstrate that HandelBot can successfully perform precise bimanual piano playing. Our system outperforms direct simulation deployment by a factor of 1.8x and requires only 30 minutes of physical interaction data.