On-Policy Distillation of Language Models for Autonomous Vehicle Motion Planning

2026-04-09Robotics

RoboticsArtificial Intelligence
AI summary

The authors explore how to take a big, smart language model used for planning car movements and teach a smaller model to do the same job so it can run on limited hardware in cars. They compare two ways to train the smaller model: one where it learns from its own predictions with detailed help from the bigger model, and another using reinforcement learning based on feedback from the bigger model. Their tests show that the first method works much better and nearly matches the big model’s performance despite being five times smaller. This suggests their approach is a practical way to bring advanced language-based planning to real-world self-driving cars.

large language modelsmotion planningknowledge distillationreinforcement learningGPT-Driverchain-of-thought reasoningautonomous vehiclesnuScenes benchmarkon-policy trainingtrajectory prediction
Authors
Amirhossein Afsharrad, Amirhesam Abedsoltan, Ahmadreza Moradipari, Sanjay Lall
Abstract
Large language models (LLMs) have recently demonstrated strong potential for autonomous vehicle motion planning by reformulating trajectory prediction as a language generation problem. However, deploying capable LLMs in resource-constrained onboard systems remains a fundamental challenge. In this paper, we study how to effectively transfer motion planning knowledge from a large teacher LLM to a smaller, more deployable student model. We build on the GPT-Driver framework, which represents driving scenes as language prompts and generates waypoint trajectories with chain-of-thought reasoning, and investigate two student training paradigms: (i) on-policy generalized knowledge distillation (GKD), which trains the student on its own self-generated outputs using dense token-level feedback from the teacher, and (ii) a dense-feedback reinforcement learning (RL) baseline that uses the teacher's log-probabilities as per-token reward signals in a policy gradient framework. Experiments on the nuScenes benchmark show that GKD substantially outperforms the RL baseline and closely approaches teacher-level performance despite a 5$\times$ reduction in model size. These results highlight the practical value of on-policy distillation as a principled and effective approach to deploying LLM-based planners in autonomous driving systems.