When in Doubt, Plan It Out: Committed Small Language Model Deliberation for Reactive Reinforcement Learning

2026-06-15 • Artificial Intelligence

Artificial IntelligenceMachine Learning

AI summaryⓘ

The authors created a system called PACT that helps AI make better decisions in new situations by combining two parts: a fast, reactive learner and a slower, thinking planner using a small language model. The planner comes up with action plans and checks if they work safely before the AI follows them directly. They tested this on different versions of a game called FrozenLake and found their method works better than others without needing to retrain the fast learner. This shows that mixing quick reactions with careful planning can improve AI performance.

Reinforcement LearningReactive PolicyDeliberative PlanningSmall Language ModelFrozenLakeSimulationAsynchronous ProcessingAction PlanningHybrid Architecture

Authors

Nathan Gavenski, Juarez Monteiro, Francisco Galuppo, Adriano Veloso, Odinaldo Rodrigues

Abstract

Reinforcement Learning (RL) policies often degrade in unfamiliar environments because they lack explicit deliberation. We propose Plan, Align, Commit, Think (PACT), a hybrid architecture that combines a fast, reactive RL policy with a slow, deliberative Small Language Model (SLM) planner. PACT invokes the SLM asynchronously to generate and validate candidate action plans. Once a plan is verified through simulation as safe, feasible, and complete, it is executed directly, bypassing the RL policy without retraining or modifying it. Evaluated on three FrozenLake configurations of increasing difficulty, PACT outperforms all baselines while relying on a 2B-parameter SLM backbone, suggesting that deliberative planning and reactive execution are more powerful in concert than either is alone in these settings.

View PDFOpen arXiv