ActionReasoning: Robot Action Reasoning in 3D Space with LLM for Robotic Brick Stacking

2026-02-24Robotics

Robotics
AI summary

The authors address the challenge that traditional robots struggle to adapt to new tasks because their planners are customized for specific environments. They propose ActionReasoning, a system that uses large language models (LLMs) to think through robot actions by applying physical rules and prior knowledge. By organizing this reasoning through multiple agents, their method can plan how to stack bricks correctly without detailed programming for every step. Their experiments show this approach helps robots make stable moves using high-level instructions rather than low-level coding, suggesting a way to better connect understanding and action in robots.

robotic manipulationlarge language modelsphysical reasoningmulti-agent systemsaction planningVision-Language-Actionsimulationbrick stackingrobotic controlembodied AI
Authors
Guangming Wang, Qizhen Ying, Yixiong Jing, Olaf Wysocki, Brian Sheil
Abstract
Classical robotic systems typically rely on custom planners designed for constrained environments. While effective in restricted settings, these systems lack generalization capabilities, limiting the scalability of embodied AI and general-purpose robots. Recent data-driven Vision-Language-Action (VLA) approaches aim to learn policies from large-scale simulation and real-world data. However, the continuous action space of the physical world significantly exceeds the representational capacity of linguistic tokens, making it unclear if scaling data alone can yield general robotic intelligence. To address this gap, we propose ActionReasoning, an LLM-driven framework that performs explicit action reasoning to produce physics-consistent, prior-guided decisions for robotic manipulation. ActionReasoning leverages the physical priors and real-world knowledge already encoded in Large Language Models (LLMs) and structures them within a multi-agent architecture. We instantiate this framework on a tractable case study of brick stacking, where the environment states are assumed to be already accurately measured. The environmental states are then serialized and passed to a multi-agent LLM framework that generates physics-aware action plans. The experiments demonstrate that the proposed multi-agent LLM framework enables stable brick placement while shifting effort from low-level domain-specific coding to high-level tool invocation and prompting, highlighting its potential for broader generalization. This work introduces a promising approach to bridging perception and execution in robotic manipulation by integrating physical reasoning with LLMs.