Executable World Models for ARC-AGI-3 in the Era of Coding Agents

2026-05-06 • Artificial Intelligence

Artificial Intelligence

AI summaryⓘ

The authors tested a computer agent designed to solve ARC-AGI-3 games by building a simple Python model of the game world, checking it against what happened before, and planning actions without any game-specific programming. They ran the agent fresh for each game and reported its results on 25 public games. The agent fully solved 7 games and performed reasonably well on others, showing that this general approach can work across different games. However, the system's performance on private games is still unknown. The authors suggest that using verifiers and executable models could be a useful way to build flexible game-playing agents.

executable world modelARC-AGI-3Python world modelverifier programsplan executorgame-general agentminimum description length (MDL) biasRelative Human Action Efficiency (RHAE)run-to-run variabilitypredefined interfaces

Authors

Sergey Rodionov

Abstract

We evaluate an initial coding-agent system for ARC-AGI-3 in which the agent maintains an executable Python world model, verifies it against previous observations, refactors it toward simpler abstractions as a practical proxy for an MDL-like simplicity bias, and plans through the model before acting. The system is intentionally direct: it uses a scripted controller, predefined world-model interfaces, verifier programs, and a plan executor, but no hand-coded game-specific logic. We report results on the 25 public ARC-AGI-3 games. Each recorded playthrough uses a fresh agent instance with no access to previous playthrough-specific files or conversation state. Most games have a single recorded playthrough; for a few games, we report multiple independent fresh-agent playthroughs to expose run-to-run variability. The agent fully solved 7 games, achieved a Relative Human Action Efficiency greater than 75%, on 6 games, and obtained a mean per-game RHAE of 32.58%. Because the system uses no game-specific code, it can serve as a game-general baseline for ARC-AGI-3. Performance on the private validation set remains to be tested. Overall, the results provide preliminary evidence that verifier-driven executable world models are a promising approach for ARC-AGI-3 agents.

View PDFOpen arXiv