Operads for compositional reasoning in LLMs

2026-06-11Computation and Language

Computation and Language
AI summary

The authors propose using a mathematical concept called operads to better understand how complex questions can be broken down into simpler parts and then solved step-by-step. They define a framework where questions are treated like operations that can be combined, and answer-checking can be done consistently across these parts. They introduce the idea of 'operadic consistency' to measure if answers from different parts of a question fit well together. Their companion paper shows this method correlates with better accuracy than some existing techniques. Overall, the authors suggest operads provide a solid math foundation for improving multi-step question answering with large language models.

operadsquestion decompositionlarge language models (LLMs)multi-hop question answeringalgebras over operadsoperadic consistencymathematical foundationreasoningself-consistencyquestion templates
Authors
Nathaniel Bottman, Kyle Richardson
Abstract
Question decomposition, i.e. breaking a complex query into simpler sub-queries whose answers are composed to produce a final answer, is a widely used strategy for improving LLM reasoning, yet it currently lacks a rigorous mathematical foundation. In this paper, we propose operads, mathematical structures that model many-in, one-out operations and compositions thereof, as a natural framework for describing question decomposition. We define the questions operad $Q$, in which operations correspond to question templates and composition corresponds to substitution of sub-answers, and show how QA models can be interpreted as algebras over $Q$. Beyond reframing existing practice, this operadic perspective points toward new methods, in particular a notion of operadic consistency, which measures whether a QA model's answers agree across the partial collapses of a question decomposition tree. Empirical evaluation of operadic consistency is reported in our companion paper (Bottman, Liu, and Richardson, 2026), which finds it strongly correlated with accuracy across twelve LLMs and four multi-hop QA datasets and outperforming standard temperature-based self-consistency baselines. We argue that operads are the natural mathematical home for question decomposition, and that invariants such as operadic consistency open new directions for analyzing and improving the reliability of multi-step reasoning.