Wiring the 'Why': A Unified Taxonomy and Survey of Abductive Reasoning in LLMs
2026-04-09 • Artificial Intelligence
Artificial IntelligenceMachine Learning
AI summaryⓘ
The authors explore abductive reasoning, which is figuring out the best explanation for something you observe, within large language models (LLMs). They highlight that this area has been studied in bits and pieces without a clear shared understanding. To fix this, they propose a clear two-step framework: first, generating possible explanations, then choosing the most likely one. They also review the research landscape, run tests comparing LLMs on these tasks, and identify important gaps like limited task variety and understanding of how models reason abductively.
Abductive reasoningLarge Language ModelsHypothesis GenerationHypothesis SelectionDeductive reasoningInductive reasoningBenchmarkingReasoning tasksAI evaluationTaxonomy
Authors
Moein Salimi, Shaygan Adim, Danial Parnian, Nima Alighardashi, Mahdi Jafari Siavoshani, Mohammad Hossein Rohban
Abstract
Regardless of its foundational role in human discovery and sense-making, abductive reasoning--the inference of the most plausible explanation for an observation--has been relatively underexplored in Large Language Models (LLMs). Despite the rapid advancement of LLMs, the exploration of abductive reasoning and its diverse facets has thus far been disjointed rather than cohesive. This paper presents the first survey of abductive reasoning in LLMs, tracing its trajectory from philosophical foundations to contemporary AI implementations. To address the widespread conceptual confusion and disjointed task definitions prevalent in the field, we establish a unified two-stage definition that formally categorizes prior work. This definition disentangles abduction into \textit{Hypothesis Generation}, where models bridge epistemic gaps to produce candidate explanations, and \textit{Hypothesis Selection}, where the generated candidates are evaluated and the most plausible explanation is chosen. Building upon this foundation, we present a comprehensive taxonomy of the literature, categorizing prior work based on their abductive tasks, datasets, underlying methodologies, and evaluation strategies. In order to ground our framework empirically, we conduct a compact benchmark study of current LLMs on abductive tasks, together with targeted comparative analyses across model sizes, model families, evaluation styles, and the distinct generation-versus-selection task typologies. Moreover, by synthesizing recent empirical results, we examine how LLM performance on abductive reasoning relates to deductive and inductive tasks, providing insights into their broader reasoning capabilities. Our analysis reveals critical gaps in current approaches--from static benchmark design and narrow domain coverage to narrow training frameworks and limited mechanistic understanding of abductive processes...