Hidden Biases in Conditioning Autoregressive Models
2026-04-09 • Artificial Intelligence
Artificial Intelligence
AI summaryⓘ
The authors explain that when using large language or music models to create outputs with specific rules (like rhyming or fixed meter), the usual methods don't perfectly follow these constraints. They show that finding the best exact solution or correctly sampling all valid options is computationally very hard (NP-hard or #P-hard). This difficulty occurs even with simple constraints and means that exact methods can't rely on easy shortcuts used in simpler models. Their work confirms that while generating outputs step-by-step is doable, perfectly meeting complex global constraints is generally too difficult to solve exactly.
autoregressive modelsconstrained generationNP-hard#P-hardmaximum a posteriori decodingsampling normalizationfinite-state Markov modelsdynamic programmingglobal form constraintsneural decoding
Authors
Francois Pachet, Pierre Roy
Abstract
Large language and music models are increasingly used for constrained generation: rhyming lines, fixed meter, inpainting or infilling, positional endings, and other global form requirements. These systems often perform strikingly well, but the induced procedures are usually not exact conditioning of the underlying autoregressive model. This creates a hidden inferential bias, distinct from the better-known notion of bias inherited from the training set: samples are distorted relative to the true constrained distribution, with no generic guarantee of complete coverage of the admissible solution space or of correct conditional probabilities over valid completions. We formalize several exact inference tasks for autoregressive models and prove corresponding hardness results. For succinctly represented autoregressive models whose next-token probabilities are computable in polynomial time, exact sentence-level maximum a posteriori (MAP) decoding is NP-hard. This hardness persists under unary and metrical constraints. On the sampling side, exact conditioned normalization is \#P-hard even for regular constraints such as fixed-length terminal events. Unlike finite-state Markov models, general autoregressive models do not admit a bounded-state dynamic program for these tasks. These results formalize a standard claim in the neural decoding literature: local autoregressive sampling is easy, whereas exact decoding and exact conditioning under global form constraints are computationally intractable in general.