Implicit Representations of Grammaticality in Language Models

2026-05-06Computation and Language

Computation and Language
AI summary

The authors studied whether language models (LMs) understand grammar differently from just predicting how likely a sentence is. They trained a simple tool to detect grammatical sentences using the inner workings of LMs, finding it worked better than just using a sentence's probability scores. This tool also worked well on grammar tests in other languages but was less good at judging if sentences made sense meaning-wise. Overall, the authors show that LMs may have some hidden sense of grammar beyond just guessing word patterns.

language modelsgrammaticalitystring probabilitylinear probesynthetic ungrammatical sentencesminimal pairssemantic plausibilitycross-lingual generalizationinternal representationspretrained models
Authors
Yingshan Susan Wang, Linlu Qiu, Zhaofeng Wu, Roger P. Levy, Yoon Kim
Abstract
Grammaticality and likelihood are distinct notions in human language. Pretrained language models (LMs), which are probabilistic models of language fitted to maximize corpus likelihood, generate grammatically well-formed text and discriminate well between grammatical and ungrammatical sentences in tightly controlled minimal pairs. However, their string probabilities do not sharply discriminate between grammatical and ungrammatical sentences overall. But do LMs implicitly acquire a grammaticality distinction distinct from string probability? We explore this question through studying internal representations of LMs, by training a linear probe on a dataset of grammatical and (synthetic) ungrammatical sentences obtained by applying perturbations to a naturalistic text corpus. We find that this simple grammaticality probe generalizes to human-curated grammaticality judgment benchmarks and outperforms LM probability-based grammaticality judgments. When applied to semantic plausibility benchmarks, in which both members of a minimal pair are grammatical and differ in only plausibility, the probe however performs worse than string probability. The English-trained probe also exhibits nontrivial cross-lingual generalization, outperforming string probabilities on grammaticality benchmarks in numerous other languages. Additionally, probe scores correlate only weakly with string probabilities. These results collectively suggest that LMs acquire to some extent an implicit grammaticality distinction within their hidden layers.