Flood and Harvest: The Provable Necessity of Trivia for Generating Valuable Mathematics via the Lens of Language Generation in the Limit

2026-06-12 • Machine Learning

Machine LearningArtificial IntelligenceComputation and LanguageData Structures and Algorithms

AI summaryⓘ

The authors study how AI systems generate useful formal math proofs checked by computers. They model the problem as trying to find valuable math statements hidden within all verifiable statements, with some known from existing literature. They find that even a perfect checker cannot replace human judgment, because some correct but trivial statements are always needed to cover all valuable math. Allowing a small number of trivial outputs greatly improves coverage, but too many can lead to an overwhelming amount of worthless but correct statements. This shows that generating valuable formal math involves balancing soundness, completeness, and the unavoidable presence of trivial facts.

formal mathematicsproof assistantverifierlanguage generationmembership oracleAngluin's conditioncoveragetriviahallucinationcompression model

Authors

Xiaoyu Li, Andi Han, Dai Shi, Zheng Gao, Jiaojiao Jiang, Junbin Gao

Abstract

AI systems coupled to proof assistants now generate formal mathematics at scale, and the gap between what a checker can verify and what a mathematician would value has become the binding constraint. We model the generation of valuable mathematics as nested language generation in the limit: a verifiable formal language $F$, accessed through a membership oracle (the proof checker), contains an unknown valuable language $H \in \mathcal{H}$ revealed only through an adversarial enumeration of a core $C \subseteq H$ of exact density $α$ (the literature). Every output is valuable ($\in H$), trivial ($\in F \setminus H$), or a hallucination ($\notin F$). We settle four questions. First, the verifier is not taste: the collections admitting generation with breadth are exactly those of the oracle-free model, characterized fiber-wise by Angluin's condition. Second, the verifier does buy sound coverage, covering all unseen valuable statements while asserting only valid ones: possible with it, impossible without it; it relocates unavoidable errors from false to trivial. Third, and centrally, a sharp dichotomy on the tight family: generators emitting finitely many trivia achieve optimal coverage $α/2$, while any infinite trivia allowance, even at vanishing rate, jumps the optimum to $1-α/2$ (both tight, for cores presented as the candidate intersection), and one generator attains both ends. The transition is in trivia count, not rate; the gap $1-α$ is the unrecorded mass. Fourth, both regimes instantiate in a compression model of mathematics. A perfect verifier cannot substitute for taste: the unbounded stream of correct-but-worthless statements is not an engineering accident but a provable necessity, since covering unrecorded valuable mathematics requires an infinite, but asymptotically negligible, stream of certified trivia.

View PDFOpen arXiv