LEAP: Supercharging LLMs for Formal Mathematics with Agentic Frameworks

2026-06-02Artificial Intelligence

Artificial Intelligence
AI summary

The authors created LEAP, a system that helps large language models do complex math proofs in a formal language called Lean. LEAP breaks down big math problems into smaller steps, using informal reasoning combined with formal checking to create correct proofs. They also made a new test set, Lean-IMO-Bench, with tricky math problems inspired by the International Math Olympiad. LEAP performed very well, solving all problems in a recent math competition and greatly improving proof success rates compared to previous methods. Additionally, the authors used LEAP to formalize difficult open math problems, demonstrating its usefulness for real research.

Large Language ModelsFormal Theorem ProvingLean Theorem ProverInternational Math OlympiadPutnam CompetitionAutomated ReasoningMathematical ProofsCombinatoricsHamiltonian DecompositionCayley Graphs
Authors
Po-Nien Kung, Linfeng Song, Dawsen Hwang, Jinsung Yoon, Chun-Liang Li, Simone Severini, Mirek Olšák, Edward Lockhart, Quoc V Le, Burak Gokturk, Thang Luong, Tomas Pfister, Nanyun Peng
Abstract
Large Language Models (LLMs) exhibit strong informal mathematical reasoning but struggle to generate mechanically verifiable proofs in formal languages like Lean. We present LEAP, an agentic framework that enables general-purpose foundation models to achieve state-of-the-art performance on automated formal theorem proving. LEAP leverages foundation model capabilities, such as informal reasoning, instruction following, and iterative self-refinement. By decomposing complex problems into smaller units, the system bridges formal proof construction with informal blueprints through continuous interaction with the Lean compiler. To provide a rigorous evaluation beyond increasingly saturated benchmarks, we introduce Lean-IMO-Bench, a benchmark of IMO-style problems formalized in Lean, with short statements yet highly non-routine and multi-step proofs across a wide range of difficulty levels. Empirically, on the latest 2025 Putnam Competition, an annual mathematics competition for undergraduate students in North America, LEAP solves all 12 problems, matching recent breakthroughs by frontier formal mathematical models. On Lean-IMO-Bench, LEAP boosts the one-shot formal solve rate of general-purpose LLMs from below 10% to 70%, notably surpassing the 48% benchmark set by a specialized, gold-medal-caliber IMO system. Furthermore, we demonstrate LEAP's research-level utility by autonomously formalizing complex proofs for open combinatorial challenges, including a verified proof for a key subproblem in Knuth's Hamiltonian decomposition of even-order Cayley graphs.