Goedel-Architect: Streamlining Formal Theorem Proving with Blueprint Generation and Refinement

2026-06-04 • Artificial Intelligence

Artificial Intelligence

AI summaryⓘ

The authors present Goedel-Architect, a system that helps prove math theorems in Lean 4 by first creating a 'blueprint' of smaller lemmas and definitions needed for the main proof. This blueprint guides the system to tackle lemmas in parallel, improving efficiency compared to older methods that might get stuck on dead ends. They use a large language model called DeepSeek-V4-Flash to help generate and refine these blueprints, achieving very high success rates on several benchmark math problem sets. Adding natural language hints lets the system solve the hardest problems too. Their method is notable for strong performance while being much cheaper to run than similar open-source tools.

Formal theorem provingLean 4Blueprint generationLemmaDependency graphParallel proof searchDeepSeek-V4-FlashMiniF2F-testPutnamBenchNatural language proof

Authors

Jui-Hui Chung, Ziyang Cai, Zihao Li, Qishuo Yin, Rohit Agarwal, Simon Park, Rodrigo Porto, Narutatsu Ri, Ziran Yang, Shange Tang, Xingyu Dang, Hongzhou Lin, Mengdi Wang, Danqi Chen, Chi Jin, Liam H Fowl, Sanjeev Arora

Abstract

We introduce Goedel-Architect, an agentic framework for formal theorem proving in Lean 4 centered on blueprint generation and refinement. A blueprint is a dependency graph of definitions and lemmas that builds up to the main theorem. First, Goedel-Architect generates a blueprint of formally stated definitions and lemmas, along with declared dependencies. This blueprint is optionally guided by a natural language proof. Then, a tool-equipped Lean prover component closes each open lemma node in parallel using relevant dependencies. Failed lemmas in turn drive refinement of the global blueprint. This strategy contrasts with other mainstream approaches which use recursive lemma decomposition, and can inefficiently loop on dead-end strategies. Using the open-weight DeepSeek-V4-Flash (284B-A13B) as the backbone, Goedel-Architect attains 99.2% pass@1 on MiniF2F-test and 75.6% pass@1 on PutnamBench. With an optional natural-language proof seeding the initial blueprint on the harder problems, we additionally close the remaining two MiniF2F-test problems (reaching 100%), lift PutnamBench to 88.8% (597/672), and solve 4/6 on IMO 2025, 11/12 on Putnam 2025, and 3/6 on USAMO 2026. This represents state-of-the-art performance for an open-source pipeline at a price point up to 500x less than comparable open-source pipelines.

View PDFOpen arXiv