Think like a Scientist: Physics-guided LLM Agent for Equation Discovery
2026-02-12 • Artificial Intelligence
Artificial IntelligenceMachine Learning
AI summaryⓘ
The authors created KeplerAgent, a system that mimics how scientists discover formulas by first finding physical properties before guessing equations. Unlike other methods that jump straight to guessing equations from data, KeplerAgent uses a step-by-step reasoning process with physics tools to guide its search. This approach helps it find more accurate and reliable formulas, especially when the data is noisy. They tested it on physics problems and it did better than other language model and traditional methods.
symbolic equation discoverylarge language modelssymbolic regressionphysical propertiessymmetryPySINDyPySRnoisy datascientific reasoningagentic framework
Authors
Jianke Yang, Ohm Venkatachalam, Mohammad Kianezhad, Sharvaree Vadgama, Rose Yu
Abstract
Explaining observed phenomena through symbolic, interpretable formulas is a fundamental goal of science. Recently, large language models (LLMs) have emerged as promising tools for symbolic equation discovery, owing to their broad domain knowledge and strong reasoning capabilities. However, most existing LLM-based systems try to guess equations directly from data, without modeling the multi-step reasoning process that scientists often follow: first inferring physical properties such as symmetries, then using these as priors to restrict the space of candidate equations. We introduce KeplerAgent, an agentic framework that explicitly follows this scientific reasoning process. The agent coordinates physics-based tools to extract intermediate structure and uses these results to configure symbolic regression engines such as PySINDy and PySR, including their function libraries and structural constraints. Across a suite of physical equation benchmarks, KeplerAgent achieves substantially higher symbolic accuracy and greater robustness to noisy data than both LLM and traditional baselines.