FlexSQL: Flexible Exploration and Execution Make Better Text-to-SQL Agents
2026-05-04 • Computation and Language
Computation and Language
AI summaryⓘ
The authors present FlexSQL, a new system that helps convert natural language questions into database queries more flexibly. Unlike previous methods that only look at the database schema once, FlexSQL can check the schema, look at actual data, and test its queries throughout the process. It can try different query plans and fix its mistakes by going back to earlier steps. Their tests show it performs better than other open-source systems on a popular benchmark. The authors highlight that being flexible in exploring and executing queries is important for improving results.
Text-to-SQLDatabase schemaQuery execution planNatural language processingData groundingError backtrackingAnalytical databasesSQLOpen-source modelsSpider benchmark
Authors
Quang Hieu Pham, Yang He, Ping Nie, Canwen Xu, Davood Rafiei, Yuepeng Wang, Xi Ye, Jocelyn Qiaochu Chen
Abstract
Text-to-SQL over large analytical databases requires navigating complex schemas, resolving ambiguous queries, and grounding decisions in actual data. Most current systems follow a fixed pipeline where schema elements are retrieved once upfront and the database is only revisited for post-hoc repair, limiting recovery from early mistakes. We present FlexSQL, a text-to-SQL agent whose core design principle is flexible database interaction: the agent can explore schema structure, inspect data values, and run verification queries at any point during reasoning. FlexSQL generates diverse execution plans to cover multiple query interpretations, implements each plan in either SQL or Python depending on the task, and uses a two-tiered repair mechanism that can backtrack from code-level errors to plan-level revisions. On Spider2-Snow, using gpt-oss-120b, FlexSQL achieves a 65.4\% score, outperforming strong open-source baselines that use stronger, larger models such as gpt-o3 and DeepSeek-R1. When integrated into a general-purpose coding agent (as skills in Claude Code), our approach yields over 10\% relative improvement on Spider2-Snow. Further analysis shows that flexible exploration and flexible execution jointly contribute to the effectiveness of our approach, highlighting flexibility as a key design principle. Our code is available at: https://github.com/StringNLPLAB/FlexSQL