SHERLOC: Structured Diagnostic Localization for Code Repair Agents

2026-06-23 • Computation and Language

Computation and Language

AI summaryⓘ

The authors looked at how large language models (LLMs) fix coding problems in big code repositories. They saw that these models spend a lot of effort just finding where the code is broken before trying to fix it. To help with this, the authors made SHERLOC, a new method that uses reasoning and small tools to better find the exact problem spots without needing extra training. SHERLOC works really well compared to other methods and helps repair tools fix bugs faster and with less work.

Large Language ModelsFault LocalizationCode RepairTool UseMulti-turn ReasoningSWE-BenchSelf-recoveryParameter ScaleAgentic MethodsDiagnostic Context

Authors

Hovhannes Tamoyan, Sean Narenthiran, Erik Arakelyan, Mira Mezini, Boris Ginsburg

Abstract

LLM agents solve repository-level coding tasks through multi-turn tool use, but utilize half their budget on locating faults before editing. Dedicated localization frameworks have emerged, yet are still evaluated as file retrieval rather than actionable diagnosis, producing locations without the diagnostic context a repair agent needs. We introduce SHERLOC (Structured Hypothesis-driven Exploration and Reasoning for Localization), a training-free framework pairing a reasoning LLM with compact repository tools and self-recovery, without fine-tuning or multi-agent orchestration. SHERLOC reaches state-of-the-art localization across model scales: 84.33% accuracy@1 on SWE-Bench Lite and 81.27% recall@1 on SWE-Bench Verified; at ~30B parameters, it matches or outperforms other agentic methods. Injecting our locations and diagnostic findings into repair agents yields, on average, +5.95 pp resolve rate on SWE-Bench Verified while cutting localization and total tokens by 36.7% and 23.1%.

View PDFOpen arXiv