Self-Reflective APIs: Structure Beats Verbosity for AI Agent Recovery
2026-06-03 • Software Engineering
Software EngineeringArtificial Intelligence
AI summaryⓘ
The authors studied how AI agents handle errors when calling APIs. They show that when APIs provide not just error messages but clear, machine-readable suggestions on how to fix those errors, AI agents perform significantly better at completing tasks. This improvement was found in some AI models but not all, and they confirmed these results by testing a different API as well. They also identified hidden issues in benchmark tests and created tools to audit these problems.
AI agentAPIvalidation errormachine-readable feedbacktask completionlarge language modelsAnthropic modelsbenchmark auditingtoken efficiencyself-reflective APIs
Authors
Arquimedes Canedo, Grama Chethan
Abstract
When an AI agent calls an API and hits a validation error, it needs more than what went wrong -- it needs what to do next. A self-reflective API returns, on validation failure, a machine-readable recovery\_feedback.suggestions[] payload sufficient for the agent to repair the request and retry without external reasoning. On a leak-audited pilot ($N{=}30$ per cell, 3 LLMs, 10 adversarial tasks), structured suggestions lift task-completion rate by $+36.7$--$40.0$pp over plain-English diagnoses on Anthropic models (Fisher's exact $p \le 0.0022$), at $1.8$--$2.2\times$ better per-success token efficiency. The lift is not significant on gpt-4o-mini ($p{=}0.435$); a second-domain replication on a billing API confirms the pattern. The comparison only holds after auditing two undocumented classes of answer leakage in LLM benchmarks. We shipaudit\_prompt\_leakage.py as reusable CI infrastructure. Code and data: https://github.com/arquicanedo/self-reflective-apis.