VeriTrace: Evolving Mental Models for Deep Research Agents

2026-05-25 • Artificial Intelligence

Artificial Intelligence

AI summaryⓘ

The authors address problems that arise when AI research agents work with lots of uncertain and complex information, which can cause errors to build up. They suggest that instead of letting the AI figure things out on its own, the AI should get clear, ongoing feedback to keep its understanding accurate. They created a system called VeriTrace that uses three feedback loops to help the AI update its knowledge properly. Their tests showed that VeriTrace performs better than strong existing models on specialized research benchmarks.

deep research agentsintermediate representationslarge language modelsfeedback loopscognitive graphVeriTracemodel regulationerror propagationDeepResearch BenchDeepConsult

Authors

Haolang Zhao, Yunbo Long, Lukas Beckenbauer, Alexandra Brintrup

Abstract

Deep research agents face vast, interdependent, and pervasively uncertain information. Existing systems explore what evolving intermediate representations should look like, but leave their evolution to the LLM's implicit reasoning. Without explicit regulation, the intermediate layer is easily contaminated by mixed-quality information and propagates errors along its dependencies, so model scale often ends up substituting for absent regulation. We argue that an agent's mental model should instead evolve through explicit feedback that continuously aligns task understanding with reality, and identify three regulatory loops: interpretive update, deviation feedback, and schema revision. We realise this in VeriTrace, a cognitive-graph framework that explicitly implements the three loops. Using matched Qwen3.5-27B backbones, VeriTrace improves over the strongest matched baseline by 4.22 pp on DeepResearch Bench (DRB) Insight (1.49 pp Overall) and by 5.9 pp Overall win rate on DeepConsult. With Config-DeepSeek, it achieves the strongest reproducible open-source result on DRB.

View PDFOpen arXiv