AgentIR: Reasoning-Aware Retrival for Deep Research Agents
2026-03-04 • Computation and Language
Computation and Language
AI summaryⓘ
The authors point out that Deep Research agents write down their thought process before searching, unlike humans who don’t. They created a new method called Reasoning-Aware Retrieval that uses these written thoughts along with the search query to find better results. They also made a way to generate training data for this method using existing question-answer datasets. Their combined approach performs much better on a tough benchmark compared to traditional search methods. This shows that including the agents' reasoning helps retrieval systems work more effectively.
Deep Research agentsReasoning-Aware Retrievalembedding modelsdata synthesisretrieval systemstraining dataBM25BrowseComp-Plus benchmarknatural language reasoningquery embedding
Authors
Zijian Chen, Xueguang Ma, Shengyao Zhuang, Jimmy Lin, Akari Asai, Victor Zhong
Abstract
Deep Research agents are rapidly emerging as primary consumers of modern retrieval systems. Unlike human users who issue and refine queries without documenting their intermediate thought processes, Deep Research agents generate explicit natural language reasoning before each search call, revealing rich intent and contextual information that existing retrievers entirely ignore. To exploit this overlooked signal, we introduce: (1) Reasoning-Aware Retrieval, a retrieval paradigm that jointly embeds the agent's reasoning trace alongside its query; and (2) DR-Synth, a data synthesis method that generates Deep Research retriever training data from standard QA datasets. We demonstrate that both components are independently effective, and their combination yields a trained embedding model, AgentIR-4B, with substantial gains. On the challenging BrowseComp-Plus benchmark, AgentIR-4B achieves 68\% accuracy with the open-weight agent Tongyi-DeepResearch, compared to 50\% with conventional embedding models twice its size, and 37\% with BM25. Code and data are available at: https://texttron.github.io/AgentIR/.