SmartSearch: How Ranking Beats Structure for Conversational Memory Retrieval

2026-03-16Machine Learning

Machine Learning
AI summary

The authors show that complicated AI systems for remembering past chat conversations don't need to be so complex. Their system, SmartSearch, finds important parts of a conversation using simple rule-based methods and only one AI step for ranking results, which runs fast on a basic CPU. They identify that the main problem is fitting the right information within a limited token budget. With smart trimming of results, their method performs better than other systems on two test sets, while using much fewer tokens to process the conversation.

conversational memorynamed entity recognitionsubstring matchingmulti-hop retrievalCrossEncoderColBERTtoken budgetranking fusiondeterministic pipelineretrieval recall
Authors
Jesper Derehag, Carlos Calva, Timmy Ghiurau
Abstract
Recent conversational memory systems invest heavily in LLM-based structuring at ingestion time and learned retrieval policies at query time. We show that neither is necessary. SmartSearch retrieves from raw, unstructured conversation history using a fully deterministic pipeline: NER-weighted substring matching for recall, rule-based entity discovery for multi-hop expansion, and a CrossEncoder+ColBERT rank fusion stage -- the only learned component -- running on CPU in ~650ms. Oracle analysis on two benchmarks identifies a compilation bottleneck: retrieval recall reaches 98.6%, but without intelligent ranking only 22.5% of gold evidence survives truncation to the token budget. With score-adaptive truncation and no per-dataset tuning, SmartSearch achieves 93.5% on LoCoMo and 88.4% on LongMemEval-S, exceeding all known memory systems under the same evaluation protocol on both benchmarks while using 8.5x fewer tokens than full-context baselines.