Catapults to the Rescue: Accelerating Vector Search by Exploiting Query Locality
2026-03-02 • Databases
Databases
AI summaryⓘ
The authors present CatapultDB, a tool that improves how searches start in large vector databases used for finding approximate nearest neighbors. Instead of always beginning searches from fixed points, CatapultDB adds shortcut connections called catapults to jump closer to frequently searched areas, speeding up the process. Their method works without changing the original search algorithm and supports dynamic updates and disk storage. Experiments showed CatapultDB can make searches much faster while keeping or improving accuracy, and it adapts well when the types of search queries change over time.
approximate nearest neighbor searchvector databasesproximity graphquery localitygraph-based indexingshortcut edgesdynamic insertionDiskANNLSH (Locality-Sensitive Hashing)query workload adaptation
Authors
Sami Abuzakuk, Anne-Marie Kermarrec, Rafael Pires, Mathis Randl, Martijn de Vos
Abstract
Graph-based indexing is the dominant approach for approximate nearest neighbor search in vector databases, offering high recall with low latency across billions of vectors. However, in such indices, the edge set of the proximity graph is only modified to reflect changes in the indexed data, never to adapt to the query workload. This is wasteful: real-world query streams exhibit strong spatial and temporal locality, yet every query must re-traverse the same intermediate hops from fixed or random entry points. We present CatapultDB, a lightweight mechanism that, for the first time, dynamically determines where to begin the search in an ANN index on the fly, therefore exploiting query locality. CatapultDB injects shortcut edges called catapults that connect query regions to frequently visited destination nodes. Catapults are maintained as an additional layer on top of the graph, so the standard vector search algorithm remains unchanged: queries are simply routed to a better starting point when an appropriate catapult exists. This transparent design preserves the full feature set of the underlying system, including filtered search, dynamic insertions, and disk-resident indices. We implement CatapultDB and evaluate it using four workloads with varying amounts of bias. Our experiments show that CatapultDB increases throughput by up to 2.51x compared to DiskANN at equivalent or better recall, matches the efficiency of LSH-based approaches without sacrificing filtering or requiring index reconstruction, and adapts gracefully to workload shifts, unlike cache-based alternatives.