Virtual-Memory Assisted Buffer Management In Tiered Memory
2026-03-03 • Databases
DatabasesOperating Systems
AI summaryⓘ
The authors explore memory systems that have more than two levels, like fast local memory, slower remote memory, and even slower disk storage. They created vmcacheⁿ, which manages data across multiple memory types by using the computer's virtual memory features. Because moving data between these memory levels can slow things down, they also developed a new tool called move_pages2 to improve control over this process. Their tests showed that vmcacheⁿ can run database queries up to four times faster than older two-level systems.
Tiered memory architectureVirtual memoryBuffer managementPage migrationDRAMRemote memory (RMem)NUMARDMACXLTPC-C benchmark
Authors
Yeasir Rayhan, Walid G. Aref
Abstract
Tiered memory architectures have gained significant traction in the database community in recent years. In these architectures, the on-chip DRAM of the host processor is typically referred to as local memory, and forms the primary tier. Additional byte-addressable, cache-coherent memory resources, collectively referred to as remote memory (RMem, for short), form one or more secondary tiers. RMem is slower than local DRAM but faster than disk, e.g., NUMA memory located on a remote socket, chiplet-attached memory, and memory attached via high-performance interconnect protocols, e.g., RDMA and CXL. In this paper, we discuss how traditional two-tier (DRAM-Disk) virtual-memory assisted Buffer Management techniques generalize to an $n$-tier setting (DRAM-RMem-Disk). We present vmcache$^n$, an $n$-tier virtual-memory-assisted buffer pool that leverages the virtual memory subsystem and operating system calls to migrate pages across memory tiers. In this setup, page migration can become a bottleneck. To address this limitation, we introduce the move_pages2 system call that provides vmcache$^n$ with fine-grained control over the page migration process. Experiments show that vmcache$^n$ can achieve up to 4$\times$ higher query throughput over vmcache for TPC-C workloads.