Sharp Capacity Thresholds in Linear Associative Memory: From Winner-Take-All to Listwise Retrieval

2026-05-06Information Theory

Information TheoryMachine Learning
AI summary

The authors study how many key-value pairs a square matrix memory can hold, showing it depends on both the matrix size and how retrieval is done. They find that if you want to always pick the single best match (top-1 retrieval), the memory needs to grow with the number of pairs times a logarithmic factor. Using a method called correlation matrix memory, they prove this scaling is necessary for any linear memory approach. For a more relaxed retrieval method, where the correct item just needs to be among the top candidates (listwise retrieval), they define a new criterion called Tail-Average Margin and show the memory capacity grows roughly with the square of the matrix dimension. They also develop a theory predicting the exact limits and behaviors of the memory under these conditions.

linear memorykey-value associationsretrieval criteriontop-1 retrievalcorrelation matrix memoryphase transitionlistwise retrievalTail-Average Margin (TAM)capacity scalingasymptotic theory
Authors
Nicholas Barnfield, Juno Kim, Eshaan Nichani, Jason D. Lee, Yue M. Lu
Abstract
How many key-value associations can a $d\times d$ linear memory store? We show that the answer depends not only on the $d^2$ degrees of freedom in the memory matrix, but also on the retrieval criterion. In an isotropic Gaussian model for the stored pairs, we show that top-1 retrieval, where every signal must beat its largest distractor, requires the logarithmic model-size scale $d^2\asymp n\log n$. We prove that the correlation matrix memory construction, which stores associations by superposing key-target outer products, achieves this scale through a sharp phase transition, and that the same scaling is necessary for any linear memory. Thus the logarithm is the intrinsic extreme-value price of winner-take-all decoding. We next consider listwise retrieval, where the correct target need not be the unique top-scoring item but should remain among the strongest candidates. To formalize this regime, we propose the Tail-Average Margin (TAM), a convex upper-tail criterion that certifies inclusion of the correct target in a controlled candidate list. Under this listwise retrieval criterion, the capacity follows the quadratic scale $d^2\asymp n$. At load $n/d^2\toα$, we develop an exact asymptotic theory for the TAM empirical-risk minimizer through a two-parameter scalar variational principle. The theory has a rich phenomenology: in the ridgeless limit it yields a closed-form critical load separating satisfiable and unsatisfiable phases, and it predicts the limiting laws of true scores, competitor scores, margins, and percentile profiles. Finally, a small-tail extrapolation further leads to the conjectural sharp top-1 threshold $d^2\sim 2n\log n$.