Consistent Low-Rank Approximation
2026-03-02 • Data Structures and Algorithms
Data Structures and Algorithms
AI summaryⓘ
The authors study how to keep updating a simple, low-rank summary of a matrix when its rows arrive one by one, aiming to stay close to the best possible summary at every step while changing the solution as little as possible. They quantify how much these updates (recourse) are needed to guarantee either an additive or relative error bound compared to the best rank-k approximation. They provide improved bounds for special cases such as integer-valued data and well-conditioned data streams, and prove some fundamental lower limits on how much recourse is necessary. Finally, they validate their theoretical results with experiments showing their methods work well in practice.
low-rank approximationsubspace approximationrecourserank-k approximationadditive errormultiplicative approximationFrobenius normonline algorithmscondition numberdata streams
Authors
David P. Woodruff, Samson Zhou
Abstract
We introduce and study the problem of consistent low-rank approximation, in which rows of an input matrix $\mathbf{A}\in\mathbb{R}^{n\times d}$ arrive sequentially and the goal is to provide a sequence of subspaces that well-approximate the optimal rank-$k$ approximation to the submatrix $\mathbf{A}^{(t)}$ that has arrived at each time $t$, while minimizing the recourse, i.e., the overall change in the sequence of solutions. We first show that when the goal is to achieve a low-rank cost within an additive $\varepsilon\cdot||\mathbf{A}^{(t)}||_F^2$ factor of the optimal cost, roughly $\mathcal{O}\left(\frac{k}{\varepsilon}\log(nd)\right)$ recourse is feasible. For the more challenging goal of achieving a relative $(1+\varepsilon)$-multiplicative approximation of the optimal rank-$k$ cost, we show that a simple upper bound in this setting is $\frac{k^2}{\varepsilon^2}\cdot\text{poly}\log(nd)$ recourse, which we further improve to $\frac{k^{3/2}}{\varepsilon^2}\cdot\text{poly}\log(nd)$ for integer-bounded matrices and $\frac{k}{\varepsilon^2}\cdot\text{poly}\log(nd)$ for data streams with polynomial online condition number. We also show that $Ω\left(\frac{k}{\varepsilon}\log\frac{n}{k}\right)$ recourse is necessary for any algorithm that maintains a multiplicative $(1+\varepsilon)$-approximation to the optimal low-rank cost, even if the full input is known in advance. Finally, we perform a number of empirical evaluations to complement our theoretical guarantees, demonstrating the efficacy of our algorithms in practice.