Order-Optimal Sequential 1-Bit Mean Estimation in General Tail Regimes

2026-04-09Information Theory

Information TheoryMachine Learning
AI summary

The authors study how to estimate the average value (mean) of data when only allowed to send 1-bit messages that say if a sample is above or below certain thresholds. They design an adaptive estimator that works efficiently for distributions with bounded means and moments, achieving the best possible sample use for many cases. They show a fundamental extra cost for finite variance distributions due to 1-bit limits, and prove that non-adaptive methods need many more samples. The authors also provide practical algorithm versions for unknown budgets and scale parameters.

mean estimation1-bit communicationadaptive estimatorPAC learningsample complexitycentral momentminimax lower boundquantizationnon-adaptive estimatorthreshold queries
Authors
Ivan Lau, Jonathan Scarlett
Abstract
In this paper, we study the problem of mean estimation under strict 1-bit communication constraints. We propose a novel adaptive mean estimator based solely on randomized threshold queries, where each 1-bit outcome indicates whether a given sample exceeds a sequentially chosen threshold. Our estimator is $(ε, δ)$-PAC for any distribution with a bounded mean $μ\in [-λ, λ]$ and a bounded $k$-th central moment $\mathbb{E}[|X-μ|^k] \le σ^k$ for any fixed $k > 1$. Crucially, our sample complexity is order-optimal in all such tail regimes, i.e., for every such $k$ value. For $k \neq 2$, our estimator's sample complexity matches the unquantized minimax lower bounds plus an unavoidable $O(\log(λ/σ))$ localization cost. For the finite-variance case ($k=2$), our estimator's sample complexity has an extra multiplicative $O(\log(σ/ε))$ penalty, and we establish a novel information-theoretic lower bound showing that this penalty is a fundamental limit of 1-bit quantization. We also establish a significant adaptivity gap: for both threshold queries and more general interval queries, the sample complexity of any non-adaptive estimator must scale linearly with the search space parameter $λ/σ$, rendering it vastly less sample efficient than our adaptive approach. Finally, we present algorithmic variants that (i) handle an unknown sampling budget, (ii) adapt to an unknown scale parameter~$σ$ given (possibly loose) bounds, and (iii) require only two stages of adaptivity at the expense of more complicated general 1-bit queries.