Online Convex Optimization with Sublinear Noisy Probes

2026-06-12 • Machine Learning

Machine LearningData Structures and Algorithms

AI summaryⓘ

The authors study a problem where a learning algorithm picks points from a set and sees convex losses, aiming to perform as well as the best fixed choice. They introduce a new model that lets the learner make a limited number of pairwise comparisons between points to get noisy feedback about which is better. Their main result shows that even a few noisy comparisons can improve the learner's overall performance compared to only observing losses directly. They provide tight mathematical bounds describing how performance improves depending on the number of probes and noise level. The analysis extends to special cases like prediction with experts and uses variance-based techniques for clearer understanding.

Online Convex OptimizationConvex LossRegret MinimizationPairwise ProbingNoisy FeedbackExponential WeightsVariance ReductionPrediction with ExpertsConvex SetSecond-order Analysis

Authors

Simone Di Gregorio, Anupam Gupta, Stefano Leonardi, Matteo Russo

Abstract

We study Online Convex Optimization (OCO) over a convex set $K\subseteq \mathbb R^d$, where in each round $t$ the learner selects $x_t\in K$ and then observes a convex loss $f_t:K\to[0,1]$, with the goal of minimizing regret to the best fixed decision in hindsight. We introduce a unified probing model that generalizes two recent lines of work: sublinear best-expert queries in the experts setting, and pairwise (comparison-based) feedback available every round in OCO. In our framework, the learner has a budget of $k\le T$ pairwise probes; on a probed round it may query two points and learn which one has smaller loss. Our main result shows that even a sublinear and noisy probe budget can provably improve worst-case regret in the full feedback OCO regime. With $k$ $δ$-noisy pairwise probes, we obtain: $ \text{Reg}_T \le O\left(\min\left\{\sqrt{dT\ln T},\; \frac{dT\ln T}{k|1-2δ|}\right\}\right) $, which is tight (up to logarithmic factors in $T$) across $T$, $k$ and $δ$. Specifically regarding the noise parameter $δ\in [0,1]$, the regret guarantee smoothly degrades as the oracle response approaches a coin flip, i.e., $δ$ is close to $\frac{1}{2}$. When applying the same techniques to a finite $K$ for the prediction with $d$ experts setting, the resulting rates are instead completely tight in all parameters, including $d$. Our analysis gives a streamlined treatment of pairwise probing in OCO by quantifying the benefit of probing via a variance reduction effect, combined with a second-order (variance-based) analysis of Continuous Exponential Weights.

View PDFOpen arXiv