Online Learning with Gradient-Variation Interval Regret

2026-06-02Machine Learning

Machine Learning
AI summary

The authors study a type of online learning where the goal is to do well not just overall, but during every time period. They create a new algorithm that adjusts to how much the problem changes over time, measured by something called gradient variation. Their method is efficient and adapts automatically, even without knowing certain problem details like smoothness or scaling. They also show their algorithm works well in different more complex settings and back up their claims with experiments.

online learninginterval regretgradient variationregret boundadaptive algorithmLipschitz continuitysmoothnessmeta algorithmstochastic optimizationdynamic regret
Authors
Yan-Feng Xie, Shuche Wang, Peng Zhao, Zhi-Hua Zhou
Abstract
This paper investigates non-stationary online learning using the metric of interval regret, which requires an online algorithm to perform well over every time interval. We propose the first online learning algorithm that achieves an interval regret bound scaling with gradient variation, a fundamental measure of the cumulative change in online function gradients, which relates to various problem-dependent quantities and is closely connected to stochastic optimization and other problems. Our method employs a simple and efficient two-layer online ensemble structure that achieves strong theoretical guarantees. Specifically, it enjoys a regret bound that simultaneously adapts to various problem-dependent quantities while also preserving the minimax-optimal rate in the worst case. Moreover, recognizing the challenge of hyperparameter tuning, we introduce a Lipschitz- and smoothness-agnostic variant that automatically adapts to these potentially unknown constants. This is primarily enabled by a novel Lipschitz-adaptive meta algorithm, which may be of independent interest. Beyond interval regret, our method also yields broader implications: it provides versatile bounds for interval dynamic regret, a stronger measure that competes with changing comparators over any interval, and yields the first piecewise characterization for stochastic extended adversarial optimization. Theoretical findings are validated by experiments.