Optimal Hidden-Target Learning for Online Inventory Optimization on General Convex Sets

2026-06-12Machine Learning

Machine Learning
AI summary

The authors study online inventory optimization, where decisions about ordering depend on past inventory levels. They analyze a simple strategy that keeps a hidden target and projects it onto the feasible ordering set, proving it is optimal for complex inventory constraints. Using online gradient descent, they improve previous performance guarantees and provide new ones for different loss types and dynamic environments. Their main innovation is a mathematical approach that treats the inventory problem like controlling a simple queue, allowing better understanding and results for general settings. Experiments confirm their theoretical findings.

online inventory optimizationonline convex optimizationonline gradient descentconvex capacity setsregret analysisprojectionstrongly convex lossesdynamic regretnorm alignmentqueue control
Authors
Anthony Pineci, Yunzong Xu
Abstract
Online inventory optimization (OIO) is online convex optimization with physical memory: inventory carryover makes the feasible action set depend on the past. A natural principle, used in stochastic inventory learning and recently in OIO under a single linear capacity constraint, is to maintain a hidden target chosen by an online learner and implement its projection onto the currently feasible order-up-to set. We prove that this simple principle is optimal for OIO on arbitrary bounded convex capacity sets. With online gradient descent as the base learner, the method improves the best known regret guarantee for OIO on general convex sets from inverse to inverse-square-root dependence on the common-demand probability, and we prove a matching lower bound. The same principle gives the first polylogarithmic regret guarantee for strongly convex losses and the first dynamic regret guarantee adapting to Euclidean path variation on general convex capacity sets. The analysis introduces a norm alignment principle: the right state variable is the distance from the hidden target to the feasible set, measured in the same norm as the projection. Under norm alignment, this distance evolves pathwise as a scalar queue, with target movement as arrival and common demand as service. This reduction to one-dimensional queue control resolves the state dependence and extends the guarantees to general convex capacity sets, beyond the reach of prior productwise approaches. Experiments on synthetic and real-world inventory data corroborate the theory.