Near-Optimal Decentralized Stochastic Convex Optimization over Networks

2026-06-03Machine Learning

Machine Learning
AI summary

The authors study a problem where many workers try to minimize a shared goal using limited information and only talk to their neighbors in a network. They develop a new, faster method that works well when using up to about M ≈ √ρ · N^(3/4) workers, improving on previous limits. Their approach cleverly combines delayed acceleration and gossip communication to keep everyone roughly in agreement. They also prove that no similar method can do significantly better, making their result nearly optimal.

decentralized optimizationstochastic gradientsconvex optimizationgossip algorithmsspectral gapacceleration methodsminibatchingheterogeneityfirst-order methods
Authors
Nitai Kluger, Amit Attia, Tomer Koren
Abstract
We study decentralized stochastic smooth convex optimization, where $M$ workers minimize an average objective using local stochastic gradients and neighbor-only communication over a fixed gossip network. A central question in this setting is to determine the largest number of workers that can be used under a total budget of $N$ gradient samples while still preserving the centralized $O(1/\sqrt N)$ statistical rate. We introduce an accelerated decentralized method that preserves this rate for up to $\smash{M\lesssim \sqrtρ\,N^{3/4}}$ workers, where $ρ$ is the spectral gap of the gossip network, improving the best prior maximal scaling of $\smash{M\lesssim ρ\sqrt N}$. The method is based on a one-step-delayed stochastic acceleration scheme that enables workers to interleave minibatching with accelerated gossip while controlling residual disagreement, and its guarantee depends only logarithmically on the optimum-local heterogeneity. We also establish a matching lower bound for linear-span decentralized first-order methods, showing that the method is optimal up to logarithmic factors.