Adaptive Methods Are Preferable in High Privacy Settings: An SDE Perspective

2026-03-03Machine Learning

Machine LearningCryptography and Security
AI summary

The authors studied how adding privacy noise affects different optimization methods used in training machine learning models with privacy guarantees. They analyzed two algorithms, DP-SGD and DP-SignSGD, using stochastic differential equations to understand their convergence behaviors under fixed and optimal hyperparameters. They found that DP-SignSGD can perform better in high privacy settings without needing much tuning, making it more practical for certain cases. Their experiments confirmed these theoretical insights and showed similar results extend to another method called DP-Adam.

Differential PrivacyDP-SGDDP-SignSGDStochastic Differential EquationsOptimizationPrivacy-Utility Trade-OffLearning RateAdaptive MethodsMachine LearningHyperparameters
Authors
Enea Monzio Compagnoni, Alessandro Stanghellini, Rustem Islamov, Aurelien Lucchi, Anastasiia Koloskova
Abstract
Differential Privacy (DP) is becoming central to large-scale training as privacy regulations tighten. We revisit how DP noise interacts with adaptivity in optimization through the lens of stochastic differential equations, providing the first SDE-based analysis of private optimizers. Focusing on DP-SGD and DP-SignSGD under per-example clipping, we show a sharp contrast under fixed hyperparameters: DP-SGD converges at a Privacy-Utility Trade-Off of $\mathcal{O}(1/\varepsilon^2)$ with speed independent of $\varepsilon$, while DP-SignSGD converges at a speed linear in $\varepsilon$ with an $\mathcal{O}(1/\varepsilon)$ trade-off, dominating in high-privacy or large batch noise regimes. By contrast, under optimal learning rates, both methods achieve comparable theoretical asymptotic performance; however, the optimal learning rate of DP-SGD scales linearly with $\varepsilon$, while that of DP-SignSGD is essentially $\varepsilon$-independent. This makes adaptive methods far more practical, as their hyperparameters transfer across privacy levels with little or no re-tuning. Empirical results confirm our theory across training and test metrics, and empirically extend from DP-SignSGD to DP-Adam.