Group Entropies and Mirror Duality: A Class of Flexible Mirror Descent Updates for Machine Learning

2026-03-09Machine Learning

Machine Learning
AI summary

The authors present a new way to improve optimization algorithms used in machine learning by connecting ideas from group theory and generalized entropies. Their method creates a large family of flexible Mirror Descent algorithms that can be adapted to different types of data and distributions by adjusting certain parameters. They introduce a concept called mirror duality, which allows switching between mathematical functions to fine-tune learning behavior. Their approach aims to provide better convergence and flexibility, and they test it on complex optimization problems to show its effectiveness.

Mirror DescentGroup TheoryGroup EntropyGeneralized EntropyOptimization AlgorithmsConvex OptimizationRegularizationStatistical DistributionsMachine LearningNatural Gradient
Authors
Andrzej Cichocki, Piergiulio Tempesta
Abstract
We introduce a comprehensive theoretical and algorithmic framework that bridges formal group theory and group entropies with modern machine learning, paving the way for an infinite, flexible family of Mirror Descent (MD) optimization algorithms. Our approach exploits the rich structure of group entropies, which are generalized entropic functionals governed by group composition laws, encompassing and significantly extending all trace-form entropies such as the Shannon, Tsallis, and Kaniadakis families. By leveraging group-theoretical mirror maps (or link functions) in MD, expressed via multi-parametric generalized logarithms and their inverses (group exponentials), we achieve highly flexible and adaptable MD updates that can be tailored to diverse data geometries and statistical distributions. To this end, we introduce the notion of \textit{mirror duality}, which allows us to seamlessly switch or interchange group-theoretical link functions with their inverses, subject to specific learning rate constraints. By tuning or learning the hyperparameters of the group logarithms enables us to adapt the model to the statistical properties of the training distribution, while simultaneously ensuring desirable convergence characteristics via fine-tuning. This generality not only provides greater flexibility and improved convergence properties, but also opens new perspectives for applications in machine learning and deep learning by expanding the design of regularizers and natural gradient algorithms. We extensively evaluate the validity, robustness, and performance of the proposed updates on large-scale, simplex-constrained quadratic programming problems.