Pion: A Spectrum-Preserving Optimizer via Orthogonal Equivalence Transformation

2026-05-12Machine Learning

Machine Learning
AI summary

The authors present Pion, a new method for training large language models that keeps certain mathematical properties of weight matrices unchanged during updates. Instead of tweaking weights by just adding small changes, Pion changes them using special rotations that preserve how the matrix stretches space. This approach helps keep the training stable and performs competitively with popular methods like Adam. The authors also explore Pion's design and how well it works theoretically and in practice.

large language modelsoptimizerorthogonal transformationsingular valuesspectral normweight matricesAdampretrainingfinetuning
Authors
Kexuan Shi, Hanxuan Li, Zeju Qiu, Yandong Wen, Simon Buchholz, Weiyang Liu
Abstract
We introduce Pion, a spectrum-preserving optimizer for large language model (LLM) training based on orthogonal equivalence transformation. Unlike additive optimizers such as Adam and Muon, Pion updates each weight matrix through left and right orthogonal transformations, preserving its singular values throughout training. This yields an optimization mechanism that modulates the geometry of weight matrices while keeping their spectral norm fixed. We derive the Pion update rule, systematically examine its design choices, and analyze its convergence behavior along with several key properties. Empirical results show that Pion offers a stable and competitive alternative to standard optimizers for both LLM pretraining and finetuning.