The Token Is a Group Element: On Lie-Algebra Attention over Matrix Lie Groups
2026-06-18 • Machine Learning
Machine LearningComputer Vision and Pattern RecognitionGraphicsRobotics
AI summaryⓘ
The authors propose a new attention method called Lie-Algebra Attention, where each token represents a transformation in a mathematical group called a matrix Lie group. Instead of learning attention scores from data, they compute them exactly using the group’s inherent geometry, focusing on the relative position between tokens as group elements. This approach works for complex transformation groups, including those that previous methods cannot handle, and is more parameter-efficient while maintaining invariance properties. They tested their method on three different transformation groups and found it matches or outperforms learned approaches with far fewer parameters.
matrix Lie groupattention mechanismLie algebragroup invarianceaffine grouprelative poseequivariancelogarithm mapFrobenius inner productsequence completion
Authors
Przemyslaw Musialski
Abstract
We place the attention token on the group: a token is an element $g_i$ of a matrix Lie group $G$ -- a bare transformation, with no feature payload and no external action $ρ(g)$ carrying it. To our knowledge this is the first attention construction whose tokens are bare matrix Lie group elements: their score is the closed-form algebra norm of the relative pose rather than a learned kernel, and it reaches the affine full-frame groups that every irrep- or surjective-exp-based method must exclude. We call it Lie-Algebra Attention. Once tokens are group elements, the rest follows with none of the usual representation-theoretic machinery. The relative geometry of a pair is canonical, $g_i^{-1} g_j$, so the pairwise invariant $w_{ij} = \log(g_i^{-1} g_j)$ is intrinsic rather than designed; equivariance under the diagonal $G$-action is tautological, and the cocycle condition holds automatically. The attention score is the negative squared algebra norm, $s_{ij} = -\|\log(g_i^{-1} g_j)\|_λ^2/τ$: the canonical proximity kernel under a block-weighted Frobenius inner product, with no irreducible representations, spherical harmonics, Clebsch-Gordan products, or learned kernel. The construction applies to any matrix Lie group on a chosen logarithm chart containing the relative poses, including the non-compact non-abelian affine groups with scale and shear that no vector-token attention method reaches: neither the irrep tradition nor surjective-exp methods. Three sequence-completion experiments, on SE(2), SO(3), and Aff(2), bear this out: the closed-form score matches a learned MLP kernel on the same invariant and outperforms it on SE(2), using 50 to 80x fewer score parameters, while a vector-token baseline breaks invariance by five to twelve orders of magnitude.