Subgroups of $U(d)$ Induce Natural RNN and Transformer Architectures

2026-02-20 • Machine Learning

Machine LearningComputation and Language

AI summaryⓘ

The authors propose a new way to build sequence models, like those used in language processing, by focusing on special mathematical groups called closed subgroups of U(d). They create a basic template that can be adapted to different groups, including O(d), to design recurrent neural networks (RNNs) and transformer models. Their experiments on text datasets show how these orthogonal-state models work when matched for parameters. They also introduce an extension that mixes information linearly in a special space, which helps improve performance in their O(d) models under resource limits.

sequence modelshidden statesclosed subgroupsunitary group U(d)orthogonal group O(d)recurrent neural networkstransformer modelstangent spaceparameter-matchinglinear mixing

Authors

Joshua Nunley

Abstract

This paper presents a direct framework for sequence models with hidden states on closed subgroups of U(d). We use a minimal axiomatic setup and derive recurrent and transformer templates from a shared skeleton in which subgroup choice acts as a drop-in replacement for state space, tangent projection, and update map. We then specialize to O(d) and evaluate orthogonal-state RNN and transformer models on Tiny Shakespeare and Penn Treebank under parameter-matched settings. We also report a general linear-mixing extension in tangent space, which applies across subgroup choices and improves finite-budget performance in the current O(d) experiments.

View PDFOpen arXiv