Patnaik-Pearson intrinsic dimension for internal representations of neural networks

2026-06-17Computational Geometry

Computational Geometry
AI summary

The authors introduce a new way to measure the complexity, called the Patnaik-Pearson dimension, of data that sits on curved spaces known as manifolds. They focus on how this measure applies to neural network weight matrices, especially when these weights follow a certain kind of distribution called a power law. They prove mathematical properties of their measure and compare it to existing methods. Finally, they use this measure to study how token embeddings change inside popular transformer models like BERT and DeepSeek. They also provide code for others to explore their findings.

intrinsic dimensiondata manifoldtransformerneural network weightsEmpirical Spectral DensityPareto distributionpower lawtoken embeddingsBERTspectral analysis
Authors
Tom Hadfield
Abstract
We define a new measure of intrinsic dimension of a data manifold, which we call the Patnaik-Pearson dimension, and apply this to internal representations of neural networks, in particular transformers. The inspiration for this comes from the HTSR and SETOL work of Martin, Mahoney and Hinrichs, combined with the TwoNN intrinsic dimension estimator of Facco et al. We prove various properties of this intrinsic dimension estimator. Treating weight matrices of neural networks as data manifolds, for weight matrices whose Empirical Spectral Density follows a Pareto (Power Law) distribution, we relate the Patnaik-Pearson dimension to the HTSR and SETOL analysis, and show that critical values of the tail exponent coincide for the two approaches. Using a combination of theoretical and numerical techniques, we study the behaviour of the Patnaik-Pearson dimension of a data manifold under the transformations typical to neural networks. We apply this machinery to the BERT-base and DeepSeek-R1-Distill-Qwen-1 models, to investigate first the Patnaik-Pearson dimension of the initial data manifold of token embeddings, and second the evolution of the Patnaik-Pearson dimension as token embeddings pass through the layers of the model. Code and notebooks used for the numerical results presented here is available at https://github.com/tdhadfield/PatnaikPearson