Superposition Is Not Necessary: A Mechanistic Interpretability Analysis of Transformer Representations for Time Series Forecasting

2026-05-06Machine Learning

Machine LearningArtificial Intelligence
AI summary

The authors studied why simple linear models still do well for time series forecasting compared to complex transformers. They used a method called sparse autoencoders to analyze the internal workings of a transformer model named PatchTST. Their findings show that the transformer's internal features are sparse and stable, not relying on complicated overlapping patterns (superposition) as seen in language models. This suggests time series forecasting might not need the complex representations that transformers use in NLP, explaining why simple models remain competitive.

TransformerTime Series ForecastingSparse AutoencoderPatchTSTFeedforward Network (FFN)SuperpositionLinear ModelsMechanistic InterpretabilityGELU ActivationDictionary Expansion
Authors
Alper Yıldırım
Abstract
Transformer architectures have been widely adopted for time series forecasting, yet whether the representational mechanisms that make them powerful in NLP actually engage on time series data remains unexplored. The persistent competitiveness of simple linear models such as DLinear has fueled ongoing debate, but no mechanistic explanation for this phenomenon has been offered. We address this gap by applying sparse autoencoders (SAEs), a tool from mechanistic interpretability, to probe the internal representations of PatchTST. We first establish that a single-layer, narrow-dimensional transformer matches the forecasting performance of deeper configurations across commonly used benchmarks. We then train SAEs on the post-GELU intermediate FFN activations with dictionary sizes ranging from 0.5x to 4.0x the native dimensionality. Expanding the dictionary yields negligible downstream performance change (average 0.214%), with large portions of overcomplete dictionaries remaining inactive. Targeted causal interventions on dominant latent features produce minimal forecast perturbation. Across all evaluated settings, we observe no empirical evidence that the analyzed FFN representations rely on strong superposition. Instead, the representations remain sparse, stable under aggressive dictionary expansion, and largely insensitive to latent interventions. These results demonstrate that superposition is not necessary for competitive performance on standard forecasting benchmarks, suggesting they may not demand the rich compositional representations that drive transformer success in language modeling, and helping explain the persistent competitiveness of simple linear models