OASIS: Online Activation Subspace Learning for Memory-Efficient Training

2026-04-10Machine Learning

Machine Learning
AI summary

The authors propose OASIS, a new method to save memory when training large language models. Instead of storing all the intermediate data exactly, OASIS learns a smaller, evolving space to represent this data efficiently during training. This reduces memory use without changing how the model processes information. The method also keeps optimizer information consistent with these changes, allowing stable training. Tests show OASIS cuts peak memory use in half while maintaining model performance better than previous techniques.

large language modelsmemory optimizationactivation memorylow-dimensional subspacegradient representationoptimizer statesonline learningfine-tuningpretraining
Authors
Sakshi Choudhary, Utkarsh Saxena, Kaushik Roy
Abstract
Training large language models (LLMs) is constrained by memory requirements, with activations accounting for a substantial fraction of the total footprint. Existing approaches reduce memory using low-rank weight parameterizations or low-rank gradient subspaces for optimizer states, while activation memory is addressed through architectural modifications or compression schemes based on periodically updated projections. We propose OASIS, an online activation subspace learning algorithm for memory-efficient training that tracks and continuously updates a low-dimensional activation subspace during training. Intermediate activations are projected onto this evolving subspace, reducing memory without modifying forward-pass computations. The evolving activation subspace induces low-rank gradient representations, enabling both gradients and optimizer states to be maintained directly in this subspace, while a projection-aware optimizer consistently transports optimizer states across subspace updates for stable training. Across various finetuning and pretraining tasks, OASIS achieves up to $2\times$ lower peak memory than full fine-tuning while matching its performance and outperforming prior low-rank methods.