Preserving Plasticity in Continual Learning via Dynamical Isometry
2026-06-08 • Machine Learning
Machine LearningArtificial Intelligence
AI summaryⓘ
The authors study why deep neural networks lose their ability to learn new things over time, especially when the data changes. They link this problem to a mathematical property called dynamical isometry, which means that certain internal calculations stay well-behaved. They show that keeping this property helps networks stay flexible and keep learning. To help achieve this, they propose a new training method and optimizer, AdamO, that maintains this balance better than previous methods. Their approach works well in tests where networks usually struggle to keep learning.
Continual learningPlasticityNeural Tangent KernelDynamical isometryJacobian singular valuesReLU unitsAdaptive optimizerAdam optimizerRegularizationReinforcement learning
Authors
Andries Rosseau, Robert Müller, Ann Nowé
Abstract
Continual training of deep neural networks under non-stationarity often leads to a progressive loss of plasticity, eventually limiting further learning. We relate plasticity to the empirical Neural Tangent Kernel, and identify dynamical isometry (the condition that layer-wise Jacobian singular values remain close to one) as a key mechanism for preserving plasticity in continual learning. We revisit a class of networks that are almost-everywhere isometric while remaining universal Lipschitz function approximators, demonstrating that near-dynamical isometry is compatible with expressive nonlinear representations. For general architectures, we propose an efficient isometry-promoting regularization scheme and identify a novel mechanism by which it can reactivate dormant ReLU units. Building on this, we introduce AdamO, an Adam-style adaptive optimizer that decouples isometry regularization from gradient updates, analogous to AdamW. We further reinterpret prior plasticity-preserving approaches through the lens of dynamical isometry, showing that they target only a partial measure of isometry. Across supervised and reinforcement-learning continual-learning benchmarks designed to induce plasticity loss, our methods consistently match or outperform existing approaches.