Teacher Forcing as Generalized Bayes: Optimization Geometry Mismatch in Switching Surrogates for Chaotic Dynamics
2026-04-28 • Machine Learning
Machine Learning
AI summaryⓘ
The authors study a training method called identity teacher forcing (ITF) that helps recurrent neural networks (RNNs) learn chaotic systems more stably. They find that ITF focuses on a single forced prediction path, which makes the training landscape sharper, while considering multiple possible paths (marginal likelihood) smooths this landscape. Their experiments with a complex system (Lorenz-63) show that fine-tuning based on full evidence can improve prediction scores but might reduce accuracy on important dynamic features compared to ITF-trained models. This suggests a trade-off between different training objectives when modeling chaotic dynamics.
identity teacher forcingrecurrent neural networkschaotic dynamical systemsLorenz-63marginal likelihoodcurvatureobserved informationLouis' identityswitching modelsdynamical systems reconstruction
Authors
Andre Herz, Daniel Durstewitz, Georgia Koppe
Abstract
Identity teacher forcing (ITF) enables stable training of deterministic recurrent surrogates for chaotic dynamical systems and has been highly effective for dynamical systems reconstruction (DSR) with recurrent neural networks (RNNs), including interpretable almost-linear RNNs (AL-RNNs). However, as an intervention-based prediction loss (and thus a generalized Bayes update), teacher forcing need not match the free-running model's marginal likelihood geometry. We compare the objective-induced curvatures of ITF and marginal likelihood in a probabilistic switching augmentation of AL-RNNs, estimating ambiguity-aware observed information via Louis' identity. In the switching setting studied here, conditioning on a single forced regime path (as ITF does) inflates curvature, while marginal likelihood curvature is reduced by a missing-information correction when multiple switching explanations remain plausible. In Lorenz-63 experiments, windowed evidence fine-tuning improves held-out evidence but can degrade dynamical quantities of interest (QoIs) relative to ITF-pretrained models.