On Geometry Regularization in Autoencoder Reduced-Order Models with Latent Neural ODE Dynamics

2026-03-03Machine Learning

Machine Learning
AI summary

The authors study ways to make the hidden (latent) spaces of encoder-decoder models behave nicely for simulating complex systems like advection-diffusion-reaction equations. They try four different methods to regularize the decoder's behavior during training. They find that three of these methods, while making the decoder locally smoother, actually make it harder for the model to learn accurate long-term predictions. The fourth method, involving a special projection of the decoder’s first layer, consistently leads to better performance in modeling the system over time. The authors suggest that matching the geometry of the latent space to the problem is more important than just smoothing the decoder.

encoder-decoder modelslatent representationsregularizationneural ODEadvection-diffusion-reaction equationdecoder JacobianStiefel projectionlatent dynamicsautoencoder pre-trainingrollout performance
Authors
Mikhail Osipov
Abstract
We investigate geometric regularization strategies for learned latent representations in encoder--decoder reduced-order models. In a fixed experimental setting for the advection--diffusion--reaction (ADR) equation, we model latent dynamics using a neural ODE and evaluate four regularization approaches applied during autoencoder pre-training: (a) near-isometry regularization of the decoder Jacobian, (b) a stochastic decoder gain penalty based on random directional gains, (c) a second-order directional curvature penalty, and (d) Stiefel projection of the first decoder layer. Across multiple seeds, we find that (a)--(c) often produce latent representations that make subsequent latent-dynamics training with a frozen autoencoder more difficult, especially for long-horizon rollouts, even when they improve local decoder smoothness or related sensitivity proxies. In contrast, (d) consistently improves conditioning-related diagnostics of the learned latent dynamics and tends to yield better rollout performance. We discuss the hypothesis that, in this setting, the downstream impact of latent-geometry mismatch outweighs the benefits of improved decoder smoothness.