A Diagnostic Software Suite for Auditing Learned PDE Simulators
2026-06-16 • Mathematical Software
Mathematical Software
AI summaryⓘ
The authors created a software tool to check how well AI models simulate physical systems described by partial differential equations (PDEs). Instead of just looking at simple error amounts, their tool tests if the models behave correctly over time and follow important physical rules. They tested this on several complex fluid and physics problems using different AI methods. The results showed that even if the usual error measure looks okay, deeper checks can reveal problems with the models. This tool helps researchers understand and trust AI PDE simulators better by giving a detailed report instead of just one error number.
partial differential equationsPDE simulatorsnumerical solversrelative L2 errorevolution operatorsNavier-Stokes equationsDeepONetFourier Neural Operator (FNO)U-NetResNet
Authors
Lennon J. Shikhman
Abstract
Learned PDE simulators are increasingly used as low-cost replacements for expensive numerical solvers, but standard relative $L^2$ error does not determine whether a learned model behaves as a coherent numerical time propagator. This paper presents a diagnostic software suite for auditing learned PDE simulators as approximate evolution operators. The suite provides architecture-independent, post hoc diagnostics for relative state error, semigroup consistency, finite-difference generator discrepancy, energy behavior, integral balance, admissibility constraints, perturbation response, and scaling-law consistency. The software is designed around a minimal contract: reference trajectories, a learned propagator or saved predictions, equation metadata, and a diagnostic configuration specifying which structures are meaningful for the problem under study. We validate the suite on five benchmark PDE tasks: two-dimensional incompressible Navier-Stokes, shallow-water dynamics, active matter, three-dimensional compressible Navier-Stokes, and three-dimensional magnetohydrodynamics, using FNO, DeepONet, U-Net, and ResNet-style surrogate models together with controlled underfit and oversmoothed variants. The validation study shows that relative $L^2$ error can remain moderate, or even improve, while structural diagnostics deteriorate substantially. The package therefore supports software-level auditing of learned PDE simulators by reporting an interpretable diagnostic panel rather than collapsing model behavior into a single state-error score.