Stein Kernelized Molecular Dynamics for Active Learning of Interatomic Potentials
2026-06-02 • Machine Learning
Machine Learning
AI summaryⓘ
The authors introduce a new method called Stein kernelized molecular dynamics (SKMD) to help train machine learning models that simulate atoms more accurately and efficiently. SKMD uses a special way of exploring different atomic configurations to find the most useful training data, ensuring the simulations cover important and likely atomic arrangements. Unlike other methods, SKMD keeps the natural distribution of these arrangements, balancing exploring new ones and focusing on likely ones. The authors show that SKMD improves model accuracy faster and with fewer training rounds compared to standard approaches.
machine learning interatomic potentialsmolecular dynamicsenhanced samplingStein variational gradient descentBoltzmann distributionactive learningkernel methodsneural network potentialsatomic descriptorsenergy landscape
Authors
Joanna Zou, Fraser Birks, Dallas Foster, Youssef Marzouk
Abstract
Machine learning interatomic potentials (MLIPs) enable efficient and accurate atomistic simulations but depend critically on the quality and diversity of the training data. We introduce Stein kernelized molecular dynamics (SKMD), an enhanced sampling method that uses interacting particle dynamics to acquire informative training configurations for the active learning and fine-tuning of MLIPs. SKMD corresponds to a stochastic variant of Stein variational gradient descent that is adapted for molecular dynamics by incorporating asynchronous particle updates and a kernel of global atomic descriptors, which provides a symmetry-aware measure of configurational similarity. Unlike other enhanced samplers used in molecular dynamics, SKMD preserves the Boltzmann distribution as the asymptotic distribution of the dynamics. This property enforces a balance between the exploration of diverse configurations and attraction toward high-probability regions of the energy landscape. We further propose an approach to efficient online data acquisition using an adaptive stopping criterion that selects non-redundant training data over the course of simulation. We demonstrate SKMD for the active learning of a neural network model of the Müller-Brown potential and the fine-tuning of a MACE interatomic potential for alanine dipeptide. Compared to active learning baselines, our method achieves higher model accuracy in fewer training iterations with the same number of acquired training samples.