Automated Batch Distillation Process Simulation for a Large Hybrid Dataset for Deep Anomaly Detection
2026-04-10 • Machine Learning
Machine Learning
AI summaryⓘ
The authors worked on detecting problems in chemical processes using deep learning, which usually needs a lot of good data. They first made a detailed dataset from real batch distillation experiments, then added simulated data created with a new Python tool that copies the real experiments closely. This mix of real and simulated data helps improve detection methods and is shared openly for others to use. Their approach shows it’s possible to automatically simulate many experiments consistently and support better chemical process monitoring.
Anomaly detectionDeep learningBatch distillationProcess simulationDifferential-algebraic equationsTime-series dataHybrid datasetChemical process monitoringPython process simulator
Authors
Jennifer Werner, Justus Arweiler, Indra Jungjohann, Jochen Schmid, Fabian Jirasek, Hans Hasse, Michael Bortz
Abstract
Anomaly detection (AD) in chemical processes based on deep learning offers significant opportunities but requires large, diverse, and well-annotated training datasets that are rarely available from industrial operations. In a recent work, we introduced a large, fully annotated experimental dataset for batch distillation under normal and anomalous operating conditions. In the present study, we augment this dataset with a corresponding simulation dataset, creating a novel hybrid dataset. The simulation data is generated in an automated workflow with a novel Python-based process simulator that employs a tailored index-reduction strategy for the underlying differential-algebraic equations. Leveraging the rich metadata and structured anomaly annotations of the experimental database, experimental records are automatically translated into simulation scenarios. After calibration to a single reference experiment, the dynamics of the other experiments are well predicted. This enabled the fully automated, consistent generation of time-series data for a large number of experimental runs, covering both normal operation and a wide range of actuator- and control-related anomalies. The resulting hybrid dataset is released openly. From a process simulation perspective, this work demonstrates the automated, consistent simulation of large-scale experimental campaigns, using batch distillation as an example. From a data-driven AD perspective, the hybrid dataset provides a unique basis for simulation-to-experiment style transfer, the generation of pseudo-experimental data, and future research on deep AD methods in chemical process monitoring.