Automatic Generation of Executable BPMN Models from Medical Guidelines

2026-04-09Artificial Intelligence

Artificial IntelligenceMachine LearningSoftware Engineering
AI summary

The authors created a system that turns healthcare policy documents into detailed computer models that can simulate how the policies work in practice. They used advanced language tools to accurately convert and improve these models, adding in ways to measure important health outcomes and detect when the policies are unclear. They tested this by using diabetes prevention guidelines from three Japanese cities, running many simulations on fake patient data. Their system matched the original policies perfectly on clear documents and still performed well on more complex ones, with a method to flag when human review is needed.

Business Process Model and Notation (BPMN)large language models (LLMs)policy digitizationdiabetic nephropathysimulation-based evaluationkey performance indicators (KPIs)entropy-based uncertainty detectionsynthetic patient dataexecutable augmentation
Authors
Praveen Kumar Menaka Sekar, Ion Matei, Maksym Zhenirovskyy, Hon Yung Wong, Sayuri Kohmura, Shinji Hotta, Akihiro Inomata
Abstract
We present an end-to-end pipeline that converts healthcare policy documents into executable, data-aware Business Process Model and Notation (BPMN) models using large language models (LLMs) for simulation-based policy evaluation. We address the main challenges of automated policy digitization with four contributions: data-grounded BPMN generation with syntax auto-correction, executable augmentation, KPI instrumentation, and entropy-based uncertainty detection. We evaluate the pipeline on diabetic nephropathy prevention guidelines from three Japanese municipalities, generating 100 models per backend across three LLMs and executing each against 1,000 synthetic patients. On well-structured policies, the pipeline achieves a 100% ground-truth match with perfect per-patient decision agreement. Across all conditions, raw per-patient decision agreement exceeds 92%, and entropy scores increase monotonically with document complexity, confirming that the detector reliably separates unambiguous policies from those requiring targeted human clarification.