DeepSWIP: Quotient-WMC Counterfactuals for Neural Probabilistic Logic Programs
2026-06-18 • Artificial Intelligence
Artificial Intelligence
AI summaryⓘ
The authors present DeepSWIP, a method that improves how DeepProbLog models reason about 'what if' scenarios by adding a way to handle interventions and evidence causally, not just associationally. They do this by transforming neural network outputs into logical choices, which lets them compute counterfactual questions efficiently within a single program. Their approach is mathematically exact under certain assumptions and offers faster inference compared to previous methods. Experiments confirm both the accuracy and speed benefits, and they explore how issues like neural calibration affect causal estimates. Their work provides a clearer way to integrate neural perception with causal logic reasoning.
DeepProbLogcounterfactual reasoningcausal inferenceprobabilistic logic programmingweighted model countingSingle World Intervention Programs (SWIPs)neural calibrationcausal semanticsinterventionscausal estimands
Authors
Saimun Habib, Vaishak Belle, Fengxiang He
Abstract
Neurosymbolic systems such as DeepProbLog combine neural perception with probabilistic logic, but standard inference is associational. Counterfactual reasoning additionally requires a causal semantics for interventions and evidence. We introduce DeepSWIP, a single-world counterfactual semantics for DeepProbLog programs. Using neural materialization, we reduce fixed-context neural predicates to ordinary ProbLog choices, apply Single World Intervention Programs (SWIPs), and compute counterfactuals by weighted model counting (WMC) over a single transformed program. Under finite grounding and unique-supported-model assumptions, DeepSWIP is exact relative to the learned materialized FCM. The standard quotient-WMC form of ProbLog conditionals identifies active neural probabilities and explains intervention cleaning, calibration sensitivity, and rare-evidence instability. Experiments on MPI3D confirm the transformation against a DeepTwin construction against 12,000 queries, as predicted and a 2.14$\times$ inference speedup from avoiding the Twin's endogenous duplication. A SUMO HOV experiment shows that neural calibration degradation biases plug-in estimates, while a correctly scoped randomized-policy AIPW estimator removes most first-order bias for population mean and ATE estimands. Code is at https://github.com/saibib/deep_SWIP.