Neurosymbolic Auditing of Natural-Language Software Requirements
2026-05-13 • Software Engineering
Software EngineeringArtificial Intelligence
AI summaryⓘ
The authors developed VERIMED, a system that uses large language models and an SMT solver to check medical device software requirements for problems like ambiguity and safety issues. Their approach translates written requirements into formal logic and looks for multiple possible meanings by comparing different formalizations. They found that differences in these formalizations signal unclear requirements, and detailed feedback from the solver helps fix them significantly. This method improved accuracy in verifying hemodialysis safety requirements and helped audit software specifications more reliably.
natural language requirementsambiguitySMT solverformal logiclarge language modelssoftware verificationmedical device softwarecounterexample-guided repairhemodialysis safety
Authors
Bethel Hall, William Eiers
Abstract
Natural-language software requirements are often ambiguous, inconsistent, and underspecified; in safety-critical domains, these defects propagate into formal models that verify the wrong specification and into implementations that ship unsafe behavior. We show that large language models, equipped with an SMT solver, can audit such requirements: translating them into formal logic, detecting ambiguity through stochastic variation in the generated formalization, and exposing inconsistency, vacuousness, and safety violations through solver queries on the resulting specification. We present VERIMED, a neurosymbolic pipeline that operationalizes this idea for medical-device software requirements, and report two findings. First, stochastic variation across independent formalizations is a signal of ambiguity: requirements that admit multiple plausible interpretations produce SMT-inequivalent formalizations, and bidirectional SMT equivalence checking turns this disagreement into a solver-checkable test. Second, the usefulness of symbolic feedback depends on its granularity: in counterexample-guided repair on a hemodialysis question-answering benchmark, concrete SMT counterexamples raise verified accuracy from 55.4% to 98.5%. Over an extensive experimental evaluation on open-source hemodialysis safety requirements, we show that the LLM-based approach in VERIMED successfully reduces ambiguity-sensitive requirements and enables rigorous auditing of software requirements through SMT-based queries.