LogicEval: A Systematic Framework for Evaluating Automated Repair Techniques for Logical Vulnerabilities in Real-World Software

2026-04-14Cryptography and Security

Cryptography and SecurityArtificial Intelligence
AI summary

The authors point out that some software bugs, called logical vulnerabilities, happen because the program’s logic is wrong, not because of memory errors. They note that current automatic bug-fixing tools mostly handle memory problems and have trouble fixing these logic bugs. To study this, the authors made a new collection of real logical bugs called LogicDS and a testing system named LogicEval to check fixes. Their tests show that problems with fixing these bugs often come from challenges understanding the code fully and finding exactly where to fix it.

logical vulnerabilitiesprogram logicmemory safetyautomated program repairlarge language models (LLMs)patch localizationsoftware securitycompilationtesting failuresCVE
Authors
Syed Md Mukit Rashid, Abdullah Al Ishtiaq, Kai Tu, Yilu Dong, Tianwei Wu, Ali Ranjbar, Tianchang Yang, Najrin Sultana, Shagufta Mehnaz, Syed Rafiul Hussain
Abstract
Logical vulnerabilities in software stem from flaws in program logic rather than memory safety, which can lead to critical security failures. Although existing automated program repair techniques primarily focus on repairing memory corruption vulnerabilities, they struggle with logical vulnerabilities because of their limited semantic understanding of the vulnerable code and its expected behavior. On the other hand, recent successes of large language models (LLMs) in understanding and repairing code are promising. However, no framework currently exists to analyze the capabilities and limitations of such techniques for logical vulnerabilities. This paper aims to systematically evaluate both traditional and LLM-based repair approaches for addressing real-world logical vulnerabilities. To facilitate our assessment, we created the first ever dataset, LogicDS, of 86 logical vulnerabilities with assigned CVEs reflecting tangible security impact. We also developed a systematic framework, LogicEval, to evaluate patches for logical vulnerabilities. Evaluations suggest that compilation and testing failures are primarily driven by prompt sensitivity, loss of code context, and difficulty in patch localization.