LLM4CodeRE: Generative AI for Code Decompilation Analysis and Reverse Engineering
2026-04-07 • Cryptography and Security
Cryptography and SecurityArtificial Intelligence
AI summaryⓘ
The authors address the difficulty of turning complicated malware code back into readable source code, a process called decompilation. They created LLM4CodeRE, a special language model that can both convert low-level assembly code to high-level source code and the other way around, all in one system. To make this work well, they developed two fine-tuning methods that help the model understand the unique patterns of malware code and follow task rules. Their tests show that LLM4CodeRE does better than current tools and general code models at both directions of code translation.
code decompilationmalware reverse engineeringassembly codesource codelarge language models (LLMs)fine-tuningMulti-AdapterSeq2Seqtask-conditioned prefixessyntactic and semantic alignment
Authors
Hamed Jelodar, Samita Bai, Tochukwu Emmanuel Nwankwo, Parisa Hamedi, Mohammad Meymani, Roozbeh Razavi-Far, Ali A. Ghorbani
Abstract
Code decompilation analysis is a fundamental yet challenging task in malware reverse engineering, particularly due to the pervasive use of sophisticated obfuscation techniques. Although recent large language models (LLMs) have shown promise in translating low-level representations into high-level source code, most existing approaches rely on generic code pretraining and lack adaptation to malicious software. We propose LLM4CodeRE, a domain-adaptive LLM framework for bidirectional code reverse engineering that supports both assembly-to-source decompilation and source-to-assembly translation within a unified model. To enable effective task adaptation, we introduce two complementary fine-tuning strategies: (i) a Multi-Adapter approach for task-specific syntactic and semantic alignment, and (ii) a Seq2Seq Unified approach using task-conditioned prefixes to enforce end-to-end generation constraints. Experimental results demonstrate that LLM4CodeRE outperforms existing decompilation tools and general-purpose code models, achieving robust bidirectional generalization.