LLM4CodeRE: Generative AI for Code Decompilation Analysis and Reverse Engineering

2026-04-07 • Cryptography and Security

Cryptography and SecurityArtificial Intelligence

AI summaryⓘ

The authors address the difficulty of turning complicated malware code back into readable source code, a process called decompilation. They created LLM4CodeRE, a special language model that can both convert low-level assembly code to high-level source code and the other way around, all in one system. To make this work well, they developed two fine-tuning methods that help the model understand the unique patterns of malware code and follow task rules. Their tests show that LLM4CodeRE does better than current tools and general code models at both directions of code translation.

code decompilationmalware reverse engineeringassembly codesource codelarge language models (LLMs)fine-tuningMulti-AdapterSeq2Seqtask-conditioned prefixessyntactic and semantic alignment

Authors

Hamed Jelodar, Samita Bai, Tochukwu Emmanuel Nwankwo, Parisa Hamedi, Mohammad Meymani, Roozbeh Razavi-Far, Ali A. Ghorbani

Abstract

Code decompilation analysis is a fundamental yet challenging task in malware reverse engineering, particularly due to the pervasive use of sophisticated obfuscation techniques. Although recent large language models (LLMs) have shown promise in translating low-level representations into high-level source code, most existing approaches rely on generic code pretraining and lack adaptation to malicious software. We propose LLM4CodeRE, a domain-adaptive LLM framework for bidirectional code reverse engineering that supports both assembly-to-source decompilation and source-to-assembly translation within a unified model. To enable effective task adaptation, we introduce two complementary fine-tuning strategies: (i) a Multi-Adapter approach for task-specific syntactic and semantic alignment, and (ii) a Seq2Seq Unified approach using task-conditioned prefixes to enforce end-to-end generation constraints. Experimental results demonstrate that LLM4CodeRE outperforms existing decompilation tools and general-purpose code models, achieving robust bidirectional generalization.

View PDFOpen arXiv