Neural architectures for resolving references in program code

2026-04-15Machine Learning

Machine LearningNeural and Evolutionary Computing
AI summary

The authors studied how to improve computer programs that rewrite references, which is important in tasks like decompiling code. They focused on two specific problems called direct and indirect indexing by permutation and made fake tests to check how well machine learning models do on these tasks. They found that popular models struggle, so they created new models that perform better, especially on longer examples. When tested on a real task involving decompiling switch statements, their new model reduced errors by 42%. Their experiments also showed that each part of their new model is important for its success.

reference rewritingdecompilationsequence-to-sequence modelsindexing by permutationdirect indexingindirect indexingmachine learning benchmarksswitch statementsmodel ablation studiesscalability
Authors
Gergő Szalay, Gergely Zsolt Kovács, Sándor Teleki, Balázs Pintér, Tibor Gregorics
Abstract
Resolving and rewriting references is fundamental in programming languages. Motivated by a real-world decompilation task, we abstract reference rewriting into the problems of direct and indirect indexing by permutation. We create synthetic benchmarks for these tasks and show that well-known sequence-to-sequence machine learning architectures are struggling on these benchmarks. We introduce new sequence-to-sequence architectures for both problems. Our measurements show that our architectures outperform the baselines in both robustness and scalability: our models can handle examples that are ten times longer compared to the best baseline. We measure the impact of our architecture in the real-world task of decompiling switch statements, which has an indexing subtask. According to our measurements, the extended model decreases the error rate by 42%. Multiple ablation studies show that all components of our architectures are essential.