AI summaryⓘ
The authors studied a special kind of secret code where each letter can be replaced by many possible symbols, making it hard for machines to crack using usual frequency tricks. They tested a type of neural network called an attention-augmented LSTM to see if it could learn to decode these messages using matching pairs of coded and normal text. Their experiments with synthetic texts in English and Swedish from the 1500s to 1800s showed the model could almost perfectly decode messages, even when the messages were short or had errors. The model only worked well when the coded messages used the shared key code pool, showing it could help check if the same code key was reused. This suggests the method could be a useful tool for cracking and verifying these old types of secret codes.
Homophonic substitution cipherLong Short-Term Memory (LSTM)Attention mechanismCharacter-level decryptionSynthetic ciphertextShared-key settingChronoFideliusTranscription errorsAutomated deciphermentKey-space verification
Authors
Micaella Bruton, Meriem Beloucif, Beáta Megyesi
Abstract
Homophonic substitution ciphers replace each plaintext letter with one of several possible ciphertext codes, deliberately weakening letter-frequency patterns and making automated decipherment difficult. This paper evaluates whether an attention-augmented Long Short-Term Memory (LSTM) model can learn such mappings in a historically motivated shared-key setting: all ciphertexts draw from the same known homophonic code pool, while individual keys use different consistent subsets of that pool. Using synthetic ciphertexts generated with ChronoFidelius from historical English and Swedish texts dated 1500--1899, we test performance across ciphertext lengths, centuries, variable-length codes, and simulated transcription errors. Models are trained only on aligned ciphertext--plaintext pairs, without external language models, frequency statistics, or key-search heuristics. Results show near-perfect character-level decryption accuracy across both languages and all periods, including short and noisy ciphertexts. The model also fails predictably on ciphertexts outside the shared pool, indicating that it functions as a practical tool for decipherment and key-space verification when key reuse is suspected.