Gravity Falls: A Comparative Analysis of Domain-Generation Algorithm (DGA) Detection Methods for Mobile Device Spearphishing

2026-03-03Cryptography and Security

Cryptography and SecurityMachine LearningNetworking and Internet Architecture
AI summary

The authors studied how well current tools detect bad website names (domains) made by cybercriminals targeting mobile devices through smishing (text message phishing). They made a new dataset called Gravity Falls from real smishing links used by one attacker over several years, showing changing tricks like random strings, word mixes, and copycat domains. They tested simple and machine-learning detectors, finding these methods work best on random names but struggle with more complex, evolving tricks. The authors conclude current detectors aren’t reliable for these smart, changing attacks and suggest more context-aware methods are needed.

smishingDomain Generation Algorithm (DGA)machine learningmalware command and control (C2)entropyLSTM classifiercombo-squattingcredential theftfee/fine frauddomain detection
Authors
Adam Dorian Wong, John D. Hastings
Abstract
Mobile devices are frequent targets of eCrime threat actors through SMS spearphishing (smishing) links that leverage Domain Generation Algorithms (DGA) to rotate hostile infrastructure. Despite this, DGA research and evaluation largely emphasize malware C2 and email phishing datasets, leaving limited evidence on how well detectors generalize to smishing-driven domain tactics outside enterprise perimeters. This work addresses that gap by evaluating traditional and machine-learning DGA detectors against Gravity Falls, a new semi-synthetic dataset derived from smishing links delivered between 2022 and 2025. Gravity Falls captures a single threat actor's evolution across four technique clusters, shifting from short randomized strings to dictionary concatenation and themed combo-squatting variants used for credential theft and fee/fine fraud. Two string-analysis approaches (Shannon entropy and Exp0se) and two ML-based detectors (an LSTM classifier and COSSAS DGAD) are assessed using Top-1M domains as benign baselines. Results are strongly tactic-dependent: performance is highest on randomized-string domains but drops on dictionary concatenation and themed combo-squatting, with low recall across multiple tool/cluster pairings. Overall, both traditional heuristics and recent ML detectors are ill-suited for consistently evolving DGA tactics observed in Gravity Falls, motivating more context-aware approaches and providing a reproducible benchmark for future evaluation.