DuDi: Dual-Signal Distillation with Cross-Lingual Verbalizer
2026-06-03 • Computation and Language
Computation and Language
AI summaryⓘ
The authors address the challenge that small language models struggle with multiple languages, especially Southeast Asian ones. They propose DuDi, a method that teaches these models using two types of signals: one looking at whole sentences and another looking at individual words. They also improve the teaching process by translating feedback across languages. Their tests show that DuDi helps models learn languages better than older methods, thanks to combining different teaching signals and cross-language techniques.
small language modelsmultilingual modelsknowledge distillationsequence-level signalstoken-level signalscross-lingual verbalizerteacher-student learningSoutheast Asian languagesmodel optimization
Authors
Patomporn Payoungkhamdee, Tinnakit Udsa, Jian Gang Ngui, Sarana Nutanong, Alham Fikri Aji, Peerat Limkonchotiwat
Abstract
Small language models (SLMs) are efficient and scalable, but their multilingual capabilities degrade severely at sub-billion scales, especially for Southeast Asian (SEA) languages. We introduce DuDi, a dual-signal multilingual distillation framework that combines an online sequence-level signal with off-policy and on-policy token-level signals. DuDi further uses a cross-lingual verbalizer to refine teacher feedback and improve teacher-student transferability in multilingual settings. Experiments on SEA-HELM across multiple model families, scales, and teacher-student settings show that DuDi consistently outperforms competitive distillation baselines. Ablations and analyses confirm that sequence-level optimization, token-level supervision, and cross-lingual verbalization provide complementary and transferable learning signals for multilingual SLMs.