ShapleyLaw: A Game-Theoretic Approach to Multilingual Scaling Laws
2026-03-18 • Computation and Language
Computation and Language
AI summaryⓘ
The authors study how the mix of languages used during training affects the performance of a multilingual model. They point out that previous methods didn't properly measure how much different languages help each other during training, called cross-lingual transfer. To fix this, they treat languages like players in a team and use game theory to measure each language's contribution. Their new method, ShapleyLaw, better predicts model performance and helps choose the best mix of languages for training.
multilingual pretraininglanguage mixture ratiostest losscross-lingual transferscaling lawscooperative game theoryShapley valuemodel performance predictionlanguage mixture optimization
Authors
Xuyang Cao, Qianying Liu, Chuan Xiao, Yusuke Oda, Pontus Stenetorp, Daisuke Kawahara, Makoto Onizuka, Sadao Kurohashi, Shuyuan Zheng
Abstract
In multilingual pretraining, the test loss of a pretrained model is heavily influenced by the proportion of each language in the pretraining data, namely the \textit{language mixture ratios}. Multilingual scaling laws can predict the test loss under different language mixture ratios and can therefore be used to estimate the optimal ratios. However, the current approaches to multilingual scaling laws do not measure the \textit{cross-lingual transfer} effect, resulting in suboptimal mixture ratios. In this paper, we consider multilingual pretraining as a cooperative game in which each language acts as a player that jointly contributes to pretraining, gaining the resulting reduction in test loss as the payoff. Consequently, from the perspective of cooperative game theory, we quantify the cross-lingual transfer from each language by its contribution in the game, and propose a game-theoretic multilingual scaling law called \textit{ShapleyLaw}. Our experiments show that ShapleyLaw outperforms baseline methods in model performance prediction and language mixture optimization.