Link Prediction or Perdition: the Seeds of Instability in Knowledge Graph Embeddings
2026-06-02 • Machine Learning
Machine Learning
AI summaryⓘ
The authors studied how stable knowledge graph embedding models (KGEMs) are when used to predict missing links in knowledge graphs. They found that even models with good overall scores can give very different answers depending on things like random starting points or the order of training data. Various random factors each cause similar amounts of instability. Also, just because a model has better average performance doesn't mean it is more consistent. Techniques like voting can help slightly but don't fully fix the problem. This suggests current ways of evaluating these models might overlook important reliability issues.
Knowledge Graph EmbeddingsLink PredictionMean Reciprocal Rank (MRR)Hits@KModel StabilityRandom SeedsNegative SamplingDropoutHyperparametersBenchmarking Protocols
Authors
Guillaume Méroué, Fabien Gandon, Pierre Monnin
Abstract
Embedding models (KGEMs) constitute the main link prediction approach to complete knowledge graphs. Standard evaluation protocols emphasize rank-based metrics such as MRR or Hits@$K$, but usually overlook the influence of random seeds on result stability. Moreover, these metrics conceal potential instabilities in individual predictions and in the organization of embedding spaces. In this work, we conduct a systematic stability analysis of multiple KGEMs across several datasets. We find that high-performance models actually produce divergent predictions at the triple level and highly variable embedding spaces. By isolating stochastic factors (i.e., initialization, triple ordering, negative sampling, dropout, hardware), we show that each independently induces instability of comparable magnitude. Furthermore, for a given model, hyperparameter configurations with better MRR are not guaranteed to be more stable. Moreover, voting, albeit a known remediation mechanism, only provides a limited enhancement of stability. These findings highlight critical limitations of current benchmarking protocols, and raise concerns about the reliability of KGEMs for knowledge graph completion.