A Note on the Kullback-Leibler Divergence in Discretized Empirical Distributions
2026-06-03 • Digital Libraries
Digital Libraries
AI summaryⓘ
The authors examine a way to compare how two probability distributions differ using the Kullback-Leibler (KL) difference, which looks at differences in information from one distribution to another and vice versa. They show that the sign (positive or negative) of this KL difference does not simply tell us if one distribution fully covers the other or how broad their overlap is. Instead, it's better viewed as measuring how the probability mass is unevenly assigned across categories in an asymmetric way. They clarify this idea through examples and apply it to illustrate topic distributions in COVID-19-related preprints.
Kullback-Leibler divergenceShannon entropyprobability distributionsHill diversity indicesasymmetric informationprobability masslog-ratio contrastsupport inclusioncovariant coveragetopic distributions
Authors
Hayami Osaki
Abstract
When empirical objects are represented as discrete probability distributions, within-distribution summaries such as Shannon entropy and Hill-type diversity indices describe how probability mass is spread inside each object, while Kullback-Leibler (KL) divergence provides pairwise asymmetric information. This note focuses on the KL difference $Δ_{\mathrm{KL}}(p,q)=D_{\mathrm{KL}}(p|q)-D_{\mathrm{KL}}(q|p)$. Although $Δ_{\mathrm{KL}}$ can add information beyond within-distribution summaries and symmetric overlap, its sign does not, by itself, establish support inclusion, coverage, or breadth. It is better understood as a weighted category-wise log-ratio contrast reflecting asymmetric probability-mass placement. The point becomes clear once the definition is written out. The aim of this note is therefore to present it in a compact, example-based form, together with a descriptive bibliometric illustration based on COVID-19-related preprint-server topic distributions.