KL Divergence

The Kullback-Leibler (KL) divergence, a.k.a. relative entropy, measures how one distribution diverges from a second, expected probability distribution. It is defined as:

D_{KL} (P ∥ Q) = E_{x \sim P} [lo g \frac{P ( x )}{Q ( x )}] = E_{x \sim P} [lo g P (x) - lo g Q (x)] .

KL divergence is an example of f-Divergence with $f (x) = x lo g x$ .

KL divergence measures how two probability distributions are different from each other: the more similar two distributions are, the smaller their KL divergence is. For example, if two distributions’ KL divergence is 0, iff they are almost everywhere the same.

For discrete variables, KL divergence is the extra amount of information needed to send a message containing symbols drawn from probability distribution P, when we use a code that was designed to minimize the length of messages drawn from probability distribution Q; or conversely, is the information gain achieved if $P$ would be used instead of $Q$ , which is currently used.

Properties

Generally $D_{KL} (P ∣∣ Q) \neq = D_{KL} (Q ∣∣ P)$
$D_{KL} \geq 0$
$D_{KL} (P ∣∣ Q) = 0 \Rightarrow P ≐ Q$
Generally $D_{KL} (P ∣∣ Q) \neq \leq D_{KL} (P ∥ R) + D_{KL} (R ∥ Q)$

❗️ KL divergence is not a metric.

The second and third properties are by Jensen Inequality and the convexity of $- lo g$ .

Sufficient Statistics

Table of Contents

Backlinks

Graph View

KL Divergence

Table of Contents

KL Divergence

Properties

Backlinks

Graph View