KL Divergence
The Kullback-Leibler (KL) divergence, a.k.a. relative entropy, measures how one distribution diverges from a second, expected probability distribution. It is defined as:
KL divergence is an example of f-Divergence with .
KL divergence measures how two probability distributions are different from each other: the more similar two distributions are, the smaller their KL divergence is. For example, if two distributions’ KL divergence is 0, iff they are almost everywhere the same.
For discrete variables, KL divergence is the extra amount of information needed to send a message containing symbols drawn from probability distribution P, when we use a code that was designed to minimize the length of messages drawn from probability distribution Q; or conversely, is the information gain achieved if would be used instead of , which is currently used.
Properties
- Generally
- Generally
- ❗️ KL divergence is not a metric.
The second and third properties are by Jensen Inequality and the convexity of .