f-Divergence
For two distributions and such that whenever , and a convex real function on with and , we can define
where the second equality holds when and are densities of and respectively. is called the f-divergence between and .
Intuitively, the expected convex function amplifies the difference signal in the likelihood ratio .
Examples
- Letting gives the Total Variation Distance
- Letting gives the KL Divergence
- Letting gives the squared Hellinger Distance
- Letting gives the chi-square divergence
TV vs. KL ^rmk-tv-kl
- TV is a metric, meaning that it’s symmetric and satisfies the triangle inequality; KL is not a metric, and does not satisfy these two properties.
- TV does no “tensorize” but KL does. So in practice, KL, or other divergences that tensorize, is usually preferred, as the calculation boils down to calculating the divergence of individual samples.
- TV is convenient for theoretical analysis, as it has a simple variational representation, L1 norm equivalence, and optimal transportation interpretation. For example, TV is used to show hardness results of hypothesis testing.
Properties
- Non-negativity: , due to the convexity of .
- Zero self-divergence: , due to that .