Information Theory

Topics

Notes

Linked

Jensen–Shannon Divergence

KL divergence goes to infinity when there is no support overlap resulting in no gradient

KL Divergence

How to measure how one probability distribution differs from another? Compute the expected excess surprise from using the wrong distribution.

Cross entropy

How to measure the difference between predicted and true distributions? Compute the expected log-loss under the true distribution.