Information Theory
Topics
Notes
Linked
Jensen–Shannon Divergence
KL divergence goes to infinity when there is no support overlap resulting in no gradient
KL Divergence
How to measure how one probability distribution differs from another? Compute the expected excess surprise from using the wrong distribution.
Cross entropy
How to measure the difference between predicted and true distributions? Compute the expected log-loss under the true distribution.