Information Theory

Information Theory Probability Theory

KL divergence goes to infinity when there is no support overlap resulting in no gradient

[ [ G e n e r a t i v e A d v e r s a r i a l N e t w o r k s ] ] Generative Adversarial Networks KL Divergence

Information Theory

How to measure how one probability distribution differs from another? Compute the expected excess surprise from using the wrong distribution.

Gaussian Distribution Maximum Likelihood Estimation Cross entropy Variational Autoencoders

Information Theory Loss Functions

Cross entropy

How to measure the difference between predicted and true distributions? Compute the expected log-loss under the true distribution.

Loss Functions