Probability Theory

Need a computationally tractable distance measure between empirical distributions without requiring density estimation.

KL Divergence

Probability Theory

Many probability distributions are consistent with known constraints. Choose the one with maximum entropy as the least biased estimate.

Gaussian Distribution Lagrange Multipliers

Probability Theory Stochastic Gradients

How do you estimate expectations under a target distribution when you only have samples from a different distribution?

Probability Theory

Analytical computation of expectations is intractable. Approximate expectations by averaging random samples.

Machine Learning Probability Theory

Exact posterior inference is intractable for complex models. Approximate the posterior with a simpler distribution by minimizing KL divergence.

Jensen's Inequality Latenent Variable Models Importance Sampling

Information Theory Probability Theory

KL divergence goes to infinity when there is no support overlap resulting in no gradient

[ [ G e n e r a t i v e A d v e r s a r i a l N e t w o r k s ] ] KL Divergence Generative Adversarial Networks

Probability Theory

Proving convexity/concavity properties and establishing bounds like such KL non-negativity and convexity of loss functions.

KL Divergence