Jensen's Inequality

If a function $g(x)$ is a Convex Function i.e. it's first and second order derivatives are greater than 0, then we have

$$ \mathbb{E}[g(x)] \geq g[\mathbb{E}(x)] \quad {\text {iff g is convex }} $$

Intuition:

  1. Algebraic view: The function evaluated at the average of inputs is less than or equal to the average of the function evaluated at each input.
  2. Geometric view: A line connecting any two points on a convex function lies below (or on) the function curve - this is the visual definition of convexity

It's essentially the go-to tool for proving convexity/concavity properties and establishing bounds in information theory, machine learning, and probability theory. Examples:

  • KL Divergence non-negativity
  • Entropy concavity - Shows that Shannon entropy is a concave function
  • Mutual information non-negativity - Proves that I(X;Y) ≥ 0
  • Convexity of loss functions - Establishes convexity for cross-entropy and other ML losses
  • Log-sum-exp properties - Proves convexity and bounds for this common function
  • f-divergence non-negativity - Generalizes beyond KL to other divergence measures
  • Arithmetic-geometric mean inequality - Special case when applied to the log function
  • Concentration inequalities - Used in proving bounds like Hoeffding's inequality

References

  1. Intuition behind Jensen's inequality https://www.youtube.com/watch?v=HfCb1K4Nr8M