Contrastive Divergence

Created December 15, 2020 · Updated March 4, 2026

\mathrm{KL}\left(p_{0} \| p_{\infty}\right)=\int p_{0} \log p_{0}-\int p_{0} \log p_{\infty} \propto-\int p_{0} \log \mathrm{p}_{\infty}

Contrastive divergence minimizes

\mathrm{CD}_{n}=\mathrm{KL}\left(p_{0} \| p_{\infty}\right)-\mathrm{KL}\left(p_{n} \| p_{\infty}\right)

Updates weights using CD $_{n}$ gradients instead of ML gradients

\frac{d}{\partial \boldsymbol{\theta}} \mathrm{CD}_{n}=-\mathbb{E}_{0}\left[\frac{d}{\partial \boldsymbol{\theta}} E_{\boldsymbol{\theta}}(\boldsymbol{x})\right]+\mathbb{E}_{n}\left[\frac{d}{\partial \boldsymbol{\theta}} E_{\boldsymbol{\theta}}\left(\boldsymbol{x}^{\prime}\right)\right]+\frac{d}{\partial \boldsymbol{\theta}}[\ldots]

where

\mathbb{E}_{n}

is computed by sampling after

$n$

steps in the Markov Chain. The last term is small and can be ignored.

Make sure after $$n$$ sampling step not far from data distribution

Because of conditional independence of $x \mid v$ and $v \mid x$ -> parallel computations

Sample a data point $$x$$
Compute the posterior $\boldsymbol{p}(\boldsymbol{v} \mid \boldsymbol{x})$
Take sample of latents $\boldsymbol{v} \sim \boldsymbol{p}(\boldsymbol{v} \mid \boldsymbol{x})$
Compute the conditional $p(x \mid v)$
Sample from $x^{\prime} \sim p(x \mid v)$
Minimize difference using $x, x^{\prime}$