Monte-Carlo Estimation
Motivation
We are often interested to compute quantities (statistics) on random variables. For ex,
- the average response to a drug
- probability of a particular sum when throwing two dice
- average reconstructions in VAE given an input.
These statistics often intractable to compute.
- Cannot derive a perfect drug response model (too complex)
- Cannot enumerate all possible dice combinations (too lazy)
- Computationally infeasible (intractable integrals)
Monte Carlo Integration
Use random sampling instead of analytical computation. A single random sample might not be enough. Many random samples can give us a reliable quantification. E.g, by throwing dice many times we can obtain a histogram of probabilities for each possible sum. If we throw dice once, the histogram will be very wrong (just a single bar). But if we repeat hundreds of times and average, we are gonna get very close (law of large numbers).
More formally, in MC integration we treat inputs $x$ as random variables with pdf $p(x)$. Our desired statistic $y$ is the output and integrate over all possible $x$
This integral is equivalent to an expectation
We can approximate this expectation by random sampling and summation
Note that $\hat{y}$ is an estimator because it only approximately estimates the value of $y$.
If we want to compute a quantity $y$ that we can express it as an integral of a function $f$ over a probability space $x$ that has a known and easy to sample pdf $p(x)$ we can replace the exact but intractable computation with a tractable $\mathrm{MC}$ estimator
If we can't translate the quantity as such an integral, we can't estimate it. For instance, we cannot use $\mathrm{MC}$ on the following because neither the $\log p(x \mid z)$ nor the $\nabla_{\varphi} q_{\varphi}(z \mid x)$ are probability densities:
Estimator mean and variance
Our estimator is itself a random variable $\rightarrow$. It has its own mean $\mu_{\hat{y}}=\mathbb{E}[\hat{y}]$ and variance $\operatorname{Var}[\hat{y}]=\mathbb{E}\left[\left(\hat{y}-\mu_{\hat{y}}\right)^{2}\right]$.
The higher the variance, the more the estimation fluctuates after every new experiment.
Estimator bias
An estimator is unbiased if its expected value equals the true parameter. For example, the sample mean x̄ is an unbiased estimator of the population mean μ because E[x̄] = μ.
Otherwise, biased with bias
It is better to have unbiased estimators, but in some cases a bit of bias is ok. Trade tractability for less accurate solutions (than what could be).
MC estimators are unbiased due to linearity of expectation.
For estimating $\mathbb{E}[f(X)]$ using $\hat{\theta} = \frac{1}{n}\sum_{i=1}^n f(X_i)$:
The expectation operator "pushes through" the summation and averaging operations, giving us the true value. This holds for any sample size $n$, making the estimator unbiased regardless of how many samples we use.
This is why MC methods are so powerful - you get this theoretical guarantee basically for free, regardless of how complex your function $f(X)$ is or what distribution you're sampling from (though of course it takes a LOT of samples for good accuracy).
Note: The Law of Large Numbers ensures the estimator is also consistent (converges to the true value as $n \to \infty$), but unbiasedness is a separate property.
Standard error of MC estimator
The MC estimator is a sample mean
The standard error of a sample mean is
The more samples we take the less the estimator deviates. But the deviation reduces only as $\sqrt{n}$. With $4 x$ more samples we only improve our error two times.