Loss Functions
Mean-Squared-Error Loss
Data: inputs $\mathbf{X}=\left(\mathbf{x}_{1}, \ldots, \mathbf{x}_{N}\right)^{T},$ and targets $\mathbf{t}=\left(t_{1}, \ldots, t_{N}\right)^{T}$
Assume target distribution as Gaussian:
single target $->$ single output unit: $y(\mathbf{x}, \mathbf{w})=h^{(L)}\left(a^{\text {out }}\right)$
Targets are real valued: identity output activation function:
Maximum Likelihood/minimum negative log likelihood:
Equivalently,
commonly referred to as Mean-Squared Error, Quadratic Loss etc.
Binary Cross Entropy Loss
Data: inputs $\mathbf{X}=\left(\mathbf{x}_{1}, \ldots, \mathbf{x}_{N}\right)^{T},$ and targets $\mathbf{t}=\left(t_{1}, \ldots, t_{N}\right)^{T}$
Assume target distribution as Bernoulli, and make prediction for probability for class 1:
Targets are binary: Sigmoid output activation function:
Maximum Likelihood/minimum negative log likelihood:
commonly referred to as Binary Cross entropy loss.
Cross Entropy Loss
Data: inputs $\mathbf{X}=\left(\mathbf{x}_{1}, \ldots, \mathbf{x}_{N}\right)^{T},$ and one-hot encoded targets $\mathbf{T}=\left(\mathbf{t}_{1}, . . ., \mathbf{t}_{N}\right)^{T}$
Assume target distribution as generalized Bernoulli:
$\mathrm{K}$ targets $->\mathrm{K}$ output units: $\quad y_{k}(\mathbf{x}, \mathbf{w})=h^{(L)}\left(a_{k}^{\text {out }}\right)$
Categorical targets: Softmax output activation function:
Maximum Likelihood/minimum negative log likelihood:
commonly referred to as Cross entropy loss.