Backpropagation
With error function of the form
$$
E(\mathbf{w})=\sum_{n=1}^{N} E_{n}(\mathbf{w})
$$
$$
\frac{\partial E_{n}(\mathbf{w})}{\partial \mathbf{w}}
$$
The goal is to evaluate the above gradient for optimization of parameters using Stochastic Gradient Descent.
Output/hidden activations: $a_{j}^{(l)}=\sum_{i} w_{j i}^{(l)} z_{i}^{(l-1)}$
Output/hidden units: $z_{j}^{(l)}=h^{(l)}\left(a_{j}^{(l)}\right)$
Two stages:
- Forward propagation: Compute all $a_{j}$ and $z_{j}$
- Back propagation: Compute all derivatives $\frac{\partial E_{n}}{\partial w_{j i}^{(l)}}$