Deep Q-Learning
Deep Q-Learning extends Q-learning to high-dimensional state spaces by using neural networks to approximate the Q-function Q(s,a). This enables RL in complex environments like video games and robotics where tabular methods are infeasible.
Q-learning algorithms for function approximators, such as Deep-Q-Network (DQN) (and all its variants) and Deep Deterministic Policy Gradient (DDPG), are largely based on minimizing mean squared bellman error (MSBE) loss function:
L = (Q(s,a) - [r + γ max Q(s',a')])²
Minimizes the squared difference between current Q-estimates and Bellman targets.
Essential Techniques
Experience Replay: Store and randomly sample past experiences to break correlation and improve stability
Target Networks: Use separate, slowly-updated networks for computing targets to prevent instability during training (Multi-Network Training with Moving Average Target).
Key Algorithms
- Deep-Q-Network (DQN): Discrete action spaces, ε-greedy exploration
- Double DQN: Addresses overestimation bias
- Deep Deterministic Policy Gradient (DDPG): Continuous action spaces, actor-critic architecture
- Rainbow DQN: Combines multiple improvements (prioritized replay, dueling networks, etc.)