Deep Q-Learning

Created July 22, 2025 · Updated March 5, 2026

Deep Q-Learning extends Q-learning to high-dimensional state spaces by using neural networks to approximate the Q-function Q(s,a). This enables RL in complex environments like video games and robotics where tabular methods are infeasible.

Q-learning algorithms for function approximators, such as Deep-Q-Network (DQN) (and all its variants) and Deep Deterministic Policy Gradient (DDPG), are largely based on minimizing mean squared bellman error (MSBE) loss function:

L = (Q(s,a) - [r + γ max Q(s',a')])²

Minimizes the squared difference between current Q-estimates and Bellman targets.

Essential Techniques

Experience Replay: Store and randomly sample past experiences to break correlation and improve stability

Target Networks: Use separate, slowly-updated networks for computing targets to prevent instability during training (Multi-Network Training with Moving Average Target).

Key Algorithms

Deep-Q-Network (DQN): Discrete action spaces, ε-greedy exploration
Double DQN: Addresses overestimation bias
Deep Deterministic Policy Gradient (DDPG): Continuous action spaces, actor-critic architecture
Rainbow DQN: Combines multiple improvements (prioritized replay, dueling networks, etc.)