Reinforcement Learning
Fundamentals
Introduction, MDP and Bandits
Sequential Decision Making
MC Methods and TD(0)
Advanced TD Methods
Prediction with Approximation
Control with Approximation
Model Free Reinforcement Learning
Poilcy Gradient - REINFORCE and Approximations
Policy Gradient - PGT and GAE
Advanced Policy Search
Deterministic PG and Evaluation
- Deterministic Policy Gradient
- Deep Deterministic Policy Gradient (DDPG)
- Evaluating RL algorithms
- Soft Actor-Critic
Planning and Learning
- Prioritized Sweeping
- Trajectory Sampling
Model Based Reinforcement Learning can be categorized as:
Observations-predicting:
- Recurrent World Models Facilitate Policy Evolution (World Models)
- Dyna-Q - Planning and Learning
Value-predicting:
-
MuZero
-
The Predictron
-
TreeQN
-
Value Prediction Network
-
AlphaZero
-
ReBeL
-
When to Trust Your Model - Model-Based Policy Optimization - Neurips 2019
-
MOPO https://arxiv.org/abs/2005.13239 - Neurips 2020
-
Can be combined with structures priors as well:
- Deep RL with relational inductive bias (ICLR 2019, https://openreview.net/pdf?id=HkxaFoC9KQ)
- Relational Neural Expectation Maximization: Unsupervised discovery of objects and their interactions (ICLR 2018, https://arxiv.org/pdf/1802.10353.pdf)
-
Other works
Partial Observability
Pure Exploration
- Best Arm Identification
Other
- Benefits and challenges of different RL methods
- Inverse Reinforcement Learning
Focus on understanding the methods and the relationship between them rather than on remembering e.g. update equations
Especially important: know the advantages, disadvantages and limitations of each methods, and the situations where a certain method should be preferred.
Resources
- Solutions to RL:AI 2nd Edition: https://github.com/LyWangPX/Reinforcement-Learning-2nd-Edition-by-Sutton-Exercise-Solutions/