Inverse Reinforcement Learning

Given the $(s,a, R)$ triple, where $R$ is the reward from expert annotations, learn the function $f(s, a) \rightarrow R$.

Presupposition: Reward function provides the most succinct and transferable definition of a task.

References

  1. Inverse Reinforcement Learning from Preferences https://danieltakeshi.github.io/2021/04/01/inverse-rl-prefs/
  2. Berkeley Lecture on IRL https://people.eecs.berkeley.edu/~pabbeel/cs287-fa15/slides/lecture9-inverseRL.pdf