Model Free RL

"Never solve a more general problem as an intermediate step." ~ Vladimir Vapnik, 1998

If we care about optimal behaviour: why not learn a policy directly?

Model-based RL:

  • 'Easy' to learn a model (supervised learning)
  • Learns 'all there is to know' from the data
  • Objective captures irrelevant information
  • May focus compute/capacity on irrelevant details
  • Computing policy (planning) is non-trivial and can be computationally expensive

Value-based RL:

  • Closer to true objective (get the highest value for each state to get policy)
  • Fairly well-understood - somewhat similar to regression Still not the true objective - may still focus capacity on less-important details

Policy-based RL:

  • Right objective!
  • Not the most efficient use of data
  • Ignores other learnable knowledge, so difficult to get off the ground at first