Model Free RL

Created January 4, 2021 · Updated March 4, 2026

"Never solve a more general problem as an intermediate step." ~ Vladimir Vapnik, 1998

If we care about optimal behaviour: why not learn a policy directly?

Model-based RL:

Value-based RL:

Closer to true objective (get the highest value for each state to get policy)
Fairly well-understood - somewhat similar to regression Still not the true objective - may still focus capacity on less-important details

Policy-based RL: