Semi-Markov Decision Processes

It is a semi-MDP because the process is Markovian at the level of decision points/epochs (at the level of the decisions over options) but not at the "flat" level. That is, if you don't observe the current choice of options along the trajectories and only see state-action pairs, that process won't be Markovian.

Semi-MDPs are thus used to deal with such problems that involve actions of different levels of abstraction. Hierarchical reinforcement learning (HRL) is a generalization (or extension) of reinforcement learning where the environment is modeled as a semi-MDP.

Option

An option is a generalization of the concept of action. It captures the idea that certain actions are composed of other sub-actions. An example from:

Examples of options include picking up an object, going to lunch, and traveling to a distant city, as well as primitive actions such as muscle twitches and joint torques.

smdp-options

References

  1. What is semi-MDP? https://stats.stackexchange.com/questions/219796/from-markov-decision-process-mdp-to-semi-mdp-what-is-it-in-a-nutshell
  2. What are options in RL? https://ai.stackexchange.com/a/13255
  3. SMDPs, Chapter 7, A First Course in Stochastic Models, HC Tijms http://read.pudn.com/downloads74/ebook/272070/A First Course in Stochastic Models/7 Semi-Markov Decision Processes.pdf