State Update Functions in Partially Observable MDP

Partial observable MDP's do not have all relevant information from history in the observations

Thus, an internal state has to be extracted from the history Trade-off between various factors:

  • Compactness
  • Markov property
  • Interpretability
  • Computational complexity of updates, learning
  • Ease of implementation

We can extract features from a history

$$ f\left(H_{t}\right) $$

Two desired properties:
$f\left(H_{t}\right)$ should be compact (low-d) summary of history $f\left(H_{t}\right)$ should capture all relevant information...

Exact methods

  • Full history
    • Not compact...
  • Belief state
    • Easy to interpret
    • Requires known model
      • (tricky to learn from data)
  • Predictive state
    • Model learnable from data
    • Most compact

Approximate methods

  • Recent observation(s)
    • Easy
    • Lose long-term dependencies
  • End-to-end learning
    • Quite general
    • RNN learning can be tricky, requires much data...