State Update Functions in Partially Observable MDP

Created October 28, 2021 · Updated March 4, 2026

Partial observable MDP's do not have all relevant information from history in the observations

Thus, an internal state has to be extracted from the history Trade-off between various factors:

Compactness
Markov property
Interpretability
Computational complexity of updates, learning
Ease of implementation

We can extract features from a history

f\left(H_{t}\right)

Two desired properties:
$f\left(H_{t}\right)$ should be compact (low-d) summary of history $f\left(H_{t}\right)$ should capture all relevant information...

Exact methods

Full history
- Not compact...
Belief state
- Easy to interpret
- Requires known model
  - (tricky to learn from data)
Predictive state
- Model learnable from data
- Most compact

Approximate methods

Recent observation(s)
- Easy
- Lose long-term dependencies
End-to-end learning
- Quite general
- RNN learning can be tricky, requires much data...