Markov Reward Processes

Created December 28, 2020 · Updated July 22, 2025

A MRP is essentially just a Markov Chain with an associated reward function.

In Reinforcement Learning, a MRP arises when you fix a policy $\pi$ for your MDP. Then all the decision making is accounted for, and we have a MRP with the induced transition kernel

$p\left(s^{\prime} \mid s\right)=\int p\left(s^{\prime} \mid s, a\right) \pi(a \mid s) d a$

This MRP models the reward accrued by a given decision-making strategy $(\pi)$ in the MDP.