1 page

Rlhf

Apr 22, 20268 min read

Markov Decision Processes: The Mathematical Foundation of Reinforcement Learning

The Markov Decision Process (MDP) is the standard formal object for sequential decision-making under uncertainty. It separates problem definition — states, actions, how the world …