RL1 > pseudo codes > Flashcards
Policy evaluation MDP
Policy iteration MDP
Value iteration
Policy evaluation MC
Exploration starts
Epsilon-soft policies MC
Policy evaluation MC off-policy
Policy evaluation TD0
Sarsa
Q-learning
Expected Sarsa