What is reinforcement learning
What is reinforcement agent capable of
Where can reinforcement learning operate?
as long as a clear reward can be applied
What is optimal policy
What does a Markov decision process contain
What is a state in MDP
What is a model / transition model in MDP
How is the transition model defined
- in state S, take action A, ends in State S’
How does the modal differ in stochastic actions?
Add in probability P(S’| S,a) - probability of S’ given S and a
What is they key feature of Markov Property
effects of an action taken in a state depend only on that state and not prior history
What is an action in MDP
- A(s) defines the set of actions that can eb taken given state s
What is a reward in MDP
What is policy in MDP
What do MDP solutions usually involve?
dynamic programming
- recursively breaking a problem into pieces while remembering optimal solutions to each piece
How is the quality measured of a policy
- denoted by pi*
What is the goal of MDP and what role does RL play
Goal - maximize cumulative reward in LT
RL - transitions and rewards usually not available
- how to change policy given experience
- how to explore environment
Describe Episodic vs continuing tasks in MDP (optimality/horizon)?
Episodic
Continuing tasks
What are additive rewards
what are discontinued rewards
where y is 0 < 1 - discount factor describes preference if an agent for current rewards over future rewards
where y is close to 0 - rewards in distant future are insignificant
where y is close to 1 - agent is more willing to wait for long-term rewards
when y is exactly 1 - discounted rewards reduce to the special case of purely additive rewards
What is the utility of the state
What is the state value function
denoted - U pi (s)
- expected return when starting in s and following pi
What is the state-action value function
denoted - Q pi (s, a) AKA Q funtion
- expected return when starting in s, performing a and following pi
What are value functions useful for
useful for finding the optimal policy
How does RL differ from MDP
- must try actions and states to learn