Markov Decision Process Flashcards

Question 1

Q

How do Markov Decision Processes (MDPs) differ from state space searches

Answer

A

In standard search, action have guaranteed outcomes, in an MDP, actions have probabilistic outcomes (e.g. 80% chance of x, 20% chance of y)

Question 2

Q

Probabilistic outcomes

Answer

A

You don’t know for certain which state you will reach

Question 3

Q

What is the Markov Property

Answer

A

It is the “memoryless” property.

The future depends only on the current state and action, not the history of how you got there

Question 4

Q

What is the mathematical rule for transition probabilities in any state-action pair

Answer

A

For a given state s and action a, the sum of probabilities for all possible next states must equal 1

Question 5

Q

What is the formula for Discounted Return (Gt)

Answer

A

Gt = r(t+1) + γr(t+2) + γ²r(t+3) + … = Σ (γ^k * r_{t+k+1})

Question 6

Q

What does the discounted formula equation represent

Answer

A

It is a way to calculate the total value of all rewards an agent receives, starting from time t.

Question 7

Q

What does r(t+1), r(t+2) … represent in the Discounted Return formula

Answer

A

These are the individual rewards received at each future step. The first reward is not discounted because it is received immediately

Question 8

Q

What does gamma represent in the Discounted Return formula

Answer

A

This is the discount rate.
It is a value between 0 and 1 that determines how much we value rewards relative to immediate ones.

Question 9

Q

What does gamma = 1 mean

Answer

A

Future rewards are worth just as much as current rewards

Question 10

Q

What does gamma^k mean in the Discounted Return formula

Answer

A

As time goes on, k increases. Since gamma is usually less than 1, gamma^k gets smaller and smaller, meaning rewards in the distant future fade away and count for less

Question 11

Q

How does a lower discount rate change an agent’s behaviour

Answer

A

It motivates the decision-maker to favour immediate rewards and take actions early rather than postponing them.

Question 12

Q

What is a Policy (π) in an MDP

Answer

A

A strategy that specifies exactly which action to take for every possible state in the process.

Question 13

Q

What is the difference between Deterministic and Stochastic policy

Answer

A

Deterministic - Selects exactly one specific action for each state

Stochastic - Assigns probabilities to different actions for each state

Question 14

Q

What 2 factors must be balanced to achieve optimal behaviour in an MDP

Answer

A

Risks and Rewards

Question 15

Q

How is the discount rate used in Climate Policy?

Answer

A

High SDR (social discount rate) - Values the present more than the future; used to argue against drastic immediate climate action.

Low SDR - Values the future almost as much as the present; used to argue for immediate action

Question 16

Q

States

Answer

Study These Flashcards

A

A set of all possible situations

Question 17

Q

Actions

Answer

Study These Flashcards

A

The choices available in each state

Question 18

Q

Transition probability

Answer

Study These Flashcards

A

The likelihood of ending up in a new state given the current state and action

P (S t+1 | St, at)

Question 19

Q

Reward

Answer

Study These Flashcards

A

The immediate payoff received after a transition

Markov Decision Process Flashcards

(19 cards)