Markov Decision Process Flashcards

(19 cards)

1
Q

How do Markov Decision Processes (MDPs) differ from state space searches

A

In standard search, action have guaranteed outcomes, in an MDP, actions have probabilistic outcomes (e.g. 80% chance of x, 20% chance of y)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Probabilistic outcomes

A

You don’t know for certain which state you will reach

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the Markov Property

A

It is the “memoryless” property.

The future depends only on the current state and action, not the history of how you got there

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the mathematical rule for transition probabilities in any state-action pair

A

For a given state s and action a, the sum of probabilities for all possible next states must equal 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the formula for Discounted Return (Gt)

A

Gt = r(t+1) + γr(t+2) + γ²r(t+3) + … = Σ (γ^k * r_{t+k+1})

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does the discounted formula equation represent

A

It is a way to calculate the total value of all rewards an agent receives, starting from time t.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does r(t+1), r(t+2) … represent in the Discounted Return formula

A

These are the individual rewards received at each future step. The first reward is not discounted because it is received immediately

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does gamma represent in the Discounted Return formula

A

This is the discount rate.
It is a value between 0 and 1 that determines how much we value rewards relative to immediate ones.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does gamma = 1 mean

A

Future rewards are worth just as much as current rewards

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does gamma^k mean in the Discounted Return formula

A

As time goes on, k increases. Since gamma is usually less than 1, gamma^k gets smaller and smaller, meaning rewards in the distant future fade away and count for less

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How does a lower discount rate change an agent’s behaviour

A

It motivates the decision-maker to favour immediate rewards and take actions early rather than postponing them.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a Policy (π) in an MDP

A

A strategy that specifies exactly which action to take for every possible state in the process.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the difference between Deterministic and Stochastic policy

A

Deterministic - Selects exactly one specific action for each state

Stochastic - Assigns probabilities to different actions for each state

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What 2 factors must be balanced to achieve optimal behaviour in an MDP

A

Risks and Rewards

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How is the discount rate used in Climate Policy?

A

High SDR (social discount rate) - Values the present more than the future; used to argue against drastic immediate climate action.

Low SDR - Values the future almost as much as the present; used to argue for immediate action

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

States

A

A set of all possible situations

17
Q

Actions

A

The choices available in each state

18
Q

Transition probability

A

The likelihood of ending up in a new state given the current state and action

P (S t+1 | St, at)

19
Q

Reward

A

The immediate payoff received after a transition