Basic Rl Flashcards

(26 cards)

1
Q

Define reinforcement learning.

A

A type of machine learning where agents learn to make decisions by receiving rewards or penalties.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is an agent in reinforcement learning?

A

An entity that makes decisions and takes actions in an environment to maximize cumulative reward.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

True or false: Exploration is the process of trying new actions in reinforcement learning.

A

TRUE

Exploration helps agents discover better strategies than exploiting known actions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Fill in the blank: The environment provides _______ to the agent.

A

states and rewards

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does reward signify in reinforcement learning?

A

A feedback signal that indicates the success of an agent’s action in achieving its goal.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Define policy in reinforcement learning.

A

A strategy that defines the agent’s actions based on the current state of the environment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the purpose of value function?

A

To estimate the expected return or future rewards from a given state or state-action pair.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

True or false: Discount factor determines the importance of future rewards.

A

TRUE

A discount factor between 0 and 1 prioritizes immediate rewards over distant ones.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is Q-learning?

A

A model-free reinforcement learning algorithm that learns the value of action in states.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Fill in the blank: In reinforcement learning, exploit means to _______.

A

use known information to maximize reward

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Define Markov Decision Process (MDP).

A

A mathematical framework for modeling decision-making where outcomes are partly random and partly under the control of an agent.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is temporal difference learning?

A

A method that updates value estimates based on the difference between predicted and actual rewards over time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

True or false: Deep reinforcement learning combines neural networks with reinforcement learning.

A

TRUE

It allows agents to handle high-dimensional state spaces effectively.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does exploration-exploitation tradeoff refer to?

A

The balance between trying new actions and using known actions to maximize rewards.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Fill in the blank: SARSA stands for _______.

A

State-Action-Reward-State-Action

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the role of a reward function?

A

To define the goals of the agent by assigning rewards to actions taken in specific states.

17
Q

Define policy gradient methods.

A

A class of algorithms that optimize the policy directly by adjusting its parameters based on gradients.

18
Q

What is experience replay?

A

A technique where past experiences are stored and reused to improve learning efficiency.

19
Q

True or false: Monte Carlo methods rely on complete episodes for learning.

A

TRUE

They estimate value functions based on the average returns of complete episodes.

20
Q

What is a state in reinforcement learning?

A

A representation of the current situation of the agent within the environment.

21
Q

Fill in the blank: Transfer learning in reinforcement learning refers to _______.

A

applying knowledge from one task to improve learning in another task.

22
Q

What is multi-agent reinforcement learning?

A

A scenario where multiple agents interact and learn in the same environment, influencing each other’s learning.

23
Q

Define reward shaping.

A

The process of modifying the reward function to make learning more efficient or faster.

24
Q

What is the Bellman equation?

A

A recursive equation that relates the value of a state to the values of its successor states.

25
True or false: **Overfitting** can occur in reinforcement learning.
TRUE ## Footnote It happens when an agent learns too well from limited experiences, failing to generalize.
26
Fill in the blank: In reinforcement learning, **bootstrapping** involves _______.
updating value estimates based on other estimates rather than on complete returns.