Solution Methods Flashcards

Question

Define **curriculum learning**.

Answer 1

A training strategy where tasks are presented in increasing difficulty to improve learning efficiency.

Answer 2

Learning an agent's goals by observing its behavior and inferring the reward function.

Answer 3

TRUE ## Footnote Algorithms must handle infinite action choices, requiring specialized techniques.

Answer 4

The set of all possible states in which an agent can find itself.

Answer 5

A situation where rewards are infrequent, making learning more challenging.

Answer 6

Using existing value estimates to update other value estimates.

Answer 7

TRUE ## Footnote This can lead to unintended behaviors in agents.

Answer 8

A neural network that outputs action probabilities for given states.

Answer 9

selecting actions

Answer 10

The expected future reward of taking a specific action in a given state.

Answer 11

Balances exploration and exploitation by randomly choosing actions with a small probability.

Answer 12

TRUE ## Footnote This introduces randomness in decision-making.

Answer 13

The process of determining the value function for a given policy.

Answer 14

reinforcement learning

Answer 15

A function that defines the rewards received after taking actions in states.

Answer 16

The set of all possible actions an agent can take in a given state.

Answer 17

TRUE ## Footnote This contrasts with stochastic policies.

Answer 18

Encoding the state information in a format suitable for learning algorithms.

Answer 19

estimates of future rewards

Answer 20

A method for deciding how to explore the action space during learning.

Answer 21

The practice of reducing the value of future rewards to prioritize immediate rewards.

Answer 22

TRUE ## Footnote This is useful for learning from historical data.

Answer 23

The process of improving a policy to maximize expected rewards.

Answer 24

reinforcement learning

Answer 25

A type of machine learning where agents learn by interacting with an environment to maximize rewards.

Answer 26

To provide feedback to the agent about the quality of its actions in reinforcement learning.

Answer 27

TRUE ## Footnote Exploration helps agents discover new strategies and improve performance.

Answer 28

value-based

Answer 29

A strategy that defines the agent's actions based on its state.

Answer 30

A mathematical framework for modeling decision-making where outcomes are partly random and partly under the control of a decision maker.

Answer 31

A value between 0 and 1 that determines the importance of future rewards.

Answer 32

TRUE ## Footnote It updates estimates based on other learned estimates without waiting for a final outcome.

Answer 33

To estimate how good it is for an agent to be in a given state.

Answer 34

State-Action-Reward-State-Action

Answer 35

To choose the best-known action based on current knowledge.

Answer 36

A type of reinforcement learning that optimizes the policy directly.

Answer 37

TRUE ## Footnote This approach allows handling high-dimensional state spaces.

Answer 38

Stores past experiences to improve learning efficiency and stability.

Answer 39

The balance between exploring new actions and exploiting known rewarding actions.

Answer 40

A recursive equation that relates the value of a state to the values of its successor states.

Answer 41

TRUE ## Footnote These methods average returns from complete episodes for learning.

Answer 42

Modifying the reward function to make learning easier and faster.

Answer 43

simulation

Answer 44

An algorithm that iteratively improves the policy based on value function updates.

Answer 45

An algorithm that computes the optimal policy by iteratively updating value estimates.

Answer 46

TRUE ## Footnote This approach simplifies complex problems by structuring them hierarchically.

Answer 47

Applying knowledge gained in one task to improve learning in a different but related task.

Answer 48

A training strategy where tasks are presented in increasing difficulty to improve learning efficiency.

Answer 49

Learning an agent's goals by observing its behavior and inferring the reward function.

Answer 50

TRUE ## Footnote Algorithms must handle infinite action choices, requiring specialized techniques.

Answer 51

The set of all possible states in which an agent can find itself.

Answer 52

A situation where rewards are infrequent, making learning more challenging.

Answer 53

Using existing value estimates to update other value estimates.

Answer 54

TRUE ## Footnote This can lead to unintended behaviors in agents.

Answer 55

A neural network that outputs action probabilities for given states.

Answer 56

selecting actions

Answer 57

The expected future reward of taking a specific action in a given state.

Answer 58

Balances exploration and exploitation by randomly choosing actions with a small probability.

Answer 59

TRUE ## Footnote This introduces randomness in decision-making.

Answer 60

The process of determining the value function for a given policy.

Answer 61

reinforcement learning

Answer 62

A function that defines the rewards received after taking actions in states.

Answer 63

The set of all possible actions an agent can take in a given state.

Answer 64

TRUE ## Footnote This contrasts with stochastic policies.

Answer 65

Encoding the state information in a format suitable for learning algorithms.

Answer 66

estimates of future rewards

Answer 67

A method for deciding how to explore the action space during learning.

Answer 68

The practice of reducing the value of future rewards to prioritize immediate rewards.

Answer 69

TRUE ## Footnote This is useful for learning from historical data.

Answer 70

The process of improving a policy to maximize expected rewards.

Answer 71

reinforcement learning

Solution Methods Flashcards

(99 cards)