Basic Rl Flashcards by Williams Iker Luque Portada

Define reinforcement learning.

A type of machine learning where agents learn to make decisions by receiving rewards or penalties.

How well did you know this?

Not at all

Perfectly

What is an agent in reinforcement learning?

An entity that makes decisions and takes actions in an environment to maximize cumulative reward.

How well did you know this?

Not at all

Perfectly

True or false: Exploration is the process of trying new actions in reinforcement learning.

TRUE

Exploration helps agents discover better strategies than exploiting known actions.

How well did you know this?

Not at all

Perfectly

Fill in the blank: The environment provides _______ to the agent.

states and rewards

How well did you know this?

Not at all

Perfectly

What does reward signify in reinforcement learning?

A feedback signal that indicates the success of an agent’s action in achieving its goal.

How well did you know this?

Not at all

Perfectly

Define policy in reinforcement learning.

A strategy that defines the agent’s actions based on the current state of the environment.

How well did you know this?

Not at all

Perfectly

What is the purpose of value function?

To estimate the expected return or future rewards from a given state or state-action pair.

How well did you know this?

Not at all

Perfectly

True or false: Discount factor determines the importance of future rewards.

TRUE

A discount factor between 0 and 1 prioritizes immediate rewards over distant ones.

How well did you know this?

Not at all

Perfectly

What is Q-learning?

A model-free reinforcement learning algorithm that learns the value of action in states.

How well did you know this?

Not at all

Perfectly

Fill in the blank: In reinforcement learning, exploit means to _______.

use known information to maximize reward

How well did you know this?

Not at all

Perfectly

Define Markov Decision Process (MDP).

A mathematical framework for modeling decision-making where outcomes are partly random and partly under the control of an agent.

How well did you know this?

Not at all

Perfectly

What is temporal difference learning?

A method that updates value estimates based on the difference between predicted and actual rewards over time.

How well did you know this?

Not at all

Perfectly

True or false: Deep reinforcement learning combines neural networks with reinforcement learning.

TRUE

It allows agents to handle high-dimensional state spaces effectively.

How well did you know this?

Not at all

Perfectly

What does exploration-exploitation tradeoff refer to?

The balance between trying new actions and using known actions to maximize rewards.

How well did you know this?

Not at all

Perfectly

Fill in the blank: SARSA stands for _______.

State-Action-Reward-State-Action

How well did you know this?

Not at all

Perfectly

What is the role of a reward function?

Study These Flashcards

To define the goals of the agent by assigning rewards to actions taken in specific states.

Define policy gradient methods.

Study These Flashcards

A class of algorithms that optimize the policy directly by adjusting its parameters based on gradients.

What is experience replay?

Study These Flashcards

A technique where past experiences are stored and reused to improve learning efficiency.

True or false: Monte Carlo methods rely on complete episodes for learning.

Study These Flashcards

TRUE

They estimate value functions based on the average returns of complete episodes.

What is a state in reinforcement learning?

Study These Flashcards

A representation of the current situation of the agent within the environment.

Fill in the blank: Transfer learning in reinforcement learning refers to _______.

Study These Flashcards

applying knowledge from one task to improve learning in another task.

What is multi-agent reinforcement learning?

Study These Flashcards

A scenario where multiple agents interact and learn in the same environment, influencing each other’s learning.

Define reward shaping.

Study These Flashcards

The process of modifying the reward function to make learning more efficient or faster.

What is the Bellman equation?

Study These Flashcards

A recursive equation that relates the value of a state to the values of its successor states.

True or false: **Overfitting** can occur in reinforcement learning.

TRUE ## Footnote It happens when an agent learns too well from limited experiences, failing to generalize.

Fill in the blank: In reinforcement learning, **bootstrapping** involves _______.

updating value estimates based on other estimates rather than on complete returns.

Basic Rl Flashcards

(26 cards)