Define reinforcement learning.
A type of machine learning where agents learn to make decisions by receiving rewards or penalties.
What is an agent in reinforcement learning?
An entity that makes decisions and takes actions in an environment to maximize cumulative reward.
True or false: Exploration is the process of trying new actions in reinforcement learning.
TRUE
Exploration helps agents discover better strategies than exploiting known actions.
Fill in the blank: The environment provides _______ to the agent.
states and rewards
What does reward signify in reinforcement learning?
A feedback signal that indicates the success of an agent’s action in achieving its goal.
Define policy in reinforcement learning.
A strategy that defines the agent’s actions based on the current state of the environment.
What is the purpose of value function?
To estimate the expected return or future rewards from a given state or state-action pair.
True or false: Discount factor determines the importance of future rewards.
TRUE
A discount factor between 0 and 1 prioritizes immediate rewards over distant ones.
What is Q-learning?
A model-free reinforcement learning algorithm that learns the value of action in states.
Fill in the blank: In reinforcement learning, exploit means to _______.
use known information to maximize reward
Define Markov Decision Process (MDP).
A mathematical framework for modeling decision-making where outcomes are partly random and partly under the control of an agent.
What is temporal difference learning?
A method that updates value estimates based on the difference between predicted and actual rewards over time.
True or false: Deep reinforcement learning combines neural networks with reinforcement learning.
TRUE
It allows agents to handle high-dimensional state spaces effectively.
What does exploration-exploitation tradeoff refer to?
The balance between trying new actions and using known actions to maximize rewards.
Fill in the blank: SARSA stands for _______.
State-Action-Reward-State-Action
What is the role of a reward function?
To define the goals of the agent by assigning rewards to actions taken in specific states.
Define policy gradient methods.
A class of algorithms that optimize the policy directly by adjusting its parameters based on gradients.
What is experience replay?
A technique where past experiences are stored and reused to improve learning efficiency.
True or false: Monte Carlo methods rely on complete episodes for learning.
TRUE
They estimate value functions based on the average returns of complete episodes.
What is a state in reinforcement learning?
A representation of the current situation of the agent within the environment.
Fill in the blank: Transfer learning in reinforcement learning refers to _______.
applying knowledge from one task to improve learning in another task.
What is multi-agent reinforcement learning?
A scenario where multiple agents interact and learn in the same environment, influencing each other’s learning.
Define reward shaping.
The process of modifying the reward function to make learning more efficient or faster.
What is the Bellman equation?
A recursive equation that relates the value of a state to the values of its successor states.