RL Generalization Flashcards by Winston Awortwi

What does generalization mean in reinforcement learning?

The ability of an RL agent to achieve good performance either with limited collected data or in a related but different environment.

How well did you know this?

Not at all

Perfectly

Why is generalization particularly challenging in real-world RL problems?

Agent may not be able to interreact with true environment but only a simulation of it (Reality Gap)

Agent has access to limited data due to safety (robotics, med trials), computation, or limited exogenous data (weather conditions, trading markets)

How well did you know this?

Not at all

Perfectly

How does generalization in RL differ from supervised learning?

In RL, data is generated by the agent’s policy and affects future data, whereas supervised learning assumes independent identically distributed samples.

How well did you know this?

Not at all

Perfectly

What is bias in the context of learning algorithms?

The error introduced by limitations in the model or learning algorithm, even with infinite data.

How well did you know this?

Not at all

Perfectly

What is overfitting in learning from limited data?

Poor performance on unseen data due to excessive sensitivity to the specific training dataset.

How well did you know this?

Not at all

Perfectly

What tradeoff governs learning from limited data in supervised learning?

The bias–variance tradeoff. An error introduced from the learning algorithm or an error due to limited data available

How well did you know this?

Not at all

Perfectly

Why is there no strict bias–variance decomposition in reinforcement learning?

Because not all RL objectives rely on an L2 loss (error^2) and involve sequential decision making.

How well did you know this?

Not at all

Perfectly

How is policy suboptimality decomposed in batch reinforcement learning?

Into asymptotic bias (difference from optimal policy) and error due to finite dataset size (overfitting).

How well did you know this?

Not at all

Perfectly

What is asymptotic bias in reinforcement learning?

The difference in performance between optimal policy and policy learned with infinite data

How well did you know this?

Not at all

Perfectly

What causes overfitting in reinforcement learning?

Learning from a limited dataset that does not sufficiently cover the state–action space.

How well did you know this?

Not at all

Perfectly

What is the key tradeoff when selecting a policy class in RL?

Between expressiveness to reduce bias and simplicity to avoid overfitting.

How well did you know this?

Not at all

Perfectly

Name three elements that can improve generalization in reinforcement learning.

Abstract state representations: (discarding non-essential features)
Modifying the objective function: (Reward shaping, tuning training discount factor)
Choosing appropriate learning algorithms (model based/free) and function approximators

How well did you know this?

Not at all

Perfectly

How does abstract state representation improve generalization?

By discarding non-essential features and reducing sensitivity to random correlations which reduces overfitting.

How well did you know this?

Not at all

Perfectly

What is the risk of using too many features in state representations?

Overfitting due to reliance on spurious correlations.

How well did you know this?

Not at all

Perfectly

What bias is introduced by overly coarse abstractions?

They may merge states with different dynamics, reducing policy optimality.

How well did you know this?

Not at all

Perfectly

What is reward shaping?

Study These Flashcards

Modifying the reward function away from actual objective to facilitate learning, at the cost of introducing bias.

How can tuning the discount factor affect generalization?

Study These Flashcards

It biases learning toward short-term or long-term rewards, which can stabilize training.

Why can modifying the training objective help generalization?

Study These Flashcards

Because a biased but smoother objective may be easier to learn from limited data.

How does the choice of learning algorithm affect generalization?

Study These Flashcards

Different algorithms balance bias and overfitting differently depending on structure and assumptions.

Since each algorithm contains a different value function for each state action pair, representation of the policy, and/or a model of environment and planning algorithm.

Why do function approximators play a key role in generalization?

Study These Flashcards

An attention mechanism: they determine how features are transformed into higher-level abstractions. Depending on task either model-free or model based is best

What is the parallel between model-free vs model-based RL and human cognition?

Study These Flashcards

Model-free RL resembles fast, intuitive reasoning, while model-based RL resembles slow, deliberate reasoning.

What is transfer learning in reinforcement learning?

Study These Flashcards

Using knowledge learned in one environment to improve learning in a related environment.

Why is transfer learning important in deep RL for real world tasks?

Study These Flashcards

Because training from scratch is expensive and often infeasible in real-world tasks.

What is undirected exploration?

Study These Flashcards

Exploration strategies such as ε-greedy that do not use uncertainty estimates.

What is directed exploration?

Exploration guided by measures of novelty or uncertainty in the value function if non-sparse rewards. If rewards are spares then some exploration rewards must be added in.

Why is exploration critical for generalization in RL?

Because insufficient exploration leads to biased datasets and poor coverage of the environment.

What challenge arises when rewards are sparse?

The agent receives little feedback, making exploration and generalization difficult.

How can exploration bonuses help with sparse rewards?

They provide intrinsic motivation to explore novel states.

What is the main conclusion regarding generalization in RL?

Good generalization requires balancing bias and overfitting through representation, objectives, algorithms, and data.

How can reality gap and limited data be dealt with

As accurate a simulator as possible Design learning algorithm to improve generalization

What does generalization refer to in an RL algorithm

the capacity to achieve good performance in an environment where limited data has been gathered the capacity to obtain good performance in a related environment (transfer learning techniques)

What is a supervised learning algorithm

A mapping from a dataset of learning samples into a predictive model ideally with low overfitting and low bias

What is a batch/offline algorithm in reinforcement learning

A mapping from a dataset into a policy

RL Generalization Flashcards

(33 cards)