What is the main objective of a deep reinforcement learning agent?
to learn a sequential decision-making task from experience in an environment to achieve specific goals
transitions in reinforcement learning
usually stochastic
observations (w) and actions (a)
experience
What is the ‘reality gap’ in reinforcement learning, and why is it a challenge?
Why might an agent have limited access to data in reinforcement learning?
How can the reality gap and limited data challenges be addressed in reinforcement learning?
What does generalization refer to in reinforcement learning?
How can an agent achieve generalization with limited data?
What is transfer learning in reinforcement learning, and why is it important?
What are common methods for transfer learning in reinforcement learning?
What is a supervised learning algorithm?
What are bias and variance in supervised learning?
What is the ideal model in terms of bias and variance?
How can variance (overfitting) be reduced in supervised learning?
Increasing the size of the dataset can help reduce variance by improving the model’s generalization to new data.
What is bias-variance decomposition?
Bias-variance decomposition describes how the total error of a model can be broken down into:
What does the bias-variance decomposition highlight in reinforcement learning?
highlights a tradeoff between:
Why is direct bias-variance decomposition less straightforward for loss functions other than L2 loss in reinforcement learning?
How can prediction error be decomposed when using non-L2 loss functions?
prediction error can be decomposed into:
Replacement for the bias-variance tradeoff in reinforcement learning
tradeoff between:
What is a batch or offline reinforcement learning algorithm?
How can the suboptimality of the expected return in an MDP be decomposed?
What is the bias-overfitting tradeoff in reinforcement learning?
How can the best policy be obtained in reinforcement learning?
The best policy can be obtained by balancing bias and overfitting through: