Explain what RL is, and what we want to learn from it
Learning from interaction with an environment to achieve some long-term goal that is related to the state of the environment
Define a simple RL Setup And Goal
Setup: We have an agent which is interacting with an environment which it can affect through actions. The agent may be able to sense the environment partially or fully.
Goal: the agent tries to maximise the long term reward conveyed using a reward signal
Explain the differences between Supervised and Reinforcement Learning
Explain the differences between Unsupervised and Reinforcement Learning
What is a policy
It states what action the agent takes when in a particular state. Thus, it is a function that maps states to actions
Characteristics of RL
What is the difference between Fully and Partially Observable environments
With full observability, the agent directly observes environment state.
With partial observability, the agent indirectly observes the environment
What does expectimax search compute
the average score under optimal play (e.g. Stockfish)
Discuss the differences between model-based and model-free RL techniques
Pros and cons of Model-based RL
Pros and cons of Model-Free RL
What is collaborative filtering
Approach for making predictions about the preferences of a user by collecting information from many other users
Briefly describe content-based recommenders
They analyse item descriptions and metadata to identify items likely to be of interest to the target user
What are the differences between user-based and item-based collaborative filtering
What are the differences between the Jaccard Index and Cosine similarity
JI is suitable for binary data, cosine similarity is used with real-valued data.
Give the steps of predicting ratings in User-Based CF
Give the steps of making recommendations in User-Based CF
Explicit v Implicit Data Collection
EDC actively asks users for explicit ratings for items
IDC gathers data directly based on user’s activity
What is offline evaluation and its advantages
A previously collected dataset is used, no actual users are involved in the evaluation.
What is online evaluation and its advantages
Users interact with a running system in a “live experiment”, and receive actual recommendations.
Feedback from the users is collected by observing their online behaviour and/or explicitly collecting their feedback
Briefly discuss serendipity and diversity in Recommendation systems evaluation
It is often not helpful to recommend obvious items that are too similar to one another.
An alternative evaluation goal is to examine the extent to which a recommender can generate diverse recommendations among its top results