What is the utility function for the formal model?
u: X x S -> R
Where X is a set of customers, S is a set of items, and R is a rating.
Essentially, we get a rating for each customer/item pairing
What are the key problems associated with the formal model for recommender systems?
What are the 2 ways we can collect ratings for the utility matrix?
What are the approaches to recommender systems for extrapolating utilities?
Why is extrapolating utilities a problem?
Most people have not rated most items
New items have no ratings
New users have no history
Not much info to extrapolate from
What is the main idea behind a content-based recommendation system?
To recommend items to customer x that are similar to previous items rated highly by that customer
What is an item profile?
A set of features. It is convenient to think of it has a vector with one dimension per feature
What is the prediction heuristic for content-based recommendation systems?
Given a user profile x and item profile i, estimate u(x, i) using cosine similarity between x and i
What is a user profile and how can we calculate it?
When a user has rated items each with their own profile, we create a user profile using the weighted average of rated item profiles or we can weight them by the difference from the average rating for that item
What are the pros of the content-based recommendation system?
What are the cons of the content-based recommendation system?
What is the goal of a collaborative filtering system?
Finding a set N of other users whose ratings are similar to user x’s ratings. We estimate x’s ratings based on ratings of users in N
What is the formula for Jaccard Similarity when we have sets of ratings for users A and B?
Sim(A, B) = |rA INTERSECT rB|/|rA U rB|
What is the formula for cosine similarity when we have sets of ratings for users A and B?
Sim(A, B) = cos(rA, rB) = (rA * rB)/(|rA||rB|)
What is centered cosine similarity (Pearson Correlation)?
Same as cosine similarity but we first normalize all ratings by subtracting the mean of the row (mean of a user’s ratings)
What is the issue with the Jaccard and cosine similarity measures?
Jaccard ignores the value of the rating and only looks at overlapping things
Cosine treats missing ratings as negative by giving them a 0
How do we translate a similarity metric to a recommendation?
rxi = (sim(x, y) * ryi for all y in N)/(sim(x, y) for all y in N)
Where N is the set of k users most similar to x who have rated i, rx is the vector of user x’s ratings
What is item-item collaborative filtering?
Unlike user-user filtering where we compare user preferences, we want to find similar items to a given item
What is the process for item-item collaborative filtering?
What is the upside of the collaborative filtering system?
It works for any kind of item, no feature selection is needed
What are the cons of the collaborative filtering system?
How do we compute a global baseline estimate and why would we need one?
Average rating + (movie rating - average) + (user rating - average)
We need this in case someone has not rated any movie similar to one we are trying to estimate a rating for