Lecture 42 — Content Based Recommendations | Stanford University



welcome back to mining of massive datasets we are going to continue our lesson on recommender systems by looking at content-based recommendation systems the main idea behind content-based recommendation systems is to recommend items to a customer X similar to previous items rated highly by the same customer for example in the in example of movies you might recommend movies with the same actor or actors directors genre and so on in the case of websites blogs or news we might recommend articles with similar content or on the simple similar topics in the case of people recommendations we might recommend people with many common friends to each other so here's a plan of action we're going to start with the user and find out a set of items the user likes using both explicit and implicit data for example we might look at the items that the user has rated highly and the set of items a user has purchased and for each of each of those items we are going to build an item profile an item profile is a description of the item for example in this case we are dealing with geometric shapes and let's say the user likes a red red circle and a red triangle we might build item profiles that say that the user likes red items right or they order a user like circles for instance and from these items from these item profiles we could infer a user profile the user profile in first the likes of the user from the profile has items the user likes because the user here likes a red circle and a red triangle we will further the user likes the color red they like circles and they like triangles now once we have a profile of the user we can then match that against the catalogue and recommend other items to the user so let's say the catalogue has a bunch of items in it some of those items are red so we can recommend those to the user so let's let's look at how to build these item profiles for each item we want to create an item profile which we can then use to build user profiles so the profile is a set of features about the item in the case of movies for instance the item profile might include author title actor director and so on in the case of images and videos we might use metadata and tags in the case of people the item profile might be a set of friends of the user even though the item profile is a set of features it's often convenient to think of it as a vector the vector could be either boolean or real valued and there's one entry per feature for example in the case of movies the vector might be the item profile might be a boolean vector and there is a 0 or a 1 for each actor director and so on depending on whether that actor or that director actually participated in that movie let's look at a special case of text for example you might be recommending news articles now what's the item profile in this case the simplify term profile in this case is to pick the set of important words in the document or the item how do you pick the important words in the item the usual heuristic that we get from text mining is a technique called tf-idf or term frequency inverse document frequency many of you may have come across tf-idf in in the context of information retrieval but for those of you who have not here's a quick refresher let's say we are looking at a document or item J and we are computing the score for term or feature I the term frequency TF i J for feature I in document J is just the number of times the feature J or the feature I appears in our document J divided by the maximum number of time that Fame feature appears in any document for example let's say the feature is a certain bird or the word Apple and in the document that we're looking at the word Apple five times but there's another document where the bird Apple appears 23 times and this is the maximum number of times the word Apple appears in any document at all then the term frequency TF IJ is is five divided by twenty three now I'm glossing over the fact that we need to normalize DF to account for the fact that document lengths are different let's just ignore that for the moment now the term frequency captures the number of times a term appears in a document intuitively the more often a term appears in a document the more important a feature it is for example if a document mentions the word Apple five times the word Apple is more important in that document than another document that just mentions it once but how do you compare the beat of different terms for example you know the red bird appearing just a couple of times might be more important than a more common word like the appearing thousands of times this is where the document frequency comes in let n-i be the number of documents have mentioned the term AI and let n be the total number of documents in the whole system the inverse document frequency for the term I is obtained by dividing n by n I the number of documents that mentioned the term AI and then taking the logarithm of that of that fraction notice that the more common term the larger and I and the larger and I the lower the IDF the IDF function ensures you know gives a lower weight to more common words and a high of 8 to rarer words so people these two pieces together the tf-idf score of feature I for document J is obtained by multiplying the term frequency and the IDF so given a document you compute the tf-idf scores for every term in the document and then you sought all the terms in the document dfid of scores and then you have some kind of threshold or you or you might pick the set of birds as the highest tf-idf scores in the document together with this course and that would be the top profile so in this case the dog profile is a real-valued vector as opposed to a boolean vector now that you have item profiles our next task is to construct user profiles let's say we have a user who's rated items with profiles I 1 through I n now remember I one through I n are are vectors of of entries so let's say this I 1 plus I 2 I 3 and so on and here is I n these are each is a vector in a high dimensional space with many many entries now the simplest way to construct a user profile from a set of item profiles is just to average the item profiles your n is the total number of item profiles so if I take all the item profiles in the users you know the of all the items the user has has rated and then take that average that would be a simplest way of constructing a user profile now this doesn't take into account that the user liked certain items more than others so in that case we might want to use a weighted average where the weight is equal to the rating given by the user 4 for each item then you would have a weighted average item profile a variant of this is to normalize these weights using the average rating of the user and we'll see an example that makes this idea clear and of course much more sophisticated aggregations are possible here we only looking at some very simple examples let's look at an example that you know that will clarify weighted average item profiles and how to normalize weights let's start at an example of a boolean utility clicks what's a boolean utility matrix all we have is information of whether a user purchase an item or not for example so each entry is either a 0 or a 1 let's say the items are movies and the only feature is actor the item profile in this case is a vector with 0 or 1 for each factor 0 if then that actor did not appear in that movie and 1 if that actual appear in that movie suppose user X has watched 5 movies and 2 of those movies feature actor a and three of those movies feature actor B now the simplest user profile is just the mean of the item profiles now remember there are 5 vectors and two of those have a 1 a 4 feature a and so the weight of feature a is going to be 2 divided by the total number of item profiles which is 5 which is 0.4 and the weight of feature be correspondingly is going to be 3 by 5 let's look at a more complex example with star ratings suppose we have star ratings in the range 1 to 5 and the user has once again watch 5 movies and there are two movie starring actor a and three movie starring actor B the movies that occur a starred in the user rated 3 and 5 whereas a movie that they're active be acted in the user rated one two and four since we have 5 star ratings and the user gives lower ratings for movies they didn't like and higher rating for movies they liked it's somewhat apparent from these ratings that the user like at least one of the movies from from actor a and one of the movies from actor B but didn't he but they really didn't like to of actor B's movie so once that were rated one and two one and two are in fact negative ratings are not positive ratings and we'd like to capture this fact the idea of normalizing ratings helps us capture the idea that some ratings are actually negative ratings and some appositive ratings but the baseline you know users are very different from each other some users are just more generous in their ratings than others so for user a for instance a for might be a widely positive rating whereas for another user for my despi an average rating to sort of capture this idea we're going to baseline each users ratings by their average rating so in this case the this users average rating is a three if you average all the five ratings that the user has provided the average rating is a three and so what you're going to do is to subtract the average rating from each of the individual movie ratings so in this case the movies with actor a the normalized ratings in that case instead of three and five becomes zero and plus two and for actor B the normalized ratings become minus two minus one and plus one notice that this captures intuition that the user did not like the the first two movies with actor B whereas he really liked the the second movie with with with actor a whether the first movie with actor a was we know was kind of an average movie once you do this normalization then you can compute the profile profile weights but in this case we divide not by the total number of movies but by the total number of movies with a specific feature so in this case there are two movies with actor a and profile wait for actor a the the feature actor a is 0 plus 2 divided by 2 which is 1 and similarly the feature actor B as a profile weight of minus 2 by 3 this indicates a mild positive preference for for actor a and a mild negative preference for actor B now that you have user profiles and item profiles the next task is to recommend certain items to the user the key step in this is to take a pair of user profile and item profile and figure out what the rating for that user and item pair is likely to be remember that both the user profile and the item profile are vectors in a high dimensional space in this case I've shown them in a two dimensional space when the reality of course they are embedded in a much higher dimensional space you might recall from a prior lecture that when you have vectors in a higher dimensional space a good distance metric between the pair of vectors is the angle theta between the pair of vectors in particular you can estimate the angle using the cosine formula the cosine of T theta the angle between the two vectors is given by the dot product of the two vectors divided by the product of the magnitudes and this distance U in this case we will call this a cosine similarity between the user X and the item type now technically the cosine distance is actually the angle theta and not the cosine of the angle right the cosine distance as we studied in an earlier lecture is the angle theta and the cosine similarity is the angle 180 minus theta now the smaller the angle the more similar the item X and the the more similar the user X and the item I are and therefore the similarity 180 minus theta is going to be larger but for convenience we are going to actually use the cosine of theta as our similarity measure notice that as the angle theta becomes smaller cos theta becomes larger and as the angle theta becomes larger and larger the cosine becomes smaller and smaller in fact as theta becomes greater than 90 the cosine of theta becomes negative and so this captures the intuition that as the angle becomes smaller and smaller X and I are more and more similar to each other and and it more likely that X will give a higher rating to item I so the way we make predictions is as follows given the user X we compute the cosine similarity between that user and all the items in the catalog and then you pick the items with the highest cosine similarity and recommend those to the user so that's the theory of content-based recommendations now let's look at some of the pros and cons of the content-based recommendation approach the biggest Pro of the content-based recommendation approach is that you don't need data about other users in order to make recommendations to a specific user this turns out to be a very very good thing because you know you can start working or making cotton-based recommendations from day one for a very first user another good thing about content-based recommendation is that you can recommend to users with very unique days when we go when you get to collaborate a filtering we see that collaborating collaborative filtering to make recommendations to a user we need to find other similar users the problem with that is that if there's a user with very unique or idiosyncratic tastes they may not be any other similar users whereas a content-based approach is able to deal naturally with this with the fact that you can make you know user can have very unique tastes as long as the if we can build item profiles for the items that the user likes and a user profile for the user based on that we can make recommendations to that user a third row is that variable to recommend new and unpopular items now when a new item comes in we don't need any ratings from users to build the item profile the item profile depends entirely on the features of the items are not on how other users rated the item so we don't have a so-called first grader problem that we will see in the in the collaborative filtering approach we can make recommendations for an item as soon as it becomes available and finally whenever the content based approach makes a recommendation you can provide an explanation or to the user for why a certain item was recommended in particular you can just list the content features that cost the item to be recommended for example if you recommend a news article to use of example using a cotton bait approach you may be able to say look in the past you spent a lot of time reading articles that mention in Syria and that's why I am recommending this article on Syria to you so these are some of the pros of the content-based approach but now let's look at the cons the most important problem are the most serious problem with the content-based approach either finding the appropriate features is very very hard for example how do you find features for images or movies producing now in the case of movies we suggested a set of features that include actors and directors and so on but it turns out that movies often cross shonduras and users are not very often loyal to specific actors or directors and the similar case of music it's very hard to sort of you know box music into specific genres and musicians and so on and images of course you know the features are very very hard to find so in general the finding appropriate features to make content based approach this work turns out to be a very very hard problem and this is the main reason why the content-based approach is not more popular the second problem is one of over specialization remember the user profile is built using the item profiles of the the items that the user has rated or purchased now because of this if a user has never rated a certain kind of movie or a certain genre of movie he will never be recommended a movie in that in that genre for example or he'll never be recommended a piece of music that's outside his previous preferences in general people might have multiple interests and might express only some of them in the past and so it's hard to you know so it's very easy this way to miss recommending interesting items users because you don't have any fun enough data on the user another serious problem of the content-based approach is that it's unable to exploit the quality judgments of other users for example there might be a certain video or a movie that's widely popular across the you know wide cross-section of users however the current user has not expressed interest in that kind of movie and therefore the content-based approach will never recommend that movie to that user the final problem that we have the cotton based approach is one of a cold start problem for new users remember the user profile is built by aggregating item profiles of the items the user has rated when you have a new user the new user has not related any items and so the source so there is no user profile so there's a challenging problem of how to build a user profile for a new user in most practical situations new users start with you know most recommender system start of new users with some kind of average profile based on a system-wide average and then over time the user profile evolves as user rates more and more items and becomes more individualized to the use

10 thoughts on “Lecture 42 — Content Based Recommendations | Stanford University

  1. Thank you for putting all together! Just one small question on TF, the simplest understanding is – count(term i) in Doc j/sum(count(term x where x in Doc j) in Doc j), right? Not only word 'apple' itself.

  2. Thank you sir, your videos are excellent. I am in a Data Science bootcamp and your videos are the perfect complexity level for me. Huge help! Please keep them coming!

  3. Great work!
    I also recommend to make tutorial on "Matrix Factorization" Methods as used in "Recommender Systems".

  4. The definition of TF (Term Frequency) is wrongly explained in the video as compared to what is written. It should be the frequency of term(feature) i in the document j divided by the maximum frequency of any other term k in the same document j.

  5. Based on the Cosine Similarity , we get a value between [0,1]. How do you get the user rating after this step?

Leave a Reply

Your email address will not be published. Required fields are marked *