Information Retrieval: Unit 4: Web Search - Part 3
Information Retrieval: Unit 4: Web Search - Part 3
Information Retrieval: Unit 4: Web Search - Part 3
By
A.BHUvaneshwari
810016104018
B.E. CSE – IV Year – ‘A’ Section
A 4 5 1
B 5 5 4
C 2 4 5
D 3 3
Collaborative Filtering(Con…)
• In above Example,
A,B,C,D are users
HP1,HP2,HP3,TW,SW1,SW2,SW3 are movies
scale 0 to 5.
• Consider users A,B,C,D with rating vectors ra,rb,rc and rd.
• Similarity between two users – sim(A,B)
• Capture intuition that sim(A,B)>sim(A,C)
Option1:Jaccard Similarity
Formula:
Sim(A,B)=|ra rb|/|rA U rB|
Sim(A,C)=|rA rB|/rA U rC|
Sim(A,B)=1/5=0.2
Sim(A,C)=2/4=0.5
Sim(A,B) < Sim(A,C)
Problem:
Ignores the rating values
Option2:Cosine Similarity
Formula:
Sim(A,B)=cos(rA,rB)
A[4 0 0 5 1 0 0 ]
B[5 5 4 0 0 0 0 ]
Sim(A,B)=0.37
Sim(A,C)=0.32
Sim(A,B) > Sim(A,C)
Problem:
Treat missing rate as negative
Option3:Centered Cosine similarity
• Normalize Ratings by
subtracting row mean
• Example:
• Mean for row1=4+5+1/3=10/3
• Mean for row2=5+5+4/3=14/3
• Mean for row3=2+4+5/3=11/3
• Mean for row4=3+3/2=6/2=3
Option3:Centered Cosine similarity(Con..)
HP1 HP2 HP3 SW TW1 TW2 TW3 HP1 HP2 HP3 SW TW1 TW2 TW3
D 3 3 D 0 0
Option3:Centered Cosine similarity(Con…)
• Sum of ratings in any row is equal to zero(0).
• “0” is the average rating
• positive ratings are user like the movie more than average
• Negative ratings are user like the movie less than average
• Sim(A,B)=cos(rA,rb)=0.09
• Sim(A,C)=cos(rA,rC)=-0.56
• Sim(A,B) > Sim(A,C)
• Treat missing rate as “average”
• Also known as “Pearson Correlation”
Rating Predictions
Item-Item Collaborative Filtering
Item-Item Collaborative Filtering(Con….)
Example – Rating Predictions
Item-Item CF(Con…..)
Item-Item CF(...)
• Here we use Pearson correlation as similarity:
1)Subtract mean rating m, from each movie I [ i=1,2…6]
example:
m1={1+3+5+5+4}/5=3.6
row1={-2.6,0,-0.6,0,0,1.4,0,0,1.4,0,0.4,0}
2)Compute cosine similarities between rows.
• Neighbour selection:
N=2
Identify movies similar to movie 1,rated by user 5
Item-Item CF(….)
Item-Item CF(Con…)
Cosine similarity:
S(1,2)=-0.18 S(1,3)=0.41
S(1,4)=-0.10 S(1,5)=-0.31
S(1,6)=0.59
Weighted average:
r15=[0.41*2 + 0.59*3]/0.41+0.59
r15=2.6
Content-based Recommendations
Plan of Action
Item Profiles
For each item , create an item profile
Profile is a set of features
Movies : actor , director,…..
People : set of friends
Text Features:
Profile=set of “important “ words in item
(document)
How to pick important words?
Using TF-IDF
(Term frequency * Inverse Document
frequency)
User Profiles
• User has rated items with profiles i1,i2,………..,in
• Simple:
(weighted ) average of rated item profiles
• Variant:
Normalize weights using average rating of user
Example1:Boolean Utility Matrix
Example2:Star Ratings
INVISIBLE WEB