How do we learn new words?
What is the distributional hypothesis?
Similar context suggests similar meanings
In distributional semantics, we want to find f, where f is
a function that takes in and transforms and compresses contexts to produce a vector that encompasses the meaning of a word
meaning(w) = f(c1, c2, c3, c4)
How do we find function, f?
use co-occurrence vectors
what is a cooccurrence vector?
collect a corpus of documents or sentences
apply basic preprocessing like lower case
count how many times word u appears with word v
the meaning of u is vector [(count(u,v1), count(u,v2)…]
what are the benefits of cooccurrence vectors (3)
what are the disadvantages of cooccurrence vectors
distributional semantics beyond words
cant capture all aspects of semantics