CBOW steps
From input vector take embeddings of cont4xt words
Average the context words v’
Multiplying this v’ vectory with output matrix
Get the score and apply softmax
Learn from result and tune the input matrix
Transposed convolution feature out dimension calculation
(i-1)*s+k-2p
Limitations of one hot encoding
Produces sparse, high dimensional vectors and captures no semantic relationship
When is TF - IDF vector outperform bag of words
When the distinguished power of a word is crucial and common words are noisy.
Trade off between CBoW and Skip gram
CBOW fasteand better for frequent words
Skip gram is slower but better for rare words