# GloVe Embedding

As we mentioned in the Word2Vec notebook, training your Embedding Matrix involves setting up some fake task for a Neural Network to optimize over.

Stanford’s GloVe Embedding model is very similar to the Word2Vec implementation, but with one crucial difference:

GloVe places a higher importance on *frequency* of co-occurrence between two words.

## Training Notes

First, an enormous `vocab_size x vocab_size`

matrix is constructed as a result of a pass through of your entire corpus to get all unique words.

Then, to reduce dimensionality, we look for a factorization that minimizes the following

$\sum_i \sum_j f(Xij)(\Theta^T_j e_j + b_i + b_j - log(Xij))^2$

Where

`X_ij`

is the number of times`i`

appears in the context of`j`

(say, proximity of 10 words)`f()`

is a*weighting term*that zeros out if the two words don’t ever appear near each other.`b_i`

and`b_j`

are bias terms at the word-level

### Runtime

Because of the exhaustiveness of the co-occurrence matrix construction, GloVe involves a considerable up-front computation cost. This calculation, however, *does* lend itself to some pretty straight-forward parallelization.