Why deep learning?

Language model

n-gram


The word vector


Neural network model

What is the link between word vector model and neural network model?

Hierarchical Softmax

Two schemes of neural network implementation:

CBOW

Harvard’s tree









CBOW model example


CBOW solves the target

Gradient rise solution

To find a maximum, use the gradient method.

Negative Sampling, generally adopt this modeling method (simple)

If the thesaurus is very, very large, then what about the word nodes in the middle or behind the Haverman tree? The computational complexity is still very high.