Article/Han spirit

1. What is word vector

As can be seen from the following figure: the spectrogram has rich features of speech signals; The natural matrix dense representation of the picture is directly accessible to computers; The meaning of word vector is to express the literal information that cannot be directly understood by the computer into comprehensible number vector, and to contain the grammatical and semantic information of the literal itself.

2. How to do word vector

Generally speaking, it can be divided into two categories: decentralized and distributed.

2.1. Discretized table expressions

One-hot representation, also known as one-bit effective representation, uses the N-bit state register to encode N effective values, each state has its own register bit, and at any time, only one of them is valid and is set to 1, while the rest are set to 0. It has the advantages of being easy to understand and easy to generate. On the other hand, this method is gradually replaced due to the dimension disaster and the orthogonal of each vector, which does not retain any grammatical and semantic information. I like deep learning.I like NLP. I enjoy flying. First generation word {I, like, deep, learning, NLP, enjoy, flying} and every word said is as follows: I =,0,0,0,0,0,0 [1]; Like =,1,0,0,0,0,0 [0]…

2.2. Distributed table format

Discretization means that the semantic information of the text is discarded, which makes the task of semantic understanding stagnate for a period of time. A series of attempts have been made to integrate semantic information into vector coding by the large class method of distributed representation.


2.2.1. Based on statistics

The method based on statistics tries to preserve semantic information by considering the positional relationship between words, and the method based on co-occurrence matrix is a representative one.

2.2.1.1. Based on co-occurrence matrix

Set a window size at which text scans are performed to record matching pairs that occur. Similarly, take the text in 2.1 as an example, assuming that the window size is 2, 0 and 2 in the first line in the following figure respectively represent :I I appears 0 times in the text, I like appears 2 times, and so on.


2.2.1.2. Based on singular value decomposition

The singular value decomposition (SVD) method is mainly used to alleviate the problem of high dimensional data caused by the co-occurrence matrix method. We hope to map to low-dimensional space through singular value decomposition for subsequent processing.

2.2.2. Based on language model

For language sequences W1,W2… Wn, the language model is to calculate the probability of the sequence, which can be obtained by the chain rule:


2.2.2.1. Word2vec

Word2vec, proposed by Google in 2013, is based on shallow neural networks to solve word vectors. There are two design patterns, CBOW and Skip-Gram. CBOW adopts the strategy of using context to predict intermediate words. Skip-gram, by contrast, uses intermediate words to predict context. According to the number of training corpus and the degree of emphasis on rare characters, the corresponding strategies can be flexibly selected.


2.2.2.2. Bert

After experiencing word2VEc,GPT,ELMo and other models, BERT model innovatively proposes masked LM loss and next sentence loss based on Transformer model and bidirectional context information. Achieved the best results in 11 NLP downstream tasks. The following figure on the left shows that the text embedding is composed of three parts: word, segment and position information, and then sent to the Transformer structure on the right for text encoding.




















This question is too difficult; I won’t do it; This question is too difficult; 618, ready to chop your hands off; False – indicates that two sentences are not contextual

3. Preliminary application in front-end intelligence

3.1 Problem Description

In the current intelligent code generation task, P2C, we want to directly generate the corresponding code from the requirements document, with a small example to show the role of NLP in this. As shown in the figure below, we begin by abstracting the problem. Question 1: Get the intention of the sentence. Problem 2, extract all entity relationships in a single sentence, namely SPO triples.

3.2 Solution Description

3.2.1. Hierarchical multi-label classification

Transformer is used as the coding layer of the model and BCELossWithLogits is used as the common loss function of multi-label classification tasks. At the same time, the Recursive regularization regularization term is adopted to constrain the label level information.

3.2.2. Entity relationship extraction

The Bert model is taken as the main framework, the text is taken as Sentence A, and the relationship to be predicted is taken as Sentence B, thus constituting the input of Bert model. Combining the classification task of relationship prediction and serialization annotation task of entity extraction into one model structure, it is expected that the input of existing relationship information as third-party knowledge will also bring effect gain to the accuracy of entity extraction.