This is the 14th day of my participation in the More text Challenge. For more details, see more text Challenge

The essential

In the last article, we introduced the basics of RNN. Now we introduce the basics of text generation, mainly so that we can flexibly use RNN. Real text generation projects are more complex in practice than this, but the basic principles remain the same, so here’s a primer. RNN Basics review link: juejin.cn/post/697234…

The principle of

RNN is used here to generate text, and other models that can model time series data can be used, such as LSTM. Let’s say we’ve trained an RNN model to predict the next character. Let’s say we’ve limited the input length to 21. Here’s an example:

Input: "The cat sat on the ma"Copy the code

The 21-character text is divided into character-level input and input into the model. RNN is used to accumulate the input information, and the final output state vector H is passed through the full connection layer transformation and the classification of Softmax classifier, and the final output is the probability distribution of a candidate character.

In the example above, typing “the cat sat on the ma” yields a probability distribution of 26 letters and a number of other characters used (possibly punctuation, Spaces, etc.).

"A" --> 0.05 "b" --> 0.03 "c" --> 0.054... "T" -- -- > 0.06... "--> 0.01"; -- > 0.04Copy the code

At this point, the probability of predicting the next character “t” is the highest, so “t” is chosen as the next character, and then “the cat sat on the mat” is obtained by concatenating “T” to “the cat sat on the ma”. Then we take the last 21 characters “he cat sat on the mat” and input them into the model

Input: "He cat sat on the mat"Copy the code

The probability distribution of predicting the next character is added. Is the most probable, so let’s take. After “the cat sat on the mat”, you get “the cat sat on the mat. If you need to continue, repeat the above process. If our text generation needs to end there, we end up with text

The cat sat on the mat.Copy the code

Usually we train with the same data as the target. If you want to generate poetry, you can use tang poetry and Song poetry to train the model. For example, to generate lyrics, you can use Jay Chou’s lyrics to train.

Three ways to choose the next character of the prediction

When you get the probability distribution, and then you predict the next character, there are three ways to do it.

The first is to select the character with the highest probability in the probability distribution, as mentioned above. This approach, while the simplest, is not the best because almost all the predicted characters are certain, but it does not achieve a variety of interesting character results. The formula is as follows:

next_index = np.argmax(pred)
Copy the code

The second method will randomly sample from the multinomial distribution, and the probability that a character will be predicted is the probability that it will be chosen as the next character. In practice, the values in the probability distribution are often very small, and the probabilities of many candidates are not different from each other. In this way, everyone has the same probability of being selected, and the prediction randomness of the next character is very strong. If we get the probability of one character being correctly predicted is 0.1, and the probability of the other characters being predicted is only slightly less than 0.1, then the probability of each character being chosen as the next character is very similar. This approach is too random, and the resulting text tends to be riddled with grammatical and spelling errors. The formula is as follows:

next_onehot = np.random.multimomial(1, pred, 1)
next_index = np.argmax(next_onehot)
Copy the code

The third method is one between the two methods. The next character generated is somewhat random, but not very random. This is adjusted by the temperature parameter, which is a decimal between 0 and 1, and if it is 1 it is the same as the first method. If it is any other value, the probability can be magnified to different degrees, which means that characters with high probability are more likely to be selected, and characters with low probability are less likely to be selected. In this way, there can be obvious probability differentiation, so that the situation in the second method will not occur. The formula is as follows:

pred = pred ** (1/temperature)
pred = pred / np.sum(pred)
Copy the code

training

Suppose we have one sentence as training data, as follows:

Machine learning is a subset of artificial intelligence.
Copy the code

We set two parameters, len = 5 and stride = 3, where len is the input length and stride is the step size, we enter 5 characters as input, and then enter the next character as the label, as follows

Input: Machi Target: nCopy the code

Then, since we have set the stride to 3, we will shift 3 bits to the right in the text, and then select 5 characters as the input, and the next character as the label, as follows:

Input: "hine" target: "l"Copy the code

In this way, 3 characters are continuously shifted to the right, and the newly obtained 5 characters and the next 1 character are input into the model as labels and training data, so that the model can learn the features inside the text. The training data is the key-value pair (string, next character). At this point, all the training data obtained are:

Input: 'Machi' target: 'n' input: 'hine 'target: 'l' input: 'e lea' target: 'r'... Input: 'ligen' target: 'c'Copy the code

Then use the training data to conduct a lot of training model, can be used to generate new text! .

conclusion

The process of training model generally requires three processes:

(segment, next_char) 2. Use one-hot to encode characters, segment to l*v vector, next_char to V *1 vector, l is the input length, v is the total number of characters 3. Build a network, the input is the matrix of L * V, and then capture the text features through RNN or LSTM, and then convert the last features to the full connection layer. The full connection layer uses Softmax as the activation function, and finally outputs a v*1 probability distribution. The selection method of the next character can be seen from the content above.Copy the code

The process of generating text generally requires three processes:

Normally, with the model trained, we would enter a string as the seed input, use it as the beginning of the text we are going to generate, and then repeat the process: A) Use one-hot vector to represent the input and then input it into the model b) Select a character from the neural network output probability distribution as the next predicted character c) Concatenate the predicted characters into the previous text and select the new input textCopy the code

case

Here are two small examples I implemented earlier to review the RNN and LSTM for your likes.

https://juejin.cn/post/6949412997903155230
https://juejin.cn/post/6949412624215834638
Copy the code

There are also a number of open source text generation projects on Github, which are slightly more complex to implement, but work on the same principles as I did. Here are two.

https://github.com/wandouduoduo/SunRnn
https://github.com/stardut/Text-Generate-RNN
Copy the code