instructions

This is a personal exercise note, using Python language, Keras built neural network data using Wang Leehom’s Lyrics, including 91 songs, with 2 columns of attributes: song Title, Lyrics.

Importing various packages

import re
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from keras.layers import Dense, LSTM
from keras.utils import to_categorical
from keras.layers.embeddings import Embedding
from keras.preprocessing.text import Tokenizer
from keras.models import Sequential, load_model
from keras.preprocessing.sequence import pad_sequences

%matplotlib inline
Copy the code

data

This is an Excel file about Wang Lee Hom’s Lyrics. It contains 91 songs with 2 columns of attributes: Title and Lyrics.

Title Lyrics
0 Bridge of Faith Qin Mingyue Han guan Wan Li Long March people have not yet but the dragon city will fly in not teach Hu Ma Degree Yinshan mountain smoke thousands of miles of mass graves in chaotic times lonely soul nobody…
1 DO U Love Me Look at me. Look at me. I want to ask you a question.
2 Dragon Dance It’s called singing with Dan Dan. He doesn’t know how to sing with Dan Dan.
3 FLOW Follow me Flow follow me Flow so free so free Yi Li A E Yi Li A O…
4 appiness x 3 Loneliness x 3 happiness happiness happiness loneliness lonel…
file_path = '.. /input/wanglihong/wanglihong.xlsx'
songs = pd.read_excel(file_path)
print(songs.shape)
songs.head()
Copy the code
(91, 2)
Copy the code
Title Lyrics
0 Bridge of Faith Qin Mingyue Han guan Wan Li Long March people have not yet but the dragon city will fly in not teach Hu Ma Degree Yinshan mountain smoke thousands of miles of mass graves in chaotic times lonely soul nobody…
1 DO U Love Me Look at me. Look at me. I want to ask you a question.
2 Dragon Dance It’s called singing with Dan Dan. He doesn’t know how to sing with Dan Dan.
3 FLOW Follow me Flow follow me Flow so free so free Yi Li A E Yi Li A O…
4 Happiness x 3 Loneliness x 3 happiness happiness happiness loneliness lonel…

Check out the lyrics of the first five songs

for i in range(5):
    print(i, '\n', songs['Lyrics'][i])
Copy the code
0 Qin Mingyue Han Guan Wanli Long march not yet but make the Dragon City fly will not teach Hu Madu Yinshan smoke thousands of miles of mass graves in the turbulent times lonely soul no visit silent sky pen and ink cold pen knife spring and Autumn in blood to talk about love and hate can not scrawl drums beat beat with trust to make a vow I will endure this fate like a bridge with fluttering flag fluttering you want to go please immediately draw a knife Love to write Talk about love, hate Don't scribble The world of mortals burn to burn In order to life and death Clear conscience who prove important A look at this fate is like a bridge story You go You and I remove shirt The Great Wall becomes a communal When warlords moon han The long march people did not return But make liuzhou fly will be not teach ma degrees yinshan hu Flesh fortification of arrows Armor blood reflected moonlight The distance Hu Jia push heartbroken Talk about love and hate can't be careless war drum beat beat make a vow with trust I'll make it through Love is like a bridge with flags fluttering fluttering you want to go please draw a knife immediately love write it all off talk about love and hate can't be careless world of people burn burn with life and death worthy to prove who is important This love is like a bridge story take a look go you and I take off our battle clothes Look at me Look at my eyes I want to ask you ask you a question What play are you and I still playing like life or life like a play The person you love is I really am I is another look if you would like I take off my mask to you, all of Me, all of Me, all of Me, all of Me, all of Me, all of Me, all of Me, all of Me, all of Me, all of Me, all of Me, all of Me, all of Me, all of Me, all of Me, all of Me, all of Me, all of Me, all of Me, all of Me, all of Me, all of Me, all of Me Dododododododododo DoDoDoDo U Love Me look at Me look at my eyes ask you a question you, you, Me, Me Still play what play Play as life Life is like a play Whether the person you love me if I really another is I like If you are willing to remove my mask I give you all I all I have to give you You pick a bear The horse I have carried off the sunrise sunset Give me a response chanting my name Monkey King's charm with kids Do U Love Me Do U Love Me Do U Love Me Do U Love Me Do U Love Me Do U Love Me Do U Love Me DoDoDoDo Do U Love Me? Do U Love Me, Do U Want Me, Do U Want Me, Do U Want Me, Do U Love Me Mohammed is Du Minghan Du Dudu du ~ han This is called sing in abdomen He wouldn't sing in abdomen He wouldn't sing in abdomen Root root spikes with this root root root spikes root spikes He wouldn't sing in abdomen Root root spikes He wouldn't sing in abdomen Root root Root root root Ben don't even like root root Ben don't even like root root don't even know how to sing root root don't even know how to sing root root don't even know how to sing Root root don't even know how to sing Root root don't even know how to sing root root don't even know how to sing root root don't even know how to sing root root Why don't you come back why don't you come back root root root root why don't you come back Ah, ah, ah, ah, ah, ah The Flow... Flow I think I'm gonna rock I think I'When the rhythm starts everybody can feel the notes play with me I can't stop I can't stop from head to toe No reason to be a little weird ABC Do re mi fa sol The rhythm gently swing with a subtle smile So free (the rhythm starts hey hey hey hey) so free (we all start) (We all start) Singing so clear inspiration starts pouring out A feeling woo... There are always two styles of handover between the same... (Just Flow) I think I'm gonna rock I think I'M gonna roll when I hear music Start my feet go When the rhythm starts everybody can feel the notes Play with me I can't stop I can't stop Follow me Flow (my fingers start) Follow me Flow (MY fingers start) So free (the rhythm starts) so free (We all start) (we all start'm gonna rock I think I'M gonna roll When I hear music start my feet start to go (my feet start to go) When the rhythm starts to turn everybody can feel the notes play together I can't stop I can't stop I can't stop I think I'm gonna rock I think I'I can't stop, I can't stop, follow me Flow (follow me Flow) Follow me Flow (follow me Flow) so free (so free) So free (so free) Flow with me, Flow with me, so free, so free happiness happiness loneliness loneliness loneliness just can't live just can'T die you seem to see the shadows of the building come alive and surround it free to be uncomfortable Expecting colors are black and white Who's your friend who's your true love What are you trying to prove woo yeah all your fragile I know I just wanna get you out of here get out of your nest you don't have to be the one you can get out get out get out of your nest you don't have to be the best happiness happiness happiness loneliness loneliness loneliness happiness happiness happiness loneliness loneliness loneliness just can't live just can'Newspapers in the morning and trash in the night half of you are in a coma trying to wake the other half up What do you want to find what do you want to know No one can give you the answer Woo yeah All your fragile I know I just wanna take you away right now you can get out get out get out of your nest you don't have to be the one you don't have to be the best you can get out get out get out of your nest happiness happiness happiness loneliness loneliness loneliness happiness happiness happiness loneliness loneliness loneliness just can't live just can'No one can give you the answer. What do you want to findCopy the code

Regular expression

Take song three, for example

English, symbols do not

song = re.sub(r"[a-zA-Z()''…?.,!!,-]+".' ', songs['Lyrics'][3])
song
Copy the code
'Follow me follow me so free, so free Follow me Follow me As soon as I hear my feet start when the rhythm starts to turn Everybody can feel the notes Play with me I can't stop I can't stop from head to toe No reason it's a little weird the rhythm gently swing with a subtle smile Cool jump This is new Why don't you join me Follow me, my fingers start to follow me, my fingers start to go so free and the rhythm start to go so free and we all start we all start to sing so clear and the inspiration start to pour a feeling how strong the feeling is no need to explain in detail there is always a difference between the two styles as soon as I hear my feet start and the rhythm starts to turn everyone can feel it I can't stop playing with that note I can't stop playing with that note I can't stop following me my fingers start following me I start snapping my fingers so free the rhythm starts so free we all start we all start as soon as you hear my feet start my feet start when the rhythm starts turning everybody can feel that note playing with that note I can't stop I can't stop As soon as you hear my feet start when the beat starts everybody can feel the notes play with me I can't stop I can't stop follow me follow me follow me follow me so free so free so free follow me so free so free '
Copy the code

Too many Spaces, no! Leave only one space

re.sub('\s{2,}'.' ', song)
Copy the code
'Follow me follow me so free, so free Follow me Follow me As soon as I hear my feet start when the rhythm starts to turn Everybody can feel the notes Play with me I can't stop I can't stop from head to toe No reason it's a little weird the rhythm gently swing with a subtle smile Cool jump This is new Why don't you join me Follow me, my fingers start to follow me, my fingers start to go so free and the rhythm start to go so free and we all start we all start to sing so clear and the inspiration start to pour a feeling how strong the feeling is no need to explain in detail there is always a difference between the two styles as soon as I hear my feet start and the rhythm starts to turn everyone can feel it I can't stop playing with that note I can't stop playing with that note I can't stop following me my fingers start following me I start snapping my fingers so free the rhythm starts so free we all start we all start as soon as you hear my feet start my feet start when the rhythm starts turning everybody can feel that note playing with that note I can't stop I can't stop As soon as you hear my feet start when the beat starts everybody can feel the notes play with me I can't stop I can't stop follow me follow me follow me follow me so free so free so free follow me so free so free '
Copy the code

Good. Write it as a function

def regex_func(text):
    text = re.sub(r"[a-zA-Z()''…?.,!!,-]+".' ', text)
    text = re.sub('\s{2,}'.' ', text)
    return text
Copy the code

Create a new DataFrame

new_songs = pd.DataFrame(columns=songs.columns)
new_songs['Title'] = songs['Title']
Copy the code

Apply the re function to the lyrics and add a property that contains the length of the lyrics

length = []
for i in range(len(new_songs)):
    new_songs.loc[i]['Lyrics'] = regex_func(songs['Lyrics'][i])
    length.append(len(new_songs['Lyrics'][i]))
new_songs['Length'] = length
new_songs.head()
Copy the code
Title Lyrics Length
0 Bridge of Faith Qin Mingyue Han guan Wan Li Long March people have not yet but the dragon city will fly in not teach Hu Ma Degree Yinshan mountain smoke thousands of miles of mass graves in chaotic times lonely soul nobody… 396
1 DO U Love Me Look at me. Look at me. I want to ask you a question. 490
2 Dragon Dance It’s called singing with Dan Tian. He doesn’t know how to sing with Dan Tian. He just… 271
3 FLOW Follow me follow me so free, so free follow me follow me as soon as I hear my feet start, when the rhythm starts to turn, everybody… 458
4 Happiness x 3 Loneliness x 3 You seem to see the shadows of the building come alive surrounded by freedom to uncomfortable. the colors of expectation are black and white who’s your friend and who’s your true love… 200

Now that you have a cleaner set of data, you can make love

Input output 1

Take the window length as 10, the window step size as 1, that is, input the string “Qin Mingyue Han Guan Wanli”, and then output “long”; Input “when the moon is bright and the han is close to ten thousand miles long”, then output “zheng”, constantly slide to the next Chinese character, and so on, this is the so-called Sliding Window, as shown in the picture below:

The input The output
Qin Mingyue Han Guan wanli long
When the moon han guan ten thousand miles long sign
The bright moon han Shi Guan long march people
On the han pass thousands of miles of the long March not
Han Shiguan thousands of miles of the long March also

Next, convert the Chinese character to a numeric type

serialization

n = len(new_songs) # Total songs
print('There are {} songs'.format(n))
Copy the code
There are 91 songsCopy the code

Combine all the lyrics into one long string

texts = ' '
for i in range(n):
    texts += new_songs['Lyrics'][i]
print('Total length of all lyrics :', len(texts))
Copy the code
Total length of all lyrics: 33,575Copy the code

Using Keras

Use Tokenizer() to convert text to sequences, that is, integers

tokenizer = Tokenizer()
tokenizer.fit_on_texts(texts)
Copy the code

The subscript for each character (not just Hanzi, of course, but possibly other weird symbols, collectively referred to as Hanzi) returns the dictionary form hanzi: subscript

tokenizer.word_index # {return descending order according to the word frequency dictionary ' ': 1,' I ': 2,' you ': 3,' no ', 4, 'is' : 5,' a ': 6,' love ': 7,... }
Copy the code

Frequency of occurrence of each word, return dictionary form – Kanji: frequency. Hereinafter referred to as word frequency

tokenizer.word_counts # returns OrderedDict ([(" qin ", 2), (' when ', 93), (" Ming ", 62), (' month ', 35), (' han ', 4), (' closed ', 17). )
Copy the code

Check the length of the dictionary

print('Dictionary length:', len(tokenizer.word_counts))
Copy the code
Length of dictionary: 1636Copy the code

Use Texts_to_SEQUENCES to map Chinese characters to integer sequences

sequences = tokenizer.texts_to_sequences(texts)
Copy the code
print('" {} {}'.format(texts[: 10], 'respectively mapped to integers:'))
print(sequences[: 10])
Copy the code
"The bright moon in Qin dynasty and the Wan Li in Han Dynasty."Respectively are mapped into integer: [[1062], [46], [87], [171], [805], [46], [341], [], [220], [31]]Copy the code

However, we don’t need to use all the characters in the dictionary for our lyrics, some of which only appear once or twice, so we can ignore the consideration

Tokenizer () provides the num_words parameter. When used, the first num_words (num_words-1) is taken, and the extra characters are empty, but the dictionary length is the same as the original

tokenizer = Tokenizer(num_words=num_words)
tokenizer.fit_on_texts(texts)
print('Dictionary length:', len(tokenizer.word_index))
Copy the code
Length of dictionary: 1636Copy the code
sequences = tokenizer.texts_to_sequences(texts)
print('" {} {}'.format(texts[: 10], 'respectively mapped to integers:'))
print(sequences[: 10])
Copy the code
"The bright moon in Qin dynasty and the Wan Li in Han Dynasty."Respectively are mapped into integer: [[], [46], [87], [171], [], [46], [341], [], [220], [31]]Copy the code

Comparing the results for the two sequences above, there are more empty lists in the second sequences, indicating that NUM_words are in play

Pad_sequences

Pad_sequences (sequences, maxlen=None, PADDING =’pre’, TRUNCating =’pre’, value=0.0)

  • Sequences: a list of lists, where each element is a sequence.
  • Maxlen: Integer, the maximum length of all sequences.
  • Padding: String, ‘pre’ or ‘post’, is completed at the front or back of the sequence.
  • Truncating: String, ‘pre’ or ‘post’, removes the value of a sequence greater than maxlen, either truncated at the front or back of the sequence.
  • Value: indicates the floating point value used for completion.

Empty values are then filled with PAD_SEQUENCES

sequences = pad_sequences(sequences)
sequences[: 10]
Copy the code
array([[  0],
       [ 46],
       [ 87],
       [171],
       [  0],
       [ 46],
       [341],
       [  0],
       [220],
       [ 31]], dtype=int32)
Copy the code
print('Minimum index: {}, maximum index: {}'.format(sequences.min(), sequences.max()))
Copy the code
Minimum index: 0; maximum index: 399Copy the code

So the sequences index corresponds to num_words=400, which is the only 400 characters we will use later

The new column

This column is the new column after the lyrics of each song are mapped to a sequence of integers

star = 0
end = 0
new_songs['Sequences'] = ' '
for i in range(n):
    end += new_songs['Length'][i]
    new_songs.loc[i, 'Sequences'] = sequences[star: end]
    star = end
new_songs.head()
Copy the code
Title Lyrics Length Sequences
0 Bridge of Faith Qin Mingyue Han guan Wan Li Long March people have not yet but the dragon city will fly in not teach Hu Ma Degree Yinshan mountain smoke thousands of miles of mass graves in chaotic times lonely soul nobody… 396 [0, 46, 87, 171, 0, 46, 341, 0, 220, 31, 212,…
1 DO U Love Me Look at me. Look at me. I want to ask you a question. 490 [49, 229, 39, 2, 0, 39, 39, 2, 1, 49, 229, 0…
2 Dragon Dance It’s called singing with Dan Tian. He doesn’t know how to sing with Dan Tian. He just… 271 [263, 18, 389, 0, 389, 0, 14, 5, 0, 87, 0, 0…
3 FLOW Follow me follow me so free, so free follow me follow me as soon as I hear my feet start, when the rhythm starts to turn, everybody… 458 [150, 33, 2, 0, 150, 33, 2, 0, 11, 13, 26, 139…
4 Happiness x 3 Loneliness x 3 You seem to see the shadows of the building come alive surrounded by freedom to uncomfortable. the colors of expectation are black and white who’s your friend and who’s your true love… 200 [0, 3, 0, 359, 39, 57, 86, 0, 0, 360, 0, 348,…

Input output 2

Now that you have numeric data, enter a sequence and print the next value:

The input The output
[0, 46, 87, 171, 0, 46, 341, 0, 220, 31] [212]
[46, 87, 171, 0, 46, 341, 0, 220, 31, 212] [0]
[87, 171, 0, 46, 341, 0, 220, 31, 212, 0] [15]
[171, 0, 46, 341, 0, 220, 31, 212, 0, 15] [127]
[0, 46, 341, 0, 220, 31, 212, 0, 15, 127] [43]

Next, the sliding window

Sliding window

As there is no connection between the lyrics of each song, the lyrics of one song will not be predicted to the lyrics of another song, so the whole texts cannot be directly used to slide the window, but each song is processed separately

Take the first song, for example

seq = new_songs['Sequences'][0]
seq[: 10]
Copy the code
[0, 46, 87, 171, 0, 46, 341, 0, 220, 31]
Copy the code

Take the length of the sliding window as 10, that is, the length of the input sequence of the subsequent neural network

max_len = 10 The sliding-window length is the length of the input sequence
len_lrc = new_songs['Length'] [0]# Length of lyrics for each song
X = []
y = []
for i in range(len_lrc - (max_len+1)):
    X.append(seq[i: i + (max_len+1)])
    y.append(seq[i + (max_len+1)])
Copy the code

I got X and Y for the first song

Functional form

Shape =(?); shape=(?); 11),
def build_matrix(sequence, max_len = 10):
    max_len += 1
    matrix = []
    length = len(sequence)
    for i in range(length - max_len):
        matrix.append(sequence[i: i + max_len])
    matrix = np.array(matrix)
    X = matrix[:, :-1]
    y = matrix[:, -1]
    return X, y
Copy the code

Sequence of merger

# n = 91 number of songs
X, y = build_matrix(new_songs['Sequences'] [0])for i in range(1, n):
    sequence = new_songs['Sequences'][i]
    XX, yy = build_matrix(sequence)
    X = np.concatenate([X, XX])
    y = np.concatenate([y, yy])
Copy the code

You get X and y for all the lyrics

X.shape, y.shape
Copy the code
((32574, 10), (32574))Copy the code

We get 32,574 lines of input

Input output 3

The input sequence is graphically quantized (here it is a single Chinese character, so it is called word vector, but more word vector is used), and the output is One HOT encoded, for example (the value is written blind) :

The integer One Hot The word vector
3 [0, 0, 0, 1, 0, 0] [0.32, 0.11]
6 [1, 0, 0, 0, 0, 0] [0.51, 0.62]
4 [0, 0, 1, 0, 0, 0] [0.26, 0.41]

model

In Keras, Embedding layer Embedding can be input for vectorization

Embedding(input_dim, out_dim, input_length=None)

  • Input_dim: an integer larger than or equal to 0. It is the dictionary length, that is, the maximum subscript of input data +1
  • Out_dim: an integer greater than 0, representing the dimension of the fully connected embedding
  • Input_length: indicates the length of the input sequence if the length of the input sequence is fixed

Now, for example, take X[: 1] as an example

print('X [1] is', X[: 1].shape, 'the matrix')
print('It contains the value:', X[: 1])
Copy the code
X[: 1] is the (1, 10) matrix which contains the values: [[0 46 87 171 0 46 341 0 220 31]]Copy the code

Embedding input_dim is the number of Chinese characters used num_words, out_DIM is the length we need to vectorize the integer, input_length is the length of input sequence, our x.shape [1] = 10, which is 10. For example, we write:

embedding = Sequential()
embedding.add(Embedding(num_words, 12, input_length=X.shape[1]))
Copy the code
WARNING: tensorflow: the From/opt/conda/lib/python3.6 / site - packages/tensorflow/python/framework/op_def_library py: 263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removedin a future version.
Instructions for updating:
Colocations handled automatically by placer.
Copy the code

Shape = (1, 10) to (1, 10, 12)

print(embedding.predict(X[: 1]))
print(embedding.predict(X[: 1]).shape)
Copy the code
[[[-9.2859752E-03-1.9947542E-02 1.4158692E-02 2.5630761E-02-1.3254523E-02-4.4673540E-02 2.0926904E-02-2.1672739E-02 4.5357570E-03 7.1933120E-04 2.3986842E-02-1.6472042E-02] [3.5469163E-02-4.4115938E-02-3.4637891E-02-2.7161956E-02 -2.3825645E-02 2.0239640E-02 2.5555268E-03-1.7135501E-02-3.0841781E-02 2.8748605E-02-3.9590113E-03 6.0198195E-03] [ 1.6848337E-02-1.7353892E-02-3.8705468E-02-4.8205614E-02 3.5994854E-02-1.7381988E-02 7.6504238E-03-7.6918602E-03 -1.4761232E-02 7.7899583E-03-4.9694888E-03 3.0793250E-05] [3.5687301E-02-1.8727556E-03-4.7537651E-02-2.4941897E-02 -2.4263645E-02-1.2860574E-02 3.2451559E-02 2.3206424e-02 3.7631344E-02-4.2071544E-02-1.8674839E-02 3.9704118E-02] [-9.2859752E-03-1.9947542E-02 1.4158692E-02 2.5630761E-02-1.3254523E-02-4.4673540E-02 2.0926904E-02-2.1672739E-02 4.5357570E-03 7.1933120E-04 2.3986842E-02-1.6472042E-02] [3.5469163E-02-4.4115938E-02-3.4637891E-02-2.7161956E-02 -2.3825645E-02 2.0239640E-02 2.5555268E-03-1.7135501E-02-3.0841781E-02 2.8748605E-02-3.9590113E-03 6.0198195E-03] [ 2.6380885E-02-1.3970364E-02 4.0359497E-03 4.9115308E-03-1.5673112E-02 2.5444303E-02-3.0493153E-02-4.5944821E-02 3.2101501E-02-2.2213591E-02-2.4235893E-02-2.3439897E-02] [-9.2859752E-03-1.9947542E-02 1.4158692E-02 2.5630761E-02 -1.3254523E-02-4.4673540E-02 2.0926904E-02-2.167273902 4.5357570E-03 7.1933120E-04 2.3986842E-02-1.6472042E-02] [-3.0420924E-02 4.8257243E-02-7.5347200E-03 1.0823570E-02 2.1067370E-02 3.6987696E-02 3.3310857E-02-7.7851191E-03 4.3326866E-02-1.8127739E-02-3.8963556E-02 7.5731166E-03] [-5.6729540E-03 3.0662213E-02-2.9343033E-02 3.2814298E-02 -4.9867988E-02 4.5566510E-02 3.3448350E-02-3.5617065E-02 4.3894444E-02-3.1913005E-02-3.4133270E-02-3.0503750E-02]] (1, 10, 12)Copy the code

From above, Namely 0 -> [0.00410016-0.04466455-0.00399476 0.01623822-0.04920522-0.00889667-0.01161783-0.00897527-0.01439846 -0.01946389-0.02629197 0.04435097] 46 -> [-0.02194022 0.00509825 0.0128585 0.00029064-0.00293522 0.00266709 0.00053818 -0.02969973-0.03511238 0.02033797 0.02721104 0.0055184]…

So we can start work right away!

We need to predict the next Chinese character, so the number of nodes in the output layer is num_words, i.e. 400, using softmax activation function, then get the subscript of the maximum output value, and finally find the corresponding Chinese character in the dictionary

The loss function uses categorical_crossentropy and the optimization function uses Adam

When using a categorical_crossentropy loss, your target value should be the categorization format (that is, if you have 10 classes, the target value for each sample should be a 10-dimensional vector that is 0 except for the index 1 for the category).

model = Sequential()
model.add(Embedding(num_words, 128, input_length=X.shape[1]))
model.add(LSTM(64))
model.add(Dense(64, activation='relu'))
model.add(Dense(num_words, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')

model.summary()
Copy the code
_________________________________________________________________
Layer (type)                 Output Shape              Param #   ================================================================= embedding_2 (Embedding) (None, 10, 128) 51200 _________________________________________________________________ lstm_1 (LSTM) (None, 64) 49408 _________________________________________________________________ dense_1 (Dense) (None, 64) 4160 _________________________________________________________________ dense_2 (Dense) (None, 400) 26000 = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = Total params: 130768 Trainable params: 130768 Non - trainable params: 0 _________________________________________________________________Copy the code

Use to_categorical() for One hot encoding of Y

y = to_categorical(y, num_classes=num_words)
Copy the code

Start training

It takes a long time to wait

model.fit(X, y, batch_size=256, epochs=500, verbose=0)
# model.save('lrc_model_0.h5')
Copy the code

visualization

loss = model.history.history['loss']

plt.style.use('bmh')
plt.figure(figsize=(12, 8))
plt.plot(range(len(loss)), loss)
plt.title('LSTM')
plt.xlabel('Iterations')
plt.ylabel('Loss')
Copy the code
The Text (0, 0.5,'Loss')
Copy the code

To predict

test_lrc = 'When the beat begins to turn' # Enter the beginning of the lyrics
test_sequence = tokenizer.texts_to_sequences(test_lrc) # serialization
test_sequence = pad_sequences(test_sequence).reshape(1, -1)
test_sequence = pad_sequences(test_sequence, X.shape[1]) The input sequence length is less than 10, so use PAD_SEQUENCES to fill it up
test_sequence
Copy the code
array([[  0,   0,   0,   0, 149, 374,   0,  38, 148, 131]], dtype=int32)
Copy the code
print('The maximum index of output is:', model.predict(test_sequence).argmax())
Copy the code
The maximum subscript of the output is 0Copy the code

The dictionary in Tokenizer() starts with a subscript of 1, so we treat the subscript 0 as a space

try:
    print(tokenizer.index_word[0])
except:
    print('Not in the dictionary')
Copy the code
It's not in the dictionaryCopy the code

We need to make the model output, we need to have an input loop, adding the output to the original input, so we have a new input

test_lrc += ' ' # subscript 0, so add a space
test_lrc
Copy the code
'When the beat begins to turn'
Copy the code

After the new Chinese characters are added

test_sequence = tokenizer.texts_to_sequences(test_lrc)
test_sequence = pad_sequences(test_sequence).reshape(1, -1)
test_sequence = pad_sequences(test_sequence, X.shape[1])
test_sequence
Copy the code
array([[  0,   0,   0, 149, 374,   0,  38, 148, 131,   0]], dtype=int32)
Copy the code

Functional form

# Enter a text and convert it to a sequence
def input_sequence(text, max_len=10):
    sequence = tokenizer.texts_to_sequences(text) # serialization
    sequence = pad_sequences(sequence).reshape(1, -1) Filling 0 #
    sequence = pad_sequences(sequence, maxlen=max_len) # complement or truncate
    return sequence
Copy the code
Get the characters in the dictionary
def next_word(y_pred):
    idx = np.argmax(y_pred) # subscript of the maximum value
    if idx == 0: # does not exist in the dictionary if the subscript is 0
        return ' '
    else:
        return tokenizer.index_word[idx]
Copy the code

Try to output lyrics of length 200

lrc = 'When the beat begins to turn'
for i in range(200):
    X_sequence = input_sequence(lrc)
    y_pred = model.predict(X_sequence)
    word = next_word(y_pred)
    lrc += word
lrc = re.sub('\s+'.' ', lrc) All you need is a space
print(lrc)
Copy the code
When the rhythm began to turn whether the heart of this flying side line full of love a change of heart will feel there will be before oh my before I just want to come away from the dream of your high tears said to far enough to remember to square really forget to keep this rebirth one still understand the spring people's love from the force that is the only this is twelve students care about my love do not want to use words But see you like to look at the head forget a loud you in my heart a few times with a happy and after than any good broken love words from the head have heart fast OUT OF my dream with all life pain also don't listen to meetCopy the code

But we don’t understand what he’s talking about, right

Functional form

def generating_lrc(lrc, length=200):
    for i in range(200):
        X_sequence = input_sequence(lrc)
        y_pred = model.predict(X_sequence)
        word = next_word(y_pred)
        lrc += word
    lrc = re.sub('\s+'.' ', lrc) All you need is a space
    return lrc
Copy the code
generating_lrc(Big City, little Love)
Copy the code
'Big city small love can not close not to come I don't cut will say any forgotten words cold to my eyes spring has played with me to listen to clear in the life to see this a love excuse me you pull hero I can return to the injury forever in the eye wish but don't think how can I star music good You accompany all know that you are my hand is you, my wish is here, spring is gone, now it is time for you to come, in every sad heart of my heart, people will become today, you are the god of the whole world with the season of the flowers. '
Copy the code
generating_lrc('He doesn't know how to sing with Dan Tian.')
Copy the code
'he wouldn't sing in abdomen Root root spikes spikes He wouldn't sing in abdomen Root root spikes spikes He wouldn't sing in abdomen Root root spikes spikes He wouldn't sing in abdomen Root root spikes spikes He wouldn't sing in abdomen Root root He can't sing with Dan Tian at all. He can't sing with Dan Tian at all. He can't sing with Dan Tian at all.
Copy the code

The resources

For details, go to Github: an LSTM generated lyric exercise

  1. Generating Drake Rap Lyrics using Language Models and LSTMs
  2. Sequence preprocessing – Keras Chinese document
  3. Embedding layer Embedding – Keras Chinese document
  4. Lost-keras (Lost-keras
  5. Understanding and use of Embedding in Keras in deep learning