The development tools

[Deep learning] poetry writing robot tensorflow implementation

February 7, 2024

by Samiha Gokhale

No Comments

Code address: github.com/hjptriplebe… Welcome to Fork Star

The robot named MC Panghu is only the most simple and crude method at present. It is completed using Tensorflow, which looks like artificial retarded. It is in line with the character setting of Panghu.

LSTM principle of online information a lot, don’t understand can look here: www.jianshu.com/p/9dc9f41f0…

This paper mainly explains the realization of poetry writing robot, not too much theory and tensorflow use method, good to start.

Training data preprocessing

3W Tang poems were used as training data, which can be found in the github dataset folder in the format of “Title: Poem”, as shown below:

We first separated the topic from the content, and then filtered some bad training samples through data cleaning, including special symbols, too few words or too many words, and finally added the beginning and end symbols before and after the poem to tell LSTM that this is the beginning and end, which are represented by square brackets.

[python]
view plain
copy
print
?

poems = []
file = open(filename, “r”)
for line in file: #every line is a poem
#print(line)
title, poem = line.strip().split(“:”) #get title and poem
poem = poem.replace(‘ ‘.‘ ‘)
if ‘_’ in poem or ‘ ‘ ‘ in poem or ‘[‘ in poem or ‘(‘ in poem or ‘（’ in poem:
continue
if len(poem) < 10 or len(poem) > 128: #filter poem
continue
poem = ‘[‘ + poem + ‘] ‘ #add start and end signs
poems.append(poem)

poems = [] file = open(filename, "r") for line in file: #every line is a poem #print(line) title, Break = line.strip().split(":") #get title and poem poem = poem.replace(", ") if '_' in poem or 'in poem or '[' in Poem or '(' in poem or ') : continue if len(poem) < 10 or len(poem) > 128: #filter poem continue poem = '[' + poem + ']' #add start and end signs poems.append(poem)Copy the code

Then the number of occurrence of each word is counted, and the rare words with less occurrence are deleted

[python]
view plain
copy
print
?

#counting words
allWords = {}
for poem in poems:
for word in poem:
if word not in allWords:
allWords[word] = 1
else:
allWords[word] += 1
# erase words which are not common
erase = []
for key in allWords:
if allWords[key] < 2:
erase.append(key)
for key in erase:
del allWords[key]

#counting words
allWords = {}
for poem in poems:
    for word in poem:
        if word not in allWords:
            allWords[word] = 1
        else:
            allWords[word] += 1
# erase words which are not common
erase = []
for key in allWords:
    if allWords[key] < 2:
        erase.append(key)
for key in erase:
    del allWords[key]Copy the code

Sort by the number of occurrences of the word to establish a mapping of the word to the ID. Why do we need to sort? The sorted ID represents the occurrence frequency of words to a certain extent, and there is a certain relationship between the two. It is easier to make the model learn rules than direct mapping without sorting.

Add a space character, because the length of the poem is inconsistent and needs to be filled with Spaces, so leave the ID of the space. Finally, the poem is transformed into a word vector form.

[python]
view plain
copy
print
?

wordPairs = sorted(allWords.items(), key = lambda x: -x[1])
words, a= zip(*wordPairs)
words += (“”.)
wordToID = dict(zip(words, range(len(words)))) #word to ID
wordTOIDFun = lambda A: wordToID.get(A, len(words))
poemsVector = [([wordTOIDFun(word) for word in poem]) for poem in poems] # poem to vector

wordPairs = sorted(allWords.items(), key = lambda x: -x[1])
words, a= zip(*wordPairs)
words += (" ", )
wordToID = dict(zip(words, range(len(words)))) #word to ID
wordTOIDFun = lambda A: wordToID.get(A, len(words))
poemsVector = [([wordTOIDFun(word) for word in poem]) for poem in poems] # poem to vectorCopy the code

Next, construct the training batch, and fill in the blanks of all the poems in each batch until the length reaches the maximum length of the poem. Because the blanks are filled, the model learns that Spaces are followed by Spaces. X and Y represent the input and output respectively, and the output is the dislocation of the input, that is, the output from the word seen by the model should be the next word.

Make sure you use NP. Copy here.

[python]
view plain
copy
print
?

#padding length to batchMaxLength
batchNum = (len(poemsVector) – 1) // batchSize
X = []
Y = []
#create batch
for i in range(batchNum):
batch = poemsVector[i * batchSize: (i + 1) * batchSize]
maxLength = max([len(vector) for vector in batch])
temp = np.full((batchSize, maxLength), wordTOIDFun(“”), np.int32)
for j in range(batchSize):
temp[j, :len(batch[j])] = batch[j]
X.append(temp)
temp2 = np.copy(temp) #copy!!!!!!
temp2[:, :-1] = temp[:, 1:]
Y.append(temp2)

#padding length to batchMaxLength
batchNum = (len(poemsVector) - 1) // batchSize
X = []
Y = []
#create batch
for i in range(batchNum):
    batch = poemsVector[i * batchSize: (i + 1) * batchSize]
    maxLength = max([len(vector) for vector in batch])
    temp = np.full((batchSize, maxLength), wordTOIDFun(" "), np.int32)
    for j in range(batchSize):
        temp[j, :len(batch[j])] = batch[j]
    X.append(temp)
    temp2 = np.copy(temp) #copy!!!!!!
    temp2[:, :-1] = temp[:, 1:]
    Y.append(temp2)Copy the code

To build the model

Build an LSTM model, followed by Softmax, and output the probability of each word. Here, copy the LSTM template and change the parameters.

[python]
view plain
copy
print
?

with tf.variable_scope(“embedding”) :#embedding
embedding = tf.get_variable(“embedding”, [wordNum, hidden_units], dtype = tf.float32)
inputbatch = tf.nn.embedding_lookup(embedding, gtX)
basicCell = tf.contrib.rnn.BasicLSTMCell(hidden_units, state_is_tuple = True)
stackCell = tf.contrib.rnn.MultiRNNCell([basicCell] * layers)
initState = stackCell.zero_state(np.shape(gtX)[0], tf.float32)
outputs, finalState = tf.nn.dynamic_rnn(stackCell, inputbatch, initial_state = initState)
outputs = tf.reshape(outputs, [-1, hidden_units])
with tf.variable_scope(“softmax”) :
w = tf.get_variable(“w”, [hidden_units, wordNum])
b = tf.get_variable(“b”, [wordNum])
logits = tf.matmul(outputs, w) + b
probs = tf.nn.softmax(logits)

with tf.variable_scope("embedding"): #embedding
    embedding = tf.get_variable("embedding", [wordNum, hidden_units], dtype = tf.float32)
    inputbatch = tf.nn.embedding_lookup(embedding, gtX)

basicCell = tf.contrib.rnn.BasicLSTMCell(hidden_units, state_is_tuple = True)
stackCell = tf.contrib.rnn.MultiRNNCell([basicCell] * layers)
initState = stackCell.zero_state(np.shape(gtX)[0], tf.float32)
outputs, finalState = tf.nn.dynamic_rnn(stackCell, inputbatch, initial_state = initState)
outputs = tf.reshape(outputs, [-1, hidden_units])

with tf.variable_scope("softmax"):
    w = tf.get_variable("w", [hidden_units, wordNum])
    b = tf.get_variable("b", [wordNum])
    logits = tf.matmul(outputs, w) + b

probs = tf.nn.softmax(logits)Copy the code

Model training

Firstly, input and output are defined, model is constructed, and then parameters such as loss function and learning rate are set.

[python]
view plain
copy
print
?

gtX = tf.placeholder(tf.int32, shape=[batchSize, None]) # input
gtY = tf.placeholder(tf.int32, shape=[batchSize, None]) # output
logits, probs, a, b, c = buildModel(wordNum, gtX)
targets = tf.reshape(gtY, [-1])
#loss
loss = tf.contrib.legacy_seq2seq.sequence_loss_by_example([logits], [targets],
[tf.ones_like(targets, dtype=tf.float32)], wordNum)
cost = tf.reduce_mean(loss)
tvars = tf.trainable_variables()
grads, a = tf.clip_by_global_norm(tf.gradients(cost, tvars), 5)
learningRate = learningRateBase
optimizer = tf.train.AdamOptimizer(learningRate)
trainOP = optimizer.apply_gradients(zip(grads, tvars))
globalStep = 0

gtX = tf.placeholder(tf.int32, shape=[batchSize, None])  # input
gtY = tf.placeholder(tf.int32, shape=[batchSize, None])  # output
logits, probs, a, b, c = buildModel(wordNum, gtX)
targets = tf.reshape(gtY, [-1])
#loss
loss = tf.contrib.legacy_seq2seq.sequence_loss_by_example([logits], [targets],
                                                          [tf.ones_like(targets, dtype=tf.float32)], wordNum)
cost = tf.reduce_mean(loss)
tvars = tf.trainable_variables()
grads, a = tf.clip_by_global_norm(tf.gradients(cost, tvars), 5)
learningRate = learningRateBase
optimizer = tf.train.AdamOptimizer(learningRate)
trainOP = optimizer.apply_gradients(zip(grads, tvars))
globalStep = 0Copy the code

Then start training, training to see if you can find the checkpoint, find the restoration, otherwise re-training. Then read the data in batch step by step for training, and the learning rate gradually decreases. Save the model every few steps.

[python]
view plain
copy
print
?

with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
saver = tf.train.Saver()
if reload:
checkPoint = tf.train.get_checkpoint_state(checkpointsPath)
# if have checkPoint, restore checkPoint
if checkPoint and checkPoint.model_checkpoint_path:
saver.restore(sess, checkPoint.model_checkpoint_path)
print(“restored %s” % checkPoint.model_checkpoint_path)
else:
print(“no checkpoint found!”)
for epoch in range(epochNum):
if globalStep % learningRateDecreaseStep == 0: #learning rate decrease by epoch
learningRate = learningRateBase * (0.95 ** epoch)
epochSteps = len(X) # equal to batch
for step, (x, y) in enumerate(zip(X, Y)):
#print(x)
#print(y)
globalStep = epoch * epochSteps + step
a, loss = sess.run([trainOP, cost], feed_dict = {gtX:x, gtY:y})
print(“epoch: %d steps:%d/%d loss:%3f” % (epoch,step,epochSteps,loss))
if globalStep%1000= =0:
print(“save model”)
saver.save(sess,checkpointsPath + “/poem”,global_step=epoch)

with tf.Session() as sess: sess.run(tf.global_variables_initializer()) saver = tf.train.Saver() if reload: checkPoint = tf.train.get_checkpoint_state(checkpointsPath) # if have checkPoint, restore checkPoint if checkPoint and checkPoint.model_checkpoint_path: saver.restore(sess, checkPoint.model_checkpoint_path) print("restored %s" % checkPoint.model_checkpoint_path) else: print("no checkpoint found!" ) for epoch in range(epochNum): if globalStep % learningRateDecreaseStep == 0: # Learning rate decrease by epoch learningRate = LearninGrateful Base * (0.95 ** epoch) epochSteps = Len (X) # equal to batch for step, (x, y) in enumerate(zip(X, Y)): #print(x) #print(y) globalStep = epoch * epochSteps + step a, loss = sess.run([trainOP, cost], feed_dict = {gtX:x, gtY:y}) print("epoch: %d steps:%d/%d loss:%3f" % (epoch,step,epochSteps,loss)) if globalStep%1000==0: print("save model") saver.save(sess,checkpointsPath + "/poem",global_step=epoch)Copy the code

Automatic writing poetry

Before writing poems automatically, we need to define a function whose output probability corresponds to the word. In order to avoid the same poems generated every time, we need to introduce some randomness. Instead of selecting the word with the highest output probability, the probability is mapped to an interval and sampled randomly on the interval. The corresponding interval of the word with high output probability is large and the probability of being sampled is also large. However, fat Tiger also has a small probability to choose other words. Because of the randomness of each word, the poem is completely different each time.

[python]
view plain
copy
print
?

def probsToWord(weights, words):
“””probs to word”””
t = np.cumsum(weights) #prefix sum
s = np.sum(weights)
coff = np.random.rand(1)
index = int(np.searchsorted(t, coff * s)) # large margin has high possibility to be sampled
return words[index]

def probsToWord(weights, words):
    """probs to word"""
    t = np.cumsum(weights) #prefix sum
    s = np.sum(weights)
    coff = np.random.rand(1)
    index = int(np.searchsorted(t, coff * s)) # large margin has high possibility to be sampled
    return words[index]Copy the code

Then start writing the poem, again building the model, defining parameters, and checkpoint loading.

[python]
view plain
copy
print
?

gtX = tf.placeholder(tf.int32, shape=[1.None]) # input
logits, probs, stackCell, initState, finalState = buildModel(wordNum, gtX)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
saver = tf.train.Saver()
checkPoint = tf.train.get_checkpoint_state(checkpointsPath)
# if have checkPoint, restore checkPoint
if checkPoint and checkPoint.model_checkpoint_path:
saver.restore(sess, checkPoint.model_checkpoint_path)
print(“restored %s” % checkPoint.model_checkpoint_path)
else:
print(“no checkpoint found!”)
exit(0)

gtX = tf.placeholder(tf.int32, shape=[1, None]) # input logits, probs, stackCell, initState, finalState = buildModel(wordNum, gtX) with tf.Session() as sess: sess.run(tf.global_variables_initializer()) saver = tf.train.Saver() checkPoint = tf.train.get_checkpoint_state(checkpointsPath) # if have checkPoint, restore checkPoint if checkPoint and checkPoint.model_checkpoint_path: saver.restore(sess, checkPoint.model_checkpoint_path) print("restored %s" % checkPoint.model_checkpoint_path) else: print("no checkpoint found!" ) exit(0)Copy the code

GenerateNum So many poems are generated. Each poem starts with left brackets and ends with right brackets or Spaces. Prob generated each time is converted into words using probsToWord method.

[python]
view plain
copy
print
?

poems = []
for i in range(generateNum):
state = sess.run(stackCell.zero_state(1, tf.float32))
x = np.array([[wordToID[‘[‘]]]) # init start sign
probs1, state = sess.run([probs, finalState], feed_dict={gtX: x, initState: state})
word = probsToWord(probs1, words)
poem = ‘ ‘
whileword ! =‘] ‘ andword ! = ‘ ‘:
poem += word
if word == ‘. ‘:
poem += ‘\n’
x = np.array([[wordToID[word]]])
#print(word)
probs2, state = sess.run([probs, finalState], feed_dict={gtX: x, initState: state})
word = probsToWord(probs2, words)
print(poem)
poems.append(poem)

poems = [] for i in range(generateNum): state = sess.run(stackCell.zero_state(1, tf.float32)) x = np.array([[wordToID['[']]]) # init start sign probs1, state = sess.run([probs, finalState], feed_dict={gtX: x, initState: state}) word = probsToWord(probs1, words) poem = '' while word != ']' and word ! ': poem += word if word == '. ': poem += '\n' x = np.array([[wordToID[word]]]) #print(word) probs2, state = sess.run([probs, finalState], feed_dict={gtX: x, initState: state}) word = probsToWord(probs2, words) print(poem) poems.append(poem)Copy the code

You can also write an epilogue, build a model, load checkpoint, and so on. When you come across punctuation marks, you can manually control the next word input to the specified word. Note that after the punctuation, you need to scroll state forward and skip the generation of the word because no word is selected for the output of the model.

[python]
view plain
copy
print
?

flag = 1
endSign = {-1: “，”.1: “。”}
poem = ‘ ‘
state = sess.run(stackCell.zero_state(1, tf.float32))
x = np.array([[wordToID[‘[‘]]])
probs1, state = sess.run([probs, finalState], feed_dict={gtX: x, initState: state})
for c in characters:
word = c
flag = -flag
whileword ! =‘] ‘ andword ! = ‘, ‘ andword ! =‘. ‘ andword ! = ‘ ‘:
poem += word
x = np.array([[wordToID[word]]])
probs2, state = sess.run([probs, finalState], feed_dict={gtX: x, initState: state})
word = probsToWord(probs2, words)
poem += endSign[flag]
# keep the context, state must be updated
if endSign[flag] == ‘. ‘:
probs2, state = sess.run([probs, finalState],
feed_dict={gtX: np.array([[wordToID[“。”]]]), initState: state})
poem += ‘\n’
else:
probs2, state = sess.run([probs, finalState],
feed_dict={gtX: np.array([[wordToID[“，”]]]), initState: state})
print(characters)
print(poem)

EndSign = {-1: ", ", 1: ". } poem = '' state = sess.run(stackCell.zero_state(1, tf.float32)) x = np.array([[wordToID['[']]]) probs1, state = sess.run([probs, finalState], feed_dict={gtX: x, initState: state}) for c in characters: word = c flag = -flag while word != ']' and word ! = ', 'and word! = '. ' and word ! = ' ': poem += word x = np.array([[wordToID[word]]]) probs2, state = sess.run([probs, finalState], feed_dict={gtX: x, initState: state}) word = probsToWord(probs2, words) poem += endSign[flag] # keep the context, State must be updated if endSign[flag] == '. ': probs2, state = sess.run([probs, finalState], feed_dict={gtX: np.array([[wordToID[". "]]])), initState: state}) poem += '\n' else: probs2, state = sess.run([probs, finalState], feed_dict={gtX: Np. array([[wordToID[", "]]]), initState: state}) print(characters) print(poem)Copy the code

Training 20EPOCH on GPU is good!

Code address: github.com/hjptriplebe… Welcome to Fork Star

It is estimated that there will be a follow-up picture writing robot MC Fat Tiger 2.0

So much for the fat tiger!