The author | Carly Stambaugh

Source | AI technology review

People have found more and more things to do with neural networks, such as drawing and writing poetry, and Microsoft’s Xiaoice has published a book of poetry. Automattic data scientist Carly Stambaugh has written a post showing how quickly and easily to build an AI that can write poetry.

“Code is poetry” is the philosophy of the WordPress community.

As a coder and a poet, I have always loved this sentence. However, if I turn that on its head, I can’t help asking, “Can I write poetry in code? Can I create a machine that can write original poetry?” So I did a series of experiments to try to answer that question.

First, we all know that if a machine wants to learn to write poetry, it must learn to read poetry. Throughout the 2017 years, to use the WordPress system, mark for the post of poetry published more than 500000 articles (https://wordpress.com/tag/poetry). I contacted a few prolific poets who share their work through WordPress and asked them if they would like to collaborate with me on an interesting experiment: could my machine read their work, so that my machine could learn the form and structure of poetry, and eventually make poetry itself?

O at the Edges — Robert Okaji (https://robertokaji.com)

Wolff Poetry — Linda J. Wolff (http://wolffpoetry.com)

Poetry, Short Prose and Walking — Frank Hubeny (https://frankhubeny.blog)

Perspectives on Life, the Universe and Everything – Aurangzeb Bozdar (https://abozdar.wordpress.com)

What is LSTM and how does it generate text?

I used a neural network called LSTM (Long Short Term Memory Network) to build my poetry robot.

Neural networks decompose a problem into many small problems by hierarchical structure. For example, if you wanted to train a neural network to recognize squares, one layer might recognize right angles and another might recognize parallel lines. All of these features are rendered by the machine in order to identify the image as a square. The neural network learns the parameters of these necessary features by feeding millions of square images into the training model. The machine also learns which features of the picture are important and which are not important to recognize squares.

Now, suppose you wanted to use a neural network to predict the next letter of these two letters:

Th_

For one person, this task is very simple. Most likely, you guessed the next letter would be e. But, I bet if you were an English speaker, you wouldn’t guess that the next letter would be Q. This is because, as you have learned, q does not follow th in English. The letters at the beginning of a word are highly predictive of what letters will follow. An LSTM can “remember” its previous state and tell it about its current decision process. For a more in-depth explanation of how LSTM works, check out this excellent article by Chris Olah of Google Brain.

Like many LSTM-based text generation cases, my poetry robot generates text by generating it one character at a time. So to put words together in any meaningful pattern, poetry robots must first learn how to make them. To do this, it needs millions of examples of valid words. Thankfully, WordPress.com has a wealth of poetry.

Data set preparation

First, I grabbed poems from all the sites listed above from the Elasticsearch index. I used a very simple rule (based on the number of words between each “\n” character encountered and the last “\n”) to clean out everything but the text of the poem. If a piece of text contains many words but few “\n” characters, it may be a collection of one or more paragraphs. However, a piece of text that spans more than one line is more likely to be a poem. This is an easy way to do it, and of course, I can think of many excellent poems that don’t meet this rule! However, for the purpose of this experiment, I am interested in whether LSTM can learn poetic structures such as line breaks and sections, as well as other rhetorical devices such as rhymes, rhymes, complementary rhymes and alliteration. Therefore, it makes sense to limit the training data to fairly structured poems.

Once a piece of text is identified as a poem, I output it to a text file and prefix it with “++++\n” to indicate the start of a new poem. Doing so produces about 500KB of training data. Usually, I try to train an LSTM network with at least 1MB of text, therefore, I need to find more poems! To supplement the work of more featured poets, I used a random sample generated from public posts flagged as poetry published last year. It is as if you use in the WordPress.com reader poetry after tabbed browsing (https://en.wordpress.com/tag/poetry/). I have limited the size of randomly captured poems to one post per poet.

Train the LSTM network

When I had more than 1MB of poetry, I started building an LSTM network. I use the Python deep learning library Keras for all my neural network needs. Keras (https://github.com/keras-team/keras) ‘s Repo repository on Github has a number of sample files to help you learn about a range of different neural networks, Including the use of LSTM generated text (https://github.com/keras-team/keras/blob/master/examples/lstm_text_generation.py). I coded my model based on this example and started experimenting with different model configurations. The goal of this model is to produce original poetry. In this case, overfitting, in other words, studying the training data in so much detail that the model does not generalize well, results in the resulting text being too similar to the input text. (It’s like plagiarism, and no poet would like to do that!) One way to prevent overfitting is to use dropout in the network. This forces a random subset of node weights to be reduced to 0 in each batch of training. This is a bit like forcing the network to “forget” something it has just learned. (I also added extra post-processing to check to prevent poets from being copied by poetry bots.)

I use FloydHub (https://www.floydhub.com/) GPU heavy neural network training is to finish my work. This allowed me to train my neural network nearly ten times faster than my laptop. My first neural network had an LSTM layer followed by a Dropout layer. The network produces a text that looks a lot like poetry. It has line breaks and poetic chapters, and almost all character combinations are real words. Occasionally the whole line is relatively smooth. In fact, its first iteration produced this wonderful line:

I added a few more LSTM layers, trying to vary the degree of dropout in each layer, until I settled on a final model as shown in the code below. I ended up using the three-tier LSTM because the training time started to get really long and the results were pretty good. (Below is the program code)

1. model = Sequential()

2. model.add(LSTM(300, input_shape=(maxlen, len(chars)), return_sequences=True, dropout=.20, recurrent_dropout=.20))

3. model.add(LSTM(300, return_sequences=True, dropout=.20, recurrent_dropout=.20))

4. model.add(LSTM(300, dropout=.20, recurrent_dropout=.20))

5. model.add(Dropout(.20))

6. model.add(Dense(len(chars)))

7. model.add(Activation(‘softmax’))

8. model.compile(loss=’categorical_crossentropy’, optimizer=’adam’)

Here is a graph comparing the loss function curve of the model as the number of LSTM layers increases.

As the number of LSTM layers in the model increases, the verification loss decreases rapidly

Oh! What’s going on here? (https://stats.stackexchange.com/questions/303857/explanation-of-spikes-in-training-loss-vs-iterations-with-adam-optimiz Er) in fact, this is quite common when we use the Adam optimizer to train models. Note that as I add LSTM layers to the network, the validation loss of the model as a whole continues to decline at a rapid rate. This indicates that feasible experimental results can be obtained in a small number of iterations, but additional LSTM layers will increase the training time of each iteration. When training a single-layer LSTM, each iteration takes about 600 seconds, and the experiment can be completed in one night. However, training the three-layer LSTM took 7,000 seconds per iteration, and a total of several days to complete. So, faster declines in validation losses don’t actually mean faster results. Entirely from my subjective point of view, poems generated using a network of three LSTM layers are better, although it takes more time to train.

Generate the poetry

To produce completely original text, I also needed to change the way the text was generated. In the example in the Keras library, the script selects a random sequence of characters from the training data as the input, the seed of the training neural network. I want to build a poetry robot that can write original poetry, not transcribe other poets’ lines! Therefore, I tried different seeds in the steps to generate the text. Since I had already added the beginning of “++++\n” to each poem in the training set, I thought this was already paying attention to creating completely original poems. But the result is a meaningless set of “\n” and “.” , “_” and “&”. Despite trial and error, I found that the seed sequence needed to have the same number of characters as the training sequence. With hindsight, this is obvious. In the end, I used a sequence of 300 characters, and I generated exactly 300 characters of seed for text generation by repeating “++++\n”. The poetry robot generates several poems per round and occasionally separates them with “++++\n”.

After the script generated a new round of poems, I did a final plagiarism check. To do this, I first built up a collection of all 4-grams (four-word phrases) in the training set, and did the same with the poems written by my poetry robot. And then I figured out the intersection of the two sets. For the purposes of this experiment, I manually checked the 4-gram to make sure that the phrases that appear in both 4-Gram sets are meaningless. Most of the time, phrases in this intersection look like this:

i don’t want

i can not be

i want to be

the sound of the

To get better test results, I repeated this step on the 5-gram and 6-gram. If I were to automate this process, I would probably adopt a frequency-based approach and exclude n-grams that are considered plagiarized and common to multiple authors’ poems.

Magical poem!

The weight of the model output after each iteration means that we can load snapshots of the model at some nodes during training. When we looked at the early iterations of the final model, it was clear that the poetry robot would immediately pick up the line breaking technique. I expected this because, according to the design, the most striking feature of the training design is the small number of characters per line. Here is a poem generated by an iterative round of training:

Poetry robots have learned some real words and mimicked the common practice of leaving space between lines. At first glance, if you don’t look closely, it looks like a poem. After the convergence of the loss function of the single-layer LSTM model, in addition to line breaking, the model also learned the standosis of the poem, and even showed some common repetitive poetic rhetoric devices.

The power of the LSTM is evident in a single line of verse. Besides the line in the title of this article, another of my favorite lines is:

With the help of Inspirobot, one of the funniest aphoristic robots of all time, Demet learned from her favorite lines to create these gems:

Although the single LSTM model does not fully grasp the theme in a poem, it seems to have a common thread throughout the creation of the work. Here is a word cloud generated from all the poems generated by the single-layer LSTM model:

How intoxicating! : The robot was fascinated by the sun and the stars

It wouldn’t be surprising if the sun was the most common theme in training data, but it’s not! Here is a word cloud generated from the training data:

Poets love to sing the praises of love

Emily Dickinson wrote poems about nature and death. My poetry robot writes poetry about celestial bodies. Each to his own!

After the addition of a second layer of LSTM networks, I began to see other poetic tropes like alliteration and rhyme:

It also began to write some very poetic phrases. These phrases are somewhat similar to the occasional great verse produced by the previous model, but they sometimes span more than one line. Such as:

Oh, my gosh! That’s profound!

At this point we have seen line breaks, rhythms, rhymes (both in the middle and at the end), repetition and alliteration. That’s not bad! But, except for the occasional good line, most of the poetry robots produced was a collection of broken words. In most cases, its meaningless phrases are not even grammatically structured.

However, with the addition of layer 3 LSTM, this situation has changed. Even if it still doesn’t make sense, the model’s resulting verse is more likely to be grammatically correct. Such as:

This sentence doesn’t seem to make sense, but it neatly arranges the components of the language. It also contains alliteration, and noun clauses have a poetic feel. The network model of the three-tier LSTM also produces three lines of poetry that I think are very powerful and poetic:

However, the complete poem below can be called the crowning achievement of the three-tier LSTM model.

The poem is not an excerpt from a long text. These lines are firmly separated between two “++++\n”!

Look how funny human nature is! We are unique, and the possibilities within us are endless.

Special thanks to the poets who collaborated with me on this interesting experiment! Please be sure to visit their website to enjoy their excellent work!

O at the Edges — Robert Okaji

Wolff Poetry — Linda J. Wolff

Poetry, Short Prose and Walking — Frank Hubeny

Perspectives on Life, the Universe and Everything — Aurangzeb Bozdar

End