The chicken soup of 20 lines of Python code is just around the corner

Friendship tip: the end has welfare!!

I’ll start with some chicken soup:

“Don’t think of the overwhelming majority of the impossible.”

“Don’t think impossible”

“Grew up your bliss and the world.”

“Strive to win my own happiness and the world”

“What We would end create, creates the ground and you are the one to warm it”

“The creation that we wanted to end made the earth, and you were the one to hold it.”

Look and Give Up in Miracles

“Hope for miracles, give up illusions.”

But the above chicken soup sentences are all computer-generated, and they took less than 20 lines of Python to generate.

When it comes to natural language generation, people often think that it must be a very advanced AI system, using very advanced mathematics. But that’s not the case. In this article, I will use Markov Chains and a small chicken soup text data set to generate a new chicken soup text.

Markov chain

A Markov chain is a stochastic model that can predict an event individually based on previous events. For a simple example, let’s explain the transition in my cat’s life. My cat always eats, sleeps and plays with toys. She sleeps most of the time, but wakes up occasionally to eat. Usually, after eating, she would be refreshed and play with her toys. When she had finished, she would go back to sleep and wake up again to eat.

Using the Markov chain, it’s easy to simulate the life of my cat’s owner, because she decides what to do next based on her previous state. She doesn’t usually wake up and go straight to play with toys, but after eating, there’s a good chance she’ll play for a while. These life transitions can also be graphed:

Each oval represents one state of life, the arrow points to the next state of life, and the number next to the arrow refers to the probability of her moving from one state to the other. As we can see, the probability of state transition is basically based on the last state of life.

Generate text using markov chains

The same idea is used to generate text using Markov chains, trying to find the probability of one word coming after another. To confirm the possibility of these transformations, we train the model with some examples.

For example, we train the model with these sentences:

I like to eat apples. You eat oranges.

From the above two training sentences, we can conclude that “I”, “like” and “eat” always appear in the same order, while “you” and “eat” always go together. But “orange” and “apples” appear equally often after the word “eat.” Here’s a conversion chart that better shows what I’m talking about:

These two training sentences can produce two new sentences, but this is not always the case. I trained another model with these four sentences, and the results were quite different:

My friend makes the best raspberry pies in town. I think apple pies are the best pies. Steve thinks Apple makes the best computers in the world. He thinks Apple makes the best computers in the world. I have two computers. They are not Apple computers. I own two computers and they’re not apple because I am not Steve or rich.

The transformation diagram of the model trained with these four sentences will be much larger.

While the diagram looks very different from a typical Markov chain transformation diagram, the main idea behind both is the same.

The path from the start node randomly selects the following words all the way to the end node. The width of the linking paths between words indicates the probability that words will be selected.

Although trained with only four sentences, the model above can generate hundreds of different sentences.

code

The code for the text generator above is very simple and does not require any additional modules or libraries other than Python’s random modules. The code consists of two parts, one for training and one for generation.

training

The training code constructs the model we will use later to generate chicken soup sentences. I use a dictionary as a model that contains a few words as key points and a list of possible following words as corresponding values. For example, a dictionary of models trained with the above two sentences “I like to eat apples” and “You eat oranges” would look something like this:

{'START': ['i'.'you'].'i': ['like'].'like': ['to'].'to': ['eat'].'you': ['eat'].'eat': ['apples
Copy the code

We do not need to calculate the probability of following words, because if they have a high probability, they will appear multiple times in the list of possible following words. For example, if we wanted to add the training sentence “‘ we eat apples’, the word ‘apples’ would be highly likely to appear after the word’ eat ‘in two sentences already. In the model’s lexicon, two occurrences in the “eat” list are considered high probability.

{'START': ['i'.'we'.'you'].'i': ['like'].'like': ['to'].'to': ['eat'].'you': ['eat'].'we'
Copy the code

In addition, there are two more terms in the model dictionary above: “START” and “END,” which denote the beginning and END words of a generated sentence.

for line in dataset_file:
    line = line.lower().split()
    for i, word in enumerate(line):
        if i == len(line)-1:   
            model['END'] = model.get('END', []) + [word]
        else:    
            if i == 0:
                model['START'] = model.get('START', []) + [word]
            model[word] = model.get(word, []) + [line[i+1]] 
Copy the code

Make chicken soup sentences

The generator section contains a loop. It first selects a random starting word and adds it to a list, then searches the dictionary for a list of potential follow-on words and randomly selects a list to add the new selection to that list. The generator selects random potential follow-words until it finds the ending word, then stops the loop and outputs the resulting sentence or so-called “quote.”

import random 

generated = []
while True:
    if not generated:
        words = model['START']
    elif generated[-1] in model['END'] :break
    else:
        words = model[generated[-1]]
    generated.append(random.choice(words))
Copy the code

I’ve made a lot of chicken text with Markov chains, but as a text generator, you can type in any text and make it produce similar sentences.

Another cool thing you can do with a Markov chain text generator is mix different text types. For example, one of my favorite TV shows, rick and Morty, has a character called Abradolf Lincler, a hybrid of Abraham Lincoln and Adolf Hitler.

You can also do this by typing the names of some famous people into the Markov chain and having it generate a playful mashup of character names (say, Gerda Statham)

Nicholas Zhao

You can even go one step further and take quotes from famous people, such as Lincoln and Hitler’s speeches, and mix them with Markov chains to create a whole new style of speech.

Markov chains can be used in almost any field, and while text generation is not the most useful application, I do find it interesting. What if your chicken soup will one day attract more fans than Mimun?

Take this bonus and make some chicken soup:

We’ll continue the negative Basics Introduction to Python live tonight at 20:00! For free! For free! For free!

The live address is here.

Click here for the summary of past sessions

The chicken soup of 20 lines of Python code is just around the corner

Markov chain

Generate text using markov chains

code

training

Make chicken soup sentences

Take this bonus and make some chicken soup:

Related Posts

Java JUC AtomicLong

In-depth understanding of the Virtual machine’s Java memory region

Python + Flask writes a simple login interface example