• Soul of the Machine: How Chatbots Work
  • George Kassabgi
  • The Nuggets translation Project
  • Permanent link to this article: github.com/xitu/gold-m…
  • Translator: lsvih
  • Proofreader: lileizhenshuai jasonxia23

Spirit of the Machine: How do chatbots work

Since the early industrial age, humans have been fascinated by devices that operate autonomously. Because they represent the “humanization” of technology.

And today, all kinds of software are gradually becoming humanized. One of the most obvious changes is the “chatbot”.

But how do these “machines” work? First, let’s go back in time and explore a primitive, but similar, technique.

How does a music box work

Early example of automation – mechanical music box. A set of tuned metal teeth arranged in a comb structure on the edge of a cylindrical needle. Each needle corresponds to a note at a particular time.

When the machine turns, it will produce music at a predetermined time through the plucking of a single or multiple needles. If you want to play different songs, you have to play different cylinders (assuming that the specific notes are the same for different songs).

In addition to producing musical notes, the rotation of the cylinder can be accompanied by other actions, such as moving figurines. Either way, the basic mechanics of the music box won’t change.

How do chatbots work

The input text is processed by a function called a classifier, which associates an input sentence with an “intention” (the purpose of the chat) and then generates a response to that “intention”.

An example of a chatbot

You can think of a classifier as a way of grouping a piece of data (in one sentence) into one of several categories (that is, an intent). Type in one sentence: “How are you?” , will be classified as an intention and then associated with a response (such as “I’m good” or better, “I am well”).

We learn early in basic science that chimpanzees are “mammals,” blue birds are “birds,” earth is “planet,” and so on.

Generally speaking, there are three different ways to classify texts. Think of them as software machines built for a specific purpose, like the cylinders of a music box.

Text classification methods for chatbots

  • Pattern matching
  • algorithm
  • The neural network

No matter which classifier you use, the end result is always a response. Music boxes can take advantage of mechanical connections to perform additional “actions,” as can chatbots. Additional information can be used in response (weather, sports scores, web searches, etc.), but this information is not part of a chatbot, it’s just extra code. Responses can also be generated based on certain “parts of speech” in the sentence (such as a proper noun). In addition, the intended response can also use logical conditions to determine the “state” of the conversation to provide several different responses, which can also be done by random selection (to make the conversation more “natural”).

Pattern matching

Early chatbots used pattern matching to sort text and generate responses. This approach is often referred to as the “brute force approach” because the author of the system needs to detail all the patterns for a response.

The standard structure for these patterns is “AIML” (Artificial Intelligence Markup Language). The noun uses “artificial intelligence” as a modifier, but they’re not the same thing at all.

Here is a simple pattern matching definition:

<aiml version = "1.0.1" encoding = "UTF-8"? > <category> <pattern> WHO IS ALBERT EINSTEIN </pattern> <template>Albert Einstein was a German physicist.</template> </category> <category> <pattern> WHO IS Isaac NEWTON </pattern> <template>Isaac Newton was a English physicist and mathematician.</template> </category> <category> <pattern>DO YOU KNOW WHO * IS</pattern> <template> <srai>WHO IS <star/></srai> </template> </category> </aiml>Copy the code

And then the machine is processed and says,

Human: Do you know who Albert Einstein is
Robot: Albert Einstein was a German physicist.Copy the code

It knows which physicist is being asked only by matching patterns associated with his or her name. Again, it can respond to any intention with the creator’s preset patterns. Give it thousands of modes, and you’ll eventually see a “humanoid” chatbot emerge.

In 2000, John Denning and his colleagues built a chatbot this way (related news) and passed the Turing Test. It was designed to mimic a 13-year-old boy from Ukraine who spoke very poor English. I met John in 2015, and he didn’t deny the inner workings of this automaton. So the chatbot probably uses brute force to match patterns. But it also proves that with a large enough definition of pattern matching, you can keep most conversations as “natural” as possible. It is also consistent with Alan Turing’s assertion that there is “no point” in building machines to fool humans.

Another example of a bot using this approach is PandoraBots, who claim to have built more than 285,000 chatbots using their framework.

algorithm

Brute force automata is daunting: for every input there has to be a pattern available to match its response. Inspired by rat holes, people create hierarchies of patterns.

We can use algorithms to reduce the number of classifiers in order to manage the machine, or we can create an equation for it. This approach is what computer scientists call “simplification” : a problem needs to be reduced, so the way to solve it is to simplify it.

There is a classic text classification algorithm called the Naive Bayes Polynomial model, which you can learn here or elsewhere. Here’s how it works:

It’s a lot easier to use than it looks. Given a set of sentences, each sentence corresponds to a category; Then enter a new sentence, and we can find the commonality of each category by counting the frequency of the word in each category, and give each category a score. (Finding commonality is important: matching the word “cheese” makes more sense than matching the word “it,” for example.) Finally, the category that gets the highest score is likely to be the same as the input sentence. Of course this is a simplification, as you need to find the stem of each word first. However, by now you should have a basic idea of this algorithm.

Here is a simple training set:

class: weather
    "is it nice outside?"
    "how is it outside?"
    "is the weather nice?"

class: greeting
    "how are you?"
    "hello there"
    "how is it going?"Copy the code

Let’s classify a few simple input sentences:

input: "Hi there"
 term: "hi" (**no matches)**
 term: "there" **(class: greeting)**
 classification: **greeting **(score=1)

input: "What’s it like outside?"
 term: "it" **(class: weather (2), greeting)**
 term: "outside **(class: weather (2) )** classification: **weather **(score=4)Copy the code

Note that “What’s it like outside” found words in another category when sorting, but the correct category gave the word a higher score. Through the algorithmic formula, we can calculate and match the word frequency corresponding to each category for the sentence, so there is no need to indicate all patterns.

This classifier classifies the most matched sentences by calibrating the classification score (calculating word frequency), but it still has some limitations. Unlike probability, a score can only tell us which category the sentence’s intention is most likely to be, but not all of its possibilities for matching categories. Therefore, it is difficult to give a threshold to accept or reject the score. The highest score given by this type of algorithm can only be used as the basis to judge the correlation, and its effect as a classifier is relatively poor in nature. In addition, the algorithm cannot accept is not type sentences because it simply computes what it might be. That is to say, this method is not suitable for the classification of negative sentences containing not.

There are a number of chatbot frameworks that use this method for sorting intentions. And most of them are word frequency calculations for training sets, and this “naive” approach is sometimes surprisingly effective.

The neural network

Artificial neural networks, invented in the 1940s, iteratively calculate training data to get weights for connections (” synapses “), which are then used to classify input data. The weighted value is changed by using the training data calculation again and again to make the output of the neural network get higher “accuracy” (low error rate).

Pictured above is a neural network consisting of neurons (circles) and synapses (lines)

There’s nothing new about these structures, except that today’s software can use faster processors and more memory. When it comes to the hundreds of thousands of matrix multiplications (the basic mathematical operations in neural networks), running memory and computational speed become key issues.

In the previous method, each category is given some example sentences. Then, all the words are input to the neural network. The data is then iterated over and tens of thousands of iterations are performed, each iteration changing synaptic weights to achieve higher accuracy. Then, the weight of each layer is calculated again by comparing the output value of the training set with the calculated result of the neural network (back propagation). You remember something because you’ve seen it many times, and the weight goes up slightly each time you see it.

Sometimes, after the weight is adjusted to a certain extent, the result will become worse gradually, which is called “over-fitting”. It will backfire to continue training in the case of over-fitting.

The amount of code required for a trained neural network model is small, but it requires a large potential weight matrix. For a relatively small example, its training sentences consist of 150 words and 30 categories, which might produce a 150×30 matrix; You can imagine, in order to reduce the error rate, a matrix of this size would have to be multiplied 100,000 times. This is why high-performance processors are needed.

The ability of neural networks to be both complex and sparse is the result of matrix multiplication and a formula (the activation function, in this case Sigmoid) that can be learned by a middle-school student in a matter of hours, reducing the range to -1, 1. The really hard work is washing the training data.

Just like pattern matching and algorithm matching, there are various variations of neural networks, some of which can be quite complex. But the basic principle is the same, and the main job is to classify.

Just as mechanical music boxes don’t understand music theory, chatbots don’t understand language.

Chatbots essentially look for patterns in collections of phrases, each of which can be subdivided into individual words. Inside chatbots, words don’t really mean anything beyond the patterns they exist in and the training data. Labeling such “robots” as “artificial intelligence” is also bad.

Bottom line: A chatbot is like a mechanical music box: it’s a machine that produces output based on patterns, but instead of cylinders and needles, it uses software code and math.


The Nuggets Translation Project is a community that translates quality Internet technical articles from English sharing articles on nuggets. The content covers Android, iOS, React, front-end, back-end, product, design and other fields. If you want to see more high-quality translation, please continue to pay attention to the Project, official Weibo, Zhihu column.