Abstract: This paper mainly introduces the natural language understanding module (NLU) in the dialogue system. NLU is a very important module in the dialogue system, including intention recognition and slot filling.

One, the introduction

With the rapid development of mobile Internet and intelligent terminal, task-based dialogue robot is more and more widely used. Any mature dialogue robot product cannot do without task-based dialogue system. At present, the mainstream approach of task-based dialogue system industry is to adopt a set of relatively fixed Pipline way to achieve. The specific flow chart is as follows:

The whole Pipline consists of three modules: Natural language understanding (NLU); Dialogue Management (DM); Natural Language generation (NLG). Now more and more products are also integrated into the knowledge base, mainly introduced in the dialog management module. Natural language comprehension, conversation management and natural language generation are all natural language processing techniques. In voice conversation, speech recognition (ASR) and speech synthesis (TTS) are added to the input and output terminals. Natural language understanding (NLU) : the main function is to process user input sentences or the result of speech recognition, and extract the user’s conversation intention and the information conveyed by the user.

Dialogue management (DM) : Dialogue management is divided into two sub-modules, dialogue state tracking (DST) and dialogue policy learning (DPL). Its main function is to update the system state according to the results of NLU and generate corresponding system actions.

Natural language generation (NLG) : the system action output by DM is textualized, and the system action is expressed in the form of text.

Each of the above modules corresponds to a research field in the academic world. This paper mainly introduces the natural language understanding module (NLU) in the dialogue system. NLU is a very important module in the dialogue system, mainly including intention recognition and slot filling. The concepts of intent recognition and slot filling and the mainstream methods used for intent recognition and slot filling in recent years are described below.

Second, the concept of intention identification and slot filling

(1) Identification of intention

Intention recognition, as the name suggests, is to determine what the user is going to do. For example, a user asks a robot a question, so the robot needs to determine whether the user is asking for information about the weather, travel, or a movie. In the final analysis, intention recognition is a problem of text classification. Since the corresponding text classification, it is necessary to define the categories of intention first, that is to say, we need to define the categories of intention in advance before we can consider the problem of intention recognition. So how do you define intention categories? Unlike emotion categorization tasks, no matter what the situation is, it can be divided into three categories: positive, negative and neutral emotions; When we talk about intention classification, we need to consider it in specific scenarios. Different application scenarios have different intention classification. On the Meituan APP, for example, it divides users’ search intentions into categories such as food delivery, hotel, travel tickets, movie tickets and air tickets.

(2) Filling slots

One way to understand a passage is to mark up words or symbols that make sense for the sentence. In natural language processing, this problem is known as the semantic slot filling problem.

In the conversational system, slot filling process generally refers to extracting key information from the user’s words and converting the user’s implicit intention into explicit instructions for the computer to understand. Semantic slots are generally used to represent user requirements, such as origin, destination, and departure time. As the keywords of intention identification, semantic slots are used to prompt the next dialog.

In the dialogue system, slot filling has two meanings. One is to carry out multiple rounds of dialogue as a conditional branch, and the other is to express users’ intentions more clearly as information completion. In other words, slot filling is not only a way to complete the user’s intention, but also plays a role in guiding the direction of subsequent information completion.

Intent identification and slot filling methods

Intent identification and slot filling can be handled as two separate tasks or in combination. Due to the high correlation between the two tasks (correlation between intent and slot values), joint modeling generally works better. Next we will introduce some related paper methods.

Thesis 1: A Joint Model of Intent Determination and Slot Filling for Spoken Language Understanding

(1) Overview

This paper is aimed at the intent-slot filling joint model of SLU(Spoken Language Understanding). The excellence of this paper lies in providing an RNNs-based joint model that achieves excellent results in both the basic tasks of text classification and text markup of NLP. Recurrent neural networks (RNNs) have been proved to be effective in text categorization. Since intention recognition is closely related to slot filling, the authors propose a joint model suitable for both tasks.

In this paper, the author adopts the improved version of GATED cyclic neural network (GRU) of RNNs, which has the feature of learning the representation of each word (token) along with time sequence, through which each slot can be predicted. Meanwhile, a Max-pooling layer can capture the global features of sentences to achieve intent recognition. The features of each token captured by GRU and the global features of sentences captured by max-pooling layer are shared by two tasks. For example, if the intention of a sentence is to find a flight, it is more likely to have the starting and destination city, and vice versa. Experiments show that the combined model works better than each model separately. United Loss Function was selected as the strategy in this paper, and the experimental results proved that the paper achieved state-of-art results in both tasks.

(2) Model structure

(3) Definition of loss

(4) Model results

The experimental results are shown as follows. The second column is the feature used by each method, where W, N and S respectively represent word, named entity and semantic feature. In the CQUD dataset, W represents each Chinese character feature representation. It can be seen that the results of CRF are better than those of SVM, indicating that CRF is more suitable for sequence annotation tasks. In addition, RNN beats CRF because RNN can capture long dependencies. R-crf model combines the advantages of RNN and CRF model, which can simulate label transfer and obtain the global advantages of the whole sequence. For slot filling tasks, sentence simplification is the best method, which uses dependency analysis to extract keywords from sentences. RecNN uses semantic information extracted from DL, but its effect is worse than that of inflection. According to the author, the possible reason is that the scale of corpus is not large enough. Overall, the paper improved accuracy by 1.59% and 1.24% in the ID and SF domains, respectively.

Comparison of combined and separate models:

Thesis 2: Attention-based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling

(1) Overview

This article mainly introduces the use of attention-based RNN model to identify intents and slot filling. Intention recognition usually uses classification to classify sentences into corresponding types of intention. Slot filling can be regarded as a sequence labeling problem, that is, each word in a given sentence is labeled accordingly. For example sentences “first/class/fares/from/Boston/to/Denver.”, each word corresponding labeled “B – class_type/I – class_type/O/O/B – fromloc/B – toloc”. Slot filling can be solved by either RNN model or Encoder – Dedoder model. That is, the source input is sentence sequence, and the target output is annotation sequence. Meanwhile, the intent recognition can also be solved by encoder- Dedoder model. That is, the source input is the sentence sequence and the target output is the intention type. For and slot filling, however, the sentence is in one-to-one alignment with the corresponding label, that is, with “explicit alignment”. The author proposes two models. One is to add alignment information to encoder- Dedoder model. The other is to add alignment information and attention to the RNN model to solve the problems of slot filling and intention identification.

(2) Joint model

The attention-based RNN model combining intent recognition and slot filling is shown in the figure below:

In bi-directional RNN for sequence labeling, the hidden state of each time step carries information for the entire sequence, but the information may be gradually lost as it propagates forward and backward. Therefore, when making slot prediction, we not only want to use only aligned hidden states at each step, but also want to see if the use of context vectors provides us with any other supporting information, especially information that requires long-term dependence that is not fully captured by hidden states.

In the model, the bidirectional RNN (BiRNN) reads the source sequence forward and backward. In this paper, RNN adopts short – and long-term memory network (LSTM). The slot label dependencies are modeled in a forward RNN. Similar to the encoder module in the encoder-decoder architecture above, the hidden state of each step is a series of forward and backward states. Each hidden state contains information about the entire sequence of input words, focusing on the parts around the words. This hidden state is then combined with a context vector to produce a label distribution, where the context vector is calculated as a weighted average of the RNN hidden states.

For joint modeling of intention recognition and slot filling, the hidden state of bidirectional RNN is used to generate intention class distribution. When attention mechanism is not added, the hidden state vector is maximized first, and then input into logistic regression to classify intentions. When attention mechanism is added, the weighted average value of hidden state vector is obtained.

It is worth noting that the attention-based RNN model has higher computational efficiency compared to the attention-based encoder/decoder model that utilizes explicit aligned inputs. During model training, the encoder-decoder slot filled model reads the input sequence twice, while the attention-based RNN model reads the input sequence only once.

(3) Experimental results

Table 1 shows the slot fill F1 score for the proposed model. Table 2 compares the slot-filling model performance with the results previously reported.

Table 3 shows the error rates of intention classification between the proposed intention model and the previous approach.

Table 4 shows the performance of the joint training model in intent detection and slot filling compared to the results previously reported.

To further verify the performance of the joint training model, this paper applies the proposed model to other ATIS data sets and evaluates them through 10-fold cross-validation. Both encoder – decoder and attention-based RNN approach have achieved satisfactory results.

“Reference”

[1] Zhang X, Wang H. A joint model of intent determination and slot filling for spoken language understanding[C]//IJCAI. 2016, 16: 2993-2999.

[2] Liu, B., & Lane, I. (2016). Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling. Interspeech.

[3] www.jianshu.com/p/cec045c55…

[4] zhuanlan.zhihu.com/p/92909762

This article is shared by Huawei Cloud community “Natural Language Understanding of Task-based Dialogue Robot (1)”, original author: Xiaobi ~.

Click to follow, the first time to learn about Huawei cloud fresh technology ~