Welcome toTencent Cloud + community, get more Tencent mass technology practice dry goods oh ~

This article is from the column language, Knowledge and Artificial Intelligence, written by Tencent Zhiwen Lab

1. What is a task-oriented robot

Task-oriented robots refer to robots that provide information or services under specific conditions. Usually, it is to meet users with a clear purpose, such as traffic check, check phone charges, ordering meals, booking tickets, consulting and other task-based scenarios. As the needs of users are complex, they usually need to interact in multiple rounds, and users may modify and improve their needs constantly during the dialogue. Task-based robots need to help users clarify their goals by asking, clarifying and confirming.

2. Composition of task-type robot

The core module of task-oriented robot mainly includes three parts:

  1. Natural Language Understanding module — Language Understanding

  2. Dialog Management module – Dialog Management

  3. Natural Language Generation module — Natural Language Generation

The overall framework is as follows:

The following is a detailed description based on each module:

2.1 Natural language understanding module

2.1.1 profile

When the user language passes through the module of natural language understanding, it needs to pass through three sub-modules of domain recognition, user intention recognition and slot extraction. Domain recognition is to identify whether the statement belongs to the task scene. Generally, when multiple robots are integrated, such as chattering robots and question answering robots, domain recognition should be judged and distributed before entering task-based robots. Intention recognition is to identify user intentions and subdivide the sub-scenes under the task-based scenario. Entity recognition and slot filling, used for the input of the dialog management module.

2.1.2, for example,

To take a simple example for this module: Suppose Text= “What is the exchange rate of RMB to US dollar”; Act (slot1 = value1, slot2 = value2……) Text will be resolved to “query (slot 1= RMB, slot 2= USD)”.

2.1.3 Related research work on natural language understanding module

As one of the core modules of task-based robots, intention understanding and slot extraction have aroused wide interest of researchers. There are the following methods:

1. Rule-based understanding method

Examples are VoiceXML and Phoenix Parser (Ward et al., 1994; Seneff et al., 1992; Dowding et al., 1993). Phoenix Parser maps an input sentence (word sequence) into a semantic framework composed of multiple semantic slots. The matching rule of a semantic Slot consists of multiple Slot value types and connectees, which can represent a complete piece of information, as shown in Figure 2. Advantages: Does not require a lot of training data. Disadvantages: 1. Rule development is error-prone. 2. Tweaking rules requires multiple iterations. 3. Rules conflict, which is difficult to maintain.

The test results of Phoenix based on TownInfo corpus are shown in Table 1:

2. The method of combining rules and statistics

For example, combinatorial Category Grammar (CCG) can be used for statistical modeling and automatic rule extraction of a large number of complex language phenomena based on annotated data. Due to the looseness of grammatical rules and the combination with statistical information, the application of this method in oral semantic understanding can learn and parse irregular text (Zettlemoyer et al., 2007). The test results based on ATIS corpus are shown in Table 2:

3. Statistical methods (alignment)

Oral comprehension based on word alignment data is often regarded as a sequence labeling problem. Generative models include random finite state machine (FST), Statistical machine translation (SMT), dynamic Bayesian network (DBN), etc. Discriminant models include CRF, SVM,MEMM, etc. (Hahn et al., 2011). The expected test results based on Media Evaluation are shown in Table 3.

4. Statistical methods (unaligned)

Such as generative dynamic Bayesian network (DBN) (Schwartz et al., 1996), disadvantages: Markov hypothesis makes this model can not accurately model the long range correlation of words; The layered hidden state method can solve the above problems of long range correlation, but it requires high computational complexity (He et al., 2006). A semantic tuple classifier (Mairesse et al., 2009) was proposed based on support vector machine classifier. Based on TownInfo, ATIS corpus test results are shown in Table 4.

5. Deep learning methods

Unidirectional RNN is applied to semantic slot filling task, and significantly outperforms CRF model in ATIS evaluation set (Yao et al., 2013; Mesnil et al., 2013); LSTM and other extensions (BiLSTM+CRF); CNN for sequence labeling (Xu et al. 2013; Vu 2016) ; Encoder decoder model based on Sequence-to-sequence, extended for Attention (Zhu et al., 2016; Liu et al., 2016); The cyclic neural network with External Memory unit can improve the Memory capacity of the network (Peng et al., 2015). RecNN (Guo et al., 2014) et al. Test results of the above methods are shown in Table 5.

6. Examples based on the above methods

Rules-based parsing: If you type “I want to query the current exchange rate for the dollar” into a rules-based parser, you can parse the following intents and slots.

Lstm-based model: Sentence annotation format is as follows. BIO annotation and intention annotation of the whole sentence are adopted to solve model parameters by using the likelihood of maximization slot and intention.

Model based on statistical Methods (SVM) : N-gram feature extraction is carried out for sentences, which are classified by SVM of training field and SVM of slot.

2.2 Dialog Management module

2.2.1 profile

The triplet output of the natural language understanding module will serve as input to the conversation management system. The dialogue management system consists of two parts: status tracking and dialogue strategy. The state tracking module includes various information about the ongoing conversation, and updates the current conversation state according to the old state, user state (i.e., the triplet above) and system state (i.e., through the query with the database) as shown in Figure 3. Dialog policies are closely related to the task scenario and are usually output by the dialog management module. For example, the dialog policy is used to query the missing slot in the scenario.

2.2.2 for

Text = “What is the exchange rate of RMB against US dollar” The form “query (slot 1= RMB, slot 2= USD)” will be used as the input of the dialog management module. At this time, the status tracking module will judge the query status of the round based on the information of the previous rounds and the input, determine the slot of the query, and interact with the database. For example, what you want to inquire is the exchange rate information of RMB against US dollar. At this time, judge the current slot status according to the existing dialogue strategy, and finally give the output of the dialogue management module, such as the query result (source currency = RMB, target currency = USD, exchange rate =1:0.16)

2.2.3 Related research on dialogue management system

The dialogue management module is the brain of a task-robot. The main methods are based on rules and statistical learning methods. At present, there are popular dialogue management modules based on reinforcement learning. A dialogue management system based on reinforcement learning needs a lot of data to train. Jost SchatzM-Ann and Steve Young et al. proposed the Agenda User Simulator model to simulate users and continuously train with the dialogue management module, thus solving the problem of scarce annotation data to a certain extent. But it still doesn’t handle complex conversations very well. Through experiments, Jianfeng Gao et al. proved that the dialogue management system based on reinforcement learning training has a strong anti-interference ability to noise. Meanwhile, from the perspective of overall error, the influence caused by slot error is more serious than that caused by intent error.

2.3 Natural language generation module

Natural language modules are usually template – based, syntax – or model-based. Templates and grammars are primarily rules-based strategies, and models can be used to generate natural languages using networks such as LSTM.

2.4 End-to-end model

Here, Microsoft’s end-to-end model (Jianfeng Gao et al., 2018) is taken as an example, as shown in the figure below.

The main highlight of this paper is that according to Jost Schatzmann and Steve Young, Fellow et al., 2009), the semantic level is raised to the natural language level, and the error model is used to make an in-depth study of the dialogue management system based on DQN reinforcement learning. The User Simulator adopts Steve Young’s stack-based AGENDA model, the natural language generation module and natural language understanding module adopt LSTM model, and the dialogue management module adopt DQN model (Jianfeng Gao et al., 2018).

3. Application: Ali Xiaomi survey

The overall algorithm system of Ali Xiaomirobot is as follows, which uses domain recognition to distribute the input Query and context to different robots to perform tasks.

The algorithm framework of task-type robot is as follows, which basically adopts the framework described in the first part.

4. To summarize

This paper briefly introduces the framework and some methods of task-based dialogue. If you need further research, you can find the corresponding article in the references to read. Of course, there are still many problems in this field, such as:

  1. Representation of semantics. How to design sentences into appropriate semantic structure forms, add semantic parsing, semantic reasoning, robust domain transfer, etc., is always a very challenging problem.

  2. It is very difficult to collect and annotate task-based data, so how to design a set of general data annotation format needs to be further studied. With the increasing requirements of users on task-based domain, it becomes particularly important to use existing resources to study domain migration.

Q&a how to detect search engine bots with PHP? A review of text emotion analysis for language comprehension in Reading Task-based dialogue when deep learning meets automatic text summarization

This article has been published by Tencent Cloud + community authorized by the author.Cloud.tencent.com/developer/a…

Welcome toTencent Cloud + communityOr pay attention to the wechat public account (QcloudCommunity), the first time to get more massive technical practice dry goods oh ~