Share the guest | singer liu (senior technical experts sound cloud know AI Labs)

Source | AI technology base courses online

Human-computer dialogue systems, or conversational interaction, are expected to become the main interaction mode in the Era of the Internet of Things. The understanding and expression of language are closely related to knowledge. As a form of large-scale knowledge representation, knowledge graph plays an important role in every module of human-machine dialogue system. Intellectual dialogue is a man-machine conversation interaction service based on knowledge graph.

AI Technology Base invited Liu Shengping, a senior expert of Yunzhisheng AI Labs, to systematically explain and sort out “man-machine dialogue system method and practice based on knowledge graph”.


This course introduces the architecture and key technologies of intellectual conversation, and illustrates the application of knowledge graph in the core module of human-computer conversation system based on the practical experience of industrial level human-computer conversation system.

Liu Shengping: Senior technical expert/Senior R&D Director of Yunzhisheng AI Labs. Former senior researcher of IBM Research Institute, member of Language and Knowledge Computing Special Committee of Chinese Information Society. He received his PhD from School of Mathematics, Peking University in 2005, and is one of the pioneers of semantic Web research in China. He was a member of the Program Committee of the International Semantic Web Conference in 2010 and 2011. She has published more than 20 papers in the fields of semantic web, machine learning, information retrieval, and medical informatics. During his time at IBM, he received two IBM Research Achievement Awards. At the end of 2012, Dr. Liu joined Yunzhisheng AI Labs and led the NLP team, fully responsible for the research, development and management of natural language understanding and generation, human-machine dialogue system, chat robot, knowledge graph, intelligent medicine and other aspects.

In this open class, he comprehensively and concretely described the development and application of knowledge graph in human-machine dialogue system, which is mainly divided into the following four parts:

  • A survey of language knowledge and dialogue system

  • Basic concepts of intellectual conversation and case study

  • Key technologies of intellectual conversation: knowledge graph construction, entity discovery and linking

  • The key technology of intellectual conversation: discourse comprehension and natural response generation


A review of language, knowledge and human-machine dialogue system


1. Language and knowledge


Language and knowledge go hand in hand. The iceberg map vividly explains the relationship between language and knowledge. The language we see is just the tip of the iceberg, the words we speak. But if you want to understand this sentence, the background to this sentence is like the bottom of the iceberg.

Therefore, natural language is very different from speech and image. When we listen to speech or look at an image, all the information of it is in the speech signal or image pixel, but language is completely different. This is also the reason why natural language comprehension is far more difficult than speech recognition or image recognition.

This report is a synthesis of the three reports I made at CCKS conference in the last three years.


2. Man-machine dialogue system


Human-computer conversation system first caused a big stir in the industry was Apple Siri, which was an APP on iPhone at that time and was acquired by Apple in 2010. Siri’s innovation is the addition of a voice-UI to our traditional mobile GUI interface.

What really revolutionized human-computer conversation systems was the launch in 2014 of Amazon Echo, a hardware that was entirely based on voice interaction. Its voice technology was a big step ahead of Siri’s because it could speak far.

In 2017, Amazon released the Amazon Echo Show, a speaker with a screen. Is this a return to Siri, or is it guI-based? We should pay attention to the difference here. Amazon Echo is VUI+GUI, that is to say, it is VUI first, because the advantage of voice is that it is very convenient to input, you can say a few words can represent a section of instructions, can replace many interfaces. The downside is that output is inefficient. If you have a lot of content on the screen and you have to speak it, it can take several minutes. So the combination of VUI and GUI is a combination of the two advantages, VUI for input, GUI for output.

A more advanced form is the humanoid robot that can be seen in many movies now, such as Eva, iron man or westworld. It can talk freely with people. Its interaction is VUI++, which truly simulates the multi-modal interaction of people.

Why are hCI systems so popular in the industry at present? One of its most important significance is that it is expected to replace the current APP on mobile phones and become the most important form of human-computer interaction in the IoT era, which is its most important significance.


3. Interactive forms and application scenarios of man-machine dialogue system


Just as human speech has many purposes and forms, human-computer conversation systems include many forms of interaction:

1. Chat. The typical example is Xiaoice, which consists of greetings and greetings, and is characterized by no clear purpose and does not necessarily answer users’ questions. Chat in the existing man-machine dialogue system mainly plays the role of emotional companionship.

2. Q&a. It needs to give accurate answers to users’ questions and answers. These questions can be factual questions, such as “how tall is Yao Ming”, or other questions that define, describe, or compare classes. Q&a system can be divided into FAQ based on FAQ – answer list, CQA based on q&A community data and KBQA based on knowledge base according to the data source of q&A.

3. Control is only to parse out its semantics for the execution of a third party. The most typical control is to turn on the air conditioner, turn on the desk lamp, or play a song.

4. Task-based dialogue. It is a purposeful conversation in which the goal is to gather information in order to complete a form-filling task, most commonly ordering takeout, a hotel, or a flight, done through conversation.

5. Initiate conversation. Let the machine initiate the conversation. The difference is that the previous interaction was initiated by a human.


At present, there are many application scenarios of human-machine dialogue system, such as speakers, TELEVISIONS, air conditioners and so on. Its significant feature is that it is not directly touched by people. Voice interaction can be regarded as a substitute for remote control, and voice interaction can be used wherever remote control is available.

Another application scenario is in the car, because when driving, your eyes, hands and feet are occupied, so it is very convenient and safe to use voice to answer the phone, navigate and even send and receive wechat. Vehicle is the scene of rigid demand, so the most shipments are in this area. For example, we started to develop vehicle-mounted voice interaction solutions in 2014, and now we have shipped more than 15 million products.

Another area of application is children’s educational robots. The children’s robots in various shapes in the lower right corner can actually be regarded as children’s version of the speaker. Its content is for children, but the interaction form is also the way of human-computer dialogue.

4. Technical architecture of man-machine dialogue


Human-machine dialogue system from the academic world, its research history is very long, maybe after AI proposed, in the 1970s and 1980s began to study. Its technology is divided into five major parts:


1, speech recognition: mainly to solve the complex real scene noise, user accent diverse circumstances, people say words into text, that is, to “hear clearly”.

2. Semantic understanding: it is mainly to translate what the user says into instructions or queries that the machine can understand, that is, to achieve “understanding”.

3. Dialogue management: Maintain the state and goal of the dialogue, determine how the system should say, how to ask the next sentence, that is, generate the intention of a response.

4, natural language generation: is according to the intention of the system response, with natural language to express the intention of the response.

5. Speech Synthesis: Read this sentence with machine-made speech.


This form a complete man-machine dialogue of the closed loop.


5. Voice recognition scenario Evolution

Since the human-computer dialogue system uses speech as the entry point, we need to talk about the development of speech technology. It is important to emphasize that in order to really do a good human-computer conversation system, you must have an understanding of speech technology in addition to natural language processing technology.

The earliest scene like Siri is the close speaking mode, which mainly solves the problem of accent. At present, the recognition accuracy of this aspect is very high, which can achieve about 97%. The mobile phone voice input method we usually use is this mode, and it is generally recommended that the distance from microphone is about 30cm.

Amazon Echo is in remote mode, where you can be as far as 3 meters or even 5 meters away from the microphone. It has a lot of problems to deal with, because you’re more susceptible to ambient noise when you’re away from it, and one of the more deadly effects is reverberation caused by sound reflection, especially in a glass room where the sound is constantly reflected, and the sound that the microphone picks up is a lot of sound mixed together. Still have a very different place, is a time when we use WeChat voice can press said, or according to have said, but when you face a speakers, because there are 3 to 5 meters away from you, can’t according to speak, then there is a new technology, called “voice wake up”, just like when we speak to people call people name, like “Hi, Google”, Wake up the machine and talk to it.

At present, the most difficult scene for speech recognition is the conversation between people. When people talk to each other, they should first record it and convert it into text. This most common scene is like a meeting, which automatically transcribe the speech of different people and even automatically form meeting minutes. As well as judicial trials, as long as the scene is talking to people can be used. The most difficult problem is the cocktail party problem, a lot of people are together, the environment is noisy, everyone is talking, people can hear the person who is only paying attention, even if it is noisy, but two people can have a conversation, but this is difficult for the machine.


6. Machine role evolution in human-machine dialogue system

The role of the machine in human-computer dialogue has evolved. In the early days, human-computer dialogue was very simple and could be regarded as a replacement for the remote control. Users controlled the system with fixed sentence patterns or single-sentence commands.

Siri, Amazon Echo is an assistant form, which means you can interact through natural language, and the conversations are multiple rounds, and you can even make the machine have some emotion.

But the next stage is that it will become an expert, especially for the industry or for a particular domain, when we talk to the speaker, hopefully the speaker is also a music expert, it can talk to you about music, it can talk to you about classical music, it can even teach you something about music. When we talk to the child education robot, we want the robot to be a child education expert, and when we talk to the air conditioner, we want the air conditioning expert behind us. In this case, it is characterized by the need to have knowledge of the field, and can help you make recommendations and decisions.


Basic concepts of intellectual conversation and analysis of examples

We must understand these concepts — semantics, context and pragmatics. Especially in the context, it is in the dialogue is meaning, context is refers to human conversation occurs a specific environment, the environment, including verbal context, is what we call context, there are many nonverbal context, such as time, location, weather talk is non-verbal context, and the speaker information, etc., The knowledge that we emphasize today is also an important non-verbal context.

If the user says “it’s too cold”, the meaning is that the temperature is a little low, but if we consider pragmatics, this sentence conveys the conversational meaning and real meaning in the specific context: If the air conditioner is on in the car, we understand that this sentence means to turn up the temperature of the air conditioner; If the air conditioner is not on in winter, it might mean to turn up the heat in your car or close the Windows. It’s almost fall now, and if a girl tells you it’s “too cold,” she probably means she wants you to give her a hug or something. So context and pragmatics are very important concepts, and if you make a human-computer dialogue system, you will be exposed to these two concepts.

1. Context of human-machine (device) dialogue system


I just said that when people talk to people, context is very important. Now when we make human-machine conversation systems, what context are there when people talk to devices?

1. Physical context. That is, information about where you are when you are speaking, including (1) when, where, where, in a car, at home, etc. (2) Weather. (3) Emotions and emotions. (4) The content displayed on the device. (5) The information that the equipment can perceive, such as when we talk to the air conditioner, the air conditioner can sense the temperature and humidity inside and outside the room. The lifecycle of this context is request-level.

2. Linguistic context. (1) Context. The feedback information on and above the device is also a context. This life cycle can be regarded as session level.

3. Knowledge context. Include:


(1) Human common sense and domain knowledge. A simple example, we used a word called “the Chinese table tennis team who can win”, and “the Chinese team is also no one can win the”, this words literally look the same, but people can understand the difference of the two words, because we have common sense is: the Chinese football team is very weak, the Chinese table tennis team is very strong. So knowledge is crucial to understanding this sentence.


(2) User portrait, including some basic information of the user, the user’s gender, age, education level, hobbies and so on. (3) The Agent portrait is the information defined by this robot. For example, Xiao Ice defines its Agent portrait as an 18-year-old neighbor sister. (4) Equipment information library, if the speaker as the central control, the central control connected equipment information, equipment state, etc., are context. What does it mean to say “I’m home” to central Control at home? The central control may decide whether to turn on the light or turn off the light, whether to open the window or close the curtain for you, etc., according to your equipment state and the current environment.

2. Don’t mythological knowledge maps

The history and concept of knowledge graph have been well understood. Here, I mainly emphasize the following basic concepts: The most important concept of knowledge graph is “Things, Not Strings”. The Things in the knowledge graph are entities rather than Strings.

In addition, we should not deify the knowledge graph, which is really just an organization of knowledge. Because no matter what application you do, you have knowledge in all kinds of scenarios, and you might have represented that knowledge in other ways before. At the conceptual level, we’ve seen something like this before, like the ER model when we did database modeling, which is also a conceptual model. When we write programs, when we do object-oriented design, we draw class diagrams, these are conceptual models, and these models can be easily translated into knowledge graphs. I think the knowledge graph is first and foremost an organizational form of knowledge. In the data layer, knowledge graph is a graph model, which uses nodes and edges to express entities, values, relationships and attributes.


3. What is intellectual conversation?


What is intellectual conversation? As an example, the user might chat with the speaker: “Do you like Nicholas Tse?” “Yeah, he’s cool.” “Do you know who his girlfriend is?” This is a kind of control, the machine will play the legend of Faye Wong to you, after playing the system can also ask “do you still want to listen to the original Song of Li Jian?” This is an active conversation, the user says “ok”, the system can play Li Jian’s “Legend”, the user can also ask “what is his music style?” “Li Jian’s style has the simplicity of folk songs, but is much more gorgeous than folk songs,” the system said.

If you look at this example, it involves a lot of knowledge about music, including some of the characters of singers. The forms of interaction include chat, question and answer, manipulation and active dialogue. They are linked together by knowledge, and you will feel that the whole conversation is a very smooth one.

To sum up, intellectual conversation means: centering on knowledge graph, it integrates various data sources that can be used as dialogue through entity discovery and linking technology to realize multi-round dialogue across domains and interaction forms.


The main characteristics of intellectual conversation are as follows: first, sharing context across domains and interaction forms. You can see that chat and q&A can be connected before; Second, it reflects the positioning of the robot as an expert in the field. It has a good understanding of the knowledge in these fields and can reflect its knowledge in the chat or question and answer. With this knowledge, it can also initiate some dialogues.

The core technologies of intellectual conversation are:


  • Offline processing requires knowledge graph first, so there is a problem of knowledge graph construction. In addition, we will use entity discovery and linking techniques to correlate various dialog-related data with knowledge graphs.

  • Online processing. Discourse comprehension based on knowledge, how to integrate knowledge into chat, q&A based on knowledge graph, active dialogue based on knowledge graph, etc.


Key techniques for intellectual conversation


(I) Knowledge graph construction


1. Construction method of knowledge graph


Here I quote the knowledge graph construction method summarized by Fudan Professor Xiao. The first step is to do pattern design. We need to define which classes or concepts and which attributes or relations there are.

The second step is to determine where our knowledge comes from, the so-called data sources, which can be transformed into some structured data, unstructured data, namely text, from which to extract information.

Thirdly, the most important part of the knowledge graph is vocabulary mining, including synonyms, acronyms, phrases and so on.

Fourth, we need to group synonyms together into a concept called entity discovery, including entity realization, entity categorization, entity linking, and so on.

In the fifth step, in addition to entities, there are edges in the knowledge graph, namely relations, and we need to do the extraction of relations.

The sixth step, because our knowledge atlas may come from different data sources, we need to do knowledge fusion, mainly entity alignment, attribute fusion and value normalization.

Finally, check and control the quality of knowledge graph, including knowledge completion, error correction and knowledge update, and finally form a domain knowledge graph.

2. Evaluation method of knowledge graph

If you don’t know how to evaluate the knowledge graph, you don’t know whether your knowledge graph is good or bad, useful or useless. Evaluation methods can be basically divided into four categories: the most important category is the second category based on application, which is to evaluate the knowledge ontology indirectly through the application effect. Instead of getting dozens of people to build a knowledge graph for a year or two and then looking for applications, the knowledge graph has to be application-driven, and evaluating the knowledge graph based on application effectiveness is a recommended approach.

There is also gold standard based evaluation, which means if we have some good knowledge graph, or we can build a small knowledge graph, and evaluate the knowledge graph that we build against the standard knowledge graph. We can evaluate the completeness of our constructed knowledge graph by looking at the coverage of concepts and relationships, that is, how many concepts and relationships that appear in the standard knowledge graph are included.

In addition, the simple evaluation method is based on indicators. We can set some statistical indicators, such as how many concepts, relationships and relationship attributes there are in the knowledge graph, and then we can spot check it to see its accuracy and consistency.

3. Agile Build


At present, we are doing agile development in many cases when we are doing applications, that is to say, the version may be released once every half or a month. At this time, our knowledge graph also needs to be rapidly iterated with the application, which is the process of constructing knowledge graph quickly. It is emphasized here that we should conduct automated tests on the knowledge graph, judge whether it can be issued after the test, and continue to analyze its current problems after the release. Think of the knowledge graph as a piece of software, whether it has any bugs or needs any new features, and make a plan for the next release based on these. The core idea is to treat the knowledge graph as a piece of software, with version management and agile development.


(2) Entity discovery and linking

Problems to be solved: If we have knowledge graphs at this point, we still rely on entity discovery and linking. This technique solves that problem, “Thinks, not Strings,” where the most important problem is associating Strings with entities in the knowledge graph. It has to solve two problems. One is that we may have different forms of expression for the same meaning, such as “Kobe”, “black mamba” and “Ke God”. Many of them refer to Kobe. Another is the ambiguity of natural language or the string itself, as “apple” could mean an apple computer, an iPhone, or a fruit.

Solution: So it does it in two steps, entity discovery and entity link, entity discovery is to find mention in the text, which is a string, like “This apple is expensive”, “apple” is mention. Entity link refers to the entity in the knowledge graph. The entity in the knowledge graph can be multiple entities, including Apple Company, Apple brand, Apple phone, Apple computer, fruit named Apple, etc. Which one is the “apple”? It may depend on the context.

1. Entity-based multi-source data fusion


Here I give a very simple knowledge graph, Tse’s girlfriend is Faye Wong, who sang the song “Legend”, the original singer of the song “legend” is Li Jian.

There are several sources of data for the conversation section: one is the chat database, like “Do you like the singer Nicholas Tse?” “Yes, he is very cool.” , and FAQ library, we may know from Baidu or many places can find the community question and answer data, like here said “who can tell us about Li Jian’s music style?” “Li Jian’s style has the simplicity of folk songs, but is much more gorgeous than folk songs.”

We will also find a lot of documents from the Internet, including encyclopedia documents or webpage documents. For these documents, chat library, FAQ library and document library, we will make entity links to connect the singers appearing in these documents with the singers in our knowledge graph.

2. How to discover and link entities?


The first step of pretreatment is to establish a relationship between a Mention and an entity, which is also the limitation of the current algorithm. We need to know which entity a Mention may correspond to in advance. Then extract the entity related features:

One is the prior probability of the entity. Just like there’s a 40% prior probability that an apple could be a fruit, there’s a 60% prior probability that it could be an iPhone, what if we say grapes? Maybe there’s a 90% prior probability that a grape is a fruit, and 10% is something else. The second is the word distribution of the entity context. We see what words are around these entities, or the subject words of its chapter, just like the apple mobile phone appearing in the article are the subject words of science and technology. The third is the semantic correlation between entities. Because the knowledge graph is a graph structure, each entity is surrounded by other entities, which are all related features.

In step 2, entity link becomes a sorting problem. After finding mention, we can find its candidate entity according to the previous mention relation table. Now we only need to sort the candidate entity and return the most likely entity.

The third step is to sort the candidate entities in the most basic way. There are two main categories of this: one is the information of the entity itself, and the other is that you can use the synergy between entities to do ordering. If the entity next to the apple is more of a computer, then the apple may refer to the Apple computer.


(3) Discourse comprehension integrating knowledge


After finishing the entity link processing, we can do the real dialogue system. The most basic part of the dialogue system is the understanding of the user’s words, how to understand the user’s words.



The first step is to discover and link entities. For example, “Do you like Nicholas Tse”, we need to associate Nicholas Tse with entities in the knowledge graph.

The second step is to find the reference, such as “you know who his girlfriend is”, then the “he” refers to who, we first need to find that he is a referential pronoun, and then according to the context to judge “he” in this case is the entity tse Tse.

Another case where we do semantic understanding is disambiguation with knowledge. For example, when the user says “Zhou Qiaowen’s Birthday”, because “Birthday” is the name of a song and Zhou Qiaowen is the singer of this song, we understand that it is music because it is originally under the speaker, so we can play zhou Qiaowen’s birthday directly. But if the system ask the Andy lau’s birthday again, this time although we named entity recognition is likely to be their “birthday” may also be into title tags, Andy lau into singer, the singer’s song, it’s easy to think is playing music, but we know are verified through the knowledge of Andy lau didn’t sing this song, into a question and answer at this time, This is not a manipulative instruction. Go straight back to his birthday and say “Andy Lau’s birthday is September 27, 1961”.

These are examples of how we use knowledge to help us understand the user’s instructions. Let me talk a little bit more about how to combine knowledge with conversation.


(4) Chat with knowledge


1. The context


Now deep learning models are used in the academic world, so I will briefly talk about the method of deep learning and its basic ideas. We now generally in the academic community to chat into a sequence-to-sequence model, that is, there is an encoder to encode the input as a vector, through the decoder to generate the response. At this point, the core problem becomes how to add the context. The most basic method is to combine the text of the context with the vector of the current text as the encoder input. In addition, we can take the context as a vector, in the decoder stage input; Or use the theme model to model the session, the session theme model also as the input to the decoder, so you can achieve the effect of the context.

2. The consistency


Another important aspect of chat is consistency. As we said just now, there is a portrait of agent in the context. Although the object I am chatting with is a robot, it has a unified personality with the same gender, age, origin and hobby, which is the most difficult point in the chatbot at present. You ask the robot “How old is it?” It might say “18.” If you ask “How old are you,” it might say “I’m 88.” Or “How old are you?” It might say “She’s 18, etc.”


Why is this happening? Because now chat robot are from various sources to collect all kinds of corpora heap together, didn’t do the normalized processing on the corpus, because some corpus said “I am 88 years old,” some corpora inside might say “I am 18 years old, and so on, this time a different way to ask it may appear inconsistent answers. More complicated example, you ask it “Where were you born?” It says “I’m in Beijing” and then asks it “Are you Chinese?” It may not be able to answer, although common human knowledge knows that Beijing belongs to China and so on.

In deep learning, if you want to put the so-called robot information, modeling or vectorization into the decoder model, at this time it will give priority to generate responses from the word vector of identity information, so as to achieve a certain consistency.

3. Blend knowledge


In addition, when asking questions like our example “how tall is Yao Ming”, we generate a more natural answer saying “he is 2.26 meters and he is the only human being that can be seen from space”. Joking, of course. This kind of chat is the fusion of knowledge, it knows yao Ming’s height. When decode is done through deep learning model, in addition to generating regular responses, some responses have to be retrieved from the knowledge base, and then the responses are put together with text responses.


For more examples of such work, look at the work of Huang Minlie, who won this year’s IJCAI Outstanding Paper Award.


(4) Questions and answers based on knowledge


There were two main approaches to knowledge Parsing: one was a traditional approach based on Semantic Parsing, in which a problem was parsed into a formal query language, and then the query language was incorporated into the knowledge base. The biggest difficulty with this approach is translating natural language problems into such a formal query language. There are also many methods, the simplest ones are rule-based, template-based, complex ones are based on translation model, deep learning model, etc.


At present, there are many question-answering methods based on machine learning knowledge base in the academic world. The basic idea is that the problem is modeled as embedding, and then the knowledge graph is also embedding, which turns into vectors. The question-answering becomes a similarity matching problem. The vector of the subgraph in the knowledge base is matched with the corresponding subgraph of the problem.


There are a lot of other approaches, and there are a lot of network-based approaches, which are based on circular neural networks with attention mechanisms. Here I will give you a reference, you can take a look at “lifting the veil of KB-QA knowledge base” this article, very detailed, very good. In my opinion, the knowledge base QUESTION answering based on deep learning is not very mature in the industry, and its effects are not very controllable. We still use traditional Semantic Parsing in the system.

There are also many ways to incorporate knowledge on CQA. The core problem for CQA is that we want to calculate semantic similarity between the user’s questions and the questions in our question-answering library, and the core problem here is how to put knowledge into vector representations of sentences. A recent SIGIR2018 paper describes a way to combine neural networks of knowledge and attention. Now these papers are basically a network diagram. Another article is similar, the general is to quantify the knowledge when the text is sorted.


(5) Active conversation based on knowledge


This is actually very critical. In our human-computer dialogue system, especially in VUI interaction, the VUI speaker has no interface, which means you can’t know what functions the speaker supports. When you’re in front of a speaker, how do you know what it does, what you can say, what you can’t say, or what it has? At this time, it is necessary for the robot to have an active dialogue, to guide the user to use it and know its function.

For example, if a user says “Legend”, the machine can take the initiative to ask him “do you want to listen to Li Jian’s original song after playing it?” In fact, its idea is very simple, based on our knowledge graph, to see if there are other relationships or attributes under the same entity, or to recommend other entities under the same relationship.



The idea of a Baidu article here is similar. If we feel that the chat is not going to continue, we will first do entity analysis and entity link in the context, find the entity as the topic of the chat, and then find the relevant entity according to the knowledge graph, and generate topics according to the relevant entity.


Conclusion


In front of the chat, question and answer, dialogue, semantic analysis and knowledge of how to do a simple introduction. Here’s a summary:

First, why are human-computer conversation systems important?


1. It has the potential to become the dominant form of interaction in the Internet of Things era, similar to the OS.

2. The core of intellectual conversation is the knowledge graph. The most important thing for it is to do two things: one is to integrate multi-source data based on knowledge graph offline; the other is to integrate chat, question and answer, dialogue and control based on knowledge graph in service.

3. Technically, the combination of deep learning and knowledge mapping is one of the most important trends. Personally, I am optimistic about the sequence-to-sequence model, because it has rich expression ability and many application scenarios. Basically, most problems in natural language processing can be modeled into a sequence-to-sequence. Including our translation from one language to another, chat and question answering, and even pinyin input method, which is to convert pinyin Sequence into text Sequence and so on, as well as segmentation, part of speech recognition, named entity recognition and so on, which are sequence-to-sequence. This model is divided into encoder and decoder two stages, it can put some knowledge into different stages.

Second, what is the evolution of technology in human-machine dialogue systems?


1, in the dialogue, we should not only look at the semantics, but also look at the pragmatics, which is “semantic + context”.

2, we can not only do chatty robot, but also hope that our robot is to master the domain knowledge, it is literate, and the cultural level is also very high, is a domain expert, is intellectual conversation.

3. Streaming conversations. Our current interaction with the speaker is to wake up first and say “Xiao Ai, please order a song for me”, and then say “Xiao Ai, play the next song”. It’s very troublesome, but people don’t always call people’s names when they have a conversation, so streaming the conversation is needed. The technical difficulty is how to determine whether a person has finished speaking and whether you can interrupt. This is the most difficult point in technology at present. There is also how to reject noise, because there is no wake word in the conversation now, at this time, people nearby or even the words on TV are likely to be misrecognized, and the machine will respond to it.


Answer questions from the audience

Q: Our company is building the knowledge graph of e-commerce, but the data of e-commerce will be updated every day. Is there any good way to update the knowledge graph? And how to do knowledge inference based on neo4j?

A: That’s A good question. We emphasized that the knowledge graph needs to be built agile, and agile building means that you can release versions frequently, so there is the issue of version merging, which is also the issue of update. The main technology of updating is the fusion of knowledge ontology or the matching and alignment of knowledge entities. If the amount of updated data is not large, I suggest that the updated data be automatically added to the knowledge graph through the technology of entity alignment. If the amount is not large, manual review is needed to check whether the updated data is OK. I don’t think there is a particularly good way to do this, because updating is the most difficult problem in the knowledge graph.

How does neo4j map do knowledge reasoning? First of all, I personally think it is not suitable for storing massive knowledge maps. The number of e-commerce should be large, so whether neo4j is suitable for this situation remains to be discussed. How to do knowledge reasoning? We generally think of the knowledge graph as knowledge, and try to do as little reasoning as possible, because reasoning is a very difficult thing, and there are no particularly sophisticated industrial tools. Second, if we have to do reasoning, we usually do offline reasoning, which is to do reasoning well in advance and expand all the data it can expand, also known as “knowledge completion”, just like simple transitive relationship or expand it all in advance, which is equivalent to storage space for time, which is a relatively common method. We do not recommend real-time reasoning for online services at this time, because that performance is generally difficult to achieve.

Q: Can you briefly introduce the general method of ontology construction?

A: Generally speaking, there are two methods of ontology construction. One is the traditional expert-based method, which is to invite general experts to construct the ontology manually. They have meetings and discussions on each word, each entity and the relationship between words, and finally decide this or that way. However, this approach is no longer feasible, and it will become a bottleneck for us to do the knowledge graph because we expect it to be an agile build.

At present, most of the methods are data-driven, that is, we use data mining to automatically build knowledge graph, and appropriately based on manual review. I prefer the extreme method. The way I recommend is that the whole construction of the knowledge graph is fully automatic, but experts are also required to participate. However, experts are not involved in review or construction, but in evaluation. The effect of the entire knowledge graph depends on the effect of the application, and the application cannot assume that the whole knowledge graph is completely correct and complete. We can constantly update the knowledge graph through rapid iteration, and then determine the quality of the knowledge graph based on automated testing or manual sampling inspection and application effects. As long as the quality of our knowledge graph can meet the needs of the application, it is OK.

Q: Is there a general best practice for entity extraction?

A: The best practice is that, if you look at it from an industrial point of view, entity extraction is definitely A combination of multiple approaches, dictionary-based, rules-based, statistical learning based, deep learning based. No one approach is going to solve all the problems. Although dictionary to dig this thing no technical content, but in the practice of the method based on dictionary is a very effective method, especially in vertical, areas such as health care, of course, in some areas may be the method unreliable, such as in the field of music, music there is a song, every word could be a song.

However, there is another important consideration based on the lexicographical method, which is to consider whether the word in the dictionary is ambiguous, or the prior probability of a word. For example, “I love you” is also the name of a song, but the probability of it being the name of a song may not be very high, but the probability of “Forgetting Water” being the name of a song is very high, so the dictionary is not a simple list of entries, but information with prior probability.

Q: Does the knowledge graph still require knowledge of the Semantic Web? Does building OWL require strong domain knowledge?

A: We just mentioned that the knowledge graph’s predecessor is the Semantic Web, so if you want to understand the knowledge graph more deeply, you need to understand the semantic Web, especially the specification of RDF OWL.

The ontology language of OWL is still a little bit complicated. Currently, it is not recommended to make the knowledge graph as complicated as that, and it is almost equivalent to the form of RDF. We want the knowledge graph to be built as large as possible, but it should be logically simple and not complicated with OWL. A little bit of semantics can go a long way, there’s no need to make the model too complicated, because one of the most difficult things about making the model too complicated is that when you put an entity in it it’s hard to tell which concept that entity belongs to.

Q: Are psychologists competitive in the NLP academic field? What advice do psychology researchers have for moving into NLP academic circles?

A: That’s an interesting question. One of the backbone members of our team was trained in psychology, but his major in psychology was statistical psychology, that is, quantitative psychology, so he had a certain basis in statistics. At this time, it is easier to switch from statistical basis to NLP because of the mathematical basis. Another point, psychology is more meaningful in cognition, because these principles of neural network and cognitive psychology have a certain relationship, so psychological knowledge is quite helpful to transfer to NLP.

As for specific suggestions, no matter which major transfers to NLP, the most important thing is to learn the most basic things of mathematics and machine learning. Once this foundation is laid, it will be relatively easy to switch to NLP.

Q: Knowledge-based methods and statistical methods need to integrate and complement each other. Do teachers have typical cooperative ideas to make full use of the stability and control of knowledge-based methods and at the same time make use of statistics to automatically extract from supervised big data? Can you talk about the experience of NLP together?

A: At present, there are three schools of artificial intelligence: knowledge graph, statistical learning and deep learning. From the perspective of the industry, each has its own strengths in solving specific problems. Therefore, the three need to be integrated, and A real online system cannot have only one method. Therefore, knowledge method is a very important method, and it has good complementarity with deep learning, especially it can provide interpretability which is not found in deep learning method.

The simplest method of fusion is to make the model Ensemble and assemble several classifiers together. For this, you can see Mr. Zhou Zhihua’s book “Watermelon Book”, because Mr. Zhou is the best at making models with Ensemble.

In addition, the knowledge or rules can be as features, from this point of view together. In addition, deep learning decoders can also integrate knowledge, so there are many ways to do this.