preface

Knowledge Graph (Jianguo KG) is the infrastructure for a new generation of intelligent application products (AI) such as search engines and question answering systems. The current products include Baidu's "Bosom" and Sogou's "Zhicube". Knowledge graph is a data structure based on directed graph, which consists of points and directed edges. Each node in the graph is called Entity. Edges represent logical relations between entities or attributes and attribute values. Everything in the world is an entity. These entities can be real things, concepts (like cold medicine), or categories (like fruit). Through these entities, and the relationships between them, everything can be connected, the world's largest knowledge map. The reason why a knowledge graph is an infrastructure is that it is a knowledge base, and the data in the knowledge base is stored in the form of the data structure of a directed graph. Knowledge graph is the foundation of new generation search engine and intelligent question and answer. It is data support. To take 2 examples, here is a knowledge graph that simply describes tourist attractions ** :Copy the code

Now let's explain why "knowledge graph is the infrastructure for intelligent applications such as a new generation of search engine and question answering system". If an intelligent system is regarded as a brain, then knowledge graph is a knowledge base in the brain, which enables the machine to analyze and think about problems from the perspective of "relationship". In the above picture, for example, simple knowledge such as "Mount Tai is 1545 meters above sea level" and "Hengshan and Hengshan have the same pronunciation" can be obtained from the knowledge map. Knowledge graph is divided into general knowledge graph and industry knowledge graph. General knowledge graph: Multi-source open field mass data Industry knowledge graph is vertical data, such as medicine, hospital, film, automobile, etc.Copy the code

Representation of knowledge graph

The knowledge graph is a large network, but when you break it down, it's all spOS. Knowledge graph can be represented by triples (entity-1, relation, entity-2) and SPO. The attribute value of one entity is another entity. Of course, some attributes have only one value, such as someone's wife, and some attributes have multiple attribute values, such as the leading actor of a movie. Each spo describes a fact, for example :(five mountains, one of the five mountains, mount tai) means the fact that mount tai is one of the five mountains. It should be noted that if relation is certain, the position of entity-1 and entity-2 cannot be reversed, because a triplet describes a directed edge (fact). The entity does not have to be a specific thing in real life, but can also be an attribute value of the thing, and the relation is the attribute. When triples are used to store knowledge maps, another issue needs to be considered, namely Entity Recognition and Entity Disambiguation. For example, the physical "apple" could be a fruit apple or a mobile phone iPhone. At this time, we need to do some processing on the knowledge graph and modify the search strategy.Copy the code

In the process of establishing the knowledge graph, if ambiguity is found, the superior node of the corresponding entity is added, and the superior node is used to disambiguate the word "apple". Here it is emphasized again that the knowledge graph is only infrastructure and a knowledge base. Several examples will be given to illustrate the application value of knowledge graph. It also introduces Knowledge Reasoning, which is how to teach an intelligent system to complete the Knowledge graph by interacting with human users. Intelligent question answering system can understand the user's intention, and then find the most appropriate answer from the knowledge graph, and then present to the user. For example, in the query of Wang Jianlin's son's girlfriend, the intelligent system needs to understand that the user's intention is to know Wang Sicong's girlfriend, but not Wang Jianlin's girlfriend. Therefore, the intelligent system first finds wang Sicong's son's entity from the knowledge graph, and then seeks Wang Sicong's girlfriend. Finally, present the answer.Copy the code

The nature of the knowledge graph

Knowledge graph is to display knowledge in the form of graph. Nodes are used to describe some entities or concepts in the objective world. Edges are used to describe some relationship between an entity and its time or some properties of the entity. Through this structured representation of knowledge, knowledge maps express the rich knowledge existing in the objective world into a form that can be processed and understood by machines (or intelligent systems). There is no standard definition of Knowledge Graphs, but here isa reference to Exploiting Linked Data and Knowledge Graphs in Large Organisations:  A knowledge graph consists of a set of interconnected typed entities and their attributes. That is, the knowledge graph is composed of some interconnected entities and their attributes. In its simplest case it looks like this:Copy the code

If it's more complicated, it looks like this:Copy the code

As mentioned earlier, the knowledge graph combines many aspects, among them: from a Web perspective, KG establishes semantic links between data like hyperlinks between texts, and supports semantic search. From NLP's perspective, it focuses on how to extract semantic and structured data from text. From the knowledge representation point of view is how to use computer symbols to represent and process knowledge. From the perspective of AI, it is how to use knowledge base to assist in understanding human language. From the database point of view is to store knowledge in the way of graph. Therefore, KR, NLP, Web, ML, DB and other methods and technologies should be comprehensively used to make a good KG.Copy the code

An overview of knowledge graph technology

Application of knowledge graph

Traditional search engines simply filter the target web page based on the keyword query entered by users, and then provide a bunch of web links, requiring users to open the web page and find the answer they want. The application of knowledge graph will try to provide some more intelligent answers besides corresponding web links. For example, a user typing "Taj Mahal" into Bing will get the following results:Copy the code

Here provide the taj mahal * * to * *, * * * * travel information, geographic location * * * *, * * * *, such as the seven wonders of the ancient world, discover user intent to better * * * *, and not as rigid as a traditional search engine, users need myself a a to filter information, so that knowledge map technology is of great commercial value. For example, I directly typed "ping Pang" into Baidu and got the following results (actually I just wanted to search "Zhang Jike", but forgot his name) :Copy the code

At the same time, the application of knowledge graph can make search engine gain certain reasoning ability. For example, if you type in "Liang Qichao's son's wife" on Baidu, traditional search engines simply match web pages, making it difficult to truly understand the user's intention, let alone answer the question. However, the knowledge map can make the problem easier. We first obtain liang Sicheng's son from the knowledge base, and then obtain Liang Sicheng's wife is Lin Weiyin.Copy the code

Some technical and technical terms in the knowledge graph

Association: Whether the entity to be included and the entity in the core set are the same entity. Entity normalization: whether the entity to be included is the same entity, if so, it will be normalized into one entity entity disambiguation: entity refers to, edge construction: is the association of O. Data fusionCopy the code

The last key technique for constructing universal knowledge graph is knowledge integration based on multi-source data. In the face of billions of entities in the open domain, we realize the normalized fusion of large-scale entities in the multi-source open domain through the technology of entity disambiguation and entity normalization based on semantic space transformation, so as to solve the problems of diverse knowledge representation and difficult association and fusion. An Entity is an abstraction of an objective Entity. A person, a movie, or a sentence can all be considered an Entity. For example: Yao Ming, Ang Lee, I'm not Pan Jinlian, Roll up your sleeves and work Hard Type A type is an abstraction of a collection of entities that share the same characteristics or attributes. Examples: China is an entity, the United States is an entity, and France is an entity. These entities all have common characteristics such as capital, population and area. Therefore, for example, China, the United States, France and other entities with capital, population and area can be abstracted as "country". Domain is a collection of types, which is above the type and is the abstraction of all types in a certain field. A country is an abstraction of entities like China and the United States, and it is a type. Besides the type of a country, a geographical location also includes other types: city, region, continent and so on. The geographical location domain is formed by abstracting all these types: continent, country, city, region and so on. Property is an abstraction of the relationship between entities. For example, Ang Lee is an entity, Ang Lee is a character (Type), Life of PI is an entity, and Life of PI is a movie (type). It is obvious that there is a relationship between the two entities, namely: Therefore, the relationship between Ang Lee and life of PI can be characterized by the attribute "Director". Then you can build a layer of relationships based on attributes, character (Type) → director (property) → movie (type). Relation is an abstract relation between entities. Entity → Director → Life of PI This relation describes the relationship between Ang Lee and life of PI. Value is used to describe entities, which can be divided into text type and numerical type, for example: Entity yao Ming → height relation →226cm (value).Copy the code