What is the knowledge graph?

It is simply understood as a graph, in which nodes are various entities in reality, such as people, objects and organizations, etc., and lines reflect the relationships or attributes between nodes. As shown in the figure.

The role of the knowledge graph

Nowadays, knowledge graph is widely used in chatbot, recommendation system, etc., and in various vertical fields such as finance, agriculture, e-commerce, medical and health care, environmental protection, industrial manufacturing scenes, etc., due to the characteristics of prior knowledge of knowledge graph, it has been widely applied. In the abstract point, knowledge graph is a huge knowledge network graph that transforms the discrete symbol representation into continuous vector representation.

Representation and storage of knowledge graphs

Currently, there are two methods. One is RDF, which consists of many triples. The advantage is that it is easy to publish and share, but the disadvantage is that it does not support entities or relationships to own attributes. Another method is graph database, which is mainly used by the query and search of universities. For example, Neo4j is the most widely used, with a relatively clear interface and easier expression of relationships in real business scenarios. The effect is ok when the data volume is less than 100 million levels, but the only shortcoming is that it does not support distribution.

knowledge

There are only two sources of data to build the knowledge graph: one is the internal business data of the company, which is generally stored in a structured database and can be directly used. The other is the web pages that need to be captured from the Internet by crawler or the data provided externally. Such data is messy and unstructured and needs to be processed. So the difficulty mainly comes from the latter. It mainly involves natural language related technologies, such as entity naming recognition, relation extraction, entity unity and coreference resolution. The figure below uses unstructured text to build knowledge maps.

Several of the NLP techniques mentioned above are used

Building knowledge map

Common mistake: Without referring to the importance of data, many people think that building a knowledge graph is all about algorithms and development. But the fact is not imagined, in fact, the core of the most important is the understanding of the business and the design of knowledge map itself, and also to have certain forecast to the future business, it is similar to a business system, the design of the database table is very important, and that the design is absolutely inseparable from the in-depth understanding of business and the change of business scenarios for the future forecast.

Main steps:

  1. Determine whether your business needs the support of the knowledge graph

  2. Define specific business problems

  3. Data collection and preprocessing

    FAQ:

    1. What data do we already have? 2. What data might be available, though not yet? 3. Which part of the data can be used to reduce risk? 4. Which part of data can be used to construct the knowledge graph? 5. Note that not all target-related data should go into the knowledge graphCopy the code
  4. Design of knowledge graph

    FAQ:

    1. What entities, relationships, and attributes are needed? 2. Which attributes can be used as entities and which entities can be used as attributes? 3. What information does not need to be included in the knowledge graph?Copy the code

    Design principles:

    Business principles: Everything starts with business logic, and it is easy to infer the business logic behind it by observing the design of the knowledge graph, as well as the possible changes of the business in the future. Good design makes it easy to see the logic of the business from the graph. Analysis principle: There is no need to put entities in the graph that are not related to relationship analysis. Principle of efficiency: The knowledge graph is designed as a small and light storage carrier, and the information irrelevant to the relationship analysis is placed in the traditional relational database. Principle of redundancy: Some repetitive, high-frequency information can be placed in a traditional database.Copy the code
  5. Storage of knowledge graphs

    In storage, we have to choose between storage systems, but since the knowledge graph we designed has properties, the graph database can be the first choice. But as to which graph database to choose also depends on the volume of business and the requirements for efficiency. If the amount of data is extremely large, Neo4j may not be able to meet the needs of the business. In this case, we have to choose quasi-distributed systems such as OrientDB and JanusGraph (the original Titan), or store the information in traditional databases through the principle of efficiency and redundancy. Thus reducing the amount of information carried by the knowledge graph. In general, Neo4j is sufficient.

  6. Upper application development and system evaluation

    Construct a knowledge map and mine valuable information from the map according to requirements. From the perspective of algorithm, there are the following three different scenarios: one is rule-based, common applications are inconsistent verification, rule-based feature extraction, pattern-based judgment; The other is based on probability, common applications are community mining, clustering, etc. The other is based on dynamic network, and the common application is risk change from time T to time T+1.

    The disadvantage of the probabilistic approach compared to the rule-based methodology is that it requires a large amount of data. If the data volume is small and the entire graph is Sparse, a rules-based approach can be preferred. In finance, in particular, there are fewer data labels, which is the main reason why a rules-based methodology is still more commonly used in finance.

    Given the current state of AI technology, rules-based methodologies are still dominant in vertical applications, but as the volume of data increases and methodologies improve, probabilistic models will gradually bring greater value.

At the end

First of all, the main function of knowledge graph is to analyze relationships, especially deep relationships. Therefore, in business, first of all, to ensure its necessity, in fact, many problems can be solved by the way of non-knowledge graph.

One of the most important topics in knowledge mapping is knowledge reasoning. And intellectual reasoning is the only way to strong artificial intelligence. Unfortunately, many reasoning techniques discussed from the perspective of semantic networks (such as deep learning and probability statistics) are difficult to be implemented in practical vertical applications. The most effective approach is rule-based methodologies, unless we have very large data sets.

Finally, it is important to emphasize that the KGS itself is business-focused and data-centric. Don’t underestimate the importance of business and data.

This paper reference: blog.csdn.net/lzw17750614…